From kbarrett at openjdk.java.net Thu Jul 1 00:00:00 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 1 Jul 2021 00:00:00 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> Message-ID: On Wed, 30 Jun 2021 20:11:17 GMT, Zhengyu Gu wrote: > GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). > > My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. > > Specification people, please comment. Thanks! Changes requested by kbarrett (Reviewer). src/hotspot/share/prims/jni.cpp line 2825: > 2823: } > 2824: oop a = lock_gc_or_pin_object(thread, array); > 2825: guarantee(a->is_typeArray(), "Primitive array only"); I think an assert here is more usual. This is JNI - we don't make guarantees (pun slightly intentional). If you want error reporting, use -Xcheck:jni; the appropriate check and reporting is already there. ------------- PR: https://git.openjdk.java.net/jdk17/pull/185 From dcubed at openjdk.java.net Thu Jul 1 00:08:00 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 1 Jul 2021 00:08:00 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading In-Reply-To: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Wed, 30 Jun 2021 22:48:03 GMT, Coleen Phillimore wrote: > This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. > Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. > > Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. > > Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen > tationLimit=0 Since this fix touches jmethodIDs, I think it would be a good idea to run it through Mach5 Tier[1-7] (on all Oracle platforms). (Yes, I'm paranoid.) ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From dcubed at openjdk.java.net Thu Jul 1 00:16:05 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 1 Jul 2021 00:16:05 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading In-Reply-To: <_Jm72ki3fIZ2dcAe4ykKXubrFUIOpzKoTQGniQ6GjmM=.e8df38c7-5d11-429d-939f-8b8ac002a180@github.com> References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> <_Jm72ki3fIZ2dcAe4ykKXubrFUIOpzKoTQGniQ6GjmM=.e8df38c7-5d11-429d-939f-8b8ac002a180@github.com> Message-ID: On Thu, 1 Jul 2021 00:06:45 GMT, Daniel D. Daugherty wrote: >> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >> >> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >> >> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >> tationLimit=0 > > src/hotspot/share/oops/method.cpp line 2261: > >> 2259: } >> 2260: // Method should otherwise be valid. Assert for testing. >> 2261: assert(is_valid_method(o), "should be valid jmethodid"); > > nit typo: s/jmethodid/jmethodID/ I think you need a comment for L2262 to explain why you return NULL when the class loader is no longer alive. I think it's because it's racy if the class loader is no longer alive to return the jmethodID. However, if that is racy, then does that mean that the above new assert() is also racy? ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From dcubed at openjdk.java.net Thu Jul 1 00:16:04 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 1 Jul 2021 00:16:04 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading In-Reply-To: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: <_Jm72ki3fIZ2dcAe4ykKXubrFUIOpzKoTQGniQ6GjmM=.e8df38c7-5d11-429d-939f-8b8ac002a180@github.com> On Wed, 30 Jun 2021 22:48:03 GMT, Coleen Phillimore wrote: > This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. > Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. > > Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. > > Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen > tationLimit=0 Just comments for now. I need to think about this more. src/hotspot/share/oops/method.cpp line 2257: > 2255: if (mid == NULL) return NULL; > 2256: Method* o = resolve_jmethod_id(mid); > 2257: if (o == NULL || o == JNIMethodBlock::_free_method) { I need to think more about deleting the is_method() check. src/hotspot/share/oops/method.cpp line 2261: > 2259: } > 2260: // Method should otherwise be valid. Assert for testing. > 2261: assert(is_valid_method(o), "should be valid jmethodid"); nit typo: s/jmethodid/jmethodID/ ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From jwilhelm at openjdk.java.net Thu Jul 1 00:17:39 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 1 Jul 2021 00:17:39 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8262841: Clarify the behavior of PhantomReference::refersTo - 8269703: ProblemList vmTestbase/nsk/jvmti/scenarios/sampling/SP07/sp07t002/TestDescription.java on Windows-X64 with -Xcomp - 8269513: Clarify the spec wrt `useOldISOCodes` system property - 8268897: [TESTBUG] compiler/compilercontrol/mixed/RandomCommandsTest.java must not fail on Command.quiet - 8268557: Module page uses unstyled table class - 8269691: ProblemList sun/management/jdp/JdpDefaultsTest.java on Linux-aarch64 - 8269486: CallerAccessTest fails for non server variant - 8269614: [s390] Interpreter checks wrong bit for slow path instance allocation - 8269594: assert(_handle_mark_nesting > 1) failed: memory leak: allocating handle outside HandleMark - ... and 6 more: https://git.openjdk.java.net/jdk/compare/85262c71...d9b654b1 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4645&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4645&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4645/files Stats: 394 lines in 29 files changed: 280 ins; 26 del; 88 mod Patch: https://git.openjdk.java.net/jdk/pull/4645.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4645/head:pull/4645 PR: https://git.openjdk.java.net/jdk/pull/4645 From david.holmes at oracle.com Thu Jul 1 00:20:09 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 1 Jul 2021 10:20:09 +1000 Subject: RFR: JDK-8269650: Optimize gc-locker in [Get|Release]StringCritical for latin string In-Reply-To: References: Message-ID: On 1/07/2021 9:30 am, Kim Barrett wrote: > On Wed, 30 Jun 2021 11:55:49 GMT, Hamlin Li wrote: > >> Currently, JNI GetStringCritical locks gc locker for all strings including latin and non-latin until ReleaseStringCritical. >> But for latin, it's not necessary to still lock gc locker after GetStringCritical, as it's copied anyway whether obj pining is supported or not, so it's fine to unlock gc locker after GetStringCritical. > >> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ >> >> On 30/06/2021 10:35 pm, Thomas Schatzl wrote: >> >>> On Wed, 30 Jun 2021 11:55:49 GMT, Hamlin Li wrote: >>>> Currently, JNI GetStringCritical locks gc locker for all strings including latin and non-latin until ReleaseStringCritical. >>>> But for latin, it's not necessary to still lock gc locker after GetStringCritical, as it's copied anyway whether obj pining is supported or not, so it's fine to unlock gc locker after GetStringCritical. >>> >>> >>> Actually I think the *String* object can be unlocked regardless of `is_latin1` or not. The code returns the *char array* that the native code is going to process after all - which is not locked *at all* but probably should be. I filed [JDK-8269661](https://bugs.openjdk.java.net/browse/JDK-8269661) for this. >> >> I admit I do not know how String objects are laid out since compact >> strings came along but IIRC before that the char* pointed to an actual >> array embedded in the String's value, and so the String (and its value >> array) had to be pinned/locked. >> >> David > > I don?t see how that embedding of the char array could work (without bespoke GC support or something like Valhalla). > I also don?t see any sign of that in the code history. I was unclear in what I was trying to describe. The String object contains a reference to its byte[] 'value'. The byte[] object embeds the actual array of bytes (C char) that we expose via GetStringCritical. So the byte[] has to be pinned, but not the String. David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/4637 > From zgu at openjdk.java.net Thu Jul 1 00:34:26 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 1 Jul 2021 00:34:26 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> Message-ID: <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> > GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). > > My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. > > Specification people, please comment. Thanks! Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Kim's comment ------------- Changes: - all: https://git.openjdk.java.net/jdk17/pull/185/files - new: https://git.openjdk.java.net/jdk17/pull/185/files/0b9f944e..a117ace2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk17&pr=185&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk17&pr=185&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk17/pull/185.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/185/head:pull/185 PR: https://git.openjdk.java.net/jdk17/pull/185 From kbarrett at openjdk.java.net Thu Jul 1 01:06:19 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 1 Jul 2021 01:06:19 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> Message-ID: <-CCV_PYOCcJf7aKFvc5rKzt9sQeSPt55KXi-_XHgdo8=.e3495833-308b-4aa6-a5bb-927009e7b8c7@github.com> On Thu, 1 Jul 2021 00:34:26 GMT, Zhengyu Gu wrote: >> GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). >> >> My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. >> >> Specification people, please comment. Thanks! > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Kim's comment Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk17/pull/185 From jwilhelm at openjdk.java.net Thu Jul 1 01:06:45 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 1 Jul 2021 01:06:45 GMT Subject: RFR: Merge jdk17 [v2] In-Reply-To: References: Message-ID: > Forwardport JDK 17 -> JDK 18 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 125 commits: - Merge - 8268637: Update --release 17 symbol information for JDK 17 build 28 Reviewed-by: iris - 8269678: Remove unimplemented and unused os::bind_to_processor() Reviewed-by: dcubed - 8268457: XML Transformer outputs Unicode supplementary character incorrectly to HTML Reviewed-by: lancea, naoto, iris, joehw - 8269516: AArch64: Assembler cleanups Reviewed-by: ngasson, adinn - 8261495: Shenandoah: reconsider update references memory ordering Reviewed-by: zgu, rkennke - 8269478: Shenandoah: gc/shenandoah/mxbeans tests should be more resilient Reviewed-by: rkennke - 8269416: [JVMCI] capture libjvmci crash data to a file Reviewed-by: kvn, dholmes - 8268906: gc/g1/mixedgc/TestOldGenCollectionUsage.java assumes that GCs take 1ms minimum Reviewed-by: kbarrett, ayang, lkorinth - 8263461: jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java uses wrong mechanism to cause evacuation failure Reviewed-by: kbarrett, iwalulya, ayang - ... and 115 more: https://git.openjdk.java.net/jdk/compare/9ac63a6e...d9b654b1 ------------- Changes: https://git.openjdk.java.net/jdk/pull/4645/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4645&range=01 Stats: 31079 lines in 656 files changed: 18219 ins; 10796 del; 2064 mod Patch: https://git.openjdk.java.net/jdk/pull/4645.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4645/head:pull/4645 PR: https://git.openjdk.java.net/jdk/pull/4645 From jwilhelm at openjdk.java.net Thu Jul 1 01:06:47 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 1 Jul 2021 01:06:47 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: <9kd-vCEotcwQxZlMbaZi-hGcyK6i6qGQRVefV1PeDBg=.d2fbbe58-7c3a-46d5-b685-66ab35fd8b70@github.com> On Thu, 1 Jul 2021 00:08:51 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: 9def3b06 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/9def3b068e9ee065e2e545bb35f8dc56ccfe5955 Stats: 394 lines in 29 files changed: 280 ins; 26 del; 88 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4645 From david.holmes at oracle.com Thu Jul 1 01:24:39 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 1 Jul 2021 11:24:39 +1000 Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> Message-ID: <64586b79-75ff-fcdc-6e07-dd0fdb278932@oracle.com> On 1/07/2021 6:17 am, Zhengyu Gu wrote: > GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). JNI does not in general perform parameter validation. This is a deliberate design choice with JNI. Use of -Xcheck:jni should detect and repport incorrect usage. David ----- > My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. > > Specification people, please comment. Thanks! > > ------------- > > Commit messages: > - v0 > > Changes: https://git.openjdk.java.net/jdk17/pull/185/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=185&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8269697 > Stats: 7 lines in 1 file changed: 0 ins; 5 del; 2 mod > Patch: https://git.openjdk.java.net/jdk17/pull/185.diff > Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/185/head:pull/185 > > PR: https://git.openjdk.java.net/jdk17/pull/185 > From david.holmes at oracle.com Thu Jul 1 04:41:44 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 1 Jul 2021 14:41:44 +1000 Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <64586b79-75ff-fcdc-6e07-dd0fdb278932@oracle.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <64586b79-75ff-fcdc-6e07-dd0fdb278932@oracle.com> Message-ID: <9ae7cb6c-b114-58ef-28b8-c7e3ea671f07@oracle.com> Hi Zhengyu, On 1/07/2021 11:24 am, David Holmes wrote: > On 1/07/2021 6:17 am, Zhengyu Gu wrote: >> GetPrimitiveArrayCritical() is supposed to only be used with primitive >> array types, but nothing prevents current implementation from >> accepting object arrays (please see attached test case in bug). > > JNI does not in general perform parameter validation. This is a > deliberate design choice with JNI. Use of -Xcheck:jni should detect and > repport incorrect usage. Sorry I misunderstood the nature of the proposed fix. >> My purposed fix is not very friendly, it crashes JVM if a none >> primitive array is passed in, but I am sure what to expect in this >> scenario. >> >> Specification people, please comment. Thanks! Crashing if passed the wrong type of array is perfectly fine per JNI specification "Reporting Errors" section [1]. [1]. https://docs.oracle.com/en/java/javase/16/docs/specs/jni/design.html#reporting-programming-errors I'll see if I can dig up why the code does what it does. Thanks, David ----- >> >> ------------- >> >> Commit messages: >> ? - v0 >> >> Changes: https://git.openjdk.java.net/jdk17/pull/185/files >> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=185&range=00 >> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8269697 >> ?? Stats: 7 lines in 1 file changed: 0 ins; 5 del; 2 mod >> ?? Patch: https://git.openjdk.java.net/jdk17/pull/185.diff >> ?? Fetch: git fetch https://git.openjdk.java.net/jdk17 >> pull/185/head:pull/185 >> >> PR: https://git.openjdk.java.net/jdk17/pull/185 >> From david.holmes at oracle.com Thu Jul 1 05:00:48 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 1 Jul 2021 15:00:48 +1000 Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <9ae7cb6c-b114-58ef-28b8-c7e3ea671f07@oracle.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <64586b79-75ff-fcdc-6e07-dd0fdb278932@oracle.com> <9ae7cb6c-b114-58ef-28b8-c7e3ea671f07@oracle.com> Message-ID: <06838ae5-ee45-d89d-ebc4-ce441ca8c735@oracle.com> On 1/07/2021 2:41 pm, David Holmes wrote: > Hi Zhengyu, > > On 1/07/2021 11:24 am, David Holmes wrote: >> On 1/07/2021 6:17 am, Zhengyu Gu wrote: >>> GetPrimitiveArrayCritical() is supposed to only be used with >>> primitive array types, but nothing prevents current implementation >>> from accepting object arrays (please see attached test case in bug). >> >> JNI does not in general perform parameter validation. This is a >> deliberate design choice with JNI. Use of -Xcheck:jni should detect >> and repport incorrect usage. > > Sorry I misunderstood the nature of the proposed fix. > >>> My purposed fix is not very friendly, it crashes JVM if a none >>> primitive array is passed in, but I am sure what to expect in this >>> scenario. >>> >>> Specification people, please comment. Thanks! > > Crashing if passed the wrong type of array is perfectly fine per JNI > specification "Reporting Errors" section [1]. > > [1]. > https://docs.oracle.com/en/java/javase/16/docs/specs/jni/design.html#reporting-programming-errors > > > I'll see if I can dig up why the code does what it does. No this is basically "day one" code from 1997. David > Thanks, > David > ----- > >>> >>> ------------- >>> >>> Commit messages: >>> ? - v0 >>> >>> Changes: https://git.openjdk.java.net/jdk17/pull/185/files >>> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=185&range=00 >>> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8269697 >>> ?? Stats: 7 lines in 1 file changed: 0 ins; 5 del; 2 mod >>> ?? Patch: https://git.openjdk.java.net/jdk17/pull/185.diff >>> ?? Fetch: git fetch https://git.openjdk.java.net/jdk17 >>> pull/185/head:pull/185 >>> >>> PR: https://git.openjdk.java.net/jdk17/pull/185 >>> From dholmes at openjdk.java.net Thu Jul 1 05:03:04 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 1 Jul 2021 05:03:04 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> Message-ID: On Thu, 1 Jul 2021 00:34:26 GMT, Zhengyu Gu wrote: >> GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). >> >> My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. >> >> Specification people, please comment. Thanks! > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Kim's comment The change looks good. We should add a testcase in runtime/jni/checked/ to ensure Xcheck:jni catches this. Thanks, David ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk17/pull/185 From shade at openjdk.java.net Thu Jul 1 08:49:02 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 1 Jul 2021 08:49:02 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> Message-ID: On Thu, 1 Jul 2021 00:34:26 GMT, Zhengyu Gu wrote: >> GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). >> >> My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. >> >> Specification people, please comment. Thanks! > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Kim's comment Hold on. This introduces the behavioral change _for the worse_, right? More specifically, the current callers-with-object-array would now get `TypeArrayKlass::cast()`-ed to improper class with unknown consequences. _Hopefully_ it would be a JVM crash, but it might as well be incorrect execution. Old code would execute fine with most GCs (= the ones using GCLocker). Would it be saner to do a copy of object array if such a request is detected? I believe it warrants a CSR discussion, and therefore postponement to JDK 18. ------------- PR: https://git.openjdk.java.net/jdk17/pull/185 From pliden at openjdk.java.net Thu Jul 1 08:54:01 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 1 Jul 2021 08:54:01 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> Message-ID: <1Uk6ccQ7jGW8foy8pVaACn7miSWKUQCAfyjl7ud3uIE=.6b5b2d9e-9339-47b1-a78e-7d9454eb5be7@github.com> On Thu, 1 Jul 2021 00:34:26 GMT, Zhengyu Gu wrote: >> GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). >> >> My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. >> >> Specification people, please comment. Thanks! > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Kim's comment The spec (https://docs.oracle.com/en/java/javase/16/docs/specs/jni/functions.html#getprimitivearraycritical-releaseprimitivearraycritical) says: > Returns a pointer to the array elements, or NULL if the operation fails. I interpret this as we're allowed to return NULL in this case. Operation failed because of invalid input. ------------- PR: https://git.openjdk.java.net/jdk17/pull/185 From shade at openjdk.java.net Thu Jul 1 08:59:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 1 Jul 2021 08:59:57 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> Message-ID: On Thu, 1 Jul 2021 00:34:26 GMT, Zhengyu Gu wrote: >> GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). >> >> My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. >> >> Specification people, please comment. Thanks! > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Kim's comment Yes, returning `NULL` would be much saner here. That would alleviate my concern: the out-of-spec users would have a clear failure result. (It should be done before trying to lock/pin the object, of course). ------------- PR: https://git.openjdk.java.net/jdk17/pull/185 From aph at redhat.com Thu Jul 1 10:00:50 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 1 Jul 2021 11:00:50 +0100 Subject: RFR: 8269476: Skip nmethod entry barrier if there is no oops in the jit code [v4] In-Reply-To: <9FFE0079-6EC2-4C0B-B235-C9F9D2C9E90D@oracle.com> References: <9FFE0079-6EC2-4C0B-B235-C9F9D2C9E90D@oracle.com> Message-ID: <8eb6de3a-44e6-dd80-c14f-b54b28a67c95@redhat.com> On 6/30/21 9:02 PM, John Rose wrote: > There are obvious micro-benchmarks where the cost > of nmethod entry would be detectable. But nmethods > tend to be large and loopy and calls to them are typically > infrequent and expensive. (Expenses include spilling > registers, redoing checks on inputs, and lots more.) True enough: method calls are expensive, particularly because we don't have any callee-preserved registers. The notion that "calls are expensive" has IMVHO caused us to ignore some things that could make them cheaper, such as frameless leaf methods. That is to say: to some extent, the belief in an inevitable future can cause that future to happen. If we hadn't had such expensive calls maybe we wouldn't have needed to make so much effort to avoid them. And maybe we'd have had to do less inlining, etc., and our caches would work better. Or maybe not. :-) (Please forgive the somewhat pedagogical tone of the following: it is the only way I know how to explain my thoughts.) An efficient system includes thousands of tiny optimizations, each one too small to be measured on its own, at least without heroic efforts. This is perhaps more obvious in, e.g. automobile engines than it is in the systems we create, but I think it's the same thing. But lose them, and, one by one, the system slowly gets worse. Even if, in isolation, adding some code makes no difference in speed, you have made something worse. The most likely reason nothing slowed down is that a superscalar CPU managed to run your extra instructions in parallel. But it does cost, even if the result was delivered to the user in the same time, in terms of power consumption. Also, it reduces the availability of speculation resources which might be used for useful work in a slightly different run of the same program. I am going to take a little to to quantify, hopefully in percentage terms, the cost of nmethod barriers, as much for my own education as anything else. I'll get back to y'all. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley From coleenp at openjdk.java.net Thu Jul 1 12:24:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 1 Jul 2021 12:24:02 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading In-Reply-To: <_Jm72ki3fIZ2dcAe4ykKXubrFUIOpzKoTQGniQ6GjmM=.e8df38c7-5d11-429d-939f-8b8ac002a180@github.com> References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> <_Jm72ki3fIZ2dcAe4ykKXubrFUIOpzKoTQGniQ6GjmM=.e8df38c7-5d11-429d-939f-8b8ac002a180@github.com> Message-ID: On Thu, 1 Jul 2021 00:12:52 GMT, Daniel D. Daugherty wrote: >> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >> >> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >> >> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >> tationLimit=0 > > src/hotspot/share/oops/method.cpp line 2257: > >> 2255: if (mid == NULL) return NULL; >> 2256: Method* o = resolve_jmethod_id(mid); >> 2257: if (o == NULL || o == JNIMethodBlock::_free_method) { > > I need to think more about deleting the is_method() check. The is_method() check was from when JNIHandleBlocks contained both jmethodID's and jobjects. It's a leftover from permgen elimination. I should have mentioned that in the PR comments. Now the only things that can be put in the JNIMethodBlocks are Methods, which the compiler type-checks, so that's the only thing that can come out (unless corrupted). ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From coleenp at openjdk.java.net Thu Jul 1 12:24:03 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 1 Jul 2021 12:24:03 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> <_Jm72ki3fIZ2dcAe4ykKXubrFUIOpzKoTQGniQ6GjmM=.e8df38c7-5d11-429d-939f-8b8ac002a180@github.com> Message-ID: On Thu, 1 Jul 2021 00:11:55 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/oops/method.cpp line 2261: >> >>> 2259: } >>> 2260: // Method should otherwise be valid. Assert for testing. >>> 2261: assert(is_valid_method(o), "should be valid jmethodid"); >> >> nit typo: s/jmethodid/jmethodID/ > > I think you need a comment for L2262 to explain why you return NULL > when the class loader is no longer alive. I think it's because it's racy > if the class loader is no longer alive to return the jmethodID. > > However, if that is racy, then does that mean that the above new assert() > is also racy? The assert isn't racy because there needs to be a safepoint in between changing the color of the oop pointers, so if there's a Method there, the class loader liveness won't change state in between. The assert checks for the unexpected case where it's corrupted. ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From zgu at openjdk.java.net Thu Jul 1 12:36:02 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 1 Jul 2021 12:36:02 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> Message-ID: On Thu, 1 Jul 2021 00:34:26 GMT, Zhengyu Gu wrote: >> GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). >> >> My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. >> >> Specification people, please comment. Thanks! > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Kim's comment The spec also says: > Therefore we need to check its return value against NULL for possible out of memory situations. it suggests that returning NULL solely due to out of memory ------------- PR: https://git.openjdk.java.net/jdk17/pull/185 From david.holmes at oracle.com Thu Jul 1 13:03:17 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 1 Jul 2021 23:03:17 +1000 Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> Message-ID: <8ad7b801-31b7-4b2f-1655-706bc592cb62@oracle.com> On 1/07/2021 6:49 pm, Aleksey Shipilev wrote: > On Thu, 1 Jul 2021 00:34:26 GMT, Zhengyu Gu wrote: > >>> GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). >>> >>> My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. >>> >>> Specification people, please comment. Thanks! >> >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Kim's comment > > Hold on. This introduces the behavioral change _for the worse_, right? More specifically, the current callers-with-object-array would now get `TypeArrayKlass::cast()`-ed to improper class with unknown consequences. _Hopefully_ it would be a JVM crash, but it might as well be incorrect execution. Old code would execute fine with most GCs (= the ones using GCLocker). Would it be saner to do a copy of object array if such a request is detected? Making a copy seems a rather bizarre way to legitimize something invalid. I agree that it might be better to detect and return NULL rather than hoping it might crash. But the same can be said for passing bad parameters to any JNI method - we don't guarantee fail-fast. > I believe it warrants a CSR discussion, and therefore postponement to JDK 18. It had skipped my notice that this was targeted to 17. I do not agree with that. This is day one behaviour so not a P3+ bug, so not suitable for changing in RDP1. This has to be targeted to 18 and yes a CSR request could be worthwhile - though unclear how to handle such a case when the spec itself fails to handle it. Perhaps for 18 we need to adjust the spec as well to explicitly reject a non-primitive array and return NULL. David > ------------- > > PR: https://git.openjdk.java.net/jdk17/pull/185 > From hseigel at openjdk.java.net Thu Jul 1 13:12:16 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 1 Jul 2021 13:12:16 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE Message-ID: Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. Thanks, Harold ------------- Commit messages: - 8244162: Additional opportunities to use NONCOPYABLE Changes: https://git.openjdk.java.net/jdk/pull/4652/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4652&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8244162 Stats: 21 lines in 6 files changed: 6 ins; 9 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/4652.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4652/head:pull/4652 PR: https://git.openjdk.java.net/jdk/pull/4652 From zgu at openjdk.java.net Thu Jul 1 13:20:01 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 1 Jul 2021 13:20:01 GMT Subject: [jdk17] RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> <5wDsqEPH7CoHMK7zN7f7TFbNuwrtqto8_8bnjSnx7pE=.9377ce23-dbf8-4c03-9057-e612628782ab@github.com> Message-ID: <7Im4_7lHvEYAaz28RK8grsORFG-bPWVb6Ehu5LhH0_8=.5dd00e3c-58a8-4a34-8d44-55bb333e5c86@github.com> On Thu, 1 Jul 2021 00:34:26 GMT, Zhengyu Gu wrote: >> GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). >> >> My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. >> >> Specification people, please comment. Thanks! > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Kim's comment Close this PR and move to 18 for further discussion. ------------- PR: https://git.openjdk.java.net/jdk17/pull/185 From zgu at openjdk.java.net Thu Jul 1 13:20:01 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 1 Jul 2021 13:20:01 GMT Subject: [jdk17] Withdrawn: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> References: <3haNnfVYr8DFrHCOx3EKMARn3Qs_JTJSVVIxvl1fJYg=.10c1be00-600e-442c-95c7-db705464054e@github.com> Message-ID: On Wed, 30 Jun 2021 20:11:17 GMT, Zhengyu Gu wrote: > GetPrimitiveArrayCritical() is supposed to only be used with primitive array types, but nothing prevents current implementation from accepting object arrays (please see attached test case in bug). > > My purposed fix is not very friendly, it crashes JVM if a none primitive array is passed in, but I am sure what to expect in this scenario. > > Specification people, please comment. Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk17/pull/185 From aph at redhat.com Thu Jul 1 13:49:30 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 1 Jul 2021 14:49:30 +0100 Subject: The cost of nmethod entry barriers [was: RFR: 8269476: Skip nmethod entry barrier if there is no oops in the jit code [v4]] In-Reply-To: <8eb6de3a-44e6-dd80-c14f-b54b28a67c95@redhat.com> References: <9FFE0079-6EC2-4C0B-B235-C9F9D2C9E90D@oracle.com> <8eb6de3a-44e6-dd80-c14f-b54b28a67c95@redhat.com> Message-ID: <61040ae8-bbbd-d10b-f579-e94eac74df0f@redhat.com> On 7/1/21 11:00 AM, Andrew Haley wrote: > I am going to take a little to to quantify, hopefully in percentage > terms, the cost of nmethod barriers, as much for my own education as > anything else. I'll get back to y'all. I have numbers. Running javac (@java.base) on AArch64 we execute on average 657152838 nmethod barriers. That number varies by as much as 2.5%, to be expected given that compiler threads are racing with Java threads. perf stats are typcally: 77232.039076 task-clock:u (msec) # 3.284 CPUs utilized 933972 page-faults:u # 0.012 M/sec 167237276242 cycles:u # 2.165 GHz 239468489262 instructions:u # 1.43 insn per cycle 1762368611 branch-misses:u 23.520042515 seconds time elapsed An nmethod barrier is 5 instructions, so the proportion of instructions executed by nmethod barriers is (657152838.0*5)/239468489262 = 1.38% That's quite a lot. I would have expected it to be noticeable. It might well be that the barrier instructions commonly are (fully or partially) speculated in parallel on a big out-of-order machine so don't show up on a wall clock, but they will show up in perf stats. Also, that 1% is about the level of run-to-run variance even on a quiet server, so it'd take many runs averaged out to see it. But the effect is real; unless I have messed up my measurements or my thinking, which happens. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at openjdk.java.net Thu Jul 1 13:49:27 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 1 Jul 2021 13:49:27 GMT Subject: RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array Message-ID: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> Open this PR to carry on the discussion started in jdk17 [https://github.com/openjdk/jdk17/pull/185](url) ------------- Commit messages: - v1 Changes: https://git.openjdk.java.net/jdk/pull/4653/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4653&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269697 Stats: 128 lines in 3 files changed: 121 ins; 5 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4653/head:pull/4653 PR: https://git.openjdk.java.net/jdk/pull/4653 From dcubed at openjdk.java.net Thu Jul 1 14:36:05 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 1 Jul 2021 14:36:05 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> <_Jm72ki3fIZ2dcAe4ykKXubrFUIOpzKoTQGniQ6GjmM=.e8df38c7-5d11-429d-939f-8b8ac002a180@github.com> Message-ID: On Thu, 1 Jul 2021 12:17:58 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/method.cpp line 2257: >> >>> 2255: if (mid == NULL) return NULL; >>> 2256: Method* o = resolve_jmethod_id(mid); >>> 2257: if (o == NULL || o == JNIMethodBlock::_free_method) { >> >> I need to think more about deleting the is_method() check. > > The is_method() check was from when JNIHandleBlocks contained both jmethodID's and jobjects. It's a leftover from permgen elimination. I should have mentioned that in the PR comments. Now the only things that can be put in the JNIMethodBlocks are Methods, which the compiler type-checks, so that's the only thing that can come out (unless corrupted). Thanks for the explanation. ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From dcubed at openjdk.java.net Thu Jul 1 14:57:00 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 1 Jul 2021 14:57:00 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading In-Reply-To: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Wed, 30 Jun 2021 22:48:03 GMT, Coleen Phillimore wrote: > This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. > Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. > > Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. > > Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen > tationLimit=0 Still just comments because I'm not clear on the one code path. src/hotspot/share/oops/method.cpp line 2262: > 2260: // Method should otherwise be valid. Assert for testing. > 2261: assert(is_valid_method(o), "should be valid jmethodid"); > 2262: return o->method_holder()->is_loader_alive() ? o : NULL; Let me ask the question a different way. Here's the two lines of code: assert(is_valid_method(o), "should be valid jmethodid"); return o->method_holder()->is_loader_alive() ? o : NULL; If `is_loader_alive()` can return false at this point, then what does the previous `is_valid_method(o)` call return in that case? Should the `assert()` call only be made in the case where `is_loader_alive()` returns true? Based on what you said in your reply, the normal expectation is that the `assert()` call will pass because at that point we must have a valid Method. (I'm ignoring corruption here). If we have a valid Method, then that implies that the `is_loader_alive()` call must also return true. So if we're expecting the Method to be valid at the point of the `assert()` call, then why are we checking `is_loader_alive()`? It could be for robustness, but in that case, I would add a comment making that clear. ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From coleenp at openjdk.java.net Thu Jul 1 15:44:03 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 1 Jul 2021 15:44:03 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Thu, 1 Jul 2021 14:53:24 GMT, Daniel D. Daugherty wrote: >> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >> >> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >> >> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >> tationLimit=0 > > src/hotspot/share/oops/method.cpp line 2262: > >> 2260: // Method should otherwise be valid. Assert for testing. >> 2261: assert(is_valid_method(o), "should be valid jmethodid"); >> 2262: return o->method_holder()->is_loader_alive() ? o : NULL; > > Let me ask the question a different way. Here's the two lines of code: > > assert(is_valid_method(o), "should be valid jmethodid"); > return o->method_holder()->is_loader_alive() ? o : NULL; > > > If `is_loader_alive()` can return false at this point, then what does > the previous `is_valid_method(o)` call return in that case? Should the > `assert()` call only be made in the case where `is_loader_alive()` returns > true? > > Based on what you said in your reply, the normal expectation is that the > `assert()` call will pass because at that point we must have a valid Method. > (I'm ignoring corruption here). If we have a valid Method, then that implies > that the `is_loader_alive()` call must also return true. So if we're expecting > the Method to be valid at the point of the `assert()` call, then why are we > checking `is_loader_alive()`? It could be for robustness, but in that case, > I would add a comment making that clear. The assert is to check for corruptness but even if the loader isn't alive, the Method* should be valid. For concurrent unloading (ie. zgc), checking that the method loader is alive goes like this: method->method_holder() points to the method's InstanceKlass but the ->is_loader_alive will go to the ClassLoaderData and do a weak load of the holder oop (class_loader or mirror oop). If the weak load fails, the loader is no longer alive, but the concurrent collector hasn't marked it as unloaded yet, but it will. After the safepoint, the code is then safe to purge and that will reclaim memory for the Method and it's holder. // Unloading support bool ClassLoaderData::is_alive() const { bool alive = keep_alive() // null class loader and incomplete non-strong hidden class. || (_holder.peek() != NULL); // and not cleaned by the GC weak handle processing. return alive; } ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From coleenp at openjdk.java.net Thu Jul 1 15:50:26 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 1 Jul 2021 15:50:26 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: > This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. > Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. > > Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. > > Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen > tationLimit=0 Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - ooops fixed typo - Add a comment about is_loader_alive. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4643/files - new: https://git.openjdk.java.net/jdk/pull/4643/files/76ec7ae3..50865ee0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4643&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4643&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4643.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4643/head:pull/4643 PR: https://git.openjdk.java.net/jdk/pull/4643 From eosterlund at openjdk.java.net Thu Jul 1 16:38:00 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 1 Jul 2021 16:38:00 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Thu, 1 Jul 2021 15:50:26 GMT, Coleen Phillimore wrote: >> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >> >> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >> >> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >> tationLimit=0 > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - ooops fixed typo > - Add a comment about is_loader_alive. I did wonder if this was an issue but thought that if you are racingly using a jmethodid that is uncontrollably being unloaded (i.e. you are not holding the class alive in any way at all), then you are using the APIs in a seemingly buggy way that may or may not work the way you expect it to. Nevertheless, bugging out without a crash does seem better, and it did annoy me that these jmethodids are the only things unlinked that late. Thanks for fixing this properly. This is the way I thought about fixing it, when I was not sure if it needed fixing or not. Sounds like it did need fixing after all. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4643 From dcubed at openjdk.java.net Thu Jul 1 17:54:02 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 1 Jul 2021 17:54:02 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Thu, 1 Jul 2021 15:50:26 GMT, Coleen Phillimore wrote: >> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >> >> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >> >> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >> tationLimit=0 > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - ooops fixed typo > - Add a comment about is_loader_alive. Thanks for the explanation on the `is_loader_alive()` part and thanks for adding the new comment. Thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4643 From coleen.phillimore at oracle.com Thu Jul 1 19:37:04 2021 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 1 Jul 2021 15:37:04 -0400 Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: <2a09a4dc-6db4-fb4d-acdb-574f2d9b0a52@oracle.com> On 7/1/21 12:38 PM, Erik ?sterlund wrote: > On Thu, 1 Jul 2021 15:50:26 GMT, Coleen Phillimore wrote: > >>> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >>> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >>> >>> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >>> >>> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >>> tationLimit=0 >> Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: >> >> - ooops fixed typo >> - Add a comment about is_loader_alive. > I did wonder if this was an issue but thought that if you are racingly using a jmethodid that is uncontrollably being unloaded (i.e. you are not holding the class alive in any way at all), then you are using the APIs in a seemingly buggy way that may or may not work the way you expect it to. Nevertheless, bugging out without a crash does seem better, and it did annoy me that these jmethodids are the only things unlinked that late. > Thanks for fixing this properly. This is the way I thought about fixing it, when I was not sure if it needed fixing or not. Sounds like it did need fixing after all. Erik, thank you for confirming my diagnosis of the problem and solution.? Ioi had found this through code inspection and carefully reading JVMTI together helped. There weren't any test failures, and in fact, this bug is getting marked with noreg-hard. Thanks! Coleen > > ------------- > > Marked as reviewed by eosterlund (Reviewer). > > PR: https://git.openjdk.java.net/jdk/pull/4643 From john.r.rose at oracle.com Thu Jul 1 20:56:31 2021 From: john.r.rose at oracle.com (John Rose) Date: Thu, 1 Jul 2021 20:56:31 +0000 Subject: [External] : The cost of nmethod entry barriers [was: RFR: 8269476: Skip nmethod entry barrier if there is no oops in the jit code [v4]] In-Reply-To: <61040ae8-bbbd-d10b-f579-e94eac74df0f@redhat.com> References: <9FFE0079-6EC2-4C0B-B235-C9F9D2C9E90D@oracle.com> <8eb6de3a-44e6-dd80-c14f-b54b28a67c95@redhat.com> <61040ae8-bbbd-d10b-f579-e94eac74df0f@redhat.com> Message-ID: <33BC1746-BAE1-4161-8499-92DFA2935792@oracle.com> Nice work; thanks. So for javac (which BTW has a lot of polymorphism in it) the dynamic proportion of retired nmethod entry instructions is 1.38%. Unless those instructions are unusually slow ones, that percentage is an upper limit to their wall clock contribution. If we can get rid of 2 out of 5 of them, the estimate would be 0.83%, with a savings of 0.55% (upper bound). That is indeed a tempting target. Just as performance can degrade by the ?death of a thousand cuts? as you describe, it improves, often, by doggedly adding half percent improvements over and over. And (as we all know) if you need a ?success of a thousand tweaks?, you have to pick and choose which tweaks you attempt. One tweak that promises good effects but has high costs in complexity and maintainability (like, perhaps, the one we are talking about here), can have the economic effect (a la Bastiat ?that which is not seen?) of quashing five other tweaks that might require an equal amount of maintenance but provide better combined benefit. Also, an overly complex tweak can have the effect of muddying the code so that later tweaks (like Erik?s Swiss army knife tactics) become more costly themselves or even impossible. BTW, that 1.38% is a *lower* limit for nmethod overhead proper, because it doesn?t count the effects of lost inlining, which are surely at least as large. The combined effects of lost inlining (whatever they are) are IMO the root reason why we keep piling on more and more inlining tactics, trying to make nmethods large and calls infrequent. HTH! > On Jul 1, 2021, at 6:49 AM, Andrew Haley wrote: > > On 7/1/21 11:00 AM, Andrew Haley wrote: > >> I am going to take a little to to quantify, hopefully in percentage >> terms, the cost of nmethod barriers, as much for my own education as >> anything else. I'll get back to y'all. > > I have numbers. > > Running javac (@java.base) on AArch64 we execute on average 657152838 > nmethod barriers. That number varies by as much as 2.5%, to be > expected given that compiler threads are racing with Java threads. > > perf stats are typcally: > > 77232.039076 task-clock:u (msec) # 3.284 CPUs utilized > 933972 page-faults:u # 0.012 M/sec > 167237276242 cycles:u # 2.165 GHz > 239468489262 instructions:u # 1.43 insn per cycle > 1762368611 branch-misses:u > > 23.520042515 seconds time elapsed > > An nmethod barrier is 5 instructions, so the proportion of instructions > executed by nmethod barriers is > > (657152838.0*5)/239468489262 = 1.38% > > That's quite a lot. I would have expected it to be noticeable. It > might well be that the barrier instructions commonly are (fully or > partially) speculated in parallel on a big out-of-order machine so > don't show up on a wall clock, but they will show up in perf > stats. Also, that 1% is about the level of run-to-run variance even on > a quiet server, so it'd take many runs averaged out to see it. But the > effect is real; unless I have messed up my measurements or my > thinking, which happens. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://urldefense.com/v3/__https://keybase.io/andrewhaley__;!!ACWV5N9M2RV99hQ!d79BL2gH4Ia5-JuRczEGCL852C2FIKBiVsyvUa6wnQB_2CTVMxzM_KQ31rYrRuHG$ > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From kbarrett at openjdk.java.net Thu Jul 1 21:26:00 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 1 Jul 2021 21:26:00 GMT Subject: RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> References: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> Message-ID: On Thu, 1 Jul 2021 13:41:15 GMT, Zhengyu Gu wrote: > Open this PR to carry on the discussion started in jdk17 [https://github.com/openjdk/jdk17/pull/185](url) Thanks for adding the test. This looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4653 From kbarrett at openjdk.java.net Thu Jul 1 21:39:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 1 Jul 2021 21:39:01 GMT Subject: RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> References: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> Message-ID: <4msE7a0CpakhVGnBUMkqp_kVxxEOHJ48NTzxzKq4GiY=.3dc888ed-b16e-41c3-8051-4fd4c58fc3f3@github.com> On Thu, 1 Jul 2021 13:41:15 GMT, Zhengyu Gu wrote: > Open this PR to carry on the discussion started in jdk17 [https://github.com/openjdk/jdk17/pull/185](url) Following upo on the discussion from https://github.com/openjdk/jdk17/pull/185 GetPrimitiveArrayCritical returns a void* which the caller is then expected to cast to the appropriate pointer to jXXX for read or write. If it's an objarray, it's not clear what type should be used for that cast; it's not jobject. JNI doesn't provide an appropriate type. In fact, you need to know whether UseCompressedOops is enabled or not to even begin to access values. Let's say you have side-channel knowledge of the value of UseCompressedOops. Then what? Writing has a good chance of leading to crashes with most (maybe all?) of our collectors. Even if you could somehow obtain a well-formed value, many collectors have some barrier protocol that's needed for writes. Reading and doing anything non-trivial with a value also has a good chance of leading to crashes with at least some GCs. I don't think changing it to return a copy helps much with any of that. There are a few corner cases that might be different, but not interestingly so. And JNI_COMMIT mode might have very bad results for some at least some collectors. So I'm going to claim there's no significant compatibility issue with code that was violating the spec, because I don't think such code could have been doing anything interesting or useful anyway. About all you can get from the contents is some raw bits that you can't do much with. ------------- PR: https://git.openjdk.java.net/jdk/pull/4653 From coleenp at openjdk.java.net Thu Jul 1 21:44:00 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 1 Jul 2021 21:44:00 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Thu, 1 Jul 2021 15:50:26 GMT, Coleen Phillimore wrote: >> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >> >> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >> >> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >> tationLimit=0 > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - ooops fixed typo > - Add a comment about is_loader_alive. Thanks Dan for the careful review and questions. ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From jwilhelm at openjdk.java.net Fri Jul 2 00:25:29 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 2 Jul 2021 00:25:29 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8269745: [JVMCI] restore original qualified exports to Graal - 8268566: java/foreign/TestResourceScope.java timed out - 8260684: vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java timed out - 8269580: assert(is_valid()) failed: invalid register (-1) - 8269704: Typo in j.t.Normalizer.normalize() - 8269354: javac crashes when processing parenthesized pattern in instanceof - 8269285: Crash/miscompile in CallGenerator::for_method_handle_inline after JDK-8191998 - 8269088: C2 fails with assert(!n->is_Store() && !n->is_LoadStore()) failed: no node with a side effect - 8269230: C2: main loop in micro benchmark never executed - ... and 3 more: https://git.openjdk.java.net/jdk/compare/de61328d...5515a992 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4661&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4661&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4661/files Stats: 996 lines in 26 files changed: 843 ins; 64 del; 89 mod Patch: https://git.openjdk.java.net/jdk/pull/4661.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4661/head:pull/4661 PR: https://git.openjdk.java.net/jdk/pull/4661 From dholmes at openjdk.java.net Fri Jul 2 01:09:01 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 2 Jul 2021 01:09:01 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Thu, 1 Jul 2021 15:50:26 GMT, Coleen Phillimore wrote: >> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >> >> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >> >> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >> tationLimit=0 > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - ooops fixed typo > - Add a comment about is_loader_alive. Hi Coleen, So IIUC by clearing during unload, rather than in the destructor, it ensures that a jmethodID seen as valid by `Method::checked_resolve_jmethod_id` can't become invalid (due to a loader no longer being alive and being reclaimed) unless there is a safepoint (and assuming the JVM TI agent or application code, is not keeping the loader alive). So this fixes the situation in `jvmti_GetMethodDeclaringClass` as per the bug report, but in general we would have to be careful in the JVM TI implementation to avoid safepoints after validating the jmethodID. Is that correct? Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From jwilhelm at openjdk.java.net Fri Jul 2 01:12:34 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 2 Jul 2021 01:12:34 GMT Subject: RFR: Merge jdk17 [v2] In-Reply-To: References: Message-ID: > Forwardport JDK 17 -> JDK 18 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 133 commits: - Merge - 8225559: assertion error at TransTypes.visitApply Reviewed-by: sadayapalam, jlahoda - 8268960: com/sun/net/httpserver/Headers.java: Ensure mutators normalize keys and disallow null for keys and values Reviewed-by: chegar, dfuchs, michaelm - 8267307: Introduce new client property for XAWT: xawt.mwm_decor_title Reviewed-by: azvegint, serb - 8133873: Simplify {Register,Unregister}NMethodOopClosure Reviewed-by: tschatzl, kbarrett - 8268298: jdk/jfr/api/consumer/log/TestVerbosity.java fails: unexpected log message Reviewed-by: egahlin - 8266746: C1: Replace UnsafeGetRaw with UnsafeGet when setting up OSR entry block Replace UnsafeGetRaw with UnsafeGetObject when setting up OSR entry block, and rename Unsafe{Get,Put}Object to Unsafe{Get,Put} Reviewed-by: thartmann, dlong, mdoerr - 8268870: Remove dead code in metaspaceShared Reviewed-by: tschatzl - Merge - 8268637: Update --release 17 symbol information for JDK 17 build 28 Reviewed-by: iris - ... and 123 more: https://git.openjdk.java.net/jdk/compare/a4d2a9a7...5515a992 ------------- Changes: https://git.openjdk.java.net/jdk/pull/4661/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4661&range=01 Stats: 32483 lines in 677 files changed: 18918 ins; 11377 del; 2188 mod Patch: https://git.openjdk.java.net/jdk/pull/4661.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4661/head:pull/4661 PR: https://git.openjdk.java.net/jdk/pull/4661 From jwilhelm at openjdk.java.net Fri Jul 2 01:12:36 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 2 Jul 2021 01:12:36 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: <9541FIty7ivYR6t4zu5TZr5R_tIsqCHUaP4Wmo7q1nM=.6563c5be-64bb-40aa-a0e4-4facdfb85b2c@github.com> On Fri, 2 Jul 2021 00:18:55 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: b0e18679 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/b0e186792e816be30347dacfd88b8e55476584e7 Stats: 996 lines in 26 files changed: 843 ins; 64 del; 89 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4661 From dholmes at openjdk.java.net Fri Jul 2 01:30:06 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 2 Jul 2021 01:30:06 GMT Subject: RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> References: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> Message-ID: On Thu, 1 Jul 2021 13:41:15 GMT, Zhengyu Gu wrote: > Open this PR to carry on the discussion started in jdk17 [https://github.com/openjdk/jdk17/pull/185](url) Hi Zhengyu, Please put the test under runtime/jni/checked. The actual code change is fine by me. I will discuss the compatibility issue elsewhere. Thanks, David test/hotspot/jtreg/runtime/jni/TestPrimitiveArrayCriticalWithBadParam/TestPrimitiveArrayCriticalWithBadParam.java line 29: > 27: * @summary -Xcheck:jni should catch wrong parameter passed to GetPrimitiveArrayCritical > 28: * @library /test/lib > 29: * @run main/othervm/native TestPrimitiveArrayCriticalWithBadParam othervm is not needed, but you will need to explicitly set the library path for the launched VM. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4653 From dholmes at openjdk.java.net Fri Jul 2 01:48:00 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 2 Jul 2021 01:48:00 GMT Subject: RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> References: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> Message-ID: On Thu, 1 Jul 2021 13:41:15 GMT, Zhengyu Gu wrote: > Open this PR to carry on the discussion started in jdk17 [https://github.com/openjdk/jdk17/pull/185](url) A CSR request has been created and filled out. @zhengyu123 please mark it as Finalized. Thanks. Others can comment on the CSR request. I have already included a large portion of @kimbarrett 's comment above. ------------- PR: https://git.openjdk.java.net/jdk/pull/4653 From iklam at openjdk.java.net Fri Jul 2 06:07:43 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 2 Jul 2021 06:07:43 GMT Subject: RFR: 8269004 Implement ResizableResourceHashtable [v5] In-Reply-To: References: Message-ID: > In HotSpot we have (at least) two hashtable designs in the C++ code: > > - share/utilities/hashtable.hpp > - share/utilities/resourceHash.hpp > > Of the two, the `ResourceHashtable` API is much cleaner and most new code has been written with it. However, one issue is that the `SIZE` of `ResourceHashtable` is a compile-time constant. This makes the hash-to-index computation very fast on x64 (gcc can avoid using the slow divq instruction for modulo). However, the downside is we cannot use `ResourceHashtable` when we need a hashtable whose size is determined at run time (and, optionally, resizeable). > > This PR refactors `ResourceHashtable` into a base template class `ResourceHashtableBase`, whose `size()` function can be configured by a subclass to be either constant or runtime-configurable. > > Note: since we want to preserve the performance of `hash % SIZE`, we can't make `size()` a virtual function. > > Preliminary benchmark shows that this refactoring has no impact on the performance of the constant `ResourceHashtable`. See https://github.com/iklam/tools/tree/main/bench/resourceHash: > > *before* > ResourceHashtable: 2.70 sec > > *after* > ResourceHashtable: 2.72 sec > ResizableResourceHashtable: 5.29 sec > > To make sure `ResizableResourceHashtable` works, I rewrote some CDS code to use `ResizableResourceHashtable` instead of `KVHashtable` Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @kimbarrett comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4536/files - new: https://git.openjdk.java.net/jdk/pull/4536/files/543de2b7..66c6e381 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4536&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4536&range=03-04 Stats: 10 lines in 2 files changed: 3 ins; 4 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/4536.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4536/head:pull/4536 PR: https://git.openjdk.java.net/jdk/pull/4536 From iklam at openjdk.java.net Fri Jul 2 06:07:51 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 2 Jul 2021 06:07:51 GMT Subject: RFR: 8269004 Implement ResizableResourceHashtable [v4] In-Reply-To: References: <29ADEYUJ_QfBxtes5NSY8CwtQnw06eXQWJMAP0MdJ60=.cbdab722-ac5c-40c7-a481-dbad7bed54cd@github.com> Message-ID: On Wed, 30 Jun 2021 12:46:53 GMT, Kim Barrett wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @coleenp comments > > src/hotspot/share/utilities/resourceHash.hpp line 80: > >> 78: >> 79: Node const** lookup_node(unsigned hash, K const& key) const { >> 80: return const_cast( > > [pre-existing] I think this `const_cast` to add const is unnecessary. I tried removing it but gcc complains. I will leave it as is for now. > src/hotspot/share/utilities/resourceHash.hpp line 88: > >> 86: >> 87: public: >> 88: ResourceHashtableBase() : _number_of_entries(0) {} > > I'd prefer this explicitly initialize STORAGE, e.g. add `STORAGE()` to value-initialize rather than default-initialize it. (I think it doesn't currently make a difference, but I think being explicit is clearer.) Fixed. > src/hotspot/share/utilities/resourceHash.hpp line 90: > >> 88: ResourceHashtableBase() : _number_of_entries(0) {} >> 89: >> 90: ResourceHashtableBase(unsigned size) : STORAGE(size), _number_of_entries(0) {} > > These constructors seem like they should be non-public. Fixed. > src/hotspot/share/utilities/resourceHash.hpp line 92: > >> 90: ResourceHashtableBase(unsigned size) : STORAGE(size), _number_of_entries(0) {} >> 91: >> 92: ~ResourceHashtableBase() { > > Base class constructor should be non-public to avoid slicing. Also need to decide what to do about copying, either disallow or deep copy. Default shallow copy seems likely to be wrong. I made all the constructors and destructors protected. I also made the classes `NONCOPYABLE`. > src/hotspot/share/utilities/resourceHash.hpp line 222: > >> 220: protected: >> 221: FixedResourceHashtableStorage() { >> 222: memset(_table, 0, TABLE_SIZE * sizeof(Node*)); > > Instead of memset, consider `FixedResourceHashtableStorage() : _table{} {}`. Fixed. > src/hotspot/share/utilities/resourceHash.hpp line 224: > >> 222: memset(_table, 0, TABLE_SIZE * sizeof(Node*)); >> 223: } >> 224: > > Destructor should be non-public to prevent slicing. Also need to consider what to do about copying. This class doesn't have a destructor. Should I add one and make it protected? ------------- PR: https://git.openjdk.java.net/jdk/pull/4536 From kbarrett at openjdk.java.net Fri Jul 2 07:49:04 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 2 Jul 2021 07:49:04 GMT Subject: RFR: 8269004 Implement ResizableResourceHashtable [v5] In-Reply-To: References: Message-ID: <19XEJpbcG5LRF-_tssdMnCxj6m9I9RWE2XGCTe-hguo=.44a6937e-a914-49eb-b308-5834c9e6cce4@github.com> On Fri, 2 Jul 2021 06:07:43 GMT, Ioi Lam wrote: >> In HotSpot we have (at least) two hashtable designs in the C++ code: >> >> - share/utilities/hashtable.hpp >> - share/utilities/resourceHash.hpp >> >> Of the two, the `ResourceHashtable` API is much cleaner and most new code has been written with it. However, one issue is that the `SIZE` of `ResourceHashtable` is a compile-time constant. This makes the hash-to-index computation very fast on x64 (gcc can avoid using the slow divq instruction for modulo). However, the downside is we cannot use `ResourceHashtable` when we need a hashtable whose size is determined at run time (and, optionally, resizeable). >> >> This PR refactors `ResourceHashtable` into a base template class `ResourceHashtableBase`, whose `size()` function can be configured by a subclass to be either constant or runtime-configurable. >> >> Note: since we want to preserve the performance of `hash % SIZE`, we can't make `size()` a virtual function. >> >> Preliminary benchmark shows that this refactoring has no impact on the performance of the constant `ResourceHashtable`. See https://github.com/iklam/tools/tree/main/bench/resourceHash: >> >> *before* >> ResourceHashtable: 2.70 sec >> >> *after* >> ResourceHashtable: 2.72 sec >> ResizableResourceHashtable: 5.29 sec >> >> To make sure `ResizableResourceHashtable` works, I rewrote some CDS code to use `ResizableResourceHashtable` instead of `KVHashtable` > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @kimbarrett comments The hashtable changes look good, with the one lingering nit about one of the destructors. I didn't review the usage changes carefully. It all looked okay to me, but that's not code I'm familiar with. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4536 From kbarrett at openjdk.java.net Fri Jul 2 07:49:05 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 2 Jul 2021 07:49:05 GMT Subject: RFR: 8269004 Implement ResizableResourceHashtable [v4] In-Reply-To: References: <29ADEYUJ_QfBxtes5NSY8CwtQnw06eXQWJMAP0MdJ60=.cbdab722-ac5c-40c7-a481-dbad7bed54cd@github.com> Message-ID: On Fri, 2 Jul 2021 06:03:29 GMT, Ioi Lam wrote: >> src/hotspot/share/utilities/resourceHash.hpp line 80: >> >>> 78: >>> 79: Node const** lookup_node(unsigned hash, K const& key) const { >>> 80: return const_cast( >> >> [pre-existing] I think this `const_cast` to add const is unnecessary. > > I tried removing it but gcc complains. I will leave it as is for now. Oops, I misread it. OK. >> src/hotspot/share/utilities/resourceHash.hpp line 224: >> >>> 222: memset(_table, 0, TABLE_SIZE * sizeof(Node*)); >>> 223: } >>> 224: >> >> Destructor should be non-public to prevent slicing. Also need to consider what to do about copying. > > This class doesn't have a destructor. Should I add one and make it protected? I would. Make it `= default`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4536 From aph at redhat.com Fri Jul 2 08:43:05 2021 From: aph at redhat.com (Andrew Haley) Date: Fri, 2 Jul 2021 09:43:05 +0100 Subject: [External] : The cost of nmethod entry barriers [was: RFR: 8269476: Skip nmethod entry barrier if there is no oops in the jit code [v4]] In-Reply-To: <33BC1746-BAE1-4161-8499-92DFA2935792@oracle.com> References: <9FFE0079-6EC2-4C0B-B235-C9F9D2C9E90D@oracle.com> <8eb6de3a-44e6-dd80-c14f-b54b28a67c95@redhat.com> <61040ae8-bbbd-d10b-f579-e94eac74df0f@redhat.com> <33BC1746-BAE1-4161-8499-92DFA2935792@oracle.com> Message-ID: <10fbf542-f64f-ca1f-c04d-d98076f57e85@redhat.com> On 7/1/21 9:56 PM, John Rose wrote: > So for javac (which BTW has a lot of polymorphism in it) Ha! Good point, I hadn't thought of that. javac is my pet program for testing compiler tweaks because it's complex, does a lot of (re)compiling, requires no setting up or external dependencies, and it's there. > the dynamic proportion of retired nmethod entry > instructions is 1.38%. Unless those instructions are > unusually slow ones, that percentage is an upper limit > to their wall clock contribution. Probably, but one of them is a LoadLoad fence, whose cost is horribly difficult to quantize because that cost depends entirely on what is going on around the instruction. That is to say, fences can restrict speculation and speculation is what makes our fastest processors fast. However, I understand that some processors speculate past memory fence instructions, so that may not be such a big issue. > If we can get rid of 2 out of 5 of them, the estimate would be > 0.83%, with a savings of 0.55% (upper bound). Depending on the cost of that fence, yes. But, for avoidance of doubt, I'm not arguing for 8269476. My goal here is to provide some clarity about the costs of nmethod barriers, mostly because I've seen them dismissed as insignificant. They're not. That 1.38% figure surprised me. I hadn't expected it to be as insignificant as people made it out to be, but ... wow. That doesn't mean that nmethod barriers are bad, of course. > BTW, that 1.38% is a *lower* limit for nmethod overhead proper, > because it doesn?t count the effects of lost inlining, which are > surely at least as large. The combined effects of lost inlining > (whatever they are) are IMO the root reason why we keep piling on > more and more inlining tactics, trying to make nmethods large and > calls infrequent. That's a good point too. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at openjdk.java.net Fri Jul 2 09:20:35 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 2 Jul 2021 09:20:35 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v6] In-Reply-To: References: Message-ID: <8wTsKKtuOS-OWOMgEvZD8Z1kql-ncYNA7-iYl11pDk8=.e870a1f2-58f6-407b-b9fb-391dffe0c6da@github.com> > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8261492-shenandoah-forwardee-memord - 8261492: Shenandoah: reconsider forwardee accesses memory ordering ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2496/files - new: https://git.openjdk.java.net/jdk/pull/2496/files/36e2da27..4953a6dd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=04-05 Stats: 20377 lines in 568 files changed: 8720 ins; 9594 del; 2063 mod Patch: https://git.openjdk.java.net/jdk/pull/2496.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496 PR: https://git.openjdk.java.net/jdk/pull/2496 From whuang at openjdk.java.net Fri Jul 2 09:54:32 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 2 Jul 2021 09:54:32 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: > Dear all, > Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. > We profile the performance by using this JMH case: > > > ```java > package com.huawei.string; > import java.util.*; > import java.util.concurrent.TimeUnit; > > import org.openjdk.jmh.annotations.CompilerControl; > import org.openjdk.jmh.annotations.Benchmark; > import org.openjdk.jmh.annotations.Level; > import org.openjdk.jmh.annotations.OutputTimeUnit; > import org.openjdk.jmh.annotations.Param; > import org.openjdk.jmh.annotations.Scope; > import org.openjdk.jmh.annotations.Setup; > import org.openjdk.jmh.annotations.State; > import org.openjdk.jmh.annotations.Fork; > import org.openjdk.jmh.infra.Blackhole; > > @State(Scope.Thread) > @OutputTimeUnit(TimeUnit.MILLISECONDS) > public class StringEqual { > @Param({"8", "64", "4096"}) > int size; > > String str1; > String str2; > > @Setup(Level.Trial) > public void init() { > str1 = newString(size, 'c', '1'); > str2 = newString(size, 'c', '2'); > } > > public String newString(int length, char charToFill, char lastChar) { > if (length > 0) { > char[] array = new char[length]; > Arrays.fill(array, charToFill); > array[length - 1] = lastChar; > return new String(array); > } > return ""; > } > > @Benchmark > @CompilerControl(CompilerControl.Mode.DONT_INLINE) > public boolean EqualString() { > return str1.equals(str2); > } > } > > ``` > The result is list as following:?Linux aarch64 with 128cores? > > Benchmark | (size) | Mode | Cnt | Score | Error | Units > ----------------------------------|-------|---------|-------|------------|------------|---------- > StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms > StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms > StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms > StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms > StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms > StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms > > Yours, > WANG Huang Wang Huang has updated the pull request incrementally with one additional commit since the last revision: unroll when small string sizes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4423/files - new: https://git.openjdk.java.net/jdk/pull/4423/files/4f02c00f..7957ee28 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4423&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4423&range=01-02 Stats: 161 lines in 3 files changed: 34 ins; 48 del; 79 mod Patch: https://git.openjdk.java.net/jdk/pull/4423.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4423/head:pull/4423 PR: https://git.openjdk.java.net/jdk/pull/4423 From whuang at openjdk.java.net Fri Jul 2 09:59:01 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 2 Jul 2021 09:59:01 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: <0sQRhgsM2oRxLZwttYTCstpSGkO1kDeQAKTwsbVYsLA=.7eecf031-00e1-43ce-bc60-f6e0d2a6052e@github.com> On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang wrote: >> Dear all, >> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. >> We profile the performance by using this JMH case: >> >> >> ```java >> package com.huawei.string; >> import java.util.*; >> import java.util.concurrent.TimeUnit; >> >> import org.openjdk.jmh.annotations.CompilerControl; >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.Level; >> import org.openjdk.jmh.annotations.OutputTimeUnit; >> import org.openjdk.jmh.annotations.Param; >> import org.openjdk.jmh.annotations.Scope; >> import org.openjdk.jmh.annotations.Setup; >> import org.openjdk.jmh.annotations.State; >> import org.openjdk.jmh.annotations.Fork; >> import org.openjdk.jmh.infra.Blackhole; >> >> @State(Scope.Thread) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class StringEqual { >> @Param({"8", "64", "4096"}) >> int size; >> >> String str1; >> String str2; >> >> @Setup(Level.Trial) >> public void init() { >> str1 = newString(size, 'c', '1'); >> str2 = newString(size, 'c', '2'); >> } >> >> public String newString(int length, char charToFill, char lastChar) { >> if (length > 0) { >> char[] array = new char[length]; >> Arrays.fill(array, charToFill); >> array[length - 1] = lastChar; >> return new String(array); >> } >> return ""; >> } >> >> @Benchmark >> @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> public boolean EqualString() { >> return str1.equals(str2); >> } >> } >> >> ``` >> The result is list as following:?Linux aarch64 with 128cores? >> >> Benchmark | (size) | Mode | Cnt | Score | Error | Units >> ----------------------------------|-------|---------|-------|------------|------------|---------- >> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms >> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms >> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms >> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms >> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms >> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms >> >> Yours, >> WANG Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > unroll when small string sizes > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ > > I had to make some changes to the benchmark to get accurate timing, because > it is swamped by JMH overhead for very small strings. > > It should be clear from my patch what I did. The most important part is > to run the test code in a loop, or you won't see small effects. We're > trying to measure something that only takes a few nanoseconds. > > This is what I see, Apple M1, two equal strings: > > Old: > > StringEquals.equal 8 avgt 5 0.948 ? 0.001 us/op > StringEquals.equal 11 avgt 5 0.948 ? 0.004 us/op > StringEquals.equal 16 avgt 5 0.948 ? 0.001 us/op > StringEquals.equal 22 avgt 5 1.260 ? 0.002 us/op > StringEquals.equal 32 avgt 5 1.886 ? 0.001 us/op > StringEquals.equal 45 avgt 5 2.514 ? 0.001 us/op > StringEquals.equal 64 avgt 5 3.141 ? 0.003 us/op > StringEquals.equal 91 avgt 5 4.395 ? 0.002 us/op > StringEquals.equal 121 avgt 5 5.653 ? 0.014 us/op > StringEquals.equal 181 avgt 5 8.011 ? 0.010 us/op > StringEquals.equal 256 avgt 5 11.433 ? 0.014 us/op > StringEquals.equal 512 avgt 5 23.005 ? 0.124 us/op > StringEquals.equal 1024 avgt 5 49.185 ? 0.032 us/op > > Your patch: > > Benchmark (size) Mode Cnt Score Error Units > StringEquals.equal 8 avgt 5 1.574 ? 0.001 us/op > StringEquals.equal 11 avgt 5 1.734 ? 0.004 us/op > StringEquals.equal 16 avgt 5 1.888 ? 0.002 us/op > StringEquals.equal 22 avgt 5 1.892 ? 0.003 us/op > StringEquals.equal 32 avgt 5 2.517 ? 0.003 us/op > StringEquals.equal 45 avgt 5 2.988 ? 0.002 us/op > StringEquals.equal 64 avgt 5 2.517 ? 0.003 us/op > StringEquals.equal 91 avgt 5 8.659 ? 0.007 us/op > StringEquals.equal 121 avgt 5 5.649 ? 0.007 us/op > StringEquals.equal 181 avgt 5 6.050 ? 0.009 us/op > StringEquals.equal 256 avgt 5 7.088 ? 0.016 us/op > StringEquals.equal 512 avgt 5 14.163 ? 0.018 us/op > StringEquals.equal 1024 avgt 5 29.998 ? 0.052 us/op > > As you can see, we're looking at regressions all the way up to size=45, > with something very odd happening at size=91. Finally the vectorized > code starts to pull ahead at size=181. > > A few things: > > You should never be executing the TAIL unless the string is really > short. Just do one pair of unaligned loads at the end to finish. > > Please don't use aliases for rscratch1 and rscratch2. Calling them tmp1 > and tmp2 doesn't help the reader. > > So: please make sure the smaller strings are at least as good as > they are now. Remember strings are usually short, so we can tolerate > no regressions with the smaller sizes. > > I don't think that Neon does any good here. This is what I get by rewriting > (just) the stub with scalar registers, in the attached patch: > > Benchmark (size) Mode Cnt Score Error Units > StringEquals.equal 8 avgt 5 1.574 ? 0.004 us/op > StringEquals.equal 11 avgt 5 1.734 ? 0.003 us/op > StringEquals.equal 16 avgt 5 1.888 ? 0.002 us/op > StringEquals.equal 22 avgt 5 1.891 ? 0.003 us/op > StringEquals.equal 32 avgt 5 2.517 ? 0.001 us/op > StringEquals.equal 45 avgt 5 2.988 ? 0.002 us/op > StringEquals.equal 64 avgt 5 2.595 ? 0.004 us/op > StringEquals.equal 91 avgt 5 4.083 ? 0.006 us/op > StringEquals.equal 121 avgt 5 5.432 ? 0.006 us/op > StringEquals.equal 181 avgt 5 6.292 ? 0.009 us/op > StringEquals.equal 256 avgt 5 7.232 ? 0.008 us/op > StringEquals.equal 512 avgt 5 13.304 ? 0.012 us/op > StringEquals.equal 1024 avgt 5 25.537 ? 0.012 us/op > > I use an editor with automatic indentation, as do many people, so > I inserted brackets in the right places in the assembly code. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: 8268229.patch > Type: text/x-patch > Size: 12464 bytes > Desc: not available > URL: @theRealAph Thank you for your suggestion. It's my fault that the JMH I used is not accurate. I changed my codes and re-tested under your JMH: Before opt? Benchmark |(size)| Mode| Cnt | Score| Error |Units -------------------|------|-----|-----|-------|---------|----- StringEquals.equal | 8| avgt| 5 | 2.334|? 0.012 |us/op StringEquals.equal | 11| avgt| 5 | 2.335|? 0.012 |us/op StringEquals.equal | 16| avgt| 5 | 2.334|? 0.011 |us/op StringEquals.equal | 22| avgt| 5 | 3.414|? 0.422 |us/op StringEquals.equal | 32| avgt| 5 | 3.890|? 0.004 |us/op StringEquals.equal | 45| avgt| 5 | 5.610|? 0.023 |us/op StringEquals.equal | 64| avgt| 5 | 7.215|? 0.009 |us/op StringEquals.equal | 91| avgt| 5 | 12.305|? 1.716 |us/op StringEquals.equal | 121| avgt| 5 | 14.891|? 0.085 |us/op StringEquals.equal | 181| avgt| 5 | 21.502|? 0.050 |us/op StringEquals.equal | 256| avgt| 5 | 29.968|? 0.155 |us/op StringEquals.equal | 512| avgt| 5 | 59.414|? 2.341 |us/op StringEquals.equal | 1024| avgt| 5 |118.365|? 20.794 |us/op After opt? Benchmark |(size)| Mode| Cnt | Score| Error| Units -------------------|------|-----|-----|------|-------|------ StringEquals.equal | 8| avgt| 5 | 2.333|? 0.003| us/op StringEquals.equal | 11| avgt| 5 | 2.333|? 0.001| us/op StringEquals.equal | 16| avgt| 5 | 2.332|? 0.002| us/op StringEquals.equal | 22| avgt| 5 | 3.265|? 0.404| us/op StringEquals.equal | 32| avgt| 5 | 3.875|? 0.002| us/op StringEquals.equal | 45| avgt| 5 | 5.793|? 0.331| us/op StringEquals.equal | 64| avgt| 5 | 6.730|? 0.054| us/op StringEquals.equal | 91| avgt| 5 | 8.611|? 0.075| us/op StringEquals.equal | 121| avgt| 5 |10.041|? 0.042| us/op StringEquals.equal | 181| avgt| 5 |13.968|? 0.653| us/op StringEquals.equal | 256| avgt| 5 |19.199|? 1.227| us/op StringEquals.equal | 512| avgt| 5 |39.508|? 1.784| us/op StringEquals.equal | 1024| avgt| 5 |77.883|? 1.290| us/op ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From zgu at redhat.com Fri Jul 2 11:55:03 2021 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 2 Jul 2021 07:55:03 -0400 Subject: RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: References: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> Message-ID: Hi David, Thanks for filing the CSR. > > A CSR request has been created and filled out. @zhengyu123 please mark it as Finalized. Thanks. Others can comment on the CSR request. I have already included a large portion of @kimbarrett 's comment above. > Done. -Zhengyu > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/4653 > From zgu at openjdk.java.net Fri Jul 2 12:41:41 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 2 Jul 2021 12:41:41 GMT Subject: RFR: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array [v2] In-Reply-To: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> References: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> Message-ID: > Open this PR to carry on the discussion started in jdk17 [https://github.com/openjdk/jdk17/pull/185](url) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: fix and move test to new location ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4653/files - new: https://git.openjdk.java.net/jdk/pull/4653/files/d6842ff9..89de9a67 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4653&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4653&range=00-01 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4653/head:pull/4653 PR: https://git.openjdk.java.net/jdk/pull/4653 From coleen.phillimore at oracle.com Fri Jul 2 13:45:29 2021 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Fri, 2 Jul 2021 09:45:29 -0400 Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On 7/1/21 9:09 PM, David Holmes wrote: > On Thu, 1 Jul 2021 15:50:26 GMT, Coleen Phillimore wrote: > >>> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >>> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >>> >>> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >>> >>> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >>> tationLimit=0 >> Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: >> >> - ooops fixed typo >> - Add a comment about is_loader_alive. > Hi Coleen, > > So IIUC by clearing during unload, rather than in the destructor, it ensures that a jmethodID seen as valid by `Method::checked_resolve_jmethod_id` can't become invalid (due to a loader no longer being alive and being reclaimed) unless there is a safepoint (and assuming the JVM TI agent or application code, is not keeping the loader alive). So this fixes the situation in `jvmti_GetMethodDeclaringClass` as per the bug report, but in general we would have to be careful in the JVM TI implementation to avoid safepoints after validating the jmethodID. Is that correct? Yes, it is true.? Once you have a Method*, you'd have to have a java.lang.Class object for the InstanceKlass on the stack (or a jobject to one) if you want to use the Method* later.? This is okay for most JVMTI entries because there is already a jobject pointing to the mirror, and jvmti_GetMethodDeclaringClass was sort of the exception that didn't have the mirror until after the call. In general if you don't want your Method* to go away, you also need to have a methodHandle so that redefinition doesn't clean it up if it's redefined, which is a much less frequent but bad situation. Coleen > > Thanks, > David > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/4643 From zgu at openjdk.java.net Fri Jul 2 14:25:16 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 2 Jul 2021 14:25:16 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v6] In-Reply-To: <8wTsKKtuOS-OWOMgEvZD8Z1kql-ncYNA7-iYl11pDk8=.e870a1f2-58f6-407b-b9fb-391dffe0c6da@github.com> References: <8wTsKKtuOS-OWOMgEvZD8Z1kql-ncYNA7-iYl11pDk8=.e870a1f2-58f6-407b-b9fb-391dffe0c6da@github.com> Message-ID: On Fri, 2 Jul 2021 09:20:35 GMT, Aleksey Shipilev wrote: >> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. >> >> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. >> >> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. >> >> Additional testing: >> - [x] Linux x86_64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `tier1` with Shenandoah > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8261492-shenandoah-forwardee-memord > - 8261492: Shenandoah: reconsider forwardee accesses memory ordering Still good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2496 From aph at openjdk.java.net Fri Jul 2 14:28:06 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 2 Jul 2021 14:28:06 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang wrote: >> Dear all, >> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. >> We profile the performance by using this JMH case: >> >> >> ```java >> package com.huawei.string; >> import java.util.*; >> import java.util.concurrent.TimeUnit; >> >> import org.openjdk.jmh.annotations.CompilerControl; >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.Level; >> import org.openjdk.jmh.annotations.OutputTimeUnit; >> import org.openjdk.jmh.annotations.Param; >> import org.openjdk.jmh.annotations.Scope; >> import org.openjdk.jmh.annotations.Setup; >> import org.openjdk.jmh.annotations.State; >> import org.openjdk.jmh.annotations.Fork; >> import org.openjdk.jmh.infra.Blackhole; >> >> @State(Scope.Thread) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class StringEqual { >> @Param({"8", "64", "4096"}) >> int size; >> >> String str1; >> String str2; >> >> @Setup(Level.Trial) >> public void init() { >> str1 = newString(size, 'c', '1'); >> str2 = newString(size, 'c', '2'); >> } >> >> public String newString(int length, char charToFill, char lastChar) { >> if (length > 0) { >> char[] array = new char[length]; >> Arrays.fill(array, charToFill); >> array[length - 1] = lastChar; >> return new String(array); >> } >> return ""; >> } >> >> @Benchmark >> @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> public boolean EqualString() { >> return str1.equals(str2); >> } >> } >> >> ``` >> The result is list as following:?Linux aarch64 with 128cores? >> >> Benchmark | (size) | Mode | Cnt | Score | Error | Units >> ----------------------------------|-------|---------|-------|------------|------------|---------- >> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms >> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms >> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms >> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms >> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms >> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms >> >> Yours, >> WANG Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > unroll when small string sizes src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4816: > 4814: cbnz(rscratch2, DONE); > 4815: b(SAME); > 4816: No, we are not going to do all this unrolling at the call site. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670: > 4668: __ cbnz(rscratch1, NOT_EQUAL); > 4669: __ br(__ GE, LOOP); > 4670: As I said before, we gain nothing by using Neon here. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From aph at openjdk.java.net Fri Jul 2 14:32:55 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 2 Jul 2021 14:32:55 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang wrote: >> Dear all, >> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. >> We profile the performance by using this JMH case: >> >> >> ```java >> package com.huawei.string; >> import java.util.*; >> import java.util.concurrent.TimeUnit; >> >> import org.openjdk.jmh.annotations.CompilerControl; >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.Level; >> import org.openjdk.jmh.annotations.OutputTimeUnit; >> import org.openjdk.jmh.annotations.Param; >> import org.openjdk.jmh.annotations.Scope; >> import org.openjdk.jmh.annotations.Setup; >> import org.openjdk.jmh.annotations.State; >> import org.openjdk.jmh.annotations.Fork; >> import org.openjdk.jmh.infra.Blackhole; >> >> @State(Scope.Thread) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class StringEqual { >> @Param({"8", "64", "4096"}) >> int size; >> >> String str1; >> String str2; >> >> @Setup(Level.Trial) >> public void init() { >> str1 = newString(size, 'c', '1'); >> str2 = newString(size, 'c', '2'); >> } >> >> public String newString(int length, char charToFill, char lastChar) { >> if (length > 0) { >> char[] array = new char[length]; >> Arrays.fill(array, charToFill); >> array[length - 1] = lastChar; >> return new String(array); >> } >> return ""; >> } >> >> @Benchmark >> @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> public boolean EqualString() { >> return str1.equals(str2); >> } >> } >> >> ``` >> The result is list as following:?Linux aarch64 with 128cores? >> >> Benchmark | (size) | Mode | Cnt | Score | Error | Units >> ----------------------------------|-------|---------|-------|------------|------------|---------- >> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms >> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms >> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms >> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms >> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms >> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms >> >> Yours, >> WANG Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > unroll when small string sizes It would have been a good idea, to read what I wrote, and act on it. Or did you not see my patch attached to the email? ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From aph at openjdk.java.net Fri Jul 2 14:32:56 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 2 Jul 2021 14:32:56 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> References: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> Message-ID: On Fri, 2 Jul 2021 14:25:07 GMT, Andrew Haley wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> unroll when small string sizes > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670: > >> 4668: __ cbnz(rscratch1, NOT_EQUAL); >> 4669: __ br(__ GE, LOOP); >> 4670: > > As I said before, we gain nothing by using Neon here. Much better: + __ ldp(r5, r6, Address(__ post(a1, wordSize * 2))); + __ ldp(rscratch1, rscratch2, Address(__ post(a2, wordSize * 2))); + __ cmp(r5, rscratch1); + __ ccmp(r6, rscratch2, 0, Assembler::EQ); + __ br(__ NE, NOT_EQUAL); ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From coleenp at openjdk.java.net Fri Jul 2 18:07:58 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 2 Jul 2021 18:07:58 GMT Subject: Integrated: 8268364: jmethod clearing should be done during unloading In-Reply-To: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: <7dbBx7ykvOdyP2nlBc3L6aiNVaGdBli9KqBD4yvKV-0=.c7703520-9e67-433e-bdd0-194fd6c8eeb8@github.com> On Wed, 30 Jun 2021 22:48:03 GMT, Coleen Phillimore wrote: > This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. > Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. > > Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. > > Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen > tationLimit=0 This pull request has now been integrated. Changeset: 3d84398d Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/3d84398d128bb2eed6280ebbc3f57afb3b89908f Stats: 36 lines in 2 files changed: 20 ins; 13 del; 3 mod 8268364: jmethod clearing should be done during unloading Reviewed-by: dcubed, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From coleenp at openjdk.java.net Fri Jul 2 18:07:57 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 2 Jul 2021 18:07:57 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Fri, 2 Jul 2021 01:06:01 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: >> >> - ooops fixed typo >> - Add a comment about is_loader_alive. > > Hi Coleen, > > So IIUC by clearing during unload, rather than in the destructor, it ensures that a jmethodID seen as valid by `Method::checked_resolve_jmethod_id` can't become invalid (due to a loader no longer being alive and being reclaimed) unless there is a safepoint (and assuming the JVM TI agent or application code, is not keeping the loader alive). So this fixes the situation in `jvmti_GetMethodDeclaringClass` as per the bug report, but in general we would have to be careful in the JVM TI implementation to avoid safepoints after validating the jmethodID. Is that correct? > > Thanks, > David I'm not sure why the bot didn't put my reply to @dholmes-ora in the PR. For the record, all of the tier1-7 tests passed on the oracle supported platforms (linux-aarch64, linux-x64, windows-x64, macos-x64, and macos-aarch64 and their debug counterparts). ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From dcubed at openjdk.java.net Fri Jul 2 18:30:55 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 2 Jul 2021 18:30:55 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: On Fri, 2 Jul 2021 18:03:06 GMT, Coleen Phillimore wrote: >> Hi Coleen, >> >> So IIUC by clearing during unload, rather than in the destructor, it ensures that a jmethodID seen as valid by `Method::checked_resolve_jmethod_id` can't become invalid (due to a loader no longer being alive and being reclaimed) unless there is a safepoint (and assuming the JVM TI agent or application code, is not keeping the loader alive). So this fixes the situation in `jvmti_GetMethodDeclaringClass` as per the bug report, but in general we would have to be careful in the JVM TI implementation to avoid safepoints after validating the jmethodID. Is that correct? >> >> Thanks, >> David > > I'm not sure why the bot didn't put my reply to @dholmes-ora in the PR. > For the record, all of the tier1-7 tests passed on the oracle supported platforms (linux-aarch64, linux-x64, windows-x64, macos-x64, and macos-aarch64 and their debug counterparts). @coleenp - That was me that asked for the Tier[1-7]. Thanks for running that. ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From coleenp at openjdk.java.net Fri Jul 2 19:30:57 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 2 Jul 2021 19:30:57 GMT Subject: RFR: 8268364: jmethod clearing should be done during unloading [v2] In-Reply-To: References: <4rn1RSGefWZrUjBgkyJFeFe6hg05r3iVmQq-7PcRj1o=.ac03228b-1fea-459d-95eb-5cfde53f803a@github.com> Message-ID: <-hnzOt3T8svhqeoM4sL8cnRlXqRJ3Q1Idt2ymkhu8h4=.05e90d0b-700c-4b54-ac65-06b49db94226@github.com> On Thu, 1 Jul 2021 15:50:26 GMT, Coleen Phillimore wrote: >> This patch moves the jmethod clearing to ClassLoaderData::unload() but also adds a check to Method::checked_resolved_jmethod_id() to handle the case where ZGC may be unloading a class but not have gotten to ClassLoaderData::unload() yet. JVMTI will read a NULL method for checked_resolved_jmethod_id() in this case, and not get a Method that will shortly, or has already been reclaimed in the Metaspace destructor. >> Since I was there, I also added Method::is_valid_method() check to checked_resolve_jmethod_id. I don't think it's expensive anymore but it could be added under DEBUG. Either way method->method_holder()->is_loader_alive() will crash if !is_valid_method so we should leave it. As I wrote in the related issues, the bogus Method may have been because of a previous set of bugs with post_compiled_method_load events. >> >> Tested with tiers 1-6 on linux-x64-debug and 1-3 on windows-x64-debug. >> >> Also ran vmTestbase/nsk/{jdi,jvmti} tests with VM_OPTIONS=-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmen >> tationLimit=0 > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - ooops fixed typo > - Add a comment about is_loader_alive. Thanks for the suggestion, Dan! ------------- PR: https://git.openjdk.java.net/jdk/pull/4643 From kvn at openjdk.java.net Fri Jul 2 19:44:02 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 2 Jul 2021 19:44:02 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms Message-ID: Knights family of X86 Intel CPU (KNL) does not support some of avx512 features (AVX512VL/BW) and have other restrictions. We may not have such kind of machines in our testing environment and may miss bugs as JBS history shows (look recent fixes for KNL). On other hand we have some Windows VM instances which seem emulate KNL configuration and limit avx512 instructions on CPU which supports full set. Recent bug JDK-8269775 shows such example. I suggest to add -XX:+UseKNLSetting x86 diagnostic flag to emulate KNL settings in HotSpot VM to test such configuration. ------------- Commit messages: - 8269825: [TESTBUG] Missing testing for x86 KNL platforms Changes: https://git.openjdk.java.net/jdk17/pull/205/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=205&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269825 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk17/pull/205.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/205/head:pull/205 PR: https://git.openjdk.java.net/jdk17/pull/205 From jwilhelm at openjdk.java.net Fri Jul 2 19:44:18 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 2 Jul 2021 19:44:18 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8269768: JFR Terminology Refresh - 8269775: compiler/codegen/ClearArrayTest.java failed with "assert(false) failed: bad AD file" - 8269543: The warning for System::setSecurityManager should only appear once for each caller - 8262017: C2: assert(n != __null) failed: Bad immediate dominator info. - 8269771: assert(tmp == _callprojs.fallthrough_catchproj) failed: allocation control projection - 8265132: C2 compilation fails with assert "missing precedence edge" The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4673&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4673&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4673/files Stats: 341 lines in 12 files changed: 266 ins; 34 del; 41 mod Patch: https://git.openjdk.java.net/jdk/pull/4673.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4673/head:pull/4673 PR: https://git.openjdk.java.net/jdk/pull/4673 From iklam at openjdk.java.net Fri Jul 2 20:23:37 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 2 Jul 2021 20:23:37 GMT Subject: RFR: 8269004 Implement ResizableResourceHashtable [v6] In-Reply-To: References: Message-ID: <4f1YCGl8EbHchOMfpV4R1QeMmej3KLa0AeRKEDLf_xM=.d38bc652-717e-472e-9cc3-61b25fecd391@github.com> > In HotSpot we have (at least) two hashtable designs in the C++ code: > > - share/utilities/hashtable.hpp > - share/utilities/resourceHash.hpp > > Of the two, the `ResourceHashtable` API is much cleaner and most new code has been written with it. However, one issue is that the `SIZE` of `ResourceHashtable` is a compile-time constant. This makes the hash-to-index computation very fast on x64 (gcc can avoid using the slow divq instruction for modulo). However, the downside is we cannot use `ResourceHashtable` when we need a hashtable whose size is determined at run time (and, optionally, resizeable). > > This PR refactors `ResourceHashtable` into a base template class `ResourceHashtableBase`, whose `size()` function can be configured by a subclass to be either constant or runtime-configurable. > > Note: since we want to preserve the performance of `hash % SIZE`, we can't make `size()` a virtual function. > > Preliminary benchmark shows that this refactoring has no impact on the performance of the constant `ResourceHashtable`. See https://github.com/iklam/tools/tree/main/bench/resourceHash: > > *before* > ResourceHashtable: 2.70 sec > > *after* > ResourceHashtable: 2.72 sec > ResizableResourceHashtable: 5.29 sec > > To make sure `ResizableResourceHashtable` works, I rewrote some CDS code to use `ResizableResourceHashtable` instead of `KVHashtable` Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into 8269004-resizeable-resource-hashtable - @kimbarrett review -- add ~FixedResourceHashtableStorage() = default; - @kimbarrett comments - @coleenp comments - @kimbarrett feedback to move the storage code to a base class - @coleenp comments - cleanup - step4 - implemented resizing - step3 - step2 - ... and 1 more: https://git.openjdk.java.net/jdk/compare/ee28753e...e41c7d22 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4536/files - new: https://git.openjdk.java.net/jdk/pull/4536/files/66c6e381..e41c7d22 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4536&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4536&range=04-05 Stats: 44942 lines in 899 files changed: 23215 ins; 18555 del; 3172 mod Patch: https://git.openjdk.java.net/jdk/pull/4536.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4536/head:pull/4536 PR: https://git.openjdk.java.net/jdk/pull/4536 From dlong at openjdk.java.net Fri Jul 2 20:40:53 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 2 Jul 2021 20:40:53 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: References: Message-ID: <4uATHQKUM4LXeAWVeBG7T-icSBOiw7OoCRghjC316kg=.c92d41e2-27a6-496d-ae28-2870fcf8c8b1@github.com> On Fri, 2 Jul 2021 19:35:56 GMT, Vladimir Kozlov wrote: > Knights family of X86 Intel CPU (KNL) does not support some of avx512 features (AVX512VL/BW) and have other restrictions. We may not have such kind of machines in our testing environment and may miss bugs as JBS history shows (look recent fixes for KNL). > On other hand we have some Windows VM instances which seem emulate KNL configuration and limit avx512 instructions on CPU which supports full set. Recent bug JDK-8269775 shows such example. > I suggest to add -XX:+UseKNLSetting x86 diagnostic flag to emulate KNL settings in HotSpot VM to test such configuration. Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From coleenp at openjdk.java.net Fri Jul 2 20:43:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 2 Jul 2021 20:43:02 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects Message-ID: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). Hoping git actions will test 32 bits. Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. ------------- Commit messages: - 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects Changes: https://git.openjdk.java.net/jdk/pull/4675/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4675&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267303 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/4675.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4675/head:pull/4675 PR: https://git.openjdk.java.net/jdk/pull/4675 From jwilhelm at openjdk.java.net Fri Jul 2 20:53:56 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 2 Jul 2021 20:53:56 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: <6-wiHfIsWRpKqKS94UW7KNPg_YlZwc4PtYiVRX9-P2w=.ee4fac91-806c-4f49-bcb4-24f194fccded@github.com> On Fri, 2 Jul 2021 19:34:35 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: 17f53f2f Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/17f53f2f9c5928395eff9186160924e9a8e9a794 Stats: 341 lines in 12 files changed: 266 ins; 34 del; 41 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4673 From kvn at openjdk.java.net Fri Jul 2 22:43:47 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 2 Jul 2021 22:43:47 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 19:35:56 GMT, Vladimir Kozlov wrote: > Knights family of X86 Intel CPU (KNL) does not support some of avx512 features (AVX512VL/BW) and have other restrictions. We may not have such kind of machines in our testing environment and may miss bugs as JBS history shows (look recent fixes for KNL). > On other hand we have some Windows VM instances which seem emulate KNL configuration and limit avx512 instructions on CPU which supports full set. Recent bug JDK-8269775 shows such example. > I suggest to add -XX:+UseKNLSetting x86 diagnostic flag to emulate KNL settings in HotSpot VM to test such configuration. Thank you, Dean. ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From kbarrett at openjdk.java.net Sun Jul 4 10:59:03 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 4 Jul 2021 10:59:03 GMT Subject: [jdk17] RFR: 8269661: JNI_GetStringCritical does not lock char array Message-ID: Please review this fix to jni_GetStringCritical. When object pinning is being used it pins/unpins the String argument, but that's the wrong object. It should be pinning the associated byte array (when !latin1), as it is a reference into that array that will be returned and must be imobilized. Presently only Shenandoah is affected by this bug, as it is the only GC that uses object pinning (at the region level) rather than the GC locker. But both G1 and ZGC plan to use object pinning in the future. But with region pinning the problem is still going to be rare because the String and its value array are likely to be allocated in the same region. In addition, if pinning the String's value array, we also need to prevent string deduplication from changing the array while in the critical section. Otherwise, the unpin when the critical section is released will use the wrong object for the unpin operation. We accomplish this by marking the String as no longer subject to string deduplication. As that state is sticky, when pinning is used a String can't be deduplicated after being used in a critical section. We could add a critical section counter to String, but for now we're going to guess that isn't worth the effort. (String has at least 2 bytes available for the two string deduplication flags plus such a counter (more in some configurations, but always at least 2 at present)). As part of the restructuring to accomplish these changes, it proved to be easy to avoid using the GC locker (or pinning) for the latin1 case, where a copy of the value array will be used. Hence this change also addresses JDK-8269650. Testing: Existing tests for GetStringCritical are in various vmTestbase/gc/lock tests and vmTestbase/nsk/stress/jni/jnistress004.java. Ran mach5 tier1, tier5. Tier5 is where the vmTestbase/gc/lock and vmTestbase/nsk/stress/jni tests are run. Locally (linux-x64) ran those tests with Shenandoah, with and without string deduplication enabled. ------------- Commit messages: - conditionally lock the string value array Changes: https://git.openjdk.java.net/jdk17/pull/209/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=209&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269661 Stats: 73 lines in 3 files changed: 60 ins; 10 del; 3 mod Patch: https://git.openjdk.java.net/jdk17/pull/209.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/209/head:pull/209 PR: https://git.openjdk.java.net/jdk17/pull/209 From kbarrett at openjdk.java.net Sun Jul 4 22:41:51 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 4 Jul 2021 22:41:51 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects In-Reply-To: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Fri, 2 Jul 2021 20:36:23 GMT, Coleen Phillimore wrote: > Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). > Hoping git actions will test 32 bits. > Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. Change looks good. Just a question in the comments. src/hotspot/share/oops/symbol.hpp line 158: > 156: static int max_length() { return max_symbol_length; } > 157: unsigned identity_hash() const { > 158: unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogBytesPerWord + 3)); [pre-existing] I'm puzzled by the `+3` in this line. I'm guessing this is `log2(sizeof(Symbol))`? ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4675 From dholmes at openjdk.java.net Mon Jul 5 00:49:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 5 Jul 2021 00:49:50 GMT Subject: [jdk17] RFR: 8269661: JNI_GetStringCritical does not lock char array In-Reply-To: References: Message-ID: On Sun, 4 Jul 2021 04:55:56 GMT, Kim Barrett wrote: > Please review this fix to jni_GetStringCritical. When object pinning is > being used it pins/unpins the String argument, but that's the wrong object. > It should be pinning the associated byte array (when !latin1), as it is a > reference into that array that will be returned and must be imobilized. > > Presently only Shenandoah is affected by this bug, as it is the only GC that > uses object pinning (at the region level) rather than the GC locker. But > both G1 and ZGC plan to use object pinning in the future. But with region > pinning the problem is still going to be rare because the String and its > value array are likely to be allocated in the same region. > > In addition, if pinning the String's value array, we also need to prevent > string deduplication from changing the array while in the critical section. > Otherwise, the unpin when the critical section is released will use the > wrong object for the unpin operation. We accomplish this by marking the > String as no longer subject to string deduplication. As that state is > sticky, when pinning is used a String can't be deduplicated after being used > in a critical section. We could add a critical section counter to String, > but for now we're going to guess that isn't worth the effort. (String has > at least 2 bytes available for the two string deduplication flags plus such > a counter (more in some configurations, but always at least 2 at present)). > > As part of the restructuring to accomplish these changes, it proved to be > easy to avoid using the GC locker (or pinning) for the latin1 case, where a > copy of the value array will be used. Hence this change also addresses > JDK-8269650. > > Testing: > Existing tests for GetStringCritical are in various vmTestbase/gc/lock > tests and vmTestbase/nsk/stress/jni/jnistress004.java. > > Ran mach5 tier1, tier5. Tier5 is where the vmTestbase/gc/lock and > vmTestbase/nsk/stress/jni tests are run. > > Locally (linux-x64) ran those tests with Shenandoah, with and without > string deduplication enabled. Hi Kim, Based on your description and previous discussion this all looks good to me. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk17/pull/209 From dholmes at openjdk.java.net Mon Jul 5 01:54:53 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 5 Jul 2021 01:54:53 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 13:06:55 GMT, Harold Seigel wrote: > Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Hi Harold, That all seems quite reasonable. One query below about private constructors. Thanks, David src/hotspot/share/classfile/stackMapTableFormat.hpp line 47: > 45: > 46: protected: > 47: // No constructors - should be 'private', but GCC issues a warning if it is Have we checked this is still the case? Might be a very old gcc issue. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4652 From iklam at openjdk.java.net Mon Jul 5 02:34:04 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 5 Jul 2021 02:34:04 GMT Subject: Integrated: 8269004 Implement ResizableResourceHashtable In-Reply-To: References: Message-ID: On Mon, 21 Jun 2021 04:31:42 GMT, Ioi Lam wrote: > In HotSpot we have (at least) two hashtable designs in the C++ code: > > - share/utilities/hashtable.hpp > - share/utilities/resourceHash.hpp > > Of the two, the `ResourceHashtable` API is much cleaner and most new code has been written with it. However, one issue is that the `SIZE` of `ResourceHashtable` is a compile-time constant. This makes the hash-to-index computation very fast on x64 (gcc can avoid using the slow divq instruction for modulo). However, the downside is we cannot use `ResourceHashtable` when we need a hashtable whose size is determined at run time (and, optionally, resizeable). > > This PR refactors `ResourceHashtable` into a base template class `ResourceHashtableBase`, whose `size()` function can be configured by a subclass to be either constant or runtime-configurable. > > Note: since we want to preserve the performance of `hash % SIZE`, we can't make `size()` a virtual function. > > Preliminary benchmark shows that this refactoring has no impact on the performance of the constant `ResourceHashtable`. See https://github.com/iklam/tools/tree/main/bench/resourceHash: > > *before* > ResourceHashtable: 2.70 sec > > *after* > ResourceHashtable: 2.72 sec > ResizableResourceHashtable: 5.29 sec > > To make sure `ResizableResourceHashtable` works, I rewrote some CDS code to use `ResizableResourceHashtable` instead of `KVHashtable` This pull request has now been integrated. Changeset: 4da52eaf Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/4da52eaf53e531e96e1e6eac460d6209916d6f2f Stats: 272 lines in 7 files changed: 214 ins; 19 del; 39 mod 8269004: Implement ResizableResourceHashtable Reviewed-by: coleenp, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/4536 From iklam at openjdk.java.net Mon Jul 5 02:34:01 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 5 Jul 2021 02:34:01 GMT Subject: RFR: 8269004 Implement ResizableResourceHashtable [v4] In-Reply-To: References: <29ADEYUJ_QfBxtes5NSY8CwtQnw06eXQWJMAP0MdJ60=.cbdab722-ac5c-40c7-a481-dbad7bed54cd@github.com> Message-ID: <1hZ1ndt-FOYhkopuZrZv5xGU7Kly8I3PYwJtRHIpiO0=.4a3fb467-2dca-4507-9df0-ca67b0b8b8d5@github.com> On Tue, 29 Jun 2021 21:33:34 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @coleenp comments > > Marked as reviewed by coleenp (Reviewer). Thanks @coleenp and @kimbarrett for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/4536 From whuang at openjdk.java.net Mon Jul 5 06:40:56 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 5 Jul 2021 06:40:56 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: <_lWD1fRe29XDPIa4-0Ft7OUNBsU0jWxEjJSoXiHlONI=.36acfd73-7093-4595-b6b1-c3f6836843bf@github.com> References: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> <_lWD1fRe29XDPIa4-0Ft7OUNBsU0jWxEjJSoXiHlONI=.36acfd73-7093-4595-b6b1-c3f6836843bf@github.com> Message-ID: On Mon, 5 Jul 2021 06:37:47 GMT, Wang Huang wrote: >> * Why we unrolling the loop ? >> * It is found that these codes degrades performance >> ``` >> br(LE, B16); >> subs(cnt1, cnt1, wordSize); >> br(LE, B24); >> subs(cnt1, cnt1, wordSize); >> >> We unrolls the loop to remove these comparsion. > > * Why we unroll the loop ? > * It is found that these codes degrades performance > ``` > br(LE, B16); > subs(cnt1, cnt1, wordSize); > br(LE, B24); > subs(cnt1, cnt1, wordSize); > > We unroll the loop to remove these comparsion. * Why we unroll the loop ? * It is found that these codes degrades performance ``` br(LE, B16); subs(cnt1, cnt1, wordSize); br(LE, B24); subs(cnt1, cnt1, wordSize); We unroll the loop to remove these comparsion. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From whuang at openjdk.java.net Mon Jul 5 06:40:56 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 5 Jul 2021 06:40:56 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> Message-ID: <_lWD1fRe29XDPIa4-0Ft7OUNBsU0jWxEjJSoXiHlONI=.36acfd73-7093-4595-b6b1-c3f6836843bf@github.com> On Mon, 5 Jul 2021 06:37:01 GMT, Wang Huang wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4816: >> >>> 4814: cbnz(rscratch2, DONE); >>> 4815: b(SAME); >>> 4816: >> >> No, we are not going to do all this unrolling at the call site. > > * Why we unrolling the loop ? > * It is found that these codes degrades performance > ``` > br(LE, B16); > subs(cnt1, cnt1, wordSize); > br(LE, B24); > subs(cnt1, cnt1, wordSize); > > We unrolls the loop to remove these comparsion. * Why we unroll the loop ? * It is found that these codes degrades performance ``` br(LE, B16); subs(cnt1, cnt1, wordSize); br(LE, B24); subs(cnt1, cnt1, wordSize); We unroll the loop to remove these comparsion. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From whuang at openjdk.java.net Mon Jul 5 06:40:55 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 5 Jul 2021 06:40:55 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> References: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> Message-ID: On Fri, 2 Jul 2021 14:24:33 GMT, Andrew Haley wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> unroll when small string sizes > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4816: > >> 4814: cbnz(rscratch2, DONE); >> 4815: b(SAME); >> 4816: > > No, we are not going to do all this unrolling at the call site. * Why we unrolling the loop ? * It is found that these codes degrades performance ``` br(LE, B16); subs(cnt1, cnt1, wordSize); br(LE, B24); subs(cnt1, cnt1, wordSize); We unrolls the loop to remove these comparsion. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From whuang at openjdk.java.net Mon Jul 5 06:57:54 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 5 Jul 2021 06:57:54 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> Message-ID: On Fri, 2 Jul 2021 14:30:18 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670: >> >>> 4668: __ cbnz(rscratch1, NOT_EQUAL); >>> 4669: __ br(__ GE, LOOP); >>> 4670: >> >> As I said before, we gain nothing by using Neon here. > > Much better: > > > + __ ldp(r5, r6, Address(__ post(a1, wordSize * 2))); > + __ ldp(rscratch1, rscratch2, Address(__ post(a2, wordSize * 2))); > + __ cmp(r5, rscratch1); > + __ ccmp(r6, rscratch2, 0, Assembler::EQ); > + __ br(__ NE, NOT_EQUAL); We changed `ld1` into `ldp` and get the result as following, simple: Benchmark |(size)| Mode| Cnt | Score| Error |Units -------------------|------|-----|-----|-------|---------|----- StringEquals.equal |45 |avgt |5 | 6.105 | ? 0.635 |us/op StringEquals.equal |64 |avgt | 5 |7.226 |? 0.056 |us/op StringEquals.equal |91 | avgt |5 |12.010 |? 0.375 | us/op StringEquals.equal |121 |avgt |5 |14.772 |? 0.114 | us/op StringEquals.equal |181 | avgt |5 | 21.468 | ? 0.676 |us/op StringEquals.equal |256 | avgt |5 |28.942 |? 4.806 |us/op StringEquals.equal | 512 |avgt | 5 |58.479 |? 5.918 |us/op StringEquals.equal |1024 |avgt |5 |119.313 | ? 16.661 | us/op ldp: Benchmark |(size)| Mode| Cnt | Score| Error |Units -------------------|------|-----|-----|-------|---------|----- StringEquals.equal |45 |avgt |5 |6.449 | ? 0.202 |us/op StringEquals.equal |64 |avgt | 5 |7.367 |? 0.055 |us/op StringEquals.equal |91 |avgt |5 | 9.984 |? 0.065 |us/op StringEquals.equal | 121 | avgt | 5 | 12.540 |? 0.545| us/op StringEquals.equal |181 |avgt |5 | 15.614 |? 0.280 |us/op StringEquals.equal | 256 |avgt | 5 |19.346 | ? 0.243| us/op StringEquals.equal | 512 |avgt |5 |35.718 | ? 0.599 |us/op StringEquals.equal |1024 |avgt |5 |67.846 | ? 0.439| us/op neon: Benchmark |(size)| Mode| Cnt | Score| Error |Units -------------------|------|-----|-----|-------|---------|----- StringEquals.equal |45 | avgt | 5 | 5.883 |? 0.173 | us/op StringEquals.equal | 64 |avgt |5 | 6.737 |? 0.035 |us/op StringEquals.equal | 91 | avgt |5 |8.997 |? 0.215 |us/op StringEquals.equal |121 | avgt | 5 | 10.789 |? 0.386 |us/op StringEquals.equal |181 |avgt |5 |14.063 |? 0.253 |us/op StringEquals.equal |256 | avgt |5 |19.679 | ? 1.419 |us/op StringEquals.equal |512 |avgt |5 |38.813 |? 1.378 |us/op StringEquals.equal |1024 |avgt |5 | 77.769 |? 3.082 | us/op >From the results, we can see that, * for small size (45~181), the performance of `ldp` version is not as good as `neon/ ld1` version * for big size, `ldp` version is better that `neon/ld1` version * all versions (both `ldp` and `ld1`) are better that old `simple` version . * I agree with you `ldp` version is better than `ld1` version at **last patch** because I used __ ldr(v0, __ Q, Address(__ post(a1, wordSize * 2))); __ ldr(v1, __ Q, Address(__ post(a2, wordSize * 2))); at last patch. However, I use __ ld1(v0, v1, __ T2D, Address(__ post(a1, loopThreshold))); __ ld1(v2, v3, __ T2D, Address(__ post(a2, loopThreshold))); in recent patch. I think this change has fixed the problem here. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From tschatzl at openjdk.java.net Mon Jul 5 08:06:51 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 5 Jul 2021 08:06:51 GMT Subject: [jdk17] RFR: 8269661: JNI_GetStringCritical does not lock char array In-Reply-To: References: Message-ID: On Sun, 4 Jul 2021 04:55:56 GMT, Kim Barrett wrote: > Please review this fix to jni_GetStringCritical. When object pinning is > being used it pins/unpins the String argument, but that's the wrong object. > It should be pinning the associated byte array (when !latin1), as it is a > reference into that array that will be returned and must be imobilized. > > Presently only Shenandoah is affected by this bug, as it is the only GC that > uses object pinning (at the region level) rather than the GC locker. But > both G1 and ZGC plan to use object pinning in the future. But with region > pinning the problem is still going to be rare because the String and its > value array are likely to be allocated in the same region. > > In addition, if pinning the String's value array, we also need to prevent > string deduplication from changing the array while in the critical section. > Otherwise, the unpin when the critical section is released will use the > wrong object for the unpin operation. We accomplish this by marking the > String as no longer subject to string deduplication. As that state is > sticky, when pinning is used a String can't be deduplicated after being used > in a critical section. We could add a critical section counter to String, > but for now we're going to guess that isn't worth the effort. (String has > at least 2 bytes available for the two string deduplication flags plus such > a counter (more in some configurations, but always at least 2 at present)). > > As part of the restructuring to accomplish these changes, it proved to be > easy to avoid using the GC locker (or pinning) for the latin1 case, where a > copy of the value array will be used. Hence this change also addresses > JDK-8269650. > > Testing: > Existing tests for GetStringCritical are in various vmTestbase/gc/lock > tests and vmTestbase/nsk/stress/jni/jnistress004.java. > > Ran mach5 tier1, tier5. Tier5 is where the vmTestbase/gc/lock and > vmTestbase/nsk/stress/jni tests are run. > > Locally (linux-x64) ran those tests with Shenandoah, with and without > string deduplication enabled. Marked as reviewed by tschatzl (Reviewer). src/hotspot/share/prims/jni.cpp line 2894: > 2892: ret[s_len] = 0; > 2893: } > 2894: if (isCopy != NULL) *isCopy = JNI_TRUE; Pre-existing: This also returns `JNI_TRUE` for `isCopy` if the return value is `NULL`. It probably does not matter, but seems strange. Not sure if something should be done about this (separately). ------------- PR: https://git.openjdk.java.net/jdk17/pull/209 From ngasson at openjdk.java.net Mon Jul 5 08:24:01 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 5 Jul 2021 08:24:01 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> <_lWD1fRe29XDPIa4-0Ft7OUNBsU0jWxEjJSoXiHlONI=.36acfd73-7093-4595-b6b1-c3f6836843bf@github.com> Message-ID: On Mon, 5 Jul 2021 06:38:00 GMT, Wang Huang wrote: >> * Why we unroll the loop ? >> * It is found that these codes degrades performance >> ``` >> br(LE, B16); >> subs(cnt1, cnt1, wordSize); >> br(LE, B24); >> subs(cnt1, cnt1, wordSize); >> >> We unroll the loop to remove these comparsion. > > * Why we unroll the loop ? > * It is found that these codes degrades performance > ``` > br(LE, B16); > subs(cnt1, cnt1, wordSize); > br(LE, B24); > subs(cnt1, cnt1, wordSize); > > We unroll the loop to remove these comparsion. But we need to balance performance against code size, given that this is expanded frequently. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From ngasson at openjdk.java.net Mon Jul 5 08:24:00 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 5 Jul 2021 08:24:00 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: <8_yMpXeVqT2wC_B0eWuS5s3XMQT-BySPtGJFxywQHR4=.dc55516e-5164-4be1-81f2-8aa7bd80ac1a@github.com> On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang wrote: >> Dear all, >> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. >> We profile the performance by using this JMH case: >> >> >> ```java >> package com.huawei.string; >> import java.util.*; >> import java.util.concurrent.TimeUnit; >> >> import org.openjdk.jmh.annotations.CompilerControl; >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.Level; >> import org.openjdk.jmh.annotations.OutputTimeUnit; >> import org.openjdk.jmh.annotations.Param; >> import org.openjdk.jmh.annotations.Scope; >> import org.openjdk.jmh.annotations.Setup; >> import org.openjdk.jmh.annotations.State; >> import org.openjdk.jmh.annotations.Fork; >> import org.openjdk.jmh.infra.Blackhole; >> >> @State(Scope.Thread) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class StringEqual { >> @Param({"8", "64", "4096"}) >> int size; >> >> String str1; >> String str2; >> >> @Setup(Level.Trial) >> public void init() { >> str1 = newString(size, 'c', '1'); >> str2 = newString(size, 'c', '2'); >> } >> >> public String newString(int length, char charToFill, char lastChar) { >> if (length > 0) { >> char[] array = new char[length]; >> Arrays.fill(array, charToFill); >> array[length - 1] = lastChar; >> return new String(array); >> } >> return ""; >> } >> >> @Benchmark >> @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> public boolean EqualString() { >> return str1.equals(str2); >> } >> } >> >> ``` >> The result is list as following:?Linux aarch64 with 128cores? >> >> Benchmark | (size) | Mode | Cnt | Score | Error | Units >> ----------------------------------|-------|---------|-------|------------|------------|---------- >> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms >> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms >> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms >> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms >> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms >> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms >> >> Yours, >> WANG Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > unroll when small string sizes src/hotspot/cpu/aarch64/globals_aarch64.hpp line 96: > 94: product(bool, UseSimpleArrayEquals, false, \ > 95: "Use simpliest and shortest implementation for array equals") \ > 96: product(bool, UseSimpleStringEquals, true, \ Do we really need a user-facing toggle, especially a product one? Under what situations do we expect the user to change this? It's useful for comparison but if the new implementation if demonstrably better then we should just delete the old one. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4658: > 4656: // Main 32 byte comparison loop. > 4657: __ bind(LOOP); > 4658: __ ld1(v0, v1, __ T2D, Address(__ post(a1, loopThreshold))); You need to reserve v0-3 as temporaries in the .ad file string_equals patterns otherwise we might be overwriting live values here. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From tschatzl at openjdk.java.net Mon Jul 5 08:24:57 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 5 Jul 2021 08:24:57 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects In-Reply-To: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Fri, 2 Jul 2021 20:36:23 GMT, Coleen Phillimore wrote: > Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). > Hoping git actions will test 32 bits. > Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. Same question about the additional +3, but looks good otherwise. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4675 From aph at openjdk.java.net Mon Jul 5 09:19:52 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 5 Jul 2021 09:19:52 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: <8_yMpXeVqT2wC_B0eWuS5s3XMQT-BySPtGJFxywQHR4=.dc55516e-5164-4be1-81f2-8aa7bd80ac1a@github.com> References: <8_yMpXeVqT2wC_B0eWuS5s3XMQT-BySPtGJFxywQHR4=.dc55516e-5164-4be1-81f2-8aa7bd80ac1a@github.com> Message-ID: On Mon, 5 Jul 2021 08:18:24 GMT, Nick Gasson wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> unroll when small string sizes > > src/hotspot/cpu/aarch64/globals_aarch64.hpp line 96: > >> 94: product(bool, UseSimpleArrayEquals, false, \ >> 95: "Use simpliest and shortest implementation for array equals") \ >> 96: product(bool, UseSimpleStringEquals, true, \ > > Do we really need a user-facing toggle, especially a product one? Under what situations do we expect the user to change this? It's useful for comparison but if the new implementation if demonstrably better then we should just delete the old one. True. But we've a way to go; taking this out should be the last step. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4658: > >> 4656: // Main 32 byte comparison loop. >> 4657: __ bind(LOOP); >> 4658: __ ld1(v0, v1, __ T2D, Address(__ post(a1, loopThreshold))); > > You need to reserve v0-3 as temporaries in the .ad file string_equals patterns otherwise we might be overwriting live values here. There's no point pursuing the use of vector registers any more. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From aph at openjdk.java.net Mon Jul 5 09:23:49 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 5 Jul 2021 09:23:49 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: <1KtE4vY2m9EWyX1_ZGG0JpQzvGb_mteSxZR6ecwUoO8=.73b0f977-634f-457d-aca6-749ea64f76b9@github.com> <_lWD1fRe29XDPIa4-0Ft7OUNBsU0jWxEjJSoXiHlONI=.36acfd73-7093-4595-b6b1-c3f6836843bf@github.com> Message-ID: On Mon, 5 Jul 2021 08:19:51 GMT, Nick Gasson wrote: >> * Why we unroll the loop ? >> * It is found that these codes degrades performance >> ``` >> br(LE, B16); >> subs(cnt1, cnt1, wordSize); >> br(LE, B24); >> subs(cnt1, cnt1, wordSize); >> >> We unroll the loop to remove these comparsion. > > But we need to balance performance against code size, given that this is expanded frequently. Nick is right. This unrolling at the call site is not going to be accepted. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From aph at openjdk.java.net Mon Jul 5 11:33:55 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 5 Jul 2021 11:33:55 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang wrote: >> Dear all, >> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. >> We profile the performance by using this JMH case: >> >> >> ```java >> package com.huawei.string; >> import java.util.*; >> import java.util.concurrent.TimeUnit; >> >> import org.openjdk.jmh.annotations.CompilerControl; >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.Level; >> import org.openjdk.jmh.annotations.OutputTimeUnit; >> import org.openjdk.jmh.annotations.Param; >> import org.openjdk.jmh.annotations.Scope; >> import org.openjdk.jmh.annotations.Setup; >> import org.openjdk.jmh.annotations.State; >> import org.openjdk.jmh.annotations.Fork; >> import org.openjdk.jmh.infra.Blackhole; >> >> @State(Scope.Thread) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class StringEqual { >> @Param({"8", "64", "4096"}) >> int size; >> >> String str1; >> String str2; >> >> @Setup(Level.Trial) >> public void init() { >> str1 = newString(size, 'c', '1'); >> str2 = newString(size, 'c', '2'); >> } >> >> public String newString(int length, char charToFill, char lastChar) { >> if (length > 0) { >> char[] array = new char[length]; >> Arrays.fill(array, charToFill); >> array[length - 1] = lastChar; >> return new String(array); >> } >> return ""; >> } >> >> @Benchmark >> @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> public boolean EqualString() { >> return str1.equals(str2); >> } >> } >> >> ``` >> The result is list as following:?Linux aarch64 with 128cores? >> >> Benchmark | (size) | Mode | Cnt | Score | Error | Units >> ----------------------------------|-------|---------|-------|------------|------------|---------- >> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms >> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms >> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms >> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms >> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms >> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms >> >> Yours, >> WANG Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > unroll when small string sizes Please bear in mind that String.equals() is typically used for short strings: identifiers, names, etc. The mean string length in most cases I've tried is around 18 characters. The use of String.equals() for long strings is unusual, and we should not burden typical usages with higher overheads for the sake of rare usages. The current String.equals() is a compromise: it performs fairly well on the String instances we expect to see, but it is not highly optimized for very long strings. Whatever you do, any replacement must not be worse for small strings. That is to say, it must not use many more registers or significantly more (expanded inline) code space. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From aph at openjdk.java.net Mon Jul 5 12:16:51 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 5 Jul 2021 12:16:51 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang wrote: >> Dear all, >> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. >> We profile the performance by using this JMH case: >> >> >> ```java >> package com.huawei.string; >> import java.util.*; >> import java.util.concurrent.TimeUnit; >> >> import org.openjdk.jmh.annotations.CompilerControl; >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.Level; >> import org.openjdk.jmh.annotations.OutputTimeUnit; >> import org.openjdk.jmh.annotations.Param; >> import org.openjdk.jmh.annotations.Scope; >> import org.openjdk.jmh.annotations.Setup; >> import org.openjdk.jmh.annotations.State; >> import org.openjdk.jmh.annotations.Fork; >> import org.openjdk.jmh.infra.Blackhole; >> >> @State(Scope.Thread) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class StringEqual { >> @Param({"8", "64", "4096"}) >> int size; >> >> String str1; >> String str2; >> >> @Setup(Level.Trial) >> public void init() { >> str1 = newString(size, 'c', '1'); >> str2 = newString(size, 'c', '2'); >> } >> >> public String newString(int length, char charToFill, char lastChar) { >> if (length > 0) { >> char[] array = new char[length]; >> Arrays.fill(array, charToFill); >> array[length - 1] = lastChar; >> return new String(array); >> } >> return ""; >> } >> >> @Benchmark >> @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> public boolean EqualString() { >> return str1.equals(str2); >> } >> } >> >> ``` >> The result is list as following:?Linux aarch64 with 128cores? >> >> Benchmark | (size) | Mode | Cnt | Score | Error | Units >> ----------------------------------|-------|---------|-------|------------|------------|---------- >> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms >> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms >> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms >> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms >> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms >> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms >> >> Yours, >> WANG Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > unroll when small string sizes There is one other thing I should mention, of which you may not be aware. Whenever you expand a macro inline, you reduce the opportunities for methods to be inlined. That's because if a method is bigger than (default) 2500 bytes, we do not inline it into other methods. Inlining is the most powerful optimization we have, but we need to prevent code size explosion. So not only does inlining code add pressure on the machine's icache, the HotSpot code cache, and so on, but it also prevents other optimizations. That's why we are very wary of increasing the size of String.equals to benefit unusual cases. ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From shade at openjdk.java.net Mon Jul 5 16:13:16 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 5 Jul 2021 16:13:16 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v7] In-Reply-To: References: Message-ID: > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: "acquire" is too slow on aarch64, and does not seem neccessary anyway ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2496/files - new: https://git.openjdk.java.net/jdk/pull/2496/files/4953a6dd..0e78dc98 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=05-06 Stats: 32 lines in 3 files changed: 13 ins; 7 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/2496.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496 PR: https://git.openjdk.java.net/jdk/pull/2496 From shade at openjdk.java.net Mon Jul 5 16:23:18 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 5 Jul 2021 16:23:18 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v7] In-Reply-To: References: Message-ID: On Mon, 5 Jul 2021 16:13:16 GMT, Aleksey Shipilev wrote: >> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. >> >> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. The reader side is much more interesting, because we generally want "consume", but it is not available. We can do "acquire", but it regresses performance all too much. >> >> The close inspection of the code reveals we don't even need "consume", since we don't read the evacuated object contents in GC code, and mutator code goes through runtime interface which provides enough cushion for relaxed reads to work. This must explain why current weaker reader side was never seen to fail. >> >> The relaxation in `try_update_forwardee` improves concurrent evacuation quite visibly. See for example GC cycle times with SPECjvm2008, Compiler.sunflow on AArch64: >> >> Before: >> >> >> [info][gc,stats] Concurrent Evacuation = 3.421 s (a = 21247 us) (n = 161) >> [info][gc,stats] Concurrent Evacuation = 3.584 s (a = 21080 us) (n = 170) >> [info][gc,stats] Concurrent Evacuation = 3.226 s (a = 21088 us) (n = 153) >> [info][gc,stats] Concurrent Evacuation = 3.270 s (a = 20827 us) (n = 157) >> [info][gc,stats] Concurrent Evacuation = 3.339 s (a = 20742 us) (n = 161) >> >> >> After: >> >> [info][gc,stats] Concurrent Evacuation = 3.109 s (a = 18617 us) (n = 167) >> [info][gc,stats] Concurrent Evacuation = 3.027 s (a = 18918 us) (n = 160) >> [info][gc,stats] Concurrent Evacuation = 2.862 s (a = 17669 us) (n = 162) >> [info][gc,stats] Concurrent Evacuation = 2.858 s (a = 17425 us) (n = 164) >> [info][gc,stats] Concurrent Evacuation = 2.883 s (a = 17685 us) (n = 163) >> >> >> Additional testing: >> - [x] Linux x86_64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `hotspot_gc_shenandoah` >> - [x] Linux x86_64 `tier1` with Shenandoah >> - [x] Linux AArch64 `tier1` with Shenandoah > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > "acquire" is too slow on aarch64, and does not seem neccessary anyway Performance tests on AArch64 reveal that doing "acquire" on the `get_forwardee` path is penalizing concurrent update references quite significantly. Therefore, I had to inspect the code and convince myself that current "relaxed" reads are actually fine. Therefore, we only need to relax the fwdptr store path with "release". See the PR body for sample performance data, and the code comments for more discussion. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From shade at openjdk.java.net Mon Jul 5 16:23:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 5 Jul 2021 16:23:17 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v8] In-Reply-To: References: Message-ID: <_IqJ7u4Vk7jF8E--2RzWfdnxYXDQr86TIsxA7sh_3WI=.4d2c4cd9-63c8-4921-b5a1-e77d66c10325@github.com> > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. The reader side is much more interesting, because we generally want "consume", but it is not available. We can do "acquire", but it regresses performance all too much. > > The close inspection of the code reveals we don't even need "consume", since we don't read the evacuated object contents in GC code, and mutator code goes through runtime interface which provides enough cushion for relaxed reads to work. This must explain why current weaker reader side was never seen to fail. > > The relaxation in `try_update_forwardee` improves concurrent evacuation quite visibly. See for example GC cycle times with SPECjvm2008, Compiler.sunflow on AArch64: > > Before: > > > [info][gc,stats] Concurrent Evacuation = 3.421 s (a = 21247 us) (n = 161) > [info][gc,stats] Concurrent Evacuation = 3.584 s (a = 21080 us) (n = 170) > [info][gc,stats] Concurrent Evacuation = 3.226 s (a = 21088 us) (n = 153) > [info][gc,stats] Concurrent Evacuation = 3.270 s (a = 20827 us) (n = 157) > [info][gc,stats] Concurrent Evacuation = 3.339 s (a = 20742 us) (n = 161) > > > After: > > [info][gc,stats] Concurrent Evacuation = 3.109 s (a = 18617 us) (n = 167) > [info][gc,stats] Concurrent Evacuation = 3.027 s (a = 18918 us) (n = 160) > [info][gc,stats] Concurrent Evacuation = 2.862 s (a = 17669 us) (n = 162) > [info][gc,stats] Concurrent Evacuation = 2.858 s (a = 17425 us) (n = 164) > [info][gc,stats] Concurrent Evacuation = 2.883 s (a = 17685 us) (n = 163) > > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux x86_64 `tier1` with Shenandoah > - [x] Linux AArch64 `tier1` with Shenandoah Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Add TODO ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2496/files - new: https://git.openjdk.java.net/jdk/pull/2496/files/0e78dc98..954dfc19 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=06-07 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2496.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496 PR: https://git.openjdk.java.net/jdk/pull/2496 From aph at openjdk.java.net Mon Jul 5 16:34:54 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 5 Jul 2021 16:34:54 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang wrote: >> Dear all, >> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. >> We profile the performance by using this JMH case: >> >> >> ```java >> package com.huawei.string; >> import java.util.*; >> import java.util.concurrent.TimeUnit; >> >> import org.openjdk.jmh.annotations.CompilerControl; >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.Level; >> import org.openjdk.jmh.annotations.OutputTimeUnit; >> import org.openjdk.jmh.annotations.Param; >> import org.openjdk.jmh.annotations.Scope; >> import org.openjdk.jmh.annotations.Setup; >> import org.openjdk.jmh.annotations.State; >> import org.openjdk.jmh.annotations.Fork; >> import org.openjdk.jmh.infra.Blackhole; >> >> @State(Scope.Thread) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class StringEqual { >> @Param({"8", "64", "4096"}) >> int size; >> >> String str1; >> String str2; >> >> @Setup(Level.Trial) >> public void init() { >> str1 = newString(size, 'c', '1'); >> str2 = newString(size, 'c', '2'); >> } >> >> public String newString(int length, char charToFill, char lastChar) { >> if (length > 0) { >> char[] array = new char[length]; >> Arrays.fill(array, charToFill); >> array[length - 1] = lastChar; >> return new String(array); >> } >> return ""; >> } >> >> @Benchmark >> @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> public boolean EqualString() { >> return str1.equals(str2); >> } >> } >> >> ``` >> The result is list as following:?Linux aarch64 with 128cores? >> >> Benchmark | (size) | Mode | Cnt | Score | Error | Units >> ----------------------------------|-------|---------|-------|------------|------------|---------- >> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms >> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms >> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms >> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms >> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms >> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms >> >> Yours, >> WANG Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > unroll when small string sizes Here are some Graviton 2 timings, five versions: +UseSimpleStringEquals: Benchmark (size) Mode Cnt Score Error Units StringEquals.equal 1 avgt 5 2.813 ? 0.001 us/op StringEquals.equal 3 avgt 5 2.821 ? 0.001 us/op StringEquals.equal 4 avgt 5 2.812 ? 0.001 us/op StringEquals.equal 6 avgt 5 2.821 ? 0.002 us/op StringEquals.equal 8 avgt 5 2.420 ? 0.001 us/op StringEquals.equal 11 avgt 5 2.420 ? 0.001 us/op StringEquals.equal 16 avgt 5 2.421 ? 0.002 us/op StringEquals.equal 21 avgt 5 3.291 ? 0.003 us/op StringEquals.equal 32 avgt 5 4.412 ? 0.001 us/op StringEquals.equal 45 avgt 5 5.623 ? 0.001 us/op StringEquals.equal 64 avgt 5 7.225 ? 0.010 us/op StringEquals.equal 91 avgt 5 10.426 ? 0.002 us/op StringEquals.equal 128 avgt 5 13.628 ? 0.001 us/op StringEquals.equal 181 avgt 5 19.231 ? 0.002 us/op StringEquals.equal 256 avgt 5 26.436 ? 0.009 us/op Your commit 4f02c00f55f1dc37762a04b7e30534ee27a7f20a: Benchmark (size) Mode Cnt Score Error Units StringEquals.equal 1 avgt 5 2.812 ? 0.001 us/op StringEquals.equal 3 avgt 5 3.212 ? 0.001 us/op StringEquals.equal 4 avgt 5 2.812 ? 0.001 us/op StringEquals.equal 6 avgt 5 3.212 ? 0.001 us/op StringEquals.equal 8 avgt 5 3.612 ? 0.001 us/op StringEquals.equal 11 avgt 5 4.413 ? 0.001 us/op StringEquals.equal 16 avgt 5 4.813 ? 0.001 us/op StringEquals.equal 21 avgt 5 5.613 ? 0.001 us/op StringEquals.equal 32 avgt 5 6.418 ? 0.001 us/op StringEquals.equal 45 avgt 5 7.614 ? 0.001 us/op StringEquals.equal 64 avgt 5 6.929 ? 0.081 us/op StringEquals.equal 91 avgt 5 9.617 ? 0.001 us/op StringEquals.equal 128 avgt 5 11.880 ? 0.152 us/op StringEquals.equal 181 avgt 5 16.576 ? 0.002 us/op StringEquals.equal 256 avgt 5 21.869 ? 0.108 us/op My hack using ldp: Benchmark (size) Mode Cnt Score Error Units StringEquals.equal 1 avgt 5 2.414 ? 0.001 us/op StringEquals.equal 3 avgt 5 2.814 ? 0.001 us/op StringEquals.equal 4 avgt 5 2.414 ? 0.001 us/op StringEquals.equal 6 avgt 5 2.814 ? 0.001 us/op StringEquals.equal 8 avgt 5 3.214 ? 0.001 us/op StringEquals.equal 11 avgt 5 4.015 ? 0.001 us/op StringEquals.equal 16 avgt 5 4.419 ? 0.001 us/op StringEquals.equal 21 avgt 5 5.216 ? 0.001 us/op StringEquals.equal 32 avgt 5 6.017 ? 0.001 us/op StringEquals.equal 45 avgt 5 7.218 ? 0.001 us/op StringEquals.equal 64 avgt 5 6.015 ? 0.001 us/op StringEquals.equal 91 avgt 5 8.967 ? 0.015 us/op StringEquals.equal 128 avgt 5 9.217 ? 0.001 us/op StringEquals.equal 181 avgt 5 14.096 ? 0.011 us/op StringEquals.equal 256 avgt 5 15.462 ? 0.259 us/op Today's -UseSimpleStringEquals: Benchmark (size) Mode Cnt Score Error Units StringEquals.equal 1 avgt 5 2.812 ? 0.001 us/op StringEquals.equal 3 avgt 5 3.212 ? 0.001 us/op StringEquals.equal 4 avgt 5 2.812 ? 0.001 us/op StringEquals.equal 6 avgt 5 3.212 ? 0.001 us/op StringEquals.equal 8 avgt 5 2.813 ? 0.002 us/op StringEquals.equal 11 avgt 5 2.813 ? 0.001 us/op StringEquals.equal 16 avgt 5 2.813 ? 0.001 us/op StringEquals.equal 21 avgt 5 3.615 ? 0.001 us/op StringEquals.equal 32 avgt 5 4.414 ? 0.001 us/op StringEquals.equal 45 avgt 5 7.080 ? 0.027 us/op StringEquals.equal 64 avgt 5 7.613 ? 0.001 us/op StringEquals.equal 91 avgt 5 10.037 ? 0.005 us/op StringEquals.equal 128 avgt 5 10.419 ? 0.001 us/op StringEquals.equal 181 avgt 5 14.896 ? 0.004 us/op StringEquals.equal 256 avgt 5 16.823 ? 0.001 us/op ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From aph at openjdk.java.net Mon Jul 5 16:50:50 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 5 Jul 2021 16:50:50 GMT Subject: RFR: 8268229: Aarch64: Use Neon in intrinsics for String.equals [v3] In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 09:54:32 GMT, Wang Huang wrote: >> Dear all, >> Could you give me a favor to review this patch? It improves the performance of the intrinsic of `String.equals` on Neon backend of Aarch64. >> We profile the performance by using this JMH case: >> >> >> ```java >> package com.huawei.string; >> import java.util.*; >> import java.util.concurrent.TimeUnit; >> >> import org.openjdk.jmh.annotations.CompilerControl; >> import org.openjdk.jmh.annotations.Benchmark; >> import org.openjdk.jmh.annotations.Level; >> import org.openjdk.jmh.annotations.OutputTimeUnit; >> import org.openjdk.jmh.annotations.Param; >> import org.openjdk.jmh.annotations.Scope; >> import org.openjdk.jmh.annotations.Setup; >> import org.openjdk.jmh.annotations.State; >> import org.openjdk.jmh.annotations.Fork; >> import org.openjdk.jmh.infra.Blackhole; >> >> @State(Scope.Thread) >> @OutputTimeUnit(TimeUnit.MILLISECONDS) >> public class StringEqual { >> @Param({"8", "64", "4096"}) >> int size; >> >> String str1; >> String str2; >> >> @Setup(Level.Trial) >> public void init() { >> str1 = newString(size, 'c', '1'); >> str2 = newString(size, 'c', '2'); >> } >> >> public String newString(int length, char charToFill, char lastChar) { >> if (length > 0) { >> char[] array = new char[length]; >> Arrays.fill(array, charToFill); >> array[length - 1] = lastChar; >> return new String(array); >> } >> return ""; >> } >> >> @Benchmark >> @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> public boolean EqualString() { >> return str1.equals(str2); >> } >> } >> >> ``` >> The result is list as following:?Linux aarch64 with 128cores? >> >> Benchmark | (size) | Mode | Cnt | Score | Error | Units >> ----------------------------------|-------|---------|-------|------------|------------|---------- >> StringEqual.EqualString | 8 | thrpt | 10 | 123971.994 | ? 1462.131 | ops/ms >> StringEqual.EqualString | 64 | thrpt | 10 | 56009.960 | ? 999.734 | ops/ms >> StringEqual.EqualString | 4096 | thrpt | 10 | 1943.852 | ? 8.159 | ops/ms >> StringEqual.EqualStringWithNEON | 8 | thrpt | 10 | 120319.271 | ? 1392.185 | ops/ms >> StringEqual.EqualStringWithNEON | 64 | thrpt | 10 | 72914.767 | ? 1814.173 | ops/ms >> StringEqual.EqualStringWithNEON | 4096 | thrpt | 10 | 2579.155 | ? 15.589 | ops/ms >> >> Yours, >> WANG Huang > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > unroll when small string sizes I'm still seeing a slight advantage for `ldp` on Graviton 2: Benchmark (size) Mode Cnt Score Error Units StringEquals.equal 256 avgt 5 15.592 ? 0.080 us/op StringEquals.equal 512 avgt 5 28.467 ? 0.245 us/op StringEquals.equal 1024 avgt 5 53.883 ? 0.272 us/op Versus the latest Neon version: Benchmark (size) Mode Cnt Score Error Units StringEquals.equal 256 avgt 5 16.848 ? 0.158 us/op StringEquals.equal 512 avgt 5 29.640 ? 0.024 us/op StringEquals.equal 1024 avgt 5 55.257 ? 0.050 us/op ------------- PR: https://git.openjdk.java.net/jdk/pull/4423 From dholmes at openjdk.java.net Mon Jul 5 23:15:02 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 5 Jul 2021 23:15:02 GMT Subject: RFR: 8269882: stack-use-after-scope in NewObjectA Message-ID: Please review this trivial fix to add '&' so that a macro parameter is passed as a reference. See bug for details. Thanks, David ------------- Commit messages: - 8269882: stack-use-after-scope in NewObjectA Changes: https://git.openjdk.java.net/jdk/pull/4683/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4683&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269882 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4683.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4683/head:pull/4683 PR: https://git.openjdk.java.net/jdk/pull/4683 From jwilhelm at openjdk.java.net Mon Jul 5 23:32:18 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Mon, 5 Jul 2021 23:32:18 GMT Subject: RFR: Merge jdk17 Message-ID: <67qSGBCFyCiWEsK6JnokVIBYDWXAN7uHbnilBkWAANM=.900d6c9c-c2aa-4cdc-929e-577fa9de79db@github.com> Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8266595: jdk/jfr/jcmd/TestJcmdDump.java with slowdebug bits fails with AttachNotSupportedException - 8269668: [aarch64] java.library.path not including /usr/lib64 - 8268775: Password is being converted to String in AccessibleJPasswordField The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4684&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4684&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4684/files Stats: 34 lines in 3 files changed: 12 ins; 8 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/4684.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4684/head:pull/4684 PR: https://git.openjdk.java.net/jdk/pull/4684 From jwilhelm at openjdk.java.net Tue Jul 6 00:16:59 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 6 Jul 2021 00:16:59 GMT Subject: RFR: Merge jdk17 [v2] In-Reply-To: <67qSGBCFyCiWEsK6JnokVIBYDWXAN7uHbnilBkWAANM=.900d6c9c-c2aa-4cdc-929e-577fa9de79db@github.com> References: <67qSGBCFyCiWEsK6JnokVIBYDWXAN7uHbnilBkWAANM=.900d6c9c-c2aa-4cdc-929e-577fa9de79db@github.com> Message-ID: <3y1H2jlOJ6k1kXnNJzwM7ypThkw5nLI9DSZfzq35Cz8=.d90637cf-ea69-4f04-9a59-29d4d9487c45@github.com> > Forwardport JDK 17 -> JDK 18 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 153 commits: - Merge - 8233020: (fs) UnixFileSystemProvider should use StaticProperty.userDir(). Reviewed-by: alanb - 8269760: idea.sh should not invoke cygpath directly Reviewed-by: mcimadamore, erikj - 8269758: idea.sh doesn't work when there are multiple configurations available. Reviewed-by: mcimadamore, erikj - 8263389: IGV: Zooming changes the point that is currently centered Reviewed-by: rrich, neliasso - 8269700: source level for IntelliJ JDK project is set incorrectly Reviewed-by: mcimadamore - 8269124: Update java.time to use switch expressions (part II) Reviewed-by: dfuchs, vtewari, aefimov, iris, lancea, naoto - 8269821: Remove is-queue-active check in inner loop of write_ref_array_pre_work Reviewed-by: ayang, kbarrett - 8269004: Implement ResizableResourceHashtable Reviewed-by: coleenp, kbarrett - 8269652: Factor out the common code for creating system j.l.Thread objects Reviewed-by: coleenp, dcubed, kvn, xliu - ... and 143 more: https://git.openjdk.java.net/jdk/compare/5b8e1a26...b282c112 ------------- Changes: https://git.openjdk.java.net/jdk/pull/4684/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4684&range=01 Stats: 33778 lines in 717 files changed: 19338 ins; 11716 del; 2724 mod Patch: https://git.openjdk.java.net/jdk/pull/4684.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4684/head:pull/4684 PR: https://git.openjdk.java.net/jdk/pull/4684 From jwilhelm at openjdk.java.net Tue Jul 6 00:16:59 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 6 Jul 2021 00:16:59 GMT Subject: Integrated: Merge jdk17 In-Reply-To: <67qSGBCFyCiWEsK6JnokVIBYDWXAN7uHbnilBkWAANM=.900d6c9c-c2aa-4cdc-929e-577fa9de79db@github.com> References: <67qSGBCFyCiWEsK6JnokVIBYDWXAN7uHbnilBkWAANM=.900d6c9c-c2aa-4cdc-929e-577fa9de79db@github.com> Message-ID: On Mon, 5 Jul 2021 23:21:21 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: a18a1129 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/a18a1129639a9650d9b6cea7f11dab9ce8d4cd59 Stats: 34 lines in 3 files changed: 12 ins; 8 del; 14 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4684 From yyang at openjdk.java.net Tue Jul 6 02:27:53 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 6 Jul 2021 02:27:53 GMT Subject: RFR: 8268425: Show decimal nid of OSThread instead of hex format one [v5] In-Reply-To: References: <2XpHch1KL91iW9wQ9VdboCFdkyUxdCwCq_-Dad6zo4E=.b01db185-1596-4f0c-b1ee-2d125d50963c@github.com> Message-ID: On Wed, 30 Jun 2021 06:38:30 GMT, Yi Yang wrote: >> From users' perspective, we can find corresponding os thread via top directly, otherwise, we must convert hex format based nid to an integer, and find that thread via `top -pid `. This slightly facilitates our debugging process, but would obviously break some existing jstack analysis tool. >> >> Jstack Before: >> >> "ParGC Thread#7" os_prio=0 cpu=103260.18ms elapsed=5255043.58s tid=0x00007f967000b000 nid=0x12e67 runnable >> >> "ParGC Thread#8" os_prio=0 cpu=104818.76ms elapsed=5255043.58s tid=0x00007f967000c000 nid=0x12e68 runnable >> >> "ParGC Thread#9" os_prio=0 cpu=102164.69ms elapsed=5255043.58s tid=0x00007f967000e000 nid=0x12e69 runnable >> >> Jstack After: >> "G1 Conc#0" os_prio=0 cpu=0.03ms elapsed=1295.27s tid=0x00007f99dc096490 nid=117707 runnable >> >> "G1 Refine#0" os_prio=0 cpu=0.06ms elapsed=1295.22s tid=0x00007f99dc2cad20 nid=117708 runnable >> >> "G1 Service" os_prio=0 cpu=87.05ms elapsed=1295.22s tid=0x00007f99dc2cc140 nid=117709 runnable >> >> Top: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 49083 tianxia+ 20 0 32.8g 594148 10796 S 103.3 0.1 0:10.05 java >> 71291 qingfen+ 20 0 39.3g 26.7g 18312 S 100.7 5.3 16861:35 jhsdb >> 50407 tianxia+ 20 0 32.5g 32796 9768 S 100.3 0.0 0:05.80 java >> 107429 maolian+ 20 0 11.4g 1.1g 10956 S 100.3 0.2 20173:52 java >> 99923 root 10 -10 288520 163228 5088 S 5.9 0.0 6463:53 AliYunDun > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > use \p{XDigit} @dholmes-ora David, can you please take a look at the latest versions. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/4449 From dholmes at openjdk.java.net Tue Jul 6 03:11:57 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 6 Jul 2021 03:11:57 GMT Subject: RFR: 8268425: Show decimal nid of OSThread instead of hex format one [v5] In-Reply-To: References: <2XpHch1KL91iW9wQ9VdboCFdkyUxdCwCq_-Dad6zo4E=.b01db185-1596-4f0c-b1ee-2d125d50963c@github.com> Message-ID: On Wed, 30 Jun 2021 06:38:30 GMT, Yi Yang wrote: >> From users' perspective, we can find corresponding os thread via top directly, otherwise, we must convert hex format based nid to an integer, and find that thread via `top -pid `. This slightly facilitates our debugging process, but would obviously break some existing jstack analysis tool. >> >> Jstack Before: >> >> "ParGC Thread#7" os_prio=0 cpu=103260.18ms elapsed=5255043.58s tid=0x00007f967000b000 nid=0x12e67 runnable >> >> "ParGC Thread#8" os_prio=0 cpu=104818.76ms elapsed=5255043.58s tid=0x00007f967000c000 nid=0x12e68 runnable >> >> "ParGC Thread#9" os_prio=0 cpu=102164.69ms elapsed=5255043.58s tid=0x00007f967000e000 nid=0x12e69 runnable >> >> Jstack After: >> "G1 Conc#0" os_prio=0 cpu=0.03ms elapsed=1295.27s tid=0x00007f99dc096490 nid=117707 runnable >> >> "G1 Refine#0" os_prio=0 cpu=0.06ms elapsed=1295.22s tid=0x00007f99dc2cad20 nid=117708 runnable >> >> "G1 Service" os_prio=0 cpu=87.05ms elapsed=1295.22s tid=0x00007f99dc2cc140 nid=117709 runnable >> >> Top: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 49083 tianxia+ 20 0 32.8g 594148 10796 S 103.3 0.1 0:10.05 java >> 71291 qingfen+ 20 0 39.3g 26.7g 18312 S 100.7 5.3 16861:35 jhsdb >> 50407 tianxia+ 20 0 32.5g 32796 9768 S 100.3 0.0 0:05.80 java >> 107429 maolian+ 20 0 11.4g 1.1g 10956 S 100.3 0.2 20173:52 java >> 99923 root 10 -10 288520 163228 5088 S 5.9 0.0 6463:53 AliYunDun > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > use \p{XDigit} src/hotspot/share/runtime/osThread.cpp line 41: > 39: // Printing > 40: void OSThread::print_on(outputStream *st) const { > 41: st->print("nid=" UINT64_FORMAT " ", (uint64_t)thread_id()); Why are you forcing this to be a 64-bit type? ------------- PR: https://git.openjdk.java.net/jdk/pull/4449 From yyang at openjdk.java.net Tue Jul 6 03:25:57 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Tue, 6 Jul 2021 03:25:57 GMT Subject: RFR: 8268425: Show decimal nid of OSThread instead of hex format one [v5] In-Reply-To: References: <2XpHch1KL91iW9wQ9VdboCFdkyUxdCwCq_-Dad6zo4E=.b01db185-1596-4f0c-b1ee-2d125d50963c@github.com> Message-ID: On Tue, 6 Jul 2021 03:08:42 GMT, David Holmes wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> use \p{XDigit} > > src/hotspot/share/runtime/osThread.cpp line 41: > >> 39: // Printing >> 40: void OSThread::print_on(outputStream *st) const { >> 41: st->print("nid=" UINT64_FORMAT " ", (uint64_t)thread_id()); > > Why are you forcing this to be a 64-bit type? IMHO, I prefer using `%d` since a large portion of existing code using `%d`. Thomas suggests using UINT64_FORMAT rather than `%d`: > You'd do: > print("nid: " UINT64_FORMAT, (uint64_t) id):; > thread_t is, among other things, pthread_t, which is opaque. Any current code treating that as signed int is incorrect too. There is no uniform format for the formatted output of thread_id in hotspot. As far as I can see, `%ld` `%d` and `UINTX_FORMAT` are used, so I want to left the decision to reviewers. ------------- PR: https://git.openjdk.java.net/jdk/pull/4449 From david.holmes at oracle.com Tue Jul 6 03:31:26 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 6 Jul 2021 13:31:26 +1000 Subject: RFR: 8268425: Show decimal nid of OSThread instead of hex format one [v5] In-Reply-To: References: <2XpHch1KL91iW9wQ9VdboCFdkyUxdCwCq_-Dad6zo4E=.b01db185-1596-4f0c-b1ee-2d125d50963c@github.com> Message-ID: <2928e4d8-8889-f289-5b15-3f434df3f742@oracle.com> On 6/07/2021 1:25 pm, Yi Yang wrote: > On Tue, 6 Jul 2021 03:08:42 GMT, David Holmes wrote: > >>> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >>> >>> use \p{XDigit} >> >> src/hotspot/share/runtime/osThread.cpp line 41: >> >>> 39: // Printing >>> 40: void OSThread::print_on(outputStream *st) const { >>> 41: st->print("nid=" UINT64_FORMAT " ", (uint64_t)thread_id()); >> >> Why are you forcing this to be a 64-bit type? > > IMHO, I prefer using `%d` since a large portion of existing code using `%d`. Thomas suggests using UINT64_FORMAT rather than `%d`: >> You'd do: > >> print("nid: " UINT64_FORMAT, (uint64_t) id):; > >> thread_t is, among other things, pthread_t, which is opaque. Any current code treating that as signed int is incorrect too. > > There is no uniform format for the formatted output of thread_id in hotspot. As far as I can see, `%ld` `%d` and `UINTX_FORMAT` are used, so I want to left the decision to reviewers. Okay. This is a mess but that's not your issue. At least a 64-bit decimal value won't show any leading zeroes so it doesn't really matter. If Thomas and Keven are happy with the latest changes then that is fine. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/4449 > From kbarrett at openjdk.java.net Tue Jul 6 07:19:51 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 07:19:51 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 13:06:55 GMT, Harold Seigel wrote: > Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.hpp line 85: > 83: Counters _discovered_count; > 84: Counters _enqueued_count; > 85: NONCOPYABLE(ShenandoahRefProcThreadLocal); I think this change should not be made. We Oracle folks don't make casual changes to Shenandoah, since we don't support or test it. The Shenandoah team is free to write it out rather than using the macro if they prefer. ------------- PR: https://git.openjdk.java.net/jdk/pull/4652 From kbarrett at openjdk.java.net Tue Jul 6 07:19:51 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 07:19:51 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: On Mon, 5 Jul 2021 01:49:41 GMT, David Holmes wrote: >> Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. >> >> Thanks, Harold > > src/hotspot/share/classfile/stackMapTableFormat.hpp line 47: > >> 45: >> 46: protected: >> 47: // No constructors - should be 'private', but GCC issues a warning if it is > > Have we checked this is still the case? Might be a very old gcc issue. I did a little bit of experimenting and couldn't get gcc to warn, though other compilers might do so. I think such a warning is not unreasonable. ------------- PR: https://git.openjdk.java.net/jdk/pull/4652 From kbarrett at openjdk.java.net Tue Jul 6 07:31:50 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 07:31:50 GMT Subject: [jdk17] RFR: 8269661: JNI_GetStringCritical does not lock char array In-Reply-To: References: Message-ID: <-IHgdQYvstzvLQgl38hjMQBjsd72wPX__N5wcBD2CAU=.68e7d11d-4959-43e4-a84c-5f2c85b05f2c@github.com> On Mon, 5 Jul 2021 00:47:09 GMT, David Holmes wrote: >> Please review this fix to jni_GetStringCritical. When object pinning is >> being used it pins/unpins the String argument, but that's the wrong object. >> It should be pinning the associated byte array (when !latin1), as it is a >> reference into that array that will be returned and must be imobilized. >> >> Presently only Shenandoah is affected by this bug, as it is the only GC that >> uses object pinning (at the region level) rather than the GC locker. But >> both G1 and ZGC plan to use object pinning in the future. But with region >> pinning the problem is still going to be rare because the String and its >> value array are likely to be allocated in the same region. >> >> In addition, if pinning the String's value array, we also need to prevent >> string deduplication from changing the array while in the critical section. >> Otherwise, the unpin when the critical section is released will use the >> wrong object for the unpin operation. We accomplish this by marking the >> String as no longer subject to string deduplication. As that state is >> sticky, when pinning is used a String can't be deduplicated after being used >> in a critical section. We could add a critical section counter to String, >> but for now we're going to guess that isn't worth the effort. (String has >> at least 2 bytes available for the two string deduplication flags plus such >> a counter (more in some configurations, but always at least 2 at present)). >> >> As part of the restructuring to accomplish these changes, it proved to be >> easy to avoid using the GC locker (or pinning) for the latin1 case, where a >> copy of the value array will be used. Hence this change also addresses >> JDK-8269650. >> >> Testing: >> Existing tests for GetStringCritical are in various vmTestbase/gc/lock >> tests and vmTestbase/nsk/stress/jni/jnistress004.java. >> >> Ran mach5 tier1, tier5. Tier5 is where the vmTestbase/gc/lock and >> vmTestbase/nsk/stress/jni tests are run. >> >> Locally (linux-x64) ran those tests with Shenandoah, with and without >> string deduplication enabled. > > Hi Kim, > > Based on your description and previous discussion this all looks good to me. > > Thanks, > David Thanks for reviews @dholmes-ora and @tschatzl . ------------- PR: https://git.openjdk.java.net/jdk17/pull/209 From kbarrett at openjdk.java.net Tue Jul 6 07:31:51 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 07:31:51 GMT Subject: [jdk17] RFR: 8269661: JNI_GetStringCritical does not lock char array In-Reply-To: References: Message-ID: On Mon, 5 Jul 2021 07:51:52 GMT, Thomas Schatzl wrote: >> Please review this fix to jni_GetStringCritical. When object pinning is >> being used it pins/unpins the String argument, but that's the wrong object. >> It should be pinning the associated byte array (when !latin1), as it is a >> reference into that array that will be returned and must be imobilized. >> >> Presently only Shenandoah is affected by this bug, as it is the only GC that >> uses object pinning (at the region level) rather than the GC locker. But >> both G1 and ZGC plan to use object pinning in the future. But with region >> pinning the problem is still going to be rare because the String and its >> value array are likely to be allocated in the same region. >> >> In addition, if pinning the String's value array, we also need to prevent >> string deduplication from changing the array while in the critical section. >> Otherwise, the unpin when the critical section is released will use the >> wrong object for the unpin operation. We accomplish this by marking the >> String as no longer subject to string deduplication. As that state is >> sticky, when pinning is used a String can't be deduplicated after being used >> in a critical section. We could add a critical section counter to String, >> but for now we're going to guess that isn't worth the effort. (String has >> at least 2 bytes available for the two string deduplication flags plus such >> a counter (more in some configurations, but always at least 2 at present)). >> >> As part of the restructuring to accomplish these changes, it proved to be >> easy to avoid using the GC locker (or pinning) for the latin1 case, where a >> copy of the value array will be used. Hence this change also addresses >> JDK-8269650. >> >> Testing: >> Existing tests for GetStringCritical are in various vmTestbase/gc/lock >> tests and vmTestbase/nsk/stress/jni/jnistress004.java. >> >> Ran mach5 tier1, tier5. Tier5 is where the vmTestbase/gc/lock and >> vmTestbase/nsk/stress/jni tests are run. >> >> Locally (linux-x64) ran those tests with Shenandoah, with and without >> string deduplication enabled. > > src/hotspot/share/prims/jni.cpp line 2894: > >> 2892: ret[s_len] = 0; >> 2893: } >> 2894: if (isCopy != NULL) *isCopy = JNI_TRUE; > > Pre-existing: This also returns `JNI_TRUE` for `isCopy` if the return value is `NULL`. It probably does not matter, but seems strange. Not sure if something should be done about this (separately). I noticed that, but decided against making any change to that behavior. I think it's arguable whether it's correct usage, but I can imagine a caller that used a JNI_FALSE isCopy result to indicate no copy was attempted so no OOME could have happened so the NULL check of the result isn't needed. The specification doesn't say what the isCopy value should be in the OOME case. ------------- PR: https://git.openjdk.java.net/jdk17/pull/209 From shade at openjdk.java.net Tue Jul 6 07:36:50 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 6 Jul 2021 07:36:50 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: <5Fe2aqTCHFpotxcrCvIfH81EleWfUMY4_EYq50Lba7w=.5c5753e0-4359-4b72-a19b-af3fe6827050@github.com> On Tue, 6 Jul 2021 07:03:33 GMT, Kim Barrett wrote: >> Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. >> >> Thanks, Harold > > src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.hpp line 85: > >> 83: Counters _discovered_count; >> 84: Counters _enqueued_count; >> 85: NONCOPYABLE(ShenandoahRefProcThreadLocal); > > I think this change should not be made. We Oracle folks don't make casual changes to Shenandoah, since we don't support or test it. The Shenandoah team is free to write it out rather than using the macro if they prefer. OTOH, it is fine to suggest minor/cosmetic changes to the entirety of the OpenJDK code, and ask others to review/test the bits that are not built/tested by the submitter. This change is fine (AFAICS, it is semantically the same as the old code), and it passes `hotspot_gc_shenandoah`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4652 From shade at openjdk.java.net Tue Jul 6 07:36:49 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 6 Jul 2021 07:36:49 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 13:06:55 GMT, Harold Seigel wrote: > Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Shenandoah bit looks good. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4652 From kbarrett at openjdk.java.net Tue Jul 6 07:44:22 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 07:44:22 GMT Subject: [jdk17] RFR: 8269661: JNI_GetStringCritical does not lock char array [v2] In-Reply-To: References: Message-ID: > Please review this fix to jni_GetStringCritical. When object pinning is > being used it pins/unpins the String argument, but that's the wrong object. > It should be pinning the associated byte array (when !latin1), as it is a > reference into that array that will be returned and must be imobilized. > > Presently only Shenandoah is affected by this bug, as it is the only GC that > uses object pinning (at the region level) rather than the GC locker. But > both G1 and ZGC plan to use object pinning in the future. But with region > pinning the problem is still going to be rare because the String and its > value array are likely to be allocated in the same region. > > In addition, if pinning the String's value array, we also need to prevent > string deduplication from changing the array while in the critical section. > Otherwise, the unpin when the critical section is released will use the > wrong object for the unpin operation. We accomplish this by marking the > String as no longer subject to string deduplication. As that state is > sticky, when pinning is used a String can't be deduplicated after being used > in a critical section. We could add a critical section counter to String, > but for now we're going to guess that isn't worth the effort. (String has > at least 2 bytes available for the two string deduplication flags plus such > a counter (more in some configurations, but always at least 2 at present)). > > As part of the restructuring to accomplish these changes, it proved to be > easy to avoid using the GC locker (or pinning) for the latin1 case, where a > copy of the value array will be used. Hence this change also addresses > JDK-8269650. > > Testing: > Existing tests for GetStringCritical are in various vmTestbase/gc/lock > tests and vmTestbase/nsk/stress/jni/jnistress004.java. > > Ran mach5 tier1, tier5. Tier5 is where the vmTestbase/gc/lock and > vmTestbase/nsk/stress/jni tests are run. > > Locally (linux-x64) ran those tests with Shenandoah, with and without > string deduplication enabled. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into getstringcritical - conditionally lock the string value array ------------- Changes: - all: https://git.openjdk.java.net/jdk17/pull/209/files - new: https://git.openjdk.java.net/jdk17/pull/209/files/ea8bd386..32bc98fc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk17&pr=209&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk17&pr=209&range=00-01 Stats: 254 lines in 7 files changed: 233 ins; 1 del; 20 mod Patch: https://git.openjdk.java.net/jdk17/pull/209.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/209/head:pull/209 PR: https://git.openjdk.java.net/jdk17/pull/209 From kbarrett at openjdk.java.net Tue Jul 6 07:44:24 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 07:44:24 GMT Subject: [jdk17] Integrated: 8269661: JNI_GetStringCritical does not lock char array In-Reply-To: References: Message-ID: On Sun, 4 Jul 2021 04:55:56 GMT, Kim Barrett wrote: > Please review this fix to jni_GetStringCritical. When object pinning is > being used it pins/unpins the String argument, but that's the wrong object. > It should be pinning the associated byte array (when !latin1), as it is a > reference into that array that will be returned and must be imobilized. > > Presently only Shenandoah is affected by this bug, as it is the only GC that > uses object pinning (at the region level) rather than the GC locker. But > both G1 and ZGC plan to use object pinning in the future. But with region > pinning the problem is still going to be rare because the String and its > value array are likely to be allocated in the same region. > > In addition, if pinning the String's value array, we also need to prevent > string deduplication from changing the array while in the critical section. > Otherwise, the unpin when the critical section is released will use the > wrong object for the unpin operation. We accomplish this by marking the > String as no longer subject to string deduplication. As that state is > sticky, when pinning is used a String can't be deduplicated after being used > in a critical section. We could add a critical section counter to String, > but for now we're going to guess that isn't worth the effort. (String has > at least 2 bytes available for the two string deduplication flags plus such > a counter (more in some configurations, but always at least 2 at present)). > > As part of the restructuring to accomplish these changes, it proved to be > easy to avoid using the GC locker (or pinning) for the latin1 case, where a > copy of the value array will be used. Hence this change also addresses > JDK-8269650. > > Testing: > Existing tests for GetStringCritical are in various vmTestbase/gc/lock > tests and vmTestbase/nsk/stress/jni/jnistress004.java. > > Ran mach5 tier1, tier5. Tier5 is where the vmTestbase/gc/lock and > vmTestbase/nsk/stress/jni tests are run. > > Locally (linux-x64) ran those tests with Shenandoah, with and without > string deduplication enabled. This pull request has now been integrated. Changeset: 0f4e07b7 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk17/commit/0f4e07b7d9190dd44b2fd65eff58fb6ec983a467 Stats: 73 lines in 3 files changed: 60 ins; 10 del; 3 mod 8269661: JNI_GetStringCritical does not lock char array 8269650: Optimize gc-locker in [Get|Release]StringCritical for latin string Reviewed-by: dholmes, tschatzl ------------- PR: https://git.openjdk.java.net/jdk17/pull/209 From kbarrett at openjdk.java.net Tue Jul 6 07:44:51 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 07:44:51 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 13:06:55 GMT, Harold Seigel wrote: > Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4652 From kbarrett at openjdk.java.net Tue Jul 6 07:44:52 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 07:44:52 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: <5Fe2aqTCHFpotxcrCvIfH81EleWfUMY4_EYq50Lba7w=.5c5753e0-4359-4b72-a19b-af3fe6827050@github.com> References: <5Fe2aqTCHFpotxcrCvIfH81EleWfUMY4_EYq50Lba7w=.5c5753e0-4359-4b72-a19b-af3fe6827050@github.com> Message-ID: <3dqApek2b5eDgZqfVuWa26VMxAChgUMtVU4WmF33py8=.b6a00624-08eb-4b26-969f-9743ce893a1e@github.com> On Tue, 6 Jul 2021 07:32:07 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.hpp line 85: >> >>> 83: Counters _discovered_count; >>> 84: Counters _enqueued_count; >>> 85: NONCOPYABLE(ShenandoahRefProcThreadLocal); >> >> I think this change should not be made. We Oracle folks don't make casual changes to Shenandoah, since we don't support or test it. The Shenandoah team is free to write it out rather than using the macro if they prefer. > > OTOH, it is fine to suggest minor/cosmetic changes to the entirety of the OpenJDK code, and ask others to review/test the bits that are not built/tested by the submitter. > > This change is fine (AFAICS, it is semantically the same as the old code), and it passes `hotspot_gc_shenandoah`. Then I'll withdraw my objection. ------------- PR: https://git.openjdk.java.net/jdk/pull/4652 From jbhateja at openjdk.java.net Tue Jul 6 09:09:51 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 6 Jul 2021 09:09:51 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: <4uATHQKUM4LXeAWVeBG7T-icSBOiw7OoCRghjC316kg=.c92d41e2-27a6-496d-ae28-2870fcf8c8b1@github.com> References: <4uATHQKUM4LXeAWVeBG7T-icSBOiw7OoCRghjC316kg=.c92d41e2-27a6-496d-ae28-2870fcf8c8b1@github.com> Message-ID: <58_N4Nv-idYtlSuMr9U3bSpy1CTd7Y3FpJHXCQXuxzg=.bb911554-6fdb-46f5-82b3-0868163aff3c@github.com> On Fri, 2 Jul 2021 20:37:24 GMT, Dean Long wrote: >> Knights family of X86 Intel CPU (KNL) does not support some of avx512 features (AVX512VL/BW) and have other restrictions. We may not have such kind of machines in our testing environment and may miss bugs as JBS history shows (look recent fixes for KNL). >> On other hand we have some Windows VM instances which seem emulate KNL configuration and limit avx512 instructions on CPU which supports full set. Recent bug JDK-8269775 shows such example. >> I suggest to add -XX:+UseKNLSetting x86 diagnostic flag to emulate KNL settings in HotSpot VM to test such configuration. > > Marked as reviewed by dlong (Reviewer). Hi @dean-long , @vnkozlov , Since the idea is to test KNL settings on any latest X86 Server platform we should also turn off following options. if (is_intel()) { // Intel cpus specific settings if (is_knights_family()) { _features &= ~CPU_VZEROUPPER; _features &= ~CPU_AVX512BW; _features &= ~CPU_AVX512VL; + _features &= ~CPU_AVX512DQ; + _features &= ~CPU_AVX512_VNNI; + _features &= ~CPU_AVX512_VAES; + _features &= ~CPU_AVX512_VPOPCNTDQ; + _features &= ~CPU_AVX512_VPCLMULQDQ; + _features &= ~CPU_AVX512_VBMI; + _features &= ~CPU_AVX512_VBMI2; } } ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From zgu at openjdk.java.net Tue Jul 6 12:29:01 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 6 Jul 2021 12:29:01 GMT Subject: Integrated: 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array In-Reply-To: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> References: <_2wLsWH3nda7_VjcP1R8HYf53avdsnYdeqaJeZZ0fbQ=.39912c26-11f7-447b-9fc1-bc7c37a96add@github.com> Message-ID: On Thu, 1 Jul 2021 13:41:15 GMT, Zhengyu Gu wrote: > Open this PR to carry on the discussion started in jdk17 [https://github.com/openjdk/jdk17/pull/185](url) This pull request has now been integrated. Changeset: 16aa8cbf Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/16aa8cbf8d6c0b89cd88cbe4f39c2bb76968c06e Stats: 131 lines in 3 files changed: 124 ins; 5 del; 2 mod 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array Reviewed-by: kbarrett, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/4653 From kvn at openjdk.java.net Tue Jul 6 14:52:57 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 6 Jul 2021 14:52:57 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: <58_N4Nv-idYtlSuMr9U3bSpy1CTd7Y3FpJHXCQXuxzg=.bb911554-6fdb-46f5-82b3-0868163aff3c@github.com> References: <4uATHQKUM4LXeAWVeBG7T-icSBOiw7OoCRghjC316kg=.c92d41e2-27a6-496d-ae28-2870fcf8c8b1@github.com> <58_N4Nv-idYtlSuMr9U3bSpy1CTd7Y3FpJHXCQXuxzg=.bb911554-6fdb-46f5-82b3-0868163aff3c@github.com> Message-ID: On Tue, 6 Jul 2021 09:06:05 GMT, Jatin Bhateja wrote: >> Marked as reviewed by dlong (Reviewer). > > Hi @dean-long , @vnkozlov , > Since the idea is to test KNL settings on any latest X86 Server platform we should also turn off following options. > > if (is_intel()) { // Intel cpus specific settings > if (is_knights_family()) { > _features &= ~CPU_VZEROUPPER; > _features &= ~CPU_AVX512BW; > _features &= ~CPU_AVX512VL; > + _features &= ~CPU_AVX512DQ; > + _features &= ~CPU_AVX512_VNNI; > + _features &= ~CPU_AVX512_VAES; > + _features &= ~CPU_AVX512_VPOPCNTDQ; > + _features &= ~CPU_AVX512_VPCLMULQDQ; > + _features &= ~CPU_AVX512_VBMI; > + _features &= ~CPU_AVX512_VBMI2; > } > } Thank you, @jatin-bhateja I think I should be more clear here. What I want to emulate is VM instance we use in our testing: KVM virtualization detected CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) family 6 model 85 stepping 4 microcode 0x1, cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, lzcnt, tsc, avx, avx2, aes, erms, clmul, bmi1, bmi2, rtm, adx, avx512f, avx512dq, avx512cd, fma, clflush, clflushopt, clwb, hv And because current KNL flags setting is matching it I used it. I don't want to emulate exactly KNL CPU. ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From kvn at openjdk.java.net Tue Jul 6 15:04:49 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 6 Jul 2021 15:04:49 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: <58_N4Nv-idYtlSuMr9U3bSpy1CTd7Y3FpJHXCQXuxzg=.bb911554-6fdb-46f5-82b3-0868163aff3c@github.com> References: <4uATHQKUM4LXeAWVeBG7T-icSBOiw7OoCRghjC316kg=.c92d41e2-27a6-496d-ae28-2870fcf8c8b1@github.com> <58_N4Nv-idYtlSuMr9U3bSpy1CTd7Y3FpJHXCQXuxzg=.bb911554-6fdb-46f5-82b3-0868163aff3c@github.com> Message-ID: On Tue, 6 Jul 2021 09:06:05 GMT, Jatin Bhateja wrote: >> Marked as reviewed by dlong (Reviewer). > > Hi @dean-long , @vnkozlov , > Since the idea is to test KNL settings on any latest X86 Server platform we should also turn off following options. > > if (is_intel()) { // Intel cpus specific settings > if (is_knights_family()) { > _features &= ~CPU_VZEROUPPER; > _features &= ~CPU_AVX512BW; > _features &= ~CPU_AVX512VL; > + _features &= ~CPU_AVX512DQ; > + _features &= ~CPU_AVX512_VNNI; > + _features &= ~CPU_AVX512_VAES; > + _features &= ~CPU_AVX512_VPOPCNTDQ; > + _features &= ~CPU_AVX512_VPCLMULQDQ; > + _features &= ~CPU_AVX512_VBMI; > + _features &= ~CPU_AVX512_VBMI2; > } > } Tests listed in [8269828](https://bugs.openjdk.java.net/browse/JDK-8269828) passed with additional KNL CPU features switched off as @jatin-bhateja suggested. But it does not solve our internal testing issue. I will change bug's subject and flag name to be specific that it is VM emulation instead of KNL CPU. ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From coleenp at openjdk.java.net Tue Jul 6 15:56:50 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 6 Jul 2021 15:56:50 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Sun, 4 Jul 2021 22:37:45 GMT, Kim Barrett wrote: >> Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). >> Hoping git actions will test 32 bits. >> Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. > > src/hotspot/share/oops/symbol.hpp line 158: > >> 156: static int max_length() { return max_symbol_length; } >> 157: unsigned identity_hash() const { >> 158: unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogBytesPerWord + 3)); > > [pre-existing] I'm puzzled by the `+3` in this line. I'm guessing this is `log2(sizeof(Symbol))`? Oh that might be a good guess. I thought it was just some way to randomize the address some more. @yminqi or @iklam would know since it was added with JDK-8130115. ------------- PR: https://git.openjdk.java.net/jdk/pull/4675 From jbhateja at openjdk.java.net Tue Jul 6 16:37:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 6 Jul 2021 16:37:53 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: References: <4uATHQKUM4LXeAWVeBG7T-icSBOiw7OoCRghjC316kg=.c92d41e2-27a6-496d-ae28-2870fcf8c8b1@github.com> <58_N4Nv-idYtlSuMr9U3bSpy1CTd7Y3FpJHXCQXuxzg=.bb911554-6fdb-46f5-82b3-0868163aff3c@github.com> Message-ID: On Tue, 6 Jul 2021 15:02:17 GMT, Vladimir Kozlov wrote: >> Hi @dean-long , @vnkozlov , >> Since the idea is to test KNL settings on any latest X86 Server platform we should also turn off following options. >> >> if (is_intel()) { // Intel cpus specific settings >> if (is_knights_family()) { >> _features &= ~CPU_VZEROUPPER; >> _features &= ~CPU_AVX512BW; >> _features &= ~CPU_AVX512VL; >> + _features &= ~CPU_AVX512DQ; >> + _features &= ~CPU_AVX512_VNNI; >> + _features &= ~CPU_AVX512_VAES; >> + _features &= ~CPU_AVX512_VPOPCNTDQ; >> + _features &= ~CPU_AVX512_VPCLMULQDQ; >> + _features &= ~CPU_AVX512_VBMI; >> + _features &= ~CPU_AVX512_VBMI2; >> } >> } > > Tests listed in [8269828](https://bugs.openjdk.java.net/browse/JDK-8269828) passed with additional KNL CPU features switched off as @jatin-bhateja suggested. But it does not solve our internal testing issue. > > I will change bug's subject and flag name to be specific that it is VM emulation instead of KNL CPU. Hi @vnkozlov , Thanks for clarifications, configuration of VM guest mentions AVX512DQ feature which is not supported by KNL. Below is the KNL features list generated using Intel's SDE (SW Dev Emulator). SPROMPT>sde -knl -- cpuid -1 | grep "AVX512" | grep "true" AVX512F: AVX-512 foundation instructions = true AVX512PF: prefetch instructions = true AVX512ER: exponent & reciprocal instrs = true AVX512CD: conflict detection instrs = true SPROMPT>sde -knl -- cpuid -1 | grep "AVX512" | grep "false" AVX512DQ: double & quadword instructions = false AVX512IFMA: fused multiply add = false AVX512BW: byte & word instructions = false AVX512VL: vector length = false AVX512VBMI: vector byte manipulation = false AVX512_VBMI2: byte VPCOMPRESS, VPEXPAND = false AVX512_VNNI: neural network instructions = false AVX512_BITALG: bit count/shiffle = false AVX512: VPOPCNTDQ instruction = false AVX512_4VNNIW: neural network instrs = false AVX512_4FMAPS: multiply acc single prec = false AVX512_VP2INTERSECT: intersect mask regs = false Regards ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From minqi at openjdk.java.net Tue Jul 6 16:51:53 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Tue, 6 Jul 2021 16:51:53 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Tue, 6 Jul 2021 15:54:19 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/symbol.hpp line 158: >> >>> 156: static int max_length() { return max_symbol_length; } >>> 157: unsigned identity_hash() const { >>> 158: unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogBytesPerWord + 3)); >> >> [pre-existing] I'm puzzled by the `+3` in this line. I'm guessing this is `log2(sizeof(Symbol))`? > > Oh that might be a good guess. I thought it was just some way to randomize the address some more. @yminqi or @iklam would know since it was added with JDK-8130115. This address_bits here is for using 'this' to join the calculation of identity_hash. I don't think it matters which bits are used here, but we need keep consistency with SA. If you changed here, you need to change Symbol.java too. Note in 32 bits, LogBytesPerWord is different from LogMinObjAlignmentInBytes (it is 3 for both 32 and 64 its). ------------- PR: https://git.openjdk.java.net/jdk/pull/4675 From kvn at openjdk.java.net Tue Jul 6 16:56:47 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 6 Jul 2021 16:56:47 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 19:35:56 GMT, Vladimir Kozlov wrote: > Knights family of X86 Intel CPU (KNL) does not support some of avx512 features (AVX512VL/BW) and have other restrictions. We may not have such kind of machines in our testing environment and may miss bugs as JBS history shows (look recent fixes for KNL). > On other hand we have some Windows VM instances which seem emulate KNL configuration and limit avx512 instructions on CPU which supports full set. Recent bug JDK-8269775 shows such example. > I suggest to add -XX:+UseKNLSetting x86 diagnostic flag to emulate KNL settings in HotSpot VM to test such configuration. Interesting. Excluding AVX512DQ does help to pass tests from 8269828 (at least on my local machine). Yes, my listed before CPUID was incorrect (was generated with my patch). Here is correct one from one of crashes we see (no AVX512BW, AVX512VL, AVX512DQ, CLWB, FLUSHOPT): CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) family 6 model 85 stepping 4 microcode 0x1, cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, lzcnt, tsc, avx, avx2, aes, erms, clmul, bmi1, bmi2, rtm, adx, avx512f, avx512cd, fma, vzeroupper, clflush, hv I am currently testing update for this PR which excludes AVX512DQ too. ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From kvn at openjdk.java.net Tue Jul 6 17:12:51 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 6 Jul 2021 17:12:51 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: References: <4uATHQKUM4LXeAWVeBG7T-icSBOiw7OoCRghjC316kg=.c92d41e2-27a6-496d-ae28-2870fcf8c8b1@github.com> <58_N4Nv-idYtlSuMr9U3bSpy1CTd7Y3FpJHXCQXuxzg=.bb911554-6fdb-46f5-82b3-0868163aff3c@github.com> Message-ID: <4Q1rzXwbmjh7xH-E9QGLoYn7vA4cHHV9hNIhJ80Ie3Y=.5b97a352-61bf-4cde-aa75-0dc8e0d1a899@github.com> On Tue, 6 Jul 2021 16:34:29 GMT, Jatin Bhateja wrote: >> Tests listed in [8269828](https://bugs.openjdk.java.net/browse/JDK-8269828) passed with additional KNL CPU features switched off as @jatin-bhateja suggested. But it does not solve our internal testing issue. >> >> I will change bug's subject and flag name to be specific that it is VM emulation instead of KNL CPU. > > Hi @vnkozlov , > Thanks for clarifications, configuration of VM guest mentions AVX512DQ feature which is not supported by KNL. > Below is the KNL features list generated using Intel's SDE (SW Dev Emulator). > > > > SPROMPT>sde -knl -- cpuid -1 | grep "AVX512" | grep "true" > AVX512F: AVX-512 foundation instructions = true > AVX512PF: prefetch instructions = true > AVX512ER: exponent & reciprocal instrs = true > AVX512CD: conflict detection instrs = true > SPROMPT>sde -knl -- cpuid -1 | grep "AVX512" | grep "false" > AVX512DQ: double & quadword instructions = false > AVX512IFMA: fused multiply add = false > AVX512BW: byte & word instructions = false > AVX512VL: vector length = false > AVX512VBMI: vector byte manipulation = false > AVX512_VBMI2: byte VPCOMPRESS, VPEXPAND = false > AVX512_VNNI: neural network instructions = false > AVX512_BITALG: bit count/shiffle = false > AVX512: VPOPCNTDQ instruction = false > AVX512_4VNNIW: neural network instrs = false > AVX512_4FMAPS: multiply acc single prec = false > AVX512_VP2INTERSECT: intersect mask regs = false > > > Regards @jatin-bhateja What is your suggestion for this change? Because of my mistake about AVX512DQ I think I can just apply your additional features exclusion for KNL cpu and it will emulate VM settings we have. And I don't need separate settings for virtualization. ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From iklam at openjdk.java.net Tue Jul 6 17:34:46 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 6 Jul 2021 17:34:46 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Tue, 6 Jul 2021 16:48:34 GMT, Yumin Qi wrote: >> Oh that might be a good guess. I thought it was just some way to randomize the address some more. @yminqi or @iklam would know since it was added with JDK-8130115. > > This address_bits here is for using 'this' to join the calculation of identity_hash. I don't think it matters which bits are used here, but we need keep consistency with SA. If you changed here, you need to change Symbol.java too. Note in 32 bits, LogBytesPerWord is different from LogMinObjAlignmentInBytes (it is 3 for both 32 and 64 its). I agree that the `+3` can be replaced with `log2(sizeof(Symbol))`. If I remember correctly, the intention was to avoid getting the same value for `(((uintptr_t)this) >> LogBytesPerWord) & 0x07)`. However, this may not be necessary. The Symbols are of variable sizes (the string body is allocated as part of the Symbol). I write a program to analyze the distribution of the above expression for 18487 Symbols allocated for running a HelloWorld Java program. While it's not perfectly distributed, it may be good enough. Perhaps we can get rid of the `+ 3` or `+ log2(sizeof(Symbol))` in a separate RFE? 0: +++++++ 1: +++++++++++++++++++++++++++++++++++++++ 2: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3: ++++++++++++++++++++++++++++++++++++++++++++++ 4: ++++++++++++++++++++++++++++++ 5: ++++++++++++++++++++++ 6: +++++++++++++++ 7: ++++++++++ 0: 571 1: 3152 2: 4659 3: 3721 4: 2467 5: 1801 6: 1276 7: 838 ------------- PR: https://git.openjdk.java.net/jdk/pull/4675 From jbhateja at openjdk.java.net Tue Jul 6 17:38:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 6 Jul 2021 17:38:53 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: References: <4uATHQKUM4LXeAWVeBG7T-icSBOiw7OoCRghjC316kg=.c92d41e2-27a6-496d-ae28-2870fcf8c8b1@github.com> <58_N4Nv-idYtlSuMr9U3bSpy1CTd7Y3FpJHXCQXuxzg=.bb911554-6fdb-46f5-82b3-0868163aff3c@github.com> Message-ID: On Tue, 6 Jul 2021 16:34:29 GMT, Jatin Bhateja wrote: >> Tests listed in [8269828](https://bugs.openjdk.java.net/browse/JDK-8269828) passed with additional KNL CPU features switched off as @jatin-bhateja suggested. But it does not solve our internal testing issue. >> >> I will change bug's subject and flag name to be specific that it is VM emulation instead of KNL CPU. > > Hi @vnkozlov , > Thanks for clarifications, configuration of VM guest mentions AVX512DQ feature which is not supported by KNL. > Below is the KNL features list generated using Intel's SDE (SW Dev Emulator). > > > > SPROMPT>sde -knl -- cpuid -1 | grep "AVX512" | grep "true" > AVX512F: AVX-512 foundation instructions = true > AVX512PF: prefetch instructions = true > AVX512ER: exponent & reciprocal instrs = true > AVX512CD: conflict detection instrs = true > SPROMPT>sde -knl -- cpuid -1 | grep "AVX512" | grep "false" > AVX512DQ: double & quadword instructions = false > AVX512IFMA: fused multiply add = false > AVX512BW: byte & word instructions = false > AVX512VL: vector length = false > AVX512VBMI: vector byte manipulation = false > AVX512_VBMI2: byte VPCOMPRESS, VPEXPAND = false > AVX512_VNNI: neural network instructions = false > AVX512_BITALG: bit count/shiffle = false > AVX512: VPOPCNTDQ instruction = false > AVX512_4VNNIW: neural network instrs = false > AVX512_4FMAPS: multiply acc single prec = false > AVX512_VP2INTERSECT: intersect mask regs = false > > > Regards > @jatin-bhateja > What is your suggestion for this change? Because of my mistake about AVX512DQ I think I can just apply your additional features exclusion for KNL cpu and it will emulate VM settings we have. And I don't need separate settings for virtualization. I agree with your suggestion. Settings enabled under UseNewCode with additional new features disabled for KNL will help in testing. ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From coleenp at openjdk.java.net Tue Jul 6 21:22:25 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 6 Jul 2021 21:22:25 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects [v2] In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Tue, 6 Jul 2021 17:31:49 GMT, Ioi Lam wrote: >> This address_bits here is for using 'this' to join the calculation of identity_hash. I don't think it matters which bits are used here, but we need keep consistency with SA. If you changed here, you need to change Symbol.java too. Note in 32 bits, LogBytesPerWord is different from LogMinObjAlignmentInBytes (it is 3 for both 32 and 64 its). > > I agree that the `+3` can be replaced with `log2(sizeof(Symbol))`. If I remember correctly, the intention was to avoid getting the same value for `(((uintptr_t)this) >> LogBytesPerWord) & 0x07)`. > > However, this may not be necessary. The Symbols are of variable sizes (the string body is allocated as part of the Symbol). I write a program to analyze the distribution of the above expression for 18487 Symbols allocated for running a HelloWorld Java program. While it's not perfectly distributed, it may be good enough. > > Perhaps we can get rid of the `+ 3` or `+ log2(sizeof(Symbol))` in a separate RFE? > > > 0: +++++++ > 1: +++++++++++++++++++++++++++++++++++++++ > 2: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3: ++++++++++++++++++++++++++++++++++++++++++++++ > 4: ++++++++++++++++++++++++++++++ > 5: ++++++++++++++++++++++ > 6: +++++++++++++++ > 7: ++++++++++ > > 0: 571 > 1: 3152 > 2: 4659 > 3: 3721 > 4: 2467 > 5: 1801 > 6: 1276 > 7: 838 Thanks Yumin for pointing out the SA dependencies. It turns out that this function isn't ultimately used by the SA anymore, but I've made the changes anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/4675 From coleenp at openjdk.java.net Tue Jul 6 21:22:24 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 6 Jul 2021 21:22:24 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects [v2] In-Reply-To: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: > Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). > Hoping git actions will test 32 bits. > Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - missed Symbol.java part of the change. - Add equivalent change to SA. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4675/files - new: https://git.openjdk.java.net/jdk/pull/4675/files/acc23f38..02ccd486 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4675&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4675&range=00-01 Stats: 8 lines in 2 files changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4675.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4675/head:pull/4675 PR: https://git.openjdk.java.net/jdk/pull/4675 From coleenp at openjdk.java.net Tue Jul 6 21:29:49 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 6 Jul 2021 21:29:49 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects [v2] In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Tue, 6 Jul 2021 21:16:29 GMT, Coleen Phillimore wrote: >> I agree that the `+3` can be replaced with `log2(sizeof(Symbol))`. If I remember correctly, the intention was to avoid getting the same value for `(((uintptr_t)this) >> LogBytesPerWord) & 0x07)`. >> >> However, this may not be necessary. The Symbols are of variable sizes (the string body is allocated as part of the Symbol). I write a program to analyze the distribution of the above expression for 18487 Symbols allocated for running a HelloWorld Java program. While it's not perfectly distributed, it may be good enough. >> >> Perhaps we can get rid of the `+ 3` or `+ log2(sizeof(Symbol))` in a separate RFE? >> >> >> 0: +++++++ >> 1: +++++++++++++++++++++++++++++++++++++++ >> 2: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 3: ++++++++++++++++++++++++++++++++++++++++++++++ >> 4: ++++++++++++++++++++++++++++++ >> 5: ++++++++++++++++++++++ >> 6: +++++++++++++++ >> 7: ++++++++++ >> >> 0: 571 >> 1: 3152 >> 2: 4659 >> 3: 3721 >> 4: 2467 >> 5: 1801 >> 6: 1276 >> 7: 838 > > Thanks Yumin for pointing out the SA dependencies. It turns out that this function isn't ultimately used by the SA anymore, but I've made the changes anyway. I'd be happy if someone changed the 3 to log2(sizeof(Symbol) in another RFE since I'm having trouble commenting why this is important. ------------- PR: https://git.openjdk.java.net/jdk/pull/4675 From kvn at openjdk.java.net Tue Jul 6 21:46:22 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 6 Jul 2021 21:46:22 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms [v2] In-Reply-To: References: Message-ID: > Knights family of X86 Intel CPU (KNL) does not support some of avx512 features (AVX512VL/BW) and have other restrictions. We may not have such kind of machines in our testing environment and may miss bugs as JBS history shows (look recent fixes for KNL). > On other hand we have some Windows VM instances which seem emulate KNL configuration and limit avx512 instructions on CPU which supports full set. Recent bug JDK-8269775 shows such example. > I suggest to add -XX:+UseKNLSetting x86 diagnostic flag to emulate KNL settings in HotSpot VM to test such configuration. Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Exclude more CPU features to emulate KNL cpu - Merge branch 'master' into JDK-8269825 - 8269825: [TESTBUG] Missing testing for x86 KNL platforms ------------- Changes: - all: https://git.openjdk.java.net/jdk17/pull/205/files - new: https://git.openjdk.java.net/jdk17/pull/205/files/9466fb97..4117946a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk17&pr=205&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk17&pr=205&range=00-01 Stats: 3695 lines in 52 files changed: 2834 ins; 359 del; 502 mod Patch: https://git.openjdk.java.net/jdk17/pull/205.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/205/head:pull/205 PR: https://git.openjdk.java.net/jdk17/pull/205 From kvn at openjdk.java.net Tue Jul 6 21:51:56 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 6 Jul 2021 21:51:56 GMT Subject: [jdk17] RFR: 8269825: [TESTBUG] Missing testing for x86 KNL platforms [v2] In-Reply-To: References: Message-ID: On Tue, 6 Jul 2021 21:46:22 GMT, Vladimir Kozlov wrote: >> Knights family of X86 Intel CPU (KNL) does not support some of avx512 features (AVX512VL/BW) and have other restrictions. We may not have such kind of machines in our testing environment and may miss bugs as JBS history shows (look recent fixes for KNL). >> On other hand we have some Windows VM instances which seem emulate KNL configuration and limit avx512 instructions on CPU which supports full set. Recent bug JDK-8269775 shows such example. >> I suggest to add -XX:+UseKNLSetting x86 diagnostic flag to emulate KNL settings in HotSpot VM to test such configuration. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Exclude more CPU features to emulate KNL cpu > - Merge branch 'master' into JDK-8269825 > - 8269825: [TESTBUG] Missing testing for x86 KNL platforms The update changes were suggested by Jatin. ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From kvn at openjdk.java.net Tue Jul 6 21:56:04 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 6 Jul 2021 21:56:04 GMT Subject: [jdk17] Integrated: 8269825: [TESTBUG] Missing testing for x86 KNL platforms In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 19:35:56 GMT, Vladimir Kozlov wrote: > Knights family of X86 Intel CPU (KNL) does not support some of avx512 features (AVX512VL/BW) and have other restrictions. We may not have such kind of machines in our testing environment and may miss bugs as JBS history shows (look recent fixes for KNL). > On other hand we have some Windows VM instances which seem emulate KNL configuration and limit avx512 instructions on CPU which supports full set. Recent bug JDK-8269775 shows such example. > I suggest to add -XX:+UseKNLSetting x86 diagnostic flag to emulate KNL settings in HotSpot VM to test such configuration. This pull request has now been integrated. Changeset: 0d1cd3a7 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk17/commit/0d1cd3a7452a83f198d5d6eab0d4fbbaf44a302b Stats: 13 lines in 3 files changed: 12 ins; 0 del; 1 mod 8269825: [TESTBUG] Missing testing for x86 KNL platforms Reviewed-by: dlong, jbhateja ------------- PR: https://git.openjdk.java.net/jdk17/pull/205 From jwilhelm at openjdk.java.net Tue Jul 6 22:26:28 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 6 Jul 2021 22:26:28 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8269825: [TESTBUG] Missing testing for x86 KNL platforms - 8269955: ProblemList compiler/vectorapi/VectorCastShape[64|128]Test.java tests on x86 - 8268966: AArch64: 'bad AD file' in some vector conversion tests - 8225667: Clarify the behavior of System::gc w.r.t. reference processing - 8269568: JVM crashes when running VectorMask query tests - 8269661: JNI_GetStringCritical does not lock char array - 8269575: C2: assert(false) failed: graph should be schedulable after JDK-8252372 - 8268883: C2: assert(false) failed: unscheduable graph - 8268369: SIGSEGV in PhaseCFG::implicit_null_check due to missing null check The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4698&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4698&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4698/files Stats: 3666 lines in 51 files changed: 2827 ins; 351 del; 488 mod Patch: https://git.openjdk.java.net/jdk/pull/4698.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4698/head:pull/4698 PR: https://git.openjdk.java.net/jdk/pull/4698 From jwilhelm at openjdk.java.net Tue Jul 6 23:05:13 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 6 Jul 2021 23:05:13 GMT Subject: RFR: Merge jdk17 [v2] In-Reply-To: References: Message-ID: > Forwardport JDK 17 -> JDK 18 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 164 commits: - Merge - 8269935: ProblemList runtime/jni/checked/TestPrimitiveArrayCriticalWithBadParam.java on windows Reviewed-by: jjg - 8269917: Insert missing commas in copyrights in java.net Reviewed-by: chegar, dfuchs - 8253119: Remove the legacy PlainSocketImpl and PlainDatagramSocketImpl implementation Reviewed-by: alanb, dfuchs, chegar - 8269692: sun.net.httpserver.ServerImpl::createContext should throw IAE Reviewed-by: dfuchs - 8269697: JNI_GetPrimitiveArrayCritical() should not accept object array Reviewed-by: kbarrett, dholmes - 8266310: deadlock between System.loadLibrary and JNI FindClass loading another class Reviewed-by: dholmes, plevart, chegar, mchung - 8269882: stack-use-after-scope in NewObjectA Reviewed-by: kbarrett - 8269672: C1: Remove unaligned move on all architectures Co-authored-by: Martin Doerr Reviewed-by: thartmann - 8267956: C1 code cleanup Reviewed-by: thartmann - ... and 154 more: https://git.openjdk.java.net/jdk/compare/0d1cd3a7...b68ba015 ------------- Changes: https://git.openjdk.java.net/jdk/pull/4698/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4698&range=01 Stats: 46385 lines in 833 files changed: 20688 ins; 22823 del; 2874 mod Patch: https://git.openjdk.java.net/jdk/pull/4698.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4698/head:pull/4698 PR: https://git.openjdk.java.net/jdk/pull/4698 From jwilhelm at openjdk.java.net Tue Jul 6 23:05:14 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 6 Jul 2021 23:05:14 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: On Tue, 6 Jul 2021 22:16:07 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: 7a4f08ae Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/7a4f08ae32ede32beb05f6e5e0a266943b91b1ee Stats: 3666 lines in 51 files changed: 2827 ins; 351 del; 488 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4698 From minqi at openjdk.java.net Tue Jul 6 22:33:07 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Tue, 6 Jul 2021 22:33:07 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects [v2] In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Tue, 6 Jul 2021 21:22:24 GMT, Coleen Phillimore wrote: >> Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). >> Hoping git actions will test 32 bits. >> Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - missed Symbol.java part of the change. > - Add equivalent change to SA. LGTM. ------------- Marked as reviewed by minqi (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4675 From kbarrett at openjdk.java.net Tue Jul 6 23:22:35 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 6 Jul 2021 23:22:35 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects [v2] In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Tue, 6 Jul 2021 21:22:24 GMT, Coleen Phillimore wrote: >> Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). >> Hoping git actions will test 32 bits. >> Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - missed Symbol.java part of the change. > - Add equivalent change to SA. Still good. I'd also be fine with removal of that questionable "+ 3". ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4675 From iklam at openjdk.java.net Wed Jul 7 01:40:50 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 7 Jul 2021 01:40:50 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects [v2] In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Tue, 6 Jul 2021 23:11:35 GMT, Kim Barrett wrote: > Still good. I'd also be fine with removal of that questionable "+ 3". I filed https://bugs.openjdk.java.net/browse/JDK-8269986 to remove the "+3" ------------- PR: https://git.openjdk.java.net/jdk/pull/4675 From yyang at openjdk.java.net Wed Jul 7 01:42:55 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 7 Jul 2021 01:42:55 GMT Subject: Integrated: 8268425: Show decimal nid of OSThread instead of hex format one In-Reply-To: <2XpHch1KL91iW9wQ9VdboCFdkyUxdCwCq_-Dad6zo4E=.b01db185-1596-4f0c-b1ee-2d125d50963c@github.com> References: <2XpHch1KL91iW9wQ9VdboCFdkyUxdCwCq_-Dad6zo4E=.b01db185-1596-4f0c-b1ee-2d125d50963c@github.com> Message-ID: On Thu, 10 Jun 2021 02:07:36 GMT, Yi Yang wrote: > From users' perspective, we can find corresponding os thread via top directly, otherwise, we must convert hex format based nid to an integer, and find that thread via `top -pid `. This slightly facilitates our debugging process, but would obviously break some existing jstack analysis tool. > > Jstack Before: > > "ParGC Thread#7" os_prio=0 cpu=103260.18ms elapsed=5255043.58s tid=0x00007f967000b000 nid=0x12e67 runnable > > "ParGC Thread#8" os_prio=0 cpu=104818.76ms elapsed=5255043.58s tid=0x00007f967000c000 nid=0x12e68 runnable > > "ParGC Thread#9" os_prio=0 cpu=102164.69ms elapsed=5255043.58s tid=0x00007f967000e000 nid=0x12e69 runnable > > Jstack After: > "G1 Conc#0" os_prio=0 cpu=0.03ms elapsed=1295.27s tid=0x00007f99dc096490 nid=117707 runnable > > "G1 Refine#0" os_prio=0 cpu=0.06ms elapsed=1295.22s tid=0x00007f99dc2cad20 nid=117708 runnable > > "G1 Service" os_prio=0 cpu=87.05ms elapsed=1295.22s tid=0x00007f99dc2cc140 nid=117709 runnable > > Top: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 49083 tianxia+ 20 0 32.8g 594148 10796 S 103.3 0.1 0:10.05 java > 71291 qingfen+ 20 0 39.3g 26.7g 18312 S 100.7 5.3 16861:35 jhsdb > 50407 tianxia+ 20 0 32.5g 32796 9768 S 100.3 0.0 0:05.80 java > 107429 maolian+ 20 0 11.4g 1.1g 10956 S 100.3 0.2 20173:52 java > 99923 root 10 -10 288520 163228 5088 S 5.9 0.0 6463:53 AliYunDun This pull request has now been integrated. Changeset: a9e20101 Author: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/a9e201016de119af4b0fd3ebb43768896fb9e5c5 Stats: 6 lines in 3 files changed: 1 ins; 0 del; 5 mod 8268425: Show decimal nid of OSThread instead of hex format one Reviewed-by: stuefe, kevinw ------------- PR: https://git.openjdk.java.net/jdk/pull/4449 From dholmes at openjdk.java.net Wed Jul 7 07:11:08 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 7 Jul 2021 07:11:08 GMT Subject: RFR: 8215948: [TESTBUG] gtest pseudo-JavaThreads could be more regular JavaThreads Message-ID: Please review this change to the gtest infrastructure that makes the test JavaThreads more regular JavaThreads, with the same lifecycle methods and having a regular j.l.Thread object. One test had to be modified slightly to create and start the threads outside of a code region where a Mutex is held. Testing: tiers 1-3, GHA Thanks, David ------------- Commit messages: - 8215948: [TESTBUG] gtest pseudo-JavaThreads could be more regular JavaThreads Changes: https://git.openjdk.java.net/jdk/pull/4704/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4704&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8215948 Stats: 108 lines in 2 files changed: 43 ins; 38 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/4704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4704/head:pull/4704 PR: https://git.openjdk.java.net/jdk/pull/4704 From aw at openjdk.java.net Wed Jul 7 11:47:06 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Wed, 7 Jul 2021 11:47:06 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames Message-ID: Several smaller optimizations and cleanups to JVMCI's iterateFrames: * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). * Only resolve the callback interface method once per iterateFrames call. * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. ------------- Commit messages: - [JVMCI] Test iterateFrames with native frames. - [JVMCI] Optimize iterateFrames. - Add support for native frames to vframeStreamCommon::asJavaVFrame(). - Add getters to vframeStreamCommon. Changes: https://git.openjdk.java.net/jdk/pull/4625/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4625&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269592 Stats: 456 lines in 6 files changed: 335 ins; 55 del; 66 mod Patch: https://git.openjdk.java.net/jdk/pull/4625.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4625/head:pull/4625 PR: https://git.openjdk.java.net/jdk/pull/4625 From coleenp at openjdk.java.net Wed Jul 7 12:49:52 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 7 Jul 2021 12:49:52 GMT Subject: RFR: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects [v2] In-Reply-To: References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: On Tue, 6 Jul 2021 21:22:24 GMT, Coleen Phillimore wrote: >> Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). >> Hoping git actions will test 32 bits. >> Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - missed Symbol.java part of the change. > - Add equivalent change to SA. Thanks Ioi, Yumin, Kim and Thomas for the review and comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/4675 From coleenp at openjdk.java.net Wed Jul 7 12:49:53 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 7 Jul 2021 12:49:53 GMT Subject: Integrated: 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects In-Reply-To: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> References: <2iYTJ25o-hTaqlrJHGRa-M1Zru7jNIfTFz_XUQxw8yw=.f89be21e-fad1-42f9-98cd-bda9b9e9e520@github.com> Message-ID: <7xywhVgLftjliRet4MTnmPVpwqUGSGXc0XijVmxx63g=.014cd543-fa82-4d72-a115-f1d958e4265a@github.com> On Fri, 2 Jul 2021 20:36:23 GMT, Coleen Phillimore wrote: > Replace '3' and LogMinObjAlignmentInBytes with LogBytesPerWord (3 for LP64 and 2 for 32 bits). > Hoping git actions will test 32 bits. > Tested with tier1 on all Oracle platforms, but since I didn't change the values for our platforms, no failures expected. This pull request has now been integrated. Changeset: 2dc54864 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/2dc5486415097bf44e7fca1cf601847fde0eeecb Stats: 11 lines in 4 files changed: 6 ins; 0 del; 5 mod 8267303: Replace MinObjectAlignmentSize usages for non-Java heap objects Reviewed-by: kbarrett, tschatzl, minqi ------------- PR: https://git.openjdk.java.net/jdk/pull/4675 From coleenp at openjdk.java.net Wed Jul 7 13:35:10 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 7 Jul 2021 13:35:10 GMT Subject: RFR: 8269962: SA has unused Hashtable, Dictionary classes In-Reply-To: References: Message-ID: <17Vyll0T_s8sQGC8070vcu9ZeLxX-Nf5Re0skIdi2k8=.99923ba2-abdc-4eb3-b48c-30f1d0783384@github.com> On Wed, 7 Jul 2021 13:27:49 GMT, Coleen Phillimore wrote: > See bug for more details. This code is unused and is soon going to be not in hotspot. > I left in logBytesPerWord() from my previous change because it might be useful in the SA. I could remove it if opinion warrants. > Ran tier1-3. I also removed BinaryTreeDictionary left over from Metaspace and CMS. ------------- PR: https://git.openjdk.java.net/jdk/pull/4708 From coleenp at openjdk.java.net Wed Jul 7 13:35:10 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 7 Jul 2021 13:35:10 GMT Subject: RFR: 8269962: SA has unused Hashtable, Dictionary classes Message-ID: See bug for more details. This code is unused and is soon going to be not in hotspot. I left in logBytesPerWord() from my previous change because it might be useful in the SA. I could remove it if opinion warrants. Ran tier1-3. ------------- Commit messages: - 8269962: SA has unused Hashtable, Dictionary classes Changes: https://git.openjdk.java.net/jdk/pull/4708/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4708&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269962 Stats: 669 lines in 14 files changed: 0 ins; 666 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/4708.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4708/head:pull/4708 PR: https://git.openjdk.java.net/jdk/pull/4708 From iklam at openjdk.java.net Thu Jul 8 04:27:03 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 8 Jul 2021 04:27:03 GMT Subject: RFR: 8270059: Remove KVHashtable Message-ID: There are now only 2 uses of KVHashtable in the HotSpot code. They can be easily rewritten to use ResourceHashtable and ResizeableResourceHashtable. ------------- Commit messages: - 8270059: Remove KVHashtable Changes: https://git.openjdk.java.net/jdk/pull/4715/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4715&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270059 Stats: 133 lines in 6 files changed: 6 ins; 110 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/4715.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4715/head:pull/4715 PR: https://git.openjdk.java.net/jdk/pull/4715 From dholmes at openjdk.java.net Thu Jul 8 05:14:46 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 8 Jul 2021 05:14:46 GMT Subject: RFR: 8270059: Remove KVHashtable In-Reply-To: References: Message-ID: <4gK3ZlakhukXMEULmwyJ3sTmfH15njTTXDmSpERHuHc=.3693516e-b5ad-4df2-9475-46fbd57cc5c9@github.com> On Thu, 8 Jul 2021 04:18:22 GMT, Ioi Lam wrote: > There are now only 2 uses of KVHashtable in the HotSpot code. They can be easily rewritten to use ResourceHashtable and ResizeableResourceHashtable. Hi Ioi, This seems okay to me. One query: why did the AsyncLogWriter usage have to specify all 7 template parameters when the ArchiveBuilder usage didn't ?? Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4715 From iklam at openjdk.java.net Thu Jul 8 05:46:49 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 8 Jul 2021 05:46:49 GMT Subject: RFR: 8270059: Remove KVHashtable In-Reply-To: <4gK3ZlakhukXMEULmwyJ3sTmfH15njTTXDmSpERHuHc=.3693516e-b5ad-4df2-9475-46fbd57cc5c9@github.com> References: <4gK3ZlakhukXMEULmwyJ3sTmfH15njTTXDmSpERHuHc=.3693516e-b5ad-4df2-9475-46fbd57cc5c9@github.com> Message-ID: On Thu, 8 Jul 2021 05:11:53 GMT, David Holmes wrote: > Hi Ioi, > > This seems okay to me. One query: why did the AsyncLogWriter usage have to specify all 7 template parameters when the ArchiveBuilder usage didn't ?? That's because AsyncLogWriter uses ResourceHashtable and ArchiveBuilder uses ResizeableResourceHashtable. These two templates have different order for the template parameters. It will be all sorted out in https://github.com/iklam/jdk/pull/new/8270061-reorder-resource-hash-params. I will do that after this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/4715 From dholmes at openjdk.java.net Thu Jul 8 06:56:52 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 8 Jul 2021 06:56:52 GMT Subject: RFR: 8270059: Remove KVHashtable In-Reply-To: References: Message-ID: <0d0dqcoHMDEawcRk8BEbwiYWvfTM2Ez8Fwnu6bQLESE=.4c67f363-ed10-4859-8a7b-55a58a80bdb6@github.com> On Thu, 8 Jul 2021 04:18:22 GMT, Ioi Lam wrote: > There are now only 2 uses of KVHashtable in the HotSpot code. They can be easily rewritten to use ResourceHashtable and ResizeableResourceHashtable. Thanks for clarifying. ------------- PR: https://git.openjdk.java.net/jdk/pull/4715 From ysuenaga at openjdk.java.net Thu Jul 8 12:05:07 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Thu, 8 Jul 2021 12:05:07 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 Message-ID: I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: ------------- Commit messages: - 8270083: -Wnonnull errors happen with GCC 11.1.1 Changes: https://git.openjdk.java.net/jdk/pull/4719/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4719&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270083 Stats: 42 lines in 9 files changed: 38 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/4719.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4719/head:pull/4719 PR: https://git.openjdk.java.net/jdk/pull/4719 From shade at openjdk.java.net Thu Jul 8 16:57:08 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 8 Jul 2021 16:57:08 GMT Subject: RFR: 8263375: Support stack watermarks in Zero VM Message-ID: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> Zero VM supports most of GCs. Since JDK 16, Shenandoah uses stack watermarks, so Zero has to support those if Shenandoah+Zero support is to remain. This PR adds the stack watermark support in Zero VM. This should also be useful as other projects, notably Loom, mature and depend on stack watermarks. Zero already calls into Hotspot safepoint machinery to do things, and it seems only the hooks for `on_iteration` and `on_unwind` are missing. AFAICS, Zero only has on-return safepoints, renamed it to be more precise. @fisk, do you see any obvious problems with this patch? Additional testing: - [x] Linux x86_64 Zero `hotspot_gc_shenandoah` now passes - [ ] Linux x86_64 Zero `tier1` ------------- Commit messages: - Revert debugging - 8263375: Support stack watermarks in Zero VM Changes: https://git.openjdk.java.net/jdk/pull/4728/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4728&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8263375 Stats: 50 lines in 5 files changed: 29 ins; 14 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4728.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4728/head:pull/4728 PR: https://git.openjdk.java.net/jdk/pull/4728 From erikj at openjdk.java.net Thu Jul 8 18:05:55 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Thu, 8 Jul 2021 18:05:55 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 09:42:43 GMT, Yasumasa Suenaga wrote: > I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: > > > In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, > from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: Build change looks good. Can't comment on the code changes. ------------- Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4719 From kvn at openjdk.java.net Thu Jul 8 18:29:52 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 8 Jul 2021 18:29:52 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. Someone from Graal group have to review it. I have concern about relaxing guarantees in `asJavaVFrame()`. It is shared code and the only case when we can see native frame is call from JVMCI's method `iterateFrames()` as I understand. Then relaxation should be JVMCI specific. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4625 From sspitsyn at openjdk.java.net Thu Jul 8 19:18:04 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 8 Jul 2021 19:18:04 GMT Subject: [jdk17] RFR: 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec Message-ID: The fix of: 8252657 JVMTI agent is not unloaded when Agent_OnAttach is failed did not update the JVM TI spec history at the end of document. This PR adds missed item to the JVM TI spec history. ------------- Commit messages: - 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec Changes: https://git.openjdk.java.net/jdk17/pull/233/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=233&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269558 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk17/pull/233.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/233/head:pull/233 PR: https://git.openjdk.java.net/jdk17/pull/233 From never at openjdk.java.net Thu Jul 8 19:18:49 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Thu, 8 Jul 2021 19:18:49 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. Looks good. ------------- Marked as reviewed by never (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4625 From never at openjdk.java.net Thu Jul 8 19:22:56 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Thu, 8 Jul 2021 19:22:56 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. We went through a bunch of review and tests cycles on this in our JDK repos so this is the result of that, so I've approved it. @dean-long saw some of this progress as well. So you want an ifdef JVMCI around the is_native part of the guarantees? Why are those guarantees anyway? Seems like they should just be asserts to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From dcubed at openjdk.java.net Thu Jul 8 19:33:57 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 8 Jul 2021 19:33:57 GMT Subject: [jdk17] RFR: 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 19:11:45 GMT, Serguei Spitsyn wrote: > The fix of: > 8252657 JVMTI agent is not unloaded when Agent_OnAttach is failed > did not update the JVM TI spec history at the end of document. > This PR adds missed item to the JVM TI spec history. Thumbs up. This looks like a trivial fix to me. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk17/pull/233 From aw at openjdk.java.net Thu Jul 8 19:44:57 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Thu, 8 Jul 2021 19:44:57 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 18:26:34 GMT, Vladimir Kozlov wrote: > I have concern about relaxing guarantees in `asJavaVFrame()`. It is shared code and the only case when we can see native frame is call from JVMCI's method `iterateFrames()` as I understand. Then relaxation should be JVMCI specific. @vnkozlov The changes to `asJavaVFrame()` seemed like a good idea since `vframeStream` iterates all javaVFrames including native frames but, without these changes, fails the `_frame.is_compiled_frame()` guarantee for native frames. This is ok if you filter the frames by method first, as is the case with current usages, but you won't be able to use `asJavaVFrame()` on all vframes in the stream. If the change is considered problematic/too risky, we can work around it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From cjplummer at openjdk.java.net Thu Jul 8 19:48:54 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Thu, 8 Jul 2021 19:48:54 GMT Subject: [jdk17] RFR: 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 19:11:45 GMT, Serguei Spitsyn wrote: > The fix of: > 8252657 JVMTI agent is not unloaded when Agent_OnAttach is failed > did not update the JVM TI spec history at the end of document. > This PR adds missed item to the JVM TI spec history. Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk17/pull/233 From coleenp at openjdk.java.net Thu Jul 8 19:49:57 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 8 Jul 2021 19:49:57 GMT Subject: RFR: 8270059: Remove KVHashtable In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 04:18:22 GMT, Ioi Lam wrote: > There are now only 2 uses of KVHashtable in the HotSpot code. They can be easily rewritten to use ResourceHashtable and ResizeableResourceHashtable. Nice cleanup! src/hotspot/share/cds/archiveBuilder.hpp line 183: > 181: public: > 182: bool do_entry(address key, const SourceObjInfo& value) { > 183: delete value.ref(); One of the things that I thought that should change with the _src_obj_table is that SourceObjInfo should have a destructor that should call this delete. But then the table would have to take SourceObjInfo elements and not copy from a stack object. I think ResourceHashtable's destructor calls delete on all the nodes. Anyway, that can be a future improvement if you've followed what I mean. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4715 From coleenp at openjdk.java.net Thu Jul 8 20:04:05 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 8 Jul 2021 20:04:05 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning Message-ID: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! Tested with tier1-3. ------------- Commit messages: - Remove Amalloc_D because it's the same as Amalloc_4. - 8253779: Amalloc may be wasting space by overaligning Changes: https://git.openjdk.java.net/jdk/pull/4732/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4732&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253779 Stats: 34 lines in 7 files changed: 3 ins; 18 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/4732.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4732/head:pull/4732 PR: https://git.openjdk.java.net/jdk/pull/4732 From iklam at openjdk.java.net Thu Jul 8 20:08:53 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 8 Jul 2021 20:08:53 GMT Subject: RFR: 8270059: Remove KVHashtable In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 19:46:32 GMT, Coleen Phillimore wrote: >> There are now only 2 uses of KVHashtable in the HotSpot code. They can be easily rewritten to use ResourceHashtable and ResizeableResourceHashtable. > > src/hotspot/share/cds/archiveBuilder.hpp line 183: > >> 181: public: >> 182: bool do_entry(address key, const SourceObjInfo& value) { >> 183: delete value.ref(); > > One of the things that I thought that should change with the _src_obj_table is that SourceObjInfo should have a destructor that should call this delete. But then the table would have to take SourceObjInfo elements and not copy from a stack object. I think ResourceHashtable's destructor calls delete on all the nodes. Anyway, that can be a future improvement if you've followed what I mean. Yes, I plan to add calls to the elements' destructors (perhaps both ~K() and ~V()?) in a separate PR. This would be similar to what `GrowableArray` does when the array elements are removed, or when the array is destroyed. To do this, we need to get `SourceObjInfo` to work with C++ Move Semantics so that the `_ref` field will not be freed more than once. I am still trying to learn how that works .... ------------- PR: https://git.openjdk.java.net/jdk/pull/4715 From dlong at openjdk.java.net Thu Jul 8 20:46:56 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 8 Jul 2021 20:46:56 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 09:42:43 GMT, Yasumasa Suenaga wrote: > I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: > > > In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, > from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4719 From dlong at openjdk.java.net Thu Jul 8 20:51:53 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 8 Jul 2021 20:51:53 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 19:19:37 GMT, Tom Rodriguez wrote: >> Several smaller optimizations and cleanups to JVMCI's iterateFrames: >> * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. >> * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. >> * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. >> Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. >> * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). >> These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). >> * Only resolve the callback interface method once per iterateFrames call. >> * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. >> * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). >> * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. > > We went through a bunch of review and tests cycles on this in our JDK repos so this is the result of that, so I've approved it. @dean-long saw some of this progress as well. > So you want an ifdef JVMCI around the is_native part of the guarantees? Why are those guarantees anyway? Seems like they should just be asserts to me. @tkrodriguez I think I introduced those guarantees, and now I agree they should probably be asserts. ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From kvn at openjdk.java.net Thu Jul 8 21:40:50 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 8 Jul 2021 21:40:50 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. Please link Graal's PR to this RFE so I can see your reviews. I am fine with converting guarantees to asserts. ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From kbarrett at openjdk.java.net Thu Jul 8 21:41:47 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 8 Jul 2021 21:41:47 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 09:42:43 GMT, Yasumasa Suenaga wrote: > I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: > > > In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, > from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: This _might_ be good enough as a short-term workaround, but it should not be considered a long-term fix. The underlying problem really needs to be addressed properly. ------------- PR: https://git.openjdk.java.net/jdk/pull/4719 From sspitsyn at openjdk.java.net Thu Jul 8 21:52:54 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 8 Jul 2021 21:52:54 GMT Subject: [jdk17] RFR: 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 19:11:45 GMT, Serguei Spitsyn wrote: > The fix of: > 8252657 JVMTI agent is not unloaded when Agent_OnAttach is failed > did not update the JVM TI spec history at the end of document. > This PR adds missed item to the JVM TI spec history. Dan and Chris, thank you for quick review! ------------- PR: https://git.openjdk.java.net/jdk17/pull/233 From dlong at openjdk.java.net Thu Jul 8 22:30:50 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 8 Jul 2021 22:30:50 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 09:42:43 GMT, Yasumasa Suenaga wrote: > I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: > > > In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, > from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: I agree, it would be good to fix the underlying problem without causing a performance regression. Maybe storing the register encoding as a field inside class RegisterImpl and using constexpr would work, or is there a better solution? ------------- PR: https://git.openjdk.java.net/jdk/pull/4719 From kbarrett at openjdk.java.net Thu Jul 8 22:52:56 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 8 Jul 2021 22:52:56 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Thu, 8 Jul 2021 22:17:07 GMT, Kim Barrett wrote: >> src/hotspot/share/memory/arena.hpp line 159: >> >>> 157: assert((x & (sizeof(char*)-1)) == 0, "misaligned size"); >>> 158: debug_only(if (UseMallocOnly) return malloc(x);) >>> 159: if (!check_for_overflow(x, "Arena::Amalloc_4", alloc_failmode)) >> >> [pre-existing] Missing brackets for the multiline `if` is contrary to the style guide. Similarly in Amalloc. > > [pre-existing: I can't comment directly on line 161, so adding here.] > `if (_hwm + x > _max)` has a risk of overflow if x is excessively large. Better is `if (_max - _hwm < x)`. Similarly in Amalloc. [pre-existing. I can't comment directly on line 161, so adding here.] The `if` to range check `x` could be an `else if`. Similarly in Amalloc. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Thu Jul 8 22:52:56 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 8 Jul 2021 22:52:56 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Thu, 8 Jul 2021 22:16:23 GMT, Kim Barrett wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > src/hotspot/share/memory/arena.hpp line 159: > >> 157: assert((x & (sizeof(char*)-1)) == 0, "misaligned size"); >> 158: debug_only(if (UseMallocOnly) return malloc(x);) >> 159: if (!check_for_overflow(x, "Arena::Amalloc_4", alloc_failmode)) > > [pre-existing] Missing brackets for the multiline `if` is contrary to the style guide. Similarly in Amalloc. [pre-existing: I can't comment directly on line 161, so adding here.] `if (_hwm + x > _max)` has a risk of overflow if x is excessively large. Better is `if (_max - _hwm < x)`. Similarly in Amalloc. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Thu Jul 8 22:52:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 8 Jul 2021 22:52:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning In-Reply-To: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Thu, 8 Jul 2021 19:56:33 GMT, Coleen Phillimore wrote: > Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. > I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! > Tested with tier1-3. Changes requested by kbarrett (Reviewer). src/hotspot/share/memory/arena.hpp line 138: > 136: > 137: // Fast allocate in the arena. Common case aligns to the size of long which is 64 bits > 138: // on both 32 and 64 bit platforms. Required for atomic long operations on 32 bits. s/long/jlong/ because C++ long is not 64 bits on some platforms (Windows!). src/hotspot/share/memory/arena.hpp line 140: > 138: // on both 32 and 64 bit platforms. Required for atomic long operations on 32 bits. > 139: void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { > 140: assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT) , "should be a power of 2"); [pre-existing] I think this could now be a static_assert. src/hotspot/share/memory/arena.hpp line 155: > 153: > 154: // Allocate in the arena, assuming the size has been aligned to size of pointer, which > 155: // is 4 bytes on 32 bits, hence the name. `Amalloc_4` is a horrible name for this, because this is allocating pointer-aligned and sized values, and even enforces that the size is appropriately pointer aligned. Maybe better would be something like `AmallocP`? Though it can also be used to allocate `size_t` and the like that are non-pointers. src/hotspot/share/memory/arena.hpp line 157: > 155: // is 4 bytes on 32 bits, hence the name. > 156: void *Amalloc_4(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { > 157: assert((x & (sizeof(char*)-1)) == 0, "misaligned size"); This ought to be using `is_aligned`. src/hotspot/share/memory/arena.hpp line 159: > 157: assert((x & (sizeof(char*)-1)) == 0, "misaligned size"); > 158: debug_only(if (UseMallocOnly) return malloc(x);) > 159: if (!check_for_overflow(x, "Arena::Amalloc_4", alloc_failmode)) [pre-existing] Missing brackets for the multiline `if` is contrary to the style guide. Similarly in Amalloc. src/hotspot/share/memory/arena.hpp line 171: > 169: // Allocate with 'double' alignment. It is 8 bytes on sparc. > 170: // In other cases Amalloc_D() should be the same as Amalloc_4(). > 171: void* Amalloc_D(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { I'm happy to see this go. That description is about as clear as mud; I have no idea what the intended semantics are for this name. Oh, I see. Before SPARC removal (JDK-8244224), this had some seriously stale conditional code for 32bit SPARC. I still don't understand what that was about, but don't really care either. Yay code deletion. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kvn at openjdk.java.net Thu Jul 8 22:59:48 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 8 Jul 2021 22:59:48 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. My concern comes from fact that we don't treat `native` frame as as `java` frame: [frame.cpp#L183](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/frame.cpp#L183). And `frame::is_compiled_frame()` specifically checks that compiled method is Java method. On other hand we create `compiledVFrame` for native calls wrappers - we check only that it is compiled code: [vframe.cpp#L81](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/vframe.cpp#L81). And `compiledVFrame` is subclass of `javaVFrame`. Which is conflicting for me but it is not related to these changes. Okay, if Tom and Dean think it is safe change for `asJavaVFrame()` code I will approve changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From ysuenaga at openjdk.java.net Fri Jul 9 00:31:52 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 9 Jul 2021 00:31:52 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 09:42:43 GMT, Yasumasa Suenaga wrote: > I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: > > > In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, > from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: It is better if we can treat registers as a class **normally**. Similar change happens for markWord in [JDK-8229258](https://bugs.openjdk.java.net/browse/JDK-8229258). Should we change like now? ------------- PR: https://git.openjdk.java.net/jdk/pull/4719 From coleenp at openjdk.java.net Fri Jul 9 00:38:50 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 00:38:50 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Thu, 8 Jul 2021 22:34:43 GMT, Kim Barrett wrote: >> [pre-existing: I can't comment directly on line 161, so adding here.] >> `if (_hwm + x > _max)` has a risk of overflow if x is excessively large. Better is `if (_max - _hwm < x)`. Similarly in Amalloc. > > [pre-existing. I can't comment directly on line 161, so adding here.] > The `if` to range check `x` could be an `else if`. Similarly in Amalloc. 1, 3. Ok, turned to else if with appropriate brackets. 2. Both of these places have a call check_for_overflow that checks that so rearranging the code leads to casts since _hwm and _max are char* and x is size_t. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 00:38:50 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 00:38:50 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Thu, 8 Jul 2021 22:33:29 GMT, Kim Barrett wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > src/hotspot/share/memory/arena.hpp line 140: > >> 138: // on both 32 and 64 bit platforms. Required for atomic long operations on 32 bits. >> 139: void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { >> 140: assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT) , "should be a power of 2"); > > [pre-existing] I think this could now be a static_assert. Changing it so. Why is there capital STATIC_ASSERT and not capital static_assert ? > src/hotspot/share/memory/arena.hpp line 155: > >> 153: >> 154: // Allocate in the arena, assuming the size has been aligned to size of pointer, which >> 155: // is 4 bytes on 32 bits, hence the name. > > `Amalloc_4` is a horrible name for this, because this is allocating pointer-aligned and sized values, and even enforces that the size is appropriately pointer aligned. Maybe better would be something like `AmallocP`? Though it can also be used to allocate `size_t` and the like that are non-pointers. I agree that Amalloc_4 is a bad name. I thought of Amalloc_ptr but it's not good either. Amalloc_naturally_aligned.... too long. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 01:25:15 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 01:25:15 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: > Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. > I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! > Tested with tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Improvements ala Kim ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4732/files - new: https://git.openjdk.java.net/jdk/pull/4732/files/fcb50228..db91c5ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4732&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4732&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4732.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4732/head:pull/4732 PR: https://git.openjdk.java.net/jdk/pull/4732 From minqi at openjdk.java.net Fri Jul 9 02:51:08 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Fri, 9 Jul 2021 02:51:08 GMT Subject: RFR: 8267281: Investigate calling MetaspaceShared::link_and_cleanup_shared_classes for jcmd dynamic_dump Message-ID: Hi, Please review When using 'jcmd VM.cds dynamic_dump' to dump dynamic archive, we did not call MetaspaceShared::link_and_cleanup_shared_classes, which is linking linkable shared classes before dump. The classes should be those loaded but not yet linked or loaded during verification. It also will regenerate the lambda form invoker holder classes (those recorded in static archive plus loaded during app run time). For static dump, a separate process spawned to dump the static archive, and for dynamic dump without using jcmd, after dump the process will exit, so this function only called once. With jcmd, we can do multiple dumps to same live process (See bug 8264735) so we need to check if calling this function is safe (do not change runtime data consistency). The lambda form invoker holder classes will be generated every time this function is called either. The function name is renamed to link_shared_classes, since the cleanup work is no longer done by this function. Tests: tier1,tier3,tier4 (going on to finish, without failure found). Thanks Yumin ------------- Commit messages: - 8267281: Investigate calling MetaspaceShared::link_and_cleanup_shared_classes for jcmd dynamic_dump Changes: https://git.openjdk.java.net/jdk/pull/4736/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4736&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267281 Stats: 10 lines in 6 files changed: 1 ins; 0 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/4736.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4736/head:pull/4736 PR: https://git.openjdk.java.net/jdk/pull/4736 From dlong at openjdk.java.net Fri Jul 9 03:51:51 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 9 Jul 2021 03:51:51 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: <5_DLmkZ8CMFVmLtxuuIzc9y5PNR-PflZBBsFQMw_YuQ=.61ae2e8c-dd90-4ae2-85a1-3fcd1fc16f5b@github.com> On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. Marked as reviewed by dlong (Reviewer). I agree, it's confusing that different APIs can't agree on the exact status of compiled native frames. But as Vladimir pointed out, we support them in vframe::new_vframe, and we also support them in vframeStreamCommon::fill_from_compiled_native_frame(), so it seems OK for asJavaVFrame() to support them as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From kbarrett at openjdk.java.net Fri Jul 9 04:07:51 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 04:07:51 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: <4IWDH8jhv8Dq9QAvH-6NK4qogNBNE2698yx5CV07exk=.73819cbb-f928-4644-acf1-896cab947632@github.com> On Fri, 9 Jul 2021 01:25:15 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Improvements ala Kim I replied to some "outdated" conversation threads. I'm not sure how that will show up in the UI or in the Skara emails. GitHub seems to be aggressively hiding some of the conversations in the "Files changed" view, which seems pretty unhelpful. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4732 From dlong at openjdk.java.net Fri Jul 9 04:07:49 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 9 Jul 2021 04:07:49 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Fri, 9 Jul 2021 00:28:38 GMT, Yasumasa Suenaga wrote: >> I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: >> >> >> In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, >> from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: > > It is better if we can treat registers as a class **normally**. Similar change happens for markWord in [JDK-8229258](https://bugs.openjdk.java.net/browse/JDK-8229258). Should we change like now? @YaSuenag I'm in favor of a permanent solution, assuming there's no regression in footprint or code quality. See https://bugs.openjdk.java.net/browse/JDK-8269122 for a related improvement. ------------- PR: https://git.openjdk.java.net/jdk/pull/4719 From kbarrett at openjdk.java.net Fri Jul 9 04:07:53 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 04:07:53 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 00:36:17 GMT, Coleen Phillimore wrote: >> src/hotspot/share/memory/arena.hpp line 140: >> >>> 138: // on both 32 and 64 bit platforms. Required for atomic long operations on 32 bits. >>> 139: void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { >>> 140: assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT) , "should be a power of 2"); >> >> [pre-existing] I think this could now be a static_assert. > > Changing it so. Why is there capital STATIC_ASSERT and not capital static_assert ? Is there now a preference? `static_assert` is new in C++11, `STATIC_ASSERT` is a macro that provides somewhat similar functionality (but without the informative message) that works before C++11. I think we should prefer `static_assert` now; the only (arguable) downside is that (until C++17) the informative message is required. >> src/hotspot/share/memory/arena.hpp line 155: >> >>> 153: >>> 154: // Allocate in the arena, assuming the size has been aligned to size of pointer, which >>> 155: // is 4 bytes on 32 bits, hence the name. >> >> `Amalloc_4` is a horrible name for this, because this is allocating pointer-aligned and sized values, and even enforces that the size is appropriately pointer aligned. Maybe better would be something like `AmallocP`? Though it can also be used to allocate `size_t` and the like that are non-pointers. > > I agree that Amalloc_4 is a bad name. I thought of Amalloc_ptr but it's not good either. Amalloc_naturally_aligned.... too long. Perhaps the naming ought to be Amalloc => AmallocL (long) and Amalloc4 => Amalloc? But even if you agree with that or some other better naming scheme, such renaming probably ought to be a separate thing. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Fri Jul 9 04:07:53 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 04:07:53 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: <4Eh8LTAhgcIkNHKnwOyREE7b3JGt0EBljM3knyEfnpI=.797af79f-e414-41e2-8f8e-090ac6a3a6ec@github.com> On Thu, 8 Jul 2021 21:56:43 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Improvements ala Kim > > src/hotspot/share/memory/arena.hpp line 138: > >> 136: >> 137: // Fast allocate in the arena. Common case aligns to the size of long which is 64 bits >> 138: // on both 32 and 64 bit platforms. Required for atomic long operations on 32 bits. > > s/long/jlong/ because C++ long is not 64 bits on some platforms (Windows!). There were two "long" that need to be "jlong"; only one was changed. > src/hotspot/share/memory/arena.hpp line 157: > >> 155: // is 4 bytes on 32 bits, hence the name. >> 156: void *Amalloc_4(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { >> 157: assert((x & (sizeof(char*)-1)) == 0, "misaligned size"); > > This ought to be using `is_aligned`. The change to BytesPerWord for the alignment seems correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Fri Jul 9 04:07:54 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 04:07:54 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 00:34:57 GMT, Coleen Phillimore wrote: >> [pre-existing. I can't comment directly on line 161, so adding here.] >> The `if` to range check `x` could be an `else if`. Similarly in Amalloc. > > 1, 3. Ok, turned to else if with appropriate brackets. > 2. Both of these places have a call check_for_overflow that checks that so rearranging the code leads to casts since _hwm and _max are char* and x is size_t. Instead of my previously suggested `if (_max - _hwm < x)`, use `if (pointer_delta(_max, _hwm, 1) < x)` to avoid the signed vs unsigned comparison warning. Then I think `check_for_overflow` is just not needed. (Sorry, I forgot to mention previously that I think `check_for_overflow` can be eliminated.) ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Fri Jul 9 04:11:53 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 04:11:53 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: <75-Urc75oS7EgzMbU_AhI9HYTPRMZLLwA18JdXjuVIE=.1b27719c-3d99-4a95-91d1-d3b3d4868b19@github.com> On Fri, 9 Jul 2021 01:25:15 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Improvements ala Kim Yuck! The handling of the replies to "outdated" conversations seems pretty horrible. In "conversation" mode they remain attached to the original change, while in "files changed" mode they don't show up at all. Sorry this seems to be kind of a mess. I also have no idea why these got tagged as "outdated". ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Fri Jul 9 04:35:50 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 04:35:50 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 09:42:43 GMT, Yasumasa Suenaga wrote: > I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: > > > In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, > from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: I'm conditionally approving this as a temporary workaround. A followup RFE to improve things should be filed. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4719 From kbarrett at openjdk.java.net Fri Jul 9 04:35:50 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 04:35:50 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Fri, 9 Jul 2021 00:28:38 GMT, Yasumasa Suenaga wrote: > It is better if we can treat registers as a class **normally**. Similar change happens for markWord in [JDK-8229258](https://bugs.openjdk.java.net/browse/JDK-8229258). Should we change like now? The potential difficulty with doing that is the same as for markWord - see [JDK-8235362](https://bugs.openjdk.java.net/browse/JDK-8235362). A different option might be to make the various categories of Register be enum classes without any defined enumerators. That wouldn't support the existing XXXRegisterImpl derivation from AbstractRegisterImpl. I'm not sure that relationship is needed; it may not even be good. The only use I can find is in the various assert_different_registers, and that could be done differently. (The numerous overloads could also probably be simplified using variadic templates.) ------------- PR: https://git.openjdk.java.net/jdk/pull/4719 From stuefe at openjdk.java.net Fri Jul 9 04:59:52 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 9 Jul 2021 04:59:52 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: <75-Urc75oS7EgzMbU_AhI9HYTPRMZLLwA18JdXjuVIE=.1b27719c-3d99-4a95-91d1-d3b3d4868b19@github.com> References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> <75-Urc75oS7EgzMbU_AhI9HYTPRMZLLwA18JdXjuVIE=.1b27719c-3d99-4a95-91d1-d3b3d4868b19@github.com> Message-ID: On Fri, 9 Jul 2021 04:08:44 GMT, Kim Barrett wrote: > Yuck! The handling of the replies to "outdated" conversations seems pretty horrible. In "conversation" mode they remain attached to the original change, while in "files changed" mode they don't show up at all. Sorry this seems to be kind of a mess. I also have no idea why these got tagged as "outdated". I think they get outdated once a new push is made. I agree that this is confusing. The way we work, the "outdated" tag has no meaning. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From stuefe at openjdk.java.net Fri Jul 9 05:40:54 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 9 Jul 2021 05:40:54 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 03:31:56 GMT, Kim Barrett wrote: >> I agree that Amalloc_4 is a bad name. I thought of Amalloc_ptr but it's not good either. Amalloc_naturally_aligned.... too long. > > Perhaps the naming ought to be Amalloc => AmallocL (long) and Amalloc4 => Amalloc? But even if you agree with that or some other better naming scheme, such renaming probably ought to be a separate thing. I agree. I feel like this should really be the default "Amalloc" (pointer-sized alignment) and Amalloc should really be specific, e.g. "Amalloc64". ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From stuefe at openjdk.java.net Fri Jul 9 05:40:52 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 9 Jul 2021 05:40:52 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 01:25:15 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Improvements ala Kim I like this fix and the suggested naming cleanups. LGTM. Since you are talking about potential improvements: I was always a bit unhappy with this arena code. E.g. I disliked how, instead of letting the Arena itself deal with its chunk chain, it exposed the chain internals and let the Marks modify the chunks from outside. This always seemed wrong to me. There may be more potential improvements: - _hwm and _max should be properties of the chunk, not the arena, and then arguably could be smaller typed offsets instead of pointers. - I am not sure why we need to keep track of both _first and _current chunk in the arenas. I think one pointer would suffice: just holding the last added chunk, which would serve as starting point to traverse the chain. I may miss something here though, maybe traversal order matters somewhere. I also disliked how an Arena would always create its first chunk right when constructed, instead of delaying chunk allocation to the first allocation. You always pay upfront even if you don't allocate from the Arena. If one does all of the above, Arena could maybe shrink to just one member (the top chunk pointer), and then it could be embedded as a value object into Thread instead of having to dynamically create and destroy it. Would be slightly simpler and save one pointer dereferencing when accessing the resource area. You'd also save NMT registration of the Arenas themselves. Cheers, Thomas (I also had vague ideas of re-using Metaspace arena code for these hotspot arenas, which share many similarities. But I am not sure if or when I find the time to play with that idea.) src/hotspot/share/memory/arena.hpp line 36: > 34: > 35: // The byte alignment to be used by Arena::Amalloc. > 36: #define ARENA_AMALLOC_ALIGNMENT BytesPerLong We don't store types in here which need alignment larger than 64bit? eg long double? .. I searched and found no cases where they lived inside an arena, so it's probably fine. src/hotspot/share/memory/arena.hpp line 140: > 138: // on both 32 and 64 bit platforms. Required for atomic jlong operations on 32 bits. > 139: void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { > 140: STATIC_ASSERT(is_power_of_2(ARENA_AMALLOC_ALIGNMENT)); Move this static assert up to the definition of ARENA_AMALLOC_ALIGNMENT? ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4732 From ysuenaga at openjdk.java.net Fri Jul 9 05:59:53 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 9 Jul 2021 05:59:53 GMT Subject: RFR: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 18:02:55 GMT, Erik Joelsson wrote: >> I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: >> >> >> In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, >> from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: > > Build change looks good. Can't comment on the code changes. Thanks @erikj79 @dean-long @kimbarrett to approve this change! I filed a followup RFE as [JDK-8270140](https://bugs.openjdk.java.net/browse/JDK-8270140). I will integrate this change later. ------------- PR: https://git.openjdk.java.net/jdk/pull/4719 From mbaesken at openjdk.java.net Fri Jul 9 11:17:32 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Fri, 9 Jul 2021 11:17:32 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v3] In-Reply-To: References: Message-ID: <42XR7tr8DLQJfVW41iHaTjARZqBYB_QvLP9cq6CHa8I=.80538a01-23e2-405c-a8af-d308c234fef3@github.com> > Hello, please review this PR; it extend the OSContainer API in order to also support the pids controller of cgroups. > > I noticed that unlike the other controllers "cpu", "cpuset", "cpuacct", "memory" on some older Linux distros (SLES 12.1, RHEL 7.1) the pids controller might not be there (or not fully supported) so it was added as optional , see the coding > > > if (!cg_infos[PIDS_IDX]._data_complete) { > log_debug(os, container)("Optional cgroup v1 pids subsystem not found"); > // keep the other controller info, pids is optional > } Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: test and small adjustments suggested by Severin ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4518/files - new: https://git.openjdk.java.net/jdk/pull/4518/files/afd7bf61..f5527143 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4518&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4518&range=01-02 Stats: 103 lines in 7 files changed: 92 ins; 2 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/4518.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4518/head:pull/4518 PR: https://git.openjdk.java.net/jdk/pull/4518 From aw at openjdk.java.net Fri Jul 9 11:27:52 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Fri, 9 Jul 2021 11:27:52 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. > My concern comes from fact that we don't treat native frame as as java frame: frame.cpp#L183. And frame::is_compiled_frame() specifically checks that compiled method is Java method. Indeed, there's an inconsistency between `frame` and `vframe` w.r.t. native frames: * `frame`: `is_java_frame()` and `is_compiled_frame()` return false * `vframe`: `is_java_frame()` and `is_compiled_frame()` return true (since they're represented as `compiledVFrame`) I can change the guarantees to asserts. ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From mbaesken at openjdk.java.net Fri Jul 9 11:38:56 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Fri, 9 Jul 2021 11:38:56 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v2] In-Reply-To: <4TD_2jJOnOQ6-D2eCFdJzF3tQg_H-Vm6IrFcyX_xSIw=.028fbe3f-bc04-4b9c-8b35-a6a450a80f7f@github.com> References: <4TD_2jJOnOQ6-D2eCFdJzF3tQg_H-Vm6IrFcyX_xSIw=.028fbe3f-bc04-4b9c-8b35-a6a450a80f7f@github.com> Message-ID: On Tue, 29 Jun 2021 08:21:51 GMT, Severin Gehwolf wrote: > This looks pretty good now. Looking forward to seeing container tests for this new code. Hi Severin , I did some adjustments following your suggestions. I added docker based test coding for testing pids-limit (with limits and also with unlimited value). I noticed that on our ppc64le based Linux , the message "WARNING: Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded." shows up , and the docker "--pids-limit" limitation does not work because of this. So I had to take this into account. ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From aw at openjdk.java.net Fri Jul 9 12:15:24 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Fri, 9 Jul 2021 12:15:24 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. I converted the guarantees in asJavaVFrame() to asserts. Let me know if any of the asserts should remain guarantees. ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From aw at openjdk.java.net Fri Jul 9 12:15:22 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Fri, 9 Jul 2021 12:15:22 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames [v2] In-Reply-To: References: Message-ID: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. Andreas Woess has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains five new commits since the last revision: - Convert guarantee()s in asJavaVFrame() to assert()s. - [JVMCI] Test iterateFrames with native frames. - [JVMCI] Optimize iterateFrames. - Add support for native frames to vframeStreamCommon::asJavaVFrame(). - Add getters to vframeStreamCommon. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4625/files - new: https://git.openjdk.java.net/jdk/pull/4625/files/b7b029c0..1d53feb5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4625&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4625&range=00-01 Stats: 12 lines in 3 files changed: 2 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/4625.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4625/head:pull/4625 PR: https://git.openjdk.java.net/jdk/pull/4625 From stuefe at openjdk.java.net Fri Jul 9 13:28:53 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 9 Jul 2021 13:28:53 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 01:25:15 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Improvements ala Kim One thought, maybe it should keep the alignment adjustable with a diagnostic switch in debug. At least for a little while. Since we may uncover hidden overwrite issues with the reduced alignment. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From sgehwolf at openjdk.java.net Fri Jul 9 13:44:56 2021 From: sgehwolf at openjdk.java.net (Severin Gehwolf) Date: Fri, 9 Jul 2021 13:44:56 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v2] In-Reply-To: References: <4TD_2jJOnOQ6-D2eCFdJzF3tQg_H-Vm6IrFcyX_xSIw=.028fbe3f-bc04-4b9c-8b35-a6a450a80f7f@github.com> Message-ID: On Fri, 9 Jul 2021 11:35:34 GMT, Matthias Baesken wrote: > > This looks pretty good now. Looking forward to seeing container tests for this new code. > > Hi Severin , I did some adjustments following your suggestions. > I added docker based test coding for testing pids-limit (with limits and also with unlimited value). > I noticed that on our ppc64le based Linux , the message "WARNING: Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded." shows up , and the docker "--pids-limit" limitation does not work because of this. > So I had to take this into account. OK. Please also add a test on the hotspot side. You may want to add relevant parts to `TestMisc.java`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From mbaesken at openjdk.java.net Fri Jul 9 13:56:52 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Fri, 9 Jul 2021 13:56:52 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v2] In-Reply-To: References: <4TD_2jJOnOQ6-D2eCFdJzF3tQg_H-Vm6IrFcyX_xSIw=.028fbe3f-bc04-4b9c-8b35-a6a450a80f7f@github.com> Message-ID: On Fri, 9 Jul 2021 13:42:15 GMT, Severin Gehwolf wrote: > OK. Please also add a test on the hotspot side. You may want to add relevant parts to `TestMisc.java`. Thanks for the suggestion, I will look into TestMisc.java . ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From coleenp at openjdk.java.net Fri Jul 9 14:01:55 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:01:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 03:56:03 GMT, Kim Barrett wrote: >> Changing it so. Why is there capital STATIC_ASSERT and not capital static_assert ? Is there now a preference? > > `static_assert` is new in C++11, `STATIC_ASSERT` is a macro that provides somewhat similar functionality (but without the informative message) that works before C++11. I think we should prefer `static_assert` now; the only (arguable) downside is that (until C++17) the informative message is required. I was sort of underwhelmed with the usefulness of this: runtime/synchronizer.cpp: static_assert(is_power_of_2(NINFLATIONLOCKS), "must be"); I'll change it to lower case static_assert ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:01:56 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:01:56 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: <4Eh8LTAhgcIkNHKnwOyREE7b3JGt0EBljM3knyEfnpI=.797af79f-e414-41e2-8f8e-090ac6a3a6ec@github.com> References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> <4Eh8LTAhgcIkNHKnwOyREE7b3JGt0EBljM3knyEfnpI=.797af79f-e414-41e2-8f8e-090ac6a3a6ec@github.com> Message-ID: On Fri, 9 Jul 2021 03:57:40 GMT, Kim Barrett wrote: >> src/hotspot/share/memory/arena.hpp line 157: >> >>> 155: // is 4 bytes on 32 bits, hence the name. >>> 156: void *Amalloc_4(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { >>> 157: assert((x & (sizeof(char*)-1)) == 0, "misaligned size"); >> >> This ought to be using `is_aligned`. > > The change to BytesPerWord for the alignment seems correct. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:13:56 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:13:56 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: <-0PoF8zolz52ppH9tTNTi0-TzKvcexnyiX5LfG4zlok=.afdee3a9-b86a-4d44-8db6-faa1d6b7528b@github.com> On Fri, 9 Jul 2021 05:06:10 GMT, Thomas Stuefe wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Improvements ala Kim > > src/hotspot/share/memory/arena.hpp line 36: > >> 34: >> 35: // The byte alignment to be used by Arena::Amalloc. >> 36: #define ARENA_AMALLOC_ALIGNMENT BytesPerLong > > We don't store types in here which need alignment larger than 64bit? eg long double? > > .. I searched and found no cases where they lived inside an arena, so it's probably fine. I don't think so. I'm not sure how to search for that if there are any. > src/hotspot/share/memory/arena.hpp line 140: > >> 138: // on both 32 and 64 bit platforms. Required for atomic jlong operations on 32 bits. >> 139: void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { >> 140: STATIC_ASSERT(is_power_of_2(ARENA_AMALLOC_ALIGNMENT)); > > Move this static assert up to the definition of ARENA_AMALLOC_ALIGNMENT? ok, can do that. It looks funny but it compiles. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:13:55 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:13:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: <6c12jO8VFd2ED1txsDZD4_ZgeoG5QUQiBMzTgNgfwx8=.9157490f-77a3-4164-b418-6c5759abd207@github.com> On Fri, 9 Jul 2021 01:25:15 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Improvements ala Kim 2 comments in this view. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:13:57 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:13:57 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: <4Eh8LTAhgcIkNHKnwOyREE7b3JGt0EBljM3knyEfnpI=.797af79f-e414-41e2-8f8e-090ac6a3a6ec@github.com> References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> <4Eh8LTAhgcIkNHKnwOyREE7b3JGt0EBljM3knyEfnpI=.797af79f-e414-41e2-8f8e-090ac6a3a6ec@github.com> Message-ID: <2mlvbonBJeMmBKBJ3lNAEhI8grsNaVCnB6zjwaExP-0=.de2b3c4a-e588-4191-8f14-0c597fd4047c@github.com> On Fri, 9 Jul 2021 03:52:59 GMT, Kim Barrett wrote: >> src/hotspot/share/memory/arena.hpp line 138: >> >>> 136: >>> 137: // Fast allocate in the arena. Common case aligns to the size of long which is 64 bits >>> 138: // on both 32 and 64 bit platforms. Required for atomic long operations on 32 bits. >> >> s/long/jlong/ because C++ long is not 64 bits on some platforms (Windows!). > > There were two "long" that need to be "jlong"; only one was changed. ok, got it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:23:50 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:23:50 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 03:40:31 GMT, Kim Barrett wrote: >> 1, 3. Ok, turned to else if with appropriate brackets. >> 2. Both of these places have a call check_for_overflow that checks that so rearranging the code leads to casts since _hwm and _max are char* and x is size_t. > > Instead of my previously suggested `if (_max - _hwm < x)`, use `if (pointer_delta(_max, _hwm, 1) < x)` to avoid the signed vs unsigned comparison warning. Then I think `check_for_overflow` is just not needed. (Sorry, I forgot to mention previously that I think `check_for_overflow` can be eliminated.) check_for_overflow does more than just the pointer comparison. It conditionally calls vm_exit_out_of_memory. I don't want to copy this into both of the Amalloc function, and it has a nice name vs. squinting and trying to understand why "if (pointer_delta(_max, _hwm, 1) < x)" does the same thing. As is, this is readable and I don't want to change it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:30:55 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:30:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 14:27:21 GMT, Coleen Phillimore wrote: >> I agree. I feel like this should really be the default "Amalloc" (pointer-sized alignment) and Amalloc should really be specific, e.g. "Amalloc64". > > For some reason, the new me doesn't want to change all of these calls, but there are _only_ 45 Amalloc and now 38 Amalloc_4s. > I agree with both of you though. Amalloc should be the default pointer sized alignment, and existing Amalloc should be jlong aligned. So Amalloc_4 => Amalloc and Amalloc => Amalloc64. I suspect the Amalloc64s could really be Amalloc but that only matters on 32 bit platforms which we don't support. They can be changed by supporters of 32 bit platforms though. It seems like this can do that renaming to get this done. The reason I worked on this enhancement is to get it out of the backlog, so I don't want to add another fairly simple RFE to the backlog. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:30:55 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:30:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 05:11:12 GMT, Thomas Stuefe wrote: >> Perhaps the naming ought to be Amalloc => AmallocL (long) and Amalloc4 => Amalloc? But even if you agree with that or some other better naming scheme, such renaming probably ought to be a separate thing. > > I agree. I feel like this should really be the default "Amalloc" (pointer-sized alignment) and Amalloc should really be specific, e.g. "Amalloc64". For some reason, the new me doesn't want to change all of these calls, but there are _only_ 45 Amalloc and now 38 Amalloc_4s. I agree with both of you though. Amalloc should be the default pointer sized alignment, and existing Amalloc should be jlong aligned. So Amalloc_4 => Amalloc and Amalloc => Amalloc64. I suspect the Amalloc64s could really be Amalloc but that only matters on 32 bit platforms which we don't support. They can be changed by supporters of 32 bit platforms though. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:50:19 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:50:19 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v3] In-Reply-To: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: > Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. > I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! > Tested with tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Move static_assert and fix jlong. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4732/files - new: https://git.openjdk.java.net/jdk/pull/4732/files/db91c5ae..220c3cb8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4732&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4732&range=01-02 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4732.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4732/head:pull/4732 PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 14:50:20 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 14:50:20 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 01:25:15 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Improvements ala Kim I'll try to answer the main comments in one message. I hope I haven't lost any. 1. I don't want to add a switch to keep the original alignment. I think we should be confident that this change is correct or not make it! It's the beginning of the release. We have time to find any bugs that are unlikely to fall out. 2. Funny that metaspace was originally inherited from Arena, as you know, but I'd rather see a simpler implementation for Arena if rewritten than using all of Metaspace. Metaspace needs to deallocate from the middle and Arena doesn't. Arena uses malloc which is better imo except for on one platform. Rewriting Arena isn't something that's on the top of our list, please only file a bug if you think you're going to do it @tstuefe :) 3. I'll file another RFE for renaming (ignore some comment above that I can't find now). ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From stuefe at openjdk.java.net Fri Jul 9 15:02:54 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 9 Jul 2021 15:02:54 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v2] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 14:43:37 GMT, Coleen Phillimore wrote: > I'll try to answer the main comments in one message. I hope I haven't lost any. > > 1. I don't want to add a switch to keep the original alignment. I think we should be confident that this change is correct or not make it! It's the beginning of the release. We have time to find any bugs that are unlikely to fall out. Okay, sure. > 2. Funny that metaspace was originally inherited from Arena, as you know, but I'd rather see a simpler implementation for Arena if rewritten than using all of Metaspace. Metaspace needs to deallocate from the middle and Arena doesn't. Arena uses malloc which is better imo except for on one platform. Rewriting Arena isn't something that's on the top of our list, please only file a bug if you think you're going to do it @tstuefe :) Of course, sorry, I was not suggesting someone other should do it. I was just interested in your thoughts. > 3. I'll file another RFE for renaming (ignore some comment above that I can't find now). Change looks still good to me. ..Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 15:10:51 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 15:10:51 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v3] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 14:50:19 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Move static_assert and fix jlong. Thanks Thomas! Since Arena's haven't been a problem we haven't actually thought about them, other than this RFE and now the new RFE JDK-8270179. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kvn at openjdk.java.net Fri Jul 9 15:50:56 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 9 Jul 2021 15:50:56 GMT Subject: RFR: 8269592: [JVMCI] Optimize c2v_iterateFrames [v2] In-Reply-To: References: Message-ID: On Fri, 9 Jul 2021 12:15:22 GMT, Andreas Woess wrote: >> Several smaller optimizations and cleanups to JVMCI's iterateFrames: >> * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. >> * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. >> * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. >> Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. >> * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). >> These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). >> * Only resolve the callback interface method once per iterateFrames call. >> * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. >> * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). >> * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. > > Andreas Woess has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains five new commits since the last revision: > > - Convert guarantee()s in asJavaVFrame() to assert()s. > - [JVMCI] Test iterateFrames with native frames. > - [JVMCI] Optimize iterateFrames. > - Add support for native frames to vframeStreamCommon::asJavaVFrame(). > - Add getters to vframeStreamCommon. Approved. Please, run mach5 testing before push. Tom or Dean can help. Thank you, Tom, for Graal's PR link. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4625 From iklam at openjdk.java.net Fri Jul 9 18:18:20 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 9 Jul 2021 18:18:20 GMT Subject: RFR: 8270059: Remove KVHashtable [v2] In-Reply-To: References: Message-ID: > There are now only 2 uses of KVHashtable in the HotSpot code. They can be easily rewritten to use ResourceHashtable and ResizeableResourceHashtable. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8270059-remove-kvhashtable - 8270059: Remove KVHashtable ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4715/files - new: https://git.openjdk.java.net/jdk/pull/4715/files/78dc95d3..934e8347 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4715&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4715&range=00-01 Stats: 2916 lines in 152 files changed: 1605 ins; 899 del; 412 mod Patch: https://git.openjdk.java.net/jdk/pull/4715.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4715/head:pull/4715 PR: https://git.openjdk.java.net/jdk/pull/4715 From coleenp at openjdk.java.net Fri Jul 9 19:07:57 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 19:07:57 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v3] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: <83Tp5XZlDwUxXUedlBNOtlIDQJSXs_-00P3mC4vMlYM=.6a66b22b-dea9-49c8-a5d3-10ede4b0626c@github.com> On Fri, 9 Jul 2021 14:28:50 GMT, Coleen Phillimore wrote: >> For some reason, the new me doesn't want to change all of these calls, but there are _only_ 45 Amalloc and now 38 Amalloc_4s. >> I agree with both of you though. Amalloc should be the default pointer sized alignment, and existing Amalloc should be jlong aligned. So Amalloc_4 => Amalloc and Amalloc => Amalloc64. I suspect the Amalloc64s could really be Amalloc but that only matters on 32 bit platforms which we don't support. They can be changed by supporters of 32 bit platforms though. > > It seems like this can do that renaming to get this done. The reason I worked on this enhancement is to get it out of the backlog, so I don't want to add another fairly simple RFE to the backlog. I filed JDK-8270179 for the follow-on renaming. It's almost trivial but not quite. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From iklam at openjdk.java.net Fri Jul 9 19:31:56 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 9 Jul 2021 19:31:56 GMT Subject: RFR: 8270059: Remove KVHashtable In-Reply-To: <0d0dqcoHMDEawcRk8BEbwiYWvfTM2Ez8Fwnu6bQLESE=.4c67f363-ed10-4859-8a7b-55a58a80bdb6@github.com> References: <0d0dqcoHMDEawcRk8BEbwiYWvfTM2Ez8Fwnu6bQLESE=.4c67f363-ed10-4859-8a7b-55a58a80bdb6@github.com> Message-ID: <9bc4X4ym-iYi_8hGK6Gcl5w24k-W5TddN509OtJdcLs=.75640ba5-e00c-41bb-b7a6-bd16d6f29956@github.com> On Thu, 8 Jul 2021 06:53:33 GMT, David Holmes wrote: >> There are now only 2 uses of KVHashtable in the HotSpot code. They can be easily rewritten to use ResourceHashtable and ResizeableResourceHashtable. > > Thanks for clarifying. Thanks @dholmes-ora and @coleenp for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/4715 From iklam at openjdk.java.net Fri Jul 9 19:31:57 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 9 Jul 2021 19:31:57 GMT Subject: Integrated: 8270059: Remove KVHashtable In-Reply-To: References: Message-ID: <2h4qvnte49_xUTR6vXYpRWX3sVRScPdP0WLO7pxpOB0=.fd3d0c4a-1833-4fe7-885a-44ee7bea2091@github.com> On Thu, 8 Jul 2021 04:18:22 GMT, Ioi Lam wrote: > There are now only 2 uses of KVHashtable in the HotSpot code. They can be easily rewritten to use ResourceHashtable and ResizeableResourceHashtable. This pull request has now been integrated. Changeset: d6c0f5fa Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/d6c0f5fa22d2fc07a4d8957d7ad005c03df9f8d2 Stats: 133 lines in 6 files changed: 6 ins; 110 del; 17 mod 8270059: Remove KVHashtable Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/4715 From kbarrett at openjdk.java.net Fri Jul 9 20:08:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 20:08:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v3] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 13:59:01 GMT, Coleen Phillimore wrote: >> `static_assert` is new in C++11, `STATIC_ASSERT` is a macro that provides somewhat similar functionality (but without the informative message) that works before C++11. I think we should prefer `static_assert` now; the only (arguable) downside is that (until C++17) the informative message is required. > > I was sort of underwhelmed with the usefulness of this: > runtime/synchronizer.cpp: static_assert(is_power_of_2(NINFLATIONLOCKS), "must be"); > I'll change it to lower case static_assert Kind of like the required message for our assert macro. static_assert was fixed in C++17. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Fri Jul 9 20:08:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 20:08:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v3] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 14:20:58 GMT, Coleen Phillimore wrote: >> Instead of my previously suggested `if (_max - _hwm < x)`, use `if (pointer_delta(_max, _hwm, 1) < x)` to avoid the signed vs unsigned comparison warning. Then I think `check_for_overflow` is just not needed. (Sorry, I forgot to mention previously that I think `check_for_overflow` can be eliminated.) > > check_for_overflow does more than just the pointer comparison. It conditionally calls vm_exit_out_of_memory. I don't want to copy this into both of the Amalloc function, and it has a nice name vs. squinting and trying to understand why "if (pointer_delta(_max, _hwm, 1) < x)" does the same thing. As is, this is readable and I don't want to change it. Such a use of pointer_delta is one of its canonical use-cases, as discussed in the comment describing that function. grow does everything that check_for_overflow does, except correctly. check_for_overflow can incorrectly report an error if the current chunk happens to be near the end of the address space, even though malloc can succeed. The current code with the call to check_for_overflow is incorrect, slower, and larger than it needs to be. It's got nothing at all going for it except the "whence" string, which is not enough to make up for the other problems. If that feature is important, add it to grow(). ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Fri Jul 9 20:23:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 20:23:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v3] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 14:50:19 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Move static_assert and fix jlong. Changes requested by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From kbarrett at openjdk.java.net Fri Jul 9 22:38:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 9 Jul 2021 22:38:55 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v3] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 14:50:19 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Move static_assert and fix jlong. src/hotspot/share/memory/arena.hpp line 37: > 35: // The byte alignment to be used by Arena::Amalloc. > 36: #define ARENA_AMALLOC_ALIGNMENT BytesPerLong > 37: static_assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT), "is aligned"); This static_assert could just be dropped, along with ARENA_ALIGN_M1 and ARENA_ALIGN_MASK, and change ARENA_ALIGN to `#define ARENA_ALIGN(x) (align_up((x), ARENA_AMALLOC_ALIGNMENT))` or replace the macro with `constexpr size_t arena_align(size_t x) { return align_up(x, ARENA_AMALLOC_ALIGNMENT); }` (and either way, #include align.hpp). ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 22:38:56 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 22:38:56 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v3] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 20:19:31 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Move static_assert and fix jlong. > > src/hotspot/share/memory/arena.hpp line 37: > >> 35: // The byte alignment to be used by Arena::Amalloc. >> 36: #define ARENA_AMALLOC_ALIGNMENT BytesPerLong >> 37: static_assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT), "is aligned"); > > This static_assert could just be dropped, along with ARENA_ALIGN_M1 and ARENA_ALIGN_MASK, and change ARENA_ALIGN to > `#define ARENA_ALIGN(x) (align_up((x), ARENA_AMALLOC_ALIGNMENT))` > or replace the macro with > `constexpr size_t arena_align(size_t x) { return align_up(x, ARENA_AMALLOC_ALIGNMENT); }` > (and either way, #include align.hpp). sure, makes sense. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Fri Jul 9 22:52:25 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 9 Jul 2021 22:52:25 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v4] In-Reply-To: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: > Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. > I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! > Tested with tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix ARENA_ALIGN macro ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4732/files - new: https://git.openjdk.java.net/jdk/pull/4732/files/220c3cb8..7ebc7421 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4732&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4732&range=02-03 Stats: 6 lines in 1 file changed: 1 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4732.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4732/head:pull/4732 PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Sat Jul 10 00:04:57 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sat, 10 Jul 2021 00:04:57 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v4] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 20:04:08 GMT, Kim Barrett wrote: >> check_for_overflow does more than just the pointer comparison. It conditionally calls vm_exit_out_of_memory. I don't want to copy this into both of the Amalloc function, and it has a nice name vs. squinting and trying to understand why "if (pointer_delta(_max, _hwm, 1) < x)" does the same thing. As is, this is readable and I don't want to change it. > > Such a use of pointer_delta is one of its canonical use-cases, as discussed in the comment describing that function. > > grow does everything that check_for_overflow does, except correctly. check_for_overflow can incorrectly report an error if the current chunk happens to be near the end of the address space, even though malloc can succeed. > > The current code with the call to check_for_overflow is incorrect, slower, and larger than it needs to be. It's got nothing at all going for it except the "whence" string, which is not enough to make up for the other problems. If that feature is important, add it to grow(). I'll file another RFE to fix it. I hope you can do it because I would only cut/paste the pointer diff expressions that you type here. This has already exceeded the original RFE change. JDK-8270217 One backlog item removed, two created JDK-8270217. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From jwilhelm at openjdk.java.net Sat Jul 10 00:26:17 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Sat, 10 Jul 2021 00:26:17 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8268826: Cleanup Override in Context-Specific Deserialization Filters - 8261147: C2: Node is wrongly marked as reduction resulting in a wrong execution due to wrong vector instructions - 8270151: IncompatibleClassChangeError on empty pattern switch statement case - 8269146: Missing unreported constraints on pattern and other case label combination - 8269952: compiler/vectorapi/VectorCastShape*Test.java tests failed on avx2 machines - 8269840: Update Platform.isDefaultCDSArchiveSupported() to return true for aarch64 platforms The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4748&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4748&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4748/files Stats: 691 lines in 29 files changed: 544 ins; 66 del; 81 mod Patch: https://git.openjdk.java.net/jdk/pull/4748.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4748/head:pull/4748 PR: https://git.openjdk.java.net/jdk/pull/4748 From jwilhelm at openjdk.java.net Sat Jul 10 01:26:52 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Sat, 10 Jul 2021 01:26:52 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: On Sat, 10 Jul 2021 00:17:07 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: ec975c6a Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/ec975c6a055688c014e709917dcfc340037e684f Stats: 691 lines in 29 files changed: 544 ins; 66 del; 81 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4748 From ysuenaga at openjdk.java.net Sat Jul 10 05:04:58 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Sat, 10 Jul 2021 05:04:58 GMT Subject: Integrated: 8270083: -Wnonnull errors happen with GCC 11.1.1 In-Reply-To: References: Message-ID: <9x39bpx0Vb4Xd4snMsaUEza_inopGhThmnmnMVGI_7E=.fafa5a94-be02-443a-b402-7d8b5b974d2e@github.com> On Thu, 8 Jul 2021 09:42:43 GMT, Yasumasa Suenaga wrote: > I attempted to build OpenJDK on Fedora 34 with gcc-11.1.1-3.fc34.x86_64, but I saw following errors: > > > In file included from /home/ysuenaga/github-forked/jdk/src/hotspot/share/runtime/frame.inline.hpp:42, > from /home/ysuenaga/github-forked/jdk/src/hotspot/cpu/x86/abstractInterpreter_x86.cpp:29: This pull request has now been integrated. Changeset: 68b6e11e Author: Yasumasa Suenaga URL: https://git.openjdk.java.net/jdk/commit/68b6e11e481349e40014aa4593a53ae2ea74aedc Stats: 42 lines in 9 files changed: 38 ins; 0 del; 4 mod 8270083: -Wnonnull errors happen with GCC 11.1.1 Reviewed-by: erikj, dlong, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/4719 From kbarrett at openjdk.java.net Sat Jul 10 06:52:57 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 10 Jul 2021 06:52:57 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v4] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 22:52:25 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix ARENA_ALIGN macro Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4732 From sspitsyn at openjdk.java.net Sun Jul 11 08:13:30 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Sun, 11 Jul 2021 08:13:30 GMT Subject: [jdk17] RFR: 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec [v2] In-Reply-To: References: Message-ID: > The fix of: > 8252657 JVMTI agent is not unloaded when Agent_OnAttach is failed > did not update the JVM TI spec history at the end of document. > This PR adds missed item to the JVM TI spec history. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge - 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec ------------- Changes: - all: https://git.openjdk.java.net/jdk17/pull/233/files - new: https://git.openjdk.java.net/jdk17/pull/233/files/7a5b9b3b..82f97000 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk17&pr=233&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk17&pr=233&range=00-01 Stats: 828 lines in 37 files changed: 668 ins; 67 del; 93 mod Patch: https://git.openjdk.java.net/jdk17/pull/233.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/233/head:pull/233 PR: https://git.openjdk.java.net/jdk17/pull/233 From sspitsyn at openjdk.java.net Sun Jul 11 11:07:01 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Sun, 11 Jul 2021 11:07:01 GMT Subject: [jdk17] Integrated: 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec In-Reply-To: References: Message-ID: On Thu, 8 Jul 2021 19:11:45 GMT, Serguei Spitsyn wrote: > The fix of: > 8252657 JVMTI agent is not unloaded when Agent_OnAttach is failed > did not update the JVM TI spec history at the end of document. > This PR adds missed item to the JVM TI spec history. This pull request has now been integrated. Changeset: 3d82b0e6 Author: Serguei Spitsyn URL: https://git.openjdk.java.net/jdk17/commit/3d82b0e634583f4bc01ceece9dd82fc00fd6f9c3 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec Reviewed-by: dcubed, cjplummer ------------- PR: https://git.openjdk.java.net/jdk17/pull/233 From coleenp at openjdk.java.net Sun Jul 11 18:19:59 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sun, 11 Jul 2021 18:19:59 GMT Subject: RFR: 8253779: Amalloc may be wasting space by overaligning [v4] In-Reply-To: References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: On Fri, 9 Jul 2021 22:52:25 GMT, Coleen Phillimore wrote: >> Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. >> I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix ARENA_ALIGN macro Thanks Kim. Thank you for the suggestions on other improvements. ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From coleenp at openjdk.java.net Sun Jul 11 18:20:00 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sun, 11 Jul 2021 18:20:00 GMT Subject: Integrated: 8253779: Amalloc may be wasting space by overaligning In-Reply-To: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> References: <3C-18DwlsyIGa_VBCOLY89gkYj_Az0cZY4nshVYiT8Y=.3ee95921-5a40-4f16-9d09-162cfd9413b5@github.com> Message-ID: <1nq-vlnVDMIwmViSxB-Efeoaa6HLp0hiMHEgWLduFbY=.6f641836-8192-4ec3-8db0-e48350d33482@github.com> On Thu, 8 Jul 2021 19:56:33 GMT, Coleen Phillimore wrote: > Thanks to @kimbarrett for noticing this. The alignment was changed to 64 bits for 32 bit platforms, but overalign for 64 bits platforms. I changed this to BytesPerLong to cover both, since the long case is why it was changed on 32 bits in the first place in JDK-4526490. > I also removed Amalloc_D since I don't know what D stands for and it's the same as Amalloc_4. That's not a great name either. I'm open to suggestions! > Tested with tier1-3. This pull request has now been integrated. Changeset: ac75a53f Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/ac75a53fc513cce2a1aa266f0b7235d150a76c01 Stats: 43 lines in 7 files changed: 2 ins; 20 del; 21 mod 8253779: Amalloc may be wasting space by overaligning Reviewed-by: kbarrett, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/4732 From apangin at openjdk.java.net Sun Jul 11 22:24:58 2021 From: apangin at openjdk.java.net (Andrei Pangin) Date: Sun, 11 Jul 2021 22:24:58 GMT Subject: RFR: 8178287: AsyncGetCallTrace fails to traverse valid Java stacks [v3] In-Reply-To: References: <9qfnLj_-jz8MocK7UIIs5-NYZsVPJ7J20ZLiORqpUlM=.cb712662-0eb9-4d17-a67d-42451423f470@github.com> Message-ID: On Fri, 18 Jun 2021 08:56:32 GMT, Ludovic Henry wrote: >> When the signal sent for AsyncGetCallTrace or JFR would land on a runtime stub (like arraycopy), a vtable stub, or the prolog of a compiled method, it wouldn't be able to detect the sender (caller) frame for multiple reasons. This patch fixes these cases through adding CodeBlob-specific frame parser which are in the best position to know how a frame is setup. >> >> The following examples have been profiled with honest-profiler which uses `AsyncGetCallTrace`. >> >> # `Prof1` >> >> public class Prof1 { >> >> public static void main(String[] args) { >> StringBuilder sb = new StringBuilder(); >> for (int i = 0; i < 1000000; i++) { >> sb.append("ab"); >> sb.delete(0, 1); >> } >> System.out.println(sb.length()); >> } >> } >> >> >> - Baseline: >> >> Flat Profile (by method): >> (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] >> (t 0.5,s 0.2) Prof1::main >> (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift >> (t 0.0,s 0.0) java.lang.String::getBytes >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt >> (t 0.0,s 0.0) java.lang.StringBuilder::delete >> (t 0.2,s 0.0) java.lang.StringBuilder::append >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt >> >> - With `StubRoutinesBlob::FrameParser`: >> >> Flat Profile (by method): >> (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal >> (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete >> (t 99.8,s 0.2) Prof1::main >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] >> (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append >> (t 98.8,s 0.0) java.lang.StringBuilder::append >> (t 0.9,s 0.0) java.lang.StringBuilder::delete >> >> >> # `Prof2` >> >> import java.util.function.Supplier; >> >> public class Prof2 { >> >> public static void main(String[] args) { >> var rand = new java.util.Random(0); >> Supplier[] suppliers = { >> () -> 0, >> () -> 1, >> () -> 2, >> () -> 3, >> }; >> >> long sum = 0; >> for (int i = 0; i >= 0; i++) { >> sum += (int)suppliers[i % suppliers.length].get(); >> } >> } >> } >> >> >> - Baseline: >> >> Flat Profile (by method): >> (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] >> (t 39.2,s 35.2) Prof2::main >> (t 1.4,s 1.4) Prof2::lambda$main$3 >> (t 1.0,s 1.0) Prof2::lambda$main$2 >> (t 0.9,s 0.9) Prof2::lambda$main$1 >> (t 0.7,s 0.7) Prof2::lambda$main$0 >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> (t 0.0,s 0.0) java.lang.Thread::exit >> (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get >> (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get >> (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get >> (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get >> >> >> - With `VtableBlob::FrameParser` and `nmethod::FrameParser`: >> >> Flat Profile (by method): >> (t 74.1,s 70.3) Prof2::main >> (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get >> (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get >> (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get >> (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get >> (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] >> (t 1.2,s 1.2) Prof2::lambda$main$2 >> (t 0.9,s 0.9) Prof2::lambda$main$3 >> (t 0.9,s 0.9) Prof2::lambda$main$1 >> (t 0.7,s 0.7) Prof2::lambda$main$0 >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments Hi Ludovic, Thank you for working on this long-standing bug. I like the idea of the proposed solution, but unfortunately it cannot be applied as is. Since the stack walking code runs inside a signal handler, it is very limited in things it can do. In particular, it must not allocate, acquire locks, etc. In your implementation, FrameParser does allocate though. The issue is not just theoretical: when I ran JDK with this patch with async-profiler, I immediately got the following deadlock: (gdb) bt #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fa2363ca025 in __GI___pthread_mutex_lock (mutex=0x7fa235da5440 ) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007fa235696cb6 in ThreadCritical::ThreadCritical() () from /usr/java/jdk-18/lib/server/libjvm.so #3 0x00007fa234b6fe53 in Chunk::next_chop() () from /usr/java/jdk-18/lib/server/libjvm.so #4 0x00007fa234e88523 in frame::safe_for_sender(JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so #5 0x00007fa234e838f2 in vframeStreamForte::forte_next() () from /usr/java/jdk-18/lib/server/libjvm.so #6 0x00007fa2349fbb9b in forte_fill_call_trace_given_top(JavaThread*, ASGCT_CallTrace*, int, frame) [clone .isra.20] () from /usr/java/jdk-18/lib/server/libjvm.so #7 0x00007fa234e8426e in AsyncGetCallTrace () from /usr/java/jdk-18/lib/server/libjvm.so #8 0x00007fa228519312 in Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int) () from /mnt/c/Users/Andrei/java/async-profiler/build/libasyncProfiler.so #9 0x00007fa228519c72 in Profiler::recordSample(void*, unsigned long long, int, Event*) () from /mnt/c/Users/Andrei/java/async-profiler/build/libasyncProfiler.so #10 0x00007fa2285164f8 in WallClock::signalHandler(int, siginfo_t*, void*) () from /mnt/c/Users/Andrei/java/async-profiler/build/libasyncProfiler.so #11 #12 __pthread_mutex_unlock_usercnt (decr=1, mutex=0x7fa235da5440 ) at pthread_mutex_unlock.c:41 #13 __GI___pthread_mutex_unlock (mutex=0x7fa235da5440 ) at pthread_mutex_unlock.c:356 #14 0x00007fa235696d3b in ThreadCritical::~ThreadCritical() () from /usr/java/jdk-18/lib/server/libjvm.so #15 0x00007fa234b6fe71 in Chunk::next_chop() () from /usr/java/jdk-18/lib/server/libjvm.so #16 0x00007fa234d1ca62 in ClassFileParser::parse_method(ClassFileStream const*, bool, ConstantPool const*, AccessFlags*, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so #17 0x00007fa234d1e338 in ClassFileParser::parse_methods(ClassFileStream const*, bool, AccessFlags*, bool*, bool*, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so #18 0x00007fa234d22459 in ClassFileParser::parse_stream(ClassFileStream const*, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so #19 0x00007fa234d2291c in ClassFileParser::ClassFileParser(ClassFileStream*, Symbol*, ClassLoaderData*, ClassLoadInfo const*, ClassFileParser::Publicity, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so #20 0x00007fa2351febb6 in KlassFactory::create_from_stream(ClassFileStream*, Symbol*, ClassLoaderData*, ClassLoadInfo const&, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so #21 0x00007fa235645b40 in SystemDictionary::resolve_class_from_stream(ClassFileStream*, Symbol*, Handle, ClassLoadInfo const&, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so #22 0x00007fa2350bad0a in jvm_define_class_common(char const*, _jobject*, signed char const*, int, _jobject*, char const*, JavaThread*) [clone .constprop.299] () from /usr/java/jdk-18/lib/server/libjvm.so #23 0x00007fa2350bae6d in JVM_DefineClassWithSource () from /usr/java/jdk-18/lib/server/libjvm.so #24 0x00007fa236a0ee12 in Java_java_lang_ClassLoader_defineClass1 () from /usr/java/jdk-18/lib/libjava.so ------------- PR: https://git.openjdk.java.net/jdk/pull/4436 From coleenp at openjdk.java.net Sun Jul 11 23:35:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sun, 11 Jul 2021 23:35:02 GMT Subject: RFR: 8270179: Rename Amalloc_4 Message-ID: This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for JDK-8270217 Fix Arena::Amalloc to check for overflow better Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. ------------- Commit messages: - 8270179: Rename Amalloc_4 Changes: https://git.openjdk.java.net/jdk/pull/4750/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4750&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270179 Stats: 94 lines in 16 files changed: 0 ins; 40 del; 54 mod Patch: https://git.openjdk.java.net/jdk/pull/4750.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4750/head:pull/4750 PR: https://git.openjdk.java.net/jdk/pull/4750 From coleenp at openjdk.java.net Sun Jul 11 23:35:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sun, 11 Jul 2021 23:35:02 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: References: Message-ID: On Sun, 11 Jul 2021 23:27:33 GMT, Coleen Phillimore wrote: > This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better > Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. Why do we still have UseMallocOnly? There's one test for it but why. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From kbarrett at openjdk.java.net Mon Jul 12 05:09:53 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 12 Jul 2021 05:09:53 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: References: Message-ID: On Sun, 11 Jul 2021 23:28:37 GMT, Coleen Phillimore wrote: > Why do we still have UseMallocOnly? There's one test for it but why. Presumably to let a developer compare the behavior of malloc vs arena-based allocation, either for correctness or performance. It's a developer option, so we're free to nuke it without deprecation or anything like that. I wouldn't object to that; it seems like a rather crude hammer for such uses (more interesting for performance would be to control a specific arena). Though because it's a developer option it also doesn't affect product builds. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From stuefe at openjdk.java.net Mon Jul 12 05:13:50 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 12 Jul 2021 05:13:50 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: References: Message-ID: On Mon, 12 Jul 2021 05:06:48 GMT, Kim Barrett wrote: > Why do we still have UseMallocOnly? There's one test for it but why. I always thought it is done to find overwriters between individual arena allocations by moving them out of the arena block. You also get the malloc headers with the canaries in os::malloc (debug builds only), which would trip you on overwrites when the arena is cleaned and all allocations are deleted. But I think it's too complex and too much code for this purpose and could be done differently. In Metaspace, I add optional canary gaps between individual allocations within the arena itself (https://github.com/openjdk/jdk/blob/master/src/hotspot/share/memory/metaspace/allocationGuard.hpp) and test them periodically, which achieves the same goal with less code. It also means you don't need to individually free them. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From kbarrett at openjdk.java.net Mon Jul 12 05:30:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 12 Jul 2021 05:30:02 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: References: Message-ID: On Sun, 11 Jul 2021 23:27:33 GMT, Coleen Phillimore wrote: > This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better > Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. Looks good. One possible naming nit. I agree that UseMallocOnly might not be pulling its weight. > ... I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better You should add that bug to this PR via `/issue add JDK-8270217` src/hotspot/share/memory/arena.hpp line 105: > 103: debug_only(void* malloc(size_t size);) > 104: > 105: void* internal_malloc_words(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { If UseMallocOnly is retained, I think a better name for this would be internal_amalloc_only, to avoid confusion with the case of actually using malloc. (And yes, this can be taken as another argument for nuking that option.) ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4750 From stuefe at openjdk.java.net Mon Jul 12 06:12:57 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 12 Jul 2021 06:12:57 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: References: Message-ID: <11I6eE_tfyYlUgkA7bzL7Bfbi4JaXeuTbVfjyepsLSw=.d3486203-692b-44e9-bf1d-9c07eba5e518@github.com> On Sun, 11 Jul 2021 23:27:33 GMT, Coleen Phillimore wrote: > This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better > Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. I'm not sure the alignment handling is correct. Either that or I don't understand the contract of the Amalloc... functions. I thought: - Amalloc(x) -> allocate x bytes, align returned pointer to 64bit - AmallocWords(x) -> allocate x bytes, align returned pointer to pointer size ? Because the functions just align the given byte size (well, `Amalloc()` aligns automatically, `AmallocWords()` asserts). But that has no effect on the returned pointer. Consider this, on 32bit: 1) AmallocWords(4) -> _hwm = 4 2) Amalloc(8) -> will return a pointer to offset 4, so not 64bit aligned. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From kbarrett at openjdk.java.net Mon Jul 12 08:17:51 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 12 Jul 2021 08:17:51 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: <11I6eE_tfyYlUgkA7bzL7Bfbi4JaXeuTbVfjyepsLSw=.d3486203-692b-44e9-bf1d-9c07eba5e518@github.com> References: <11I6eE_tfyYlUgkA7bzL7Bfbi4JaXeuTbVfjyepsLSw=.d3486203-692b-44e9-bf1d-9c07eba5e518@github.com> Message-ID: On Mon, 12 Jul 2021 06:10:18 GMT, Thomas Stuefe wrote: > Consider this sequence, on 32bit: > > 1. AmallocWords(4) -> _hwm = 4 > > 2. Amalloc(8) -> will return a pointer to offset 4, so not 64bit aligned. I thought that's what ARENA_ALIGN at the beginning of Amalloc was for. But you are right, there's nothing to align `_hwm`. This looks like a pre-existing bug, and I wonder how this ever worked on 32bit platforms. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From stuefe at openjdk.java.net Mon Jul 12 08:26:53 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 12 Jul 2021 08:26:53 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: References: <11I6eE_tfyYlUgkA7bzL7Bfbi4JaXeuTbVfjyepsLSw=.d3486203-692b-44e9-bf1d-9c07eba5e518@github.com> Message-ID: <4RHRSZCA8HNJcpt-WG3ObtxZa6KsdQoOOQwQ4oLVSSQ=.4c99ede5-0d75-4a2d-b066-be4eb0e2ca0a@github.com> On Mon, 12 Jul 2021 08:15:01 GMT, Kim Barrett wrote: > > Consider this sequence, on 32bit: > > ``` > > 1. AmallocWords(4) -> _hwm = 4 > > > > 2. Amalloc(8) -> will return a pointer to offset 4, so not 64bit aligned. > > ``` > > I thought that's what ARENA_ALIGN at the beginning of Amalloc was for. But you are right, there's nothing to align `_hwm`. This looks like a pre-existing bug, and I wonder how this ever worked on 32bit platforms. My guess would be that the only widely relevant 32bit architecture was x86, and they allow unaligned access. I wonder how 32bit arm handles this. Maybe allocating 32bit words of memory is just very rare. I honestly always thought Amalloc4 was for allocating 32bits, if you knew 32bit alignment was enough. E.g. for storing ints. But seems I was wrong all along. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From kbarrett at openjdk.java.net Mon Jul 12 09:28:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 12 Jul 2021 09:28:55 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: <4RHRSZCA8HNJcpt-WG3ObtxZa6KsdQoOOQwQ4oLVSSQ=.4c99ede5-0d75-4a2d-b066-be4eb0e2ca0a@github.com> References: <11I6eE_tfyYlUgkA7bzL7Bfbi4JaXeuTbVfjyepsLSw=.d3486203-692b-44e9-bf1d-9c07eba5e518@github.com> <4RHRSZCA8HNJcpt-WG3ObtxZa6KsdQoOOQwQ4oLVSSQ=.4c99ede5-0d75-4a2d-b066-be4eb0e2ca0a@github.com> Message-ID: <94GOcNBJmjP5Xo0Z21TQnbiPY-TQTpzWMJoOR5qwodA=.88f81cf4-1a97-4f94-a0bd-1da6c9034b83@github.com> On Mon, 12 Jul 2021 08:23:55 GMT, Thomas Stuefe wrote: > My guess would be that the only widely relevant 32bit architecture was x86, and they allow unaligned access. I wonder how 32bit arm handles this. Maybe allocating 32bit words of memory is just very rare. Maybe using a given arena for a mix of operations doesn't happen? If only Amalloc is used for an arena then it will stay 8 byte aligned. > I honestly always thought Amalloc4 was for allocating 32bits, if you knew 32bit alignment was enough. E.g. for storing ints. But seems I was wrong all along. That's what I would have guessed too, based on the name. That's why I complained about it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From eosterlund at openjdk.java.net Mon Jul 12 11:56:52 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 12 Jul 2021 11:56:52 GMT Subject: RFR: 8263375: Support stack watermarks in Zero VM In-Reply-To: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> References: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> Message-ID: On Thu, 8 Jul 2021 16:48:26 GMT, Aleksey Shipilev wrote: > Zero VM supports most of GCs. Since JDK 16, Shenandoah uses stack watermarks, so Zero has to support those if Shenandoah+Zero support is to remain. This PR adds the stack watermark support in Zero VM. This should also be useful as other projects, notably Loom, mature and depend on stack watermarks. > > Zero already calls into Hotspot safepoint machinery to do things, and it seems only the hooks for `on_iteration` and `on_unwind` are missing. AFAICS, Zero only has on-return safepoints, renamed it to be more precise. > > @fisk, do you see any obvious problems with this patch? > > Additional testing: > - [x] Linux x86_64 Zero `hotspot_gc_shenandoah` now passes Looks good in general, would just like slightly more precise hooks as explained in a comment above. It will probably work without that though. src/hotspot/cpu/zero/zeroInterpreter_zero.cpp line 205: > 203: // Notify the stack watermarks machinery that we are unwinding. > 204: // Should do this before resetting the frame anchor. > 205: stack_watermark_unwind_check(thread); I wonder if this should maybe move down a bit to where we inspect the reason we left the interpreter loop. There are multiple reasons and only some involve unwinding. I'm thinking BytecodeInterpreter::return_from_method and BytecodeInterpreter::do_osr Regarding BytecodeInterpreter::throwing_exception the current contract for exception handing is that an unwind handler is called *after* unwinding instead. We have some exception handler function in the interpreter runtime that gets called after unwinding with an exception into an interpreted frame. Hopefully that still gets called when using zero. Worth double checking. ------------- PR: https://git.openjdk.java.net/jdk/pull/4728 From shade at openjdk.java.net Mon Jul 12 12:03:03 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 12 Jul 2021 12:03:03 GMT Subject: RFR: 8263375: Support stack watermarks in Zero VM In-Reply-To: References: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> Message-ID: On Mon, 12 Jul 2021 11:51:16 GMT, Erik ?sterlund wrote: >> Zero VM supports most of GCs. Since JDK 16, Shenandoah uses stack watermarks, so Zero has to support those if Shenandoah+Zero support is to remain. This PR adds the stack watermark support in Zero VM. This should also be useful as other projects, notably Loom, mature and depend on stack watermarks. >> >> Zero already calls into Hotspot safepoint machinery to do things, and it seems only the hooks for `on_iteration` and `on_unwind` are missing. AFAICS, Zero only has on-return safepoints, renamed it to be more precise. >> >> @fisk, do you see any obvious problems with this patch? >> >> Additional testing: >> - [x] Linux x86_64 Zero `hotspot_gc_shenandoah` now passes > > src/hotspot/cpu/zero/zeroInterpreter_zero.cpp line 205: > >> 203: // Notify the stack watermarks machinery that we are unwinding. >> 204: // Should do this before resetting the frame anchor. >> 205: stack_watermark_unwind_check(thread); > > I wonder if this should maybe move down a bit to where we inspect the reason we left the interpreter loop. There are multiple reasons and only some involve unwinding. I'm thinking BytecodeInterpreter::return_from_method and BytecodeInterpreter::do_osr > > Regarding BytecodeInterpreter::throwing_exception the current contract for exception handing is that an unwind handler is called *after* unwinding instead. We have some exception handler function in the interpreter runtime that gets called after unwinding with an exception into an interpreted frame. Hopefully that still gets called when using zero. Worth double checking. Thanks! I'll take a look after I am back from extended time off. Zero does not actually do OSR anymore (Zero gradually eroded to interpreter-only mode), so only return_from_method might need handling. I'll see what happens in throwing_exception case; worst case I think I can set up a one-off frame anchor and call the unwind handler with it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4728 From coleenp at openjdk.java.net Mon Jul 12 13:54:53 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 12 Jul 2021 13:54:53 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: References: Message-ID: <-20d98EmU8Uo6Fr5UycvjLyWU8QN3W3zofLvPqYSBnQ=.04a0dd95-9ca7-4598-8fd3-c87ca8e3f52b@github.com> On Sun, 11 Jul 2021 23:27:33 GMT, Coleen Phillimore wrote: > This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better > Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. Thomas, you're right. More bugs in this code! >> My guess would be that the only widely relevant 32bit architecture was x86, and they allow unaligned access. I wonder how 32bit arm handles this. Maybe allocating 32bit words of memory is just very rare. > Maybe using a given arena for a mix of operations doesn't happen? If only Amalloc is used for an arena then it will stay 8 byte aligned. I think this is the case. The case for aligning to 64 bits for 32 bit platforms was added for bug JDK-4526490 but I can't really follow where the unaligned load is coming from in that case. The test program is Java so the interpreter will do the volatile loads correctly regardless of alignment, so I think the problem was with Xcomp. I suppose I can put a comment and file *another* bug. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From coleenp at openjdk.java.net Mon Jul 12 13:54:53 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 12 Jul 2021 13:54:53 GMT Subject: RFR: 8270179: Rename Amalloc_4 In-Reply-To: References: Message-ID: <8KXzVcjSd9YRmR3GCxIABbFa51K_PORuajx0TgF13kc=.c267076d-d8c7-4de5-9637-07f885bc8f2f@github.com> On Mon, 12 Jul 2021 05:20:44 GMT, Kim Barrett wrote: >> This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for >> JDK-8270217 Fix Arena::Amalloc to check for overflow better >> Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. > > src/hotspot/share/memory/arena.hpp line 105: > >> 103: debug_only(void* malloc(size_t size);) >> 104: >> 105: void* internal_malloc_words(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { > > If UseMallocOnly is retained, I think a better name for this would be internal_amalloc_only, to avoid confusion with the case of actually using malloc. (And yes, this can be taken as another argument for nuking that option.) Ok. I'll file another RFE for removing UseMallocOnly. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From coleenp at openjdk.java.net Mon Jul 12 14:11:30 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 12 Jul 2021 14:11:30 GMT Subject: RFR: 8270179: Rename Amalloc_4 [v2] In-Reply-To: References: Message-ID: > This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better > Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Rename internal_malloc_words and add comment about Amalloc being wrong. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4750/files - new: https://git.openjdk.java.net/jdk/pull/4750/files/4e30e42e..dfd64c57 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4750&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4750&range=00-01 Stats: 7 lines in 4 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4750.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4750/head:pull/4750 PR: https://git.openjdk.java.net/jdk/pull/4750 From coleenp at openjdk.java.net Mon Jul 12 14:22:56 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 12 Jul 2021 14:22:56 GMT Subject: RFR: 8270179: Rename Amalloc_4 [v2] In-Reply-To: References: Message-ID: <14dPaVzdJ8F4-P3GLMDrOWcwCacmW4eCgVTR9fy2bG8=.fb7cc6f8-05af-45d4-bd51-cc5a68245fc0@github.com> On Mon, 12 Jul 2021 14:11:30 GMT, Coleen Phillimore wrote: >> This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for >> JDK-8270217 Fix Arena::Amalloc to check for overflow better >> Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename internal_malloc_words and add comment about Amalloc being wrong. >From bug JDK-8007475 > Nevertheless I think the -XX:+UseMallocOnly option (which is also only available in the debug version of the VM) is important enough (i.e. very nice to hunt other memory problems) to fix the problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From hseigel at openjdk.java.net Mon Jul 12 14:24:56 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 12 Jul 2021 14:24:56 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: On Tue, 6 Jul 2021 07:01:22 GMT, Kim Barrett wrote: >> src/hotspot/share/classfile/stackMapTableFormat.hpp line 47: >> >>> 45: >>> 46: protected: >>> 47: // No constructors - should be 'private', but GCC issues a warning if it is >> >> Have we checked this is still the case? Might be a very old gcc issue. > > I did a little bit of experimenting and couldn't get gcc to warn, though other compilers might do so. I think such a warning is not unreasonable. Based on Kim's comment, I decided to leave the change as it currently is. ------------- PR: https://git.openjdk.java.net/jdk/pull/4652 From hseigel at openjdk.java.net Mon Jul 12 14:24:55 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 12 Jul 2021 14:24:55 GMT Subject: RFR: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 13:06:55 GMT, Harold Seigel wrote: > Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Thanks David, Kim, and Aleksey for reviewing this change! ------------- PR: https://git.openjdk.java.net/jdk/pull/4652 From hseigel at openjdk.java.net Mon Jul 12 14:24:56 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 12 Jul 2021 14:24:56 GMT Subject: Integrated: 8244162: Additional opportunities to use NONCOPYABLE In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 13:06:55 GMT, Harold Seigel wrote: > Please review this small change to use NONCOPYABLE macro where applicable. The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold This pull request has now been integrated. Changeset: 92ae6a51 Author: Harold Seigel URL: https://git.openjdk.java.net/jdk/commit/92ae6a512340485f75a12479dc1c1b8d3261bc76 Stats: 21 lines in 6 files changed: 6 ins; 9 del; 6 mod 8244162: Additional opportunities to use NONCOPYABLE Reviewed-by: dholmes, kbarrett, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/4652 From chris at sageembedded.com Mon Jul 12 16:47:02 2021 From: chris at sageembedded.com (Chris Cole) Date: Mon, 12 Jul 2021 09:47:02 -0700 Subject: 8267042: bug in monitor locking/unlocking on ARM32 C1 due to uninitialized BasicObjectLock::_displaced_header Message-ID: Hi Boris and others, Thanks for integrating this fix into jdk18 and jdk17. This bug is also present in OpenJDK 11.0.10 and later (introduced by the backport of JDK-8241234). I would suggest that this also be backported to jdk11u. It applies cleanly. Let me know if there is any way for me to help with this effort. Thanks again for the support, Chris Cole ------------ BUG - https://bugs.openjdk.java.net/browse/JDK-8267042 From martin.doerr at sap.com Mon Jul 12 17:22:14 2021 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 12 Jul 2021 17:22:14 +0000 Subject: AW: 8267042: bug in monitor locking/unlocking on ARM32 C1 due to uninitialized BasicObjectLock::_displaced_header In-Reply-To: References: Message-ID: Hi Chris, thanks for bringing this up! Can you create a ?Backport Pull Request? for jdk11u-dev [1] according to [2]? Best regards, Martin [1] https://github.com/openjdk/jdk11u-dev [2] https://wiki.openjdk.java.net/display/SKARA/Backports Von: jdk-updates-dev im Auftrag von Chris Cole Datum: Montag, 12. Juli 2021 um 18:48 An: hotspot-compiler-dev at openjdk.java.net , hotspot-dev at openjdk.java.net , jdk-updates-dev at openjdk.java.net Betreff: 8267042: bug in monitor locking/unlocking on ARM32 C1 due to uninitialized BasicObjectLock::_displaced_header Hi Boris and others, Thanks for integrating this fix into jdk18 and jdk17. This bug is also present in OpenJDK 11.0.10 and later (introduced by the backport of JDK-8241234). I would suggest that this also be backported to jdk11u. It applies cleanly. Let me know if there is any way for me to help with this effort. Thanks again for the support, Chris Cole ------------ BUG - https://bugs.openjdk.java.net/browse/JDK-8267042 From akozlov at openjdk.java.net Mon Jul 12 20:30:18 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 12 Jul 2021 20:30:18 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run Message-ID: The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. Verified by: jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 ------------- Commit messages: - 8266889: Switch W^X in JVMTI RawMonitor functions Changes: https://git.openjdk.java.net/jdk17/pull/244/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=244&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266889 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk17/pull/244.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/244/head:pull/244 PR: https://git.openjdk.java.net/jdk17/pull/244 From jwilhelm at openjdk.java.net Mon Jul 12 23:21:35 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Mon, 12 Jul 2021 23:21:35 GMT Subject: RFR: Merge jdk17 Message-ID: <7zQieFkCIJDip3Y4qUzNkzmvs5CWEaF2L09fXDMbLkc=.2b0d08b5-5ed6-4d63-bc76-fccad123bdb6@github.com> Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8266345: (fs) Custom DefaultFileSystemProvider security related loops - 8269873: serviceability/sa/Clhsdb tests are using a C2 specific VMStruct field - 8268965: TCP Connection Reset when connecting simple socket to SSL server - 8269558: fix of JDK-8252657 missed to update history at the end of JVM TI spec - 8270216: [macOS] Update named used for Java run loop mode The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4760&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4760&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4760/files Stats: 39 lines in 7 files changed: 33 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/4760.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4760/head:pull/4760 PR: https://git.openjdk.java.net/jdk/pull/4760 From coleenp at openjdk.java.net Mon Jul 12 23:23:24 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 12 Jul 2021 23:23:24 GMT Subject: RFR: 8270179: Rename Amalloc_4 [v3] In-Reply-To: References: Message-ID: > This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better > Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add more info why UseMallocOnly is useful. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4750/files - new: https://git.openjdk.java.net/jdk/pull/4750/files/dfd64c57..b49f5860 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4750&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4750&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4750.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4750/head:pull/4750 PR: https://git.openjdk.java.net/jdk/pull/4750 From sspitsyn at openjdk.java.net Mon Jul 12 23:49:53 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 12 Jul 2021 23:49:53 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run In-Reply-To: References: Message-ID: On Mon, 12 Jul 2021 20:22:25 GMT, Anton Kozlov wrote: > The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. > > Verified by: > > jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 Hi Anton, At least, some comments would be nice to have explaining why the switches are needed there. Thanks, Serguei ------------- PR: https://git.openjdk.java.net/jdk17/pull/244 From coleenp at openjdk.java.net Mon Jul 12 23:52:53 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 12 Jul 2021 23:52:53 GMT Subject: RFR: 8215948: [TESTBUG] gtest pseudo-JavaThreads could be more regular JavaThreads In-Reply-To: References: Message-ID: <8E2ayGT8HJOATcmlSo9uA1Z4o8ZsKgdwWQVLjJErvWY=.6c7222ec-03c8-4066-8eb9-0e7fdd8cb8df@github.com> On Wed, 7 Jul 2021 07:03:55 GMT, David Holmes wrote: > Please review this change to the gtest infrastructure that makes the test JavaThreads more regular JavaThreads, with the same lifecycle methods and having a regular j.l.Thread object. > > One test had to be modified slightly to create and start the threads outside of a code region where a Mutex is held. > > Testing: tiers 1-3, GHA > > Thanks, > David This looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4704 From coleenp at openjdk.java.net Mon Jul 12 23:56:21 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 12 Jul 2021 23:56:21 GMT Subject: RFR: 8270179: Rename Amalloc_4 [v4] In-Reply-To: References: Message-ID: > This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better > Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Rename internal_malloc_only to internal_amalloc, which is a better name. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4750/files - new: https://git.openjdk.java.net/jdk/pull/4750/files/b49f5860..59690839 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4750&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4750&range=02-03 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/4750.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4750/head:pull/4750 PR: https://git.openjdk.java.net/jdk/pull/4750 From kbarrett at openjdk.java.net Tue Jul 13 01:23:56 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 13 Jul 2021 01:23:56 GMT Subject: RFR: 8270179: Rename Amalloc_4 [v4] In-Reply-To: References: Message-ID: <38TIB2jg8nCWmvOu7pKBPVwVoe7-SdEordigYWm6_2E=.920e4eeb-6d7f-4789-896a-6703aeb946e9@github.com> On Mon, 12 Jul 2021 23:56:21 GMT, Coleen Phillimore wrote: >> This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for >> JDK-8270217 Fix Arena::Amalloc to check for overflow better >> Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename internal_malloc_only to internal_amalloc, which is a better name. Still looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4750 From dholmes at openjdk.java.net Tue Jul 13 01:28:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 13 Jul 2021 01:28:51 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run In-Reply-To: References: Message-ID: On Mon, 12 Jul 2021 20:22:25 GMT, Anton Kozlov wrote: > The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. > > Verified by: > > jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 I can understand that raw_enter and raw_wait need this fix because they have the ThreadBlockInVM transition; but why do we need it for raw_exit and raw_notify when they do not change the thread state ?? ------------- PR: https://git.openjdk.java.net/jdk17/pull/244 From david.holmes at oracle.com Tue Jul 13 01:44:47 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 13 Jul 2021 11:44:47 +1000 Subject: RFR: 8215948: [TESTBUG] gtest pseudo-JavaThreads could be more regular JavaThreads In-Reply-To: <8E2ayGT8HJOATcmlSo9uA1Z4o8ZsKgdwWQVLjJErvWY=.6c7222ec-03c8-4066-8eb9-0e7fdd8cb8df@github.com> References: <8E2ayGT8HJOATcmlSo9uA1Z4o8ZsKgdwWQVLjJErvWY=.6c7222ec-03c8-4066-8eb9-0e7fdd8cb8df@github.com> Message-ID: On 13/07/2021 9:52 am, Coleen Phillimore wrote: > On Wed, 7 Jul 2021 07:03:55 GMT, David Holmes wrote: > >> Please review this change to the gtest infrastructure that makes the test JavaThreads more regular JavaThreads, with the same lifecycle methods and having a regular j.l.Thread object. >> >> One test had to be modified slightly to create and start the threads outside of a code region where a Mutex is held. >> >> Testing: tiers 1-3, GHA >> >> Thanks, >> David > > This looks good. Thanks for the review Coleen! Can I get a second review please. David > ------------- > > Marked as reviewed by coleenp (Reviewer). > > PR: https://git.openjdk.java.net/jdk/pull/4704 > From stuefe at openjdk.java.net Tue Jul 13 03:58:54 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 13 Jul 2021 03:58:54 GMT Subject: RFR: 8270179: Rename Amalloc_4 [v4] In-Reply-To: References: Message-ID: <6USlXA-qkz-jbe9ovASUsMJ_wdaFCzhBSJqnHkE3j0E=.ae2c4cb0-1d90-4317-9e57-48f025fb39d5@github.com> On Mon, 12 Jul 2021 23:56:21 GMT, Coleen Phillimore wrote: >> This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for >> JDK-8270217 Fix Arena::Amalloc to check for overflow better >> Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename internal_malloc_only to internal_amalloc, which is a better name. LGTM (besides the alignment issue as well as the somewhat unclear comments, but both can be fixed with the follow-up RFE) ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4750 From cashford at openjdk.java.net Tue Jul 13 04:55:06 2021 From: cashford at openjdk.java.net (Corey Ashford) Date: Tue, 13 Jul 2021 04:55:06 GMT Subject: RFR: JDK-8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup Message-ID: This series of commits was written to accomplish several cleanup and Power10 optimization tasks for the Base64 decodeBlock intrinsic for Power64: * Remove the ISA 3.1+ (Power10+) pextd instruction optimization in decodeBlock. This "optimization" turned out to actually cause a performance hit. Removing it gains back about 3% in performance. * Introduce a constant block, similar to that in use by encodeBlock() to speed up constant loading. * Add the ISA 3.1+ xxpermx instruction and align_prefix() method for use in a Power10 optimization for decodeBlock. Please see the commit log for my concerns about this change. * Implement the xxpermx-based decodeBlock algorithm for Power10+, which gives about a 5% performance boost. More details can be found in the commit logs. I want to note here that I looked into changing the loop_unrolls constant, and found that at large buffer sizes, the values of 2 and 4 give some extra performance gain. For example, on a 20001-byte destination buffer, I see an increase from 4.7X over intrinsic disabled (loop_unrolls=1), to 5.3X over intrinsic disabled (loop_unrolls=4), but on smaller buffer sizes, up to about 512, it causes performance degradation over loop_unrolls=1, so I have decided to stick with the original value of 1, since I don't know where to focus the performance versus buffer length tradeoff. ------------- Commit messages: - macroAssembler_ppc.cpp: fix whitespace error - stubGenerator_ppc.cpp: decodeBlock(): Use xxpermx to improve performance of decodeBlock on Power10+ - Add xxpermx instruction, and align_prefix() method - stubGenerator_cpp.cpp: decodeBlock(): use constant block for loading constants into vector registers for cleaner and faster code - stubGenerator_ppc.cpp: Remove p10 pextd optimization Changes: https://git.openjdk.java.net/jdk/pull/4762/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4762&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270340 Stats: 437 lines in 5 files changed: 206 ins; 126 del; 105 mod Patch: https://git.openjdk.java.net/jdk/pull/4762.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4762/head:pull/4762 PR: https://git.openjdk.java.net/jdk/pull/4762 From akozlov at openjdk.java.net Tue Jul 13 10:13:39 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 13 Jul 2021 10:13:39 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run [v2] In-Reply-To: References: Message-ID: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> > The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. > > Verified by: > > jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: Subset of RawMonitor; comments added ------------- Changes: - all: https://git.openjdk.java.net/jdk17/pull/244/files - new: https://git.openjdk.java.net/jdk17/pull/244/files/4598ddb8..0088479c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk17&pr=244&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk17&pr=244&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk17/pull/244.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/244/head:pull/244 PR: https://git.openjdk.java.net/jdk17/pull/244 From akozlov at openjdk.java.net Tue Jul 13 10:13:57 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 13 Jul 2021 10:13:57 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run [v2] In-Reply-To: References: Message-ID: On Tue, 13 Jul 2021 01:25:48 GMT, David Holmes wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Subset of RawMonitor; comments added > > I can understand that raw_enter and raw_wait need this fix because they have the ThreadBlockInVM transition; but why do we need it for raw_exit and raw_notify when they do not change the thread state ?? @dholmes-ora I added the W^X into all RawMonitor functions since they clearly slipped from the common JVMTI entry with W^X management. It was just precaution. But I don't see any problem with handling only raw_enter and raw_wait, updated the patch. Handling only subset of functions also simplifies the comment suggested by @sspitsyn. Thanks for the comments! ------------- PR: https://git.openjdk.java.net/jdk17/pull/244 From aph at openjdk.java.net Tue Jul 13 10:34:03 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 13 Jul 2021 10:34:03 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run [v2] In-Reply-To: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> References: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> Message-ID: <5RGCSuMnuhEB_pV7UDdCJaks7Liz_nJovKKldsufGoc=.24afb9c2-fb75-4d8c-aedb-0818cb152546@github.com> On Tue, 13 Jul 2021 10:13:39 GMT, Anton Kozlov wrote: >> The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. >> >> Verified by: >> >> jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Subset of RawMonitor; comments added OK. We really need to have a think about how W^X should be handled: this temporal coupling is a code smell, and will be be a reliability problem until we do something better. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk17/pull/244 From jwilhelm at openjdk.java.net Tue Jul 13 10:54:07 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 13 Jul 2021 10:54:07 GMT Subject: Integrated: Merge jdk17 In-Reply-To: <7zQieFkCIJDip3Y4qUzNkzmvs5CWEaF2L09fXDMbLkc=.2b0d08b5-5ed6-4d63-bc76-fccad123bdb6@github.com> References: <7zQieFkCIJDip3Y4qUzNkzmvs5CWEaF2L09fXDMbLkc=.2b0d08b5-5ed6-4d63-bc76-fccad123bdb6@github.com> Message-ID: On Mon, 12 Jul 2021 23:12:29 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: 6b123b05 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/6b123b059136b0c1efa62a23824b9aa253e6a519 Stats: 39 lines in 7 files changed: 33 ins; 2 del; 4 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4760 From coleenp at openjdk.java.net Tue Jul 13 13:09:59 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 13 Jul 2021 13:09:59 GMT Subject: RFR: 8270179: Rename Amalloc_4 [v4] In-Reply-To: References: Message-ID: On Mon, 12 Jul 2021 23:56:21 GMT, Coleen Phillimore wrote: >> This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for >> JDK-8270217 Fix Arena::Amalloc to check for overflow better >> Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename internal_malloc_only to internal_amalloc, which is a better name. Thanks Kim and Thomas. I filed JDK-8270308 for the alignment issue. I almost fixed it with this but then I'd want to write a gtest and don't really have 32 bit platforms to conveniently test it on. The comments can be made more clear if someone fixes that bug (assuming the comments refer to alignment). Thanks for all the reviews and improvements! ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From coleenp at openjdk.java.net Tue Jul 13 13:10:00 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 13 Jul 2021 13:10:00 GMT Subject: RFR: 8270179: Rename Amalloc_4 [v4] In-Reply-To: <8KXzVcjSd9YRmR3GCxIABbFa51K_PORuajx0TgF13kc=.c267076d-d8c7-4de5-9637-07f885bc8f2f@github.com> References: <8KXzVcjSd9YRmR3GCxIABbFa51K_PORuajx0TgF13kc=.c267076d-d8c7-4de5-9637-07f885bc8f2f@github.com> Message-ID: <6-LnuWSUi9Lk9hXQTdThOccEalENbga0o3K58G9nAhs=.4239b4bc-3052-49f9-9bea-598919606ee0@github.com> On Mon, 12 Jul 2021 13:52:08 GMT, Coleen Phillimore wrote: >> src/hotspot/share/memory/arena.hpp line 105: >> >>> 103: debug_only(void* malloc(size_t size);) >>> 104: >>> 105: void* internal_malloc_words(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { >> >> If UseMallocOnly is retained, I think a better name for this would be internal_amalloc_only, to avoid confusion with the case of actually using malloc. (And yes, this can be taken as another argument for nuking that option.) > > Ok. I'll file another RFE for removing UseMallocOnly. I decided that UseMallocOnly found an interesting bug once and there's a test for it, so it might find another interesting bug someday, so I added a comment and decided to leave it. We can decide some other time to remove it if we have a good alternative. ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From coleenp at openjdk.java.net Tue Jul 13 13:10:01 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 13 Jul 2021 13:10:01 GMT Subject: Integrated: 8270179: Rename Amalloc_4 In-Reply-To: References: Message-ID: On Sun, 11 Jul 2021 23:27:33 GMT, Coleen Phillimore wrote: > This renames Amalloc_4 to AmallocWords. While I had to fix internal_malloc_4, which is a copy of Amalloc_4 (except with UseMallocOnly handling), I also made the change for > JDK-8270217 Fix Arena::Amalloc to check for overflow better > Tested with tier1 - all Oracle platforms and tier1-3 on linux-x64. This pull request has now been integrated. Changeset: 460c4bb6 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/460c4bb6ceeea20d21f41c9d62280c0b2bd747e7 Stats: 97 lines in 17 files changed: 1 ins; 40 del; 56 mod 8270179: Rename Amalloc_4 8270217: Fix Arena::Amalloc to check for overflow better Reviewed-by: kbarrett, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/4750 From sspitsyn at openjdk.java.net Tue Jul 13 16:15:54 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 13 Jul 2021 16:15:54 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run [v2] In-Reply-To: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> References: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> Message-ID: <5ylMEuqR6Wu5I7mqoDAZH94mKivSnnefC0SMJmDRP0w=.569f3410-1354-454c-8864-cf7b65cf8aa7@github.com> On Tue, 13 Jul 2021 10:13:39 GMT, Anton Kozlov wrote: >> The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. >> >> Verified by: >> >> jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Subset of RawMonitor; comments added Marked as reviewed by sspitsyn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk17/pull/244 From github.com+58006833+xbzhang99 at openjdk.java.net Tue Jul 13 21:43:24 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 13 Jul 2021 21:43:24 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v2] In-Reply-To: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> References: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> Message-ID: <1CaQ3iFQHAc-aT_rU1d6Sw-9T-u_y4SH5C6yHTo3Sw4=.ccf681e7-06c7-4944-a80f-d52cde9f6932@github.com> On Thu, 27 May 2021 17:36:24 GMT, Xubo Zhang wrote: >> Intel introduced a new instruction ?serialize? which ensures that all modifications to flags, registers, and memory by previous instructions are completed and all buffered writes are drained to memory before the next instruction is fetched and executed. It is a serializing instruction and can be used to implement cross modify fence (OrderAccess::cross_modify_fence_impl) more efficiently than using ?cpuid? on supported 32-bit and 64-bit x86 platforms. >> >> The availability of the SERIALIZE instruction is indicated by the presence of the CPUID feature flag SERIALIZE, bit 14 of the EDX register in sub-leaf CPUID:7H.0H. >> >> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html > > Xubo Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8264543: Using Intel serialize instruction to replace cpuid in Cross modify fence, on supported platforms > rebase with master I profiled vmexit a simple c/asm program that calls cupid and serialize instructions running inside virtual machine, the results showed that each cupid caused a vmexit while serialize did not (excluding fixed overhead): 12000000 asm cpuid: VM-EXIT Samples Samples% CPUID 12000347 99.88% 12000000 asm serialize: VM-EXIT Samples Samples% CPUID 331 6.25% It shows that replacing cpuid with serialize greatly reduced # of vmexit, which benefits java programs running in virtual environment ------------- PR: https://git.openjdk.java.net/jdk/pull/3334 From pchilanomate at openjdk.java.net Tue Jul 13 22:30:11 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Tue, 13 Jul 2021 22:30:11 GMT Subject: RFR: 8215948: [TESTBUG] gtest pseudo-JavaThreads could be more regular JavaThreads In-Reply-To: References: Message-ID: On Wed, 7 Jul 2021 07:03:55 GMT, David Holmes wrote: > Please review this change to the gtest infrastructure that makes the test JavaThreads more regular JavaThreads, with the same lifecycle methods and having a regular j.l.Thread object. > > One test had to be modified slightly to create and start the threads outside of a code region where a Mutex is held. > > Testing: tiers 1-3, GHA > > Thanks, > David Changes look good to me. Thanks, Patricio ------------- Marked as reviewed by pchilanomate (Committer). PR: https://git.openjdk.java.net/jdk/pull/4704 From svkamath at openjdk.java.net Tue Jul 13 23:46:40 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 13 Jul 2021 23:46:40 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v3] In-Reply-To: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: > I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. > Performance gain of ~1.5x - 2x for message sizes 8k and above. Smita Kamath has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - merge master - 8267125:Updated intrinsic signature to remove copies of counter, state and subkeyHtbl - Merge master - JDK-8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions ------------- Changes: https://git.openjdk.java.net/jdk/pull/4019/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4019&range=02 Stats: 935 lines in 21 files changed: 926 ins; 0 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/4019.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4019/head:pull/4019 PR: https://git.openjdk.java.net/jdk/pull/4019 From dholmes at openjdk.java.net Wed Jul 14 00:11:16 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 14 Jul 2021 00:11:16 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run [v2] In-Reply-To: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> References: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> Message-ID: <3haZTu2ZwYp0-RQ82bjglrUG_dKH0ookReMroI5-KDc=.08bf0e7e-7853-462f-943b-24629684d2f1@github.com> On Tue, 13 Jul 2021 10:13:39 GMT, Anton Kozlov wrote: >> The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. >> >> Verified by: >> >> jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Subset of RawMonitor; comments added Seems fine to address current problem, but we need a much clearer, more systematic approach to handling this (and enabling others to easily understand when and why it is needed). Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk17/pull/244 From jwilhelm at openjdk.java.net Wed Jul 14 00:22:49 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 14 Jul 2021 00:22:49 GMT Subject: RFR: Merge jdk17 Message-ID: <8R4n8NuSxB6bvN-nX-OpZ4gW1MKS-Q0B5eBNoIif7n0=.dd765279-93e5-47c9-9f1d-47e282e04e5f@github.com> Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8270025: DynamicCallSiteDesc::withArgs doesn't throw NPE - 8270184: [TESTBUG] Add coverage for jvmci ResolvedJavaType.toJavaName() for lambdas - 8269281: java/foreign/Test{Down,Up}call.java time out - 8269635: Stress test SEGV while emitting OldObjectSample - 8269525: Deadlock during Volano with JFR - 8259848: Interim javadoc build does not support platform links - 8269795: C2: Out of bounds array load floats above its range check in loop peeling resulting in SEGV - 8270203: Missing build dependency between jdk.jfr-gendata and buildtools-hotspot The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4771&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4771&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4771/files Stats: 388 lines in 13 files changed: 322 ins; 18 del; 48 mod Patch: https://git.openjdk.java.net/jdk/pull/4771.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4771/head:pull/4771 PR: https://git.openjdk.java.net/jdk/pull/4771 From david.holmes at oracle.com Wed Jul 14 01:03:15 2021 From: david.holmes at oracle.com (David Holmes) Date: Wed, 14 Jul 2021 11:03:15 +1000 Subject: RFR: 8215948: [TESTBUG] gtest pseudo-JavaThreads could be more regular JavaThreads In-Reply-To: References: Message-ID: <4bbce4b4-7185-7df1-89dc-6eff773ee811@oracle.com> On 14/07/2021 8:30 am, Patricio Chilano Mateo wrote: > On Wed, 7 Jul 2021 07:03:55 GMT, David Holmes wrote: > >> Please review this change to the gtest infrastructure that makes the test JavaThreads more regular JavaThreads, with the same lifecycle methods and having a regular j.l.Thread object. >> >> One test had to be modified slightly to create and start the threads outside of a code region where a Mutex is held. >> >> Testing: tiers 1-3, GHA >> >> Thanks, >> David > > Changes look good to me. Thanks Patricio! David > Thanks, > Patricio > > ------------- > > Marked as reviewed by pchilanomate (Committer). > > PR: https://git.openjdk.java.net/jdk/pull/4704 > From dholmes at openjdk.java.net Wed Jul 14 01:08:16 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 14 Jul 2021 01:08:16 GMT Subject: Integrated: 8215948: [TESTBUG] gtest pseudo-JavaThreads could be more regular JavaThreads In-Reply-To: References: Message-ID: <9JSueOhxGQTNFHhiBk8j3FbBPPaBRG2yQ73hjWtft2s=.5bac9b8d-7ef3-4d30-83f0-12898d524c94@github.com> On Wed, 7 Jul 2021 07:03:55 GMT, David Holmes wrote: > Please review this change to the gtest infrastructure that makes the test JavaThreads more regular JavaThreads, with the same lifecycle methods and having a regular j.l.Thread object. > > One test had to be modified slightly to create and start the threads outside of a code region where a Mutex is held. > > Testing: tiers 1-3, GHA > > Thanks, > David This pull request has now been integrated. Changeset: 770e2aa3 Author: David Holmes URL: https://git.openjdk.java.net/jdk/commit/770e2aa3c6a2bbbc578e60dc2b11300344863e70 Stats: 108 lines in 2 files changed: 43 ins; 38 del; 27 mod 8215948: [TESTBUG] gtest pseudo-JavaThreads could be more regular JavaThreads Reviewed-by: coleenp, pchilanomate ------------- PR: https://git.openjdk.java.net/jdk/pull/4704 From jwilhelm at openjdk.java.net Wed Jul 14 01:10:29 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 14 Jul 2021 01:10:29 GMT Subject: Integrated: Merge jdk17 In-Reply-To: <8R4n8NuSxB6bvN-nX-OpZ4gW1MKS-Q0B5eBNoIif7n0=.dd765279-93e5-47c9-9f1d-47e282e04e5f@github.com> References: <8R4n8NuSxB6bvN-nX-OpZ4gW1MKS-Q0B5eBNoIif7n0=.dd765279-93e5-47c9-9f1d-47e282e04e5f@github.com> Message-ID: <5EuAHb-D3XlzTzriaMpMRYvuN7F7Rsga91-2nbN_X1s=.639bf5f5-df8a-4dbf-bd9c-efe2b21a5cbe@github.com> On Wed, 14 Jul 2021 00:13:41 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: 4a7ccf36 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/4a7ccf36e9a3978c437db3efe892dd23e8a0b772 Stats: 388 lines in 13 files changed: 322 ins; 18 del; 48 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4771 From dholmes at openjdk.java.net Wed Jul 14 01:20:26 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 14 Jul 2021 01:20:26 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v2] In-Reply-To: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> References: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> Message-ID: On Thu, 27 May 2021 17:36:24 GMT, Xubo Zhang wrote: >> Intel introduced a new instruction ?serialize? which ensures that all modifications to flags, registers, and memory by previous instructions are completed and all buffered writes are drained to memory before the next instruction is fetched and executed. It is a serializing instruction and can be used to implement cross modify fence (OrderAccess::cross_modify_fence_impl) more efficiently than using ?cpuid? on supported 32-bit and 64-bit x86 platforms. >> >> The availability of the SERIALIZE instruction is indicated by the presence of the CPUID feature flag SERIALIZE, bit 14 of the EDX register in sub-leaf CPUID:7H.0H. >> >> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html > > Xubo Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8264543: Using Intel serialize instruction to replace cpuid in Cross modify fence, on supported platforms > rebase with master Hi, Thanks for the additional benchmarking numbers. This seems fine in principle as previously stated but should be done for all x86 not just Linux. Other changes requested below. Thanks, David src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 461: > 459: } > 460: > 461: bool os::supports_serialize(){ This function is unnecessary and pollutes the OS namespace with something that is not OS related. Just use VM_Version::supports_serialize() directly. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3334 From akozlov at openjdk.java.net Wed Jul 14 10:39:11 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 14 Jul 2021 10:39:11 GMT Subject: [jdk17] Integrated: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run In-Reply-To: References: Message-ID: On Mon, 12 Jul 2021 20:22:25 GMT, Anton Kozlov wrote: > The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. > > Verified by: > > jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 This pull request has now been integrated. Changeset: 381bd621 Author: Anton Kozlov URL: https://git.openjdk.java.net/jdk17/commit/381bd621074a13cc2f260c18371c956bc48abd4d Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run Reviewed-by: dholmes, aph, sspitsyn ------------- PR: https://git.openjdk.java.net/jdk17/pull/244 From akozlov at openjdk.java.net Wed Jul 14 10:39:10 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 14 Jul 2021 10:39:10 GMT Subject: [jdk17] RFR: 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run [v2] In-Reply-To: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> References: <8qJPTLrfnYRnzcv9csluIhKDZW9SxNwoFsnMEAYPnW8=.5b36c87f-09af-4935-88e5-1538f4377c2c@github.com> Message-ID: On Tue, 13 Jul 2021 10:13:39 GMT, Anton Kozlov wrote: >> The change adds W^X transition in RawMonitor family of functions, fixing the crash. RawMonitor functions are treated specially, so W^X transition is not inserted automatically. A better fix would be in .xml description for entries generation, but for now it is too risky. I hope to get this fixed in 17 in the simplest way and do re-work in the 18. This change is still 100% correct for the reported bug. >> >> Verified by: >> >> jtreg -vmoption:-XX:+AssertWXAtThreadSync test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002 > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Subset of RawMonitor; comments added Thanks for the reviews. It's my fault I have not explained how W^X is supposed to work. I still does not consider this as the smell, but since someone thinks so, it's my bad. I'm going to create a page on cr.openjdk.java.net that will be a memo of the current state. ------------- PR: https://git.openjdk.java.net/jdk17/pull/244 From aw at openjdk.java.net Wed Jul 14 17:36:21 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Wed, 14 Jul 2021 17:36:21 GMT Subject: Integrated: 8269592: [JVMCI] Optimize c2v_iterateFrames In-Reply-To: References: Message-ID: On Tue, 29 Jun 2021 17:26:58 GMT, Andreas Woess wrote: > Several smaller optimizations and cleanups to JVMCI's iterateFrames: > * Restructure the iterateFrames method for better readability and maintenance, with some parts extracted to helper functions. > * Use vframeStream as the iterator for faster iteration in case not every vframe matches the method filter, so we can avoid creating javaVFrames for skipped vframes. We use vframeStream::asJavaVFrame() to get the current javaVFrame. > * Extended vframeStream::asJavaVFrame() to also work with native frames, so that it works with all java frames returned by vframeStream. This way, native compiledVFrames will just work and do not need extra handling. > Test coverage is provided via a newly added iterateFrames jtreg test that includes a JNI call on the stack. > * Added two trivial getters to vframeStream: vframe_id() and decode_offset(). > These are used together with compiledVFrame::at_scope() to avoid going through vframeStream::asJavaVFrame() and recreating the scope objects for every matched inlined vframe of a compiled frame which would be more expensive than using javaVFrame::sender() (that shares the scope object pool). > * Only resolve the callback interface method once per iterateFrames call. > * Only resolve the Method* of the ResolvedJavaMethods to be matched once per iterateFrames call. > * Only allocate localIsVirtual array if at least one local is virtual (the Java part already expects this). > * Use matched ResolvedJavaMethod instances instead of going through JVMCIEnv::get_jvmci_method, if possible. This pull request has now been integrated. Changeset: b1bb05bc Author: Andreas Woess Committer: Tom Rodriguez URL: https://git.openjdk.java.net/jdk/commit/b1bb05bcf4956f38d6e1a15bcfbed92154ba85a2 Stats: 461 lines in 6 files changed: 337 ins; 55 del; 69 mod 8269592: [JVMCI] Optimize c2v_iterateFrames Reviewed-by: kvn, never, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/4625 From svkamath at openjdk.java.net Wed Jul 14 21:02:01 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Wed, 14 Jul 2021 21:02:01 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: > I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. > Performance gain of ~1.5x - 2x for message sizes 8k and above. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated AES-GCM intrinsic to match latest Java Code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4019/files - new: https://git.openjdk.java.net/jdk/pull/4019/files/4b6e881e..4a36816f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4019&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4019&range=02-03 Stats: 469 lines in 11 files changed: 226 ins; 128 del; 115 mod Patch: https://git.openjdk.java.net/jdk/pull/4019.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4019/head:pull/4019 PR: https://git.openjdk.java.net/jdk/pull/4019 From duke at openjdk.java.net Wed Jul 14 21:12:17 2021 From: duke at openjdk.java.net (duke) Date: Wed, 14 Jul 2021 21:12:17 GMT Subject: Withdrawn: 8266936: Add a finalization JFR event In-Reply-To: References: Message-ID: On Tue, 18 May 2021 20:55:10 GMT, Brent Christian wrote: > Please review this enhancement to add a new JFR event, generated whenever a finalizer is run. > (The makeup is similar to the Deserialization event, [JDK-8261160](https://bugs.openjdk.java.net/browse/JDK-8261160).) > > The event's only datum (beyond those common to all jfr events) is the class of the object that was finalized. > > The Category for the event: > `"Java Virtual Machine" / "GC" / "Finalization"` > is what made sense to me, even though the event is generated from library code. > > Along with the new regtest, I added a run mode to the basic finalizer test to enable jfr. > Automated testing looks good so far. > > Thanks, > -Brent This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4101 From jwilhelm at openjdk.java.net Wed Jul 14 21:43:43 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 14 Jul 2021 21:43:43 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8270422: Test build/AbsPathsInImage.java fails after JDK-8259848 - 8266313: (JEP-356) - RandomGenerator spec implementation requirements tightly coupled to JDK internal classes - 8270075: SplittableRandom extends AbstractSplittableGenerator - 8266889: [macosx-aarch64] Crash with SIGBUS in MarkActivationClosure::do_code_blob during vmTestbase/nsk/jvmti/.../bi04t002 test run - 8259499: Handling type arguments from outer classes for inner class in javadoc - 8268620: InfiniteLoopException test may fail on x86 platforms - 8269865: Async UL needs to handle ERANGE on exceeding SEM_VALUE_MAX - 8270056: Generated lambda class can not access protected static method of target class The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4786&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4786&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4786/files Stats: 328 lines in 19 files changed: 240 ins; 14 del; 74 mod Patch: https://git.openjdk.java.net/jdk/pull/4786.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4786/head:pull/4786 PR: https://git.openjdk.java.net/jdk/pull/4786 From jwilhelm at openjdk.java.net Wed Jul 14 22:41:17 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 14 Jul 2021 22:41:17 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: On Wed, 14 Jul 2021 21:35:34 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: 7d0edb57 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/7d0edb5743aacfc22f76ee8aa7b03d7dc0f90dca Stats: 328 lines in 19 files changed: 240 ins; 14 del; 74 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4786 From kvn at openjdk.java.net Thu Jul 15 00:22:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 15 Jul 2021 00:22:14 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code Looks like you have some issues: wrong file property. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From kvn at openjdk.java.net Thu Jul 15 00:44:15 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 15 Jul 2021 00:44:15 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code First, you need review from Tony for Java side changes. Second, you need to extend tests in `test/hotspot/jtreg/compiler/codegen/aes/` to cover this implementation. And, third, I think we need to put this on hold until the issue of big intrinsics stubs generation effect on startup is solved. See discussion in https://bugs.openjdk.java.net/browse/JDK-8270323 - code_size1 = 20000 LP64_ONLY(+10000), // simply increase if too small (assembler will crash if too small) - code_size2 = 35300 LP64_ONLY(+25000) // simply increase if too small (assembler will crash if too small) + code_size1 = 20000 LP64_ONLY(+12000), // simply increase if too small (assembler will crash if too small) + code_size2 = 35300 LP64_ONLY(+37000) // simply increase if too small (assembler will crash if too small) @sviswa7 please, note these changes too for our discussion. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 7644: > 7642: } > 7643: if (UseAESIntrinsics) { > 7644: if (VM_Version::supports_avx512_vaes() && VM_Version::supports_avx512vl() && VM_Version::supports_avx512dq()) { Why duplicate already existing checks? Move code there and add comment for which intrinsic code is generated. src/hotspot/cpu/x86/stubRoutines_x86.hpp line 36: > 34: enum platform_dependent_constants { > 35: code_size1 = 20000 LP64_ONLY(+12000), // simply increase if too small (assembler will crash if too small) > 36: code_size2 = 35300 LP64_ONLY(+37000) // simply increase if too small (assembler will crash if too small) This is almost 50% increase !!! src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 333: > 331: static_field(StubRoutines, _bigIntegerRightShiftWorker, address) \ > 332: static_field(StubRoutines, _bigIntegerLeftShiftWorker, address) \ > 333: static_field(StubRoutines, _galoisCounterMode_AESCrypt, address) \ Move up to other AESCrypt lines. src/hotspot/share/opto/escape.cpp line 1111: > 1109: strcmp(call->as_CallLeaf()->_name, "bigIntegerLeftShiftWorker") == 0 || > 1110: strcmp(call->as_CallLeaf()->_name, "vectorizedMismatch") == 0 || > 1111: strcmp(call->as_CallLeaf()->_name, "galoisCounterMode_AESCrypt") == 0 || Please, move new line where other AEScrypt methods listed. src/hotspot/share/runtime/stubRoutines.cpp line 130: > 128: address StubRoutines::_base64_encodeBlock = NULL; > 129: address StubRoutines::_base64_decodeBlock = NULL; > 130: address StubRoutines::_galoisCounterMode_AESCrypt = NULL; Move up few lines src/hotspot/share/runtime/stubRoutines.hpp line 212: > 210: static address _base64_encodeBlock; > 211: static address _base64_decodeBlock; > 212: static address _galoisCounterMode_AESCrypt; Move up few lines src/hotspot/share/runtime/vmStructs.cpp line 592: > 590: static_field(StubRoutines, _unsafe_arraycopy, address) \ > 591: static_field(StubRoutines, _generic_arraycopy, address) \ > 592: static_field(StubRoutines, _galoisCounterMode_AESCrypt, address) \ Move up to other AESCrypt declarations. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4019 From github.com+58006833+xbzhang99 at openjdk.java.net Thu Jul 15 07:27:18 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Thu, 15 Jul 2021 07:27:18 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v2] In-Reply-To: References: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> Message-ID: On Wed, 14 Jul 2021 01:14:37 GMT, David Holmes wrote: >> Xubo Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8264543: Using Intel serialize instruction to replace cpuid in Cross modify fence, on supported platforms >> rebase with master > > src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 461: > >> 459: } >> 460: >> 461: bool os::supports_serialize(){ > > This function is unnecessary and pollutes the OS namespace with something that is not OS related. Just use VM_Version::supports_serialize() directly. orderAccess_linux_x86.hpp is included in orderAccess.hpp, atomic.hpp, etc. VM_Version is not defined in any of the nested header files. If I add inclusion of vm_version.hpp in any of these nested head filer, it will be messy. Not sure about the best solution here. ------------- PR: https://git.openjdk.java.net/jdk/pull/3334 From dholmes at openjdk.java.net Thu Jul 15 07:56:12 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 15 Jul 2021 07:56:12 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v2] In-Reply-To: References: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> Message-ID: <66oSsTIKRNq28BftbfhSyUTujvowAfeGi77FFa0_ZLw=.05bc1b5a-e712-4722-a860-18a409b697dd@github.com> On Thu, 15 Jul 2021 07:24:04 GMT, Xubo Zhang wrote: >> src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp line 461: >> >>> 459: } >>> 460: >>> 461: bool os::supports_serialize(){ >> >> This function is unnecessary and pollutes the OS namespace with something that is not OS related. Just use VM_Version::supports_serialize() directly. > > orderAccess_linux_x86.hpp is included in orderAccess.hpp, atomic.hpp, etc. VM_Version is not defined in any of the nested header files. If I add inclusion of vm_version.hpp in any of these nested head filer, it will be messy. > Not sure about the best solution here. What actually happens if you just include vm_version.hpp in orderAccess_linux_x86.hpp ? ------------- PR: https://git.openjdk.java.net/jdk/pull/3334 From mbaesken at openjdk.java.net Thu Jul 15 13:09:54 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Thu, 15 Jul 2021 13:09:54 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v4] In-Reply-To: References: Message-ID: > Hello, please review this PR; it extend the OSContainer API in order to also support the pids controller of cgroups. > > I noticed that unlike the other controllers "cpu", "cpuset", "cpuacct", "memory" on some older Linux distros (SLES 12.1, RHEL 7.1) the pids controller might not be there (or not fully supported) so it was added as optional , see the coding > > > if (!cg_infos[PIDS_IDX]._data_complete) { > log_debug(os, container)("Optional cgroup v1 pids subsystem not found"); > // keep the other controller info, pids is optional > } Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Add hotspot tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4518/files - new: https://git.openjdk.java.net/jdk/pull/4518/files/f5527143..3fe73c3c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4518&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4518&range=02-03 Stats: 117 lines in 3 files changed: 116 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4518.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4518/head:pull/4518 PR: https://git.openjdk.java.net/jdk/pull/4518 From mbaesken at openjdk.java.net Thu Jul 15 13:24:19 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Thu, 15 Jul 2021 13:24:19 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v2] In-Reply-To: References: <4TD_2jJOnOQ6-D2eCFdJzF3tQg_H-Vm6IrFcyX_xSIw=.028fbe3f-bc04-4b9c-8b35-a6a450a80f7f@github.com> Message-ID: On Fri, 9 Jul 2021 13:53:27 GMT, Matthias Baesken wrote: > > OK. Please also add a test on the hotspot side. You may want to add relevant parts to `TestMisc.java`. > > Thanks for the suggestion, I will look into TestMisc.java . I added some HS testing code in the latest commit. Best regards, Matthias ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From sgehwolf at openjdk.java.net Thu Jul 15 13:42:17 2021 From: sgehwolf at openjdk.java.net (Severin Gehwolf) Date: Thu, 15 Jul 2021 13:42:17 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v4] In-Reply-To: References: Message-ID: <3VJ1EMr_QoFConUexuO8Z1pUGuLoIKiNAlJyFT0HKHw=.e2904647-69c9-418e-b0be-db252d439e30@github.com> On Thu, 15 Jul 2021 13:09:54 GMT, Matthias Baesken wrote: >> Hello, please review this PR; it extend the OSContainer API in order to also support the pids controller of cgroups. >> >> I noticed that unlike the other controllers "cpu", "cpuset", "cpuacct", "memory" on some older Linux distros (SLES 12.1, RHEL 7.1) the pids controller might not be there (or not fully supported) so it was added as optional , see the coding >> >> >> if (!cg_infos[PIDS_IDX]._data_complete) { >> log_debug(os, container)("Optional cgroup v1 pids subsystem not found"); >> // keep the other controller info, pids is optional >> } > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Add hotspot tests Thanks, I'll test it and review once again. Meanwhile, please merge the master branch into your `JDK-8266490` branch and push so that we get a better ruling of pre-integration tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From rkennke at openjdk.java.net Thu Jul 15 14:44:21 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 15 Jul 2021 14:44:21 GMT Subject: RFR: 8270554: Shenandoah: Optimize heap scan loop Message-ID: This is a fall-out from Lilliput. I noticed that in the heap scan loop, we load the size of objects in the size-based object scanner, even though all of the object closures already load the size, or at least in some cases, the Klass* which is necessary to determine the size. We can optimize that by making the scan loop co-operate with the closures. In other words, this changes the loop to avoid double-loading the Klass* in most cases in the size-based part of the scan loop. Note: the motivation in Lilliput is not performance, but correctness, because there loading the Klass* means loading the header, and this needs to be done carefully because of concurrent evacuation and concurrent locking code both messing with the header, and thus depends a lot on the actual closures to do it correctly. Implementation notes: - SH::evacuate_object() has been changed so that it can return both the forwardee and the size. I opted to return the size as return-value because otherwise I'd have to null check an incoming pointer in the cases when we're not interested in the size. The way it is done, it can simply be ignored (and optimized-out) by the compiler. - I added a do_object_size() variant to all affected iterators. I tried to do it with templates, but could not figure out how to please the compiler. - While I was at it, I marked all do_object() methods as 'inline'. - I ran some benchmarks. I think I see consistent but small improvements in evac and update-refs times, but it's not large enough to say that it is a definite improvement. Testing: - [x] hotspot_gc_shenandoah - [ ] tier1 (+UseShenandoahGC) - [x] specjvm testing ------------- Commit messages: - 8270554: Shenandoah: Optimize heap scan loop Changes: https://git.openjdk.java.net/jdk/pull/4797/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4797&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270554 Stats: 92 lines in 10 files changed: 59 ins; 3 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/4797.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4797/head:pull/4797 PR: https://git.openjdk.java.net/jdk/pull/4797 From ogatak at openjdk.java.net Thu Jul 15 17:22:15 2021 From: ogatak at openjdk.java.net (Kazunori Ogata) Date: Thu, 15 Jul 2021 17:22:15 GMT Subject: RFR: JDK-8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup In-Reply-To: References: Message-ID: On Tue, 13 Jul 2021 04:28:47 GMT, Corey Ashford wrote: > This series of commits was written to accomplish several cleanup and Power10 optimization tasks for the Base64 decodeBlock intrinsic for Power64: > * Remove the ISA 3.1+ (Power10+) pextd instruction optimization in decodeBlock. This "optimization" turned out to actually cause a performance hit. Removing it gains back about 3% in performance. > * Introduce a constant block, similar to that in use by encodeBlock() to speed up constant loading. > * Add the ISA 3.1+ xxpermx instruction and align_prefix() method for use in a Power10 optimization for decodeBlock. Please see the commit log for my concerns about this change. > * Implement the xxpermx-based decodeBlock algorithm for Power10+, which gives about a 5% performance boost. > > More details can be found in the commit logs. > > I want to note here that I looked into changing the loop_unrolls constant, and found that at large buffer sizes, the values of 2 and 4 give some extra performance gain. For example, on a 20001-byte destination buffer, I see an increase from 4.7X over intrinsic disabled (loop_unrolls=1), to 5.3X over intrinsic disabled (loop_unrolls=4), but on smaller buffer sizes, up to about 512, it causes performance degradation over loop_unrolls=1, so I have decided to stick with the original value of 1, since I don't know where to focus the performance versus buffer length tradeoff. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3869: > 3867: VectorSRegister M = VSR8; > 3868: > 3869: // P10+ VSR lookup variables "P10+" should better be "P10 or above" (or something grammatically correct) because it appears like an enhanced version of P10, like P7+. Although IBM didn't released "+" models after P8, they have manufactured with reduced semiconductor process and apply enhancements, including larger caches, higher clock frequency, etc. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3995: > 3993: __ align_prefix(); __ xxpermx(xlate_b, table_64_79, table_80_95, input->to_vsr(), 2); > 3994: __ xxlor(xlate_b, xlate_a, xlate_b); > 3995: __ align_prefix(); __ xxpermx(xlate_a, table_96_111, table_112_127, input->to_vsr(), 3); This is just a reminder comment. align_prefix() does nothing when loop_unrolls is 1 because the label unrolled_loop_start is aligned to a 32-byte boundary and these three prefixed instructions are within 32bytes after the label, and thus, they never cross a 64-byte boundary. Since the comment above mentions that the best unrolling factor here is 1, align_prefix() are just safe guards for the case someone try to unroll this loop. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 4038: > 4036: __ bne_predict_not_taken(CCR6, unrolled_loop_exit); > 4037: > 4038: // The Base64 characters had no errors, so add the offsets It may be helpful to add comments for P10 case, where each byte of input contains (decoded 6-bit binary | 0x80) and offsets contains 0x80, and vaddubm is used to clear the MSB. ------------- PR: https://git.openjdk.java.net/jdk/pull/4762 From cashford at openjdk.java.net Thu Jul 15 17:31:19 2021 From: cashford at openjdk.java.net (Corey Ashford) Date: Thu, 15 Jul 2021 17:31:19 GMT Subject: RFR: JDK-8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup In-Reply-To: References: Message-ID: On Thu, 15 Jul 2021 17:02:43 GMT, Kazunori Ogata wrote: >> This series of commits was written to accomplish several cleanup and Power10 optimization tasks for the Base64 decodeBlock intrinsic for Power64: >> * Remove the ISA 3.1+ (Power10+) pextd instruction optimization in decodeBlock. This "optimization" turned out to actually cause a performance hit. Removing it gains back about 3% in performance. >> * Introduce a constant block, similar to that in use by encodeBlock() to speed up constant loading. >> * Add the ISA 3.1+ xxpermx instruction and align_prefix() method for use in a Power10 optimization for decodeBlock. Please see the commit log for my concerns about this change. >> * Implement the xxpermx-based decodeBlock algorithm for Power10+, which gives about a 5% performance boost. >> >> More details can be found in the commit logs. >> >> I want to note here that I looked into changing the loop_unrolls constant, and found that at large buffer sizes, the values of 2 and 4 give some extra performance gain. For example, on a 20001-byte destination buffer, I see an increase from 4.7X over intrinsic disabled (loop_unrolls=1), to 5.3X over intrinsic disabled (loop_unrolls=4), but on smaller buffer sizes, up to about 512, it causes performance degradation over loop_unrolls=1, so I have decided to stick with the original value of 1, since I don't know where to focus the performance versus buffer length tradeoff. > > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3869: > >> 3867: VectorSRegister M = VSR8; >> 3868: >> 3869: // P10+ VSR lookup variables > > "P10+" should better be "P10 or above" (or something grammatically correct) because it appears like an enhanced version of P10, like P7+. Although IBM didn't released "+" models after P8, they have manufactured with reduced semiconductor process and apply enhancements, including larger caches, higher clock frequency, etc. Good point. Will fix. > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3995: > >> 3993: __ align_prefix(); __ xxpermx(xlate_b, table_64_79, table_80_95, input->to_vsr(), 2); >> 3994: __ xxlor(xlate_b, xlate_a, xlate_b); >> 3995: __ align_prefix(); __ xxpermx(xlate_a, table_96_111, table_112_127, input->to_vsr(), 3); > > This is just a reminder comment. align_prefix() does nothing when loop_unrolls is 1 because the label unrolled_loop_start is aligned to a 32-byte boundary and these three prefixed instructions are within 32bytes after the label, and thus, they never cross a 64-byte boundary. Since the comment above mentions that the best unrolling factor here is 1, align_prefix() are just safe guards for the case someone try to unroll this loop. That's a good point. Initially I had removed the align_prefix() calls for that reason and added a comment about why they weren't needed, but it burnt me when I was testing an unroll value of 4. I will put in a comment like you asked. > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 4038: > >> 4036: __ bne_predict_not_taken(CCR6, unrolled_loop_exit); >> 4037: >> 4038: // The Base64 characters had no errors, so add the offsets > > It may be helpful to add comments for P10 case, where each byte of input contains (decoded 6-bit binary | 0x80) and offsets contains 0x80, and vaddubm is used to clear the MSB. Good point. I'll update the comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/4762 From github.com+58006833+xbzhang99 at openjdk.java.net Thu Jul 15 18:15:15 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Thu, 15 Jul 2021 18:15:15 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v2] In-Reply-To: <66oSsTIKRNq28BftbfhSyUTujvowAfeGi77FFa0_ZLw=.05bc1b5a-e712-4722-a860-18a409b697dd@github.com> References: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> <66oSsTIKRNq28BftbfhSyUTujvowAfeGi77FFa0_ZLw=.05bc1b5a-e712-4722-a860-18a409b697dd@github.com> Message-ID: On Thu, 15 Jul 2021 07:53:28 GMT, David Holmes wrote: >> orderAccess_linux_x86.hpp is included in orderAccess.hpp, atomic.hpp, etc. VM_Version is not defined in any of the nested header files. If I add inclusion of vm_version.hpp in any of these nested head filer, it will be messy. >> Not sure about the best solution here. > > What actually happens if you just include vm_version.hpp in orderAccess_linux_x86.hpp ? it will give me errors, like 'Atomic' has not been declared. Also, the cross_modify_fence_impl actually is os-dependent. Linux and Windows have different implementations. ------------- PR: https://git.openjdk.java.net/jdk/pull/3334 From mdoerr at openjdk.java.net Thu Jul 15 18:51:10 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 15 Jul 2021 18:51:10 GMT Subject: RFR: JDK-8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup In-Reply-To: References: Message-ID: On Thu, 15 Jul 2021 17:27:04 GMT, Corey Ashford wrote: >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3995: >> >>> 3993: __ align_prefix(); __ xxpermx(xlate_b, table_64_79, table_80_95, input->to_vsr(), 2); >>> 3994: __ xxlor(xlate_b, xlate_a, xlate_b); >>> 3995: __ align_prefix(); __ xxpermx(xlate_a, table_96_111, table_112_127, input->to_vsr(), 3); >> >> This is just a reminder comment. align_prefix() does nothing when loop_unrolls is 1 because the label unrolled_loop_start is aligned to a 32-byte boundary and these three prefixed instructions are within 32bytes after the label, and thus, they never cross a 64-byte boundary. Since the comment above mentions that the best unrolling factor here is 1, align_prefix() are just safe guards for the case someone try to unroll this loop. > > That's a good point. Initially I had removed the align_prefix() calls for that reason and added a comment about why they weren't needed, but it burnt me when I was testing an unroll value of 4. > I will put in a comment like you asked. I guess we will never change the unroll factor. Why not clean it up? ------------- PR: https://git.openjdk.java.net/jdk/pull/4762 From rkennke at openjdk.java.net Thu Jul 15 20:03:34 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 15 Jul 2021 20:03:34 GMT Subject: RFR: 8270794: Avoid loading Klass* twice in TypeArrayKlass::oop_size() Message-ID: TypeArrayKlass::oop_size() calls into TypoArrayOopDesc::object_size() which loads the Klass* from the object, but this is not necessary because we're coming from TypeArrayKlass. Note: This came up in Lilliput, where we need to be careful how to load the Klass, and must figure out the object size using oopDesc::size_given_klass() without blindly re-loading the Klass*. Outside of Lilliput I consider this a cosmetic change (i.e. no substantial performance improvement expected because most cases should be covered by layout-helper). Testing: - [ ] tier1 - [ ] tier2 ------------- Commit messages: - 8270794: Avoid loading Klass* twice in TypeArrayKlass::oop_size() Changes: https://git.openjdk.java.net/jdk/pull/4799/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4799&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270794 Stats: 5 lines in 4 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/4799.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4799/head:pull/4799 PR: https://git.openjdk.java.net/jdk/pull/4799 From valeriep at openjdk.java.net Thu Jul 15 22:47:13 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Thu, 15 Jul 2021 22:47:13 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 170: > 168: > 169: // always encrypt mode for embedded cipher > 170: blockCipher.init(false, key.getAlgorithm(), keyValue); Is this change intentional? Looks like we are reverting to older version of source and undo newer changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Thu Jul 15 22:50:17 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Thu, 15 Jul 2021 22:50:17 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 472: > 470: engine = null; > 471: if (encodedKey != null) { > 472: Arrays.fill(encodedKey, (byte)0); Looks like another unintentional newer->older change. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Thu Jul 15 22:55:13 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Thu, 15 Jul 2021 22:55:13 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 992: > 990: */ > 991: byte[] overlapDetection(byte[] in, int inOfs, byte[] out, int outOfs) { > 992: if (in == out && (!encryption || inOfs < outOfs)) { So, we will always allocate an output buffer for decryption if in==out? Why just decryption? Update the javadoc for this method with the reason? ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Thu Jul 15 23:53:10 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Thu, 15 Jul 2021 23:53:10 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: <2Et7eUIahoDViqp0MhD_mMse1n0tW7fxWgMjea9yyQU=.09d1eeac-a7be-4da5-8ac6-5405a7c0e45f@github.com> On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 589: > 587: * Requires 768 bytes (48 AES blocks) to efficiently use the intrinsic > 588: * @param in input buffer > 589: * @param inOfs input offset missed @param inLen src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 594: > 592: * @param out output buffer > 593: * @param outOfs output offset > 594: * @param gctr object for the CTR operation typo: CTR->GCTR? ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Fri Jul 16 00:14:16 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 16 Jul 2021 00:14:16 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: <3X43XyHHzWOWHvKNMoblGEQpvOBIB5cudVpXZl2yIH8=.4e382ef3-90c8-4749-8768-12470d98e9ab@github.com> On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 611: > 609: outOfs + len); > 610: ghash.update(ct, ctOfs, segments); > 611: ctOfs = len; This does not look right when the initial value of ctOfs != 0. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From ascarpino at openjdk.java.net Fri Jul 16 00:14:14 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Fri, 16 Jul 2021 00:14:14 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: <9DGzWlRgC8DaSEZFFeOzQJuRvopW8CISMLJwYQAUGTo=.1aa32797-386f-4101-a96d-6cbad78934f7@github.com> On Thu, 15 Jul 2021 22:44:05 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 170: > >> 168: >> 169: // always encrypt mode for embedded cipher >> 170: blockCipher.init(false, key.getAlgorithm(), keyValue); > > Is this change intentional? Looks like we are reverting to older version of source and undo newer changes. Nope.. unintentional > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 472: > >> 470: engine = null; >> 471: if (encodedKey != null) { >> 472: Arrays.fill(encodedKey, (byte)0); > > Looks like another unintentional newer->older change. I don't remember an old comment about that, dunno if that was reverted > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 992: > >> 990: */ >> 991: byte[] overlapDetection(byte[] in, int inOfs, byte[] out, int outOfs) { >> 992: if (in == out && (!encryption || inOfs < outOfs)) { > > So, we will always allocate an output buffer for decryption if in==out? Why just decryption? Update the javadoc for this method with the reason? If the crypto is decryption in-place, an internal output buffer is needed in case the auth tag fails, otherwise the input buffer would be zero'ed. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From mbaesken at openjdk.java.net Fri Jul 16 06:14:07 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Fri, 16 Jul 2021 06:14:07 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v5] In-Reply-To: References: Message-ID: > Hello, please review this PR; it extend the OSContainer API in order to also support the pids controller of cgroups. > > I noticed that unlike the other controllers "cpu", "cpuset", "cpuacct", "memory" on some older Linux distros (SLES 12.1, RHEL 7.1) the pids controller might not be there (or not fully supported) so it was added as optional , see the coding > > > if (!cg_infos[PIDS_IDX]._data_complete) { > log_debug(os, container)("Optional cgroup v1 pids subsystem not found"); > // keep the other controller info, pids is optional > } Matthias Baesken has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8266490 - Add hotspot tests - test and small adjustments suggested by Severin - Adjustments following Severins comments - JDK-8266490 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4518/files - new: https://git.openjdk.java.net/jdk/pull/4518/files/3fe73c3c..5fc52fb1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4518&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4518&range=03-04 Stats: 675520 lines in 5968 files changed: 547978 ins; 105328 del; 22214 mod Patch: https://git.openjdk.java.net/jdk/pull/4518.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4518/head:pull/4518 PR: https://git.openjdk.java.net/jdk/pull/4518 From dholmes at openjdk.java.net Fri Jul 16 06:24:12 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 16 Jul 2021 06:24:12 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v2] In-Reply-To: References: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> <66oSsTIKRNq28BftbfhSyUTujvowAfeGi77FFa0_ZLw=.05bc1b5a-e712-4722-a860-18a409b697dd@github.com> Message-ID: <59pKeqHa3QFazHZyOhmv1izEKylKq02KwM2yxD5n9aE=.b37590ab-77a1-4819-b70b-71a917c3781a@github.com> On Thu, 15 Jul 2021 18:12:31 GMT, Xubo Zhang wrote: >> What actually happens if you just include vm_version.hpp in orderAccess_linux_x86.hpp ? > > it will give me errors, like 'Atomic' has not been declared. > Also, the cross_modify_fence_impl actually is os-dependent. Linux and Windows have different implementations. vm_version_x86.hpp has a dependency on universe.hpp which has a dependency on orderAccess.hpp which is why that circularity problem arises. That can be fixed by moving the inline definition of `supports_clflush()` (which calls `Universe::is_fully_initialized()`) out of vm_version_x86.hpp into vm_version_x86.cpp. We can then proceed to include vm_version.hpp in orderAccess.hpp (something I'm surprised is not already done as I recall CPU specific memory barriers being used back in JDK 7 and 8). That change (of course) requires a few other tweaks, so I put the changes together here and ran them through our build system. https://github.com/openjdk/jdk/compare/master...dholmes-ora:orderAccess-vm_version?expand=1 ------------- PR: https://git.openjdk.java.net/jdk/pull/3334 From stuefe at openjdk.java.net Fri Jul 16 11:13:13 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 16 Jul 2021 11:13:13 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value Message-ID: Hi, may I have reviews for this fix to Arena alignment handling in hotspot. This affects the old Arena classes as well as child classes (ResourceArea, HandleArea etc). This is a followup to https://bugs.openjdk.java.net/browse/JDK-8270179 and other recent Arena fixes. I redid this patch twice. It was tempting to rewrite more, but I decided to keep this change as small as possible to make reviews easier. And defer cleanups to separate RFEs. This patch establishes new alignment rules, but actual behavioral changes to current Arena users should be minimal and should only affect 32-bit anyway. Before this patch, Arena allocation checked (or auto-aligned) the *allocation size* in an attempt to guarantee alignment of the allocation start address. That was neither needed nor sufficient - see JBS text. The new code just allows any allocation size, aligned or unaligned. However, it now clearly guarantees alignment of the *allocation start address*. E.g., theoretically, you could allocate a 16bit word at a 64bit boundary. The arena does this by aligning, if needed, before an allocation. Alignment gaps are filled (debug build + ZapResourceArea) with a special zap pattern. --- Notes: - Chunk start, payload start, and payload end have to be aligned to max arena alignment. I wondered whether just statically assert `sizeof(Chunk)` to be aligned; but I did not find a pragma for this for all platforms, and did not want to play whack-the-mole with 32bit and strange compilers like Xlc. Therefore we align payload start separately. - To simplify implementation, I added the rule that the four standard chunk sizes (those cached in the pools) have to be aligned to max arena alignment. They mostly were already, just had to tweak numbers a bit for 32-bit. - Chunks are preceded with a canary now for debugging purposes and potential sanity tests in gtests (not used yet) - Arena::realloc(): - Commented in header that the original alignment is preserved (which it is since Arealloc either changes the allocation in-place, so start address does not change, or reallocates with max possible alignment. - I fixed another pointer calculation to use pointer_delta when an allocation is grown in-place - I added code to zap the leftover memory if an allocation shrinks Tests: - I wrote a large number of gtests for the Arena class. These test proper behavior on alloc, realloc, and free, including overwrite tests and gap pattern tests. - wrote a new gtests jtreg wrapper to run the Arena gtests with +UseMallocOnly - I think as long as we support it we should test it. Ran all tests on 32bit and 64bit Linux on my box. Nightlies at SAP are in progress. Manual tests: - I also tested a higher arena max alignment (16 bytes on 64 bit), ran gtests, worked fine - I also tested an unaligned Chunk size on 64bit to test padding after header is handled correctly, works fine ------------- Commit messages: - Arena alignment fixes Changes: https://git.openjdk.java.net/jdk/pull/4784/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4784&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270308 Stats: 624 lines in 6 files changed: 568 ins; 19 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/4784.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4784/head:pull/4784 PR: https://git.openjdk.java.net/jdk/pull/4784 From whuang at openjdk.java.net Fri Jul 16 11:30:24 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 16 Jul 2021 11:30:24 GMT Subject: RFR: 8270832: Aarch64: Update algorithm annotations for MacroAssembler::fill_words Message-ID: <2xPts-aE-Mr-T24nLCj5WZnGieBVx9oVtJ-WzKcU0mM=.ad6e2fe0-db5f-48c9-a604-88b332c50db1@github.com> It is found that the comments of `MacroAssembler::fill_words` is not right here. Let's fix that. ------------- Commit messages: - 8270832: Aarch64: Update algorithm annotations for MacroAssembler::fill_words Changes: https://git.openjdk.java.net/jdk/pull/4809/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4809&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270832 Stats: 14 lines in 1 file changed: 7 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4809.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4809/head:pull/4809 PR: https://git.openjdk.java.net/jdk/pull/4809 From thartmann at openjdk.java.net Fri Jul 16 14:06:25 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 16 Jul 2021 14:06:25 GMT Subject: [jdk17] RFR: 8270461: ZGC: Invalid oop passed to ZBarrierSetRuntime::load_barrier_on_oop_array Message-ID: For object arrays, C2's clone intrinsic emits calls to the `oop_disjoint_arraycopy_uninit` stub. With ZGC, load barriers on the source array elements are applied via `BarrierSetAssembler::arraycopy_prologue` before copying to the destination array: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L2400-L2403 The problem is that `BarrierSetC2::arraycopy_payload_base_offset` may 8-byte align the array offset to ensure that a copy in 8-byte chunks of the 8-byte aligned object is possible: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L658-L662 Now with ` -XX:-UseCompressedClassPointers`, the offset starts at the 4-byte length field of the array (8 bytes mark word + 8 byte klass = 16 byte). That is fine if we don't need load barriers and can copy the array as `T_LONG` but with ZGC we crash in `ZBarrier::mark` because the element is not a valid oop. I propose to simply set the offset to the first array element when cloning Object arrays with ZGC. We can still copy in 8 byte chunks because the oop elements are 8 byte on 64-bit (and ZGC is only supported on 64-bit). I found this when investigating intermittent ZGC crashes in project Valhalla. The bug was introduced by [JDK-8268125](https://bugs.openjdk.java.net/browse/JDK-8268125) in JDK 17. The code is quite messy and will hopefully be cleaned up by [JDK-8268020](https://bugs.openjdk.java.net/browse/JDK-8268020). Thanks, Tobias ------------- Commit messages: - 8270461: ZGC: Invalid oop passed to ZBarrierSetRuntime::load_barrier_on_oop_array Changes: https://git.openjdk.java.net/jdk17/pull/252/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=252&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270461 Stats: 14 lines in 3 files changed: 8 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk17/pull/252.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/252/head:pull/252 PR: https://git.openjdk.java.net/jdk17/pull/252 From github.com+58006833+xbzhang99 at openjdk.java.net Fri Jul 16 16:17:54 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Fri, 16 Jul 2021 16:17:54 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v2] In-Reply-To: <59pKeqHa3QFazHZyOhmv1izEKylKq02KwM2yxD5n9aE=.b37590ab-77a1-4819-b70b-71a917c3781a@github.com> References: <-uL2Xl3RD6-QYEUniGR5ur7YbGLVsjRZnqWdQhkUIQg=.26632121-76b8-41ea-afaa-626e9502af07@github.com> <66oSsTIKRNq28BftbfhSyUTujvowAfeGi77FFa0_ZLw=.05bc1b5a-e712-4722-a860-18a409b697dd@github.com> <59pKeqHa3QFazHZyOhmv1izEKylKq02KwM2yxD5n9aE=.b37590ab-77a1-4819-b70b-71a917c3781a@github.com> Message-ID: <2dcGB4plpCOjdE3VgKIywSLkj66kLDBKZoJi52bd3WM=.9f94ed16-c880-4f1c-b4f6-0b32145eb65e@github.com> On Fri, 16 Jul 2021 06:20:20 GMT, David Holmes wrote: >> it will give me errors, like 'Atomic' has not been declared. >> Also, the cross_modify_fence_impl actually is os-dependent. Linux and Windows have different implementations. > > vm_version_x86.hpp has a dependency on universe.hpp which has a dependency on orderAccess.hpp which is why that circularity problem arises. That can be fixed by moving the inline definition of `supports_clflush()` (which calls `Universe::is_fully_initialized()`) out of vm_version_x86.hpp into vm_version_x86.cpp. We can then proceed to include vm_version.hpp in orderAccess.hpp (something I'm surprised is not already done as I recall CPU specific memory barriers being used back in JDK 7 and 8). That change (of course) requires a few other tweaks, so I put the changes together here and ran them through our build system. > https://github.com/openjdk/jdk/compare/master...dholmes-ora:orderAccess-vm_version?expand=1 cool. thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/3334 From valeriep at openjdk.java.net Fri Jul 16 19:44:53 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 16 Jul 2021 19:44:53 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 629: > 627: GCTR gctr; > 628: GHASH ghash; > 629: GCMOperation op; It seems clearer to initialize "op" in GCMEngine ctor since it's declared here. There is already logic in its method checking whether we are doing encryption or decryption. src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 650: > 648: int originalOutOfs = 0; > 649: byte[] in; > 650: byte[] out; The name "in", "out" are almost used in all calls, it's hard to tell when these two are actually used. Can we rename them to make them more unique? src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 724: > 722: } else { > 723: ct = in; > 724: } This can just be: byte[] ct = (encryption? out : in); Since you only use this 'ct' variable inside the else block on line 746, move this down to that block. src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 743: > 741: dst.array(), dst.arrayOffset() + dst.position(), > 742: gctr, ghash); > 743: } Can we use another ByteBuffer variable to avoid almost-duplicate calls? ByteBuffer ct = (encryption? dst : src); rlen -= GaloisCounterMode.implGCMCrypt(src.array(), src.arrayOffset() + src.position(), src.remaining(), ct.array(), ct.arrayOffset() + ct.position(), dst.array(), dst.arrayOffset() + dst.position(), gctr, ghash); ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Fri Jul 16 19:44:54 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 16 Jul 2021 19:44:54 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Fri, 16 Jul 2021 00:32:16 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 629: > >> 627: GCTR gctr; >> 628: GHASH ghash; >> 629: GCMOperation op; > > It seems clearer to initialize "op" in GCMEngine ctor since it's declared here. There is already logic in its method checking whether we are doing encryption or decryption. Now that you have GCMOperation op, but there is still if-else blocks checking whether it's encryption/decryption and uses gctr and ghash instead of op. Looks like a bit adhoc? Can GaloisCounterMode.implGCMCrypt(...) just take GCMOperation op instead, no need for ct, ctOfs, gctr and ghash? ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Fri Jul 16 20:51:52 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 16 Jul 2021 20:51:52 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Wed, 14 Jul 2021 21:02:01 GMT, Smita Kamath wrote: >> I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. >> Performance gain of ~1.5x - 2x for message sizes 8k and above. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated AES-GCM intrinsic to match latest Java Code src/java.base/share/classes/com/sun/crypto/provider/GHASH.java line 146: > 144: } > 145: state = new long[2]; > 146: subkeyHtbl = new long[2*57]; // 48 keys for the interleaved implementation, 8 for avx-ghash implementation and one for the original key nit: the comment is too long, i.e. > 80 ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From cashford at openjdk.java.net Sat Jul 17 00:05:50 2021 From: cashford at openjdk.java.net (Corey Ashford) Date: Sat, 17 Jul 2021 00:05:50 GMT Subject: RFR: JDK-8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup In-Reply-To: References: Message-ID: <2MGEbbqycE1PB8pY5U0Ups5OhB4WgC6sTZXFDjmzlFY=.330056df-bee3-4c05-94b9-e988afc0f35a@github.com> On Thu, 15 Jul 2021 18:48:37 GMT, Martin Doerr wrote: > I guess we will never change the unroll factor. Why not clean it up? I debated with myself back and forth on that. My thinking was that perhaps it will make sense on later processors, like Power11. On the other hand, it's some technical debt to have that complexity without the need. I'll push another commit to remove it (after testing it, of course) ------------- PR: https://git.openjdk.java.net/jdk/pull/4762 From duke at openjdk.java.net Sat Jul 17 00:11:55 2021 From: duke at openjdk.java.net (duke) Date: Sat, 17 Jul 2021 00:11:55 GMT Subject: Withdrawn: 8267356: AArch64: Vector API SVE codegen support In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Thu, 20 May 2021 07:32:52 GMT, Ningsheng Jian wrote: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Note: our original plan was making this work part of JEP 414 Vector API (Second Incubator) [1], but we realized that it's now close to 17 release cycle and the JEP process may take time. Adding more features could delay the whole review process for the JEP. So we separate this work out as a standalone patch. > > [1] http://openjdk.java.net/jeps/414 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From dholmes at openjdk.java.net Sat Jul 17 01:01:04 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 17 Jul 2021 01:01:04 GMT Subject: RFR: 8270862: Fix problem list entries for 32-bit Message-ID: 32-bit is specified as linux-i586 not linux-x86 Waiting for GHA results to confirm. Thanks, David ------------- Commit messages: - Fix 32-bit Changes: https://git.openjdk.java.net/jdk/pull/4818/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4818&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270862 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4818.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4818/head:pull/4818 PR: https://git.openjdk.java.net/jdk/pull/4818 From kvn at openjdk.java.net Sat Jul 17 01:20:08 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 17 Jul 2021 01:20:08 GMT Subject: [jdk17] RFR: 8270461: ZGC: Invalid oop passed to ZBarrierSetRuntime::load_barrier_on_oop_array In-Reply-To: References: Message-ID: On Fri, 16 Jul 2021 13:59:02 GMT, Tobias Hartmann wrote: > For object arrays, C2's clone intrinsic emits calls to the `oop_disjoint_arraycopy_uninit` stub. With ZGC, load barriers on the source array elements are applied via `BarrierSetAssembler::arraycopy_prologue` before copying to the destination array: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L2400-L2403 > > The problem is that `BarrierSetC2::arraycopy_payload_base_offset` may 8-byte align the array offset to ensure that a copy in 8-byte chunks of the 8-byte aligned object is possible: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L658-L662 > Now with ` -XX:-UseCompressedClassPointers`, the offset starts at the 4-byte length field of the array (8 bytes mark word + 8 byte klass = 16 byte). That is fine if we don't need load barriers and can copy the array as `T_LONG` but with ZGC we crash in `ZBarrier::mark` because the element is not a valid oop. > > I propose to simply set the offset to the first array element when cloning Object arrays with ZGC. We can still copy in 8 byte chunks because the oop elements are 8 byte on 64-bit (and ZGC is only supported on 64-bit). > > I found this when investigating intermittent ZGC crashes in project Valhalla. The bug was introduced by [JDK-8268125](https://bugs.openjdk.java.net/browse/JDK-8268125) in JDK 17. The code is quite messy and will hopefully be cleaned up by [JDK-8268020](https://bugs.openjdk.java.net/browse/JDK-8268020). > > Thanks, > Tobias Should we fix `arraycopy_payload_base_offset()` instead? The placement of the fix looks strange: `src_offset` and `dest_offset` are some input nodes from `ArrayCopyNode` but you replacing them with one constant. The fix is valid because the code is guarded by `ac->is_clone_array()` check. But I think `src_offset` and `dest_offset` should be already corrected when `ZBarrierSetC2::clone_at_expansion()` is called. ------------- PR: https://git.openjdk.java.net/jdk17/pull/252 From cashford at openjdk.java.net Sat Jul 17 02:01:26 2021 From: cashford at openjdk.java.net (Corey Ashford) Date: Sat, 17 Jul 2021 02:01:26 GMT Subject: RFR: 8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup [v2] In-Reply-To: References: Message-ID: On Thu, 15 Jul 2021 17:27:18 GMT, Corey Ashford wrote: >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3869: >> >>> 3867: VectorSRegister M = VSR8; >>> 3868: >>> 3869: // P10+ VSR lookup variables >> >> "P10+" should better be "P10 or above" (or something grammatically correct) because it appears like an enhanced version of P10, like P7+. Although IBM didn't released "+" models after P8, they have manufactured with reduced semiconductor process and apply enhancements, including larger caches, higher clock frequency, etc. > > Good point. Will fix. I believe I have resolved this with my latest commit to this PR. Please double check. >> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 4038: >> >>> 4036: __ bne_predict_not_taken(CCR6, unrolled_loop_exit); >>> 4037: >>> 4038: // The Base64 characters had no errors, so add the offsets >> >> It may be helpful to add comments for P10 case, where each byte of input contains (decoded 6-bit binary | 0x80) and offsets contains 0x80, and vaddubm is used to clear the MSB. > > Good point. I'll update the comment. I believe I have resolved this with my latest commit to this PR. Please double check. ------------- PR: https://git.openjdk.java.net/jdk/pull/4762 From cashford at openjdk.java.net Sat Jul 17 02:01:23 2021 From: cashford at openjdk.java.net (Corey Ashford) Date: Sat, 17 Jul 2021 02:01:23 GMT Subject: RFR: 8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup [v2] In-Reply-To: References: Message-ID: > This series of commits was written to accomplish several cleanup and Power10 optimization tasks for the Base64 decodeBlock intrinsic for Power64: > * Remove the ISA 3.1+ (Power10+) pextd instruction optimization in decodeBlock. This "optimization" turned out to actually cause a performance hit. Removing it gains back about 3% in performance. > * Introduce a constant block, similar to that in use by encodeBlock() to speed up constant loading. > * Add the ISA 3.1+ xxpermx instruction and align_prefix() method for use in a Power10 optimization for decodeBlock. Please see the commit log for my concerns about this change. > * Implement the xxpermx-based decodeBlock algorithm for Power10+, which gives about a 5% performance boost. > > More details can be found in the commit logs. > > I want to note here that I looked into changing the loop_unrolls constant, and found that at large buffer sizes, the values of 2 and 4 give some extra performance gain. For example, on a 20001-byte destination buffer, I see an increase from 4.7X over intrinsic disabled (loop_unrolls=1), to 5.3X over intrinsic disabled (loop_unrolls=4), but on smaller buffer sizes, up to about 512, it causes performance degradation over loop_unrolls=1, so I have decided to stick with the original value of 1, since I don't know where to focus the performance versus buffer length tradeoff. Corey Ashford has updated the pull request incrementally with one additional commit since the last revision: stubGenerator_ppc.cpp: fixes for feedback from Kazunori Ogata and Martin Doerr decodeBlock changes: * Remove unroll loop and associated comments * Change comments referring to "P10+" to "P10 (or later)" to remove ambiguity * Make clear the lack of need for "aligh_prefix()" calls when using xxpermx due to the align(32) call The following change isn't based on feedback: encodeBlock changes: * Fix a comment that still referred to "unrolled loop". Unrolling was removed in an earlier commit. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4762/files - new: https://git.openjdk.java.net/jdk/pull/4762/files/bfa1d3be..1400e26d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4762&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4762&range=00-01 Stats: 275 lines in 1 file changed: 103 ins; 127 del; 45 mod Patch: https://git.openjdk.java.net/jdk/pull/4762.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4762/head:pull/4762 PR: https://git.openjdk.java.net/jdk/pull/4762 From cashford at openjdk.java.net Sat Jul 17 02:01:28 2021 From: cashford at openjdk.java.net (Corey Ashford) Date: Sat, 17 Jul 2021 02:01:28 GMT Subject: RFR: 8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup [v2] In-Reply-To: <2MGEbbqycE1PB8pY5U0Ups5OhB4WgC6sTZXFDjmzlFY=.330056df-bee3-4c05-94b9-e988afc0f35a@github.com> References: <2MGEbbqycE1PB8pY5U0Ups5OhB4WgC6sTZXFDjmzlFY=.330056df-bee3-4c05-94b9-e988afc0f35a@github.com> Message-ID: On Sat, 17 Jul 2021 00:03:12 GMT, Corey Ashford wrote: >> I guess we will never change the unroll factor. Why not clean it up? > >> I guess we will never change the unroll factor. Why not clean it up? > > I debated with myself back and forth on that. My thinking was that perhaps it will make sense on later processors, like Power11. On the other hand, it's some technical debt to have that complexity without the need. > > I'll push another commit to remove it (after testing it, of course) I believe I have resolved this with my latest commit to this PR. Please double check. ------------- PR: https://git.openjdk.java.net/jdk/pull/4762 From ascarpino at openjdk.java.net Sat Jul 17 03:33:54 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Sat, 17 Jul 2021 03:33:54 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: <9DGzWlRgC8DaSEZFFeOzQJuRvopW8CISMLJwYQAUGTo=.1aa32797-386f-4101-a96d-6cbad78934f7@github.com> References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> <9DGzWlRgC8DaSEZFFeOzQJuRvopW8CISMLJwYQAUGTo=.1aa32797-386f-4101-a96d-6cbad78934f7@github.com> Message-ID: On Fri, 16 Jul 2021 00:10:52 GMT, Anthony Scarpino wrote: >> src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 992: >> >>> 990: */ >>> 991: byte[] overlapDetection(byte[] in, int inOfs, byte[] out, int outOfs) { >>> 992: if (in == out && (!encryption || inOfs < outOfs)) { >> >> So, we will always allocate an output buffer for decryption if in==out? Why just decryption? Update the javadoc for this method with the reason? > > If the crypto is decryption in-place, an internal output buffer is needed in case the auth tag fails, otherwise the input buffer would be zero'ed. If decryption fails with a bad auth tag, the in is not overwritten because it's in-place. Encryption is not needed because there is nothing to check. I can add a comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From ascarpino at openjdk.java.net Sat Jul 17 04:04:56 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Sat, 17 Jul 2021 04:04:56 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Fri, 16 Jul 2021 20:49:20 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GHASH.java line 146: > >> 144: } >> 145: state = new long[2]; >> 146: subkeyHtbl = new long[2*57]; // 48 keys for the interleaved implementation, 8 for avx-ghash implementation and one for the original key > > nit: the comment is too long, i.e. > 80 Ah.. I forgot I didn't change GHASH with my changes.. but I'll fix that thanks > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 743: > >> 741: dst.array(), dst.arrayOffset() + dst.position(), >> 742: gctr, ghash); >> 743: } > > Can we use another ByteBuffer variable to avoid almost-duplicate calls? > > ByteBuffer ct = (encryption? dst : src); > rlen -= GaloisCounterMode.implGCMCrypt(src.array(), > src.arrayOffset() + src.position(), src.remaining(), > ct.array(), ct.arrayOffset() + ct.position(), > dst.array(), dst.arrayOffset() + dst.position(), > gctr, ghash); That maybe a better choice ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From sspitsyn at openjdk.java.net Sat Jul 17 04:35:51 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Sat, 17 Jul 2021 04:35:51 GMT Subject: RFR: 8270862: Fix problem list entries for 32-bit In-Reply-To: References: Message-ID: On Sat, 17 Jul 2021 00:50:04 GMT, David Holmes wrote: > 32-bit is specified as linux-i586 not linux-x86 > > Waiting for GHA results to confirm. > > Thanks, > David Hi David, Looks good and trivial. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4818 From stuefe at openjdk.java.net Sat Jul 17 07:16:25 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 17 Jul 2021 07:16:25 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value [v2] In-Reply-To: References: Message-ID: > Hi, > > may I have reviews for this fix to Arena alignment handling in hotspot. This affects the old Arena classes as well as child classes (ResourceArea, HandleArea etc). This is a followup to https://bugs.openjdk.java.net/browse/JDK-8270179 and other recent Arena fixes. > > This makes arenas more flexible, allowing to allocate with different alignments to achieve tighter packing. Theoretically arenas should be able to do that today, but it is actually broken. We did not notice any errors though, probably because the only platforms affected are 32-bit platforms other than x86 (x86 allows unaligned access). Also, today in the VM nobody mixes different alignments - you either allocate only with Amalloc, so with 64bit alignment, or only with AmallocWords (former Amalloc_4), which is 32bit on 32bit. > > I redid this patch twice. My first attempts rewrote a lot of the code, but I scratched that and in this patch kept it focused only on the alignment issue. Defer cleanups to separate RFEs. > > This patch establishes new alignment rules, but actual behavioral changes to current Arena users should be minimal and should only affect 32-bit anyway. > > Before this patch, Arena allocation checked (or auto-aligned) the *allocation size* in an attempt to guarantee alignment of the allocation start address. That was neither needed nor sufficient - see JBS text. > > The new code just allows any allocation size, aligned or unaligned. However, it now clearly guarantees alignment of the *allocation start address*. E.g., theoretically, you could allocate a 16bit word at a 64bit boundary. > > The arena does this by aligning, if needed, before an allocation. Alignment gaps are filled (debug build + ZapResourceArea) with a special zap pattern. > > > --- > > Notes: > - Chunk start, payload start, and payload end have to be aligned to max arena alignment. I wondered whether just statically assert `sizeof(Chunk)` to be aligned; but I did not find a pragma for this for all platforms, and did not want to play whack-the-mole with 32bit and strange compilers like Xlc. Therefore we align payload start separately. > - To simplify implementation, I added the rule that the four standard chunk sizes (those cached in the pools) have to be aligned to max arena alignment. They mostly were already, just had to tweak numbers a bit for 32-bit. > - Chunks are preceded with a canary now for debugging purposes and potential sanity tests in gtests (not used yet) > - Arena::realloc(): > - Commented in header that the original alignment is preserved (which it is since Arealloc either changes the allocation in-place, so start address does not change, or reallocates with max possible alignment. > - I fixed another pointer calculation to use pointer_delta when an allocation is grown in-place > - I added code to zap the leftover memory if an allocation shrinks > > Tests: > - I wrote a large number of gtests for the Arena class. These test proper behavior on alloc, realloc, and free, including overwrite tests and gap pattern tests. > - wrote a new gtests jtreg wrapper to run the Arena gtests with +UseMallocOnly - I think as long as we support it we should test it. > > Ran all tests on 32bit and 64bit Linux on my box. > > Nightlies at SAP are in progress. > > Manual tests: > - I also tested a higher arena max alignment (16 bytes on 64 bit), ran gtests, worked fine > - I also tested an unaligned Chunk size on 64bit to test padding after header is handled correctly, works fine Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge - Arena alignment fixes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4784/files - new: https://git.openjdk.java.net/jdk/pull/4784/files/14e7c086..0e11110a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4784&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4784&range=00-01 Stats: 2716 lines in 96 files changed: 2050 ins; 234 del; 432 mod Patch: https://git.openjdk.java.net/jdk/pull/4784.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4784/head:pull/4784 PR: https://git.openjdk.java.net/jdk/pull/4784 From dholmes at openjdk.java.net Sat Jul 17 07:46:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 17 Jul 2021 07:46:51 GMT Subject: RFR: 8270862: Fix problem list entries for 32-bit In-Reply-To: References: Message-ID: <7jo-8emtSyLSTyjVIAuedI5TnSQcApTY9XaY-SGzY7s=.588ea792-384f-4d77-887e-9d00293f8598@github.com> On Sat, 17 Jul 2021 04:32:55 GMT, Serguei Spitsyn wrote: >> 32-bit is specified as linux-i586 not linux-x86 >> >> Waiting for GHA results to confirm. >> >> Thanks, >> David > > Hi David, > Looks good and trivial. > Thanks, > Serguei Thanks @sspitsyn ! ------------- PR: https://git.openjdk.java.net/jdk/pull/4818 From dholmes at openjdk.java.net Sat Jul 17 07:46:52 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 17 Jul 2021 07:46:52 GMT Subject: Integrated: 8270862: Fix problem list entries for 32-bit In-Reply-To: References: Message-ID: On Sat, 17 Jul 2021 00:50:04 GMT, David Holmes wrote: > 32-bit is specified as linux-i586 not linux-x86 > > Waiting for GHA results to confirm. > > Thanks, > David This pull request has now been integrated. Changeset: e7cdfebb Author: David Holmes URL: https://git.openjdk.java.net/jdk/commit/e7cdfebbeebb274b28495b469f39d5874af45e65 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8270862: Fix problem list entries for 32-bit Reviewed-by: sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/4818 From ascarpino at openjdk.java.net Sat Jul 17 16:53:52 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Sat, 17 Jul 2021 16:53:52 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Fri, 16 Jul 2021 19:41:53 GMT, Valerie Peng wrote: >> src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 629: >> >>> 627: GCTR gctr; >>> 628: GHASH ghash; >>> 629: GCMOperation op; >> >> It seems clearer to initialize "op" in GCMEngine ctor since it's declared here. There is already logic in its method checking whether we are doing encryption or decryption. > > Now that you have GCMOperation op, but there is still if-else blocks checking whether it's encryption/decryption and uses gctr and ghash instead of op. Looks like a bit adhoc? Can GaloisCounterMode.implGCMCrypt(...) just take GCMOperation op instead, no need for ct, ctOfs, gctr and ghash? Initializing op in abstract GCMEngine would mean another 'if(encryption)', when that would not be needed in the GCMEncrypt() or GCMDecrypt(). I don't see why that is clearer. GaloisCounterMode.implGCMCrypt(...) is the intrinsic, so I have to use what is used by hotspot. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From ogatak at openjdk.java.net Sun Jul 18 20:11:54 2021 From: ogatak at openjdk.java.net (Kazunori Ogata) Date: Sun, 18 Jul 2021 20:11:54 GMT Subject: RFR: 8270340: Base64 decodeBlock intrinsic for Power64 needs cleanup [v2] In-Reply-To: References: Message-ID: On Sat, 17 Jul 2021 02:01:23 GMT, Corey Ashford wrote: >> This series of commits was written to accomplish several cleanup and Power10 optimization tasks for the Base64 decodeBlock intrinsic for Power64: >> * Remove the ISA 3.1+ (Power10+) pextd instruction optimization in decodeBlock. This "optimization" turned out to actually cause a performance hit. Removing it gains back about 3% in performance. >> * Introduce a constant block, similar to that in use by encodeBlock() to speed up constant loading. >> * Add the ISA 3.1+ xxpermx instruction and align_prefix() method for use in a Power10 optimization for decodeBlock. Please see the commit log for my concerns about this change. >> * Implement the xxpermx-based decodeBlock algorithm for Power10+, which gives about a 5% performance boost. >> >> More details can be found in the commit logs. >> >> I want to note here that I looked into changing the loop_unrolls constant, and found that at large buffer sizes, the values of 2 and 4 give some extra performance gain. For example, on a 20001-byte destination buffer, I see an increase from 4.7X over intrinsic disabled (loop_unrolls=1), to 5.3X over intrinsic disabled (loop_unrolls=4), but on smaller buffer sizes, up to about 512, it causes performance degradation over loop_unrolls=1, so I have decided to stick with the original value of 1, since I don't know where to focus the performance versus buffer length tradeoff. > > Corey Ashford has updated the pull request incrementally with one additional commit since the last revision: > > stubGenerator_ppc.cpp: fixes for feedback from Kazunori Ogata and Martin Doerr > > decodeBlock changes: > > * Remove unroll loop and associated comments > * Change comments referring to "P10+" to "P10 (or later)" to remove ambiguity > * Make clear the lack of need for "aligh_prefix()" calls when using xxpermx due to the align(32) call > > The following change isn't based on feedback: > > encodeBlock changes: > * Fix a comment that still referred to "unrolled loop". Unrolling was removed in an earlier commit. The changes look fine to me besides the comment about align_prefix(), though I'm not a Reviewer. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3966: > 3964: // Note that due to align(32) call above, the xxpermx instructions do > 3965: // not require align_prefix() calls, since the final xxpermx > 3966: // prefix+opcode is at byte 24. The question is whether we should leave align_prefix() in macroAssembler.[ch]pp. Since it appears convenient, someone may want to use it without noticing its intention that is only allowed (or acceptable) for generating intrinsics. If we leave it, I think it is better to add assertion statement to check if it is used for intrinsics generation (though I don't know how to implement it.) ------------- PR: https://git.openjdk.java.net/jdk/pull/4762 From njian at openjdk.java.net Mon Jul 19 02:47:57 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 19 Jul 2021 02:47:57 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Thu, 20 May 2021 07:32:52 GMT, Ningsheng Jian wrote: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Note: our original plan was making this work part of JEP 414 Vector API (Second Incubator) [1], but we realized that it's now close to 17 release cycle and the JEP process may take time. Adding more features could delay the whole review process for the JEP. So we separate this work out as a standalone patch. > > [1] http://openjdk.java.net/jeps/414 Will reopen this when https://bugs.openjdk.java.net/browse/JDK-8269306 finalized. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From ngasson at openjdk.java.net Mon Jul 19 03:06:54 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 19 Jul 2021 03:06:54 GMT Subject: RFR: 8270832: Aarch64: Update algorithm annotations for MacroAssembler::fill_words In-Reply-To: <2xPts-aE-Mr-T24nLCj5WZnGieBVx9oVtJ-WzKcU0mM=.ad6e2fe0-db5f-48c9-a604-88b332c50db1@github.com> References: <2xPts-aE-Mr-T24nLCj5WZnGieBVx9oVtJ-WzKcU0mM=.ad6e2fe0-db5f-48c9-a604-88b332c50db1@github.com> Message-ID: On Fri, 16 Jul 2021 11:20:45 GMT, Wang Huang wrote: > It is found that the comments of `MacroAssembler::fill_words` is not right here. Let's fix that. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4811: > 4809: { > 4810: // Algorithm: > 4811: // There's an extra store at the beginning to align `base` if bit 3 is set. Do you want to document that here too? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4834: > 4832: // } while (cnt); > 4833: // } > 4834: // if (cnt & 1 == 1) { Should be `(cnt & 1) == 1`. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4835: > 4833: // } > 4834: // if (cnt & 1 == 1) { > 4835: // p[0] = v; How about `*p++ = v` as `base` also gets incremented here. ------------- PR: https://git.openjdk.java.net/jdk/pull/4809 From thartmann at openjdk.java.net Mon Jul 19 06:42:12 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 19 Jul 2021 06:42:12 GMT Subject: [jdk17] RFR: 8270461: ZGC: Invalid oop passed to ZBarrierSetRuntime::load_barrier_on_oop_array In-Reply-To: References: Message-ID: On Fri, 16 Jul 2021 13:59:02 GMT, Tobias Hartmann wrote: > For object arrays, C2's clone intrinsic emits calls to the `oop_disjoint_arraycopy_uninit` stub. With ZGC, load barriers on the source array elements are applied via `BarrierSetAssembler::arraycopy_prologue` before copying to the destination array: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L2400-L2403 > > The problem is that `BarrierSetC2::arraycopy_payload_base_offset` may 8-byte align the array offset to ensure that a copy in 8-byte chunks of the 8-byte aligned object is possible: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L658-L662 > Now with ` -XX:-UseCompressedClassPointers`, the offset starts at the 4-byte length field of the array (8 bytes mark word + 8 byte klass = 16 byte). That is fine if we don't need load barriers and can copy the array as `T_LONG` but with ZGC we crash in `ZBarrier::mark` because the element is not a valid oop. > > I propose to simply set the offset to the first array element when cloning Object arrays with ZGC. We can still copy in 8 byte chunks because the oop elements are 8 byte on 64-bit (and ZGC is only supported on 64-bit). > > I found this when investigating intermittent ZGC crashes in project Valhalla. The bug was introduced by [JDK-8268125](https://bugs.openjdk.java.net/browse/JDK-8268125) in JDK 17. The code is quite messy and will hopefully be cleaned up by [JDK-8268020](https://bugs.openjdk.java.net/browse/JDK-8268020). > > Thanks, > Tobias Thanks for the review, Vladimir! The problem is that in `arraycopy_payload_base_offset` we don't know the basic type and therefore can't use `arrayOopDesc::base_offset_in_bytes(bt)`: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L660 Also, modifying that code would affect the other GCs using `T_LONG` for cloning Object arrays. Since we are late for JDK 17, I would rather prefer going with this conservative fix and postpone cleaning things up to [JDK-8268020](https://bugs.openjdk.java.net/browse/JDK-8268020). What do you think? And yes, for `ArrayCopyNodes` used for clone, `src_offset` is always a constant and equal to `dest_offset` (see code in `BarrierSetC2::clone`). ------------- PR: https://git.openjdk.java.net/jdk17/pull/252 From luhenry at openjdk.java.net Mon Jul 19 07:09:56 2021 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Mon, 19 Jul 2021 07:09:56 GMT Subject: RFR: 8178287: AsyncGetCallTrace fails to traverse valid Java stacks [v3] In-Reply-To: References: <9qfnLj_-jz8MocK7UIIs5-NYZsVPJ7J20ZLiORqpUlM=.cb712662-0eb9-4d17-a67d-42451423f470@github.com> Message-ID: On Sun, 11 Jul 2021 22:21:31 GMT, Andrei Pangin wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comments > > Hi Ludovic, > > Thank you for working on this long-standing bug. > I like the idea of the proposed solution, but unfortunately it cannot be applied as is. Since the stack walking code runs inside a signal handler, it is very limited in things it can do. In particular, it must not allocate, acquire locks, etc. In your implementation, FrameParser does allocate though. > > The issue is not just theoretical: when I ran JDK with this patch with async-profiler, I immediately got the following deadlock: > > > (gdb) bt > #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > #1 0x00007fa2363ca025 in __GI___pthread_mutex_lock (mutex=0x7fa235da5440 ) > at ../nptl/pthread_mutex_lock.c:80 > #2 0x00007fa235696cb6 in ThreadCritical::ThreadCritical() () from /usr/java/jdk-18/lib/server/libjvm.so > #3 0x00007fa234b6fe53 in Chunk::next_chop() () from /usr/java/jdk-18/lib/server/libjvm.so > #4 0x00007fa234e88523 in frame::safe_for_sender(JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so > #5 0x00007fa234e838f2 in vframeStreamForte::forte_next() () from /usr/java/jdk-18/lib/server/libjvm.so > #6 0x00007fa2349fbb9b in forte_fill_call_trace_given_top(JavaThread*, ASGCT_CallTrace*, int, frame) [clone .isra.20] > () from /usr/java/jdk-18/lib/server/libjvm.so > #7 0x00007fa234e8426e in AsyncGetCallTrace () from /usr/java/jdk-18/lib/server/libjvm.so > #8 0x00007fa228519312 in Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int) () > from /mnt/c/Users/Andrei/java/async-profiler/build/libasyncProfiler.so > #9 0x00007fa228519c72 in Profiler::recordSample(void*, unsigned long long, int, Event*) () > from /mnt/c/Users/Andrei/java/async-profiler/build/libasyncProfiler.so > #10 0x00007fa2285164f8 in WallClock::signalHandler(int, siginfo_t*, void*) () > from /mnt/c/Users/Andrei/java/async-profiler/build/libasyncProfiler.so > #11 > #12 __pthread_mutex_unlock_usercnt (decr=1, mutex=0x7fa235da5440 ) at pthread_mutex_unlock.c:41 > #13 __GI___pthread_mutex_unlock (mutex=0x7fa235da5440 ) at pthread_mutex_unlock.c:356 > #14 0x00007fa235696d3b in ThreadCritical::~ThreadCritical() () from /usr/java/jdk-18/lib/server/libjvm.so > #15 0x00007fa234b6fe71 in Chunk::next_chop() () from /usr/java/jdk-18/lib/server/libjvm.so > #16 0x00007fa234d1ca62 in ClassFileParser::parse_method(ClassFileStream const*, bool, ConstantPool const*, AccessFlags*, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so > #17 0x00007fa234d1e338 in ClassFileParser::parse_methods(ClassFileStream const*, bool, AccessFlags*, bool*, bool*, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so > #18 0x00007fa234d22459 in ClassFileParser::parse_stream(ClassFileStream const*, JavaThread*) () > from /usr/java/jdk-18/lib/server/libjvm.so > #19 0x00007fa234d2291c in ClassFileParser::ClassFileParser(ClassFileStream*, Symbol*, ClassLoaderData*, ClassLoadInfo const*, ClassFileParser::Publicity, JavaThread*) () from /usr/java/jdk-18/lib/server/libjvm.so > #20 0x00007fa2351febb6 in KlassFactory::create_from_stream(ClassFileStream*, Symbol*, ClassLoaderData*, ClassLoadInfo const&, JavaThread*) () > from /usr/java/jdk-18/lib/server/libjvm.so > #21 0x00007fa235645b40 in SystemDictionary::resolve_class_from_stream(ClassFileStream*, Symbol*, Handle, ClassLoadInfo const&, JavaThread*) () > from /usr/java/jdk-18/lib/server/libjvm.so > #22 0x00007fa2350bad0a in jvm_define_class_common(char const*, _jobject*, signed char const*, int, _jobject*, char const*, JavaThread*) [clone .constprop.299] () > from /usr/java/jdk-18/lib/server/libjvm.so > #23 0x00007fa2350bae6d in JVM_DefineClassWithSource () from /usr/java/jdk-18/lib/server/libjvm.so > #24 0x00007fa236a0ee12 in Java_java_lang_ClassLoader_defineClass1 () from /usr/java/jdk-18/lib/libjava.so @apangin Thanks for pointing that out! I'm updating it right now and should be pushing an update very soon. I'll also add examples on how it impacts JFR. ------------- PR: https://git.openjdk.java.net/jdk/pull/4436 From paul.sandoz at oracle.com Mon Jul 19 20:28:37 2021 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 19 Jul 2021 20:28:37 +0000 Subject: RFR: 8267356: AArch64: Vector API SVE codegen support In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: <787204D1-0E8A-4FC8-B7F0-0DFC0261F57F@oracle.com> Either way I think it's good that changes are/were socialized early, so as reviewers may familiarize themselves and not be surprised. Leaving a PR open for comments is useful in this respect. I?ll start pushing the JEP forward this week. Paul. > On Jul 18, 2021, at 7:47 PM, Ningsheng Jian wrote: > > On Thu, 20 May 2021 07:32:52 GMT, Ningsheng Jian wrote: > >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. >> >> Note: our original plan was making this work part of JEP 414 Vector API (Second Incubator) [1], but we realized that it's now close to 17 release cycle and the JEP process may take time. Adding more features could delay the whole review process for the JEP. So we separate this work out as a standalone patch. >> >> [1] http://openjdk.java.net/jeps/414 > > Will reopen this when https://bugs.openjdk.java.net/browse/JDK-8269306 finalized. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/4122 From coleenp at openjdk.java.net Wed Jul 21 20:20:13 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 21 Jul 2021 20:20:13 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v3] In-Reply-To: References: Message-ID: > I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. > Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix errant print ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4858/files - new: https://git.openjdk.java.net/jdk/pull/4858/files/4c86b5c4..7f8e7421 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4858&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4858&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4858.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4858/head:pull/4858 PR: https://git.openjdk.java.net/jdk/pull/4858 From fparain at openjdk.java.net Wed Jul 21 21:18:46 2021 From: fparain at openjdk.java.net (Frederic Parain) Date: Wed, 21 Jul 2021 21:18:46 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 20:20:13 GMT, Coleen Phillimore wrote: >> I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. >> Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix errant print Looks good to me. Fred ------------- Marked as reviewed by fparain (Committer). PR: https://git.openjdk.java.net/jdk/pull/4858 From coleenp at openjdk.java.net Wed Jul 21 22:27:48 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 21 Jul 2021 22:27:48 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 20:20:13 GMT, Coleen Phillimore wrote: >> I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. >> Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix errant print Thanks Fred! ------------- PR: https://git.openjdk.java.net/jdk/pull/4858 From jwilhelm at openjdk.java.net Thu Jul 22 00:02:23 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 22 Jul 2021 00:02:23 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8266347: assert(Dependencies::is_concrete_root_method(fm, ctxk) == Dependencies::is_concrete_method(m, ctxk)) failed: mismatch - 8264066: Enhance compiler validation - 8265201: JarFile.getInputStream not validating invalid signed jars - 8258432: Improve File Transfers - 8264079: Improve abstractions - 8262380: Enhance XML processing passes - 8262967: Improve Zip file support - 8264460: Improve NTLM support - 8256491: Better HTTP transport - ... and 10 more: https://git.openjdk.java.net/jdk/compare/0790f04d...025eaefb The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4863&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4863&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4863/files Stats: 517 lines in 33 files changed: 403 ins; 34 del; 80 mod Patch: https://git.openjdk.java.net/jdk/pull/4863.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4863/head:pull/4863 PR: https://git.openjdk.java.net/jdk/pull/4863 From jwilhelm at openjdk.java.net Thu Jul 22 00:51:37 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 22 Jul 2021 00:51:37 GMT Subject: RFR: Merge jdk17 [v2] In-Reply-To: References: Message-ID: > Forwardport JDK 17 -> JDK 18 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 282 commits: - Merge - 8271015: Split cds/SharedBaseAddress.java test into smaller parts Reviewed-by: ccheung, minqi - 8271014: Refactor HeapShared::is_archived_object() Reviewed-by: ccheung, minqi - 8270949: Make dynamically generated classes with the class file version of the current release Reviewed-by: alanb - 8269849: vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java failed with "OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects" Reviewed-by: kbarrett - 8270991: G1 Full GC always performs heap verification after JDK-8269295 Reviewed-by: iwalulya, kbarrett - 8270820: remove unused stiFileTableIndex from SDE.c Reviewed-by: cjplummer, sspitsyn - 8270147: Increase stride size allowing unrolling more loops Reviewed-by: kvn, iveresov - 8270803: Reduce CDS API verbosity Reviewed-by: minqi, ccheung - 8269933: test/jdk/javax/net/ssl/compatibility/JdkInfo incorrect verification of protocol and cipher support Reviewed-by: xuelei, rhalade - ... and 272 more: https://git.openjdk.java.net/jdk/compare/89f7998a...025eaefb ------------- Changes: https://git.openjdk.java.net/jdk/pull/4863/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4863&range=01 Stats: 55988 lines in 1158 files changed: 26162 ins; 25130 del; 4696 mod Patch: https://git.openjdk.java.net/jdk/pull/4863.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4863/head:pull/4863 PR: https://git.openjdk.java.net/jdk/pull/4863 From jwilhelm at openjdk.java.net Thu Jul 22 00:51:38 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 22 Jul 2021 00:51:38 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 23:52:53 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: c36755de Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/c36755dedf1a0d7ce0aeadd401e0c70ff84185e7 Stats: 517 lines in 33 files changed: 403 ins; 34 del; 80 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4863 From hboehm at google.com Thu Jul 22 01:25:16 2021 From: hboehm at google.com (Hans Boehm) Date: Wed, 21 Jul 2021 18:25:16 -0700 Subject: JNI WeakGlobalRefs In-Reply-To: References: Message-ID: [ Moving here from core-libs-dev on David Holmes' recommendation. ] I'm concerned that the current semantics of JNI WeakGlobalRefs are still dangerous in a very subtle way that is hidden in the spec. The current (14+) spec says: ?Weak global references are related to Java phantom references (java.lang.ref.PhantomReference). A weak global reference to a specific object is treated as a phantom reference referring to that object when determining whether the object is phantom reachable (see java.lang.ref). ---> Such a weak global reference will become functionally equivalent to NULL at the same time as a PhantomReference referring to that same object would be cleared by the garbage collector. <---? (This was the result of JDK-8220617, and is IMO a large improvement over the prior version, but ...) Consider what happens if I have a WeakGlobalRef W that refers to a Java object A which, possibly indirectly, relies on an object F, where F is finalizable, i.e. W - - -> A -----> ... -----> F Assume that F becomes invalid once it is finalized, e.g. because the finalizer deallocates a native object that F relies on. This seems to be a very common case. We are then exposed to the following scenario: 0) At some point, there are no longer any other references to A or F. 1) F is enqueued for finalization. 2) W is dereferenced by Thread 1, yielding a strong reference to A and transitively to F. 3) F is finalized. 4) Thread 1 uses A and F, accessing F, which is no longer valid. 5) Crash, or possibly memory corruption followed by a later crash elsewhere. (3) and (4) actually race, so there is some synchronization effort and cost required to prevent F from corrupting memory. Commonly the implementer of W will have no idea that F even exists. I believe that typically there is no way to prevent this scenario, unless the developer adding W actually knows how every class that A could possibly rely on, including those in the Java standard library, are implemented. This is reminiscent of finalizer ordering issues. But it seems to be worse, in that there isn't even a semi-plausible workaround. I believe all of this is exactly the reason PhantomReference.get() always returns null, while WeakReference provides significantly different semantics, and WeakReferences are enqueued when an object is enqueued for finalization. The situation improves, but the problem doesn't fully disappear, in a hypothetical world without finalizers. It's still possible to use WeakGlobalRef to get a strong reference to A after a WeakReference to A has been cleared and enqueued. I think the problem does go away if all cleanup code were to use PhantomReference-based Cleaners. AFAICT, backward-compatibility aside, the obvious solution here is to have WeakGlobalRefs behave like WeakReferences. My impression is that this would fix significantly more broken clients than it would break correct ones, so it is arguably still a viable option. There is a case in which the current semantics are actually the desired ones, namely when implementing, say, a String intern table. In this case it's important the reference not be cleared even if the referent is, at some point, only reachable via a finalizer. But this use case again relies on the programmer knowing that no part of the referent is invalidated by a finalizer. That's a reasonable assumption for the Java-implementation-provided String intern table. But I'm not sure it's reasonable for any user-written code. There seem to be two ways forward here: 1) Make WeakGlobalRefs behave like WeakReferences instead of PhantomReferences, or 2) Add strong warnings to the spec that basically suggest using a strong GlobalRef to a WeakReference instead. Has there been prior discussion of this? Are there reasonable use cases for the current semantics? Is there something else that I'm overlooking? If not, what's the best way forward here? (I found some discussion from JDK-8220617, including a message I posted. Unfortunately, it seems to me that all of us overlooked this issue?) Hans From kbarrett at openjdk.java.net Thu Jul 22 04:22:46 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 22 Jul 2021 04:22:46 GMT Subject: RFR: 8270961: [TESTBUG] Move GotWrongOOMEException into vm.share.gc package In-Reply-To: <68ZribzSwQCScsVM69wcNS5HYkUN3nMIz5H6z05u7UI=.413e9cb3-cb70-4b66-ad67-cc7861999bfb@github.com> References: <68ZribzSwQCScsVM69wcNS5HYkUN3nMIz5H6z05u7UI=.413e9cb3-cb70-4b66-ad67-cc7861999bfb@github.com> Message-ID: On Tue, 20 Jul 2021 19:26:31 GMT, Leonid Mesnik wrote: > The small refactoring which consolidates shared code in the shared library. Make project more IDE-friendly. Some minor commenting issues, but otherwise looks good. test/hotspot/jtreg/vmTestbase/metaspace/stressHierarchy/common/StressHierarchyBaseClass.java line 117: > 115: * We pass test in this case as this breaks test logic. We have dedicated test configurations > 116: * for OOME:heap provoking class unloading, that why we are not missing test coverage here. > 117: */ This comment is unnecessary, as it adds nothing over the log message that follows. test/hotspot/jtreg/vmTestbase/vm/share/gc/HeapOOMEException.java line 26: > 24: > 25: /** > 26: * This class is used to differ OOME in metaspace and heap when trigger class unloading. That doesn't scan so well. Perhaps something like "This class is used to distinguish between OOME in metaspace and OOME in heap when triggering class unloading." ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4844 From lmesnik at openjdk.java.net Thu Jul 22 04:41:25 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Thu, 22 Jul 2021 04:41:25 GMT Subject: RFR: 8270961: [TESTBUG] Move GotWrongOOMEException into vm.share.gc package [v2] In-Reply-To: References: <68ZribzSwQCScsVM69wcNS5HYkUN3nMIz5H6z05u7UI=.413e9cb3-cb70-4b66-ad67-cc7861999bfb@github.com> Message-ID: On Thu, 22 Jul 2021 04:13:16 GMT, Kim Barrett wrote: >> Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - fixed after Kim's comments. >> - Merge branch 'master' of https://github.com/openjdk/jdk into 8270961 >> - fixed log string. >> - import fix. >> - 8270961 > > test/hotspot/jtreg/vmTestbase/metaspace/stressHierarchy/common/StressHierarchyBaseClass.java line 117: > >> 115: * We pass test in this case as this breaks test logic. We have dedicated test configurations >> 116: * for OOME:heap provoking class unloading, that why we are not missing test coverage here. >> 117: */ > > This comment is unnecessary, as it adds nothing over the log message that follows. Thanks! I just moved it and missed that is the same as the log message. ------------- PR: https://git.openjdk.java.net/jdk/pull/4844 From lmesnik at openjdk.java.net Thu Jul 22 04:41:20 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Thu, 22 Jul 2021 04:41:20 GMT Subject: RFR: 8270961: [TESTBUG] Move GotWrongOOMEException into vm.share.gc package [v2] In-Reply-To: <68ZribzSwQCScsVM69wcNS5HYkUN3nMIz5H6z05u7UI=.413e9cb3-cb70-4b66-ad67-cc7861999bfb@github.com> References: <68ZribzSwQCScsVM69wcNS5HYkUN3nMIz5H6z05u7UI=.413e9cb3-cb70-4b66-ad67-cc7861999bfb@github.com> Message-ID: > The small refactoring which consolidates shared code in the shared library. Make project more IDE-friendly. Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - fixed after Kim's comments. - Merge branch 'master' of https://github.com/openjdk/jdk into 8270961 - fixed log string. - import fix. - 8270961 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4844/files - new: https://git.openjdk.java.net/jdk/pull/4844/files/990bff2f..70e726f3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4844&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4844&range=00-01 Stats: 1509 lines in 67 files changed: 998 ins; 186 del; 325 mod Patch: https://git.openjdk.java.net/jdk/pull/4844.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4844/head:pull/4844 PR: https://git.openjdk.java.net/jdk/pull/4844 From thartmann at openjdk.java.net Thu Jul 22 05:55:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 22 Jul 2021 05:55:49 GMT Subject: [jdk17] RFR: 8270461: ZGC: Invalid oop passed to ZBarrierSetRuntime::load_barrier_on_oop_array [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 06:46:37 GMT, Tobias Hartmann wrote: >> For object arrays, C2's clone intrinsic emits calls to the `oop_disjoint_arraycopy_uninit` stub. With ZGC, load barriers on the source array elements are applied via `BarrierSetAssembler::arraycopy_prologue` before copying to the destination array: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L2400-L2403 >> >> The problem is that `BarrierSetC2::arraycopy_payload_base_offset` may 8-byte align the array offset to ensure that a copy in 8-byte chunks of the 8-byte aligned object is possible: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L658-L662 >> Now with ` -XX:-UseCompressedClassPointers`, the offset starts at the 4-byte length field of the array (8 bytes mark word + 8 byte klass = 16 byte). That is fine if we don't need load barriers and can copy the array as `T_LONG` but with ZGC we crash in `ZBarrier::mark` because the element is not a valid oop. >> >> I propose to simply set the offset to the first array element when cloning Object arrays with ZGC. We can still copy in 8 byte chunks because the oop elements are 8 byte on 64-bit (and ZGC is only supported on 64-bit). >> >> I found this when investigating intermittent ZGC crashes in project Valhalla. The bug was introduced by [JDK-8268125](https://bugs.openjdk.java.net/browse/JDK-8268125) in JDK 17. The code is quite messy and will hopefully be cleaned up by [JDK-8268020](https://bugs.openjdk.java.net/browse/JDK-8268020). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted comment Thanks, Vladimir! ------------- PR: https://git.openjdk.java.net/jdk17/pull/252 From thartmann at openjdk.java.net Thu Jul 22 06:02:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 22 Jul 2021 06:02:49 GMT Subject: [jdk17] Integrated: 8270461: ZGC: Invalid oop passed to ZBarrierSetRuntime::load_barrier_on_oop_array In-Reply-To: References: Message-ID: On Fri, 16 Jul 2021 13:59:02 GMT, Tobias Hartmann wrote: > For object arrays, C2's clone intrinsic emits calls to the `oop_disjoint_arraycopy_uninit` stub. With ZGC, load barriers on the source array elements are applied via `BarrierSetAssembler::arraycopy_prologue` before copying to the destination array: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L2400-L2403 > > The problem is that `BarrierSetC2::arraycopy_payload_base_offset` may 8-byte align the array offset to ensure that a copy in 8-byte chunks of the 8-byte aligned object is possible: https://github.com/openjdk/jdk17/blob/1e3b418a53a080a53827989393362338b43dd363/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L658-L662 > Now with ` -XX:-UseCompressedClassPointers`, the offset starts at the 4-byte length field of the array (8 bytes mark word + 8 byte klass = 16 byte). That is fine if we don't need load barriers and can copy the array as `T_LONG` but with ZGC we crash in `ZBarrier::mark` because the element is not a valid oop. > > I propose to simply set the offset to the first array element when cloning Object arrays with ZGC. We can still copy in 8 byte chunks because the oop elements are 8 byte on 64-bit (and ZGC is only supported on 64-bit). > > I found this when investigating intermittent ZGC crashes in project Valhalla. The bug was introduced by [JDK-8268125](https://bugs.openjdk.java.net/browse/JDK-8268125) in JDK 17. The code is quite messy and will hopefully be cleaned up by [JDK-8268020](https://bugs.openjdk.java.net/browse/JDK-8268020). > > Thanks, > Tobias This pull request has now been integrated. Changeset: 4119a52c Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk17/commit/4119a52c4b3d30d7e02e6f987f61121a90758876 Stats: 19 lines in 3 files changed: 13 ins; 0 del; 6 mod 8270461: ZGC: Invalid oop passed to ZBarrierSetRuntime::load_barrier_on_oop_array Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk17/pull/252 From tschatzl at openjdk.java.net Thu Jul 22 07:22:49 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 22 Jul 2021 07:22:49 GMT Subject: RFR: 8270961: [TESTBUG] Move GotWrongOOMEException into vm.share.gc package [v2] In-Reply-To: References: <68ZribzSwQCScsVM69wcNS5HYkUN3nMIz5H6z05u7UI=.413e9cb3-cb70-4b66-ad67-cc7861999bfb@github.com> Message-ID: <91jfzBI7s8byTQLDCgDDaLUi4FqH5CsBo4MZf8VAgQs=.5666606b-8cd0-4d16-acfa-3ff3eac962b4@github.com> On Thu, 22 Jul 2021 04:41:20 GMT, Leonid Mesnik wrote: >> The small refactoring which consolidates shared code in the shared library. Make project more IDE-friendly. > > Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - fixed after Kim's comments. > - Merge branch 'master' of https://github.com/openjdk/jdk into 8270961 > - fixed log string. > - import fix. > - 8270961 Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4844 From erik.osterlund at oracle.com Thu Jul 22 08:20:36 2021 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 22 Jul 2021 08:20:36 +0000 Subject: JNI WeakGlobalRefs In-Reply-To: References: Message-ID: Hi Hans, So you are saying that a jweak allows you to peek into the finalizer graph of a finalizing object from the outside due to having phantom semantics, and that is sometimes bad. I agree. As you noted, if you don't use finalizers, there is no problem here. But let's assume we do what you propose, and make jweak have weak semantics, instead of phantom semantics. First of all, have we now removed all gimmicks that allow you to peek through finalizers? I would say no. Consider the following (same notation as your example): F ---> ... ---> F Here we have an object with a finalizer that can reach another object with a finalizer. They can both get finalized by the same GC cycle. So depending on who is enqueued for finalization first, you may or may not be able to peek into an already finalized object here, from the outside, through another finalizer. One might argue about the probability of users using an object behind a jweak vs finalizer, without knowing their implementation. But it's bad news for both. Once again, if finalizers are not used, there is no problem. Secondly, as for when you would ever want jweak to have phantom semantics... I don't think this is uncommon, in particular for native code. It is not uncommon to use JNI to implement a native mirror component for a Java object. The Java object and its native data structure are to be considered as one logical unit that stick together. The Java object has a pointer to its native part, and the native part wants a pointer back to its Java object, so they can talk to each other seamlessly. You don't want the back pointer to be strong because it would create memory leaks, so you use a jweak. The expectation is that surely, as long as this Java object is around, you can access it through the jweak. Especially since the spec says so, it's a very sound assumption. Then some cleaner will delete the data structure when it is no longer relevant. So in contexts where you are in the native data structure and haven't passed around the object reference, you can always get the object through the jweak as long as it isn't dead. Now if we change the semantics of jweak to weak instead, then every time the Java object is reachable through a finalizer only, the logic will be wrong. The native part thinks the object is dead, but it isn't. So if anyone uses this object through a finalizer, bad things can happen. Kind of like a native object monitor not working properly in finalizers. You essentially no longer can have an object with a native component point at each other, without running into a bunch of issues. Either introducing a memory leak, or having the object be crippled when reachable from finalizers. For similar reasons, all native weak references used internally in HotSpot, do have phantom semantics, and rely on that being the case. So it's a two edged sword I think. What is true for this whole discussion though is that if we don't have finalizers, then there isn't really a problem. Finalizers are deprecated, so hopefully its use will die out over time. What is also true is that by changing the semantics of jweak, you trade one problem for another one. This seems particularly nasty to change, as the spec clearly states what users can expect from this API. Hope this helps. /Erik > [ Moving here from core-libs-dev on David Holmes' recommendation. ] > > I'm concerned that the current semantics of JNI WeakGlobalRefs are still > dangerous in a very subtle way that is hidden in the spec. The current > (14+) spec says: > > ?Weak global references are related to Java phantom references > (java.lang.ref.PhantomReference). A weak global reference to a specific > object is treated as a phantom reference referring to that object when > determining whether the object is phantom reachable (see java.lang.ref). > ---> Such a weak global reference will become functionally equivalent to > NULL at the same time as a PhantomReference referring to that same object > would be cleared by the garbage collector. <---? > > (This was the result of JDK-8220617, and is IMO a large improvement over the > prior version, but ...) > > Consider what happens if I have a WeakGlobalRef W that refers to a Java > object A which, possibly indirectly, relies on an object F, where F is > finalizable, i.e. > > W - - -> A -----> ... -----> F > > Assume that F becomes invalid once it is finalized, e.g. because the finalizer > deallocates a native object that F relies on. This seems to be a very common > case. We are then exposed to the following scenario: > > 0) At some point, there are no longer any other references to A or F. > 1) F is enqueued for finalization. > 2) W is dereferenced by Thread 1, yielding a strong reference to A and > transitively to F. > 3) F is finalized. > 4) Thread 1 uses A and F, accessing F, which is no longer valid. > 5) Crash, or possibly memory corruption followed by a later crash elsewhere. > > (3) and (4) actually race, so there is some synchronization effort and cost > required to prevent F from corrupting memory. Commonly the implementer > of W will have no idea that F even exists. > > I believe that typically there is no way to prevent this scenario, unless the > developer adding W actually knows how every class that A could possibly rely > on, including those in the Java standard library, are implemented. > > This is reminiscent of finalizer ordering issues. But it seems to be worse, in > that there isn't even a semi-plausible workaround. > > I believe all of this is exactly the reason PhantomReference.get() always > returns null, while WeakReference provides significantly different semantics, > and WeakReferences are enqueued when an object is enqueued for > finalization. > > The situation improves, but the problem doesn't fully disappear, in a > hypothetical world without finalizers. It's still possible to use WeakGlobalRef > to get a strong reference to A after a WeakReference to A has been cleared > and enqueued. I think the problem does go away if all cleanup code were to > use PhantomReference-based Cleaners. > > AFAICT, backward-compatibility aside, the obvious solution here is to have > WeakGlobalRefs behave like WeakReferences. My impression is that this > would fix significantly more broken clients than it would break correct ones, > so it is arguably still a viable option. > > There is a case in which the current semantics are actually the desired ones, > namely when implementing, say, a String intern table. In this case it's > important the reference not be cleared even if the referent is, at some > point, only reachable via a finalizer. But this use case again relies on the > programmer knowing that no part of the referent is invalidated by a finalizer. > That's a reasonable assumption for the Java-implementation-provided String > intern table. But I'm not sure it's reasonable for any user-written code. > > There seem to be two ways forward here: > > 1) Make WeakGlobalRefs behave like WeakReferences instead of > PhantomReferences, or > 2) Add strong warnings to the spec that basically suggest using a strong > GlobalRef to a WeakReference instead. > > Has there been prior discussion of this? Are there reasonable use cases for > the current semantics? Is there something else that I'm overlooking? If not, > what's the best way forward here? > > (I found some discussion from JDK-8220617, including a message I posted. > Unfortunately, it seems to me that all of us overlooked this issue?) > > Hans From coleenp at openjdk.java.net Thu Jul 22 11:47:48 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 22 Jul 2021 11:47:48 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v2] In-Reply-To: References: <2pyhkqWrPeCCq-mva2BtVRH8GXQsWF6wW7cyTKcNJj8=.95a9ed9f-5e78-4e42-b385-4190ec8eec9b@github.com> Message-ID: <4EvTy1zZI6i_Sf3kZuayPvLsRgMifI7Fauy7EenaCGE=.1908ff49-dcd2-42bf-b038-ed62da6418f3@github.com> On Wed, 21 Jul 2021 14:45:53 GMT, Thomas Stuefe wrote: >> Or guarantee the chunk size is 64bit aligned and std::max_align_t is also (at least) 64bit aligned (which it almost certainly is), so that the result of malloc will always be (at least) 64bit aligned. > > I strongly agree that _max and the chunk boundaries should be aligned to 64bit alignment. But as I wrote, that would cause more widespread changes. I would just duplicate the work I did with my first proposal https://github.com/openjdk/jdk/pull/4784. Which already exists, and it does do all you requested. Maybe take a look at that? If you like it, I propose we put this minimal patch through and later use my first PR as a base for a better cleanup. > > The current behavior for Amalloc(0) is non-malloc-standard: it returns a non-unique-non-NULL address (since the next allocation returns the same address). When I worked on the first proposal I tried to fix this too. First tried to return NULL, which does not work, callers assume that's an OOM. Then by returning a unique pointer, but I did not like the memory waste. The true fix would be to disallow size==0 and fix callers not to allocate 0 bytes. But I believe that memory allocated with size==0 is not used, since it would be overwritten by the next allocation. > > I'll modify the patch and do a size==0->size=1 for 32bit only. Let's hope that does not blow up arena sizes too much. > > About std::max_align_t, can we use that? I hard-coded malloc align in a number of places. Can you just assert that size > 0 for Amalloc? ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From hseigel at openjdk.java.net Thu Jul 22 12:12:49 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 22 Jul 2021 12:12:49 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 20:20:13 GMT, Coleen Phillimore wrote: >> I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. >> Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix errant print Changes look good! Thanks, Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4858 From mbaesken at openjdk.java.net Thu Jul 22 12:18:36 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Thu, 22 Jul 2021 12:18:36 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v5] In-Reply-To: References: Message-ID: <5U29-7MNu_hZ-zKgsFUKj656igVcq-0h3kq-EASNU_M=.2881b609-4712-49b7-8f4f-fb447b94c2eb@github.com> On Fri, 16 Jul 2021 06:14:07 GMT, Matthias Baesken wrote: >> Hello, please review this PR; it extend the OSContainer API in order to also support the pids controller of cgroups. >> >> I noticed that unlike the other controllers "cpu", "cpuset", "cpuacct", "memory" on some older Linux distros (SLES 12.1, RHEL 7.1) the pids controller might not be there (or not fully supported) so it was added as optional , see the coding >> >> >> if (!cg_infos[PIDS_IDX]._data_complete) { >> log_debug(os, container)("Optional cgroup v1 pids subsystem not found"); >> // keep the other controller info, pids is optional >> } > > Matthias Baesken has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8266490 > - Add hotspot tests > - test and small adjustments suggested by Severin > - Adjustments following Severins comments > - JDK-8266490 Hi Severin, thanks for the comments. I added a commit with a number of adjustments src/hotspot/os/linux/cgroupSubsystem_linux.cpp adjusted log_info to log_debug src/java.base/share/classes/sun/launcher/LauncherHelper.java adjusted the output to "Maximum Processes Limit:" test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java removed the getPidsMax related line (I think I inserted it while running some tests and forgot previously to remove it) test/hotspot/jtreg/containers/docker/TestPids.java added testing of "Unlimited"; added --pids-limit=-1 for Unlimited procs like you suggested test/jdk/jdk/internal/platform/docker/TestPidsLimit.java adjusted output; added --pids-limit=-1 for Unlimited procs like you suggested ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From mbaesken at openjdk.java.net Thu Jul 22 12:18:20 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Thu, 22 Jul 2021 12:18:20 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v6] In-Reply-To: References: Message-ID: > Hello, please review this PR; it extend the OSContainer API in order to also support the pids controller of cgroups. > > I noticed that unlike the other controllers "cpu", "cpuset", "cpuacct", "memory" on some older Linux distros (SLES 12.1, RHEL 7.1) the pids controller might not be there (or not fully supported) so it was added as optional , see the coding > > > if (!cg_infos[PIDS_IDX]._data_complete) { > log_debug(os, container)("Optional cgroup v1 pids subsystem not found"); > // keep the other controller info, pids is optional > } Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Minor adjustments, handling of Unlimited ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4518/files - new: https://git.openjdk.java.net/jdk/pull/4518/files/5fc52fb1..857ab1db Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4518&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4518&range=04-05 Stats: 19 lines in 5 files changed: 11 ins; 1 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4518.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4518/head:pull/4518 PR: https://git.openjdk.java.net/jdk/pull/4518 From yyang at openjdk.java.net Thu Jul 22 12:28:46 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 22 Jul 2021 12:28:46 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 20:20:13 GMT, Coleen Phillimore wrote: >> I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. >> Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix errant print src/hotspot/share/oops/instanceKlass.cpp line 1641: > 1639: } > 1640: > 1641: struct F { Hi Coleen, is it possible to replace this with existing Pair structure(utilities/pair.hpp)? ------------- PR: https://git.openjdk.java.net/jdk/pull/4858 From stuefe at openjdk.java.net Thu Jul 22 13:02:48 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 22 Jul 2021 13:02:48 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v2] In-Reply-To: <4EvTy1zZI6i_Sf3kZuayPvLsRgMifI7Fauy7EenaCGE=.1908ff49-dcd2-42bf-b038-ed62da6418f3@github.com> References: <2pyhkqWrPeCCq-mva2BtVRH8GXQsWF6wW7cyTKcNJj8=.95a9ed9f-5e78-4e42-b385-4190ec8eec9b@github.com> <4EvTy1zZI6i_Sf3kZuayPvLsRgMifI7Fauy7EenaCGE=.1908ff49-dcd2-42bf-b038-ed62da6418f3@github.com> Message-ID: On Thu, 22 Jul 2021 11:44:35 GMT, Coleen Phillimore wrote: >> I strongly agree that _max and the chunk boundaries should be aligned to 64bit alignment. But as I wrote, that would cause more widespread changes. I would just duplicate the work I did with my first proposal https://github.com/openjdk/jdk/pull/4784. Which already exists, and it does do all you requested. Maybe take a look at that? If you like it, I propose we put this minimal patch through and later use my first PR as a base for a better cleanup. >> >> The current behavior for Amalloc(0) is non-malloc-standard: it returns a non-unique-non-NULL address (since the next allocation returns the same address). When I worked on the first proposal I tried to fix this too. First tried to return NULL, which does not work, callers assume that's an OOM. Then by returning a unique pointer, but I did not like the memory waste. The true fix would be to disallow size==0 and fix callers not to allocate 0 bytes. But I believe that memory allocated with size==0 is not used, since it would be overwritten by the next allocation. >> >> I'll modify the patch and do a size==0->size=1 for 32bit only. Let's hope that does not blow up arena sizes too much. >> >> About std::max_align_t, can we use that? I hard-coded malloc align in a number of places. > > Can you just assert that size > 0 for Amalloc? Unfortunately not, since that is actually done. I would have to hunt down callers which do this. That is maybe worthwhile, but would be a larger change. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From coleenp at openjdk.java.net Thu Jul 22 14:25:46 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 22 Jul 2021 14:25:46 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v3] In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 12:26:05 GMT, Yi Yang wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix errant print > > src/hotspot/share/oops/instanceKlass.cpp line 1641: > >> 1639: } >> 1640: >> 1641: struct F { > > Hi Coleen, is it possible to reuse existing Pair structure(utilities/pair.hpp)? Just IMHO:) Yeah, that would be nicer. Let me have a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/4858 From coleenp at openjdk.java.net Thu Jul 22 14:35:21 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 22 Jul 2021 14:35:21 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v4] In-Reply-To: References: Message-ID: > I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. > Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Pair is much nicer. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4858/files - new: https://git.openjdk.java.net/jdk/pull/4858/files/7f8e7421..d7f648a0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4858&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4858&range=02-03 Stats: 14 lines in 1 file changed: 1 ins; 6 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4858.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4858/head:pull/4858 PR: https://git.openjdk.java.net/jdk/pull/4858 From coleenp at openjdk.java.net Thu Jul 22 14:35:24 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 22 Jul 2021 14:35:24 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v3] In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 14:23:10 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 1641: >> >>> 1639: } >>> 1640: >>> 1641: struct F { >> >> Hi Coleen, is it possible to reuse existing Pair structure(utilities/pair.hpp)? Just IMHO:) > > Yeah, that would be nicer. Let me have a look. Thank you for the improvement. I was trying to be minimal with 'F' but Pair is better. ------------- PR: https://git.openjdk.java.net/jdk/pull/4858 From sgehwolf at openjdk.java.net Thu Jul 22 15:42:01 2021 From: sgehwolf at openjdk.java.net (Severin Gehwolf) Date: Thu, 22 Jul 2021 15:42:01 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v5] In-Reply-To: <5U29-7MNu_hZ-zKgsFUKj656igVcq-0h3kq-EASNU_M=.2881b609-4712-49b7-8f4f-fb447b94c2eb@github.com> References: <5U29-7MNu_hZ-zKgsFUKj656igVcq-0h3kq-EASNU_M=.2881b609-4712-49b7-8f4f-fb447b94c2eb@github.com> Message-ID: On Thu, 22 Jul 2021 12:15:03 GMT, Matthias Baesken wrote: >> Matthias Baesken has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/master' into JDK-8266490 >> - Add hotspot tests >> - test and small adjustments suggested by Severin >> - Adjustments following Severins comments >> - JDK-8266490 > > Hi Severin, thanks for the comments. I added a commit with a number of adjustments > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp > adjusted log_info to log_debug > > src/java.base/share/classes/sun/launcher/LauncherHelper.java > adjusted the output to "Maximum Processes Limit:" > > test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java > removed the getPidsMax related line (I think I inserted it while running some tests and forgot previously to remove it) > > test/hotspot/jtreg/containers/docker/TestPids.java > added testing of "Unlimited"; added --pids-limit=-1 for Unlimited procs like you suggested > > test/jdk/jdk/internal/platform/docker/TestPidsLimit.java > adjusted output; added --pids-limit=-1 for Unlimited procs like you suggested @MBaesken Thanks. We need a solution for https://github.com/openjdk/jdk/pull/4518#issuecomment-882637594 though. `--pids-limit=-1` doesn't seem to make it unlimited on all container runtimes. For example it fails for me here with: $ docker --version Docker version 20.10.6, build 370c289 ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From ascarpino at openjdk.java.net Thu Jul 22 17:22:25 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Thu, 22 Jul 2021 17:22:25 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Fri, 16 Jul 2021 00:31:43 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 650: > >> 648: int originalOutOfs = 0; >> 649: byte[] in; >> 650: byte[] out; > > The name "in", "out" are almost used in all calls, it's hard to tell when these two are actually used. Can we rename them to make them more unique? ok ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From ascarpino at openjdk.java.net Thu Jul 22 17:22:24 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Thu, 22 Jul 2021 17:22:24 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Tue, 20 Jul 2021 22:36:28 GMT, Valerie Peng wrote: >> Initializing op in abstract GCMEngine would mean another 'if(encryption)', when that would not be needed in the GCMEncrypt() or GCMDecrypt(). I don't see why that is clearer. >> >> GaloisCounterMode.implGCMCrypt(...) is the intrinsic, so I have to use what is used by hotspot. > > Seems strange to have GCMOperation op defined in GCMEngine but not initialized, nor used. The methods in GCMEngine which use op has an argument named op anyway. Either you just use the "op" field (remove the "op" argument) or the "op" argument (move the op field to GCMEncrypt/GCMDecrypt class). Having both looks confusing. Ok.. Moving it into GCMEncrypt makes sense. Now that I look at the code GCMDecrypt only uses it when passed to a method. GCMEncrypt uses it ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From ascarpino at openjdk.java.net Thu Jul 22 17:51:07 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Thu, 22 Jul 2021 17:51:07 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: <4PCHgWxJmLLQYJvoVAjaRvD97KUwd5z5wE3czxfiKg8=.c1a12cda-bb95-4597-9078-c17cd71edadb@github.com> References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> <4PCHgWxJmLLQYJvoVAjaRvD97KUwd5z5wE3czxfiKg8=.c1a12cda-bb95-4597-9078-c17cd71edadb@github.com> Message-ID: <0WMPRxRKCdPaYY8hsB9VqEXtk41YrUMPhWhxmgJGlno=.e811e3dd-ddf5-4f7f-b8e1-54b22a08d8a1@github.com> On Mon, 19 Jul 2021 19:22:53 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 729: > >> 727: >> 728: if (src.hasArray() && dst.hasArray()) { >> 729: int l = rlen; > > Remove this "l" variable since it's not used? The code needs some reorganizing ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From coleenp at openjdk.java.net Thu Jul 22 17:55:18 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 22 Jul 2021 17:55:18 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v4] In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 14:35:21 GMT, Coleen Phillimore wrote: >> I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. >> Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Pair is much nicer. Thanks Harold! ------------- PR: https://git.openjdk.java.net/jdk/pull/4858 From ascarpino at openjdk.java.net Thu Jul 22 18:00:21 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Thu, 22 Jul 2021 18:00:21 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Mon, 19 Jul 2021 19:35:16 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 761: > >> 759: } >> 760: >> 761: dst.put(out, 0, rlen); > > This looks belong to the above if-block? I wonder how this have not affected the operation to fail. Perhaps the existing regression tests did not cover the 'rlen < blockSize' case. If the code in the above if-block is not run, this outsize dst.put(...) call would put extra output bytes into the output buffer. Yes... this one and the ct offset problem earlier I would have expected the regression test it pick the mistake. There should be tests that catch this.. I'm not sure what's up. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From lmesnik at openjdk.java.net Thu Jul 22 18:21:17 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Thu, 22 Jul 2021 18:21:17 GMT Subject: Integrated: 8270961: [TESTBUG] Move GotWrongOOMEException into vm.share.gc package In-Reply-To: <68ZribzSwQCScsVM69wcNS5HYkUN3nMIz5H6z05u7UI=.413e9cb3-cb70-4b66-ad67-cc7861999bfb@github.com> References: <68ZribzSwQCScsVM69wcNS5HYkUN3nMIz5H6z05u7UI=.413e9cb3-cb70-4b66-ad67-cc7861999bfb@github.com> Message-ID: <4-_OOzeOnoX7nRVsyrWlASrSiaap7sFZ1YdAY1ClJTc=.c43f1687-a43d-4345-bff6-87960f72fdd8@github.com> On Tue, 20 Jul 2021 19:26:31 GMT, Leonid Mesnik wrote: > The small refactoring which consolidates shared code in the shared library. Make project more IDE-friendly. This pull request has now been integrated. Changeset: 258f188b Author: Leonid Mesnik URL: https://git.openjdk.java.net/jdk/commit/258f188bff07b6c873128a181746afcf8053d936 Stats: 82 lines in 4 files changed: 37 ins; 40 del; 5 mod 8270961: [TESTBUG] Move GotWrongOOMEException into vm.share.gc package Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/4844 From ascarpino at openjdk.java.net Thu Jul 22 18:34:11 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Thu, 22 Jul 2021 18:34:11 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Tue, 20 Jul 2021 01:35:04 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 762: > >> 760: >> 761: dst.put(out, 0, rlen); >> 762: processed += srcLen; > > It seems that callers of this implGCMCrypt() method such as GCMEngine.doLastBlock() adds the returned value to the "processed" field which looks like double counting? However, some caller such as GCMEncrypt.doUpdate() does not. Seems inconsistent and may lead to wrong value for the "processed" field? All the callers that use GCMOperations, ie op.update(...), have the processed value updated. implGCMCrypt() calls op.update() and updates the value. It cannot double count 'processed' is not updated after implGCMCrypt(). I can see your point, but the other methods do not have access to 'processed' and would mean I copy that line 3 times elsewhere. I'd rather keep it as is ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From ascarpino at openjdk.java.net Thu Jul 22 18:39:14 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Thu, 22 Jul 2021 18:39:14 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> <9DGzWlRgC8DaSEZFFeOzQJuRvopW8CISMLJwYQAUGTo=.1aa32797-386f-4101-a96d-6cbad78934f7@github.com> Message-ID: On Mon, 19 Jul 2021 23:41:49 GMT, Valerie Peng wrote: >> If decryption fails with a bad auth tag, the in is not overwritten because it's in-place. Encryption is not needed because there is nothing to check. I can add a comment. > > Hmm ok, so if it's not decryption in-place, then output buffer would still be zero'ed when the auth tag failed, but this is ok? This is able in-place, not about two separate buffers.. zeroing happens somewhere else for all decryption bad buffers ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From ascarpino at openjdk.java.net Thu Jul 22 19:18:08 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Thu, 22 Jul 2021 19:18:08 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: <3X43XyHHzWOWHvKNMoblGEQpvOBIB5cudVpXZl2yIH8=.4e382ef3-90c8-4749-8768-12470d98e9ab@github.com> References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> <3X43XyHHzWOWHvKNMoblGEQpvOBIB5cudVpXZl2yIH8=.4e382ef3-90c8-4749-8768-12470d98e9ab@github.com> Message-ID: On Fri, 16 Jul 2021 00:09:37 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 611: > >> 609: outOfs + len); >> 610: ghash.update(ct, ctOfs, segments); >> 611: ctOfs = len; > > This does not look right when the initial value of ctOfs != 0. Yeah that doesn't look right ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Thu Jul 22 22:44:07 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Thu, 22 Jul 2021 22:44:07 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> <9DGzWlRgC8DaSEZFFeOzQJuRvopW8CISMLJwYQAUGTo=.1aa32797-386f-4101-a96d-6cbad78934f7@github.com> Message-ID: On Thu, 22 Jul 2021 18:36:16 GMT, Anthony Scarpino wrote: >> Hmm ok, so if it's not decryption in-place, then output buffer would still be zero'ed when the auth tag failed, but this is ok? > > This is able in-place, not about two separate buffers.. zeroing happens somewhere else for all decryption bad buffers Yes, I know. Basically, we are trying to optimize performance by trying to write into the supplied buffers (out) as much as we can. But then when tag verification failed, the "written" bytes are erased w/ 0. Ideal case would be not to touch the output buffer until after the tag verification succeeds. Isn't this the previous approach? Verify the tag first and then write out the plain text afterwards. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From ascarpino at openjdk.java.net Thu Jul 22 22:55:09 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Thu, 22 Jul 2021 22:55:09 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> <9DGzWlRgC8DaSEZFFeOzQJuRvopW8CISMLJwYQAUGTo=.1aa32797-386f-4101-a96d-6cbad78934f7@github.com> Message-ID: On Thu, 22 Jul 2021 22:41:03 GMT, Valerie Peng wrote: >> This is able in-place, not about two separate buffers.. zeroing happens somewhere else for all decryption bad buffers > > Yes, I know. Basically, we are trying to optimize performance by trying to write into the supplied buffers (out) as much as we can. But then when tag verification failed, the "written" bytes are erased w/ 0. Ideal case would be not to touch the output buffer until after the tag verification succeeds. Isn't this the previous approach? Verify the tag first and then write out the plain text afterwards. With this new intrinsic doing both ghash and gctr at the same time, I cannot do the that ghash check first before the gctr op. I wish I could ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From hboehm at google.com Thu Jul 22 23:12:30 2021 From: hboehm at google.com (Hans Boehm) Date: Thu, 22 Jul 2021 16:12:30 -0700 Subject: JNI WeakGlobalRefs In-Reply-To: References: Message-ID: Thanks for the detailed response! More below ... On Thu, Jul 22, 2021 at 1:20 AM Erik Osterlund wrote: > Hi Hans, > > So you are saying that a jweak allows you to peek into the finalizer graph > of a > finalizing object from the outside due to having phantom semantics, and > that > is sometimes bad. I agree. As you noted, if you don't use finalizers, > there is no > problem here. > > But let's assume we do what you propose, and make jweak have weak > semantics, > instead of phantom semantics. First of all, have we now removed all > gimmicks > that allow you to peek through finalizers? I would say no. > > Consider the following (same notation as your example): > > F ---> ... ---> F > > Here we have an object with a finalizer that can reach another object with > a > finalizer. They can both get finalized by the same GC cycle. So depending > on who > is enqueued for finalization first, you may or may not be able to peek > into an > already finalized object here, from the outside, through another > finalizer. > One might argue about the probability of users using an object behind a > jweak > vs finalizer, without knowing their implementation. But it's bad news for > both. > Once again, if finalizers are not used, there is no problem. > Agreed. I think this is problematic as well, and we're clearly not going to fix that problem at this stage. But as I tried to argue too briefly, I think there kind of is a work-around in the finalizer ordering case. If the first F's finalizer keeps a strong reference (from some strongly reachable data structure) to everything the finalizer needs to run, and clears it in the finalizer. (IIRC, Guy Steele suggested this long ago, and I think the 2005 memory model work kind of, sort of, enabled this to be done correctly.) In the WeakGlobalRef case, I don't think there is an analogous workaround. So I think it's actually worse than what is already a very unpleasant problem. Secondly, as for when you would ever want jweak to have phantom semantics... > I don't think this is uncommon, in particular for native code. It is not > uncommon > to use JNI to implement a native mirror component for a Java object. The > Java > object and its native data structure are to be considered as one logical > unit that > stick together. The Java object has a pointer to its native part, and the > native part > wants a pointer back to its Java object, so they can talk to each other > seamlessly. > You don't want the back pointer to be strong because it would create > memory leaks, > so you use a jweak. The expectation is that surely, as long as this Java > object is > around, you can access it through the jweak. Especially since the spec > says so, it's > a very sound assumption. Then some cleaner will delete the data structure > when > it is no longer relevant. So in contexts where you are in the native data > structure > and haven't passed around the object reference, you can always get the > object > through the jweak as long as it isn't dead. > This sounds like a very reasonable intent, but I don't see how to implement it correctly with the current semantics, unless you control the implementation of the Java object all the way down to the bits, which seems rare. The core question is what it means for the Java object to "be around". When the Java object is eligible for finalization, it's, in general, no longer safe to access from the native mirror, in spite of the fact that the native mirror has no way to tell that it has entered that half-dead-and-potentially-invalid state. The only way to dodge the issue is to know that no part of the Java object is finalizable (or affected by WeakReference-based cleanup), which I think is usually unknowable. It's unclear to me what guarantees even the standard library provides in this respect. > > Now if we change the semantics of jweak to weak instead, then every time > the > Java object is reachable through a finalizer only, the logic will be > wrong. The native > part thinks the object is dead, but it isn't. So if anyone uses this > object through > a finalizer, bad things can happen. Kind of like a native object monitor > not working > properly in finalizers. You essentially no longer can have an object with > a native > component point at each other, without running into a bunch of issues. > Either > introducing a memory leak, or having the object be crippled when reachable > from > finalizers. For similar reasons, all native weak references used > internally in HotSpot, > do have phantom semantics, and rely on that being the case. > That's a good data point, and suggests that current implementations need to keep supporting the current semantics for internal use. But I think Java implementations themselves here are in a completely different situation from client code: I would hope that those HotSpot uses actually are aware of the referent implementation all the way down to the bits, and either know there no finalizers involved, or write the finalizer code to defend against accessing finalized objects. The client code, on the other hand, seems extremely likely to use referents (e.g. listeners for some event generated by native code) that can rely on arbitrary unknown Java objects. And I think the current semantics just don't work for such cases. But you may well be right that there is just enough client code using this correctly that we can't actually change the semantics. > So it's a two edged sword I think. What is true for this whole discussion > though is > that if we don't have finalizers, then there isn't really a problem. > Finalizers are > deprecated, so hopefully its use will die out over time. What is also true > is that > by changing the semantics of jweak, you trade one problem for another one. > This seems particularly nasty to change, as the spec clearly states what > users > can expect from this API. The failure probability indeed goes down dramatically if you get rid of finalizers. Based on what I've seen, that's unfortunately not likely to happen quickly. But writing demonstrably correct code seems to remain a problem even without finalizers. Based on the current spec, you can get a similar effect to a finalizer by enqueuing a WeakReference. In reality, it might be the case that in the absence of finalizers, GlobalWeakRefs are cleared when WeakReferences are enqueued, But I don't think the spec says that. The implementation is still allowed to enqueue WeakReferences, leave the object around for a while, and clear PhantomReferences and GlobalWeakRefs later, leaving a window during which WeakReferences have been enqueued and processed, but GlobalWeakRefs will continue to generate strong references to the object. So I think this problem will really only go away after finalizers die out, and the spec is updated to take advantage of the fact that finalizers are dead. Which I expect to take a long time. If we can't change the semantics, do you agree that it makes sense to have the spec warn about the problem, and recommend a (strong) GlobalRef to a WeakReference for references to objects not controlled by the same developer? Hans > Hope this helps. > > /Erik > > > [ Moving here from core-libs-dev on David Holmes' recommendation. ] > > > > I'm concerned that the current semantics of JNI WeakGlobalRefs are still > > dangerous in a very subtle way that is hidden in the spec. The current > > (14+) spec says: > > > > ?Weak global references are related to Java phantom references > > (java.lang.ref.PhantomReference). A weak global reference to a specific > > object is treated as a phantom reference referring to that object when > > determining whether the object is phantom reachable (see java.lang.ref). > > ---> Such a weak global reference will become functionally equivalent to > > NULL at the same time as a PhantomReference referring to that same object > > would be cleared by the garbage collector. <---? > > > > (This was the result of JDK-8220617, and is IMO a large improvement over > the > > prior version, but ...) > > > > Consider what happens if I have a WeakGlobalRef W that refers to a Java > > object A which, possibly indirectly, relies on an object F, where F is > > finalizable, i.e. > > > > W - - -> A -----> ... -----> F > > > > Assume that F becomes invalid once it is finalized, e.g. because the > finalizer > > deallocates a native object that F relies on. This seems to be a very > common > > case. We are then exposed to the following scenario: > > > > 0) At some point, there are no longer any other references to A or F. > > 1) F is enqueued for finalization. > > 2) W is dereferenced by Thread 1, yielding a strong reference to A and > > transitively to F. > > 3) F is finalized. > > 4) Thread 1 uses A and F, accessing F, which is no longer valid. > > 5) Crash, or possibly memory corruption followed by a later crash > elsewhere. > > > > (3) and (4) actually race, so there is some synchronization effort and > cost > > required to prevent F from corrupting memory. Commonly the implementer > > of W will have no idea that F even exists. > > > > I believe that typically there is no way to prevent this scenario, > unless the > > developer adding W actually knows how every class that A could possibly > rely > > on, including those in the Java standard library, are implemented. > > > > This is reminiscent of finalizer ordering issues. But it seems to be > worse, in > > that there isn't even a semi-plausible workaround. > > > > I believe all of this is exactly the reason PhantomReference.get() always > > returns null, while WeakReference provides significantly different > semantics, > > and WeakReferences are enqueued when an object is enqueued for > > finalization. > > > > The situation improves, but the problem doesn't fully disappear, in a > > hypothetical world without finalizers. It's still possible to use > WeakGlobalRef > > to get a strong reference to A after a WeakReference to A has been > cleared > > and enqueued. I think the problem does go away if all cleanup code were > to > > use PhantomReference-based Cleaners. > > > > AFAICT, backward-compatibility aside, the obvious solution here is to > have > > WeakGlobalRefs behave like WeakReferences. My impression is that this > > would fix significantly more broken clients than it would break correct > ones, > > so it is arguably still a viable option. > > > > There is a case in which the current semantics are actually the desired > ones, > > namely when implementing, say, a String intern table. In this case it's > > important the reference not be cleared even if the referent is, at some > > point, only reachable via a finalizer. But this use case again relies on > the > > programmer knowing that no part of the referent is invalidated by a > finalizer. > > That's a reasonable assumption for the Java-implementation-provided > String > > intern table. But I'm not sure it's reasonable for any user-written code. > > > > There seem to be two ways forward here: > > > > 1) Make WeakGlobalRefs behave like WeakReferences instead of > > PhantomReferences, or > > 2) Add strong warnings to the spec that basically suggest using a strong > > GlobalRef to a WeakReference instead. > > > > Has there been prior discussion of this? Are there reasonable use cases > for > > the current semantics? Is there something else that I'm overlooking? If > not, > > what's the best way forward here? > > > > (I found some discussion from JDK-8220617, including a message I posted. > > Unfortunately, it seems to me that all of us overlooked this issue?) > > > > Hans > From jwilhelm at openjdk.java.net Fri Jul 23 00:38:36 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 23 Jul 2021 00:38:36 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8271162: runtime/StackTrace/LargeClassTest.java can be run in driver mode - 8271158: runtime/handshake/HandshakeTimeoutTest.java test doesn't check exit code - 8271169: runtime/Safepoint/TestAbortVMOnSafepointTimeout.java can be run in driver mode - 8271160: runtime/jni/checked/TestCheckedJniExceptionCheck.java doesn't set -Djava.library.path - 8271155: Wrong path separator in env variable - 8270916: Update java.lang.annotation.Target for changes in JLS 9.6.4.1 - 8271094: runtime/duplAttributes/DuplAttributesTest.java doesn't check exit code - 8271093: remove deadcode from runtime/Thread/TestThreadDumpSMRInfo.java test - 8270085: Suspend during block transition may deadlock if lock held - ... and 2 more: https://git.openjdk.java.net/jdk/compare/a7d30123...b6b24fa0 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4883&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4883&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4883/files Stats: 268 lines in 20 files changed: 178 ins; 43 del; 47 mod Patch: https://git.openjdk.java.net/jdk/pull/4883.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4883/head:pull/4883 PR: https://git.openjdk.java.net/jdk/pull/4883 From jwilhelm at openjdk.java.net Fri Jul 23 01:44:14 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Fri, 23 Jul 2021 01:44:14 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: On Fri, 23 Jul 2021 00:28:53 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: 9935440e Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/9935440eded25b041ea3e73cfa8ac0d95bbd66c6 Stats: 268 lines in 20 files changed: 178 ins; 43 del; 47 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4883 From yyang at openjdk.java.net Fri Jul 23 01:53:06 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 23 Jul 2021 01:53:06 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v4] In-Reply-To: References: Message-ID: <_bqq-eLJVAE3_CdkL4MdkEysqthOZxPdEHCiPbN4x-0=.0719886c-8a54-47c5-844c-ad9cb8ff2234@github.com> On Thu, 22 Jul 2021 14:35:21 GMT, Coleen Phillimore wrote: >> I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. >> Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Pair is much nicer. Looks good, thanks for doing this. ------------- Marked as reviewed by yyang (Committer). PR: https://git.openjdk.java.net/jdk/pull/4858 From aw at openjdk.java.net Fri Jul 23 02:47:14 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Fri, 23 Jul 2021 02:47:14 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() Message-ID: Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. Extended the test from JDK-8269592 to cover this case. ------------- Commit messages: - 8271140: Fix native frame handling in vframeStream::asJavaVFrame() Changes: https://git.openjdk.java.net/jdk/pull/4872/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4872&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271140 Stats: 71 lines in 2 files changed: 34 ins; 22 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/4872.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4872/head:pull/4872 PR: https://git.openjdk.java.net/jdk/pull/4872 From dnsimon at openjdk.java.net Fri Jul 23 02:47:15 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 23 Jul 2021 02:47:15 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 13:07:46 GMT, Andreas Woess wrote: > Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. > Extended the test from JDK-8269592 to cover this case. test/hotspot/jtreg/compiler/jvmci/compilerToVM/IterateFramesNative.java line 47: > 45: * -XX:+DoEscapeAnalysis -XX:-UseCounterDecay > 46: * compiler.jvmci.compilerToVM.IterateFramesNative > 47: * @run main/othervm -Xbatch -Xcomp -Xbootclasspath/a:. I think `-Xcomp` implies `-Xbatch`. I think you should add `-XX:CompileOnly=jdk.vm.ci.hotspot.CompilerToVM::iterateFrames` since you only really care about that method being compiled for sake of the test. test/hotspot/jtreg/compiler/jvmci/compilerToVM/IterateFramesNative.java line 50: > 48: * -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI > 49: * -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI > 50: * -XX:+DoEscapeAnalysis -XX:-UseCounterDecay Are the `-XX:+DoEscapeAnalysis -XX:-UseCounterDecay` options really needed? test/hotspot/jtreg/compiler/jvmci/compilerToVM/IterateFramesNative.java line 90: > 88: ? CompilerWhiteBoxTest.THRESHOLD > 89: : CompilerWhiteBoxTest.THRESHOLD * 2; > 90: `COMPILE_THRESHOLD` looks unused. ------------- PR: https://git.openjdk.java.net/jdk/pull/4872 From aw at openjdk.java.net Fri Jul 23 02:47:16 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Fri, 23 Jul 2021 02:47:16 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 22:17:21 GMT, Doug Simon wrote: >> Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. >> Extended the test from JDK-8269592 to cover this case. > > test/hotspot/jtreg/compiler/jvmci/compilerToVM/IterateFramesNative.java line 47: > >> 45: * -XX:+DoEscapeAnalysis -XX:-UseCounterDecay >> 46: * compiler.jvmci.compilerToVM.IterateFramesNative >> 47: * @run main/othervm -Xbatch -Xcomp -Xbootclasspath/a:. > > I think `-Xcomp` implies `-Xbatch`. > I think you should add `-XX:CompileOnly=jdk.vm.ci.hotspot.CompilerToVM::iterateFrames` since you only really care about that method being compiled for sake of the test. Hm, actually, I don't really need to use `-Xcomp` since the test should already trigger compilation anyway. It was just a way to ensure it's really compiled. I'll replace it with an assert in the test that the method is compiled. > test/hotspot/jtreg/compiler/jvmci/compilerToVM/IterateFramesNative.java line 50: > >> 48: * -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI >> 49: * -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI >> 50: * -XX:+DoEscapeAnalysis -XX:-UseCounterDecay > > Are the `-XX:+DoEscapeAnalysis -XX:-UseCounterDecay` options really needed? Probably copied them from somewhere, I think they're irrelevant and will remove them. ------------- PR: https://git.openjdk.java.net/jdk/pull/4872 From mbaesken at openjdk.java.net Fri Jul 23 06:52:06 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Fri, 23 Jul 2021 06:52:06 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v6] In-Reply-To: References: Message-ID: <80NDhQE20WOO7LMCDS9C9zYQIRy-YKqNiGgrPQAPI64=.ef6e55d9-8995-4669-9c6f-e10a61bd427f@github.com> On Thu, 22 Jul 2021 12:18:20 GMT, Matthias Baesken wrote: >> Hello, please review this PR; it extend the OSContainer API in order to also support the pids controller of cgroups. >> >> I noticed that unlike the other controllers "cpu", "cpuset", "cpuacct", "memory" on some older Linux distros (SLES 12.1, RHEL 7.1) the pids controller might not be there (or not fully supported) so it was added as optional , see the coding >> >> >> if (!cg_infos[PIDS_IDX]._data_complete) { >> log_debug(os, container)("Optional cgroup v1 pids subsystem not found"); >> // keep the other controller info, pids is optional >> } > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Minor adjustments, handling of Unlimited > @MBaesken Thanks. We need a solution for [#4518 (comment)](https://github.com/openjdk/jdk/pull/4518#issuecomment-882637594) though. `--pids-limit=-1` doesn't seem to make it unlimited on all container runtimes. For example it fails for me here with: > > ``` > $ docker --version > Docker version 20.10.6, build 370c289 > ``` Hi Severin, that's a pity and looks like a bug, because the docker documentation says https://docs.docker.com/engine/reference/commandline/run/ --pids-limit | ? | Tune container pids limit (set -1 for unlimited) -- | -- | -- Do you have an idea what to set with docker 20 on your setup? I did not find much about this in the docker 20 release notes https://docs.docker.com/engine/release-notes/ . ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From dnsimon at openjdk.java.net Fri Jul 23 08:39:02 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 23 Jul 2021 08:39:02 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 13:07:46 GMT, Andreas Woess wrote: > Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. > Extended the test from JDK-8269592 to cover this case. Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4872 From sgehwolf at openjdk.java.net Fri Jul 23 08:43:04 2021 From: sgehwolf at openjdk.java.net (Severin Gehwolf) Date: Fri, 23 Jul 2021 08:43:04 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v6] In-Reply-To: <80NDhQE20WOO7LMCDS9C9zYQIRy-YKqNiGgrPQAPI64=.ef6e55d9-8995-4669-9c6f-e10a61bd427f@github.com> References: <80NDhQE20WOO7LMCDS9C9zYQIRy-YKqNiGgrPQAPI64=.ef6e55d9-8995-4669-9c6f-e10a61bd427f@github.com> Message-ID: On Fri, 23 Jul 2021 06:49:15 GMT, Matthias Baesken wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor adjustments, handling of Unlimited > >> @MBaesken Thanks. We need a solution for [#4518 (comment)](https://github.com/openjdk/jdk/pull/4518#issuecomment-882637594) though. `--pids-limit=-1` doesn't seem to make it unlimited on all container runtimes. For example it fails for me here with: >> >> ``` >> $ docker --version >> Docker version 20.10.6, build 370c289 >> ``` > > Hi Severin, that's a pity and looks like a bug, because the docker documentation says > https://docs.docker.com/engine/reference/commandline/run/ > > > > > > --pids-limit | ? | Tune container pids limit (set -1 for unlimited) > -- | -- | -- > > > > > > > Do you have an idea what to set with docker 20 on your setup? I did not find much about this in the docker 20 release notes https://docs.docker.com/engine/release-notes/ . > > @MBaesken Thanks. We need a solution for [#4518 (comment)](https://github.com/openjdk/jdk/pull/4518#issuecomment-882637594) though. `--pids-limit=-1` doesn't seem to make it unlimited on all container runtimes. For example it fails for me here with: > > ``` > > $ docker --version > > Docker version 20.10.6, build 370c289 > > ``` > > Hi Severin, that's a pity and looks like a bug, because the docker documentation says > https://docs.docker.com/engine/reference/commandline/run/ > --pids-limit Tune container pids limit (set -1 for unlimited) > > Do you have an idea what to set with docker 20 on your setup? I did not find much about this in the docker 20 release notes https://docs.docker.com/engine/release-notes/ . No, I don't know what to do about it. All I can see it comes back with a pids limit of `38019` when set to `-1`. It does seem like a bug or an intentional setting so as to avoid fork bombs. ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From erik.osterlund at oracle.com Fri Jul 23 10:31:12 2021 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 23 Jul 2021 10:31:12 +0000 Subject: [External] : Re: JNI WeakGlobalRefs In-Reply-To: References: , Message-ID: <7C309890-C960-4DE9-90BE-56ED2992D9CE@oracle.com> Hi Hans, On 23 Jul 2021, at 01:12, Hans Boehm wrote: ? Thanks for the detailed response! More below ... On Thu, Jul 22, 2021 at 1:20 AM Erik Osterlund > wrote: Hi Hans, So you are saying that a jweak allows you to peek into the finalizer graph of a finalizing object from the outside due to having phantom semantics, and that is sometimes bad. I agree. As you noted, if you don't use finalizers, there is no problem here. But let's assume we do what you propose, and make jweak have weak semantics, instead of phantom semantics. First of all, have we now removed all gimmicks that allow you to peek through finalizers? I would say no. Consider the following (same notation as your example): F ---> ... ---> F Here we have an object with a finalizer that can reach another object with a finalizer. They can both get finalized by the same GC cycle. So depending on who is enqueued for finalization first, you may or may not be able to peek into an already finalized object here, from the outside, through another finalizer. One might argue about the probability of users using an object behind a jweak vs finalizer, without knowing their implementation. But it's bad news for both. Once again, if finalizers are not used, there is no problem. Agreed. I think this is problematic as well, and we're clearly not going to fix that problem at this stage. But as I tried to argue too briefly, I think there kind of is a work-around in the finalizer ordering case. If the first F's finalizer keeps a strong reference (from some strongly reachable data structure) to everything the finalizer needs to run, and clears it in the finalizer. (IIRC, Guy Steele suggested this long ago, and I think the 2005 memory model work kind of, sort of, enabled this to be done correctly.) In the WeakGlobalRef case, I don't think there is an analogous workaround. So I think it's actually worse than what is already a very unpleasant problem. I think you are right. But if we flipped the coin and bought the problem of native components being unable to access their Java mirror any longer, if they are finalizably reachable, which they can?t know, there wouldn?t really be any workaround for that either, AFAICT. So either way you end up with problems hard to work around, except by removing finalizers of course. Which by all means is what we want people to do. Secondly, as for when you would ever want jweak to have phantom semantics... I don't think this is uncommon, in particular for native code. It is not uncommon to use JNI to implement a native mirror component for a Java object. The Java object and its native data structure are to be considered as one logical unit that stick together. The Java object has a pointer to its native part, and the native part wants a pointer back to its Java object, so they can talk to each other seamlessly. You don't want the back pointer to be strong because it would create memory leaks, so you use a jweak. The expectation is that surely, as long as this Java object is around, you can access it through the jweak. Especially since the spec says so, it's a very sound assumption. Then some cleaner will delete the data structure when it is no longer relevant. So in contexts where you are in the native data structure and haven't passed around the object reference, you can always get the object through the jweak as long as it isn't dead. This sounds like a very reasonable intent, but I don't see how to implement it correctly with the current semantics, unless you control the implementation of the Java object all the way down to the bits, which seems rare. The core question is what it means for the Java object to "be around". When the Java object is eligible for finalization, it's, in general, no longer safe to access from the native mirror, in spite of the fact that the native mirror has no way to tell that it has entered that half-dead-and-potentially-invalid state. The only way to dodge the issue is to know that no part of the Java object is finalizable (or affected by WeakReference-based cleanup), which I think is usually unknowable. It's unclear to me what guarantees even the standard library provides in this respect. I disagree. I don?t think you need to know the object isn?t finalizable. You just need to know your native mirror object doesn?t have a finalizer which breaks the object. And that seems easy to know. Just follow the guidelines and don?t write a finalizer for that object, and objects you use in your native code. Instead use a phantom based cleaner. If other objects have finalizers that can reach said native mirrored objects, doesn?t really matter. Because said finalizers are not allowed to break the object. So you only need to reason about your own class to know your native mirrored component works correctly. Now if we change the semantics of jweak to weak instead, then every time the Java object is reachable through a finalizer only, the logic will be wrong. The native part thinks the object is dead, but it isn't. So if anyone uses this object through a finalizer, bad things can happen. Kind of like a native object monitor not working properly in finalizers. You essentially no longer can have an object with a native component point at each other, without running into a bunch of issues. Either introducing a memory leak, or having the object be crippled when reachable from finalizers. For similar reasons, all native weak references used internally in HotSpot, do have phantom semantics, and rely on that being the case. That's a good data point, and suggests that current implementations need to keep supporting the current semantics for internal use. But I think Java implementations themselves here are in a completely different situation from client code: I would hope that those HotSpot uses actually are aware of the referent implementation all the way down to the bits, and either know there no finalizers involved, or write the finalizer code to defend against accessing finalized objects. The client code, on the other hand, seems extremely likely to use referents (e.g. listeners for some event generated by native code) that can rely on arbitrary unknown Java objects. And I think the current semantics just don't work for such cases. Similarly to my comment above, we just don?t need to know about what libraries use finalizers. We know our native components are implemented right with corresponding phantom based cleanups and don?t get broken by other finalizers. But you may well be right that there is just enough client code using this correctly that we can't actually change the semantics. I think so. Especially if we only swap problems around. If we could make everyone happy, it might be a different story. So it's a two edged sword I think. What is true for this whole discussion though is that if we don't have finalizers, then there isn't really a problem. Finalizers are deprecated, so hopefully its use will die out over time. What is also true is that by changing the semantics of jweak, you trade one problem for another one. This seems particularly nasty to change, as the spec clearly states what users can expect from this API. The failure probability indeed goes down dramatically if you get rid of finalizers. Based on what I've seen, that's unfortunately not likely to happen quickly. I can imagine the transition will be slow indeed. But writing demonstrably correct code seems to remain a problem even without finalizers. Based on the current spec, you can get a similar effect to a finalizer by enqueuing a WeakReference. In reality, it might be the case that in the absence of finalizers, GlobalWeakRefs are cleared when WeakReferences are enqueued, But I don't think the spec says that. The implementation is still allowed to enqueue WeakReferences, leave the object around for a while, and clear PhantomReferences and GlobalWeakRefs later, leaving a window during which WeakReferences have been enqueued and processed, but GlobalWeakRefs will continue to generate strong references to the object. Without finalizers, weak and phantom will be equivalent. Therefore jweak and and WeakReference will agree about whether the object is cleared or not. So after the WeakReference cleanup has run, jweaks are guaranteed to resolve to null. So I don?t see the problem you are referring to, in a finalizer absent world. So I think this problem will really only go away after finalizers die out, and the spec is updated to take advantage of the fact that finalizers are dead. Which I expect to take a long time. It sounds like we agree that without finalizers everyone will be happy. It?s just getting there that could take a while. If we can't change the semantics, do you agree that it makes sense to have the spec warn about the problem, and recommend a (strong) GlobalRef to a WeakReference for references to objects not controlled by the same developer? Writing down a note that you should probably think about this, and possibly recommending removal of finalizers, seems fine to me. /Erik Hans Hope this helps. /Erik > [ Moving here from core-libs-dev on David Holmes' recommendation. ] > > I'm concerned that the current semantics of JNI WeakGlobalRefs are still > dangerous in a very subtle way that is hidden in the spec. The current > (14+) spec says: > > ?Weak global references are related to Java phantom references > (java.lang.ref.PhantomReference). A weak global reference to a specific > object is treated as a phantom reference referring to that object when > determining whether the object is phantom reachable (see java.lang.ref). > ---> Such a weak global reference will become functionally equivalent to > NULL at the same time as a PhantomReference referring to that same object > would be cleared by the garbage collector. <---? > > (This was the result of JDK-8220617, and is IMO a large improvement over the > prior version, but ...) > > Consider what happens if I have a WeakGlobalRef W that refers to a Java > object A which, possibly indirectly, relies on an object F, where F is > finalizable, i.e. > > W - - -> A -----> ... -----> F > > Assume that F becomes invalid once it is finalized, e.g. because the finalizer > deallocates a native object that F relies on. This seems to be a very common > case. We are then exposed to the following scenario: > > 0) At some point, there are no longer any other references to A or F. > 1) F is enqueued for finalization. > 2) W is dereferenced by Thread 1, yielding a strong reference to A and > transitively to F. > 3) F is finalized. > 4) Thread 1 uses A and F, accessing F, which is no longer valid. > 5) Crash, or possibly memory corruption followed by a later crash elsewhere. > > (3) and (4) actually race, so there is some synchronization effort and cost > required to prevent F from corrupting memory. Commonly the implementer > of W will have no idea that F even exists. > > I believe that typically there is no way to prevent this scenario, unless the > developer adding W actually knows how every class that A could possibly rely > on, including those in the Java standard library, are implemented. > > This is reminiscent of finalizer ordering issues. But it seems to be worse, in > that there isn't even a semi-plausible workaround. > > I believe all of this is exactly the reason PhantomReference.get() always > returns null, while WeakReference provides significantly different semantics, > and WeakReferences are enqueued when an object is enqueued for > finalization. > > The situation improves, but the problem doesn't fully disappear, in a > hypothetical world without finalizers. It's still possible to use WeakGlobalRef > to get a strong reference to A after a WeakReference to A has been cleared > and enqueued. I think the problem does go away if all cleanup code were to > use PhantomReference-based Cleaners. > > AFAICT, backward-compatibility aside, the obvious solution here is to have > WeakGlobalRefs behave like WeakReferences. My impression is that this > would fix significantly more broken clients than it would break correct ones, > so it is arguably still a viable option. > > There is a case in which the current semantics are actually the desired ones, > namely when implementing, say, a String intern table. In this case it's > important the reference not be cleared even if the referent is, at some > point, only reachable via a finalizer. But this use case again relies on the > programmer knowing that no part of the referent is invalidated by a finalizer. > That's a reasonable assumption for the Java-implementation-provided String > intern table. But I'm not sure it's reasonable for any user-written code. > > There seem to be two ways forward here: > > 1) Make WeakGlobalRefs behave like WeakReferences instead of > PhantomReferences, or > 2) Add strong warnings to the spec that basically suggest using a strong > GlobalRef to a WeakReference instead. > > Has there been prior discussion of this? Are there reasonable use cases for > the current semantics? Is there something else that I'm overlooking? If not, > what's the best way forward here? > > (I found some discussion from JDK-8220617, including a message I posted. > Unfortunately, it seems to me that all of us overlooked this issue?) > > Hans From rkennke at openjdk.java.net Fri Jul 23 10:45:03 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 23 Jul 2021 10:45:03 GMT Subject: Integrated: 8270894: Use acquire semantics in ObjectSynchronizer::read_stable_mark() In-Reply-To: References: Message-ID: On Mon, 19 Jul 2021 19:13:59 GMT, Roman Kennke wrote: > Currently, the object header is read using plain loads in read_stable_mark() (synchronizer.cpp). The matching stores use release semantics in corresponding CAS and release_store(). It seems reasonable to use acquire-semantics for the loads of the object header. > > See also discussion here: > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-July/050132.html > > I propose to use MO_ACQUIRE when reading the object header in read_stable_mark() and some related loads of the header. As discussed in the thread, current_thread_holds_lock() is the only place where we could do without acquire, but it doesn't seem worth to introduce extra complexity just to make this access relaxed, because it does not seem to be used in any place that looks very performance sensitive. > > If it were me, I'd probably also change the other mark() calls to MO_ACQUIRE for consistency, but that might be overkill. > > Testing: > - [x] tier1 > - [x] tier2 This pull request has now been integrated. Changeset: f2261903 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/f22619032df2cf45664f110c71ddf509a5128900 Stats: 11 lines in 3 files changed: 6 ins; 0 del; 5 mod 8270894: Use acquire semantics in ObjectSynchronizer::read_stable_mark() Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/4829 From mbaesken at openjdk.java.net Fri Jul 23 11:05:03 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Fri, 23 Jul 2021 11:05:03 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v6] In-Reply-To: References: <80NDhQE20WOO7LMCDS9C9zYQIRy-YKqNiGgrPQAPI64=.ef6e55d9-8995-4669-9c6f-e10a61bd427f@github.com> Message-ID: On Fri, 23 Jul 2021 08:39:37 GMT, Severin Gehwolf wrote: > > No, I don't know what to do about it. All I can see it comes back with a pids limit of `38019` when set to `-1`. It does seem like a bug or an intentional setting so as to avoid fork bombs. Very strange indeed, I have docker 20.10.2 on my Ubuntu test server and there the tests work and no "38019" is coming back for -1/unlimited . What distro are you using? I did a quick search but did not find much about the mysterious `38019` . Could this be some setting configured on your box that is picked up as an additional limit ? ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From dholmes at openjdk.java.net Fri Jul 23 11:50:04 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 23 Jul 2021 11:50:04 GMT Subject: RFR: 8270894: Use acquire semantics in ObjectSynchronizer::read_stable_mark() In-Reply-To: References: Message-ID: On Mon, 19 Jul 2021 19:13:59 GMT, Roman Kennke wrote: > Currently, the object header is read using plain loads in read_stable_mark() (synchronizer.cpp). The matching stores use release semantics in corresponding CAS and release_store(). It seems reasonable to use acquire-semantics for the loads of the object header. > > See also discussion here: > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-July/050132.html > > I propose to use MO_ACQUIRE when reading the object header in read_stable_mark() and some related loads of the header. As discussed in the thread, current_thread_holds_lock() is the only place where we could do without acquire, but it doesn't seem worth to introduce extra complexity just to make this access relaxed, because it does not seem to be used in any place that looks very performance sensitive. > > If it were me, I'd probably also change the other mark() calls to MO_ACQUIRE for consistency, but that might be overkill. > > Testing: > - [x] tier1 > - [x] tier2 @rkennke two reviews are required for hotspot changes! ------------- PR: https://git.openjdk.java.net/jdk/pull/4829 From rkennke at openjdk.java.net Fri Jul 23 11:54:04 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 23 Jul 2021 11:54:04 GMT Subject: RFR: 8270894: Use acquire semantics in ObjectSynchronizer::read_stable_mark() In-Reply-To: References: Message-ID: On Fri, 23 Jul 2021 11:47:12 GMT, David Holmes wrote: > @rkennke two reviews are required for hotspot changes! Oh shit. I forgot, sorry! Do you want me to back it out? ------------- PR: https://git.openjdk.java.net/jdk/pull/4829 From adinn at redhat.com Fri Jul 23 12:22:33 2021 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 23 Jul 2021 13:22:33 +0100 Subject: RFR: 8270894: Use acquire semantics in ObjectSynchronizer::read_stable_mark() In-Reply-To: References: Message-ID: On 23/07/2021 12:54, Roman Kennke wrote: > On Fri, 23 Jul 2021 11:47:12 GMT, David Holmes wrote: > >> @rkennke two reviews are required for hotspot changes! > > Oh shit. I forgot, sorry! Do you want me to back it out? I have been reading this thread and I am happy to provide a second review if that will help. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From coleenp at openjdk.java.net Fri Jul 23 12:26:09 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 23 Jul 2021 12:26:09 GMT Subject: RFR: 8271063: Print injected fields for InstanceKlass [v4] In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 14:35:21 GMT, Coleen Phillimore wrote: >> I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. >> Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Pair is much nicer. Thanks for the review and suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/4858 From coleenp at openjdk.java.net Fri Jul 23 12:26:09 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 23 Jul 2021 12:26:09 GMT Subject: Integrated: 8271063: Print injected fields for InstanceKlass In-Reply-To: References: Message-ID: <4cAbo6_rr55iREUqFAiGYdUD-FHXC3Iyjq0FfoKHJpc=.17d04e78-fe05-401b-83bd-3048d72810e5@github.com> On Wed, 21 Jul 2021 13:47:04 GMT, Coleen Phillimore wrote: > I added code to print the injected fields in InstanceKlass. I also removed field offset sorting for do_nonstatic_fields because it's not necessary when used for purposes other than printing the fields. I also added a gtest. > Tested with tier1 all Oracle platforms, tier2-3 on linux-x64-debug, windows-x64-debug. This pull request has now been integrated. Changeset: 9b27df6a Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/9b27df6a4f0e5cdc7765144d6bcbc95700bdb6a3 Stats: 88 lines in 4 files changed: 53 ins; 25 del; 10 mod 8271063: Print injected fields for InstanceKlass Reviewed-by: fparain, hseigel, yyang ------------- PR: https://git.openjdk.java.net/jdk/pull/4858 From sgehwolf at openjdk.java.net Fri Jul 23 12:32:03 2021 From: sgehwolf at openjdk.java.net (Severin Gehwolf) Date: Fri, 23 Jul 2021 12:32:03 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v6] In-Reply-To: References: <80NDhQE20WOO7LMCDS9C9zYQIRy-YKqNiGgrPQAPI64=.ef6e55d9-8995-4669-9c6f-e10a61bd427f@github.com> Message-ID: On Fri, 23 Jul 2021 11:02:13 GMT, Matthias Baesken wrote: > > No, I don't know what to do about it. All I can see it comes back with a pids limit of `38019` when set to `-1`. It does seem like a bug or an intentional setting so as to avoid fork bombs. > > Very strange indeed, I have docker 20.10.2 on my Ubuntu test server and there the tests work and no "38019" is coming back for -1/unlimited . What distro are you using? I did a quick search but did not find much about the mysterious `38019` . Could this be some setting configured on your box that is picked up as an additional limit ? I'm on Fedora 34 and have the moby distro build of docker: https://koji.fedoraproject.org/koji/buildinfo?buildID=1781164 ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From sgehwolf at openjdk.java.net Fri Jul 23 12:35:11 2021 From: sgehwolf at openjdk.java.net (Severin Gehwolf) Date: Fri, 23 Jul 2021 12:35:11 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v6] In-Reply-To: References: <80NDhQE20WOO7LMCDS9C9zYQIRy-YKqNiGgrPQAPI64=.ef6e55d9-8995-4669-9c6f-e10a61bd427f@github.com> Message-ID: <_CXJ5Lcpd7-PqzRzGAtEE4NyZzAGirYGSgVT7KbPyFw=.f2ce7164-7d28-4b6e-9a79-9417054e0113@github.com> On Fri, 23 Jul 2021 12:28:47 GMT, Severin Gehwolf wrote: > Could this be some setting configured on your box that is picked up as an additional limit ? Possibly. Not sure where to look, though. ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From dholmes at openjdk.java.net Fri Jul 23 13:32:07 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 23 Jul 2021 13:32:07 GMT Subject: RFR: 8270894: Use acquire semantics in ObjectSynchronizer::read_stable_mark() In-Reply-To: References: Message-ID: On Mon, 19 Jul 2021 19:13:59 GMT, Roman Kennke wrote: > Currently, the object header is read using plain loads in read_stable_mark() (synchronizer.cpp). The matching stores use release semantics in corresponding CAS and release_store(). It seems reasonable to use acquire-semantics for the loads of the object header. > > See also discussion here: > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-July/050132.html > > I propose to use MO_ACQUIRE when reading the object header in read_stable_mark() and some related loads of the header. As discussed in the thread, current_thread_holds_lock() is the only place where we could do without acquire, but it doesn't seem worth to introduce extra complexity just to make this access relaxed, because it does not seem to be used in any place that looks very performance sensitive. > > If it were me, I'd probably also change the other mark() calls to MO_ACQUIRE for consistency, but that might be overkill. > > Testing: > - [x] tier1 > - [x] tier2 Perhaps solicit a second review from someone. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4829 From adinn at openjdk.java.net Fri Jul 23 14:37:10 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 23 Jul 2021 14:37:10 GMT Subject: RFR: 8270894: Use acquire semantics in ObjectSynchronizer::read_stable_mark() In-Reply-To: References: Message-ID: On Mon, 19 Jul 2021 19:13:59 GMT, Roman Kennke wrote: > Currently, the object header is read using plain loads in read_stable_mark() (synchronizer.cpp). The matching stores use release semantics in corresponding CAS and release_store(). It seems reasonable to use acquire-semantics for the loads of the object header. > > See also discussion here: > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-July/050132.html > > I propose to use MO_ACQUIRE when reading the object header in read_stable_mark() and some related loads of the header. As discussed in the thread, current_thread_holds_lock() is the only place where we could do without acquire, but it doesn't seem worth to introduce extra complexity just to make this access relaxed, because it does not seem to be used in any place that looks very performance sensitive. > > If it were me, I'd probably also change the other mark() calls to MO_ACQUIRE for consistency, but that might be overkill. > > Testing: > - [x] tier1 > - [x] tier2 This looks fine to me Reviewed ------------- PR: https://git.openjdk.java.net/jdk/pull/4829 From kvn at openjdk.java.net Fri Jul 23 16:13:02 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 23 Jul 2021 16:13:02 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 13:07:46 GMT, Andreas Woess wrote: > Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. > Extended the test from JDK-8269592 to cover this case. Testing show failures. Please, investigate. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4872 From richard.reingruber at sap.com Fri Jul 23 18:08:03 2021 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 23 Jul 2021 18:08:03 +0000 Subject: [External] : Re: JNI WeakGlobalRefs In-Reply-To: <7C309890-C960-4DE9-90BE-56ED2992D9CE@oracle.com> References: , <7C309890-C960-4DE9-90BE-56ED2992D9CE@oracle.com> Message-ID: Hi, > > But writing demonstrably correct code seems to remain a problem even without finalizers. > > Based on the current spec, you can get a similar effect to a finalizer by > > enqueuing a WeakReference. In reality, it might be the case that in the absence > > of finalizers, GlobalWeakRefs are cleared when WeakReferences are enqueued, > > But I don't think the spec says that. The implementation is still allowed to enqueue > > WeakReferences, leave the object around for a while, and clear PhantomReferences > > and GlobalWeakRefs later, leaving a window during which WeakReferences > > have been enqueued and processed, but GlobalWeakRefs will continue to generate > > strong references to the object. > Without finalizers, weak and phantom will be equivalent. I guess Hans referred to the api doc / spec: "An object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized, and some phantom reference refers to it." [1] This would leave room for implementations to treat them differently which they should not. Without finalization phantom references must not be enqueued before weak references with the same referent. This potentially prevents processing weak and phantom references in the same phase. I guess the spec can be change to define phantom and weak as equivalent when finalizers are removed. It would be even better to remove them as well but that would likely be too much incompatibility. Should phantom references be deprecated now? I don't see why they shouldn't. Cheers, Richard. [1] https://download.java.net/java/early_access/jdk17/docs/api/java.base/java/lang/ref/package-summary.html By the way (anecdotal): I learned the hard way about the semantics of jni weak references when we backported hotspot from JDK11 to our commercial JDK8 (licensed from Oracle). Hotspot form JDK11 uses jni weak references for the classloader of a method M enqueued for jit compilation (see CompileTask). This can cause issues if the loader is also held by a phantom reference (seems to be a pattern in nashorn) because when the loader and an invokedynamic MutableCallSite C in M become phantom reachable then C's dependencies to nmehtods are flushed by a Cleaner but the CompileTasks jni weak reference is not cleared (because phantom references are not cleared in Java 8). C2 compiles M and tries to register a dependency to C which is not good after C was cleaned. In that case it would have been good if the jni weak reference would be processed like j.l.r.WeakReferences. -----Original Message----- From: hotspot-dev On Behalf Of Erik Osterlund Sent: Freitag, 23. Juli 2021 12:31 To: Hans Boehm Cc: hotspot-dev developers Subject: Re: [External] : Re: JNI WeakGlobalRefs Hi Hans, On 23 Jul 2021, at 01:12, Hans Boehm wrote: ? Thanks for the detailed response! More below ... On Thu, Jul 22, 2021 at 1:20 AM Erik Osterlund > wrote: Hi Hans, So you are saying that a jweak allows you to peek into the finalizer graph of a finalizing object from the outside due to having phantom semantics, and that is sometimes bad. I agree. As you noted, if you don't use finalizers, there is no problem here. But let's assume we do what you propose, and make jweak have weak semantics, instead of phantom semantics. First of all, have we now removed all gimmicks that allow you to peek through finalizers? I would say no. Consider the following (same notation as your example): F ---> ... ---> F Here we have an object with a finalizer that can reach another object with a finalizer. They can both get finalized by the same GC cycle. So depending on who is enqueued for finalization first, you may or may not be able to peek into an already finalized object here, from the outside, through another finalizer. One might argue about the probability of users using an object behind a jweak vs finalizer, without knowing their implementation. But it's bad news for both. Once again, if finalizers are not used, there is no problem. Agreed. I think this is problematic as well, and we're clearly not going to fix that problem at this stage. But as I tried to argue too briefly, I think there kind of is a work-around in the finalizer ordering case. If the first F's finalizer keeps a strong reference (from some strongly reachable data structure) to everything the finalizer needs to run, and clears it in the finalizer. (IIRC, Guy Steele suggested this long ago, and I think the 2005 memory model work kind of, sort of, enabled this to be done correctly.) In the WeakGlobalRef case, I don't think there is an analogous workaround. So I think it's actually worse than what is already a very unpleasant problem. I think you are right. But if we flipped the coin and bought the problem of native components being unable to access their Java mirror any longer, if they are finalizably reachable, which they can?t know, there wouldn?t really be any workaround for that either, AFAICT. So either way you end up with problems hard to work around, except by removing finalizers of course. Which by all means is what we want people to do. Secondly, as for when you would ever want jweak to have phantom semantics... I don't think this is uncommon, in particular for native code. It is not uncommon to use JNI to implement a native mirror component for a Java object. The Java object and its native data structure are to be considered as one logical unit that stick together. The Java object has a pointer to its native part, and the native part wants a pointer back to its Java object, so they can talk to each other seamlessly. You don't want the back pointer to be strong because it would create memory leaks, so you use a jweak. The expectation is that surely, as long as this Java object is around, you can access it through the jweak. Especially since the spec says so, it's a very sound assumption. Then some cleaner will delete the data structure when it is no longer relevant. So in contexts where you are in the native data structure and haven't passed around the object reference, you can always get the object through the jweak as long as it isn't dead. This sounds like a very reasonable intent, but I don't see how to implement it correctly with the current semantics, unless you control the implementation of the Java object all the way down to the bits, which seems rare. The core question is what it means for the Java object to "be around". When the Java object is eligible for finalization, it's, in general, no longer safe to access from the native mirror, in spite of the fact that the native mirror has no way to tell that it has entered that half-dead-and-potentially-invalid state. The only way to dodge the issue is to know that no part of the Java object is finalizable (or affected by WeakReference-based cleanup), which I think is usually unknowable. It's unclear to me what guarantees even the standard library provides in this respect. I disagree. I don?t think you need to know the object isn?t finalizable. You just need to know your native mirror object doesn?t have a finalizer which breaks the object. And that seems easy to know. Just follow the guidelines and don?t write a finalizer for that object, and objects you use in your native code. Instead use a phantom based cleaner. If other objects have finalizers that can reach said native mirrored objects, doesn?t really matter. Because said finalizers are not allowed to break the object. So you only need to reason about your own class to know your native mirrored component works correctly. Now if we change the semantics of jweak to weak instead, then every time the Java object is reachable through a finalizer only, the logic will be wrong. The native part thinks the object is dead, but it isn't. So if anyone uses this object through a finalizer, bad things can happen. Kind of like a native object monitor not working properly in finalizers. You essentially no longer can have an object with a native component point at each other, without running into a bunch of issues. Either introducing a memory leak, or having the object be crippled when reachable from finalizers. For similar reasons, all native weak references used internally in HotSpot, do have phantom semantics, and rely on that being the case. That's a good data point, and suggests that current implementations need to keep supporting the current semantics for internal use. But I think Java implementations themselves here are in a completely different situation from client code: I would hope that those HotSpot uses actually are aware of the referent implementation all the way down to the bits, and either know there no finalizers involved, or write the finalizer code to defend against accessing finalized objects. The client code, on the other hand, seems extremely likely to use referents (e.g. listeners for some event generated by native code) that can rely on arbitrary unknown Java objects. And I think the current semantics just don't work for such cases. Similarly to my comment above, we just don?t need to know about what libraries use finalizers. We know our native components are implemented right with corresponding phantom based cleanups and don?t get broken by other finalizers. But you may well be right that there is just enough client code using this correctly that we can't actually change the semantics. I think so. Especially if we only swap problems around. If we could make everyone happy, it might be a different story. So it's a two edged sword I think. What is true for this whole discussion though is that if we don't have finalizers, then there isn't really a problem. Finalizers are deprecated, so hopefully its use will die out over time. What is also true is that by changing the semantics of jweak, you trade one problem for another one. This seems particularly nasty to change, as the spec clearly states what users can expect from this API. The failure probability indeed goes down dramatically if you get rid of finalizers. Based on what I've seen, that's unfortunately not likely to happen quickly. I can imagine the transition will be slow indeed. But writing demonstrably correct code seems to remain a problem even without finalizers. Based on the current spec, you can get a similar effect to a finalizer by enqueuing a WeakReference. In reality, it might be the case that in the absence of finalizers, GlobalWeakRefs are cleared when WeakReferences are enqueued, But I don't think the spec says that. The implementation is still allowed to enqueue WeakReferences, leave the object around for a while, and clear PhantomReferences and GlobalWeakRefs later, leaving a window during which WeakReferences have been enqueued and processed, but GlobalWeakRefs will continue to generate strong references to the object. Without finalizers, weak and phantom will be equivalent. Therefore jweak and and WeakReference will agree about whether the object is cleared or not. So after the WeakReference cleanup has run, jweaks are guaranteed to resolve to null. So I don?t see the problem you are referring to, in a finalizer absent world. So I think this problem will really only go away after finalizers die out, and the spec is updated to take advantage of the fact that finalizers are dead. Which I expect to take a long time. It sounds like we agree that without finalizers everyone will be happy. It?s just getting there that could take a while. If we can't change the semantics, do you agree that it makes sense to have the spec warn about the problem, and recommend a (strong) GlobalRef to a WeakReference for references to objects not controlled by the same developer? Writing down a note that you should probably think about this, and possibly recommending removal of finalizers, seems fine to me. /Erik Hans Hope this helps. /Erik > [ Moving here from core-libs-dev on David Holmes' recommendation. ] > > I'm concerned that the current semantics of JNI WeakGlobalRefs are still > dangerous in a very subtle way that is hidden in the spec. The current > (14+) spec says: > > ?Weak global references are related to Java phantom references > (java.lang.ref.PhantomReference). A weak global reference to a specific > object is treated as a phantom reference referring to that object when > determining whether the object is phantom reachable (see java.lang.ref). > ---> Such a weak global reference will become functionally equivalent to > NULL at the same time as a PhantomReference referring to that same object > would be cleared by the garbage collector. <---? > > (This was the result of JDK-8220617, and is IMO a large improvement over the > prior version, but ...) > > Consider what happens if I have a WeakGlobalRef W that refers to a Java > object A which, possibly indirectly, relies on an object F, where F is > finalizable, i.e. > > W - - -> A -----> ... -----> F > > Assume that F becomes invalid once it is finalized, e.g. because the finalizer > deallocates a native object that F relies on. This seems to be a very common > case. We are then exposed to the following scenario: > > 0) At some point, there are no longer any other references to A or F. > 1) F is enqueued for finalization. > 2) W is dereferenced by Thread 1, yielding a strong reference to A and > transitively to F. > 3) F is finalized. > 4) Thread 1 uses A and F, accessing F, which is no longer valid. > 5) Crash, or possibly memory corruption followed by a later crash elsewhere. > > (3) and (4) actually race, so there is some synchronization effort and cost > required to prevent F from corrupting memory. Commonly the implementer > of W will have no idea that F even exists. > > I believe that typically there is no way to prevent this scenario, unless the > developer adding W actually knows how every class that A could possibly rely > on, including those in the Java standard library, are implemented. > > This is reminiscent of finalizer ordering issues. But it seems to be worse, in > that there isn't even a semi-plausible workaround. > > I believe all of this is exactly the reason PhantomReference.get() always > returns null, while WeakReference provides significantly different semantics, > and WeakReferences are enqueued when an object is enqueued for > finalization. > > The situation improves, but the problem doesn't fully disappear, in a > hypothetical world without finalizers. It's still possible to use WeakGlobalRef > to get a strong reference to A after a WeakReference to A has been cleared > and enqueued. I think the problem does go away if all cleanup code were to > use PhantomReference-based Cleaners. > > AFAICT, backward-compatibility aside, the obvious solution here is to have > WeakGlobalRefs behave like WeakReferences. My impression is that this > would fix significantly more broken clients than it would break correct ones, > so it is arguably still a viable option. > > There is a case in which the current semantics are actually the desired ones, > namely when implementing, say, a String intern table. In this case it's > important the reference not be cleared even if the referent is, at some > point, only reachable via a finalizer. But this use case again relies on the > programmer knowing that no part of the referent is invalidated by a finalizer. > That's a reasonable assumption for the Java-implementation-provided String > intern table. But I'm not sure it's reasonable for any user-written code. > > There seem to be two ways forward here: > > 1) Make WeakGlobalRefs behave like WeakReferences instead of > PhantomReferences, or > 2) Add strong warnings to the spec that basically suggest using a strong > GlobalRef to a WeakReference instead. > > Has there been prior discussion of this? Are there reasonable use cases for > the current semantics? Is there something else that I'm overlooking? If not, > what's the best way forward here? > > (I found some discussion from JDK-8220617, including a message I posted. > Unfortunately, it seems to me that all of us overlooked this issue?) > > Hans From coleenp at openjdk.java.net Fri Jul 23 19:00:16 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 23 Jul 2021 19:00:16 GMT Subject: RFR: 8271219: [REDO] JDK-8271063 Print injected fields for InstanceKlass Message-ID: There was a test bug in the JDK-8271063. I thought I'd check the output of Method->print() except it's only available in debug mode. The basic diff is line 56 and 63 in the gtest. Retested with mach5 tier1 on 5 Oracle platforms. ------------- Commit messages: - 8271219: [REDO] JDK-8271063 Print injected fields for InstanceKlass Changes: https://git.openjdk.java.net/jdk/pull/4894/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4894&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271219 Stats: 91 lines in 4 files changed: 56 ins; 25 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/4894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4894/head:pull/4894 PR: https://git.openjdk.java.net/jdk/pull/4894 From dcubed at openjdk.java.net Fri Jul 23 19:14:03 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 23 Jul 2021 19:14:03 GMT Subject: RFR: 8271219: [REDO] JDK-8271063 Print injected fields for InstanceKlass In-Reply-To: References: Message-ID: <2KI_4N0oAXYqs5PVKKlse8cMvUpBoEQD830J3Z6CTs8=.5ade3274-dad2-4a2a-9256-4fe1f146349c@github.com> On Fri, 23 Jul 2021 18:48:21 GMT, Coleen Phillimore wrote: > There was a test bug in the JDK-8271063. I thought I'd check the output of Method->print() except it's only available in debug mode. The basic diff is line 56 and 63 in the gtest. Retested with mach5 tier1 on 5 Oracle platforms. Thumbs up! I also compared the original patch to the new patch. It would be good to have one of the original reviewers chime in on this PR. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4894 From fparain at openjdk.java.net Fri Jul 23 20:16:06 2021 From: fparain at openjdk.java.net (Frederic Parain) Date: Fri, 23 Jul 2021 20:16:06 GMT Subject: RFR: 8271219: [REDO] JDK-8271063 Print injected fields for InstanceKlass In-Reply-To: References: Message-ID: <_82qhVpqW8cDhiYBl4VC8Zoxcy-EMXppLUZdmkvx6Ro=.4e191b67-6159-4ce4-a893-a2578ba656b1@github.com> On Fri, 23 Jul 2021 18:48:21 GMT, Coleen Phillimore wrote: > There was a test bug in the JDK-8271063. I thought I'd check the output of Method->print() except it's only available in debug mode. The basic diff is line 56 and 63 in the gtest. Retested with mach5 tier1 on 5 Oracle platforms. LGTM Fred ------------- Marked as reviewed by fparain (Committer). PR: https://git.openjdk.java.net/jdk/pull/4894 From coleenp at openjdk.java.net Fri Jul 23 20:57:06 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 23 Jul 2021 20:57:06 GMT Subject: Integrated: 8271219: [REDO] JDK-8271063 Print injected fields for InstanceKlass In-Reply-To: References: Message-ID: On Fri, 23 Jul 2021 18:48:21 GMT, Coleen Phillimore wrote: > There was a test bug in the JDK-8271063. I thought I'd check the output of Method->print() except it's only available in debug mode. The basic diff is line 56 and 63 in the gtest. Retested with mach5 tier1 on 5 Oracle platforms. This pull request has now been integrated. Changeset: 286106dd Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/286106dd2ae899746c0e9d9a263ed4af9e56c536 Stats: 91 lines in 4 files changed: 56 ins; 25 del; 10 mod 8271219: [REDO] JDK-8271063 Print injected fields for InstanceKlass Reviewed-by: dcubed, fparain ------------- PR: https://git.openjdk.java.net/jdk/pull/4894 From coleenp at openjdk.java.net Fri Jul 23 20:57:05 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 23 Jul 2021 20:57:05 GMT Subject: RFR: 8271219: [REDO] JDK-8271063 Print injected fields for InstanceKlass In-Reply-To: References: Message-ID: <72UDBKx37HThv2D6H0Vp3s97SvEgHuKPsoqr0lKpHZ4=.adcc4e7a-c5aa-4d30-92ba-8fb5b8bbb8ba@github.com> On Fri, 23 Jul 2021 18:48:21 GMT, Coleen Phillimore wrote: > There was a test bug in the JDK-8271063. I thought I'd check the output of Method->print() except it's only available in debug mode. The basic diff is line 56 and 63 in the gtest. Retested with mach5 tier1 on 5 Oracle platforms. Thanks Dan and Fred! ------------- PR: https://git.openjdk.java.net/jdk/pull/4894 From stuefe at openjdk.java.net Sat Jul 24 04:58:04 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 24 Jul 2021 04:58:04 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v2] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 13:31:10 GMT, Coleen Phillimore wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix +UseMallocOnly on 32-bit > > I don't know if I can speak for everybody but I think cleanups in this area would be welcome if they're incremental and we can plan for time to review them. This hasn't been an area of pain for us (that I know of), but if it has for SAP, maybe cleanups like allowing packing in arenas has a higher priority. > Having a GrowableArray for UseMallocOnly might make sense. I was somewhat surprised by this implementation last week when I first looked at it closely. Or maybe UseMallocOnly could be removed in favor of memory guards in arena in debug mode, since UseMallocOnly is sparingly tested. That would be my preference anyway. > Thanks for taking this on and your thoughts on this. @coleenp, @kimbarrett is this current version okay for you? ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From github.com+58006833+xbzhang99 at openjdk.java.net Sat Jul 24 05:18:29 2021 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Sat, 24 Jul 2021 05:18:29 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v3] In-Reply-To: References: Message-ID: <9XBsUYvWe8qbOz-4xQH3sKDE_QtzPcJ6zlJJkF6r9co=.59658cbc-0e5f-4f76-90d2-3a59b9384ff5@github.com> > Intel introduced a new instruction ?serialize? which ensures that all modifications to flags, registers, and memory by previous instructions are completed and all buffered writes are drained to memory before the next instruction is fetched and executed. It is a serializing instruction and can be used to implement cross modify fence (OrderAccess::cross_modify_fence_impl) more efficiently than using ?cpuid? on supported 32-bit and 64-bit x86 platforms. > > The availability of the SERIALIZE instruction is indicated by the presence of the CPUID feature flag SERIALIZE, bit 14 of the EDX register in sub-leaf CPUID:7H.0H. > > https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: add support for bsd ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4848/files - new: https://git.openjdk.java.net/jdk/pull/4848/files/277a2b54..d70eb4a6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4848&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4848&range=01-02 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4848.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4848/head:pull/4848 PR: https://git.openjdk.java.net/jdk/pull/4848 From kbarrett at openjdk.java.net Sat Jul 24 09:00:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 24 Jul 2021 09:00:02 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: References: Message-ID: <9ZgL__5Vp9HlPdRffcb849EQUTEkodSdi_9YR4VACi4=.fb97ddba-6caf-4f84-b3f7-c795a499248c@github.com> On Wed, 21 Jul 2021 15:19:12 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > kbarret feedback I think this still isn't right. src/hotspot/share/memory/arena.hpp line 135: > 133: // on both 32 and 64 bit platforms. Required for atomic jlong operations on 32 bits. > 134: void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { > 135: x = ARENA_ALIGN(x); // note for 32 bits this should align _hwm as well. The comment here is stale. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4835 From kbarrett at openjdk.java.net Sat Jul 24 09:00:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 24 Jul 2021 09:00:02 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v2] In-Reply-To: References: <2pyhkqWrPeCCq-mva2BtVRH8GXQsWF6wW7cyTKcNJj8=.95a9ed9f-5e78-4e42-b385-4190ec8eec9b@github.com> <4EvTy1zZI6i_Sf3kZuayPvLsRgMifI7Fauy7EenaCGE=.1908ff49-dcd2-42bf-b038-ed62da6418f3@github.com> Message-ID: On Thu, 22 Jul 2021 12:59:42 GMT, Thomas Stuefe wrote: >> Can you just assert that size > 0 for Amalloc? > > Unfortunately not, since that is actually done. I would have to hunt down callers which do this. That is maybe worthwhile, but would be a larger change. The latest version doesn't seem any better than the previous. It could still return a misaligned value in the same circumstance, just more slowly because of the additional branch. Under what circumstances can _hwm not be 64bit aligned? What needs to be done to prevent it? It looks to me that if Arena::grow were to align its size argument, then it all falls out, assuming std::max_align_t is also appropriately aligned. Add asserts in Chunk allocation. Add a static assert that std::max_align_t is appropriately aligned and leave it to the person porting to the very weird platform to figure out what to do if it fails. And I think that's sufficient. (While use of alignof and std::max_align_t are not currently discussed in the HotSpot Style Guide, I would have no objection to their use when the alternative is something more complicated and less robust. And feel free to propose corresponding post facto changes to style guide.) ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Sat Jul 24 11:29:02 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 24 Jul 2021 11:29:02 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: <9ZgL__5Vp9HlPdRffcb849EQUTEkodSdi_9YR4VACi4=.fb97ddba-6caf-4f84-b3f7-c795a499248c@github.com> References: <9ZgL__5Vp9HlPdRffcb849EQUTEkodSdi_9YR4VACi4=.fb97ddba-6caf-4f84-b3f7-c795a499248c@github.com> Message-ID: On Sat, 24 Jul 2021 08:50:55 GMT, Kim Barrett wrote: > The comment here is stale. Yes, I know. See my initial PR text. I left it in since my first fix was rejected for being too complex, so I wanted to limit this patch to the absolute minimum needed. I do not think aligning the allocation size - or insisting on an aligned allocation size as the sister method `AmallocWords` does - is useful at all. Removing the comment just leaves the code, which leaves the reader to ask himself why it's in there. Leaving the comment at least states the reason the original author saw. We should either remove the code and the comment or leave both in place. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Sat Jul 24 11:45:04 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 24 Jul 2021 11:45:04 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 15:19:12 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > kbarret feedback > I think this still isn't right. >... > The latest version doesn't seem any better than the previous. It could still return a misaligned value in the same circumstance, just more slowly because of the additional branch. How? I must be blind. - If size > 0: Amalloc aligns _hwm to 64-bit and returns a 64-bit aligned address -> ok - if size == 0: Amalloc does not change the arena, but returns a potentially unaligned pointer - but that is fine too since that pointer cannot be used for storing anything. Certainly nothing needing 64-bit alignment. What am I overlooking here? > Under what circumstances can _hwm not be 64bit aligned? What needs to be done to prevent it? _hwm is misaligned by the caller mixing Amalloc and AmallocWords on 32-bit. AmallocWords advances _hwm by word size. Then, it is not 64-bit aligned anymore. A subsequent Amalloc() call needs to return a 64-bit aligned pointer, so it needs to align _hwm up. A subsequent AmallocWords() call, however, would be fine with _hwm as it is, so the alignment should really be done at Amalloc(). We cannot just align the allocation size to always be 64-bit, since on an Arena that uses repeated calls to AmallocWords() (e.g., HandleArea) this would create allocation padding and waste memory or even crash if the arena cannot handle padding. > It looks to me that if Arena::grow were to align its size argument, then it all falls out, assuming std::max_align_t is also appropriately aligned. No, since this only ensures the chunk boundaries are properly aligned. The first call to AmallocWords - on a 32-bit platform - will misalign _hwm for potential follow-up calls to Amalloc(). In short, mixing allocation calls with different alignment requirements means the calls with higher alignment guarantees must align. Alternatively, you must ensure that for a given Arena, only one alignment is used - I proposed that alternative in my first patch. Make alignment property of the Arena and only allow allocations with that alignment. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From kim.barrett at oracle.com Sat Jul 24 12:23:25 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 24 Jul 2021 12:23:25 +0000 Subject: JNI WeakGlobalRefs In-Reply-To: References: Message-ID: <5F4FD9CE-1CC3-47D6-8D4B-4987C2ACE955@oracle.com> > On Jul 21, 2021, at 9:25 PM, Hans Boehm wrote: > > I'm concerned that the current semantics of JNI WeakGlobalRefs are still > dangerous in a very subtle way that is hidden in the spec. The current > (14+) spec says: I'm responding to various parts of your different emails here, and not being linear in the discussion. Hopefully I'm not being too confusing. > (I found some discussion from JDK-8220617, including a message I posted. > Unfortunately, it seems to me that all of us overlooked this issue?) I don't think the problems arising from WeakGlobalRef's being able to access post-WeakReference clearing or post-finalization were overlooked. What you are saying is "well known", and there are multiple previous discussions. The most recent I can find is here: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043919.html and in particular, here: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043924.html > Based on the current spec, you can get a similar effect to a finalizer by > enqueuing a WeakReference. In reality, it might be the case that in the > absence of finalizers, GlobalWeakRefs are cleared when WeakReferences are > enqueued, But I don't think the spec says that. The implementation is still > allowed to enqueue WeakReferences, leave the object around for a while, and > clear PhantomReferences and GlobalWeakRefs later, leaving a window during > which WeakReferences have been enqueued and processed, but GlobalWeakRefs > will continue to generate strong references to the object. It's true that the spec permits the GC to use different "certain point in time"s for deciding whether an object has reached the various strengths. But to my knowledge (and Gil Tene's as well: https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-January/016298.html and I would trust him more than me on this question), no implementation actually does that, other than as required to support finalization. There's no obvious benefit, and doing so likely has undesirable costs (implementation complexity, runtime speed/space, pause durations, &etc.). Assuming that, your initial problem description isn't possible: > Consider what happens if I have a WeakGlobalRef W that refers to a Java > object A which, possibly indirectly, relies on an object F, where F is > finalizable, i.e. > > W - - -> A -----> ... -----> F > > Assume that F becomes invalid once it is finalized, e.g. because the > finalizer deallocates a native object that F relies on. This seems to be a > very common case. We are then exposed to the following scenario: > > 0) At some point, there are no longer any other references to A or F. > 1) F is enqueued for finalization. > 2) W is dereferenced by Thread 1, yielding a strong reference to A and > transitively to F. > 3) F is finalized. > 4) Thread 1 uses A and F, accessing F, which is no longer valid. > 5) Crash, or possibly memory corruption followed by a later crash elsewhere. F is only going to become finalizable at the same time that W is cleared. So Step 2 can't happen. There are other ways the strength of WeakGlobalRefs relative to WeakReference and finalization can get one in trouble though. I think the description of the reference strengths and the associated behaviors ought to be improved. But again, finalization makes that complicated. My impression is that there's a reluctance to make even clarifications in this area that have to deal with the complexity induced by finalization, when removing finalization would simplify so much. (And yeah, that's despite the slow rate of progress on removing finalization.) The deficiencies of resurrection-based cleanup, and the comparative superiority of proxy-based cleanup, have been known for a long time. (I have archived discussions on the topic from the late 1980s and early 1990s, some including you as a participant.) Finalization is long past its expiration date. It seems to me that rather than spending time discussing how to continue to work around its problems, it would be more productive to work toward removal. Some progress has been made. There's a punch-list for removing finalizers in the jdk here: https://bugs.openjdk.java.net/browse/JDK-8253568 There is also work being done on tooling to help with the process. Things like checking for usage of finalization, enabling or disabling at various granularities, measuring and reporting its costs, and so on. From stuefe at openjdk.java.net Sat Jul 24 12:39:20 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 24 Jul 2021 12:39:20 GMT Subject: RFR: JDK-8271242: add arena regression tests Message-ID: Hi, May I have reviews for these new regression tests for hotspot arenas, please? Arena coding is fragile, and recently we have changed it and plan to change it some more. But we have no tests, so regression tests should exist. This is a breakout from https://github.com/openjdk/jdk/pull/4784. In which I rewrote arena alignment handling and added a lot of tests. The arena changes were rejected as too complex, which is okay, but most of the tests are still good and I hope not controversial. Therefore I would like the tests to the gtest suite. This patch adds test, but does *not* change the Arenas themselves. Instead it tests current behavior, taking it as a baseline. Where the current behavior is inconsistent (eg. when dealing with 0-sized allocations), it nevertheless tests the inconsistent behavior, since callers may rely on it. This patch also adds a new jtreg gtest wrapper which tests the arena-releated part of the tests with +UseMallocOnly. As long as we support that mode, we should test it. --- Tests: - manual on Linux x64 and x86 both, with and without UseMallocOnly. - GHAs - nightlies at SAP are scheduled. ---- Note: this needs JDK-8270308, which is not yet pushed but under review here: https://github.com/openjdk/jdk/pull/4835. ------------- Commit messages: - start - JDK-8270308-Amalloc-aligns-size-but-not-return-value Changes: https://git.openjdk.java.net/jdk/pull/4898/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4898&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271242 Stats: 442 lines in 4 files changed: 441 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4898.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4898/head:pull/4898 PR: https://git.openjdk.java.net/jdk/pull/4898 From stuefe at openjdk.java.net Sat Jul 24 12:39:20 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 24 Jul 2021 12:39:20 GMT Subject: RFR: JDK-8271242: add arena regression tests In-Reply-To: References: Message-ID: <95uqV4-v0bMdOUw5wXmasl5tEpejp9ikufyrFXiOjSU=.0e8900d7-6934-4d71-88b1-7b967b71ea9c@github.com> On Sat, 24 Jul 2021 07:50:33 GMT, Thomas Stuefe wrote: > Hi, > > May I have reviews for these new regression tests for hotspot arenas, please? Arena coding is fragile, and recently we have changed it and plan to change it some more. But we have no tests, so regression tests should exist. > > This is a breakout from https://github.com/openjdk/jdk/pull/4784. In which I rewrote arena alignment handling and added a lot of tests. The arena changes were rejected as too complex, which is okay, but most of the tests are still good and I hope not controversial. Therefore I would like the tests to the gtest suite. > > This patch adds test, but does *not* change the Arenas themselves. Instead it tests current behavior, taking it as a baseline. Where the current behavior is inconsistent (eg. when dealing with 0-sized allocations), it nevertheless tests the inconsistent behavior, since callers may rely on it. > > This patch also adds a new jtreg gtest wrapper which tests the arena-releated part of the tests with +UseMallocOnly. As long as we support that mode, we should test it. > > --- > > Tests: > - manual on Linux x64 and x86 both, with and without UseMallocOnly. > - GHAs > - nightlies at SAP are scheduled. > > ---- > > Note: this needs JDK-8270308, which is not yet pushed but under review here: https://github.com/openjdk/jdk/pull/4835. src/hotspot/share/memory/arena.hpp line 137: > 135: x = ARENA_ALIGN(x); // note for 32 bits this should align _hwm as well. > 136: debug_only(if (UseMallocOnly) return malloc(x);) > 137: // JDK-8270308: Amalloc guarantees 64-bit alignment and we need to ensure that in case the preceding Please ignore, this belongs to JDK-8270308 which is waiting for final approval (https://github.com/openjdk/jdk/pull/4835). Will be removed before pushing. ------------- PR: https://git.openjdk.java.net/jdk/pull/4898 From kbarrett at openjdk.java.net Sat Jul 24 13:51:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 24 Jul 2021 13:51:01 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 15:19:12 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > kbarret feedback Sorry about the typo. I meant to ask, how can `_max` not be 64bit aligned. As that is what the comment is claiming can happen, and that is where things can go wrong. If grow() were to align the allocation size then I don't think that can happen and all would be good. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Sat Jul 24 15:37:03 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 24 Jul 2021 15:37:03 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: References: Message-ID: On Sat, 24 Jul 2021 13:48:25 GMT, Kim Barrett wrote: > Sorry about the typo. I meant to ask, how can `_max` not be 64bit aligned. As that is what the comment is claiming can happen, and that is where things can go wrong. If grow() were to align the allocation size then I don't think that can happen and all would be good. Oh, okay. I see what you mean. Arena::_max is not 64-bit aligned currently, but maybe that is easier to fix than I thought: Arena::_max depeds on the chunk length (the payload length). I would not want to align Arena::_max itself, because there are too many entry points (all that save restore code directly modifies it). But maybe the chunk length itself? The chunk length is either handed in as initsize parameter to `Arena::Arena(MEMFLAGS memflag, size_t init_size);`, which can be anything. But we could align it without problem. We already do, to word size. Could align to 64bit instead. Or, chunk length is one of the default chunk lengths - `Chunk::size, tinysize` etc. Those are not aligned correctly, since that `slack` offset which we use to keep waste low if libc uses buddy allocation is not aligned correctly. Its 20. If we make it 24, its 64-bit aligned, and all default chunk lengths are 64-bit aligned. I think that may work. I have to check. It would certainly be cleaner. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Sat Jul 24 18:04:29 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 24 Jul 2021 18:04:29 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v4] In-Reply-To: References: Message-ID: > Hi, > > may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. > > The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: > > > Arena ar; > p1 = ar.AmallocWords(); // return 32bit aligned address > p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. > > > This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . > > This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. > > Remaining issues: > > - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. > - The chunk dimensions are not guaranteed to be 64-bit aligned: > 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. > 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. > 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. > > Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. > > ----- > > Tests: > > - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) > - GHA > - Tests are scheduled at SAP. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: make sure Chunk payload boundaries are 64-bit aligned ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4835/files - new: https://git.openjdk.java.net/jdk/pull/4835/files/a6eaf586..b1324ee9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4835&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4835&range=02-03 Stats: 49 lines in 3 files changed: 29 ins; 5 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/4835.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4835/head:pull/4835 PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Sat Jul 24 18:07:05 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 24 Jul 2021 18:07:05 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 15:19:12 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > kbarret feedback In this new version, the chunk payload area - and therefore, Arena::_max - is aligned to 64-bit. In order to do this we need to make sure the chunk length given when creating a chunk is 64-bit aligned. There are three places to fix: - the constants for the cached chunk sizes - if a caller specifies a custom chunk size as initial size when creating an arena - when the chunk grows In all three cases we align now to 64-bit. This works seamlessly. Tested this new version on x64 and x86 Ubuntu, works. I tested it also with my new gtests (see https://github.com/openjdk/jdk/pull/4898), no problems found. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From kbarrett at openjdk.java.net Sat Jul 24 20:54:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 24 Jul 2021 20:54:01 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: References: <9ZgL__5Vp9HlPdRffcb849EQUTEkodSdi_9YR4VACi4=.fb97ddba-6caf-4f84-b3f7-c795a499248c@github.com> Message-ID: On Sat, 24 Jul 2021 11:26:04 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/arena.hpp line 135: >> >>> 133: // on both 32 and 64 bit platforms. Required for atomic jlong operations on 32 bits. >>> 134: void* Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) { >>> 135: x = ARENA_ALIGN(x); // note for 32 bits this should align _hwm as well. >> >> The comment here is stale. > >> The comment here is stale. > > Yes, I know. See my initial PR text. > > I left it in since my first fix was rejected for being too complex, so I wanted to limit this patch to the absolute minimum needed. > > I do not think aligning the allocation size - or insisting on an aligned allocation size as the sister method `AmallocWords` does - is useful at all. Removing the comment just leaves the code, which leaves the reader to ask himself why it's in there. Leaving the comment at least states the reason the original author saw. We should either remove the code and the comment or leave both in place. The length alignment (either by adjustment or by precondition) is what lets the ARENA_ALIGN of _hwm be 32bit only in Amalloc, and not needed in AmallocWords. Maybe you are suggesting dropping the size alignment and instead unconditionally aligning _hwm as needed for the operation? Maybe that's the next RFE? ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From kbarrett at openjdk.java.net Sat Jul 24 20:54:00 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 24 Jul 2021 20:54:00 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v4] In-Reply-To: References: Message-ID: <_hi64-QlEARE58dnqovdz-N7y-my5UP4-9PY1osR-2s=.24234f54-1fde-4943-865e-7d83dc008f79@github.com> On Sat, 24 Jul 2021 18:04:29 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > make sure Chunk payload boundaries are 64-bit aligned This looks much better. But I think you missed a place. If AmallocWords is called with a non-64bit-aligned value (it could be only 32bit aligned on a 32bit platform), and it calls grow(), grow will call the chunk allocator with that length, which fails the precondition because it's not BytesPerLong aligned. I think grow() needs to call ARENA_ALIGN on the length on 32bit platforms. That also suggests an additional test. I like the new comment for Chunk::operator new. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4835 From aw at openjdk.java.net Sun Jul 25 01:01:21 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Sun, 25 Jul 2021 01:01:21 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() [v2] In-Reply-To: References: Message-ID: <8pIzHubeG8Uxz534lizR7KFFou-zLxecGWa_lPlWSew=.82bff8e7-b5c9-4737-9da3-759498adb0ed@github.com> > Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. > Extended the test from JDK-8269592 to cover this case. Andreas Woess has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() ------------- Changes: https://git.openjdk.java.net/jdk/pull/4872/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4872&range=01 Stats: 78 lines in 2 files changed: 41 ins; 19 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/4872.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4872/head:pull/4872 PR: https://git.openjdk.java.net/jdk/pull/4872 From aw at openjdk.java.net Sun Jul 25 01:01:22 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Sun, 25 Jul 2021 01:01:22 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() [v2] In-Reply-To: References: Message-ID: <8qYTy1fw7Mzmp4YiGuRFD64dd30lSu7fuqIrpAk4uVY=.3d94de68-0b59-4c1f-b783-2c5c8a2c5417@github.com> On Fri, 23 Jul 2021 02:30:53 GMT, Andreas Woess wrote: >> test/hotspot/jtreg/compiler/jvmci/compilerToVM/IterateFramesNative.java line 47: >> >>> 45: * -XX:+DoEscapeAnalysis -XX:-UseCounterDecay >>> 46: * compiler.jvmci.compilerToVM.IterateFramesNative >>> 47: * @run main/othervm -Xbatch -Xcomp -Xbootclasspath/a:. >> >> I think `-Xcomp` implies `-Xbatch`. >> I think you should add `-XX:CompileOnly=jdk.vm.ci.hotspot.CompilerToVM::iterateFrames` since you only really care about that method being compiled for sake of the test. > > Hm, actually, I don't really need to use `-Xcomp` since the test should already trigger compilation anyway. It was just a way to ensure it's really compiled. I'll replace it with an assert in the test that the method is compiled. Reverted to using `-Xcomp` but with `-XX:CompileOnly`. Some configurations did not trigger compilation in the expected iteration; I've probably just got the threshold wrong, but I think the `-Xcomp` solution is less fragile and thus preferable. ------------- PR: https://git.openjdk.java.net/jdk/pull/4872 From stuefe at openjdk.java.net Sun Jul 25 05:22:03 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sun, 25 Jul 2021 05:22:03 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v5] In-Reply-To: References: Message-ID: > Hi, > > may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. > > The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: > > > Arena ar; > p1 = ar.AmallocWords(); // return 32bit aligned address > p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. > > > This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . > > This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. > > Remaining issues: > > - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. > - The chunk dimensions are not guaranteed to be 64-bit aligned: > 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. > 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. > 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. > > Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. > > ----- > > Tests: > > - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) > - GHA > - Tests are scheduled at SAP. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Fix Arena::grow for misaligned large grow sizes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4835/files - new: https://git.openjdk.java.net/jdk/pull/4835/files/b1324ee9..d3700fd9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4835&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4835&range=03-04 Stats: 22 lines in 2 files changed: 21 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4835.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4835/head:pull/4835 PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Sun Jul 25 05:27:13 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sun, 25 Jul 2021 05:27:13 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v4] In-Reply-To: <_hi64-QlEARE58dnqovdz-N7y-my5UP4-9PY1osR-2s=.24234f54-1fde-4943-865e-7d83dc008f79@github.com> References: <_hi64-QlEARE58dnqovdz-N7y-my5UP4-9PY1osR-2s=.24234f54-1fde-4943-865e-7d83dc008f79@github.com> Message-ID: On Sat, 24 Jul 2021 20:50:37 GMT, Kim Barrett wrote: > This looks much better. But I think you missed a place. > > If AmallocWords is called with a non-64bit-aligned value (it could be only 32bit aligned on a 32bit platform), and it calls grow(), grow will call the chunk allocator with that length, which fails the precondition because it's not BytesPerLong aligned. I think grow() needs to call ARENA_ALIGN on the length on 32bit platforms. You are right. I even spelled it out in my last comment, but then did not fix it. I added a test, saw that it fires on 32-bit, then fixed Arena::grow(). I also limited all new tests to 32-bit to save some cycles on 64-bit. One remaining worry I have is that when mixing Amalloc and AmallocWords now we align correctly, that is fine. But if the Arena cannot handle gaps - e.g. HandleArea - difficult to analyze crashes can happen on 32-bit. The original cause would be that the author mixes Amalloc and AmallocWords for an Arena where he should stick to one alignment only. My first patch filled alignment gaps in debug with a pattern, to trip off the analyzer. What do you think, should I do this here too? ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From kbarrett at openjdk.java.net Sun Jul 25 10:33:08 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 25 Jul 2021 10:33:08 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v5] In-Reply-To: References: Message-ID: On Sun, 25 Jul 2021 05:22:03 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Fix Arena::grow for misaligned large grow sizes Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Sun Jul 25 11:42:11 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sun, 25 Jul 2021 11:42:11 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: References: Message-ID: <0LzqmH0tRLVtFjvYlQ1E3eQoCHqTMokvpm4-aFCHYz0=.08dae1f4-84f7-41fb-957d-116a43bbcf9c@github.com> On Sat, 24 Jul 2021 13:48:25 GMT, Kim Barrett wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> kbarret feedback > > Sorry about the typo. I meant to ask, how can `_max` not be 64bit aligned. As that is what the comment is claiming can happen, and that is where things can go wrong. If grow() were to align the allocation size then I don't think that can happen and all would be good. Thanks, @kimbarrett ! @coleenp are you ok with the latest version too? ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From aw at openjdk.java.net Sun Jul 25 16:22:09 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Sun, 25 Jul 2021 16:22:09 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() [v2] In-Reply-To: References: Message-ID: <9_5EIKPwGb9IlzdIPLF53kwZlIQVZNdvGiFdYDmNO60=.5579610b-bd8b-4c03-bc8c-5b9c8c0e532d@github.com> On Fri, 23 Jul 2021 16:10:06 GMT, Vladimir Kozlov wrote: >> Andreas Woess has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: >> >> 8271140: Fix native frame handling in vframeStream::asJavaVFrame() > > Testing show failures. Please, investigate. @vnkozlov Tests are all green now. I've only changed the test slightly to reliably trigger compilation (using `-Xcomp` rather than compilation threshold). ------------- PR: https://git.openjdk.java.net/jdk/pull/4872 From njian at openjdk.java.net Mon Jul 26 04:56:38 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 26 Jul 2021 04:56:38 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8267356: AArch64: Vector API SVE codegen support This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: 1. Code generation for Vector API c2 IR nodes with SVE. 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. 3. Some more SVE assemblers (and tests) used by the codegen part. Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. ------------- Changes: https://git.openjdk.java.net/jdk/pull/4122/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=01 Stats: 5676 lines in 13 files changed: 4524 ins; 188 del; 964 mod Patch: https://git.openjdk.java.net/jdk/pull/4122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4122/head:pull/4122 PR: https://git.openjdk.java.net/jdk/pull/4122 From dholmes at openjdk.java.net Mon Jul 26 06:00:07 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 26 Jul 2021 06:00:07 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v3] In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 17:33:37 GMT, Xubo Zhang wrote: >> I was compiling using --with-toolchain-version=2019 >> _serialize() is defined here https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=seri&expand=6062 > > Also ran specjvm2008 successfully You are linking to Intel documentation of the Intel C++ compiler. This `_serialize()` intrinsic needs to be available in Microsoft Visual Studio. ------------- PR: https://git.openjdk.java.net/jdk/pull/4848 From dholmes at openjdk.java.net Mon Jul 26 06:00:07 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 26 Jul 2021 06:00:07 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v3] In-Reply-To: <9XBsUYvWe8qbOz-4xQH3sKDE_QtzPcJ6zlJJkF6r9co=.59658cbc-0e5f-4f76-90d2-3a59b9384ff5@github.com> References: <9XBsUYvWe8qbOz-4xQH3sKDE_QtzPcJ6zlJJkF6r9co=.59658cbc-0e5f-4f76-90d2-3a59b9384ff5@github.com> Message-ID: <0HVfX-mQmTK0pqke8Guna_tATfqMNFKVd0-sSlxCqdk=.2d361c2f-bafe-495a-97b7-d8bccc45d983@github.com> On Sat, 24 Jul 2021 05:18:29 GMT, Xubo Zhang wrote: >> Intel introduced a new instruction ?serialize? which ensures that all modifications to flags, registers, and memory by previous instructions are completed and all buffered writes are drained to memory before the next instruction is fetched and executed. It is a serializing instruction and can be used to implement cross modify fence (OrderAccess::cross_modify_fence_impl) more efficiently than using ?cpuid? on supported 32-bit and 64-bit x86 platforms. >> >> The availability of the SERIALIZE instruction is indicated by the presence of the CPUID feature flag SERIALIZE, bit 14 of the EDX register in sub-leaf CPUID:7H.0H. >> >> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > add support for bsd Changes requested by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4848 From dholmes at openjdk.java.net Mon Jul 26 06:42:08 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 26 Jul 2021 06:42:08 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v3] In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 05:56:51 GMT, David Holmes wrote: >> Also ran specjvm2008 successfully > > You are linking to Intel documentation of the Intel C++ compiler. This `_serialize()` intrinsic needs to be available in Microsoft Visual Studio. I ran it through our build system and it builds fine but I'd be happier if I could find some documentation that shows the availability of `_serialize()` in Visual Studio. ------------- PR: https://git.openjdk.java.net/jdk/pull/4848 From aph at openjdk.java.net Mon Jul 26 08:19:16 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 26 Jul 2021 08:19:16 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 04:56:38 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 73: > 71: case 8: > 72: return D; > 73: default: Please use an array here. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From aph at openjdk.java.net Mon Jul 26 08:23:14 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 26 Jul 2021 08:23:14 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 04:56:38 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3143: > 3141: #undef INSN > 3142: > 3143: // SVE load gather, store scatter (scalar plus vector) - 32-bit scaled offset In the Arm ARM Supplement for SVE, this group is called "Gather/scatter load/store/prefetch (SVE)". ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From aph at openjdk.java.net Mon Jul 26 08:34:14 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 26 Jul 2021 08:34:14 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 04:56:38 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. Basically looks good. One small thing: please check all comments in the Assembler to make sure they match the named instruction groupsl in DDI0584B_a_SVE/SVE_xml/xhtml/encodingindex.html. I know that Arm aren't consistent in their docs either, but we don't need to add to the confusion. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From aph at openjdk.java.net Mon Jul 26 08:38:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 26 Jul 2021 08:38:41 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 04:56:38 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 953: > 951: if (bt == T_FLOAT || bt == T_DOUBLE) { > 952: switch (cond) { > 953: case BoolTest::eq: sve_fcmeq(pd, size, pg, zn, zm); break; Wouldn't this work better by making cond and size arguments to sve_fcm ? ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From njian at openjdk.java.net Mon Jul 26 09:23:19 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 26 Jul 2021 09:23:19 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 04:56:38 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. > Basically looks good. > One small thing: please check all comments in the Assembler to make sure they match the named instruction groupsl in DDI0584B_a_SVE/SVE_xml/xhtml/encodingindex.html. I know that Arm aren't consistent in their docs either, but we don't need to add to the confusion. Thank you for the review, Andrew! I will revisit all the assembler comments to make them clear. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From njian at openjdk.java.net Mon Jul 26 09:23:20 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 26 Jul 2021 09:23:20 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 08:34:42 GMT, Andrew Haley wrote: >> Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8267356: AArch64: Vector API SVE codegen support >> >> This is the integration of current SVE work done in >> panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on >> 256-bit SVE environment could also generate optimized SVE >> instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further >> improvement to map mask to predicate register is under development at >> https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware >> with MaxVectorSize=16/32/64. > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 953: > >> 951: if (bt == T_FLOAT || bt == T_DOUBLE) { >> 952: switch (cond) { >> 953: case BoolTest::eq: sve_fcmeq(pd, size, pg, zn, zm); break; > > Wouldn't this work better by making cond and size arguments to sve_fcm ? Thanks! But we still need to map c2 specific condition value in BoolTest to instruction condition encoding. So, I think making cond arguments to fcm doesn't make help here. (This also aligns current NEON implementation.) What do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From aph at openjdk.java.net Mon Jul 26 10:51:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 26 Jul 2021 10:51:07 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 09:16:34 GMT, Ningsheng Jian wrote: > we still need to map c2 specific condition value in BoolTest to instruction condition encoding You'd still need a way to do that mapping, but it'd be a simple switch in a function that does the conversion to Assembler::Condition. Then you'd just have e.g. sve_cmp(GE, xword ...) and most of this code would disappear. You wouldn't need all the Assembler functions, either. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From dholmes at openjdk.java.net Mon Jul 26 11:21:10 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 26 Jul 2021 11:21:10 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v3] In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 06:38:57 GMT, David Holmes wrote: >> You are linking to Intel documentation of the Intel C++ compiler. This `_serialize()` intrinsic needs to be available in Microsoft Visual Studio. > > I ran it through our build system and it builds fine but I'd be happier if I could find some documentation that shows the availability of `_serialize()` in Visual Studio. Update: it builds fine with VS 2019, but not VS 2017. I don't know if we still expect to be able to build with VS 2017. If we do then the new code will need a compiler version guard on it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4848 From stuefe at openjdk.java.net Mon Jul 26 11:22:07 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 26 Jul 2021 11:22:07 GMT Subject: Withdrawn: JDK-8271242: add arena regression tests In-Reply-To: References: Message-ID: On Sat, 24 Jul 2021 07:50:33 GMT, Thomas Stuefe wrote: > Hi, > > May I have reviews for these new regression tests for hotspot arenas, please? Arena coding is fragile, and recently we have changed it and plan to change it some more. But we have no tests, so regression tests should exist. > > This is a breakout from https://github.com/openjdk/jdk/pull/4784. In which I rewrote arena alignment handling and added a lot of tests. The arena changes were rejected as too complex, which is okay, but most of the tests are still good and I hope not controversial. Therefore I would like the tests to the gtest suite. > > This patch adds test, but does *not* change the Arenas themselves. Instead it tests current behavior, taking it as a baseline. Where the current behavior is inconsistent (eg. when dealing with 0-sized allocations), it nevertheless tests the inconsistent behavior, since callers may rely on it. > > This patch also adds a new jtreg gtest wrapper which tests the arena-releated part of the tests with +UseMallocOnly. As long as we support that mode, we should test it. > > --- > > Tests: > - manual on Linux x64 and x86 both, with and without UseMallocOnly. > - GHAs > - nightlies at SAP are scheduled. > > ---- > > Note: this needs JDK-8270308, which is not yet pushed but under review here: https://github.com/openjdk/jdk/pull/4835. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4898 From stuefe at openjdk.java.net Mon Jul 26 12:15:55 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 26 Jul 2021 12:15:55 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable Message-ID: Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing. --------- NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. However, NMT is of limited use due to the following restrictions: - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. ------ This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. Changes in detail: - pre-NMT-init handling: - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. - Changes to NMT: - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. - New utility functions to translate tracking level from/to strings added to `NMTUtil` - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. - Gtests: - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. - jtreg: - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. ------------- Tests: - ran manually all new tests on 64-bit and 32-bit Linux - GHAs - The patch has been active in SAPs test systems for a while now. ------------- Commit messages: - NMT late init, hashmap variant Changes: https://git.openjdk.java.net/jdk/pull/4874/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4874&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256844 Stats: 1616 lines in 22 files changed: 1227 ins; 346 del; 43 mod Patch: https://git.openjdk.java.net/jdk/pull/4874.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4874/head:pull/4874 PR: https://git.openjdk.java.net/jdk/pull/4874 From coleenp at openjdk.java.net Mon Jul 26 13:36:08 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 26 Jul 2021 13:36:08 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v5] In-Reply-To: References: Message-ID: On Sun, 25 Jul 2021 05:22:03 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Fix Arena::grow for misaligned large grow sizes These are small comments but maybe important. src/hotspot/share/memory/arena.cpp line 185: > 183: // | |p | | > 184: // +-----------+--+--------------------------------------------+ > 185: // A B C D Hooray for ascii art! src/hotspot/share/memory/arena.cpp line 197: > 195: assert(is_aligned(length, BytesPerLong), "chunk payload length not 64-bit aligned: " > 196: SIZE_FORMAT ".", length); > 197: size_t bytes = ARENA_ALIGN(sizeofChunk) + length; Are Chunk::size, medium_size, init_size and tiny size all 64 bit aligned? I see that they're aligned here again but it seems like those sizes being invariantly aligned to 64 bit would prevent future bugs (if they're not already). Also maybe needing a static_assert ? src/hotspot/share/memory/arena.cpp line 208: > 206: vm_exit_out_of_memory(bytes, OOM_MALLOC_ERROR, "Chunk::new"); > 207: } > 208: assert(is_aligned(p, BytesPerLong), "Chunk start address not malloc aligned?"); Should this be ARENA_AMALLOC_ALIGNMENT, in case we have to change it back to 128 bits for some reason? src/hotspot/share/memory/arena.hpp line 55: > 53: // default sizes; make them slightly smaller than 2**k to guard against > 54: // buddy-system style malloc implementations > 55: // Note: please keep these constants 64-bit aligned. Can you add a static_assert(is_aligned(slack, ARENA_ALIGNMENT)? test/hotspot/gtest/memory/test_arena.cpp line 2: > 1: /* > 2: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. Did you want an SAP copyright? test/hotspot/gtest/memory/test_arena.cpp line 30: > 28: #ifndef LP64 > 29: // These tests below are about alignment issues when mixing Amalloc and AmallocWords. > 30: // Since on 64-bit these APIs offer the same alignment, they only matter for 32-bit. Thank you so much for testing this on 32 bit platforms. test/hotspot/gtest/memory/test_arena.cpp line 39: > 37: void* p2 = ar.Amalloc(BytesPerLong); > 38: ASSERT_TRUE(is_aligned(p1, BytesPerWord)); > 39: ASSERT_TRUE(is_aligned(p2, BytesPerLong)); Should BytesPerLong in this test be ARENA_AMALLOC_ALIGNMENT? ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4835 From coleenp at openjdk.java.net Mon Jul 26 13:36:09 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 26 Jul 2021 13:36:09 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v3] In-Reply-To: References: <9ZgL__5Vp9HlPdRffcb849EQUTEkodSdi_9YR4VACi4=.fb97ddba-6caf-4f84-b3f7-c795a499248c@github.com> Message-ID: On Sat, 24 Jul 2021 20:49:38 GMT, Kim Barrett wrote: >>> The comment here is stale. >> >> Yes, I know. See my initial PR text. >> >> I left it in since my first fix was rejected for being too complex, so I wanted to limit this patch to the absolute minimum needed. >> >> I do not think aligning the allocation size - or insisting on an aligned allocation size as the sister method `AmallocWords` does - is useful at all. Removing the comment just leaves the code, which leaves the reader to ask himself why it's in there. Leaving the comment at least states the reason the original author saw. We should either remove the code and the comment or leave both in place. > > The length alignment (either by adjustment or by precondition) is what lets the ARENA_ALIGN of _hwm be 32bit only in Amalloc, and not needed in AmallocWords. Maybe you are suggesting dropping the size alignment and instead unconditionally aligning _hwm as needed for the operation? Maybe that's the next RFE? Oh please no more RFEs unless Kim can review them. This turned out much more difficult than we thought it would be! This comment is true though, right? For 32 bits if you do an AmallocWords then Amalloc, the _hwm will then be aligned to ARENA_ALIGN, ie 64 bits. That's ok. That seems like just what we want. Although it seems like this latter comment says what I think this does better. I'm fine with leaving the comment though. Makes you think about it anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From coleenp at openjdk.java.net Mon Jul 26 13:42:03 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 26 Jul 2021 13:42:03 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v4] In-Reply-To: References: <_hi64-QlEARE58dnqovdz-N7y-my5UP4-9PY1osR-2s=.24234f54-1fde-4943-865e-7d83dc008f79@github.com> Message-ID: On Sun, 25 Jul 2021 05:23:41 GMT, Thomas Stuefe wrote: > My first patch filled alignment gaps in debug with a pattern, to trip off the analyzer. What do you think, should I do this here too? Maybe. Aren't arena's freed with a zap pattern so you'd find bugs that way? Anyway, it might be better as a separate RFE and new round of testing. Thanks for default aligning the chunks though because this is what one would expect without looking at the code more closely. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Mon Jul 26 17:11:37 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 26 Jul 2021 17:11:37 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v5] In-Reply-To: References: Message-ID: <1JQCK7W4Vs-pcb2qMsQSrOAwKQsQmuxGGwhpxZ-8x8g=.c1ec15c0-4231-4100-ab13-1137117b048d@github.com> On Mon, 26 Jul 2021 13:12:26 GMT, Coleen Phillimore wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Arena::grow for misaligned large grow sizes > > src/hotspot/share/memory/arena.hpp line 55: > >> 53: // default sizes; make them slightly smaller than 2**k to guard against >> 54: // buddy-system style malloc implementations >> 55: // Note: please keep these constants 64-bit aligned. > > Can you add a static_assert(is_aligned(slack, ARENA_ALIGNMENT)? Sure! ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From never at openjdk.java.net Mon Jul 26 17:17:53 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Mon, 26 Jul 2021 17:17:53 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() [v2] In-Reply-To: <8pIzHubeG8Uxz534lizR7KFFou-zLxecGWa_lPlWSew=.82bff8e7-b5c9-4737-9da3-759498adb0ed@github.com> References: <8pIzHubeG8Uxz534lizR7KFFou-zLxecGWa_lPlWSew=.82bff8e7-b5c9-4737-9da3-759498adb0ed@github.com> Message-ID: On Sun, 25 Jul 2021 01:01:21 GMT, Andreas Woess wrote: >> Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. >> Extended the test from JDK-8269592 to cover this case. > > Andreas Woess has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8271140: Fix native frame handling in vframeStream::asJavaVFrame() Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4872 From stuefe at openjdk.java.net Mon Jul 26 17:21:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 26 Jul 2021 17:21:39 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v5] In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 13:23:16 GMT, Coleen Phillimore wrote: > Are Chunk::size, medium_size, init_size and tiny size all 64 bit aligned? I see that they're aligned here again but it seems like those sizes being invariantly aligned to 64 bit would prevent future bugs (if they're not already). Also maybe needing a static_assert ? They are all aligned now - and you are right, I can add static asserts to assert that. I do not align them here, I just assert that they are aligned (via assert(is_aligned(length)). The only thing actively I align here is the header size aka sizeof(Chunk). I tried to force the compiler to generate Chunk as 64-bit aligned structure but gave up after fiddling with pragmas for half an hour. > src/hotspot/share/memory/arena.cpp line 208: > >> 206: vm_exit_out_of_memory(bytes, OOM_MALLOC_ERROR, "Chunk::new"); >> 207: } >> 208: assert(is_aligned(p, BytesPerLong), "Chunk start address not malloc aligned?"); > > Should this be ARENA_AMALLOC_ALIGNMENT, in case we have to change it back to 128 bits for some reason? Yes, I can do that too. I should try to set it to 128bits and test it. With 128 bit we may run into alignment problems with 32-bit platforms, since we rely on ARENA_AMALLOC_ALIGNMENT being <= malloc alignment, and on 32-bit I believe malloc alignment is just 64 bit. Not a showstopper, but may require a bit more care. > test/hotspot/gtest/memory/test_arena.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. > > Did you want an SAP copyright? I could add one. Probably should, thanks for noticing. > test/hotspot/gtest/memory/test_arena.cpp line 30: > >> 28: #ifndef LP64 >> 29: // These tests below are about alignment issues when mixing Amalloc and AmallocWords. >> 30: // Since on 64-bit these APIs offer the same alignment, they only matter for 32-bit. > > Thank you so much for testing this on 32 bit platforms. Sure. I run Ubuntu 20.4, with Debian multi-arch it's beautifully easy to have 32-bit and 64-bit compilers side by side. > test/hotspot/gtest/memory/test_arena.cpp line 39: > >> 37: void* p2 = ar.Amalloc(BytesPerLong); >> 38: ASSERT_TRUE(is_aligned(p1, BytesPerWord)); >> 39: ASSERT_TRUE(is_aligned(p2, BytesPerLong)); > > Should BytesPerLong in this test be ARENA_AMALLOC_ALIGNMENT? Yes, I can do that. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Mon Jul 26 17:27:30 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 26 Jul 2021 17:27:30 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v4] In-Reply-To: References: <_hi64-QlEARE58dnqovdz-N7y-my5UP4-9PY1osR-2s=.24234f54-1fde-4943-865e-7d83dc008f79@github.com> Message-ID: On Mon, 26 Jul 2021 13:39:29 GMT, Coleen Phillimore wrote: > > My first patch filled alignment gaps in debug with a pattern, to trip off the analyzer. What do you think, should I do this here too? > Maybe. Aren't arena's freed with a zap pattern so you'd find bugs that way? Yes, you are right, maybe that's sufficient. I thought that the zap pattern would be overwritten with the first allocation wave (allocate a bunch, reset-to-mark, allocate again) but I see reset-to-mark - ultimately Chunk::chop() - also zaps, so this may be fine. > Anyway, it might be better as a separate RFE and new round of testing. Yes. But the default pattern may be fine already. > Thanks for default aligning the chunks though because this is what one would expect without looking at the code more closely. Sure! ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Mon Jul 26 18:13:11 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 26 Jul 2021 18:13:11 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v6] In-Reply-To: References: Message-ID: > Hi, > > may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. > > The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: > > > Arena ar; > p1 = ar.AmallocWords(); // return 32bit aligned address > p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. > > > This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . > > This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. > > Remaining issues: > > - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. > - The chunk dimensions are not guaranteed to be 64-bit aligned: > 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. > 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. > 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. > > Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. > > ----- > > Tests: > > - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) > - GHA > - Tests are scheduled at SAP. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: feedback coleen ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4835/files - new: https://git.openjdk.java.net/jdk/pull/4835/files/d3700fd9..cef2e4d1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4835&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4835&range=04-05 Stats: 20 lines in 2 files changed: 14 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/4835.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4835/head:pull/4835 PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Mon Jul 26 18:13:14 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 26 Jul 2021 18:13:14 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v5] In-Reply-To: References: Message-ID: <5wKtEgmQ-2PPKAMNs5kT_eQ0EXKlT4rndVPyir5ylEo=.3e8984fa-f84b-459b-9ef4-83703b5be922@github.com> On Sun, 25 Jul 2021 05:22:03 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Fix Arena::grow for misaligned large grow sizes New version: - added static asserts to test that the default chunk sizes are now properly aligned (since Chunk::slack is used to compute those sizes, this also tests Chunk::slack) - changed a number of literal checks for 64-bit aligned-ness to ARENA_AMALLOC_ALIGNMENT - added SAP copyright ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From coleenp at openjdk.java.net Mon Jul 26 18:32:34 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 26 Jul 2021 18:32:34 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v6] In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 18:13:11 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback coleen Looks great. Thank you! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4835 From coleenp at openjdk.java.net Mon Jul 26 18:32:35 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 26 Jul 2021 18:32:35 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v5] In-Reply-To: References: Message-ID: <9kq4TIxJkGqLsnXt0JJv_cilz0RdAz8F_Z79vFloIk8=.6f629fec-0750-4664-85b6-603690a8e471@github.com> On Mon, 26 Jul 2021 17:18:50 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/arena.cpp line 208: >> >>> 206: vm_exit_out_of_memory(bytes, OOM_MALLOC_ERROR, "Chunk::new"); >>> 207: } >>> 208: assert(is_aligned(p, BytesPerLong), "Chunk start address not malloc aligned?"); >> >> Should this be ARENA_AMALLOC_ALIGNMENT, in case we have to change it back to 128 bits for some reason? > > Yes, I can do that too. I should try to set it to 128bits and test it. > > With 128 bit we may run into alignment problems with 32-bit platforms, since we rely on ARENA_AMALLOC_ALIGNMENT being <= malloc alignment, and on 32-bit I believe malloc alignment is just 64 bit. Not a showstopper, but may require a bit more care. I was wondering if it would make an interesting test mode. ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From kvn at openjdk.java.net Mon Jul 26 19:42:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 26 Jul 2021 19:42:38 GMT Subject: RFR: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() [v2] In-Reply-To: <8pIzHubeG8Uxz534lizR7KFFou-zLxecGWa_lPlWSew=.82bff8e7-b5c9-4737-9da3-759498adb0ed@github.com> References: <8pIzHubeG8Uxz534lizR7KFFou-zLxecGWa_lPlWSew=.82bff8e7-b5c9-4737-9da3-759498adb0ed@github.com> Message-ID: <6pfPMfgFyfgz9HvzlT6KLqzWWZYjWkTKQFm8k8xBmEQ=.d3a06f7a-1075-4eb1-b82f-45a8c8343ff5@github.com> On Sun, 25 Jul 2021 01:01:21 GMT, Andreas Woess wrote: >> Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. >> Extended the test from JDK-8269592 to cover this case. > > Andreas Woess has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8271140: Fix native frame handling in vframeStream::asJavaVFrame() Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4872 From aw at openjdk.java.net Mon Jul 26 19:50:33 2021 From: aw at openjdk.java.net (Andreas Woess) Date: Mon, 26 Jul 2021 19:50:33 GMT Subject: Integrated: 8271140: Fix native frame handling in vframeStream::asJavaVFrame() In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 13:07:46 GMT, Andreas Woess wrote: > Follow-up to https://github.com/openjdk/jdk/pull/4625 ([JDK-8269592](https://bugs.openjdk.java.net/browse/JDK-8269592)) which added support for native frames to `vframeStreamCommon::asJavaVFrame()`. This change was not correct when `asJavaVFrame()` is called for a native frame that is the last frame on the stack (i.e. the first frame in the stream), in which case there's no `_prev_frame` yet. We don't actually need the extended frame information for native frames, so the fix is to just use the vframeStream's `_frame` and `_reg_map` for native frames. > Extended the test from JDK-8269592 to cover this case. This pull request has now been integrated. Changeset: 3aadae20 Author: Andreas Woess Committer: Tom Rodriguez URL: https://git.openjdk.java.net/jdk/commit/3aadae2077e9bf0a5900af79929b679bc6ec62b2 Stats: 78 lines in 2 files changed: 41 ins; 19 del; 18 mod 8271140: Fix native frame handling in vframeStream::asJavaVFrame() Reviewed-by: dnsimon, kvn, never ------------- PR: https://git.openjdk.java.net/jdk/pull/4872 From kbarrett at openjdk.java.net Mon Jul 26 19:52:59 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 26 Jul 2021 19:52:59 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v6] In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 18:13:11 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback coleen Looks even better after dealing with Coleen's recent comments. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4835 From hboehm at google.com Mon Jul 26 20:02:36 2021 From: hboehm at google.com (Hans Boehm) Date: Mon, 26 Jul 2021 13:02:36 -0700 Subject: JNI WeakGlobalRefs In-Reply-To: <5F4FD9CE-1CC3-47D6-8D4B-4987C2ACE955@oracle.com> References: <5F4FD9CE-1CC3-47D6-8D4B-4987C2ACE955@oracle.com> Message-ID: On Sat, Jul 24, 2021 at 5:23 AM Kim Barrett wrote: > > On Jul 21, 2021, at 9:25 PM, Hans Boehm wrote: > > > > I'm concerned that the current semantics of JNI WeakGlobalRefs are still > > dangerous in a very subtle way that is hidden in the spec. The current > > (14+) spec says: > > I'm responding to various parts of your different emails here, and not > being > linear in the discussion. Hopefully I'm not being too confusing. > > > (I found some discussion from JDK-8220617, including a message I posted. > > Unfortunately, it seems to me that all of us overlooked this issue?) > > I don't think the problems arising from WeakGlobalRef's being able to > access > post-WeakReference clearing or post-finalization were overlooked. What you > are saying is "well known", and there are multiple previous discussions. > The > most recent I can find is here: > > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043919.html > > and in particular, here: > > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-November/043924.html > > > Based on the current spec, you can get a similar effect to a finalizer by > > enqueuing a WeakReference. In reality, it might be the case that in the > > absence of finalizers, GlobalWeakRefs are cleared when WeakReferences are > > enqueued, But I don't think the spec says that. The implementation is > still > > allowed to enqueue WeakReferences, leave the object around for a while, > and > > clear PhantomReferences and GlobalWeakRefs later, leaving a window during > > which WeakReferences have been enqueued and processed, but GlobalWeakRefs > > will continue to generate strong references to the object. > > It's true that the spec permits the GC to use different "certain point in > time"s for deciding whether an object has reached the various strengths. > But > to my knowledge (and Gil Tene's as well: > > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-January/016298.html > and I would trust him more than me on this question), no implementation > actually does that, other than as required to support finalization. There's > no obvious benefit, and doing so likely has undesirable costs > (implementation > complexity, runtime speed/space, pause durations, &etc.). > Thanks for the useful pointers! To me, the problem is not really resurrection, but unintentional resurrection of no-longer-valid objects. But in a Java context, I agree that's a meaningless distinction. > Assuming that, your initial problem description isn't possible: > > > Consider what happens if I have a WeakGlobalRef W that refers to a Java > > object A which, possibly indirectly, relies on an object F, where F is > > finalizable, i.e. > > > > W - - -> A -----> ... -----> F > > > > Assume that F becomes invalid once it is finalized, e.g. because the > > finalizer deallocates a native object that F relies on. This seems to be > a > > very common case. We are then exposed to the following scenario: > > > > 0) At some point, there are no longer any other references to A or F. > > 1) F is enqueued for finalization. > > 2) W is dereferenced by Thread 1, yielding a strong reference to A and > > transitively to F. > > 3) F is finalized. > > 4) Thread 1 uses A and F, accessing F, which is no longer valid. > > 5) Crash, or possibly memory corruption followed by a later crash > elsewhere. > > F is only going to become finalizable at the same time that W is cleared. > So > Step 2 can't happen. > Interesting point. This perhaps shouldn't actually happen unless A and F are strongly connected, i.e. A is also reachable from F. (Even if it's not, the implementation needs to preclude that by not clearing W until it has traced objects reachable from F. But I think that's at most a performance issue.) But I'm not sure how one would specify that, or how much it actually helps. At a minimum, it still has to break if A is a client supplied object which is itself finalizable, e.g. because the object's type is a finalizable subclass of what the implementer of W expected. > > There are other ways the strength of WeakGlobalRefs relative to > WeakReference and finalization can get one in trouble though. > > I think the description of the reference strengths and the associated > behaviors ought to be improved. But again, finalization makes that > complicated. My impression is that there's a reluctance to make even > clarifications in this area that have to deal with the complexity induced > by > finalization, when removing finalization would simplify so much. (And yeah, > that's despite the slow rate of progress on removing finalization.) > > The deficiencies of resurrection-based cleanup, and the comparative > superiority of proxy-based cleanup, have been known for a long time. (I > have > archived discussions on the topic from the late 1980s and early 1990s, some > including you as a participant.) Finalization is long past its expiration > date. It seems to me that rather than spending time discussing how to > continue to work around its problems, it would be more productive to work > toward removal. Some progress has been made. There's a punch-list for > removing finalizers in the jdk here: > https://bugs.openjdk.java.net/browse/JDK-8253568 > There is also work being done on tooling to help with the process. Things > like checking for usage of finalization, enabling or disabling at various > granularities, measuring and reporting its costs, and so on. > >From my perspective, this looks like a great direction, but we still have a long way to go here. To really solve the problem, and converge on a simple model, it looks to me like we need to: 1) Remove all uses of finalization from the standard library. (In progress.) 2) Remove all uses of finalization from other Java libraries and application code. 3) Revise the spec to require simultaneous clearing of all kinds of Soft/Weak/Phantom References. IMO, (2) will take a really long time, even with tooling. (AFAICT, one technical obstacle is that the current Cleaner spec provides new easy way to share Cleaners across separately developed libraries. risking thread proliferation, and thus not making Cleaners a consistent win over finalizers today. But the main reason is that finalizers are fairly widely used, and it will take time to touch all that code. I don't see a way to mechanize that transformation.) If I had to speculate on a date for (3), I would guess somewhere in the 2030-2050 range. Is there an argument that I should be more optimistic? In the meantime, it still seems to me that WeakGlobalRefs are more likely to be used incorrectly than correctly, though you've convinced me that actual unpredictable failures are a bit less likely than I had thought. Hans From dholmes at openjdk.java.net Mon Jul 26 21:03:30 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 26 Jul 2021 21:03:30 GMT Subject: RFR: 8264543: Cross modify fence optimization for x86 [v3] In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 11:18:13 GMT, David Holmes wrote: >> I ran it through our build system and it builds fine but I'd be happier if I could find some documentation that shows the availability of `_serialize()` in Visual Studio. > > Update: it builds fine with VS 2019, but not VS 2017. I don't know if we still expect to be able to build with VS 2017. If we do then the new code will need a compiler version guard on it. Update: VS 2017 is the minimum supported compiler version so the new code will need to be enabled only for VS 2019. ------------- PR: https://git.openjdk.java.net/jdk/pull/4848 From david.holmes at oracle.com Mon Jul 26 21:06:13 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 27 Jul 2021 07:06:13 +1000 Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: Hi Thomas, On 26/07/2021 10:15 pm, Thomas Stuefe wrote: > Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing. Before looking at this, have you checked the startup performance impact? Thanks, David ----- > --------- > > NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. > > However, NMT is of limited use due to the following restrictions: > > - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. > - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. > - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. > - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. > > The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. > > The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. > > All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. > > And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. > > ------ > > This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. > > The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. > > The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. > > Changes in detail: > > - pre-NMT-init handling: > - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. > - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. > - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. > > - Changes to NMT: > - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. > - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. > - New utility functions to translate tracking level from/to strings added to `NMTUtil` > - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. > - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. > > - Gtests: > - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. > - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. > - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. > - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. > - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. > > - jtreg: > - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. > > ------------- > > Tests: > - ran manually all new tests on 64-bit and 32-bit Linux > - GHAs > - The patch has been active in SAPs test systems for a while now. > > ------------- > > Commit messages: > - NMT late init, hashmap variant > > Changes: https://git.openjdk.java.net/jdk/pull/4874/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4874&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8256844 > Stats: 1616 lines in 22 files changed: 1227 ins; 346 del; 43 mod > Patch: https://git.openjdk.java.net/jdk/pull/4874.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/4874/head:pull/4874 > > PR: https://git.openjdk.java.net/jdk/pull/4874 > From valeriep at openjdk.java.net Mon Jul 26 23:33:33 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Mon, 26 Jul 2021 23:33:33 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Thu, 22 Jul 2021 18:30:50 GMT, Anthony Scarpino wrote: >> src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 762: >> >>> 760: >>> 761: dst.put(out, 0, rlen); >>> 762: processed += srcLen; >> >> It seems that callers of this implGCMCrypt() method such as GCMEngine.doLastBlock() adds the returned value to the "processed" field which looks like double counting? However, some caller such as GCMEncrypt.doUpdate() does not. Seems inconsistent and may lead to wrong value for the "processed" field? > > All the callers that use GCMOperations, ie op.update(...), have the processed value updated. implGCMCrypt() calls op.update() and updates the value. It cannot double count 'processed' is not updated after implGCMCrypt(). I can see your point, but the other methods do not have access to 'processed' and would mean I copy that line 3 times elsewhere. I'd rather keep it as is As long the "processed" value is correct, it's fine. Not sure if I may be missing some subtle things, GCMEngine.implGCMCrypt(GCMOperation op, ByteBuffer src, ByteBuffer dst) impl would increment "processed" with "srcLen". But then in GCMEngine.doLastBlock(GCMOperation op, ByteBuffer buffer, ByteBuffer src, ByteBuffer dst) impl, it calls the GCMEngine.implGCMCrypt(GCMOperation op, ByteBuffer src, ByteBuffer dst) method and stores the return value into "resultLen" and then again increment "processed" with "resultLen" after op.doFinal(...) call. Since "resultLen" contains the number of bytes processed by GCMEngine.implGCMCrypt(GCMOperation op, ByteBuffer src, ByteBuffer dst) method already, adding it to "prcessed" looks like double counting. Not sure what did I miss. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From stuefe at openjdk.java.net Tue Jul 27 04:18:30 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 27 Jul 2021 04:18:30 GMT Subject: RFR: JDK-8270308: Amalloc aligns size but not return value (take 2) [v6] In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 18:13:11 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. >> >> The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: >> >> >> Arena ar; >> p1 = ar.AmallocWords(); // return 32bit aligned address >> p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. >> >> >> This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . >> >> This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. >> >> Remaining issues: >> >> - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. >> - The chunk dimensions are not guaranteed to be 64-bit aligned: >> 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. >> 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. >> 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. >> >> Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. >> >> ----- >> >> Tests: >> >> - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) >> - GHA >> - Tests are scheduled at SAP. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback coleen Thanks Coleen & Kim! ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From stuefe at openjdk.java.net Tue Jul 27 04:25:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 27 Jul 2021 04:25:41 GMT Subject: Integrated: JDK-8270308: Arena::Amalloc may return misaligned address on 32-bit In-Reply-To: References: Message-ID: On Tue, 20 Jul 2021 05:28:42 GMT, Thomas Stuefe wrote: > Hi, > > may I please have reviews for this fix. This fixes an issue with arena allocation alignment which can only happen on 32-bit. > > The underlying problem is that even though Arenas offer ways to allocate with different alignment (Amalloc and AmallocWords), allocation alignment is not guaranteed. This sequence will not work on 32-bit: > > > Arena ar; > p1 = ar.AmallocWords(); // return 32bit aligned address > p2 = ar.Amalloc(); // supposed to return 64bit aligned address but does not. > > > This patch is the bare minimum needed to fix the specific problem; I proposed a larger patch before which redid arena alignment handling but it was found too complex: https://github.com/openjdk/jdk/pull/4784 . > > This fix is limited to `Amalloc()` and aligns `_hwm` to be 64bit aligned before allocation. But since chunk boundaries are not guaranteed to be 64-bit aligned either, additional care must be taken to not overflow `_max`. Since this adds instructions into a hot allocation path, I restricted this code to 32 bit - it is only needed there. > > Remaining issues: > > - `Amalloc...` align the allocation size in an attempt to ensure allocation alignment. This is not needed nor sufficient, we could just remove that code. I left it untouched to keep the patch minimal. I also left the `// ... for 32 bits this should align _hwm as well.` comment in `Amalloc()` though I think it is wrong. > - The chunk dimensions are not guaranteed to be 64-bit aligned: > 1) Chunk bottom depends on Chunk start. We currently align the header size, but if the chunk starts at an unaligned address, this is not sufficient. It's not a real issue though as long as Chunks are C-heap allocated since malloc alignment is at least 64bit on our 32bit platforms. More of a beauty spot, since this is an implicit assumption which we don't really check. > 2) Chunk top and hence Arena::_max are not guaranteed to be 64-bit aligned either. They depend on the input chunk length, which is not even aligned for the standard chunk sizes used in ChunkPool. And users can hand in any size they want. Fixing this would require more widespread changes to the ChunkPool logic though, so I left it as it is. > 3) similarly, we cannot just align Arena::_max, because that is set in many places and we need to cover all of that; it ties in with Arena rollback as well. > > Because of (2) and (3), I had to add the overflow check into `Amalloc()` - any other way to solve this would result in more widespread changes. > > ----- > > Tests: > > - I tested the provided gtest on both 64-bit and 32-bit platforms (with and without the fix, without it shows the expected problem) > - GHA > - Tests are scheduled at SAP. This pull request has now been integrated. Changeset: 45d277fe Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/45d277feb04a51faa3858099336fc67dfb222542 Stats: 116 lines in 3 files changed: 105 ins; 1 del; 10 mod 8270308: Arena::Amalloc may return misaligned address on 32-bit Reviewed-by: coleenp, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/4835 From njian at openjdk.java.net Tue Jul 27 05:43:33 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 27 Jul 2021 05:43:33 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 10:48:30 GMT, Andrew Haley wrote: > > You'd still need a way to do that mapping, but it'd be a simple switch in a function that does the conversion to Assembler::Condition. Then you'd just have e.g. sve_cmp(GE, xword ...) and most of this code would disappear. You wouldn't need all the Assembler functions, either. But Assembler::Condition cannot be encoded to vector compare instructions directly, which means that we need two switch mappings: BoolTest --> Assembler::Condition --> vector compare encodings. That looks more complicated than current implementation? ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From aph at openjdk.java.net Tue Jul 27 08:19:30 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 27 Jul 2021 08:19:30 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Tue, 27 Jul 2021 05:40:13 GMT, Ningsheng Jian wrote: >>> we still need to map c2 specific condition value in BoolTest to instruction condition encoding >> >> You'd still need a way to do that mapping, but it'd be a simple switch in a function that does the conversion to Assembler::Condition. Then you'd just have e.g. sve_cmp(GE, xword ...) and most of this code would disappear. You wouldn't need all the Assembler functions, either. > >> >> You'd still need a way to do that mapping, but it'd be a simple switch in a function that does the conversion to Assembler::Condition. Then you'd just have e.g. sve_cmp(GE, xword ...) and most of this code would disappear. You wouldn't need all the Assembler functions, either. > > But Assembler::Condition cannot be encoded to vector compare instructions directly, which means that we need two switch mappings: BoolTest --> Assembler::Condition --> vector compare encodings. That looks more complicated than current implementation? Ah, true, SVE conditions are (of course!) different from scalar conditions. But that's still no more than a mapping function from c2-specific cond to SVE codes, isn't it? But I take your point, it's not so obvious as I thought. (Re the "aligning with Neon" argument: I let a lot of things through in the past, which I now regret. I should have pushed back harder.) ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From shade at openjdk.java.net Tue Jul 27 08:34:28 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 27 Jul 2021 08:34:28 GMT Subject: RFR: 8270794: Avoid loading Klass* twice in TypeArrayKlass::oop_size() In-Reply-To: References: Message-ID: On Thu, 15 Jul 2021 19:56:04 GMT, Roman Kennke wrote: > TypeArrayKlass::oop_size() calls into TypoArrayOopDesc::object_size() which loads the Klass* from the object, but this is not necessary because we're coming from TypeArrayKlass. > > Note: This came up in Lilliput, where we need to be careful how to load the Klass, and must figure out the object size using oopDesc::size_given_klass() without blindly re-loading the Klass*. Outside of Lilliput I consider this a cosmetic change (i.e. no substantial performance improvement expected because most cases should be covered by layout-helper). > > Testing: > - [x] tier1 > - [ ] tier2 This looks fine to me, but somebody from runtime team should take a look as well. src/hotspot/share/opto/runtime.cpp line 305: > 303: is_deoptimized_caller_frame(current)) { > 304: // Zero array here if the caller is deoptimized. > 305: int size = TypeArrayKlass::cast(array_type)->oop_size(result); I think you can pull `TypeArrayKlass::cast(array_type)` into a local variable, and use it in the line below as well. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4799 From stuefe at openjdk.java.net Tue Jul 27 09:48:52 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 27 Jul 2021 09:48:52 GMT Subject: RFR: 8271242: Add Arena regression tests Message-ID: May I please have reviews for these test additions. These are new regression tests for hotspot arenas. We don't have any and it makes sense to have them since this code is fragile and we work on it. It also contains some new gtest utility functions which I will use to consolidate some more test coding, mainly in the metaspace gtests (in a future rfe). It also comes with a new jtreg gtest launcher for arena tests to test UseMallocOnly mode. As long as we support that, we should test it. --- tests: - gtests, manually with 32/64 bit and with/without UseMallocOnly - GHAs ------------- Commit messages: - start Changes: https://git.openjdk.java.net/jdk/pull/4909/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4909&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271242 Stats: 465 lines in 4 files changed: 465 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4909.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4909/head:pull/4909 PR: https://git.openjdk.java.net/jdk/pull/4909 From mbaesken at openjdk.java.net Tue Jul 27 10:28:30 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Tue, 27 Jul 2021 10:28:30 GMT Subject: RFR: JDK-8266490: Extend the OSContainer API to support the pids controller of cgroups [v6] In-Reply-To: <_CXJ5Lcpd7-PqzRzGAtEE4NyZzAGirYGSgVT7KbPyFw=.f2ce7164-7d28-4b6e-9a79-9417054e0113@github.com> References: <80NDhQE20WOO7LMCDS9C9zYQIRy-YKqNiGgrPQAPI64=.ef6e55d9-8995-4669-9c6f-e10a61bd427f@github.com> <_CXJ5Lcpd7-PqzRzGAtEE4NyZzAGirYGSgVT7KbPyFw=.f2ce7164-7d28-4b6e-9a79-9417054e0113@github.com> Message-ID: On Fri, 23 Jul 2021 12:32:04 GMT, Severin Gehwolf wrote: > > Could this be some setting configured on your box that is picked up as an additional limit ? > > Possibly. Not sure where to look, though. I ask some local experts about that issue. What do you think about accepting, when setting -1/unlimited, a high limit number like 20.000+ as well (and and a comment that on some setups unlimited means just "high number" but not unlimited? Another Idea I had was to start a little test java program that creates e.g. 50.000 (or another high number) of threads. If this fails with "unilimited" pids-limit set, we might have a setup like yours and then skip the test (or accept a high number like I suggested). ------------- PR: https://git.openjdk.java.net/jdk/pull/4518 From stuefe at openjdk.java.net Tue Jul 27 14:17:34 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 27 Jul 2021 14:17:34 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 21:08:04 GMT, David Holmes wrote: > Before looking at this, have you checked the startup performance impact? > > Thanks, > David > ----- Hi David, performance should not be a problem. The potentially costly thing is the underlying hashmap. But we keep it operating with a very small load factor. More details: Adding entries is O(1). Since during pre-init phase almost only adds happen, startup time is not affected. Still, to make sure this is true, I did a bunch of tests: - tested WCT of a helloworld, no differences with and without patch - tested startup time in various of ways, no differences - repeated those tests with 25000 (!) VM arguments, the only way to influence the number of pre-init allocations. No differences (VM gets slower with and without patch). ---- The expensive thing is lookup since we potentially need to walk a very full hashmap. Lookup affects post-init more than pre-init. To get an idea of the cost of a too-full preinit lookup table, I modified the VM to do a configurable number of pre-init test-allocations, with the intent of artificially inflating the lookup table. Then, after NMT initialization, I measured the cost of lookup. The short story, I was not able to measure anything, even with a million pre-init allocations. Of course, with more allocations lookup table got fuller and the VM got slower, but the time increase was caused by the cost of the malloc calls themselves, not the table lookup. Finally, I did an isolated test for the lookup table, testing pure adding and retrieval cost with artificial values. There, I could see costs for add were static (as expected), and lookup cost increased with table population. On my machine: | lu table entries | time per lookup | | ------ |:-------------:| | 1000 | 3 ns | | 1 mio | 240 ns | As you can see, if lookup table population goes beyond 1 mio entries, lookup time starts being noticeable over background noise. But with these numbers, I am not worried. Standard lookup population should be around *300-500*, with very long command lines resulting in table populations of *~1000*. We should never seen 10000 entries, let alone millions of them. Still, I added a jtreg test to verify the expected hash table population. To catch errors like an unforeseen mass of pre-init allocations (lets say a leak or badly written code sneaked in), or if the hash algorithm suddenly is not good anymore. Two more points 1) I kept this coding deliberately simple. If we are really worried about a degenerated lookup table, we can do things to fix that: - we could automatically resize and rehash - we could, if we sense something wrong, just stop filling it and disable NMT, stopping NMT init phase prematurely at the cost of not being able to use NMT. The latter I had implemented already but removed it again to keep complexity down, and because I saw no need. 2) In our propietary production VM we have a system similar to NMT, but predating it. In that system we don't use malloc headers but store all (millions of) malloc'ed pointers in a big hash map. It performs excellent on *all our libc variants*. It is so fast that we just leave it always switched on. This solution has been productive since >10 years, and therefore I am confident that this is viable. This proposed hashmap with a planned population of 300-1000 is really not much :) ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From coleenp at openjdk.java.net Tue Jul 27 15:05:30 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 27 Jul 2021 15:05:30 GMT Subject: RFR: 8270794: Avoid loading Klass* twice in TypeArrayKlass::oop_size() In-Reply-To: References: Message-ID: On Thu, 15 Jul 2021 19:56:04 GMT, Roman Kennke wrote: > TypeArrayKlass::oop_size() calls into TypoArrayOopDesc::object_size() which loads the Klass* from the object, but this is not necessary because we're coming from TypeArrayKlass. > > Note: This came up in Lilliput, where we need to be careful how to load the Klass, and must figure out the object size using oopDesc::size_given_klass() without blindly re-loading the Klass*. Outside of Lilliput I consider this a cosmetic change (i.e. no substantial performance improvement expected because most cases should be covered by layout-helper). > > Testing: > - [x] tier1 > - [ ] tier2 This looks fine. I was wondering why the other Metadata types have oop_size(oop) which call object_size() shouldn't have the same treatment but it doesn't appear that objArrayOop::object_size loads the _klass field. So TypeArrayKlass::object_size() can be different and that seems fine. It's not virtual. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4799 From iklam at openjdk.java.net Tue Jul 27 16:24:41 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 27 Jul 2021 16:24:41 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable Message-ID: The template parameter for ResourceHashtable is currently in this order: template< typename K, typename V, unsigned (*HASH) (K const&) = primitive_hash, bool (*EQUALS)(K const&, K const&) = primitive_equals, unsigned SIZE = 256, ResourceObj::allocation_type ALLOC_TYPE = ResourceObj::RESOURCE_AREA, MEMFLAGS MEM_TYPE = mtInternal > class ResourceHashtable {...} However, more often than not, default values of `HASH` and `EQUALS` will be used, where the other parameters may need to be specified. We should move the `HASH` and `EQUALS` parameters to the end of the parameter list. ------------- Commit messages: - 8270061: Change parameter order of ResourceHashtable Changes: https://git.openjdk.java.net/jdk/pull/4912/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4912&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270061 Stats: 97 lines in 18 files changed: 24 ins; 48 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/4912.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4912/head:pull/4912 PR: https://git.openjdk.java.net/jdk/pull/4912 From rkennke at openjdk.java.net Tue Jul 27 16:41:32 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 27 Jul 2021 16:41:32 GMT Subject: Integrated: 8270794: Avoid loading Klass* twice in TypeArrayKlass::oop_size() In-Reply-To: References: Message-ID: <_yWBI1BInXRFHCNARax--uby3wNOkhTui-anXzpxgjI=.edcd240a-86e0-4036-b87f-c7101090237f@github.com> On Thu, 15 Jul 2021 19:56:04 GMT, Roman Kennke wrote: > TypeArrayKlass::oop_size() calls into TypoArrayOopDesc::object_size() which loads the Klass* from the object, but this is not necessary because we're coming from TypeArrayKlass. > > Note: This came up in Lilliput, where we need to be careful how to load the Klass, and must figure out the object size using oopDesc::size_given_klass() without blindly re-loading the Klass*. Outside of Lilliput I consider this a cosmetic change (i.e. no substantial performance improvement expected because most cases should be covered by layout-helper). > > Testing: > - [x] tier1 > - [x] tier2 This pull request has now been integrated. Changeset: ea49691f Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/ea49691f1dbb4f57ed0c5982f004e7aabcd15d13 Stats: 5 lines in 4 files changed: 0 ins; 1 del; 4 mod 8270794: Avoid loading Klass* twice in TypeArrayKlass::oop_size() Reviewed-by: shade, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/4799 From coleenp at openjdk.java.net Tue Jul 27 18:56:30 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 27 Jul 2021 18:56:30 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 16:10:48 GMT, Ioi Lam wrote: > The template parameter for ResourceHashtable is currently in this order: > > template< > typename K, typename V, > unsigned (*HASH) (K const&) = primitive_hash, > bool (*EQUALS)(K const&, K const&) = primitive_equals, > unsigned SIZE = 256, > ResourceObj::allocation_type ALLOC_TYPE = ResourceObj::RESOURCE_AREA, > MEMFLAGS MEM_TYPE = mtInternal > > > class ResourceHashtable {...} > > > However, more often than not, default values of `HASH` and `EQUALS` will be used, where the other parameters may need to be specified. > > We should move the `HASH` and `EQUALS` parameters to the end of the parameter list. Looks good. Thanks for doing this! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4912 From stuefe at openjdk.java.net Tue Jul 27 19:14:30 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 27 Jul 2021 19:14:30 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 16:10:48 GMT, Ioi Lam wrote: > The template parameter for ResourceHashtable is currently in this order: > > template< > typename K, typename V, > unsigned (*HASH) (K const&) = primitive_hash, > bool (*EQUALS)(K const&, K const&) = primitive_equals, > unsigned SIZE = 256, > ResourceObj::allocation_type ALLOC_TYPE = ResourceObj::RESOURCE_AREA, > MEMFLAGS MEM_TYPE = mtInternal > > > class ResourceHashtable {...} > > > However, more often than not, default values of `HASH` and `EQUALS` will be used, where the other parameters may need to be specified. > > We should move the `HASH` and `EQUALS` parameters to the end of the parameter list. Looks good. One could save even more writing by some more reordering: move MEMFLAGS upfront, since its is usually specified, and make the default allocator C_HEAP, because more users seem to want a C-heap map than a resource area one. But this is just idle nitpicking, the change is good as it is. ..Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4912 From coleenp at openjdk.java.net Tue Jul 27 20:53:36 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 27 Jul 2021 20:53:36 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 16:10:48 GMT, Ioi Lam wrote: > The template parameter for ResourceHashtable is currently in this order: > > template< > typename K, typename V, > unsigned (*HASH) (K const&) = primitive_hash, > bool (*EQUALS)(K const&, K const&) = primitive_equals, > unsigned SIZE = 256, > ResourceObj::allocation_type ALLOC_TYPE = ResourceObj::RESOURCE_AREA, > MEMFLAGS MEM_TYPE = mtInternal > > > class ResourceHashtable {...} > > > However, more often than not, default values of `HASH` and `EQUALS` will be used, where the other parameters may need to be specified. > > We should move the `HASH` and `EQUALS` parameters to the end of the parameter list. Oh but MEMFLAGS is generally found after the allocation type and I thought there were more resource allocated resource hashtables (at least initially). ------------- PR: https://git.openjdk.java.net/jdk/pull/4912 From mseledtsov at openjdk.java.net Tue Jul 27 22:41:27 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Tue, 27 Jul 2021 22:41:27 GMT Subject: RFR: 8271242: Add Arena regression tests In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 06:06:40 GMT, Thomas Stuefe wrote: > May I please have reviews for these test additions. These are new regression tests for hotspot arenas. We don't have any and it makes sense to have them since this code is fragile and we work on it. > > It also contains some new gtest utility functions which I will use to consolidate some more test coding, mainly in the metaspace gtests (in a future rfe). > > It also comes with a new jtreg gtest launcher for arena tests to test UseMallocOnly mode. As long as we support that, we should test it. > > --- > > tests: > - gtests, manually with 32/64 bit and with/without UseMallocOnly > - GHAs Thank you for adding these tests. These new tests look good to me (from HotSpot test engineer POV). Someone who is an expert in Arena allocation should review this change as well. test/hotspot/jtreg/gtest/ArenaGtests.java line 38: > 36: * @modules java.base/jdk.internal.misc > 37: * java.xml > 38: * @requires vm.flagless Style comment: Please group all at-requires together in the same or neighboring lines. ------------- Marked as reviewed by mseledtsov (Committer). PR: https://git.openjdk.java.net/jdk/pull/4909 From david.holmes at oracle.com Tue Jul 27 23:02:55 2021 From: david.holmes at oracle.com (David Holmes) Date: Wed, 28 Jul 2021 09:02:55 +1000 Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: <57df56b8-2a33-9a8f-8d6a-132208ce1b5f@oracle.com> On 28/07/2021 12:17 am, Thomas Stuefe wrote: > On Mon, 26 Jul 2021 21:08:04 GMT, David Holmes wrote: > >> Before looking at this, have you checked the startup performance impact? >> >> Thanks, >> David >> ----- > > Hi David, > > performance should not be a problem. The potentially costly thing is the underlying hashmap. But we keep it operating with a very small load factor. > > More details: > > Adding entries is O(1). Since during pre-init phase almost only adds happen, startup time is not affected. Still, to make sure this is true, I did a bunch of tests: > > - tested WCT of a helloworld, no differences with and without patch > - tested startup time in various of ways, no differences > - repeated those tests with 25000 (!) VM arguments, the only way to influence the number of pre-init allocations. No differences (VM gets slower with and without patch). > > ---- > > The expensive thing is lookup since we potentially need to walk a very full hashmap. Lookup affects post-init more than pre-init. > > To get an idea of the cost of a too-full preinit lookup table, I modified the VM to do a configurable number of pre-init test-allocations, with the intent of artificially inflating the lookup table. Then, after NMT initialization, I measured the cost of lookup. The short story, I was not able to measure anything, even with a million pre-init allocations. Of course, with more allocations lookup table got fuller and the VM got slower, but the time increase was caused by the cost of the malloc calls themselves, not the table lookup. > > Finally, I did an isolated test for the lookup table, testing pure adding and retrieval cost with artificial values. There, I could see costs for add were static (as expected), and lookup cost increased with table population. On my machine: > > | lu table entries | time per lookup | > | ------ |:-------------:| > | 1000 | 3 ns | > | 1 mio | 240 ns | > > As you can see, if lookup table population goes beyond 1 mio entries, lookup time starts being noticeable over background noise. But with these numbers, I am not worried. Standard lookup population should be around *300-500*, with very long command lines resulting in table populations of *~1000*. We should never seen 10000 entries, let alone millions of them. > > Still, I added a jtreg test to verify the expected hash table population. To catch errors like an unforeseen mass of pre-init allocations (lets say a leak or badly written code sneaked in), or if the hash algorithm suddenly is not good anymore. > > Two more points > > 1) I kept this coding deliberately simple. If we are really worried about a degenerated lookup table, we can do things to fix that: > - we could automatically resize and rehash > - we could, if we sense something wrong, just stop filling it and disable NMT, stopping NMT init phase prematurely at the cost of not being able to use NMT. > > The latter I had implemented already but removed it again to keep complexity down, and because I saw no need. > > 2) In our propietary production VM we have a system similar to NMT, but predating it. In that system we don't use malloc headers but store all (millions of) malloc'ed pointers in a big hash map. It performs excellent on *all our libc variants*. It is so fast that we just leave it always switched on. This solution has been productive since >10 years, and therefore I am confident that this is viable. This proposed hashmap with a planned population of 300-1000 is really not much :) Thanks Thomas! I appreciate the detailed investigation. Cheers, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/4874 > From jwilhelm at openjdk.java.net Tue Jul 27 23:54:39 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Tue, 27 Jul 2021 23:54:39 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8271350: runtime/Safepoint tests use OutputAnalyzer::shouldMatch instead of shouldContaint - 8270866: NPE in DocTreePath.getTreePath() - 8270491: SEGV at read_string_field(oopDesc*, char const*, JavaThread*)+0x54 - 8271223: two runtime/ClassFile tests don't check exit code The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.java.net/jdk/pull/4914/files Stats: 205 lines in 14 files changed: 114 ins; 53 del; 38 mod Patch: https://git.openjdk.java.net/jdk/pull/4914.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4914/head:pull/4914 PR: https://git.openjdk.java.net/jdk/pull/4914 From iklam at openjdk.java.net Wed Jul 28 00:41:27 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 28 Jul 2021 00:41:27 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 20:50:35 GMT, Coleen Phillimore wrote: > Oh but MEMFLAGS is generally found after the allocation type and I thought there were more resource allocated resource hashtables (at least initially). OK, I'll leave the PR as is without further rearranging the arguments. Since the class is named ResourceHashtable, using C_HEAP as the default seems a little weird. We are planning to get rid of the "old" Hashtable and then rename ResourceHashtable to Hashtable. We can rethink the default values at that point. ------------- PR: https://git.openjdk.java.net/jdk/pull/4912 From jwilhelm at openjdk.java.net Wed Jul 28 00:42:00 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 28 Jul 2021 00:42:00 GMT Subject: RFR: Merge jdk17 [v2] In-Reply-To: References: Message-ID: > Forwardport JDK 17 -> JDK 18 Jesper Wilhelmsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 337 commits: - Merge - 8270859: Post JEP 411 refactoring: client libs with maximum covering > 10K Reviewed-by: serb - 8267485: Remove the dependency on SecurityManager in JceSecurityManager.java Reviewed-by: mchung - 8270794: Avoid loading Klass* twice in TypeArrayKlass::oop_size() Reviewed-by: shade, coleenp - 8270946: X509CertImpl.getFingerprint should not return the empty String Reviewed-by: weijun - 8270308: Arena::Amalloc may return misaligned address on 32-bit Reviewed-by: coleenp, kbarrett - 8212961: [TESTBUG] vmTestbase/nsk/stress/jni/ native code cleanup Reviewed-by: stuefe, iignatyev - 8269753: Misplaced caret in PatternSyntaxException's detail message Reviewed-by: prappo - 8190753: (zipfs): Accessing a large entry (> 2^31 bytes) leads to a negative initial size for ByteArrayOutputStream Reviewed-by: lancea - Merge - ... and 327 more: https://git.openjdk.java.net/jdk/compare/f1e15c8c...9bbbad27 ------------- Changes: https://git.openjdk.java.net/jdk/pull/4914/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4914&range=01 Stats: 60112 lines in 1260 files changed: 28209 ins; 25676 del; 6227 mod Patch: https://git.openjdk.java.net/jdk/pull/4914.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4914/head:pull/4914 PR: https://git.openjdk.java.net/jdk/pull/4914 From jwilhelm at openjdk.java.net Wed Jul 28 00:42:03 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 28 Jul 2021 00:42:03 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 23:47:20 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: a50161b7 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/a50161b75045715b1a0ee2a55a6352e4c1aa009a Stats: 205 lines in 14 files changed: 114 ins; 53 del; 38 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4914 From stuefe at openjdk.java.net Wed Jul 28 03:59:29 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 28 Jul 2021 03:59:29 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable In-Reply-To: References: Message-ID: <-yMdhvxqbFv1mjl9cTLJv0iDoFmTj9BrVDo099hi0Yw=.aa114475-5c26-407f-8ca5-6272a06352bd@github.com> On Wed, 28 Jul 2021 00:38:53 GMT, Ioi Lam wrote: > > Oh but MEMFLAGS is generally found after the allocation type and I thought there were more resource allocated resource hashtables (at least initially). > > OK, I'll leave the PR as is without further rearranging the arguments. > > Since the class is named ResourceHashtable, using C_HEAP as the default seems a little weird. We are planning to get rid of the "old" Hashtable and then rename ResourceHashtable to Hashtable. We can rethink the default values at that point. Sure, all good! ------------- PR: https://git.openjdk.java.net/jdk/pull/4912 From iklam at openjdk.java.net Wed Jul 28 04:21:03 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 28 Jul 2021 04:21:03 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable [v2] In-Reply-To: References: Message-ID: <_D8f5roHrLu8B4uCfktdMURJEJeb3nIqZx_kwbRd02g=.3e6a5c55-7d5b-4c11-a1fe-65fbc0637cc1@github.com> > The template parameter for ResourceHashtable is currently in this order: > > template< > typename K, typename V, > unsigned (*HASH) (K const&) = primitive_hash, > bool (*EQUALS)(K const&, K const&) = primitive_equals, > unsigned SIZE = 256, > ResourceObj::allocation_type ALLOC_TYPE = ResourceObj::RESOURCE_AREA, > MEMFLAGS MEM_TYPE = mtInternal > > > class ResourceHashtable {...} > > > However, more often than not, default values of `HASH` and `EQUALS` will be used, where the other parameters may need to be specified. > > We should move the `HASH` and `EQUALS` parameters to the end of the parameter list. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8270061-reorder-resource-hash-params - 8270061: Change parameter order of ResourceHashtable ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4912/files - new: https://git.openjdk.java.net/jdk/pull/4912/files/dd6afc8f..3a87c5db Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4912&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4912&range=00-01 Stats: 6717 lines in 133 files changed: 5425 ins; 406 del; 886 mod Patch: https://git.openjdk.java.net/jdk/pull/4912.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4912/head:pull/4912 PR: https://git.openjdk.java.net/jdk/pull/4912 From kvn at openjdk.java.net Wed Jul 28 05:48:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Jul 2021 05:48:45 GMT Subject: RFR: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization Message-ID: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> Backout the following changes due to vector tests failures in tier 2 and later: [JDK-8266054](https://bugs.openjdk.java.net/browse/JDK-8266054) VectorAPI rotate operation optimization Changes also caused copyright header validation failure in Tier1 due to missing `,` after copyright year in new test. Currently running testing. ------------- Commit messages: - 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization Changes: https://git.openjdk.java.net/jdk/pull/4915/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4915&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271368 Stats: 4438 lines in 57 files changed: 58 ins; 4219 del; 161 mod Patch: https://git.openjdk.java.net/jdk/pull/4915.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4915/head:pull/4915 PR: https://git.openjdk.java.net/jdk/pull/4915 From dholmes at openjdk.java.net Wed Jul 28 05:48:45 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 28 Jul 2021 05:48:45 GMT Subject: RFR: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization In-Reply-To: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> References: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> Message-ID: On Wed, 28 Jul 2021 05:35:59 GMT, Vladimir Kozlov wrote: > Backout the following changes due to vector tests failures in tier 2 and later: > [JDK-8266054](https://bugs.openjdk.java.net/browse/JDK-8266054) VectorAPI rotate operation optimization > > Changes also caused copyright header validation failure in Tier1 due to missing `,` after copyright year in new test. > > Currently running testing. Backout looks good. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4915 From iklam at openjdk.java.net Wed Jul 28 06:08:33 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 28 Jul 2021 06:08:33 GMT Subject: RFR: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization In-Reply-To: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> References: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> Message-ID: On Wed, 28 Jul 2021 05:35:59 GMT, Vladimir Kozlov wrote: > Backout the following changes due to vector tests failures in tier 2 and later: > [JDK-8266054](https://bugs.openjdk.java.net/browse/JDK-8266054) VectorAPI rotate operation optimization > > Changes also caused copyright header validation failure in Tier1 due to missing `,` after copyright year in new test. > > Currently running testing. LGTM ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4915 From jbhateja at openjdk.java.net Wed Jul 28 06:27:27 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 28 Jul 2021 06:27:27 GMT Subject: RFR: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization In-Reply-To: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> References: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> Message-ID: On Wed, 28 Jul 2021 05:35:59 GMT, Vladimir Kozlov wrote: > Backout the following changes due to vector tests failures in tier 2 and later: > [JDK-8266054](https://bugs.openjdk.java.net/browse/JDK-8266054) VectorAPI rotate operation optimization > > Changes also caused copyright header validation failure in Tier1 due to missing `,` after copyright year in new test. > > Currently running testing. - Thanks for reporting it, should it be ok to move those tests to ProblemList.txt and let me fix this as a follow up issue instead of a revert ? ------------- PR: https://git.openjdk.java.net/jdk/pull/4915 From kvn at openjdk.java.net Wed Jul 28 07:01:34 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Jul 2021 07:01:34 GMT Subject: Integrated: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization In-Reply-To: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> References: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> Message-ID: On Wed, 28 Jul 2021 05:35:59 GMT, Vladimir Kozlov wrote: > Backout the following changes due to vector tests failures in tier 2 and later: > [JDK-8266054](https://bugs.openjdk.java.net/browse/JDK-8266054) VectorAPI rotate operation optimization > > Changes also caused copyright header validation failure in Tier1 due to missing `,` after copyright year in new test. > > Currently running testing. This pull request has now been integrated. Changeset: d7b5cb68 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/d7b5cb688956ce79443ef3cd080c36028fcfb19d Stats: 4438 lines in 57 files changed: 58 ins; 4219 del; 161 mod 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization Reviewed-by: dholmes, iklam ------------- PR: https://git.openjdk.java.net/jdk/pull/4915 From kvn at openjdk.java.net Wed Jul 28 07:07:31 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 28 Jul 2021 07:07:31 GMT Subject: RFR: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization In-Reply-To: References: <3CmpsfEtgNRMlwhC0L2QmMFmKL_umQ95A_cT65DGVBs=.21a0fea3-9a65-41d9-86e1-ef0827c105ee@github.com> Message-ID: On Wed, 28 Jul 2021 06:24:20 GMT, Jatin Bhateja wrote: > * Thanks for reporting it, should it be ok to move those tests to ProblemList.txt and let me fix this as a follow up issue instead of a revert ? @jatin-bhateja We did not test original changes in our testing infra. There could be other issues in high tiers which we don't know. I prefer that you use 8271366 to prepare changes with fixed reported failure, file PR and let me run testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/4915 From stuefe at openjdk.java.net Wed Jul 28 09:29:31 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 28 Jul 2021 09:29:31 GMT Subject: RFR: 8271242: Add Arena regression tests In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 22:38:06 GMT, Mikhailo Seledtsov wrote: > Thank you for adding these tests. These new tests look good to me (from HotSpot test engineer POV). Someone who is an expert in Arena allocation should review this change as well. Thank you Mikhailo! > test/hotspot/jtreg/gtest/ArenaGtests.java line 38: > >> 36: * @modules java.base/jdk.internal.misc >> 37: * java.xml >> 38: * @requires vm.flagless > > Style comment: Please group all at-requires together in the same or neighboring lines. Will do. ------------- PR: https://git.openjdk.java.net/jdk/pull/4909 From aph at openjdk.java.net Wed Jul 28 10:04:44 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 28 Jul 2021 10:04:44 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects Message-ID: C1 has its own code generators for zeroing words. We use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. ------------- Commit messages: - Untabify - Test - Better loops - Merge https://git.openjdk.java.net/jdk into c1-initialize - comments - Benchmark - Cleanup - Better - Intermediate - Intermediate - ... and 1 more: https://git.openjdk.java.net/jdk/compare/515113d8...5a1acc81 Changes: https://git.openjdk.java.net/jdk/pull/4919/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270947 Stats: 28931 lines in 8 files changed: 28779 ins; 116 del; 36 mod Patch: https://git.openjdk.java.net/jdk/pull/4919.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4919/head:pull/4919 PR: https://git.openjdk.java.net/jdk/pull/4919 From kbarrett at openjdk.java.net Wed Jul 28 11:56:39 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 28 Jul 2021 11:56:39 GMT Subject: RFR: 8271352: Extend jcc erratum mitigation to additional processors Message-ID: Please review this change to default enable the Intel jcc erratum performance mitigation for family 6 model 165 (0xA5). This seems to reduce the frequency of an issue that is still under investigation. ------------- Commit messages: - add 06_A5H Changes: https://git.openjdk.java.net/jdk/pull/4922/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4922&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271352 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4922.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4922/head:pull/4922 PR: https://git.openjdk.java.net/jdk/pull/4922 From thartmann at openjdk.java.net Wed Jul 28 12:15:28 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 28 Jul 2021 12:15:28 GMT Subject: RFR: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 11:49:52 GMT, Kim Barrett wrote: > Please review this change to default enable the Intel jcc erratum > performance mitigation for family 6 model 165 (0xA5). This seems to reduce > the frequency of an issue that is still under investigation. Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4922 From aph at openjdk.java.net Wed Jul 28 12:22:32 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 28 Jul 2021 12:22:32 GMT Subject: Withdrawn: 8270947: AArch64: C1: use zero_words to initialize all objects In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 09:39:32 GMT, Andrew Haley wrote: > C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. > > This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. > > Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. > > The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: > > old: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > > new: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > > > Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, > objects sizes are in bytes: > > > old: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s > > new: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4919 From eosterlund at openjdk.java.net Wed Jul 28 12:23:37 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 28 Jul 2021 12:23:37 GMT Subject: RFR: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 11:49:52 GMT, Kim Barrett wrote: > Please review this change to default enable the Intel jcc erratum > performance mitigation for family 6 model 165 (0xA5). This seems to reduce > the frequency of an issue that is still under investigation. Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4922 From kbarrett at openjdk.java.net Wed Jul 28 12:43:30 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 28 Jul 2021 12:43:30 GMT Subject: RFR: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: References: Message-ID: <6RThq-3v1YoQk9OyPYOd416UJC9XtpR5dI3GCDU1SR4=.1a43e043-2c23-48e6-8e54-c0fd19776937@github.com> On Wed, 28 Jul 2021 11:49:52 GMT, Kim Barrett wrote: > Please review this change to default enable the Intel jcc erratum > performance mitigation for family 6 model 165 (0xA5). This seems to reduce > the frequency of an issue that is still under investigation. Withdrawing this. I accidentally opened this PR against mainline, rather than the intended jdk17 repo. ------------- PR: https://git.openjdk.java.net/jdk/pull/4922 From kbarrett at openjdk.java.net Wed Jul 28 12:43:31 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 28 Jul 2021 12:43:31 GMT Subject: Withdrawn: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 11:49:52 GMT, Kim Barrett wrote: > Please review this change to default enable the Intel jcc erratum > performance mitigation for family 6 model 165 (0xA5). This seems to reduce > the frequency of an issue that is still under investigation. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4922 From kbarrett at openjdk.java.net Wed Jul 28 12:52:49 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 28 Jul 2021 12:52:49 GMT Subject: [jdk17] RFR: 8271352: Extend jcc erratum mitigation to additional processors Message-ID: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> Please review this change to default enable the Intel jcc erratum performance mitigation for family 6 model 165 (0xA5). This seems to reduce the frequency of an issue that is still under investigation. ------------- Commit messages: - add 06_A5H Changes: https://git.openjdk.java.net/jdk17/pull/286/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=286&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271352 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk17/pull/286.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/286/head:pull/286 PR: https://git.openjdk.java.net/jdk17/pull/286 From eosterlund at openjdk.java.net Wed Jul 28 12:52:49 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 28 Jul 2021 12:52:49 GMT Subject: [jdk17] RFR: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> References: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> Message-ID: <0C4yC_ytQMkx7dC0anfZG2D_zPkcn4ZMjBN6aMwwKyk=.3d812531-3a19-45e4-89d5-61f4b37c42c8@github.com> On Wed, 28 Jul 2021 12:42:10 GMT, Kim Barrett wrote: > Please review this change to default enable the Intel jcc erratum > performance mitigation for family 6 model 165 (0xA5). This seems to reduce > the frequency of an issue that is still under investigation. Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk17/pull/286 From thartmann at openjdk.java.net Wed Jul 28 12:52:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 28 Jul 2021 12:52:49 GMT Subject: [jdk17] RFR: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> References: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> Message-ID: On Wed, 28 Jul 2021 12:42:10 GMT, Kim Barrett wrote: > Please review this change to default enable the Intel jcc erratum > performance mitigation for family 6 model 165 (0xA5). This seems to reduce > the frequency of an issue that is still under investigation. Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk17/pull/286 From kbarrett at openjdk.java.net Wed Jul 28 14:10:36 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 28 Jul 2021 14:10:36 GMT Subject: [jdk17] RFR: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: References: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> Message-ID: On Wed, 28 Jul 2021 12:47:42 GMT, Tobias Hartmann wrote: >> Please review this change to default enable the Intel jcc erratum >> performance mitigation for family 6 model 165 (0xA5). This seems to reduce >> the frequency of an issue that is still under investigation. > > Marked as reviewed by thartmann (Reviewer). Thanks for reviews @TobiHartmann and @fisk . Currently waiting for jdk17-fix-request approval. ------------- PR: https://git.openjdk.java.net/jdk17/pull/286 From thartmann at openjdk.java.net Wed Jul 28 15:18:35 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 28 Jul 2021 15:18:35 GMT Subject: [jdk17] RFR: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> References: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> Message-ID: On Wed, 28 Jul 2021 12:42:10 GMT, Kim Barrett wrote: > Please review this change to default enable the Intel jcc erratum > performance mitigation for family 6 model 165 (0xA5). This seems to reduce > the frequency of an issue that is still under investigation. And for the record: I think this is a trivial change. ------------- PR: https://git.openjdk.java.net/jdk17/pull/286 From kbarrett at openjdk.java.net Wed Jul 28 15:34:37 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 28 Jul 2021 15:34:37 GMT Subject: [jdk17] RFR: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: References: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> Message-ID: On Wed, 28 Jul 2021 15:15:08 GMT, Tobias Hartmann wrote: >> Please review this change to default enable the Intel jcc erratum >> performance mitigation for family 6 model 165 (0xA5). This seems to reduce >> the frequency of an issue that is still under investigation. > > And for the record: I think this is a trivial change. Thanks @TobiHartmann - I agree it's a trivial change. It's passed tests and has jdk17-fix approval, so I'm going ahead with pushing now. ------------- PR: https://git.openjdk.java.net/jdk17/pull/286 From kbarrett at openjdk.java.net Wed Jul 28 15:34:38 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 28 Jul 2021 15:34:38 GMT Subject: [jdk17] Integrated: 8271352: Extend jcc erratum mitigation to additional processors In-Reply-To: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> References: <9c-6bAexA7yYlG2Kd0H7_8w6Faba2UCJrxNQ4aXoMjY=.aa91c4b7-b671-4079-8e4f-a707e2d908e1@github.com> Message-ID: On Wed, 28 Jul 2021 12:42:10 GMT, Kim Barrett wrote: > Please review this change to default enable the Intel jcc erratum > performance mitigation for family 6 model 165 (0xA5). This seems to reduce > the frequency of an issue that is still under investigation. This pull request has now been integrated. Changeset: 5fcf7208 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk17/commit/5fcf72086ffca85f524fae2d5bd9fd328c9a77e0 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8271352: Extend jcc erratum mitigation to additional processors Reviewed-by: thartmann, eosterlund ------------- PR: https://git.openjdk.java.net/jdk17/pull/286 From tschatzl at openjdk.java.net Wed Jul 28 15:39:47 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 28 Jul 2021 15:39:47 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2385 From github.com+54304+ebourg at openjdk.java.net Wed Jul 28 15:39:47 2021 From: github.com+54304+ebourg at openjdk.java.net (Emmanuel Bourg) Date: Wed, 28 Jul 2021 15:39:47 GMT Subject: RFR: 8271396: Spelling errors Message-ID: This PR fixes the following spelling errors: choosen -> chosen commad -> command hiearchy -> hierarchy leagacy -> legacy minium -> minimum subsytem -> subsystem unamed -> unnamed ------------- Commit messages: - 8271396: Fix spelling errors Changes: https://git.openjdk.java.net/jdk/pull/2385/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2385&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271396 Stats: 78 lines in 34 files changed: 0 ins; 0 del; 78 mod Patch: https://git.openjdk.java.net/jdk/pull/2385.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2385/head:pull/2385 PR: https://git.openjdk.java.net/jdk/pull/2385 From chegar at openjdk.java.net Wed Jul 28 15:39:48 2021 From: chegar at openjdk.java.net (Chris Hegarty) Date: Wed, 28 Jul 2021 15:39:48 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed Trivially, looks ok to me. ------------- Marked as reviewed by chegar (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2385 From yyang at openjdk.java.net Wed Jul 28 15:39:48 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 28 Jul 2021 15:39:48 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed Hi, I've filed https://bugs.openjdk.java.net/browse/JDK-8271396 for this PR, you can change the title to "8271396: Spelling errors", openjdk bot will link this PR to the corresponding issue. Also you should resolve conflicts and pass jcheck before integrating it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From github.com+54304+ebourg at openjdk.java.net Wed Jul 28 15:39:48 2021 From: github.com+54304+ebourg at openjdk.java.net (Emmanuel Bourg) Date: Wed, 28 Jul 2021 15:39:48 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: <10SItkPG_iJXTz8Ya0h8wBgoiXRl-aki9ADR8U_jaj8=.f3b7b05e-2b91-4522-9f7f-4b2ea97c8a50@github.com> On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed . . Thank you! The PR has been updated ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From iris at openjdk.java.net Wed Jul 28 16:17:31 2021 From: iris at openjdk.java.net (Iris Clark) Date: Wed, 28 Jul 2021 16:17:31 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed Marked as reviewed by iris (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From psadhukhan at openjdk.java.net Wed Jul 28 16:17:31 2021 From: psadhukhan at openjdk.java.net (Prasanta Sadhukhan) Date: Wed, 28 Jul 2021 16:17:31 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed Thanks for awt/swing correction. ------------- Marked as reviewed by psadhukhan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2385 From cjplummer at openjdk.java.net Wed Jul 28 16:25:32 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Wed, 28 Jul 2021 16:25:32 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed jdi, jvmti, and dcmd related changes look good. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2385 From aph at openjdk.java.net Wed Jul 28 16:26:02 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 28 Jul 2021 16:26:02 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: References: Message-ID: > C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. > > This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. > > Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. > > The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: > > old: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > > new: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > > > Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, > objects sizes are in bytes: > > > old: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s > > new: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: D'oh! Stupid fencepost error. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4919/files - new: https://git.openjdk.java.net/jdk/pull/4919/files/5a1acc81..31f238e6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4919.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4919/head:pull/4919 PR: https://git.openjdk.java.net/jdk/pull/4919 From jboes at openjdk.java.net Wed Jul 28 16:29:29 2021 From: jboes at openjdk.java.net (Julia Boes) Date: Wed, 28 Jul 2021 16:29:29 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed I'm happy to sponsor this change, but could you please update the copyright year where necessary, e.g. in src/java.desktop/unix/classes/sun/awt/X11/XMSelection.java: `Copyright (c) 2003, 2018, Oracle...` -> `Copyright (c) 2003, 2021, Oracle...` ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From iklam at openjdk.java.net Wed Jul 28 16:41:05 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 28 Jul 2021 16:41:05 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable [v3] In-Reply-To: References: Message-ID: > The template parameter for ResourceHashtable is currently in this order: > > template< > typename K, typename V, > unsigned (*HASH) (K const&) = primitive_hash, > bool (*EQUALS)(K const&, K const&) = primitive_equals, > unsigned SIZE = 256, > ResourceObj::allocation_type ALLOC_TYPE = ResourceObj::RESOURCE_AREA, > MEMFLAGS MEM_TYPE = mtInternal > > > class ResourceHashtable {...} > > > However, more often than not, default values of `HASH` and `EQUALS` will be used, where the other parameters may need to be specified. > > We should move the `HASH` and `EQUALS` parameters to the end of the parameter list. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8270061-reorder-resource-hash-params - Merge branch 'master' of https://github.com/openjdk/jdk into 8270061-reorder-resource-hash-params - 8270061: Change parameter order of ResourceHashtable ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4912/files - new: https://git.openjdk.java.net/jdk/pull/4912/files/3a87c5db..56abd91e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4912&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4912&range=01-02 Stats: 4616 lines in 66 files changed: 81 ins; 4328 del; 207 mod Patch: https://git.openjdk.java.net/jdk/pull/4912.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4912/head:pull/4912 PR: https://git.openjdk.java.net/jdk/pull/4912 From github.com+54304+ebourg at openjdk.java.net Wed Jul 28 17:12:04 2021 From: github.com+54304+ebourg at openjdk.java.net (Emmanuel Bourg) Date: Wed, 28 Jul 2021 17:12:04 GMT Subject: RFR: 8271396: Spelling errors [v2] In-Reply-To: References: Message-ID: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed Emmanuel Bourg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: 8271396: Fix spelling errors choosen -> chosen commad -> command hiearchy -> hierarchy leagacy -> legacy minium -> minimum subsytem -> subsystem unamed -> unnamed ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2385/files - new: https://git.openjdk.java.net/jdk/pull/2385/files/31cfcba7..6e1be4f6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2385&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2385&range=00-01 Stats: 61642 lines in 1361 files changed: 29147 ins; 26026 del; 6469 mod Patch: https://git.openjdk.java.net/jdk/pull/2385.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2385/head:pull/2385 PR: https://git.openjdk.java.net/jdk/pull/2385 From github.com+54304+ebourg at openjdk.java.net Wed Jul 28 17:12:05 2021 From: github.com+54304+ebourg at openjdk.java.net (Emmanuel Bourg) Date: Wed, 28 Jul 2021 17:12:05 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 16:26:49 GMT, Julia Boes wrote: >> This PR fixes the following spelling errors: >> >> choosen -> chosen >> commad -> command >> hiearchy -> hierarchy >> leagacy -> legacy >> minium -> minimum >> subsytem -> subsystem >> unamed -> unnamed > > I'm happy to sponsor this change, but could you please update the copyright year where necessary, e.g. in > src/java.desktop/unix/classes/sun/awt/X11/XMSelection.java: > `Copyright (c) 2003, 2018, Oracle...` -> `Copyright (c) 2003, 2021, Oracle...` @FrauBoes thank you, the PR has been updated to modify the copyright year ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From kcr at openjdk.java.net Wed Jul 28 17:26:35 2021 From: kcr at openjdk.java.net (Kevin Rushforth) Date: Wed, 28 Jul 2021 17:26:35 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 17:08:01 GMT, Emmanuel Bourg wrote: >> I'm happy to sponsor this change, but could you please update the copyright year where necessary, e.g. in >> src/java.desktop/unix/classes/sun/awt/X11/XMSelection.java: >> `Copyright (c) 2003, 2018, Oracle...` -> `Copyright (c) 2003, 2021, Oracle...` > > @FrauBoes thank you, the PR has been updated to modify the copyright year @ebourg for future PRs please do not force push after the PR is out for review. Just push incremental commits normally. The Skara tooling will squash them all into a single commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From iris at openjdk.java.net Wed Jul 28 17:26:34 2021 From: iris at openjdk.java.net (Iris Clark) Date: Wed, 28 Jul 2021 17:26:34 GMT Subject: RFR: 8271396: Spelling errors [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 17:12:04 GMT, Emmanuel Bourg wrote: >> This PR fixes the following spelling errors: >> >> choosen -> chosen >> commad -> command >> hiearchy -> hierarchy >> leagacy -> legacy >> minium -> minimum >> subsytem -> subsystem >> unamed -> unnamed > > Emmanuel Bourg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8271396: Fix spelling errors > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed Marked as reviewed by iris (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From github.com+54304+ebourg at openjdk.java.net Wed Jul 28 17:26:35 2021 From: github.com+54304+ebourg at openjdk.java.net (Emmanuel Bourg) Date: Wed, 28 Jul 2021 17:26:35 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 17:20:37 GMT, Kevin Rushforth wrote: >> @FrauBoes thank you, the PR has been updated to modify the copyright year > > @ebourg for future PRs please do not force push after the PR is out for review. Just push incremental commits normally. The Skara tooling will squash them all into a single commit. @kevinrushforth I'll do that, thank you for the hint ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From iklam at openjdk.java.net Wed Jul 28 20:42:37 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 28 Jul 2021 20:42:37 GMT Subject: RFR: 8270061: Change parameter order of ResourceHashtable In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 20:50:35 GMT, Coleen Phillimore wrote: >> The template parameter for ResourceHashtable is currently in this order: >> >> template< >> typename K, typename V, >> unsigned (*HASH) (K const&) = primitive_hash, >> bool (*EQUALS)(K const&, K const&) = primitive_equals, >> unsigned SIZE = 256, >> ResourceObj::allocation_type ALLOC_TYPE = ResourceObj::RESOURCE_AREA, >> MEMFLAGS MEM_TYPE = mtInternal >> > >> class ResourceHashtable {...} >> >> >> However, more often than not, default values of `HASH` and `EQUALS` will be used, where the other parameters may need to be specified. >> >> We should move the `HASH` and `EQUALS` parameters to the end of the parameter list. > > Oh but MEMFLAGS is generally found after the allocation type and I thought there were more resource allocated resource hashtables (at least initially). Thanks @coleenp and @tstuefe for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/4912 From iklam at openjdk.java.net Wed Jul 28 20:42:37 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 28 Jul 2021 20:42:37 GMT Subject: Integrated: 8270061: Change parameter order of ResourceHashtable In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 16:10:48 GMT, Ioi Lam wrote: > The template parameter for ResourceHashtable is currently in this order: > > template< > typename K, typename V, > unsigned (*HASH) (K const&) = primitive_hash, > bool (*EQUALS)(K const&, K const&) = primitive_equals, > unsigned SIZE = 256, > ResourceObj::allocation_type ALLOC_TYPE = ResourceObj::RESOURCE_AREA, > MEMFLAGS MEM_TYPE = mtInternal > > > class ResourceHashtable {...} > > > However, more often than not, default values of `HASH` and `EQUALS` will be used, where the other parameters may need to be specified. > > We should move the `HASH` and `EQUALS` parameters to the end of the parameter list. This pull request has now been integrated. Changeset: 357947ac Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/357947acd80b50b1f26679608245de1f9566163e Stats: 97 lines in 18 files changed: 24 ins; 48 del; 25 mod 8270061: Change parameter order of ResourceHashtable Reviewed-by: coleenp, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/4912 From svkamath at openjdk.java.net Wed Jul 28 23:14:04 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Wed, 28 Jul 2021 23:14:04 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v5] In-Reply-To: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: > I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. > Performance gain of ~1.5x - 2x for message sizes 8k and above. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Extended jtreg test case, made changes to .cpp and .java files ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4019/files - new: https://git.openjdk.java.net/jdk/pull/4019/files/4a36816f..108d4a3e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4019&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4019&range=03-04 Stats: 110 lines in 9 files changed: 53 ins; 29 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/4019.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4019/head:pull/4019 PR: https://git.openjdk.java.net/jdk/pull/4019 From jwilhelm at openjdk.java.net Thu Jul 29 00:07:59 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 29 Jul 2021 00:07:59 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8271403: mark hotspot runtime/memory tests which ignore external VM flags - 8271402: mark hotspot runtime/os tests which ignore external VM flags - 8271412: ProblemList javax/sound/midi/Sequencer/Looping.java - 8271251: JavaThread::java_suspend() fails with "fatal error: Illegal threadstate encountered: 6" - 8271174: runtime/ClassFile/UnsupportedClassFileVersion.java can be run in driver mode - 8271352: Extend jcc erratum mitigation to additional processors - 8270908: TestParallelRefProc fails on single core machines The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=4925&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=4925&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/4925/files Stats: 43 lines in 10 files changed: 28 ins; 7 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/4925.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4925/head:pull/4925 PR: https://git.openjdk.java.net/jdk/pull/4925 From iignatyev at openjdk.java.net Thu Jul 29 00:53:12 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 29 Jul 2021 00:53:12 GMT Subject: [jdk17] RFR: 8067223: [TESTBUG] Rename Whitebox API package Message-ID: <39PNb3wa-43yB5IFR3XroqFf4vMNZPitzF85lO1Gw58=.f2587741-d2c4-467d-b722-b17269587e7a@github.com> Hi all, could you please review this big tedious and trivial(-ish) patch which moves `sun.hotspot.WhiteBox` and related classes to `jdk.test.whitebox` package? the majority of the patch is the following substitutions: - `s~sun/hotspot/WhiteBox~jdk/test/whitebox/WhiteBox~g` - `s/sun.hotspot.parser/jdk.test.whitebox.parser/g` - `s/sun.hotspot.cpuinfo/jdk.test.whitebox.cpuinfo/g` - `s/sun.hotspot.code/jdk.test.whitebox.code/g` - `s/sun.hotspot.gc/jdk.test.whitebox.gc/g` - `s/sun.hotspot.WhiteBox/jdk.test.whitebox.WhiteBox/g` testing: tier1-4 Thanks, -- Igor ------------- Commit messages: - copyright year - update TEST.ROOT - jdk: s/sun\.hotspot\.gc/jdk.test.whitebox.gc/g - jdk: s/sun\.hotspot\.code/jdk.test.whitebox.code/g - jdk: s/sun\.hotspot\.WhiteBox/jdk.test.whitebox.WhiteBox/g - hotspot: 's~sun/hotspot/WhiteBox~jdk/test/whitebox/WhiteBox~g' - hotspot: s/sun\.hotspot\.parser/jdk.test.whitebox.parser/g - hotspot: s/sun\.hotspot\.cpuinfo/jdk.test.whitebox.cpuinfo/g - hotspot: s/sun\.hotspot\.code/jdk.test.whitebox.code/g - hotspot: 's/sun\.hotspot\.gc/jdk.test.whitebox.gc/g' - ... and 9 more: https://git.openjdk.java.net/jdk17/compare/c8ae7e5b...8f12f2cf Changes: https://git.openjdk.java.net/jdk17/pull/290/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=290&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8067223 Stats: 2310 lines in 949 files changed: 0 ins; 0 del; 2310 mod Patch: https://git.openjdk.java.net/jdk17/pull/290.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/290/head:pull/290 PR: https://git.openjdk.java.net/jdk17/pull/290 From jwilhelm at openjdk.java.net Thu Jul 29 01:11:31 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 29 Jul 2021 01:11:31 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: <00oNYsWDQ206SlNU0nmvAjvt_1LYzEAbgUxJewtcVmI=.e55a1b2c-07ee-4639-a674-c0fba4a18309@github.com> On Wed, 28 Jul 2021 23:58:29 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: a0504cff Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/a0504cff9f91617fb9810333f3656dba196218d6 Stats: 43 lines in 10 files changed: 28 ins; 7 del; 8 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/4925 From kvn at openjdk.java.net Thu Jul 29 01:33:29 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 29 Jul 2021 01:33:29 GMT Subject: [jdk17] RFR: 8067223: [TESTBUG] Rename Whitebox API package In-Reply-To: <39PNb3wa-43yB5IFR3XroqFf4vMNZPitzF85lO1Gw58=.f2587741-d2c4-467d-b722-b17269587e7a@github.com> References: <39PNb3wa-43yB5IFR3XroqFf4vMNZPitzF85lO1Gw58=.f2587741-d2c4-467d-b722-b17269587e7a@github.com> Message-ID: <_3viZFs2iZ00hMdeP3Nq9gTmwfIHeTzZ7NVT8thQ1BM=.10005882-f135-4592-a663-b75747da3f6c@github.com> On Wed, 28 Jul 2021 17:13:49 GMT, Igor Ignatyev wrote: > Hi all, > > could you please review this big tedious and trivial(-ish) patch which moves `sun.hotspot.WhiteBox` and related classes to `jdk.test.whitebox` package? > > the majority of the patch is the following substitutions: > - `s~sun/hotspot/WhiteBox~jdk/test/whitebox/WhiteBox~g` > - `s/sun.hotspot.parser/jdk.test.whitebox.parser/g` > - `s/sun.hotspot.cpuinfo/jdk.test.whitebox.cpuinfo/g` > - `s/sun.hotspot.code/jdk.test.whitebox.code/g` > - `s/sun.hotspot.gc/jdk.test.whitebox.gc/g` > - `s/sun.hotspot.WhiteBox/jdk.test.whitebox.WhiteBox/g` > > testing: tier1-4 > > Thanks, > -- Igor I know that tests fixes could be pushed during RDP2 without approval. But these one is very big and it is not a fix - it is enhancement. What is the reason you want to push it into JDK 17 just few days before first Release Candidate? Instead of pushing it into JDK 18. And I can't even review it because GutHub UI hangs on these big changes. ------------- PR: https://git.openjdk.java.net/jdk17/pull/290 From dholmes at openjdk.java.net Thu Jul 29 01:59:34 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 29 Jul 2021 01:59:34 GMT Subject: [jdk17] RFR: 8067223: [TESTBUG] Rename Whitebox API package In-Reply-To: <_3viZFs2iZ00hMdeP3Nq9gTmwfIHeTzZ7NVT8thQ1BM=.10005882-f135-4592-a663-b75747da3f6c@github.com> References: <39PNb3wa-43yB5IFR3XroqFf4vMNZPitzF85lO1Gw58=.f2587741-d2c4-467d-b722-b17269587e7a@github.com> <_3viZFs2iZ00hMdeP3Nq9gTmwfIHeTzZ7NVT8thQ1BM=.10005882-f135-4592-a663-b75747da3f6c@github.com> Message-ID: On Thu, 29 Jul 2021 01:30:37 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> could you please review this big tedious and trivial(-ish) patch which moves `sun.hotspot.WhiteBox` and related classes to `jdk.test.whitebox` package? >> >> the majority of the patch is the following substitutions: >> - `s~sun/hotspot/WhiteBox~jdk/test/whitebox/WhiteBox~g` >> - `s/sun.hotspot.parser/jdk.test.whitebox.parser/g` >> - `s/sun.hotspot.cpuinfo/jdk.test.whitebox.cpuinfo/g` >> - `s/sun.hotspot.code/jdk.test.whitebox.code/g` >> - `s/sun.hotspot.gc/jdk.test.whitebox.gc/g` >> - `s/sun.hotspot.WhiteBox/jdk.test.whitebox.WhiteBox/g` >> >> testing: tier1-4 >> >> Thanks, >> -- Igor > > I know that tests fixes could be pushed during RDP2 without approval. > But these one is very big and it is not a fix - it is enhancement. > > What is the reason you want to push it into JDK 17 just few days before first Release Candidate? Instead of pushing it into JDK 18. > > And I can't even review it because GutHub UI hangs on these big changes. I agree with @vnkozlov this too big and disruptive for 17 at this stage of the release. I also think a better approach to this cleanup would have been to copy the WhiteBox class to the new package structure, then update the tests in chunks, then remove the old WhiteBox classes when that is complete. Thanks, David ------------- PR: https://git.openjdk.java.net/jdk17/pull/290 From dholmes at openjdk.java.net Thu Jul 29 05:27:31 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 29 Jul 2021 05:27:31 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 14:58:47 GMT, Thomas Stuefe wrote: > Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free. > > --------- > > NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. > > However, NMT is of limited use due to the following restrictions: > > - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. > - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. > - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. > - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. > > The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. > > The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. > > All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. > > And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. > > ------ > > This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. > > The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. > > The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. > > Changes in detail: > > - pre-NMT-init handling: > - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. > - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. > - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. > > - Changes to NMT: > - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. > - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. > - New utility functions to translate tracking level from/to strings added to `NMTUtil` > - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. > - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. > > - Gtests: > - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. > - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. > - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. > - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. > - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. > > - jtreg: > - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. > > ------------- > > Tests: > - ran manually all new tests on 64-bit and 32-bit Linux > - GHAs > - The patch has been active in SAPs test systems for a while now. src/hotspot/share/services/memTracker.cpp line 147: > 145: if (tracking_level() >= NMT_summary) { > 146: report(true, output, MemReporterBase::default_scale); // just print summary for error case. > 147: output->print("Preinit state:"); print_cr? Or need space after ':' ? ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From dholmes at openjdk.java.net Thu Jul 29 06:40:30 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 29 Jul 2021 06:40:30 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 14:58:47 GMT, Thomas Stuefe wrote: > Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free. > > --------- > > NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. > > However, NMT is of limited use due to the following restrictions: > > - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. > - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. > - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. > - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. > > The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. > > The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. > > All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. > > And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. > > ------ > > This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. > > The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. > > The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. > > Changes in detail: > > - pre-NMT-init handling: > - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. > - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. > - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. > > - Changes to NMT: > - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. > - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. > - New utility functions to translate tracking level from/to strings added to `NMTUtil` > - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. > - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. > > - Gtests: > - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. > - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. > - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. > - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. > - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. > > - jtreg: > - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. > > ------------- > > Tests: > - ran manually all new tests on 64-bit and 32-bit Linux > - GHAs > - The patch has been active in SAPs test systems for a while now. Hi Thomas, I had a look through this and it seems reasonable, but I'm not familiar enough with the details to approve at this stage. A few nits below. Thanks, David src/hotspot/share/services/nmtCommon.hpp line 127: > 125: } > 126: > 127: // parses the tracking level from a string. Returns NMT_unknown if Nit: start sentence with capital P src/hotspot/share/services/nmtCommon.hpp line 131: > 129: static NMT_TrackingLevel parse_tracking_level(const char* s); > 130: > 131: // returns textual representation of a tracking level. Nit: start sentence with capital R src/hotspot/share/services/nmtPreInit.hpp line 114: > 112: // > 113: // We use a basic open hashmap, dimensioned generously - hash collisions should be very rare. > 114: // The table is customized for holding malloced pointers. One main point of this map is that we do Nit: double-space at line start (and for rest of comment block) src/hotspot/share/services/nmtPreInit.hpp line 140: > 138: > 139: // table_size: keep table size a prime and the hash function simple; this > 140: // seems to give a good distribution for malloce'd pointers on all our libc variants. typo: malloce'd src/hotspot/share/services/nmtPreInit.hpp line 268: > 266: add_to_map(a); > 267: (*rc) = a->payload(); > 268: _num_mallocs_pre ++; style nit: no space with unary operator src/java.base/share/native/libjli/java.c line 807: > 805: */ > 806: static void > 807: SetJvmEnvironment(int argc, char **argv) { This doesn't seem to do anything any more. test/hotspot/gtest/nmt/test_nmtpreinitmap.cpp line 2: > 1: /* > 2: * Copyright (c) 2017, 2020, Oracle and/or its affiliates. All rights reserved. New file should only have single, current, copyright year ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From ngasson at openjdk.java.net Thu Jul 29 07:19:35 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 29 Jul 2021 07:19:35 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 16:26:02 GMT, Andrew Haley wrote: >> C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. >> >> This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. >> >> Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. >> >> The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: >> >> old: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> >> new: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> >> >> Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, >> objects sizes are in bytes: >> >> >> old: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s >> >> new: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > D'oh! Stupid fencepost error. src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 1147: > 1145: LIR_Opr reg = result_register_for(x->type()); > 1146: LIR_Opr tmp1 = FrameMap::r2_oop_opr; > 1147: LIR_Opr tmp2 = FrameMap::r4_oop_opr; I think we still need r2 and r4 here: they're used in the runtime stub for eden allocation if TLAB is disabled. src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 804: > 802: __ sub(arr_size, arr_size, t1); // body length > 803: __ add(t1, t1, obj); // body start > 804: __ initialize_body(t1, arr_size, 0, t2, noreg); Is this the only place where we can end up with the the wrong temporaries in `initialize_body`? If so I think it would be better to add an assert in `initialize_body` and move the register saving here or else change the allocation of `t1`, `t2`, and `arr_size` above (see the earlier comment). ------------- PR: https://git.openjdk.java.net/jdk/pull/4919 From njian at openjdk.java.net Thu Jul 29 07:45:57 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 29 Jul 2021 07:45:57 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: <06wMDWrhwZ1LDi5YX3E27BumDZS2f0BDANCEMPv0plg=.554d7dc3-f16c-4241-afaa-3e4637adb09b@github.com> On Tue, 27 Jul 2021 08:16:32 GMT, Andrew Haley wrote: > Ah, true, SVE conditions are (of course!) different from scalar conditions. But that's still no more than a mapping function from c2-specific cond to SVE codes, isn't it? Updated in the latest commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From njian at openjdk.java.net Thu Jul 29 07:45:56 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 29 Jul 2021 07:45:56 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v3] In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: Address Andrew's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4122/files - new: https://git.openjdk.java.net/jdk/pull/4122/files/eaa19f03..24100773 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=01-02 Stats: 232 lines in 5 files changed: 64 ins; 50 del; 118 mod Patch: https://git.openjdk.java.net/jdk/pull/4122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4122/head:pull/4122 PR: https://git.openjdk.java.net/jdk/pull/4122 From njian at openjdk.java.net Thu Jul 29 07:48:44 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 29 Jul 2021 07:48:44 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v2] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Mon, 26 Jul 2021 08:30:59 GMT, Andrew Haley wrote: >> Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8267356: AArch64: Vector API SVE codegen support >> >> This is the integration of current SVE work done in >> panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on >> 256-bit SVE environment could also generate optimized SVE >> instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further >> improvement to map mask to predicate register is under development at >> https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware >> with MaxVectorSize=16/32/64. > > Basically looks good. > One small thing: please check all comments in the Assembler to make sure they match the named instruction groupsl in DDI0584B_a_SVE/SVE_xml/xhtml/encodingindex.html. I know that Arm aren't consistent in their docs either, but we don't need to add to the confusion. Hi Andrew @theRealAph , I think I have addressed all your comments in the new commit. Could you please have a look? Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From njian at openjdk.java.net Thu Jul 29 08:21:54 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 29 Jul 2021 08:21:54 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v4] In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision: Add missing part ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4122/files - new: https://git.openjdk.java.net/jdk/pull/4122/files/24100773..c444dc5a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=02-03 Stats: 129 lines in 2 files changed: 0 ins; 14 del; 115 mod Patch: https://git.openjdk.java.net/jdk/pull/4122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4122/head:pull/4122 PR: https://git.openjdk.java.net/jdk/pull/4122 From ngasson at openjdk.java.net Thu Jul 29 08:56:27 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 29 Jul 2021 08:56:27 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 16:26:02 GMT, Andrew Haley wrote: >> C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. >> >> This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. >> >> Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. >> >> The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: >> >> old: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> >> new: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> >> >> Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, >> objects sizes are in bytes: >> >> >> old: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s >> >> new: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > D'oh! Stupid fencepost error. I hit the following assertion failure when I ran tier1 with `-XX:TieredStopAtLevel=1`. This was from `compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java`. # Internal Error (/mnt/nicgas01-pc/jdk/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:566), pid=124784, tid=124797 # assert(pc() == ((address)::badAddressVal)) failed: postcond V [libjvm.so+0x12c1bbc] MacroAssembler::trampoline_call(Address, CodeBuffer*)+0x2fc V [libjvm.so+0x12daf10] MacroAssembler::zero_words(RegisterImpl*, RegisterImpl*)+0x130 V [libjvm.so+0x79c950] C1_MacroAssembler::initialize_body(RegisterImpl*, RegisterImpl*, int, RegisterImpl*, RegisterImpl*)+0x1b0 V [libjvm.so+0x79daec] C1_MacroAssembler::allocate_array(RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, int, int, RegisterImpl*, Label&)+0x41c V [libjvm.so+0x73bad0] LIR_Assembler::emit_alloc_array(LIR_OpAllocArray*)+0x160 V [libjvm.so+0x71be40] LIR_OpAllocArray::emit_code(LIR_Assembler*)+0x20 V [libjvm.so+0x72c6e8] LIR_Assembler::emit_lir_list(LIR_List*)+0xe8 V [libjvm.so+0x72ce48] LIR_Assembler::emit_code(BlockList*)+0x78 V [libjvm.so+0x6d394c] Compilation::emit_code_body()+0x14c V [libjvm.so+0x6d40a0] Compilation::compile_java_method()+0x550 ------------- PR: https://git.openjdk.java.net/jdk/pull/4919 From ngasson at openjdk.java.net Thu Jul 29 09:12:33 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 29 Jul 2021 09:12:33 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 08:53:35 GMT, Nick Gasson wrote: > I hit the following assertion failure when I ran tier1 with `-XX:TieredStopAtLevel=1`. This was from `compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java`. > > ``` > # Internal Error (/mnt/nicgas01-pc/jdk/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:566), pid=124784, tid=124797 > # assert(pc() == ((address)::badAddressVal)) failed: postcond > ``` Maybe we need to reserve some extra space for the trampoline stubs in C1's `Compilation::setup_code_buffer()`? That assert seems dodgy though, `pc()` won't be `badAddressVal` if `code()->blob()` was initially NULL (the code buffer wasn't resizeable). ------------- PR: https://git.openjdk.java.net/jdk/pull/4919 From aph at redhat.com Thu Jul 29 09:19:52 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 29 Jul 2021 10:19:52 +0100 Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: References: Message-ID: <934a804d-aba9-4396-a038-4cf732561a04@redhat.com> On 7/29/21 9:56 AM, Nick Gasson wrote: > I hit the following assertion failure when I ran tier1 with `-XX:TieredStopAtLevel=1`. This was from `compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java`. > > > # Internal Error (/mnt/nicgas01-pc/jdk/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:566), pid=124784, tid=124797 > # assert(pc() == ((address)::badAddressVal)) failed: postcond Oh, that really is weird. I'll dig some more. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Jul 29 09:36:37 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 29 Jul 2021 10:36:37 +0100 Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: <934a804d-aba9-4396-a038-4cf732561a04@redhat.com> References: <934a804d-aba9-4396-a038-4cf732561a04@redhat.com> Message-ID: On 7/29/21 10:19 AM, Andrew Haley wrote: > On 7/29/21 9:56 AM, Nick Gasson wrote: >> I hit the following assertion failure when I ran tier1 with `-XX:TieredStopAtLevel=1`. This was from `compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java`. >> >> >> # Internal Error (/mnt/nicgas01-pc/jdk/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:566), pid=124784, tid=124797 >> # assert(pc() == ((address)::badAddressVal)) failed: postcond > > Oh, that really is weird. I'll dig some more. I get Running test 'jtreg:test/hotspot/jtreg/compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java' Test results: no tests selected perhaps because that is a C2-only test. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Jul 29 09:44:03 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 29 Jul 2021 10:44:03 +0100 Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: References: <934a804d-aba9-4396-a038-4cf732561a04@redhat.com> Message-ID: On 7/29/21 10:36 AM, Andrew Haley wrote: > perhaps because that is a C2-only test. Ah no, it's a debug-only test. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From cgo at openjdk.java.net Thu Jul 29 09:48:45 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Thu, 29 Jul 2021 09:48:45 GMT Subject: RFR: 8271128: InlineIntrinsics support for 32-bit ARM Message-ID: Hi, please review this patch, which adds support for InlineIntrinsics to the 32-bit ARM port. The old aarch32 port had this intrinsic implemented and enabled by default. Like on many other platforms, the 32-bit ARM port simply calls into the `SharedRuntime` to intrinsify the basic `java.lang.Math` methods. InlineIntrinsics is already implemented for C1 on 32-bit ARM, which does the same thing. testing: hotspot tier1 on ARMv5TE (soft-float) and ARMv7-A (hard-float) There is already the micro benchmark `test/micro/org/openjdk/bench/java/lang/MathBench.java` which I used. The soft-float benchmarks are not that meaningful, since I performed them in QEMU. __hard-float__ `-Xint -XX:+InlineIntrinsics` | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | | MathBench.absDouble | 0 | thrpt | 5 | 1169.574 | +/- 133.694 | ops/ms | | MathBench.cosDouble | 0 | thrpt | 5 | 759.902 | +/- 573.852 | ops/ms | | MathBench.expDouble | 0 | thrpt | 5 | 854.753 | +/- 67.217 | ops/ms | | MathBench.log10Double | 0 | thrpt | 5 | 902.034 | +/- 22.413 | ops/ms | | MathBench.logDouble | 0 | thrpt | 5 | 895.470 | +/- 113.811 | ops/ms | | MathBench.powDouble | 0 | thrpt | 5 | 936.136 | +/- 40.661 | ops/ms | | MathBench.sinDouble | 0 | thrpt | 5 | 864.670 | +/- 68.329 | ops/ms | | MathBench.sqrtDouble | 0 | thrpt | 5 | 1082.589 | +/- 92.570 | ops/ms | | MathBench.tanDouble | 0 | thrpt | 5 | 853.715 | +/- 122.427 | ops/ms | __hard-float__ `-Xint -XX:-InlineIntrinsics` | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | | MathBench.absDouble | 0 | thrpt | 5 | 450.907 | +/- 10.402 | ops/ms | | MathBench.cosDouble | 0 | thrpt | 5 | 592.242 | +/- 14.011 | ops/ms | | MathBench.expDouble | 0 | thrpt | 5 | 167.614 | +/- 7.530 | ops/ms | | MathBench.log10Double | 0 | thrpt | 5 | 572.099 | +/- 55.089 | ops/ms | | MathBench.logDouble | 0 | thrpt | 5 | 596.588 | +/- 24.976 | ops/ms | | MathBench.powDouble | 0 | thrpt | 5 | 212.673 | +/- 4.060 | ops/ms | | MathBench.sinDouble | 0 | thrpt | 5 | 584.873 | +/- 42.774 | ops/ms | | MathBench.sqrtDouble | 0 | thrpt | 5 | 514.690 | +/- 30.568 | ops/ms | | MathBench.tanDouble | 0 | thrpt | 5 | 566.586 | +/- 23.995 | ops/ms | __soft-float__ `-Xint -XX:+InlineIntrinsics` | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | | MathBench.absDouble | 0 | thrpt | 5 | 279.575 | +/- 56.455 | ops/ms | | MathBench.cosDouble | 0 | thrpt | 5 | 137.005 | +/- 72.561 | ops/ms | | MathBench.expDouble | 0 | thrpt | 5 | 117.778 | +/- 30.186 | ops/ms | | MathBench.log10Double | 0 | thrpt | 5 | 107.957 | +/- 10.158 | ops/ms | | MathBench.logDouble | 0 | thrpt | 5 | 101.341 | +/- 3.914 | ops/ms | | MathBench.powDouble | 0 | thrpt | 5 | 222.220 | +/- 3.854 | ops/ms | | MathBench.sinDouble | 0 | thrpt | 5 | 112.715 | +/- 9.088 | ops/ms | | MathBench.sqrtDouble | 0 | thrpt | 5 | 119.341 | +/- 76.528 | ops/ms | | MathBench.tanDouble | 0 | thrpt | 5 | 105.224 | +/- 30.477 | ops/ms | __soft-float__ `-Xint -XX:-InlineIntrinsics` | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | | MathBench.absDouble | 0 | thrpt | 5 | 173.150 | +/- 36.279 | ops/ms | | MathBench.cosDouble | 0 | thrpt | 5 | 129.774 | +/- 8.795 | ops/ms | | MathBench.expDouble | 0 | thrpt | 5 | 53.524 | +/- 1.679 | ops/ms | | MathBench.log10Double | 0 | thrpt | 5 | 132.503 | +/- 4.274 | ops/ms | | MathBench.logDouble | 0 | thrpt | 5 | 135.483 | +/- 1.150 | ops/ms | | MathBench.powDouble | 0 | thrpt | 5 | 54.266 | +/- 0.699 | ops/ms | | MathBench.sinDouble | 0 | thrpt | 5 | 105.636 | +/- 4.647 | ops/ms | | MathBench.sqrtDouble | 0 | thrpt | 5 | 204.550 | +/- 7.206 | ops/ms | | MathBench.tanDouble | 0 | thrpt | 5 | 101.072 | +/- 3.701 | ops/ms | ------------- Commit messages: - 8271128: InlineIntrinsics support for 32-bit ARM Changes: https://git.openjdk.java.net/jdk/pull/4927/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4927&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271128 Stats: 111 lines in 4 files changed: 104 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4927.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4927/head:pull/4927 PR: https://git.openjdk.java.net/jdk/pull/4927 From nick.gasson at arm.com Thu Jul 29 09:58:17 2021 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 29 Jul 2021 17:58:17 +0800 Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: References: <934a804d-aba9-4396-a038-4cf732561a04@redhat.com> Message-ID: <8535rxphl2.fsf@arm.com> On 29/07/21 17:36 pm, Andrew Haley wrote: > > I get > > Running test 'jtreg:test/hotspot/jtreg/compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java' > Test results: no tests selected > > perhaps because that is a C2-only test. I did: make exploded-test TEST="compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java" \ JTREG="VM_OPTIONS=-XX:TieredStopAtLevel=1" On a fastdebug build. -- Nick From shade at openjdk.java.net Thu Jul 29 10:08:31 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 29 Jul 2021 10:08:31 GMT Subject: RFR: 8271128: InlineIntrinsics support for 32-bit ARM In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 09:40:08 GMT, Christoph G?ttschkes wrote: > Hi, > > please review this patch, which adds support for InlineIntrinsics to the 32-bit ARM port. The old aarch32 port had this intrinsic implemented and enabled by default. > > Like on many other platforms, the 32-bit ARM port simply calls into the `SharedRuntime` to intrinsify the basic `java.lang.Math` methods. InlineIntrinsics is already implemented for C1 on 32-bit ARM, which does the same thing. > > testing: hotspot tier1 on ARMv5TE (soft-float) and ARMv7-A (hard-float) > > There is already the micro benchmark `test/micro/org/openjdk/bench/java/lang/MathBench.java` which I used. The soft-float benchmarks are not that meaningful, since I performed them in QEMU. > > __hard-float__ `-Xint -XX:+InlineIntrinsics` > > | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | > | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | > | MathBench.absDouble | 0 | thrpt | 5 | 1169.574 | +/- 133.694 | ops/ms | > | MathBench.cosDouble | 0 | thrpt | 5 | 759.902 | +/- 573.852 | ops/ms | > | MathBench.expDouble | 0 | thrpt | 5 | 854.753 | +/- 67.217 | ops/ms | > | MathBench.log10Double | 0 | thrpt | 5 | 902.034 | +/- 22.413 | ops/ms | > | MathBench.logDouble | 0 | thrpt | 5 | 895.470 | +/- 113.811 | ops/ms | > | MathBench.powDouble | 0 | thrpt | 5 | 936.136 | +/- 40.661 | ops/ms | > | MathBench.sinDouble | 0 | thrpt | 5 | 864.670 | +/- 68.329 | ops/ms | > | MathBench.sqrtDouble | 0 | thrpt | 5 | 1082.589 | +/- 92.570 | ops/ms | > | MathBench.tanDouble | 0 | thrpt | 5 | 853.715 | +/- 122.427 | ops/ms | > > __hard-float__ `-Xint -XX:-InlineIntrinsics` > > | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | > | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | > | MathBench.absDouble | 0 | thrpt | 5 | 450.907 | +/- 10.402 | ops/ms | > | MathBench.cosDouble | 0 | thrpt | 5 | 592.242 | +/- 14.011 | ops/ms | > | MathBench.expDouble | 0 | thrpt | 5 | 167.614 | +/- 7.530 | ops/ms | > | MathBench.log10Double | 0 | thrpt | 5 | 572.099 | +/- 55.089 | ops/ms | > | MathBench.logDouble | 0 | thrpt | 5 | 596.588 | +/- 24.976 | ops/ms | > | MathBench.powDouble | 0 | thrpt | 5 | 212.673 | +/- 4.060 | ops/ms | > | MathBench.sinDouble | 0 | thrpt | 5 | 584.873 | +/- 42.774 | ops/ms | > | MathBench.sqrtDouble | 0 | thrpt | 5 | 514.690 | +/- 30.568 | ops/ms | > | MathBench.tanDouble | 0 | thrpt | 5 | 566.586 | +/- 23.995 | ops/ms | > > __soft-float__ `-Xint -XX:+InlineIntrinsics` > > | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | > | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | > | MathBench.absDouble | 0 | thrpt | 5 | 279.575 | +/- 56.455 | ops/ms | > | MathBench.cosDouble | 0 | thrpt | 5 | 137.005 | +/- 72.561 | ops/ms | > | MathBench.expDouble | 0 | thrpt | 5 | 117.778 | +/- 30.186 | ops/ms | > | MathBench.log10Double | 0 | thrpt | 5 | 107.957 | +/- 10.158 | ops/ms | > | MathBench.logDouble | 0 | thrpt | 5 | 101.341 | +/- 3.914 | ops/ms | > | MathBench.powDouble | 0 | thrpt | 5 | 222.220 | +/- 3.854 | ops/ms | > | MathBench.sinDouble | 0 | thrpt | 5 | 112.715 | +/- 9.088 | ops/ms | > | MathBench.sqrtDouble | 0 | thrpt | 5 | 119.341 | +/- 76.528 | ops/ms | > | MathBench.tanDouble | 0 | thrpt | 5 | 105.224 | +/- 30.477 | ops/ms | > > __soft-float__ `-Xint -XX:-InlineIntrinsics` > > | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | > | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | > | MathBench.absDouble | 0 | thrpt | 5 | 173.150 | +/- 36.279 | ops/ms | > | MathBench.cosDouble | 0 | thrpt | 5 | 129.774 | +/- 8.795 | ops/ms | > | MathBench.expDouble | 0 | thrpt | 5 | 53.524 | +/- 1.679 | ops/ms | > | MathBench.log10Double | 0 | thrpt | 5 | 132.503 | +/- 4.274 | ops/ms | > | MathBench.logDouble | 0 | thrpt | 5 | 135.483 | +/- 1.150 | ops/ms | > | MathBench.powDouble | 0 | thrpt | 5 | 54.266 | +/- 0.699 | ops/ms | > | MathBench.sinDouble | 0 | thrpt | 5 | 105.636 | +/- 4.647 | ops/ms | > | MathBench.sqrtDouble | 0 | thrpt | 5 | 204.550 | +/- 7.206 | ops/ms | > | MathBench.tanDouble | 0 | thrpt | 5 | 101.072 | +/- 3.701 | ops/ms | This looks okay to me, but I think `hotspot-tier1` testing is inadequate for this. Please run at least full `tier1`, which should include JDK tests for `java.lang.Math`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4927 From aph at redhat.com Thu Jul 29 10:59:40 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 29 Jul 2021 11:59:40 +0100 Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v2] In-Reply-To: <8535rxphl2.fsf@arm.com> References: <934a804d-aba9-4396-a038-4cf732561a04@redhat.com> <8535rxphl2.fsf@arm.com> Message-ID: On 7/29/21 10:58 AM, Nick Gasson wrote: > On 29/07/21 17:36 pm, Andrew Haley wrote: >> >> I get >> >> Running test 'jtreg:test/hotspot/jtreg/compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java' >> Test results: no tests selected >> >> perhaps because that is a C2-only test. > > I did: > > make exploded-test TEST="compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java" \ > JTREG="VM_OPTIONS=-XX:TieredStopAtLevel=1" > > On a fastdebug build. Fascinating. C1 doesn't allocate any space for trampoline stubs, except for explicit call LIR. Anyway, it makes sense to use far_call() in the case of C1 compilation: we want compilations to be fast anyway. From cgo at openjdk.java.net Thu Jul 29 12:01:29 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Thu, 29 Jul 2021 12:01:29 GMT Subject: RFR: 8271128: InlineIntrinsics support for 32-bit ARM In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 09:40:08 GMT, Christoph G?ttschkes wrote: > Hi, > > please review this patch, which adds support for InlineIntrinsics to the 32-bit ARM port. The old aarch32 port had this intrinsic implemented and enabled by default. > > Like on many other platforms, the 32-bit ARM port simply calls into the `SharedRuntime` to intrinsify the basic `java.lang.Math` methods. InlineIntrinsics is already implemented for C1 on 32-bit ARM, which does the same thing. > > testing: hotspot tier1 on ARMv5TE (soft-float) and ARMv7-A (hard-float) > > There is already the micro benchmark `test/micro/org/openjdk/bench/java/lang/MathBench.java` which I used. The soft-float benchmarks are not that meaningful, since I performed them in QEMU. > > __hard-float__ `-Xint -XX:+InlineIntrinsics` > > | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | > | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | > | MathBench.absDouble | 0 | thrpt | 5 | 1169.574 | +/- 133.694 | ops/ms | > | MathBench.cosDouble | 0 | thrpt | 5 | 759.902 | +/- 573.852 | ops/ms | > | MathBench.expDouble | 0 | thrpt | 5 | 854.753 | +/- 67.217 | ops/ms | > | MathBench.log10Double | 0 | thrpt | 5 | 902.034 | +/- 22.413 | ops/ms | > | MathBench.logDouble | 0 | thrpt | 5 | 895.470 | +/- 113.811 | ops/ms | > | MathBench.powDouble | 0 | thrpt | 5 | 936.136 | +/- 40.661 | ops/ms | > | MathBench.sinDouble | 0 | thrpt | 5 | 864.670 | +/- 68.329 | ops/ms | > | MathBench.sqrtDouble | 0 | thrpt | 5 | 1082.589 | +/- 92.570 | ops/ms | > | MathBench.tanDouble | 0 | thrpt | 5 | 853.715 | +/- 122.427 | ops/ms | > > __hard-float__ `-Xint -XX:-InlineIntrinsics` > > | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | > | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | > | MathBench.absDouble | 0 | thrpt | 5 | 450.907 | +/- 10.402 | ops/ms | > | MathBench.cosDouble | 0 | thrpt | 5 | 592.242 | +/- 14.011 | ops/ms | > | MathBench.expDouble | 0 | thrpt | 5 | 167.614 | +/- 7.530 | ops/ms | > | MathBench.log10Double | 0 | thrpt | 5 | 572.099 | +/- 55.089 | ops/ms | > | MathBench.logDouble | 0 | thrpt | 5 | 596.588 | +/- 24.976 | ops/ms | > | MathBench.powDouble | 0 | thrpt | 5 | 212.673 | +/- 4.060 | ops/ms | > | MathBench.sinDouble | 0 | thrpt | 5 | 584.873 | +/- 42.774 | ops/ms | > | MathBench.sqrtDouble | 0 | thrpt | 5 | 514.690 | +/- 30.568 | ops/ms | > | MathBench.tanDouble | 0 | thrpt | 5 | 566.586 | +/- 23.995 | ops/ms | > > __soft-float__ `-Xint -XX:+InlineIntrinsics` > > | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | > | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | > | MathBench.absDouble | 0 | thrpt | 5 | 279.575 | +/- 56.455 | ops/ms | > | MathBench.cosDouble | 0 | thrpt | 5 | 137.005 | +/- 72.561 | ops/ms | > | MathBench.expDouble | 0 | thrpt | 5 | 117.778 | +/- 30.186 | ops/ms | > | MathBench.log10Double | 0 | thrpt | 5 | 107.957 | +/- 10.158 | ops/ms | > | MathBench.logDouble | 0 | thrpt | 5 | 101.341 | +/- 3.914 | ops/ms | > | MathBench.powDouble | 0 | thrpt | 5 | 222.220 | +/- 3.854 | ops/ms | > | MathBench.sinDouble | 0 | thrpt | 5 | 112.715 | +/- 9.088 | ops/ms | > | MathBench.sqrtDouble | 0 | thrpt | 5 | 119.341 | +/- 76.528 | ops/ms | > | MathBench.tanDouble | 0 | thrpt | 5 | 105.224 | +/- 30.477 | ops/ms | > > __soft-float__ `-Xint -XX:-InlineIntrinsics` > > | Benchmark | (seed) | Mode | Cnt | Score | Error | Units | > | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: | > | MathBench.absDouble | 0 | thrpt | 5 | 173.150 | +/- 36.279 | ops/ms | > | MathBench.cosDouble | 0 | thrpt | 5 | 129.774 | +/- 8.795 | ops/ms | > | MathBench.expDouble | 0 | thrpt | 5 | 53.524 | +/- 1.679 | ops/ms | > | MathBench.log10Double | 0 | thrpt | 5 | 132.503 | +/- 4.274 | ops/ms | > | MathBench.logDouble | 0 | thrpt | 5 | 135.483 | +/- 1.150 | ops/ms | > | MathBench.powDouble | 0 | thrpt | 5 | 54.266 | +/- 0.699 | ops/ms | > | MathBench.sinDouble | 0 | thrpt | 5 | 105.636 | +/- 4.647 | ops/ms | > | MathBench.sqrtDouble | 0 | thrpt | 5 | 204.550 | +/- 7.206 | ops/ms | > | MathBench.tanDouble | 0 | thrpt | 5 | 101.072 | +/- 3.701 | ops/ms | Sorry, I forgot to mention this. I tested the java.lang.Math methods for which the new intrinsics are implemented manually, by comparing the results of the intrinsics with the results of the corresponding java.lang.StrictMath method. Since both, StrictMath and the intrinsics use the same algorithm on 32-bit ARM, it is possible to do that. But I agree, doing some more testing doesn't hurt. I will start jdk tier1 as well and will report back as soon as they are done, will definitely take some time though. ------------- PR: https://git.openjdk.java.net/jdk/pull/4927 From chagedorn at openjdk.java.net Thu Jul 29 12:34:40 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 29 Jul 2021 12:34:40 GMT Subject: RFR: 8271471: [IR Framework] Rare occurrence of "" in PrintIdeal/PrintOptoAssembly can let tests fail Message-ID: A test VM used by the IR framework sometimes prints `` in the middle of emitting a `PrintIdeal` or `PrintOptoAssembly` output which could lead to IR matching failures: https://github.com/openjdk/jdk/blob/489e5fd12a37a45f4f5ea64b05f85c6f99f70811/src/hotspot/share/utilities/ostream.cpp#L918-L927 I thought about just bailing out of IR matching if this string is found after a failure but this issue also affects internal framework tests (I observed one case locally where this happened in the test `TestIRMatching`, letting it fail). Handling `` makes things more complicated for the IR framework tests. I'm not sure about the value of printing this message in the first place. But if nobody objects, I suggest to either remove it or at least guard it with `Verbose`, for example. I went with the latter solution in this PR. Thanks, Christian ------------- Commit messages: - 8271471: [IR Framework] Rare occurrence of "" in PrintIdeal/PrintOptoAssembly can let tests fai Changes: https://git.openjdk.java.net/jdk/pull/4932/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4932&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271471 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4932.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4932/head:pull/4932 PR: https://git.openjdk.java.net/jdk/pull/4932 From jboes at openjdk.java.net Thu Jul 29 15:58:29 2021 From: jboes at openjdk.java.net (Julia Boes) Date: Thu, 29 Jul 2021 15:58:29 GMT Subject: RFR: 8271396: Spelling errors In-Reply-To: References: Message-ID: <6pKjSeljbRVUY95MMl6XydDzNvJde-Pa1DSN6OfM7mM=.8b5ea3f5-dce4-49d3-8090-023734e19061@github.com> On Wed, 28 Jul 2021 17:23:51 GMT, Emmanuel Bourg wrote: >> @ebourg for future PRs please do not force push after the PR is out for review. Just push incremental commits normally. The Skara tooling will squash them all into a single commit. > > @kevinrushforth I'll do that, thank you for the hint @ebourg Thanks for updating the copyright. If you integrate again, I can sponsor. ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From github.com+54304+ebourg at openjdk.java.net Thu Jul 29 16:06:37 2021 From: github.com+54304+ebourg at openjdk.java.net (Emmanuel Bourg) Date: Thu, 29 Jul 2021 16:06:37 GMT Subject: Integrated: 8271396: Spelling errors In-Reply-To: References: Message-ID: <9uAMUJBjsby1KkwiwU8wrYEw0ozGjXD7Xnjil0nLoXg=.fd3b9a16-c42a-4806-a174-8b6b0b565abb@github.com> On Wed, 3 Feb 2021 19:12:25 GMT, Emmanuel Bourg wrote: > This PR fixes the following spelling errors: > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed This pull request has now been integrated. Changeset: d09b0284 Author: Emmanuel Bourg Committer: Julia Boes URL: https://git.openjdk.java.net/jdk/commit/d09b028407ff9d0e8c2dfd9cc5d0dca19c4497e3 Stats: 103 lines in 34 files changed: 0 ins; 0 del; 103 mod 8271396: Spelling errors Reviewed-by: tschatzl, chegar, iris, psadhukhan, cjplummer ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From github.com+54304+ebourg at openjdk.java.net Thu Jul 29 16:11:36 2021 From: github.com+54304+ebourg at openjdk.java.net (Emmanuel Bourg) Date: Thu, 29 Jul 2021 16:11:36 GMT Subject: RFR: 8271396: Spelling errors [v2] In-Reply-To: References: Message-ID: <1IourDXaaOqLbUP1_8BfEui3NErozrWkUoBcYUmYAx8=.82ca7fd0-732d-41ab-ad90-8e77412b8ac2@github.com> On Wed, 28 Jul 2021 17:12:04 GMT, Emmanuel Bourg wrote: >> This PR fixes the following spelling errors: >> >> choosen -> chosen >> commad -> command >> hiearchy -> hierarchy >> leagacy -> legacy >> minium -> minimum >> subsytem -> subsystem >> unamed -> unnamed > > Emmanuel Bourg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8271396: Fix spelling errors > > choosen -> chosen > commad -> command > hiearchy -> hierarchy > leagacy -> legacy > minium -> minimum > subsytem -> subsystem > unamed -> unnamed Thank you, glad to land my first commit to OpenJDK ------------- PR: https://git.openjdk.java.net/jdk/pull/2385 From aph at openjdk.java.net Thu Jul 29 16:18:58 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 29 Jul 2021 16:18:58 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v3] In-Reply-To: References: Message-ID: > C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. > > This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. > > Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. > > The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: > > old: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > > new: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > > > Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, > objects sizes are in bytes: > > > old: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s > > new: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Tidy up register temps in C1 stubs that call initialize_body() - Don't use a trampoline call to zero_blocks in C1 compiles ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4919/files - new: https://git.openjdk.java.net/jdk/pull/4919/files/31f238e6..cd1e83ba Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=01-02 Stats: 30 lines in 3 files changed: 9 ins; 7 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/4919.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4919/head:pull/4919 PR: https://git.openjdk.java.net/jdk/pull/4919 From rriggs at openjdk.java.net Thu Jul 29 16:44:55 2021 From: rriggs at openjdk.java.net (Roger Riggs) Date: Thu, 29 Jul 2021 16:44:55 GMT Subject: [jdk17] RFR: 8271489: (doc) Clarify Filter Factory example Message-ID: <0p6tjDXve1NzwIv5CqU2RjJgL9R0t0Je8scHatsVlTU=.a49cd354-ca52-4a8c-8481-dde3431a4a27@github.com> Improve the clarity of comments in the ObjectInputFilter FilterInThread example. ------------- Commit messages: - 8271489: (doc) Clarify Filter Factory example - 8270398: Enhance canonicalization - 8270404: Better canonicalization - Merge - Merge - 8263531: Remove unused buffer int - 8262731: [macOS] Exception from "Printable.print" is swallowed during "PrinterJob.print" - 8269763: The JEditorPane is blank after JDK-8265167 - 8265580: Enhanced style for RTF kit - 8265574: Improve handling of sheets - ... and 15 more: https://git.openjdk.java.net/jdk17/compare/c1304519...650e1561 Changes: https://git.openjdk.java.net/jdk17/pull/292/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=292&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271489 Stats: 1001 lines in 42 files changed: 625 ins; 181 del; 195 mod Patch: https://git.openjdk.java.net/jdk17/pull/292.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/292/head:pull/292 PR: https://git.openjdk.java.net/jdk17/pull/292 From kvn at openjdk.java.net Thu Jul 29 17:03:28 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 29 Jul 2021 17:03:28 GMT Subject: RFR: 8271471: [IR Framework] Rare occurrence of "" in PrintIdeal/PrintOptoAssembly can let tests fail In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 12:25:27 GMT, Christian Hagedorn wrote: > A test VM used by the IR framework sometimes prints `` in the middle of emitting a `PrintIdeal` or `PrintOptoAssembly` output which could lead to IR matching failures: > https://github.com/openjdk/jdk/blob/489e5fd12a37a45f4f5ea64b05f85c6f99f70811/src/hotspot/share/utilities/ostream.cpp#L918-L927 > > I thought about just bailing out of IR matching if this string is found after a failure but this issue also affects internal framework tests (I observed one case locally where this happened in the test `TestIRMatching`, letting it fail). > > Handling `` makes things more complicated for the IR framework tests. I'm not sure about the value of printing this message in the first place. But if nobody objects, I suggest to either remove it or at least guard it with `Verbose`, for example. I went with the latter solution in this PR. > > Thanks, > Christian Agree. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4932 From rriggs at openjdk.java.net Thu Jul 29 17:46:36 2021 From: rriggs at openjdk.java.net (Roger Riggs) Date: Thu, 29 Jul 2021 17:46:36 GMT Subject: [jdk17] Withdrawn: 8271489: (doc) Clarify Filter Factory example In-Reply-To: <0p6tjDXve1NzwIv5CqU2RjJgL9R0t0Je8scHatsVlTU=.a49cd354-ca52-4a8c-8481-dde3431a4a27@github.com> References: <0p6tjDXve1NzwIv5CqU2RjJgL9R0t0Je8scHatsVlTU=.a49cd354-ca52-4a8c-8481-dde3431a4a27@github.com> Message-ID: <4o8A0bH9cVUSjB1lO6rW6Z3QT1KTxpu9Z0IaWK3c_-g=.25c55618-f4b9-46d6-9e4e-39674df24ca9@github.com> On Thu, 29 Jul 2021 16:36:21 GMT, Roger Riggs wrote: > Improve the clarity of comments in the ObjectInputFilter FilterInThread example. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk17/pull/292 From coleenp at openjdk.java.net Thu Jul 29 21:07:45 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 29 Jul 2021 21:07:45 GMT Subject: RFR: 8271506: Add ResourceHashtable support for deleting selected entries Message-ID: The ResourceHashtable doesn't have a way to delete selected entries based on things like their class has been unloaded, which is needed to replace Hashtable with ResourceHashtable. The Nodes of the ResourceHashtable has a Key and Value of template types. template class ResourceHashtableNode : public ResourceObj { public: unsigned _hash; K _key; V _value; ... But there's no destructor so that ~K and ~V are not called (if I understand C++ correctly). When instantiated with a value that's not a pointer, calling code does this: SourceObjInfo src_info(ref, read_only, follow_mode); bool created; SourceObjInfo* p = _src_obj_table.put_if_absent(src_obj, src_info, &created); So if SourceObjInfo has a destructor, it'll have to have a careful assignment operator so that the value copied into the hashtable doesn't get deleted. In this patch, I assign the responsibility of deleting the Key and Value of the hashtable to the do_entry function, because it's simple. If we want to use more advanced unreadable C++ code, someone will have to suggest an alternate set of changes, because my C++ is not up to this. Tested with tier1-3, gtest, and upcoming patch for JDK-8048190. ------------- Commit messages: - 8271506: Add ResourceHashtable support for deleting selected entries Changes: https://git.openjdk.java.net/jdk/pull/4938/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4938&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271506 Stats: 51 lines in 2 files changed: 51 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4938.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4938/head:pull/4938 PR: https://git.openjdk.java.net/jdk/pull/4938 From coleenp at openjdk.java.net Thu Jul 29 21:42:29 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 29 Jul 2021 21:42:29 GMT Subject: RFR: 8271242: Add Arena regression tests In-Reply-To: References: Message-ID: <_chJbiP524b_Xm5Ir_gtmnRKX00FuJfVmSfwVhC0PoM=.ec823b0f-65d9-4bc1-92ec-8be13d06cf3b@github.com> On Tue, 27 Jul 2021 06:06:40 GMT, Thomas Stuefe wrote: > May I please have reviews for these test additions. These are new regression tests for hotspot arenas. We don't have any and it makes sense to have them since this code is fragile and we work on it. > > It also contains some new gtest utility functions which I will use to consolidate some more test coding, mainly in the metaspace gtests (in a future rfe). > > It also comes with a new jtreg gtest launcher for arena tests to test UseMallocOnly mode. As long as we support that, we should test it. > > --- > > tests: > - gtests, manually with 32/64 bit and with/without UseMallocOnly > - GHAs Looks good. Thank you for writing these tests. test/hotspot/gtest/memory/test_arena.cpp line 65: > 63: } > 64: // Allocate again. The new allocations should have the same position as the 0-sized > 65: // first one. It seems so dangerous to allow zero sized Amalloc. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4909 From dholmes at openjdk.java.net Thu Jul 29 22:35:31 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 29 Jul 2021 22:35:31 GMT Subject: RFR: 8271471: [IR Framework] Rare occurrence of "" in PrintIdeal/PrintOptoAssembly can let tests fail In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 12:25:27 GMT, Christian Hagedorn wrote: > A test VM used by the IR framework sometimes prints `` in the middle of emitting a `PrintIdeal` or `PrintOptoAssembly` output which could lead to IR matching failures: > https://github.com/openjdk/jdk/blob/489e5fd12a37a45f4f5ea64b05f85c6f99f70811/src/hotspot/share/utilities/ostream.cpp#L918-L927 > > I thought about just bailing out of IR matching if this string is found after a failure but this issue also affects internal framework tests (I observed one case locally where this happened in the test `TestIRMatching`, letting it fail). > > Handling `` makes things more complicated for the IR framework tests. I'm not sure about the value of printing this message in the first place. But if nobody objects, I suggest to either remove it or at least guard it with `Verbose`, for example. I went with the latter solution in this PR. > > Thanks, > Christian I think the expectation is that a safepoint whilst printing and holding the tty lock should be a very rare thing, possibly indicative of an error, and so something we want to know about. If this is hidden by Verbose then it will effectively always be hidden and we won't spot this. Do you understand why this is getting printed in this context, and that it doesn't indicate a problem? Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/4932 From coleenp at openjdk.java.net Thu Jul 29 23:06:30 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 29 Jul 2021 23:06:30 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 14:58:47 GMT, Thomas Stuefe wrote: > Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free. > > --------- > > NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. > > However, NMT is of limited use due to the following restrictions: > > - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. > - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. > - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. > - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. > > The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. > > The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. > > All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. > > And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. > > ------ > > This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. > > The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. > > The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. > > Changes in detail: > > - pre-NMT-init handling: > - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. > - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. > - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. > > - Changes to NMT: > - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. > - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. > - New utility functions to translate tracking level from/to strings added to `NMTUtil` > - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. > - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. > > - Gtests: > - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. > - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. > - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. > - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. > - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. > > - jtreg: > - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. > > ------------- > > Tests: > - ran manually all new tests on 64-bit and 32-bit Linux > - GHAs > - The patch has been active in SAPs test systems for a while now. This is an interesting and it seems a better way to solve this problem. Where were you all those years ago?? I hope @zhengyu123 has a chance to review it. Also interesting is that we were wondering how we could return malloc'd memory on JVM creation failure, and this might partially help with that larger problem. src/hotspot/share/services/nmtPreInit.hpp line 128: > 126: // Returns start of the user data area > 127: void* payload() { return this + 1; } > 128: const void* payload() const { return this + 1; } This is an odd looking overload (that I wouldn't have thought possible). Maybe one of these payloads can be renamed to why it's const. src/hotspot/share/services/nmtPreInit.hpp line 166: > 164: NMTPreInitAllocation** find_entry(const void* p) const { > 165: const unsigned index = index_for_key(p); > 166: NMTPreInitAllocation** aa = (NMTPreInitAllocation**) (&(_entries[index])); Why is this cast needed? ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4874 From kbarrett at openjdk.java.net Fri Jul 30 04:06:29 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 30 Jul 2021 04:06:29 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 22:52:07 GMT, Coleen Phillimore wrote: >> Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free. >> >> --------- >> >> NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. >> >> However, NMT is of limited use due to the following restrictions: >> >> - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. >> - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. >> - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. >> - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. >> >> The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. >> >> The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. >> >> All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. >> >> And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. >> >> ------ >> >> This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. >> >> The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. >> >> The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. >> >> Changes in detail: >> >> - pre-NMT-init handling: >> - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. >> - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. >> - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. >> >> - Changes to NMT: >> - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. >> - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. >> - New utility functions to translate tracking level from/to strings added to `NMTUtil` >> - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. >> - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. >> >> - Gtests: >> - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. >> - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. >> - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. >> - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. >> - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. >> >> - jtreg: >> - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. >> >> ------------- >> >> Tests: >> - ran manually all new tests on 64-bit and 32-bit Linux >> - GHAs >> - The patch has been active in SAPs test systems for a while now. > > src/hotspot/share/services/nmtPreInit.hpp line 128: > >> 126: // Returns start of the user data area >> 127: void* payload() { return this + 1; } >> 128: const void* payload() const { return this + 1; } > > This is an odd looking overload (that I wouldn't have thought possible). Maybe one of these payloads can be renamed to why it's const. [Not a review, just a drive-by comment.] This is a normal and idiomatic overload on the const-ness of `this`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From kbarrett at openjdk.java.net Fri Jul 30 04:12:31 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 30 Jul 2021 04:12:31 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 22:53:42 GMT, Coleen Phillimore wrote: >> Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free. >> >> --------- >> >> NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. >> >> However, NMT is of limited use due to the following restrictions: >> >> - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. >> - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. >> - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. >> - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. >> >> The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. >> >> The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. >> >> All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. >> >> And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. >> >> ------ >> >> This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. >> >> The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. >> >> The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. >> >> Changes in detail: >> >> - pre-NMT-init handling: >> - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. >> - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. >> - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. >> >> - Changes to NMT: >> - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. >> - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. >> - New utility functions to translate tracking level from/to strings added to `NMTUtil` >> - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. >> - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. >> >> - Gtests: >> - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. >> - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. >> - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. >> - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. >> - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. >> >> - jtreg: >> - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. >> >> ------------- >> >> Tests: >> - ran manually all new tests on 64-bit and 32-bit Linux >> - GHAs >> - The patch has been active in SAPs test systems for a while now. > > src/hotspot/share/services/nmtPreInit.hpp line 166: > >> 164: NMTPreInitAllocation** find_entry(const void* p) const { >> 165: const unsigned index = index_for_key(p); >> 166: NMTPreInitAllocation** aa = (NMTPreInitAllocation**) (&(_entries[index])); > > Why is this cast needed? [Not a review, just a drive-by comment.] It's casting away const. Better would be a const_cast. And probably moved to the final result, with the body keeping things const-qualified. And maybe const and non-const overloads of this function. Or maybe this function shouldn't be const-qualified if a non-const result is always needed, but that doesn't seem likely. ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From iklam at openjdk.java.net Fri Jul 30 04:32:30 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 30 Jul 2021 04:32:30 GMT Subject: RFR: 8271506: Add ResourceHashtable support for deleting selected entries In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 21:00:06 GMT, Coleen Phillimore wrote: > The ResourceHashtable doesn't have a way to delete selected entries based on things like their class has been unloaded, which is needed to replace Hashtable with ResourceHashtable. > The Nodes of the ResourceHashtable has a Key and Value of template types. > template > class ResourceHashtableNode : public ResourceObj { > public: > unsigned _hash; > K _key; > V _value; > ... > But there's no destructor so that ~K and ~V are not called (if I understand C++ correctly). > > When instantiated with a value that's not a pointer, calling code does this: > > SourceObjInfo src_info(ref, read_only, follow_mode); > bool created; > SourceObjInfo* p = _src_obj_table.put_if_absent(src_obj, src_info, &created); > > So if SourceObjInfo has a destructor, it'll have to have a careful assignment operator so that the value copied into the hashtable doesn't get deleted. > > In this patch, I assign the responsibility of deleting the Key and Value of the hashtable to the do_entry function, because it's simple. If we want to use more advanced unreadable C++ code, someone will have to suggest an alternate set of changes, because my C++ is not up to this. > > Tested with tier1-3, gtest, and upcoming patch for JDK-8048190. Changes requested by iklam (Reviewer). src/hotspot/share/utilities/resourceHash.hpp line 219: > 217: // the entry is deleted. > 218: template > 219: void unlink(ITER* iter) const { `unlink()` shouldn't be `const` as it modifies the table. However, you probably needed to declare it as such because it calls `bucket_at()`, which is `const`. And the reason that `bucket_at()` is `const` is because it's called by `iterate()`, which is `const`. The use of `const` and `const_cast` in this class is getting a bit out of whack. I have created [JDK-8271525](https://bugs.openjdk.java.net/browse/JDK-8271525) with a preliminary fix here: https://github.com/openjdk/jdk/pull/4942 Maybe we should do that before this PR? it will also make the `const_cast` unnecessary for `decrement_entries()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4938 From chagedorn at openjdk.java.net Fri Jul 30 07:22:34 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 30 Jul 2021 07:22:34 GMT Subject: RFR: 8271471: [IR Framework] Rare occurrence of "" in PrintIdeal/PrintOptoAssembly can let tests fail In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 12:25:27 GMT, Christian Hagedorn wrote: > A test VM used by the IR framework sometimes prints `` in the middle of emitting a `PrintIdeal` or `PrintOptoAssembly` output which could lead to IR matching failures: > https://github.com/openjdk/jdk/blob/489e5fd12a37a45f4f5ea64b05f85c6f99f70811/src/hotspot/share/utilities/ostream.cpp#L918-L927 > > I thought about just bailing out of IR matching if this string is found after a failure but this issue also affects internal framework tests (I observed one case locally where this happened in the test `TestIRMatching`, letting it fail). > > Handling `` makes things more complicated for the IR framework tests. I'm not sure about the value of printing this message in the first place. But if nobody objects, I suggest to either remove it or at least guard it with `Verbose`, for example. I went with the latter solution in this PR. > > Thanks, > Christian I see your concern. Let me try to reproduce it and get some more information when it happens again. This might take some time. I'll get back to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/4932 From ngasson at openjdk.java.net Fri Jul 30 07:31:31 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 30 Jul 2021 07:31:31 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v3] In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 16:18:58 GMT, Andrew Haley wrote: >> C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. >> >> This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. >> >> Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. >> >> The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: >> >> old: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> >> new: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> >> >> Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, >> objects sizes are in bytes: >> >> >> old: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s >> >> new: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s > > Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: > > - Tidy up register temps in C1 stubs that call initialize_body() > - Don't use a trampoline call to zero_blocks in C1 compiles Looks good to me and I've tested tier1 with -XX:TieredStopAtLevel=1. Although you probably ought to update the copyright year in c1_MacroAssembler_aarch64.hpp. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4919 From aph at openjdk.java.net Fri Jul 30 09:06:06 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 30 Jul 2021 09:06:06 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v4] In-Reply-To: References: Message-ID: > C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. > > This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. > > Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. > > The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: > > old: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > > new: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > > > Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, > objects sizes are in bytes: > > > old: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s > > new: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix editing error restoring r19. Update copyright notices. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4919/files - new: https://git.openjdk.java.net/jdk/pull/4919/files/cd1e83ba..9ed8b7de Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=02-03 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/4919.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4919/head:pull/4919 PR: https://git.openjdk.java.net/jdk/pull/4919 From stuefe at openjdk.java.net Fri Jul 30 09:35:28 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 09:35:28 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 04:03:57 GMT, Kim Barrett wrote: >> src/hotspot/share/services/nmtPreInit.hpp line 128: >> >>> 126: // Returns start of the user data area >>> 127: void* payload() { return this + 1; } >>> 128: const void* payload() const { return this + 1; } >> >> This is an odd looking overload (that I wouldn't have thought possible). Maybe one of these payloads can be renamed to why it's const. > > [Not a review, just a drive-by comment.] This is a normal and idiomatic overload on the const-ness of `this`. To expand on Kim's answer. This is a way to solve const/nonconst problems like https://github.com/openjdk/jdk/pull/4938/files#r679639391. You get a const version (suitably returning a write-protected pointer) which the compiler chooses in const code, and a non-const version for non-const code, and no awkward const-casts are needed from the outside. In this case the implementation is simple enough that I just duplicated it; were it more complex, I'd call one in terms of the other. We do this in other places too, see e.g. `ResourceHashTable::lookup_node`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From stuefe at openjdk.java.net Fri Jul 30 09:50:33 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 09:50:33 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 06:37:36 GMT, David Holmes wrote: > Hi Thomas, > > I had a look through this and it seems reasonable, but I'm not familiar enough with the details to approve at this stage. > > A few nits below. > > Thanks, > David I did not expect a quick review for this one, so thanks for looking at this! All your suggestions make sense, I'll incorporate them. ..Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From stuefe at openjdk.java.net Fri Jul 30 09:50:34 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 09:50:34 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 04:09:32 GMT, Kim Barrett wrote: >> src/hotspot/share/services/nmtPreInit.hpp line 166: >> >>> 164: NMTPreInitAllocation** find_entry(const void* p) const { >>> 165: const unsigned index = index_for_key(p); >>> 166: NMTPreInitAllocation** aa = (NMTPreInitAllocation**) (&(_entries[index])); >> >> Why is this cast needed? > > [Not a review, just a drive-by comment.] It's casting away const. Better would be a const_cast. And probably moved to the final result, with the body keeping things const-qualified. And maybe const and non-const overloads of this function. Or maybe this function shouldn't be const-qualified if a non-const result is always needed, but that doesn't seem likely. I'll rethink this. ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From stuefe at openjdk.java.net Fri Jul 30 09:50:33 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 09:50:33 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 29 Jul 2021 23:03:46 GMT, Coleen Phillimore wrote: > This is an interesting and it seems a better way to solve this problem. Where were you all those years ago?? I hope @zhengyu123 has a chance to review it. Thank you! I was here, but we were not yet doing much upstream :) To be fair, this problem got quite involved and did cost me some cycles and false starts. I fully understand that the first solution uses the environment variable approach. I spend some time investigating different ideas with this one; at first I did not use a hash-table but a static pre-allocated buffer from which I fed early allocations. But the code got too complex, and Kim's suggestion with the side table turned out just to be a lot simpler. > Also interesting is that we were wondering how we could return malloc'd memory on JVM creation failure, and this might partially help with that larger problem. Yes, this would be trivial now, to return that memory. Though I am afraid it would be a small part only. But NMT may be instrumental in releasing all memory, since it knows everything. Only, its not always enabled. Is that a real-life problem? Are there cases where the launcher would want to live on if the JVM failed to load? ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From zgu at openjdk.java.net Fri Jul 30 13:06:32 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 30 Jul 2021 13:06:32 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 14:58:47 GMT, Thomas Stuefe wrote: > Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free. > > --------- > > NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. > > However, NMT is of limited use due to the following restrictions: > > - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. > - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. > - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. > - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. > > The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. > > The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. > > All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. > > And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. > > ------ > > This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. > > The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. > > The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. > > Changes in detail: > > - pre-NMT-init handling: > - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. > - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. > - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. > > - Changes to NMT: > - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. > - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. > - New utility functions to translate tracking level from/to strings added to `NMTUtil` > - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. > - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. > > - Gtests: > - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. > - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. > - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. > - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. > - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. > > - jtreg: > - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. > > ------------- > > Tests: > - ran manually all new tests on 64-bit and 32-bit Linux > - GHAs > - The patch has been active in SAPs test systems for a while now. Sorry for late review. Did a quick scan and have a few questions, will do more detail reading later. src/hotspot/share/services/nmtPreInit.hpp line 108: > 106: // - lookup speed is paramount since lookup is done for every os::free() call. > 107: // - insert/delete speed only matters for VM startup - after NMT initialization the lookup > 108: // table is readonly This comment does not seem to be true, you have find_and_remove() that alters table. src/hotspot/share/services/nmtPreInit.hpp line 202: > 200: assert((*aa) != NULL, "Entry not found: " PTR_FORMAT, p2i(p)); > 201: NMTPreInitAllocation* a = (*aa); > 202: (*aa) = (*aa)->next; // remove from its list Could this be a race? There is no synchronization, read/write result could be arbitrary. src/hotspot/share/services/nmtPreInit.hpp line 309: > 307: ::memcpy(p_new, a->payload(), MIN2(a->size, new_size)); > 308: (*rc) = p_new; > 309: _num_reallocs_pre_to_post++; post-NMT-init counter updates are racy. ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From adinn at openjdk.java.net Fri Jul 30 13:29:33 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 30 Jul 2021 13:29:33 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v4] In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 09:06:06 GMT, Andrew Haley wrote: >> C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. >> >> This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. >> >> Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. >> >> The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: >> >> old: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> >> new: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> >> >> Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, >> objects sizes are in bytes: >> >> >> old: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s >> >> new: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix editing error restoring r19. Update copyright notices. This looks good to me ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4919 From coleenp at openjdk.java.net Fri Jul 30 13:42:53 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 30 Jul 2021 13:42:53 GMT Subject: RFR: 8271506: Add ResourceHashtable support for deleting selected entries [v2] In-Reply-To: References: Message-ID: > The ResourceHashtable doesn't have a way to delete selected entries based on things like their class has been unloaded, which is needed to replace Hashtable with ResourceHashtable. > The Nodes of the ResourceHashtable has a Key and Value of template types. > template > class ResourceHashtableNode : public ResourceObj { > public: > unsigned _hash; > K _key; > V _value; > ... > But there's no destructor so that ~K and ~V are not called (if I understand C++ correctly). > > When instantiated with a value that's not a pointer, calling code does this: > > SourceObjInfo src_info(ref, read_only, follow_mode); > bool created; > SourceObjInfo* p = _src_obj_table.put_if_absent(src_obj, src_info, &created); > > So if SourceObjInfo has a destructor, it'll have to have a careful assignment operator so that the value copied into the hashtable doesn't get deleted. > > In this patch, I assign the responsibility of deleting the Key and Value of the hashtable to the do_entry function, because it's simple. If we want to use more advanced unreadable C++ code, someone will have to suggest an alternate set of changes, because my C++ is not up to this. > > Tested with tier1-3, gtest, and upcoming patch for JDK-8048190. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Make unlink non-const because it's not. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4938/files - new: https://git.openjdk.java.net/jdk/pull/4938/files/d3e23f7a..999405db Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4938&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4938&range=00-01 Stats: 9 lines in 1 file changed: 5 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/4938.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4938/head:pull/4938 PR: https://git.openjdk.java.net/jdk/pull/4938 From coleenp at openjdk.java.net Fri Jul 30 13:42:54 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 30 Jul 2021 13:42:54 GMT Subject: RFR: 8271506: Add ResourceHashtable support for deleting selected entries [v2] In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 04:29:42 GMT, Ioi Lam wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Make unlink non-const because it's not. > > src/hotspot/share/utilities/resourceHash.hpp line 219: > >> 217: // the entry is deleted. >> 218: template >> 219: void unlink(ITER* iter) const { > > `unlink()` shouldn't be `const` as it modifies the table. However, you probably needed to declare it as such because it calls `bucket_at()`, which is `const`. And the reason that `bucket_at()` is `const` is because it's called by `iterate()`, which is `const`. > > The use of `const` and `const_cast` in this class is getting a bit out of whack. I have created [JDK-8271525](https://bugs.openjdk.java.net/browse/JDK-8271525) with a preliminary fix here: https://github.com/openjdk/jdk/pull/4942 > > Maybe we should do that before this PR? it will also make the `const_cast` unnecessary for `decrement_entries()`. I agree with @tstuefe in your draft PR. I don't think get should be non-const. I haven't been able to parse through the monstrosity cast of lookup_node to know why and whether it should be changed. I made unlink non-const and made the changes that he suggested there and that fixed the cast that I had to decrement the entries. ------------- PR: https://git.openjdk.java.net/jdk/pull/4938 From stuefe at openjdk.java.net Fri Jul 30 14:36:42 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 14:36:42 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 04:16:59 GMT, Ioi Lam wrote: > `ResourceHashtableBase::iterate()` is declared `const`, but it can actually change the contents of the table. The same is true for `ResourceHashtableBase::get()`, which returns a non-`const` pointer to the value, allowing the caller to modify it. > > We should declare these two functions as non-`const`. This will also remove a lot of ugly `const_cast<>` code. > > The `iterate()` API is tightened such that the `do_entry()` function can modify the `value` but not the `key`. I think this is not an improvement, sorry. Now, a number of methods became non-const, including `iterate`, and that non-constness propagates. `iterate` itself does not change the table, and it can be reasonably called from const contexts. There are actually a number of callers which arguably should be const but aren't (e.g. `ClassLoaderStatsClosure::print()`. I think a better approach would be to have both const and non-const versions, e.g. for `bucket_at()`: Node** bucket_at(unsigned index) ... const Node* const * bucket_at(unsigned index) const ... This way, the compiler chooses the const version in const code, giving you the benefit of a read-only V*, and in non-const code you get access to the innards of V*. No awkward const casts needed. Above code makes Coleen's new unlink() from https://github.com/openjdk/jdk/pull/4938 compile in non-const form. Note how `ResourceHashTable::lookup_node()` already follows this pattern. You deleted the non-const version, but it made sense. Same with `::get`: const V* get(K const& key) const; V* get(K const& key); Making the Key references const is fine though. ..Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/4942 From coleenp at openjdk.java.net Fri Jul 30 14:36:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 30 Jul 2021 14:36:42 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 09:26:00 GMT, Thomas Stuefe wrote: >> `ResourceHashtableBase::iterate()` is declared `const`, but it can actually change the contents of the table. The same is true for `ResourceHashtableBase::get()`, which returns a non-`const` pointer to the value, allowing the caller to modify it. >> >> We should declare these two functions as non-`const`. This will also remove a lot of ugly `const_cast<>` code. >> >> The `iterate()` API is tightened such that the `do_entry()` function can modify the `value` but not the `key`. > > I think this is not an improvement, sorry. Now, a number of methods became non-const, including `iterate`, and that non-constness propagates. `iterate` itself does not change the table, and it can be reasonably called from const contexts. There are actually a number of callers which arguably should be const but aren't (e.g. `ClassLoaderStatsClosure::print()`. > > I think a better approach would be to have both const and non-const versions, e.g. for `bucket_at()`: > > > Node** bucket_at(unsigned index) ... > const Node* const * bucket_at(unsigned index) const ... > > > This way, the compiler chooses the const version in const code, giving you the benefit of a read-only V*, and in non-const code you get access to the innards of V*. No awkward const casts needed. > > Above code makes Coleen's new unlink() from https://github.com/openjdk/jdk/pull/4938 compile in non-const form. > > Note how `ResourceHashTable::lookup_node()` already follows this pattern. You deleted the non-const version, but it made sense. > > Same with `::get`: > > > const V* get(K const& key) const; > V* get(K const& key); > > > Making the Key references const is fine though. > > ..Thomas @tstuefe I was not paying attention when C++ added overloading based on constness of return type so that's an interesting change, but also, does this get rid of the monstrosity: return const_cast( const_cast(this)->lookup_node(hash, key)); ? ------------- PR: https://git.openjdk.java.net/jdk/pull/4942 From iklam at openjdk.java.net Fri Jul 30 14:36:42 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 30 Jul 2021 14:36:42 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 04:16:59 GMT, Ioi Lam wrote: > `ResourceHashtableBase::iterate()` is declared `const`, but it can actually change the contents of the table. The same is true for `ResourceHashtableBase::get()`, which returns a non-`const` pointer to the value, allowing the caller to modify it. > > We should declare these two functions as non-`const`. This will also remove a lot of ugly `const_cast<>` code. > > The `iterate()` API is tightened such that the `do_entry()` function can modify the `value` but not the `key`. > I think this is not an improvement, sorry. Now, a number of methods became non-const, including `iterate`, and that non-constness propagates. `iterate` itself does not change the table, and it can be reasonably called from const contexts. There are actually a number of callers which arguably should be const but aren't (e.g. `ClassLoaderStatsClosure::print()`. > > I think a better approach would be to have both const and non-const versions, e.g. for `bucket_at()`: > > ``` > Node** bucket_at(unsigned index) ... > const Node* const * bucket_at(unsigned index) const ... > ``` > > This way, the compiler chooses the const version in const code, giving you the benefit of a read-only V*, and in non-const code you get access to the innards of V*. No awkward const casts needed. > > Above code makes Coleen's new unlink() from #4938 compile in non-const form. > > Note how `ResourceHashTable::lookup_node()` already follows this pattern. You deleted the non-const version, but it made sense. > > Same with `::get`: > > ``` > const V* get(K const& key) const; > V* get(K const& key); > ``` > > Making the Key references const is fine though. Hi Thomas, I've added `const V* get(K const& key) const;` as you suggested. However, I don't think `iterate()` should be const, because we have code that actually modifies the table's contents: https://github.com/openjdk/jdk/blob/77fbd99f792c42bb92a240d38f35e3af25500f99/src/hotspot/share/logging/logAsyncWriter.cpp#L98-L107 I'll try to refactor the code to have a `const_iterate()` which calls `ITER::do_entry(const&K, const &V)`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4942 From iklam at openjdk.java.net Fri Jul 30 14:36:42 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 30 Jul 2021 14:36:42 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const Message-ID: `ResourceHashtableBase::iterate()` is declared `const`, but it can actually change the contents of the table. The same is true for `ResourceHashtableBase::get()`, which returns a non-`const` pointer to the value, allowing the caller to modify it. We should declare these two functions as non-`const`. This will also remove a lot of ugly `const_cast<>` code. The `iterate()` API is tightened such that the `do_entry()` function can modify the `value` but not the `key`. ------------- Commit messages: - added const_iterate() - added const version of get() - fixed spacing - fixed typos - also make the get() function non-const, since it returns a pointer to the value, which the caller can modify - step2 -- make iterate() a non-const function - step1 -- keep iterate() as a const function, but clean up the rest of the code Changes: https://git.openjdk.java.net/jdk/pull/4942/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4942&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271525 Stats: 60 lines in 6 files changed: 29 ins; 4 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/4942.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4942/head:pull/4942 PR: https://git.openjdk.java.net/jdk/pull/4942 From coleenp at openjdk.java.net Fri Jul 30 14:36:43 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 30 Jul 2021 14:36:43 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 04:16:59 GMT, Ioi Lam wrote: > `ResourceHashtableBase::iterate()` is declared `const`, but it can actually change the contents of the table. The same is true for `ResourceHashtableBase::get()`, which returns a non-`const` pointer to the value, allowing the caller to modify it. > > We should declare these two functions as non-`const`. This will also remove a lot of ugly `const_cast<>` code. > > The `iterate()` API is tightened such that the `do_entry()` function can modify the `value` but not the `key`. src/hotspot/share/utilities/resourceHash.hpp line 204: > 202: Node* node = *bucket; > 203: while (node != NULL) { > 204: bool cont = iter->do_entry(const_cast(node->_key), node->_value); It seems to me that iterate should be const because it doesn't change the table. Whether or not it changes the pointers in the Nodes is not part of the contract, since they could have been declared non-const. ------------- PR: https://git.openjdk.java.net/jdk/pull/4942 From aph at openjdk.java.net Fri Jul 30 15:22:02 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 30 Jul 2021 15:22:02 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v5] In-Reply-To: References: Message-ID: > C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. > > This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. > > Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. > > The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: > > old: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > > new: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > > > Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, > objects sizes are in bytes: > > > old: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s > > new: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Guard ciEnv::current() check with a test to ensure this is a compiler thread. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4919/files - new: https://git.openjdk.java.net/jdk/pull/4919/files/9ed8b7de..e0f00afa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4919&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4919.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4919/head:pull/4919 PR: https://git.openjdk.java.net/jdk/pull/4919 From iignatyev at openjdk.java.net Fri Jul 30 15:36:35 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 30 Jul 2021 15:36:35 GMT Subject: [jdk17] RFR: 8067223: [TESTBUG] Rename Whitebox API package In-Reply-To: <_3viZFs2iZ00hMdeP3Nq9gTmwfIHeTzZ7NVT8thQ1BM=.10005882-f135-4592-a663-b75747da3f6c@github.com> References: <39PNb3wa-43yB5IFR3XroqFf4vMNZPitzF85lO1Gw58=.f2587741-d2c4-467d-b722-b17269587e7a@github.com> <_3viZFs2iZ00hMdeP3Nq9gTmwfIHeTzZ7NVT8thQ1BM=.10005882-f135-4592-a663-b75747da3f6c@github.com> Message-ID: On Thu, 29 Jul 2021 01:30:37 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> could you please review this big tedious and trivial(-ish) patch which moves `sun.hotspot.WhiteBox` and related classes to `jdk.test.whitebox` package? >> >> the majority of the patch is the following substitutions: >> - `s~sun/hotspot/WhiteBox~jdk/test/whitebox/WhiteBox~g` >> - `s/sun.hotspot.parser/jdk.test.whitebox.parser/g` >> - `s/sun.hotspot.cpuinfo/jdk.test.whitebox.cpuinfo/g` >> - `s/sun.hotspot.code/jdk.test.whitebox.code/g` >> - `s/sun.hotspot.gc/jdk.test.whitebox.gc/g` >> - `s/sun.hotspot.WhiteBox/jdk.test.whitebox.WhiteBox/g` >> >> testing: tier1-4 >> >> Thanks, >> -- Igor > > I know that tests fixes could be pushed during RDP2 without approval. > But these one is very big and it is not a fix - it is enhancement. > > What is the reason you want to push it into JDK 17 just few days before first Release Candidate? Instead of pushing it into JDK 18. > > And I can't even review it because GutHub UI hangs on these big changes. @vnkozlov, @dholmes-ora, Thank you for looking at this! I want this to be integrated into JDK 17 b/c some "external" libraries use (used to use) WhiteBox API, e.g. jcstress[[2]] used WhiteBox API to deoptimize compiled methods[[3]], and it would be easier for maintainers of such libraries to condition package name based on JDK major version. Also, given JDK 17 is an LTS, it would be beneficial for everyone not to have big differences in test bases b/w it and the mainline. according to JEP 3, test RFEs are allowed until the very end and don't require late enhancement approval: "Enhancements to tests and documentation during RDP 1 and RDP 2 do not require approval, as long as the relevant issues are identified with a noreg-self or noreg-doc label, as appropriate"[[1]]. So, process-wise, I don't see any issues w/ integrating this RFE, yet, I fully agree that due to its size, this patch can be disruptive and can cause massive failures, which is something we obviously don't want at the current stage of JDK 17. I like David's idea about phasing this clean-up, and, due to the reasons described above, I would like to integrate the first part, copying WhiteBox classes to the new package structure and associated changes w/o updating all the tests, into JDK 17 and update the tests on the mainline (w/ backporting into jdk17u). WDYT? Cheers, -- Igor [1]: https://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process [2]: https://github.com/openjdk/jcstress [3]: https://github.com/openjdk/jcstress/blob/df83b446f187ae0b0fa31fa54decb59db9e955da/jcstress-core/src/main/java/org/openjdk/jcstress/vm/WhiteBoxSupport.java ------------- PR: https://git.openjdk.java.net/jdk17/pull/290 From adinn at openjdk.java.net Fri Jul 30 15:55:33 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 30 Jul 2021 15:55:33 GMT Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v5] In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 15:22:02 GMT, Andrew Haley wrote: >> C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. >> >> This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. >> >> Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. >> >> The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: >> >> old: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> >> new: >> >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> >> >> Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, >> objects sizes are in bytes: >> >> >> old: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s >> >> new: >> >> Benchmark (size) Mode Cnt Score Error Units >> RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s >> RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s >> RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s >> RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s >> RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s >> RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s >> RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s >> RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s >> RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s >> RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s >> RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s >> RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s >> RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s >> RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s >> RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s >> RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s >> RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s >> RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s >> RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s >> RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s >> RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s >> RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s >> RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s >> RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s >> RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s >> RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s >> RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s >> RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s >> RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s >> RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s >> RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s >> RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s >> RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s >> RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s >> RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s >> RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s >> RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s >> RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s >> RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s >> RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Guard ciEnv::current() check with a test to ensure this is a compiler thread. My initial reaction to that extra last commit was that anything other than a compiler thread calling MacroAssembler::zero_words implies "Something is rotten in the state of Danmark" and the only end game is a stage strewn with corpses. Of course, zero_words gets called from the stub generator. However, I guess the stub complete check avoids any problems there. Were you thinking of some other potential use for it? e.g. lazy intrinsics? Anyway, the check certainly does no harm so it is all still good. ------------- PR: https://git.openjdk.java.net/jdk/pull/4919 From stuefe at openjdk.java.net Fri Jul 30 16:13:29 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 16:13:29 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 04:16:59 GMT, Ioi Lam wrote: > `ResourceHashtableBase::iterate()` is declared `const`, but it can actually change the contents of the table. The same is true for `ResourceHashtableBase::get()`, which returns a non-`const` pointer to the value, allowing the caller to modify it. > > We should declare these two functions as non-`const`. This will also remove a lot of ugly `const_cast<>` code. > > The `iterate()` API is tightened such that the `do_entry()` function can modify the `value` but not the `key`. Hi Ioi, > > Hi Thomas, > > I've added `const V* get(K const& key) const;` as you suggested. > > However, I don't think `iterate()` should be const, because we have code that actually modifies the table's contents: > I disagree; iterate does not modify the table. Calling it from a const context (eg. to print the table) should be possible. > https://github.com/openjdk/jdk/blob/77fbd99f792c42bb92a240d38f35e3af25500f99/src/hotspot/share/logging/logAsyncWriter.cpp#L98-L107 > > I'll try to refactor the code to have a `const_iterate()` which calls `ITER::do_entry(const&K, const &V)`. If ITER were not a template parameter but a real functor, this would make sense since we could say: void iterate(iterator& it); void iterate(const_iterator& it) const; but with ITER being a template parameter, I don't know of a way to enforce the use of only const ITERs with a const_iterate. Therefore I am not sure a const_iterate would give us that much. ------------- PR: https://git.openjdk.java.net/jdk/pull/4942 From stuefe at openjdk.java.net Fri Jul 30 16:22:37 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 16:22:37 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const In-Reply-To: References: Message-ID: <2Tb21gNwyf6TJFSITnDupFddCyPWJKJoPK9j9rEqIPU=.317dec9f-1068-4b27-9311-a070ec1a487d@github.com> On Fri, 30 Jul 2021 16:10:29 GMT, Thomas Stuefe wrote: >> `ResourceHashtableBase::iterate()` is declared `const`, but it can actually change the contents of the table. The same is true for `ResourceHashtableBase::get()`, which returns a non-`const` pointer to the value, allowing the caller to modify it. >> >> We should declare these two functions as non-`const`. This will also remove a lot of ugly `const_cast<>` code. >> >> The `iterate()` API is tightened such that the `do_entry()` function can modify the `value` but not the `key`. > > Hi Ioi, > >> >> Hi Thomas, >> >> I've added `const V* get(K const& key) const;` as you suggested. >> >> However, I don't think `iterate()` should be const, because we have code that actually modifies the table's contents: >> > > I disagree; iterate does not modify the table. Calling it from a const context (eg. to print the table) should be possible. > >> https://github.com/openjdk/jdk/blob/77fbd99f792c42bb92a240d38f35e3af25500f99/src/hotspot/share/logging/logAsyncWriter.cpp#L98-L107 >> >> I'll try to refactor the code to have a `const_iterate()` which calls `ITER::do_entry(const&K, const &V)`. > > If ITER were not a template parameter but a real functor, this would make sense since we could say: > > > void iterate(iterator& it); > void iterate(const_iterator& it) const; > > > but with ITER being a template parameter, I don't know of a way to enforce the use of only const ITERs with a const_iterate. Therefore I am not sure a const_iterate would give us that much. > @tstuefe I was not paying attention when C++ added overloading based on constness of return type so that's an interesting change, but also, does this get rid of the monstrosity: > > ``` > return const_cast( > const_cast(this)->lookup_node(hash, key)); > ``` > > ? No, sorry, I think this monstrosity is the price one pays for const overloading a method but sharing the implementation. Plus doing all casts in the C++ correct very verbose fashion instead of the old boring C-style way. That said, in the above code, I am not 100% sure the cast for the return value is even needed. Since we go from non-const pointer to const pointer, should that not work without a cast? ------------- PR: https://git.openjdk.java.net/jdk/pull/4942 From stuefe at openjdk.java.net Fri Jul 30 16:31:29 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 16:31:29 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 12:56:59 GMT, Zhengyu Gu wrote: >> Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free. >> >> --------- >> >> NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. >> >> However, NMT is of limited use due to the following restrictions: >> >> - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. >> - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. >> - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. >> - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. >> >> The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. >> >> The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. >> >> All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. >> >> And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. >> >> ------ >> >> This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. >> >> The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. >> >> The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. >> >> Changes in detail: >> >> - pre-NMT-init handling: >> - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. >> - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. >> - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. >> >> - Changes to NMT: >> - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. >> - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. >> - New utility functions to translate tracking level from/to strings added to `NMTUtil` >> - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. >> - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. >> >> - Gtests: >> - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. >> - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. >> - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. >> - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. >> - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. >> >> - jtreg: >> - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. >> >> ------------- >> >> Tests: >> - ran manually all new tests on 64-bit and 32-bit Linux >> - GHAs >> - The patch has been active in SAPs test systems for a while now. > > src/hotspot/share/services/nmtPreInit.hpp line 202: > >> 200: assert((*aa) != NULL, "Entry not found: " PTR_FORMAT, p2i(p)); >> 201: NMTPreInitAllocation* a = (*aa); >> 202: (*aa) = (*aa)->next; // remove from its list > > Could this be a race? There is no synchronization, read/write result could be arbitrary. The code is implicitly thread-safe because the hashmap is only modified in the pre-NMT-init phase. After NMT initialization, the table is read-only. During pre-NMT-init we are effectively single-threaded - at most two threads access the map, the thread loading the libjvm, and the thread calling CreateJavaVM, but not at the same time. See also the asserts in the AllStatic class `NMTPreInit`. (I should have described it more clearly, will add a comment.) ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From stuefe at openjdk.java.net Fri Jul 30 16:39:31 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 16:39:31 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 13:03:32 GMT, Zhengyu Gu wrote: > Sorry for late review. > > Did a quick scan and have a few questions, will do more detail reading later. Thanks a lot, I appreciate your feedback! > src/hotspot/share/services/nmtPreInit.hpp line 108: > >> 106: // - lookup speed is paramount since lookup is done for every os::free() call. >> 107: // - insert/delete speed only matters for VM startup - after NMT initialization the lookup >> 108: // table is readonly > > This comment does not seem to be true, you have find_and_remove() that alters table. The point is *after NMT initialization* - `find_and_remove` only gets called before NMT initialization; after that, we only do non-modifying lookup. You'll find the logic in `NMTPreInit::handle_realloc()` and `NMTPreInit::handle_free()`, respectively. The basic idea behind this is that we remove pointers from the map as long as we can without risking concurrency issues, which is until NMT initialization. After that, we leave the map alone. It was either that or protect the map with a lock, and this is the lesser of two evils since the map is usually sparsely populated. > src/hotspot/share/services/nmtPreInit.hpp line 309: > >> 307: ::memcpy(p_new, a->payload(), MIN2(a->size, new_size)); >> 308: (*rc) = p_new; >> 309: _num_reallocs_pre_to_post++; > > post-NMT-init counter updates are racy. True, this is racy. It's just for diagnostics though - I rather remove them than make them atomic since we would pay for this with every malloc. Or, maybe atomic + debug only? ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From zgu at openjdk.java.net Fri Jul 30 16:39:31 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 30 Jul 2021 16:39:31 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 16:28:54 GMT, Thomas Stuefe wrote: >> src/hotspot/share/services/nmtPreInit.hpp line 202: >> >>> 200: assert((*aa) != NULL, "Entry not found: " PTR_FORMAT, p2i(p)); >>> 201: NMTPreInitAllocation* a = (*aa); >>> 202: (*aa) = (*aa)->next; // remove from its list >> >> Could this be a race? There is no synchronization, read/write result could be arbitrary. > > The code is implicitly thread-safe because the hashmap is only modified in the pre-NMT-init phase. After NMT initialization, the table is read-only. During pre-NMT-init we are effectively single-threaded - at most two threads access the map, the thread loading the libjvm, and the thread calling CreateJavaVM, but not at the same time. > > See also the asserts in the AllStatic class `NMTPreInit`. > > (I should have described it more clearly, will add a comment.) So, you are saying that there is no memory that is malloc'd pre-NMT-init phase and freed post-NMT-init phase? ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From stuefe at openjdk.java.net Fri Jul 30 16:45:34 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 16:45:34 GMT Subject: RFR: 8271242: Add Arena regression tests In-Reply-To: <_chJbiP524b_Xm5Ir_gtmnRKX00FuJfVmSfwVhC0PoM=.ec823b0f-65d9-4bc1-92ec-8be13d06cf3b@github.com> References: <_chJbiP524b_Xm5Ir_gtmnRKX00FuJfVmSfwVhC0PoM=.ec823b0f-65d9-4bc1-92ec-8be13d06cf3b@github.com> Message-ID: On Thu, 29 Jul 2021 21:39:48 GMT, Coleen Phillimore wrote: > Looks good. Thank you for writing these tests. Thank you Coleen! > test/hotspot/gtest/memory/test_arena.cpp line 65: > >> 63: } >> 64: // Allocate again. The new allocations should have the same position as the 0-sized >> 65: // first one. > > It seems so dangerous to allow zero sized Amalloc. I agree, but I wanted these tests to test existing behavior. I tried to do a quick "lets assert > 0" and that backfired, a bit of cleanup would be needed at the caller side. ------------- PR: https://git.openjdk.java.net/jdk/pull/4909 From stuefe at openjdk.java.net Fri Jul 30 16:45:35 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 16:45:35 GMT Subject: Integrated: 8271242: Add Arena regression tests In-Reply-To: References: Message-ID: On Tue, 27 Jul 2021 06:06:40 GMT, Thomas Stuefe wrote: > May I please have reviews for these test additions. These are new regression tests for hotspot arenas. We don't have any and it makes sense to have them since this code is fragile and we work on it. > > It also contains some new gtest utility functions which I will use to consolidate some more test coding, mainly in the metaspace gtests (in a future rfe). > > It also comes with a new jtreg gtest launcher for arena tests to test UseMallocOnly mode. As long as we support that, we should test it. > > --- > > tests: > - gtests, manually with 32/64 bit and with/without UseMallocOnly > - GHAs This pull request has now been integrated. Changeset: cd7e30ef Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/cd7e30ef84165722c2128471231b6000b1c46fb8 Stats: 465 lines in 4 files changed: 465 ins; 0 del; 0 mod 8271242: Add Arena regression tests Reviewed-by: mseledtsov, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/4909 From aph at redhat.com Fri Jul 30 17:05:25 2021 From: aph at redhat.com (Andrew Haley) Date: Fri, 30 Jul 2021 18:05:25 +0100 Subject: RFR: 8270947: AArch64: C1: use zero_words to initialize all objects [v5] In-Reply-To: References: Message-ID: <3677dee4-e621-b517-435a-d43f942c1a7f@redhat.com> On 7/30/21 4:55 PM, Andrew Dinn wrote: > My initial reaction to that extra last commit was that anything other than a compiler thread calling MacroAssembler::zero_words implies "Something is rotten in the state of Danmark" and the only end game is a stage strewn with corpses. Of course, zero_words gets called from the stub generator. However, I guess the stub complete check avoids any problems there. > > Were you thinking of some other potential use for it? e.g. lazy intrinsics? > > Anyway, the check certainly does no harm so it is all still good. I *think* it's probably unnecessary. However, we do lazily generate IC buffers from Java code, and we use the assembler to do that. So in general it is not against any rules to use the MacroAssembler from non-compiler threads. It is certainly the case that you can't call ciEnv::current() unless you're on a compiler thread without risking Severe Badness, because current() blindly does this: static CompilerThread* current() { => return CompilerThread::cast(JavaThread::current()); } static CompilerThread* cast(Thread* t) { assert(t->is_Compiler_thread(), "incorrect cast to CompilerThread"); return static_cast(t); } And therefore ciEnv::current()->task() is Undefined Behaviour if not on a CompilerThread. (This is pretty awful, and I should submit a patch to perhaps add ciEnv::current_orNull(). But not in this patch.) So it's fairly uncontroversial, I would have thought, to guard all calls to ciEnv::current() from MacroAssembler with is_Compiler_thread(). -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.java.net Fri Jul 30 18:05:39 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 30 Jul 2021 18:05:39 GMT Subject: Integrated: 8270947: AArch64: C1: use zero_words to initialize all objects In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 09:39:32 GMT, Andrew Haley wrote: > C1 has its own code generators for zeroing words. We should use the same logic for C1 and C2, which should give us better C1 performance and result in less code to maintain. > > This is one of those patches that's a great joy to write, because it consists mainly of deletions. The code I've added is mostly adapters to allow the C1 code to use the memory-zeroing logic written originally for C2. This means we have less code, but also that VM configuration options (e.g. `BlockZeroingLowLimit`) work with C1 and C2 in th esame way. > > Measuring the performance of memory allocation is quite tricky, so I've written a JMH test case that measures the raw allocation rate of the JVM for various object sizes. This is inevitably rather noisy because it combines the effects of both the allocation code and other GC-related pauses. Nonetheless, it's a useful sanity check. > > The performance differences between old and one are mostly in the noise, but with large allocations the advantage of `DC ZVA` becomes apparent: > > old: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > > new: > > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > > > Full test results, Graviton 2 (i.e. Neoverse N1). Units are megabytes per second, > objects sizes are in bytes: > > > old: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5092.798 ? 20.879 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9821.608 ? 6.250 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 14117.192 ? 72.720 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9090.514 ? 40.239 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9842.503 ? 52.744 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.179 ? 6.332 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12836.968 ? 14.143 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18970.307 ? 96.903 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 36709.095 ? 38.256 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43055.263 ? 60.808 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3045.285 ? 23.128 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5774.157 ? 52.472 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 4720.713 ? 9.419 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 7457.880 ? 806.208 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 8155.046 ? 194.153 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 8364.379 ? 127.661 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 11220.314 ? 336.878 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 16655.815 ? 88.577 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 28302.661 ? 155.513 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 31434.868 ? 211.768 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6667.433 ? 50.031 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 10669.876 ? 72.109 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5483.582 ? 336.743 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9740.872 ? 6.269 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9868.685 ? 51.939 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9881.944 ? 46.306 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13524.791 ? 69.250 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19560.774 ? 109.518 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37510.256 ? 15.586 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43361.887 ? 181.294 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2851.135 ? 22.891 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5476.183 ? 84.376 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5105.347 ? 35.389 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 7380.805 ? 3.944 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 8963.428 ? 83.857 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9257.715 ? 52.647 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 11655.359 ? 70.209 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 17084.813 ? 91.150 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 28682.783 ? 176.563 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 31268.318 ? 221.486 ops/s > > new: > > Benchmark (size) Mode Cnt Score Error Units > RawAllocationRate.arrayTest 32 thrpt 5 5355.477 ? 43.045 ops/s > RawAllocationRate.arrayTest 64 thrpt 5 9825.067 ? 55.493 ops/s > RawAllocationRate.arrayTest 256 thrpt 5 13984.865 ? 125.125 ops/s > RawAllocationRate.arrayTest 1024 thrpt 5 9025.380 ? 48.921 ops/s > RawAllocationRate.arrayTest 2048 thrpt 5 9844.463 ? 6.780 ops/s > RawAllocationRate.arrayTest 4096 thrpt 5 9866.566 ? 48.659 ops/s > RawAllocationRate.arrayTest 8192 thrpt 5 12753.622 ? 67.211 ops/s > RawAllocationRate.arrayTest 16384 thrpt 5 18890.419 ? 14.152 ops/s > RawAllocationRate.arrayTest 65536 thrpt 5 37322.124 ? 269.352 ops/s > RawAllocationRate.arrayTest 131072 thrpt 5 43017.952 ? 204.057 ops/s > RawAllocationRate.arrayTest_C1 32 thrpt 5 3102.221 ? 13.811 ops/s > RawAllocationRate.arrayTest_C1 64 thrpt 5 5947.419 ? 36.408 ops/s > RawAllocationRate.arrayTest_C1 256 thrpt 5 5124.479 ? 548.617 ops/s > RawAllocationRate.arrayTest_C1 1024 thrpt 5 9459.376 ? 716.317 ops/s > RawAllocationRate.arrayTest_C1 2048 thrpt 5 9840.594 ? 15.922 ops/s > RawAllocationRate.arrayTest_C1 4096 thrpt 5 9860.274 ? 56.088 ops/s > RawAllocationRate.arrayTest_C1 8192 thrpt 5 13677.987 ? 143.048 ops/s > RawAllocationRate.arrayTest_C1 16384 thrpt 5 19517.416 ? 155.004 ops/s > RawAllocationRate.arrayTest_C1 65536 thrpt 5 37348.536 ? 307.582 ops/s > RawAllocationRate.arrayTest_C1 131072 thrpt 5 43414.399 ? 58.317 ops/s > RawAllocationRate.instanceTest 32 thrpt 5 6620.452 ? 137.048 ops/s > RawAllocationRate.instanceTest 64 thrpt 5 9850.677 ? 6.417 ops/s > RawAllocationRate.instanceTest 256 thrpt 5 5533.512 ? 129.334 ops/s > RawAllocationRate.instanceTest 1024 thrpt 5 9829.806 ? 7.555 ops/s > RawAllocationRate.instanceTest 2048 thrpt 5 9857.707 ? 51.541 ops/s > RawAllocationRate.instanceTest 4096 thrpt 5 9957.300 ? 7.115 ops/s > RawAllocationRate.instanceTest 8192 thrpt 5 13662.581 ? 85.225 ops/s > RawAllocationRate.instanceTest 16384 thrpt 5 19571.796 ? 120.962 ops/s > RawAllocationRate.instanceTest 65536 thrpt 5 37401.527 ? 67.260 ops/s > RawAllocationRate.instanceTest 131072 thrpt 5 43327.339 ? 35.077 ops/s > RawAllocationRate.instanceTest_C1 32 thrpt 5 2842.031 ? 47.924 ops/s > RawAllocationRate.instanceTest_C1 64 thrpt 5 5359.357 ? 53.031 ops/s > RawAllocationRate.instanceTest_C1 256 thrpt 5 5081.287 ? 57.737 ops/s > RawAllocationRate.instanceTest_C1 1024 thrpt 5 8372.330 ? 267.016 ops/s > RawAllocationRate.instanceTest_C1 2048 thrpt 5 9470.224 ? 250.706 ops/s > RawAllocationRate.instanceTest_C1 4096 thrpt 5 9843.936 ? 52.825 ops/s > RawAllocationRate.instanceTest_C1 8192 thrpt 5 13695.863 ? 80.433 ops/s > RawAllocationRate.instanceTest_C1 16384 thrpt 5 19495.110 ? 116.300 ops/s > RawAllocationRate.instanceTest_C1 65536 thrpt 5 37448.948 ? 291.917 ops/s > RawAllocationRate.instanceTest_C1 131072 thrpt 5 43443.406 ? 267.236 ops/s This pull request has now been integrated. Changeset: 6c68ce2d Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/6c68ce2d396c6fe02201daf2bdb8c164de807cc1 Stats: 28948 lines in 8 files changed: 28783 ins; 117 del; 48 mod 8270947: AArch64: C1: use zero_words to initialize all objects Reviewed-by: ngasson, adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/4919 From valeriep at openjdk.java.net Fri Jul 30 18:56:40 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 30 Jul 2021 18:56:40 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Thu, 22 Jul 2021 17:16:45 GMT, Anthony Scarpino wrote: >> Seems strange to have GCMOperation op defined in GCMEngine but not initialized, nor used. The methods in GCMEngine which use op has an argument named op anyway. Either you just use the "op" field (remove the "op" argument) or the "op" argument (move the op field to GCMEncrypt/GCMDecrypt class). Having both looks confusing. > > Ok.. Moving it into GCMEncrypt makes sense. Now that I look at the code GCMDecrypt only uses it when passed to a method. GCMEncrypt uses it This is still present in the latest update. Is there another update coming? ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Fri Jul 30 18:56:41 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 30 Jul 2021 18:56:41 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Thu, 22 Jul 2021 17:19:20 GMT, Anthony Scarpino wrote: >> src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 650: >> >>> 648: int originalOutOfs = 0; >>> 649: byte[] in; >>> 650: byte[] out; >> >> The name "in", "out" are almost used in all calls, it's hard to tell when these two are actually used. Can we rename them to make them more unique? > > ok This is still present in the latest update. Is there another update coming? ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From svkamath at openjdk.java.net Fri Jul 30 18:56:40 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Fri, 30 Jul 2021 18:56:40 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Fri, 30 Jul 2021 18:23:18 GMT, Valerie Peng wrote: >> Ok.. Moving it into GCMEncrypt makes sense. Now that I look at the code GCMDecrypt only uses it when passed to a method. GCMEncrypt uses it > > This is still present in the latest update. Is there another update coming? Yes. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From svkamath at openjdk.java.net Fri Jul 30 18:56:42 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Fri, 30 Jul 2021 18:56:42 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Fri, 30 Jul 2021 18:23:44 GMT, Valerie Peng wrote: >> ok > > This is still present in the latest update. Is there another update coming? Yes. There will be another update. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From iklam at openjdk.java.net Fri Jul 30 18:57:32 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 30 Jul 2021 18:57:32 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 16:10:29 GMT, Thomas Stuefe wrote: > Hi Ioi, > > > Hi Thomas, > > I've added `const V* get(K const& key) const;` as you suggested. > > However, I don't think `iterate()` should be const, because we have code that actually modifies the table's contents: > > I disagree; iterate does not modify the table. Calling it from a const context (eg. to print the table) should be possible. It does. The code I pointed below modifies the `counter` which is stored inside the table. It's possible to write a iterator that prints out all the counters, and resets all of them to zero. To me, this is a modification of the table. The current API does not forbid that, so it should not be declared `const`. We need to be consistent: - `V* get(key)` returns a `V*` that can be used to modify the contents in the table. - `iterate(iter)` passes a `V*` that can be used to modify the contents in the table. So we should declare both of these functions const (or declare both of them non-const). I think non-const makes more sense. I have a hard time understand why add/removing items are considered non-const, but modifying items are considered const. > > https://github.com/openjdk/jdk/blob/77fbd99f792c42bb92a240d38f35e3af25500f99/src/hotspot/share/logging/logAsyncWriter.cpp#L98-L107 > > > > I'll try to refactor the code to have a `const_iterate()` which calls `ITER::do_entry(const&K, const &V)`. > > If ITER were not a template parameter but a real functor, this would make sense since we could say: > > ``` > void iterate(iterator& it); > void iterate(const_iterator& it) const; > ``` > > but with ITER being a template parameter, I don't know of a way to enforce the use of only const ITERs with a const_iterate. Therefore I am not sure a const_iterate would give us that much. My current implementation of `const_iterate()` already enforces that `ITER::do_entry()` must be declared with `(K const&, V const&)`, due to this call: bool cont = iter->do_entry(const_cast(node->_key), const_cast(node->_value)); That's because `const_iterate()` instantiates `VALUE` as `V const&`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4942 From svkamath at openjdk.java.net Fri Jul 30 18:56:43 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Fri, 30 Jul 2021 18:56:43 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Mon, 19 Jul 2021 19:18:54 GMT, Valerie Peng wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated AES-GCM intrinsic to match latest Java Code > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 717: > >> 715: in = new byte[Math.min(PARALLEL_LEN, srcLen)]; >> 716: out = new byte[Math.min(PARALLEL_LEN, srcLen)]; >> 717: } > > Move this down to else-block below just like the 'ct' variable. I've kept this code as is and not moved as recommended. If we move this line to the else part, the case where srcLen is less than PARALLEL_LEN but greater than BlockSize, in[] is null. As a result, three tests in test/jdk/../Cipher/AEAD were failing on src.get(in, 0, rlen) line. Do let me know if that's okay. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From stuefe at openjdk.java.net Fri Jul 30 18:57:32 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 30 Jul 2021 18:57:32 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 18:28:44 GMT, Zhengyu Gu wrote: >> So, you are saying that there is no memory that is malloc'd pre-NMT-init phase and freed post-NMT-init phase? > > Okay, I see. It just leaks those memory, so the table can be read-only. Exactly. There are a few allocs that either get free'd or realloc'ed post-init, but not enough to make freeing them worth it if it means having to serialize access to the lookup table. ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From zgu at openjdk.java.net Fri Jul 30 18:57:35 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 30 Jul 2021 18:57:35 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 16:36:41 GMT, Thomas Stuefe wrote: >> src/hotspot/share/services/nmtPreInit.hpp line 309: >> >>> 307: ::memcpy(p_new, a->payload(), MIN2(a->size, new_size)); >>> 308: (*rc) = p_new; >>> 309: _num_reallocs_pre_to_post++; >> >> post-NMT-init counter updates are racy. > > True, this is racy. It's just for diagnostics though - I rather remove them than make them atomic since we would pay for this with every malloc. Or, maybe atomic + debug only? Either one is fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From zgu at openjdk.java.net Fri Jul 30 18:57:31 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 30 Jul 2021 18:57:31 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 16:33:17 GMT, Zhengyu Gu wrote: >> The code is implicitly thread-safe because the hashmap is only modified in the pre-NMT-init phase. After NMT initialization, the table is read-only. During pre-NMT-init we are effectively single-threaded - at most two threads access the map, the thread loading the libjvm, and the thread calling CreateJavaVM, but not at the same time. >> >> See also the asserts in the AllStatic class `NMTPreInit`. >> >> (I should have described it more clearly, will add a comment.) > > So, you are saying that there is no memory that is malloc'd pre-NMT-init phase and freed post-NMT-init phase? Okay, I see. It just leaks those memory, so the table can be read-only. ------------- PR: https://git.openjdk.java.net/jdk/pull/4874 From svkamath at openjdk.java.net Fri Jul 30 19:03:53 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Fri, 30 Jul 2021 19:03:53 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v6] In-Reply-To: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: > I would like to submit AES-GCM optimization for x86_64 architectures supporting AVX3+VAES (Evex encoded AES). This optimization interleaves AES and GHASH operations. > Performance gain of ~1.5x - 2x for message sizes 8k and above. Smita Kamath has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - changed file property of GaloisCounterMode.java - Merge branch 'master' of https://git.openjdk.java.net/jdk into aes-gcm - Extended jtreg test case, made changes to .cpp and .java files - Updated AES-GCM intrinsic to match latest Java Code - merge master - 8267125:Updated intrinsic signature to remove copies of counter, state and subkeyHtbl - Merge master - JDK-8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions ------------- Changes: https://git.openjdk.java.net/jdk/pull/4019/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4019&range=05 Stats: 1320 lines in 21 files changed: 1148 ins; 100 del; 72 mod Patch: https://git.openjdk.java.net/jdk/pull/4019.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4019/head:pull/4019 PR: https://git.openjdk.java.net/jdk/pull/4019 From zgu at openjdk.java.net Fri Jul 30 20:17:32 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 30 Jul 2021 20:17:32 GMT Subject: RFR: JDK-8256844: Make NMT late-initializable In-Reply-To: References: Message-ID: On Thu, 22 Jul 2021 14:58:47 GMT, Thomas Stuefe wrote: > Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free. > > --------- > > NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM. > > However, NMT is of limited use due to the following restrictions: > > - NMT cannot be used if the hotspot is embedded into a custom launcher unless the launcher actively cooperates. Just creating and invoking the JVM is not enough, it needs to do some steps prior to loading the hotspot. This limitation is not well known (nor, do I believe, documented). Many products don't do this, e.g., you cannot use NMT with IntelliJ. For us at SAP this problem limits NMT usefulness greatly since our VMs are often embedded into custom launchers and modifying every launcher is impossible. > - Worse, if that custom launcher links the libjvm *statically* there is just no way to activate NMT at all. This is the reason NMT cannot be used in the `gtestlauncher`. > - Related to that is that we cannot pass NMT options via `JAVA_TOOL_OPTIONS` and `-XX:Flags=`. > - The fact that NMT cannot be used in gtests is really a pity since it would allow us to both test NMT itself more rigorously and check for memory leaks while testing other stuff. > > The reason for all this is that NMT initialization happens very early, on the first call to `os::malloc()`. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments. > > The current solution is to pass NMT arguments via a specially prepared environment variable: `NMT_LEVEL_=`. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher. > > All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable. `java` itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand. > > And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g., `JAVA_TOOL_OPTIONS`. > > ------ > > This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests. > > The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block `nmtPreInit.hpp` and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section. > > The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well. > > Changes in detail: > > - pre-NMT-init handling: > - the new files `nmtPreInit.hpp/cpp` take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase. > - `os::malloc()/os::realloc()/os::free()` defer to this code before doing anything else. > - Please see the extensive comment block at the start of `nmtPreinit.hpp` explaining the details. > > - Changes to NMT: > - Before, NMT initialization was spread over two phases, `initialize()` and `late_initialize()`. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing. > - Minor changes were needed for the `NMT_TrackingLevel` enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block in `nmtCommon.hpp` now clearly specifies what's what, including allowed level state transitions. > - New utility functions to translate tracking level from/to strings added to `NMTUtil` > - NMT has never been able to handle virtual memory allocations before initialization, which is fine since os::reserve_memory() is not called before VM parses arguments. We now assert that. > - All code outside the VM handling NMT initialization (eg. libjli) has been removed, as has the code testing it. > > - Gtests: > - Some existing gtests had to be modified: before, they all changed global state (turning NMT on/off) before testing. This is not allowed anymore, to keep NMT simple. Also, this pattern disturbed other tests. > - The new way to test is to passively check whether NMT has been switched on or off, and do tests accordingly: if on, full tests, if off, test just what makes sense in off-state. That does not disturb neighboring tests, gives us actually better coverage all around. > - It is now possible to start the gtestlauncher with NMT on! Which additionally gives us good coverage. > - To actually do gtests with NMT - since it's disabled by default - we now run NMT-enabled gtests as part of the hotspot jtreg NMT wrapper. This pattern we have done for a number of other facitilites, see all the tests in test/hotspot/jtreg/gtest.. . It works very well. > - Finally, a new gtest has been written to test the NMT preinit lookup map in isolation, placed in `gtest/nmt/test_nmtpreinitmap.cpp`. > > - jtreg: > - A new test has been added, `runtime/NMT/NMTInitializationTest.java`, testing NMT initialization in the face of many many VM arguments. > > ------------- > > Tests: > - ran manually all new tests on 64-bit and 32-bit Linux > - GHAs > - The patch has been active in SAPs test systems for a while now. Looks good in general. src/hotspot/share/services/nmtPreInit.hpp line 153: > 151: > 152: static unsigned calculate_hash(const void* p) { > 153: uintptr_t tmp = p2i(p); malloc memory usually is 2-machine word aligned, maybe tmp = tmp >> LP64_ONLY(4) NOT_LP64(3) can result better hash distribution? ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4874 From valeriep at openjdk.java.net Fri Jul 30 20:19:37 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 30 Jul 2021 20:19:37 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Fri, 30 Jul 2021 18:40:14 GMT, Smita Kamath wrote: >> src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 717: >> >>> 715: in = new byte[Math.min(PARALLEL_LEN, srcLen)]; >>> 716: out = new byte[Math.min(PARALLEL_LEN, srcLen)]; >>> 717: } >> >> Move this down to else-block below just like the 'ct' variable. > > I've kept this code as is and not moved as recommended. If we move this line to the else part, the case where srcLen is less than PARALLEL_LEN but greater than BlockSize, in[] is null. As a result, three tests in test/jdk/../Cipher/AEAD were failing on src.get(in, 0, rlen) line. Do let me know if that's okay. Thanks. Hmm, I see. Sure, fine to keep it as is then. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Fri Jul 30 20:19:37 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 30 Jul 2021 20:19:37 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> Message-ID: On Thu, 22 Jul 2021 17:57:13 GMT, Anthony Scarpino wrote: >> src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 761: >> >>> 759: } >>> 760: >>> 761: dst.put(out, 0, rlen); >> >> This looks belong to the above if-block? I wonder how this have not affected the operation to fail. Perhaps the existing regression tests did not cover the 'rlen < blockSize' case. If the code in the above if-block is not run, this outsize dst.put(...) call would put extra output bytes into the output buffer. > > Yes... this one and the ct offset problem earlier I would have expected the regression test it pick the mistake. There should be tests that catch this.. I'm not sure what's up. This shall be addressed in next update I assume? ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From valeriep at openjdk.java.net Fri Jul 30 20:19:38 2021 From: valeriep at openjdk.java.net (Valerie Peng) Date: Fri, 30 Jul 2021 20:19:38 GMT Subject: RFR: 8267125: AES Galois CounterMode (GCM) interleaved implementation using AVX512 + VAES instructions [v4] In-Reply-To: References: <0a7b_-PDU_JYXR7OrJRK8Z8QPRwLlV2vcHbBbW06SO8=.f0d61fd3-0205-40a7-b1a1-58caa2ea0f45@github.com> <9DGzWlRgC8DaSEZFFeOzQJuRvopW8CISMLJwYQAUGTo=.1aa32797-386f-4101-a96d-6cbad78934f7@github.com> Message-ID: On Thu, 22 Jul 2021 22:52:14 GMT, Anthony Scarpino wrote: >> Yes, I know. Basically, we are trying to optimize performance by trying to write into the supplied buffers (out) as much as we can. But then when tag verification failed, the "written" bytes are erased w/ 0. Ideal case would be not to touch the output buffer until after the tag verification succeeds. Isn't this the previous approach? Verify the tag first and then write out the plain text afterwards. > > With this new intrinsic doing both ghash and gctr at the same time, I cannot do the that ghash check first before the gctr op. I wish I could Oh-well, ok. ------------- PR: https://git.openjdk.java.net/jdk/pull/4019 From coleenp at openjdk.java.net Fri Jul 30 21:32:29 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 30 Jul 2021 21:32:29 GMT Subject: RFR: 8271525: ResourceHashtableBase::iterate() should not declared as const In-Reply-To: References: Message-ID: <2QfaaDy3R20HshsviuqjNZ51EP_3BDodO2u5aTccnsA=.c16ed47c-03d3-446b-979f-31133dd940e1@github.com> On Fri, 30 Jul 2021 04:16:59 GMT, Ioi Lam wrote: > `ResourceHashtableBase::iterate()` is declared `const`, but it can actually change the contents of the table. The same is true for `ResourceHashtableBase::get()`, which returns a non-`const` pointer to the value, allowing the caller to modify it. > > We should declare these two functions as non-`const`. This will also remove a lot of ugly `const_cast<>` code. > > The `iterate()` API is tightened such that the `do_entry()` function can modify the `value` but not the `key`. If you want to enforce not modifying the items in the table, the table can be instantiated with a const type. I think that's what you want. ------------- PR: https://git.openjdk.java.net/jdk/pull/4942 From iignatyev at openjdk.java.net Sat Jul 31 20:42:10 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sat, 31 Jul 2021 20:42:10 GMT Subject: [jdk17] RFR: 8067223: [TESTBUG] Rename Whitebox API package [v2] In-Reply-To: <39PNb3wa-43yB5IFR3XroqFf4vMNZPitzF85lO1Gw58=.f2587741-d2c4-467d-b722-b17269587e7a@github.com> References: <39PNb3wa-43yB5IFR3XroqFf4vMNZPitzF85lO1Gw58=.f2587741-d2c4-467d-b722-b17269587e7a@github.com> Message-ID: > Hi all, > > could you please review this big tedious and trivial(-ish) patch which moves `sun.hotspot.WhiteBox` and related classes to `jdk.test.whitebox` package? > > the majority of the patch is the following substitutions: > - `s~sun/hotspot/WhiteBox~jdk/test/whitebox/WhiteBox~g` > - `s/sun.hotspot.parser/jdk.test.whitebox.parser/g` > - `s/sun.hotspot.cpuinfo/jdk.test.whitebox.cpuinfo/g` > - `s/sun.hotspot.code/jdk.test.whitebox.code/g` > - `s/sun.hotspot.gc/jdk.test.whitebox.gc/g` > - `s/sun.hotspot.WhiteBox/jdk.test.whitebox.WhiteBox/g` > > testing: tier1-4 > > Thanks, > -- Igor Igor Ignatyev has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains 12 new commits since the last revision: - fixed ctw build - updated runtime/cds/appcds/JarBuilder to copy j.t.w.WhiteBox's inner class - updated requires.VMProps - updated TEST.ROOT - adjusted hotspot source - added test - moved and adjusted WhiteBox tests (test/lib-test/sun/hotspot/whitebox) - updated ClassFileInstaller to copy j.t.w.WhiteBox's inner class - removed sun/hotspot/parser/DiagnosticCommand - deprecated sun/hotspot classes disallowed s.h.WhiteBox w/ security manager - ... and 2 more: https://git.openjdk.java.net/jdk17/compare/8f12f2cf...237e8860 ------------- Changes: - all: https://git.openjdk.java.net/jdk17/pull/290/files - new: https://git.openjdk.java.net/jdk17/pull/290/files/8f12f2cf..237e8860 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk17&pr=290&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk17&pr=290&range=00-01 Stats: 3248 lines in 939 files changed: 969 ins; 0 del; 2279 mod Patch: https://git.openjdk.java.net/jdk17/pull/290.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/290/head:pull/290 PR: https://git.openjdk.java.net/jdk17/pull/290 From iignatyev at openjdk.java.net Sat Jul 31 20:51:46 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sat, 31 Jul 2021 20:51:46 GMT Subject: [jdk17] RFR: 8067223: [TESTBUG] Rename Whitebox API package [v2] In-Reply-To: References: <39PNb3wa-43yB5IFR3XroqFf4vMNZPitzF85lO1Gw58=.f2587741-d2c4-467d-b722-b17269587e7a@github.com> Message-ID: On Sat, 31 Jul 2021 20:42:10 GMT, Igor Ignatyev wrote: >> Hi all, >> >> could you please review this big tedious and trivial(-ish) patch which moves `sun.hotspot.WhiteBox` and related classes to `jdk.test.whitebox` package? >> >> the majority of the patch is the following substitutions: >> - `s~sun/hotspot/WhiteBox~jdk/test/whitebox/WhiteBox~g` >> - `s/sun.hotspot.parser/jdk.test.whitebox.parser/g` >> - `s/sun.hotspot.cpuinfo/jdk.test.whitebox.cpuinfo/g` >> - `s/sun.hotspot.code/jdk.test.whitebox.code/g` >> - `s/sun.hotspot.gc/jdk.test.whitebox.gc/g` >> - `s/sun.hotspot.WhiteBox/jdk.test.whitebox.WhiteBox/g` >> >> testing: tier1-4 >> >> Thanks, >> -- Igor > > Igor Ignatyev has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Vladimir, David, I've (forced) pushed a smaller version of the renaming. instead of removing `sun.hotspot` classes, it copies them to `jdk.test.whitebox` (w/ `s.h.parser.DiagnosticCommand` being removed as it's used in WhiteBox signature and it was easier to update a few tests that use it), updates hotspot code to register native methods for both `sun.hotspot.WhiteBox` and `jdk.test.whitebox.WhiteBox` classes. To make it easier and not to introduce extra dependency, I've made it impossible to use `s.h.WB` w/ a security manager enabled, otherwise there would be a dependency b/w `s.h.WB` and `j.t.w.WB$WhiteBoxPermission` or there would be 2 permissions. There are no open JDK tests that are impacted by this limitation. With minor tweaks in closed source, the patch successfully passes Oracle's tier1-4. -- Igor ------------- PR: https://git.openjdk.java.net/jdk17/pull/290