From alanb at openjdk.java.net Sun Nov 1 16:08:54 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Sun, 1 Nov 2020 16:08:54 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 14:13:40 GMT, Maurizio Cimadamore wrote: >>> @mcimadamore, if you pull from current master, you would get the Linux x86_32 tier1 run "for free". >> >> Just did that - I also removed TestMismatch from the problem list in the latest iteration, and fixed the alignment for long/double layouts, after chatting with the team (https://bugs.openjdk.java.net/browse/JDK-8255350) > > I've just uploaded another iteration which addresses some comments from @AlanBateman. Basically, there are some operations on Channel and Socket which take ByteBuffer as arguments, and then, if such buffers are *direct*, they get the address and pass it down to some native function. This idiom is problematic because there's no way to guarantee that the buffer won't be closed (if obtained from a memory segment) after the address has been obtained. As a stop gap solution, I've introduced checks in `DirectBuffer::address` method, which is used in around 30 places in the JDK. This method will now throw if (a) the buffer has a shared scope, or (b) if the scope is confined, but already closed. With this extra check, I believe there's no way to misuse the buffer obtained from a segment. We have discussed plans to remove this limitations (which we think will be possible) - but for the time being, it's better to play the conservative card. I looked through the changes in this update. The shared memory segment support looks sound and the mechanism to close a shared memory segment is clever (albeit a bit surprising at first that it does global handshake to look for a frame in a scoped region. Also surprising that close can cause failure at both ends - it took me a while to see that this is pragmatic approach). The share method specifies NPE if thread == null but there is no thread parameter, is this a cut 'n paste error? Another one in registerCleaner where it should be NPE if the cleaner is null. I think the javadoc for the close method needs to be a bit clearer on the state of the memory segment when IllegalStateException is thrown. Will it be marked "not alive" when it fails? Does this mean there is a resource leak? I think an apiNote to explain the rational for why close is not idempotent is also needed, or maybe it should be re-visited so that close is a no-op when the memory segment is not alive. Now that MemorySegment is AutoCloseable then maybe the term "alive" should be replaced with "open" or "closed" and isAlive replaced with isOpen is isClosed. FileDescriptor can be attraction nuisance and forced reference counting everywhere that it is used. Is it needed? Could an isMapped method work instead? mapFromPath was in the second preview but I think the method name should be re-examined as it maps a file, the path just locates the file. Naming is subjectives but in this case using "map" or "mapFile" would fit beside the allocateNative methods. MappedMemorySegments. The force method specifies a write back guarantee but at the same time, the implNote in the class description suggests that the methods might be a no-op. You might want to adjust the wording to avoid any suggestion that force might be a no-op. The javadoc for copyFrom isn't changed in this update but I notice it specifies IndexOutOfBoundException when the source segment is larger than the receiver, have other exceptions been examined? I don't have any any comments on MemoryAccess except that it's not immediately clear why there are "Byte" methods that take a ByteOrder. Make sense for the multi-byte types of course. The updates the java/nio sources look okay but it would be helpful if the really long lines could be chopped down as it's just too hard to do side-by-side reviews when the lines are so long. A minor nit but the changes X-Buffer.java.template mess up the alignment of the parameters to copyMemory/copySwapMemory methods. ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From vlivanov at openjdk.java.net Sun Nov 1 19:10:58 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Sun, 1 Nov 2020 19:10:58 GMT Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 17:40:51 GMT, Vladimir Kozlov wrote: > We shipped Ahead-of-Time compilation (the jaotc tool) in JDK 9, as an experimental feature. We shipped Graal as an experimental JIT compiler in JDK 10. We haven't seen much use of these features, and the effort required to support and enhance them is significant. We therefore intend to disable these features in Oracle builds as of JDK 16. > > We'll leave the sources for these features in the repository, in case any one else is interested in building them. But we will not update or test them. > > We'll continue to build and ship JVMCI as an experimental feature in Oracle builds. > > Tested changes in all tiers. > > I verified that with these changes I still able to build Graal in open repo and run graalunit testing: > > `open$ bash test/hotspot/jtreg/compiler/graalunit/downloadLibs.sh /mydir/graalunit_lib/` > `open$ bash configure --with-debug-level=fastdebug --with-graalunit-lib=/mydir/graalunit_lib/ --with-jtreg=/mydir/jtreg` > `open$ make jdk-image` > `open$ make test-image` > `open$ make run-test TEST=compiler/graalunit/HotspotTest.java` Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/960 From iveresov at openjdk.java.net Sun Nov 1 20:17:55 2020 From: iveresov at openjdk.java.net (Igor Veresov) Date: Sun, 1 Nov 2020 20:17:55 GMT Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 17:40:51 GMT, Vladimir Kozlov wrote: > We shipped Ahead-of-Time compilation (the jaotc tool) in JDK 9, as an experimental feature. We shipped Graal as an experimental JIT compiler in JDK 10. We haven't seen much use of these features, and the effort required to support and enhance them is significant. We therefore intend to disable these features in Oracle builds as of JDK 16. > > We'll leave the sources for these features in the repository, in case any one else is interested in building them. But we will not update or test them. > > We'll continue to build and ship JVMCI as an experimental feature in Oracle builds. > > Tested changes in all tiers. > > I verified that with these changes I still able to build Graal in open repo and run graalunit testing: > > `open$ bash test/hotspot/jtreg/compiler/graalunit/downloadLibs.sh /mydir/graalunit_lib/` > `open$ bash configure --with-debug-level=fastdebug --with-graalunit-lib=/mydir/graalunit_lib/ --with-jtreg=/mydir/jtreg` > `open$ make jdk-image` > `open$ make test-image` > `open$ make run-test TEST=compiler/graalunit/HotspotTest.java` Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/960 From kvn at openjdk.java.net Sun Nov 1 21:02:53 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 1 Nov 2020 21:02:53 GMT Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: Message-ID: On Sun, 1 Nov 2020 20:15:01 GMT, Igor Veresov wrote: >> We shipped Ahead-of-Time compilation (the jaotc tool) in JDK 9, as an experimental feature. We shipped Graal as an experimental JIT compiler in JDK 10. We haven't seen much use of these features, and the effort required to support and enhance them is significant. We therefore intend to disable these features in Oracle builds as of JDK 16. >> >> We'll leave the sources for these features in the repository, in case any one else is interested in building them. But we will not update or test them. >> >> We'll continue to build and ship JVMCI as an experimental feature in Oracle builds. >> >> Tested changes in all tiers. >> >> I verified that with these changes I still able to build Graal in open repo and run graalunit testing: >> >> `open$ bash test/hotspot/jtreg/compiler/graalunit/downloadLibs.sh /mydir/graalunit_lib/` >> `open$ bash configure --with-debug-level=fastdebug --with-graalunit-lib=/mydir/graalunit_lib/ --with-jtreg=/mydir/jtreg` >> `open$ make jdk-image` >> `open$ make test-image` >> `open$ make run-test TEST=compiler/graalunit/HotspotTest.java` > > Marked as reviewed by iveresov (Reviewer). Thank you for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/960 From dongbo at openjdk.java.net Mon Nov 2 03:12:01 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 2 Nov 2020 03:12:01 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic Message-ID: Base64.encodeBlock stub is implemented for x86_64. We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. A JMH micro, Base64Encode.java, is added for performance test. With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. The Base64Encode.java JMH micro-benchmark results: Benchmark (maxNumBytes) Mode Cnt Score Error Units # kunpeng 916, intrinsic Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op # kunpeng 916, default Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op # kunpeng 920, intrinsic Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op # kunpeng 920, default Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op ------------- Commit messages: - Merge branch 'master' into aarch64-base64-encoding - 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic Changes: https://git.openjdk.java.net/jdk/pull/992/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255625 Stats: 216 lines in 3 files changed: 216 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/992.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992 PR: https://git.openjdk.java.net/jdk/pull/992 From rehn at openjdk.java.net Mon Nov 2 07:35:55 2020 From: rehn at openjdk.java.net (Robbin Ehn) Date: Mon, 2 Nov 2020 07:35:55 GMT Subject: RFR: 8255596: Mutex safepoint checking options and flags should be scoped enums In-Reply-To: References: Message-ID: <6UO5Xv_DO_zsSoLd1ybgRxd-NzqbTp7EqF7hcBqeGX4=.c2f93658-ecfa-4127-b4c5-483caa6f78da@github.com> On Fri, 30 Oct 2020 14:20:25 GMT, Kim Barrett wrote: > Please review this change to some enums in the Mutex class. > SafepointCheckFlag and SafepointCheckRequired are changed to scoped > enums. Also removed the anonymous enum defining _allow_vm_block_flag > and _as_suspend_equivalent_flag, instead defining those as bool > constants. > > To avoid changing all references to the SafepointCheckXXX enumerators > (due to the additional scoping introduced by using scoped enums), same > named constants are defined at Mutex class scope. Some renaming might > be preferable in the long term, but I didn't want to do that just to > get the improved type checking. An X-macro approach to defining the > enumerators and hoisting them into class scope could have been taken, > but the number of enumerators here doesn't seem to warrant the additional > infrastructure to do so. > > Changing the enum types uncovered a few places in the implementation > of Mutex and MutexLocker where enum values were being implicitly > converted to bool, with associated assumptions about the order or > values of the enumerators. Those have been fixed. > > Testing: > tier1 +1 ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/957 From adinn at openjdk.java.net Mon Nov 2 09:50:55 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 2 Nov 2020 09:50:55 GMT Subject: RFR: JDK-8255544: Create a checked cast [v2] In-Reply-To: References: Message-ID: On Sat, 31 Oct 2020 14:02:07 GMT, Andrew Haley wrote: >> In many places we've added C-style casts to silence compiler warnings, for example when truncating a size_t to an int when we know the size_t is a small struct. Such casts are inherently risky, because they effectively disable useful compiler warnings. We should add a form of cast that checks at runtime that a truncation does not overflow. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8255544: Create a checked cast Still looking good ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/904 From kbarrett at openjdk.java.net Mon Nov 2 10:03:54 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 2 Nov 2020 10:03:54 GMT Subject: RFR: 8255596: Mutex safepoint checking options and flags should be scoped enums In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 15:27:31 GMT, Thomas Schatzl wrote: >> Please review this change to some enums in the Mutex class. >> SafepointCheckFlag and SafepointCheckRequired are changed to scoped >> enums. Also removed the anonymous enum defining _allow_vm_block_flag >> and _as_suspend_equivalent_flag, instead defining those as bool >> constants. >> >> To avoid changing all references to the SafepointCheckXXX enumerators >> (due to the additional scoping introduced by using scoped enums), same >> named constants are defined at Mutex class scope. Some renaming might >> be preferable in the long term, but I didn't want to do that just to >> get the improved type checking. An X-macro approach to defining the >> enumerators and hoisting them into class scope could have been taken, >> but the number of enumerators here doesn't seem to warrant the additional >> infrastructure to do so. >> >> Changing the enum types uncovered a few places in the implementation >> of Mutex and MutexLocker where enum values were being implicitly >> converted to bool, with associated assumptions about the order or >> values of the enumerators. Those have been fixed. >> >> Testing: >> tier1 > > Lgtm. Thanks @tschatzl and @robehn for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/957 From kbarrett at openjdk.java.net Mon Nov 2 10:14:11 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 2 Nov 2020 10:14:11 GMT Subject: RFR: 8255596: Mutex safepoint checking options and flags should be scoped enums [v2] In-Reply-To: References: Message-ID: <0j9q9EPYihcSMQ0Y5Pl2zkPPMKmD84b4noOff_CXrOI=.8f661ec2-1169-4c82-a6fa-91f3a6c71038@github.com> > Please review this change to some enums in the Mutex class. > SafepointCheckFlag and SafepointCheckRequired are changed to scoped > enums. Also removed the anonymous enum defining _allow_vm_block_flag > and _as_suspend_equivalent_flag, instead defining those as bool > constants. > > To avoid changing all references to the SafepointCheckXXX enumerators > (due to the additional scoping introduced by using scoped enums), same > named constants are defined at Mutex class scope. Some renaming might > be preferable in the long term, but I didn't want to do that just to > get the improved type checking. An X-macro approach to defining the > enumerators and hoisting them into class scope could have been taken, > but the number of enumerators here doesn't seem to warrant the additional > infrastructure to do so. > > Changing the enum types uncovered a few places in the implementation > of Mutex and MutexLocker where enum values were being implicitly > converted to bool, with associated assumptions about the order or > values of the enumerators. Those have been fixed. > > Testing: > tier1 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into strong_mutex_flags - fix assert messages - use scoped enums ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/957/files - new: https://git.openjdk.java.net/jdk/pull/957/files/caa0c162..a4ea5df5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=957&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=957&range=00-01 Stats: 1971 lines in 87 files changed: 1004 ins; 420 del; 547 mod Patch: https://git.openjdk.java.net/jdk/pull/957.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/957/head:pull/957 PR: https://git.openjdk.java.net/jdk/pull/957 From kbarrett at openjdk.java.net Mon Nov 2 10:22:03 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 2 Nov 2020 10:22:03 GMT Subject: Integrated: 8255596: Mutex safepoint checking options and flags should be scoped enums In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 14:20:25 GMT, Kim Barrett wrote: > Please review this change to some enums in the Mutex class. > SafepointCheckFlag and SafepointCheckRequired are changed to scoped > enums. Also removed the anonymous enum defining _allow_vm_block_flag > and _as_suspend_equivalent_flag, instead defining those as bool > constants. > > To avoid changing all references to the SafepointCheckXXX enumerators > (due to the additional scoping introduced by using scoped enums), same > named constants are defined at Mutex class scope. Some renaming might > be preferable in the long term, but I didn't want to do that just to > get the improved type checking. An X-macro approach to defining the > enumerators and hoisting them into class scope could have been taken, > but the number of enumerators here doesn't seem to warrant the additional > infrastructure to do so. > > Changing the enum types uncovered a few places in the implementation > of Mutex and MutexLocker where enum values were being implicitly > converted to bool, with associated assumptions about the order or > values of the enumerators. Those have been fixed. > > Testing: > tier1 This pull request has now been integrated. Changeset: 69f5235e Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/69f5235e Stats: 23 lines in 3 files changed: 12 ins; 2 del; 9 mod 8255596: Mutex safepoint checking options and flags should be scoped enums Reviewed-by: tschatzl, rehn ------------- PR: https://git.openjdk.java.net/jdk/pull/957 From mcimadamore at openjdk.java.net Mon Nov 2 11:11:56 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 2 Nov 2020 11:11:56 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) In-Reply-To: References: Message-ID: On Sun, 1 Nov 2020 16:06:32 GMT, Alan Bateman wrote: > Now that MemorySegment is AutoCloseable then maybe the term "alive" should be replaced with "open" or "closed" and isAlive replaced with isOpen is isClosed. While the reason for the method being called "isAlive" are mostly historical (the old Panama pointer API had such a method), I think I still stand behind the current naming scheme. For temporal bounds, I think "isAlive" works better than "isOpened". > MappedMemorySegments. The force method specifies a write back guarantee but at the same time, the implNote in the class description suggests that the methods might be a no-op. You might want to adjust the wording to avoid any suggestion that force might be a no-op. The comment that this operation could be no-op was borrowed from the `MappedByteBuffer` API; looking at the impl, it seems that you are right that, under no circumstances (unless the segment has length zero) this should be a no-op. How do you suggest I proceed? ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From eosterlund at openjdk.java.net Mon Nov 2 11:14:10 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Nov 2020 11:14:10 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: > The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation(). > > The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop. > > Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is wrong, as we can safepoint on the way back to Java, which will break the return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the transition to VM and back, into a block of code that is protected against GC. Before the JRT_BLOCK is called, we stash away the return oop, and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when InterpreterRuntime::post_method_exit is called when throwing an exception, we don't have the same problem of retaining an oop result, and hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the same as before for this path. > > This is a JVMTI bug that has probably been around for a long time. It crashes with all GCs, but was discovered recently after concurrent stack processing, as StefanK has been running better GC stressing code in JVMTI, and the bug reproduced more easily with concurrent stack processing, as the timings were a bit different. The following reproducer failed pretty much 100% of the time: > while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done > > With my fix I can run this repeatedly without any more failures. I have also sanity checked the patch by running tier 1-5, so that it does not introduces any new issues on its own. I have also used Stefan's nice external GC stressing with jcmd technique that was used to trigger crashes with other GCs, to make sure said crashes no longer reproduce either. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Coleen CR1: Refactoring ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/930/files - new: https://git.openjdk.java.net/jdk/pull/930/files/d7500082..ae6355fd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=930&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=930&range=00-01 Stats: 41 lines in 2 files changed: 13 ins; 19 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/930.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/930/head:pull/930 PR: https://git.openjdk.java.net/jdk/pull/930 From eosterlund at openjdk.java.net Mon Nov 2 11:19:59 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Nov 2020 11:19:59 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 14:09:46 GMT, Erik ?sterlund wrote: >> Oh that's actually horrible. I wonder if it's possible to hoist saving the result oop into the InterpreterRuntime entry. And pass the Handle into JvmtiExport::post_method_exit(). > > I tried that first, and ended up with a bunch of non-trivial code duplication instead, as reading the oop is done in both paths but for different reasons. One to preserve/restore it (interpreter remove_activation entry), but also inside of JvmtiExport::post_method_exit() so that it can be passed into the MethodExit. I will give it another shot and see if it is possible to refactor it in a better way. I uploaded a CR that does pretty much what you suggested, ish. Hope you like it! ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From eosterlund at openjdk.java.net Mon Nov 2 11:19:59 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Nov 2020 11:19:59 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: <5-JnsHKeZztip64W88tRwA9_pcuWze2jftUq0C6oYSM=.79916a5b-5dd1-402f-b1ba-0caa38a39c1b@github.com> References: <5-JnsHKeZztip64W88tRwA9_pcuWze2jftUq0C6oYSM=.79916a5b-5dd1-402f-b1ba-0caa38a39c1b@github.com> Message-ID: On Sat, 31 Oct 2020 09:54:09 GMT, Serguei Spitsyn wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Coleen CR1: Refactoring > > Hi Erik, > > Nice discovery! Indeed, this is a long standing issue. It looks good in general. > I agree with Coleen, it would be nice if there is an elegant way to move the oop_result saving/restoring code to InterpreterRuntime::post_method_exit. Otherwise, I'm okay with what you have now. > It is also nice discovery of the issue with clearing the expression stack. I think, it was my mistake in the initial implementation of the ForceEarlyReturn when I followed the PopFrame implementation pattern. It is good to separate it from the current fix. > > Thanks, > Serguei I uploaded a new commit to perform some refactoring as requested by Coleen and Serguei. I made the oop save/restore + JRT_BLOCK logic belong only to the path taken from InterpreterRuntime::post_method_exit. An inner posting method is called both from that path and from JvmtiExport::notice_unwind_due_to_exception. I think the result is an improvement in terms of how clear it is. I didn't want to move logic all the way back to InterpreterRuntime::post_method_exit though, as I don't think it looks pretty to have large chunks of JVMTI implementation details in the interpreterRuntime.cpp file. So I did basically what you suggested, with the slight difference of moving all the JVMTI implementation into the JVMTI file instead, which is just called from InterpreterRuntime::post_method_exit. Hope you are okay with this! ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From eosterlund at openjdk.java.net Mon Nov 2 11:25:56 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Nov 2020 11:25:56 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: <5-JnsHKeZztip64W88tRwA9_pcuWze2jftUq0C6oYSM=.79916a5b-5dd1-402f-b1ba-0caa38a39c1b@github.com> References: <5-JnsHKeZztip64W88tRwA9_pcuWze2jftUq0C6oYSM=.79916a5b-5dd1-402f-b1ba-0caa38a39c1b@github.com> Message-ID: On Sat, 31 Oct 2020 09:54:09 GMT, Serguei Spitsyn wrote: > Hi Erik, > > Nice discovery! Indeed, this is a long standing issue. It looks good in general. > I agree with Coleen, it would be nice if there is an elegant way to move the oop_result saving/restoring code to InterpreterRuntime::post_method_exit. Otherwise, I'm okay with what you have now. > It is also nice discovery of the issue with clearing the expression stack. I think, it was my mistake in the initial implementation of the ForceEarlyReturn when I followed the PopFrame implementation pattern. It is good to separate it from the current fix. > > Thanks, > Serguei Thanks for reviewing this Serguei. And thanks for confirming our suspicions regarding clearing of the expression stack. I wasn't sure if anyone would be around that knew how it ended up there! I made the refactoring that you and Coleen wanted, I think. Hope you like it! ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From mcimadamore at openjdk.java.net Mon Nov 2 11:29:54 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 2 Nov 2020 11:29:54 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) In-Reply-To: References: Message-ID: On Sun, 1 Nov 2020 16:06:32 GMT, Alan Bateman wrote: > The javadoc for copyFrom isn't changed in this update but I notice it specifies IndexOutOfBoundException when the source segment is larger than the receiver, have other exceptions been examined? This exception is consistent with other uses of this exception throughout this API (e.g. when writing a segment out of bounds). ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From stefank at openjdk.java.net Mon Nov 2 11:40:04 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 2 Nov 2020 11:40:04 GMT Subject: RFR: 8212879: Make JVMTI TagMap table not hash on oop address In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 08:25:28 GMT, Stefan Karlsson wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > src/hotspot/share/prims/jvmtiTagMap.cpp line 126: > >> 124: // concurrent GCs. So fix it here once we have a lock or are >> 125: // at a safepoint. >> 126: // SetTag and GetTag should not post events! > > I think it would be good to explain why. Otherwise, this just leaves the readers wondering why this is the case. Maybe even move this comment to the set_tag/get_tag code. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From stefank at openjdk.java.net Mon Nov 2 11:40:04 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 2 Nov 2020 11:40:04 GMT Subject: RFR: 8212879: Make JVMTI TagMap table not hash on oop address In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 20:23:04 GMT, Coleen Phillimore wrote: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Commented on nits, and reviewed GC code and tag map code. Didn't look closely on the hashmap changes. src/hotspot/share/gc/shared/oopStorageSet.hpp line 41: > 39: // Must be updated when new OopStorages are introduced > 40: static const uint strong_count = 4 JVMTI_ONLY(+ 1); > 41: static const uint weak_count = 5 JVMTI_ONLY(+1) JFR_ONLY(+ 1); All other uses `+ 1` instead of `+1`. src/hotspot/share/gc/shared/weakProcessorPhaseTimes.hpp line 49: > 47: double _phase_times_sec[1]; > 48: size_t _phase_dead_items[1]; > 49: size_t _phase_total_items[1]; This should be removed and the associated reset_items src/hotspot/share/gc/z/zOopClosures.hpp line 64: > 62: }; > 63: > 64: class ZPhantomKeepAliveOopClosure : public ZRootsIteratorClosure { Seems like you flipped the location of these two. Maybe revert? src/hotspot/share/prims/jvmtiExport.hpp line 405: > 403: > 404: // Delete me and all my callers! > 405: static void weak_oops_do(BoolObjectClosure* b, OopClosure* f) {} Maybe delete? src/hotspot/share/prims/jvmtiTagMap.cpp line 126: > 124: // concurrent GCs. So fix it here once we have a lock or are > 125: // at a safepoint. > 126: // SetTag and GetTag should not post events! I think it would be good to explain why. Otherwise, this just leaves the readers wondering why this is the case. src/hotspot/share/prims/jvmtiTagMap.cpp line 131: > 129: // Operating on the hashmap must always be locked, since concurrent GC threads may > 130: // notify while running through a safepoint. > 131: assert(is_locked(), "checking"); Maybe move this to the top of the function to make it very clear. src/hotspot/share/prims/jvmtiTagMap.cpp line 133: > 131: assert(is_locked(), "checking"); > 132: if (post_events && env()->is_enabled(JVMTI_EVENT_OBJECT_FREE)) { > 133: log_info(jvmti, table)("TagMap table needs posting before heap walk"); Not sure about the "before heap walk" since this is also done from GetObjectsWithTags, which does *not* do a heap walk but still requires posting. src/hotspot/share/prims/jvmtiTagMap.cpp line 140: > 138: hashmap()->rehash(); > 139: _needs_rehashing = false; > 140: } It's not clear to me that it's correct to rehash *after* posting. I think it is, because unlink_and_post will use load barriers to fixup old pointers. src/hotspot/share/prims/jvmtiTagMap.cpp line 146: > 144: // The ZDriver may be walking the hashmaps concurrently so all these locks are needed. > 145: void JvmtiTagMap::check_hashmaps_for_heapwalk() { > 146: Extra white space. (Also double whitespace after this function) src/hotspot/share/prims/jvmtiTagMap.cpp line 144: > 142: > 143: // This checks for posting and rehashing and is called from the heap walks. > 144: // The ZDriver may be walking the hashmaps concurrently so all these locks are needed. Should this comment be moved down to the lock taking? src/hotspot/share/prims/jvmtiTagMap.cpp line 377: > 375: MutexLocker ml(lock(), Mutex::_no_safepoint_check_flag); > 376: > 377: // Check if we have to processing for concurrent GCs. Sentence seems to be missing a few words. src/hotspot/share/prims/jvmtiTagMap.cpp line 954: > 952: o->klass()->external_name()); > 953: return; > 954: } Why is this done as a part of this RFE? Is this a bug fix that should be done as a separate patch? src/hotspot/share/prims/jvmtiTagMap.cpp line 1152: > 1150: void JvmtiTagMap::unlink_and_post_locked() { > 1151: MutexLocker ml(lock(), Mutex::_no_safepoint_check_flag); > 1152: log_info(jvmti, table)("TagMap table needs posting before GetObjectTags"); There's no function called GetObjectTags. This log line needs to be adjusted. src/hotspot/share/prims/jvmtiTagMap.cpp line 1162: > 1160: VMOp_Type type() const { return VMOp_Cleanup; } > 1161: void doit() { > 1162: _tag_map->unlink_and_post_locked(); Either inline unlink_and_post_locked() or updated gc_notification to use it? src/hotspot/share/prims/jvmtiTagMap.cpp line 1279: > 1277: // Can't post ObjectFree events here from a JavaThread, so this > 1278: // will race with the gc_notification thread in the tiny > 1279: // window where the oop is not marked but hasn't been notified that Please don't use "oop" when referring to "objects". src/hotspot/share/prims/jvmtiTagMap.cpp line 2975: > 2973: } > 2974: > 2975: // Concurrent GC needs to call this in relocation pause, so after the oops are moved oops => objects src/hotspot/share/prims/jvmtiTagMap.cpp line 2977: > 2975: // Concurrent GC needs to call this in relocation pause, so after the oops are moved > 2976: // and have their new addresses, the table can be rehashed. > 2977: void JvmtiTagMap::set_needs_processing() { Maybe rename to set_needs_rehashing? src/hotspot/share/prims/jvmtiTagMap.cpp line 2985: > 2983: > 2984: JvmtiEnvIterator it; > 2985: for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { The iterator seems fast enough, so it seems unnecessary to have the environments_might_exist check. src/hotspot/share/prims/jvmtiTagMap.cpp line 2998: > 2996: // thread creation and before VMThread creation (1 thread); initial GC > 2997: // verification can happen in that window which gets to here. > 2998: if (!JvmtiEnv::environments_might_exist()) { return; } I don't know what this comment is saying, and why the code is needed. src/hotspot/share/prims/jvmtiTagMap.cpp line 3020: > 3018: JvmtiTagMap* tag_map = env->tag_map_acquire(); > 3019: if (tag_map != NULL && !tag_map->is_empty()) { > 3020: if (num_dead_entries > 0) { The other num_dead_entries check for != 0. Maybe use the same in the two branches? src/hotspot/share/prims/jvmtiTagMap.cpp line 3023: > 3021: tag_map->hashmap()->unlink_and_post(tag_map->env()); > 3022: } > 3023: tag_map->_needs_rehashing = true; Maybe add a small comment why this is deferred. src/hotspot/share/prims/jvmtiTagMap.hpp line 56: > 54: void entry_iterate(JvmtiTagMapEntryClosure* closure); > 55: void post_dead_object_on_vm_thread(); > 56: public: Looked nicer when there was a blank line before public. Now it looks like public "relates" more to the code before than after. src/hotspot/share/prims/jvmtiTagMap.hpp line 114: > 112: static void check_hashmaps_for_heapwalk(); > 113: static void set_needs_processing() NOT_JVMTI_RETURN; > 114: static void gc_notification(size_t num_dead_entries) NOT_JVMTI_RETURN; Have you verified that this builds without JVMTI? src/hotspot/share/prims/jvmtiTagMapTable.cpp line 50: > 48: // A subsequent oop_load without AS_NO_KEEPALIVE (the object() accessor) > 49: // keeps the oop alive before doing so. > 50: return literal().peek(); I'm not sure we should be talking about the low-level Access names. Maybe reword in terms of WeakHandle operations? src/hotspot/share/prims/jvmtiTagMapTable.cpp line 81: > 79: void JvmtiTagMapTable::free_entry(JvmtiTagMapEntry* entry) { > 80: unlink_entry(entry); > 81: entry->literal().release(JvmtiExport::weak_tag_storage()); // release OopStorage release *to* OopStorage? src/hotspot/share/prims/jvmtiTagMapTable.cpp line 82: > 80: unlink_entry(entry); > 81: entry->literal().release(JvmtiExport::weak_tag_storage()); // release OopStorage > 82: FREE_C_HEAP_ARRAY(char, entry); // C_Heap free. Seems excessively redundant: // C_Heap free. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 98: > 96: > 97: // The obj is in the table as a target already > 98: if (target != NULL && target == obj) { Wonder if we could assert that obj is not NULL at the entry of this function, and then change this to simply target == obj? src/hotspot/share/prims/jvmtiTagMapTable.cpp line 122: > 120: int index = hash_to_index(hash); > 121: // One was added while acquiring the lock > 122: JvmtiTagMapEntry* entry = find(index, hash, obj); Should this be done inside ASSERT? test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/cm02t001.cpp line 64: > 62: static jclass klass = NULL; > 63: static jobject testedObject = NULL; > 64: const jlong TESTED_TAG_VALUE = (5555555L); Remove parenthesis? ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From mcimadamore at openjdk.java.net Mon Nov 2 11:59:09 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 2 Nov 2020 11:59:09 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v20] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address comments from @AlanBateman ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/548/files - new: https://git.openjdk.java.net/jdk/pull/548/files/bd400615..8225bf2e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=19 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=18-19 Stats: 121 lines in 9 files changed: 14 ins; 53 del; 54 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From serb at openjdk.java.net Mon Nov 2 12:15:02 2020 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Mon, 2 Nov 2020 12:15:02 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v20] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 11:59:09 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: >> >> * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads >> * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually >> * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. >> >> A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. >> >> This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). >> >> A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. >> >> A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. >> >> Thanks >> Maurizio >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff: >> >> http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254163 >> >> >> >> ### API Changes >> >> * `MemorySegment` >> * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) >> * added a no-arg factory for a native restricted segment representing entire native heap >> * rename `withOwnerThread` to `handoff` >> * add new `share` method, to create shared segments >> * add new `registerCleaner` method, to register a segment against a cleaner >> * add more helpers to create arrays from a segment e.g. `toIntArray` >> * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) >> * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) >> * `MemoryAddress` >> * drop `segment` accessor >> * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment >> * `MemoryAccess` >> * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). >> * `MemoryHandles` >> * drop `withOffset` combinator >> * drop `withStride` combinator >> * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. >> * `Addressable` >> * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. >> * `MemoryLayouts` >> * A new layout, for machine addresses, has been added to the mix. >> >> >> >> ### Implementation changes >> >> There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. >> >> #### Shared segments >> >> The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. >> >> After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. >> >> Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). >> >> The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. >> >> As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. >> >> In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. >> >> To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). >> >> Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). >> >> `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. >> >> The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. >> >> #### Memory access var handles overhaul >> >> The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. >> >> This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. >> >> This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. >> >> #### Test changes >> >> Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. >> >> [1] - https://openjdk.java.net/jeps/393 >> [2] - https://openjdk.java.net/jeps/389 >> [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html >> [4] - https://openjdk.java.net/jeps/312 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address comments from @AlanBateman test/jdk/java/foreign/TestCleaner.java line 8: > 6: * under the terms of the GNU General Public License version 2 only, as > 7: * published by the Free Software Foundation. Oracle designates this > 8: * particular file as subject to the "Classpath" exception as provided "Classpath exception" could be dropped? ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From ihse at openjdk.java.net Mon Nov 2 12:30:56 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 2 Nov 2020 12:30:56 GMT Subject: RFR: 8212879: Make JVMTI TagMap table not hash on oop address In-Reply-To: References: Message-ID: <4yShPUjSyZSPGnbVh-jr58lGT9psXrg88HZIk5Wa-40=.c2f77dae-d582-4fb4-9f6e-bdcb5ced63b0@github.com> On Fri, 30 Oct 2020 20:23:04 GMT, Coleen Phillimore wrote: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Build changes look good. Not reviewed hotspot changes. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/967 From ihse at openjdk.java.net Mon Nov 2 12:31:59 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 2 Nov 2020 12:31:59 GMT Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 17:40:51 GMT, Vladimir Kozlov wrote: > We shipped Ahead-of-Time compilation (the jaotc tool) in JDK 9, as an experimental feature. We shipped Graal as an experimental JIT compiler in JDK 10. We haven't seen much use of these features, and the effort required to support and enhance them is significant. We therefore intend to disable these features in Oracle builds as of JDK 16. > > We'll leave the sources for these features in the repository, in case any one else is interested in building them. But we will not update or test them. > > We'll continue to build and ship JVMCI as an experimental feature in Oracle builds. > > Tested changes in all tiers. > > I verified that with these changes I still able to build Graal in open repo and run graalunit testing: > > `open$ bash test/hotspot/jtreg/compiler/graalunit/downloadLibs.sh /mydir/graalunit_lib/` > `open$ bash configure --with-debug-level=fastdebug --with-graalunit-lib=/mydir/graalunit_lib/ --with-jtreg=/mydir/jtreg` > `open$ make jdk-image` > `open$ make test-image` > `open$ make run-test TEST=compiler/graalunit/HotspotTest.java` Build changes look good. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/960 From rkennke at openjdk.java.net Mon Nov 2 12:56:12 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 Nov 2020 12:56:12 GMT Subject: RFR: 8254315: Shenandoah: Concurrent weak reference processing [v27] In-Reply-To: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> References: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> Message-ID: > Until now, references (as in java.lang.ref.Reference and its subclasses WeakReference, SoftReference, PhantomReference and the non-public FinalReference - I'll collectively call them weak references for the purpose of clarity). Workloads that make heavvy use of such weak references will therefore potentially cause significant GC pauses. > > There are 3 main items that contribute to pause time linear to number of references, or worse: > - We need to scan and consider each reference on the various 'discovered' lists. > - We need to mark through subgraph of objects that are reachable only through FinalReference. Notice that this is theoretically only bounded by the live data set size. > - Finally, all no-longer-reachable references need to be enqueued in the 'pending list' > > The problem is somewhat mitigated by pre-cleaning the discovered list: Any weak reference that we find to be strongly reachable will be removed before we go into the final-mark-pause. However, that is only a band-aid. > > The solution to this is two-fold: > 1. Extend concurrent marking to also mark the 'finalizable' subgraph of the heap. This requires to extend the marking bitmap to allow for two kinds of reachability: each object can now be strongly and finalizably reachable. Whenever marking encounters a FinalReference, it will mark through the referent and switch to 'finalizably' reachability for all objects starting from the referent. When marking encounters finalizably reachable objects while marking strongly, it will 'upgrade' reachability of such objects to strongly reachable. All of this can be done concurrently. Any encounter of a Reference (or subclass) object will enqueue that object into a thread-local 'discovered' list. Except for FinalReference, marking stops there, and does not mark through the referent. > 2. Concurrent processing is performed after the final-mark pause. GC workers scan all discovered lists that have been collected by concurrent marking, and depending on reachability of the referent, either drop the Reference, or enqueue it into the global 'pending' list (from where it will be processed by Java reference handler thread). In addition to that, we must ensure that no referents become resurrected by accessing Reference.get() on it. In order to achieve this, we employ special barriers in Reference.get() intrinsics that return NULL when the referent is not reachable. > > Testing: hotspot_gc_shenadoah (release+fastdebug, x86+aarch64), specjvm+specjbb without regressions, tier1, tier2, vmTestbase_vm_metaspace, vmTestbase_nsk_jvmti, with -XX:+UseShenandoahGC without regressions, specjvm with various levels of verification Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 84 commits: - Adopt ShenandoahReferenceBarrier to recent changes in LRB runtime impl - Merge branch 'master' into shenandoah-concurrent-weakrefs - Invert strong/weak in marking tasks and related code - Fix merge mistake - Merge branch 'master' into shenandoah-concurrent-weakrefs - Pass marking-strength through chunked arrays - Rename mark_final -> mark_weak and several cleanups (by shade) - Some more ShMarkTask cleanups - Call into native-LRB on unknown oop strenght (i.e. reflection) too - Put in comment about API impedence mismatch around interpreter native LRB - ... and 74 more: https://git.openjdk.java.net/jdk/compare/eb66418b...c09fda9a ------------- Changes: https://git.openjdk.java.net/jdk/pull/505/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=26 Stats: 2416 lines in 55 files changed: 1643 ins; 565 del; 208 mod Patch: https://git.openjdk.java.net/jdk/pull/505.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/505/head:pull/505 PR: https://git.openjdk.java.net/jdk/pull/505 From coleenp at openjdk.java.net Mon Nov 2 13:22:15 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 13:22:15 GMT Subject: RFR: 8212879: Make JVMTI TagMap table not hash on oop address In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 08:08:53 GMT, Stefan Karlsson wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > src/hotspot/share/gc/shared/oopStorageSet.hpp line 41: > >> 39: // Must be updated when new OopStorages are introduced >> 40: static const uint strong_count = 4 JVMTI_ONLY(+ 1); >> 41: static const uint weak_count = 5 JVMTI_ONLY(+1) JFR_ONLY(+ 1); > > All other uses `+ 1` instead of `+1`. Fixed, although I think the space looks strange there but I'll go along. > src/hotspot/share/gc/shared/weakProcessorPhaseTimes.hpp line 49: > >> 47: double _phase_times_sec[1]; >> 48: size_t _phase_dead_items[1]; >> 49: size_t _phase_total_items[1]; > > This should be removed and the associated reset_items Removed. > src/hotspot/share/gc/z/zOopClosures.hpp line 64: > >> 62: }; >> 63: >> 64: class ZPhantomKeepAliveOopClosure : public ZRootsIteratorClosure { > > Seems like you flipped the location of these two. Maybe revert? Reverted. There was a rebasing conflict here so this was unintentional. > src/hotspot/share/prims/jvmtiExport.hpp line 405: > >> 403: >> 404: // Delete me and all my callers! >> 405: static void weak_oops_do(BoolObjectClosure* b, OopClosure* f) {} > > Maybe delete? Yes, meant to do that. > src/hotspot/share/prims/jvmtiTagMap.cpp line 131: > >> 129: // Operating on the hashmap must always be locked, since concurrent GC threads may >> 130: // notify while running through a safepoint. >> 131: assert(is_locked(), "checking"); > > Maybe move this to the top of the function to make it very clear. ok. > src/hotspot/share/prims/jvmtiTagMap.cpp line 133: > >> 131: assert(is_locked(), "checking"); >> 132: if (post_events && env()->is_enabled(JVMTI_EVENT_OBJECT_FREE)) { >> 133: log_info(jvmti, table)("TagMap table needs posting before heap walk"); > > Not sure about the "before heap walk" since this is also done from GetObjectsWithTags, which does *not* do a heap walk but still requires posting. I don't call check_hashmap for GetObjectsWithTags. > src/hotspot/share/prims/jvmtiTagMap.cpp line 140: > >> 138: hashmap()->rehash(); >> 139: _needs_rehashing = false; >> 140: } > > It's not clear to me that it's correct to rehash *after* posting. I think it is, because unlink_and_post will use load barriers to fixup old pointers. I think it's better that the rehashing doesn't encounter null entries and the WeakHandle.peek() operation is used for both so I hope it would get the same answer. If not, which seems bad, the last answer should be what we hash on. > src/hotspot/share/prims/jvmtiTagMap.cpp line 144: > >> 142: >> 143: // This checks for posting and rehashing and is called from the heap walks. >> 144: // The ZDriver may be walking the hashmaps concurrently so all these locks are needed. > > Should this comment be moved down to the lock taking? ok, I also made it singular since Erik pointed out that we don't need the other lock. > src/hotspot/share/prims/jvmtiTagMap.cpp line 146: > >> 144: // The ZDriver may be walking the hashmaps concurrently so all these locks are needed. >> 145: void JvmtiTagMap::check_hashmaps_for_heapwalk() { >> 146: > > Extra white space. (Also double whitespace after this function) ? I removed the double whitespace after the function and put the whitespace here. It needs some whitespace. // This checks for posting and rehashing and is called from the heap walks. void JvmtiTagMap::check_hashmaps_for_heapwalk() { assert(SafepointSynchronize::is_at_safepoint(), "called from safepoints"); // Verify that the tag map tables are valid and unconditionally post events // that are expected to be posted before gc_notification. JvmtiEnvIterator it; > src/hotspot/share/prims/jvmtiTagMap.cpp line 377: > >> 375: MutexLocker ml(lock(), Mutex::_no_safepoint_check_flag); >> 376: >> 377: // Check if we have to processing for concurrent GCs. > > Sentence seems to be missing a few words. removed the sentence, because non concurrent GCs also defer rehashing to next use. > src/hotspot/share/prims/jvmtiTagMap.cpp line 954: > >> 952: o->klass()->external_name()); >> 953: return; >> 954: } > > Why is this done as a part of this RFE? Is this a bug fix that should be done as a separate patch? Because it crashed with my changes and didn't without. I cannot recollect why. > src/hotspot/share/prims/jvmtiTagMap.cpp line 1152: > >> 1150: void JvmtiTagMap::unlink_and_post_locked() { >> 1151: MutexLocker ml(lock(), Mutex::_no_safepoint_check_flag); >> 1152: log_info(jvmti, table)("TagMap table needs posting before GetObjectTags"); > > There's no function called GetObjectTags. This log line needs to be adjusted. GetObjectsWithTags fixed. > src/hotspot/share/prims/jvmtiTagMap.cpp line 1162: > >> 1160: VMOp_Type type() const { return VMOp_Cleanup; } >> 1161: void doit() { >> 1162: _tag_map->unlink_and_post_locked(); > > Either inline unlink_and_post_locked() or updated gc_notification to use it? I thought of trying to share it but one logs and the other doesn't and it only saves 1 lines of code. > src/hotspot/share/prims/jvmtiTagMap.cpp line 1279: > >> 1277: // Can't post ObjectFree events here from a JavaThread, so this >> 1278: // will race with the gc_notification thread in the tiny >> 1279: // window where the oop is not marked but hasn't been notified that > > Please don't use "oop" when referring to "objects". fixed. > src/hotspot/share/prims/jvmtiTagMap.cpp line 2975: > >> 2973: } >> 2974: >> 2975: // Concurrent GC needs to call this in relocation pause, so after the oops are moved > > oops => objects fixed. > src/hotspot/share/prims/jvmtiTagMap.cpp line 2977: > >> 2975: // Concurrent GC needs to call this in relocation pause, so after the oops are moved >> 2976: // and have their new addresses, the table can be rehashed. >> 2977: void JvmtiTagMap::set_needs_processing() { > > Maybe rename to set_needs_rehashing? Since I went back and forth about what this function did (it posted events at one time), I thought the generic _processing name would be better. GC callers shouldn't really have to know what processing we're doing here. Hopefully it won't change from rehashing. That's why I like processing. > src/hotspot/share/prims/jvmtiTagMap.cpp line 2985: > >> 2983: >> 2984: JvmtiEnvIterator it; >> 2985: for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { > > The iterator seems fast enough, so it seems unnecessary to have the environments_might_exist check. yes, it looks like it does the same thing. I'll remove it. > src/hotspot/share/prims/jvmtiTagMap.cpp line 2998: > >> 2996: // thread creation and before VMThread creation (1 thread); initial GC >> 2997: // verification can happen in that window which gets to here. >> 2998: if (!JvmtiEnv::environments_might_exist()) { return; } > > I don't know what this comment is saying, and why the code is needed. I've spent tons of time trying to understand this comment too. I think gc verification used to call oops do on the tagmap table. This comments obsolete now, and I'll remove it. > src/hotspot/share/prims/jvmtiTagMap.cpp line 3020: > >> 3018: JvmtiTagMap* tag_map = env->tag_map_acquire(); >> 3019: if (tag_map != NULL && !tag_map->is_empty()) { >> 3020: if (num_dead_entries > 0) { > > The other num_dead_entries check for != 0. Maybe use the same in the two branches? ok. > src/hotspot/share/prims/jvmtiTagMap.cpp line 3023: > >> 3021: tag_map->hashmap()->unlink_and_post(tag_map->env()); >> 3022: } >> 3023: tag_map->_needs_rehashing = true; > > Maybe add a small comment why this is deferred. // Later GC code will relocate the oops, so defer rehashing until then. ? > src/hotspot/share/prims/jvmtiTagMap.hpp line 56: > >> 54: void entry_iterate(JvmtiTagMapEntryClosure* closure); >> 55: void post_dead_object_on_vm_thread(); >> 56: public: > > Looked nicer when there was a blank line before public. Now it looks like public "relates" more to the code before than after. ok > src/hotspot/share/prims/jvmtiTagMap.hpp line 114: > >> 112: static void check_hashmaps_for_heapwalk(); >> 113: static void set_needs_processing() NOT_JVMTI_RETURN; >> 114: static void gc_notification(size_t num_dead_entries) NOT_JVMTI_RETURN; > > Have you verified that this builds without JVMTI? I will do (might have already done) that. Building non-oracle platforms builds minimal vm. > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 50: > >> 48: // A subsequent oop_load without AS_NO_KEEPALIVE (the object() accessor) >> 49: // keeps the oop alive before doing so. >> 50: return literal().peek(); > > I'm not sure we should be talking about the low-level Access names. Maybe reword in terms of WeakHandle operations? I'm going to say: // Just peek at the object without keeping it alive. > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 81: > >> 79: void JvmtiTagMapTable::free_entry(JvmtiTagMapEntry* entry) { >> 80: unlink_entry(entry); >> 81: entry->literal().release(JvmtiExport::weak_tag_storage()); // release OopStorage > > release *to* OopStorage? fixed > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 98: > >> 96: >> 97: // The obj is in the table as a target already >> 98: if (target != NULL && target == obj) { > > Wonder if we could assert that obj is not NULL at the entry of this function, and then change this to simply target == obj? makes sense. > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 122: > >> 120: int index = hash_to_index(hash); >> 121: // One was added while acquiring the lock >> 122: JvmtiTagMapEntry* entry = find(index, hash, obj); > > Should this be done inside ASSERT? yes. > test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/cm02t001.cpp line 64: > >> 62: static jclass klass = NULL; >> 63: static jobject testedObject = NULL; >> 64: const jlong TESTED_TAG_VALUE = (5555555L); > > Remove parenthesis? I copied this from some other place that had parenthesis. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Mon Nov 2 13:22:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 13:22:01 GMT Subject: RFR: 8212879: Make JVMTI TagMap table not hash on oop address In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 20:23:04 GMT, Coleen Phillimore wrote: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. I think I addressed your comments, retesting now. Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Mon Nov 2 13:22:15 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 13:22:15 GMT Subject: RFR: 8212879: Make JVMTI TagMap table not hash on oop address In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 08:34:17 GMT, Stefan Karlsson wrote: >> src/hotspot/share/prims/jvmtiTagMap.cpp line 126: >> >>> 124: // concurrent GCs. So fix it here once we have a lock or are >>> 125: // at a safepoint. >>> 126: // SetTag and GetTag should not post events! >> >> I think it would be good to explain why. Otherwise, this just leaves the readers wondering why this is the case. > > Maybe even move this comment to the set_tag/get_tag code. I was trying to explain why there's a boolean there but I can put this comment at both get_tag and set_tag. // Check if we have to processing for concurrent GCs. // GetTag should not post events because the JavaThread has to // transition to native for the callback and this cannot stop for // safepoints with the hashmap lock held. check_hashmap(false); ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Mon Nov 2 13:28:59 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 13:28:59 GMT Subject: RFR: 8212879: Make JVMTI TagMap table not hash on oop address In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 20:46:31 GMT, Erik Joelsson wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Build changes look ok. There should be a thank you emoji that doesn't send email except maybe to the person reviewing the code. Thank you @erikj79 and @magicus for reviewing the build changes. There should also be a 'fixed' emoji. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From aph at openjdk.java.net Mon Nov 2 13:30:00 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 2 Nov 2020 13:30:00 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic In-Reply-To: References: Message-ID: <51wDcKZ0kbfUQ3yXerMX2k_PiqNMYzUdq-1AE-vAqzI=.60806e4a-ce08-4fb7-b218-93ac40444613@github.com> On Mon, 2 Nov 2020 03:05:48 GMT, Dong Bo wrote: > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5442: > 5440: Register src = c_rarg0; // source array > 5441: Register sp = c_rarg1; // source start offset > 5442: Register sl = c_rarg2; // source end offset Please don't use "sp" as a register name. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5453: > 5451: > 5452: #define BASE64_ENCODE_SIMD_ROUND(in0, in1, in2, out0, out1, out2, out3, SZ) \ > 5453: __ ld3(in0, in1, in2, __ T##SZ##B, __ post(src, 3 * SZ)); \ There's no need for this to be a macro -- as far as I can see. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From burban at openjdk.java.net Mon Nov 2 13:45:57 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 13:45:57 GMT Subject: Integrated: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build In-Reply-To: References: Message-ID: On Tue, 6 Oct 2020 18:09:05 GMT, Bernhard Urban-Forster wrote: > I organized this PR so that each commit contains the warning emitted by MSVC as commit message and its relevant fix. > > Verified on > * Linux+ARM64: `{hotspot,jdk,langtools}:tier1`, no failures. > * Windows+ARM64: `{hotspot,jdk,langtools}:tier1`, no (new) failures. > * internal macOS+ARM64 port: build without `--disable-warnings-as-errors` still works. Just mentioning this here, because it's yet another toolchain (Xcode / clang) that needs to be kept happy [going forward](https://openjdk.java.net/jeps/391). This pull request has now been integrated. Changeset: d2812f78 Author: Bernhard Urban-Forster Committer: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/d2812f78 Stats: 23 lines in 8 files changed: 2 ins; 0 del; 21 mod 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build Reviewed-by: ihse, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From aph at openjdk.java.net Mon Nov 2 13:45:57 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 2 Nov 2020 13:45:57 GMT Subject: RFR: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build [v4] In-Reply-To: References: Message-ID: On Tue, 27 Oct 2020 14:04:04 GMT, Andrew Haley wrote: >> Bernhard Urban-Forster has updated the pull request incrementally with two additional commits since the last revision: >> >> - uppercase suffix >> - add assert > > Marked as reviewed by aph (Reviewer). > Would you mind to sponsor it @theRealAph or @magicus? Hmm, I think you have to integrate it first. https://wiki.openjdk.java.net/display/SKARA/Pull+Request+Commands#PullRequestCommands-/sponsor ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From zgu at openjdk.java.net Mon Nov 2 13:48:02 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 2 Nov 2020 13:48:02 GMT Subject: RFR: 8255691: Shenandoah: Invoke native-LRB only on non-strong refs [v3] In-Reply-To: <2wI5SLQjP_SmJOstjPHk3ct5b1Pr3nZEvi97NWCbo-E=.39961cac-4a3b-48c4-b608-425f024020b0@github.com> References: <2O7ZS_-HyPZpm5m2sMTf-dZv8Qkw3tB47hMgORkckoU=.a58d7945-a2ea-4f76-9e01-540521bd77ea@github.com> <2wI5SLQjP_SmJOstjPHk3ct5b1Pr3nZEvi97NWCbo-E=.39961cac-4a3b-48c4-b608-425f024020b0@github.com> Message-ID: <0OzLgZFXNCorTwEdj9a2zFvR8Dj10pxCA8FiYp_tLdM=.edd21518-28c5-40ad-b2ee-a9e770e878b2@github.com> On Fri, 30 Oct 2020 19:39:07 GMT, Roman Kennke wrote: >> The way that current native LRB is implemented is wrong (but non-fatal) and misleading. It's purpose is to prevent resurrection of unreachable non-strong references, and it should only be invoked on non-strong references, not all native references. This distinction will become even more important once we get concurrent reference processing: then we also want to invoke this barrier on referent-loads. >> >> This changes the runtime-part of native-LRB so that it is only invoked when it's invoked with non-strong reference decorator. Otherwise it acts as regular LRB. >> >> Testing: hotspot_gc_shenandoah > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Rename LRB-native -> LRB-weak Marked as reviewed by zgu (Reviewer). src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 974: > 972: CAST_FROM_FN_PTR(address, ShenandoahRuntime::load_reference_barrier); > 973: > 974: address calladdr = is_native ? CAST_FROM_FN_PTR(address, ShenandoahRuntime::load_reference_barrier_weak) Please rename is_native -> is_weak. ------------- PR: https://git.openjdk.java.net/jdk/pull/961 From burban at openjdk.java.net Mon Nov 2 14:02:56 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 14:02:56 GMT Subject: RFR: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build [v4] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 13:41:53 GMT, Andrew Haley wrote: >> Marked as reviewed by aph (Reviewer). > >> Would you mind to sponsor it @theRealAph or @magicus? > > Hmm, I think you have to integrate it first. > https://wiki.openjdk.java.net/display/SKARA/Pull+Request+Commands#PullRequestCommands-/sponsor Thank you Andrew. ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From sjohanss at openjdk.java.net Mon Nov 2 14:07:00 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 2 Nov 2020 14:07:00 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v3] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 19:51:55 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? >> >> By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? >> >> Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: >> - humongous regions are either live or fully reclaimed. >> - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). >> >> This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. >> >> Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). >> >> Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. >> Performance testing: no regressions >> >> Some comments for questions that might come up during review: >> >> - how does this work with the bitmaps now: >> - at start of full gc the next bitmap is cleared >> - full gc marks the next bitmap >> - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom >> - swap bitmaps >> - clear next bitmap for next marking >> >> (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. >> >> - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. >> >> Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. >> (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). >> >> I.e. the second clause in the condition of this hunk is intentionally slower than could be: >> @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { >> // Marked by us, preserve if needed. >> markWord mark = obj->mark(); >> if (obj->mark_must_be_preserved(mark) && >> // It is not necessary to preserve marks for objects in pinned regions because >> // we do not change their headers (i.e. forward them). >> !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { >> preserved_stack()->push(obj, mark); >> } >> - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. >> >> Also please note that the 51b297b change is from the #808 change. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into 8253600-full-gc-pinned-region-support > - Merge branch 'master' into 8253600-full-gc-pinned-region-support > - sjohanss review > > Also remove _archive_allocator_map et al as the new attribute table > implements the same functionality also suggested by sjohanss in > private. > - Initial import > - Initial import Thanks Thomas for addressing my concerns around the archive-map and the new region-attr overlap. I think this looks much better. Just a few additional comments. src/hotspot/share/gc/g1/g1FullCollector.cpp line 169: > 167: PrepareRegionsClosure cl(this); > 168: _heap->heap_region_iterate(&cl); > 169: I know the old iteration tearing down the region sets was also done by a single thread, but I wonder if it would make sense to do it in parallel here. I guess we could file a CR to investigate if it would be worth doing. src/hotspot/share/gc/g1/g1FullGCMarker.inline.hpp line 57: > 55: // It is not necessary to preserve marks for objects in pinned regions because > 56: // we do not change their headers (i.e. forward them). > 57: !_collector->is_in_pinned_or_closed(obj)) { Even if it's correct, I think we should add a `is_in_pinned(obj)` and use it here, since we early out for objects in closed above. src/hotspot/share/gc/shared/collectedHeap.hpp line 515: > 513: // Is the given object inside a CDS archive area? > 514: virtual bool is_archived_object(oop object) const { return false; } > 515: I think this is a good addition, but to be consistent with the other functions, I think the implementation should be moved to the cpp-file. src/hotspot/share/gc/g1/g1FullCollector.hpp line 101: > 99: bool is_in_pinned_or_closed(oop obj) const { return _region_attr_table.is_pinned_or_closed(cast_from_oop(obj)); } > 100: bool is_in_closed(oop obj) const { return _region_attr_table.is_closed_archive(cast_from_oop(obj)); } > 101: Can't decide if it's worth adding a inline.hpp for those. Just throwing it out there in case you thought about it as well :) ------------- PR: https://git.openjdk.java.net/jdk/pull/824 From sjohanss at openjdk.java.net Mon Nov 2 14:07:01 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 2 Nov 2020 14:07:01 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v3] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 08:34:13 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into 8253600-full-gc-pinned-region-support >> - Merge branch 'master' into 8253600-full-gc-pinned-region-support >> - sjohanss review >> >> Also remove _archive_allocator_map et al as the new attribute table >> implements the same functionality also suggested by sjohanss in >> private. >> - Initial import >> - Initial import > > src/hotspot/share/gc/g1/g1FullGCMarker.hpp line 46: > >> 44: >> 45: class G1CMBitMap; >> 46: class G1FullCollector; > > Not needed (yet). Now its needed :) ------------- PR: https://git.openjdk.java.net/jdk/pull/824 From rkennke at openjdk.java.net Mon Nov 2 14:16:14 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 Nov 2020 14:16:14 GMT Subject: RFR: 8255691: Shenandoah: Invoke native-LRB only on non-strong refs [v4] In-Reply-To: <2O7ZS_-HyPZpm5m2sMTf-dZv8Qkw3tB47hMgORkckoU=.a58d7945-a2ea-4f76-9e01-540521bd77ea@github.com> References: <2O7ZS_-HyPZpm5m2sMTf-dZv8Qkw3tB47hMgORkckoU=.a58d7945-a2ea-4f76-9e01-540521bd77ea@github.com> Message-ID: > The way that current native LRB is implemented is wrong (but non-fatal) and misleading. It's purpose is to prevent resurrection of unreachable non-strong references, and it should only be invoked on non-strong references, not all native references. This distinction will become even more important once we get concurrent reference processing: then we also want to invoke this barrier on referent-loads. > > This changes the runtime-part of native-LRB so that it is only invoked when it's invoked with non-strong reference decorator. Otherwise it acts as regular LRB. > > Testing: hotspot_gc_shenandoah Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Rename argument is_native -> is_weak ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/961/files - new: https://git.openjdk.java.net/jdk/pull/961/files/86c80228..2db8a9d5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=961&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=961&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/961.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/961/head:pull/961 PR: https://git.openjdk.java.net/jdk/pull/961 From shade at openjdk.java.net Mon Nov 2 14:16:16 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 2 Nov 2020 14:16:16 GMT Subject: RFR: 8255691: Shenandoah: Invoke native-LRB only on non-strong refs [v3] In-Reply-To: <2wI5SLQjP_SmJOstjPHk3ct5b1Pr3nZEvi97NWCbo-E=.39961cac-4a3b-48c4-b608-425f024020b0@github.com> References: <2O7ZS_-HyPZpm5m2sMTf-dZv8Qkw3tB47hMgORkckoU=.a58d7945-a2ea-4f76-9e01-540521bd77ea@github.com> <2wI5SLQjP_SmJOstjPHk3ct5b1Pr3nZEvi97NWCbo-E=.39961cac-4a3b-48c4-b608-425f024020b0@github.com> Message-ID: On Fri, 30 Oct 2020 19:39:07 GMT, Roman Kennke wrote: >> The way that current native LRB is implemented is wrong (but non-fatal) and misleading. It's purpose is to prevent resurrection of unreachable non-strong references, and it should only be invoked on non-strong references, not all native references. This distinction will become even more important once we get concurrent reference processing: then we also want to invoke this barrier on referent-loads. >> >> This changes the runtime-part of native-LRB so that it is only invoked when it's invoked with non-strong reference decorator. Otherwise it acts as regular LRB. >> >> Testing: hotspot_gc_shenandoah > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Rename LRB-native -> LRB-weak Generally looks good, some minor nits. src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 1063: > 1061: Node* in2 = n->in(2); > 1062: > 1063: // If one input is NULL, then step over the barriers (except LRB native) on the other input Should be `weak LRB`, not `LRB native` now? Probably text search for `native` elsewhere? src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp line 98: > 96: } > 97: > 98: return ((decorators & IN_NATIVE) != 0) && ((decorators & ON_STRONG_OOP_REF) == 0); This would change with conc ref processing, right? Currently this only accepts "native" + "weak" LRBs. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 105: > 103: inline oop ShenandoahBarrierSet::load_reference_barrier(oop obj, T* load_addr) { > 104: > 105: // Prevent resurrection of unreachable non-strorg references. Typo: "non-strorg" ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/961 From rkennke at openjdk.java.net Mon Nov 2 14:16:17 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 Nov 2020 14:16:17 GMT Subject: Integrated: 8255691: Shenandoah: Invoke native-LRB only on non-strong refs In-Reply-To: <2O7ZS_-HyPZpm5m2sMTf-dZv8Qkw3tB47hMgORkckoU=.a58d7945-a2ea-4f76-9e01-540521bd77ea@github.com> References: <2O7ZS_-HyPZpm5m2sMTf-dZv8Qkw3tB47hMgORkckoU=.a58d7945-a2ea-4f76-9e01-540521bd77ea@github.com> Message-ID: On Fri, 30 Oct 2020 18:12:18 GMT, Roman Kennke wrote: > The way that current native LRB is implemented is wrong (but non-fatal) and misleading. It's purpose is to prevent resurrection of unreachable non-strong references, and it should only be invoked on non-strong references, not all native references. This distinction will become even more important once we get concurrent reference processing: then we also want to invoke this barrier on referent-loads. > > This changes the runtime-part of native-LRB so that it is only invoked when it's invoked with non-strong reference decorator. Otherwise it acts as regular LRB. > > Testing: hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 1019581c Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/1019581c Stats: 84 lines in 14 files changed: 4 ins; 3 del; 77 mod 8255691: Shenandoah: Invoke native-LRB only on non-strong refs Reviewed-by: zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/961 From mcimadamore at openjdk.java.net Mon Nov 2 14:18:14 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 2 Nov 2020 14:18:14 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v21] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Addess remaining feedback from @AlanBateman and @mrserb ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/548/files - new: https://git.openjdk.java.net/jdk/pull/548/files/8225bf2e..e2f69ec0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=20 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=19-20 Stats: 5 lines in 2 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From dongbo at openjdk.java.net Mon Nov 2 14:32:07 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 2 Nov 2020 14:32:07 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v2] In-Reply-To: References: Message-ID: > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: change register name sp and unpack the macro ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/992/files - new: https://git.openjdk.java.net/jdk/pull/992/files/a34be2a9..2a17576b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=00-01 Stats: 74 lines in 1 file changed: 40 ins; 26 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/992.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992 PR: https://git.openjdk.java.net/jdk/pull/992 From dongbo at openjdk.java.net Mon Nov 2 14:32:09 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 2 Nov 2020 14:32:09 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v2] In-Reply-To: <51wDcKZ0kbfUQ3yXerMX2k_PiqNMYzUdq-1AE-vAqzI=.60806e4a-ce08-4fb7-b218-93ac40444613@github.com> References: <51wDcKZ0kbfUQ3yXerMX2k_PiqNMYzUdq-1AE-vAqzI=.60806e4a-ce08-4fb7-b218-93ac40444613@github.com> Message-ID: On Mon, 2 Nov 2020 13:26:49 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> change register name sp and unpack the macro > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5453: > >> 5451: >> 5452: #define BASE64_ENCODE_SIMD_ROUND(in0, in1, in2, out0, out1, out2, out3, SZ) \ >> 5453: __ ld3(in0, in1, in2, __ T##SZ##B, __ post(src, 3 * SZ)); \ > > There's no need for this to be a macro -- as far as I can see. Thanks for the suggestions. Just updated a version. The register name `sp` is changed to `soff`, and the macro is unpacked into code block `Process48B` and `Process24B`. Verified with `test/jdk/java/util/Base64/`. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From aph at openjdk.java.net Mon Nov 2 15:00:57 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 2 Nov 2020 15:00:57 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v2] In-Reply-To: References: <51wDcKZ0kbfUQ3yXerMX2k_PiqNMYzUdq-1AE-vAqzI=.60806e4a-ce08-4fb7-b218-93ac40444613@github.com> Message-ID: On Mon, 2 Nov 2020 14:29:39 GMT, Dong Bo wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5453: >> >>> 5451: >>> 5452: #define BASE64_ENCODE_SIMD_ROUND(in0, in1, in2, out0, out1, out2, out3, SZ) \ >>> 5453: __ ld3(in0, in1, in2, __ T##SZ##B, __ post(src, 3 * SZ)); \ >> >> There's no need for this to be a macro -- as far as I can see. > > Thanks for the suggestions. > > Just updated a version. > The register name `sp` is changed to `soff`, and the macro is unpacked into code block `Process48B` and `Process24B`. > > Verified with `test/jdk/java/util/Base64/`. I'm sorry, there's no way I wanted the macro to be unpacked; I wanted it to be a function. I apologize for not being clear. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From rkennke at openjdk.java.net Mon Nov 2 15:28:09 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 Nov 2020 15:28:09 GMT Subject: RFR: 8254315: Shenandoah: Concurrent weak reference processing [v28] In-Reply-To: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> References: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> Message-ID: > Until now, references (as in java.lang.ref.Reference and its subclasses WeakReference, SoftReference, PhantomReference and the non-public FinalReference - I'll collectively call them weak references for the purpose of clarity). Workloads that make heavvy use of such weak references will therefore potentially cause significant GC pauses. > > There are 3 main items that contribute to pause time linear to number of references, or worse: > - We need to scan and consider each reference on the various 'discovered' lists. > - We need to mark through subgraph of objects that are reachable only through FinalReference. Notice that this is theoretically only bounded by the live data set size. > - Finally, all no-longer-reachable references need to be enqueued in the 'pending list' > > The problem is somewhat mitigated by pre-cleaning the discovered list: Any weak reference that we find to be strongly reachable will be removed before we go into the final-mark-pause. However, that is only a band-aid. > > The solution to this is two-fold: > 1. Extend concurrent marking to also mark the 'finalizable' subgraph of the heap. This requires to extend the marking bitmap to allow for two kinds of reachability: each object can now be strongly and finalizably reachable. Whenever marking encounters a FinalReference, it will mark through the referent and switch to 'finalizably' reachability for all objects starting from the referent. When marking encounters finalizably reachable objects while marking strongly, it will 'upgrade' reachability of such objects to strongly reachable. All of this can be done concurrently. Any encounter of a Reference (or subclass) object will enqueue that object into a thread-local 'discovered' list. Except for FinalReference, marking stops there, and does not mark through the referent. > 2. Concurrent processing is performed after the final-mark pause. GC workers scan all discovered lists that have been collected by concurrent marking, and depending on reachability of the referent, either drop the Reference, or enqueue it into the global 'pending' list (from where it will be processed by Java reference handler thread). In addition to that, we must ensure that no referents become resurrected by accessing Reference.get() on it. In order to achieve this, we employ special barriers in Reference.get() intrinsics that return NULL when the referent is not reachable. > > Testing: hotspot_gc_shenadoah (release+fastdebug, x86+aarch64), specjvm+specjbb without regressions, tier1, tier2, vmTestbase_vm_metaspace, vmTestbase_nsk_jvmti, with -XX:+UseShenandoahGC without regressions, specjvm with various levels of verification Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 86 commits: - Invoke interpreter weak-LRB on weak-refs too (fixes merge mistake) - Merge branch 'master' into shenandoah-concurrent-weakrefs - Adopt ShenandoahReferenceBarrier to recent changes in LRB runtime impl - Merge branch 'master' into shenandoah-concurrent-weakrefs - Invert strong/weak in marking tasks and related code - Fix merge mistake - Merge branch 'master' into shenandoah-concurrent-weakrefs - Pass marking-strength through chunked arrays - Rename mark_final -> mark_weak and several cleanups (by shade) - Some more ShMarkTask cleanups - ... and 76 more: https://git.openjdk.java.net/jdk/compare/4c66b158...2c2579d2 ------------- Changes: https://git.openjdk.java.net/jdk/pull/505/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=27 Stats: 2409 lines in 55 files changed: 1632 ins; 564 del; 213 mod Patch: https://git.openjdk.java.net/jdk/pull/505.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/505/head:pull/505 PR: https://git.openjdk.java.net/jdk/pull/505 From roland at openjdk.java.net Mon Nov 2 15:45:58 2020 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 2 Nov 2020 15:45:58 GMT Subject: RFR: 8255401: Shenandoah: Allow oldval and newval registers to overlap in cmpxchg_oop() In-Reply-To: References: Message-ID: <_EufJHEmM7yEZd_RS0vzeafF8BFqVA3nm_J4n-B9IeM=.94b48189-7853-4154-9a1f-5409f2414717@github.com> On Mon, 26 Oct 2020 20:50:40 GMT, Roman Kennke wrote: > We encountered a failure in testing: > > Internal Error (/home/jenkins/workspace/nightly/jdk-jdk/src/hotspot/share/asm/register.hpp:141), pid=15470, tid=15611 > assert(a != b && a != c && a != d && b != c && b != d && c != d) failed: registers must be different: a=0x0000000000000000, b=0x0000000000000000, c=0x000000000000000b, d=0x000000000000000a > > in: > > Stack: [0x00007fb8fa2e3000,0x00007fb8fa3e4000], sp=0x00007fb8fa3deca0, free space=1007k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x156890e] ShenandoahBarrierSetAssembler::cmpxchg_oop(MacroAssembler*, RegisterImpl*, Address, RegisterImpl*, RegisterImpl*, bool, RegisterImpl*, RegisterImpl*)+0xde > V [libjvm.so+0x3ec1d1] compareAndSwapN_shenandoahNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x571 > > It seems to appear very rarely. > > The failure is that both newval and oldval are the same register (rax). I believe it is ok for the two registers to overlap: > - It is not expected that newval is preserved across the cmpxchg > - The CAS will override newval, but: > - The first CAS is unaffected by the overlap > - The retry-loop is only entered when previous-value == old-value, and thus newval will still hold the same value > > For aarch64 it matters even less, because newval is never overridden. > > Testing: hotspot_gc_shenandoah (x86 & aarch64). Looks good to me ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/871 From ihse at openjdk.java.net Mon Nov 2 15:46:57 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 2 Nov 2020 15:46:57 GMT Subject: RFR: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build [v4] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 14:00:33 GMT, Bernhard Urban-Forster wrote: >>> Would you mind to sponsor it @theRealAph or @magicus? >> >> Hmm, I think you have to integrate it first. >> https://wiki.openjdk.java.net/display/SKARA/Pull+Request+Commands#PullRequestCommands-/sponsor > > Thank you Andrew. @lewurm This patch seems to break on linux-aarch64 with gcc: open/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:1501:52: error: comparison of integer expressions of different signedness: 'size_t' {aka 'long unsigned int'} and 'int' [-Werror=sign-compare] 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Did you test building this on any aarch64 platforms besides Windows? ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From coleenp at openjdk.java.net Mon Nov 2 15:58:15 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 15:58:15 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: References: Message-ID: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Code review comments from StefanK. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/534326da..cb4c83e0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=00-01 Stats: 75 lines in 9 files changed: 12 ins; 48 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From zgu at openjdk.java.net Mon Nov 2 16:07:02 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 2 Nov 2020 16:07:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> References: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> Message-ID: On Mon, 2 Nov 2020 15:58:15 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Code review comments from StefanK. Shenandoah part looks good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/967 From rkennke at openjdk.java.net Mon Nov 2 16:08:56 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 Nov 2020 16:08:56 GMT Subject: Integrated: 8255401: Shenandoah: Allow oldval and newval registers to overlap in cmpxchg_oop() In-Reply-To: References: Message-ID: On Mon, 26 Oct 2020 20:50:40 GMT, Roman Kennke wrote: > We encountered a failure in testing: > > Internal Error (/home/jenkins/workspace/nightly/jdk-jdk/src/hotspot/share/asm/register.hpp:141), pid=15470, tid=15611 > assert(a != b && a != c && a != d && b != c && b != d && c != d) failed: registers must be different: a=0x0000000000000000, b=0x0000000000000000, c=0x000000000000000b, d=0x000000000000000a > > in: > > Stack: [0x00007fb8fa2e3000,0x00007fb8fa3e4000], sp=0x00007fb8fa3deca0, free space=1007k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x156890e] ShenandoahBarrierSetAssembler::cmpxchg_oop(MacroAssembler*, RegisterImpl*, Address, RegisterImpl*, RegisterImpl*, bool, RegisterImpl*, RegisterImpl*)+0xde > V [libjvm.so+0x3ec1d1] compareAndSwapN_shenandoahNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x571 > > It seems to appear very rarely. > > The failure is that both newval and oldval are the same register (rax). I believe it is ok for the two registers to overlap: > - It is not expected that newval is preserved across the cmpxchg > - The CAS will override newval, but: > - The first CAS is unaffected by the overlap > - The retry-loop is only entered when previous-value == old-value, and thus newval will still hold the same value > > For aarch64 it matters even less, because newval is never overridden. > > Testing: hotspot_gc_shenandoah (x86 & aarch64). This pull request has now been integrated. Changeset: 0e19ded9 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/0e19ded9 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod 8255401: Shenandoah: Allow oldval and newval registers to overlap in cmpxchg_oop() Reviewed-by: roland ------------- PR: https://git.openjdk.java.net/jdk/pull/871 From burban at openjdk.java.net Mon Nov 2 16:08:58 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 16:08:58 GMT Subject: RFR: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build [v4] In-Reply-To: References: Message-ID: <8gg1Viyrq9xvlobYWDR2MTvXYFEBtVHXIfAkrptF240=.8936d808-6aac-43d7-9893-e8c095c6abb9@github.com> On Mon, 2 Nov 2020 15:41:06 GMT, Magnus Ihse Bursie wrote: >> Thank you Andrew. > > @lewurm > This patch seems to break on linux-aarch64 with gcc: > open/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:1501:52: error: comparison of integer expressions of different signedness: 'size_t' {aka 'long unsigned int'} and 'int' [-Werror=sign-compare] > 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); > | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Did you test building this on any aarch64 platforms besides Windows? @magicus I did test the initial version of this PR on linux+arm64, but not the latest iteration. sorry about that ?? What is the policy here? Submit a revert right away or investigate a fix? ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From kvn at openjdk.java.net Mon Nov 2 16:09:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 2 Nov 2020 16:09:55 GMT Subject: Integrated: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: Message-ID: <-4ETgOIXCz-lGU-CQQTpYtgUZ5gs6TYaeAZtBRqXmmg=.16b603f0-0af9-425f-aa62-b3ae29e41e56@github.com> On Fri, 30 Oct 2020 17:40:51 GMT, Vladimir Kozlov wrote: > We shipped Ahead-of-Time compilation (the jaotc tool) in JDK 9, as an experimental feature. We shipped Graal as an experimental JIT compiler in JDK 10. We haven't seen much use of these features, and the effort required to support and enhance them is significant. We therefore intend to disable these features in Oracle builds as of JDK 16. > > We'll leave the sources for these features in the repository, in case any one else is interested in building them. But we will not update or test them. > > We'll continue to build and ship JVMCI as an experimental feature in Oracle builds. > > Tested changes in all tiers. > > I verified that with these changes I still able to build Graal in open repo and run graalunit testing: > > `open$ bash test/hotspot/jtreg/compiler/graalunit/downloadLibs.sh /mydir/graalunit_lib/` > `open$ bash configure --with-debug-level=fastdebug --with-graalunit-lib=/mydir/graalunit_lib/ --with-jtreg=/mydir/jtreg` > `open$ make jdk-image` > `open$ make test-image` > `open$ make run-test TEST=compiler/graalunit/HotspotTest.java` This pull request has now been integrated. Changeset: 2f7d34f2 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/2f7d34f2 Stats: 36 lines in 4 files changed: 21 ins; 11 del; 4 mod 8255616: Disable AOT and Graal in Oracle OpenJDK Reviewed-by: iignatyev, vlivanov, iveresov, ihse ------------- PR: https://git.openjdk.java.net/jdk/pull/960 From eosterlund at openjdk.java.net Mon Nov 2 16:14:06 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Nov 2020 16:14:06 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> References: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> Message-ID: <0heSKANZ4kJqlQOKY6MCs6cSZvT8-KRFRbbbKTlzNzA=.7224a164-a76d-47ce-8016-9db052b09773@github.com> On Mon, 2 Nov 2020 15:58:15 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Code review comments from StefanK. Looks great in general. Great work Coleen, and thanks again for fixing this. I like all the red lines in the GC code. I added a few nits/questions. test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/cm02t001.cpp line 656: > 654: result = NSK_FALSE; > 655: > 656: printf("Object free events %d\n", ObjectFreeEventsCount); Is this old debug info you forgot to remove? Other code seems to use NSK_DISPLAY macros instead. src/hotspot/share/prims/jvmtiTagMap.cpp line 345: > 343: > 344: // Check if we have to process for concurrent GCs. > 345: check_hashmap(false); Maybe add a comment stating the parameter name, as was done in other callsites for check_hashmap. src/hotspot/share/prims/jvmtiTagMap.cpp line 3009: > 3007: // Lock each hashmap from concurrent posting and cleaning > 3008: MutexLocker ml(tag_map->lock(), Mutex::_no_safepoint_check_flag); > 3009: tag_map->hashmap()->unlink_and_post(tag_map->env()); This could call unlink_and_post_locked instead of manually locking. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/967 From eosterlund at openjdk.java.net Mon Nov 2 16:14:06 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Nov 2020 16:14:06 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 12:50:23 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/jvmtiTagMap.cpp line 954: >> >>> 952: o->klass()->external_name()); >>> 953: return; >>> 954: } >> >> Why is this done as a part of this RFE? Is this a bug fix that should be done as a separate patch? > > Because it crashed with my changes and didn't without. I cannot recollect why. I thought that we didn't load the archived heap from CDS, if JVMTI heap walker capabilities are in place, as we didn't want this kind of interactions. But maybe I'm missing something, since you said having this if statement here made a difference. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From ihse at openjdk.java.net Mon Nov 2 16:19:07 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 2 Nov 2020 16:19:07 GMT Subject: RFR: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build [v4] In-Reply-To: <8gg1Viyrq9xvlobYWDR2MTvXYFEBtVHXIfAkrptF240=.8936d808-6aac-43d7-9893-e8c095c6abb9@github.com> References: <8gg1Viyrq9xvlobYWDR2MTvXYFEBtVHXIfAkrptF240=.8936d808-6aac-43d7-9893-e8c095c6abb9@github.com> Message-ID: <-mnJRsB2bDYTyIboORS0S7t2rVJC_YolZrqmr2lqloM=.da15438d-463a-4462-8292-f6595b4bcb06@github.com> On Mon, 2 Nov 2020 16:06:15 GMT, Bernhard Urban-Forster wrote: >> @lewurm >> This patch seems to break on linux-aarch64 with gcc: >> open/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:1501:52: error: comparison of integer expressions of different signedness: 'size_t' {aka 'long unsigned int'} and 'int' [-Werror=sign-compare] >> 1501 | assert(StackOverflow::stack_shadow_zone_size() == (int)StackOverflow::stack_shadow_zone_size(), "must be same"); >> | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> Did you test building this on any aarch64 platforms besides Windows? > > @magicus I did test the initial version of this PR on linux+arm64, but not the latest iteration. sorry about that ?? > > What is the policy here? Submit a revert right away or investigate a fix? @lewurm Open a new JBS issue with the bug. If you can find a fix in a short amount of time (which I would believe should be possible; probably just need a proper cast) it's acceptable to fix it directly. What amounts to a "short amount of time" is left to reasonable judgement; minutes to hours are okay, days are not. Otherwise, create an anti-delta (revert changeset) to back out your changes, and open yet another JBS issue for re-implementing them correctly. In this case, an alternative short-term fix could also be to remove the assert instead of backing out the entire patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From stefank at openjdk.java.net Mon Nov 2 16:21:05 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 2 Nov 2020 16:21:05 GMT Subject: RFR: 8255662: ZGC: Unify nmethod closures in the heap iterator Message-ID: n the heap iterator, we use different nmethod closures to visit the on-stack nmethods and the rest that are visited if class unloading is turned off. The rational is that the first set have already been processed and does not have to be fixed, so the code simply verifies that the nmethod has been processed and visits all the oops. The second set contains nmethods of the kind that has been entered and processed, and those that have not. Before visiting oops in those nmethods, we apply an nmethod barrier to ensure that it's safe to visit the oops. The proposal is to get rid of this separation and simply apply the nmethod entry barrier on all visited nmethods. This will make it easier to reason about the safeness of visiting the oops. ------------- Commit messages: - Review 1 - hide ClassUnloading check - ZGC: Unify nmethod closures in the heap iterator Changes: https://git.openjdk.java.net/jdk/pull/1011/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1011&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255662 Stats: 39 lines in 3 files changed: 18 ins; 15 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1011.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1011/head:pull/1011 PR: https://git.openjdk.java.net/jdk/pull/1011 From coleenp at openjdk.java.net Mon Nov 2 16:23:03 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 16:23:03 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 11:14:10 GMT, Erik ?sterlund wrote: >> The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation(). >> >> The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop. >> >> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is wrong, as we can safepoint on the way back to Java, which will break the return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the transition to VM and back, into a block of code that is protected against GC. Before the JRT_BLOCK is called, we stash away the return oop, and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when InterpreterRuntime::post_method_exit is called when throwing an exception, we don't have the same problem of retaining an oop result, and hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the same as before for this path. >> >> This is a JVMTI bug that has probably been around for a long time. It crashes with all GCs, but was discovered recently after concurrent stack processing, as StefanK has been running better GC stressing code in JVMTI, and the bug reproduced more easily with concurrent stack processing, as the timings were a bit different. The following reproducer failed pretty much 100% of the time: >> while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done >> >> With my fix I can run this repeatedly without any more failures. I have also sanity checked the patch by running tier 1-5, so that it does not introduces any new issues on its own. I have also used Stefan's nice external GC stressing with jcmd technique that was used to trigger crashes with other GCs, to make sure said crashes no longer reproduce either. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Coleen CR1: Refactoring This looks better. Just to have the JRT_BLOCK be unconditional is an improvement. src/hotspot/share/prims/jvmtiExport.cpp line 1570: > 1568: // return a flag when a method terminates by throwing an exception > 1569: // i.e. if an exception is thrown and it's not caught by the current method > 1570: bool exception_exit = state->is_exception_detected() && !state->is_exception_caught(); So this only applies to the case where the post_method_exit comes from remove_activation? Good to have it only on this path in this case. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/930 From coleenp at openjdk.java.net Mon Nov 2 16:27:03 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 16:27:03 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: <0heSKANZ4kJqlQOKY6MCs6cSZvT8-KRFRbbbKTlzNzA=.7224a164-a76d-47ce-8016-9db052b09773@github.com> References: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> <0heSKANZ4kJqlQOKY6MCs6cSZvT8-KRFRbbbKTlzNzA=.7224a164-a76d-47ce-8016-9db052b09773@github.com> Message-ID: On Mon, 2 Nov 2020 15:18:43 GMT, Erik ?sterlund wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Code review comments from StefanK. > > test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM02/cm02t001/cm02t001.cpp line 656: > >> 654: result = NSK_FALSE; >> 655: >> 656: printf("Object free events %d\n", ObjectFreeEventsCount); > > Is this old debug info you forgot to remove? Other code seems to use NSK_DISPLAY macros instead. yes, removed it. > src/hotspot/share/prims/jvmtiTagMap.cpp line 345: > >> 343: >> 344: // Check if we have to process for concurrent GCs. >> 345: check_hashmap(false); > > Maybe add a comment stating the parameter name, as was done in other callsites for check_hashmap. Ok, will I run afoul of the ZGC people putting the parameter name after the parameter and the rest of the code, it is before? > src/hotspot/share/prims/jvmtiTagMap.cpp line 3009: > >> 3007: // Lock each hashmap from concurrent posting and cleaning >> 3008: MutexLocker ml(tag_map->lock(), Mutex::_no_safepoint_check_flag); >> 3009: tag_map->hashmap()->unlink_and_post(tag_map->env()); > > This could call unlink_and_post_locked instead of manually locking. Ok, 2 requests. I can call that then and move the logging. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From eosterlund at openjdk.java.net Mon Nov 2 16:32:54 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Nov 2020 16:32:54 GMT Subject: RFR: 8255662: ZGC: Unify nmethod closures in the heap iterator In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 16:10:55 GMT, Stefan Karlsson wrote: > n the heap iterator, we use different nmethod closures to visit the on-stack nmethods and the rest that are visited if class unloading is turned off. > > The rational is that the first set have already been processed and does not have to be fixed, so the code simply verifies that the nmethod has been processed and visits all the oops. The second set contains nmethods of the kind that has been entered and processed, and those that have not. Before visiting oops in those nmethods, we apply an nmethod barrier to ensure that it's safe to visit the oops. > > The proposal is to get rid of this separation and simply apply the nmethod entry barrier on all visited nmethods. This will make it easier to reason about the safeness of visiting the oops. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1011 From github.com+58006833+xbzhang99 at openjdk.java.net Mon Nov 2 16:52:05 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 2 Nov 2020 16:52:05 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 04:32:41 GMT, Xubo Zhang wrote: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number I am from Intel Corp. Intel is OCA signatory ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Mon Nov 2 16:52:04 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 2 Nov 2020 16:52:04 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms Message-ID: Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number ------------- Commit messages: - Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large Changes: https://git.openjdk.java.net/jdk/pull/894/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255368 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From coleenp at openjdk.java.net Mon Nov 2 16:57:14 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 16:57:14 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 16:52:02 GMT, Coleen Phillimore wrote: >> I thought that we didn't load the archived heap from CDS, if JVMTI heap walker capabilities are in place, as we didn't want this kind of interactions. But maybe I'm missing something, since you said having this if statement here made a difference. > > Now I remember. I added an assert in JvmtiTagMapTable::find() for oop != NULL which didn't exist in the current hashmap code. The current hashmap code just didn't find a null oop. I tracked it down to the fact that we're finding dormant objects whose class hasn't been loaded yet. So I think we do load the archived heap from CDS. The heap walker capabilities can be added dynamically. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Mon Nov 2 16:57:13 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 16:57:13 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 15:45:18 GMT, Erik ?sterlund wrote: >> Because it crashed with my changes and didn't without. I cannot recollect why. > > I thought that we didn't load the archived heap from CDS, if JVMTI heap walker capabilities are in place, as we didn't want this kind of interactions. But maybe I'm missing something, since you said having this if statement here made a difference. Now I remember. I added an assert in JvmtiTagMapTable::find() for oop != NULL which didn't exist in the current hashmap code. The current hashmap code just didn't find a null oop. I tracked it down to the fact that we're finding dormant objects whose class hasn't been loaded yet. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From burban at openjdk.java.net Mon Nov 2 17:07:54 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 17:07:54 GMT Subject: RFR: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build [v4] In-Reply-To: <-mnJRsB2bDYTyIboORS0S7t2rVJC_YolZrqmr2lqloM=.da15438d-463a-4462-8292-f6595b4bcb06@github.com> References: <8gg1Viyrq9xvlobYWDR2MTvXYFEBtVHXIfAkrptF240=.8936d808-6aac-43d7-9893-e8c095c6abb9@github.com> <-mnJRsB2bDYTyIboORS0S7t2rVJC_YolZrqmr2lqloM=.da15438d-463a-4462-8292-f6595b4bcb06@github.com> Message-ID: On Mon, 2 Nov 2020 16:16:25 GMT, Magnus Ihse Bursie wrote: >> @magicus I did test the initial version of this PR on linux+arm64, but not the latest iteration. sorry about that ?? >> >> What is the policy here? Submit a revert right away or investigate a fix? > > @lewurm Open a new JBS issue with the bug. If you can find a fix in a short amount of time (which I would believe should be possible; probably just need a proper cast) it's acceptable to fix it directly. What amounts to a "short amount of time" is left to reasonable judgement; minutes to hours are okay, days are not. > > Otherwise, create an anti-delta (revert changeset) to back out your changes, and open yet another JBS issue for re-implementing them correctly. > > In this case, an alternative short-term fix could also be to remove the assert instead of backing out the entire patch. https://github.com/openjdk/jdk/pull/1013 ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From darcy at openjdk.java.net Mon Nov 2 17:45:02 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Mon, 2 Nov 2020 17:45:02 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms In-Reply-To: References: Message-ID: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> On Wed, 28 Oct 2020 04:32:41 GMT, Xubo Zhang wrote: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number The regression tests for exp should be explicitly updated to cover the previously erroneous input if they do not do so already. ------------- Changes requested by darcy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/894 From aph at openjdk.java.net Mon Nov 2 17:45:57 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 2 Nov 2020 17:45:57 GMT Subject: RFR: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build [v4] In-Reply-To: References: <8gg1Viyrq9xvlobYWDR2MTvXYFEBtVHXIfAkrptF240=.8936d808-6aac-43d7-9893-e8c095c6abb9@github.com> <-mnJRsB2bDYTyIboORS0S7t2rVJC_YolZrqmr2lqloM=.da15438d-463a-4462-8292-f6595b4bcb06@github.com> Message-ID: On Mon, 2 Nov 2020 17:05:19 GMT, Bernhard Urban-Forster wrote: >> @lewurm Open a new JBS issue with the bug. If you can find a fix in a short amount of time (which I would believe should be possible; probably just need a proper cast) it's acceptable to fix it directly. What amounts to a "short amount of time" is left to reasonable judgement; minutes to hours are okay, days are not. >> >> Otherwise, create an anti-delta (revert changeset) to back out your changes, and open yet another JBS issue for re-implementing them correctly. >> >> In this case, an alternative short-term fix could also be to remove the assert instead of backing out the entire patch. > > https://github.com/openjdk/jdk/pull/1013 > @lewurm > This patch seems to break on linux-aarch64 with gcc: Builds cleanly on Linux/GCC or me. ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From jbhateja at openjdk.java.net Mon Nov 2 17:50:56 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 2 Nov 2020 17:50:56 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: References: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> Message-ID: On Fri, 16 Oct 2020 14:50:15 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Replacing explicit type checks with existing type checking routines > > src/hotspot/share/opto/cfgnode.cpp line 396: > >> 394: } >> 395: >> 396: bool RegionNode::is_self_loop(Node* n) { > > A bit expensive to DFS the entire graph to find a self loop. You don't need to visit nodes outside the loop. But you might not need to do this at all - see my comments further down. DONE ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From eosterlund at openjdk.java.net Mon Nov 2 17:56:07 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Nov 2020 17:56:07 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 16:19:59 GMT, Coleen Phillimore wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Coleen CR1: Refactoring > > src/hotspot/share/prims/jvmtiExport.cpp line 1570: > >> 1568: // return a flag when a method terminates by throwing an exception >> 1569: // i.e. if an exception is thrown and it's not caught by the current method >> 1570: bool exception_exit = state->is_exception_detected() && !state->is_exception_caught(); > > So this only applies to the case where the post_method_exit comes from remove_activation? Good to have it only on this path in this case. I'm not sure. There might be other cases, when remove_activation is called by the exception code. That's why I didn't want to change it to just true in this path. ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From jbhateja at openjdk.java.net Mon Nov 2 17:58:03 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 2 Nov 2020 17:58:03 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v3] In-Reply-To: References: <94qadtiTzSkdsJAc_8IWrLxpBvmfiBXMf_W9Z965P80=.9a59a5db-2209-4007-94bb-16ccd8ff0b77@github.com> Message-ID: On Fri, 16 Oct 2020 14:53:32 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Replacing explicit type checks with existing type checking routines > > src/hotspot/share/opto/cfgnode.cpp line 436: > >> 434: Node* rep_node = NULL; >> 435: PhaseIterGVN *igvn = phase->is_IterGVN(); >> 436: if (in(1)->is_top() && !in(2)->is_top()) { > > The Phi-nodes for loops are always normalized - in(1) will be loop-entry and in(2) is the backedge. So if in(1) is top - in(2) will be a self loop. Yes, loop self loop check is no longer needed, removed associated phi disintegration logic also, there was a problem with the inputs connection exit_region (convergence region after partially in-lined fast path region and stub call slow path region) which has been fixed, it was maligning the graph shape. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Mon Nov 2 18:00:58 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 2 Nov 2020 18:00:58 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v4] In-Reply-To: References: Message-ID: On Mon, 19 Oct 2020 18:33:22 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions. > > There is regression after 8252847 changes: 8254890. > It should be fixed before we proceed with these changes. Hi @vnkozlov , @neliasso, kindly let me know if there any review comments which needs to be addressed. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From cjashfor at linux.ibm.com Mon Nov 2 18:09:36 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Mon, 2 Nov 2020 10:09:36 -0800 Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v8] In-Reply-To: <-7PHVafzbyMukuWngsX5bdLvJPubN2KzjMWM2lrQnCs=.a278b608-3e1c-4126-9791-efe18a5d8d5e@github.com> References: <_JR-e3ZsRFwvZCR7ws34z5jLjp2kJQ1bu4gyl0RG1XU=.ec3040cf-8147-4dcd-b87d-4fd9be4eb59e@github.com> <-7PHVafzbyMukuWngsX5bdLvJPubN2KzjMWM2lrQnCs=.a278b608-3e1c-4126-9791-efe18a5d8d5e@github.com> Message-ID: <8c320241-a64f-380a-6e02-6fdf5878cd10@linux.ibm.com> On 10/26/20 12:47 PM, Paul Murphy wrote: > On Thu, 22 Oct 2020 22:06:11 GMT, CoreyAshford wrote: > >>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3878: >>> >>>> 3876: // | Element | | | | | | | | | >>>> 3877: // +===============+=============+======================+======================+=============+=============+======================+======================+=============+ >>>> 3878: // | after vaddubm | 00||b0:0..5 | 00||b0:6..7||b1:0..3 | 00||b1:4..7||b2:0..1 | 00||b2:2..7 | 00||b3:0..5 | 00||b3:6..7||b4:0..3 | 00||b4:4..7||b5:0..1 | 00||b5:2..7 | >>> >>> An extra line here showing how the 8 6-bit values above get mapping into 6 bytes greatly help my brain out. (likewise for the > >> Just to make sure I understand, you're not asking for a change here, is that right? > > I think the first line should also express the initial layout of the 6 bit values similar to the linked algo. I think changing this comment add an extra line which describes the bits as they leave `vaddubm` would be helpful to understand the demangling here. (e.g the `00aaaaaa 00bbbbbb 00ccccc 00dddddd` comments in the linked paper) Ok, got it. I will change it as you suggest to create a better mental link between the terminology used in the paper and the bit numbering I chose to use in the code comments. > >>> src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3884: >>> >>>> 3882: // | vec_0x3fs | 00111111 | 00111111 | 00111111 | 00111111 | 00111111 | 00111111 | 00111111 | 00111111 | >>>> 3883: // +---------------+-------------+----------------------+----------------------+-------------+-------------+----------------------+----------------------+-------------+ >>>> 3884: // | after vpextd | b5:0..7 | b4:0..7 | b3:0..7 | b2:0..7 | b1:0..7 | b0:0..7 | 00000000 | 00000000 | >>> >>> Are theses comments correct or am I misunderstanding this? I read the final result as something starting as `b5:2..7 || b4:4..7|| b5:0..1` from vpextd. >> >> Because the bytes are displayed e15..e8, instead of the other way around, it's hard to follow. As an example, consider just the last four bytes of the table, but displayed in the reverse order: >> >> 00||b0:0..5 00||b0:6..7||b1:0..3 00||b1:4..7||b2:0..1 00||b2:2..7 >> >> After vpextd with bit select pattern 00111111 for all bytes: >> >> b0:0..5||b0:6..7 b1:0..3||1:4..7 b2:0..1||b2:2..7 >> = >> b0:0..7 b1:0..7 b2:0..7 >> >> Should I reverse the order of this table with a comment at the top, to explain the reason for the reversal? It seems like a good idea. > > Since you are operating on doublewords here, expressing this as operations on a doubleword instead of bytes would be more intuitive here. I think the lane mappings for little endian are what throw me off. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/293 > Got it. I will try that out and see how it looks compared to the byte-swapped version. Also I will add a comment about vpextd operating on doublewords. As a side note, on github, it's waiting for you to check a box: "I agree to the OpenJDK Terms of use for all comments I make in a project in the OpenJDK GitHub organization.". Until you tick that box, your comment can't be seen there. https://github.com/openjdk/jdk/pull/293#discussion_r512214354 Thanks, - Corey From burban at openjdk.java.net Mon Nov 2 18:37:56 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 2 Nov 2020 18:37:56 GMT Subject: RFR: 8254072: AArch64: Get rid of --disable-warnings-as-errors on Windows+ARM64 build [v4] In-Reply-To: References: <8gg1Viyrq9xvlobYWDR2MTvXYFEBtVHXIfAkrptF240=.8936d808-6aac-43d7-9893-e8c095c6abb9@github.com> <-mnJRsB2bDYTyIboORS0S7t2rVJC_YolZrqmr2lqloM=.da15438d-463a-4462-8292-f6595b4bcb06@github.com> Message-ID: On Mon, 2 Nov 2020 17:43:31 GMT, Andrew Haley wrote: >> https://github.com/openjdk/jdk/pull/1013 > >> @lewurm >> This patch seems to break on linux-aarch64 with gcc: > > Builds cleanly on Linux/GCC or me. @theRealAph what gcc version? I can reproduce with $ gcc --version gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008 which ships in Ubuntu 19.10 as default ------------- PR: https://git.openjdk.java.net/jdk/pull/530 From stuefe at openjdk.java.net Mon Nov 2 19:14:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 2 Nov 2020 19:14:01 GMT Subject: RFR: JDK-8255780: Remove unused overloads of VMError::report_and_die() Message-ID: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> VMError::report_and_die() comes in a lot of overloads. These are unused: void report_and_die(const char* message, const char* detail_fmt, ...) void report_and_die(const char* message); and can be removed. ------------- Commit messages: - remove unused functions Changes: https://git.openjdk.java.net/jdk/pull/1018/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1018&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255780 Stats: 16 lines in 2 files changed: 0 ins; 16 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1018.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1018/head:pull/1018 PR: https://git.openjdk.java.net/jdk/pull/1018 From never at openjdk.java.net Mon Nov 2 19:25:54 2020 From: never at openjdk.java.net (Tom Rodriguez) Date: Mon, 2 Nov 2020 19:25:54 GMT Subject: Integrated: 8255578: [JVMCI] be more careful about reflective reads of Class.componentType. In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 19:23:39 GMT, Tom Rodriguez wrote: > cc @vnkozlov This pull request has now been integrated. Changeset: bc6085b0 Author: Tom Rodriguez URL: https://git.openjdk.java.net/jdk/commit/bc6085b0 Stats: 25 lines in 3 files changed: 25 ins; 0 del; 0 mod 8255578: [JVMCI] be more careful about reflective reads of Class.componentType. Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/962 From gziemski at openjdk.java.net Mon Nov 2 19:31:06 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Mon, 2 Nov 2020 19:31:06 GMT Subject: RFR: 8253742: POSIX signal code cleanup Message-ID: hi all, Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code #2 David's feedback - used unblock_program_error_signals() on all platforms (this change is superseeded by JDK-8252533, will need to merge) #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include #6 Coleen's feedback - factored out print_signal_handlers() #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() #8 Thomas's feedback - factored out common POSIX signal initialization code #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API #10 YaSuenag's feedback - unified logging out of the scope for this fix #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? ------------- Commit messages: - Factor out common POSIX signal initialization code - Factor out do_task into PosixSignals - factor out print_signal_handlers - Coleen's feedback integrated - merge - Use JVM_handle_posix_signal for all POSIX platforms - factor out ucontext_get_pc and ucontext_set_pc into their respective platform code - Use JVM_handle_posix_signal for all POSIX platforms - Use unblock_program_error_signals() on all platforms - Remove non POSIX SIGNIFICANT_SIGNAL_MASK code Changes: https://git.openjdk.java.net/jdk/pull/636/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253742 Stats: 326 lines in 20 files changed: 91 ins; 157 del; 78 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Mon Nov 2 19:31:06 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Mon, 2 Nov 2020 19:31:06 GMT Subject: RFR: 8253742: POSIX signal code cleanup In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 14:19:02 GMT, Gerard Ziemski wrote: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (this change is superseeded by JDK-8252533, will need to merge) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Does anyone have a suggestion on how to merge with current JDK in the safest way? ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From mdoerr at openjdk.java.net Mon Nov 2 19:54:58 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 2 Nov 2020 19:54:58 GMT Subject: RFR: JDK-8255780: Remove unused overloads of VMError::report_and_die() In-Reply-To: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> References: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> Message-ID: On Mon, 2 Nov 2020 19:07:29 GMT, Thomas Stuefe wrote: > VMError::report_and_die() comes in a lot of overloads. These are unused: > > void report_and_die(const char* message, const char* detail_fmt, ...) > void report_and_die(const char* message); > > and can be removed. Thanks for reducing overloaded versions of it. Looks good. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1018 From stuefe at openjdk.java.net Mon Nov 2 20:16:58 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 2 Nov 2020 20:16:58 GMT Subject: RFR: 8253742: POSIX signal code cleanup In-Reply-To: References: Message-ID: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> On Tue, 13 Oct 2020 14:19:02 GMT, Gerard Ziemski wrote: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (this change is superseeded by JDK-8252533, will need to merge) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Hi Gerard, good job! But we really should synchronize :) I am currently working on: https://bugs.openjdk.java.net/browse/JDK-8255711 (see Draft PR: https://github.com/openjdk/jdk/pull/982) and https://bugs.openjdk.java.net/browse/JDK-8252533 is also still open (see https://github.com/openjdk/jdk/pull/839) - still waiting Davids final OK. So unfortunately there are a number of clashes with your change: - "JVM_handle_xxx_signal()": See my mail to jdk-dev from this morning: https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html - we either can get completely rid of this function, which is what my current Draft for JDK-8255711 does. Or we need to retain it for backward compatibility, but if so, we need to retain it with the current interface. Either way, could you please withhold changes to the hotspot signal handlers for the moment (so, both javaSignalHandler() and the various JVM_handle_xxx_signal() functions)? - I removed some functions you changed - all that signal blocking mask stuff. See https://github.com/openjdk/jdk/pull/839. Could you hold any changes to those functions until JDK-8252533 is out of the door? Hope to do this tomorrow, with your and Davids approval. The unification of the signal handler printing stuff and the SR initialization make sense and are a good simplification. Please fine further remarks remarks inline. Cheers, Thomas make/hotspot/symbols/symbols-linux line 24: > 22: # > 23: > 24: JVM_handle_posix_signal Please don't change these (see comment above). src/hotspot/os/posix/signals_posix.cpp line 1261: > 1259: PosixSignals::print_signal_handler(st, SHUTDOWN3_SIGNAL , buf, buflen); > 1260: PosixSignals::print_signal_handler(st, BREAK_SIGNAL, buf, buflen); > 1261: #if defined(AIX) Can you change both #ifdefs to: `#ifdef SIGDANGER` resp. `#ifdef SIGTRAP` please? Some other Unices have this too. (Side note, I always wanted to change this coding to a loop to print all signal handlers unconditionally, regardless of whether this is a "hotspot signal" or not. Since when analyzing customer problems, sometimes its interesting to know if other handlers are installed too(eg SIGCHILD). At least that's how we do things in our propietary VM.) src/hotspot/os/aix/os_aix.cpp line 1442: > 1440: } > 1441: > 1442: void os::print_signal_handlers(outputStream* st, char* buf, size_t buflen) { This makes sense. But then, it would make sense to move this to os_posix.cpp completely. Or to even completely replace calls to os::print_signal_handlers with PosixSignals::print_signal_handlers() and remove the former. src/hotspot/os/aix/os_aix.cpp line 3301: > 3299: } > 3300: > 3301: address os::ucontext_get_pc(const ucontext_t* ctx) { +1 src/hotspot/os/posix/signals_posix.cpp line 452: > 450: } > 451: > 452: // Renamed from 'signalHandler' to avoid collision with other shared libs. Please don't change those, nor javaSignalHandler(). src/hotspot/os/posix/signals_posix.cpp line 459: > 457: // on all our platforms they would bring down the process immediately when > 458: // getting raised while being blocked. > 459: unblock_program_error_signals(); As remarked above, this will conflict with JDK-8252533, since I remove this function too. Can we leave this out please? src/hotspot/share/runtime/os.hpp line 970: > 968: > 969: static address ucontext_get_pc(const ucontext_t* ctx); > 970: static void ucontext_set_pc(ucontext_t* ctx, address pc); This feels misplaced here (and probably won't compile on windows) since ucontext_t is POSIX. At the very least needs ucontext.h. But I would consider moving this to os_posix. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From sspitsyn at openjdk.java.net Mon Nov 2 21:02:56 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 2 Nov 2020 21:02:56 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 16:20:09 GMT, Coleen Phillimore wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Coleen CR1: Refactoring > > This looks better. Just to have the JRT_BLOCK be unconditional is an improvement. Erik, Thank you for the update! It looks more elegant. One concern is that after the move of this fragment to the post_method_exit_inner: 1614 if (state == NULL || !state->is_interp_only_mode()) { 1615 // for any thread that actually wants method exit, interp_only_mode is set 1616 return; 1617 } there is no guarantee that the current frame is interpreted below: 1580 if (!exception_exit) { 1581 oop oop_result; 1582 BasicType type = current_frame.interpreter_frame_result(&oop_result, &value); . . . 1597 if (result.not_null() && !mh->is_native()) { 1598 // We have to restore the oop on the stack for interpreter frames 1599 *(oop*)current_frame.interpreter_frame_tos_address() = result(); 1600 } Probably, extra checks for current_frame.is_interpreted_frame() in these fragments will be sufficient. ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From andrew at openjdk.java.net Mon Nov 2 21:20:02 2020 From: andrew at openjdk.java.net (Andrew John Hughes) Date: Mon, 2 Nov 2020 21:20:02 GMT Subject: RFR: 8142984: Zero: fast accessors should handle both getters and setters [v2] In-Reply-To: References: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> Message-ID: On Wed, 28 Oct 2020 09:56:31 GMT, Aleksey Shipilev wrote: >> It started as removing the TODO item in `abstractInterpreter.cpp`. Zero is the only implementation that treats `accessor` to mean `getter`, which makes the awkward choice in the entry selection. After going back and forth (including trying to remove the fast accessor methods altogether in [JDK-8255066](https://bugs.openjdk.java.net/browse/JDK-8255066)), I settled on implementing the fast Zero `setter`-s too, plus renaming and whipping the existing `getter` code in shape. The end result seems to be more straight-forward than it was before. >> >> On the plus side, it improves `make bootcycle-images` in release mode from ~47m40s to ~46m50s, because we are saving time doing the `normal_entry` for setters. >> >> The "normal", non-Zero template interpreter is not affected, because it does not have any specializations for `accessor`, `getter` or `setter`, and instead just doing the normal entry. >> >> Testing: >> - [x] Linux x86_64 {fastdebug, release} Zero `make bootcycle-images` >> - [x] Linux aarch64 {fastdebug, release} Zero `make bootcycle-images` >> - [x] Linux x86_64 Zero release jcstress >> - [x] Linux aarch64 Zero release jcstress > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8142984: Zero: fast accessors should handle both getters and setters Looks good to me. ------------- Marked as reviewed by andrew (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/728 From sspitsyn at openjdk.java.net Mon Nov 2 21:23:00 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 2 Nov 2020 21:23:00 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 17:52:58 GMT, Erik ?sterlund wrote: >> src/hotspot/share/prims/jvmtiExport.cpp line 1570: >> >>> 1568: // return a flag when a method terminates by throwing an exception >>> 1569: // i.e. if an exception is thrown and it's not caught by the current method >>> 1570: bool exception_exit = state->is_exception_detected() && !state->is_exception_caught(); >> >> So this only applies to the case where the post_method_exit comes from remove_activation? Good to have it only on this path in this case. > > I'm not sure. There might be other cases, when remove_activation is called by the exception code. That's why I didn't want to change it to just true in this path. The post_method_exit can come from Zero interpreter: src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp: CALL_VM_NOCHECK(InterpreterRuntime::post_method_exit(THREAD)); ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From pliden at openjdk.java.net Mon Nov 2 21:38:56 2020 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 2 Nov 2020 21:38:56 GMT Subject: RFR: 8255662: ZGC: Unify nmethod closures in the heap iterator In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 16:10:55 GMT, Stefan Karlsson wrote: > In the heap iterator, we use different nmethod closures to visit the on-stack nmethods and the rest that are visited if class unloading is turned off. > > The rational is that the first set have already been processed and does not have to be fixed, so the code simply verifies that the nmethod has been processed and visits all the oops. The second set contains nmethods of the kind that has been entered and processed, and those that have not. Before visiting oops in those nmethods, we apply an nmethod barrier to ensure that it's safe to visit the oops. > > The proposal is to get rid of this separation and simply apply the nmethod entry barrier on all visited nmethods. This will make it easier to reason about the safeness of visiting the oops. Marked as reviewed by pliden (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1011 From rkennke at openjdk.java.net Mon Nov 2 22:52:04 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 2 Nov 2020 22:52:04 GMT Subject: RFR: 8255762: Shenandoah: Consolidate/streamline interpreter LRBs Message-ID: We currently have two LRB implementations in interpreter: one normal and one for native/weak LRB. We should consolidate them into one. The main issue here is that the two implementations follow different approaches: - the normal LRB calls through the shenandoah_lrb stub which does additional null- and cset-checking - the weak LRB calls to runtime directly and must not do cset-checking The reason for calling through the stub is that it gives more freedom to allocate two registers that are required for the cset check. However, we can invert the cset addressing like we did in JDK-8245465 and save a register. We can also eliminate the null-check and let the cset-check subsume it (like we do everywhere else). Allocating a single register for the cset-check is easy, and we can do so in-line without the extra jump through the stub. The runtime call through the stub has also been very costly: it dumps 2KB of register data on the stack at each call, that is very excessive. save_xmm_registers() should be more than enough (in-fact, I am almost certain that this is excessive too, and we should only need to save/restore xmm0 - but not in this patch). Not needing to generate the call-stub is also helpful for backportability, because in jdk8-shenandoah we cannot do that. Testing: hotspot_gc_shenandoah (x86_64, x86_32) ------------- Commit messages: - Improve register shuffling in interpreter LRB/aarch64 - Consolidate/streamline interpreter LRBs/aarch64 part - Fix x86_32 build - 8255762: Shenandoah: Consolidate/streamline interpreter LRBs Changes: https://git.openjdk.java.net/jdk/pull/1010/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1010&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255762 Stats: 424 lines in 4 files changed: 45 ins; 312 del; 67 mod Patch: https://git.openjdk.java.net/jdk/pull/1010.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1010/head:pull/1010 PR: https://git.openjdk.java.net/jdk/pull/1010 From coleenp at openjdk.java.net Mon Nov 2 23:06:55 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 2 Nov 2020 23:06:55 GMT Subject: RFR: JDK-8255780: Remove unused overloads of VMError::report_and_die() In-Reply-To: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> References: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> Message-ID: <2KoReo7d64Gkc0DIw8WRJajpTt0FvGEtcRmn8TsuTEs=.d093ace9-b1b2-4f80-9d81-a495c80aa236@github.com> On Mon, 2 Nov 2020 19:07:29 GMT, Thomas Stuefe wrote: > VMError::report_and_die() comes in a lot of overloads. These are unused: > > void report_and_die(const char* message, const char* detail_fmt, ...) > void report_and_die(const char* message); > > and can be removed. Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1018 From dholmes at openjdk.java.net Tue Nov 3 01:57:00 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 3 Nov 2020 01:57:00 GMT Subject: RFR: 8250637: UseOSErrorReporting times out (on Mac and Linux) [v4] In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 15:56:09 GMT, Gerard Ziemski wrote: >> hi all, >> >> Please review this simple fix for POSIX platforms, which addresses a time out that occurs while handling a crash with UseOSErrorReporting turned ON. >> >> It appears that "UseOSErrorReporting" flag was only ever meant to be used on Windows platform and was mistakenly left available for other platforms. In this fix we make sure to only use the flag on Windows platform and make it a NOP for other platforms. >> >> Note #1: A similar hang issue occurs today even on Windows, with the only difference being that before a process times out (takes 2 minutes) it runs out of stack space in about 250 loops, so that's the only reason it doesn't linger for that long. Windows issue is tracked separately by https://bugs.openjdk.java.net/browse/JDK-8250782 >> >> Note #2: Creating native crash log (on macOS) is a non-trivial, research wise effort, that is tracked by https://bugs.openjdk.java.net/browse/JDK-8237727 >> >> Note #3 Removal of the "UseOSErrorReporting" flag will be depended on whether we can do #2 and at that time we can decide whether to keep it and implement it for other platforms or whether to remove it, provided that #2 can not be done reliably. > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed one more leftover UseOsErrorReporting to UseOSErrorReporting > - last tweaks and fixes Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/813 From github.com+51754783+coreyashford at openjdk.java.net Tue Nov 3 02:24:58 2020 From: github.com+51754783+coreyashford at openjdk.java.net (CoreyAshford) Date: Tue, 3 Nov 2020 02:24:58 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v8] In-Reply-To: References: Message-ID: On Sat, 24 Oct 2020 21:38:55 GMT, CoreyAshford wrote: >> Yes, it assumes uniformly random data, but also recall that the unencoded data bytes get shifted by 2, 4, 6 bits into the encoded bytes, which I'm guessing would tend to make the data somewhat more uniform, even if the source data has low entropy. >> >> That said, I didn't actually benchmark it. I will do that to make sure there is a gain, and if there isn't I will remove the conditional branch. > >> I took a look at the VSX algo. I haven't looked much beyond it. I had a few questions I've inlined. It does look like a faithful VSX implementation of the linked algo. > > I neglected to thank you for reviewing this code! I realize there's quite a time commitment required to review this properly, and because of that I was having difficulty finding a second reviewer for the PPC64 portion. > > Just to set expectations, I will be on vacation next week, so further commits won't be posted until the following week, but I will address all of your great feedback. Thanks again! I just got done running a benchmark without the branch around the xxsel, and your hunch was right. There's about a 9% performance gain in the benchmark with that branch dropped. I also changed the previous instruction not to set the condition code, but I doubt that affected performance. Both regression tests for Base64 encoding/decoding still pass. The next set of commits will contain this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/293 From github.com+51754783+coreyashford at openjdk.java.net Tue Nov 3 02:53:09 2020 From: github.com+51754783+coreyashford at openjdk.java.net (CoreyAshford) Date: Tue, 3 Nov 2020 02:53:09 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v9] In-Reply-To: References: Message-ID: > This patch set encompasses the following commits: > > - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. > - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation > - Adds a Power64LE-specific implementation of the decodeBlock intrinsic. > - Adds a JMH microbenchmark for both Base64 encoding and encoding. > - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. CoreyAshford has updated the pull request incrementally with two additional commits since the last revision: - stubGenerator_ppc.cpp: Remove the predicted branch around the xxsel instruction to improves performance by about 9% This conditional branch around the xxsel seemed like a good idea at the time, because I thought the branch would be less costly than the xxsel instruction, but it turns out not to be the case; executing the xxsel every time without a conditional branch increases performance by about 9%. Removing that branch also removed the need for the declaration and usage of an array of Label's for the branch destinations inside the unrolled code. - stubGenerator_ppc.cpp: address issues with understanding the pack algorithm * Change the order of the bytes as listed in the tables, which makes the use of vpextd easier to understand. * Because the byte order of the constants used in the tables is reversed from the original documentation, change the constant declarations to match the order in the table, by using the ARRAY_TO_LXV_ORDER macro. This makes the constant declarations more consistent as well. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/293/files - new: https://git.openjdk.java.net/jdk/pull/293/files/8e15d971..0e291be4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=293&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=293&range=07-08 Stats: 106 lines in 1 file changed: 25 ins; 12 del; 69 mod Patch: https://git.openjdk.java.net/jdk/pull/293.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/293/head:pull/293 PR: https://git.openjdk.java.net/jdk/pull/293 From github.com+51754783+coreyashford at openjdk.java.net Tue Nov 3 02:55:58 2020 From: github.com+51754783+coreyashford at openjdk.java.net (CoreyAshford) Date: Tue, 3 Nov 2020 02:55:58 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v9] In-Reply-To: <-7PHVafzbyMukuWngsX5bdLvJPubN2KzjMWM2lrQnCs=.a278b608-3e1c-4126-9791-efe18a5d8d5e@github.com> References: <_JR-e3ZsRFwvZCR7ws34z5jLjp2kJQ1bu4gyl0RG1XU=.ec3040cf-8147-4dcd-b87d-4fd9be4eb59e@github.com> <-7PHVafzbyMukuWngsX5bdLvJPubN2KzjMWM2lrQnCs=.a278b608-3e1c-4126-9791-efe18a5d8d5e@github.com> Message-ID: On Mon, 26 Oct 2020 19:22:23 GMT, Paul Murphy wrote: >> Just to make sure I understand, you're not asking for a change here, is that right? > > I think the first line should also express the initial layout of the 6 bit values similar to the linked algo. I think changing this comment add an extra line which describes the bits as they leave `vaddubm` would be helpful to understand the demangling here. (e.g the `00aaaaaa 00bbbbbb 00ccccc 00dddddd` comments in the linked paper) I think I have addressed the issues in this comment with the latest commits. Reversing the order of the bytes in the tables seems to make the tables easier to understand, and also make the vector constant declarations consistent: all use the ARRAY_TO_LXV_ORDER macro now. The 00aaaaa (etc.) bit fields are added to the tables.. I'm not 100% sure they help much, but at least the comments follow the original paper in a clearer way. ------------- PR: https://git.openjdk.java.net/jdk/pull/293 From github.com+51754783+coreyashford at openjdk.java.net Tue Nov 3 03:01:09 2020 From: github.com+51754783+coreyashford at openjdk.java.net (CoreyAshford) Date: Tue, 3 Nov 2020 03:01:09 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v10] In-Reply-To: References: Message-ID: > This patch set encompasses the following commits: > > - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. > - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation > - Adds a Power64LE-specific implementation of the decodeBlock intrinsic. > - Adds a JMH microbenchmark for both Base64 encoding and encoding. > - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. CoreyAshford has updated the pull request incrementally with one additional commit since the last revision: stubGenerator_ppc.cpp: fix trailing whitespace errors ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/293/files - new: https://git.openjdk.java.net/jdk/pull/293/files/0e291be4..8292527e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=293&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=293&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/293.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/293/head:pull/293 PR: https://git.openjdk.java.net/jdk/pull/293 From dongbo at openjdk.java.net Tue Nov 3 03:13:09 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 3 Nov 2020 03:13:09 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v3] In-Reply-To: References: Message-ID: > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: reconstructed the macro as function generate_base64_encode_simdround ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/992/files - new: https://git.openjdk.java.net/jdk/pull/992/files/2a17576b..2999ac15 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=01-02 Stats: 77 lines in 1 file changed: 35 ins; 40 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/992.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992 PR: https://git.openjdk.java.net/jdk/pull/992 From dongbo at openjdk.java.net Tue Nov 3 03:17:57 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 3 Nov 2020 03:17:57 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v3] In-Reply-To: References: <51wDcKZ0kbfUQ3yXerMX2k_PiqNMYzUdq-1AE-vAqzI=.60806e4a-ce08-4fb7-b218-93ac40444613@github.com> Message-ID: <0dVn-ZndEjTQ0WNFHKY4yF1P-stGBr4Ryf4Gd_TaYIs=.924a7686-f084-4d75-abfe-16753a8c4354@github.com> On Mon, 2 Nov 2020 14:58:17 GMT, Andrew Haley wrote: >> Thanks for the suggestions. >> >> Just updated a version. >> The register name `sp` is changed to `soff`, and the macro is unpacked into code block `Process48B` and `Process24B`. >> >> Verified with `test/jdk/java/util/Base64/`. > > I'm sorry, there's no way I wanted the macro to be unpacked; I wanted it to be a function. I apologize for not being clear. Okay. Updated, reconstructed the macro as function `generate_base64_encode_simdround`. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From dongbo at openjdk.java.net Tue Nov 3 07:01:12 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 3 Nov 2020 07:01:12 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4] In-Reply-To: References: Message-ID: <_p5rQggkG8fBak9RrEDnoOzRnl8hsezlFjNmN_MFRiU=.c4167eaf-79f4-45eb-9fa9-2d0613274392@github.com> > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into aarch64-base64-encoding - reconstructed the macro as function generate_base64_encode_simdround - change register name sp and unpack the macro - Merge branch 'master' into aarch64-base64-encoding - 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/992/files - new: https://git.openjdk.java.net/jdk/pull/992/files/2999ac15..e5c50ffd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=02-03 Stats: 4625 lines in 215 files changed: 2529 ins; 894 del; 1202 mod Patch: https://git.openjdk.java.net/jdk/pull/992.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992 PR: https://git.openjdk.java.net/jdk/pull/992 From stefank at openjdk.java.net Tue Nov 3 07:30:55 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 3 Nov 2020 07:30:55 GMT Subject: RFR: 8255662: ZGC: Unify nmethod closures in the heap iterator In-Reply-To: References: Message-ID: <80SMRC4moQAHMiEV58M0AiFRYI0OhdU3hFA4T3uzrys=.ec781538-afa4-482f-82b7-e7b0ecb35980@github.com> On Mon, 2 Nov 2020 21:35:51 GMT, Per Liden wrote: >> In the heap iterator, we use different nmethod closures to visit the on-stack nmethods and the rest that are visited if class unloading is turned off. >> >> The rational is that the first set have already been processed and does not have to be fixed, so the code simply verifies that the nmethod has been processed and visits all the oops. The second set contains nmethods of the kind that has been entered and processed, and those that have not. Before visiting oops in those nmethods, we apply an nmethod barrier to ensure that it's safe to visit the oops. >> >> The proposal is to get rid of this separation and simply apply the nmethod entry barrier on all visited nmethods. This will make it easier to reason about the safeness of visiting the oops. > > Marked as reviewed by pliden (Reviewer). Thanks for reviewing ------------- PR: https://git.openjdk.java.net/jdk/pull/1011 From stefank at openjdk.java.net Tue Nov 3 07:36:57 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 3 Nov 2020 07:36:57 GMT Subject: Integrated: 8255662: ZGC: Unify nmethod closures in the heap iterator In-Reply-To: References: Message-ID: <0_gSEcxunF7ZIz7TJ77-8KYN48TtiMw8maF_xpsTORw=.426197dd-9662-4490-a49e-7ba065492583@github.com> On Mon, 2 Nov 2020 16:10:55 GMT, Stefan Karlsson wrote: > In the heap iterator, we use different nmethod closures to visit the on-stack nmethods and the rest that are visited if class unloading is turned off. > > The rational is that the first set have already been processed and does not have to be fixed, so the code simply verifies that the nmethod has been processed and visits all the oops. The second set contains nmethods of the kind that has been entered and processed, and those that have not. Before visiting oops in those nmethods, we apply an nmethod barrier to ensure that it's safe to visit the oops. > > The proposal is to get rid of this separation and simply apply the nmethod entry barrier on all visited nmethods. This will make it easier to reason about the safeness of visiting the oops. This pull request has now been integrated. Changeset: c96a914b Author: Stefan Karlsson URL: https://git.openjdk.java.net/jdk/commit/c96a914b Stats: 39 lines in 3 files changed: 18 ins; 15 del; 6 mod 8255662: ZGC: Unify nmethod closures in the heap iterator Reviewed-by: eosterlund, pliden ------------- PR: https://git.openjdk.java.net/jdk/pull/1011 From stuefe at openjdk.java.net Tue Nov 3 07:38:55 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Nov 2020 07:38:55 GMT Subject: RFR: JDK-8255780: Remove unused overloads of VMError::report_and_die() In-Reply-To: References: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> Message-ID: On Mon, 2 Nov 2020 19:52:37 GMT, Martin Doerr wrote: > Thanks for reducing overloaded versions of it. Looks good. Thanks Martin! ------------- PR: https://git.openjdk.java.net/jdk/pull/1018 From stuefe at openjdk.java.net Tue Nov 3 07:38:55 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Nov 2020 07:38:55 GMT Subject: RFR: JDK-8255780: Remove unused overloads of VMError::report_and_die() In-Reply-To: References: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> Message-ID: On Tue, 3 Nov 2020 07:32:47 GMT, Thomas Stuefe wrote: > Looks good! Thank you Coleen! ------------- PR: https://git.openjdk.java.net/jdk/pull/1018 From stuefe at openjdk.java.net Tue Nov 3 07:38:55 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Nov 2020 07:38:55 GMT Subject: Integrated: JDK-8255780: Remove unused overloads of VMError::report_and_die() In-Reply-To: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> References: <6h8bS0mNYL0Ke0o0RjmK13SPwG79EvnY3tXV_iT-HJ0=.8719c0ff-1f51-4f32-91a3-fc8bcb182c7d@github.com> Message-ID: On Mon, 2 Nov 2020 19:07:29 GMT, Thomas Stuefe wrote: > VMError::report_and_die() comes in a lot of overloads. These are unused: > > void report_and_die(const char* message, const char* detail_fmt, ...) > void report_and_die(const char* message); > > and can be removed. This pull request has now been integrated. Changeset: 9a367479 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/9a367479 Stats: 16 lines in 2 files changed: 0 ins; 16 del; 0 mod 8255780: Remove unused overloads of VMError::report_and_die() Reviewed-by: mdoerr, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/1018 From shade at openjdk.java.net Tue Nov 3 08:10:00 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 08:10:00 GMT Subject: RFR: 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" Message-ID: When doing Zero VM performance investigations, I realized that `UseTLAB` is disabled there by default. That is effectively because `UseTLAB` is currently `product_pd` (defined in `gc_globals.hpp`), and it is enabled for every platform with C1 and C2 ports (in their respective `c1/c2_globals.hpp`). `compiler_globals.hpp` has a block that defines `UseTLAB` to `false` when no C1/C2/JVMCI is present. Not only this is awkward -- GC flag is managed by Compiler globals! -- it makes Zero awkward to opt-in to `UseTLAB` in `*_zero_globals.hpp`, because `compiler_globals.hpp` already defines it. I think we can make this all better by turning TLAB flags from `product_pd` to `product`, and defaulting them to `true`. This matches what every current Server/Minimal VM config has, and would implicitly enable `UseTLAB` and `ResizeTLAB` for Zero, as well as for builds with `--with-jvm-features=-compiler1,-compiler-2,-jvmci` (in case anyone actually builds it, that is only with template interpreter). On the downside, this shuts the door for new platform ports to disable TLAB flags by default to ease porting. But I believe the same can be achieved by turning these flags off in `arguments.cpp` under the special platform defines, while TLAB enablement work is in progress. Additional testing: - [x] Linux x86_64 Zero ad-hoc runs - [x] Linux x86_64 `--with-jvm-features=-compiler1,-compiler-2,-jvmci,*` builds ------------- Commit messages: - Prototype Changes: https://git.openjdk.java.net/jdk/pull/1019/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1019&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255782 Stats: 25 lines in 12 files changed: 1 ins; 22 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1019.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1019/head:pull/1019 PR: https://git.openjdk.java.net/jdk/pull/1019 From stuefe at openjdk.java.net Tue Nov 3 08:37:56 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Nov 2020 08:37:56 GMT Subject: RFR: 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" In-Reply-To: References: Message-ID: <7PsAR-PGqbin2lbvtr802PjP4thu4yHIS4q1ZZR3PGQ=.189b87e1-218a-419a-b713-f955170c423c@github.com> On Mon, 2 Nov 2020 19:28:29 GMT, Aleksey Shipilev wrote: > When doing Zero VM performance investigations, I realized that `UseTLAB` is disabled there by default. > > That is effectively because `UseTLAB` is currently `product_pd` (defined in `gc_globals.hpp`), and it is enabled for every platform with C1 and C2 ports (in their respective `c1/c2_globals.hpp`). `compiler_globals.hpp` has a block that defines `UseTLAB` to `false` when no C1/C2/JVMCI is present. > > Not only this is awkward -- GC flag is managed by Compiler globals! -- it makes Zero awkward to opt-in to `UseTLAB` in `*_zero_globals.hpp`, because `compiler_globals.hpp` already defines it. I think we can make this all better by turning TLAB flags from `product_pd` to `product`, and defaulting them to `true`. This matches what every current Server/Minimal VM config has, and would implicitly enable `UseTLAB` and `ResizeTLAB` for Zero, as well as for builds with `--with-jvm-features=-compiler1,-compiler-2,-jvmci` (in case anyone actually builds it, that is only with template interpreter). > > On the downside, this shuts the door for new platform ports to disable TLAB flags by default to ease porting. But I believe the same can be achieved by turning these flags off in `arguments.cpp` under the special platform defines, while TLAB enablement work is in progress. > > Additional testing: > - [x] Linux x86_64 Zero ad-hoc runs > - [x] Linux x86_64 `--with-jvm-features=-compiler1,-compiler-2,-jvmci,*` builds I think this is fine. New platforms could also just use NOT_platform ONLY_platform in gc_globals.hpp, which would be better grep-able. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1019 From shade at openjdk.java.net Tue Nov 3 09:10:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 09:10:56 GMT Subject: RFR: 8255762: Shenandoah: Consolidate/streamline interpreter LRBs In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 15:55:40 GMT, Roman Kennke wrote: > We currently have two LRB implementations in interpreter: one normal and one for native/weak LRB. We should consolidate them into one. > > The main issue here is that the two implementations follow different approaches: > - the normal LRB calls through the shenandoah_lrb stub which does additional null- and cset-checking > - the weak LRB calls to runtime directly and must not do cset-checking > > The reason for calling through the stub is that it gives more freedom to allocate two registers that are required for the cset check. However, we can invert the cset addressing like we did in JDK-8245465 and save a register. We can also eliminate the null-check and let the cset-check subsume it (like we do everywhere else). Allocating a single register for the cset-check is easy, and we can do so in-line without the extra jump through the stub. The runtime call through the stub has also been very costly: it dumps 2KB of register data on the stack at each call, that is very excessive. save_xmm_registers() should be more than enough (in-fact, I am almost certain that this is excessive too, and we should only need to save/restore xmm0 - but not in this patch). Not needing to generate the call-stub is also helpful for backportability, because in jdk8-shenandoah we cannot do that. > > Testing: hotspot_gc_shenandoah (x86_64, x86_32) This looks nice. Please run `hotspot_gc_shenandoah` on `aarch64`? src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp line 239: > 237: > 238: // Check for heap stability > 239: __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, not_fwded); `not_fwded` reads as if the object is not forwarded. May it should be `heap_stable`? src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 307: > 305: assert(tmp1 != dst, ""); > 306: assert(tmp1 != src.base(), ""); > 307: assert(tmp1 != src.index(), ""); Is this just `assert_different_registers(tmp1, dst, src.base(), src.index())`, or am I missing something? ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1010 From shade at openjdk.java.net Tue Nov 3 09:15:54 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 09:15:54 GMT Subject: RFR: 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" In-Reply-To: <7PsAR-PGqbin2lbvtr802PjP4thu4yHIS4q1ZZR3PGQ=.189b87e1-218a-419a-b713-f955170c423c@github.com> References: <7PsAR-PGqbin2lbvtr802PjP4thu4yHIS4q1ZZR3PGQ=.189b87e1-218a-419a-b713-f955170c423c@github.com> Message-ID: On Tue, 3 Nov 2020 08:34:45 GMT, Thomas Stuefe wrote: >> When doing Zero VM performance investigations, I realized that `UseTLAB` is disabled there by default. >> >> That is effectively because `UseTLAB` is currently `product_pd` (defined in `gc_globals.hpp`), and it is enabled for every platform with C1 and C2 ports (in their respective `c1/c2_globals.hpp`). `compiler_globals.hpp` has a block that defines `UseTLAB` to `false` when no C1/C2/JVMCI is present. >> >> Not only this is awkward -- GC flag is managed by Compiler globals! -- it makes Zero awkward to opt-in to `UseTLAB` in `*_zero_globals.hpp`, because `compiler_globals.hpp` already defines it. I think we can make this all better by turning TLAB flags from `product_pd` to `product`, and defaulting them to `true`. This matches what every current Server/Minimal VM config has, and would implicitly enable `UseTLAB` and `ResizeTLAB` for Zero, as well as for builds with `--with-jvm-features=-compiler1,-compiler-2,-jvmci` (in case anyone actually builds it, that is only with template interpreter). >> >> On the downside, this shuts the door for new platform ports to disable TLAB flags by default to ease porting. But I believe the same can be achieved by turning these flags off in `arguments.cpp` under the special platform defines, while TLAB enablement work is in progress. >> >> Additional testing: >> - [x] Linux x86_64 Zero ad-hoc runs >> - [x] Linux x86_64 `--with-jvm-features=-compiler1,-compiler-2,-jvmci,*` builds > > I think this is fine. New platforms could also just use NOT_platform ONLY_platform in gc_globals.hpp, which would be better grep-able. Thanks, @tstuefe. Need another reviewer. ------------- PR: https://git.openjdk.java.net/jdk/pull/1019 From stefank at openjdk.java.net Tue Nov 3 09:29:53 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 3 Nov 2020 09:29:53 GMT Subject: RFR: 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" In-Reply-To: References: Message-ID: <1OzPeIS9fm-ju9MIajtY8pz_rf0NVtKiDeiTd29_zmc=.5800f906-fb0f-4080-b41f-0e39864fd2ae@github.com> On Mon, 2 Nov 2020 19:28:29 GMT, Aleksey Shipilev wrote: > When doing Zero VM performance investigations, I realized that `UseTLAB` is disabled there by default. > > That is effectively because `UseTLAB` is currently `product_pd` (defined in `gc_globals.hpp`), and it is enabled for every platform with C1 and C2 ports (in their respective `c1/c2_globals.hpp`). `compiler_globals.hpp` has a block that defines `UseTLAB` to `false` when no C1/C2/JVMCI is present. > > Not only this is awkward -- GC flag is managed by Compiler globals! -- it makes Zero awkward to opt-in to `UseTLAB` in `*_zero_globals.hpp`, because `compiler_globals.hpp` already defines it. I think we can make this all better by turning TLAB flags from `product_pd` to `product`, and defaulting them to `true`. This matches what every current Server/Minimal VM config has, and would implicitly enable `UseTLAB` and `ResizeTLAB` for Zero, as well as for builds with `--with-jvm-features=-compiler1,-compiler-2,-jvmci` (in case anyone actually builds it, that is only with template interpreter). > > On the downside, this shuts the door for new platform ports to disable TLAB flags by default to ease porting. But I believe the same can be achieved by turning these flags off in `arguments.cpp` under the special platform defines, while TLAB enablement work is in progress. > > Additional testing: > - [x] Linux x86_64 Zero ad-hoc runs > - [x] Linux x86_64 `--with-jvm-features=-compiler1,-compiler-2,-jvmci,*` builds Sounds good to me. (As usual with shared HotSpot code, remember to leave this open for a while to allow people in other timezones time to see it.) ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1019 From tschatzl at openjdk.java.net Tue Nov 3 09:44:17 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 3 Nov 2020 09:44:17 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v4] In-Reply-To: References: Message-ID: > Hi all, > > can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? > > By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? > > Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: > - humongous regions are either live or fully reclaimed. > - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). > > This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. > > Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). > > Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. > Performance testing: no regressions > > Some comments for questions that might come up during review: > > - how does this work with the bitmaps now: > - at start of full gc the next bitmap is cleared > - full gc marks the next bitmap > - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom > - swap bitmaps > - clear next bitmap for next marking > > (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. > > - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. > > Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. > (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). > > I.e. the second clause in the condition of this hunk is intentionally slower than could be: > @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { > // Marked by us, preserve if needed. > markWord mark = obj->mark(); > if (obj->mark_must_be_preserved(mark) && > // It is not necessary to preserve marks for objects in pinned regions because > // we do not change their headers (i.e. forward them). > !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { > preserved_stack()->push(obj, mark); > } > - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. > > Also please note that the 51b297b change is from the #808 change. > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: sjohanss review 2 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/824/files - new: https://git.openjdk.java.net/jdk/pull/824/files/dd487a91..15db3776 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=824&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=824&range=02-03 Stats: 63 lines in 9 files changed: 55 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/824.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/824/head:pull/824 PR: https://git.openjdk.java.net/jdk/pull/824 From dholmes at openjdk.java.net Tue Nov 3 09:44:58 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 3 Nov 2020 09:44:58 GMT Subject: RFR: 8253742: POSIX signal code cleanup In-Reply-To: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: On Mon, 2 Nov 2020 20:13:49 GMT, Thomas Stuefe wrote: >> hi all, >> >> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >> >> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >> >> #2 David's feedback - used unblock_program_error_signals() on all platforms (this change is superseeded by JDK-8252533, will need to merge) >> >> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms >> >> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >> >> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >> >> #6 Coleen's feedback - factored out print_signal_handlers() >> >> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >> >> #8 Thomas's feedback - factored out common POSIX signal initialization code >> >> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >> >> #10 YaSuenag's feedback - unified logging out of the scope for this fix >> >> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? > > Hi Gerard, > > good job! > > But we really should synchronize :) > > I am currently working on: https://bugs.openjdk.java.net/browse/JDK-8255711 > (see Draft PR: https://github.com/openjdk/jdk/pull/982) > > and https://bugs.openjdk.java.net/browse/JDK-8252533 is also still open (see https://github.com/openjdk/jdk/pull/839) - still waiting Davids final OK. > > So unfortunately there are a number of clashes with your change: > > - "JVM_handle_xxx_signal()": See my mail to jdk-dev from this morning: https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html - we either can get completely rid of this function, which is what my current Draft for JDK-8255711 does. Or we need to retain it for backward compatibility, but if so, we need to retain it with the current interface. > > Either way, could you please withhold changes to the hotspot signal handlers for the moment (so, both javaSignalHandler() and the various JVM_handle_xxx_signal() functions)? > > - I removed some functions you changed - all that signal blocking mask stuff. See https://github.com/openjdk/jdk/pull/839. Could you hold any changes to those functions until JDK-8252533 is out of the door? Hope to do this tomorrow, with your and Davids approval. > > The unification of the signal handler printing stuff and the SR initialization make sense and are a good simplification. > > Please fine further remarks remarks inline. > > Cheers, Thomas I hadn't realized that JVM_handle_XXX_signal defined a per-platform "public" entry point to allow external callers of the signal handling function in conjunction with -XX:+AllowUserSignalHandlers. We need to keep these but they can each call JVM_handle_posix_signal as their implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From tschatzl at openjdk.java.net Tue Nov 3 09:57:56 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 3 Nov 2020 09:57:56 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v3] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 14:04:37 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into 8253600-full-gc-pinned-region-support >> - Merge branch 'master' into 8253600-full-gc-pinned-region-support >> - sjohanss review >> >> Also remove _archive_allocator_map et al as the new attribute table >> implements the same functionality also suggested by sjohanss in >> private. >> - Initial import >> - Initial import > > Thanks Thomas for addressing my concerns around the archive-map and the new region-attr overlap. I think this looks much better. > > Just a few additional comments. - I'll file a CR for that investigation. In my recent measurements of a different full-iterate-all-regions closure we are talking about something < 0.0x ms. - added an `is_in_pinned` method, and since now we've got three not-completely-trivial methods added a `.inline.hpp` file. I did not see need for moving the completely trivial getters returning just a member variable there, but tell me if I should move them too. - moved "implementation" of `CollectedHeap::is_archived_object` to .cpp file ------------- PR: https://git.openjdk.java.net/jdk/pull/824 From stuefe at openjdk.java.net Tue Nov 3 10:08:00 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Nov 2020 10:08:00 GMT Subject: RFR: 8253742: POSIX signal code cleanup In-Reply-To: References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: On Tue, 3 Nov 2020 09:42:19 GMT, David Holmes wrote: > I hadn't realized that JVM_handle_XXX_signal defined a per-platform "public" entry point to allow external callers of the signal handling function in conjunction with -XX:+AllowUserSignalHandlers. We need to keep these but they can each call JVM_handle_posix_signal as their implementation. We should disentangle https://bugs.openjdk.java.net/browse/JDK-8255711 and this patch, https://bugs.openjdk.java.net/browse/JDK-8253742. I started by giving my patch a less generic name ("Fix and unify hotspot signal handlers"). I propose to do the same with this patch, or even split this patch into two smaller parts, since it does two things: - unify diagnostic printing code - unify SR handler setup As I wrote, I'd prefer to keep changes to JVM_xxx and javaSignalHandler out of this patch completely. I have to change those functions since the point of my patch is signal handler unification. In turn, I will keep my hands off any other code in signals_posix.xxx to decrease chances of conflict with this patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From stuefe at openjdk.java.net Tue Nov 3 10:11:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Nov 2020 10:11:01 GMT Subject: RFR: 8253742: POSIX signal code cleanup In-Reply-To: References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: On Tue, 3 Nov 2020 10:05:03 GMT, Thomas Stuefe wrote: > > I hadn't realized that JVM_handle_XXX_signal defined a per-platform "public" entry point to allow external callers of the signal handling function in conjunction with -XX:+AllowUserSignalHandlers. We need to keep these but they can each call JVM_handle_posix_signal as their implementation. > > We should disentangle https://bugs.openjdk.java.net/browse/JDK-8255711 and this patch, https://bugs.openjdk.java.net/browse/JDK-8253742. > > I started by giving my patch a less generic name ("Fix and unify hotspot signal handlers"). I propose to do the same with this patch, or even split this patch into two smaller parts, since it does two things: > > * unify diagnostic printing code > * unify SR handler setup > > As I wrote, I'd prefer to keep changes to JVM_xxx and javaSignalHandler out of this patch completely. I have to change those functions since the point of my patch is signal handler unification. > > In turn, I will keep my hands off any other code in signals_posix.xxx to decrease chances of conflict with this patch. Oh, and yes, I preserve the JVM_handle_xxx_signal entries in my patch, but they are thin wrappers around an internal, posix-specific handler function. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From aph at openjdk.java.net Tue Nov 3 10:22:02 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 3 Nov 2020 10:22:02 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4] In-Reply-To: <_p5rQggkG8fBak9RrEDnoOzRnl8hsezlFjNmN_MFRiU=.c4167eaf-79f4-45eb-9fa9-2d0613274392@github.com> References: <_p5rQggkG8fBak9RrEDnoOzRnl8hsezlFjNmN_MFRiU=.c4167eaf-79f4-45eb-9fa9-2d0613274392@github.com> Message-ID: On Tue, 3 Nov 2020 07:01:12 GMT, Dong Bo wrote: >> Base64.encodeBlock stub is implemented for x86_64. >> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. >> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. >> >> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. >> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. >> >> A JMH micro, Base64Encode.java, is added for performance test. >> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), >> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. >> >> The Base64Encode.java JMH micro-benchmark results: >> Benchmark (maxNumBytes) Mode Cnt Score Error Units >> # kunpeng 916, intrinsic >> Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op >> Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op >> Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op >> Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op >> Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op >> Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op >> Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op >> Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op >> Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op >> Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op >> Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op >> >> # kunpeng 916, default >> Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op >> Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op >> Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op >> Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op >> Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op >> Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op >> Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op >> Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op >> Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op >> Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op >> Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op >> >> # kunpeng 920, intrinsic >> Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op >> Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op >> Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op >> Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op >> Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op >> Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op >> Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op >> Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op >> Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op >> Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op >> Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op >> >> # kunpeng 920, default >> Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op >> Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op >> Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op >> Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op >> Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op >> Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op >> Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op >> Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op >> Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op >> Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op >> Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op > > Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into aarch64-base64-encoding > - reconstructed the macro as function generate_base64_encode_simdround > - change register name sp and unpack the macro > - Merge branch 'master' into aarch64-base64-encoding > - 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5483: > 5481: Register codec = rscratch1; > 5482: Register length = rscratch2; > 5483: Alias names for scratch registers have proved to be risky because assembler macros use scratch registers freely. A maintenance programmer might not to see this code uses rscratch1 and 2. Given that c_rarg6 and 7 are free, please use them. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From shade at openjdk.java.net Tue Nov 3 10:22:14 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 10:22:14 GMT Subject: RFR: 8255617: Zero: purge the remaining bytecode interpreter profiling support [v2] In-Reply-To: References: Message-ID: > All the stubs in `interpreter/zero/bytecodeInterpreterProfiling.hpp` are empty. History shows the whole thing gradually moved to template interpreter. We can probably simplify Zero code by dropping these empty stubs altogether. Arguably, this makes porting to new architectures a bit harder, but it seems that the proper stepping stone after Zero is implementing template interpreter anyway. > > On my TR 3970X, this improves: > - Linux x86_64 Zero "make images" times from ~18 minutes to ~17.5 minutes > > I would like to have the opinion of @GoeLin who added this for PPC64 porting back in 8u. And probably @DamonFool who is usually interested in Zero. And @jerboaa, @gnu-andrew who deal with Zero from time to time. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8255617-zero-purge-bi-profiliing - Also remove now unused BytecodeInterpreter::mdx field - Remove leftover variable - Remove leftover comments - 8255617: Zero: purge the remaining bytecode interpreter profiling support ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/944/files - new: https://git.openjdk.java.net/jdk/pull/944/files/2bcc4e31..15df2843 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=944&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=944&range=00-01 Stats: 5456 lines in 267 files changed: 3019 ins; 1050 del; 1387 mod Patch: https://git.openjdk.java.net/jdk/pull/944.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/944/head:pull/944 PR: https://git.openjdk.java.net/jdk/pull/944 From eosterlund at openjdk.java.net Tue Nov 3 10:25:00 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Nov 2020 10:25:00 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: References: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> <0heSKANZ4kJqlQOKY6MCs6cSZvT8-KRFRbbbKTlzNzA=.7224a164-a76d-47ce-8016-9db052b09773@github.com> Message-ID: On Mon, 2 Nov 2020 16:22:57 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/jvmtiTagMap.cpp line 345: >> >>> 343: >>> 344: // Check if we have to process for concurrent GCs. >>> 345: check_hashmap(false); >> >> Maybe add a comment stating the parameter name, as was done in other callsites for check_hashmap. > > Ok, will I run afoul of the ZGC people putting the parameter name after the parameter and the rest of the code, it is before? ZGC people passionately place the comment after the argument value. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From stefank at openjdk.java.net Tue Nov 3 10:36:57 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 3 Nov 2020 10:36:57 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 12:58:49 GMT, Coleen Phillimore wrote: > GC callers shouldn't really have to know what processing we're doing here. I completely disagree with this. It's extremely important that the GC and Runtime code agrees on what this code does and where the GC *must* call it. Knowing the details allows us to skip calling this after mark end, but forces us to call it in relocate start, when objects should start to move. Though, I don't want to block this review because of this point, so if you still thinks that a non-descriptive name is better then we can argue that separately. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From shade at openjdk.java.net Tue Nov 3 10:40:05 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 10:40:05 GMT Subject: RFR: 8255617: Zero: purge the remaining bytecode interpreter profiling support [v2] In-Reply-To: References: Message-ID: <4LnzknOoUgbWxRwRBCjaY6sOBoT2xNyhRzXWgGJPTE8=.558d9fb9-e296-44a1-80f1-501846bcb254@github.com> On Tue, 3 Nov 2020 08:49:11 GMT, Jie Fu wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8255617-zero-purge-bi-profiliing >> - Also remove now unused BytecodeInterpreter::mdx field >> - Remove leftover variable >> - Remove leftover comments >> - 8255617: Zero: purge the remaining bytecode interpreter profiling support > > Marked as reviewed by jiefu (Committer). I also figured we don't need `mdx` field in `BytecodeInterpreter` now. Removed. Plus a few comments/variables removed. Please take a look again, if you can. ------------- PR: https://git.openjdk.java.net/jdk/pull/944 From stefank at openjdk.java.net Tue Nov 3 10:44:05 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 3 Nov 2020 10:44:05 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> References: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> Message-ID: <0DUaLOt_27wPkI2SwP4BwykioL4Hn2c-j7hMz3AbHYI=.2270cb60-bd2f-4d72-abc8-cd8ea44d30b5@github.com> On Mon, 2 Nov 2020 15:58:15 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Code review comments from StefanK. Some more nit-picking to make the code more consistent. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 52: > 50: : Hashtable(_table_size, sizeof(JvmtiTagMapEntry)) {} > 51: > 52: Double whitespace src/hotspot/share/prims/jvmtiTagMapTable.cpp line 185: > 183: // Serially remove unused oops from the table, and notify jvmti. > 184: void JvmtiTagMapTable::unlink_and_post(JvmtiEnv* env) { > 185: Stray newline src/hotspot/share/prims/jvmtiTagMapTable.cpp line 224: > 222: // Rehash oops in the table > 223: void JvmtiTagMapTable::rehash() { > 224: Stray newline src/hotspot/share/prims/jvmtiTagMapTable.hpp line 75: > 73: > 74: void resize_if_needed(); > 75: public: Newline between src/hotspot/share/prims/jvmtiTagMapTable.hpp line 100: > 98: }; > 99: > 100: Double newline src/hotspot/share/prims/jvmtiTagMapTable.cpp line 258: > 256: int rehash_len = moved_entries.length(); > 257: // Now add back in the entries that were removed. > 258: for (int i = 0; i < moved_entries.length(); i++) { rehash_len is read, but not used in for loop condition. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 165: > 163: } > 164: } > 165: const int _resize_load_trigger = 5; // load factor that will trigger the resize Newline between ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/967 From volker.simonis at gmail.com Tue Nov 3 10:51:37 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 3 Nov 2020 11:51:37 +0100 Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: Message-ID: Hi Vladimir, this is an interesting step and I wonder how it affects the OpenJDK Graal, Metropolis and Leyden projects? - Project Graal [1] seems to have already been merged into project Metropolis as it states on its project page: "Further work on integrating Graal in the OpenJDK has moved to Project Metropolis." - Project Metropolis [2] has the following mission statement on its project page: "The goal of this Project is to provide a venue to explore and incubate advanced "Java-on-Java" implementation techniques for HotSpot. Our starting point is earlier proposals for using the Graal compiler and AOT static compilation technology to replace the HotSpot server compiler, and possibly other components of HotSpot." It seems that this goal becomes void when Graal AOT and Grall JIT are abandoned in the OpenJDK. - Project Leyden [?]: @Mark: what's actually the state of Project Leyden? We had a discussion [3], a vote [4] and the approval of the project [5] yet nothing has happened ever since. There's neither a project page nor a mailing list. Considering the fact that Leyden was supposed to "be based upon existing components in the JDK such as the HotSpot JVM, the `jaotc` ahead-of-time compiler, application class-data sharing, and the `jlink` linking tool" I wonder if Leyden is already dead before its instantiation if "jaotc", one of its core components, has now been deprecated? Or are there any plans to enhance C2 for AOT scenarios? Thank you and best regards, Volker [1] http://openjdk.java.net/projects/graal/ [2] http://openjdk.java.net/projects/metropolis/ [3] https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html [4] https://mail.openjdk.java.net/pipermail/discuss/2020-May/005475.html [5] https://mail.openjdk.java.net/pipermail/announce/2020-June/000290.html On Fri, Oct 30, 2020 at 6:47 PM Vladimir Kozlov wrote: > > We shipped Ahead-of-Time compilation (the jaotc tool) in JDK 9, as an experimental feature. We shipped Graal as an experimental JIT compiler in JDK 10. We haven't seen much use of these features, and the effort required to support and enhance them is significant. We therefore intend to disable these features in Oracle builds as of JDK 16. > > We'll leave the sources for these features in the repository, in case any one else is interested in building them. But we will not update or test them. > > We'll continue to build and ship JVMCI as an experimental feature in Oracle builds. > > Tested changes in all tiers. > > I verified that with these changes I still able to build Graal in open repo and run graalunit testing: > > `open$ bash test/hotspot/jtreg/compiler/graalunit/downloadLibs.sh /mydir/graalunit_lib/` > `open$ bash configure --with-debug-level=fastdebug --with-graalunit-lib=/mydir/graalunit_lib/ --with-jtreg=/mydir/jtreg` > `open$ make jdk-image` > `open$ make test-image` > `open$ make run-test TEST=compiler/graalunit/HotspotTest.java` > > ------------- > > Commit messages: > - 8255616: Disable AOT and Graal in Oracle OpenJDK > > Changes: https://git.openjdk.java.net/jdk/pull/960/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=960&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8255616 > Stats: 36 lines in 4 files changed: 21 ins; 11 del; 4 mod > Patch: https://git.openjdk.java.net/jdk/pull/960.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/960/head:pull/960 > > PR: https://git.openjdk.java.net/jdk/pull/960 From sjohanss at openjdk.java.net Tue Nov 3 11:01:56 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 3 Nov 2020 11:01:56 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 09:44:17 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? >> >> By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? >> >> Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: >> - humongous regions are either live or fully reclaimed. >> - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). >> >> This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. >> >> Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). >> >> Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. >> Performance testing: no regressions >> >> Some comments for questions that might come up during review: >> >> - how does this work with the bitmaps now: >> - at start of full gc the next bitmap is cleared >> - full gc marks the next bitmap >> - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom >> - swap bitmaps >> - clear next bitmap for next marking >> >> (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. >> >> - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. >> >> Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. >> (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). >> >> I.e. the second clause in the condition of this hunk is intentionally slower than could be: >> @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { >> // Marked by us, preserve if needed. >> markWord mark = obj->mark(); >> if (obj->mark_must_be_preserved(mark) && >> // It is not necessary to preserve marks for objects in pinned regions because >> // we do not change their headers (i.e. forward them). >> !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { >> preserved_stack()->push(obj, mark); >> } >> - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. >> >> Also please note that the 51b297b change is from the #808 change. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > sjohanss review 2 This looks great, thanks. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/824 From rkennke at openjdk.java.net Tue Nov 3 11:11:11 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 11:11:11 GMT Subject: RFR: 8255762: Shenandoah: Consolidate/streamline interpreter LRBs [v2] In-Reply-To: References: Message-ID: > We currently have two LRB implementations in interpreter: one normal and one for native/weak LRB. We should consolidate them into one. > > The main issue here is that the two implementations follow different approaches: > - the normal LRB calls through the shenandoah_lrb stub which does additional null- and cset-checking > - the weak LRB calls to runtime directly and must not do cset-checking > > The reason for calling through the stub is that it gives more freedom to allocate two registers that are required for the cset check. However, we can invert the cset addressing like we did in JDK-8245465 and save a register. We can also eliminate the null-check and let the cset-check subsume it (like we do everywhere else). Allocating a single register for the cset-check is easy, and we can do so in-line without the extra jump through the stub. The runtime call through the stub has also been very costly: it dumps 2KB of register data on the stack at each call, that is very excessive. save_xmm_registers() should be more than enough (in-fact, I am almost certain that this is excessive too, and we should only need to save/restore xmm0 - but not in this patch). Not needing to generate the call-stub is also helpful for backportability, because in jdk8-shenandoah we cannot do that. > > Testing: hotspot_gc_shenandoah (x86_64, x86_32) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: A few touch-ups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1010/files - new: https://git.openjdk.java.net/jdk/pull/1010/files/6573511d..12382e6c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1010&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1010&range=00-01 Stats: 9 lines in 2 files changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/1010.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1010/head:pull/1010 PR: https://git.openjdk.java.net/jdk/pull/1010 From rkennke at openjdk.java.net Tue Nov 3 11:15:57 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 11:15:57 GMT Subject: RFR: 8255762: Shenandoah: Consolidate/streamline interpreter LRBs [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 09:04:46 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> A few touch-ups > > src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 307: > >> 305: assert(tmp1 != dst, ""); >> 306: assert(tmp1 != src.base(), ""); >> 307: assert(tmp1 != src.index(), ""); > > Is this just `assert_different_registers(tmp1, dst, src.base(), src.index())`, or am I missing something? Not quite. dst can still legitimately be src.base(). Also, assert_different_registers() is somewhat unpractical because it fails in the wrong place. Anyhow, I rewrote it to be more succinct. ------------- PR: https://git.openjdk.java.net/jdk/pull/1010 From dongbo at openjdk.java.net Tue Nov 3 11:57:16 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 3 Nov 2020 11:57:16 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v5] In-Reply-To: References: Message-ID: > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: use r6/r7 instead of scratch registers ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/992/files - new: https://git.openjdk.java.net/jdk/pull/992/files/e5c50ffd..6d6103c5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/992.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992 PR: https://git.openjdk.java.net/jdk/pull/992 From dongbo at openjdk.java.net Tue Nov 3 11:57:19 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 3 Nov 2020 11:57:19 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4] In-Reply-To: References: <_p5rQggkG8fBak9RrEDnoOzRnl8hsezlFjNmN_MFRiU=.c4167eaf-79f4-45eb-9fa9-2d0613274392@github.com> Message-ID: On Tue, 3 Nov 2020 10:17:23 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into aarch64-base64-encoding >> - reconstructed the macro as function generate_base64_encode_simdround >> - change register name sp and unpack the macro >> - Merge branch 'master' into aarch64-base64-encoding >> - 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5483: > >> 5481: Register codec = rscratch1; >> 5482: Register length = rscratch2; >> 5483: > > Alias names for scratch registers have proved to be risky because assembler macros use scratch registers freely. A maintenance programmer might not to see this code uses rscratch1 and 2. Given that c_rarg6 and 7 are free, please use them. Done, I didn't realize this would be a problem at all. Thanks for the clarification. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From shade at openjdk.java.net Tue Nov 3 12:16:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 12:16:01 GMT Subject: RFR: 8255606: Enable concurrent stack processing on x86_32 platforms [v2] In-Reply-To: References: Message-ID: <3UnxY32uP-ERCn8odu787AX2OqPH8soXglsanBwpz3Q=.8eb97050-dd9e-4294-861c-aa5c3f3a90e2@github.com> On Fri, 30 Oct 2020 12:14:55 GMT, Zhengyu Gu wrote: >> 8255606: Enable concurrent stack processing on x86_32 platforms > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fix jump direction I have minor comments. src/hotspot/cpu/x86/x86_32.ad line 656: > 654: st->print_cr("POPL EBP"); st->print("\t"); > 655: if (do_polling() && C->is_method_compilation()) { > 656: st->print("cmpptr rsp, poll_offset[thread] \n\t" Let's say `CMPL` here? I don't think we reference `cmpptr` in outputs here. Also, note every instruction is capitalized in `x86_32.ad`. src/hotspot/cpu/x86/x86_64.ad line 933: > 931: if (do_polling() && C->is_method_compilation()) { > 932: st->print("\t"); > 933: st->print_cr("cmpptr rsp, poll_offset[r15_thread] \n\t" Ditto, leave `cmpq` here. ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/945 From coleenp at openjdk.java.net Tue Nov 3 12:22:03 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 12:22:03 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: References: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> <0heSKANZ4kJqlQOKY6MCs6cSZvT8-KRFRbbbKTlzNzA=.7224a164-a76d-47ce-8016-9db052b09773@github.com> Message-ID: On Tue, 3 Nov 2020 10:22:09 GMT, Erik ?sterlund wrote: >> Ok, will I run afoul of the ZGC people putting the parameter name after the parameter and the rest of the code, it is before? > > ZGC people passionately place the comment after the argument value. I see that but in the non-zgc code it's the opposite and this is non-zgc code. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From shade at openjdk.java.net Tue Nov 3 12:23:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 12:23:59 GMT Subject: RFR: 8255762: Shenandoah: Consolidate/streamline interpreter LRBs [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 11:11:11 GMT, Roman Kennke wrote: >> We currently have two LRB implementations in interpreter: one normal and one for native/weak LRB. We should consolidate them into one. >> >> The main issue here is that the two implementations follow different approaches: >> - the normal LRB calls through the shenandoah_lrb stub which does additional null- and cset-checking >> - the weak LRB calls to runtime directly and must not do cset-checking >> >> The reason for calling through the stub is that it gives more freedom to allocate two registers that are required for the cset check. However, we can invert the cset addressing like we did in JDK-8245465 and save a register. We can also eliminate the null-check and let the cset-check subsume it (like we do everywhere else). Allocating a single register for the cset-check is easy, and we can do so in-line without the extra jump through the stub. The runtime call through the stub has also been very costly: it dumps 2KB of register data on the stack at each call, that is very excessive. save_xmm_registers() should be more than enough (in-fact, I am almost certain that this is excessive too, and we should only need to save/restore xmm0 - but not in this patch). Not needing to generate the call-stub is also helpful for backportability, because in jdk8-shenandoah we cannot do that. >> >> Testing: hotspot_gc_shenandoah (x86_64, x86_32, aarch64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > A few touch-ups Looks good, apart from two nits. src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 301: > 299: Register r = as_Register(i); > 300: if (r != rsp && r != rbp && r != dst && r != src.base() && r != src.index()) { > 301: tmp1 = r; Minor nit: looks like you just want to `break` from here, and not check `tmp1->is_valid()` in the loop predicate? src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 335: > 333: assert(slot == 0, "must use all slots"); > 334: > 335: Register tmp2 = dst == rsi ? rdx : rsi; Please do parentheses around `(dst == rsi)`. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1010 From coleenp at openjdk.java.net Tue Nov 3 12:36:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 12:36:01 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: <0DUaLOt_27wPkI2SwP4BwykioL4Hn2c-j7hMz3AbHYI=.2270cb60-bd2f-4d72-abc8-cd8ea44d30b5@github.com> References: <3rZagetDi1APoH1Gg2q4Z5mVPYjk0vBHwDnnhuh7d6M=.da07fe0e-91d6-43cb-b7c7-f3f9daedb931@github.com> <0DUaLOt_27wPkI2SwP4BwykioL4Hn2c-j7hMz3AbHYI=.2270cb60-bd2f-4d72-abc8-cd8ea44d30b5@github.com> Message-ID: On Tue, 3 Nov 2020 10:36:21 GMT, Stefan Karlsson wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Code review comments from StefanK. > > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 185: > >> 183: // Serially remove unused oops from the table, and notify jvmti. >> 184: void JvmtiTagMapTable::unlink_and_post(JvmtiEnv* env) { >> 185: > > Stray newline that's a useful newline. > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 258: > >> 256: int rehash_len = moved_entries.length(); >> 257: // Now add back in the entries that were removed. >> 258: for (int i = 0; i < moved_entries.length(); i++) { > > rehash_len is read, but not used in for loop condition. It's in the logging message below. I'll use it in the loop too. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Tue Nov 3 12:40:06 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 12:40:06 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 10:33:16 GMT, Stefan Karlsson wrote: >> Since I went back and forth about what this function did (it posted events at one time), I thought the generic _processing name would be better. GC callers shouldn't really have to know what processing we're doing here. Hopefully it won't change from rehashing. That's why I like processing. > >> GC callers shouldn't really have to know what processing we're doing here. > > I completely disagree with this. It's extremely important that the GC and Runtime code agrees on what this code does and where the GC *must* call it. Knowing the details allows us to skip calling this after mark end, but forces us to call it in relocate start, when objects should start to move. Though, I don't want to block this review because of this point, so if you still thinks that a non-descriptive name is better then we can argue that separately. Ok, I'll rename it to needs_hashing. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Tue Nov 3 12:58:22 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 12:58:22 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: References: Message-ID: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: More review comments from Stefan and ErikO ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/cb4c83e0..eeaf9aed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=01-02 Stats: 18 lines in 7 files changed: 3 ins; 6 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From rkennke at openjdk.java.net Tue Nov 3 13:04:06 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 13:04:06 GMT Subject: RFR: 8255762: Shenandoah: Consolidate/streamline interpreter LRBs [v3] In-Reply-To: References: Message-ID: > We currently have two LRB implementations in interpreter: one normal and one for native/weak LRB. We should consolidate them into one. > > The main issue here is that the two implementations follow different approaches: > - the normal LRB calls through the shenandoah_lrb stub which does additional null- and cset-checking > - the weak LRB calls to runtime directly and must not do cset-checking > > The reason for calling through the stub is that it gives more freedom to allocate two registers that are required for the cset check. However, we can invert the cset addressing like we did in JDK-8245465 and save a register. We can also eliminate the null-check and let the cset-check subsume it (like we do everywhere else). Allocating a single register for the cset-check is easy, and we can do so in-line without the extra jump through the stub. The runtime call through the stub has also been very costly: it dumps 2KB of register data on the stack at each call, that is very excessive. save_xmm_registers() should be more than enough (in-fact, I am almost certain that this is excessive too, and we should only need to save/restore xmm0 - but not in this patch). Not needing to generate the call-stub is also helpful for backportability, because in jdk8-shenandoah we cannot do that. > > Testing: hotspot_gc_shenandoah (x86_64, x86_32, aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: A few more touch-ups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1010/files - new: https://git.openjdk.java.net/jdk/pull/1010/files/12382e6c..f031885d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1010&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1010&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1010.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1010/head:pull/1010 PR: https://git.openjdk.java.net/jdk/pull/1010 From rkennke at openjdk.java.net Tue Nov 3 13:04:07 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 13:04:07 GMT Subject: Integrated: 8255762: Shenandoah: Consolidate/streamline interpreter LRBs In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 15:55:40 GMT, Roman Kennke wrote: > We currently have two LRB implementations in interpreter: one normal and one for native/weak LRB. We should consolidate them into one. > > The main issue here is that the two implementations follow different approaches: > - the normal LRB calls through the shenandoah_lrb stub which does additional null- and cset-checking > - the weak LRB calls to runtime directly and must not do cset-checking > > The reason for calling through the stub is that it gives more freedom to allocate two registers that are required for the cset check. However, we can invert the cset addressing like we did in JDK-8245465 and save a register. We can also eliminate the null-check and let the cset-check subsume it (like we do everywhere else). Allocating a single register for the cset-check is easy, and we can do so in-line without the extra jump through the stub. The runtime call through the stub has also been very costly: it dumps 2KB of register data on the stack at each call, that is very excessive. save_xmm_registers() should be more than enough (in-fact, I am almost certain that this is excessive too, and we should only need to save/restore xmm0 - but not in this patch). Not needing to generate the call-stub is also helpful for backportability, because in jdk8-shenandoah we cannot do that. > > Testing: hotspot_gc_shenandoah (x86_64, x86_32, aarch64) This pull request has now been integrated. Changeset: 93ef0091 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/93ef0091 Stats: 418 lines in 4 files changed: 41 ins; 308 del; 69 mod 8255762: Shenandoah: Consolidate/streamline interpreter LRBs Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1010 From eosterlund at openjdk.java.net Tue Nov 3 13:12:02 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Nov 2020 13:12:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> References: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> Message-ID: <4rQSjnt6O05gF9AFgc-nOxXIPehn_TKL0jJj85fvbr4=.c80aafac-a743-47d0-9d1a-31caa35d54ec@github.com> On Tue, 3 Nov 2020 12:58:22 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > More review comments from Stefan and ErikO Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From rkennke at openjdk.java.net Tue Nov 3 13:26:09 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 13:26:09 GMT Subject: RFR: 8254315: Shenandoah: Concurrent weak reference processing [v29] In-Reply-To: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> References: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> Message-ID: > Until now, references (as in java.lang.ref.Reference and its subclasses WeakReference, SoftReference, PhantomReference and the non-public FinalReference - I'll collectively call them weak references for the purpose of clarity). Workloads that make heavvy use of such weak references will therefore potentially cause significant GC pauses. > > There are 3 main items that contribute to pause time linear to number of references, or worse: > - We need to scan and consider each reference on the various 'discovered' lists. > - We need to mark through subgraph of objects that are reachable only through FinalReference. Notice that this is theoretically only bounded by the live data set size. > - Finally, all no-longer-reachable references need to be enqueued in the 'pending list' > > The problem is somewhat mitigated by pre-cleaning the discovered list: Any weak reference that we find to be strongly reachable will be removed before we go into the final-mark-pause. However, that is only a band-aid. > > The solution to this is two-fold: > 1. Extend concurrent marking to also mark the 'finalizable' subgraph of the heap. This requires to extend the marking bitmap to allow for two kinds of reachability: each object can now be strongly and finalizably reachable. Whenever marking encounters a FinalReference, it will mark through the referent and switch to 'finalizably' reachability for all objects starting from the referent. When marking encounters finalizably reachable objects while marking strongly, it will 'upgrade' reachability of such objects to strongly reachable. All of this can be done concurrently. Any encounter of a Reference (or subclass) object will enqueue that object into a thread-local 'discovered' list. Except for FinalReference, marking stops there, and does not mark through the referent. > 2. Concurrent processing is performed after the final-mark pause. GC workers scan all discovered lists that have been collected by concurrent marking, and depending on reachability of the referent, either drop the Reference, or enqueue it into the global 'pending' list (from where it will be processed by Java reference handler thread). In addition to that, we must ensure that no referents become resurrected by accessing Reference.get() on it. In order to achieve this, we employ special barriers in Reference.get() intrinsics that return NULL when the referent is not reachable. > > Testing: hotspot_gc_shenadoah (release+fastdebug, x86+aarch64), specjvm+specjbb without regressions, tier1, tier2, vmTestbase_vm_metaspace, vmTestbase_nsk_jvmti, with -XX:+UseShenandoahGC without regressions, specjvm with various levels of verification Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 87 commits: - Merge branch 'master' into shenandoah-concurrent-weakrefs - Invoke interpreter weak-LRB on weak-refs too (fixes merge mistake) - Merge branch 'master' into shenandoah-concurrent-weakrefs - Adopt ShenandoahReferenceBarrier to recent changes in LRB runtime impl - Merge branch 'master' into shenandoah-concurrent-weakrefs - Invert strong/weak in marking tasks and related code - Fix merge mistake - Merge branch 'master' into shenandoah-concurrent-weakrefs - Pass marking-strength through chunked arrays - Rename mark_final -> mark_weak and several cleanups (by shade) - ... and 77 more: https://git.openjdk.java.net/jdk/compare/93ef0091...5e4a8f22 ------------- Changes: https://git.openjdk.java.net/jdk/pull/505/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=28 Stats: 2444 lines in 55 files changed: 1638 ins; 576 del; 230 mod Patch: https://git.openjdk.java.net/jdk/pull/505.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/505/head:pull/505 PR: https://git.openjdk.java.net/jdk/pull/505 From rkennke at openjdk.java.net Tue Nov 3 13:34:17 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 13:34:17 GMT Subject: RFR: 8254315: Shenandoah: Concurrent weak reference processing [v30] In-Reply-To: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> References: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> Message-ID: > Until now, references (as in java.lang.ref.Reference and its subclasses WeakReference, SoftReference, PhantomReference and the non-public FinalReference - I'll collectively call them weak references for the purpose of clarity). Workloads that make heavvy use of such weak references will therefore potentially cause significant GC pauses. > > There are 3 main items that contribute to pause time linear to number of references, or worse: > - We need to scan and consider each reference on the various 'discovered' lists. > - We need to mark through subgraph of objects that are reachable only through FinalReference. Notice that this is theoretically only bounded by the live data set size. > - Finally, all no-longer-reachable references need to be enqueued in the 'pending list' > > The problem is somewhat mitigated by pre-cleaning the discovered list: Any weak reference that we find to be strongly reachable will be removed before we go into the final-mark-pause. However, that is only a band-aid. > > The solution to this is two-fold: > 1. Extend concurrent marking to also mark the 'finalizable' subgraph of the heap. This requires to extend the marking bitmap to allow for two kinds of reachability: each object can now be strongly and finalizably reachable. Whenever marking encounters a FinalReference, it will mark through the referent and switch to 'finalizably' reachability for all objects starting from the referent. When marking encounters finalizably reachable objects while marking strongly, it will 'upgrade' reachability of such objects to strongly reachable. All of this can be done concurrently. Any encounter of a Reference (or subclass) object will enqueue that object into a thread-local 'discovered' list. Except for FinalReference, marking stops there, and does not mark through the referent. > 2. Concurrent processing is performed after the final-mark pause. GC workers scan all discovered lists that have been collected by concurrent marking, and depending on reachability of the referent, either drop the Reference, or enqueue it into the global 'pending' list (from where it will be processed by Java reference handler thread). In addition to that, we must ensure that no referents become resurrected by accessing Reference.get() on it. In order to achieve this, we employ special barriers in Reference.get() intrinsics that return NULL when the referent is not reachable. > > Testing: hotspot_gc_shenadoah (release+fastdebug, x86+aarch64), specjvm+specjbb without regressions, tier1, tier2, vmTestbase_vm_metaspace, vmTestbase_nsk_jvmti, with -XX:+UseShenandoahGC without regressions, specjvm with various levels of verification Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Invert check for access kind (merge mistake) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/505/files - new: https://git.openjdk.java.net/jdk/pull/505/files/5e4a8f22..58dead58 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=29 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=28-29 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/505.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/505/head:pull/505 PR: https://git.openjdk.java.net/jdk/pull/505 From zgu at openjdk.java.net Tue Nov 3 13:41:11 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 3 Nov 2020 13:41:11 GMT Subject: RFR: 8255606: Enable concurrent stack processing on x86_32 platforms [v3] In-Reply-To: References: Message-ID: > 8255606: Enable concurrent stack processing on x86_32 platforms Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Reverted back to cmpq/cmpl ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/945/files - new: https://git.openjdk.java.net/jdk/pull/945/files/c946b816..d6a88224 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=945&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=945&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/945.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/945/head:pull/945 PR: https://git.openjdk.java.net/jdk/pull/945 From zgu at openjdk.java.net Tue Nov 3 13:44:13 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 3 Nov 2020 13:44:13 GMT Subject: RFR: 8255606: Enable concurrent stack processing on x86_32 platforms [v4] In-Reply-To: References: Message-ID: > 8255606: Enable concurrent stack processing on x86_32 platforms Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Capitalize instructions in x86_32.ad ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/945/files - new: https://git.openjdk.java.net/jdk/pull/945/files/d6a88224..28c69c49 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=945&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=945&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/945.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/945/head:pull/945 PR: https://git.openjdk.java.net/jdk/pull/945 From zgu at openjdk.java.net Tue Nov 3 13:44:15 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 3 Nov 2020 13:44:15 GMT Subject: RFR: 8255606: Enable concurrent stack processing on x86_32 platforms [v2] In-Reply-To: <3UnxY32uP-ERCn8odu787AX2OqPH8soXglsanBwpz3Q=.8eb97050-dd9e-4294-861c-aa5c3f3a90e2@github.com> References: <3UnxY32uP-ERCn8odu787AX2OqPH8soXglsanBwpz3Q=.8eb97050-dd9e-4294-861c-aa5c3f3a90e2@github.com> Message-ID: <5JJIeaZhiTez4B7sDxCW0OYD86cuhCk6jQXGXzPucow=.36b27a50-5e05-4089-ba79-15fc707dda9d@github.com> On Tue, 3 Nov 2020 12:09:36 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix jump direction > > src/hotspot/cpu/x86/x86_32.ad line 656: > >> 654: st->print_cr("POPL EBP"); st->print("\t"); >> 655: if (do_polling() && C->is_method_compilation()) { >> 656: st->print("cmpptr rsp, poll_offset[thread] \n\t" > > Let's say `CMPL` here? I don't think we reference `cmpptr` in outputs here. Also, note every instruction is capitalized in `x86_32.ad`. Thanks for the review. Updated according to your suggestions. ------------- PR: https://git.openjdk.java.net/jdk/pull/945 From eosterlund at openjdk.java.net Tue Nov 3 13:50:11 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Nov 2020 13:50:11 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v3] In-Reply-To: References: Message-ID: <78WEYNz8e_RDtz43fTct4YdAh70TYGSM-x6Y0JQwgqs=.9c01b9c1-c8c2-420f-92f9-bb3cd95d2589@github.com> > The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation(). > > The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop. > > Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is wrong, as we can safepoint on the way back to Java, which will break the return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the transition to VM and back, into a block of code that is protected against GC. Before the JRT_BLOCK is called, we stash away the return oop, and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when InterpreterRuntime::post_method_exit is called when throwing an exception, we don't have the same problem of retaining an oop result, and hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the same as before for this path. > > This is a JVMTI bug that has probably been around for a long time. It crashes with all GCs, but was discovered recently after concurrent stack processing, as StefanK has been running better GC stressing code in JVMTI, and the bug reproduced more easily with concurrent stack processing, as the timings were a bit different. The following reproducer failed pretty much 100% of the time: > while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done > > With my fix I can run this repeatedly without any more failures. I have also sanity checked the patch by running tier 1-5, so that it does not introduces any new issues on its own. I have also used Stefan's nice external GC stressing with jcmd technique that was used to trigger crashes with other GCs, to make sure said crashes no longer reproduce either. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Serguei CR1: Check interpreted only ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/930/files - new: https://git.openjdk.java.net/jdk/pull/930/files/ae6355fd..4d68c624 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=930&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=930&range=01-02 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/930.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/930/head:pull/930 PR: https://git.openjdk.java.net/jdk/pull/930 From eosterlund at openjdk.java.net Tue Nov 3 13:54:57 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Nov 2020 13:54:57 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 21:00:23 GMT, Serguei Spitsyn wrote: > Erik, > > Thank you for the update! It looks more elegant. > > One concern is that after the move of this fragment to the post_method_exit_inner: > > ``` > 1614 if (state == NULL || !state->is_interp_only_mode()) { > 1615 // for any thread that actually wants method exit, interp_only_mode is set > 1616 return; > 1617 } > ``` > > there is no guarantee that the current frame is interpreted below: > > ``` > 1580 if (!exception_exit) { > 1581 oop oop_result; > 1582 BasicType type = current_frame.interpreter_frame_result(&oop_result, &value); > . . . > 1597 if (result.not_null() && !mh->is_native()) { > 1598 // We have to restore the oop on the stack for interpreter frames > 1599 *(oop*)current_frame.interpreter_frame_tos_address() = result(); > 1600 } > ``` > > Probably, extra checks for current_frame.is_interpreted_frame() in these fragments will be sufficient. That makes sense. Added a check in the latest version that we are in interp only mode. ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From rkennke at openjdk.java.net Tue Nov 3 14:36:19 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 14:36:19 GMT Subject: RFR: 8254315: Shenandoah: Concurrent weak reference processing [v31] In-Reply-To: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> References: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> Message-ID: > Until now, references (as in java.lang.ref.Reference and its subclasses WeakReference, SoftReference, PhantomReference and the non-public FinalReference - I'll collectively call them weak references for the purpose of clarity). Workloads that make heavvy use of such weak references will therefore potentially cause significant GC pauses. > > There are 3 main items that contribute to pause time linear to number of references, or worse: > - We need to scan and consider each reference on the various 'discovered' lists. > - We need to mark through subgraph of objects that are reachable only through FinalReference. Notice that this is theoretically only bounded by the live data set size. > - Finally, all no-longer-reachable references need to be enqueued in the 'pending list' > > The problem is somewhat mitigated by pre-cleaning the discovered list: Any weak reference that we find to be strongly reachable will be removed before we go into the final-mark-pause. However, that is only a band-aid. > > The solution to this is two-fold: > 1. Extend concurrent marking to also mark the 'finalizable' subgraph of the heap. This requires to extend the marking bitmap to allow for two kinds of reachability: each object can now be strongly and finalizably reachable. Whenever marking encounters a FinalReference, it will mark through the referent and switch to 'finalizably' reachability for all objects starting from the referent. When marking encounters finalizably reachable objects while marking strongly, it will 'upgrade' reachability of such objects to strongly reachable. All of this can be done concurrently. Any encounter of a Reference (or subclass) object will enqueue that object into a thread-local 'discovered' list. Except for FinalReference, marking stops there, and does not mark through the referent. > 2. Concurrent processing is performed after the final-mark pause. GC workers scan all discovered lists that have been collected by concurrent marking, and depending on reachability of the referent, either drop the Reference, or enqueue it into the global 'pending' list (from where it will be processed by Java reference handler thread). In addition to that, we must ensure that no referents become resurrected by accessing Reference.get() on it. In order to achieve this, we employ special barriers in Reference.get() intrinsics that return NULL when the referent is not reachable. > > Testing: hotspot_gc_shenadoah (release+fastdebug, x86+aarch64), specjvm+specjbb without regressions, tier1, tier2, vmTestbase_vm_metaspace, vmTestbase_nsk_jvmti, with -XX:+UseShenandoahGC without regressions, specjvm with various levels of verification Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: AArch64 build fixes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/505/files - new: https://git.openjdk.java.net/jdk/pull/505/files/58dead58..9165b6e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=30 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=29-30 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/505.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/505/head:pull/505 PR: https://git.openjdk.java.net/jdk/pull/505 From coleenp at openjdk.java.net Tue Nov 3 14:53:12 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 14:53:12 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into jvmti-table - Merge branch 'master' into jvmti-table - More review comments from Stefan and ErikO - Code review comments from StefanK. - 8212879: Make JVMTI TagMap table not hash on oop address ------------- Changes: https://git.openjdk.java.net/jdk/pull/967/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=03 Stats: 1749 lines in 41 files changed: 627 ins; 1020 del; 102 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From stuefe at openjdk.java.net Tue Nov 3 15:57:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Nov 2020 15:57:01 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers Message-ID: Hi all, may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. ---- This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. --- The fixed issues: 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. 4) Every platform handler has this section: JavaThread* thread = NULL; VMThread* vmthread = NULL; if (PosixSignals::are_signal_handlers_installed()) { if (t != NULL ){ if(t->is_Java_thread()) { thread = t->as_Java_thread(); } else if(t->is_VM_thread()){ vmthread = (VMThread *)t; } } } `vmthread` is unused on all platforms and can be removed. 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): if (sig == SIGPIPE || sig == SIGXFSZ) { // allow chained handler to go first if (PosixSignals::chained_handler(sig, info, ucVoid)) { return true; } else { // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 return true; } } - On s390 and ppc, we miss SIGXFSZ handling _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. - both paths return true - section can be shortened Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: > // unmask current signal > sigset_t newset; > sigemptyset(&newset); > sigaddset(&newset, sig); > sigprocmask(SIG_UNBLOCK, &newset, NULL); > - Use of `sigprocmask()` is UB in a multithreaded program. - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. 7) the JFR crash protection is not consistently checked in all platform handlers. 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). 9) on Linux ppc64 and AIX, we have this section: > if (sig == SIGILL && (pc < (address) 0x200)) { > goto report_and_die; > } which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. ---- The changes in this patch: a) hotspot signal handling is now done by the following functions: | | v v javaSignalHandler JVM_handle_linux_signal() | / v v javaSignalHandler_inner | v PosixSignals::pd_hotspot_signal_handler() The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. b) I commonized prologue- and epilogue coding. - I simplified (4) to a single line in the shared handler - I moved the JFR thread crash protection (7) up to the shared handler - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. - I simplified (5) and commonized it, and removed (9) completely - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. Thanks for reviewing. Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. ---- [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html [5] https://bugs.openjdk.java.net/browse/JDK-8253742 ------------- Commit messages: - Initial patch Changes: https://git.openjdk.java.net/jdk/pull/1034/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255711 Stats: 916 lines in 13 files changed: 116 ins; 686 del; 114 mod Patch: https://git.openjdk.java.net/jdk/pull/1034.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1034/head:pull/1034 PR: https://git.openjdk.java.net/jdk/pull/1034 From rkennke at openjdk.java.net Tue Nov 3 16:50:00 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 16:50:00 GMT Subject: RFR: 8255606: Enable concurrent stack processing on x86_32 platforms [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 13:44:13 GMT, Zhengyu Gu wrote: >> 8255606: Enable concurrent stack processing on x86_32 platforms > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Capitalize instructions in x86_32.ad Looks good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/945 From aph at openjdk.java.net Tue Nov 3 17:03:01 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 3 Nov 2020 17:03:01 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4] In-Reply-To: References: <_p5rQggkG8fBak9RrEDnoOzRnl8hsezlFjNmN_MFRiU=.c4167eaf-79f4-45eb-9fa9-2d0613274392@github.com> Message-ID: On Tue, 3 Nov 2020 11:51:42 GMT, Dong Bo wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5483: >> >>> 5481: Register codec = rscratch1; >>> 5482: Register length = rscratch2; >>> 5483: >> >> Alias names for scratch registers have proved to be risky because assembler macros use scratch registers freely. A maintenance programmer might not to see this code uses rscratch1 and 2. Given that c_rarg6 and 7 are free, please use them. > > Done, I didn't realize this would be a problem at all. > Thanks for the clarification. OK, thanks, I need to write some of this stuff down as guidance. Aliases for register names are always risky, but for the scratch registers doubly so. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From eosterlund at openjdk.java.net Tue Nov 3 17:06:56 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Nov 2020 17:06:56 GMT Subject: RFR: 8255606: Enable concurrent stack processing on x86_32 platforms [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 13:44:13 GMT, Zhengyu Gu wrote: >> 8255606: Enable concurrent stack processing on x86_32 platforms > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Capitalize instructions in x86_32.ad Looks good. Nice to see one more platform support this. Hoping that eventually all platforms will support this. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/945 From akozlov at openjdk.java.net Tue Nov 3 17:08:03 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 3 Nov 2020 17:08:03 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly Message-ID: JDK-8255716 (#983) uncovered that os::processor_count on Linux can be not equal to number of cores reported in /proc/cpuinfo. The latter historically was used to decide CPU_A53MAC feature. This patch restores feature detection based on /proc/cpuinfo. Testing: linux -version Unfortunately I cannot test windows/aarch64, CC: @lewurm @luhenry ------------- Commit messages: - Set CPU_A53MAC after /proc/cpuinfo Changes: https://git.openjdk.java.net/jdk/pull/1039/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1039&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255799 Stats: 15 lines in 4 files changed: 13 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1039.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1039/head:pull/1039 PR: https://git.openjdk.java.net/jdk/pull/1039 From zgu at openjdk.java.net Tue Nov 3 17:12:55 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 3 Nov 2020 17:12:55 GMT Subject: RFR: 8255606: Enable concurrent stack processing on x86_32 platforms [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 17:04:33 GMT, Erik ?sterlund wrote: > Looks good. Nice to see one more platform support this. Hoping that eventually all platforms will support this. Thanks, Erik. ------------- PR: https://git.openjdk.java.net/jdk/pull/945 From shade at openjdk.java.net Tue Nov 3 17:28:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 17:28:59 GMT Subject: RFR: 8255606: Enable concurrent stack processing on x86_32 platforms [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 13:44:13 GMT, Zhengyu Gu wrote: >> 8255606: Enable concurrent stack processing on x86_32 platforms > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Capitalize instructions in x86_32.ad Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/945 From zgu at openjdk.java.net Tue Nov 3 17:29:00 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 3 Nov 2020 17:29:00 GMT Subject: Integrated: 8255606: Enable concurrent stack processing on x86_32 platforms In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 20:40:30 GMT, Zhengyu Gu wrote: > 8255606: Enable concurrent stack processing on x86_32 platforms This pull request has now been integrated. Changeset: 134e22a0 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/134e22a0 Stats: 63 lines in 7 files changed: 34 ins; 14 del; 15 mod 8255606: Enable concurrent stack processing on x86_32 platforms Reviewed-by: shade, rkennke, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/945 From sspitsyn at openjdk.java.net Tue Nov 3 17:53:54 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 3 Nov 2020 17:53:54 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 13:52:24 GMT, Erik ?sterlund wrote: >> Erik, >> >> Thank you for the update! It looks more elegant. >> >> One concern is that after the move of this fragment to the post_method_exit_inner: >> 1614 if (state == NULL || !state->is_interp_only_mode()) { >> 1615 // for any thread that actually wants method exit, interp_only_mode is set >> 1616 return; >> 1617 } >> there is no guarantee that the current frame is interpreted below: >> 1580 if (!exception_exit) { >> 1581 oop oop_result; >> 1582 BasicType type = current_frame.interpreter_frame_result(&oop_result, &value); >> . . . >> 1597 if (result.not_null() && !mh->is_native()) { >> 1598 // We have to restore the oop on the stack for interpreter frames >> 1599 *(oop*)current_frame.interpreter_frame_tos_address() = result(); >> 1600 } >> Probably, extra checks for current_frame.is_interpreted_frame() in these fragments will be sufficient. > >> Erik, >> >> Thank you for the update! It looks more elegant. >> >> One concern is that after the move of this fragment to the post_method_exit_inner: >> >> ``` >> 1614 if (state == NULL || !state->is_interp_only_mode()) { >> 1615 // for any thread that actually wants method exit, interp_only_mode is set >> 1616 return; >> 1617 } >> ``` >> >> there is no guarantee that the current frame is interpreted below: >> >> ``` >> 1580 if (!exception_exit) { >> 1581 oop oop_result; >> 1582 BasicType type = current_frame.interpreter_frame_result(&oop_result, &value); >> . . . >> 1597 if (result.not_null() && !mh->is_native()) { >> 1598 // We have to restore the oop on the stack for interpreter frames >> 1599 *(oop*)current_frame.interpreter_frame_tos_address() = result(); >> 1600 } >> ``` >> >> Probably, extra checks for current_frame.is_interpreted_frame() in these fragments will be sufficient. > > That makes sense. Added a check in the latest version that we are in interp only mode. Hi Erik, I'm not sure, if this fragment is still needed: 1620 if (state == NULL || !state->is_interp_only_mode()) { 1621 // for any thread that actually wants method exit, interp_only_mode is set 1622 return; 1623 } Also, can it be that this condition is true: ` (state == NULL || !state->is_interp_only_mode())` but the top frame is interpreted? If so, then should we still safe/restore the result oop over a possible safepoint? Thanks, Serguei ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From kbarrett at openjdk.java.net Tue Nov 3 18:12:07 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 3 Nov 2020 18:12:07 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> References: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> Message-ID: On Tue, 3 Nov 2020 12:58:22 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > More review comments from Stefan and ErikO src/hotspot/share/gc/shared/weakProcessorPhases.hpp line 41: > 39: class Iterator; > 40: > 41: typedef void (*Processor)(BoolObjectClosure*, OopClosure*); I think this typedef is to support serial phases and that it is probably no longer used. src/hotspot/share/gc/shared/weakProcessorPhases.hpp line 50: > 48: }; > 49: > 50: typedef uint WeakProcessorPhase; This was originally written with the idea that WeakProcessorPhases::Phase (and WeakProcessorPhase) should be a scoped enum (but we didn't have that feature yet). It's possible there are places that don't cope with a scoped enum, since that feature wasn't available when the code was written, so there might have be mistakes. But because of that, I'd prefer to keep the WeakProcessorPhases::Phase type and the existing definition of WeakProcessorPhase. Except this proposed change is breaking that at least here: src/hotspot/share/gc/shared/weakProcessor.inline.hpp 116 uint oopstorage_index = WeakProcessorPhases::oopstorage_index(phase); 117 StorageState* cur_state = _storage_states.par_state(oopstorage_index); => 103 StorageState* cur_state = _storage_states.par_state(phase); I think eventually (as in some future RFE) this could all be collapsed to something provided by OopStorageSet. enum class : uint WeakProcessorPhase {}; ENUMERATOR_RANGE(WeakProcessorPhase, static_cast(0), static_cast(OopStorageSet::weak_count)); and replacing all uses of WeakProcessorPhases::Iterator with EnumIterator (which involves more than a type alias). Though it might be possible to go even further and eliminate WeakProcessorPhases as a thing separate from OopStorageSet. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Tue Nov 3 18:12:10 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 3 Nov 2020 18:12:10 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 14:53:12 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into jvmti-table > - Merge branch 'master' into jvmti-table > - More review comments from Stefan and ErikO > - Code review comments from StefanK. > - 8212879: Make JVMTI TagMap table not hash on oop address src/hotspot/share/prims/jvmtiTagMap.cpp line 3018: > 3016: } > 3017: // Later GC code will relocate the oops, so defer rehashing until then. > 3018: tag_map->_needs_rehashing = true; This is wrong for some collectors. I think all collectors ought to be calling set_needs_rehashing in appropriate places, and it can't be be correctly piggybacked on the num-dead callback. (See discussion above for that function.) For example, G1 remark pause does weak processing (including weak oopstorage) and will call the num-dead callback, but does not move objects, so does not require tagmap rehashing. (I think CMS oldgen remark may have been similar, for what that's worth.) src/hotspot/share/prims/jvmtiTagMap.cpp line 3015: > 3013: if (tag_map != NULL && !tag_map->is_empty()) { > 3014: if (num_dead_entries != 0) { > 3015: tag_map->hashmap()->unlink_and_post(tag_map->env()); Why are we doing this in the callback, rather than just setting a flag? I thought part of the point of this change was to get tagmap processing out of GC pauses. The same question applies to the non-safepoint side. The idea was to be lazy about updating the tagmap, waiting until someone actually needed to use it. Or if more prompt ObjectFree notifications are needed then signal some thread (maybe the service thread?) for followup. src/hotspot/share/prims/jvmtiTagMap.cpp line 2979: > 2977: > 2978: // Concurrent GC needs to call this in relocation pause, so after the objects are moved > 2979: // and have their new addresses, the table can be rehashed. I think the comment is confusing and wrong. The requirement is that the collector must call this before exposing moved objects to the mutator, and must provide the to-space invariant. (This whole design would not work with the old Shenandoah barriers without additional work. I don't know if tagmaps ever worked at all for them? Maybe they added calls to Access<>::resolve (since happily deceased) to deal with that?) I also think there are a bunch of missing calls; piggybacking on the num-dead callback isn't correct (see later comment about that). src/hotspot/share/prims/jvmtiTagMap.cpp line 127: > 125: // The table cleaning, posting and rehashing can race for > 126: // concurrent GCs. So fix it here once we have a lock or are > 127: // at a safepoint. I think this comment and the one below about locking are confused, at least about rehashing. I _think_ this is referring to concurrent num-dead notification? I've already commented there about it being a problem to do the unlink &etc in the GC pause (see later comment). It also seems like a bad idea to be doing this here and block progress by a concurrent GC because we're holding the tagmap lock for a long time, which is another reason to not have the num-dead notification do very much (and not require a lock that might be held here for a long time). ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From shade at openjdk.java.net Tue Nov 3 18:24:05 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 3 Nov 2020 18:24:05 GMT Subject: RFR: 8254315: Shenandoah: Concurrent weak reference processing [v31] In-Reply-To: References: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> Message-ID: <2rSnA-jJHIly2sT0ZMSZbo0hNasGUgeylnJ9jVs5QBU=.aa5c1c5f-1392-43be-97f3-d21eb679467b@github.com> On Tue, 3 Nov 2020 14:36:19 GMT, Roman Kennke wrote: >> Until now, references (as in java.lang.ref.Reference and its subclasses WeakReference, SoftReference, PhantomReference and the non-public FinalReference - I'll collectively call them weak references for the purpose of clarity). Workloads that make heavvy use of such weak references will therefore potentially cause significant GC pauses. >> >> There are 3 main items that contribute to pause time linear to number of references, or worse: >> - We need to scan and consider each reference on the various 'discovered' lists. >> - We need to mark through subgraph of objects that are reachable only through FinalReference. Notice that this is theoretically only bounded by the live data set size. >> - Finally, all no-longer-reachable references need to be enqueued in the 'pending list' >> >> The problem is somewhat mitigated by pre-cleaning the discovered list: Any weak reference that we find to be strongly reachable will be removed before we go into the final-mark-pause. However, that is only a band-aid. >> >> The solution to this is two-fold: >> 1. Extend concurrent marking to also mark the 'finalizable' subgraph of the heap. This requires to extend the marking bitmap to allow for two kinds of reachability: each object can now be strongly and finalizably reachable. Whenever marking encounters a FinalReference, it will mark through the referent and switch to 'finalizably' reachability for all objects starting from the referent. When marking encounters finalizably reachable objects while marking strongly, it will 'upgrade' reachability of such objects to strongly reachable. All of this can be done concurrently. Any encounter of a Reference (or subclass) object will enqueue that object into a thread-local 'discovered' list. Except for FinalReference, marking stops there, and does not mark through the referent. >> 2. Concurrent processing is performed after the final-mark pause. GC workers scan all discovered lists that have been collected by concurrent marking, and depending on reachability of the referent, either drop the Reference, or enqueue it into the global 'pending' list (from where it will be processed by Java reference handler thread). In addition to that, we must ensure that no referents become resurrected by accessing Reference.get() on it. In order to achieve this, we employ special barriers in Reference.get() intrinsics that return NULL when the referent is not reachable. >> >> Testing: hotspot_gc_shenadoah (release+fastdebug, x86+aarch64), specjvm+specjbb without regressions, tier1, tier2, vmTestbase_vm_metaspace, vmTestbase_nsk_jvmti, with -XX:+UseShenandoahGC without regressions, specjvm with various levels of verification > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > AArch64 build fixes Looks fine, modulo few nits below: src/hotspot/share/gc/shenandoah/c1/shenandoahBarrierSetC1.cpp line 56: > 54: _load_reference_barrier_normal_rt_code_blob(NULL), > 55: _load_reference_barrier_native_rt_code_blob(NULL), > 56: _load_reference_barrier_weakref_rt_code_blob(NULL) {} `weakref` or `weak`? `weakref` looks like new nomenclature to me. src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 1066: > 1064: if (in1->bottom_type() == TypePtr::NULL_PTR && > 1065: !((in2->Opcode() == Op_ShenandoahLoadReferenceBarrier) && > 1066: ((ShenandoahLoadReferenceBarrierNode*)in2)->kind() != ShenandoahBarrierSet::AccessKind::NORMAL)) { The comment "LRB native" deserves a new wording now? I.e. "then step over normal LRB barriers". src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.hpp line 53: > 51: > 52: template > 53: inline void do_chunked_array_start(ShenandoahObjToScanQueue* q, T* cl, oop array, bool strong); This should be `bool weak`? src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.hpp line 56: > 54: > 55: template > 56: inline void do_chunked_array(ShenandoahObjToScanQueue* q, T* cl, oop array, int chunk, int pow, bool strong); This should be `bool weak`? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/505 From kvn at openjdk.java.net Tue Nov 3 19:00:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 19:00:59 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v9] In-Reply-To: References: <8Ryyxuf5P2D6WNyj4riYCTgN0U6WLrLpBmxhNbnmPpQ=.b2ed5660-99d0-49d1-83e0-8b2de518d7b8@github.com> Message-ID: <3OBqYTJqjla1_OhTl3dXiNKljr3yyba_OIzxlNvHgnk=.ba8e45f8-34da-42e0-ae0c-e30197c2438b@github.com> On Fri, 23 Oct 2020 12:01:11 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.hpp line 826: >> >>> 824: class VectorMaskGenNode : public TypeNode { >>> 825: public: >>> 826: VectorMaskGenNode(Node* src, const Type* ty, const Type* ety): TypeNode(ty, 2), _elemType(ety) { >> >> Sorry, I don't quite understand the arguments here. What does 'src' mean to the mask? > > ty -> Node type , long in this case since for X86 mask register is 64 bit wide. > ety -> Mask element type, currently used during LoadVectorMasked/StoreVectorMasked idealization to compute the block sizes for constant masks and replace masked vector operations with non-masked if block size is equal to vector size. Src has been replaced by a better name "length" used for mask computation. Please, use meaningful names. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From rkennke at openjdk.java.net Tue Nov 3 19:03:11 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 19:03:11 GMT Subject: RFR: 8254315: Shenandoah: Concurrent weak reference processing [v32] In-Reply-To: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> References: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> Message-ID: <8LETdj0OOts00biRQDWKIr-FWZ2rPQf-DrdOnjxz68s=.6e174122-8317-48a1-8e49-6fd2da1d1230@github.com> > Until now, references (as in java.lang.ref.Reference and its subclasses WeakReference, SoftReference, PhantomReference and the non-public FinalReference - I'll collectively call them weak references for the purpose of clarity). Workloads that make heavvy use of such weak references will therefore potentially cause significant GC pauses. > > There are 3 main items that contribute to pause time linear to number of references, or worse: > - We need to scan and consider each reference on the various 'discovered' lists. > - We need to mark through subgraph of objects that are reachable only through FinalReference. Notice that this is theoretically only bounded by the live data set size. > - Finally, all no-longer-reachable references need to be enqueued in the 'pending list' > > The problem is somewhat mitigated by pre-cleaning the discovered list: Any weak reference that we find to be strongly reachable will be removed before we go into the final-mark-pause. However, that is only a band-aid. > > The solution to this is two-fold: > 1. Extend concurrent marking to also mark the 'finalizable' subgraph of the heap. This requires to extend the marking bitmap to allow for two kinds of reachability: each object can now be strongly and finalizably reachable. Whenever marking encounters a FinalReference, it will mark through the referent and switch to 'finalizably' reachability for all objects starting from the referent. When marking encounters finalizably reachable objects while marking strongly, it will 'upgrade' reachability of such objects to strongly reachable. All of this can be done concurrently. Any encounter of a Reference (or subclass) object will enqueue that object into a thread-local 'discovered' list. Except for FinalReference, marking stops there, and does not mark through the referent. > 2. Concurrent processing is performed after the final-mark pause. GC workers scan all discovered lists that have been collected by concurrent marking, and depending on reachability of the referent, either drop the Reference, or enqueue it into the global 'pending' list (from where it will be processed by Java reference handler thread). In addition to that, we must ensure that no referents become resurrected by accessing Reference.get() on it. In order to achieve this, we employ special barriers in Reference.get() intrinsics that return NULL when the referent is not reachable. > > Testing: hotspot_gc_shenadoah (release+fastdebug, x86+aarch64), specjvm+specjbb without regressions, tier1, tier2, vmTestbase_vm_metaspace, vmTestbase_nsk_jvmti, with -XX:+UseShenandoahGC without regressions, specjvm with various levels of verification Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/shenandoah-concurrent-weakrefs' into shenandoah-concurrent-weakrefs - Final touch-ups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/505/files - new: https://git.openjdk.java.net/jdk/pull/505/files/9165b6e5..9f367a7c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=31 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=505&range=30-31 Stats: 14 lines in 6 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/505.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/505/head:pull/505 PR: https://git.openjdk.java.net/jdk/pull/505 From rkennke at openjdk.java.net Tue Nov 3 19:03:12 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 3 Nov 2020 19:03:12 GMT Subject: Integrated: 8254315: Shenandoah: Concurrent weak reference processing In-Reply-To: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> References: <8byaPRNFSF4tG_fA2jxtiDwcEbbMS_Zmk39w86ugIV4=.6a942481-9fd0-44f7-a42a-3668b22bea3e@github.com> Message-ID: On Mon, 5 Oct 2020 13:42:02 GMT, Roman Kennke wrote: > Until now, references (as in java.lang.ref.Reference and its subclasses WeakReference, SoftReference, PhantomReference and the non-public FinalReference - I'll collectively call them weak references for the purpose of clarity). Workloads that make heavvy use of such weak references will therefore potentially cause significant GC pauses. > > There are 3 main items that contribute to pause time linear to number of references, or worse: > - We need to scan and consider each reference on the various 'discovered' lists. > - We need to mark through subgraph of objects that are reachable only through FinalReference. Notice that this is theoretically only bounded by the live data set size. > - Finally, all no-longer-reachable references need to be enqueued in the 'pending list' > > The problem is somewhat mitigated by pre-cleaning the discovered list: Any weak reference that we find to be strongly reachable will be removed before we go into the final-mark-pause. However, that is only a band-aid. > > The solution to this is two-fold: > 1. Extend concurrent marking to also mark the 'finalizable' subgraph of the heap. This requires to extend the marking bitmap to allow for two kinds of reachability: each object can now be strongly and finalizably reachable. Whenever marking encounters a FinalReference, it will mark through the referent and switch to 'finalizably' reachability for all objects starting from the referent. When marking encounters finalizably reachable objects while marking strongly, it will 'upgrade' reachability of such objects to strongly reachable. All of this can be done concurrently. Any encounter of a Reference (or subclass) object will enqueue that object into a thread-local 'discovered' list. Except for FinalReference, marking stops there, and does not mark through the referent. > 2. Concurrent processing is performed after the final-mark pause. GC workers scan all discovered lists that have been collected by concurrent marking, and depending on reachability of the referent, either drop the Reference, or enqueue it into the global 'pending' list (from where it will be processed by Java reference handler thread). In addition to that, we must ensure that no referents become resurrected by accessing Reference.get() on it. In order to achieve this, we employ special barriers in Reference.get() intrinsics that return NULL when the referent is not reachable. > > Testing: hotspot_gc_shenadoah (release+fastdebug, x86+aarch64), specjvm+specjbb without regressions, tier1, tier2, vmTestbase_vm_metaspace, vmTestbase_nsk_jvmti, with -XX:+UseShenandoahGC without regressions, specjvm with various levels of verification This pull request has now been integrated. Changeset: f64a15d6 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/f64a15d6 Stats: 2440 lines in 55 files changed: 1638 ins; 576 del; 226 mod 8254315: Shenandoah: Concurrent weak reference processing Reviewed-by: zgu, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/505 From kvn at openjdk.java.net Tue Nov 3 19:28:05 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 3 Nov 2020 19:28:05 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 10:28:00 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - JDK-8252848 : Review comments resolution. > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Replacing explicit type checks with existing type checking routines > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a Changes requested by kvn (Reviewer). src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8074: > 8072: break; > 8073: default: > 8074: assert(false,"Should not reach here."); Please, use fatal() here and print typ value. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8055: > 8053: > 8054: > 8055: void MacroAssembler::evmovdqu(BasicType typ, KRegister kmask, XMMRegister dst, Address src, int vector_len) { Don't shorten words: typ - > type src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8079: > 8077: } > 8078: > 8079: void MacroAssembler::evmovdqu(BasicType typ, KRegister kmask, Address dst, XMMRegister src, int vector_len) { typ -> type src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8098: > 8096: break; > 8097: default: > 8098: assert(false,"Should not reach here."); fatal() src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1093: > 1091: // AVX512 Unaligned > 1092: void evmovdqu(BasicType typ, KRegister kmask, Address dst, XMMRegister src, int vector_len); > 1093: void evmovdqu(BasicType typ, KRegister kmask, XMMRegister dst, Address src, int vector_len); typ -> type src/hotspot/cpu/x86/vm_version_x86.cpp line 1409: > 1407: ArrayCopyPartialInlineSize != 32 && > 1408: ArrayCopyPartialInlineSize != 64)) { > 1409: int pi_size = 0; What is 'pi'? src/hotspot/cpu/x86/vm_version_x86.cpp line 1410: > 1408: ArrayCopyPartialInlineSize != 64)) { > 1409: int pi_size = 0; > 1410: if (MaxVectorSize > 32 && AVX3Threshold == 0) { I think we can compare with 64 here because MaxVectorSize value is power of 2: (MaxVectorSize >= 64 src/hotspot/cpu/x86/vm_version_x86.cpp line 1423: > 1421: if (ArrayCopyPartialInlineSize > MaxVectorSize) { > 1422: ArrayCopyPartialInlineSize = MaxVectorSize; > 1423: warning("Setting ArrayCopyPartialInlineSize as MaxVectorSize"); warning only if ArrayCopyPartialInlineSize is not default. src/hotspot/share/opto/arraycopynode.hpp line 184: > 182: static bool may_modify(const TypeOopPtr *t_oop, MemBarNode* mb, PhaseTransform *phase, ArrayCopyNode*& ac); > 183: > 184: static int get_partial_inline_vector_lane_count(BasicType type, int con_len); 'con' defined in Dictionary as 'an instance of deceiving or tricking someone'. Please, don't use short words which may confuse. src/hotspot/share/opto/macroArrayCopy.cpp line 202: > 200: bool PhaseMacroExpand::generate_partial_inlining_block(Node** ctrl, MergeMemNode** mem, const TypePtr* adr_type, > 201: RegionNode** exit_block, Node** result_memory, Node* length, > 202: Node* src_start, Node* dst_start, BasicType type) { I need more time to look on this method. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From Monica.Beckwith at microsoft.com Tue Nov 3 20:05:59 2020 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Tue, 3 Nov 2020 20:05:59 +0000 Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: , Message-ID: Hi all, First of all, big thanks to Volker for starting this discussion. As you all may have seen from the PRs we submitted last week [1] and our previous PR [2], we have been working on enabling JVMCI and AOT for Windows on Arm64. So RFR 8255616 came in as a surprise to us. It would be really helpful for us to understand the implication of 8255616 on Projects Metropolis and Leyden. We had started looking at Metropolis but more details on Leyden are hard to find. Given the interdependence of both these projects on the Graal compiler and the AOT/jaotc compiler, (as echoed by Volker), we would appreciate any guidance as to the future of these projects and of startup related improvements to C2. Regards, Monica [1] https://github.com/openjdk/jdk/pull/972 [2] https://github.com/openjdk/jdk/pull/685 From: hotspot-dev Date: Tuesday, November 3, 2020 at 4:52 AM To: Vladimir Kozlov Cc: HotSpot Open Source Developers , Mark Reinhold , discuss at openjdk.java.net Subject: Re: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK Hi Vladimir, this is an interesting step and I wonder how it affects the OpenJDK Graal, Metropolis and Leyden projects? - Project Graal [1] seems to have already been merged into project Metropolis as it states on its project page: "Further work on integrating Graal in the OpenJDK has moved to Project Metropolis." - Project Metropolis [2] has the following mission statement on its project page: "The goal of this Project is to provide a venue to explore and incubate advanced "Java-on-Java" implementation techniques for HotSpot. Our starting point is earlier proposals for using the Graal compiler and AOT static compilation technology to replace the HotSpot server compiler, and possibly other components of HotSpot." It seems that this goal becomes void when Graal AOT and Grall JIT are abandoned in the OpenJDK. - Project Leyden [?]: @Mark: what's actually the state of Project Leyden? We had a discussion [3], a vote [4] and the approval of the project [5] yet nothing has happened ever since. There's neither a project page nor a mailing list. Considering the fact that Leyden was supposed to "be based upon existing components in the JDK such as the HotSpot JVM, the `jaotc` ahead-of-time compiler, application class-data sharing, and the `jlink` linking tool" I wonder if Leyden is already dead before its instantiation if "jaotc", one of its core components, has now been deprecated? Or are there any plans to enhance C2 for AOT scenarios? Thank you and best regards, Volker [1] https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fopenjdk.java.net%2Fprojects%2Fgraal%2F&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490480155%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ByrtmJ4lUa%2FoMsb1IEVM84FdUOEpT490i6BDVgvodBM%3D&reserved=0 [2] https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fopenjdk.java.net%2Fprojects%2Fmetropolis%2F&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490480155%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sKccrirpZ68pkGRBpIkSb%2FPLt9RRoaQkQI5K24CgOe0%3D&reserved=0 [3] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fdiscuss%2F2020-April%2F005429.html&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490480155%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Qlw%2BGU%2Fll%2BzS4k5AZUAwA5uwkH7el%2Bc7OKsDg6%2FE30E%3D&reserved=0 [4] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fdiscuss%2F2020-May%2F005475.html&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490480155%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KK0MncJ%2FGkSMraj%2FU18N9KCrZ3pr4LrErteXbyf71IE%3D&reserved=0 [5] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fannounce%2F2020-June%2F000290.html&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490480155%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zwrSmRIDezZTbRngLx1Wd5PjTL4yGMwtZRlwggieqFg%3D&reserved=0 On Fri, Oct 30, 2020 at 6:47 PM Vladimir Kozlov wrote: > > We shipped Ahead-of-Time compilation (the jaotc tool) in JDK 9, as an experimental feature. We shipped Graal as an experimental JIT compiler in JDK 10. We haven't seen much use of these features, and the effort required to support and enhance them is significant. We therefore intend to disable these features in Oracle builds as of JDK 16. > > We'll leave the sources for these features in the repository, in case any one else is interested in building them. But we will not update or test them. > > We'll continue to build and ship JVMCI as an experimental feature in Oracle builds. > > Tested changes in all tiers. > > I verified that with these changes I still able to build Graal in open repo and run graalunit testing: > > `open$ bash test/hotspot/jtreg/compiler/graalunit/downloadLibs.sh /mydir/graalunit_lib/` > `open$ bash configure --with-debug-level=fastdebug --with-graalunit-lib=/mydir/graalunit_lib/ --with-jtreg=/mydir/jtreg` > `open$ make jdk-image` > `open$ make test-image` > `open$ make run-test TEST=compiler/graalunit/HotspotTest.java` > > ------------- > > Commit messages: > - 8255616: Disable AOT and Graal in Oracle OpenJDK > > Changes: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.openjdk.java.net%2Fjdk%2Fpull%2F960%2Ffiles&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490490115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HHHqkutDMOw0XV6PeYHxJ%2FbJvyeDHpJxSlUodWB3ygU%3D&reserved=0 > Webrev: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwebrevs.openjdk.java.net%2F%3Frepo%3Djdk%26pr%3D960%26range%3D00&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490490115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Pwozdi2R%2FUZZiLJbWzVTMCh48aCKgNHP8Zq0pDAlguU%3D&reserved=0 > Issue: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8255616&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490490115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=orjYyCA36nFRXkZyleMN%2FUhcxmx7Cg%2FKTvMdwg5wn6M%3D&reserved=0 > Stats: 36 lines in 4 files changed: 21 ins; 11 del; 4 mod > Patch: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.openjdk.java.net%2Fjdk%2Fpull%2F960.diff&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490490115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pEkXolP4JV4LFOBGCeo1IWXyOv%2B9sWqFrd13tfhyKdU%3D&reserved=0 > Fetch: git fetch https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.openjdk.java.net%2Fjdk&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490490115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=njZ6RhwSFdlm79exUlbQ10xnLDxHFyby%2FePp7MVoQ38%3D&reserved=0 pull/960/head:pull/960 > > PR: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.openjdk.java.net%2Fjdk%2Fpull%2F960&data=04%7C01%7Cmonica.beckwith%40microsoft.com%7Cfb701afe07bf4350b1c808d87fe68de3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637399975490490115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1hJxST7V7Lbp1VwT9gv04DLgTUh8qAPefAksjY0T8yM%3D&reserved=0 From coleenp at openjdk.java.net Tue Nov 3 21:17:05 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 21:17:05 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 14:47:35 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into jvmti-table >> - Merge branch 'master' into jvmti-table >> - More review comments from Stefan and ErikO >> - Code review comments from StefanK. >> - 8212879: Make JVMTI TagMap table not hash on oop address > > src/hotspot/share/prims/jvmtiTagMap.cpp line 3018: > >> 3016: } >> 3017: // Later GC code will relocate the oops, so defer rehashing until then. >> 3018: tag_map->_needs_rehashing = true; > > This is wrong for some collectors. I think all collectors ought to be calling set_needs_rehashing in appropriate places, and it can't be be correctly piggybacked on the num-dead callback. (See discussion above for that function.) > > For example, G1 remark pause does weak processing (including weak oopstorage) and will call the num-dead callback, but does not move objects, so does not require tagmap rehashing. > > (I think CMS oldgen remark may have been similar, for what that's worth.) Ok, so I'm going to need help to know where in all the different GCs to make this call. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From ayang at openjdk.java.net Tue Nov 3 21:17:03 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 3 Nov 2020 21:17:03 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 14:53:12 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into jvmti-table > - Merge branch 'master' into jvmti-table > - More review comments from Stefan and ErikO > - Code review comments from StefanK. > - 8212879: Make JVMTI TagMap table not hash on oop address src/hotspot/share/utilities/hashtable.cpp line 164: > 162: } > 163: } > 164: return newsize; It is existing code, but could be made clearer as part of this PR: int newsize; for (int i=0; i= requested) break; } return newsize; Additionally, this method could be made `const`, right? PS: not a review, just a comment in passing ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From daniel.daugherty at oracle.com Tue Nov 3 21:18:23 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 3 Nov 2020 16:18:23 -0500 Subject: RFR: 8212879: Make JVMTI TagMap table not hash on oop address In-Reply-To: References: Message-ID: <3fb77c6e-a99b-c90b-9170-1ce5fca6df1b@oracle.com> On 11/2/20 8:22 AM, Coleen Phillimore wrote: > On Mon, 2 Nov 2020 08:34:17 GMT, Stefan Karlsson wrote: > >>> src/hotspot/share/prims/jvmtiTagMap.cpp line 126: >>> >>>> 124: // concurrent GCs. So fix it here once we have a lock or are >>>> 125: // at a safepoint. >>>> 126: // SetTag and GetTag should not post events! >>> I think it would be good to explain why. Otherwise, this just leaves the readers wondering why this is the case. >> Maybe even move this comment to the set_tag/get_tag code. > I was trying to explain why there's a boolean there but I can put this comment at both get_tag and set_tag. > > // Check if we have to processing for concurrent GCs. Typo: s/we have to processing/we have to do processing/ > // GetTag should not post events because the JavaThread has to > // transition to native for the callback and this cannot stop for > // safepoints with the hashmap lock held. > check_hashmap(false); > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/967 From gziemski at openjdk.java.net Tue Nov 3 21:21:07 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 3 Nov 2020 21:21:07 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v2] In-Reply-To: References: Message-ID: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (this change is superseeded by JDK-8252533, will need to merge) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: revert JVM_handle_XXX_signal change ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/636/files - new: https://git.openjdk.java.net/jdk/pull/636/files/29f56357..f0bbbcc7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=00-01 Stats: 47 lines in 14 files changed: 17 ins; 0 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 3 21:21:08 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 3 Nov 2020 21:21:08 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v2] In-Reply-To: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: <97yDRMP9ypRIEe4imbYAggioWNUklNuGfKwnGOjt5hU=.db3180b6-40ab-4a05-8e2c-4a957a6a02f6@github.com> On Mon, 2 Nov 2020 19:43:27 GMT, Thomas Stuefe wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> revert JVM_handle_XXX_signal change > > make/hotspot/symbols/symbols-linux line 24: > >> 22: # >> 23: >> 24: JVM_handle_posix_signal > > Please don't change these (see comment above). Reverted. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 3 21:24:58 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 3 Nov 2020 21:24:58 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v2] In-Reply-To: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: On Mon, 2 Nov 2020 19:59:08 GMT, Thomas Stuefe wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> revert JVM_handle_XXX_signal change > > src/hotspot/os/posix/signals_posix.cpp line 452: > >> 450: } >> 451: >> 452: // Renamed from 'signalHandler' to avoid collision with other shared libs. > > Please don't change those, nor javaSignalHandler(). Reverted. > src/hotspot/os/posix/signals_posix.cpp line 1261: > >> 1259: PosixSignals::print_signal_handler(st, SHUTDOWN3_SIGNAL , buf, buflen); >> 1260: PosixSignals::print_signal_handler(st, BREAK_SIGNAL, buf, buflen); >> 1261: #if defined(AIX) > > Can you change both #ifdefs to: `#ifdef SIGDANGER` resp. `#ifdef SIGTRAP` please? Some other Unices have this too. > > (Side note, I always wanted to change this coding to a loop to print all signal handlers unconditionally, regardless of whether this is a "hotspot signal" or not. Since when analyzing customer problems, sometimes its interesting to know if other handlers are installed too(eg SIGCHILD). At least that's how we do things in our propietary VM.) Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From coleenp at openjdk.java.net Tue Nov 3 21:28:08 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 21:28:08 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 14:50:36 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into jvmti-table >> - Merge branch 'master' into jvmti-table >> - More review comments from Stefan and ErikO >> - Code review comments from StefanK. >> - 8212879: Make JVMTI TagMap table not hash on oop address > > src/hotspot/share/prims/jvmtiTagMap.cpp line 3015: > >> 3013: if (tag_map != NULL && !tag_map->is_empty()) { >> 3014: if (num_dead_entries != 0) { >> 3015: tag_map->hashmap()->unlink_and_post(tag_map->env()); > > Why are we doing this in the callback, rather than just setting a flag? I thought part of the point of this change was to get tagmap processing out of GC pauses. The same question applies to the non-safepoint side. The idea was to be lazy about updating the tagmap, waiting until someone actually needed to use it. Or if more prompt ObjectFree notifications are needed then signal some thread (maybe the service thread?) for followup. The JVMTI code expects the posting to be done quite eagerly presumably during GC, before it has a chance to disable the event or some other such operation. So the posting is done during the notification because it's as soon as possible. Deferring to the ServiceThread had two problems. 1. the event came later than the caller is expecting it, and in at least one test the event was disabled before posting, and 2. there's a comment in the code why we can't post events with a JavaThread. We'd have to transition into native while holding a no safepoint lock (or else deadlock). The point of making this change was so that the JVMTI table does not need GC code to serially process the table. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From dcubed at openjdk.java.net Tue Nov 3 21:28:06 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 3 Nov 2020 21:28:06 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: <6uV_hLuxf7fu90EgzZckv9LwT-jAVboukH35MKaTtcU=.40efbbd6-fd6e-427a-a940-784ec86ddb15@github.com> On Mon, 2 Nov 2020 13:19:08 GMT, Coleen Phillimore wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into jvmti-table >> - Merge branch 'master' into jvmti-table >> - More review comments from Stefan and ErikO >> - Code review comments from StefanK. >> - 8212879: Make JVMTI TagMap table not hash on oop address > > I think I addressed your comments, retesting now. Thank you! @coleenp - please make sure you hear from someone on the Serviceability team for this PR... ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From david.holmes at oracle.com Tue Nov 3 21:30:29 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 4 Nov 2020 07:30:29 +1000 Subject: Biased locking Obsoletion In-Reply-To: References: Message-ID: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> Expanding to hotspot-dev. On 4/11/2020 7:08 am, Patricio Chilano wrote: > Hi all, > > As discussed in 8231264, the idea was to switch biased locking to false > by default and deprecate all related flags with the intent to remove the > code in a future release unless compelling evidence showed that the code > is worth maintaining. > I see there is only one issue that was filed since biased locking was > disabled by default (https://github.com/openjdk/jdk/pull/542) that seems > to have been addressed. As per 8231264 change, the code was set to be > obsoleted in 16, so we are already in a position to remove biased > locking code unless there are arguments for the contrary. The > alternative would be to give more time and move biased locking > obsoletion to a future release. > Let me know your thoughts. > > Thanks, > > Patricio From coleenp at openjdk.java.net Tue Nov 3 21:33:59 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 21:33:59 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 16:12:21 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into jvmti-table >> - Merge branch 'master' into jvmti-table >> - More review comments from Stefan and ErikO >> - Code review comments from StefanK. >> - 8212879: Make JVMTI TagMap table not hash on oop address > > src/hotspot/share/prims/jvmtiTagMap.cpp line 2979: > >> 2977: >> 2978: // Concurrent GC needs to call this in relocation pause, so after the objects are moved >> 2979: // and have their new addresses, the table can be rehashed. > > I think the comment is confusing and wrong. The requirement is that the collector must call this before exposing moved objects to the mutator, and must provide the to-space invariant. (This whole design would not work with the old Shenandoah barriers without additional work. I don't know if tagmaps ever worked at all for them? Maybe they added calls to Access<>::resolve (since happily deceased) to deal with that?) I also think there are a bunch of missing calls; piggybacking on the num-dead callback isn't correct (see later comment about that). So the design is that when the oops have new addresses, we set a flag in the table to rehash it. Not sure why this is wrong and why wouldn't it work for shenandoah? @zhengyu123 ? When we call WeakHandle.peek()/resolve() after the call, the new/moved oop address should be returned. Why wouldn't this be the case? ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From gziemski at openjdk.java.net Tue Nov 3 21:35:08 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 3 Nov 2020 21:35:08 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: Message-ID: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (this change is superseeded by JDK-8252533, will need to merge) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: - use ifdef(SIGDANGER) and ifdef(SIGTRAP) - revert unblock_program_error_signals change ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/636/files - new: https://git.openjdk.java.net/jdk/pull/636/files/f0bbbcc7..1c2726de Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=01-02 Stats: 35 lines in 2 files changed: 20 ins; 8 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 3 21:35:09 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 3 Nov 2020 21:35:09 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: On Mon, 2 Nov 2020 20:00:23 GMT, Thomas Stuefe wrote: >> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >> >> - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >> - revert unblock_program_error_signals change > > src/hotspot/os/posix/signals_posix.cpp line 459: > >> 457: // on all our platforms they would bring down the process immediately when >> 458: // getting raised while being blocked. >> 459: unblock_program_error_signals(); > > As remarked above, this will conflict with JDK-8252533, since I remove this function too. Can we leave this out please? Reverted. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From coleenp at openjdk.java.net Tue Nov 3 21:46:04 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 21:46:04 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 16:17:58 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into jvmti-table >> - Merge branch 'master' into jvmti-table >> - More review comments from Stefan and ErikO >> - Code review comments from StefanK. >> - 8212879: Make JVMTI TagMap table not hash on oop address > > src/hotspot/share/prims/jvmtiTagMap.cpp line 127: > >> 125: // The table cleaning, posting and rehashing can race for >> 126: // concurrent GCs. So fix it here once we have a lock or are >> 127: // at a safepoint. > > I think this comment and the one below about locking are confused, at least about rehashing. I _think_ this is referring to concurrent num-dead notification? I've already commented there about it being a problem to do the unlink &etc in the GC pause (see later comment). It also seems like a bad idea to be doing this here and block progress by a concurrent GC because we're holding the tagmap lock for a long time, which is another reason to not have the num-dead notification do very much (and not require a lock that might be held here for a long time). The comment is trying to describe the situation like: 1. mark-end pause (WeakHandle.peek() returns NULL because object A is unmarked) 2. safepoint for heap walk 2a. Need to post ObjectFree event for object A before the heap walk doesn't find object A. 3. gc_notification - would have posted an ObjectFree event for object A if the heapwalk hadn't intervened The check_hashmap() function also checks whether the hash table needs to be rehashed before the next operation that uses the hashtable. Both operations require the table to be locked. The unlink and post needs to be in a GC pause for reasons that I stated above. The unlink and post were done in a GC pause so this isn't worse for any GCs. The lock can be held for concurrent GC while the number of entries are processed and this would be a delay for some applications that have requested a lot of tags, but these applications have asked for this and it's not worse than what we had with GC walking this table in safepoints. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Tue Nov 3 21:46:02 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 21:46:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 13:19:08 GMT, Coleen Phillimore wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into jvmti-table >> - Merge branch 'master' into jvmti-table >> - More review comments from Stefan and ErikO >> - Code review comments from StefanK. >> - 8212879: Make JVMTI TagMap table not hash on oop address > > I think I addressed your comments, retesting now. Thank you! > @coleenp - please make sure you hear from someone on the Serviceability team > for this PR... I've asked @sspitsyn to review this. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Tue Nov 3 21:46:05 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 21:46:05 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: References: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> Message-ID: On Tue, 3 Nov 2020 13:43:32 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> More review comments from Stefan and ErikO > > src/hotspot/share/gc/shared/weakProcessorPhases.hpp line 41: > >> 39: class Iterator; >> 40: >> 41: typedef void (*Processor)(BoolObjectClosure*, OopClosure*); > > I think this typedef is to support serial phases and that it is probably no longer used. ok, removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From gziemski at openjdk.java.net Tue Nov 3 21:48:01 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 3 Nov 2020 21:48:01 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: <-jCaLe-hgq37MGTSpeo_Ka-3ZDd_5zN4z2bg9MACu2M=.39538d88-d553-4710-bf40-38e07ff1401e@github.com> On Mon, 2 Nov 2020 19:55:29 GMT, Thomas Stuefe wrote: >> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >> >> - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >> - revert unblock_program_error_signals change > > src/hotspot/os/aix/os_aix.cpp line 1442: > >> 1440: } >> 1441: >> 1442: void os::print_signal_handlers(outputStream* st, char* buf, size_t buflen) { > > This makes sense. But then, it would make sense to move this to os_posix.cpp completely. Or to even completely replace calls to os::print_signal_handlers with PosixSignals::print_signal_handlers() and remove the former. print_signal_handlers() is declared in os.h, so at first glance I thought I couldn't collapse it further. I'll have a second look. > src/hotspot/share/runtime/os.hpp line 970: > >> 968: >> 969: static address ucontext_get_pc(const ucontext_t* ctx); >> 970: static void ucontext_set_pc(ucontext_t* ctx, address pc); > > This feels misplaced here (and probably won't compile on windows) since ucontext_t is POSIX. At the very least needs ucontext.h. But I would consider moving this to os_posix. I thought I tested it and it built fine on Windows - will take another look... ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 3 21:48:02 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 3 Nov 2020 21:48:02 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: <-jCaLe-hgq37MGTSpeo_Ka-3ZDd_5zN4z2bg9MACu2M=.39538d88-d553-4710-bf40-38e07ff1401e@github.com> References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> <-jCaLe-hgq37MGTSpeo_Ka-3ZDd_5zN4z2bg9MACu2M=.39538d88-d553-4710-bf40-38e07ff1401e@github.com> Message-ID: On Tue, 3 Nov 2020 21:42:07 GMT, Gerard Ziemski wrote: >> src/hotspot/os/aix/os_aix.cpp line 1442: >> >>> 1440: } >>> 1441: >>> 1442: void os::print_signal_handlers(outputStream* st, char* buf, size_t buflen) { >> >> This makes sense. But then, it would make sense to move this to os_posix.cpp completely. Or to even completely replace calls to os::print_signal_handlers with PosixSignals::print_signal_handlers() and remove the former. > > print_signal_handlers() is declared in os.h, so at first glance I thought I couldn't collapse it further. I'll have a second look. void os::print_signal_handlers(outputStream* st, char* buf, size_t buflen) is declared in os.h and is used on Windows platform as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From coleenp at openjdk.java.net Tue Nov 3 21:51:03 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 21:51:03 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: References: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> Message-ID: <0dSlRGfxnjP_6IlH5CEYdF48d3ymvjCKE_s4UZb_WNc=.789216de-6d78-4c00-bdd4-e09f1dec3924@github.com> On Tue, 3 Nov 2020 13:45:57 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> More review comments from Stefan and ErikO > > src/hotspot/share/gc/shared/weakProcessorPhases.hpp line 50: > >> 48: }; >> 49: >> 50: typedef uint WeakProcessorPhase; > > This was originally written with the idea that WeakProcessorPhases::Phase (and WeakProcessorPhase) should be a scoped enum (but we didn't have that feature yet). It's possible there are places that don't cope with a scoped enum, since that feature wasn't available when the code was written, so there might have be mistakes. > > But because of that, I'd prefer to keep the WeakProcessorPhases::Phase type and the existing definition of WeakProcessorPhase. Except this proposed change is breaking that at least here: > > src/hotspot/share/gc/shared/weakProcessor.inline.hpp > 116 uint oopstorage_index = WeakProcessorPhases::oopstorage_index(phase); > 117 StorageState* cur_state = _storage_states.par_state(oopstorage_index); > => > 103 StorageState* cur_state = _storage_states.par_state(phase); > > I think eventually (as in some future RFE) this could all be collapsed to something provided by OopStorageSet. > enum class : uint WeakProcessorPhase {}; > > ENUMERATOR_RANGE(WeakProcessorPhase, > static_cast(0), > static_cast(OopStorageSet::weak_count)); > and replacing all uses of WeakProcessorPhases::Iterator with EnumIterator (which involves more than a type alias). > > Though it might be possible to go even further and eliminate WeakProcessorPhases as a thing separate from OopStorageSet. Ok, so I'm not sure what to do with this: enum Phase { // Serial phase. JVMTI_ONLY(jvmti) // Additional implicit phase values follow for oopstorages. `};` I've removed the only thing in this enum. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From gziemski at openjdk.java.net Tue Nov 3 22:00:58 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 3 Nov 2020 22:00:58 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: On Tue, 3 Nov 2020 10:08:28 GMT, Thomas Stuefe wrote: >>> I hadn't realized that JVM_handle_XXX_signal defined a per-platform "public" entry point to allow external callers of the signal handling function in conjunction with -XX:+AllowUserSignalHandlers. We need to keep these but they can each call JVM_handle_posix_signal as their implementation. >> >> We should disentangle https://bugs.openjdk.java.net/browse/JDK-8255711 and this patch, https://bugs.openjdk.java.net/browse/JDK-8253742. >> >> I started by giving my patch a less generic name ("Fix and unify hotspot signal handlers"). I propose to do the same with this patch, or even split this patch into two smaller parts, since it does two things: >> - unify diagnostic printing code >> - unify SR handler setup >> >> As I wrote, I'd prefer to keep changes to JVM_xxx and javaSignalHandler out of this patch completely. I have to change those functions since the point of my patch is signal handler unification. >> >> In turn, I will keep my hands off any other code in signals_posix.xxx to decrease chances of conflict with this patch. > >> > I hadn't realized that JVM_handle_XXX_signal defined a per-platform "public" entry point to allow external callers of the signal handling function in conjunction with -XX:+AllowUserSignalHandlers. We need to keep these but they can each call JVM_handle_posix_signal as their implementation. >> >> We should disentangle https://bugs.openjdk.java.net/browse/JDK-8255711 and this patch, https://bugs.openjdk.java.net/browse/JDK-8253742. >> >> I started by giving my patch a less generic name ("Fix and unify hotspot signal handlers"). I propose to do the same with this patch, or even split this patch into two smaller parts, since it does two things: >> >> * unify diagnostic printing code >> * unify SR handler setup >> >> As I wrote, I'd prefer to keep changes to JVM_xxx and javaSignalHandler out of this patch completely. I have to change those functions since the point of my patch is signal handler unification. >> >> In turn, I will keep my hands off any other code in signals_posix.xxx to decrease chances of conflict with this patch. > > Oh, and yes, I preserve the JVM_handle_xxx_signal entries in my patch, but they are thin wrappers around an internal, posix-specific handler function. The snapshot of JDK that I'm using in my PR does not build on Windows. Do you have any suggestion how I can safely update to the latest JDK without messing up my PR? ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From sspitsyn at openjdk.java.net Tue Nov 3 22:23:00 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 3 Nov 2020 22:23:00 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 21:41:55 GMT, Coleen Phillimore wrote: > > @coleenp - please make sure you hear from someone on the Serviceability team > > for this PR... > > I've asked @sspitsyn to review this. Yes, I'm reviewing this. Still need another pass. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From burban at openjdk.java.net Tue Nov 3 23:08:59 2020 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Tue, 3 Nov 2020 23:08:59 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 16:58:58 GMT, Anton Kozlov wrote: > JDK-8255716 (#983) uncovered that os::processor_count on Linux can be not equal to number of cores reported in /proc/cpuinfo. The latter historically was used to decide CPU_A53MAC feature. This patch restores feature detection based on /proc/cpuinfo. > > Testing: linux -version > Unfortunately I cannot test windows/aarch64, CC: @lewurm @luhenry Tested slowdebug build on Windows+Arm64 with this patch, and smoked tested it with `jtreg:tier1_compiler_1` successfully. Change itself looks go to me too (but I'm not a reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From coleenp at openjdk.java.net Tue Nov 3 23:41:02 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 23:41:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: <0dSlRGfxnjP_6IlH5CEYdF48d3ymvjCKE_s4UZb_WNc=.789216de-6d78-4c00-bdd4-e09f1dec3924@github.com> References: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> <0dSlRGfxnjP_6IlH5CEYdF48d3ymvjCKE_s4UZb_WNc=.789216de-6d78-4c00-bdd4-e09f1dec3924@github.com> Message-ID: <1qM8Skbob0uL_KwdoJNDTyavFxOH_VHJc5o6yF881zI=.604bc76e-0536-48a0-91d5-4ba85e32bc11@github.com> On Tue, 3 Nov 2020 21:47:24 GMT, Coleen Phillimore wrote: >> src/hotspot/share/gc/shared/weakProcessorPhases.hpp line 50: >> >>> 48: }; >>> 49: >>> 50: typedef uint WeakProcessorPhase; >> >> This was originally written with the idea that WeakProcessorPhases::Phase (and WeakProcessorPhase) should be a scoped enum (but we didn't have that feature yet). It's possible there are places that don't cope with a scoped enum, since that feature wasn't available when the code was written, so there might have be mistakes. >> >> But because of that, I'd prefer to keep the WeakProcessorPhases::Phase type and the existing definition of WeakProcessorPhase. Except this proposed change is breaking that at least here: >> >> src/hotspot/share/gc/shared/weakProcessor.inline.hpp >> 116 uint oopstorage_index = WeakProcessorPhases::oopstorage_index(phase); >> 117 StorageState* cur_state = _storage_states.par_state(oopstorage_index); >> => >> 103 StorageState* cur_state = _storage_states.par_state(phase); >> >> I think eventually (as in some future RFE) this could all be collapsed to something provided by OopStorageSet. >> enum class : uint WeakProcessorPhase {}; >> >> ENUMERATOR_RANGE(WeakProcessorPhase, >> static_cast(0), >> static_cast(OopStorageSet::weak_count)); >> and replacing all uses of WeakProcessorPhases::Iterator with EnumIterator (which involves more than a type alias). >> >> Though it might be possible to go even further and eliminate WeakProcessorPhases as a thing separate from OopStorageSet. > > Ok, so I'm not sure what to do with this: > > enum Phase { > // Serial phase. > JVMTI_ONLY(jvmti) > // Additional implicit phase values follow for oopstorages. > `};` > > I've removed the only thing in this enum. >Though it might be possible to go even further and eliminate WeakProcessorPhases as a thing separate from OopStorageSet. This makes sense. Can we file another RFE for this? I was sort of surprised by how much code was involved so I tried to find a place to stop deleting. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Tue Nov 3 23:41:02 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 23:41:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 21:12:49 GMT, Albert Mingkun Yang wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into jvmti-table >> - Merge branch 'master' into jvmti-table >> - More review comments from Stefan and ErikO >> - Code review comments from StefanK. >> - 8212879: Make JVMTI TagMap table not hash on oop address > > src/hotspot/share/utilities/hashtable.cpp line 164: > >> 162: } >> 163: } >> 164: return newsize; > > It is existing code, but could be made clearer as part of this PR: > int newsize; > for (int i=0; i newsize = primelist[i]; > if (newsize >= requested) > break; > } > return newsize; > Additionally, this method could be made `const`, right? > > PS: not a review, just a comment in passing Yes, that is a lot simpler and better. I'd copied that code from another file without changing it that much. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From jiefu at openjdk.java.net Tue Nov 3 23:47:58 2020 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 3 Nov 2020 23:47:58 GMT Subject: RFR: 8255617: Zero: purge the remaining bytecode interpreter profiling support [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 10:22:14 GMT, Aleksey Shipilev wrote: >> All the stubs in `interpreter/zero/bytecodeInterpreterProfiling.hpp` are empty. History shows the whole thing gradually moved to template interpreter. We can probably simplify Zero code by dropping these empty stubs altogether. Arguably, this makes porting to new architectures a bit harder, but it seems that the proper stepping stone after Zero is implementing template interpreter anyway. >> >> On my TR 3970X, this improves: >> - Linux x86_64 Zero "make images" times from ~18 minutes to ~17.5 minutes >> >> I would like to have the opinion of @GoeLin who added this for PPC64 porting back in 8u. And probably @DamonFool who is usually interested in Zero. And @jerboaa, @gnu-andrew who deal with Zero from time to time. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8255617-zero-purge-bi-profiliing > - Also remove now unused BytecodeInterpreter::mdx field > - Remove leftover variable > - Remove leftover comments > - 8255617: Zero: purge the remaining bytecode interpreter profiling support Still looks good. ------------- Marked as reviewed by jiefu (Committer). PR: https://git.openjdk.java.net/jdk/pull/944 From coleenp at openjdk.java.net Tue Nov 3 23:59:56 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 3 Nov 2020 23:59:56 GMT Subject: RFR: 8255617: Zero: purge the remaining bytecode interpreter profiling support [v2] In-Reply-To: References: Message-ID: <3GuFh7in4L6v-mRtQQ-7FrwtA9eDzwuhpfpohDy4dm0=.8f75b5c9-d43d-4dcb-9dcd-60f321ada496@github.com> On Tue, 3 Nov 2020 10:22:14 GMT, Aleksey Shipilev wrote: >> All the stubs in `interpreter/zero/bytecodeInterpreterProfiling.hpp` are empty. History shows the whole thing gradually moved to template interpreter. We can probably simplify Zero code by dropping these empty stubs altogether. Arguably, this makes porting to new architectures a bit harder, but it seems that the proper stepping stone after Zero is implementing template interpreter anyway. >> >> On my TR 3970X, this improves: >> - Linux x86_64 Zero "make images" times from ~18 minutes to ~17.5 minutes >> >> I would like to have the opinion of @GoeLin who added this for PPC64 porting back in 8u. And probably @DamonFool who is usually interested in Zero. And @jerboaa, @gnu-andrew who deal with Zero from time to time. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8255617-zero-purge-bi-profiliing > - Also remove now unused BytecodeInterpreter::mdx field > - Remove leftover variable > - Remove leftover comments > - 8255617: Zero: purge the remaining bytecode interpreter profiling support Still looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/944 From coleenp at openjdk.java.net Wed Nov 4 00:08:10 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 4 Nov 2020 00:08:10 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v5] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Code review comments from Kim and Albert. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/1c3f2e1e..f66ea839 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=03-04 Stats: 10 lines in 3 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From david.holmes at oracle.com Wed Nov 4 00:52:36 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 4 Nov 2020 10:52:36 +1000 Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> Message-ID: <73af7be5-b2cb-6dfb-ba64-3b3e6ed608e1@oracle.com> On 4/11/2020 8:00 am, Gerard Ziemski wrote: > The snapshot of JDK that I'm using in my PR does not build on Windows. Do you have any suggestion how I can safely update to the latest JDK without messing up my PR? Your PR will have to merge with latest changes one way or another anyway, so I don't think you can mess up the PR. Just update your main branch, merge into your bug branch and push the Merge changeset to your PF. But I'm going to find it very difficult to try and review your changes and Thomas's simultaneously anyway. There's just too much to keep track of. Cheers, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/636 > From dongbo at openjdk.java.net Wed Nov 4 01:45:55 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 4 Nov 2020 01:45:55 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4] In-Reply-To: References: <_p5rQggkG8fBak9RrEDnoOzRnl8hsezlFjNmN_MFRiU=.c4167eaf-79f4-45eb-9fa9-2d0613274392@github.com> Message-ID: On Tue, 3 Nov 2020 17:00:27 GMT, Andrew Haley wrote: >> Done, I didn't realize this would be a problem at all. >> Thanks for the clarification. > > OK, thanks, I need to write some of this stuff down as guidance. Aliases for register names are always risky, but for the scratch registers doubly so. That's great! I think we will walk into much less detours if there is a guidance which contains empirical coding rules in it. For this patch, if we do not need further modifications, could you please press the approval button? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From sspitsyn at openjdk.java.net Wed Nov 4 02:19:00 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 4 Nov 2020 02:19:00 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v5] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 00:08:10 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Code review comments from Kim and Albert. Hi Coleen, Wow, there are a lot of simplifications and code removal with this fix! It looks great in general, just some nits below. I also wanted to suggest renaming the 'set_needs_processing' to 'set_needs_rehashing'. :) src/hotspot/share/prims/jvmtiTagMap.hpp: Nit: Would it better to use a plural form 'post_dead_objects_on_vm_thread'? : `+ void post_dead_object_on_vm_thread();` src/hotspot/share/prims/jvmtiTagMap.cpp: Nit: It'd be nice to add a short comment before the check_hashmap similar to L143 also explaining a difference (does check and post just for one env) with the check_hashmaps_for_heapwalk: 122 void JvmtiTagMap::check_hashmap(bool post_events) { . . . 143 // This checks for posting and rehashing and is called from the heap walks. 144 void JvmtiTagMap::check_hashmaps_for_heapwalk() { I'm just curious how this fragment was added. Did you get any failures in testing? : 1038 // skip if object is a dormant shared object whose mirror hasn't been loaded 1039 if (obj != NULL && obj->klass()->java_mirror() == NULL) { 1040 log_debug(cds, heap)("skipped dormant archived object " INTPTR_FORMAT " (%s)", p2i(obj), 1041 obj->klass()->external_name()); 1042 return; 1043 } Nit: Can we rename this field to something like '_some_dead_found' or '_dead_found'? : `1186 bool _some_dead;` Nit: The lines 2997-3007 and 3009-3019 do the same but in different contexts. 2996 if (!is_vm_thread) { 2997 if (num_dead_entries != 0) { 2998 JvmtiEnvIterator it; 2999 for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { 3000 JvmtiTagMap* tag_map = env->tag_map_acquire(); 3001 if (tag_map != NULL) { 3002 // Lock each hashmap from concurrent posting and cleaning 3003 tag_map->unlink_and_post_locked(); 3004 } 3005 } 3006 // there's another callback for needs_rehashing 3007 } 3008 } else { 3009 assert(SafepointSynchronize::is_at_safepoint(), "must be"); 3010 JvmtiEnvIterator it; 3011 for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { 3012 JvmtiTagMap* tag_map = env->tag_map_acquire(); 3013 if (tag_map != NULL && !tag_map->is_empty()) { 3014 if (num_dead_entries != 0) { 3015 tag_map->hashmap()->unlink_and_post(tag_map->env()); 3016 } 3017 // Later GC code will relocate the oops, so defer rehashing until then. 3018 tag_map->_needs_rehashing = true; 3019 } It feels like it can be refactored/simplified, at least, a little bit. Is it possible to check and just return if (num_dead_entries == 0)? If not, then, at least, it can be done the same way (except of locking). Also, can we have just one (static?) lock for the whole gc_notification (not per JVMTI env/JvmtiTagMap)? How much do we win by locking per each env/JvmtiTagMap? Note, that in normal case there is just one agent. It is very rare to have multiple agents requesting object tagging and ObjectFree events. It seems, this can be refactored to more simple code with one function doing work in both contexts. src/hotspot/share/utilities/hashtable.cpp: Nit: Need space after the '{' : `+const int _small_table_sizes[] = {107, 1009, 2017, 4049, 5051, 10103, 20201, 40423 } ;` src/hotspot/share/prims/jvmtiTagMapTable.cpp: Nit: Extra space after assert: `119 assert (find(index, hash, obj) == NULL, "shouldn't already be present");` Thanks, Serguei ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From github.com+670087+jrziviani at openjdk.java.net Wed Nov 4 03:14:09 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Wed, 4 Nov 2020 03:14:09 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v4] In-Reply-To: References: Message-ID: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Ziviani has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. Ref: PowerISA 3.1, page 129. These instructions are particularly interesting to improve the following pattern `(src1src2)? 1: 0)`, which can be found in `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. Long.toString, that generate such pattern in getChars, has showed a good performance gain by using these new instructions. Example: for (int i = 0; i < 200_000; i++) res = Long.toString((long)i); java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString Without setbc (average): 0.1178 seconds With setbc (average): 0.0396 seconds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/907/files - new: https://git.openjdk.java.net/jdk/pull/907/files/41502730..0af02057 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=02-03 Stats: 8084 lines in 281 files changed: 4806 ins; 2000 del; 1278 mod Patch: https://git.openjdk.java.net/jdk/pull/907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/907/head:pull/907 PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Wed Nov 4 03:14:09 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Wed, 4 Nov 2020 03:14:09 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v3] In-Reply-To: References: Message-ID: <7dpBkAsgncLqMBAZ4I4TlEYkUNi8-re3dhmsAxOtcP8=.63e7d361-d0e3-4ac0-86ba-fd53197e80e8@github.com> On Mon, 2 Nov 2020 10:06:20 GMT, Martin Doerr wrote: >> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > Thanks for doing this. Please check my inline comments. > If you would like to benchmark C1, you can use -XX:TieredStopAtLevel=1 to switch off C2. > When you factor the new logic out, I highly prefer to use it everywhere: C2, C1 (LIR_Assembler::comp_fl2i), interpreter (TemplateTable::lcmp, TemplateTable::float_cmp) Hallo, @TheRealMDoerr. Sorry for the delay, there was a public holiday here. Anyway, the new code can be found [here](https://github.com/openjdk/jdk/commit/0af0205797e7084af4356934ed8e8ea185810569), hope I haven't missed any point. Thanks again sir. Your hints were very precise, I learned a lot. ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From dholmes at openjdk.java.net Wed Nov 4 04:29:03 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 4 Nov 2020 04:29:03 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: Message-ID: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> On Tue, 3 Nov 2020 21:35:08 GMT, Gerard Ziemski wrote: >> hi all, >> >> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >> >> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >> >> #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) >> >> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) >> >> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >> >> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >> >> #6 Coleen's feedback - factored out print_signal_handlers() >> >> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >> >> #8 Thomas's feedback - factored out common POSIX signal initialization code >> >> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >> >> #10 YaSuenag's feedback - unified logging out of the scope for this fix >> >> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > - use ifdef(SIGDANGER) and ifdef(SIGTRAP) > - revert unblock_program_error_signals change Hi Gerard, Overall looking good. Some changes still to be finalized e.g ucontext_t related functions in os.hpp. I flagged some os functions that are implemented in os_foo.cpp but which just call the Posix helper, which can be deleted from os_foo.cpp and simply added to os_posix.cpp. That can't be a further cleanup RFE if you want to limit changes in this PR. A few minor nits below. Thanks, David src/hotspot/os/aix/os_aix.cpp line 2578: > 2576: } > 2577: > 2578: void os::SuspendedThreadTask::internal_do_task() { We should be able to have a single definition of this function in os_posix.cpp too. src/hotspot/os/posix/signals_posix.cpp line 1286: > 1284: void PosixSignals::print_signal_handlers(outputStream* st, char* buf, size_t buflen) { > 1285: st->print_cr("Signal Handlers:"); > 1286: PosixSignals::print_signal_handler(st, SIGSEGV, buf, buflen); You shouldn't need the PosixSignals:: prefix in this method. src/hotspot/os/posix/signals_posix.cpp line 1349: > 1347: sigaddset(&unblocked_sigs, SIGTRAP); > 1348: #endif > 1349: sigaddset(&unblocked_sigs, PosixSignals::SR_signum); Shouldn't need the PosixSignals::prefix in this method src/hotspot/os/posix/signals_posix.cpp line 1642: > 1640: > 1641: void PosixSignals::do_task(Thread* thread, os::SuspendedThreadTask* task) { > 1642: if (PosixSignals::do_suspend(thread->osthread())) { Shouldn't need PosixSignals:: prefix in this method. src/hotspot/os/posix/signals_posix.hpp line 33: > 31: > 32: typedef siginfo_t siginfo_t; > 33: typedef sigset_t sigset_t; I don't see why this is needed/wanted. We can include signal.h without a problem. I'm not even sure what these typedefs means ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From dholmes at openjdk.java.net Wed Nov 4 04:29:03 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 4 Nov 2020 04:29:03 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> <-jCaLe-hgq37MGTSpeo_Ka-3ZDd_5zN4z2bg9MACu2M=.39538d88-d553-4710-bf40-38e07ff1401e@github.com> Message-ID: On Tue, 3 Nov 2020 21:45:04 GMT, Gerard Ziemski wrote: >> print_signal_handlers() is declared in os.h, so at first glance I thought I couldn't collapse it further. I'll have a second look. > > void os::print_signal_handlers(outputStream* st, char* buf, size_t buflen) is declared in os.h and is used on Windows platform as well. So we have one definition in os_posix.cpp and one in os_windows.cpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From stuefe at openjdk.java.net Wed Nov 4 05:09:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 4 Nov 2020 05:09:01 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: On Wed, 4 Nov 2020 04:25:52 GMT, David Holmes wrote: >> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >> >> - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >> - revert unblock_program_error_signals change > > Hi Gerard, > > Overall looking good. Some changes still to be finalized e.g ucontext_t related functions in os.hpp. > > I flagged some os functions that are implemented in os_foo.cpp but which just call the Posix helper, which can be deleted from os_foo.cpp and simply added to os_posix.cpp. That can't be a further cleanup RFE if you want to limit changes in this PR. > > A few minor nits below. > > Thanks, > David > The snapshot of JDK that I'm using in my PR does not build on Windows. Do you have any suggestion how I can safely update to the latest JDK without messing up my PR? Hi Gerard, merging master would not help you with the build error. As I said, it complains about ucontext_t. As for your question, just do a git checkout yourbranch git merge master if you get conflicts, you'll need to resolve them, but this is the way to go without invalidating old commits. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From dholmes at openjdk.java.net Wed Nov 4 05:11:00 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 4 Nov 2020 05:11:00 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 10:06:50 GMT, Thomas Stuefe wrote: > Hi all, > > may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. > > Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. > > This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. > > This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. > > ---- > > This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. > > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. > > Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. > > --- > > The fixed issues: > > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. > > Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. > But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. > > 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). > > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. > > 4) Every platform handler has this section: > > JavaThread* thread = NULL; > VMThread* vmthread = NULL; > if (PosixSignals::are_signal_handlers_installed()) { > if (t != NULL ){ > if(t->is_Java_thread()) { > thread = t->as_Java_thread(); > } > else if(t->is_VM_thread()){ > vmthread = (VMThread *)t; > } > } > } > > `vmthread` is unused on all platforms and can be removed. > > 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): > > if (sig == SIGPIPE || sig == SIGXFSZ) { > // allow chained handler to go first > if (PosixSignals::chained_handler(sig, info, ucVoid)) { > return true; > } else { > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 > return true; > } > } > > - On s390 and ppc, we miss SIGXFSZ handling > _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. > - both paths return true - section can be shortened > > Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. > > 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: > >> // unmask current signal >> sigset_t newset; >> sigemptyset(&newset); >> sigaddset(&newset, sig); >> sigprocmask(SIG_UNBLOCK, &newset, NULL); >> > > - Use of `sigprocmask()` is UB in a multithreaded program. > - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. > > 7) the JFR crash protection is not consistently checked in all platform handlers. > > 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). > > 9) on Linux ppc64 and AIX, we have this section: > >> if (sig == SIGILL && (pc < (address) 0x200)) { >> goto report_and_die; >> } > > which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). > > This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. > > 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. > > On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. > > ---- > > The changes in this patch: > > a) hotspot signal handling is now done by the following functions: > > > | | > v v > javaSignalHandler JVM_handle_linux_signal() > | / > v v > javaSignalHandler_inner > | > v > PosixSignals::pd_hotspot_signal_handler() > > > The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. > > `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. > `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. > > > b) I commonized prologue- and epilogue coding. > - I simplified (4) to a single line in the shared handler > - I moved the JFR thread crash protection (7) up to the shared handler > - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) > - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. > - I simplified (5) and commonized it, and removed (9) completely > - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. > > Thanks for reviewing. > > Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. > > I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. > > Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. > > ---- > > [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html > [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html > [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html > [5] https://bugs.openjdk.java.net/browse/JDK-8253742 src/hotspot/os/posix/signals_posix.cpp line 1231: > 1229: void PosixSignals::install_signal_handlers() { > 1230: static bool done = false; > 1231: assert(!done, "Only call once"); Not sure this kind of init guard is really needed, but if it is then the flag should DEBUG_ONLY. Also I don't see you setting this to true anywhere. ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From stuefe at openjdk.java.net Wed Nov 4 05:11:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 4 Nov 2020 05:11:01 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 05:05:28 GMT, David Holmes wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > src/hotspot/os/posix/signals_posix.cpp line 1231: > >> 1229: void PosixSignals::install_signal_handlers() { >> 1230: static bool done = false; >> 1231: assert(!done, "Only call once"); > > Not sure this kind of init guard is really needed, but if it is then the flag should DEBUG_ONLY. Also I don't see you setting this to true anywhere. Ouch. Stupid rebasing error. ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From stuefe at openjdk.java.net Wed Nov 4 05:20:09 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 4 Nov 2020 05:20:09 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v2] In-Reply-To: References: Message-ID: > Hi all, > > may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. > > Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. > > This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. > > This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. > > ---- > > This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. > > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. > > Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. > > --- > > The fixed issues: > > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. > > Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. > But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. > > 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). > > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. > > 4) Every platform handler has this section: > > JavaThread* thread = NULL; > VMThread* vmthread = NULL; > if (PosixSignals::are_signal_handlers_installed()) { > if (t != NULL ){ > if(t->is_Java_thread()) { > thread = t->as_Java_thread(); > } > else if(t->is_VM_thread()){ > vmthread = (VMThread *)t; > } > } > } > > `vmthread` is unused on all platforms and can be removed. > > 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): > > if (sig == SIGPIPE || sig == SIGXFSZ) { > // allow chained handler to go first > if (PosixSignals::chained_handler(sig, info, ucVoid)) { > return true; > } else { > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 > return true; > } > } > > - On s390 and ppc, we miss SIGXFSZ handling > _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. > - both paths return true - section can be shortened > > Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. > > 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: > >> // unmask current signal >> sigset_t newset; >> sigemptyset(&newset); >> sigaddset(&newset, sig); >> sigprocmask(SIG_UNBLOCK, &newset, NULL); >> > > - Use of `sigprocmask()` is UB in a multithreaded program. > - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. > > 7) the JFR crash protection is not consistently checked in all platform handlers. > > 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). > > 9) on Linux ppc64 and AIX, we have this section: > >> if (sig == SIGILL && (pc < (address) 0x200)) { >> goto report_and_die; >> } > > which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). > > This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. > > 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. > > On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. > > ---- > > The changes in this patch: > > a) hotspot signal handling is now done by the following functions: > > > | | > v v > javaSignalHandler JVM_handle_linux_signal() > | / > v v > javaSignalHandler_inner > | > v > PosixSignals::pd_hotspot_signal_handler() > > > The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. > > `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. > `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. > > > b) I commonized prologue- and epilogue coding. > - I simplified (4) to a single line in the shared handler > - I moved the JFR thread crash protection (7) up to the shared handler > - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) > - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. > - I simplified (5) and commonized it, and removed (9) completely > - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. > > Thanks for reviewing. > > Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. > > I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. > > Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. > > ---- > > [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html > [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html > [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html > [5] https://bugs.openjdk.java.net/browse/JDK-8253742 Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1034/files - new: https://git.openjdk.java.net/jdk/pull/1034/files/dde40e19..a548111f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1034.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1034/head:pull/1034 PR: https://git.openjdk.java.net/jdk/pull/1034 From sspitsyn at openjdk.java.net Wed Nov 4 05:40:03 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 4 Nov 2020 05:40:03 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v5] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 02:15:52 GMT, Serguei Spitsyn wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Code review comments from Kim and Albert. > > Hi Coleen, > > Wow, there are a lot of simplifications and code removal with this fix! > It looks great in general, just some nits below. > I also wanted to suggest renaming the 'set_needs_processing' to 'set_needs_rehashing'. :) > > src/hotspot/share/prims/jvmtiTagMap.hpp: > > Nit: Would it better to use a plural form 'post_dead_objects_on_vm_thread'? : > `+ void post_dead_object_on_vm_thread();` > > src/hotspot/share/prims/jvmtiTagMap.cpp: > > Nit: It'd be nice to add a short comment before the check_hashmap similar to L143 also explaining a difference (does check and post just for one env) with the check_hashmaps_for_heapwalk: > 122 void JvmtiTagMap::check_hashmap(bool post_events) { > . . . > 143 // This checks for posting and rehashing and is called from the heap walks. > 144 void JvmtiTagMap::check_hashmaps_for_heapwalk() { > > I'm just curious how this fragment was added. Did you get any failures in testing? : > 1038 // skip if object is a dormant shared object whose mirror hasn't been loaded > 1039 if (obj != NULL && obj->klass()->java_mirror() == NULL) { > 1040 log_debug(cds, heap)("skipped dormant archived object " INTPTR_FORMAT " (%s)", p2i(obj), > 1041 obj->klass()->external_name()); > 1042 return; > 1043 } > > Nit: Can we rename this field to something like '_some_dead_found' or '_dead_found'? : > `1186 bool _some_dead;` > > Nit: The lines 2997-3007 and 3009-3019 do the same but in different contexts. > 2996 if (!is_vm_thread) { > 2997 if (num_dead_entries != 0) { > 2998 JvmtiEnvIterator it; > 2999 for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { > 3000 JvmtiTagMap* tag_map = env->tag_map_acquire(); > 3001 if (tag_map != NULL) { > 3002 // Lock each hashmap from concurrent posting and cleaning > 3003 tag_map->unlink_and_post_locked(); > 3004 } > 3005 } > 3006 // there's another callback for needs_rehashing > 3007 } > 3008 } else { > 3009 assert(SafepointSynchronize::is_at_safepoint(), "must be"); > 3010 JvmtiEnvIterator it; > 3011 for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { > 3012 JvmtiTagMap* tag_map = env->tag_map_acquire(); > 3013 if (tag_map != NULL && !tag_map->is_empty()) { > 3014 if (num_dead_entries != 0) { > 3015 tag_map->hashmap()->unlink_and_post(tag_map->env()); > 3016 } > 3017 // Later GC code will relocate the oops, so defer rehashing until then. > 3018 tag_map->_needs_rehashing = true; > 3019 } > It feels like it can be refactored/simplified, at least, a little bit. > Is it possible to check and just return if (num_dead_entries == 0)? > If not, then, at least, it can be done the same way (except of locking). > Q: Should the _needs_rehashing be set in both contexts? > > Also, can we have just one (static?) lock for the whole gc_notification (not per JVMTI env/JvmtiTagMap)? How much do we win by locking per each env/JvmtiTagMap? Note, that in normal case there is just one agent. It is very rare to have multiple agents requesting object tagging and ObjectFree events. It seems, this can be refactored to more simple code with one function doing work in both contexts. > > src/hotspot/share/utilities/hashtable.cpp: > > Nit: Need space after the '{' : > `+const int _small_table_sizes[] = {107, 1009, 2017, 4049, 5051, 10103, 20201, 40423 } ;` > > src/hotspot/share/prims/jvmtiTagMapTable.cpp: > > Nit: Extra space after assert: > `119 assert (find(index, hash, obj) == NULL, "shouldn't already be present");` > > Thanks, > Serguei More about possible refactoring of the JvmtiTagMap::gc_notification(). I'm thinking about something like below: void JvmtiTagMap::unlink_and_post_for_all_envs() { if (num_dead_entries == 0) { return; // nothing to unlink and post } JvmtiEnvIterator it; for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { JvmtiTagMap* tag_map = env->tag_map_acquire(); if (tag_map != NULL && !tag_map->is_empty()) { tag_map->unlink_and_post(); } } } void JvmtiTagMap::gc_notification(size_t num_dead_entries) { if (Thread::current()->is_VM_thread()) { assert(SafepointSynchronize::is_at_safepoint(), "must be"); unlink_and_post_for_all_envs(); set_needs_rehashing(); } else { MutexLocker ml(JvmtiTagMap_lock(), Mutex::_no_safepoint_check_flag); unlink_and_post_for_all_envs(); // there's another callback for needs_rehashing } } If we still need a lock per each JvmtiTagMap then it is possible to add this fragment to the unlink_and_post_for_all_envs: bool is_vm_thread = Thread::current()->is_VM_thread() MutexLocker ml(is_vm_thread ? NULL : lock(), Mutex::_no_safepoint_check_flag); Then the code above could look like below: void JvmtiTagMap::unlink_and_post_for_all_envs() { if (num_dead_entries == 0) { return; // nothing to unlink and post } bool is_vm_thread = Thread::current()->is_VM_thread() JvmtiEnvIterator it; for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { JvmtiTagMap* tag_map = env->tag_map_acquire(); if (tag_map != NULL && !tag_map->is_empty()) { MutexLocker ml(is_vm_thread ? NULL : lock(), Mutex::_no_safepoint_check_flag); tag_map->unlink_and_post(); } } } void JvmtiTagMap::gc_notification(size_t num_dead_entries) { if (Thread::current()->is_VM_thread()) { assert(SafepointSynchronize::is_at_safepoint(), "must be"); set_needs_rehashing(); } unlink_and_post_for_all_envs(); } ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From dholmes at openjdk.java.net Wed Nov 4 05:50:54 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 4 Nov 2020 05:50:54 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v2] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 05:20:09 GMT, Thomas Stuefe wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() src/hotspot/os/posix/signals_posix.cpp line 443: > 441: extern "C" JNIEXPORT int > 442: #if defined(BSD) > 443: JVM_handle_bsd_signal Can we define this using token pasting e.g. PASTE_TOKENS(JVM_handle, PASTE_TOKENS(INCLUDE_SUFFIX_OS, _signal)) ? ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From david.holmes at oracle.com Wed Nov 4 05:56:34 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 4 Nov 2020 15:56:34 +1000 Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: Message-ID: <7888a9d2-a3b9-9981-665f-1cb3181d199d@oracle.com> Hi Thomas, Some initial comments as this is quite big - but thanks for all the detailed explanations. On 4/11/2020 1:57 am, Thomas Stuefe wrote: > Hi all, > > may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. > > Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. > > This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. > > This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. > > ---- > > This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. Thanks for doing that search. > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. > > Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. On the documentation front we could at least explain what the flag really does. Rather than saying: product(bool, AllowUserSignalHandlers, false, \ "Do not complain if the application installs signal handlers " we could say something like: product(bool, AllowUserSignalHandlers, false, \ "Allow the application to install the primary signal handlers instead of the JVM." \ and we (I?) could update the java manpage. > > --- > > The fixed issues: > > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. > > Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. > But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. The code that is guarded by this check is implicitly safe as it is trivial and early in VM init the current thread will be null anyway and so we won't execute the guarded code. > 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). Ok. > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. > > 4) Every platform handler has this section: > > JavaThread* thread = NULL; > VMThread* vmthread = NULL; > if (PosixSignals::are_signal_handlers_installed()) { > if (t != NULL ){ > if(t->is_Java_thread()) { > thread = t->as_Java_thread(); > } > else if(t->is_VM_thread()){ > vmthread = (VMThread *)t; > } > } > } > > `vmthread` is unused on all platforms and can be removed. Ok. Once upon a time we probably did something different if in the vmThread versus some other non-JavaThread. > 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): > > if (sig == SIGPIPE || sig == SIGXFSZ) { > // allow chained handler to go first > if (PosixSignals::chained_handler(sig, info, ucVoid)) { > return true; > } else { > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 > return true; > } > } > > - On s390 and ppc, we miss SIGXFSZ handling > _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. > - both paths return true - section can be shortened > > Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. > > 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: > >> // unmask current signal >> sigset_t newset; >> sigemptyset(&newset); >> sigaddset(&newset, sig); >> sigprocmask(SIG_UNBLOCK, &newset, NULL); >> > > - Use of `sigprocmask()` is UB in a multithreaded program. > - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. But is this guaranteed to be one of the error signals? What if the application calls our handler for some other signal? I guess that is their problem. > 7) the JFR crash protection is not consistently checked in all platform handlers. > > 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). > > 9) on Linux ppc64 and AIX, we have this section: > >> if (sig == SIGILL && (pc < (address) 0x200)) { >> goto report_and_die; >> } > > which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). > > This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. > > 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. > > On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. The Posix spec also states "For some implementations, the value of si_addr may be inaccurate." - so I'm not at all sure which "pc" we should be trusting here? I thought the ucontext was the detailed platform specific "context" object that we should extract information from. Which architectures give different values in the two and is there some documentation stating what happens for any given os/cpu? > ---- > > The changes in this patch: > > a) hotspot signal handling is now done by the following functions: > > > | | > v v > javaSignalHandler JVM_handle_linux_signal() > | / > v v > javaSignalHandler_inner Not clear why we need the _inner version. Why can't we just have javaSignalHandler which is installed as the handler and which is called by JVM_handle_XXX_signal? > | > v > PosixSignals::pd_hotspot_signal_handler() > > > The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. > > `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. > `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. > > > b) I commonized prologue- and epilogue coding. > - I simplified (4) to a single line in the shared handler > - I moved the JFR thread crash protection (7) up to the shared handler > - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) > - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. > - I simplified (5) and commonized it, and removed (9) completely > - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. > > Thanks for reviewing. > > Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. > > I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. > > Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. :) I've actually gone through this in far more detail as I've been composing this email and overall it is looking very good. I've made a couple of comments directly in the PR, in addition to the above. Thanks, David ----- > > ---- > > [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html > [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html > [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html > [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > ------------- > > Commit messages: > - Initial patch > > Changes: https://git.openjdk.java.net/jdk/pull/1034/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8255711 > Stats: 916 lines in 13 files changed: 116 ins; 686 del; 114 mod > Patch: https://git.openjdk.java.net/jdk/pull/1034.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/1034/head:pull/1034 > > PR: https://git.openjdk.java.net/jdk/pull/1034 > From david.holmes at oracle.com Wed Nov 4 06:10:03 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 4 Nov 2020 16:10:03 +1000 Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: <6067c6c6-6754-ef11-93be-47d248ea7de7@oracle.com> Typo: On 4/11/2020 2:29 pm, David Holmes wrote: > On Tue, 3 Nov 2020 21:35:08 GMT, Gerard Ziemski wrote: > >>> hi all, >>> >>> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >>> >>> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >>> >>> #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) >>> >>> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) >>> >>> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >>> >>> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >>> >>> #6 Coleen's feedback - factored out print_signal_handlers() >>> >>> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >>> >>> #8 Thomas's feedback - factored out common POSIX signal initialization code >>> >>> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >>> >>> #10 YaSuenag's feedback - unified logging out of the scope for this fix >>> >>> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? >> >> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >> >> - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >> - revert unblock_program_error_signals change > > Hi Gerard, > > Overall looking good. Some changes still to be finalized e.g ucontext_t related functions in os.hpp. > > I flagged some os functions that are implemented in os_foo.cpp but which just call the Posix helper, which can be deleted from os_foo.cpp and simply added to os_posix.cpp. That can't be a further cleanup RFE if you want to limit changes in this PR. s/can't be/can be/ :) David ----- > A few minor nits below. > > Thanks, > David > > src/hotspot/os/aix/os_aix.cpp line 2578: > >> 2576: } >> 2577: >> 2578: void os::SuspendedThreadTask::internal_do_task() { > > We should be able to have a single definition of this function in os_posix.cpp too. > > src/hotspot/os/posix/signals_posix.cpp line 1286: > >> 1284: void PosixSignals::print_signal_handlers(outputStream* st, char* buf, size_t buflen) { >> 1285: st->print_cr("Signal Handlers:"); >> 1286: PosixSignals::print_signal_handler(st, SIGSEGV, buf, buflen); > > You shouldn't need the PosixSignals:: prefix in this method. > > src/hotspot/os/posix/signals_posix.cpp line 1349: > >> 1347: sigaddset(&unblocked_sigs, SIGTRAP); >> 1348: #endif >> 1349: sigaddset(&unblocked_sigs, PosixSignals::SR_signum); > > Shouldn't need the PosixSignals::prefix in this method > > src/hotspot/os/posix/signals_posix.cpp line 1642: > >> 1640: >> 1641: void PosixSignals::do_task(Thread* thread, os::SuspendedThreadTask* task) { >> 1642: if (PosixSignals::do_suspend(thread->osthread())) { > > Shouldn't need PosixSignals:: prefix in this method. > > src/hotspot/os/posix/signals_posix.hpp line 33: > >> 31: >> 32: typedef siginfo_t siginfo_t; >> 33: typedef sigset_t sigset_t; > > I don't see why this is needed/wanted. We can include signal.h without a problem. > > I'm not even sure what these typedefs means ?? > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/636 > From thomas.stuefe at gmail.com Wed Nov 4 06:34:38 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 4 Nov 2020 07:34:38 +0100 Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: <7888a9d2-a3b9-9981-665f-1cb3181d199d@oracle.com> References: <7888a9d2-a3b9-9981-665f-1cb3181d199d@oracle.com> Message-ID: Hi David, On Wed, Nov 4, 2020 at 6:56 AM David Holmes wrote: > Hi Thomas, > > Some initial comments as this is quite big - but thanks for all the > detailed explanations. > > > > > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for > untangling this bit of history. > > > > Unfortunately there is no official documentation from Sun or Oracle, and > zero regression tests. So I try to preserve this interface as best as I > can. I plan to add a proper regression test with a later change, but for > now I don't have the time for that. > > On the documentation front we could at least explain what the flag > really does. Rather than saying: > > product(bool, AllowUserSignalHandlers, false, > \ > "Do not complain if the application installs signal handlers > " > > we could say something like: > > product(bool, AllowUserSignalHandlers, false, > \ > "Allow the application to install the primary signal handlers > instead of the JVM." \ > > and we (I?) could update the java manpage. > > I can update the text description, but I would like to defer any additional work for this switch to some other RFE. > > > > --- > > > > The fixed issues: > > > > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the > platform handlers to guard a part of the platform handlers against > execution in case the signal handlers are not yet installed. > > > > Initially this confused me since when this handler is called it would of > course be installed. So that boolean would always be true. The only > explanation I found was that since these handlers can be invoked directly > from outside, this is some (ineffective) form of guard against calling this > handler too early. > > But that guard can be left out and that boolean removed. Our signal > handlers are safe to call before VM initialization is completed. > > The code that is guarded by this check is implicitly safe as it is > trivial and early in VM init the current thread will be null anyway and > so we won't execute the guarded code. > Yes, that's what I thought. > > > 2) The return code of JVM_handle_xxx_signal() was inconsistently set > (some as bool, some as int) as well as unused in normal code paths > (excluding outside calls). > > Ok. > > > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there > is a day-zero bug which caused it to not be exported. > > > > 4) Every platform handler has this section: > > > > JavaThread* thread = NULL; > > VMThread* vmthread = NULL; > > if (PosixSignals::are_signal_handlers_installed()) { > > if (t != NULL ){ > > if(t->is_Java_thread()) { > > thread = t->as_Java_thread(); > > } > > else if(t->is_VM_thread()){ > > vmthread = (VMThread *)t; > > } > > } > > } > > > > `vmthread` is unused on all platforms and can be removed. > > Ok. Once upon a time we probably did something different if in the > vmThread versus some other non-JavaThread. > Solaris did use it (and AIX), but only comparing with NULL (I guess as a shorthand for "t->is_VM_thread()"). > > > 5) Every platform handler has some variant of this section, to ignore > SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): > > > > if (sig == SIGPIPE || sig == SIGXFSZ) { > > // allow chained handler to go first > > if (PosixSignals::chained_handler(sig, info, ucVoid)) { > > return true; > > } else { > > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 > > return true; > > } > > } > > > > - On s390 and ppc, we miss SIGXFSZ handling > > _Update: Fixed separately for easier backport, see [ > https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. > > - both paths return true - section can be shortened > > > > Side note: having handlers for those signals may be unnecessary. We > could just set the signal handler to `SIG_IGN`. We would have to tiptoe > around any third party handlers for those signals, but it still may be > simpler. > > > > 6) At the end of every platform header, before calling into fatal error > handling, we unblock the signal: > > > >> // unmask current signal > >> sigset_t newset; > >> sigemptyset(&newset); > >> sigaddset(&newset, sig); > >> sigprocmask(SIG_UNBLOCK, &newset, NULL); > >> > > > > - Use of `sigprocmask()` is UB in a multithreaded program. > > - but then, this section is unnecessary anyway, since [ > https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask > error signals at the start of the signal handler. > > But is this guaranteed to be one of the error signals? What if the > application calls our handler for some other signal? I guess that is > their problem. > > Good point, but: - we only need to unblock error signals. There is no reasonable need to unblock other signals in fatal error handling (to the contrary, the rest should be kept blocked to not interfere with hs-err printing) - we only install handlers for: SIGSEGV, SIGPIPE, SIGBUS, SIGILL, SIGFPE, SIGTRAP, SIGXFSZ. Of those, we don't unblock SIGPIPE and SIGXFSZ. But those are handled at the entrance of javaSignalHandler_inner, so we should never enter fatal error handling because of those. But you raise an interesting point, I am not sure whether SIGXFSZ can be deferred. What happens if we write a big core file and that hits the file limit, but SIGXSFZ is blocked from delivery? Will we just get terminated? Well, maybe we should get terminated. But this is a question for another RFE. > > 7) the JFR crash protection is not consistently checked in all platform > handlers. > > > > 8) On Zero, when entering fatal error handling, we do so via fatal() > instead of VMError::report_and_die(), thereby discarding the real crash > context and obfuscating the register content in the hs-err file (we still > see registers, but those stem from the assertion-poison-page mechanism). > > > > 9) on Linux ppc64 and AIX, we have this section: > > > >> if (sig == SIGILL && (pc < (address) 0x200)) { > >> goto report_and_die; > >> } > > > > which is related to the fact that the zero page on AIX is readable, > filled with 0, and reading instructions from it will yield us a SIGILL, not > a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). > > > > This coding is irrelevant on Linux. On AIX, it can also be removed, > since this SIGILL would be unrecognized by the hotspot and later count as > fatal error anyway. > > > > 10) When invoking the fatal error handler, we extract the pc from the > context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is > not totally correct. According to POSIX [3], for those signals the address > of the faulting instruction is handed over in `si_info.si_addr`. > > > > On most platforms this does not matter, they are the same. But on some > architectures the pc in the signal context actually points somewhere else, > e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the > better choice. > > The Posix spec also states "For some implementations, the value of > si_addr may be inaccurate." - so I'm not at all sure which "pc" we > should be trusting here? I thought the ucontext was the detailed > platform specific "context" object that we should extract information > from. Which architectures give different values in the two and is there > some documentation stating what happens for any given os/cpu? > > I overlooked this. Well, this is a helpful standard :) I saw this happen on s390 and on pa-risc. Before this patch, I did correct this in the platform handler. Out of caution I could #ifdef this section to s390. > > ---- > > > > The changes in this patch: > > > > a) hotspot signal handling is now done by the following functions: > > > > > > | | > > v v > > javaSignalHandler JVM_handle_linux_signal() > > | / > > v v > > javaSignalHandler_inner > > Not clear why we need the _inner version. Why can't we just have > javaSignalHandler which is installed as the handler and which is called > by JVM_handle_XXX_signal? > Because JVM_handle_XXX_signal has one more argument than the standard signal handler (abort_if_unrecognized). > > > | > > v > > PosixSignals::pd_hotspot_signal_handler() > > > > > > The right branch only exists to support the `AllowUserSignalHandlers` > case, see [4]. > > > > `javaSignalHandler` is registered as handler, as it was before; > JVM_handle_linux_signal() is exported as before. > > `javaSignalHandler_inner` contains the shared portion of the signal > handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining > platform dependent portions. > > > > > > b) I commonized prologue- and epilogue coding. > > - I simplified (4) to a single line in the shared handler > > - I moved the JFR thread crash protection (7) up to the shared handler > > - I moved the complete epilogue up to the shared handler. That > includes calling the chained handlers, should they exist, as well as > invoking the fatal error handler. That fixes (8) and (6) > > - Zero has this tradition of showing a robot telling a cat about the > error signal, which I like, and kept. > > - I simplified (5) and commonized it, and removed (9) completely > > - In PosixSignals::install_signal_handlers(), I removed the > `signal_handlers_are_installed` guard and replaced it with an assert. > Unfortunately this causes lots of indentation changes. @gerard-ziemski: if > this clashes too much with your patch for JDK-8253742, I'll leave that part > out. > > > > Thanks for reviewing. > > > > Testing: this patch ran through our nightlies, in an earlier form. They > will be re-ran some more times. > > > > I'd be happy if aarch64 porters could take a look at the aarch64 portion > of this change. > > > > Please note that I had to draw a line somewhere - this is an open ended > issue and a lot more could be cleaned. See also Gerard's work on [5], which > is under review too. > > :) I've actually gone through this in far more detail as I've been > composing this email and overall it is looking very good. I've made a > couple of comments directly in the PR, in addition to the above. > > Thank you! I'll wait some more, then prepare an update. Cheers, Thomas > Thanks, > David > ----- > > > > > ---- > > > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html > > [2] > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html > > [3] > https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html > > [4] > https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html > > [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > > > ------------- > > > > Commit messages: > > - Initial patch > > > > Changes: https://git.openjdk.java.net/jdk/pull/1034/files > > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=00 > > Issue: https://bugs.openjdk.java.net/browse/JDK-8255711 > > Stats: 916 lines in 13 files changed: 116 ins; 686 del; 114 mod > > Patch: https://git.openjdk.java.net/jdk/pull/1034.diff > > Fetch: git fetch https://git.openjdk.java.net/jdk > pull/1034/head:pull/1034 > > > > PR: https://git.openjdk.java.net/jdk/pull/1034 > > > From thomas.stuefe at gmail.com Wed Nov 4 06:39:28 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 4 Nov 2020 07:39:28 +0100 Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: <7888a9d2-a3b9-9981-665f-1cb3181d199d@oracle.com> Message-ID: > >> > 6) At the end of every platform header, before calling into fatal error >> handling, we unblock the signal: >> > >> >> // unmask current signal >> >> sigset_t newset; >> >> sigemptyset(&newset); >> >> sigaddset(&newset, sig); >> >> sigprocmask(SIG_UNBLOCK, &newset, NULL); >> >> >> > >> > - Use of `sigprocmask()` is UB in a multithreaded program. >> > - but then, this section is unnecessary anyway, since [ >> https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask >> error signals at the start of the signal handler. >> >> But is this guaranteed to be one of the error signals? What if the >> application calls our handler for some other signal? I guess that is >> their problem. >> >> > Good point, but: > - we only need to unblock error signals. There is no reasonable need to > unblock other signals in fatal error handling (to the contrary, the rest > should be kept blocked to not interfere with hs-err printing) > - we only install handlers for: SIGSEGV, SIGPIPE, SIGBUS, SIGILL, SIGFPE, > SIGTRAP, SIGXFSZ. Of those, we don't unblock SIGPIPE and SIGXFSZ. But those > are handled at the entrance of javaSignalHandler_inner, so we should never > enter fatal error handling because of those. > But you raise an interesting point, I am not sure whether SIGXFSZ can be > deferred. What happens if we write a big core file and that hits the file > limit, but SIGXSFZ is blocked from delivery? Will we just get terminated? > Well, maybe we should get terminated. But this is a question for another > RFE. > > Missed part of your point here. Calling this function from the outside is only allowed for a couple of signals, see comment: // This routine may recognize any of the following kinds of signals: // SIGBUS, SIGSEGV, SIGILL, SIGFPE, SIGQUIT, SIGPIPE, SIGXFSZ, SIGUSR1. // It should be consulted by handlers for any of those signals. Note that this list is not really correct, as it includes SIGUSR1 and SIGQUIT. None of which are handled by the hotspot signal handler. As you wrote before, this mechanism is only to handle signals the hotspot commandeers. So I think JVM_handle_xx_signal() should test for the list of allowed signals, and just return false right away in case sig is none of the hotspot signals. >> From shade at openjdk.java.net Wed Nov 4 06:39:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 06:39:55 GMT Subject: RFR: 8255617: Zero: purge the remaining bytecode interpreter profiling support [v2] In-Reply-To: <3GuFh7in4L6v-mRtQQ-7FrwtA9eDzwuhpfpohDy4dm0=.8f75b5c9-d43d-4dcb-9dcd-60f321ada496@github.com> References: <3GuFh7in4L6v-mRtQQ-7FrwtA9eDzwuhpfpohDy4dm0=.8f75b5c9-d43d-4dcb-9dcd-60f321ada496@github.com> Message-ID: <1LGHbZw6ldM5ubjxAbyfXwkfFicg-W_YIwuResAYcL8=.bd0fca7a-d570-4241-a87d-eefa83cb22c3@github.com> On Tue, 3 Nov 2020 23:56:57 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8255617-zero-purge-bi-profiliing >> - Also remove now unused BytecodeInterpreter::mdx field >> - Remove leftover variable >> - Remove leftover comments >> - 8255617: Zero: purge the remaining bytecode interpreter profiling support > > Still looks good. Thanks folks! ------------- PR: https://git.openjdk.java.net/jdk/pull/944 From shade at openjdk.java.net Wed Nov 4 06:42:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 06:42:56 GMT Subject: Integrated: 8255617: Zero: purge the remaining bytecode interpreter profiling support In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 19:47:02 GMT, Aleksey Shipilev wrote: > All the stubs in `interpreter/zero/bytecodeInterpreterProfiling.hpp` are empty. History shows the whole thing gradually moved to template interpreter. We can probably simplify Zero code by dropping these empty stubs altogether. Arguably, this makes porting to new architectures a bit harder, but it seems that the proper stepping stone after Zero is implementing template interpreter anyway. > > On my TR 3970X, this improves: > - Linux x86_64 Zero "make images" times from ~18 minutes to ~17.5 minutes > > I would like to have the opinion of @GoeLin who added this for PPC64 porting back in 8u. And probably @DamonFool who is usually interested in Zero. And @jerboaa, @gnu-andrew who deal with Zero from time to time. This pull request has now been integrated. Changeset: a5d8a9c2 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/a5d8a9c2 Stats: 180 lines in 6 files changed: 2 ins; 171 del; 7 mod 8255617: Zero: purge the remaining bytecode interpreter profiling support Reviewed-by: coleenp, sgehwolf, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/944 From thartmann at openjdk.java.net Wed Nov 4 07:08:07 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 4 Nov 2020 07:08:07 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 18:41:04 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Restored handling of constant nodes > > src/hotspot/share/opto/cfgnode.cpp line 2308: > >> 2306: igvn->_worklist.remove(hook); >> 2307: } >> 2308: hook->destruct(); > > I think we should pass PhaseGVN* into destruct() and do removal from worklist there because it looks like repetitive pattern. Also we can take Compile pointer from PhaseGVN instead of calling Compile::current(): > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/node.cpp#L584 Thanks Vladimir, that's a good suggestion. I've updated the PR accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/994 From thartmann at openjdk.java.net Wed Nov 4 07:08:06 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 4 Nov 2020 07:08:06 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v3] In-Reply-To: References: Message-ID: <5wM4jpVaL-BlP7FdbWwQkWgpxZZbUlpTVoFw3SRBUqA=.2419c090-2d90-4ff1-8be9-d9167b0dc1ef@github.com> > C2 often creates temporary "hook" nodes to keep other nodes alive. Although dead, these are sometimes not removed because they don't end up on the IGVN worklist. JDK-8040213 added detection of modified nodes that are not re-processed by IGVN but currently ignores dead nodes. > > This patch includes the following changes: > - Adjust detection of modified nodes such that dead nodes are includes as well. This revealed several locations were dead nodes are not eagerly destructed (or not even added to the worklist for later removal). I've fixed all of these. > - No need to yank node inputs before calling `destruct`. > - `kill_dead_code` accidentally re-adds dead nodes to the `_modified_nodes` list. `Compile::remove_modified_node` should be called at the end to avoid this. > - Some removal of dead code. > > Tested with tier1-3, higher tiers are running. > > JDK-8255670 will further improve detection. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Pass PhaseValues to Node::destruct ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/994/files - new: https://git.openjdk.java.net/jdk/pull/994/files/e8899406..21a73e6d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=994&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=994&range=01-02 Stats: 41 lines in 12 files changed: 3 ins; 20 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/994.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/994/head:pull/994 PR: https://git.openjdk.java.net/jdk/pull/994 From alanb at openjdk.java.net Wed Nov 4 07:47:55 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Wed, 4 Nov 2020 07:47:55 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) In-Reply-To: References: Message-ID: <-u4eSM47o_e_KlfTRYBNGyLNhjqAeG-84u_uEd3ppH0=.c49b4269-1001-49d1-96fd-ecacbf2417e9@github.com> On Mon, 2 Nov 2020 11:26:51 GMT, Maurizio Cimadamore wrote: >> I looked through the changes in this update. >> >> The shared memory segment support looks sound and the mechanism to close a shared memory segment is clever (albeit a bit surprising at first that it does global handshake to look for a frame in a scoped region. Also surprising that close can cause failure at both ends - it took me a while to see that this is pragmatic approach). >> >> The share method specifies NPE if thread == null but there is no thread parameter, is this a cut 'n paste error? Another one in registerCleaner where it should be NPE if the cleaner is null. >> >> I think the javadoc for the close method needs to be a bit clearer on the state of the memory segment when IllegalStateException is thrown. Will it be marked "not alive" when it fails? Does this mean there is a resource leak? I think an apiNote to explain the rational for why close is not idempotent is also needed, or maybe it should be re-visited so that close is a no-op when the memory segment is not alive. >> >> Now that MemorySegment is AutoCloseable then maybe the term "alive" should be replaced with "open" or "closed" and isAlive replaced with isOpen is isClosed. >> >> FileDescriptor can be attraction nuisance and forced reference counting everywhere that it is used. Is it needed? Could an isMapped method work instead? >> >> mapFromPath was in the second preview but I think the method name should be re-examined as it maps a file, the path just locates the file. Naming is subjectives but in this case using "map" or "mapFile" would fit beside the allocateNative methods. >> >> MappedMemorySegments. The force method specifies a write back guarantee but at the same time, the implNote in the class description suggests that the methods might be a no-op. You might want to adjust the wording to avoid any suggestion that force might be a no-op. >> >> The javadoc for copyFrom isn't changed in this update but I notice it specifies IndexOutOfBoundException when the source segment is larger than the receiver, have other exceptions been examined? >> >> I don't have any any comments on MemoryAccess except that it's not immediately clear why there are "Byte" methods that take a ByteOrder. Make sense for the multi-byte types of course. >> >> The updates the java/nio sources look okay but it would be helpful if the really long lines could be chopped down as it's just too hard to do side-by-side reviews when the lines are so long. A minor nit but the changes X-Buffer.java.template mess up the alignment of the parameters to copyMemory/copySwapMemory methods. > >> The javadoc for copyFrom isn't changed in this update but I notice it specifies IndexOutOfBoundException when the source segment is larger than the receiver, have other exceptions been examined? > > This exception is consistent with other uses of this exception throughout this API (e.g. when writing a segment out of bounds). I assume the CSR needs to be updated so that it's in sync with the API changes in the latest round. ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From ngasson at openjdk.java.net Wed Nov 4 08:33:56 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 4 Nov 2020 08:33:56 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 16:58:58 GMT, Anton Kozlov wrote: > JDK-8255716 (#983) uncovered that os::processor_count on Linux can be not equal to number of cores reported in /proc/cpuinfo. The latter historically was used to decide CPU_A53MAC feature. This patch restores feature detection based on /proc/cpuinfo. > > Testing: linux -version > Unfortunately I cannot test windows/aarch64, CC: @lewurm @luhenry src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 186: > 184: > 185: if (_cpu == CPU_ARM && (_model == 0xd07 || _model2 == 0xd07)) _features |= CPU_STXR_PREFETCH; > 186: // If max number of cores (on Linux reported in /proc/cpuinfo) then if _model is an A57 (0xd07) Should be "if max number of cores = 1"? src/hotspot/cpu/aarch64/vm_version_aarch64.hpp line 43: > 41: static int _stepping; > 42: > 43: // Used to decide CPU_A53MAC feature of some single-core CPUs. Note this It's not a feature of single-core CPUs: AFAIK it's a work around for very old arm64 kernels that only reported a single CPU in `/proc/cpuinfo` on multi-core systems where you may have a mix of different CPU types (i.e. mixed A53/A57 where the A57 is reported in cpuinfo). I wonder if we should just remove this workaround altogether? The patch to list all CPUs in `/proc/cpuinfo` was backported to at least the 3.10 series. I really doubt there's anyone running latest OpenJDK on a A53 with such an old kernel. ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From ngasson at openjdk.java.net Wed Nov 4 08:37:54 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 4 Nov 2020 08:37:54 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 08:30:41 GMT, Nick Gasson wrote: >> JDK-8255716 (#983) uncovered that os::processor_count on Linux can be not equal to number of cores reported in /proc/cpuinfo. The latter historically was used to decide CPU_A53MAC feature. This patch restores feature detection based on /proc/cpuinfo. >> >> Testing: linux -version >> Unfortunately I cannot test windows/aarch64, CC: @lewurm @luhenry > > src/hotspot/cpu/aarch64/vm_version_aarch64.hpp line 43: > >> 41: static int _stepping; >> 42: >> 43: // Used to decide CPU_A53MAC feature of some single-core CPUs. Note this > > It's not a feature of single-core CPUs: AFAIK it's a work around for very old arm64 kernels that only reported a single CPU in `/proc/cpuinfo` on multi-core systems where you may have a mix of different CPU types (i.e. mixed A53/A57 where the A57 is reported in cpuinfo). > > I wonder if we should just remove this workaround altogether? The patch to list all CPUs in `/proc/cpuinfo` was backported to at least the 3.10 series. I really doubt there's anyone running latest OpenJDK on a A53 with such an old kernel. Here is the patch prints CPU features for all CPUs, backported to 3.10: https://patchwork.kernel.org/project/linux-arm-kernel/patch/20150209083040.217202212 at linuxfoundation.org/ ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From eosterlund at openjdk.java.net Wed Nov 4 09:06:07 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Nov 2020 09:06:07 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v4] In-Reply-To: References: Message-ID: > The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation(). > > The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop. > > Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is wrong, as we can safepoint on the way back to Java, which will break the return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the transition to VM and back, into a block of code that is protected against GC. Before the JRT_BLOCK is called, we stash away the return oop, and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when InterpreterRuntime::post_method_exit is called when throwing an exception, we don't have the same problem of retaining an oop result, and hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the same as before for this path. > > This is a JVMTI bug that has probably been around for a long time. It crashes with all GCs, but was discovered recently after concurrent stack processing, as StefanK has been running better GC stressing code in JVMTI, and the bug reproduced more easily with concurrent stack processing, as the timings were a bit different. The following reproducer failed pretty much 100% of the time: > while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done > > With my fix I can run this repeatedly without any more failures. I have also sanity checked the patch by running tier 1-5, so that it does not introduces any new issues on its own. I have also used Stefan's nice external GC stressing with jcmd technique that was used to trigger crashes with other GCs, to make sure said crashes no longer reproduce either. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Serguei CR2: Don't check interpreted only ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/930/files - new: https://git.openjdk.java.net/jdk/pull/930/files/4d68c624..95306514 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=930&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=930&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/930.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/930/head:pull/930 PR: https://git.openjdk.java.net/jdk/pull/930 From eosterlund at openjdk.java.net Wed Nov 4 09:06:07 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Nov 2020 09:06:07 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 17:49:38 GMT, Serguei Spitsyn wrote: > Hi Erik, > > I'm not sure, if this fragment is still needed: > > ``` > 1620 if (state == NULL || !state->is_interp_only_mode()) { > 1621 // for any thread that actually wants method exit, interp_only_mode is set > 1622 return; > 1623 } > ``` Seems like it is not needed. I removed it. > Also, can it be that this condition is true: > ` (state == NULL || !state->is_interp_only_mode())` > but the top frame is interpreted? > If so, then should we still safe/restore the result oop over a possible safepoint? It could definitely be that the top frame is interpreted even though that condition is true. However, we now enter InterpreterRuntime::post_method_exit as a JRT_BLOCK_ENTRY call, which performs no transition (similar to JRT_LEAF). So if so we should just return back to the caller without doing anything, and no GC will happen in this path then. It is only when we perform the JRT_BLOCK and JRT_BLOCK_END that we allow GCs to happen, and we save/restore the result across that section. Thanks, > Thanks, > Serguei ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From kbarrett at openjdk.java.net Wed Nov 4 09:10:27 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 09:10:27 GMT Subject: RFR: 8188055: (ref) Add Reference::refersTo predicate [v7] In-Reply-To: References: Message-ID: > Finally returning to this review that was started in April 2020. I've > recast it as a github PR. I think the security concern raised by Gil > has been adequately answered. > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-April/029203.html > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-July/030401.html > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-August/030677.html > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030793.html > > Please review a new function: java.lang.ref.Reference.refersTo. > > This function is needed to test the referent of a Reference object without > artificially extending the lifetime of the referent object, as may happen > when calling Reference.get. Some garbage collectors require extending the > lifetime of a weak referent when accessed, in order to maintain collector > invariants. Lifetime extension may occur with any collector when the > Reference is a SoftReference, as calling get indicates recent access. This > new function also allows testing the referent of a PhantomReference, which > can't be accessed by calling get. > > The new function uses native methods whose implementations are in the VM so > they can use the Access API. It is the intent that these methods will be > intrinsified by optimizing compilers like C2 or graal, but that hasn't been > implemented yet. Bear that in mind before rushing off to change existing > uses of Reference.get. > > There are two native methods involved, one in Reference and an override in > PhantomReference, both package private in java.lang.ref. The reason for this > split is to simplify the intrinsification. This is a change from the version > from April 2020; that version had a single native method in Reference, > implemented using the ON_UNKNOWN_OOP_REF Access reference strength category. > However, adding support for that category in the compilers adds significant > implementation effort and complexity. Splitting avoids that complexity. > > Testing: > mach5 tier1 > Locally (linux-x64) verified the new test passes with various garbage collectors. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into refersto - improve wording in refersTo javadoc - Merge branch 'master' into refersto - More explicit refersTo0 comment. - simplify test - cleanup nits from Mandy - use Object instead of TestObject - improve refersTo0 descriptions - basic functional test - change referent access - ... and 3 more: https://git.openjdk.java.net/jdk/compare/f06d7348...79277ff3 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/498/files - new: https://git.openjdk.java.net/jdk/pull/498/files/3a15b6a9..79277ff3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=498&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=498&range=05-06 Stats: 90837 lines in 1555 files changed: 63919 ins; 19502 del; 7416 mod Patch: https://git.openjdk.java.net/jdk/pull/498.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/498/head:pull/498 PR: https://git.openjdk.java.net/jdk/pull/498 From kbarrett at openjdk.java.net Wed Nov 4 09:23:00 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 09:23:00 GMT Subject: Integrated: 8188055: (ref) Add Reference::refersTo predicate In-Reply-To: References: Message-ID: On Sun, 4 Oct 2020 03:59:59 GMT, Kim Barrett wrote: > Finally returning to this review that was started in April 2020. I've > recast it as a github PR. I think the security concern raised by Gil > has been adequately answered. > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-April/029203.html > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-July/030401.html > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-August/030677.html > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-September/030793.html > > Please review a new function: java.lang.ref.Reference.refersTo. > > This function is needed to test the referent of a Reference object without > artificially extending the lifetime of the referent object, as may happen > when calling Reference.get. Some garbage collectors require extending the > lifetime of a weak referent when accessed, in order to maintain collector > invariants. Lifetime extension may occur with any collector when the > Reference is a SoftReference, as calling get indicates recent access. This > new function also allows testing the referent of a PhantomReference, which > can't be accessed by calling get. > > The new function uses native methods whose implementations are in the VM so > they can use the Access API. It is the intent that these methods will be > intrinsified by optimizing compilers like C2 or graal, but that hasn't been > implemented yet. Bear that in mind before rushing off to change existing > uses of Reference.get. > > There are two native methods involved, one in Reference and an override in > PhantomReference, both package private in java.lang.ref. The reason for this > split is to simplify the intrinsification. This is a change from the version > from April 2020; that version had a single native method in Reference, > implemented using the ON_UNKNOWN_OOP_REF Access reference strength category. > However, adding support for that category in the compilers adds significant > implementation effort and complexity. Splitting avoids that complexity. > > Testing: > mach5 tier1 > Locally (linux-x64) verified the new test passes with various garbage collectors. This pull request has now been integrated. Changeset: 6023f6b1 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/6023f6b1 Stats: 501 lines in 13 files changed: 488 ins; 0 del; 13 mod 8188055: (ref) Add Reference::refersTo predicate Reviewed-by: mchung, pliden, rriggs, dholmes, ihse, smarks, alanb ------------- PR: https://git.openjdk.java.net/jdk/pull/498 From tvaleev at openjdk.java.net Wed Nov 4 09:34:11 2020 From: tvaleev at openjdk.java.net (Tagir F.Valeev) Date: Wed, 4 Nov 2020 09:34:11 GMT Subject: RFR: 8188055: (ref) Add Reference::refersTo predicate [v6] In-Reply-To: References: <0dhF_xxcp1VoUowwdZenB2qWa9ILcZjTMe3lsaRrg7k=.3c633db8-f745-4353-ad34-a64fbc96d4e0@github.com> Message-ID: On Wed, 28 Oct 2020 15:56:48 GMT, Alan Bateman wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> improve wording in refersTo javadoc > > The API looks good, thanks for getting this in. Hello! As an IDE developer, I'm thinking about IDE inspection that may suggest the new method. My idea is to suggest replacing every `ref.get() == obj` with `ref.refersTo(obj)`. Is this a good idea or there are cases when `ref.get() == obj` could be preferred? What do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/498 From kbarrett at openjdk.java.net Wed Nov 4 09:36:02 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 09:36:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: References: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> <0dSlRGfxnjP_6IlH5CEYdF48d3ymvjCKE_s4UZb_WNc=.789216de-6d78-4c00-bdd4-e09f1dec3924@github.com> <1qM8Skbob0uL_KwdoJNDTyavFxOH_VHJc5o6yF881zI=.604bc76e-0536-48a0-91d5-4ba85e32bc11@github.com> Message-ID: On Wed, 4 Nov 2020 07:41:39 GMT, Kim Barrett wrote: >>>Though it might be possible to go even further and eliminate WeakProcessorPhases as a thing separate from OopStorageSet. >> >> This makes sense. Can we file another RFE for this? I was sort of surprised by how much code was involved so I tried to find a place to stop deleting. > >> Ok, so I'm not sure what to do with this: >> >> enum Phase { >> // Serial phase. >> JVMTI_ONLY(jvmti) >> // Additional implicit phase values follow for oopstorages. >> `};` >> >> I've removed the only thing in this enum. > > Enums without any named enumerators are still meaningful types. More so with scoped enums, but still with unscoped enums. > > Though it might be possible to go even further and eliminate WeakProcessorPhases as a thing separate from OopStorageSet. > > This makes sense. Can we file another RFE for this? I was sort of surprised by how much code was involved so I tried to find a place to stop deleting. I think the deletion stopped at the wrong place; it either went too far, or not far enough. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Wed Nov 4 09:36:02 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 09:36:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: <1qM8Skbob0uL_KwdoJNDTyavFxOH_VHJc5o6yF881zI=.604bc76e-0536-48a0-91d5-4ba85e32bc11@github.com> References: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> <0dSlRGfxnjP_6IlH5CEYdF48d3ymvjCKE_s4UZb_WNc=.789216de-6d78-4c00-bdd4-e09f1dec3924@github.com> <1qM8Skbob0uL_KwdoJNDTyavFxOH_VHJc5o6yF881zI=.604bc76e-0536-48a0-91d5-4ba85e32bc11@github.com> Message-ID: On Tue, 3 Nov 2020 23:38:08 GMT, Coleen Phillimore wrote: >> Ok, so I'm not sure what to do with this: >> >> enum Phase { >> // Serial phase. >> JVMTI_ONLY(jvmti) >> // Additional implicit phase values follow for oopstorages. >> `};` >> >> I've removed the only thing in this enum. > >>Though it might be possible to go even further and eliminate WeakProcessorPhases as a thing separate from OopStorageSet. > > This makes sense. Can we file another RFE for this? I was sort of surprised by how much code was involved so I tried to find a place to stop deleting. > Ok, so I'm not sure what to do with this: > > enum Phase { > // Serial phase. > JVMTI_ONLY(jvmti) > // Additional implicit phase values follow for oopstorages. > `};` > > I've removed the only thing in this enum. Enums without any named enumerators are still meaningful types. More so with scoped enums, but still with unscoped enums. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Wed Nov 4 09:36:03 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 09:36:03 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 21:31:35 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/jvmtiTagMap.cpp line 2979: >> >>> 2977: >>> 2978: // Concurrent GC needs to call this in relocation pause, so after the objects are moved >>> 2979: // and have their new addresses, the table can be rehashed. >> >> I think the comment is confusing and wrong. The requirement is that the collector must call this before exposing moved objects to the mutator, and must provide the to-space invariant. (This whole design would not work with the old Shenandoah barriers without additional work. I don't know if tagmaps ever worked at all for them? Maybe they added calls to Access<>::resolve (since happily deceased) to deal with that?) I also think there are a bunch of missing calls; piggybacking on the num-dead callback isn't correct (see later comment about that). > > So the design is that when the oops have new addresses, we set a flag in the table to rehash it. Not sure why this is wrong and why wouldn't it work for shenandoah? @zhengyu123 ? When we call WeakHandle.peek()/resolve() after the call, the new/moved oop address should be returned. Why wouldn't this be the case? I didn't say it "doesn't work for shenandoah", I said it wouldn't have worked with the old shenandoah barriers without additional work, like adding calls to resolve. I understand the design intent of notifying the table management that its hash codes are out of date. And the num-dead callback isn't the right place, since there are num-dead callback invocations that aren't associated with hash code invalidation. (It's not a correctness wrong, it's a "these things are unrelated and this causes unnecessary work" wrong.) >> src/hotspot/share/prims/jvmtiTagMap.cpp line 3015: >> >>> 3013: if (tag_map != NULL && !tag_map->is_empty()) { >>> 3014: if (num_dead_entries != 0) { >>> 3015: tag_map->hashmap()->unlink_and_post(tag_map->env()); >> >> Why are we doing this in the callback, rather than just setting a flag? I thought part of the point of this change was to get tagmap processing out of GC pauses. The same question applies to the non-safepoint side. The idea was to be lazy about updating the tagmap, waiting until someone actually needed to use it. Or if more prompt ObjectFree notifications are needed then signal some thread (maybe the service thread?) for followup. > > The JVMTI code expects the posting to be done quite eagerly presumably during GC, before it has a chance to disable the event or some other such operation. So the posting is done during the notification because it's as soon as possible. Deferring to the ServiceThread had two problems. 1. the event came later than the caller is expecting it, and in at least one test the event was disabled before posting, and 2. there's a comment in the code why we can't post events with a JavaThread. We'd have to transition into native while holding a no safepoint lock (or else deadlock). The point of making this change was so that the JVMTI table does not need GC code to serially process the table. I know of nothing that leads to "presumably during GC" being a requirement. Having all pending events of some type occur before that type of event is disabled seems like a reasonable requirement, but just means that event disabling also requires the table to be "up to date", in the sense that any GC-cleared entries need to be dealt with. That can be handled just like other operations that use the table contents, rather than during the GC. That is, use post_dead_object_on_vm_thread if there are or might be any pending dead objects, before disabling the event. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Wed Nov 4 09:36:04 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 09:36:04 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v5] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 00:08:10 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Code review comments from Kim and Albert. src/hotspot/share/prims/jvmtiTagMapTable.hpp line 36: > 34: class JvmtiTagMapEntryClosure; > 35: > 36: class JvmtiTagMapEntry : public HashtableEntry { By using utilities/hashtable this buys into having to use HashtableEntry, which includes the _hash member, even though that value is trivially computed from the key (since we're using address-based hashing here). This costs an additional 8 bytes (_LP64) per entry (a 25% increase) compared to the old JvmtiTagHashmapEntry. (I think it doesn't currently make a difference on !_LP64 because of poorly chosen layout in the old code, but fixing that would make the difference 33%). It seems like it should not have been hard to replace the oop _object member in the old code with a WeakHandle while otherwise maintaining the Entry interface, allowing much of the rest of the code to remain the same or similar and not incurring this additional space cost. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Wed Nov 4 09:36:04 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 09:36:04 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 07:52:12 GMT, Kim Barrett wrote: >> So the design is that when the oops have new addresses, we set a flag in the table to rehash it. Not sure why this is wrong and why wouldn't it work for shenandoah? @zhengyu123 ? When we call WeakHandle.peek()/resolve() after the call, the new/moved oop address should be returned. Why wouldn't this be the case? > > I didn't say it "doesn't work for shenandoah", I said it wouldn't have worked with the old shenandoah barriers without additional work, like adding calls to resolve. I understand the design intent of notifying the table management that its hash codes are out of date. And the num-dead callback isn't the right place, since there are num-dead callback invocations that aren't associated with hash code invalidation. (It's not a correctness wrong, it's a "these things are unrelated and this causes unnecessary work" wrong.) It used to be that jvmti tagmap processing was all-in-one (in GC weak reference processing, with the weak clearing, dead table entry removal, and rehashing all done in one pass. This change has split that up, with the weak clearing happening in a different place (still as part of the GC's weak reference processing) than the others (which I claim can now be part of the mutator, whether further separated or not). "Concurrent GC" has nothing to do with whether tagmaps need rehashing. Any copying collector needs to do so. A non-copying collector (whether concurrent or not) would not. (We don't have any of those in HotSpot today.) And weak reference clearing (whether concurrent or not) has nothing to do with whether objects have been moved and so the hashing has been invalidated. There's also a "well known" issue with address-based hashing and generational or similar collectors, where a simple boolean "objects have moved" flag can be problematic, and which tagmaps seem likely to be prone to. The old do_weak_oops tries to mitigate it by recognizing when the object didn't move. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Wed Nov 4 09:38:00 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 09:38:00 GMT Subject: RFR: 8188055: (ref) Add Reference::refersTo predicate [v6] In-Reply-To: References: <0dhF_xxcp1VoUowwdZenB2qWa9ILcZjTMe3lsaRrg7k=.3c633db8-f745-4353-ad34-a64fbc96d4e0@github.com> Message-ID: On Wed, 4 Nov 2020 09:31:13 GMT, Tagir F. Valeev wrote: >> The API looks good, thanks for getting this in. > > Hello! > > As an IDE developer, I'm thinking about IDE inspection that may suggest the new method. My idea is to suggest replacing every `ref.get() == obj` with `ref.refersTo(obj)`. Is this a good idea or there are cases when `ref.get() == obj` could be preferred? What do you think? Thanks to a whole host of folks for reviews and comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/498 From aph at redhat.com Wed Nov 4 09:54:59 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 4 Nov 2020 09:54:59 +0000 Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: On 04/11/2020 08:33, Nick Gasson wrote: > It's not a feature of single-core CPUs: AFAIK it's a work around for very old arm64 kernels that only reported a single CPU in `/proc/cpuinfo` on multi-core systems where you may have a mix of different CPU types (i.e. mixed A53/A57 where the A57 is reported in cpuinfo). > > I wonder if we should just remove this workaround altogether? The patch to list all CPUs in `/proc/cpuinfo` was backported to at least the 3.10 series. I really doubt there's anyone running latest OpenJDK on a A53 with such an old kernel. Yes, please. And while we're talking about Linux, can we really not get the info we need without parsing /proc/cpuinfo? And do we need to parse the entire file? This is not good for startup. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kbarrett at openjdk.java.net Wed Nov 4 10:08:00 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 10:08:00 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: <172AiTMoD9T5iKu5xEVQ3AEixTDRx4gaAx0pQFfR57k=.d1d7648c-fb6f-4e87-b515-295c4e6187f7@github.com> On Tue, 3 Nov 2020 21:40:39 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/jvmtiTagMap.cpp line 127: >> >>> 125: // The table cleaning, posting and rehashing can race for >>> 126: // concurrent GCs. So fix it here once we have a lock or are >>> 127: // at a safepoint. >> >> I think this comment and the one below about locking are confused, at least about rehashing. I _think_ this is referring to concurrent num-dead notification? I've already commented there about it being a problem to do the unlink &etc in the GC pause (see later comment). It also seems like a bad idea to be doing this here and block progress by a concurrent GC because we're holding the tagmap lock for a long time, which is another reason to not have the num-dead notification do very much (and not require a lock that might be held here for a long time). > > The comment is trying to describe the situation like: > 1. mark-end pause (WeakHandle.peek() returns NULL because object A is unmarked) > 2. safepoint for heap walk > 2a. Need to post ObjectFree event for object A before the heap walk doesn't find object A. > 3. gc_notification - would have posted an ObjectFree event for object A if the heapwalk hadn't intervened > > The check_hashmap() function also checks whether the hash table needs to be rehashed before the next operation that uses the hashtable. > > Both operations require the table to be locked. > > The unlink and post needs to be in a GC pause for reasons that I stated above. The unlink and post were done in a GC pause so this isn't worse for any GCs. The lock can be held for concurrent GC while the number of entries are processed and this would be a delay for some applications that have requested a lot of tags, but these applications have asked for this and it's not worse than what we had with GC walking this table in safepoints. For the GCs that call the num_dead notification in a pause it is much worse than what we had. As I pointed out elsewhere, it used to be that tagmap processing was all-in-one, as a single serial subtask taken by the first thread that reached it in WeakProcessor processing. Other threads would find that subtask taken and move on to processing oopstores in parallel with the tagmap processing. Now everything except the oopstorage-based clearing of dead entries is a single threaded serial task done by the VMThread, after all the parallel WeakProcessor work is done, because that's where the num-dead callbacks are invoked. WeakProcessor's parallel oopstorage processing doesn't have a way to do the num-dead callbacks by the last thread out of each parallel oopstorage processing. Instead it's left to the end, on the assumption that the callbacks are relatively cheap. But that could still be much worse than the old code, since the tagmap oopstorage could be late in the order of processing, an d so still effectively be a serial subtask after all the parallel subtasks are done or mostly done. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Wed Nov 4 10:16:59 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Nov 2020 10:16:59 GMT Subject: RFR: 8188055: (ref) Add Reference::refersTo predicate [v6] In-Reply-To: References: <0dhF_xxcp1VoUowwdZenB2qWa9ILcZjTMe3lsaRrg7k=.3c633db8-f745-4353-ad34-a64fbc96d4e0@github.com> Message-ID: On Wed, 4 Nov 2020 09:31:13 GMT, Tagir F. Valeev wrote: > Hello! > > As an IDE developer, I'm thinking about IDE inspection that may suggest the new method. My idea is to suggest replacing every `ref.get() == obj` with `ref.refersTo(obj)`. Is this a good idea or there are cases when `ref.get() == obj` could be preferred? What do you think? Those have different behaviors when ref's class overrides get. Sometimes that might be intentional (PhantomReference, where get blocks access to the referent, and SoftReference, where get may update heuristics for recent accesses delaying GC clearing). But if some further subclass overrides get for some reason, such a change might not be appropriate. ------------- PR: https://git.openjdk.java.net/jdk/pull/498 From mdoerr at openjdk.java.net Wed Nov 4 11:10:59 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 4 Nov 2020 11:10:59 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v4] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 03:14:09 GMT, Ziviani wrote: >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > Ziviani has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions > > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Nice work! Looks correct and much cleaner than before. I only have a few improvement requests left. src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 240: > 238: > 239: // set dst to -1, 0, +1 > 240: inline void MacroAssembler::set_cmp3(Register dst) { Please add assert_different_registers(dst, R0); src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 241: > 239: // set dst to -1, 0, +1 > 240: inline void MacroAssembler::set_cmp3(Register dst) { > 241: // P10, prefer using setbc intructions Please adapt style: 2 leading spaces in C++ code src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 246: > 244: setnbc(dst, CCR0, Assembler::less); > 245: } > 246: else { Please adapt style: } else { src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 255: > 253: > 254: // set dst to -1, 0, +1 > 255: inline void MacroAssembler::set_cmpu3(Register dst) { Shorter possiblity with only 1 additional instruction on any Power version: cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); // treat overflow like less set_cmp3(dst); src/hotspot/cpu/ppc/ppc.ad line 11424: > 11422: instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2) %{ > 11423: match(Set dst (CmpL3 src1 src2)); > 11424: ins_cost(DEFAULT_COST * (VM_Version::has_brw() ? 4 : 5)); "size" needs to be precise, but a rough estimate is sufficient for "ins_const". In this case CmpL3 has only one match rule, so matcher doesn't have a choice and cost is pointless. So I suggest to keep it more simple and make cost independent on has_brw. ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/907 From mdoerr at openjdk.java.net Wed Nov 4 11:47:58 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 4 Nov 2020 11:47:58 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v4] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 10:58:39 GMT, Martin Doerr wrote: >> Ziviani has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: >> >> 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions >> >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 255: > >> 253: >> 254: // set dst to -1, 0, +1 >> 255: inline void MacroAssembler::set_cmpu3(Register dst) { > > Shorter possiblity with only 1 additional instruction on any Power version: > cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); // treat unordered like less > set_cmp3(dst); Or even better with parameter: inline void MacroAssembler::set_cmpu3(Register dst, bool treat_unordered_like_less) { if (treat_unordered_like_less) { cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); // treat unordered like less } else { cror(CCR0, Assembler::greater, CCR0, Assembler::summary_overflow); // treat unordered like greater } set_cmp3(dst); } This allows more cleanup in interpreter and C1. (unordered_result is only +1 or -1 in TemplateTable::float_cmp which we can assert.) ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From redestad at openjdk.java.net Wed Nov 4 12:02:05 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 12:02:05 GMT Subject: RFR: 8255894: Remove unused StubRoutines::_zero_aligned_words Message-ID: _zero_aligned_words was a SPARC-only optimization added by JDK-7059037 ------------- Commit messages: - Remove unused StubRoutines::_zero_aligned_words Changes: https://git.openjdk.java.net/jdk/pull/1053/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1053&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255894 Stats: 7 lines in 2 files changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1053.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1053/head:pull/1053 PR: https://git.openjdk.java.net/jdk/pull/1053 From glaubitz at openjdk.java.net Wed Nov 4 12:12:58 2020 From: glaubitz at openjdk.java.net (John Paul Adrian Glaubitz) Date: Wed, 4 Nov 2020 12:12:58 GMT Subject: RFR: 8255894: Remove unused StubRoutines::_zero_aligned_words In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 11:51:20 GMT, Claes Redestad wrote: > _zero_aligned_words was a SPARC-only optimization added by JDK-7059037 FWIW, we are still building Zero on SPARC in Debian. So, if there is an extra alignment that's needed on SPARC, I'd appreciate if it could stay in in case it's required for SPARC. ------------- PR: https://git.openjdk.java.net/jdk/pull/1053 From glaubitz at openjdk.java.net Wed Nov 4 12:16:56 2020 From: glaubitz at openjdk.java.net (John Paul Adrian Glaubitz) Date: Wed, 4 Nov 2020 12:16:56 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 23:05:58 GMT, Bernhard Urban-Forster wrote: >> JDK-8255716 (#983) uncovered that os::processor_count on Linux can be not equal to number of cores reported in /proc/cpuinfo. The latter historically was used to decide CPU_A53MAC feature. This patch restores feature detection based on /proc/cpuinfo. >> >> Testing: linux -version >> Unfortunately I cannot test windows/aarch64, CC: @lewurm @luhenry > > Tested slowdebug build on Windows+Arm64 with this patch, and smoked tested it with `jtreg:tier1_compiler_1` successfully. > > Change itself looks go to me too (but I'm not a reviewer). Does this account for the fact that CPU cores can be offline? Normally, the number of available cores is determined with ```sysconf()``` in a portable manner and ```sysconf()``` differentiates between _configured_ and _online_ processors: - _SC_NPROCESSORS_CONF The number of processors configured. See also get_nprocs_conf(3). - _SC_NPROCESSORS_ONLN The number of processors currently online (available). See also get_nprocs_conf(3). The number of online processors can be lower than the number of configured processors. I remember fixing an issue regarding this feature in PulseAudio as the testsuite broke on SPARC on Linux, see: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/commit/1df21e6ab6cd42e2f7601a6c5577c20b7e3d1046 ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From coleenp at openjdk.java.net Wed Nov 4 12:21:12 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 4 Nov 2020 12:21:12 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v5] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 05:37:00 GMT, Serguei Spitsyn wrote: >> Hi Coleen, >> >> Wow, there are a lot of simplifications and code removal with this fix! >> It looks great in general, just some nits below. >> I also wanted to suggest renaming the 'set_needs_processing' to 'set_needs_rehashing'. :) >> >> src/hotspot/share/prims/jvmtiTagMap.hpp: >> >> Nit: Would it better to use a plural form 'post_dead_objects_on_vm_thread'? : >> `+ void post_dead_object_on_vm_thread();` >> >> src/hotspot/share/prims/jvmtiTagMap.cpp: >> >> Nit: It'd be nice to add a short comment before the check_hashmap similar to L143 also explaining a difference (does check and post just for one env) with the check_hashmaps_for_heapwalk: >> 122 void JvmtiTagMap::check_hashmap(bool post_events) { >> . . . >> 143 // This checks for posting and rehashing and is called from the heap walks. >> 144 void JvmtiTagMap::check_hashmaps_for_heapwalk() { >> >> I'm just curious how this fragment was added. Did you get any failures in testing? : >> 1038 // skip if object is a dormant shared object whose mirror hasn't been loaded >> 1039 if (obj != NULL && obj->klass()->java_mirror() == NULL) { >> 1040 log_debug(cds, heap)("skipped dormant archived object " INTPTR_FORMAT " (%s)", p2i(obj), >> 1041 obj->klass()->external_name()); >> 1042 return; >> 1043 } >> >> Nit: Can we rename this field to something like '_some_dead_found' or '_dead_found'? : >> `1186 bool _some_dead;` >> >> Nit: The lines 2997-3007 and 3009-3019 do the same but in different contexts. >> 2996 if (!is_vm_thread) { >> 2997 if (num_dead_entries != 0) { >> 2998 JvmtiEnvIterator it; >> 2999 for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { >> 3000 JvmtiTagMap* tag_map = env->tag_map_acquire(); >> 3001 if (tag_map != NULL) { >> 3002 // Lock each hashmap from concurrent posting and cleaning >> 3003 tag_map->unlink_and_post_locked(); >> 3004 } >> 3005 } >> 3006 // there's another callback for needs_rehashing >> 3007 } >> 3008 } else { >> 3009 assert(SafepointSynchronize::is_at_safepoint(), "must be"); >> 3010 JvmtiEnvIterator it; >> 3011 for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { >> 3012 JvmtiTagMap* tag_map = env->tag_map_acquire(); >> 3013 if (tag_map != NULL && !tag_map->is_empty()) { >> 3014 if (num_dead_entries != 0) { >> 3015 tag_map->hashmap()->unlink_and_post(tag_map->env()); >> 3016 } >> 3017 // Later GC code will relocate the oops, so defer rehashing until then. >> 3018 tag_map->_needs_rehashing = true; >> 3019 } >> It feels like it can be refactored/simplified, at least, a little bit. >> Is it possible to check and just return if (num_dead_entries == 0)? >> If not, then, at least, it can be done the same way (except of locking). >> Q: Should the _needs_rehashing be set in both contexts? >> >> Also, can we have just one (static?) lock for the whole gc_notification (not per JVMTI env/JvmtiTagMap)? How much do we win by locking per each env/JvmtiTagMap? Note, that in normal case there is just one agent. It is very rare to have multiple agents requesting object tagging and ObjectFree events. It seems, this can be refactored to more simple code with one function doing work in both contexts. >> >> src/hotspot/share/utilities/hashtable.cpp: >> >> Nit: Need space after the '{' : >> `+const int _small_table_sizes[] = {107, 1009, 2017, 4049, 5051, 10103, 20201, 40423 } ;` >> >> src/hotspot/share/prims/jvmtiTagMapTable.cpp: >> >> Nit: Extra space after assert: >> `119 assert (find(index, hash, obj) == NULL, "shouldn't already be present");` >> >> Thanks, >> Serguei > > More about possible refactoring of the JvmtiTagMap::gc_notification(). > I'm thinking about something like below: > > void JvmtiTagMap::unlink_and_post_for_all_envs() { > if (num_dead_entries == 0) { > return; // nothing to unlink and post > } > JvmtiEnvIterator it; > for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { > JvmtiTagMap* tag_map = env->tag_map_acquire(); > if (tag_map != NULL && !tag_map->is_empty()) { > tag_map->unlink_and_post(); > } > } > } > > void JvmtiTagMap::gc_notification(size_t num_dead_entries) { > if (Thread::current()->is_VM_thread()) { > assert(SafepointSynchronize::is_at_safepoint(), "must be"); > unlink_and_post_for_all_envs(); > set_needs_rehashing(); > } else { > MutexLocker ml(JvmtiTagMap_lock(), Mutex::_no_safepoint_check_flag); > unlink_and_post_for_all_envs(); > // there's another callback for needs_rehashing > } > } > > If we still need a lock per each JvmtiTagMap then it is possible to add this fragment to the unlink_and_post_for_all_envs: > bool is_vm_thread = Thread::current()->is_VM_thread() > MutexLocker ml(is_vm_thread ? NULL : lock(), Mutex::_no_safepoint_check_flag); > > Then the code above could look like below: > > void JvmtiTagMap::unlink_and_post_for_all_envs() { > if (num_dead_entries == 0) { > return; // nothing to unlink and post > } > bool is_vm_thread = Thread::current()->is_VM_thread() > JvmtiEnvIterator it; > for (JvmtiEnv* env = it.first(); env != NULL; env = it.next(env)) { > JvmtiTagMap* tag_map = env->tag_map_acquire(); > if (tag_map != NULL && !tag_map->is_empty()) { > MutexLocker ml(is_vm_thread ? NULL : lock(), Mutex::_no_safepoint_check_flag); > tag_map->unlink_and_post(); > } > } > } > > void JvmtiTagMap::gc_notification(size_t num_dead_entries) { > if (Thread::current()->is_VM_thread()) { > assert(SafepointSynchronize::is_at_safepoint(), "must be"); > set_needs_rehashing(); > } > unlink_and_post_for_all_envs(); > } @sspitsyn Thank you for reviewing the code. The gc_notification refactoring is awkward because each table needs a lock and not a global lock. If we find another place in the GCs to call set_needs_rehashing() it might be possible to make gc_notification call two functions with a boolean to decide whether to take the lock. We're still working on @kimbarrett comments so maybe the notification will change to some new thread and be refactored that way if necessary. I fixed the code for your other comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Wed Nov 4 12:21:12 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 4 Nov 2020 12:21:12 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v6] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - Add back WeakProcessorPhases::Phase enum. - Serguei 1. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/f66ea839..7d3fdf68 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=04-05 Stats: 18 lines in 5 files changed: 7 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From shade at openjdk.java.net Wed Nov 4 12:23:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 12:23:04 GMT Subject: RFR: 8255894: Remove unused StubRoutines::_zero_aligned_words In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 12:09:56 GMT, John Paul Adrian Glaubitz wrote: > FWIW, we are still building Zero on SPARC in Debian. So, if there is an extra alignment that's needed on SPARC, I'd appreciate if it could stay in in case it's required for SPARC. This is not about Zero VM. This stub is supposed to be _zeroing the memory_, and it is not used at all, AFAICS. So there is no breakage for Zero VM on SPARC. ------------- PR: https://git.openjdk.java.net/jdk/pull/1053 From glaubitz at openjdk.java.net Wed Nov 4 12:30:55 2020 From: glaubitz at openjdk.java.net (John Paul Adrian Glaubitz) Date: Wed, 4 Nov 2020 12:30:55 GMT Subject: RFR: 8255894: Remove unused StubRoutines::_zero_aligned_words In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 12:18:36 GMT, Aleksey Shipilev wrote: > This is not about Zero VM. This stub is supposed to be _zeroing the memory_, and it is not used at all, AFAICS. So there is no breakage for Zero VM on SPARC. OK, thanks. I just wanted to be sure since I remember there were parts in Hotspot that were used both for Zero and the native versions of Hotspot. ------------- PR: https://git.openjdk.java.net/jdk/pull/1053 From redestad at openjdk.java.net Wed Nov 4 12:30:55 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 12:30:55 GMT Subject: RFR: 8255894: Remove unused StubRoutines::_zero_aligned_words In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 12:18:36 GMT, Aleksey Shipilev wrote: > FWIW, we are still building Zero on SPARC in Debian. So, if there is an extra alignment that's needed on SPARC, I'd appreciate if it could stay in in case it's required for SPARC. This routine was used as part of an optimization of [pd_fill_to_aligned_words](https://github.com/openjdk/jdk/commit/644620568827ddd5f5a4dc130d615ed6fd915c2d#diff-fe8c06a22855cd5f58f908f73109212d77cca26ec0c422d90496ebeedefe770d) on SPARC. I'm not sure how this method looks in your source tree now since all the flags that control this has been dropped from the mainline. ------------- PR: https://git.openjdk.java.net/jdk/pull/1053 From rkennke at openjdk.java.net Wed Nov 4 12:38:05 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 4 Nov 2020 12:38:05 GMT Subject: RFR: 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter Message-ID: JDK-8255762 caused test failure on Windows because of overlapping argument registers in the LRB runtime call. The problem is more general, though, but hasn't manifested anywhere else. Testing: hotspot_gc_shenandoah (linux: x86_64, x86_32, windows: x86_64) ------------- Commit messages: - 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter Changes: https://git.openjdk.java.net/jdk/pull/1054/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255886 Stats: 18 lines in 1 file changed: 10 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/1054.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1054/head:pull/1054 PR: https://git.openjdk.java.net/jdk/pull/1054 From shade at openjdk.java.net Wed Nov 4 12:39:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 12:39:59 GMT Subject: RFR: 8255894: Remove unused StubRoutines::_zero_aligned_words In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 12:28:25 GMT, Claes Redestad wrote: >>> FWIW, we are still building Zero on SPARC in Debian. So, if there is an extra alignment that's needed on SPARC, I'd appreciate if it could stay in in case it's required for SPARC. >> >> This is not about Zero VM. This stub is supposed to be _zeroing the memory_, and it is not used at all, AFAICS. So there is no breakage for Zero VM on SPARC. > >> FWIW, we are still building Zero on SPARC in Debian. So, if there is an extra alignment that's needed on SPARC, I'd appreciate if it could stay in in case it's required for SPARC. > > This routine was used as part of an optimization of [pd_fill_to_aligned_words](https://github.com/openjdk/jdk/commit/644620568827ddd5f5a4dc130d615ed6fd915c2d#diff-fe8c06a22855cd5f58f908f73109212d77cca26ec0c422d90496ebeedefe770d) on SPARC. I'm not sure how this method looks in your source tree now since all the flags that control this has been dropped from the mainline. After looking at `copy_zero.hpp`, I believe Zero uses its own `pd_fill_to_aligned_words` and `pd_fill_to_words`. AFAIU, Zero tries to avoid the arch-specific code as much as possible, and that also extends to not using the "usual" stubs that require arch-specific generators. ------------- PR: https://git.openjdk.java.net/jdk/pull/1053 From shade at openjdk.java.net Wed Nov 4 12:39:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 12:39:55 GMT Subject: RFR: 8255894: Remove unused StubRoutines::_zero_aligned_words In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 11:51:20 GMT, Claes Redestad wrote: > _zero_aligned_words was a SPARC-only optimization added by JDK-7059037 Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1053 From redestad at openjdk.java.net Wed Nov 4 12:44:59 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 4 Nov 2020 12:44:59 GMT Subject: RFR: 8255894: Remove unused StubRoutines::_zero_aligned_words In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 12:37:16 GMT, Aleksey Shipilev wrote: >> _zero_aligned_words was a SPARC-only optimization added by JDK-7059037 > > Marked as reviewed by shade (Reviewer). @shipilev thanks for verifying that (and for reviewing)! ------------- PR: https://git.openjdk.java.net/jdk/pull/1053 From stuefe at openjdk.java.net Wed Nov 4 14:40:56 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 4 Nov 2020 14:40:56 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v2] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 05:48:14 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() > > src/hotspot/os/posix/signals_posix.cpp line 443: > >> 441: extern "C" JNIEXPORT int >> 442: #if defined(BSD) >> 443: JVM_handle_bsd_signal > > Can we define this using token pasting e.g. > > PASTE_TOKENS(JVM_handle, PASTE_TOKENS(INCLUDE_SUFFIX_OS, _signal)) > > ? I find that actually less readable than what we have now. Plus, right now its easier to grep for the function name. ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From mdoerr at openjdk.java.net Wed Nov 4 14:54:07 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 4 Nov 2020 14:54:07 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v10] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 03:01:09 GMT, CoreyAshford wrote: >> This patch set encompasses the following commits: >> >> - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. >> - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation >> - Adds a Power64LE-specific implementation of the decodeBlock intrinsic. >> - Adds a JMH microbenchmark for both Base64 encoding and encoding. >> - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. > > CoreyAshford has updated the pull request incrementally with one additional commit since the last revision: > > stubGenerator_ppc.cpp: fix trailing whitespace errors Thanks for removing the branch from the loop. (Maybe this affects unrolling decision.) Looks good. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/293 From rkennke at openjdk.java.net Wed Nov 4 15:03:10 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 4 Nov 2020 15:03:10 GMT Subject: RFR: 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter [v2] In-Reply-To: References: Message-ID: > JDK-8255762 caused test failure on Windows because of overlapping argument registers in the LRB runtime call. The problem is more general, though, but hasn't manifested anywhere else. > > Testing: hotspot_gc_shenandoah (linux: x86_64, x86_32, windows: x86_64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Generate regular call_VM_leaf for non-weak LRB ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1054/files - new: https://git.openjdk.java.net/jdk/pull/1054/files/7418c712..339dde59 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1054.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1054/head:pull/1054 PR: https://git.openjdk.java.net/jdk/pull/1054 From rriggs at openjdk.java.net Wed Nov 4 15:05:58 2020 From: rriggs at openjdk.java.net (Roger Riggs) Date: Wed, 4 Nov 2020 15:05:58 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v10] In-Reply-To: References: Message-ID: <87_YermU8Z_HS5_a1cJDOjVKsPrjQZkdrPFlNoLcqwI=.a70cdec0-6ea0-468f-853b-a7a9079a9c7c@github.com> On Tue, 3 Nov 2020 03:01:09 GMT, CoreyAshford wrote: >> This patch set encompasses the following commits: >> >> - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. >> - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation >> - Adds a Power64LE-specific implementation of the decodeBlock intrinsic. >> - Adds a JMH microbenchmark for both Base64 encoding and encoding. >> - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. > > CoreyAshford has updated the pull request incrementally with one additional commit since the last revision: > > stubGenerator_ppc.cpp: fix trailing whitespace errors Marked as reviewed by rriggs (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/293 From rkennke at openjdk.java.net Wed Nov 4 15:17:06 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 4 Nov 2020 15:17:06 GMT Subject: RFR: 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter [v3] In-Reply-To: References: Message-ID: <72TjNYQ1Sc9N-3mA0bxvN7fdTMN5JApffKyr2-EZw2E=.1cf707a3-b951-4840-8166-b5bed966da6b@github.com> > JDK-8255762 caused test failure on Windows because of overlapping argument registers in the LRB runtime call. The problem is more general, though, but hasn't manifested anywhere else. > > Testing: hotspot_gc_shenandoah (linux: x86_64, x86_32, windows: x86_64) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8255886 - Generate regular call_VM_leaf for non-weak LRB - 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1054/files - new: https://git.openjdk.java.net/jdk/pull/1054/files/339dde59..504e56ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=01-02 Stats: 10 lines in 3 files changed: 1 ins; 0 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/1054.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1054/head:pull/1054 PR: https://git.openjdk.java.net/jdk/pull/1054 From stuefe at openjdk.java.net Wed Nov 4 15:22:11 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 4 Nov 2020 15:22:11 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: References: Message-ID: > Hi all, > > may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. > > Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. > > This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. > > This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. > > ---- > > This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. > > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. > > Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. > > --- > > The fixed issues: > > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. > > Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. > But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. > > 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). > > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. > > 4) Every platform handler has this section: > > JavaThread* thread = NULL; > VMThread* vmthread = NULL; > if (PosixSignals::are_signal_handlers_installed()) { > if (t != NULL ){ > if(t->is_Java_thread()) { > thread = t->as_Java_thread(); > } > else if(t->is_VM_thread()){ > vmthread = (VMThread *)t; > } > } > } > > `vmthread` is unused on all platforms and can be removed. > > 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): > > if (sig == SIGPIPE || sig == SIGXFSZ) { > // allow chained handler to go first > if (PosixSignals::chained_handler(sig, info, ucVoid)) { > return true; > } else { > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 > return true; > } > } > > - On s390 and ppc, we miss SIGXFSZ handling > _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. > - both paths return true - section can be shortened > > Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. > > 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: > >> // unmask current signal >> sigset_t newset; >> sigemptyset(&newset); >> sigaddset(&newset, sig); >> sigprocmask(SIG_UNBLOCK, &newset, NULL); >> > > - Use of `sigprocmask()` is UB in a multithreaded program. > - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. > > 7) the JFR crash protection is not consistently checked in all platform handlers. > > 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). > > 9) on Linux ppc64 and AIX, we have this section: > >> if (sig == SIGILL && (pc < (address) 0x200)) { >> goto report_and_die; >> } > > which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). > > This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. > > 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. > > On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. > > ---- > > The changes in this patch: > > a) hotspot signal handling is now done by the following functions: > > > | | > v v > javaSignalHandler JVM_handle_linux_signal() > | / > v v > javaSignalHandler_inner > | > v > PosixSignals::pd_hotspot_signal_handler() > > > The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. > > `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. > `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. > > > b) I commonized prologue- and epilogue coding. > - I simplified (4) to a single line in the shared handler > - I moved the JFR thread crash protection (7) up to the shared handler > - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) > - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. > - I simplified (5) and commonized it, and removed (9) completely > - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. > > Thanks for reviewing. > > Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. > > I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. > > Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. > > ---- > > [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html > [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html > [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html > [5] https://bugs.openjdk.java.net/browse/JDK-8253742 Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback David ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1034/files - new: https://git.openjdk.java.net/jdk/pull/1034/files/a548111f..3a6a8095 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=01-02 Stats: 44 lines in 2 files changed: 25 ins; 9 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1034.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1034/head:pull/1034 PR: https://git.openjdk.java.net/jdk/pull/1034 From shade at openjdk.java.net Wed Nov 4 16:41:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 16:41:01 GMT Subject: RFR: 8255822: Zero: improve build-time JVMTI handling Message-ID: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> Current Zero interpreter has the optimization for JVMTI support. It recognizes that JVMTI is disabled most of the time, and that JVMTI checks in the interpreter code slows it down considerably. (I measured it myself for this patch, it gives about 20% hit in build times). Current optimization works as follows. At build time, an XSLT transform is performed on `bytecodeInterpreter.cpp`, yielding `bytecodeInterpreterWithChecks.cpp`. In that new compilation unit, `VM_JVMTI` macro is defined, and a new entry point -- `BytecodeInterpreter::withChecks` -- is defined. Then, both compilation units are compiled. In one of them, `JVMTI` hooks are stripped out. In another, they persist. Then, callers have to choose which entry point to use. I believe this can be rewritten to use C++ templates instead of XLST and defines dance. This also allows to clean up JVMTI checks a bit. Additional testing: - [x] Linux x86_64 Zero fastdebug build with `-jvmti` - [x] Linux x86_64 Zero fastdebug/release build times are not regressing ------------- Commit messages: - Revert one dubious change - 8255822: Zero: improve build-time JVMTI handling Changes: https://git.openjdk.java.net/jdk/pull/1061/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1061&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255822 Stats: 155 lines in 6 files changed: 3 ins; 118 del; 34 mod Patch: https://git.openjdk.java.net/jdk/pull/1061.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1061/head:pull/1061 PR: https://git.openjdk.java.net/jdk/pull/1061 From adinn at openjdk.java.net Wed Nov 4 16:48:54 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 4 Nov 2020 16:48:54 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> On Wed, 4 Nov 2020 12:13:55 GMT, John Paul Adrian Glaubitz wrote: >> Tested slowdebug build on Windows+Arm64 with this patch, and smoked tested it with `jtreg:tier1_compiler_1` successfully. >> >> Change itself looks go to me too (but I'm not a reviewer). > > Does this account for the fact that CPU cores can be offline? > > Normally, the number of available cores is determined with ```sysconf()``` in a portable manner and ```sysconf()``` differentiates between _configured_ and _online_ processors: > > - _SC_NPROCESSORS_CONF > The number of processors configured. See also get_nprocs_conf(3). > > - _SC_NPROCESSORS_ONLN > The number of processors currently online (available). See also get_nprocs_conf(3). > > The number of online processors can be lower than the number of configured processors. > > I remember fixing an issue regarding this feature in PulseAudio as the testsuite broke on SPARC on Linux, see: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/commit/1df21e6ab6cd42e2f7601a6c5577c20b7e3d1046 > And while we're talking about Linux, can we really not get the info we need without parsing /proc/cpuinfo? And do we need to parse the entire file? At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From adinn at openjdk.java.net Wed Nov 4 16:51:56 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 4 Nov 2020 16:51:56 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Wed, 4 Nov 2020 16:46:36 GMT, Andrew Dinn wrote: >> Does this account for the fact that CPU cores can be offline? >> >> Normally, the number of available cores is determined with ```sysconf()``` in a portable manner and ```sysconf()``` differentiates between _configured_ and _online_ processors: >> >> - _SC_NPROCESSORS_CONF >> The number of processors configured. See also get_nprocs_conf(3). >> >> - _SC_NPROCESSORS_ONLN >> The number of processors currently online (available). See also get_nprocs_conf(3). >> >> The number of online processors can be lower than the number of configured processors. >> >> I remember fixing an issue regarding this feature in PulseAudio as the testsuite broke on SPARC on Linux, see: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/commit/1df21e6ab6cd42e2f7601a6c5577c20b7e3d1046 > >> And while we're talking about Linux, can we really not get the info we need > without parsing /proc/cpuinfo? And do we need to parse the entire file? > > At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. @theRealAph BTW, I think Nick is right that this patch is not needed. Are you ok to reject it? ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From rkennke at openjdk.java.net Wed Nov 4 17:23:04 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 4 Nov 2020 17:23:04 GMT Subject: RFR: 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter [v4] In-Reply-To: References: Message-ID: > JDK-8255762 caused test failure on Windows because of overlapping argument registers in the LRB runtime call. The problem is more general, though, but hasn't manifested anywhere else. > > Testing: hotspot_gc_shenandoah (linux: x86_64, x86_32, windows: x86_64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use 2-register form of cset-check, MacOSX doesn't allocate cset-table in location for 32bit-addressing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1054/files - new: https://git.openjdk.java.net/jdk/pull/1054/files/504e56ae..e83cae28 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=02-03 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1054.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1054/head:pull/1054 PR: https://git.openjdk.java.net/jdk/pull/1054 From mchung at openjdk.java.net Wed Nov 4 17:40:00 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 4 Nov 2020 17:40:00 GMT Subject: RFR: 8188055: (ref) Add Reference::refersTo predicate [v6] In-Reply-To: References: <0dhF_xxcp1VoUowwdZenB2qWa9ILcZjTMe3lsaRrg7k=.3c633db8-f745-4353-ad34-a64fbc96d4e0@github.com> Message-ID: <95NVU6qFbtLLeI2DfNZBHdpbkrXlc0jWTbHpebLpHj0=.31a36bc2-e770-453c-940f-ca21ff79e4c2@github.com> On Wed, 4 Nov 2020 10:13:59 GMT, Kim Barrett wrote: >> Hello! >> >> As an IDE developer, I'm thinking about IDE inspection that may suggest the new method. My idea is to suggest replacing every `ref.get() == obj` with `ref.refersTo(obj)`. Is this a good idea or there are cases when `ref.get() == obj` could be preferred? What do you think? > >> Hello! >> >> As an IDE developer, I'm thinking about IDE inspection that may suggest the new method. My idea is to suggest replacing every `ref.get() == obj` with `ref.refersTo(obj)`. Is this a good idea or there are cases when `ref.get() == obj` could be preferred? What do you think? > > Those have different behaviors when ref's class overrides get. Sometimes that might be intentional (PhantomReference, where get blocks access to the referent, and SoftReference, where get may update heuristics for recent accesses delaying GC clearing). But if some further subclass overrides get for some reason, such a change might not be appropriate. Checking if a reference has been cleared i.e. `ref.get() == null` or `ref.get() != null` may benefit with IDE giving a hint. ------------- PR: https://git.openjdk.java.net/jdk/pull/498 From shade at openjdk.java.net Wed Nov 4 17:45:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 17:45:57 GMT Subject: RFR: 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter [v4] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 17:23:04 GMT, Roman Kennke wrote: >> JDK-8255762 caused test failure on Windows because of overlapping argument registers in the LRB runtime call. The problem is more general, though, but hasn't manifested anywhere else. >> >> Testing: hotspot_gc_shenandoah (linux: x86_64, x86_32, windows: x86_64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Use 2-register form of cset-check, MacOSX doesn't allocate cset-table in location for 32bit-addressing Minor nits src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 297: > 295: if (kind == ShenandoahBarrierSet::AccessKind::NORMAL) { > 296: // Test for object in cset > 297: // Allocate tmp-reg. "Allocate temporary registers" now. src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 312: > 310: __ push(tmp2); > 311: assert_different_registers(tmp1, src.base(), src.index()); > 312: assert_different_registers(tmp1, dst); Let's do: assert(tmp1 != noreg, "tmp1 allocated"); assert(tmp2 != noreg, "tmp2 allocated"); assert_different_registers(tmp1, tmp2, src.base(), src.index()); assert_different_registers(tmp1, tmp2, dst); __ push(tmp1); __ push(tmp2); ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1054 From rkennke at openjdk.java.net Wed Nov 4 18:30:08 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 4 Nov 2020 18:30:08 GMT Subject: RFR: 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter [v5] In-Reply-To: References: Message-ID: > JDK-8255762 caused test failure on Windows because of overlapping argument registers in the LRB runtime call. The problem is more general, though, but hasn't manifested anywhere else. > > Testing: hotspot_gc_shenandoah (linux: x86_64, x86_32, windows: x86_64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Some touch-ups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1054/files - new: https://git.openjdk.java.net/jdk/pull/1054/files/e83cae28..607b98f7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1054&range=03-04 Stats: 8 lines in 1 file changed: 5 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1054.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1054/head:pull/1054 PR: https://git.openjdk.java.net/jdk/pull/1054 From shade at openjdk.java.net Wed Nov 4 18:32:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 18:32:59 GMT Subject: RFR: 8255886: Shenandoah: Avoid register clash when calling LRB-runtime from interpreter [v5] In-Reply-To: References: Message-ID: <7q8itHvAjJ4U8j8KwY-xeP-TP2HKx-k1FOXetEs1gBg=.e1c3b029-6ab9-44a3-a52e-d257c7127059@github.com> On Wed, 4 Nov 2020 18:30:08 GMT, Roman Kennke wrote: >> JDK-8255762 caused test failure on Windows because of overlapping argument registers in the LRB runtime call. The problem is more general, though, but hasn't manifested anywhere else. >> >> Testing: hotspot_gc_shenandoah (linux: x86_64, x86_32, windows: x86_64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Some touch-ups Looks good. Please rename the bug and PR to capture the cset check changes. Suggestion: "Shenandoah: Resolve cset address truncation and register clash in interpreter LRB" ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1054 From kvn at openjdk.java.net Wed Nov 4 19:03:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 4 Nov 2020 19:03:58 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v3] In-Reply-To: <5wM4jpVaL-BlP7FdbWwQkWgpxZZbUlpTVoFw3SRBUqA=.2419c090-2d90-4ff1-8be9-d9167b0dc1ef@github.com> References: <5wM4jpVaL-BlP7FdbWwQkWgpxZZbUlpTVoFw3SRBUqA=.2419c090-2d90-4ff1-8be9-d9167b0dc1ef@github.com> Message-ID: On Wed, 4 Nov 2020 07:08:06 GMT, Tobias Hartmann wrote: >> C2 often creates temporary "hook" nodes to keep other nodes alive. Although dead, these are sometimes not removed because they don't end up on the IGVN worklist. JDK-8040213 added detection of modified nodes that are not re-processed by IGVN but currently ignores dead nodes. >> >> This patch includes the following changes: >> - Adjust detection of modified nodes such that dead nodes are includes as well. This revealed several locations were dead nodes are not eagerly destructed (or not even added to the worklist for later removal). I've fixed all of these. >> - No need to yank node inputs before calling `destruct`. >> - `kill_dead_code` accidentally re-adds dead nodes to the `_modified_nodes` list. `Compile::remove_modified_node` should be called at the end to avoid this. >> - Some removal of dead code. >> >> Tested with tier1-3, higher tiers are running. >> >> JDK-8255670 will further improve detection. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Pass PhaseValues to Node::destruct Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/994 From github.com+51754783+coreyashford at openjdk.java.net Wed Nov 4 20:25:58 2020 From: github.com+51754783+coreyashford at openjdk.java.net (CoreyAshford) Date: Wed, 4 Nov 2020 20:25:58 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v10] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 14:51:06 GMT, Martin Doerr wrote: > Thanks for removing the branch from the loop. (Maybe this affects unrolling decision.) Looks good. Yeah, it does, and oddly enough the best loop unroll value is now 1. I will re-run the benchmarks again to confirm, but that's what it's looking like now. ------------- PR: https://git.openjdk.java.net/jdk/pull/293 From rkennke at openjdk.java.net Wed Nov 4 21:33:56 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 4 Nov 2020 21:33:56 GMT Subject: Integrated: 8255886: Shenandoah: Resolve cset address truncation and register clash in interpreter LRB In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 12:28:23 GMT, Roman Kennke wrote: > JDK-8255762 caused test failure on Windows because of overlapping argument registers in the LRB runtime call. The problem is more general, though, but hasn't manifested anywhere else. > > Testing: hotspot_gc_shenandoah (linux: x86_64, x86_32, windows: x86_64) This pull request has now been integrated. Changeset: 29db1dcd Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/29db1dcd Stats: 36 lines in 1 file changed: 21 ins; 1 del; 14 mod 8255886: Shenandoah: Resolve cset address truncation and register clash in interpreter LRB Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1054 From sspitsyn at openjdk.java.net Wed Nov 4 22:12:06 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 4 Nov 2020 22:12:06 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v6] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 12:21:12 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Add back WeakProcessorPhases::Phase enum. > - Serguei 1. Thank you for the update, Coleen! I leave it for you to decide to refactor the gc_notification or not. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/967 From sspitsyn at openjdk.java.net Wed Nov 4 22:15:58 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 4 Nov 2020 22:15:58 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v4] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 09:06:07 GMT, Erik ?sterlund wrote: >> The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation(). >> >> The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop. >> >> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is wrong, as we can safepoint on the way back to Java, which will break the return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the transition to VM and back, into a block of code that is protected against GC. Before the JRT_BLOCK is called, we stash away the return oop, and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when InterpreterRuntime::post_method_exit is called when throwing an exception, we don't have the same problem of retaining an oop result, and hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the same as before for this path. >> >> This is a JVMTI bug that has probably been around for a long time. It crashes with all GCs, but was discovered recently after concurrent stack processing, as StefanK has been running better GC stressing code in JVMTI, and the bug reproduced more easily with concurrent stack processing, as the timings were a bit different. The following reproducer failed pretty much 100% of the time: >> while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done >> >> With my fix I can run this repeatedly without any more failures. I have also sanity checked the patch by running tier 1-5, so that it does not introduces any new issues on its own. I have also used Stefan's nice external GC stressing with jcmd technique that was used to trigger crashes with other GCs, to make sure said crashes no longer reproduce either. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Serguei CR2: Don't check interpreted only Erik, Thank you for the explanation! I agree with you, so the fix is good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/930 From david.holmes at oracle.com Wed Nov 4 22:40:09 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 5 Nov 2020 08:40:09 +1000 Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: <7888a9d2-a3b9-9981-665f-1cb3181d199d@oracle.com> Message-ID: <9eb11108-18f4-dc91-206d-dccd3b581e26@oracle.com> On 4/11/2020 4:39 pm, Thomas St?fe wrote: > > 6) At the end of every platform header, before calling into > fatal error handling, we unblock the signal: > > > >>? ?// unmask current signal > >>? ?sigset_t newset; > >>? ?sigemptyset(&newset); > >>? ?sigaddset(&newset, sig); > >>? ?sigprocmask(SIG_UNBLOCK, &newset, NULL); > >> > > > > - Use of `sigprocmask()` is UB in a multithreaded program. > > - but then, this section is unnecessary anyway, since > [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) > > we unmask error signals at the start of the signal handler. > > But is this guaranteed to be one of the error signals? What if the > application calls our handler for some other signal? I guess > that is > their problem. > > > Good point, but: > - we only need to unblock error signals. There is no reasonable need > to unblock other?signals in fatal error handling (to the contrary, > the rest should be kept blocked to not interfere with hs-err printing) > - we only install handlers for:?SIGSEGV, SIGPIPE, SIGBUS, SIGILL, > SIGFPE, SIGTRAP, SIGXFSZ. Of those, we don't unblock SIGPIPE > and?SIGXFSZ. But those are handled at the entrance of > javaSignalHandler_inner, so we should never enter fatal > error?handling because of those. > But you raise an interesting point, I am not sure whether SIGXFSZ > can be deferred. What happens if we write a big core file and that > hits the file limit, but SIGXSFZ?is blocked from delivery? Will we > just get terminated? Well, maybe we should get terminated.?But this > is a question for another RFE. > > > Missed part of your point here. > > Calling this function from the outside is only allowed for a couple of > signals, see comment: > > // This routine may recognize any of the following kinds of signals: > // ? ?SIGBUS, SIGSEGV, SIGILL, SIGFPE, SIGQUIT, SIGPIPE, SIGXFSZ, SIGUSR1. > // It should be consulted by handlers for any of those signals. > > Note that this list is not really correct, as it includes SIGUSR1 > and?SIGQUIT. None of which are handled by the hotspot signal handler. As > you wrote before, this mechanism is only to handle signals the hotspot > commandeers. SIGUSR1 is probably a leftover from when we did use SIGUSR1 for something. SIGQUIT is interesting because if the app is in charge of signals then it will install a SIGQUIT handler and we would not want the "VM" to do it's normal SIGQUIT handler. I have a big question mark over how the use of the signal handler thread can/should interact with AllowUserSignalHandlers. > So I think JVM_handle_xx_signal() should test for the list of allowed > signals, and just return false right away in case sig is none of the > hotspot signals. I don't think we should change existing behaviour here. Thanks, David > From david.holmes at oracle.com Wed Nov 4 22:41:54 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 5 Nov 2020 08:41:54 +1000 Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v2] In-Reply-To: References: Message-ID: <6e3ab6d5-07e0-8c22-fcae-dd95fbc6f5e4@oracle.com> On 5/11/2020 12:40 am, Thomas Stuefe wrote: > On Wed, 4 Nov 2020 05:48:14 GMT, David Holmes wrote: > >>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >>> >>> Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() >> >> src/hotspot/os/posix/signals_posix.cpp line 443: >> >>> 441: extern "C" JNIEXPORT int >>> 442: #if defined(BSD) >>> 443: JVM_handle_bsd_signal >> >> Can we define this using token pasting e.g. >> >> PASTE_TOKENS(JVM_handle, PASTE_TOKENS(INCLUDE_SUFFIX_OS, _signal)) >> >> ? > > I find that actually less readable than what we have now. Plus, right now its easier to grep for the function name. True ... I just hate the cascading ifdefs for this kind of thing. I actually expected to see JVM_handle_xxx_signal still defined in the platform specific os_xxx.cpp file. David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/1034 > From github.com+670087+jrziviani at openjdk.java.net Wed Nov 4 22:42:07 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Wed, 4 Nov 2020 22:42:07 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v5] In-Reply-To: References: Message-ID: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. Ref: PowerISA 3.1, page 129. These instructions are particularly interesting to improve the following pattern `(src1src2)? 1: 0)`, which can be found in `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. Long.toString, that generate such pattern in getChars, has showed a good performance gain by using these new instructions. Example: for (int i = 0; i < 200_000; i++) res = Long.toString((long)i); java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString Without setbc (average): 0.1178 seconds With setbc (average): 0.0396 seconds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/907/files - new: https://git.openjdk.java.net/jdk/pull/907/files/0af02057..08e58fff Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=03-04 Stats: 56 lines in 5 files changed: 5 ins; 19 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/907/head:pull/907 PR: https://git.openjdk.java.net/jdk/pull/907 From shade at openjdk.java.net Wed Nov 4 22:47:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 4 Nov 2020 22:47:55 GMT Subject: RFR: 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" In-Reply-To: <1OzPeIS9fm-ju9MIajtY8pz_rf0NVtKiDeiTd29_zmc=.5800f906-fb0f-4080-b41f-0e39864fd2ae@github.com> References: <1OzPeIS9fm-ju9MIajtY8pz_rf0NVtKiDeiTd29_zmc=.5800f906-fb0f-4080-b41f-0e39864fd2ae@github.com> Message-ID: On Tue, 3 Nov 2020 09:27:11 GMT, Stefan Karlsson wrote: >> When doing Zero VM performance investigations, I realized that `UseTLAB` is disabled there by default. >> >> That is effectively because `UseTLAB` is currently `product_pd` (defined in `gc_globals.hpp`), and it is enabled for every platform with C1 and C2 ports (in their respective `c1/c2_globals.hpp`). `compiler_globals.hpp` has a block that defines `UseTLAB` to `false` when no C1/C2/JVMCI is present. >> >> Not only this is awkward -- GC flag is managed by Compiler globals! -- it makes Zero awkward to opt-in to `UseTLAB` in `*_zero_globals.hpp`, because `compiler_globals.hpp` already defines it. I think we can make this all better by turning TLAB flags from `product_pd` to `product`, and defaulting them to `true`. This matches what every current Server/Minimal VM config has, and would implicitly enable `UseTLAB` and `ResizeTLAB` for Zero, as well as for builds with `--with-jvm-features=-compiler1,-compiler-2,-jvmci` (in case anyone actually builds it, that is only with template interpreter). >> >> On the downside, this shuts the door for new platform ports to disable TLAB flags by default to ease porting. But I believe the same can be achieved by turning these flags off in `arguments.cpp` under the special platform defines, while TLAB enablement work is in progress. >> >> Additional testing: >> - [x] Linux x86_64 Zero ad-hoc runs >> - [x] Linux x86_64 `--with-jvm-features=-compiler1,-compiler-2,-jvmci,*` builds > > Sounds good to me. > > (As usual with shared HotSpot code, remember to leave this open for a while to allow people in other timezones time to see it.) Friendly reminder if anyone else wants to chime in. ------------- PR: https://git.openjdk.java.net/jdk/pull/1019 From github.com+51754783+coreyashford at openjdk.java.net Wed Nov 4 22:54:13 2020 From: github.com+51754783+coreyashford at openjdk.java.net (CoreyAshford) Date: Wed, 4 Nov 2020 22:54:13 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v11] In-Reply-To: References: Message-ID: > This patch set encompasses the following commits: > > - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. > - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation > - Adds a Power64LE-specific implementation of the decodeBlock intrinsic. > - Adds a JMH microbenchmark for both Base64 encoding and encoding. > - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. CoreyAshford has updated the pull request incrementally with one additional commit since the last revision: stubGenerator_ppc.cpp: reduce loop_unrolls to 1 to match new benchmark results. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/293/files - new: https://git.openjdk.java.net/jdk/pull/293/files/8292527e..c4d22da3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=293&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=293&range=09-10 Stats: 12 lines in 1 file changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/293.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/293/head:pull/293 PR: https://git.openjdk.java.net/jdk/pull/293 From vladimir.kozlov at oracle.com Wed Nov 4 22:53:47 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 4 Nov 2020 14:53:47 -0800 Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: Message-ID: Hi, Volker and Monica On 11/3/20 2:51 AM, Volker Simonis wrote: > Hi Vladimir, > > this is an interesting step and I wonder how it affects the OpenJDK > Graal, Metropolis and Leyden projects? > > - Project Graal [1] seems to have already been merged into project > Metropolis as it states on its project page: > "Further work on integrating Graal in the OpenJDK has moved to Project > Metropolis." This page is outdated and currently incorrect. The main development of Graal is done in GraalVM. OpenJDK and Metropolis are downstream. > > - Project Metropolis [2] has the following mission statement on its > project page: > "The goal of this Project is to provide a venue to explore and > incubate advanced "Java-on-Java" implementation techniques for > HotSpot. Our starting point is earlier proposals for using the Graal > compiler and AOT static compilation technology to replace the HotSpot > server compiler, and possibly other components of HotSpot." > > It seems that this goal becomes void when Graal AOT and Grall JIT are > abandoned in the OpenJDK. No, this goal is still valid. We still think "Java-on-Java" is right direction for some components of HotSpot. We learned a lot and made some progress with Graal as JIT in Metropolis. And we have got very good expertise from AOT work which will help us with Project Leyden. We will return to work on Metropolis later on. But right now, we think the work on C2 improvement is more important to keep Java vibrant and competitive. I will let Mark to talk about Project Leyden. Best regards, Vladimir Kozlov > > - Project Leyden [?]: @Mark: what's actually the state of Project > Leyden? We had a discussion [3], a vote [4] and the approval of the > project [5] yet nothing has happened ever since. There's neither a > project page nor a mailing list. > > Considering the fact that Leyden was supposed to "be based upon > existing components in the JDK such as the HotSpot JVM, the `jaotc` > ahead-of-time compiler, application class-data sharing, and the > `jlink` linking tool" I wonder if Leyden is already dead before its > instantiation if "jaotc", one of its core components, has now been > deprecated? Or are there any plans to enhance C2 for AOT scenarios? > > Thank you and best regards, > Volker > > [1] http://openjdk.java.net/projects/graal/ > [2] http://openjdk.java.net/projects/metropolis/ > [3] https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html > [4] https://mail.openjdk.java.net/pipermail/discuss/2020-May/005475.html > [5] https://mail.openjdk.java.net/pipermail/announce/2020-June/000290.html > > On Fri, Oct 30, 2020 at 6:47 PM Vladimir Kozlov wrote: >> >> We shipped Ahead-of-Time compilation (the jaotc tool) in JDK 9, as an experimental feature. We shipped Graal as an experimental JIT compiler in JDK 10. We haven't seen much use of these features, and the effort required to support and enhance them is significant. We therefore intend to disable these features in Oracle builds as of JDK 16. >> >> We'll leave the sources for these features in the repository, in case any one else is interested in building them. But we will not update or test them. >> >> We'll continue to build and ship JVMCI as an experimental feature in Oracle builds. >> >> Tested changes in all tiers. >> >> I verified that with these changes I still able to build Graal in open repo and run graalunit testing: >> >> `open$ bash test/hotspot/jtreg/compiler/graalunit/downloadLibs.sh /mydir/graalunit_lib/` >> `open$ bash configure --with-debug-level=fastdebug --with-graalunit-lib=/mydir/graalunit_lib/ --with-jtreg=/mydir/jtreg` >> `open$ make jdk-image` >> `open$ make test-image` >> `open$ make run-test TEST=compiler/graalunit/HotspotTest.java` >> >> ------------- >> >> Commit messages: >> - 8255616: Disable AOT and Graal in Oracle OpenJDK >> >> Changes: https://git.openjdk.java.net/jdk/pull/960/files >> Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=960&range=00 >> Issue: https://bugs.openjdk.java.net/browse/JDK-8255616 >> Stats: 36 lines in 4 files changed: 21 ins; 11 del; 4 mod >> Patch: https://git.openjdk.java.net/jdk/pull/960.diff >> Fetch: git fetch https://git.openjdk.java.net/jdk pull/960/head:pull/960 >> >> PR: https://git.openjdk.java.net/jdk/pull/960 From david.holmes at oracle.com Wed Nov 4 22:55:06 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 5 Nov 2020 08:55:06 +1000 Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: <7888a9d2-a3b9-9981-665f-1cb3181d199d@oracle.com> Message-ID: On 4/11/2020 4:34 pm, Thomas St?fe wrote: >> On the documentation front we could at least explain what the flag >> really does. Rather than saying: >> >> product(bool, AllowUserSignalHandlers, false, >> \ >> "Do not complain if the application installs signal >> handlers >> " >> >> we could say something like: >> >> product(bool, AllowUserSignalHandlers, false, >> \ >> "Allow the application to install the primary signal >> handlers >> instead of the JVM." \ >> >> and we (I?) could update the java manpage. > > I can update the text description, but I would like to defer any > additional work for this switch to some other RFE. Sure the manpage update can be another RFE. > > 10) When invoking the fatal error handler, we extract the pc from > the context and hand it over as "faulting pc". For SIGILL and > SIGFPE, this is not totally correct. According to POSIX [3], for > those signals the address of the faulting instruction is handed over > in `si_info.si_addr`. > > > > On most platforms this does not matter, they are the same. But on > some architectures the pc in the signal context actually points > somewhere else, e.g. beyond the faulting instruction. Therefore > `si_info.si_addr` is the better choice. > > The Posix spec also states "For some implementations, the value of > si_addr may be inaccurate." - so I'm not at all sure which "pc" we > should be trusting here? I thought the ucontext was the detailed > platform specific "context" object that we should extract information > from. Which architectures give different values in the two and is there > some documentation stating what happens for any given os/cpu? > > > I overlooked this. Well, this is a helpful?standard :) > > I saw this happen on s390 and on pa-risc. Before this patch, I did > correct this in the platform handler. Out of caution I could #ifdef this > section to s390. I prefer not see any change in behaviour for platforms where no problem has been observed. Also see what seems to be related comment in ./cpu/arm/vm_version_arm_32.cpp // JVM_handle_linux_signal moves PC here if SIGILL happens > > ---- > > > > The changes in this patch: > > > > a) hotspot signal handling is now done by the following functions: > > > > ? ? ? ? ? ? ? ? ? > >? ? |? ? ? ? ? ? ? ? ? ? ? ? ? ?| > >? ? v? ? ? ? ? ? ? ? ? ? ? ? ? ?v > >? ?javaSignalHandler? ? ? ?JVM_handle_linux_signal() > >? ? ? ? ?|? ? ? ? ? ? ? ? ? ?/ > >? ? ? ? ?v? ? ? ? ? ? ? ? ? v > >? ? ? ?javaSignalHandler_inner > > Not clear why we need the _inner version. Why can't we just have > javaSignalHandler which is installed as the handler and which is called > by JVM_handle_XXX_signal? > > > Because?JVM_handle_XXX_signal has one more argument than the standard > signal handler (abort_if_unrecognized). Okay so why introduce this shape instead of keeping the existing form: javaSignalHandler -> JVM_handle_xxx_signal(..., true) -> javaSignalHandler_inner ? With the new arrangement the equivalence between javaSignalHandler and JVM_handle_xxx_signal can only be seen by inspecting the code of both. Thanks, David From github.com+670087+jrziviani at openjdk.java.net Wed Nov 4 23:04:10 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Wed, 4 Nov 2020 23:04:10 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v6] In-Reply-To: References: Message-ID: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. Ref: PowerISA 3.1, page 129. These instructions are particularly interesting to improve the following pattern `(src1src2)? 1: 0)`, which can be found in `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. Long.toString, that generate such pattern in getChars, has showed a good performance gain by using these new instructions. Example: for (int i = 0; i < 200_000; i++) res = Long.toString((long)i); java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString Without setbc (average): 0.1178 seconds With setbc (average): 0.0396 seconds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/907/files - new: https://git.openjdk.java.net/jdk/pull/907/files/08e58fff..4e092e13 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/907/head:pull/907 PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Wed Nov 4 23:04:12 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Wed, 4 Nov 2020 23:04:12 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v4] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 11:08:05 GMT, Martin Doerr wrote: >> Ziviani has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. > > Nice work! Looks correct and much cleaner than before. I only have a few improvement requests left. Hallo, @TheRealMDoerr ! A new version is [here](https://github.com/openjdk/jdk/commit/4e092e13be4c0013c27c9ae1055891d49d93d270). P9: Test Results: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier1 1454 1454 0 0 jtreg:test/jdk:tier1 1948 1948 0 0 jtreg:test/langtools:tier1 4092 4092 0 0 jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 0 0 0 0 ============================== TEST SUCCESS P10: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier1 1456 1456 0 0 jtreg:test/jdk:tier1 1951 1951 0 0 jtreg:test/langtools:tier1 4093 4093 0 0 jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 0 0 0 0 ============================== TEST SUCCESS ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From mark.reinhold at oracle.com Wed Nov 4 23:06:07 2020 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Wed, 04 Nov 2020 15:06:07 -0800 Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: References: Message-ID: <20201104150607.735653032@eggemoggin.niobe.net> 2020/11/4 14:53:47 -0800, vladimir.kozlov at oracle.com: > On 11/3/20 2:51 AM, Volker Simonis wrote: >> this is an interesting step and I wonder how it affects the OpenJDK >> Graal, Metropolis and Leyden projects? >> > ... > > I will let Mark to talk about Project Leyden. > > Best regards, > Vladimir Kozlov > > ... >> >> - Project Leyden [?]: @Mark: what's actually the state of Project >> Leyden? We had a discussion [3], a vote [4] and the approval of the >> project [5] yet nothing has happened ever since. There's neither a >> project page nor a mailing list. Unfortunately, due to other priorities I haven?t had the time to get this project started properly. I hope to be able to do that soon. >> Considering the fact that Leyden was supposed to "be based upon >> existing components in the JDK such as the HotSpot JVM, the `jaotc` >> ahead-of-time compiler, application class-data sharing, and the >> `jlink` linking tool" I wonder if Leyden is already dead before its >> instantiation if "jaotc", one of its core components, has now been >> deprecated? Or are there any plans to enhance C2 for AOT scenarios? We are considering the possibility of using C2 for ahead-of-time compilation. - Mark From coleenp at openjdk.java.net Wed Nov 4 23:32:57 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 4 Nov 2020 23:32:57 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: References: Message-ID: <7oh4JAWsnFjXMCx9bwBCjhaZALn4U3kje15ZXn2klE0=.ab9b5ea6-0336-47dc-b9a5-32827ad47d66@github.com> On Wed, 4 Nov 2020 15:22:11 GMT, Thomas Stuefe wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback David Thank you for doing this cleanup and removing some duplicated code in the os_cpu signal handlers. src/hotspot/os/posix/signals_posix.cpp line 615: > 613: "\n# /--------------------\\" > 614: "\n# | %-7s |" > 615: "\n# \\---\\ /--------------/" Isn't the little robot supposed to say "segmentation fault" and would that be safer than calling get_signal_name in this context? thanks for keeping the picture. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1034 From coleenp at openjdk.java.net Thu Nov 5 00:06:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 00:06:01 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v4] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 09:06:07 GMT, Erik ?sterlund wrote: >> The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation(). >> >> The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop. >> >> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is wrong, as we can safepoint on the way back to Java, which will break the return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the transition to VM and back, into a block of code that is protected against GC. Before the JRT_BLOCK is called, we stash away the return oop, and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when InterpreterRuntime::post_method_exit is called when throwing an exception, we don't have the same problem of retaining an oop result, and hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the same as before for this path. >> >> This is a JVMTI bug that has probably been around for a long time. It crashes with all GCs, but was discovered recently after concurrent stack processing, as StefanK has been running better GC stressing code in JVMTI, and the bug reproduced more easily with concurrent stack processing, as the timings were a bit different. The following reproducer failed pretty much 100% of the time: >> while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done >> >> With my fix I can run this repeatedly without any more failures. I have also sanity checked the patch by running tier 1-5, so that it does not introduces any new issues on its own. I have also used Stefan's nice external GC stressing with jcmd technique that was used to trigger crashes with other GCs, to make sure said crashes no longer reproduce either. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Serguei CR2: Don't check interpreted only Still looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/930 From coleenp at openjdk.java.net Thu Nov 5 00:06:03 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 00:06:03 GMT Subject: RFR: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop [v2] In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 21:20:27 GMT, Serguei Spitsyn wrote: >> I'm not sure. There might be other cases, when remove_activation is called by the exception code. That's why I didn't want to change it to just true in this path. > > The post_method_exit can come from Zero interpreter: > src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp: > CALL_VM_NOCHECK(InterpreterRuntime::post_method_exit(THREAD)); It seems like the case of exception_exit is a condition from the call from notice_unwind_due_to_exception() and not the calls from InterpreterRuntime::post_method_exit() for both callers. But maybe if the exception is set and not caught, this post_method_exit() is called when unwinding to the exception handler. I can't tell, so leave it to be safe. ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From github.com+51754783+coreyashford at openjdk.java.net Thu Nov 5 01:20:08 2020 From: github.com+51754783+coreyashford at openjdk.java.net (CoreyAshford) Date: Thu, 5 Nov 2020 01:20:08 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v12] In-Reply-To: References: Message-ID: > This patch set encompasses the following commits: > > - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. > - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation > - Adds a Power64LE-specific implementation of the decodeBlock intrinsic. > - Adds a JMH microbenchmark for both Base64 encoding and encoding. > - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. CoreyAshford has updated the pull request incrementally with one additional commit since the last revision: stubGenerator_ppc.cpp: fix typo (omitted 'the') ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/293/files - new: https://git.openjdk.java.net/jdk/pull/293/files/c4d22da3..9e303dad Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=293&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=293&range=10-11 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/293.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/293/head:pull/293 PR: https://git.openjdk.java.net/jdk/pull/293 From ngasson at openjdk.java.net Thu Nov 5 01:49:54 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 5 Nov 2020 01:49:54 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Wed, 4 Nov 2020 16:49:00 GMT, Andrew Dinn wrote: > @theRealAph BTW, I think Nick is right that this patch is not needed. Are you ok to reject it? We should delete lines 184-187 in the current file: this isn't working as intended since the switch to `os::processor_count()` and as discussed the ancient kernel versions where this was necessary should no longer be in use. ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From ngasson at openjdk.java.net Thu Nov 5 01:52:53 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 5 Nov 2020 01:52:53 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Wed, 4 Nov 2020 16:46:36 GMT, Andrew Dinn wrote: > > At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. I think that comes from `getauxval() & HWCAP_DCPOP`? ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From xliu at openjdk.java.net Thu Nov 5 02:39:10 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 5 Nov 2020 02:39:10 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v2] In-Reply-To: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> Message-ID: > UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed > from hotspot, so remove this flag. Xin Liu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: - 8255562: delete UseRDPCForConstantTableBase mark UseRDPCForConstantTableBase is obsoletd in jdk16 and will expire in jdk17. - 8255562: delete UseRDPCForConstantTableBase ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/949/files - new: https://git.openjdk.java.net/jdk/pull/949/files/a4e875e8..478aab95 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=949&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=949&range=00-01 Stats: 13 lines in 3 files changed: 8 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/949.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/949/head:pull/949 PR: https://git.openjdk.java.net/jdk/pull/949 From xliu at openjdk.java.net Thu Nov 5 05:49:11 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 5 Nov 2020 05:49:11 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v3] In-Reply-To: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> Message-ID: > UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed > from hotspot, so remove this flag. Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - 8255562: delete UseRDPCForConstantTableBase mark UseRDPCForConstantTableBase is obsoletd in jdk16 and will expire in jdk17. - 8255562: delete UseRDPCForConstantTableBase ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/949/files - new: https://git.openjdk.java.net/jdk/pull/949/files/478aab95..ae686179 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=949&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=949&range=01-02 Stats: 12770 lines in 658 files changed: 7121 ins; 3260 del; 2389 mod Patch: https://git.openjdk.java.net/jdk/pull/949.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/949/head:pull/949 PR: https://git.openjdk.java.net/jdk/pull/949 From xliu at openjdk.java.net Thu Nov 5 06:03:56 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 5 Nov 2020 06:03:56 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v3] In-Reply-To: References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> Message-ID: On Fri, 30 Oct 2020 08:47:12 GMT, Tobias Hartmann wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - 8255562: delete UseRDPCForConstantTableBase >> >> mark UseRDPCForConstantTableBase is obsoletd in jdk16 and will expire in jdk17. >> - 8255562: delete UseRDPCForConstantTableBase > > Changes requested by thartmann (Reviewer). I made the hotspot cmdline flag `UseRDPCForConstantTableBase` obsolete in jdk(16) and will expire in jdk(17) based on the document [here](https://wiki.openjdk.java.net/display/HotSpot/Hotspot+Command-line+Flags%3A+Kinds%2C+Lifecycle+and+the+CSR+Process) The reason I modified `Arguments::handle_aliases_and_deprecation` is that hotspot didn't emit any warning message if we only mark it obsoleted but never define deprecated version. After I modify it, it emits a warning message to alarm user that the flag has been obsoleted. ./bin/java -XX:+UseRDPCForConstantTableBase --version OpenJDK 64-Bit Server VM warning: Temporarily processing option UseRDPCForConstantTableBase; support is scheduled for removal in 16.0 openjdk 16-internal 2021-03-16 OpenJDK Runtime Environment (build 16-internal+0-adhoc.ubuntu.jdk) OpenJDK 64-Bit Server VM (build 16-internal+0-adhoc.ubuntu.jdk, mixed mode) ------------- PR: https://git.openjdk.java.net/jdk/pull/949 From shade at openjdk.java.net Thu Nov 5 06:39:02 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 5 Nov 2020 06:39:02 GMT Subject: RFR: 8255822: Zero: improve build-time JVMTI handling [v2] In-Reply-To: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> References: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> Message-ID: > Current Zero interpreter has the optimization for JVMTI support. It recognizes that JVMTI is disabled most of the time, and that JVMTI checks in the interpreter code slows it down considerably. (I measured it myself for this patch, it gives about 20% hit in build times). > > Current optimization works as follows. At build time, an XSLT transform is performed on `bytecodeInterpreter.cpp`, yielding `bytecodeInterpreterWithChecks.cpp`. In that new compilation unit, `VM_JVMTI` macro is defined, and a new entry point -- `BytecodeInterpreter::withChecks` -- is defined. Then, both compilation units are compiled. In one of them, `JVMTI` hooks are stripped out. In another, they persist. Then, callers have to choose which entry point to use. > > I believe this can be rewritten to use C++ templates instead of XLST and defines dance. This also allows to clean up JVMTI checks a bit. > > Additional testing: > - [x] Linux x86_64 Zero fastdebug build with `-jvmti` > - [x] Linux x86_64 Zero fastdebug/release build times are not regressing Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Fix build error - Merge branch 'master' into JDK-8255822-zero-jvmti-rework - Revert one dubious change - 8255822: Zero: improve build-time JVMTI handling Summary: use C++ templates instead of XSLT transforms ------------- Changes: https://git.openjdk.java.net/jdk/pull/1061/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1061&range=01 Stats: 156 lines in 6 files changed: 3 ins; 119 del; 34 mod Patch: https://git.openjdk.java.net/jdk/pull/1061.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1061/head:pull/1061 PR: https://git.openjdk.java.net/jdk/pull/1061 From dholmes at openjdk.java.net Thu Nov 5 06:50:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 5 Nov 2020 06:50:56 GMT Subject: RFR: 8255822: Zero: improve build-time JVMTI handling [v2] In-Reply-To: References: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> Message-ID: <5iCgi65PZMWTwIIaBn4AdC7aM_Uvl2gqX3hUNRpK1Ok=.2352c564-b061-48ba-a40c-4a2d329676bf@github.com> On Thu, 5 Nov 2020 06:39:02 GMT, Aleksey Shipilev wrote: >> Current Zero interpreter has the optimization for JVMTI support. It recognizes that JVMTI is disabled most of the time, and that JVMTI checks in the interpreter code slows it down considerably. (I measured it myself for this patch, it gives about 20% hit in build times). >> >> Current optimization works as follows. At build time, an XSLT transform is performed on `bytecodeInterpreter.cpp`, yielding `bytecodeInterpreterWithChecks.cpp`. In that new compilation unit, `VM_JVMTI` macro is defined, and a new entry point -- `BytecodeInterpreter::withChecks` -- is defined. Then, both compilation units are compiled. In one of them, `JVMTI` hooks are stripped out. In another, they persist. Then, callers have to choose which entry point to use. >> >> I believe this can be rewritten to use C++ templates instead of XLST and defines dance. This also allows to clean up JVMTI checks a bit. >> >> Additional testing: >> - [x] Linux x86_64 Zero fastdebug build with `-jvmti` >> - [x] Linux x86_64 Zero fastdebug/release build times are not regressing > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Fix build error > - Merge branch 'master' into JDK-8255822-zero-jvmti-rework > - Revert one dubious change > - 8255822: Zero: improve build-time JVMTI handling > Summary: use C++ templates instead of XSLT transforms This looks like a good cleanup to me - far simpler! Cheers, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1061 From stuefe at openjdk.java.net Thu Nov 5 07:04:56 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 5 Nov 2020 07:04:56 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: <7oh4JAWsnFjXMCx9bwBCjhaZALn4U3kje15ZXn2klE0=.ab9b5ea6-0336-47dc-b9a5-32827ad47d66@github.com> References: <7oh4JAWsnFjXMCx9bwBCjhaZALn4U3kje15ZXn2klE0=.ab9b5ea6-0336-47dc-b9a5-32827ad47d66@github.com> Message-ID: On Wed, 4 Nov 2020 23:28:48 GMT, Coleen Phillimore wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback David > > src/hotspot/os/posix/signals_posix.cpp line 615: > >> 613: "\n# /--------------------\\" >> 614: "\n# | %-7s |" >> 615: "\n# \\---\\ /--------------/" > > Isn't the little robot supposed to say "segmentation fault" and would that be safer than calling get_signal_name in this context? thanks for keeping the picture. Thanks Coleen. Little robot now spells out the name of the signal (since "segmentation fault" is only correct for segv, and I wanted to see it for the other cases too. get_signal_name() is completely harmless, just uses a bit of stack buffer (and that only for the case of unknown signals which are printed numerically; I plan to change that and give us a simple version which just returns static strings). ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From shade at openjdk.java.net Thu Nov 5 07:21:54 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 5 Nov 2020 07:21:54 GMT Subject: RFR: 8255523: Clean up temporary shared_locs initializations In-Reply-To: <5OHoOpEGZ0j62A09r5hqDzxIu39qDSrxrC-b2tWOAzg=.4555cf67-82ee-4827-9154-1322b3f4fcf8@github.com> References: <5OHoOpEGZ0j62A09r5hqDzxIu39qDSrxrC-b2tWOAzg=.4555cf67-82ee-4827-9154-1322b3f4fcf8@github.com> Message-ID: On Wed, 28 Oct 2020 09:36:55 GMT, Aleksey Shipilev wrote: > See #648. Apparently, LLVM 11 complains that we are computing the number of elements over the array of a different type. Instead of ignoring the warning, it seems better to just clean up that code. We can allocate the whole thing as resource array of the same size. `sizeOf(relocInfo) = 2`, since it carries `unsigned short`. Friendly reminder. ------------- PR: https://git.openjdk.java.net/jdk/pull/897 From thartmann at openjdk.java.net Thu Nov 5 07:25:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 5 Nov 2020 07:25:56 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v3] In-Reply-To: References: <5wM4jpVaL-BlP7FdbWwQkWgpxZZbUlpTVoFw3SRBUqA=.2419c090-2d90-4ff1-8be9-d9167b0dc1ef@github.com> Message-ID: On Wed, 4 Nov 2020 19:01:30 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Pass PhaseValues to Node::destruct > > Good. Thanks Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/994 From thomas.stuefe at gmail.com Thu Nov 5 07:32:15 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 5 Nov 2020 08:32:15 +0100 Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: <7888a9d2-a3b9-9981-665f-1cb3181d199d@oracle.com> Message-ID: Hi David, > > > > 10) When invoking the fatal error handler, we extract the pc from > > the context and hand it over as "faulting pc". For SIGILL and > > SIGFPE, this is not totally correct. According to POSIX [3], for > > those signals the address of the faulting instruction is handed over > > in `si_info.si_addr`. > > > > > > On most platforms this does not matter, they are the same. But on > > some architectures the pc in the signal context actually points > > somewhere else, e.g. beyond the faulting instruction. Therefore > > `si_info.si_addr` is the better choice. > > > > The Posix spec also states "For some implementations, the value of > > si_addr may be inaccurate." - so I'm not at all sure which "pc" we > > should be trusting here? I thought the ucontext was the detailed > > platform specific "context" object that we should extract information > > from. Which architectures give different values in the two and is > there > > some documentation stating what happens for any given os/cpu? > > > > > > I overlooked this. Well, this is a helpful standard :) > > > > I saw this happen on s390 and on pa-risc. Before this patch, I did > > correct this in the platform handler. Out of caution I could #ifdef this > > section to s390. > > I prefer not see any change in behaviour for platforms where no problem > has been observed. Also see what seems to be related comment in > ./cpu/arm/vm_version_arm_32.cpp > > Okay, made this section s390 only. Arm may or may not have the same problem. Their signal handler is in parts using si_addr, partly the context pc. But since you want to keep old behavior, let's keep it for now. I still think si_addr would be more correct and simpler, but we can leave this for a follow up fix. > // JVM_handle_linux_signal moves PC here if SIGILL happens > > > > ---- > > > > > > The changes in this patch: > > > > > > a) hotspot signal handling is now done by the following functions: > > > > > > > > > | | > > > v v > > > javaSignalHandler JVM_handle_linux_signal() > > > | / > > > v v > > > javaSignalHandler_inner > > > > Not clear why we need the _inner version. Why can't we just have > > javaSignalHandler which is installed as the handler and which is > called > > by JVM_handle_XXX_signal? > > > > > > Because JVM_handle_XXX_signal has one more argument than the standard > > signal handler (abort_if_unrecognized). > > Okay so why introduce this shape instead of keeping the existing form: > > javaSignalHandler -> JVM_handle_xxx_signal(..., true) -> > javaSignalHandler_inner > > ? With the new arrangement the equivalence between javaSignalHandler and > JVM_handle_xxx_signal can only be seen by inspecting the code of both. > > I see now what you mean. To my mind, JVM_handle_linux_signal() is an external API with a contract which does not necessarily correspond with what javaSignalHandler_inner() is about to do. Its existence demonstrates that (as an explicit side door into the signal handler hierarchy), and that door can be guarded and adorned with pre- and post-processing if needed. For example, the contract says that the user shall not pass anything other than the given list of signals. Well, we can assert it here more clearly. So users could use a debug VM to test their application. It also means we can, for the release case, safely shortcut signals it should ignore. That isolates this interface a bit from any future changes to the hotspot signal handlers (because no-one ever thinks about this stuff when working with the handlers). I know these are not hard arguments, it is a matter of taste. If you insist, I change it in the way you prefer. > > So I think JVM_handle_xx_signal() should test for the list of allowed > > signals, and just return false right away in case sig is none of the > > hotspot signals. > I don't think we should change existing behaviour here. But is that not an error worth fixing? Before, someone could have passed in 177 as a signal number and this would have crashed the VM if he also passed in abort_if_unrecognized=true. This is a bit like my argument from above. I believe we have a clear contract with this API. If that contract is broken, what should we do? I prefer to assert, but in release case to be tolerant and ignore it. Otherwise, what good is this contract if we cannot rely on it and don't check it? > > // This routine may recognize any of the following kinds of signals: > > // SIGBUS, SIGSEGV, SIGILL, SIGFPE, SIGQUIT, SIGPIPE, SIGXFSZ, SIGUSR1. > > // It should be consulted by handlers for any of those signals. > > > > Note that this list is not really correct, as it includes SIGUSR1 > > and SIGQUIT. None of which are handled by the hotspot signal handler. As > > you wrote before, this mechanism is only to handle signals the hotspot > > commandeers. > > SIGUSR1 is probably a leftover from when we did use SIGUSR1 for something. > > SIGQUIT is interesting because if the app is in charge of signals then > it will install a SIGQUIT handler and we would not want the "VM" to do > it's normal SIGQUIT handler. Hm, I can see it either way. SIGQUIT is for thread dumping. Whether or not the application would want us to honor it we cannot say. The current behavior would be to treat SIGQUIT as unknown signal (crash or ignore). I would keep that behavior for now. > I have a big question mark over how the use > of the signal handler thread can/should interact with > AllowUserSignalHandlers. What is the signal handler thread? Thanks, Thomas From ihse at openjdk.java.net Thu Nov 5 07:58:57 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Thu, 5 Nov 2020 07:58:57 GMT Subject: RFR: 8255822: Zero: improve build-time JVMTI handling [v2] In-Reply-To: References: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> Message-ID: On Thu, 5 Nov 2020 06:39:02 GMT, Aleksey Shipilev wrote: >> Current Zero interpreter has the optimization for JVMTI support. It recognizes that JVMTI is disabled most of the time, and that JVMTI checks in the interpreter code slows it down considerably. (I measured it myself when working on this patch: removing this optimization yields about 20% hit in build times). >> >> Current optimization works as follows. At build time, an XSLT transform is performed on `bytecodeInterpreter.cpp`, yielding `bytecodeInterpreterWithChecks.cpp`. In that new compilation unit, `VM_JVMTI` macro is defined, and a new entry point -- `BytecodeInterpreter::withChecks` -- is defined. Then, both compilation units are compiled. In one of them, `JVMTI` hooks are stripped out. In another, they persist. Then, callers have to choose which entry point to use. >> >> I believe this can be rewritten to use C++ templates instead of XLST and defines dance. This also allows to clean up JVMTI checks a bit. >> >> Additional testing: >> - [x] Linux x86_64 Zero fastdebug build with `-jvmti` >> - [x] Linux x86_64 Zero fastdebug/release build times are not regressing > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Fix build error > - Merge branch 'master' into JDK-8255822-zero-jvmti-rework > - Revert one dubious change > - 8255822: Zero: improve build-time JVMTI handling > Summary: use C++ templates instead of XSLT transforms Looks good to me. Thanks for the cleanup! ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1061 From chagedorn at openjdk.java.net Thu Nov 5 07:58:57 2020 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 5 Nov 2020 07:58:57 GMT Subject: RFR: 8255665: C2 should aggressively remove temporary hook nodes [v3] In-Reply-To: <5wM4jpVaL-BlP7FdbWwQkWgpxZZbUlpTVoFw3SRBUqA=.2419c090-2d90-4ff1-8be9-d9167b0dc1ef@github.com> References: <5wM4jpVaL-BlP7FdbWwQkWgpxZZbUlpTVoFw3SRBUqA=.2419c090-2d90-4ff1-8be9-d9167b0dc1ef@github.com> Message-ID: On Wed, 4 Nov 2020 07:08:06 GMT, Tobias Hartmann wrote: >> C2 often creates temporary "hook" nodes to keep other nodes alive. Although dead, these are sometimes not removed because they don't end up on the IGVN worklist. JDK-8040213 added detection of modified nodes that are not re-processed by IGVN but currently ignores dead nodes. >> >> This patch includes the following changes: >> - Adjust detection of modified nodes such that dead nodes are includes as well. This revealed several locations were dead nodes are not eagerly destructed (or not even added to the worklist for later removal). I've fixed all of these. >> - No need to yank node inputs before calling `destruct`. >> - `kill_dead_code` accidentally re-adds dead nodes to the `_modified_nodes` list. `Compile::remove_modified_node` should be called at the end to avoid this. >> - Some removal of dead code. >> >> Tested with tier1-3, higher tiers are running. >> >> JDK-8255670 will further improve detection. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Pass PhaseValues to Node::destruct Marked as reviewed by chagedorn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/994 From thartmann at openjdk.java.net Thu Nov 5 08:05:56 2020 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 5 Nov 2020 08:05:56 GMT Subject: Integrated: 8255665: C2 should aggressively remove temporary hook nodes In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 08:14:28 GMT, Tobias Hartmann wrote: > C2 often creates temporary "hook" nodes to keep other nodes alive. Although dead, these are sometimes not removed because they don't end up on the IGVN worklist. JDK-8040213 added detection of modified nodes that are not re-processed by IGVN but currently ignores dead nodes. > > This patch includes the following changes: > - Adjust detection of modified nodes such that dead nodes are includes as well. This revealed several locations were dead nodes are not eagerly destructed (or not even added to the worklist for later removal). I've fixed all of these. > - No need to yank node inputs before calling `destruct`. > - `kill_dead_code` accidentally re-adds dead nodes to the `_modified_nodes` list. `Compile::remove_modified_node` should be called at the end to avoid this. > - Some removal of dead code. > > Tested with tier1-3, higher tiers are running. > > JDK-8255670 will further improve detection. > > Thanks, > Tobias This pull request has now been integrated. Changeset: eb85b8da Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/eb85b8da Stats: 83 lines in 16 files changed: 4 ins; 42 del; 37 mod 8255665: C2 should aggressively remove temporary hook nodes Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/994 From volker.simonis at gmail.com Thu Nov 5 09:02:15 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 5 Nov 2020 10:02:15 +0100 Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: <20201104150607.735653032@eggemoggin.niobe.net> References: <20201104150607.735653032@eggemoggin.niobe.net> Message-ID: On Thu, Nov 5, 2020 at 12:06 AM wrote: > > 2020/11/4 14:53:47 -0800, vladimir.kozlov at oracle.com: > > On 11/3/20 2:51 AM, Volker Simonis wrote: > >> this is an interesting step and I wonder how it affects the OpenJDK > >> Graal, Metropolis and Leyden projects? > >> > > ... > > > > I will let Mark to talk about Project Leyden. > > > > Best regards, > > Vladimir Kozlov > > > > ... > >> > >> - Project Leyden [?]: @Mark: what's actually the state of Project > >> Leyden? We had a discussion [3], a vote [4] and the approval of the > >> project [5] yet nothing has happened ever since. There's neither a > >> project page nor a mailing list. > > Unfortunately, due to other priorities I haven?t had the time to get > this project started properly. I hope to be able to do that soon. > > >> Considering the fact that Leyden was supposed to "be based upon > >> existing components in the JDK such as the HotSpot JVM, the `jaotc` > >> ahead-of-time compiler, application class-data sharing, and the > >> `jlink` linking tool" I wonder if Leyden is already dead before its > >> instantiation if "jaotc", one of its core components, has now been > >> deprecated? Or are there any plans to enhance C2 for AOT scenarios? > > We are considering the possibility of using C2 for ahead-of-time > compilation. > Vladimir, Mark, thanks a lot for your answers. It's encouraging to see that you keep on investing in C2. Best regards, Volker > - Mark > From jbhateja at openjdk.java.net Thu Nov 5 09:06:10 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 5 Nov 2020 09:06:10 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v12] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: JDK-8252848: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/9e85592a..689426d6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=10-11 Stats: 28 lines in 6 files changed: 2 ins; 0 del; 26 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Thu Nov 5 09:06:12 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 5 Nov 2020 09:06:12 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 19:25:30 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - JDK-8252848: Review comments addressed. >> - Merge remote-tracking branch 'origin' into JDK-8252848 >> - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - JDK-8252848 : Review comments resolution. >> - Merge remote-tracking branch 'upstream' into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a > > Changes requested by kvn (Reviewer). Hi @vnkozlov , I have resolved your review comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From simonis at openjdk.java.net Thu Nov 5 09:55:59 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 5 Nov 2020 09:55:59 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v3] In-Reply-To: References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> Message-ID: On Thu, 5 Nov 2020 05:49:11 GMT, Xin Liu wrote: >> UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed >> from hotspot, so remove this flag. > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - 8255562: delete UseRDPCForConstantTableBase > > mark UseRDPCForConstantTableBase is obsoletd in jdk16 and will expire in jdk17. > - 8255562: delete UseRDPCForConstantTableBase Hi Xin, you also have to remove the option from `c2_globals.hpp`. Also, please add the line { "UseRDPCForConstantTableBase", JDK_Version::undefined(), JDK_Version::jdk(16), JDK_Version::jdk(17) }, unconditionally (e.g. after the line for the `Debugging` option). You've added it to the `#ifndef COMPILER2` section. I suppose that's the reason why you haven't seen the deprecation warning. I think if you add the option unconditionally to `special_jvm_flags` you shouldn't need any changes to `Arguments::handle_aliases_and_deprecation`. You can have a look at [8252889: Obsolete -XX:+InsertMemBarAfterArraycopy](https://bugs.openjdk.java.net/browse/JDK-8252889) for an example. ------------- Changes requested by simonis (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/949 From mdoerr at openjdk.java.net Thu Nov 5 10:13:59 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 10:13:59 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v12] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 01:20:08 GMT, CoreyAshford wrote: >> This patch set encompasses the following commits: >> >> - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. >> - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation >> - Adds a Power64LE-specific implementation of the decodeBlock intrinsic. >> - Adds a JMH microbenchmark for both Base64 encoding and encoding. >> - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. > > CoreyAshford has updated the pull request incrementally with one additional commit since the last revision: > > stubGenerator_ppc.cpp: fix typo (omitted 'the') I already had the feeling that unrolling the large loop was not beneficial. Thanks and thumps up from my side! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/293 From aph at redhat.com Thu Nov 5 10:17:17 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 5 Nov 2020 10:17:17 +0000 Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: <81a8ab23-1505-bacf-87a2-ab9e9088c76a@redhat.com> On 05/11/2020 01:52, Nick Gasson wrote: > On Wed, 4 Nov 2020 16:46:36 GMT, Andrew Dinn wrote: > >> >> At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. > > I think that comes from `getauxval() & HWCAP_DCPOP`? If there's anything we need from /proc/cpuinfo in order to run Java, we really should be raising a bug with the kernel people. And besides, what happens if you try to use dcpop and it's not supported? If you get SIGILL, then we're done. We don't have to ask permission if we can try it. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From akozlov at openjdk.java.net Thu Nov 5 10:42:54 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 5 Nov 2020 10:42:54 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Thu, 5 Nov 2020 01:49:51 GMT, Nick Gasson wrote: >>> And while we're talking about Linux, can we really not get the info we need >> without parsing /proc/cpuinfo? And do we need to parse the entire file? >> >> At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. > >> >> At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. > > I think that comes from `getauxval() & HWCAP_DCPOP`? > > It's not a feature of single-core CPUs: AFAIK it's a work around for very old arm64 kernels that only reported a single CPU in `/proc/cpuinfo` on multi-core systems where you may have a mix of different CPU types (i.e. mixed A53/A57 where the A57 is reported in cpuinfo). > > I wonder if we should just remove this workaround altogether? The patch to list all CPUs in `/proc/cpuinfo` was backported to at least the 3.10 series. I really doubt there's anyone running latest OpenJDK on a A53 with such an old kernel. > > Yes, please. > > @theRealAph BTW, I think Nick is right that this patch is not needed. Are you ok to reject it? > > We should delete lines 184-187 in the current file: this isn't working as intended since the switch to `os::processor_count()` and as discussed the ancient kernel versions where this was necessary should no longer be in use. In general, I support to abandon this and remove the CPU_A53MAC instead. But the use of the flag is not clear for me https://github.com/openjdk/jdk/blob/f279ddfa06392f8ea14224e478a00bad33b84e7a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L447 This code does not check if there is a mix of CPU types, but does for exact type/number-of-cores combination that stands behind the flag. I don't feel comfortable to remove this `nop` now. It will be great if If someone can clarify this. Meanwhile, I'll check for `madd`/`msub`/... and will followup. > > At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. > > I think that comes from `getauxval() & HWCAP_DCPOP`? Yes, since JDK-8253015 dcpop feature is deteced by hwcap with guarantee about /proc/cpuinfo https://github.com/openjdk/jdk/commit/ec9bee68#diff-7e6fa90a7bcdbe41687eb8d39c6c6232e6518b019937a87aab75284166ef67bdR157-R159. It's only `hwcap => cpuinfo` implication checked, catching silent suboptimal performance. Failing `<=` should be evident from crashing on using dcpop. > And while we're talking about Linux, can we really not get the info we need without parsing /proc/cpuinfo? And do we need to parse the entire file? Another reason to read /proc/cpuinfo is to get vendor, model, etc of the CPUs. I would like to avoid addressing this here. > Tested slowdebug build on Windows+Arm64 with this patch, and smoked tested it with `jtreg:tier1_compiler_1` successfully. > > Change itself looks go to me too (but I'm not a reviewer). Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From mdoerr at openjdk.java.net Thu Nov 5 11:00:03 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 11:00:03 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v6] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 23:04:10 GMT, Ziviani wrote: >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Very nice! Unfortunately, I had forgotten one detail (see inline). I'll try to find a 2nd reviewer in our team. src/hotspot/cpu/ppc/ppc.ad line 11422: > 11420: > 11421: // Manifest a CmpL3 result in an integer register. > 11422: instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2) %{ I had forgotten one detail in my previous review. Sorry for that. We need to model the CR0 effect: instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2, flagsRegCR0 cr0) %{ match(Set dst (CmpL3 src1 src2)); effect(KILL cr0); (Same for other nodes.) src/hotspot/cpu/ppc/ppc.ad line 11766: > 11764: __ fcmpu(CCR0, $src1$$FloatRegister, $src2$$FloatRegister); > 11765: __ cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); > 11766: __ set_cmp3($dst$$Register); Why not set_cmpu3($dst$$Register, true); // C2 requires unordered to get treated like less ? (same for CmpD3) src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 1611: > 1609: > 1610: __ fcmpu(CCR0, Rfirst, Rsecond); // compare > 1611: // if unordered_result is 1, treat unordered_result like 'greater than' Please add assert(unordered_result == 1 || unordered_result == -1, "only supported"); ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/907 From aph at openjdk.java.net Thu Nov 5 11:03:56 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 5 Nov 2020 11:03:56 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Thu, 5 Nov 2020 10:40:33 GMT, Anton Kozlov wrote: > @theRealAph BTW, I think Nick is right that this patch is not needed. Are you ok to reject it? Why do I have to reject it? Can't the author simply withdraw it? ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From ngasson at openjdk.java.net Thu Nov 5 11:31:57 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 5 Nov 2020 11:31:57 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Thu, 5 Nov 2020 10:40:33 GMT, Anton Kozlov wrote: > > > > We should delete lines 184-187 in the current file: this isn't working as intended since the switch to `os::processor_count()` and as discussed the ancient kernel versions where this was necessary should no longer be in use. > > In general, I support to abandon this and remove the CPU_A53MAC instead. But the use of the flag is not clear for me > > https://github.com/openjdk/jdk/blob/f279ddfa06392f8ea14224e478a00bad33b84e7a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L447 > This code does not check if there is a mix of CPU types, but does for exact type/number-of-cores combination that stands behind the flag. I don't feel comfortable to remove this `nop` now. It will be great if If someone can clarify this. Meanwhile, I'll check for `madd`/`msub`/... and will followup. > I wasn't suggesting removing CPU_A53MAC entirely - that will always be required to work around an A53 hardware errata and we should set it whenever we detect an A53 in cpuinfo. What I meant was remove the logic on lines 184-187 that sets this flag if there is only one CPU listed in /proc/cpuinfo and that CPU is an A57. This exists to handle old Linux kernels that only reported CPU features for a single core in cpuinfo: it's possible on a mixed A53/A57 system that only the A57 features are reported but there's also an A53 lurking, in which case we still need to apply the MAC workaround in the JIT. The kernel was patched long ago to print the features of every CPU in /proc/cpuinfo so this check is no longer required. ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From akozlov at openjdk.java.net Thu Nov 5 12:24:54 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 5 Nov 2020 12:24:54 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Thu, 5 Nov 2020 11:29:16 GMT, Nick Gasson wrote: >>> > It's not a feature of single-core CPUs: AFAIK it's a work around for very old arm64 kernels that only reported a single CPU in `/proc/cpuinfo` on multi-core systems where you may have a mix of different CPU types (i.e. mixed A53/A57 where the A57 is reported in cpuinfo). >>> > I wonder if we should just remove this workaround altogether? The patch to list all CPUs in `/proc/cpuinfo` was backported to at least the 3.10 series. I really doubt there's anyone running latest OpenJDK on a A53 with such an old kernel. >>> >>> Yes, please. >> >>> > @theRealAph BTW, I think Nick is right that this patch is not needed. Are you ok to reject it? >>> >>> We should delete lines 184-187 in the current file: this isn't working as intended since the switch to `os::processor_count()` and as discussed the ancient kernel versions where this was necessary should no longer be in use. >> >> In general, I support to abandon this and remove the CPU_A53MAC instead. But the use of the flag is not clear for me https://github.com/openjdk/jdk/blob/f279ddfa06392f8ea14224e478a00bad33b84e7a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L447 This code does not check if there is a mix of CPU types, but does for exact type/number-of-cores combination that stands behind the flag. I don't feel comfortable to remove this `nop` now. It will be great if If someone can clarify this. Meanwhile, I'll check for `madd`/`msub`/... and will followup. >> >>> > At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. >>> >>> I think that comes from `getauxval() & HWCAP_DCPOP`? >> >> Yes, since JDK-8253015 dcpop feature is deteced by hwcap with guarantee about /proc/cpuinfo https://github.com/openjdk/jdk/commit/ec9bee68#diff-7e6fa90a7bcdbe41687eb8d39c6c6232e6518b019937a87aab75284166ef67bdR157-R159. It's only `hwcap => cpuinfo` implication checked, catching silent suboptimal performance. Failing `<=` should be evident from crashing on using dcpop. >> >>> And while we're talking about Linux, can we really not get the info we need without parsing /proc/cpuinfo? And do we need to parse the entire file? >> >> Another reason to read /proc/cpuinfo is to get vendor, model, etc of the CPUs. I would like to avoid addressing this here. >> >>> Tested slowdebug build on Windows+Arm64 with this patch, and smoked tested it with `jtreg:tier1_compiler_1` successfully. >>> >>> Change itself looks go to me too (but I'm not a reviewer). >> >> Thanks! > >> > >> > We should delete lines 184-187 in the current file: this isn't working as intended since the switch to `os::processor_count()` and as discussed the ancient kernel versions where this was necessary should no longer be in use. >> >> In general, I support to abandon this and remove the CPU_A53MAC instead. But the use of the flag is not clear for me >> >> https://github.com/openjdk/jdk/blob/f279ddfa06392f8ea14224e478a00bad33b84e7a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L447 >> This code does not check if there is a mix of CPU types, but does for exact type/number-of-cores combination that stands behind the flag. I don't feel comfortable to remove this `nop` now. It will be great if If someone can clarify this. Meanwhile, I'll check for `madd`/`msub`/... and will followup. >> > > I wasn't suggesting removing CPU_A53MAC entirely - that will always be required to work around an A53 hardware errata and we should set it whenever we detect an A53 in cpuinfo. What I meant was remove the logic on lines 184-187 that sets this flag if there is only one CPU listed in /proc/cpuinfo and that CPU is an A57. This exists to handle old Linux kernels that only reported CPU features for a single core in cpuinfo: it's possible on a mixed A53/A57 system that only the A57 features are reported but there's also an A53 lurking, in which case we still need to apply the MAC workaround in the JIT. The kernel was patched long ago to print the features of every CPU in /proc/cpuinfo so this check is no longer required. > I wasn't suggesting removing CPU_A53MAC entirely [...] What I meant was remove the logic on lines 184-187 Oh, my bad, I spend too much time thinking about reported single-core case, that the real flag meaning slipped from me. Thanks for the hint. I agree with suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From akozlov at openjdk.java.net Thu Nov 5 12:24:56 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 5 Nov 2020 12:24:56 GMT Subject: Withdrawn: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: <-XDBvInkX38bhnE044VUJa4Gwu7_UOrUv8iNHg7P2s4=.0c48ef63-a7d9-4771-86d4-9c7a171935b6@github.com> On Tue, 3 Nov 2020 16:58:58 GMT, Anton Kozlov wrote: > JDK-8255716 (#983) uncovered that os::processor_count on Linux can be not equal to number of cores reported in /proc/cpuinfo. The latter historically was used to decide CPU_A53MAC feature. This patch restores feature detection based on /proc/cpuinfo. > > Testing: linux -version > Unfortunately I cannot test windows/aarch64, CC: @lewurm @luhenry This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From coleenp at openjdk.java.net Thu Nov 5 12:30:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 12:30:01 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: <172AiTMoD9T5iKu5xEVQ3AEixTDRx4gaAx0pQFfR57k=.d1d7648c-fb6f-4e87-b515-295c4e6187f7@github.com> References: <172AiTMoD9T5iKu5xEVQ3AEixTDRx4gaAx0pQFfR57k=.d1d7648c-fb6f-4e87-b515-295c4e6187f7@github.com> Message-ID: On Wed, 4 Nov 2020 10:05:29 GMT, Kim Barrett wrote: >> The comment is trying to describe the situation like: >> 1. mark-end pause (WeakHandle.peek() returns NULL because object A is unmarked) >> 2. safepoint for heap walk >> 2a. Need to post ObjectFree event for object A before the heap walk doesn't find object A. >> 3. gc_notification - would have posted an ObjectFree event for object A if the heapwalk hadn't intervened >> >> The check_hashmap() function also checks whether the hash table needs to be rehashed before the next operation that uses the hashtable. >> >> Both operations require the table to be locked. >> >> The unlink and post needs to be in a GC pause for reasons that I stated above. The unlink and post were done in a GC pause so this isn't worse for any GCs. The lock can be held for concurrent GC while the number of entries are processed and this would be a delay for some applications that have requested a lot of tags, but these applications have asked for this and it's not worse than what we had with GC walking this table in safepoints. > > For the GCs that call the num_dead notification in a pause it is much worse than what we had. As I pointed out elsewhere, it used to be that tagmap processing was all-in-one, as a single serial subtask taken by the first thread that reached it in WeakProcessor processing. Other threads would find that subtask taken and move on to processing oopstores in parallel with the tagmap processing. Now everything except the oopstorage-based clearing of dead entries is a single threaded serial task done by the VMThread, after all the parallel WeakProcessor work is done, because that's where the num-dead callbacks are invoked. WeakProcessor's parallel oopstorage processing doesn't have a way to do the num-dead callbacks by the last thread out of each parallel oopstorage processing. Instead it's left to the end, on the assumption that the callbacks are relatively cheap. But that could still be much worse than the old code, since the tagmap oopstorage could be late in the order of processing, and so still effectively be a serial subtask after all the parallel subtasks are done or mostly done. Yes, you are right that the processing will be done serially and not by a parallel worker thread. This is could spawn a new GC worker thread to process the posts, as you suggest. We could do that if we find a customer that has a complaint about the pause time of this processing. >> The JVMTI code expects the posting to be done quite eagerly presumably during GC, before it has a chance to disable the event or some other such operation. So the posting is done during the notification because it's as soon as possible. Deferring to the ServiceThread had two problems. 1. the event came later than the caller is expecting it, and in at least one test the event was disabled before posting, and 2. there's a comment in the code why we can't post events with a JavaThread. We'd have to transition into native while holding a no safepoint lock (or else deadlock). The point of making this change was so that the JVMTI table does not need GC code to serially process the table. > > I know of nothing that leads to "presumably during GC" being a requirement. Having all pending events of some type occur before that type of event is disabled seems like a reasonable requirement, but just means that event disabling also requires the table to be "up to date", in the sense that any GC-cleared entries need to be dealt with. That can be handled just like other operations that use the table contents, rather than during the GC. That is, use post_dead_object_on_vm_thread if there are or might be any pending dead objects, before disabling the event. Ok, so there were many test failures with other approaches. Having GC trigger the posting was the most reliable way to post the events when the tests (and presumably the jvmti customers) expected the events to be posted. We could revisit during event disabling if a customer complains about GC pause times. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Thu Nov 5 12:30:00 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 12:30:00 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v5] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 08:56:54 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Code review comments from Kim and Albert. > > src/hotspot/share/prims/jvmtiTagMapTable.hpp line 36: > >> 34: class JvmtiTagMapEntryClosure; >> 35: >> 36: class JvmtiTagMapEntry : public HashtableEntry { > > By using utilities/hashtable this buys into having to use HashtableEntry, which includes the _hash member, even though that value is trivially computed from the key (since we're using address-based hashing here). This costs an additional 8 bytes (_LP64) per entry (a 25% increase) compared to the old JvmtiTagHashmapEntry. (I think it doesn't currently make a difference on !_LP64 because of poorly chosen layout in the old code, but fixing that would make the difference 33%). > > It seems like it should not have been hard to replace the oop _object member in the old code with a WeakHandle while otherwise maintaining the Entry interface, allowing much of the rest of the code to remain the same or similar and not incurring this additional space cost. Yes, there is 64/32 bits extra per hashtable entry with the standard hashtable implementation. It wouldn't have been hard to replace the oop object, but using shared code was a goal of this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Thu Nov 5 12:30:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 12:30:01 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 09:27:39 GMT, Kim Barrett wrote: >> I didn't say it "doesn't work for shenandoah", I said it wouldn't have worked with the old shenandoah barriers without additional work, like adding calls to resolve. I understand the design intent of notifying the table management that its hash codes are out of date. And the num-dead callback isn't the right place, since there are num-dead callback invocations that aren't associated with hash code invalidation. (It's not a correctness wrong, it's a "these things are unrelated and this causes unnecessary work" wrong.) > > It used to be that jvmti tagmap processing was all-in-one (in GC weak reference processing, with the weak clearing, dead table entry removal, and rehashing all done in one pass. This change has split that up, with the weak clearing happening in a different place (still as part of the GC's weak reference processing) than the others (which I claim can now be part of the mutator, whether further separated or not). > > "Concurrent GC" has nothing to do with whether tagmaps need rehashing. Any copying collector needs to do so. A non-copying collector (whether concurrent or not) would not. (We don't have any purely non-copying collectors, but G1 concurrent oldgen collection is non-copying.) And weak reference clearing (whether concurrent or not) has nothing to do with whether objects have been moved and so the hashing has been invalidated. > > There's also a "well known" issue with address-based hashing and generational or similar collectors, where a simple boolean "objects have moved" flag can be problematic, and which tagmaps seem likely to be prone to. The old do_weak_oops tries to mitigate it by recognizing when the object didn't move. The new rehash function also doesn't move the objects either. It essentially does the same as the old weak_oops_do function. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Thu Nov 5 12:30:02 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 12:30:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v3] In-Reply-To: References: <5UvpVT3pWDNSJ1Vh_WIy-E3ZjOS8O-8sZi-9ZRyYYQI=.d5d244cf-5e1b-4790-9370-66ae3a2ea76c@github.com> <0dSlRGfxnjP_6IlH5CEYdF48d3ymvjCKE_s4UZb_WNc=.789216de-6d78-4c00-bdd4-e09f1dec3924@github.com> <1qM8Skbob0uL_KwdoJNDTyavFxOH_VHJc5o6yF881zI=.604bc76e-0536-48a0-91d5-4ba85e32bc11@github.com> Message-ID: On Wed, 4 Nov 2020 07:42:36 GMT, Kim Barrett wrote: >>> Ok, so I'm not sure what to do with this: >>> >>> enum Phase { >>> // Serial phase. >>> JVMTI_ONLY(jvmti) >>> // Additional implicit phase values follow for oopstorages. >>> `};` >>> >>> I've removed the only thing in this enum. >> >> Enums without any named enumerators are still meaningful types. More so with scoped enums, but still with unscoped enums. > >> > Though it might be possible to go even further and eliminate WeakProcessorPhases as a thing separate from OopStorageSet. >> >> This makes sense. Can we file another RFE for this? I was sort of surprised by how much code was involved so I tried to find a place to stop deleting. > > I think the deletion stopped at the wrong place; it either went too far, or not far enough. I restored the empty enum for Phase and can open a new RFE if there is more code to remove. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From github.com+670087+jrziviani at openjdk.java.net Thu Nov 5 12:34:56 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Thu, 5 Nov 2020 12:34:56 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v6] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 10:50:10 GMT, Martin Doerr wrote: >> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/cpu/ppc/ppc.ad line 11422: > >> 11420: >> 11421: // Manifest a CmpL3 result in an integer register. >> 11422: instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2) %{ > > I had forgotten one detail in my previous review. Sorry for that. We need to model the CR0 effect: > instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2, flagsRegCR0 cr0) %{ > match(Set dst (CmpL3 src1 src2)); > effect(KILL cr0); > (Same for other nodes.) Wow, thanks for catching it. But, let me make my naive question: why is it necessary? ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From coleenp at openjdk.java.net Thu Nov 5 12:45:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 12:45:01 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v6] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 22:09:21 GMT, Serguei Spitsyn wrote: >> Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add back WeakProcessorPhases::Phase enum. >> - Serguei 1. > > Thank you for the update, Coleen! > I leave it for you to decide to refactor the gc_notification or not. > Thanks, > Serguei Thanks @sspitsyn . I'm going to leave the gc_notification code because structurally the two sides of the if statement are different and it's not a long function. Thank you for reviewing the change. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From github.com+670087+jrziviani at openjdk.java.net Thu Nov 5 13:30:11 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Thu, 5 Nov 2020 13:30:11 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7] In-Reply-To: References: Message-ID: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. Ref: PowerISA 3.1, page 129. These instructions are particularly interesting to improve the following pattern `(src1src2)? 1: 0)`, which can be found in `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. Long.toString, that generate such pattern in getChars, has showed a good performance gain by using these new instructions. Example: for (int i = 0; i < 200_000; i++) res = Long.toString((long)i); java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString Without setbc (average): 0.1178 seconds With setbc (average): 0.0396 seconds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/907/files - new: https://git.openjdk.java.net/jdk/pull/907/files/4e092e13..68081ca6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=05-06 Stats: 11 lines in 2 files changed: 4 ins; 2 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/907/head:pull/907 PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Thu Nov 5 13:33:56 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Thu, 5 Nov 2020 13:33:56 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v6] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 10:52:40 GMT, Martin Doerr wrote: >> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/cpu/ppc/ppc.ad line 11766: > >> 11764: __ fcmpu(CCR0, $src1$$FloatRegister, $src2$$FloatRegister); >> 11765: __ cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); >> 11766: __ set_cmp3($dst$$Register); > > Why not > set_cmpu3($dst$$Register, true); // C2 requires unordered to get treated like less > ? > (same for CmpD3) my bad, fixed! Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From mdoerr at openjdk.java.net Thu Nov 5 14:06:57 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 14:06:57 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v6] In-Reply-To: References: Message-ID: <9C7uAl7b_TLzZtADhBbeXlwNLT-W5cUvGEVM7uLADk8=.3a0e038c-2ed4-49ba-9a2e-8d4ed2441d26@github.com> On Thu, 5 Nov 2020 12:32:09 GMT, Ziviani wrote: >> src/hotspot/cpu/ppc/ppc.ad line 11422: >> >>> 11420: >>> 11421: // Manifest a CmpL3 result in an integer register. >>> 11422: instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2) %{ >> >> I had forgotten one detail in my previous review. Sorry for that. We need to model the CR0 effect: >> instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2, flagsRegCR0 cr0) %{ >> match(Set dst (CmpL3 src1 src2)); >> effect(KILL cr0); >> (Same for other nodes.) > > Wow, thanks for catching it. But, let me make my naive question: why is it necessary? E.g. instruct testI_reg_imm sets cr0 as result and branchConFar uses it. The kill cr0 effect disallows scheduling your nodes between them. ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From eosterlund at openjdk.java.net Thu Nov 5 14:22:01 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 5 Nov 2020 14:22:01 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 21:14:04 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/jvmtiTagMap.cpp line 3018: >> >>> 3016: } >>> 3017: // Later GC code will relocate the oops, so defer rehashing until then. >>> 3018: tag_map->_needs_rehashing = true; >> >> This is wrong for some collectors. I think all collectors ought to be calling set_needs_rehashing in appropriate places, and it can't be be correctly piggybacked on the num-dead callback. (See discussion above for that function.) >> >> For example, G1 remark pause does weak processing (including weak oopstorage) and will call the num-dead callback, but does not move objects, so does not require tagmap rehashing. >> >> (I think CMS oldgen remark may have been similar, for what that's worth.) > > Ok, so I'm going to need help to know where in all the different GCs to make this call. This seemed simpler at the expense of maybe causing a rehash at some points when it might not be necessary. For what GC is this wrong? I can see that it might yield more work than required, when performing a full GC, but not that it would do too little work. In other words, I can't see how it is wrong, as opposed to inaccurate. Littering GCs with JVMTI hooks so that we can optimize away an operation we do every young GC, from a full GC, does not really seem worth it IMO. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From eosterlund at openjdk.java.net Thu Nov 5 14:40:02 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 5 Nov 2020 14:40:02 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: <172AiTMoD9T5iKu5xEVQ3AEixTDRx4gaAx0pQFfR57k=.d1d7648c-fb6f-4e87-b515-295c4e6187f7@github.com> Message-ID: On Wed, 4 Nov 2020 13:32:07 GMT, Coleen Phillimore wrote: >> I know of nothing that leads to "presumably during GC" being a requirement. Having all pending events of some type occur before that type of event is disabled seems like a reasonable requirement, but just means that event disabling also requires the table to be "up to date", in the sense that any GC-cleared entries need to be dealt with. That can be handled just like other operations that use the table contents, rather than during the GC. That is, use post_dead_object_on_vm_thread if there are or might be any pending dead objects, before disabling the event. > > Ok, so there were many test failures with other approaches. Having GC trigger the posting was the most reliable way to post the events when the tests (and presumably the jvmti customers) expected the events to be posted. We could revisit during event disabling if a customer complains about GC pause times. The point of this change was not necessarily to be lazy about updating the tagmap, until someone uses it. The point was to get rid of the last annoying serial GC phase. Doing it all lazily would certainly also achieve that. But it would also lead to situations where no event is ever posted from GC to GC. So you would get the event 20 GCs later, which might come as a surprise. It did come as a surprise to some tests, so it is reasonable to assume it would come as a surprise to users too. And I don't think we want such surprises unless we couldn't deal with them. And we can. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From eosterlund at openjdk.java.net Thu Nov 5 14:53:01 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 5 Nov 2020 14:53:01 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: <172AiTMoD9T5iKu5xEVQ3AEixTDRx4gaAx0pQFfR57k=.d1d7648c-fb6f-4e87-b515-295c4e6187f7@github.com> Message-ID: On Wed, 4 Nov 2020 13:22:57 GMT, Coleen Phillimore wrote: >> For the GCs that call the num_dead notification in a pause it is much worse than what we had. As I pointed out elsewhere, it used to be that tagmap processing was all-in-one, as a single serial subtask taken by the first thread that reached it in WeakProcessor processing. Other threads would find that subtask taken and move on to processing oopstores in parallel with the tagmap processing. Now everything except the oopstorage-based clearing of dead entries is a single threaded serial task done by the VMThread, after all the parallel WeakProcessor work is done, because that's where the num-dead callbacks are invoked. WeakProcessor's parallel oopstorage processing doesn't have a way to do the num-dead callbacks by the last thread out of each parallel oopstorage processing. Instead it's left to the end, on the assumption that the callbacks are relatively cheap. But that could still be much worse than the old code, since the tagmap oopstorage could be late in the order of processing, and so still effectively be a serial subtask after all the parallel subtasks are done or mostly done. > > Yes, you are right that the processing will be done serially and not by a parallel worker thread. This is could spawn a new GC worker thread to process the posts, as you suggest. We could do that if we find a customer that has a complaint about the pause time of this processing. So both before and now, this task is a single threaded task. The difference is that before that single threaded task could be performed in parallel to other tasks. So if the table is small, you probably won't be able to notice any difference as small table implies not much to do. And if the table is large, you still probably won't be able to notice any difference as a large table implies it will dominate the pause with both the old and new approach. Any difference at all is bounded at 2x processing time, as it was serial both before and after. But now if we have a perfectly medium balanced table, we can at the very worst observe a theoretical 2x worse processing of this JVMTI table. I think that if we truly did care about this difference, and that it is important to keep this code as well performed as possible, then we would not have a serial phase for this at all. The fact that this has been serial suggests to me that it is not a path that is critical, and therefore I don't think op timizing the theoretical max 2x worse processing times for perfectly medium sized JVMTI tag map tables, is worth the hassle. At least I can't see why this would be of any importance. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Thu Nov 5 15:07:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 5 Nov 2020 15:07:01 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v5] In-Reply-To: References: Message-ID: <_w3Wb6lCIkBeH6UxIGCZ-HoXgwx-qSYF6KpLY1txy6Y=.eeeb53b9-6b1f-41cb-b59b-109b714e8eed@github.com> On Wed, 4 Nov 2020 13:19:21 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/jvmtiTagMapTable.hpp line 36: >> >>> 34: class JvmtiTagMapEntryClosure; >>> 35: >>> 36: class JvmtiTagMapEntry : public HashtableEntry { >> >> By using utilities/hashtable this buys into having to use HashtableEntry, which includes the _hash member, even though that value is trivially computed from the key (since we're using address-based hashing here). This costs an additional 8 bytes (_LP64) per entry (a 25% increase) compared to the old JvmtiTagHashmapEntry. (I think it doesn't currently make a difference on !_LP64 because of poorly chosen layout in the old code, but fixing that would make the difference 33%). >> >> It seems like it should not have been hard to replace the oop _object member in the old code with a WeakHandle while otherwise maintaining the Entry interface, allowing much of the rest of the code to remain the same or similar and not incurring this additional space cost. > > Yes, there is 64/32 bits extra per hashtable entry with the standard hashtable implementation. It wouldn't have been hard to replace the oop object, but using shared code was a goal of this change. So looking at the concurrent hashtable, the entries can be created without saving the hashcode. I was going to use that at first but didn't want to cut/paste the boilerplate to do so and the jvmti tag map hashtable is always accessed with a lock. This could be a future RFE if necessary and would also serve to eliminate another ad-hoc hashtable. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From adinn at openjdk.java.net Thu Nov 5 15:25:54 2020 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 5 Nov 2020 15:25:54 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Thu, 5 Nov 2020 10:40:33 GMT, Anton Kozlov wrote: > > At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. > > I think that comes from `getauxval() & HWCAP_DCPOP`? That assumes that HWCAP_DCPOP is defined on all the Linux/AArch64 releases we still need to build on. I am not sure if that is actually the case. ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From gziemski at openjdk.java.net Thu Nov 5 16:19:01 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 5 Nov 2020 16:19:01 GMT Subject: Integrated: 8250637: UseOSErrorReporting times out (on Mac and Linux) In-Reply-To: References: Message-ID: <9x-kcouwQPbp2IX6oHgqkI5ekSXzilrpQ98jtDjJwI0=.0a903283-da46-444a-810e-9987d4c795a7@github.com> On Thu, 22 Oct 2020 16:40:43 GMT, Gerard Ziemski wrote: > hi all, > > Please review this simple fix for POSIX platforms, which addresses a time out that occurs while handling a crash with UseOSErrorReporting turned ON. > > It appears that "UseOSErrorReporting" flag was only ever meant to be used on Windows platform and was mistakenly left available for other platforms. In this fix we make sure to only use the flag on Windows platform and make it a NOP for other platforms. > > Note #1: A similar hang issue occurs today even on Windows, with the only difference being that before a process times out (takes 2 minutes) it runs out of stack space in about 250 loops, so that's the only reason it doesn't linger for that long. Windows issue is tracked separately by https://bugs.openjdk.java.net/browse/JDK-8250782 > > Note #2: Creating native crash log (on macOS) is a non-trivial, research wise effort, that is tracked by https://bugs.openjdk.java.net/browse/JDK-8237727 > > Note #3 Removal of the "UseOSErrorReporting" flag will be depended on whether we can do #2 and at that time we can decide whether to keep it and implement it for other platforms or whether to remove it, provided that #2 can not be done reliably. This pull request has now been integrated. Changeset: ba2ff3a6 Author: Gerard Ziemski URL: https://git.openjdk.java.net/jdk/commit/ba2ff3a6 Stats: 23 lines in 7 files changed: 5 ins; 8 del; 10 mod 8250637: UseOSErrorReporting times out (on Mac and Linux) Reviewed-by: stuefe, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/813 From eosterlund at openjdk.java.net Thu Nov 5 16:21:56 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 5 Nov 2020 16:21:56 GMT Subject: Integrated: 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 12:44:58 GMT, Erik ?sterlund wrote: > The imasm::remove_activation() call does not deal with safepoints very well. However, when the MethodExit JVMTI event is being called, we call into the runtime in the middle of remove_activation(). If the value being returned is an object type, then the top-of-stack contains the oop. However, the GC does not traverse said oop in any oop map, because it is simply not expected that we safepoint in the middle of remove_activation(). > > The JvmtiExport::post_method_exit() function we end up calling, reads the top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks, that eventually call Java and a bunch of stuff that safepoints. So after the JVMTI callback, we can expect the top-of-stack oop to be broken. Unfortunately, when we continue, we therefore end up returning a broken oop. > > Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY, is wrong, as we can safepoint on the way back to Java, which will break the return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY, moving the transition to VM and back, into a block of code that is protected against GC. Before the JRT_BLOCK is called, we stash away the return oop, and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path when InterpreterRuntime::post_method_exit is called when throwing an exception, we don't have the same problem of retaining an oop result, and hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the logic is the same as before for this path. > > This is a JVMTI bug that has probably been around for a long time. It crashes with all GCs, but was discovered recently after concurrent stack processing, as StefanK has been running better GC stressing code in JVMTI, and the bug reproduced more easily with concurrent stack processing, as the timings were a bit different. The following reproducer failed pretty much 100% of the time: > while true; do make test JTREG="RETAIN=all" TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001 -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done > > With my fix I can run this repeatedly without any more failures. I have also sanity checked the patch by running tier 1-5, so that it does not introduces any new issues on its own. I have also used Stefan's nice external GC stressing with jcmd technique that was used to trigger crashes with other GCs, to make sure said crashes no longer reproduce either. This pull request has now been integrated. Changeset: 3a02578b Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/3a02578b Stats: 58 lines in 3 files changed: 42 ins; 12 del; 4 mod 8255452: Doing GC during JVMTI MethodExit event posting breaks return oop Reviewed-by: coleenp, dlong, rrich, sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/930 From akozlov at openjdk.java.net Thu Nov 5 16:54:57 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 5 Nov 2020 16:54:57 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: <055zgcELaav78pPYT17FjL4QbKy1OXK3gEeUkTkkv5k=.21c42be4-e78f-4ee9-9c71-40893fd1d9d5@github.com> Message-ID: On Thu, 5 Nov 2020 15:23:24 GMT, Andrew Dinn wrote: > > > At present reading /proc/cpuinfo that is the only reliable way I know of to identify whether dcpop is a supported feature (used to force persistence of data to memory). That is needed to support use of NVRAM-backed MappedByteBuffers. > > > > > > I think that comes from `getauxval() & HWCAP_DCPOP`? > > That assumes that HWCAP_DCPOP is defined on all the Linux/AArch64 releases we still need to build on. I am not sure if that is actually the case. There is an established practice to define a value of HWCAP if one is not provided in the system headers https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/vm_version_linux_aarch64.cpp#L58, so it should be fine in any case. ------------- PR: https://git.openjdk.java.net/jdk/pull/1039 From dcubed at openjdk.java.net Thu Nov 5 16:59:09 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 5 Nov 2020 16:59:09 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM Message-ID: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Changes from @fisk and @dcubed-ojdk to: - simplify ObjectMonitor list management - get rid of Type-Stable Memory (TSM) This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) - a few minor regressions (<= -0.24%) - Volano is 6.8% better Eric C. has also running promotion perf runs on these bits and says "the results look fine". ------------- Commit messages: - 8253064.v00.part2 - 8253064.v00.part1 Changes: https://git.openjdk.java.net/jdk/pull/642/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253064 Stats: 2510 lines in 25 files changed: 604 ins; 1721 del; 185 mod Patch: https://git.openjdk.java.net/jdk/pull/642.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/642/head:pull/642 PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Thu Nov 5 16:59:09 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 5 Nov 2020 16:59:09 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Tue, 13 Oct 2020 20:31:44 GMT, Daniel D. Daugherty wrote: > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also running promotion perf runs on these bits and says "the results look fine". Self review done. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Thu Nov 5 16:59:10 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 5 Nov 2020 16:59:10 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: <6eZv40x8Nr--K-r6YCSihm0z_vX_KkXOIskxyYNgo70=.dbc7a86a-46a0-42e7-b9ae-da8802113105@github.com> On Mon, 2 Nov 2020 22:13:30 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also running promotion perf runs on these bits and says "the results look fine". > > Self review done. ### Gory details about these changes from @fisk and @dcubed-ojdk: ### Simplify `ObjectMonitor` List Management: - delete per-thread in-use and free-lists. - delete global free-list and global wait-list; there is a still a global in-use list; after `ObjectMonitor`s on the global in-use list are deflated, they are unlinked and added to a function local `GrowableArray` called `delete_list`; we do a handshake/safepoint with all JavaThreads and that makes all the `ObjectMonitor`s on the `delete_list` safe for deletion; lastly, we delete all the `ObjectMonitor`s on `delete_list`. - move async deflation work from the `ServiceThread` to a dedicated `MonitorDeflationThread`; this prevents `ObjectMonitor` inflation storms from delaying the work done by the `ServiceThread` for other subsystems; this means the `ServiceThread` no longer wakes up every `GuaranteedSafepointInterval` to check for work. - the `AllocationState` enum is dropped along with the `_allocation_state` field and associated getters and setters; the simpler list management no longer requires the allocation state to be tracked. - the safepoint cleanup phase no longer requests async monitor deflation; there is no longer a safepoint cleanup task for monitor deflation, but there is still an auditing/logging hook for debugging purposes. - delete ObjectSynchronizer functions associated with more complicated list management: `deflate_global_idle_monitors()`, `deflate_per_thread_idle_monitors()`, `deflate_common_idle_monitors()`, `om_flush()`, `prepend_list_to_common()`, `prepend_list_to_global_free_list()`, `prepend_list_to_global_wait_list()`, `prepend_list_to_global_in_use_list()`, `prepend_to_common()`, `prepend_to_om_free_list()`, `prepend_to_om_in_use_list()`, `take_from_start_of_common()`, `take_from_start_of_global_free_list()`, `take_from_start_of_om_free_list()` - delete the spin-lock functions needed by the more complicated list management. - delete a number of audit/debug/logging related functions needed by the more complicated list management. - restore the barrier related code that needed relocation due to om_flush()'s access of the weak obj reference; now that om_flush() is gone, the barrier related code can go back to its more natural place. ### Get Rid of Type-Stable Memory (TSM): - `ObjectMonitor` now subclasses `CHeapObj`. - the `ObjectMonitor` constructor and destructor are now more normal C++! - delete `ObjectMonitor` functions associated with TSM: `clear()`, `clear_common()`, `object_addr()`, `Recycle()`, and `set_object()`. - delete the version of `set_owner_from()` that support two possible old values since it is no longer needed; we are not recycling deflated `ObjectMonitor`s anymore so there's no longer a possibility of a `NULL` `_owner` value or a `DEFLATER_MARKER` value on the same code path. - delete ObjectSynchronizer functions associated with TSM: `om_alloc()`, `om_release()`, `prepend_block_to_lists()` - simplify ObjectSynchronizer functions related to TSM: `deflate_idle_monitors()`, `deflate_monitor_list()`, `inflate()` ### Change A Displaced Header is Always at Offset 0 - Change `markWord::displaced_mark_helper()` and `markWord::set_displaced_mark_helper()` to no longer assume that the displaced header in a `BasicLock` or `ObjectMonitor` is at offset 0. - ObjectMonitor::header_addr() no longer requires the offset to be zero. ### New Diagnostic Options - `AvgMonitorsPerThreadEstimate` - Used to estimate a variable ceiling based on number of threads for use with `MonitorUsedDeflationThreshold`; default is 1024, 0 is off, range is 0..max_jint. The current count of inflated `ObjectMonitor`s and the ceiling are used to determine whether the in-use ratio is higher than `MonitorUsedDeflationThreshold` (default 90). - `MonitorDeflationMax` - The maximum number of `ObjectMonitor`s to deflate, unlink and delete at one time; default is 1 million; range is 1024..max_jint. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Thu Nov 5 16:59:10 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 5 Nov 2020 16:59:10 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <6eZv40x8Nr--K-r6YCSihm0z_vX_KkXOIskxyYNgo70=.dbc7a86a-46a0-42e7-b9ae-da8802113105@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <6eZv40x8Nr--K-r6YCSihm0z_vX_KkXOIskxyYNgo70=.dbc7a86a-46a0-42e7-b9ae-da8802113105@github.com> Message-ID: On Mon, 2 Nov 2020 22:13:41 GMT, Daniel D. Daugherty wrote: >> Self review done. > > ### Gory details about these changes from @fisk and @dcubed-ojdk: > > ### Simplify `ObjectMonitor` List Management: > > - delete per-thread in-use and free-lists. > - delete global free-list and global wait-list; there is a still a global in-use list; after `ObjectMonitor`s on the global in-use list are deflated, they are unlinked and added to a function local `GrowableArray` called `delete_list`; we do a handshake/safepoint with all JavaThreads and that makes all the `ObjectMonitor`s on the `delete_list` safe for deletion; lastly, we delete all the `ObjectMonitor`s on `delete_list`. > - move async deflation work from the `ServiceThread` to a dedicated `MonitorDeflationThread`; this prevents `ObjectMonitor` inflation storms from delaying the work done by the `ServiceThread` for other subsystems; this means the `ServiceThread` no longer wakes up every `GuaranteedSafepointInterval` to check for work. > - the `AllocationState` enum is dropped along with the `_allocation_state` field and associated getters and setters; the simpler list management no longer requires the allocation state to be tracked. > - the safepoint cleanup phase no longer requests async monitor deflation; there is no longer a safepoint cleanup task for monitor deflation, but there is still an auditing/logging hook for debugging purposes. > - delete ObjectSynchronizer functions associated with more complicated list management: `deflate_global_idle_monitors()`, `deflate_per_thread_idle_monitors()`, `deflate_common_idle_monitors()`, `om_flush()`, `prepend_list_to_common()`, `prepend_list_to_global_free_list()`, `prepend_list_to_global_wait_list()`, `prepend_list_to_global_in_use_list()`, `prepend_to_common()`, `prepend_to_om_free_list()`, `prepend_to_om_in_use_list()`, `take_from_start_of_common()`, `take_from_start_of_global_free_list()`, `take_from_start_of_om_free_list()` > - delete the spin-lock functions needed by the more complicated list management. > - delete a number of audit/debug/logging related functions needed by the more complicated list management. > - restore the barrier related code that needed relocation due to om_flush()'s access of the weak obj reference; now that om_flush() is gone, the barrier related code can go back to its more natural place. > > ### Get Rid of Type-Stable Memory (TSM): > > - `ObjectMonitor` now subclasses `CHeapObj`. > - the `ObjectMonitor` constructor and destructor are now more normal C++! > - delete `ObjectMonitor` functions associated with TSM: `clear()`, `clear_common()`, `object_addr()`, `Recycle()`, and `set_object()`. > - delete the version of `set_owner_from()` that support two possible old values since it is no longer needed; we are not recycling deflated `ObjectMonitor`s anymore so there's no longer a possibility of a `NULL` `_owner` value or a `DEFLATER_MARKER` value on the same code path. > - delete ObjectSynchronizer functions associated with TSM: `om_alloc()`, `om_release()`, `prepend_block_to_lists()` > - simplify ObjectSynchronizer functions related to TSM: `deflate_idle_monitors()`, `deflate_monitor_list()`, `inflate()` > > ### Change A Displaced Header is Always at Offset 0 > > - Change `markWord::displaced_mark_helper()` and `markWord::set_displaced_mark_helper()` to no longer assume that the displaced header in a `BasicLock` or `ObjectMonitor` is at offset 0. > - ObjectMonitor::header_addr() no longer requires the offset to be zero. > > ### New Diagnostic Options > > - `AvgMonitorsPerThreadEstimate` - Used to estimate a variable ceiling based on number of threads for use with `MonitorUsedDeflationThreshold`; default is 1024, 0 is off, range is 0..max_jint. The current count of inflated `ObjectMonitor`s and the ceiling are used to determine whether the in-use ratio is higher than `MonitorUsedDeflationThreshold` (default 90). > - `MonitorDeflationMax` - The maximum number of `ObjectMonitor`s to deflate, unlink and delete at one time; default is 1 million; range is 1024..max_jint. Rebased the project to jdk-16+23. Local macOS and Linux-X64 builds and KitchensinkSanity runs pass. Kicking off a new round of Mach5 testing... @coleenp, @dholmes-ora, @fisk, and @robehn - as usual for ObjectMonitor stuff, your reviews would be greatly appreciated. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From mcimadamore at openjdk.java.net Thu Nov 5 17:14:16 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Thu, 5 Nov 2020 17:14:16 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v22] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Fix post-merge issues caused by 8219014 - Merge branch 'master' into 8254162 - Addess remaining feedback from @AlanBateman and @mrserb - Address comments from @AlanBateman - Merge branch 'master' into 8254162 - Fix issues with derived buffers and IO operations - More 32-bit fixes for TestLayouts - * Add final to MappedByteBuffer::SCOPED_MEMORY_ACCESS field * Tweak TestLayouts to make it 32-bit friendly after recent MemoryLayouts tweaks - Remove TestMismatch from 32-bit problem list - Merge branch 'master' into 8254162 - ... and 19 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...02f9e251 ------------- Changes: https://git.openjdk.java.net/jdk/pull/548/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=21 Stats: 7608 lines in 80 files changed: 4859 ins; 1545 del; 1204 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From gziemski at openjdk.java.net Thu Nov 5 17:42:01 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 5 Nov 2020 17:42:01 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 15:22:11 GMT, Thomas Stuefe wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback David Changes requested by gziemski (Committer). src/hotspot/os/posix/signals_posix.cpp line 587: > 585: // Call platform dependent signal handler. > 586: if (!signal_was_handled) { > 587: signal_was_handled = PosixSignals::pd_hotspot_signal_handler(sig, info, uc, jt); Can we move: JavaThread* const jt = (t != NULL && t->is_Java_thread()) ? (JavaThread*) t : NULL; here? It's not used anywhere else. src/hotspot/os/posix/signals_posix.cpp line 610: > 608: } > 609: #if defined(ZERO) && !defined(PRODUCT) > 610: char buf[20]; How do we know 20 makes it big enough? ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From gziemski at openjdk.java.net Thu Nov 5 17:42:02 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 5 Nov 2020 17:42:02 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v2] In-Reply-To: References: Message-ID: <6D3fPBrnCIKwyidw1Fys8Ve6iDMld_QxOEIWeMyLXOA=.e79d36bf-fd91-42f5-9a29-7988dc20e430@github.com> On Wed, 4 Nov 2020 05:20:09 GMT, Thomas Stuefe wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp line 699: > 697: size_t os::Posix::_java_thread_min_stack_allowed = 48 * K; > 698: #ifdef _LP64 > 699: size_t os::Posix::_vm_internal_thread_min_stack_allowed = 64 * K; This is not being moved to javaSignalHandler_inner(), so why are we dropping this completely? ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From gziemski at openjdk.java.net Thu Nov 5 17:46:01 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 5 Nov 2020 17:46:01 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v2] In-Reply-To: <6D3fPBrnCIKwyidw1Fys8Ve6iDMld_QxOEIWeMyLXOA=.e79d36bf-fd91-42f5-9a29-7988dc20e430@github.com> References: <6D3fPBrnCIKwyidw1Fys8Ve6iDMld_QxOEIWeMyLXOA=.e79d36bf-fd91-42f5-9a29-7988dc20e430@github.com> Message-ID: On Thu, 5 Nov 2020 17:34:58 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() > > src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp line 699: > >> 697: size_t os::Posix::_java_thread_min_stack_allowed = 48 * K; >> 698: #ifdef _LP64 >> 699: size_t os::Posix::_vm_internal_thread_min_stack_allowed = 64 * K; > > This is not being moved to javaSignalHandler_inner(), so why are we dropping this completely? I just saw the answer to this in the thread, nvm. ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From gziemski at openjdk.java.net Thu Nov 5 17:57:00 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 5 Nov 2020 17:57:00 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 15:22:11 GMT, Thomas Stuefe wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback David src/hotspot/os/posix/signals_posix.cpp line 1276: > 1274: set_signal_handler(SIGFPE, true); > 1275: PPC64_ONLY(set_signal_handler(SIGTRAP, true);) > 1276: set_signal_handler(SIGXFSZ, true); Can we drop the last argument here, it's always true the way we use set_signal_handler() ? ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From mdoerr at openjdk.java.net Thu Nov 5 18:19:00 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 18:19:00 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v6] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 10:56:59 GMT, Martin Doerr wrote: >> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > Very nice! Unfortunately, I had forgotten one detail (see inline). > I'll try to find a 2nd reviewer in our team. Note that it's better to avoid force-push to preserve history. All commits in your branch get merged automatically when integrating so you will only see one final change in the master repository. ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From mdoerr at openjdk.java.net Thu Nov 5 18:19:01 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 5 Nov 2020 18:19:01 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v6] In-Reply-To: <9C7uAl7b_TLzZtADhBbeXlwNLT-W5cUvGEVM7uLADk8=.3a0e038c-2ed4-49ba-9a2e-8d4ed2441d26@github.com> References: <9C7uAl7b_TLzZtADhBbeXlwNLT-W5cUvGEVM7uLADk8=.3a0e038c-2ed4-49ba-9a2e-8d4ed2441d26@github.com> Message-ID: On Thu, 5 Nov 2020 14:04:27 GMT, Martin Doerr wrote: >> Wow, thanks for catching it. But, let me make my naive question: why is it necessary? > > E.g. instruct testI_reg_imm sets cr0 as result and branchConFar uses it. The kill cr0 effect disallows scheduling your nodes between them. Btw. advantage of the expand rules is that the single instructions in a basic block can get scheduled in an interleaved fashion to hide instruction latencies. But this has become less relevant with modern out-of-order CPUs. We had done that for Power 6, but we're no longer optimizing for it. ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From stuefe at openjdk.java.net Thu Nov 5 18:47:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 5 Nov 2020 18:47:01 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 17:39:02 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback David > > Changes requested by gziemski (Committer). Thanks for the review, Gerard! I'll wait for Davids feedback, since he may want more changes done. > src/hotspot/os/posix/signals_posix.cpp line 587: > >> 585: // Call platform dependent signal handler. >> 586: if (!signal_was_handled) { >> 587: signal_was_handled = PosixSignals::pd_hotspot_signal_handler(sig, info, uc, jt); > > Can we move: > > JavaThread* const jt = (t != NULL && t->is_Java_thread()) ? (JavaThread*) t : NULL; > > here? It's not used anywhere else. Sure, I can do this. We can move it up again when/if later needed. > src/hotspot/os/posix/signals_posix.cpp line 610: > >> 608: } >> 609: #if defined(ZERO) && !defined(PRODUCT) >> 610: char buf[20]; > > How do we know 20 makes it big enough? Its the signal name, plus possible additions in the form of numbers (e.g. "UNKNOWN (222)") . Now that I write this I see you are right, this could lead to truncation of the number are very long (MAX_INT is 10 digits long). I'll up this to 64. > src/hotspot/os/posix/signals_posix.cpp line 1276: > >> 1274: set_signal_handler(SIGFPE, true); >> 1275: PPC64_ONLY(set_signal_handler(SIGTRAP, true);) >> 1276: set_signal_handler(SIGXFSZ, true); > > Can we drop the last argument here, it's always true the way we use set_signal_handler() ? Yes, we can do this. ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From gziemski at openjdk.java.net Thu Nov 5 20:32:59 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 5 Nov 2020 20:32:59 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: References: Message-ID: <0UdjCJ4YXGEw_A06WaPzljEJF9fAPtvWiWqTqxSJxog=.1ba84588-0af1-40fc-9e2c-fc39bbdfbd1b@github.com> On Thu, 5 Nov 2020 18:43:57 GMT, Thomas Stuefe wrote: >> Changes requested by gziemski (Committer). > > Thanks for the review, Gerard! I'll wait for Davids feedback, since he may want more changes done. Thank you Thomas for the work, I really like the cleanup here! ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From redestad at openjdk.java.net Thu Nov 5 21:21:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 5 Nov 2020 21:21:57 GMT Subject: Integrated: 8255894: Remove unused StubRoutines::_zero_aligned_words In-Reply-To: References: Message-ID: On Wed, 4 Nov 2020 11:51:20 GMT, Claes Redestad wrote: > _zero_aligned_words was a SPARC-only optimization added by JDK-7059037 This pull request has now been integrated. Changeset: 140c162a Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/140c162a Stats: 7 lines in 2 files changed: 0 ins; 7 del; 0 mod 8255894: Remove unused StubRoutines::_zero_aligned_words Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1053 From mcimadamore at openjdk.java.net Thu Nov 5 21:26:16 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Thu, 5 Nov 2020 21:26:16 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: - Merge branch '8254162' into 8254231_linker - Fix post-merge issues caused by 8219014 - Merge branch 'master' into 8254162 - Addess remaining feedback from @AlanBateman and @mrserb - Address comments from @AlanBateman - Fix typo in upcall helper for aarch64 - Merge branch '8254162' into 8254231_linker - Merge branch 'master' into 8254162 - Fix issues with derived buffers and IO operations - More 32-bit fixes for TestLayouts - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f ------------- Changes: https://git.openjdk.java.net/jdk/pull/634/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=14 Stats: 75292 lines in 271 files changed: 72365 ins; 1626 del; 1301 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From akozlov at openjdk.java.net Thu Nov 5 21:34:00 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 5 Nov 2020 21:34:00 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly Message-ID: Follow-up patch for PR #1039. As clarified by @nick-arm, CPU_A53MAC was set to workaround old Linux bug when A53 cores may be available if only a single A57 core is reported in /proc/cpuinfo. The workaround was broken recently but the bug is assumed to be fixed everywhere, so the workaround can be removed completely. CCing old participants: @theRealAph @adinn ------------- Commit messages: - Remove CPU_A53MAC assumption based on number of CPUs Changes: https://git.openjdk.java.net/jdk/pull/1084/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1084&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255799 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1084.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1084/head:pull/1084 PR: https://git.openjdk.java.net/jdk/pull/1084 From david.holmes at oracle.com Thu Nov 5 23:57:30 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Nov 2020 09:57:30 +1000 Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: <7888a9d2-a3b9-9981-665f-1cb3181d199d@oracle.com> Message-ID: <8946f2ff-a1ea-682d-6db6-d386a8cb81bd@oracle.com> Hi Thomas, On 5/11/2020 5:32 pm, Thomas St?fe wrote: > Hi David, > > > > > >? ? ? > 10) When invoking the fatal error handler, we extract the > pc from > >? ? ?the context and hand it over as "faulting pc". For SIGILL and > >? ? ?SIGFPE, this is not totally correct. According to POSIX [3], for > >? ? ?those signals the address of the faulting instruction is > handed over > >? ? ?in `si_info.si_addr`. > >? ? ? > > >? ? ? > On most platforms this does not matter, they are the same. > But on > >? ? ?some architectures the pc in the signal context actually points > >? ? ?somewhere else, e.g. beyond the faulting instruction. Therefore > >? ? ?`si_info.si_addr` is the better choice. > > > >? ? ?The Posix spec also states "For some implementations, the > value of > >? ? ?si_addr may be inaccurate." - so I'm not at all sure which > "pc" we > >? ? ?should be trusting here? I thought the ucontext was the detailed > >? ? ?platform specific "context" object that we should extract > information > >? ? ?from. Which architectures give different values in the two > and is there > >? ? ?some documentation stating what happens for any given os/cpu? > > > > > > I overlooked this. Well, this is a helpful?standard :) > > > > I saw this happen on s390 and on pa-risc. Before this patch, I did > > correct this in the platform handler. Out of caution I could > #ifdef this > > section to s390. > > I prefer not see any change in behaviour for platforms where no problem > has been observed. Also see what seems to be related comment in > ./cpu/arm/vm_version_arm_32.cpp > > > Okay, made this section s390 only. > > Arm may or may not have the same problem. Their signal handler is in > parts using si_addr, partly the context pc. But since you want to keep > old behavior, let's keep it for now. > > I still think si_addr would be more correct and simpler, but we can > leave this for a follow up fix. Okay. Thanks. > // JVM_handle_linux_signal moves PC here if SIGILL happens > > >? ? ? > ---- > >? ? ? > > >? ? ? > The changes in this patch: > >? ? ? > > >? ? ? > a) hotspot signal handling is now done by the following > functions: > >? ? ? > > >? ? ? > ? ? ? ? ? ? ? ? ? handler too> > >? ? ? >? ? |? ? ? ? ? ? ? ? ? ? ? ? ? ?| > >? ? ? >? ? v? ? ? ? ? ? ? ? ? ? ? ? ? ?v > >? ? ? >? ?javaSignalHandler? ? ? ?JVM_handle_linux_signal() > >? ? ? >? ? ? ? ?|? ? ? ? ? ? ? ? ? ?/ > >? ? ? >? ? ? ? ?v? ? ? ? ? ? ? ? ? v > >? ? ? >? ? ? ?javaSignalHandler_inner > > > >? ? ?Not clear why we need the _inner version. Why can't we just have > >? ? ?javaSignalHandler which is installed as the handler and which > is called > >? ? ?by JVM_handle_XXX_signal? > > > > > > Because?JVM_handle_XXX_signal has one more argument than the > standard > > signal handler (abort_if_unrecognized). > > Okay so why introduce this shape instead of keeping the existing form: > > javaSignalHandler -> JVM_handle_xxx_signal(..., true) -> > javaSignalHandler_inner > > ? With the new arrangement the equivalence between javaSignalHandler > and > JVM_handle_xxx_signal can only be seen by inspecting the code of both. > > > I see now what you mean. > > To my mind,?JVM_handle_linux_signal() is an external API with a contract > which does not necessarily correspond with > what?javaSignalHandler_inner() is about to do. Its existence > demonstrates that (as an explicit side door into the signal handler > hierarchy), and that door can be guarded and adorned with pre- and > post-processing if needed. In an abstract sense yes, but today javaSignalHandler does call JVM_handle_xxx_signal and it is that implementation that we have factored out into javaSignalHandler_inner. > For example, the contract says that the user shall not pass anything > other than the given list of signals. Well, we can assert it here more > clearly. So users could use a debug VM to test their application. It > also means we can, for the release case, safely shortcut signals it > should ignore. That isolates this interface a bit from any future > changes to the hotspot signal handlers (because no-one ever thinks about > this stuff when working with the handlers). I think you may be making this "contract" a bit more concrete than it actually is. The large comments block states: // The user-defined signal handler must pass unrecognized signals to this // routine, which doesn't suggest only a specific sub-set. It also states: // This routine may recognize any of the following kinds of signals: // SIGBUS, SIGSEGV, SIGILL, SIGFPE, SIGQUIT, SIGPIPE, SIGXFSZ, SIGUSR1. but not that only says "may" - it is not IMO intended to be definitive, only indicative. So to me the application handler would be free to pass all signals through (with abort_if_unrecognized==false) and use the return value to determine if this was a VM related signal. > I know these are not hard arguments, it is a matter of taste. If you > insist, I change it in the way you prefer. > >> > So I think JVM_handle_xx_signal() should test for the list of allowed >> > signals, and just return false right away in case sig is none of the >> > hotspot signals. > > > I don't think we should change existing behaviour here. > > But is that not an error worth fixing? Before, someone could have passed > in 177 as a signal number and this would have crashed the VM if he also > passed in abort_if_unrecognized=true. And this is exactly how it is supposed to work. There is a responsibility on the application code to either pass known good signals with abort_if_unrecognized, or else not set abort_if_unrecognized. > This is a bit like my argument from above. I believe we have a clear > contract with this API. If that contract is broken, what should we do? I > prefer to assert, but in release case to be tolerant and ignore it. > Otherwise, what good is this contract if we cannot rely on it and don't > check it? The "contract" includes the abort_if_unrecognized flag. If it is true then we enforce things, otherwise we don't. > > > // This routine may recognize any of the following kinds of signals: > > > // ? ?SIGBUS, SIGSEGV, SIGILL, SIGFPE, SIGQUIT, SIGPIPE, SIGXFSZ, > SIGUSR1. > > > // It should be consulted by handlers for any of those signals. > > > > > > Note that this list is not really correct, as it includes SIGUSR1 > > > and?SIGQUIT. None of which are handled by the hotspot signal > handler. As > > > you wrote before, this mechanism is only to handle signals the hotspot > > > commandeers. > > > > SIGUSR1 is probably a leftover from when we did use SIGUSR1 for > something. > > > > SIGQUIT is interesting because if the app is in charge of signals then > > it will install a SIGQUIT handler and we would not want the "VM" to do > > it's normal SIGQUIT handler. > > Hm, I can see it either way. SIGQUIT is for thread dumping. Whether or > not the application would want us to honor it we cannot say. I'm always forgetting which of SIGQUIT and SIGINT do which (to me QUIT == exit the VM). But in both cases these are signals for which the JDK installs a handler. > The current behavior would be to treat SIGQUIT?as unknown signal (crash > or ignore). I would keep that behavior for now. > > > I have a big question mark over how the use > > of the signal handler thread can/should interact with > > AllowUserSignalHandlers. > > What is the signal handler thread? The signal handler thread is a Java thread that synchronously processes the signals that are managed by the os::user_handler(). But I realize now these signals can be controlled by -Xrs so that the application can manage them instead. (Though with no programmatic way to pass them on to the JVM AFAICS). Thanks, David ----- > Thanks, Thomas From xliu at openjdk.java.net Fri Nov 6 00:15:08 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 6 Nov 2020 00:15:08 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v4] In-Reply-To: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> Message-ID: <5oJPHjLHldduDGrkzhs5QXssAhCkmmwtn6jssbxnuvI=.cef1e346-791b-4076-a7af-8bdbc7a7b0fc@github.com> > UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed > from hotspot, so remove this flag. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8255562: delete UseRDPCForConstantTableBase revert the change of Arguments::handle_aliases_and_deprecation. Arguments::process_argument can handle obsoleted arguments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/949/files - new: https://git.openjdk.java.net/jdk/pull/949/files/ae686179..213a161b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=949&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=949&range=02-03 Stats: 13 lines in 2 files changed: 3 ins; 8 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/949.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/949/head:pull/949 PR: https://git.openjdk.java.net/jdk/pull/949 From kvn at openjdk.java.net Fri Nov 6 01:12:03 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 6 Nov 2020 01:12:03 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 19:16:21 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - JDK-8252848: Review comments addressed. >> - Merge remote-tracking branch 'origin' into JDK-8252848 >> - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - JDK-8252848 : Review comments resolution. >> - Merge remote-tracking branch 'upstream' into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1423: > >> 1421: if (ArrayCopyPartialInlineSize > MaxVectorSize) { >> 1422: ArrayCopyPartialInlineSize = MaxVectorSize; >> 1423: warning("Setting ArrayCopyPartialInlineSize as MaxVectorSize"); > > warning only if ArrayCopyPartialInlineSize is not default. I don't see your fix for my comment. I asked to add `if(!FLAG_IS_DEFAULT(ArrayCopyPartialInlineSize))` check ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Fri Nov 6 01:11:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 6 Nov 2020 01:11:59 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v12] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 09:06:10 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8252848: Review comments resolution. Changes requested by kvn (Reviewer). src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 2133: > 2131: MacroAssembler::evmovdqu(typ, kmask, dst, src, vector_len); > 2132: } > 2133: typ -> type src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 125: > 123: void evmovdqu(BasicType typ, KRegister kmask, XMMRegister dst, Address src, int vector_len); > 124: void evmovdqu(BasicType typ, KRegister kmask, Address dst, XMMRegister src, int vector_len); > 125: typ -> type src/hotspot/share/opto/cfgnode.hpp line 106: > 104: bool try_clean_mem_phi(PhaseGVN *phase); > 105: bool is_self_loop(Node* n, PhaseGVN *phase); > 106: bool try_phi_disintegration(PhaseGVN *phase); Why these changes and where new definitions? ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From ngasson at openjdk.java.net Fri Nov 6 01:26:57 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 6 Nov 2020 01:26:57 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 21:29:37 GMT, Anton Kozlov wrote: > Follow-up patch for PR #1039. As clarified by @nick-arm, CPU_A53MAC was set to workaround old Linux bug when A53 cores may be available if only a single A57 core is reported in /proc/cpuinfo. The workaround was broken recently but the bug is assumed to be fixed everywhere, so the workaround can be removed completely. > > CCing old participants: @theRealAph @adinn LGTM (I'm not a Reviewer) ------------- Marked as reviewed by ngasson (Committer). PR: https://git.openjdk.java.net/jdk/pull/1084 From dholmes at openjdk.java.net Fri Nov 6 02:44:02 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 6 Nov 2020 02:44:02 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> On Tue, 13 Oct 2020 20:31:44 GMT, Daniel D. Daugherty wrote: > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also done promotion perf runs on these bits and says "the results look fine". Hi Dan, Overall this looks great. Comparing old and new code is complex but the new code on its own is generally much simpler/clearer (not all though :) ). I have a few nits, comments and queries below. Thanks, David src/hotspot/share/runtime/monitorDeflationThread.hpp line 44: > 42: > 43: // Hide this thread from external view. > 44: bool is_hidden_from_external_view() const { return true; } Style nit: a single space after "const" suffices src/hotspot/share/runtime/objectMonitor.cpp line 380: > 378: if (event.should_commit()) { > 379: event.set_monitorClass(object()->klass()); > 380: event.set_address((uintptr_t)this); This looks wrong - the event should refer to the Object whose monitor we have entered, not the OM itself. src/hotspot/share/runtime/objectMonitor.cpp line 1472: > 1470: event->set_monitorClass(monitor->object()->klass()); > 1471: event->set_timeout(timeout); > 1472: event->set_address((uintptr_t)monitor); Again the event should refer to the Object, not the OM. src/hotspot/share/runtime/objectMonitor.hpp line 145: > 143: // have busy multi-threaded access. _header and _object are set at initial > 144: // inflation. The _object does not change, so it is a good choice to share > 145: // its the cache line with _header. Typo: its the src/hotspot/share/runtime/synchronizer.cpp line 60: > 58: > 59: void MonitorList::add(ObjectMonitor* m) { > 60: for (;;) { Style nit: a do/while loop would like nicer here. src/hotspot/share/runtime/synchronizer.cpp line 94: > 92: // Find next live ObjectMonitor. > 93: ObjectMonitor* next = m; > 94: while (next != NULL && next->is_being_async_deflated()) { Nit: This loop seems odd. Given we know m is_being_async_deflated, this should either be a do/while loop, or else we should initialize: ObjectMonitor* next = m->next_om(); and dispense with the awkwardly named next_next. src/hotspot/share/runtime/synchronizer.cpp line 126: > 124: } > 125: > 126: if (self->is_Java_thread() && Non-JavaThreads may have a monitor list ??? src/hotspot/share/runtime/synchronizer.cpp line 136: > 134: > 135: // Honor block request. > 136: ThreadBlockInVM tbivm(self->as_Java_thread()); ThreadBlockInVM is generally used to wrap blocking code, not to cause the current thread to block (which it will do as a side-effect if a safepoint/handshake is requested). Surely this should just be call to `process_if_requested` (or the new `process_if_requested_with_exit_check`)? src/hotspot/share/runtime/synchronizer.cpp line 1425: > 1423: } > 1424: > 1425: if (self->is_Java_thread() && Again unclear how a non-JavaThread is doing this? Isn't this always done by the MonitorDeflation thread?? src/hotspot/share/runtime/synchronizer.cpp line 1435: > 1433: > 1434: // Honor block request. > 1435: ThreadBlockInVM tbivm(self->as_Java_thread()); Same comment as previous use of TBIVM. (It's hard to see in the PR UI how they two blocks relate.) src/hotspot/share/runtime/synchronizer.cpp line 1460: > 1458: // This function is called by the MonitorDeflationThread to deflate > 1459: // ObjectMonitors. It is also called via do_final_audit_and_print_stats() > 1460: // by the VMThread. Ah! I think this addresses my previous comments about a non-JavaThread doing this. src/hotspot/share/runtime/synchronizer.cpp line 1501: > 1499: if (ls != NULL) { > 1500: timer.stop(); > 1501: ls->print_cr("before handshaking: unlinked_count=" SIZE_FORMAT ", in_use_list stats: ceiling=" SIZE_FORMAT ", count=" SIZE_FORMAT ", max=" SIZE_FORMAT, Style nit: line too long src/hotspot/share/runtime/synchronizer.cpp line 1520: > 1518: // deflated in this cycle. > 1519: size_t deleted_count = 0; > 1520: for (ObjectMonitor* monitor: delete_list) { I didn't realize C++ has a "foreach" loop construct! Is this in our allowed C++ usage? src/hotspot/share/runtime/synchronizer.cpp line 1533: > 1531: > 1532: // Honor block request. > 1533: ThreadBlockInVM tbivm(self->as_Java_thread()); Ditto previous comments on use of TBIVM. src/hotspot/share/runtime/synchronizer.cpp line 1712: > 1710: // Check the in_use_list; log the results of the checks. > 1711: void ObjectSynchronizer::chk_in_use_list(outputStream* out, int *error_cnt_p) { > 1712: size_t l_in_use_count = _in_use_list.count(); Style nit: what is this `l_` prefix? Is that a one or a small L? why do we want want/need it? (Applies elsewhere too) src/hotspot/share/runtime/synchronizer.cpp line 1748: > 1746: int* error_cnt_p) { > 1747: if (n->owner_is_DEFLATER_MARKER()) { > 1748: // This should not happen, but if it does, it is not fatal. Deflating an in-use monitor is not fatal? Please explain how things would recover. src/hotspot/share/runtime/synchronizer.hpp line 173: > 171: > 172: static MonitorList _in_use_list; > 173: static jint _in_use_list_ceiling; Can you add some commentary on what this ceiling is as I could not understand its role just by looking at the code. Thanks. src/hotspot/share/runtime/synchronizer.cpp line 221: > 219: > 220: MonitorList ObjectSynchronizer::_in_use_list; > 221: // Start the ceiling with one thread: This relates to me not understanding what this ceiling is (as commented elsewhere) but why does this say "start with one thread" when the value of AvgMonitorsPerThreadEstimate defaults to 1024 ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From jbhateja at openjdk.java.net Fri Nov 6 07:10:57 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 6 Nov 2020 07:10:57 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 01:02:43 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 1423: >> >>> 1421: if (ArrayCopyPartialInlineSize > MaxVectorSize) { >>> 1422: ArrayCopyPartialInlineSize = MaxVectorSize; >>> 1423: warning("Setting ArrayCopyPartialInlineSize as MaxVectorSize"); >> >> warning only if ArrayCopyPartialInlineSize is not default. > > I don't see your fix for my comment. I asked to add `if(!FLAG_IS_DEFAULT(ArrayCopyPartialInlineSize))` check Default value for ArrayCopyPartialInlineSize = -1 with a value range [-1,64], default value for MaxVectorSize=0 with a value range [0,max_int]; control flow will be reaching to this warning only for a non-default value of ArrayCopyPartialInlineSize. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Fri Nov 6 07:13:57 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 6 Nov 2020 07:13:57 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v12] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 01:07:34 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8252848: Review comments resolution. > > src/hotspot/share/opto/cfgnode.hpp line 106: > >> 104: bool try_clean_mem_phi(PhaseGVN *phase); >> 105: bool is_self_loop(Node* n, PhaseGVN *phase); >> 106: bool try_phi_disintegration(PhaseGVN *phase); > > Why these changes and where new definitions? Definitions were removed earlier declarations not needed anymore. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Fri Nov 6 07:23:07 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 6 Nov 2020 07:23:07 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v13] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge remote-tracking branch 'upstream' into JDK-8252848 - JDK-8252848 : Review comments resolved - JDK-8252848: Review comments resolution. - JDK-8252848: Review comments addressed. - Merge remote-tracking branch 'origin' into JDK-8252848 - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - JDK-8252848 : Review comments resolution. - Merge remote-tracking branch 'upstream' into JDK-8252848 - ... and 4 more: https://git.openjdk.java.net/jdk/compare/5dfb42fc...ed343a9e ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=12 Stats: 535 lines in 27 files changed: 485 ins; 23 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From simonis at openjdk.java.net Fri Nov 6 09:30:01 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Fri, 6 Nov 2020 09:30:01 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v4] In-Reply-To: <5oJPHjLHldduDGrkzhs5QXssAhCkmmwtn6jssbxnuvI=.cef1e346-791b-4076-a7af-8bdbc7a7b0fc@github.com> References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> <5oJPHjLHldduDGrkzhs5QXssAhCkmmwtn6jssbxnuvI=.cef1e346-791b-4076-a7af-8bdbc7a7b0fc@github.com> Message-ID: <5Z7xuFo1k14QqU8vN015Y7DCrOyV9zFprXm_FCJzFlM=.02475ddd-c7b0-4fb6-a173-04f8a25952f7@github.com> On Fri, 6 Nov 2020 00:15:08 GMT, Xin Liu wrote: >> UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed >> from hotspot, so remove this flag. > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8255562: delete UseRDPCForConstantTableBase > > revert the change of Arguments::handle_aliases_and_deprecation. > Arguments::process_argument can handle obsoleted arguments. Thnaks Xin. Looks good to me now. ------------- Marked as reviewed by simonis (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/949 From mdoerr at openjdk.java.net Fri Nov 6 09:57:58 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 6 Nov 2020 09:57:58 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7] In-Reply-To: References: Message-ID: <-E9uhE9me-5uGQDT0ofstDWiLdDf4HgGXfdpuS61uE8=.af48c1d0-fd81-4c4f-9bec-38a1fcea3f35@github.com> On Thu, 5 Nov 2020 13:30:11 GMT, Ziviani wrote: >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions > > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Tests look good on Power8/9. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/907 From aph at openjdk.java.net Fri Nov 6 10:07:59 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 6 Nov 2020 10:07:59 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 21:29:37 GMT, Anton Kozlov wrote: > Follow-up patch for PR #1039. As clarified by @nick-arm, CPU_A53MAC was set to workaround old Linux bug when A53 cores may be available if only a single A57 core is reported in /proc/cpuinfo. The workaround was broken recently but the bug is assumed to be fixed everywhere, so the workaround can be removed completely. > > CCing old participants: @theRealAph @adinn Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1084 From coleenp at openjdk.java.net Fri Nov 6 12:16:14 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 6 Nov 2020 12:16:14 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v7] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into jvmti-table - Add back WeakProcessorPhases::Phase enum. - Serguei 1. - Code review comments from Kim and Albert. - Merge branch 'master' into jvmti-table - Merge branch 'master' into jvmti-table - More review comments from Stefan and ErikO - Code review comments from StefanK. - 8212879: Make JVMTI TagMap table not hash on oop address ------------- Changes: https://git.openjdk.java.net/jdk/pull/967/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=06 Stats: 1748 lines in 41 files changed: 628 ins; 1018 del; 102 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From stuefe at openjdk.java.net Fri Nov 6 12:56:19 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 6 Nov 2020 12:56:19 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v4] In-Reply-To: References: Message-ID: > Hi all, > > may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. > > Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. > > This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. > > This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. > > ---- > > This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. > > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. > > Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. > > --- > > The fixed issues: > > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. > > Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. > But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. > > 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). > > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. > > 4) Every platform handler has this section: > > JavaThread* thread = NULL; > VMThread* vmthread = NULL; > if (PosixSignals::are_signal_handlers_installed()) { > if (t != NULL ){ > if(t->is_Java_thread()) { > thread = t->as_Java_thread(); > } > else if(t->is_VM_thread()){ > vmthread = (VMThread *)t; > } > } > } > > `vmthread` is unused on all platforms and can be removed. > > 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): > > if (sig == SIGPIPE || sig == SIGXFSZ) { > // allow chained handler to go first > if (PosixSignals::chained_handler(sig, info, ucVoid)) { > return true; > } else { > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 > return true; > } > } > > - On s390 and ppc, we miss SIGXFSZ handling > _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. > - both paths return true - section can be shortened > > Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. > > 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: > >> // unmask current signal >> sigset_t newset; >> sigemptyset(&newset); >> sigaddset(&newset, sig); >> sigprocmask(SIG_UNBLOCK, &newset, NULL); >> > > - Use of `sigprocmask()` is UB in a multithreaded program. > - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. > > 7) the JFR crash protection is not consistently checked in all platform handlers. > > 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). > > 9) on Linux ppc64 and AIX, we have this section: > >> if (sig == SIGILL && (pc < (address) 0x200)) { >> goto report_and_die; >> } > > which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). > > This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. > > 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. > > On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. > > ---- > > The changes in this patch: > > a) hotspot signal handling is now done by the following functions: > > > | | > v v > javaSignalHandler JVM_handle_linux_signal() > | / > v v > javaSignalHandler_inner > | > v > PosixSignals::pd_hotspot_signal_handler() > > > The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. > > `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. > `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. > > > b) I commonized prologue- and epilogue coding. > - I simplified (4) to a single line in the shared handler > - I moved the JFR thread crash protection (7) up to the shared handler > - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) > - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. > - I simplified (5) and commonized it, and removed (9) completely > - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. > > Thanks for reviewing. > > Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. > > I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. > > Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. > > ---- > > [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html > [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html > [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html > [5] https://bugs.openjdk.java.net/browse/JDK-8253742 Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Review feedback - Merge - Feedback David - Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() - Initial patch ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1034/files - new: https://git.openjdk.java.net/jdk/pull/1034/files/3a6a8095..d9a2deff Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=02-03 Stats: 9564 lines in 608 files changed: 5075 ins; 3152 del; 1337 mod Patch: https://git.openjdk.java.net/jdk/pull/1034.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1034/head:pull/1034 PR: https://git.openjdk.java.net/jdk/pull/1034 From stuefe at openjdk.java.net Fri Nov 6 12:56:19 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 6 Nov 2020 12:56:19 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: <0UdjCJ4YXGEw_A06WaPzljEJF9fAPtvWiWqTqxSJxog=.1ba84588-0af1-40fc-9e2c-fc39bbdfbd1b@github.com> References: <0UdjCJ4YXGEw_A06WaPzljEJF9fAPtvWiWqTqxSJxog=.1ba84588-0af1-40fc-9e2c-fc39bbdfbd1b@github.com> Message-ID: On Thu, 5 Nov 2020 20:30:01 GMT, Gerard Ziemski wrote: >> Thanks for the review, Gerard! I'll wait for Davids feedback, since he may want more changes done. > > Thank you Thomas for the work, I really like the cleanup here! I added some more minor cleanups: - added missing include (atomic.hpp) - CDT complained about check_pending_signals() not returning a value upon exit; we never exit, so thats fine, but I added a return and a ShouldNotReachHere just for clarity. Review-triggered fixes: - @gerard-ziemski: removed the "set_installed" parameter from set_signal_handler. That parameter was used to reset the signal back to the default handler. But that had not been used. And it had been incorrect too, since the correct way of doing this would have been to reset to any chained handler if one was present when the hotspot signal handler was installed. - @gerard-ziemski: increased the buffer size for the signal name in little robots text bubble to 64. - @gerard-ziemski: moved definition of jt closer to where its needed. - @dholmes-ora: as requested, I merged javaSignalHandler_inner() and JVM_handle_xxx_signal(). I am not super happy about this - one thing I immediately noticed is that now I cannot assert for "+AllowUserSignalHandlers" to make sure this function is only called then. But I can live with it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From coleenp at openjdk.java.net Fri Nov 6 13:03:00 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 6 Nov 2020 13:03:00 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v3] In-Reply-To: References: <7oh4JAWsnFjXMCx9bwBCjhaZALn4U3kje15ZXn2klE0=.ab9b5ea6-0336-47dc-b9a5-32827ad47d66@github.com> Message-ID: On Thu, 5 Nov 2020 07:02:32 GMT, Thomas Stuefe wrote: >> src/hotspot/os/posix/signals_posix.cpp line 615: >> >>> 613: "\n# /--------------------\\" >>> 614: "\n# | %-7s |" >>> 615: "\n# \\---\\ /--------------/" >> >> Isn't the little robot supposed to say "segmentation fault" and would that be safer than calling get_signal_name in this context? thanks for keeping the picture. > > Thanks Coleen. Little robot now spells out the name of the signal (since "segmentation fault" is only correct for segv, and I wanted to see it for the other cases too. get_signal_name() is completely harmless, just uses a bit of stack buffer (and that only for the case of unknown signals which are printed numerically; I plan to change that and give us a simple version which just returns static strings). OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From hohensee at amazon.com Fri Nov 6 13:35:51 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 6 Nov 2020 13:35:51 +0000 Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v4] Message-ID: <0BCB4D24-2329-4DF9-AA4E-63D9887105A4@amazon.com> I updated the CSR Specification to make it more clear: ----- Remove product(bool, UseRDPCForConstantTableBase, false, "Use Sparc RDPC instruction for the constant table base.") and make it obsolete in JDK 16 and expired in JDK 17. ----- Lgtm as well. Thanks, Paul ?On 11/6/20, 1:31 AM, "hotspot-dev on behalf of Volker Simonis" wrote: On Fri, 6 Nov 2020 00:15:08 GMT, Xin Liu wrote: >> UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed >> from hotspot, so remove this flag. > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8255562: delete UseRDPCForConstantTableBase > > revert the change of Arguments::handle_aliases_and_deprecation. > Arguments::process_argument can handle obsoleted arguments. Thnaks Xin. Looks good to me now. ------------- Marked as reviewed by simonis (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/949 From akozlov at openjdk.java.net Fri Nov 6 13:52:55 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 6 Nov 2020 13:52:55 GMT Subject: RFR: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 01:24:22 GMT, Nick Gasson wrote: >> Follow-up patch for PR #1039. As clarified by @nick-arm, CPU_A53MAC was set to workaround old Linux bug when A53 cores may be available if only a single A57 core is reported in /proc/cpuinfo. The workaround was broken recently but the bug is assumed to be fixed everywhere, so the workaround can be removed completely. >> >> CCing old participants: @theRealAph @adinn > > LGTM (I'm not a Reviewer) All credits should go to @nick-arm, who's explanation helped to connect the dots. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1084 From gziemski at openjdk.java.net Fri Nov 6 15:23:01 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 6 Nov 2020 15:23:01 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v4] In-Reply-To: References: Message-ID: <_A6p-VDsGFyYVIkZAYVXIIpA2_V6kyaeuB-reLrRwSk=.d531fb03-9bbd-4ac3-be48-05ade6a2b026@github.com> On Fri, 6 Nov 2020 12:56:19 GMT, Thomas Stuefe wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Review feedback > - Merge > - Feedback David > - Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() > - Initial patch Marked as reviewed by gziemski (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From iignatyev at openjdk.java.net Fri Nov 6 15:55:58 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 6 Nov 2020 15:55:58 GMT Subject: RFR: 8255964: Add jcmd Thread.print to jtreg timeout handler In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 17:09:58 GMT, Nils Eliasson wrote: > This patch adds jcmd Thread.print to the jtreg timeout handler. > > Please review. Hi Nils, It looks alright, but could you please elaborate on why we need it when there is already `jstack` action? ? Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/1080 From gziemski at openjdk.java.net Fri Nov 6 16:40:03 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 6 Nov 2020 16:40:03 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: On Wed, 4 Nov 2020 04:22:05 GMT, David Holmes wrote: >> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >> >> - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >> - revert unblock_program_error_signals change > > src/hotspot/os/posix/signals_posix.hpp line 33: > >> 31: >> 32: typedef siginfo_t siginfo_t; >> 33: typedef sigset_t sigset_t; > > I don't see why this is needed/wanted. We can include signal.h without a problem. > > I'm not even sure what these typedefs means ?? Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. > src/hotspot/os/posix/signals_posix.cpp line 1642: > >> 1640: >> 1641: void PosixSignals::do_task(Thread* thread, os::SuspendedThreadTask* task) { >> 1642: if (PosixSignals::do_suspend(thread->osthread())) { > > Shouldn't need PosixSignals:: prefix in this method. Leftover from the time when do_task wasn't in signals_posix.cpp, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Fri Nov 6 16:46:01 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 6 Nov 2020 16:46:01 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: On Wed, 4 Nov 2020 04:18:46 GMT, David Holmes wrote: >> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >> >> - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >> - revert unblock_program_error_signals change > > src/hotspot/os/posix/signals_posix.cpp line 1349: > >> 1347: sigaddset(&unblocked_sigs, SIGTRAP); >> 1348: #endif >> 1349: sigaddset(&unblocked_sigs, PosixSignals::SR_signum); > > Shouldn't need the PosixSignals::prefix in this method Here we are going from: `void PosixSignals::signal_sets_init()` to ` void signal_sets_init()` so in this case we do need it. > src/hotspot/os/posix/signals_posix.cpp line 1286: > >> 1284: void PosixSignals::print_signal_handlers(outputStream* st, char* buf, size_t buflen) { >> 1285: st->print_cr("Signal Handlers:"); >> 1286: PosixSignals::print_signal_handler(st, SIGSEGV, buf, buflen); > > You shouldn't need the PosixSignals:: prefix in this method. Goo catch, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From neliasso at openjdk.java.net Fri Nov 6 16:55:57 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 6 Nov 2020 16:55:57 GMT Subject: RFR: 8255964: Add jcmd Thread.print to jtreg timeout handler In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 15:53:35 GMT, Igor Ignatyev wrote: >> This patch adds jcmd Thread.print to the jtreg timeout handler. >> >> Please review. > > Hi Nils, > > It looks alright, but could you please elaborate on why we need it when there is already `jstack` action? > > ? Igor They have the same output - I didn't realize that. But it would be nice to add the extended output - so I am updating the PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1080 From gziemski at openjdk.java.net Fri Nov 6 16:57:58 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 6 Nov 2020 16:57:58 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: On Fri, 6 Nov 2020 16:43:33 GMT, Gerard Ziemski wrote: >> src/hotspot/os/posix/signals_posix.cpp line 1286: >> >>> 1284: void PosixSignals::print_signal_handlers(outputStream* st, char* buf, size_t buflen) { >>> 1285: st->print_cr("Signal Handlers:"); >>> 1286: PosixSignals::print_signal_handler(st, SIGSEGV, buf, buflen); >> >> You shouldn't need the PosixSignals:: prefix in this method. > > Goo catch, fixed. Actually now that we are changing `PosixSignals::print_signal_handlers()` to `os::print_signal_handlers()` we do need the `PosixSignals::` prefix after all. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From stuefe at openjdk.java.net Fri Nov 6 17:04:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 6 Nov 2020 17:04:01 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: On Fri, 6 Nov 2020 16:35:14 GMT, Gerard Ziemski wrote: >> src/hotspot/os/posix/signals_posix.hpp line 33: >> >>> 31: >>> 32: typedef siginfo_t siginfo_t; >>> 33: typedef sigset_t sigset_t; >> >> I don't see why this is needed/wanted. We can include signal.h without a problem. >> >> I'm not even sure what these typedefs means ?? > > Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. > > I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. Hi Gerard, the proper way to do this would be to remove the header and forward declare the structures like this: struct sigset_t; struct siginfo_t; struct ucontext_t; That means we can use pointers to those things without including their definition. But in this case I believe you could just remove signal.h without forward declaring anything. System headers are usually included via globalDefinitions.hpp, which we do include here. See globalDefinitions_gcc.hpp in this case. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From stuefe at openjdk.java.net Fri Nov 6 17:04:02 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 6 Nov 2020 17:04:02 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: <49QKKvk4LoQeV4mArPOOpirqLze0cprzH6mXeyjtO9g=.d3858b09-6407-4cd0-bc70-2157c3892350@github.com> On Fri, 6 Nov 2020 16:58:50 GMT, Thomas Stuefe wrote: >> Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. >> >> I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. > > Hi Gerard, > > the proper way to do this would be to remove the header and forward declare the structures like this: > > struct sigset_t; > struct siginfo_t; > struct ucontext_t; > That means we can use pointers to those things without including their definition. > > But in this case I believe you could just remove signal.h without forward declaring anything. System headers are usually included via globalDefinitions.hpp, which we do include here. See globalDefinitions_gcc.hpp in this case. > > Cheers, Thomas But its actually a matter of taste; leaving the forward declarations explicitly in like described above would be okay too. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Fri Nov 6 17:11:00 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 6 Nov 2020 17:11:00 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: <_L33d1_WSo2thIRyyO7OdhXTiI1qBc6NJBlxlwSx8HY=.d95412ff-fb18-455f-9697-cf70915a48a2@github.com> On Wed, 4 Nov 2020 04:09:16 GMT, David Holmes wrote: >> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >> >> - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >> - revert unblock_program_error_signals change > > src/hotspot/os/aix/os_aix.cpp line 2578: > >> 2576: } >> 2577: >> 2578: void os::SuspendedThreadTask::internal_do_task() { > > We should be able to have a single definition of this function in os_posix.cpp too. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From martin.doerr at sap.com Fri Nov 6 17:29:23 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 6 Nov 2020 17:29:23 +0000 Subject: Biased locking Obsoletion In-Reply-To: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> Message-ID: Hi Patricio, seems like nobody wanted to be the first person to reply. So I just share a few thoughts. Unfortunately, I haven't heard any feedback from end users. If the Biased Locking Code removal is not urgent because it's in the way for something else, I'd slightly prefer to remove it early in JDK17. My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off. Some old workloads are heavily affected, like SPEC jvm98. See performance drop on Power9: http://cr.openjdk.java.net/~mdoerr/BiasedLocking_Power9.png and on Intel Xeon E5: http://cr.openjdk.java.net/~mdoerr/BiasedLocking_XeonE5.png Are there any plans for mitigations? If so, it would be nice to implement them before finally removing BL. My 0.02$. Best regards, Martin > -----Original Message----- > From: hotspot-runtime-dev > On Behalf Of David Holmes > Sent: Dienstag, 3. November 2020 22:30 > To: Patricio Chilano ; hotspot-runtime- > dev at openjdk.java.net; hotspot-dev developers dev at openjdk.java.net> > Subject: Re: Biased locking Obsoletion > > Expanding to hotspot-dev. > > > On 4/11/2020 7:08 am, Patricio Chilano wrote: > > Hi all, > > > > As discussed in 8231264, the idea was to switch biased locking to false > > by default and deprecate all related flags with the intent to remove the > > code in a future release unless compelling evidence showed that the code > > is worth maintaining. > > I see there is only one issue that was filed since biased locking was > > disabled by default (https://github.com/openjdk/jdk/pull/542) that seems > > to have been addressed. As per 8231264 change, the code was set to be > > obsoleted in 16, so we are already in a position to remove biased > > locking code unless there are arguments for the contrary. The > > alternative would be to give more time and move biased locking > > obsoletion to a future release. > > Let me know your thoughts. > > > > Thanks, > > > > Patricio From rkennke at openjdk.java.net Fri Nov 6 17:51:01 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 6 Nov 2020 17:51:01 GMT Subject: RFR: 8255991: Shenandoah: Apply 'weak' LRB on cmpxchg and xchg Message-ID: It is possible to access a Reference's referent by using various cmpxchg and xchg intrinsics. When that happens, we need to apply the weak LRB to prevent resurrection. Testing: hotspot_gc_shenandoah ------------- Commit messages: - 8255991: Shenandoah: Apply 'weak' LRB on cmpxchg and xchg Changes: https://git.openjdk.java.net/jdk/pull/1098/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1098&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255991 Stats: 8 lines in 3 files changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1098.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1098/head:pull/1098 PR: https://git.openjdk.java.net/jdk/pull/1098 From xliu at openjdk.java.net Fri Nov 6 18:18:01 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 6 Nov 2020 18:18:01 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v4] In-Reply-To: <5Z7xuFo1k14QqU8vN015Y7DCrOyV9zFprXm_FCJzFlM=.02475ddd-c7b0-4fb6-a173-04f8a25952f7@github.com> References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> <5oJPHjLHldduDGrkzhs5QXssAhCkmmwtn6jssbxnuvI=.cef1e346-791b-4076-a7af-8bdbc7a7b0fc@github.com> <5Z7xuFo1k14QqU8vN015Y7DCrOyV9zFprXm_FCJzFlM=.02475ddd-c7b0-4fb6-a173-04f8a25952f7@github.com> Message-ID: On Fri, 6 Nov 2020 09:27:08 GMT, Volker Simonis wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8255562: delete UseRDPCForConstantTableBase >> >> revert the change of Arguments::handle_aliases_and_deprecation. >> Arguments::process_argument can handle obsoleted arguments. > > Thnaks Xin. Looks good to me now. @simonis Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/949 From simonis at openjdk.java.net Fri Nov 6 18:22:57 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Fri, 6 Nov 2020 18:22:57 GMT Subject: RFR: 8255562: delete UseRDPCForConstantTableBase [v4] In-Reply-To: References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> <5oJPHjLHldduDGrkzhs5QXssAhCkmmwtn6jssbxnuvI=.cef1e346-791b-4076-a7af-8bdbc7a7b0fc@github.com> <5Z7xuFo1k14QqU8vN015Y7DCrOyV9zFprXm_FCJzFlM=.02475ddd-c7b0-4fb6-a173-04f8a25952f7@github.com> Message-ID: On Fri, 6 Nov 2020 18:15:32 GMT, Xin Liu wrote: >> Thnaks Xin. Looks good to me now. > > @simonis Thanks! @navyxliu , you can't push this before the CSR has not been approved. ------------- PR: https://git.openjdk.java.net/jdk/pull/949 From aph at redhat.com Fri Nov 6 18:36:27 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 6 Nov 2020 18:36:27 +0000 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> Message-ID: <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> On 11/6/20 5:29 PM, Doerr, Martin wrote: > My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off. I don't believe they are, because there are several (many?) places in the Java library that perform badly. In particular, see JDK-8254078, DataOutputStream is very slow post-disabling of Biased Locking. This is not a solved problem, and I don't know what a "typical" user is, but some users may experience significant performance degradation. This is not a solved problem. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gziemski at openjdk.java.net Fri Nov 6 19:48:11 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 6 Nov 2020 19:48:11 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v4] In-Reply-To: References: Message-ID: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: David's feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/636/files - new: https://git.openjdk.java.net/jdk/pull/636/files/1c2726de..4d8f4f7b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=02-03 Stats: 39 lines in 6 files changed: 0 ins; 33 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Fri Nov 6 20:13:10 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 6 Nov 2020 20:13:10 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: References: Message-ID: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: Thomas' feedback - cleanup ucontext_get_pc/ucontext_set_pc ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/636/files - new: https://git.openjdk.java.net/jdk/pull/636/files/4d8f4f7b..7b325a4b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=03-04 Stats: 117 lines in 18 files changed: 4 ins; 34 del; 79 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Fri Nov 6 20:13:11 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 6 Nov 2020 20:13:11 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: <-jCaLe-hgq37MGTSpeo_Ka-3ZDd_5zN4z2bg9MACu2M=.39538d88-d553-4710-bf40-38e07ff1401e@github.com> References: <2iv9MdVknMdhPsGDrBAH3AcgT-GtVwktrLlVAOt8z7U=.06b77a86-3aae-4612-a350-a8cfb7c4efcc@github.com> <-jCaLe-hgq37MGTSpeo_Ka-3ZDd_5zN4z2bg9MACu2M=.39538d88-d553-4710-bf40-38e07ff1401e@github.com> Message-ID: On Tue, 3 Nov 2020 21:43:00 GMT, Gerard Ziemski wrote: >> src/hotspot/share/runtime/os.hpp line 970: >> >>> 968: >>> 969: static address ucontext_get_pc(const ucontext_t* ctx); >>> 970: static void ucontext_set_pc(ucontext_t* ctx, address pc); >> >> This feels misplaced here (and probably won't compile on windows) since ucontext_t is POSIX. At the very least needs ucontext.h. But I would consider moving this to os_posix. > > I thought I tested it and it built fine on Windows - will take another look... Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From neliasso at openjdk.java.net Fri Nov 6 20:25:13 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 6 Nov 2020 20:25:13 GMT Subject: RFR: 8255964: Add jcmd Thread.print to jtreg timeout handler [v2] In-Reply-To: References: Message-ID: > This patch adds jcmd Thread.print to the jtreg timeout handler. > > Please review. Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: add extended printing to jstack ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1080/files - new: https://git.openjdk.java.net/jdk/pull/1080/files/2451660c..2ca47884 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1080&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1080&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1080.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1080/head:pull/1080 PR: https://git.openjdk.java.net/jdk/pull/1080 From coleenp at openjdk.java.net Fri Nov 6 22:41:05 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 6 Nov 2020 22:41:05 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 21:26:16 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: > > - Merge branch '8254162' into 8254231_linker > - Fix post-merge issues caused by 8219014 > - Merge branch 'master' into 8254162 > - Addess remaining feedback from @AlanBateman and @mrserb > - Address comments from @AlanBateman > - Fix typo in upcall helper for aarch64 > - Merge branch '8254162' into 8254231_linker > - Merge branch 'master' into 8254162 > - Fix issues with derived buffers and IO operations > - More 32-bit fixes for TestLayouts > - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 58: > 56: #if 0 > 57: fprintf(stderr, "upcall_init()\n"); > 58: #endif There shouldn't be #if 0 debugging code in the final version. src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 55: > 53: > 54: // FIXME: This should be initialized explicitly instead of lazily/racily > 55: static void upcall_init() { The FIXME is right this should be initialized as a well known class and referred to here as SystemDictionary::ProgrammableUpcallHandler_klass(). This really doesn't belong here. src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 81: > 79: #endif > 80: > 81: Method* method = k->lookup_method(mname_sym, mdesc_sym); This "method" appears unused. src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 76: > 74: #endif > 75: > 76: Klass* k = SystemDictionary::resolve_or_null(cname_sym, THREAD); pass CATCH if you expect this to never throw an ClassNotFoundException. src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 68: > 66: Symbol* cname_sym = SymbolTable::new_symbol(cname, (int)strlen(cname)); > 67: Symbol* mname_sym = SymbolTable::new_symbol(mname, (int)strlen(mname)); > 68: Symbol* mdesc_sym = SymbolTable::new_symbol(mdesc, (int)strlen(mdesc)); You don't need the strlen() argument. src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 121: > 119: upcall_info.upcall_method.name, upcall_info.upcall_method.sig, > 120: &args, thread); > 121: } This code shouldn't be in the cpu directory. This should be in SharedRuntime or in jni.cpp. It should have a JNI_ENTRY and not transition directly. I don't know what AttachCurrentThreadAsDaemon does. src/hotspot/share/prims/universalNativeInvoker.hpp line 28: > 26: > 27: #include "classfile/javaClasses.hpp" > 28: #include "classfile/vmSymbols.hpp" This file doesn't seem to need all of these #include files. src/hotspot/share/prims/universalNativeInvoker.hpp line 37: > 35: #ifdef ZERO > 36: # include "entry_zero.hpp" > 37: #endif needed? Or does the header file that declares ProgrammableStub need this? src/hotspot/share/prims/universalUpcallHandler.cpp line 2: > 1: /* > 2: * Copyright (c) 2018, 2020, Oracle and/or its affiliates. All rights reserved. New files should only have 2020 even though this might have been checked into a branch before. There are a bunch of new files that have old copyrights I noticed. src/hotspot/share/prims/universalUpcallHandler.cpp line 52: > 50: guarantee(status == JNI_OK && !env->ExceptionOccurred(), > 51: "register jdk.internal.foreign.abi.ProgrammableUpcallHandler natives"); > 52: JNI_END Why isn't that other function here? src/hotspot/share/prims/universalUpcallHandler.hpp line 30: > 28: #include "classfile/vmSymbols.hpp" > 29: #include "include/jvm.h" > 30: #include "runtime/frame.inline.hpp" None of these header files seem to be needed here. src/hotspot/share/runtime/init.cpp line 40: > 38: #include "prims/methodHandles.hpp" > 39: #include "prims/universalNativeInvoker.hpp" > 40: #include "runtime/globals.hpp" This looks like the only change so why do you need the #includes? src/hotspot/share/runtime/thread.hpp line 1574: > 1572: return byte_offset_of(JavaThread, _anchor); > 1573: } > 1574: blank line, revert this if this is the only change here to avoid conflicts. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From coleenp at openjdk.java.net Fri Nov 6 22:41:06 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 6 Nov 2020 22:41:06 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 21:47:42 GMT, Coleen Phillimore wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: >> >> - Merge branch '8254162' into 8254231_linker >> - Fix post-merge issues caused by 8219014 >> - Merge branch 'master' into 8254162 >> - Addess remaining feedback from @AlanBateman and @mrserb >> - Address comments from @AlanBateman >> - Fix typo in upcall helper for aarch64 >> - Merge branch '8254162' into 8254231_linker >> - Merge branch 'master' into 8254162 >> - Fix issues with derived buffers and IO operations >> - More 32-bit fixes for TestLayouts >> - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f > > src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 81: > >> 79: #endif >> 80: >> 81: Method* method = k->lookup_method(mname_sym, mdesc_sym); > > This "method" appears unused. This should be moved into javaClasses or common code. resolve_or_null only resolves the class, it doesn't also call the initializer for the class so you shouldn't be able to call a static method on the class. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From iignatyev at openjdk.java.net Sat Nov 7 00:00:56 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sat, 7 Nov 2020 00:00:56 GMT Subject: RFR: 8255964: Add all details to jstack log in jtreg timeout handler [v2] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 20:25:13 GMT, Nils Eliasson wrote: >> This patch adds jcmd Thread.print to the jtreg timeout handler. >> >> Please review. > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > add extended printing to jstack Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1080 From david.holmes at oracle.com Sat Nov 7 04:38:20 2020 From: david.holmes at oracle.com (David Holmes) Date: Sat, 7 Nov 2020 14:38:20 +1000 Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: On 7/11/2020 2:40 am, Gerard Ziemski wrote: > On Wed, 4 Nov 2020 04:22:05 GMT, David Holmes wrote: > >>> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >>> >>> - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >>> - revert unblock_program_error_signals change >> >> src/hotspot/os/posix/signals_posix.hpp line 33: >> >>> 31: >>> 32: typedef siginfo_t siginfo_t; >>> 33: typedef sigset_t sigset_t; >> >> I don't see why this is needed/wanted. We can include signal.h without a problem. >> >> I'm not even sure what these typedefs means ?? > > Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. > > I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. The only reason we would care is if signals_posix.hpp were included in many other handers/files and that should not be the case. This looks completely bogus to me as we need the types from signal.h in this header file. What do those typedefs even mean? I would expect a forward declaration to be of the form: struct siginfo_t; but you don't know what type sigset_t (could be integer or struct) actually is so you can't forward declare it that way. David ----- >> src/hotspot/os/posix/signals_posix.cpp line 1642: >> >>> 1640: >>> 1641: void PosixSignals::do_task(Thread* thread, os::SuspendedThreadTask* task) { >>> 1642: if (PosixSignals::do_suspend(thread->osthread())) { >> >> Shouldn't need PosixSignals:: prefix in this method. > > Leftover from the time when do_task wasn't in signals_posix.cpp, fixed. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/636 > From github.com+58006833+xbzhang99 at openjdk.java.net Sat Nov 7 07:33:06 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Sat, 7 Nov 2020 07:33:06 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v2] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: Added test cases for exp at the value of 1024 and 10000 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/72630558..305d915b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From patricio.chilano.mateo at oracle.com Sat Nov 7 07:52:30 2020 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Sat, 7 Nov 2020 02:52:30 -0500 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> Message-ID: Hi Martin, Thanks for your feedback. On 11/6/20 12:29 PM, Doerr, Martin wrote: > Hi Patricio, > > seems like nobody wanted to be the first person to reply. So I just share a few thoughts. > > Unfortunately, I haven't heard any feedback from end users. > If the Biased Locking Code removal is not urgent because it's in the way for something else, I'd slightly prefer to remove it early in JDK17. > > My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off. > > Some old workloads are heavily affected, like SPEC jvm98. See performance drop on Power9: > http://cr.openjdk.java.net/~mdoerr/BiasedLocking_Power9.png > and on Intel Xeon E5: > http://cr.openjdk.java.net/~mdoerr/BiasedLocking_XeonE5.png > > Are there any plans for mitigations? > If so, it would be nice to implement them before finally removing BL. SPECjvm98 uses some classes that synchronize on pretty much every access and almost all workloads are single-threaded. To give you a couple of examples, I logged all synchronizations attempts from _209_db, _228_jack and _202_jess, benchmarks for which I saw regressions from 10%(_202_jess) up to 25%-30%(_209_db), similar to the results you sent. Here are the results showing the amount of times a thread synchronized on objects of a given class (run a single iteration with -g -M1 -s100): _209_db: ?Count ??? ??? Thread ??? ??? ??? ??? ??? Class 56255655 ??? 0x00007f9a18026dd0 ??? java.util.Vector ? 24252 ??? ??? 0x00007f9a18026dd0 ??? spec.io.FileInputStream ? 15326 ??? ??? 0x00007f9a18026dd0 ??? java.lang.StringBuffer ?? 5203 ??? ??? 0x00007f9a18026dd0 ??? java.io.ByteArrayOutputStream ?? 5203 ??? ??? 0x00007f9a18026dd0 ??? java.io.ByteArrayInputStream ?? 4088 ??? ??? 0x00007f9a18026dd0 ??? java.io.OutputStreamWriter ?? 1643 ??? ??? 0x00007f9a18026dd0 ??? spec.io.PrintStream ??? 931 ??? ??? 0x00007f9a18026dd0 ??? java.io.BufferedOutputStream ??? 633 ??? ??? 0x00007f9a18026dd0 ??? spec.io.TableOfExistingFiles ??? 625 ??? ??? 0x00007f9a18026dd0 ??? java.io.PrintStream ??? 369 ??? ??? 0x00007f9a18026dd0 ?java.util.concurrent.ConcurrentHashMap$Node (Full trace: http://cr.openjdk.java.net/~pchilanomate/specjvm98synctests/_209_db.txt) _228_jack ?Count ??? ??? Thread ??? ??? ??? ??? ??? Class 8349013 ??? 0x00007fba14026dd0 ??? java.util.Vector 2808067 ??? 0x00007fba14026dd0 ??? java.util.Hashtable 1173017 ??? ??? 0x00007fba14026dd0 ??? java.io.OutputStreamWriter ?886454 ??? 0x00007fba14026dd0 ??? java.lang.StringBuffer ?580516 ??? 0x00007fba14026dd0 spec.benchmarks._228_jack.JackPrintStream ?291017 ??? ??? 0x00007fba14026dd0 ??? spec.io.FileInputStream ?? 1116 ??? ??? 0x00007fba14026dd0 ??? java.io.ByteArrayOutputStream ?? 1116 ??? ??? 0x00007fba14026dd0 ??? java.io.ByteArrayInputStream ??? 633 ??? ??? 0x00007fba14026dd0 ??? spec.io.TableOfExistingFiles ??? 525 ??? ??? 0x00007fba14026dd0 java.util.concurrent.ConcurrentHashMap$Node ??? 414 ??? ??? 0x00007fba14026dd0 ??? java.io.FileDescriptor (Full trace: http://cr.openjdk.java.net/~pchilanomate/specjvm98synctests/_228_jack.txt) _202_jess ?Count ??? ??? Thread ??? ??? ??? ??? ??? Class 3515114 ??? 0x00007f6ad4026dd0 ??? java.util.Hashtable 1323387 ??? 0x00007f6ad4026dd0 ??? java.util.Vector ? 43451 ??? ??? 0x00007f6ad4026dd0 ??? java.util.Stack ? 18920 ??? ??? 0x00007f6ad4026dd0 ??? java.lang.StringBuffer ? 14952 ??? ??? 0x00007f6ad4026dd0 ??? spec.io.FileInputStream ?? 3811 ??? ??? 0x00007f6ad4026dd0 ??? java.io.OutputStreamWriter ?? 2413 ??? ??? 0x00007f6ad4026dd0 ??? java.io.ByteArrayOutputStream ?? 2413 ??? ??? 0x00007f6ad4026dd0 ??? java.io.ByteArrayInputStream ?? 1623 ??? ??? 0x00007f6ad4026dd0 ??? spec.io.PrintStream ?? 1380 ??? ??? 0x00007f6ad4026dd0 ??? java.lang.Class ??? 847 ??? ??? 0x00007f6ad4026dd0 ??? java.io.FileDescriptor ??? 814 ??? ??? 0x00007f6ad4026dd0 java.util.concurrent.ConcurrentHashMap$Node ??? 633 ??? ??? 0x00007f6ad4026dd0 ??? spec.io.TableOfExistingFiles (Full trace: http://cr.openjdk.java.net/~pchilanomate/specjvm98synctests/_202_jess.txt) So all the work is done by the same JavaThread, and given the amount of uncontended synchronization biased locking shines. Not sure if the pictures you sent happen to be from those particular benchmarks or others(_201_compress, _200_check, etc), but we could use the same technique and find out which classes we are synchronizing on that are causing the tests to prefer BL and whether the workload is single or multithreaded. We can do the same for other benchmarks. Note: you can also get those numbers with DiagnoseSyncOnPrimitiveWrappers, the flag I added recently for Valhalla (plus some minor changes). The same tests on benchmark _222_mpegaudio, for which I didn't see major performance difference with biased locking disabled, showed the following results (run a single iteration with -g -M1 -s100): _222_mpegaudio ?Count ??? ??? Thread ??? ??? ??? ??? Class 2165 ??? 0x00007fc4d0026dd0 ??? spec.io.FileInputStream ??? 790 ??? 0x00007fc4d0026dd0 ??? java.lang.StringBuffer ??? 633 ??? 0x00007fc4d0026dd0 ??? spec.io.TableOfExistingFiles ??? 581 ??? 0x00007fc4d0026dd0 java.util.concurrent.ConcurrentHashMap$Node ??? 365 ??? 0x00007fc4d0026dd0 ??? java.io.FileDescriptor ??? 266 ??? 0x00007fc4d0026dd0 ??? java.lang.Object ??? 208 ??? 0x00007fc4d0026dd0 ??? java.io.ByteArrayOutputStream ??? 208 ??? 0x00007fc4d0026dd0 ??? java.io.ByteArrayInputStream (Full trace: http://cr.openjdk.java.net/~pchilanomate/specjvm98synctests/_222_mpegaudio.txt) Still single-threaded with many synchronized calls but not as much as the above benchmarks, so it makes sense performance doesn't change much with or without BL. If the benchmarks above would be part of a real world app for which we would be trying to solve a performance issue, I think the solution would be to just drop java.util.Vector and java.util.Hashtable and use java.util.ArrayList and java.util.HashMap instead. Also if using custom classes, they shouldn't use synchronized keyword unless it's necessary. Then, not only performance would not be affected but it's also likely it will improve since we are not wasting time in unneeded monitorenter/exit bytecodes. If the issue is that the JDK library only provides synchronized classes, then I think we should have an unsynchronized flavor too. Then there could be workflows that still benefit from BL (although I tend to think the app code can probably be re-written so as to avoid those unnecessary synchronization calls) but the question is whether we want to support those cases. I agree that since there is nothing urgent that needs BL to go away we can push it to the next release instead. Then we will be having even more time for feedback and/or fix any issues. Thanks, Patricio > My 0.02$. > > Best regards, > Martin > > >> -----Original Message----- >> From: hotspot-runtime-dev >> On Behalf Of David Holmes >> Sent: Dienstag, 3. November 2020 22:30 >> To: Patricio Chilano ; hotspot-runtime- >> dev at openjdk.java.net; hotspot-dev developers > dev at openjdk.java.net> >> Subject: Re: Biased locking Obsoletion >> >> Expanding to hotspot-dev. >> >> >> On 4/11/2020 7:08 am, Patricio Chilano wrote: >>> Hi all, >>> >>> As discussed in 8231264, the idea was to switch biased locking to false >>> by default and deprecate all related flags with the intent to remove the >>> code in a future release unless compelling evidence showed that the code >>> is worth maintaining. >>> I see there is only one issue that was filed since biased locking was >>> disabled by default (https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/542__;!!GqivPVa7Brio!NtUU4TDlOTLoheSxnnQbTo4M64nbmca-8qDkdq0V4MjIqCudEEvdyj8am8BnV257S3q83fUh7g$ ) that seems >>> to have been addressed. As per 8231264 change, the code was set to be >>> obsoleted in 16, so we are already in a position to remove biased >>> locking code unless there are arguments for the contrary. The >>> alternative would be to give more time and move biased locking >>> obsoletion to a future release. >>> Let me know your thoughts. >>> >>> Thanks, >>> >>> Patricio From stuefe at openjdk.java.net Sat Nov 7 08:01:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 7 Nov 2020 08:01:01 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v3] In-Reply-To: References: <3EZKLGJG_aAOaBURXHnuA29ElMwzghaujWMFkX88Iik=.de2cb23b-d93b-4e3e-8788-5aa9c824c9c4@github.com> Message-ID: On Wed, 4 Nov 2020 05:06:14 GMT, Thomas Stuefe wrote: >> Hi Gerard, >> >> Overall looking good. Some changes still to be finalized e.g ucontext_t related functions in os.hpp. >> >> I flagged some os functions that are implemented in os_foo.cpp but which just call the Posix helper, which can be deleted from os_foo.cpp and simply added to os_posix.cpp. That can't be a further cleanup RFE if you want to limit changes in this PR. >> >> A few minor nits below. >> >> Thanks, >> David > >> The snapshot of JDK that I'm using in my PR does not build on Windows. Do you have any suggestion how I can safely update to the latest JDK without messing up my PR? > > Hi Gerard, > > merging master would not help you with the build error. As I said, it complains about ucontext_t. > > As for your question, just do a > > git checkout yourbranch > git merge master > > if you get conflicts, you'll need to resolve them, but this is the way to go without invalidating old commits. > _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ > > On 7/11/2020 2:40 am, Gerard Ziemski wrote: > > > On Wed, 4 Nov 2020 04:22:05 GMT, David Holmes wrote: > > > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > > > - use ifdef(SIGDANGER) and ifdef(SIGTRAP) > > > > - revert unblock_program_error_signals change > > > > > > > > > src/hotspot/os/posix/signals_posix.hpp line 33: > > > > 31: > > > > 32: typedef siginfo_t siginfo_t; > > > > 33: typedef sigset_t sigset_t; > > > > > > > > > I don't see why this is needed/wanted. We can include signal.h without a problem. > > > I'm not even sure what these typedefs means ?? > > > > > > Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. > > I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. > > The only reason we would care is if signals_posix.hpp were included in > many other handers/files and that should not be the case. This looks > completely bogus to me as we need the types from signal.h in this header > file. What do those typedefs even mean? I would expect a forward > declaration to be of the form: > > struct siginfo_t; > > but you don't know what type sigset_t (could be integer or struct) > actually is so you can't forward declare it that way. > Oh you are right. Posix guarantees that siginfo_t and ucontext_t are structures; sigset_t can be either a structure or an integer type. Missed that. I'd just either leave out the header. Or even leave it in. Do we have a policy like "all system headers only go into one central place, e.g. globalDefinitions"? Because if not, Gerard's original code would be more correct (where he included signal.h) Just my 5 cent. I'll keep out of this discussion for now. Any form is fine for me. Cheers Thomas > David > ----- ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From dcubed at openjdk.java.net Sat Nov 7 16:57:55 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 16:57:55 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 01:54:52 GMT, David Holmes wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > src/hotspot/share/runtime/monitorDeflationThread.hpp line 44: > >> 42: >> 43: // Hide this thread from external view. >> 44: bool is_hidden_from_external_view() const { return true; } > > Style nit: a single space after "const" suffices Fixed. I copied that from the serviceThread.hpp file, but, since I haven't otherwise touched that file in this changeset, I'm going to leave serviceThread.hpp alone. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Sat Nov 7 17:04:58 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 17:04:58 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 01:57:59 GMT, David Holmes wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > src/hotspot/share/runtime/objectMonitor.cpp line 380: > >> 378: if (event.should_commit()) { >> 379: event.set_monitorClass(object()->klass()); >> 380: event.set_address((uintptr_t)this); > > This looks wrong - the event should refer to the Object whose monitor we have entered, not the OM itself. I noticed that in my preliminary review of Erik's changes. He checked with the JFR guys and they said it just needed to be an address and does not have to refer to the Object. @fisk - can you think of a comment we should add here? > src/hotspot/share/runtime/objectMonitor.cpp line 1472: > >> 1470: event->set_monitorClass(monitor->object()->klass()); >> 1471: event->set_timeout(timeout); >> 1472: event->set_address((uintptr_t)monitor); > > Again the event should refer to the Object, not the OM. I noticed that in my preliminary review of Erik's changes. He checked with the JFR guys and they said it just needed to be an address and does not have to refer to the Object. @fisk - can you think of a comment we should add here? ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Sat Nov 7 17:16:58 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 17:16:58 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 02:00:07 GMT, David Holmes wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > src/hotspot/share/runtime/objectMonitor.hpp line 145: > >> 143: // have busy multi-threaded access. _header and _object are set at initial >> 144: // inflation. The _object does not change, so it is a good choice to share >> 145: // its the cache line with _header. > > Typo: its the Nice catch. Fixed. > src/hotspot/share/runtime/synchronizer.cpp line 60: > >> 58: >> 59: void MonitorList::add(ObjectMonitor* m) { >> 60: for (;;) { > > Style nit: a do/while loop would like nicer here. Fixed: do { ObjectMonitor* head = Atomic::load(&_head); m->set_next_om(head); } while (Atomic::cmpxchg(&_head, head, m) != head); > src/hotspot/share/runtime/synchronizer.cpp line 94: > >> 92: // Find next live ObjectMonitor. >> 93: ObjectMonitor* next = m; >> 94: while (next != NULL && next->is_being_async_deflated()) { > > Nit: This loop seems odd. Given we know m is_being_async_deflated, this should either be a do/while loop, or else we should initialize: > > ObjectMonitor* next = m->next_om(); > > and dispense with the awkwardly named next_next. @fisk - I'm leaving this one for you for now. > src/hotspot/share/runtime/synchronizer.cpp line 126: > >> 124: } >> 125: >> 126: if (self->is_Java_thread() && > > Non-JavaThreads may have a monitor list ??? This function, MonitorList::unlink_deflated(), may be called by either the MonitorDeflationThread or the VMThread. The VMThread does not need to block for the safepoint which is what the `if (self->is_Java_thread()` prevents. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Sat Nov 7 17:21:59 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 17:21:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 02:14:56 GMT, David Holmes wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > src/hotspot/share/runtime/synchronizer.cpp line 136: > >> 134: >> 135: // Honor block request. >> 136: ThreadBlockInVM tbivm(self->as_Java_thread()); > > ThreadBlockInVM is generally used to wrap blocking code, not to cause the current thread to block (which it will do as a side-effect if a safepoint/handshake is requested). Surely this should just be call to `process_if_requested` (or the new `process_if_requested_with_exit_check`)? This kind of use of ThreadBlockInVM predates this changeset so while the location is new, then code style is old, very old... I'll hold off changing this for now. > src/hotspot/share/runtime/synchronizer.cpp line 1425: > >> 1423: } >> 1424: >> 1425: if (self->is_Java_thread() && > > Again unclear how a non-JavaThread is doing this? Isn't this always done by the MonitorDeflation thread?? This function, ObjectSynchronizer::deflate_monitor_list(), may be called by either the MonitorDeflationThread or the VMThread. The VMThread does not need to block for the safepoint which is what the if (self->is_Java_thread() prevents. > src/hotspot/share/runtime/synchronizer.cpp line 1435: > >> 1433: >> 1434: // Honor block request. >> 1435: ThreadBlockInVM tbivm(self->as_Java_thread()); > > Same comment as previous use of TBIVM. (It's hard to see in the PR UI how they two blocks relate.) This kind of use of ThreadBlockInVM predates this changeset so while the location is new, then code style is old, very old... I'll hold off changing this for now. > src/hotspot/share/runtime/synchronizer.cpp line 1460: > >> 1458: // This function is called by the MonitorDeflationThread to deflate >> 1459: // ObjectMonitors. It is also called via do_final_audit_and_print_stats() >> 1460: // by the VMThread. > > Ah! I think this addresses my previous comments about a non-JavaThread doing this. Yup... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Sat Nov 7 17:27:58 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 17:27:58 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 02:25:23 GMT, David Holmes wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > src/hotspot/share/runtime/synchronizer.cpp line 1501: > >> 1499: if (ls != NULL) { >> 1500: timer.stop(); >> 1501: ls->print_cr("before handshaking: unlinked_count=" SIZE_FORMAT ", in_use_list stats: ceiling=" SIZE_FORMAT ", count=" SIZE_FORMAT ", max=" SIZE_FORMAT, > > Style nit: line too long Yeah... I keep going back and forth for long logging lines... I reformatted that one and another one a few lines farther down. > src/hotspot/share/runtime/synchronizer.cpp line 1520: > >> 1518: // deflated in this cycle. >> 1519: size_t deleted_count = 0; >> 1520: for (ObjectMonitor* monitor: delete_list) { > > I didn't realize C++ has a "foreach" loop construct! Is this in our allowed C++ usage? @fisk - this one is for you... :-) > src/hotspot/share/runtime/synchronizer.cpp line 1533: > >> 1531: >> 1532: // Honor block request. >> 1533: ThreadBlockInVM tbivm(self->as_Java_thread()); > > Ditto previous comments on use of TBIVM. This kind of use of ThreadBlockInVM predates this changeset so while the location is new, then code style is old, very old... I'll hold off changing this for now. > src/hotspot/share/runtime/synchronizer.cpp line 1712: > >> 1710: // Check the in_use_list; log the results of the checks. >> 1711: void ObjectSynchronizer::chk_in_use_list(outputStream* out, int *error_cnt_p) { >> 1712: size_t l_in_use_count = _in_use_list.count(); > > Style nit: what is this `l_` prefix? Is that a one or a small L? why do we want want/need it? (Applies elsewhere too) The "l_" prefix is used for a local copy of a value where we want to make sure that we use a consistent value for the check and for the resulting audit/logging message. This is not a new thing with this changeset and that style was used in previous versions of the ObjectMonitor audit/logging code. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Sat Nov 7 17:40:55 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 17:40:55 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 02:31:12 GMT, David Holmes wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > src/hotspot/share/runtime/synchronizer.cpp line 1748: > >> 1746: int* error_cnt_p) { >> 1747: if (n->owner_is_DEFLATER_MARKER()) { >> 1748: // This should not happen, but if it does, it is not fatal. > > Deflating an in-use monitor is not fatal? Please explain how things would recover. When the MonitorDeflationThread is doing its final async deflation cycle before VM shutdown, it is possible for it to finish a deflation pass on the in-use list and then block for the final safepoint before unlinking the deflated ObjectMonitors. If the VMThread happens to be doing an audit_and_print_stats() call when this happens (see the end of SafepointSynchronize::do_cleanup_tasks()), then we don't want that audit to FAIL. As for recovery, if the VMThread is doing a final audit, then when it sees an already deflated ObjectMonitor on the in-use list, it will take care of the unlinking and deleting... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From Monica.Beckwith at microsoft.com Sat Nov 7 17:52:52 2020 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Sat, 7 Nov 2020 17:52:52 +0000 Subject: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK In-Reply-To: <20201104150607.735653032@eggemoggin.niobe.net> References: , <20201104150607.735653032@eggemoggin.niobe.net> Message-ID: Thanks Vladimir and Mark. Great to hear the positivity in your replies. Please let me know how we can be of help. Regards, Monica Get Outlook for Android ________________________________ From: hotspot-dev on behalf of mark.reinhold at oracle.com Sent: Wednesday, November 4, 2020, 5:06 PM To: volker.simonis at gmail.com; vladimir.kozlov at oracle.com Cc: hotspot-dev at openjdk.java.net; discuss at openjdk.java.net Subject: Re: RFR: 8255616: Disable AOT and Graal in Oracle OpenJDK 2020/11/4 14:53:47 -0800, vladimir.kozlov at oracle.com: > On 11/3/20 2:51 AM, Volker Simonis wrote: >> this is an interesting step and I wonder how it affects the OpenJDK >> Graal, Metropolis and Leyden projects? >> > ... > > I will let Mark to talk about Project Leyden. > > Best regards, > Vladimir Kozlov > > ... >> >> - Project Leyden [?]: @Mark: what's actually the state of Project >> Leyden? We had a discussion [3], a vote [4] and the approval of the >> project [5] yet nothing has happened ever since. There's neither a >> project page nor a mailing list. Unfortunately, due to other priorities I haven?t had the time to get this project started properly. I hope to be able to do that soon. >> Considering the fact that Leyden was supposed to "be based upon >> existing components in the JDK such as the HotSpot JVM, the `jaotc` >> ahead-of-time compiler, application class-data sharing, and the >> `jlink` linking tool" I wonder if Leyden is already dead before its >> instantiation if "jaotc", one of its core components, has now been >> deprecated? Or are there any plans to enhance C2 for AOT scenarios? We are considering the possibility of using C2 for ahead-of-time compilation. - Mark From dcubed at openjdk.java.net Sat Nov 7 18:01:57 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 18:01:57 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 02:33:37 GMT, David Holmes wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > src/hotspot/share/runtime/synchronizer.hpp line 173: > >> 171: >> 172: static MonitorList _in_use_list; >> 173: static jint _in_use_list_ceiling; > > Can you add some commentary on what this ceiling is as I could not understand its role just by looking at the code. Thanks. How about this: static MonitorList _in_use_list; // The ratio of the current _in_use_list count to the ceiling is used // to determine if we are above MonitorUsedDeflationThreshold and need // to do an async monitor deflation cycle. The ceiling is increased by // AvgMonitorsPerThreadEstimate when a thread is added to the system // and is decreased by AvgMonitorsPerThreadEstimate when a thread is // removed from the system. // Note: If the _in_use_list max exceeds the ceiling, then // monitors_used_above_threshold() will use the in_use_list max instead // of the thread count derived ceiling because we have used more // ObjectMonitors than the estimated average. static jint _in_use_list_ceiling; > src/hotspot/share/runtime/synchronizer.cpp line 221: > >> 219: >> 220: MonitorList ObjectSynchronizer::_in_use_list; >> 221: // Start the ceiling with one thread: > > This relates to me not understanding what this ceiling is (as commented elsewhere) but why does this say "start with one thread" when the value of AvgMonitorsPerThreadEstimate defaults to 1024 ?? The estimate is that a single thread will generate at most 1024 inflated ObjectMonitors on average. I changed the comment like this: // Start the ceiling with the estimate for one thread: jint ObjectSynchronizer::_in_use_list_ceiling = AvgMonitorsPerThreadEstimate; Does that help? ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Sat Nov 7 18:05:00 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 18:05:00 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 02:40:44 GMT, David Holmes wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Hi Dan, > > Overall this looks great. Comparing old and new code is complex but the new code on its own is generally much simpler/clearer (not all though :) ). > > I have a few nits, comments and queries below. > > Thanks, > David @dholmes-ora - Thanks for the review! Hmm... I'm not sure why the GitHub UI send out my replies one-at-a-time. Perhaps I should have replied from the "files" view instead of the main PR view? ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Sat Nov 7 18:20:17 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sat, 7 Nov 2020 18:20:17 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also done promotion perf runs on these bits and says "the results look fine". Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: Resolve most of dholmes-ora comments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/642/files - new: https://git.openjdk.java.net/jdk/pull/642/files/b7d0c1e9..6c2db34a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=00-01 Stats: 28 lines in 4 files changed: 15 ins; 2 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/642.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/642/head:pull/642 PR: https://git.openjdk.java.net/jdk/pull/642 From alanb at openjdk.java.net Sun Nov 8 08:12:56 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Sun, 8 Nov 2020 08:12:56 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) In-Reply-To: <-u4eSM47o_e_KlfTRYBNGyLNhjqAeG-84u_uEd3ppH0=.c49b4269-1001-49d1-96fd-ecacbf2417e9@github.com> References: <-u4eSM47o_e_KlfTRYBNGyLNhjqAeG-84u_uEd3ppH0=.c49b4269-1001-49d1-96fd-ecacbf2417e9@github.com> Message-ID: On Wed, 4 Nov 2020 07:45:09 GMT, Alan Bateman wrote: >>> The javadoc for copyFrom isn't changed in this update but I notice it specifies IndexOutOfBoundException when the source segment is larger than the receiver, have other exceptions been examined? >> >> This exception is consistent with other uses of this exception throughout this API (e.g. when writing a segment out of bounds). > > I assume the CSR needs to be updated so that it's in sync with the API changes in the latest round. I see the xxxByteAtIndex methods that took a ByteOrder have been removed from MemoryAccess. Should the xxxByte and xxxByteAtOffset that take a ByteOrder be removed too? ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From rkennke at openjdk.java.net Sun Nov 8 14:02:01 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Sun, 8 Nov 2020 14:02:01 GMT Subject: RFR: 8256015: Shenandoah: Add missing Shenandoah implementation in WB_isObjectInOldGen Message-ID: The test gc/TestReferenceRefersTo.java fails with Shenandoah because of a missing implementation in WB_isObjectInOldGen: Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shared/collectedHeap.hpp:207), pid=511307, tid=511422 assert(kind == heap->kind()) failed: Heap kind 6 should be 1 This is introduced by JDK-8188055. Testing: hotspot_gc_shenandoah, tier1+Shenandoah, TestReferenceRefersTo.java ------------- Commit messages: - 8256015: Shenandoah: Add missing Shenandoah implementation in WB_isObjectInOldGen Changes: https://git.openjdk.java.net/jdk/pull/1111/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1111&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256015 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1111.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1111/head:pull/1111 PR: https://git.openjdk.java.net/jdk/pull/1111 From shade at openjdk.java.net Sun Nov 8 14:52:53 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 8 Nov 2020 14:52:53 GMT Subject: RFR: 8256015: Shenandoah: Add missing Shenandoah implementation in WB_isObjectInOldGen In-Reply-To: References: Message-ID: On Sun, 8 Nov 2020 13:57:03 GMT, Roman Kennke wrote: > The test gc/TestReferenceRefersTo.java fails with Shenandoah because of a missing implementation in WB_isObjectInOldGen: > > Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shared/collectedHeap.hpp:207), pid=511307, tid=511422 > assert(kind == heap->kind()) failed: Heap kind 6 should be 1 > > This is introduced by JDK-8188055. > > Testing: hotspot_gc_shenandoah, tier1+Shenandoah, TestReferenceRefersTo.java Looks fine and trivial. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1111 From shade at openjdk.java.net Sun Nov 8 14:56:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 8 Nov 2020 14:56:59 GMT Subject: RFR: 8255991: Shenandoah: Apply 'weak' LRB on cmpxchg and xchg In-Reply-To: References: Message-ID: <5d4PdExALw13yedls7jrZSAvOjd9hUKmw5nleWM_CMo=.117f449b-dadd-438d-a5b3-5d3fc4e74386@github.com> On Fri, 6 Nov 2020 17:45:53 GMT, Roman Kennke wrote: > It is possible to access a Reference's referent by using various cmpxchg and xchg intrinsics. When that happens, we need to apply the weak LRB to prevent resurrection. > > Testing: hotspot_gc_shenandoah Makes sense to me. Looks good. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1098 From xliu at openjdk.java.net Sun Nov 8 15:06:58 2020 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 8 Nov 2020 15:06:58 GMT Subject: Integrated: 8255562: delete UseRDPCForConstantTableBase In-Reply-To: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> References: <4Uz5ndhxS0KyAkpqwOw0A7yNDj3XvxE46JV5n7F8nFo=.406aac45-16c8-4161-bac0-64d3a8feaa8d@github.com> Message-ID: On Fri, 30 Oct 2020 05:43:25 GMT, Xin Liu wrote: > UseRDPCForConstantTableBase was a SPARC-exclusive flag. Sparc has been removed > from hotspot, so remove this flag. This pull request has now been integrated. Changeset: 6a183fbb Author: Xin Liu Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/6a183fbb Stats: 5 lines in 3 files changed: 1 ins; 4 del; 0 mod 8255562: delete UseRDPCForConstantTableBase Reviewed-by: simonis ------------- PR: https://git.openjdk.java.net/jdk/pull/949 From alanb at openjdk.java.net Sun Nov 8 16:31:58 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Sun, 8 Nov 2020 16:31:58 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v22] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 17:14:16 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: >> >> * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads >> * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually >> * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. >> >> A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. >> >> This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). >> >> A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. >> >> A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. >> >> Thanks >> Maurizio >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff: >> >> http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254163 >> >> >> >> ### API Changes >> >> * `MemorySegment` >> * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) >> * added a no-arg factory for a native restricted segment representing entire native heap >> * rename `withOwnerThread` to `handoff` >> * add new `share` method, to create shared segments >> * add new `registerCleaner` method, to register a segment against a cleaner >> * add more helpers to create arrays from a segment e.g. `toIntArray` >> * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) >> * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) >> * `MemoryAddress` >> * drop `segment` accessor >> * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment >> * `MemoryAccess` >> * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). >> * `MemoryHandles` >> * drop `withOffset` combinator >> * drop `withStride` combinator >> * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. >> * `Addressable` >> * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. >> * `MemoryLayouts` >> * A new layout, for machine addresses, has been added to the mix. >> >> >> >> ### Implementation changes >> >> There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. >> >> #### Shared segments >> >> The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. >> >> After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. >> >> Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). >> >> The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. >> >> As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. >> >> In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. >> >> To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). >> >> Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). >> >> `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. >> >> The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. >> >> #### Memory access var handles overhaul >> >> The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. >> >> This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. >> >> This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. >> >> #### Test changes >> >> Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. >> >> [1] - https://openjdk.java.net/jeps/393 >> [2] - https://openjdk.java.net/jeps/389 >> [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html >> [4] - https://openjdk.java.net/jeps/312 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - Fix post-merge issues caused by 8219014 > - Merge branch 'master' into 8254162 > - Addess remaining feedback from @AlanBateman and @mrserb > - Address comments from @AlanBateman > - Merge branch 'master' into 8254162 > - Fix issues with derived buffers and IO operations > - More 32-bit fixes for TestLayouts > - * Add final to MappedByteBuffer::SCOPED_MEMORY_ACCESS field > * Tweak TestLayouts to make it 32-bit friendly after recent MemoryLayouts tweaks > - Remove TestMismatch from 32-bit problem list > - Merge branch 'master' into 8254162 > - ... and 19 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...02f9e251 Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From rkennke at openjdk.java.net Sun Nov 8 20:39:54 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Sun, 8 Nov 2020 20:39:54 GMT Subject: Integrated: 8256015: Shenandoah: Add missing Shenandoah implementation in WB_isObjectInOldGen In-Reply-To: References: Message-ID: On Sun, 8 Nov 2020 13:57:03 GMT, Roman Kennke wrote: > The test gc/TestReferenceRefersTo.java fails with Shenandoah because of a missing implementation in WB_isObjectInOldGen: > > Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/gc/shared/collectedHeap.hpp:207), pid=511307, tid=511422 > assert(kind == heap->kind()) failed: Heap kind 6 should be 1 > > This is introduced by JDK-8188055. > > Testing: hotspot_gc_shenandoah, tier1+Shenandoah, TestReferenceRefersTo.java This pull request has now been integrated. Changeset: f39a2c89 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/f39a2c89 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8256015: Shenandoah: Add missing Shenandoah implementation in WB_isObjectInOldGen Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1111 From dholmes at openjdk.java.net Sun Nov 8 21:45:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 8 Nov 2020 21:45:56 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Sat, 7 Nov 2020 17:55:34 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/synchronizer.hpp line 173: >> >>> 171: >>> 172: static MonitorList _in_use_list; >>> 173: static jint _in_use_list_ceiling; >> >> Can you add some commentary on what this ceiling is as I could not understand its role just by looking at the code. Thanks. > > How about this: > static MonitorList _in_use_list; > // The ratio of the current _in_use_list count to the ceiling is used > // to determine if we are above MonitorUsedDeflationThreshold and need > // to do an async monitor deflation cycle. The ceiling is increased by > // AvgMonitorsPerThreadEstimate when a thread is added to the system > // and is decreased by AvgMonitorsPerThreadEstimate when a thread is > // removed from the system. > // Note: If the _in_use_list max exceeds the ceiling, then > // monitors_used_above_threshold() will use the in_use_list max instead > // of the thread count derived ceiling because we have used more > // ObjectMonitors than the estimated average. > static jint _in_use_list_ceiling; Thanks for the comment. So instead of checking the threshhold on each OM allocation we use this averaging technique to estimate the number of monitors in use? Can you explain how this came about rather than the simple/obvious check at allocation time. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From david.holmes at oracle.com Sun Nov 8 22:04:42 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Nov 2020 08:04:42 +1000 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> Message-ID: <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> Hi Martin, On 7/11/2020 3:29 am, Doerr, Martin wrote: > Hi Patricio, > > seems like nobody wanted to be the first person to reply. So I just share a few thoughts. > > Unfortunately, I haven't heard any feedback from end users. > If the Biased Locking Code removal is not urgent because it's in the way for something else, I'd slightly prefer to remove it early in JDK17. > > My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off. > > Some old workloads are heavily affected, like SPEC jvm98. See performance drop on Power9: > http://cr.openjdk.java.net/~mdoerr/BiasedLocking_Power9.png > and on Intel Xeon E5: > http://cr.openjdk.java.net/~mdoerr/BiasedLocking_XeonE5.png > > Are there any plans for mitigations? I don't see what mitigations are possible. We know that if you use heavily synchronized code when it is not necessary (i.e. uncontended) then BL shines at improving performance. Real "old" code would have been updated years ago to move away from the synchronized library classes (Vector, Hashtable) that typically result in these situations. Given that we can't update things like SPEC jvm98, they will show lower performance without BL. But I don't think we should care about this in 2020. Cheers, David > If so, it would be nice to implement them before finally removing BL. > > My 0.02$. > > Best regards, > Martin > > >> -----Original Message----- >> From: hotspot-runtime-dev >> On Behalf Of David Holmes >> Sent: Dienstag, 3. November 2020 22:30 >> To: Patricio Chilano ; hotspot-runtime- >> dev at openjdk.java.net; hotspot-dev developers > dev at openjdk.java.net> >> Subject: Re: Biased locking Obsoletion >> >> Expanding to hotspot-dev. >> >> >> On 4/11/2020 7:08 am, Patricio Chilano wrote: >>> Hi all, >>> >>> As discussed in 8231264, the idea was to switch biased locking to false >>> by default and deprecate all related flags with the intent to remove the >>> code in a future release unless compelling evidence showed that the code >>> is worth maintaining. >>> I see there is only one issue that was filed since biased locking was >>> disabled by default (https://github.com/openjdk/jdk/pull/542) that seems >>> to have been addressed. As per 8231264 change, the code was set to be >>> obsoleted in 16, so we are already in a position to remove biased >>> locking code unless there are arguments for the contrary. The >>> alternative would be to give more time and move biased locking >>> obsoletion to a future release. >>> Let me know your thoughts. >>> >>> Thanks, >>> >>> Patricio From david.holmes at oracle.com Sun Nov 8 22:15:33 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Nov 2020 08:15:33 +1000 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: <2c5e0f26-ef31-2ca4-d51f-f197efdfbcb2@oracle.com> Hi Dan, On 8/11/2020 3:40 am, Daniel D.Daugherty wrote: > On Fri, 6 Nov 2020 02:31:12 GMT, David Holmes wrote: > >>> Changes from @fisk and @dcubed-ojdk to: >>> >>> - simplify ObjectMonitor list management >>> - get rid of Type-Stable Memory (TSM) >>> >>> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >>> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >>> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >>> - a few minor regressions (<= -0.24%) >>> - Volano is 6.8% better >>> >>> Eric C. has also done promotion perf runs on these bits and says "the results look fine". >> >> src/hotspot/share/runtime/synchronizer.cpp line 1748: >> >>> 1746: int* error_cnt_p) { >>> 1747: if (n->owner_is_DEFLATER_MARKER()) { >>> 1748: // This should not happen, but if it does, it is not fatal. >> >> Deflating an in-use monitor is not fatal? Please explain how things would recover. > > When the MonitorDeflationThread is doing its final async deflation cycle before > VM shutdown, it is possible for it to finish a deflation pass on the in-use list and > then block for the final safepoint before unlinking the deflated ObjectMonitors. > > If the VMThread happens to be doing an audit_and_print_stats() call when this > happens (see the end of SafepointSynchronize::do_cleanup_tasks()), then we > don't want that audit to FAIL. > > As for recovery, if the VMThread is doing a final audit, then when it sees an > already deflated ObjectMonitor on the in-use list, it will take care of the unlinking > and deleting... So it isn't really an "in-use" monitor, it is a monitor that is still in the in-use-list - is that right? Thanks, David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/642 > From dholmes at openjdk.java.net Mon Nov 9 01:22:01 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 01:22:01 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v4] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 12:56:19 GMT, Thomas Stuefe wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Review feedback > - Merge > - Feedback David > - Remove unnecessary call-once guard from PosixSignals::install_signal_handlers() > - Initial patch Thanks for the updates. A couple of minor (mostly pre-existing) nits in the comments. David src/hotspot/os/posix/signals_posix.cpp line 475: > 473: // also be called by a user application, if a user application prefers to do > 474: // signal handling itself - in that case it needs to pass signals the hotspot > 475: // internally uses on to the hotspot first. Nit: s/the hotspot/the VM/ src/hotspot/os/posix/signals_posix.cpp line 480: > 478: // routine, and if it returns true (non-zero), then the signal handler must > 479: // return immediately. If the flag "abort_if_unrecognized" is true, then this > 480: // routine will never retun false (zero), but instead will execute a VM panic typo: retun src/hotspot/os/posix/signals_posix.cpp line 481: > 479: // return immediately. If the flag "abort_if_unrecognized" is true, then this > 480: // routine will never retun false (zero), but instead will execute a VM panic > 481: // routine kill the process. "to kill" ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1034 From akozlov at openjdk.java.net Mon Nov 9 01:38:57 2020 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 9 Nov 2020 01:38:57 GMT Subject: Integrated: 8255799: AArch64: CPU_A53MAC feature may be set incorrectly In-Reply-To: References: Message-ID: <2WIwu2cNZOXSmSC-_3j_gEi_BtJQUBfJpY-ifjziv00=.f73c245e-e174-4743-8bd4-f6714ba60a88@github.com> On Thu, 5 Nov 2020 21:29:37 GMT, Anton Kozlov wrote: > Follow-up patch for PR #1039. As clarified by @nick-arm, CPU_A53MAC was set to workaround old Linux bug when A53 cores may be available if only a single A57 core is reported in /proc/cpuinfo. The workaround was broken recently but the bug is assumed to be fixed everywhere, so the workaround can be removed completely. > > CCing old participants: @theRealAph @adinn This pull request has now been integrated. Changeset: 2c8f4e20 Author: Anton Kozlov Committer: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/2c8f4e20 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod 8255799: AArch64: CPU_A53MAC feature may be set incorrectly Reviewed-by: ngasson, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/1084 From dholmes at openjdk.java.net Mon Nov 9 03:16:59 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 03:16:59 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> Message-ID: On Fri, 6 Nov 2020 20:13:10 GMT, Gerard Ziemski wrote: >> hi all, >> >> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >> >> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >> >> #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) >> >> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) >> >> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >> >> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >> >> #6 Coleen's feedback - factored out print_signal_handlers() >> >> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >> >> #8 Thomas's feedback - factored out common POSIX signal initialization code >> >> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >> >> #10 YaSuenag's feedback - unified logging out of the scope for this fix >> >> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Thomas' feedback - cleanup ucontext_get_pc/ucontext_set_pc Latest changes all look good. Only open issue is the include, or not, of signal.h. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/636 From dcubed at openjdk.java.net Mon Nov 9 03:21:56 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 03:21:56 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Sat, 7 Nov 2020 18:01:57 GMT, Daniel D. Daugherty wrote: >> Hi Dan, >> >> Overall this looks great. Comparing old and new code is complex but the new code on its own is generally much simpler/clearer (not all though :) ). >> >> I have a few nits, comments and queries below. >> >> Thanks, >> David > > @dholmes-ora - Thanks for the review! Hmm... I'm not sure why the GitHub UI > send out my replies one-at-a-time. Perhaps I should have replied from the > "files" view instead of the main PR view? It is a deflated monitor that is still on the in-use list. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dholmes at openjdk.java.net Mon Nov 9 03:26:01 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 03:26:01 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: <8bF0fXpoTGxGwg_Ea__S4LApLPhAJ9ejrOP4P_IvaiM=.14114594-11fa-415f-9008-d3be3dfaf0c3@github.com> On Thu, 5 Nov 2020 21:26:16 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: > > - Merge branch '8254162' into 8254231_linker > - Fix post-merge issues caused by 8219014 > - Merge branch 'master' into 8254162 > - Addess remaining feedback from @AlanBateman and @mrserb > - Address comments from @AlanBateman > - Fix typo in upcall helper for aarch64 > - Merge branch '8254162' into 8254231_linker > - Merge branch 'master' into 8254162 > - Fix issues with derived buffers and IO operations > - More 32-bit fixes for TestLayouts > - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 60: > 58: #endif > 59: > 60: TRAPS = Thread::current(); Don't use TRAPS this way - it exists for use in signatures. Just declare: `Thread* THREAD = Thread::current();` ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From dcubed at openjdk.java.net Mon Nov 9 03:50:56 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 03:50:56 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Sun, 8 Nov 2020 21:43:00 GMT, David Holmes wrote: >> How about this: >> static MonitorList _in_use_list; >> // The ratio of the current _in_use_list count to the ceiling is used >> // to determine if we are above MonitorUsedDeflationThreshold and need >> // to do an async monitor deflation cycle. The ceiling is increased by >> // AvgMonitorsPerThreadEstimate when a thread is added to the system >> // and is decreased by AvgMonitorsPerThreadEstimate when a thread is >> // removed from the system. >> // Note: If the _in_use_list max exceeds the ceiling, then >> // monitors_used_above_threshold() will use the in_use_list max instead >> // of the thread count derived ceiling because we have used more >> // ObjectMonitors than the estimated average. >> static jint _in_use_list_ceiling; > > Thanks for the comment. So instead of checking the threshhold on each OM allocation we use this averaging technique to estimate the number of monitors in use? Can you explain how this came about rather than the simple/obvious check at allocation time. Thanks. I'm not sure I understand your question, but let me that a stab at it anyway... We used to compare the sum of the in-use counts from all the in-use lists with the total population of ObjectMonitors. If that ratio was higher than MonitorUsedDeflationThreshold, then we would do an async deflation cycle. Since we got rid of TSM, we no longer had a population of already allocated ObjectMonitors, we had a max value instead. However, when the VMs use of ObjectMonitors is first spinning up, the max value is typically very close to the in-use count so we would always be asking for an async-deflation during that spinning up phase. I created the idea of a ceiling value that is tied to thread count and the AvgMonitorsPerThreadEstimate to replace the population value that we used to have. By comparing the in-use count against the ceiling value, we no longer exceed the MonitorUsedDeflationThreshold when the VMs use of ObjectMonitors is first spinning up so we no longer do async deflations continuously during that phase. If the max value exceeds the ceiling value, then we're using a LOT of ObjectMonitors and, in that case, we compare the in-use count against the max to determine if we're exceeding the MonitorUsedDeflationThreshold. Does this help? ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dholmes at openjdk.java.net Mon Nov 9 04:13:07 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 04:13:07 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 21:26:16 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: > > - Merge branch '8254162' into 8254231_linker > - Fix post-merge issues caused by 8219014 > - Merge branch 'master' into 8254162 > - Addess remaining feedback from @AlanBateman and @mrserb > - Address comments from @AlanBateman > - Fix typo in upcall helper for aarch64 > - Merge branch '8254162' into 8254231_linker > - Merge branch 'master' into 8254162 > - Fix issues with derived buffers and IO operations > - More 32-bit fixes for TestLayouts > - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f A high-level scan through - mostly VM files. src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 99: > 97: if (thread == NULL) { > 98: JavaVM_ *vm = (JavaVM *)(&main_vm); > 99: vm -> functions -> AttachCurrentThreadAsDaemon(vm, &p_env, NULL); Style nit: don't put spaces around `->` operator. What is the context for this being called? It looks highly suspicious to just attach the current thread to the VM this way. src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 105: > 103: assert(thread->is_Java_thread(), "really?"); > 104: > 105: ThreadInVMfromNative __tiv((JavaThread *)thread); Please use `thread->as_Java_thread()` instead of the cast. src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 111: > 109: } > 110: > 111: ResourceMark rm; Pass `thread` to the RM constructor. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 3521: > 3519: __ movl(Address(r15_thread, JavaThread::thread_state_offset()), _thread_in_native_trans); > 3520: > 3521: if (os::is_MP()) { The assumption these days is that we are always MP and we don't litter the code with `os::is_MP()` checks any more. src/hotspot/cpu/x86/universalUpcallHandler_x86.cpp line 53: > 51: Symbol* sig; > 52: } upcall_method; // jdk.internal.foreign.abi.UniversalUpcallHandler::invoke > 53: } upcall_info; Why is this being duplicated in platform specific code when it appears to be common/shared? src/hotspot/cpu/x86/universalUpcallHandler_x86.cpp line 56: > 54: > 55: // FIXME: This should be initialized explicitly instead of lazily/racily > 56: static void upcall_init() { Obviously see all comments on the Aarch64 files. This appears it should be common/shared code. src/hotspot/share/prims/scopedMemoryAccess.cpp line 86: > 84: void do_thread(Thread* thread) { > 85: > 86: JavaThread* jt = (JavaThread*)thread; Please use `thread->as_Java_thread()` instead of the cast. src/hotspot/share/prims/universalNativeInvoker.cpp line 40: > 38: assert(thread->thread_state() == _thread_in_native, "thread state is: %d", thread->thread_state()); > 39: } > 40: assert(thread->thread_state() == _thread_in_vm, "thread state is: %d", thread->thread_state()); Is there some reason you don't trust the thread-state transition code and are asserting it updates the state correctly all the time? :) There are already a number of assertions of this kind within the ThreadToNativeFromVM code. src/java.base/share/classes/jdk/internal/invoke/NativeEntryPoint.java line 63: > 61: } > 62: > 63: public static NativeEntryPoint make(long addr, String name, ABIDescriptorProxy abi, VMStorageProxy[] argMoves, VMStorageProxy[] returnMoves, Where is name validation performed, to ensure the named native method is in fact legal and not trying to provide backdoor access to native code that should be encapsulated and protected? src/java.base/windows/native/libjava/jni_util_md.c line 51: > 49: > 50: // first come, first served > 51: if(EnumProcessModules(hProcess, hMods, sizeof(hMods), &cbNeeded)) { Style check: space after `if` ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From dholmes at openjdk.java.net Mon Nov 9 04:13:10 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 04:13:10 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v14] In-Reply-To: References: Message-ID: <037JOlF9tavFXVI6H6AnLiI3GSpcJLiAOANXz_KTWUg=.16b28fcd-302a-42b2-b493-cd2e4e59b9b2@github.com> On Fri, 30 Oct 2020 12:16:02 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo in upcall helper for aarch64 src/java.base/share/classes/java/lang/System.java line 2086: > 2084: break; > 2085: case "allow": > 2086: allowSecurityManager = MAYBE; Why is this removed? I don't see the connection. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From dholmes at openjdk.java.net Mon Nov 9 04:13:09 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 04:13:09 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 21:42:41 GMT, Coleen Phillimore wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: >> >> - Merge branch '8254162' into 8254231_linker >> - Fix post-merge issues caused by 8219014 >> - Merge branch 'master' into 8254162 >> - Addess remaining feedback from @AlanBateman and @mrserb >> - Address comments from @AlanBateman >> - Fix typo in upcall helper for aarch64 >> - Merge branch '8254162' into 8254231_linker >> - Merge branch 'master' into 8254162 >> - Fix issues with derived buffers and IO operations >> - More 32-bit fixes for TestLayouts >> - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f > > src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 55: > >> 53: >> 54: // FIXME: This should be initialized explicitly instead of lazily/racily >> 55: static void upcall_init() { > > The FIXME is right this should be initialized as a well known class and referred to here as SystemDictionary::ProgrammableUpcallHandler_klass(). This really doesn't belong here. I agree with Coleen. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From patricio.chilano.mateo at oracle.com Mon Nov 9 04:23:09 2020 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Sun, 8 Nov 2020 23:23:09 -0500 Subject: Biased locking Obsoletion In-Reply-To: <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> Message-ID: <4d2609ca-05ed-195f-6e93-7c14e61cbe01@oracle.com> Hi Andrew, On 11/6/20 1:36 PM, Andrew Haley wrote: > On 11/6/20 5:29 PM, Doerr, Martin wrote: >> My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off. > I don't believe they are, because there are several (many?) places in > the Java library that perform badly. In particular, see JDK-8254078, > DataOutputStream is very slow post-disabling of Biased Locking. I looked at JDK-8254078 and run some small tests on the JMH benchmarks defined in DataOutputStreamTest.java so I could have a better picture of the issue. I see that in each of those @benchmarks we are essentially calling in a loop the write() method of BufferedOutputStream, ByteArrayOutputStream or FileOutputStream respectively. In both BufferedOutputStream and ByteArrayOutputStream write() is synchronized and does very little work (buf[count++] = byte) so for those cases the benchmark can basically be reduced to: while (true) { ??? synchronized (BufferedOutputStream/ByteArrayOutputStream object) { ??? ??? buf[count++] = byte; ??? } } Since the workload is single-threaded this particular benchmark is ideal for biased locking. (Although from the performance numbers posted seems BufferedOutputStream didn't get that much worst with BL disabled. Maybe it was an outlier). On the other hand FileOutputStream's write() is not synchronized, but rather is a native call that ends up in a write() syscall, so makes sense that performance didn't change with or without BL for the dataOutputStreamOverRawFileStream case. I run a trace of all synchronization attempts for the 3 benchmarks just to look at the numbers. I uploaded all the results in http://cr.openjdk.java.net/~pchilanomate/jmh%208254078trace/ but here is a summary for the "short" case (only 1iteration/1s - 1warmup/1s): dataOutputStreamOverBufferedFileStream: Count ??? Thread ??? ??? ??? ??? Class 417792 ??? 0x00007fb0a05b3a60 ??? java.io.BufferedOutputStream ?? 3831 ??? 0x00007fb0a0026ec0 ??? java.lang.Object ?? 2111 ??? 0x00007fb0a0026ec0 java.util.concurrent.ConcurrentHashMap$Node ??? 838 ??? 0x00007fb0a0026ec0 java.util.zip.Inflater$InflaterZStreamRef dataOutputStreamOverByteArray: Count ??? Thread ??? ??? ??? ??? Class 417894 ??? 0x00007fd7cc5b3850??? ?java.io.ByteArrayOutputStream ?? 3830 ??? 0x00007fd7cc026ec0 ??? java.lang.Object ?? 2252 ??? 0x00007fd7cc026ec0 java.util.concurrent.ConcurrentHashMap$Node ??? 838 ??? 0x00007fd7cc026ec0 java.util.zip.Inflater$InflaterZStreamRef dataOutputStreamOverRawFileStream: Count ??? Thread ??? ??? ??? ??? Class ? 3831??? ?0x00007eff78026ec0 ??? java.lang.Object ?? 2113 ??? 0x00007eff78026ec0 java.util.concurrent.ConcurrentHashMap$Node ??? 838 ??? 0x00007eff78026ec0 java.util.zip.Inflater$InflaterZStreamRef So again, it makes sense benchmark dataOutputStreamOverRawFileStream is not affected with BL disabled but the other two are. With your patch to DataOutputStream we keep calling a synchronized write() method for the BufferedOutputStream and ByteArrayOutputStream cases, but now we copy more bytes each time we synchronize. The results again for the "short" case (only 1iteration/1s - 1warmup/1s): (all uploaded to http://cr.openjdk.java.net/~pchilanomate/jmh%208254078trace/patched/) dataOutputStreamOverBufferedFileStream: Count ??? Thread ??? ??? ??? ??? Class 354304 ??? 0x00007f540c5d3d50 ??? java.io.BufferedOutputStream ?? 3830 ??? 0x00007f540c026ec0 ??? java.lang.Object ?? 2110 ??? 0x00007f540c026ec0 java.util.concurrent.ConcurrentHashMap$Node ??? 838 ??? 0x00007f540c026ec0 java.util.zip.Inflater$InflaterZStreamRef dataOutputStreamOverByteArray: Count ??? Thread ??? ??? ??? ??? Class 387261 ??? 0x00007fd80c5cb750 ??? java.io.ByteArrayOutputStream ?? 3830 ??? 0x00007fd80c026ec0 ??? java.lang.Object ?? 2252 ??? 0x00007fd80c026ec0 java.util.concurrent.ConcurrentHashMap$Node ??? 838 ??? 0x00007fd80c026ec0 java.util.zip.Inflater$InflaterZStreamRef dataOutputStreamOverRawFileStream: Count ??? Thread ??? ??? ??? ??? ??? Class 3830 ??? 0x00007efd28026ec0 ??? java.lang.Object ?? 2112 ??? 0x00007efd28026ec0 java.util.concurrent.ConcurrentHashMap$Node ??? 838 ??? 0x00007efd28026ec0 java.util.zip.Inflater$InflaterZStreamRef Still a big synchronization count on java.io.BufferedOutputStream and java.io.ByteArrayOutputStream as expected, but since we are copying more bytes each time performance still improves relative to the unpatched version. dataOutputStreamOverRawFileStream remains unaltered in terms of synchronization also as expected, but it gets greatly benefited indirectly because we are now making less native calls for each run of the benchmark. Now, the way I see it is that the problem appears because we are calling a synchronized method when we really shouldn't. If we are in a single-threaded case the DataOutputStream objects should have been backed by an output stream that is unsynchronized. So seems to me that the issue is we are missing an unsynchronized version of BufferedOutputStream and ByteArrayOutputStream. Shouldn't we provide that in the JDK library to solve for these cases ? That way these kind of workflows will run even faster than with biased locking. Thanks, Patricio > This is not a solved problem, and I don't know what a "typical" > user is, but some users may experience significant performance > degradation. This is not a solved problem. From stuefe at openjdk.java.net Mon Nov 9 05:50:13 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 9 Nov 2020 05:50:13 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v5] In-Reply-To: References: Message-ID: > Hi all, > > may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. > > Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. > > This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. > > This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. > > ---- > > This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. > > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. > > Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. > > --- > > The fixed issues: > > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. > > Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. > But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. > > 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). > > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. > > 4) Every platform handler has this section: > > JavaThread* thread = NULL; > VMThread* vmthread = NULL; > if (PosixSignals::are_signal_handlers_installed()) { > if (t != NULL ){ > if(t->is_Java_thread()) { > thread = t->as_Java_thread(); > } > else if(t->is_VM_thread()){ > vmthread = (VMThread *)t; > } > } > } > > `vmthread` is unused on all platforms and can be removed. > > 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): > > if (sig == SIGPIPE || sig == SIGXFSZ) { > // allow chained handler to go first > if (PosixSignals::chained_handler(sig, info, ucVoid)) { > return true; > } else { > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 > return true; > } > } > > - On s390 and ppc, we miss SIGXFSZ handling > _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. > - both paths return true - section can be shortened > > Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. > > 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: > >> // unmask current signal >> sigset_t newset; >> sigemptyset(&newset); >> sigaddset(&newset, sig); >> sigprocmask(SIG_UNBLOCK, &newset, NULL); >> > > - Use of `sigprocmask()` is UB in a multithreaded program. > - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. > > 7) the JFR crash protection is not consistently checked in all platform handlers. > > 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). > > 9) on Linux ppc64 and AIX, we have this section: > >> if (sig == SIGILL && (pc < (address) 0x200)) { >> goto report_and_die; >> } > > which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). > > This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. > > 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. > > On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. > > ---- > > The changes in this patch: > > a) hotspot signal handling is now done by the following functions: > > > | | > v v > javaSignalHandler JVM_handle_linux_signal() > | / > v v > javaSignalHandler_inner > | > v > PosixSignals::pd_hotspot_signal_handler() > > > The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. > > `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. > `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. > > > b) I commonized prologue- and epilogue coding. > - I simplified (4) to a single line in the shared handler > - I moved the JFR thread crash protection (7) up to the shared handler > - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) > - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. > - I simplified (5) and commonized it, and removed (9) completely > - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. > > Thanks for reviewing. > > Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. > > I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. > > Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. > > ---- > > [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html > [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html > [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html > [5] https://bugs.openjdk.java.net/browse/JDK-8253742 Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Fix comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1034/files - new: https://git.openjdk.java.net/jdk/pull/1034/files/d9a2deff..2f466565 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1034&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1034.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1034/head:pull/1034 PR: https://git.openjdk.java.net/jdk/pull/1034 From stuefe at openjdk.java.net Mon Nov 9 05:50:14 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 9 Nov 2020 05:50:14 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v4] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 01:19:01 GMT, David Holmes wrote: > Thanks for the updates. A couple of minor (mostly pre-existing) nits in the comments. > > David Thank you David! ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From david.holmes at oracle.com Mon Nov 9 06:06:34 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Nov 2020 16:06:34 +1000 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: Hi Dan, On 9/11/2020 1:50 pm, Daniel D.Daugherty wrote: > On Sun, 8 Nov 2020 21:43:00 GMT, David Holmes wrote: > >>> How about this: >>> static MonitorList _in_use_list; >>> // The ratio of the current _in_use_list count to the ceiling is used >>> // to determine if we are above MonitorUsedDeflationThreshold and need >>> // to do an async monitor deflation cycle. The ceiling is increased by >>> // AvgMonitorsPerThreadEstimate when a thread is added to the system >>> // and is decreased by AvgMonitorsPerThreadEstimate when a thread is >>> // removed from the system. >>> // Note: If the _in_use_list max exceeds the ceiling, then >>> // monitors_used_above_threshold() will use the in_use_list max instead >>> // of the thread count derived ceiling because we have used more >>> // ObjectMonitors than the estimated average. >>> static jint _in_use_list_ceiling; >> >> Thanks for the comment. So instead of checking the threshhold on each OM allocation we use this averaging technique to estimate the number of monitors in use? Can you explain how this came about rather than the simple/obvious check at allocation time. Thanks. > > I'm not sure I understand your question, but let me that a stab at it anyway... > > We used to compare the sum of the in-use counts from all the in-use lists > with the total population of ObjectMonitors. If that ratio was higher than > MonitorUsedDeflationThreshold, then we would do an async deflation cycle. > Since we got rid of TSM, we no longer had a population of already allocated > ObjectMonitors, we had a max value instead. However, when the VMs use > of ObjectMonitors is first spinning up, the max value is typically very close > to the in-use count so we would always be asking for an async-deflation > during that spinning up phase. > > I created the idea of a ceiling value that is tied to thread count and the > AvgMonitorsPerThreadEstimate to replace the population value that we > used to have. By comparing the in-use count against the ceiling value, we > no longer exceed the MonitorUsedDeflationThreshold when the VMs use > of ObjectMonitors is first spinning up so we no longer do async deflations > continuously during that phase. If the max value exceeds the ceiling value, > then we're using a LOT of ObjectMonitors and, in that case, we compare > the in-use count against the max to determine if we're exceeding the > MonitorUsedDeflationThreshold. > > Does this help? It helps but I'm still wrestling with what MonitorUsedDeflationThreshold actually means now. So the existing MonitorUsedDeflationThreshold is used as a measure of the proportion of monitors actually in-use compared to the number of monitors pre-allocated. If an inflation request requires a new block to be allocated and we're above MonitorUsedDeflationThreshold % then a request for async deflation occurs (when we actually check). The new code, IIUC, says, lets assume we expect AvgMonitorsPerThreadEstimate monitors-per-thread. If the number of monitors in-use is > MonitorUsedDeflationThreshold % of (AvgMonitorsPerThreadEstimate * number_of_threads), then we request async deflation. So ... obviously we need some kind of watermark based system for requesting deflation otherwise there will be far too many deflation requests. And we also don't want to have check for exceeding the threshold on every monitor allocation. So the deflation thread will wakeup periodically and check if the threshold is exceeded. Okay ... so then it comes down to deciding whether AvgMonitorsPerThreadEstimate is the best way to establish the watermark and what the default value should be. This doesn't seem like something that an application developer could reasonably try to estimate so it is just going to be a tuning knob they adjust somewhat arbitrarily. I assume the 1024 default came from tuning something? Have you looked at the affect on memory use these changes have (ie peak RSS use)? Did your performance measurements look at using different values? (I can imagine that with enough memory we can effectively disable deflation and so potentially increase performance. OTOH maybe deflation is so infrequent it is a non-issue.) I have to confess that I never really thought about the old set of heuristics for this, but the fact we're changing the heuristics does raise a concern about what impact applications may see. BTW MonitorUsedDeflationThreshold should really be diagnostic not experimental, as real applications may need to tune it (and people often don't want to use experimental flags in production as a matter of policy). Thanks, David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/642 > From ngasson at openjdk.java.net Mon Nov 9 06:10:03 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 9 Nov 2020 06:10:03 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: <4bUSiuyN2A59lbK4owjFzfRXLm4G49lJo4ObxopBTrs=.bc8c90c7-f831-4814-b5e4-c1f1fcc4e98f@github.com> On Thu, 5 Nov 2020 21:26:16 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: > > - Merge branch '8254162' into 8254231_linker > - Fix post-merge issues caused by 8219014 > - Merge branch 'master' into 8254162 > - Addess remaining feedback from @AlanBateman and @mrserb > - Address comments from @AlanBateman > - Fix typo in upcall helper for aarch64 > - Merge branch '8254162' into 8254231_linker > - Merge branch 'master' into 8254162 > - Fix issues with derived buffers and IO operations > - More 32-bit fixes for TestLayouts > - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f src/hotspot/share/opto/output.cpp line 1697: > 1695: current_offset = cb->insts_size(); > 1696: > 1697: assert(!is_mcall || (call_returns[block->_pre_order] == (uint) current_offset), "ret_addr_offset() did not match size of emitted code"); This assertion is too strong: on AArch64 we generate additional instructions after the BLR (call) instruction for certain types of call. For example 0x0000ffff790f00dc: adr x9, 0x0000ffff790f00f4 0x0000ffff790f00e0: mov x8, #0x5714 // #22292 0x0000ffff790f00e4: movk x8, #0x8d3d, lsl #16 0x0000ffff790f00e8: movk x8, #0xffff, lsl #32 0x0000ffff790f00ec: stp xzr, x9, [sp, #-16]! 0x0000ffff790f00f0: blr x8 0x0000ffff790f00f4: add sp, sp, #0x10 <== ret_addr_offset() is here 0x0000ffff790f00f8: Address 0x0000ffff790f00f8 is out of bounds. <== current_offset is here I think the `==` should be `<=`. (Although this still fails sometimes on AArch64, but I believe it exposes a real bug. I've opened JDK-8256025 and will fix this shortly.) ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From dholmes at openjdk.java.net Mon Nov 9 06:11:03 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 06:11:03 GMT Subject: RFR: JDK-8255711: Fix and unify hotspot signal handlers [v5] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 05:50:13 GMT, Thomas Stuefe wrote: >> Hi all, >> >> may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. >> >> Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. >> >> This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. >> >> This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. >> >> ---- >> >> This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. >> >> See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. >> >> Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. >> >> --- >> >> The fixed issues: >> >> 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. >> >> Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. >> But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. >> >> 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). >> >> 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. >> >> 4) Every platform handler has this section: >> >> JavaThread* thread = NULL; >> VMThread* vmthread = NULL; >> if (PosixSignals::are_signal_handlers_installed()) { >> if (t != NULL ){ >> if(t->is_Java_thread()) { >> thread = t->as_Java_thread(); >> } >> else if(t->is_VM_thread()){ >> vmthread = (VMThread *)t; >> } >> } >> } >> >> `vmthread` is unused on all platforms and can be removed. >> >> 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): >> >> if (sig == SIGPIPE || sig == SIGXFSZ) { >> // allow chained handler to go first >> if (PosixSignals::chained_handler(sig, info, ucVoid)) { >> return true; >> } else { >> // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 >> return true; >> } >> } >> >> - On s390 and ppc, we miss SIGXFSZ handling >> _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. >> - both paths return true - section can be shortened >> >> Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. >> >> 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: >> >>> // unmask current signal >>> sigset_t newset; >>> sigemptyset(&newset); >>> sigaddset(&newset, sig); >>> sigprocmask(SIG_UNBLOCK, &newset, NULL); >>> >> >> - Use of `sigprocmask()` is UB in a multithreaded program. >> - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. >> >> 7) the JFR crash protection is not consistently checked in all platform handlers. >> >> 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). >> >> 9) on Linux ppc64 and AIX, we have this section: >> >>> if (sig == SIGILL && (pc < (address) 0x200)) { >>> goto report_and_die; >>> } >> >> which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). >> >> This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. >> >> 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. >> >> On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. >> >> ---- >> >> The changes in this patch: >> >> a) hotspot signal handling is now done by the following functions: >> >> >> | | >> v v >> javaSignalHandler JVM_handle_linux_signal() >> | / >> v v >> javaSignalHandler_inner >> | >> v >> PosixSignals::pd_hotspot_signal_handler() >> >> >> The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. >> >> `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. >> `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. >> >> >> b) I commonized prologue- and epilogue coding. >> - I simplified (4) to a single line in the shared handler >> - I moved the JFR thread crash protection (7) up to the shared handler >> - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) >> - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. >> - I simplified (5) and commonized it, and removed (9) completely >> - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. >> >> Thanks for reviewing. >> >> Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. >> >> I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. >> >> Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. >> >> ---- >> >> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html >> [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html >> [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html >> [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html >> [5] https://bugs.openjdk.java.net/browse/JDK-8253742 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From eosterlund at openjdk.java.net Mon Nov 9 08:38:01 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Nov 2020 08:38:01 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Sat, 7 Nov 2020 17:01:42 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 380: >> >>> 378: if (event.should_commit()) { >>> 379: event.set_monitorClass(object()->klass()); >>> 380: event.set_address((uintptr_t)this); >> >> This looks wrong - the event should refer to the Object whose monitor we have entered, not the OM itself. > > I noticed that in my preliminary review of Erik's changes. He checked > with the JFR guys and they said it just needed to be an address and > does not have to refer to the Object. > > @fisk - can you think of a comment we should add here? We could write something along the lines of "An address that is 'unique enough', such that events close in time and with the same address are likely (but not guaranteed) to belong to the same object". This uniqueness property has always been more of a heuristic thing than anything else, as deflation shuffles the addresses around. Taking the this pointer vs an offset into the this pointer does however serve the exact same purpose; there was never any correlation to the contents of the object field. >> src/hotspot/share/runtime/objectMonitor.cpp line 1472: >> >>> 1470: event->set_monitorClass(monitor->object()->klass()); >>> 1471: event->set_timeout(timeout); >>> 1472: event->set_address((uintptr_t)monitor); >> >> Again the event should refer to the Object, not the OM. > > I noticed that in my preliminary review of Erik's changes. He checked > with the JFR guys and they said it just needed to be an address and > does not have to refer to the Object. > > @fisk - can you think of a comment we should add here? I wrote one in the section above, hope it is useful. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From eosterlund at openjdk.java.net Mon Nov 9 08:49:00 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Nov 2020 08:49:00 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Sat, 7 Nov 2020 17:11:42 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 94: >> >>> 92: // Find next live ObjectMonitor. >>> 93: ObjectMonitor* next = m; >>> 94: while (next != NULL && next->is_being_async_deflated()) { >> >> Nit: This loop seems odd. Given we know m is_being_async_deflated, this should either be a do/while loop, or else we should initialize: >> >> ObjectMonitor* next = m->next_om(); >> >> and dispense with the awkwardly named next_next. > > @fisk - I'm leaving this one for you for now. Changing it to a do/while loop makes sense. The while condition is always true the first iteration, so doing a while or do/while loop is equivalent. If you find the do/while loop easier to read, then that sounds good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From eosterlund at openjdk.java.net Mon Nov 9 09:05:59 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Nov 2020 09:05:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Sat, 7 Nov 2020 17:22:21 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 1520: >> >>> 1518: // deflated in this cycle. >>> 1519: size_t deleted_count = 0; >>> 1520: for (ObjectMonitor* monitor: delete_list) { >> >> I didn't realize C++ has a "foreach" loop construct! Is this in our allowed C++ usage? > > @fisk - this one is for you... :-) Yeah this is one of the new cool features we can use. I thought it is allowed, because it is neither in the excluded nor undecided list of features in our doc/hotspot-style.md file. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From shade at openjdk.java.net Mon Nov 9 09:19:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 9 Nov 2020 09:19:57 GMT Subject: RFR: 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" In-Reply-To: References: <1OzPeIS9fm-ju9MIajtY8pz_rf0NVtKiDeiTd29_zmc=.5800f906-fb0f-4080-b41f-0e39864fd2ae@github.com> Message-ID: On Wed, 4 Nov 2020 22:45:31 GMT, Aleksey Shipilev wrote: >> Sounds good to me. >> >> (As usual with shared HotSpot code, remember to leave this open for a while to allow people in other timezones time to see it.) > > Friendly reminder if anyone else wants to chime in. Last call, if anyone has objections. ------------- PR: https://git.openjdk.java.net/jdk/pull/1019 From rkennke at openjdk.java.net Mon Nov 9 09:21:58 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 9 Nov 2020 09:21:58 GMT Subject: Integrated: 8255991: Shenandoah: Apply 'weak' LRB on cmpxchg and xchg In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 17:45:53 GMT, Roman Kennke wrote: > It is possible to access a Reference's referent by using various cmpxchg and xchg intrinsics. When that happens, we need to apply the weak LRB to prevent resurrection. > > Testing: hotspot_gc_shenandoah This pull request has now been integrated. Changeset: d99e1f6c Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/d99e1f6c Stats: 8 lines in 3 files changed: 3 ins; 0 del; 5 mod 8255991: Shenandoah: Apply 'weak' LRB on cmpxchg and xchg Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1098 From shade at openjdk.java.net Mon Nov 9 09:22:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 9 Nov 2020 09:22:58 GMT Subject: RFR: 8142984: Zero: fast accessors should handle both getters and setters [v2] In-Reply-To: References: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> Message-ID: On Mon, 2 Nov 2020 21:17:01 GMT, Andrew John Hughes wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8142984: Zero: fast accessors should handle both getters and setters > > Looks good to me. @coleenp or other runtime folks might want to take a look as well? ------------- PR: https://git.openjdk.java.net/jdk/pull/728 From tschatzl at openjdk.java.net Mon Nov 9 09:25:59 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Nov 2020 09:25:59 GMT Subject: RFR: 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 19:28:29 GMT, Aleksey Shipilev wrote: > When doing Zero VM performance investigations, I realized that `UseTLAB` is disabled there by default. > > That is effectively because `UseTLAB` is currently `product_pd` (defined in `gc_globals.hpp`), and it is enabled for every platform with C1 and C2 ports (in their respective `c1/c2_globals.hpp`). `compiler_globals.hpp` has a block that defines `UseTLAB` to `false` when no C1/C2/JVMCI is present. > > Not only this is awkward -- GC flag is managed by Compiler globals! -- it makes Zero awkward to opt-in to `UseTLAB` in `*_zero_globals.hpp`, because `compiler_globals.hpp` already defines it. I think we can make this all better by turning TLAB flags from `product_pd` to `product`, and defaulting them to `true`. This matches what every current Server/Minimal VM config has, and would implicitly enable `UseTLAB` and `ResizeTLAB` for Zero, as well as for builds with `--with-jvm-features=-compiler1,-compiler-2,-jvmci` (in case anyone actually builds it, that is only with template interpreter). > > On the downside, this shuts the door for new platform ports to disable TLAB flags by default to ease porting. But I believe the same can be achieved by turning these flags off in `arguments.cpp` under the special platform defines, while TLAB enablement work is in progress. > > Additional testing: > - [x] Linux x86_64 Zero ad-hoc runs > - [x] Linux x86_64 `--with-jvm-features=-compiler1,-compiler-2,-jvmci,*` builds Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1019 From aph at redhat.com Mon Nov 9 09:44:28 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 9 Nov 2020 09:44:28 +0000 Subject: Biased locking Obsoletion In-Reply-To: <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> Message-ID: On 11/8/20 10:04 PM, David Holmes wrote: > Hi Martin, > > On 7/11/2020 3:29 am, Doerr, Martin wrote: >> Hi Patricio, >> >> seems like nobody wanted to be the first person to reply. So I just share a few thoughts. >> >> Unfortunately, I haven't heard any feedback from end users. >> If the Biased Locking Code removal is not urgent because it's in the way for something else, I'd slightly prefer to remove it early in JDK17. >> >> My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off. >> >> Some old workloads are heavily affected, like SPEC jvm98. See performance drop on Power9: >> http://cr.openjdk.java.net/~mdoerr/BiasedLocking_Power9.png >> and on Intel Xeon E5: >> http://cr.openjdk.java.net/~mdoerr/BiasedLocking_XeonE5.png >> >> Are there any plans for mitigations? > > I don't see what mitigations are possible. We know that if you use > heavily synchronized code when it is not necessary (i.e. uncontended) > then BL shines at improving performance. Real "old" code would have been > updated years ago to move away from the synchronized library classes > (Vector, Hashtable) that typically result in these situations. Given > that we can't update things like SPEC jvm98, they will show lower > performance without BL. But I don't think we should care about this in 2020. Well, we kind-of have to care, given that some of this "old" code that would have been updated years ago is still in core classes in the Java library. JDK-8254078 is a simple example, and a very modest proposal for change, and it's still stuck in CSR. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Alan.Bateman at oracle.com Mon Nov 9 10:19:13 2020 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 9 Nov 2020 10:19:13 +0000 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> Message-ID: <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> On 09/11/2020 09:44, Andrew Haley wrote: > : > JDK-8254078 is a simple example, and a very modest proposal for change, > and it's still stuck in CSR. The CSR was approved and closed on Oct 24 so you should be good to go. -Alan From jvernee at openjdk.java.net Mon Nov 9 11:12:05 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 9 Nov 2020 11:12:05 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: <4bUSiuyN2A59lbK4owjFzfRXLm4G49lJo4ObxopBTrs=.bc8c90c7-f831-4814-b5e4-c1f1fcc4e98f@github.com> References: <4bUSiuyN2A59lbK4owjFzfRXLm4G49lJo4ObxopBTrs=.bc8c90c7-f831-4814-b5e4-c1f1fcc4e98f@github.com> Message-ID: <3Rq38lsrqgc6urjbkdsLYLCMljA519ITzLs6cLnOEfk=.24bc9068-30f4-4a8d-91d9-20bcd61696c1@github.com> On Mon, 9 Nov 2020 06:07:32 GMT, Nick Gasson wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: >> >> - Merge branch '8254162' into 8254231_linker >> - Fix post-merge issues caused by 8219014 >> - Merge branch 'master' into 8254162 >> - Addess remaining feedback from @AlanBateman and @mrserb >> - Address comments from @AlanBateman >> - Fix typo in upcall helper for aarch64 >> - Merge branch '8254162' into 8254231_linker >> - Merge branch 'master' into 8254162 >> - Fix issues with derived buffers and IO operations >> - More 32-bit fixes for TestLayouts >> - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f > > src/hotspot/share/opto/output.cpp line 1697: > >> 1695: current_offset = cb->insts_size(); >> 1696: >> 1697: assert(!is_mcall || (call_returns[block->_pre_order] == (uint) current_offset), "ret_addr_offset() did not match size of emitted code"); > > This assertion is too strong: on AArch64 we generate additional instructions after the BLR (call) instruction for certain types of call. For example > > > 0x0000ffff790f00dc: adr x9, 0x0000ffff790f00f4 > 0x0000ffff790f00e0: mov x8, #0x5714 // #22292 > 0x0000ffff790f00e4: movk x8, #0x8d3d, lsl #16 > 0x0000ffff790f00e8: movk x8, #0xffff, lsl #32 > 0x0000ffff790f00ec: stp xzr, x9, [sp, #-16]! > 0x0000ffff790f00f0: blr x8 > 0x0000ffff790f00f4: add sp, sp, #0x10 <== ret_addr_offset() is here > 0x0000ffff790f00f8: Address 0x0000ffff790f00f8 is out of bounds. <== current_offset is here > > I think the `==` should be `<=`. (Although this still fails sometimes on AArch64, but I believe it exposes a real bug. I've opened JDK-8256025 and will fix this shortly.) Ok, that seems fine to me. IIRC the problem this was trying to catch is a ret_addr_offset that is too large, which might cause a later call's oop map to be overridden. So, using `<=` should still work. At least if the code between ret_addr_offset and current_offset is guaranteed not to contain any calls or other safepoints. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From martin.doerr at sap.com Mon Nov 9 11:23:34 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 9 Nov 2020 11:23:34 +0000 Subject: Biased locking Obsoletion In-Reply-To: <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> Message-ID: Thanks for all your replies! And thanks for bringing up JDK-8254078 again, Andrew! I think we can live with the regression in jvm98, but we should better address the planned mitigations before finally removing BL. Maybe there are more such kind of things planned? Besides this, I'm only a bit concerned about the LTS users. I guess most "traditional" workload is running on jdk8 or 11. So these users may notice the impact when migrating to 17. I hope people will not refrain from doing so because of missing BL. Best regards, Martin > -----Original Message----- > From: David Holmes > Sent: Sonntag, 8. November 2020 23:05 > To: Doerr, Martin ; Patricio Chilano > ; hotspot-runtime- > dev at openjdk.java.net; hotspot-dev developers dev at openjdk.java.net> > Subject: Re: Biased locking Obsoletion > > Hi Martin, > > On 7/11/2020 3:29 am, Doerr, Martin wrote: > > Hi Patricio, > > > > seems like nobody wanted to be the first person to reply. So I just share a > few thoughts. > > > > Unfortunately, I haven't heard any feedback from end users. > > If the Biased Locking Code removal is not urgent because it's in the way for > something else, I'd slightly prefer to remove it early in JDK17. > > > > My impression is that modern workloads are fine without BL, so typical > JDK15 users will probably not notice it was switched off. > > > > Some old workloads are heavily affected, like SPEC jvm98. See > performance drop on Power9: > > http://cr.openjdk.java.net/~mdoerr/BiasedLocking_Power9.png > > and on Intel Xeon E5: > > http://cr.openjdk.java.net/~mdoerr/BiasedLocking_XeonE5.png > > > > Are there any plans for mitigations? > > I don't see what mitigations are possible. We know that if you use > heavily synchronized code when it is not necessary (i.e. uncontended) > then BL shines at improving performance. Real "old" code would have been > updated years ago to move away from the synchronized library classes > (Vector, Hashtable) that typically result in these situations. Given > that we can't update things like SPEC jvm98, they will show lower > performance without BL. But I don't think we should care about this in 2020. > > Cheers, > David > > > If so, it would be nice to implement them before finally removing BL. > > > > My 0.02$. > > > > Best regards, > > Martin > > > > > >> -----Original Message----- > >> From: hotspot-runtime-dev retn at openjdk.java.net> > >> On Behalf Of David Holmes > >> Sent: Dienstag, 3. November 2020 22:30 > >> To: Patricio Chilano ; hotspot- > runtime- > >> dev at openjdk.java.net; hotspot-dev developers >> dev at openjdk.java.net> > >> Subject: Re: Biased locking Obsoletion > >> > >> Expanding to hotspot-dev. > >> > >> > >> On 4/11/2020 7:08 am, Patricio Chilano wrote: > >>> Hi all, > >>> > >>> As discussed in 8231264, the idea was to switch biased locking to false > >>> by default and deprecate all related flags with the intent to remove the > >>> code in a future release unless compelling evidence showed that the > code > >>> is worth maintaining. > >>> I see there is only one issue that was filed since biased locking was > >>> disabled by default (https://github.com/openjdk/jdk/pull/542) that > seems > >>> to have been addressed. As per 8231264 change, the code was set to be > >>> obsoleted in 16, so we are already in a position to remove biased > >>> locking code unless there are arguments for the contrary. The > >>> alternative would be to give more time and move biased locking > >>> obsoletion to a future release. > >>> Let me know your thoughts. > >>> > >>> Thanks, > >>> > >>> Patricio From volker.simonis at gmail.com Mon Nov 9 11:30:54 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 9 Nov 2020 12:30:54 +0100 Subject: Biased locking Obsoletion In-Reply-To: <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> Message-ID: The following is just my personal opinion based on my feeling - it's not backed by any data. The arguments for removing BiasedLocking go like this: - only single-threaded legacy code using legacy APIs is benefiting from BiasedLocking - badly written code (which can easily be fixed) is benefiting from BiasedLocking BiasedLocking was disabled in JDK15. That's not a release which is widely used in production. Not many enterprise workloads have even migrated to JDK 11. We know that migration to JDK 11 is hard and migration to the next LTS version 17 will be even harder. Big applications tend to have a lot of dependencies and code which can't be easily upgraded or rewritten (even if it's badly written or uses "old" APIs). In order to not introduce yet another upgrade problem I think it would make sense to keep BiasedLocking alive in the next LTS release 17. Removing it in 18 would be fine. Best regards, Volker On Mon, Nov 9, 2020 at 11:19 AM Alan Bateman wrote: > > On 09/11/2020 09:44, Andrew Haley wrote: > > : > > JDK-8254078 is a simple example, and a very modest proposal for change, > > and it's still stuck in CSR. > > The CSR was approved and closed on Oct 24 so you should be good to go. > > -Alan From jvernee at openjdk.java.net Mon Nov 9 11:57:07 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 9 Nov 2020 11:57:07 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 03:29:05 GMT, David Holmes wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: >> >> - Merge branch '8254162' into 8254231_linker >> - Fix post-merge issues caused by 8219014 >> - Merge branch 'master' into 8254162 >> - Addess remaining feedback from @AlanBateman and @mrserb >> - Address comments from @AlanBateman >> - Fix typo in upcall helper for aarch64 >> - Merge branch '8254162' into 8254231_linker >> - Merge branch 'master' into 8254162 >> - Fix issues with derived buffers and IO operations >> - More 32-bit fixes for TestLayouts >> - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f > > src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 99: > >> 97: if (thread == NULL) { >> 98: JavaVM_ *vm = (JavaVM *)(&main_vm); >> 99: vm -> functions -> AttachCurrentThreadAsDaemon(vm, &p_env, NULL); > > Style nit: don't put spaces around `->` operator. > > What is the context for this being called? It looks highly suspicious to just attach the current thread to the VM this way. The context is a thread that is spawned by native code doing an upcall. We need to attach the thread to the VM first in that case. Normally this would be handled by the calling code, but in our case the calling code doesn't know it's calling into Java. > src/java.base/share/classes/jdk/internal/invoke/NativeEntryPoint.java line 63: > >> 61: } >> 62: >> 63: public static NativeEntryPoint make(long addr, String name, ABIDescriptorProxy abi, VMStorageProxy[] argMoves, VMStorageProxy[] returnMoves, > > Where is name validation performed, to ensure the named native method is in fact legal and not trying to provide backdoor access to native code that should be encapsulated and protected? The name is just used as debugging information (in e.g. CallNativeNode::dump_spec), we are not looking it up. The address that is passed there is the actual function target. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From stuefe at openjdk.java.net Mon Nov 9 12:05:58 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 9 Nov 2020 12:05:58 GMT Subject: Integrated: JDK-8255711: Fix and unify hotspot signal handlers In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 10:06:50 GMT, Thomas Stuefe wrote: > Hi all, > > may I please have opinions and reviews on this cleanup-fix-patch for the hotspot signal handlers. > > Its main intent is to simplify coding and to commonize some of it across all Posix platforms where possible. Also to fix a number of smaller issues. > > This will have a number of benefits, mainly easing maintenance pain for porters and reducing bitrot for platform dependent code. > > This all builds upon the work @gerard-ziemski did with https://bugs.openjdk.java.net/browse/JDK-8252324. > > ---- > > This cleanup was made more complicated by the fact that there exists a non-obvious and undocumented way for a third party app to chain signal handlers (beside the documented one of using libjsig). It seems that the JVM_handle_xxx_functions() are in fact interfaces for third party coding to invoke hotspot signal handling. This only makes sense in combination with `-XX:+AllowUserSignalHandlers`. A cursory github search revealed that this flag is used quite a bit. > > See a more in-depth discussion here: [4]. Thanks to @dholmes-ora for untangling this bit of history. > > Unfortunately there is no official documentation from Sun or Oracle, and zero regression tests. So I try to preserve this interface as best as I can. I plan to add a proper regression test with a later change, but for now I don't have the time for that. > > --- > > The fixed issues: > > 1) `PosixSignals::_are_signal_handlers_installed` is used inside the platform handlers to guard a part of the platform handlers against execution in case the signal handlers are not yet installed. > > Initially this confused me since when this handler is called it would of course be installed. So that boolean would always be true. The only explanation I found was that since these handlers can be invoked directly from outside, this is some (ineffective) form of guard against calling this handler too early. > But that guard can be left out and that boolean removed. Our signal handlers are safe to call before VM initialization is completed. > > 2) The return code of JVM_handle_xxx_signal() was inconsistently set (some as bool, some as int) as well as unused in normal code paths (excluding outside calls). > > 3) JVM_handle_xxx_signal are supposed to be exported, but on AIX there is a day-zero bug which caused it to not be exported. > > 4) Every platform handler has this section: > > JavaThread* thread = NULL; > VMThread* vmthread = NULL; > if (PosixSignals::are_signal_handlers_installed()) { > if (t != NULL ){ > if(t->is_Java_thread()) { > thread = t->as_Java_thread(); > } > else if(t->is_VM_thread()){ > vmthread = (VMThread *)t; > } > } > } > > `vmthread` is unused on all platforms and can be removed. > > 5) Every platform handler has some variant of this section, to ignore SIGPIPE, SIGXFSZ (whose default actions are to terminate the VM): > > if (sig == SIGPIPE || sig == SIGXFSZ) { > // allow chained handler to go first > if (PosixSignals::chained_handler(sig, info, ucVoid)) { > return true; > } else { > // Ignoring SIGPIPE/SIGXFSZ - see bugs 4229104 or 6499219 > return true; > } > } > > - On s390 and ppc, we miss SIGXFSZ handling > _Update: Fixed separately for easier backport, see [https://bugs.openjdk.java.net/browse/JDK-8255734](JDK-8255734)_. > - both paths return true - section can be shortened > > Side note: having handlers for those signals may be unnecessary. We could just set the signal handler to `SIG_IGN`. We would have to tiptoe around any third party handlers for those signals, but it still may be simpler. > > 6) At the end of every platform header, before calling into fatal error handling, we unblock the signal: > >> // unmask current signal >> sigset_t newset; >> sigemptyset(&newset); >> sigaddset(&newset, sig); >> sigprocmask(SIG_UNBLOCK, &newset, NULL); >> > > - Use of `sigprocmask()` is UB in a multithreaded program. > - but then, this section is unnecessary anyway, since [https://bugs.openjdk.java.net/browse/JDK-8252533](JDK-8252533) we unmask error signals at the start of the signal handler. > > 7) the JFR crash protection is not consistently checked in all platform handlers. > > 8) On Zero, when entering fatal error handling, we do so via fatal() instead of VMError::report_and_die(), thereby discarding the real crash context and obfuscating the register content in the hs-err file (we still see registers, but those stem from the assertion-poison-page mechanism). > > 9) on Linux ppc64 and AIX, we have this section: > >> if (sig == SIGILL && (pc < (address) 0x200)) { >> goto report_and_die; >> } > > which is related to the fact that the zero page on AIX is readable, filled with 0, and reading instructions from it will yield us a SIGILL, not a SIGSEGV (0 is not a noop on PPC, so we don't nop-slide). > > This coding is irrelevant on Linux. On AIX, it can also be removed, since this SIGILL would be unrecognized by the hotspot and later count as fatal error anyway. > > 10) When invoking the fatal error handler, we extract the pc from the context and hand it over as "faulting pc". For SIGILL and SIGFPE, this is not totally correct. According to POSIX [3], for those signals the address of the faulting instruction is handed over in `si_info.si_addr`. > > On most platforms this does not matter, they are the same. But on some architectures the pc in the signal context actually points somewhere else, e.g. beyond the faulting instruction. Therefore `si_info.si_addr` is the better choice. > > ---- > > The changes in this patch: > > a) hotspot signal handling is now done by the following functions: > > > | | > v v > javaSignalHandler JVM_handle_linux_signal() > | / > v v > javaSignalHandler_inner > | > v > PosixSignals::pd_hotspot_signal_handler() > > > The right branch only exists to support the `AllowUserSignalHandlers` case, see [4]. > > `javaSignalHandler` is registered as handler, as it was before; JVM_handle_linux_signal() is exported as before. > `javaSignalHandler_inner` contains the shared portion of the signal handler; `PosixSignals::pd_hotspot_signal_handler()` contains the remaining platform dependent portions. > > > b) I commonized prologue- and epilogue coding. > - I simplified (4) to a single line in the shared handler > - I moved the JFR thread crash protection (7) up to the shared handler > - I moved the complete epilogue up to the shared handler. That includes calling the chained handlers, should they exist, as well as invoking the fatal error handler. That fixes (8) and (6) > - Zero has this tradition of showing a robot telling a cat about the error signal, which I like, and kept. > - I simplified (5) and commonized it, and removed (9) completely > - In PosixSignals::install_signal_handlers(), I removed the `signal_handlers_are_installed` guard and replaced it with an assert. Unfortunately this causes lots of indentation changes. @gerard-ziemski: if this clashes too much with your patch for JDK-8253742, I'll leave that part out. > > Thanks for reviewing. > > Testing: this patch ran through our nightlies, in an earlier form. They will be re-ran some more times. > > I'd be happy if aarch64 porters could take a look at the aarch64 portion of this change. > > Please note that I had to draw a line somewhere - this is an open ended issue and a lot more could be cleaned. See also Gerard's work on [5], which is under review too. > > ---- > > [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043145.html > [2] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-October/043191.html > [3] https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html > [4] https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004887.html > [5] https://bugs.openjdk.java.net/browse/JDK-8253742 This pull request has now been integrated. Changeset: dd8e4ffb Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/dd8e4ffb Stats: 1026 lines in 15 files changed: 174 ins; 757 del; 95 mod 8255711: Fix and unify hotspot signal handlers Reviewed-by: coleenp, gziemski, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/1034 From adinn at redhat.com Mon Nov 9 12:11:22 2020 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 9 Nov 2020 12:11:22 +0000 Subject: Biased locking Obsoletion In-Reply-To: <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> Message-ID: <4c6d7418-1e99-4f31-af66-3dc949cc0602@redhat.com> On 06/11/2020 18:36, Andrew Haley wrote: > On 11/6/20 5:29 PM, Doerr, Martin wrote: >> My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off. > > I don't believe they are, because there are several (many?) places in > the Java library that perform badly. In particular, see JDK-8254078, > DataOutputStream is very slow post-disabling of Biased Locking. > > This is not a solved problem, and I don't know what a "typical" > user is, but some users may experience significant performance > degradation. This is not a solved problem. Red Hat has been conducting an internal review of our middleware products to assess the impact of biased lock removal. For the cases we have been able to test (which we acknowledge are /far/ from comprehensive) we have not found any clear picture of outright gain or loss in performance in the middleware code per se. We have almost always seen small improvements or degradations in performance (sometimes in the same product with different workloads). The only case where we have seen a significant loss of performance with biased locking disabled was for our Transactions product when running in a /non-production/ operational mode. That specific degradation was not directly caused by our middleware implementation. It resulted from unnecessary and unavoidable synchronization overheads in /standard/ JDK library code that the middleware relied on (specifically, the DataOutputStream issue that Andrew Haley addressed in JDK-8254078). Note that this issue did not affect /production/ operational modes because in those modes the extra synch costs were amortized across storage access costs. We think the same story most likely explains the lack of significant degradation for our other middleware products. However, that does not mean the JDK gets the all clear. Indeed, it may not even leave our middleware in the clear. With only limited testing we cannot be sure storage costs will mask synchronizations costs in all production use cases. More generally, it is quite probable that there are applications built over DataOutputStream or other JDK XXXOutputStream APIs (perhaps also other class hierarchies) which don't suffer from synchronization costs at present but are going to hit problems when biased locking is removed. The argument from David Holmes that: Real "old" code would have been updated years ago to move away from the synchronized library classes misses the obvious point that "real, old code" and, quite possibly, some "real, relatively new code" will not have manifested performance issues when using these old skool JDK classes with biased locking enabled. Relying on users to have already fixed problems when they are not visible prior to the deprecation change seems to me, at best, unrealistic. I also feel that relying on users to come up their own alternatives to work around the limitations of these /standard/ classes -- if and when problems do turn up -- would be an inadequate response. If a need emerges for unsynchronized, non-thread safe alternatives to JDK library classes because of this JVM change then I think it would be appropriate for that need to be met by upgrading the JDK itself to provide either an alternative class or alternative mode of operation for the existing class. A common, well built, integrated solution would be preferable to many ad hoc alternatives. Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From jvernee at openjdk.java.net Mon Nov 9 12:15:01 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 9 Nov 2020 12:15:01 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 03:31:15 GMT, David Holmes wrote: >> src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 55: >> >>> 53: >>> 54: // FIXME: This should be initialized explicitly instead of lazily/racily >>> 55: static void upcall_init() { >> >> The FIXME is right this should be initialized as a well known class and referred to here as SystemDictionary::ProgrammableUpcallHandler_klass(). This really doesn't belong here. > > I agree with Coleen. I'll give this another try, but I think last time I tried this resolution of the class failed when trying to build the JDK, seemingly since it exists in an incubator module, which is not always added to the module graph. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From mcimadamore at openjdk.java.net Mon Nov 9 12:28:13 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 9 Nov 2020 12:28:13 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v23] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Remove endianness-aware byte getter/setter in MemoryAccess Remove index-based version of byte getter/setter in MemoryAccess ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/548/files - new: https://git.openjdk.java.net/jdk/pull/548/files/02f9e251..6940f0ac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=22 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=21-22 Stats: 110 lines in 3 files changed: 0 ins; 98 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From mcimadamore at openjdk.java.net Mon Nov 9 12:28:16 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 9 Nov 2020 12:28:16 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v22] In-Reply-To: References: Message-ID: On Sun, 8 Nov 2020 16:28:41 GMT, Alan Bateman wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: >> >> - Fix post-merge issues caused by 8219014 >> - Merge branch 'master' into 8254162 >> - Addess remaining feedback from @AlanBateman and @mrserb >> - Address comments from @AlanBateman >> - Merge branch 'master' into 8254162 >> - Fix issues with derived buffers and IO operations >> - More 32-bit fixes for TestLayouts >> - * Add final to MappedByteBuffer::SCOPED_MEMORY_ACCESS field >> * Tweak TestLayouts to make it 32-bit friendly after recent MemoryLayouts tweaks >> - Remove TestMismatch from 32-bit problem list >> - Merge branch 'master' into 8254162 >> - ... and 19 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...02f9e251 > > Marked as reviewed by alanb (Reviewer). > I see the xxxByteAtIndex methods that took a ByteOrder have been removed from MemoryAccess. Should the xxxByte and xxxByteAtOffset that take a ByteOrder be removed too? I've addresses this in the latest iteration. Since I was there I also removed `getByteAtIndex` and `getByteAtIndex`, since their behavior is identical to that of `getByteAtOffset` and `setByteAtOffset`, respectively (in other words, the indexed variants are not really helpful until carrier size > 1 byte). ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From kbarrett at openjdk.java.net Mon Nov 9 12:38:04 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 9 Nov 2020 12:38:04 GMT Subject: RFR: 8252588: HotSpot Style Guide should permit uniform initialization Message-ID: Please review and vote on this change to the HotSpot Style Guide to permit the use of uniform initialization, aka brace initialization, in HotSpot code. Uniform initialization is a feature added in C++11. This is a modification of the Style Guide, so rough consensus among the HotSpot Group members is required to make this change. Only Group members should vote for approval (via the github PR), though reasoned objectsions or comments from anyone will be considered. A decision to approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. [Note: This is the first attempt to change the Style Guide since the revision that added a description for a change process (requires rough consensus of the Group), and also since the start of using git and github PRs. I'm making a guess at how to instantiate that process within the new mechanisms.] ------------- Commit messages: - permit uniform initialization Changes: https://git.openjdk.java.net/jdk/pull/1119/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1119&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252588 Stats: 29 lines in 2 files changed: 29 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1119.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1119/head:pull/1119 PR: https://git.openjdk.java.net/jdk/pull/1119 From mcimadamore at openjdk.java.net Mon Nov 9 13:22:14 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 9 Nov 2020 13:22:14 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v24] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Improve debugging output of TestHandhsake ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/548/files - new: https://git.openjdk.java.net/jdk/pull/548/files/6940f0ac..f5d339a7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=23 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=22-23 Stats: 37 lines in 1 file changed: 11 ins; 1 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From david.holmes at oracle.com Mon Nov 9 13:33:49 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Nov 2020 23:33:49 +1000 Subject: Biased locking Obsoletion In-Reply-To: <4c6d7418-1e99-4f31-af66-3dc949cc0602@redhat.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> <4c6d7418-1e99-4f31-af66-3dc949cc0602@redhat.com> Message-ID: <11e412f4-12fb-2200-61d8-b8acc5f52b1c@oracle.com> On 9/11/2020 10:11 pm, Andrew Dinn wrote: > On 06/11/2020 18:36, Andrew Haley wrote: >> On 11/6/20 5:29 PM, Doerr, Martin wrote: >>> My impression is that modern workloads are fine without BL, so >>> typical JDK15 users will probably not notice it was switched off. >> >> I don't believe they are, because there are several (many?) places in >> the Java library that perform badly. In particular, see JDK-8254078, >> DataOutputStream is very slow post-disabling of Biased Locking. >> >> This is not a solved problem, and I don't know what a "typical" >> user is, but some users may experience significant performance >> degradation. This is not a solved problem. > Red Hat has been conducting an internal review of our middleware > products to assess the impact of biased lock removal. For the cases we > have been able to test (which we acknowledge are /far/ from > comprehensive) we have not found any clear picture of outright gain or > loss in performance in the middleware code per se. We have almost always > seen small improvements or degradations in performance (sometimes in the > same product with different workloads). > > The only case where we have seen a significant loss of performance with > biased locking disabled was for our Transactions product when running in > a /non-production/ operational mode. That specific degradation was not > directly caused by our middleware implementation. It resulted from > unnecessary and unavoidable synchronization overheads in /standard/ JDK > library code that the middleware relied on (specifically, the > DataOutputStream issue that Andrew Haley addressed in JDK-8254078). > > Note that this issue did not affect /production/ operational modes > because in those modes the extra synch costs were amortized across > storage access costs. We think the same story most likely explains the > lack of significant degradation for our other middleware products. > However, that does not mean the JDK gets the all clear. Indeed, it may > not even leave our middleware in the clear. With only limited testing we > cannot be sure storage costs will mask synchronizations costs in all > production use cases. > > More generally, it is quite probable that there are applications built > over DataOutputStream or other JDK XXXOutputStream APIs (perhaps also > other class hierarchies) which don't suffer from synchronization costs > at present but are going to hit problems when biased locking is removed. > The argument from David Holmes that: > > ? Real "old" code would have been updated years ago to move away from > the synchronized library classes > > misses the obvious point that "real, old code" and, quite possibly, some > "real, relatively new code" will not have manifested performance issues > when using these old skool JDK classes with biased locking enabled. > Relying on users to have already fixed problems when they are not > visible prior to the deprecation change seems to me, at best, unrealistic. We are talking about the ancient fully-synchronized "collection classes" primarily: Vector and Hashtable. The problems with these being fully synchronized has been well known since Java 1.1 and there have been alternatives since 1.2 with the proper Collection classes. The migration away from these old classes even predates the introduction of BL! The DataOutputStream issue was a bit of a surprise as the code is grossly inefficient as written even with BL helping it out. I'm surprised nobody spotted this much earlier, but I'm glad that it has now been spotted and fixed. If other cases are discovered then we can also fix them. > I also feel that relying on users to come up their own alternatives to > work around the limitations of these /standard/ classes -- if and when > problems do turn up -- would be an inadequate response. If a need > emerges for unsynchronized, non-thread safe alternatives to JDK library > classes because of this JVM change then I think it would be appropriate > for that need to be met by upgrading the JDK itself to provide either an > alternative class or alternative mode of operation for the existing > class. A common, well built, integrated solution would be preferable to > many ad hoc alternatives. I think that is overstating the case somewhat. David ----- > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From david.holmes at oracle.com Mon Nov 9 13:51:03 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Nov 2020 23:51:03 +1000 Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: <8c4cbff8-32b5-c90b-20da-dccfa49e970c@oracle.com> Hi Jorn, On 9/11/2020 9:57 pm, Jorn Vernee wrote: > On Mon, 9 Nov 2020 03:29:05 GMT, David Holmes wrote: > >> src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 99: >> >>> 97: if (thread == NULL) { >>> 98: JavaVM_ *vm = (JavaVM *)(&main_vm); >>> 99: vm -> functions -> AttachCurrentThreadAsDaemon(vm, &p_env, NULL); >> >> Style nit: don't put spaces around `->` operator. >> >> What is the context for this being called? It looks highly suspicious to just attach the current thread to the VM this way. > > The context is a thread that is spawned by native code doing an upcall. We need to attach the thread to the VM first in that case. Normally this would be handled by the calling code, but in our case the calling code doesn't know it's calling into Java. Apologies that I don't have enough knowledge of this feature to understand the context. Shouldn't you then detach it again afterwards? Otherwise when will it detach? If you don't detach you will get a memory leak. >> src/java.base/share/classes/jdk/internal/invoke/NativeEntryPoint.java line 63: >> >>> 61: } >>> 62: >>> 63: public static NativeEntryPoint make(long addr, String name, ABIDescriptorProxy abi, VMStorageProxy[] argMoves, VMStorageProxy[] returnMoves, >> >> Where is name validation performed, to ensure the named native method is in fact legal and not trying to provide backdoor access to native code that should be encapsulated and protected? > > The name is just used as debugging information (in e.g. CallNativeNode::dump_spec), we are not looking it up. The address that is passed there is the actual function target. Okay. Thanks, David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/634 > From david.holmes at oracle.com Mon Nov 9 13:54:34 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Nov 2020 23:54:34 +1000 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> Message-ID: <623a8a08-b8e6-4c6c-aa73-3b9452e439d7@oracle.com> On 9/11/2020 7:44 pm, Andrew Haley wrote: > On 11/8/20 10:04 PM, David Holmes wrote: >> Hi Martin, >> >> On 7/11/2020 3:29 am, Doerr, Martin wrote: >>> Hi Patricio, >>> >>> seems like nobody wanted to be the first person to reply. So I just share a few thoughts. >>> >>> Unfortunately, I haven't heard any feedback from end users. >>> If the Biased Locking Code removal is not urgent because it's in the way for something else, I'd slightly prefer to remove it early in JDK17. >>> >>> My impression is that modern workloads are fine without BL, so typical JDK15 users will probably not notice it was switched off. >>> >>> Some old workloads are heavily affected, like SPEC jvm98. See performance drop on Power9: >>> http://cr.openjdk.java.net/~mdoerr/BiasedLocking_Power9.png >>> and on Intel Xeon E5: >>> http://cr.openjdk.java.net/~mdoerr/BiasedLocking_XeonE5.png >>> >>> Are there any plans for mitigations? >> >> I don't see what mitigations are possible. We know that if you use >> heavily synchronized code when it is not necessary (i.e. uncontended) >> then BL shines at improving performance. Real "old" code would have been >> updated years ago to move away from the synchronized library classes >> (Vector, Hashtable) that typically result in these situations. Given >> that we can't update things like SPEC jvm98, they will show lower >> performance without BL. But I don't think we should care about this in 2020. > > Well, we kind-of have to care, given that some of this "old" code that would > have been updated years ago is still in core classes in the Java library. I'm talking about caring about old code like SPECjvm98. The classes themselves are perfectly fine for MT situations. It is only ancient code that utilised them for single-threaded situations (and they had no choice prior to 1.2) that is impacted. > JDK-8254078 is a simple example, and a very modest proposal for change, > and it's still stuck in CSR. It was approved on October 25. David ----- From vlivanov at openjdk.java.net Mon Nov 9 14:40:03 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Nov 2020 14:40:03 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails Message-ID: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. Testing: - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails - [x] hs-precheckin-comp,hs-tier1,hs-tier2 ------------- Commit messages: - Fix PrintDeoptimizationDetails Changes: https://git.openjdk.java.net/jdk/pull/1124/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1124&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256050 Stats: 30 lines in 7 files changed: 15 ins; 0 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/1124.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1124/head:pull/1124 PR: https://git.openjdk.java.net/jdk/pull/1124 From aph at redhat.com Mon Nov 9 15:03:42 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 9 Nov 2020 15:03:42 +0000 Subject: Biased locking Obsoletion In-Reply-To: <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> Message-ID: <695619f7-7a4e-121a-3a3b-b057732cb1e8@redhat.com> On 11/9/20 10:19 AM, Alan Bateman wrote: > On 09/11/2020 09:44, Andrew Haley wrote: >> : >> JDK-8254078 is a simple example, and a very modest proposal for change, >> and it's still stuck in CSR. > > The CSR was approved and closed on Oct 24 so you should be good to go. Wow, really? I didn't see any message. Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jorn.vernee at oracle.com Mon Nov 9 15:07:48 2020 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Mon, 9 Nov 2020 16:07:48 +0100 Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: <8c4cbff8-32b5-c90b-20da-dccfa49e970c@oracle.com> References: <8c4cbff8-32b5-c90b-20da-dccfa49e970c@oracle.com> Message-ID: <56e42c7c-4abf-3948-cb63-1f102acedd85@oracle.com> Hi David, On 09/11/2020 14:51, David Holmes wrote: > Hi Jorn, > > On 9/11/2020 9:57 pm, Jorn Vernee wrote: >> On Mon, 9 Nov 2020 03:29:05 GMT, David Holmes >> wrote: >> >>> src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 99: >>> >>>> 97:?? if (thread == NULL) { >>>> 98:???? JavaVM_ *vm = (JavaVM *)(&main_vm); >>>> 99:???? vm -> functions -> AttachCurrentThreadAsDaemon(vm, &p_env, >>>> NULL); >>> >>> Style nit: don't put spaces around `->` operator. >>> >>> What is the context for this being called? It looks highly >>> suspicious to just attach the current thread to the VM this way. >> >> The context is a thread that is spawned by native code doing an >> upcall. We need to attach the thread to the VM first in that case. >> Normally this would be handled by the calling code, but in our case >> the calling code doesn't know it's calling into Java. > > Apologies that I don't have enough knowledge of this feature to > understand the context. Shouldn't you then detach it again afterwards? > Otherwise when will it detach? If you don't detach you will get a > memory leak. I went back to look at the original thread that added this code [1]. At that point the decision was made to go with the memory leak to avoid additional complexity in assembly stub generation and performance loss from having to repeatedly attach and detach threads. At least the complexity argument is no longer valid. As for the performance argument. This implementation of upcalls is not expected to perform stellar, so I think calling detach again in case we did an attach is the right call. (Though we'll have to solve the same problem again down the line for the optimized version). Thanks for catching, Jorn [1] : https://mail.openjdk.java.net/pipermail/panama-dev/2019-June/005761.html >>> src/java.base/share/classes/jdk/internal/invoke/NativeEntryPoint.java >>> line 63: >>> >>>> 61:???? } >>>> 62: >>>> 63:???? public static NativeEntryPoint make(long addr, String name, >>>> ABIDescriptorProxy abi, VMStorageProxy[] argMoves, VMStorageProxy[] >>>> returnMoves, >>> >>> Where is name validation performed, to ensure the named native >>> method is in fact legal and not trying to provide backdoor access to >>> native code that should be encapsulated and protected? >> >> The name is just used as debugging information (in e.g. >> CallNativeNode::dump_spec), we are not looking it up. The address >> that is passed there is the actual function target. > > Okay. > > Thanks, > David > ----- > >> ------------- >> >> PR: https://git.openjdk.java.net/jdk/pull/634 >> From mcimadamore at openjdk.java.net Mon Nov 9 15:26:13 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 9 Nov 2020 15:26:13 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v25] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Further improve output of TestHandshake ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/548/files - new: https://git.openjdk.java.net/jdk/pull/548/files/f5d339a7..27677a13 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=24 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=23-24 Stats: 8 lines in 1 file changed: 4 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From fparain at openjdk.java.net Mon Nov 9 15:48:01 2020 From: fparain at openjdk.java.net (Frederic Parain) Date: Mon, 9 Nov 2020 15:48:01 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo Message-ID: Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. Thank you, Fred ------------- Commit messages: - Cleanup fieldInfo structure Changes: https://git.openjdk.java.net/jdk/pull/1130/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1130&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256052 Stats: 121 lines in 5 files changed: 6 ins; 98 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/1130.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1130/head:pull/1130 PR: https://git.openjdk.java.net/jdk/pull/1130 From gziemski at openjdk.java.net Mon Nov 9 15:59:00 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Mon, 9 Nov 2020 15:59:00 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> Message-ID: <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> On Mon, 9 Nov 2020 03:14:29 GMT, David Holmes wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> Thomas' feedback - cleanup ucontext_get_pc/ucontext_set_pc > > Latest changes all look good. > > Only open issue is the include, or not, of signal.h. > > Thanks, > David > _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ > > On 7/11/2020 2:40 am, Gerard Ziemski wrote: > > > On Wed, 4 Nov 2020 04:22:05 GMT, David Holmes wrote: > > > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > > > - use ifdef(SIGDANGER) and ifdef(SIGTRAP) > > > > - revert unblock_program_error_signals change > > > > > > > > > src/hotspot/os/posix/signals_posix.hpp line 33: > > > > 31: > > > > 32: typedef siginfo_t siginfo_t; > > > > 33: typedef sigset_t sigset_t; > > > > > > > > > I don't see why this is needed/wanted. We can include signal.h without a problem. > > > I'm not even sure what these typedefs means ?? > > > > > > Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. > > I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. > > The only reason we would care is if signals_posix.hpp were included in > many other handers/files and that should not be the case. This looks > completely bogus to me as we need the types from signal.h in this header > file. Just trying to learn: why do we **need** them? We only included them in the APIs here, but we don't actually **use** them otherwise. And if we need to include `` then don't we also need to include the headers for `outputStream`, `Thread` and `OSThread`? All of these types are used to define the APIs in this header file and are used in the same capacity. > What do those typedefs even mean? I would expect a forward > declaration to be of the form: > > struct siginfo_t; > > but you don't know what type sigset_t (could be integer or struct) > actually is so you can't forward declare it that way. Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From mcimadamore at openjdk.java.net Mon Nov 9 16:07:13 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 9 Nov 2020 16:07:13 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v26] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Add more output in TestHandhsake.java ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/548/files - new: https://git.openjdk.java.net/jdk/pull/548/files/27677a13..a22b6b5b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=25 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=24-25 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From jvernee at openjdk.java.net Mon Nov 9 16:34:00 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 9 Nov 2020 16:34:00 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: <_Nl5ypHkLlY3aqitzjfT_Rot6lDm6TX3VS3KbTE_gBg=.4a0f5742-bad8-4ceb-805d-2e7e0c51ebf4@github.com> On Mon, 9 Nov 2020 12:11:56 GMT, Jorn Vernee wrote: >> I agree with Coleen. > > I'll give this another try, but I think last time I tried this resolution of the class failed when trying to build the JDK, seemingly since it exists in an incubator module, which is not always added to the module graph. Ok, I can confirm that moving this to be a well-known class will result in a `java/lang/NoClassDefFoundError: jdk/internal/foreign/abi/ProgrammableUpcallHandler` error while trying to build the JDK. I think this is because the particular class is in an incubator module, which is not always present. I think we'll have to stick with the lazy resolution instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From jvernee at openjdk.java.net Mon Nov 9 16:34:01 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 9 Nov 2020 16:34:01 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 22:02:31 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 81: >> >>> 79: #endif >>> 80: >>> 81: Method* method = k->lookup_method(mname_sym, mdesc_sym); >> >> This "method" appears unused. > > This should be moved into javaClasses or common code. resolve_or_null only resolves the class, it doesn't also call the initializer for the class so you shouldn't be able to call a static method on the class. I'll move this to common code and add a call to `Klass::initialize` ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From jvernee at openjdk.java.net Mon Nov 9 16:34:05 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 9 Nov 2020 16:34:05 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 22:07:39 GMT, Coleen Phillimore wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: >> >> - Merge branch '8254162' into 8254231_linker >> - Fix post-merge issues caused by 8219014 >> - Merge branch 'master' into 8254162 >> - Addess remaining feedback from @AlanBateman and @mrserb >> - Address comments from @AlanBateman >> - Fix typo in upcall helper for aarch64 >> - Merge branch '8254162' into 8254231_linker >> - Merge branch 'master' into 8254162 >> - Fix issues with derived buffers and IO operations >> - More 32-bit fixes for TestLayouts >> - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f > > src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 68: > >> 66: Symbol* cname_sym = SymbolTable::new_symbol(cname, (int)strlen(cname)); >> 67: Symbol* mname_sym = SymbolTable::new_symbol(mname, (int)strlen(mname)); >> 68: Symbol* mdesc_sym = SymbolTable::new_symbol(mdesc, (int)strlen(mdesc)); > > You don't need the strlen() argument. Ok, I see it has an overload that does that in the header file (and I only looked at the .cpp file. :( ) > src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 121: > >> 119: upcall_info.upcall_method.name, upcall_info.upcall_method.sig, >> 120: &args, thread); >> 121: } > > This code shouldn't be in the cpu directory. This should be in SharedRuntime or in jni.cpp. It should have a JNI_ENTRY and not transition directly. I don't know what AttachCurrentThreadAsDaemon does. Roger that. We need the thread state transition though in case we get a random native thread calling us. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From jvernee at openjdk.java.net Mon Nov 9 16:37:07 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 9 Nov 2020 16:37:07 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 03:52:27 GMT, David Holmes wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: >> >> - Merge branch '8254162' into 8254231_linker >> - Fix post-merge issues caused by 8219014 >> - Merge branch 'master' into 8254162 >> - Addess remaining feedback from @AlanBateman and @mrserb >> - Address comments from @AlanBateman >> - Fix typo in upcall helper for aarch64 >> - Merge branch '8254162' into 8254231_linker >> - Merge branch 'master' into 8254162 >> - Fix issues with derived buffers and IO operations >> - More 32-bit fixes for TestLayouts >> - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f > > src/hotspot/share/prims/universalNativeInvoker.cpp line 40: > >> 38: assert(thread->thread_state() == _thread_in_native, "thread state is: %d", thread->thread_state()); >> 39: } >> 40: assert(thread->thread_state() == _thread_in_vm, "thread state is: %d", thread->thread_state()); > > Is there some reason you don't trust the thread-state transition code and are asserting it updates the state correctly all the time? :) There are already a number of assertions of this kind within the ThreadToNativeFromVM code. Pre-existing code (to me at least). I agree it seems unnecessary, I'll remove the asserts. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From jvernee at openjdk.java.net Mon Nov 9 16:37:09 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 9 Nov 2020 16:37:09 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v14] In-Reply-To: <037JOlF9tavFXVI6H6AnLiI3GSpcJLiAOANXz_KTWUg=.16b28fcd-302a-42b2-b493-cd2e4e59b9b2@github.com> References: <037JOlF9tavFXVI6H6AnLiI3GSpcJLiAOANXz_KTWUg=.16b28fcd-302a-42b2-b493-cd2e4e59b9b2@github.com> Message-ID: On Mon, 9 Nov 2020 03:56:38 GMT, David Holmes wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo in upcall helper for aarch64 > > src/java.base/share/classes/java/lang/System.java line 2086: > >> 2084: break; >> 2085: case "allow": >> 2086: allowSecurityManager = MAYBE; > > Why is this removed? I don't see the connection. Seems to be a problem from a merge gone wrong. Not related as you say. Will remove ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From redestad at openjdk.java.net Mon Nov 9 16:38:57 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 9 Nov 2020 16:38:57 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 15:43:09 GMT, Frederic Parain wrote: > Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. > > Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. > > Thank you, > > Fred Nice cleanup! src/hotspot/share/classfile/classFileParser.cpp line 1708: > 1706: > 1707: // Remember how many oops we encountered and compute allocation type > 1708: const FieldAllocationType atype = fac->update(is_static, type); The returned `FieldAllocationType` is never used at either call-site, so maybe the `update` method can be simplified, too? (It seems all `update` does is increment a per-type counter, so the name is a bit surprising) src/hotspot/share/runtime/vmStructs.cpp line 2261: > 2259: declare_preprocessor_constant("FIELDINFO_TAG_SIZE", FIELDINFO_TAG_SIZE) \ > 2260: declare_preprocessor_constant("FIELDINFO_TAG_OFFSET", FIELDINFO_TAG_OFFSET) \ > 2261: declare_preprocessor_constant("FIELDINFO_TAG_CONTENDED", FIELDINFO_TAG_CONTENDED) \ Not sure it's necessary to add this with no usage? ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1130 From kvn at openjdk.java.net Mon Nov 9 17:15:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 9 Nov 2020 17:15:56 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails In-Reply-To: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: On Mon, 9 Nov 2020 13:01:18 GMT, Vladimir Ivanov wrote: > -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: > * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); > * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. > > Testing: > - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails > - [x] hs-precheckin-comp,hs-tier1,hs-tier2 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1124 From github.com+51754783+coreyashford at openjdk.java.net Mon Nov 9 17:32:00 2020 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Mon, 9 Nov 2020 17:32:00 GMT Subject: RFR: 8248188: Add IntrinsicCandidate and API for Base64 decoding [v12] In-Reply-To: <-7PHVafzbyMukuWngsX5bdLvJPubN2KzjMWM2lrQnCs=.a278b608-3e1c-4126-9791-efe18a5d8d5e@github.com> References: <_JR-e3ZsRFwvZCR7ws34z5jLjp2kJQ1bu4gyl0RG1XU=.ec3040cf-8147-4dcd-b87d-4fd9be4eb59e@github.com> <-7PHVafzbyMukuWngsX5bdLvJPubN2KzjMWM2lrQnCs=.a278b608-3e1c-4126-9791-efe18a5d8d5e@github.com> Message-ID: On Mon, 26 Oct 2020 19:27:59 GMT, Paul Murphy wrote: >> Because the bytes are displayed e15..e8, instead of the other way around, it's hard to follow. As an example, consider just the last four bytes of the table, but displayed in the reverse order: >> >> 00||b0:0..5 00||b0:6..7||b1:0..3 00||b1:4..7||b2:0..1 00||b2:2..7 >> >> After vpextd with bit select pattern 00111111 for all bytes: >> >> b0:0..5||b0:6..7 b1:0..3||1:4..7 b2:0..1||b2:2..7 >> = >> b0:0..7 b1:0..7 b2:0..7 >> >> Should I reverse the order of this table with a comment at the top, to explain the reason for the reversal? It seems like a good idea. > > Since you are operating on doublewords here, expressing this as operations on a doubleword instead of bytes would be more intuitive here. I think the lane mappings for little endian are what throw me off. Are you satisfied with the recent changes to the comments? ------------- PR: https://git.openjdk.java.net/jdk/pull/293 From kbarrett at openjdk.java.net Mon Nov 9 18:22:00 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 9 Nov 2020 18:22:00 GMT Subject: RFR: 8252588: HotSpot Style Guide should permit uniform initialization In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:05:35 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of uniform initialization, aka brace initialization, in > HotSpot code. Uniform initialization is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision to > approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. > > [Note: This is the first attempt to change the Style Guide since the > revision that added a description for a change process (requires rough > consensus of the Group), and also since the start of using git and > github PRs. I'm making a guess at how to instantiate that process > within the new mechanisms.] Since we're piggybacking on github PRs here, please use the PR review process to approve (click on Review Changes > Approve), rather than sending a "vote: yes" email reply that would be normal for a CFV. Other responses can still use email of course. ------------- PR: https://git.openjdk.java.net/jdk/pull/1119 From mcimadamore at openjdk.java.net Mon Nov 9 18:25:27 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 9 Nov 2020 18:25:27 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v16] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #6 from JornVernee/MoveUpcallInfo Address review comments - - Use lazy constant for upcall_info - Reduce copied code between platforms - Clean up includes - Split thread attach from upcall_helper ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/b38afb3f..e9606edb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=15 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=14-15 Stats: 463 lines in 8 files changed: 100 ins; 296 del; 67 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From shade at openjdk.java.net Mon Nov 9 19:06:00 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 9 Nov 2020 19:06:00 GMT Subject: RFR: 8253525: Implement getInstanceSize/sizeOf intrinsics [v5] In-Reply-To: <_TFbHUvH8zI29hGvpE3TGGNLGgi7PhP7rfed23up13U=.5a19aaed-cc1e-4d18-997d-f37616323e61@github.com> References: <_TFbHUvH8zI29hGvpE3TGGNLGgi7PhP7rfed23up13U=.5a19aaed-cc1e-4d18-997d-f37616323e61@github.com> Message-ID: On Wed, 21 Oct 2020 17:37:20 GMT, Aleksey Shipilev wrote: >> Good. > > Thanks for review, @kvn! I would also like a review from someone from serviceability. Friendly reminder. ------------- PR: https://git.openjdk.java.net/jdk/pull/650 From fparain at openjdk.java.net Mon Nov 9 19:06:12 2020 From: fparain at openjdk.java.net (Frederic Parain) Date: Mon, 9 Nov 2020 19:06:12 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v2] In-Reply-To: References: Message-ID: > Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. > > Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. > > Thank you, > > Fred Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: More cleanup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1130/files - new: https://git.openjdk.java.net/jdk/pull/1130/files/c048832e..b822ba0e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1130&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1130&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1130.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1130/head:pull/1130 PR: https://git.openjdk.java.net/jdk/pull/1130 From jrose at openjdk.java.net Mon Nov 9 19:46:58 2020 From: jrose at openjdk.java.net (John R Rose) Date: Mon, 9 Nov 2020 19:46:58 GMT Subject: RFR: 8252588: HotSpot Style Guide should permit uniform initialization In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:05:35 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of uniform initialization, aka brace initialization, in > HotSpot code. Uniform initialization is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision to > approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. > > [Note: This is the first attempt to change the Style Guide since the > revision that added a description for a change process (requires rough > consensus of the Group), and also since the start of using git and > github PRs. I'm making a guess at how to instantiate that process > within the new mechanisms.] Allowing this syntax is helpful and safe. Helpful because it allows us to move towards a more uniform and modern C++ style. Safe because it doesn't threaten portability of our code base. In particular, C++11 (including this feature) is widely available on the platforms we care about, and this feature, being syntax sugar, does not drag in additional C++ runtime dependencies. ------------- Marked as reviewed by jrose (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1119 From fparain at openjdk.java.net Mon Nov 9 19:58:11 2020 From: fparain at openjdk.java.net (Frederic Parain) Date: Mon, 9 Nov 2020 19:58:11 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v3] In-Reply-To: References: Message-ID: <0TM274lRiS3ITNLU8TU7z5IMIhTqJNifNA6DjsqOH5s=.4311e578-e29e-4469-b734-ac666e299f8b@github.com> > Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. > > Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. > > Thank you, > > Fred Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: Remove unused symbol from vmStruct ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1130/files - new: https://git.openjdk.java.net/jdk/pull/1130/files/b822ba0e..b4b24792 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1130&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1130&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1130.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1130/head:pull/1130 PR: https://git.openjdk.java.net/jdk/pull/1130 From lfoltan at openjdk.java.net Mon Nov 9 19:58:12 2020 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Mon, 9 Nov 2020 19:58:12 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v3] In-Reply-To: <0TM274lRiS3ITNLU8TU7z5IMIhTqJNifNA6DjsqOH5s=.4311e578-e29e-4469-b734-ac666e299f8b@github.com> References: <0TM274lRiS3ITNLU8TU7z5IMIhTqJNifNA6DjsqOH5s=.4311e578-e29e-4469-b734-ac666e299f8b@github.com> Message-ID: On Mon, 9 Nov 2020 19:54:58 GMT, Frederic Parain wrote: >> Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. >> >> Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. >> >> Thank you, >> >> Fred > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused symbol from vmStruct Marked as reviewed by lfoltan (Reviewer). Looks good Fred. ------------- PR: https://git.openjdk.java.net/jdk/pull/1130Marked as reviewed by lfoltan (Reviewer). From lfoltan at openjdk.java.net Mon Nov 9 19:58:14 2020 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Mon, 9 Nov 2020 19:58:14 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v3] In-Reply-To: References: <0TM274lRiS3ITNLU8TU7z5IMIhTqJNifNA6DjsqOH5s=.4311e578-e29e-4469-b734-ac666e299f8b@github.com> Message-ID: On Mon, 9 Nov 2020 19:54:09 GMT, Lois Foltan wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused symbol from vmStruct > > Marked as reviewed by lfoltan (Reviewer). Looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/1130 From fparain at openjdk.java.net Mon Nov 9 19:58:13 2020 From: fparain at openjdk.java.net (Frederic Parain) Date: Mon, 9 Nov 2020 19:58:13 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v3] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 16:36:07 GMT, Claes Redestad wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused symbol from vmStruct > > Nice cleanup! Hi Claes, Thank you for your review, the new version should address the points you raised. Fred > src/hotspot/share/classfile/classFileParser.cpp line 1708: > >> 1706: >> 1707: // Remember how many oops we encountered and compute allocation type >> 1708: const FieldAllocationType atype = fac->update(is_static, type); > > The returned `FieldAllocationType` is never used at either call-site, so maybe the `update` method can be simplified, too? (It seems all `update` does is increment a per-type counter, so the name is a bit surprising) Right, I removed the value returned by the `update()` method. > src/hotspot/share/runtime/vmStructs.cpp line 2261: > >> 2259: declare_preprocessor_constant("FIELDINFO_TAG_SIZE", FIELDINFO_TAG_SIZE) \ >> 2260: declare_preprocessor_constant("FIELDINFO_TAG_OFFSET", FIELDINFO_TAG_OFFSET) \ >> 2261: declare_preprocessor_constant("FIELDINFO_TAG_CONTENDED", FIELDINFO_TAG_CONTENDED) \ > > Not sure it's necessary to add this with no usage? Not necessary, removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/1130 From hseigel at openjdk.java.net Mon Nov 9 19:58:15 2020 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 9 Nov 2020 19:58:15 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v3] In-Reply-To: <0TM274lRiS3ITNLU8TU7z5IMIhTqJNifNA6DjsqOH5s=.4311e578-e29e-4469-b734-ac666e299f8b@github.com> References: <0TM274lRiS3ITNLU8TU7z5IMIhTqJNifNA6DjsqOH5s=.4311e578-e29e-4469-b734-ac666e299f8b@github.com> Message-ID: On Mon, 9 Nov 2020 19:54:58 GMT, Frederic Parain wrote: >> Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. >> >> Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. >> >> Thank you, >> >> Fred > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused symbol from vmStruct src/hotspot/share/oops/fieldInfo.hpp line 61: > 59: // [------------------offset----------------]01 - real field offset > 60: > 61: // Bit O indicates if the packed field contains an offset (O=1) or not (O=1) Hi Fred, should this comment say "... or not (0=0) ? ------------- PR: https://git.openjdk.java.net/jdk/pull/1130 From fparain at openjdk.java.net Mon Nov 9 20:05:09 2020 From: fparain at openjdk.java.net (Frederic Parain) Date: Mon, 9 Nov 2020 20:05:09 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v4] In-Reply-To: References: Message-ID: > Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. > > Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. > > Thank you, > > Fred Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: Fix comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1130/files - new: https://git.openjdk.java.net/jdk/pull/1130/files/b4b24792..18ad6490 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1130&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1130&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1130.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1130/head:pull/1130 PR: https://git.openjdk.java.net/jdk/pull/1130 From fparain at openjdk.java.net Mon Nov 9 20:05:10 2020 From: fparain at openjdk.java.net (Frederic Parain) Date: Mon, 9 Nov 2020 20:05:10 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v3] In-Reply-To: References: <0TM274lRiS3ITNLU8TU7z5IMIhTqJNifNA6DjsqOH5s=.4311e578-e29e-4469-b734-ac666e299f8b@github.com> Message-ID: On Mon, 9 Nov 2020 19:53:32 GMT, Harold Seigel wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused symbol from vmStruct > > src/hotspot/share/oops/fieldInfo.hpp line 61: > >> 59: // [------------------offset----------------]01 - real field offset >> 60: >> 61: // Bit O indicates if the packed field contains an offset (O=1) or not (O=1) > > Hi Fred, should this comment say "... or not (0=0) ? You're right. I've fixed the comment (and the line below which had the same issue). Fred ------------- PR: https://git.openjdk.java.net/jdk/pull/1130 From daniel.daugherty at oracle.com Mon Nov 9 20:16:58 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 9 Nov 2020 15:16:58 -0500 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: Hi David, I'm going to try replying to this review comment via email to see how that works out... On 11/9/20 1:06 AM, David Holmes wrote: > Hi Dan, > > On 9/11/2020 1:50 pm, Daniel D.Daugherty wrote: >> On Sun, 8 Nov 2020 21:43:00 GMT, David Holmes >> wrote: >> >>>> How about this: >>>> ?? static MonitorList?? _in_use_list; >>>> ?? // The ratio of the current _in_use_list count to the ceiling is >>>> used >>>> ?? // to determine if we are above MonitorUsedDeflationThreshold >>>> and need >>>> ?? // to do an async monitor deflation cycle. The ceiling is >>>> increased by >>>> ?? // AvgMonitorsPerThreadEstimate when a thread is added to the >>>> system >>>> ?? // and is decreased by AvgMonitorsPerThreadEstimate when a >>>> thread is >>>> ?? // removed from the system. >>>> ?? // Note: If the _in_use_list max exceeds the ceiling, then >>>> ?? // monitors_used_above_threshold() will use the in_use_list max >>>> instead >>>> ?? // of the thread count derived ceiling because we have used more >>>> ?? // ObjectMonitors than the estimated average. >>>> ?? static jint????????? _in_use_list_ceiling; >>> >>> Thanks for the comment. So instead of checking the threshhold on >>> each OM allocation we use this averaging technique to estimate the >>> number of monitors in use? Can you explain how this came about >>> rather than the simple/obvious check at allocation time. Thanks. >> >> I'm not sure I understand your question, but let me that a stab at it >> anyway... >> >> We used to compare the sum of the in-use counts from all the in-use >> lists >> with the total population of ObjectMonitors. If that ratio was higher >> than >> MonitorUsedDeflationThreshold, then we would do an async deflation >> cycle. >> Since we got rid of TSM, we no longer had a population of already >> allocated >> ObjectMonitors, we had a max value instead. However, when the VMs use >> of ObjectMonitors is first spinning up, the max value is typically >> very close >> to the in-use count so we would always be asking for an async-deflation >> during that spinning up phase. >> >> I created the idea of a ceiling value that is tied to thread count >> and the >> AvgMonitorsPerThreadEstimate to replace the population value that we >> used to have. By comparing the in-use count against the ceiling >> value, we >> no longer exceed the MonitorUsedDeflationThreshold when the VMs use >> of ObjectMonitors is first spinning up so we no longer do async >> deflations >> continuously during that phase. If the max value exceeds the ceiling >> value, >> then we're using a LOT of ObjectMonitors and, in that case, we compare >> the in-use count against the max to determine if we're exceeding the >> MonitorUsedDeflationThreshold. >> >> Does this help? > > It helps but I'm still wrestling with what > MonitorUsedDeflationThreshold actually means now. > > So the existing MonitorUsedDeflationThreshold is used as a measure of > the proportion of monitors actually in-use compared to the number of > monitors pre-allocated. Slight correction: not a measure of proportion, but a threshold on the proportion. > If an inflation request requires a new block to be allocated and we're > above MonitorUsedDeflationThreshold % then a request for async > deflation occurs (when we actually check). Not quite. We did have code in om_alloc() that could cause a deflation cycle to be invoked, but that code was not enabled by default and was removed. See the fix for: ??? JDK-8230940 Obsolete MonitorBound ??? https://bugs.openjdk.java.net/browse/JDK-8230940 which was pushed in jdk-15-B22. In the current baseline, ObjectSynchronizer::is_async_deflation_needed() is used to ask if we need to perform an async deflation cycle. That function calls monitors_used_above_threshold() which is where the logic that uses the MonitorUsedDeflationThreshold value comes into play. > The new code, IIUC, says, lets assume we expect > AvgMonitorsPerThreadEstimate monitors-per-thread. If the number of > monitors in-use is > MonitorUsedDeflationThreshold % of > (AvgMonitorsPerThreadEstimate * number_of_threads), then we request > async deflation. You understand correctly. > So ... obviously we need some kind of watermark based system for > requesting deflation otherwise there will be far too many deflation > requests. Yes. That's what I saw before I added AvgMonitorsPerThreadEstimate and the ceiling. Although, I think I like the phrase "watermark" better! Should I rename this new ceiling concept to watermark?? > And we also don't want to have check for exceeding the threshold on > every monitor allocation. So the deflation thread will wakeup > periodically and check if the threshold is exceeded. Exactly. GuaranteedSafepointInterval (default 1sec) controls how often the MonitorDeflationThread will wake up on its own to check for work. AsyncDeflationInterval (default 250ms) controls at most how often the MonitorDeflationThread will actually do the work. However, if an async deflation cycle is directly requested, then it will be honored regardless of the limits. At this point, only the WhiteBox support uses that feature. Although we do have an RFE from you to restore that feature to System.gc(): ? JDK-8249638 Re-instate idle monitor deflation as part of System.gc() ? https://bugs.openjdk.java.net/browse/JDK-8249638 > Okay ... so then it comes down to deciding whether > AvgMonitorsPerThreadEstimate is the best way to establish the > watermark and what the default value should be. This doesn't seem like > something that an application developer could reasonably try to > estimate so it is just going to be a tuning knob they adjust somewhat > arbitrarily. I actually don't expect anyone to care about AvgMonitorsPerThreadEstimate, but I've created it just-in-case. > I assume the 1024 default came from tuning something? The default 1024 value is the closest I can come to the baseline VM behavior where we allowed a per-thread free-list to allocate at most 1024 ObjectMonitors in one attempt. Since the per-thread free list is the fastest code path for an ObjectMonitor allocation in the baseline VM, I thought using that value made for a reasonable default for the AvgMonitorsPerThreadEstimate. However, we don't pre-allocate in the new code so a thread doesn't actually have (at most) 1024 ObjectMonitors waiting on the per-thread free list for fast allocation. > Have you looked at the affect on memory use these changes have (ie > peak RSS use)? Actually what I've been looking at is population values in the baseline VM and the max in the new code with Kitchensink8H runs. I don't have the absolutely latest values, but, IIRC, the new code has a much smaller max value than the baseline population value. Obviously this just tells me the raw numbers of ObjectMonitors and doesn't include any "hidden" overhead that would be captured by RSS values, but I think it gives me a reasonable feel for memory utilization. > Did your performance measurements look at using different values? No. All my performance runs used default values. I tend to focus on out-of-the-box settings unless I'm looking to justify changing some default value. > (I can imagine that with enough memory we can effectively disable > deflation and so potentially increase performance. OTOH maybe > deflation is so infrequent it is a non-issue.) > > I have to confess that I never really thought about the old set of > heuristics for this, but the fact we're changing the heuristics does > raise a concern about what impact applications may see. None of the performance testing that we've done so far has raised any concerns about the new heuristics. The new ObjectMonitor inflation mechanism is much, much faster than the old mechanism. During my Inflate2 stress testing, the baseline VM would peak at a population of about 12 million at 4.5 hours into an 8 hour run with release bits. With the new code, Inflate2 would reach a max of 400+ million at about an hour with no signs that was the actual peak when I stopped the run. > BTW MonitorUsedDeflationThreshold should really be diagnostic not > experimental, as real applications may need to tune it (and people > often don't want to use experimental flags in production as a matter > of policy). MonitorUsedDeflationThreshold wasn't added with this project. It was added by Robbin using this bug ID: ??? JDK-8181859 Monitor deflation is not checked in cleanup path ??? https://bugs.openjdk.java.net/browse/JDK-8181859 way back in jdk-10-B21... I don't know the reason that Robbin created the option as experimental rather than diagnostic, but I can investigate. Thanks again for the review! At this point, I don't see anything that I plan to change in response to this set of comments. I do have a query up above about renaming the ceiling concept to watermark. Please let me know what you think. Dan > > Thanks, > David > ----- > >> ------------- >> >> PR: https://git.openjdk.java.net/jdk/pull/642 >> From hseigel at openjdk.java.net Mon Nov 9 20:26:57 2020 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 9 Nov 2020 20:26:57 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v4] In-Reply-To: References: Message-ID: <-Wc3FeXsLIx0x0U3gt0oSW6qiTGSRuMKVr4qJ3aTTXE=.7ed06bf5-f795-44ca-99ca-848d199a2733@github.com> On Mon, 9 Nov 2020 20:05:09 GMT, Frederic Parain wrote: >> Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. >> >> Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. >> >> Thank you, >> >> Fred > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment Looks good! ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1130 From dcubed at openjdk.java.net Mon Nov 9 20:39:59 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 20:39:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: <-VFMKh1P4U674scgRXbBch7b4eE91O4ZelExZOeXByw=.e4acb6ea-a3c6-4ac0-a6d2-fe2a09cc178b@github.com> On Mon, 9 Nov 2020 08:34:56 GMT, Erik ?sterlund wrote: >> I noticed that in my preliminary review of Erik's changes. He checked >> with the JFR guys and they said it just needed to be an address and >> does not have to refer to the Object. >> >> @fisk - can you think of a comment we should add here? > > We could write something along the lines of "An address that is 'unique enough', such that events close in time and with the same address are likely (but not guaranteed) to belong to the same object". This uniqueness property has always been more of a heuristic thing than anything else, as deflation shuffles the addresses around. Taking the this pointer vs an offset into the this pointer does however serve the exact same purpose; there was never any correlation to the contents of the object field. Thanks @fisk! I've added a slightly edited version of the comment: // Set an address that is 'unique enough', such that events close in // time and with the same address are likely (but not guaranteed) to // belong to the same object. @dholmes-ora - does this work for you? >> I noticed that in my preliminary review of Erik's changes. He checked >> with the JFR guys and they said it just needed to be an address and >> does not have to refer to the Object. >> >> @fisk - can you think of a comment we should add here? > > I wrote one in the section above, hope it is useful. Thanks @fisk. I copied the same comment here since it is about 1000 lines away from the other comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From ayang at openjdk.java.net Mon Nov 9 20:42:00 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 9 Nov 2020 20:42:00 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 09:44:17 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? >> >> By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? >> >> Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: >> - humongous regions are either live or fully reclaimed. >> - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). >> >> This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. >> >> Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). >> >> Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. >> Performance testing: no regressions >> >> Some comments for questions that might come up during review: >> >> - how does this work with the bitmaps now: >> - at start of full gc the next bitmap is cleared >> - full gc marks the next bitmap >> - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom >> - swap bitmaps >> - clear next bitmap for next marking >> >> (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. >> >> - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. >> >> Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. >> (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). >> >> I.e. the second clause in the condition of this hunk is intentionally slower than could be: >> @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { >> // Marked by us, preserve if needed. >> markWord mark = obj->mark(); >> if (obj->mark_must_be_preserved(mark) && >> // It is not necessary to preserve marks for objects in pinned regions because >> // we do not change their headers (i.e. forward them). >> !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { >> preserved_stack()->push(obj, mark); >> } >> - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. >> >> Also please note that the 51b297b change is from the #808 change. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > sjohanss review 2 Thank you for the code walk-through. Some further questions/comments. src/hotspot/share/gc/g1/g1FullGCHeapRegionAttr.hpp line 58: > 56: bool is_pinned_or_closed(HeapWord* obj) const { > 57: assert(!is_invalid(obj), "not initialized yet"); > 58: return get_by_address(obj) >= Pinned; Better have a static assert that `ClosedArchive > Pinned`. src/hotspot/share/gc/g1/g1FullGCHeapRegionAttr.hpp line 37: > 35: static const uint8_t ClosedArchive = 2; > 36: > 37: static const uint8_t Invalid = 255; Why use 255 as the default value? Prior to this PR, the default value is 0. I think it's best to keep it intact. (Probably, the compiler can optimize `clear()` better if each element is reset to 0.) src/hotspot/share/gc/g1/g1FullCollector.hpp line 76: > 74: ReferenceProcessorSubjectToDiscoveryMutator _is_subject_mutator; > 75: > 76: G1FullGCHeapRegionAttr _region_attr_table; I don't really see the point of this auxiliary data structure. Why can't we just query the underlying region for its type, pinned, open/close archive? ------------- Changes requested by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/824 From coleenp at openjdk.java.net Mon Nov 9 20:43:06 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 9 Nov 2020 20:43:06 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v6] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 12:42:40 GMT, Coleen Phillimore wrote: >> Thank you for the update, Coleen! >> I leave it for you to decide to refactor the gc_notification or not. >> Thanks, >> Serguei > > Thanks @sspitsyn . I'm going to leave the gc_notification code because structurally the two sides of the if statement are different and it's not a long function. Thank you for reviewing the change. This change also passes tier 7,8 testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From dcubed at openjdk.java.net Mon Nov 9 20:44:56 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 20:44:56 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: <9K1VRMjKJ4TIYx2Dc2Cn98vmFTIAaZ-l903nmvcl3rI=.c91f9d0b-e385-4c6c-977e-5555175a3783@github.com> On Mon, 9 Nov 2020 08:45:49 GMT, Erik ?sterlund wrote: >> @fisk - I'm leaving this one for you for now. > > Changing it to a do/while loop makes sense. The while condition is always true the first iteration, so doing a while or do/while loop is equivalent. If you find the do/while loop easier to read, then that sounds good to me. Okay. I've changed it to a do-while loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Mon Nov 9 20:51:17 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 20:51:17 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Mon, 9 Nov 2020 03:19:17 GMT, Daniel D. Daugherty wrote: >> @dholmes-ora - Thanks for the review! Hmm... I'm not sure why the GitHub UI >> send out my replies one-at-a-time. Perhaps I should have replied from the >> "files" view instead of the main PR view? > > It is a deflated monitor that is still on the in-use list. I've made all the changes based on @fisk's replies to @dholmes-ora comments. @dholmes-ora - I think I've addressed all of your comments. Please let me know if I've missing something. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Mon Nov 9 20:51:17 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 20:51:17 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also done promotion perf runs on these bits and says "the results look fine". Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: Resolve more @dholmes-ora comments with help from @fisk. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/642/files - new: https://git.openjdk.java.net/jdk/pull/642/files/6c2db34a..2b668f08 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=01-02 Stats: 8 lines in 2 files changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/642.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/642/head:pull/642 PR: https://git.openjdk.java.net/jdk/pull/642 From stuefe at openjdk.java.net Mon Nov 9 20:55:57 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 9 Nov 2020 20:55:57 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> Message-ID: On Mon, 9 Nov 2020 15:55:13 GMT, Gerard Ziemski wrote: > > > Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. > > > I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. > > > > > > The only reason we would care is if signals_posix.hpp were included in > > many other handers/files and that should not be the case. This looks > > completely bogus to me as we need the types from signal.h in this header > > file. > > Just trying to learn: why do we **need** them? We only included them in the APIs here, but we don't actually **use** them otherwise. > > And if we need to include `` then don't we also need to include the headers for `outputStream`, `Thread` and `OSThread`? All of these types are used to define the APIs in this header file and are used in the same capacity. You need to include them if you: - use the full type, so whoever compiles the header has to know the type size. E.g. if you pass a structure by value. - use the type as pointer and do not forward declare it. In hotspot, it is standard practice to forward-declare structures from some hotspot utility headers - eg ostream.hpp - to avoid including them. There is no guideline of when to do this, and obviously there is a point at which it is simpler and clearer to just include the header. In my personal opinion forward declaration is just a bandaid to work around badly designed headers. Ideally, headers should be small and concise. But hotspot headers are balls of yarn. Pull one in you get a bunch of unrelated others. So people got used to forward declaring some classes instead, things like outputStream. I actually think its a bad practice and in the ideal world we would just include whatever we need. Which also means that the benefit of forward-declaring types from system headers is limited. Posix headers are usually well designed, and you are better off just including them. Especially since you need to be careful here: it is not clear what these opaque posix types are actually. Sometimes the standard tells you: "The header shall define the siginfo_t type as a structure.." but sometimes it leaves it open: "sigset_t ... Integer or structure type ...". > > > What do those typedefs even mean? I would expect a forward > > declaration to be of the form: > > struct siginfo_t; > > but you don't know what type sigset_t (could be integer or struct) > > actually is so you can't forward declare it that way. > > Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. > > Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? For hotspot classes, I would leave the forward declarations in and the headers out. Current standard practice. System headers OTOH I would either include here or, somewhat better, in globalDefinitions_gcc.hpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From dcubed at openjdk.java.net Mon Nov 9 21:05:57 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 21:05:57 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails In-Reply-To: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: On Mon, 9 Nov 2020 13:01:18 GMT, Vladimir Ivanov wrote: > -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: > * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); > * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. > > Testing: > - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails > - [x] hs-precheckin-comp,hs-tier1,hs-tier2 Looks good. Your call on whether to add the comment I proposed. src/hotspot/share/runtime/basicLock.cpp line 34: > 32: markWord mark_word = displaced_header(); > 33: if (mark_word.value() != 0) { > 34: bool print_monitor_info = (owner != NULL) && (owner->mark() == markWord::from_pointer((void*)this)); Could use a comment between L33 and L34: // Print monitor info if there's an owning oop and it refers to this BasicLock. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1124 From github.com+58006833+xbzhang99 at openjdk.java.net Mon Nov 9 21:18:08 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 9 Nov 2020 21:18:08 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v3] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8255368' of github.com:xbzhang99/bugfixes into JDK-8255368 - Added test cases for exp at the value of 1024 and 10000 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/305d915b..757192c3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From darcy at openjdk.java.net Mon Nov 9 21:34:57 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Mon, 9 Nov 2020 21:34:57 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v3] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 21:18:08 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8255368' of github.com:xbzhang99/bugfixes into JDK-8255368 > - Added test cases for exp at the value of 1024 and 10000 test/jdk/java/lang/Math/WorstCaseTests.java line 117: > 115: {+0x1.1D5C2DAEBE367p4, 0x1.A8C02E974C314p25}, > 116: {+0x1.C44CE0D716A1Ap4, 0x1.B890CA8637AE1p40}, > 117: {+0x4.0p8, Double.POSITIVE_INFINITY}, //bug 8255368 gave 0 This is not an appropriate test to update to cover this bug. This test is specifically probing at difficult cases in double arithmetic for the underlying mathematically function as opposed to flaws in a particular implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Mon Nov 9 22:12:57 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Mon, 9 Nov 2020 22:12:57 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v3] In-Reply-To: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> References: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> Message-ID: On Mon, 2 Nov 2020 17:42:27 GMT, Joe Darcy wrote: >> Xubo Zhang has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8255368' of github.com:xbzhang99/bugfixes into JDK-8255368 >> - Added test cases for exp at the value of 1024 and 10000 > > The regression tests for exp should be explicitly updated to cover the previously erroneous input if they do not do so already. Hi Darcy, Where should the test be? A new test file? Best regards, Xubo From: Joe Darcy Sent: Monday, November 9, 2020 1:33 PM To: openjdk/jdk Cc: Zhang, Xubo ; Mention Subject: Re: [openjdk/jdk] 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms (#894) @jddarcy commented on this pull request. ________________________________ In test/jdk/java/lang/Math/WorstCaseTests.java: > @@ -114,8 +114,8 @@ private static int testWorstExp() { {+0x1.A8EAD058BC6B8p3, 0x1.1D71965F516ADp19}, {+0x1.1D5C2DAEBE367p4, 0x1.A8C02E974C314p25}, {+0x1.C44CE0D716A1Ap4, 0x1.B890CA8637AE1p40}, - {+0x4.0p8, Double.POSITIVE_INFINITY}, - {+0x2.71p12, Double.POSITIVE_INFINITY}, + {+0x4.0p8, Double.POSITIVE_INFINITY}, //bug 8255368 gave 0 This is not an appropriate test to update to cover this bug. This test is specifically probing at difficult cases in double arithmetic for the underlying mathematically function as opposed to flaws in a particular implementation. ? You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From dholmes at openjdk.java.net Mon Nov 9 22:48:55 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 22:48:55 GMT Subject: RFR: 8252588: HotSpot Style Guide should permit uniform initialization In-Reply-To: References: Message-ID: <30R1KBb8P-A_bgvkcj67Bod_PBUe1z9Gk9QT3z8hC5Y=.c40c02ba-160e-4223-9635-caeca9be9f47@github.com> On Mon, 9 Nov 2020 11:05:35 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of uniform initialization, aka brace initialization, in > HotSpot code. Uniform initialization is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision to > approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. > > [Note: This is the first attempt to change the Style Guide since the > revision that added a description for a change process (requires rough > consensus of the Group), and also since the start of using git and > github PRs. I'm making a guess at how to instantiate that process > within the new mechanisms.] Seems quite reasonable. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1119 From david.holmes at oracle.com Mon Nov 9 23:07:21 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Nov 2020 09:07:21 +1000 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On 9/11/2020 7:05 pm, Erik ?sterlund wrote: > On Sat, 7 Nov 2020 17:22:21 GMT, Daniel D. Daugherty wrote: > >>> src/hotspot/share/runtime/synchronizer.cpp line 1520: >>> >>>> 1518: // deflated in this cycle. >>>> 1519: size_t deleted_count = 0; >>>> 1520: for (ObjectMonitor* monitor: delete_list) { >>> >>> I didn't realize C++ has a "foreach" loop construct! Is this in our allowed C++ usage? >> >> @fisk - this one is for you... :-) > > Yeah this is one of the new cool features we can use. I thought it is allowed, because it is neither in the excluded nor undecided list of features in our doc/hotspot-style.md file. But also not in the allowed list yet, so I'm checking on this - ref: https://bugs.openjdk.java.net/browse/JDK-8254733 Cheers, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/642 > From david.holmes at oracle.com Mon Nov 9 23:09:59 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Nov 2020 09:09:59 +1000 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: <265f6954-5a6d-1535-2e3f-48f56cb7e927@oracle.com> Hi Dan, I'm going to top-post just to keep this short :) Thanks for all the details below on the performance aspects and use of the heuristics - all good. Regarding "ceiling" versus "watermark" ... a "high watermark" is a ceiling so its really a matter of personal preference what terminology you want to use. Thanks again, David On 10/11/2020 6:16 am, Daniel D. Daugherty wrote: > Hi David, > > I'm going to try replying to this review comment via email to see how that > works out... > > On 11/9/20 1:06 AM, David Holmes wrote: >> Hi Dan, >> >> On 9/11/2020 1:50 pm, Daniel D.Daugherty wrote: >>> On Sun, 8 Nov 2020 21:43:00 GMT, David Holmes >>> wrote: >>> >>>>> How about this: >>>>> ?? static MonitorList?? _in_use_list; >>>>> ?? // The ratio of the current _in_use_list count to the ceiling is >>>>> used >>>>> ?? // to determine if we are above MonitorUsedDeflationThreshold >>>>> and need >>>>> ?? // to do an async monitor deflation cycle. The ceiling is >>>>> increased by >>>>> ?? // AvgMonitorsPerThreadEstimate when a thread is added to the >>>>> system >>>>> ?? // and is decreased by AvgMonitorsPerThreadEstimate when a >>>>> thread is >>>>> ?? // removed from the system. >>>>> ?? // Note: If the _in_use_list max exceeds the ceiling, then >>>>> ?? // monitors_used_above_threshold() will use the in_use_list max >>>>> instead >>>>> ?? // of the thread count derived ceiling because we have used more >>>>> ?? // ObjectMonitors than the estimated average. >>>>> ?? static jint????????? _in_use_list_ceiling; >>>> >>>> Thanks for the comment. So instead of checking the threshhold on >>>> each OM allocation we use this averaging technique to estimate the >>>> number of monitors in use? Can you explain how this came about >>>> rather than the simple/obvious check at allocation time. Thanks. >>> >>> I'm not sure I understand your question, but let me that a stab at it >>> anyway... >>> >>> We used to compare the sum of the in-use counts from all the in-use >>> lists >>> with the total population of ObjectMonitors. If that ratio was higher >>> than >>> MonitorUsedDeflationThreshold, then we would do an async deflation >>> cycle. >>> Since we got rid of TSM, we no longer had a population of already >>> allocated >>> ObjectMonitors, we had a max value instead. However, when the VMs use >>> of ObjectMonitors is first spinning up, the max value is typically >>> very close >>> to the in-use count so we would always be asking for an async-deflation >>> during that spinning up phase. >>> >>> I created the idea of a ceiling value that is tied to thread count >>> and the >>> AvgMonitorsPerThreadEstimate to replace the population value that we >>> used to have. By comparing the in-use count against the ceiling >>> value, we >>> no longer exceed the MonitorUsedDeflationThreshold when the VMs use >>> of ObjectMonitors is first spinning up so we no longer do async >>> deflations >>> continuously during that phase. If the max value exceeds the ceiling >>> value, >>> then we're using a LOT of ObjectMonitors and, in that case, we compare >>> the in-use count against the max to determine if we're exceeding the >>> MonitorUsedDeflationThreshold. >>> >>> Does this help? >> >> It helps but I'm still wrestling with what >> MonitorUsedDeflationThreshold actually means now. >> >> So the existing MonitorUsedDeflationThreshold is used as a measure of >> the proportion of monitors actually in-use compared to the number of >> monitors pre-allocated. > > Slight correction: not a measure of proportion, but a threshold on the > proportion. > > >> If an inflation request requires a new block to be allocated and we're >> above MonitorUsedDeflationThreshold % then a request for async >> deflation occurs (when we actually check). > > Not quite. We did have code in om_alloc() that could cause a deflation > cycle to be invoked, but that code was not enabled by default and was > removed. See the fix for: > > ??? JDK-8230940 Obsolete MonitorBound > ??? https://bugs.openjdk.java.net/browse/JDK-8230940 > > which was pushed in jdk-15-B22. > > In the current baseline, ObjectSynchronizer::is_async_deflation_needed() > is used to ask if we need to perform an async deflation cycle. That > function calls monitors_used_above_threshold() which is where the logic > that uses the MonitorUsedDeflationThreshold value comes into play. > > >> The new code, IIUC, says, lets assume we expect >> AvgMonitorsPerThreadEstimate monitors-per-thread. If the number of >> monitors in-use is > MonitorUsedDeflationThreshold % of >> (AvgMonitorsPerThreadEstimate * number_of_threads), then we request >> async deflation. > > You understand correctly. > > >> So ... obviously we need some kind of watermark based system for >> requesting deflation otherwise there will be far too many deflation >> requests. > > Yes. That's what I saw before I added AvgMonitorsPerThreadEstimate and > the ceiling. Although, I think I like the phrase "watermark" better! > > Should I rename this new ceiling concept to watermark?? > > >> And we also don't want to have check for exceeding the threshold on >> every monitor allocation. So the deflation thread will wakeup >> periodically and check if the threshold is exceeded. > > Exactly. GuaranteedSafepointInterval (default 1sec) controls how often > the MonitorDeflationThread will wake up on its own to check for work. > AsyncDeflationInterval (default 250ms) controls at most how often > the MonitorDeflationThread will actually do the work. > > However, if an async deflation cycle is directly requested, then it > will be honored regardless of the limits. At this point, only the > WhiteBox support uses that feature. Although we do have an RFE from > you to restore that feature to System.gc(): > > ? JDK-8249638 Re-instate idle monitor deflation as part of System.gc() > ? https://bugs.openjdk.java.net/browse/JDK-8249638 > > >> Okay ... so then it comes down to deciding whether >> AvgMonitorsPerThreadEstimate is the best way to establish the >> watermark and what the default value should be. This doesn't seem like >> something that an application developer could reasonably try to >> estimate so it is just going to be a tuning knob they adjust somewhat >> arbitrarily. > > I actually don't expect anyone to care about AvgMonitorsPerThreadEstimate, > but I've created it just-in-case. > > >> I assume the 1024 default came from tuning something? > > The default 1024 value is the closest I can come to the baseline VM > behavior where we allowed a per-thread free-list to allocate at most > 1024 ObjectMonitors in one attempt. Since the per-thread free list is > the fastest code path for an ObjectMonitor allocation in the baseline > VM, I thought using that value made for a reasonable default for the > AvgMonitorsPerThreadEstimate. > > However, we don't pre-allocate in the new code so a thread doesn't > actually have (at most) 1024 ObjectMonitors waiting on the per-thread > free list for fast allocation. > > >> Have you looked at the affect on memory use these changes have (ie >> peak RSS use)? > > Actually what I've been looking at is population values in the baseline > VM and the max in the new code with Kitchensink8H runs. I don't have > the absolutely latest values, but, IIRC, the new code has a much smaller > max value than the baseline population value. > > Obviously this just tells me the raw numbers of ObjectMonitors and doesn't > include any "hidden" overhead that would be captured by RSS values, but I > think it gives me a reasonable feel for memory utilization. > > >> Did your performance measurements look at using different values? > > No. All my performance runs used default values. I tend to focus on > out-of-the-box settings unless I'm looking to justify changing some > default value. > > >> (I can imagine that with enough memory we can effectively disable >> deflation and so potentially increase performance. OTOH maybe >> deflation is so infrequent it is a non-issue.) >> >> I have to confess that I never really thought about the old set of >> heuristics for this, but the fact we're changing the heuristics does >> raise a concern about what impact applications may see. > > None of the performance testing that we've done so far has raised any > concerns about the new heuristics. > > The new ObjectMonitor inflation mechanism is much, much faster than > the old mechanism. During my Inflate2 stress testing, the baseline VM > would peak at a population of about 12 million at 4.5 hours into an 8 > hour run with release bits. With the new code, Inflate2 would reach a > max of 400+ million at about an hour with no signs that was the actual > peak when I stopped the run. > > >> BTW MonitorUsedDeflationThreshold should really be diagnostic not >> experimental, as real applications may need to tune it (and people >> often don't want to use experimental flags in production as a matter >> of policy). > > MonitorUsedDeflationThreshold wasn't added with this project. It was > added by Robbin using this bug ID: > > ??? JDK-8181859 Monitor deflation is not checked in cleanup path > ??? https://bugs.openjdk.java.net/browse/JDK-8181859 > > way back in jdk-10-B21... I don't know the reason that Robbin created > the option as experimental rather than diagnostic, but I can investigate. > > Thanks again for the review! > > > At this point, I don't see anything that I plan to change in response > to this set of comments. I do have a query up above about renaming the > ceiling concept to watermark. Please let me know what you think. > > Dan > >> >> Thanks, >> David >> ----- >> >>> ------------- >>> >>> PR: https://git.openjdk.java.net/jdk/pull/642 >>> > From dcubed at openjdk.java.net Mon Nov 9 23:09:59 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 23:09:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Mon, 9 Nov 2020 09:03:37 GMT, Erik ?sterlund wrote: >> @fisk - this one is for you... :-) > > Yeah this is one of the new cool features we can use. I thought it is allowed, because it is neither in the excluded nor undecided list of features in our doc/hotspot-style.md file. https://bugs.openjdk.java.net/browse/JDK-8254733 ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Mon Nov 9 23:18:03 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 23:18:03 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Fri, 6 Nov 2020 02:40:44 GMT, David Holmes wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> Resolve more @dholmes-ora comments with help from @fisk. > > Hi Dan, > > Overall this looks great. Comparing old and new code is complex but the new code on its own is generally much simpler/clearer (not all though :) ). > > I have a few nits, comments and queries below. > > Thanks, > David @dholmes-ora - I'll stick with ceiling for now. If you're satisfied with the changeset, please mark this PR as approved. @fisk - Please mark this PR as approved if you're happy with the current version. @robehn and @coleenp - It would be good to hear from one or both of you... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From david.holmes at oracle.com Mon Nov 9 23:18:38 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Nov 2020 09:18:38 +1000 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v2] In-Reply-To: <-VFMKh1P4U674scgRXbBch7b4eE91O4ZelExZOeXByw=.e4acb6ea-a3c6-4ac0-a6d2-fe2a09cc178b@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> <-VFMKh1P4U674scgRXbBch7b4eE91O4ZelExZOeXByw=.e4acb6ea-a3c6-4ac0-a6d2-fe2a09cc178b@github.com> Message-ID: <6dcb9110-fb3e-11c6-6685-6415650025c8@oracle.com> On 10/11/2020 6:39 am, Daniel D.Daugherty wrote: > On Mon, 9 Nov 2020 08:34:56 GMT, Erik ?sterlund wrote: > >>> I noticed that in my preliminary review of Erik's changes. He checked >>> with the JFR guys and they said it just needed to be an address and >>> does not have to refer to the Object. >>> >>> @fisk - can you think of a comment we should add here? >> >> We could write something along the lines of "An address that is 'unique enough', such that events close in time and with the same address are likely (but not guaranteed) to belong to the same object". This uniqueness property has always been more of a heuristic thing than anything else, as deflation shuffles the addresses around. Taking the this pointer vs an offset into the this pointer does however serve the exact same purpose; there was never any correlation to the contents of the object field. > > Thanks @fisk! I've added a slightly edited version of the comment: > // Set an address that is 'unique enough', such that events close in > // time and with the same address are likely (but not guaranteed) to > // belong to the same object. > @dholmes-ora - does this work for you? Yes that is fine - thanks. David ----- From dholmes at openjdk.java.net Mon Nov 9 23:25:00 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 9 Nov 2020 23:25:00 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Mon, 9 Nov 2020 20:51:17 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > Resolve more @dholmes-ora comments with help from @fisk. src/hotspot/share/runtime/synchronizer.cpp line 89: > 87: ObjectMonitor* head = Atomic::load_acquire(&_head); > 88: ObjectMonitor* m = head; > 89: do { This wasn't the loop I was referring to. It is the while loop below this at line 93. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From daniel.daugherty at oracle.com Mon Nov 9 23:29:03 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 9 Nov 2020 18:29:03 -0500 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On 11/9/20 6:25 PM, David Holmes wrote: > On Mon, 9 Nov 2020 20:51:17 GMT, Daniel D. Daugherty wrote: > >>> Changes from @fisk and @dcubed-ojdk to: >>> >>> - simplify ObjectMonitor list management >>> - get rid of Type-Stable Memory (TSM) >>> >>> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >>> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >>> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >>> - a few minor regressions (<= -0.24%) >>> - Volano is 6.8% better >>> >>> Eric C. has also done promotion perf runs on these bits and says "the results look fine". >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> Resolve more @dholmes-ora comments with help from @fisk. > src/hotspot/share/runtime/synchronizer.cpp line 89: > >> 87: ObjectMonitor* head = Atomic::load_acquire(&_head); >> 88: ObjectMonitor* m = head; >> 89: do { > This wasn't the loop I was referring to. It is the while loop below this at line 93. The PR is showing comments from you for both while loops. And at this point, both have been tweaked... Dan > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Mon Nov 9 23:42:56 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 9 Nov 2020 23:42:56 GMT Subject: RFR: 8252588: HotSpot Style Guide should permit uniform initialization In-Reply-To: References: Message-ID: <7rBn7xI7FHJ6joVi62s-l1Tin4DRo3_nRyqpLgPaZw0=.8ff99cdc-6d79-44ca-827b-3be653ad44ca@github.com> On Mon, 9 Nov 2020 11:05:35 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of uniform initialization, aka brace initialization, in > HotSpot code. Uniform initialization is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision to > approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. > > [Note: This is the first attempt to change the Style Guide since the > revision that added a description for a change process (requires rough > consensus of the Group), and also since the start of using git and > github PRs. I'm making a guess at how to instantiate that process > within the new mechanisms.] Looks like a useful addition. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1119 From dholmes at openjdk.java.net Tue Nov 10 01:05:59 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 10 Nov 2020 01:05:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: <9K1VRMjKJ4TIYx2Dc2Cn98vmFTIAaZ-l903nmvcl3rI=.c91f9d0b-e385-4c6c-977e-5555175a3783@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> <9K1VRMjKJ4TIYx2Dc2Cn98vmFTIAaZ-l903nmvcl3rI=.c91f9d0b-e385-4c6c-977e-5555175a3783@github.com> Message-ID: On Mon, 9 Nov 2020 20:42:19 GMT, Daniel D. Daugherty wrote: >> Changing it to a do/while loop makes sense. The while condition is always true the first iteration, so doing a while or do/while loop is equivalent. If you find the do/while loop easier to read, then that sounds good to me. > > Okay. I've changed it to a do-while loop. The loop that was being discussed here is the one on line 93 of the original changeset: `while (next != NULL && next->is_being_async_deflated()) {` I made no comment on the outer while loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From david.holmes at oracle.com Tue Nov 10 01:33:11 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Nov 2020 11:33:11 +1000 Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> Message-ID: <304228dd-5ef8-d311-f4ba-cbc969e369b6@oracle.com> Let me just add a little to what Thomas says ... On 10/11/2020 6:55 am, Thomas Stuefe wrote: > On Mon, 9 Nov 2020 15:55:13 GMT, Gerard Ziemski wrote: > >>>> Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. >>>> I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. >>> >>> >>> The only reason we would care is if signals_posix.hpp were included in >>> many other handers/files and that should not be the case. This looks >>> completely bogus to me as we need the types from signal.h in this header >>> file. >> >> Just trying to learn: why do we **need** them? We only included them in the APIs here, but we don't actually **use** them otherwise. They are external types that we need to use in the declarations of our API, so the general rule is that you include the header that defines those types (directly or knowingly indirectly). That applies to hotspot types and system types in the first instance. >> >> And if we need to include `` then don't we also need to include the headers for `outputStream`, `Thread` and `OSThread`? All of these types are used to define the APIs in this header file and are used in the same capacity. For hotspot, as our headers are somewhat of a mess, we often have to simplify things (ie avoid circular dependencies) by eliding includes if we can get away with simple class forward declarations. e.g. class Thread; class outputStream; More recently there has been a push to improve build times by detecting excessive includes of particular headers, and avoiding those includes by using forward declarations if possible; or by moving code to cpp files. To me this current situation with posix_signal.hpp and signal.h does not meet any of the criteria to try and avoid the include. > You need to include them if you: > - use the full type, so whoever compiles the header has to know the type size. E.g. if you pass a structure by value. > - use the type as pointer and do not forward declare it. > > In hotspot, it is standard practice to forward-declare structures from some hotspot utility headers - eg ostream.hpp - to avoid including them. There is no guideline of when to do this, and obviously there is a point at which it is simpler and clearer to just include the header. > > In my personal opinion forward declaration is just a bandaid to work around badly designed headers. Ideally, headers should be small and concise. But hotspot headers are balls of yarn. Pull one in you get a bunch of unrelated others. So people got used to forward declaring some classes instead, things like outputStream. I actually think its a bad practice and in the ideal world we would just include whatever we need. > > Which also means that the benefit of forward-declaring types from system headers is limited. Posix headers are usually well designed, and you are better off just including them. Especially since you need to be careful here: it is not clear what these opaque posix types are actually. Sometimes the standard tells you: "The header shall define the siginfo_t type as a structure.." but sometimes it leaves it open: "sigset_t ... Integer or structure type ...". > >> >>> What do those typedefs even mean? I would expect a forward >>> declaration to be of the form: >>> struct siginfo_t; >>> but you don't know what type sigset_t (could be integer or struct) >>> actually is so you can't forward declare it that way. >> >> Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. >> >> Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? > > For hotspot classes, I would leave the forward declarations in and the headers out. Current standard practice. System headers OTOH I would either include here or, somewhat better, in globalDefinitions_gcc.hpp. Ha! It is already in globalDefinitions_gcc.hpp so neither the direct include nor the forward declarations are actually needed. Cheers, David ----- > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/636 > From dholmes at openjdk.java.net Tue Nov 10 01:51:02 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 10 Nov 2020 01:51:02 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v16] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 18:25:27 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: > > - Merge pull request #6 from JornVernee/MoveUpcallInfo > > Address review comments > - - Use lazy constant for upcall_info > - Reduce copied code between platforms > - Clean up includes > - Split thread attach from upcall_helper src/hotspot/cpu/aarch64/universalNativeInvoker_aarch64.cpp line 29: > 27: #include "prims/universalNativeInvoker.hpp" > 28: #include "memory/resourceArea.hpp" > 29: #include "code/codeBlob.hpp" Nit: includes should be in alphabetical order src/hotspot/cpu/x86/universalNativeInvoker_x86.cpp line 28: > 26: #include "prims/universalNativeInvoker.hpp" > 27: #include "memory/resourceArea.hpp" > 28: #include "code/codeBlob.hpp" Nit: includes should be in alphabetical order src/hotspot/share/prims/universalUpcallHandler.cpp line 54: > 52: if (thread == nullptr) { > 53: JavaVM_ *vm = (JavaVM *)(&main_vm); > 54: vm->functions->AttachCurrentThread(vm, (void**) &p_env, nullptr); The return value should be checked in case of errors. Making this non-daemon now is probably somewhat safer. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From david.holmes at oracle.com Tue Nov 10 02:13:24 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Nov 2020 12:13:24 +1000 Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: <_Nl5ypHkLlY3aqitzjfT_Rot6lDm6TX3VS3KbTE_gBg=.4a0f5742-bad8-4ceb-805d-2e7e0c51ebf4@github.com> References: <_Nl5ypHkLlY3aqitzjfT_Rot6lDm6TX3VS3KbTE_gBg=.4a0f5742-bad8-4ceb-805d-2e7e0c51ebf4@github.com> Message-ID: On 10/11/2020 2:34 am, Jorn Vernee wrote: > On Mon, 9 Nov 2020 12:11:56 GMT, Jorn Vernee wrote: > >>> I agree with Coleen. >> >> I'll give this another try, but I think last time I tried this resolution of the class failed when trying to build the JDK, seemingly since it exists in an incubator module, which is not always added to the module graph. > > Ok, I can confirm that moving this to be a well-known class will result in a `java/lang/NoClassDefFoundError: jdk/internal/foreign/abi/ProgrammableUpcallHandler` error while trying to build the JDK. I think this is because the particular class is in an incubator module, which is not always present. Right ... well-known classes appear to be limited to being in java.base module. > I think we'll have to stick with the lazy resolution instead. I think this could still be done non-racily during VM startup, after module system initialization i.e. between: call_initPhase2() ... call_initPhase3() in Threads::create_vm. And it could still use the mechanisms in systemDictionary to define a global accessor I think, even if not initialized with the other "well known" classes. I don't have a good mental picture of how all the pieces of this connect in terms of Java APIs and VM entry points so these structuring suggestions may, or may not make sense. Potentially this could be a future cleanup anyway. Thanks, David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/634 > From dcubed at openjdk.java.net Tue Nov 10 02:17:57 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 02:17:57 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> <9K1VRMjKJ4TIYx2Dc2Cn98vmFTIAaZ-l903nmvcl3rI=.c91f9d0b-e385-4c6c-977e-5555175a3783@github.com> Message-ID: On Tue, 10 Nov 2020 01:03:00 GMT, David Holmes wrote: >> Okay. I've changed it to a do-while loop. > > The loop that was being discussed here is the one on line 93 of the original changeset: > `while (next != NULL && next->is_being_async_deflated()) {` > I made no comment on the outer while loop. You're correct. I got confused by the line renumbering due to other changes. Fortunately, the outer while loop can also correctly and easily be changed into a do-while loop so I'll keep that change. I'm looking at the correct inner while loop now. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 02:35:17 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 02:35:17 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also done promotion perf runs on these bits and says "the results look fine". Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/642/files - new: https://git.openjdk.java.net/jdk/pull/642/files/2b668f08..aec90b9a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/642.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/642/head:pull/642 PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 02:35:18 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 02:35:18 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> <9K1VRMjKJ4TIYx2Dc2Cn98vmFTIAaZ-l903nmvcl3rI=.c91f9d0b-e385-4c6c-977e-5555175a3783@github.com> Message-ID: On Tue, 10 Nov 2020 02:15:19 GMT, Daniel D. Daugherty wrote: >> The loop that was being discussed here is the one on line 93 of the original changeset: >> `while (next != NULL && next->is_being_async_deflated()) {` >> I made no comment on the outer while loop. > > You're correct. I got confused by the line renumbering due to other changes. > Fortunately, the outer while loop can also correctly and easily be changed > into a do-while loop so I'll keep that change. I'm looking at the correct inner > while loop now. The inner while loop is now converted into a do-while loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 02:35:19 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 02:35:19 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v3] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: <9lQzjB5MoTaMBnO_24juUkl6I3s8DONmuSt5CiKCyP8=.8dd7b150-6953-430d-bf56-91b738af7832@github.com> On Mon, 9 Nov 2020 23:21:48 GMT, David Holmes wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> Resolve more @dholmes-ora comments with help from @fisk. > > src/hotspot/share/runtime/synchronizer.cpp line 89: > >> 87: ObjectMonitor* head = Atomic::load_acquire(&_head); >> 88: ObjectMonitor* m = head; >> 89: do { > > This wasn't the loop I was referring to. It is the while loop below this at line 93. The inner while loop is now converted into a do-while loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dongbo at openjdk.java.net Tue Nov 10 02:59:09 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 10 Nov 2020 02:59:09 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v6] In-Reply-To: References: Message-ID: > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: fix register naming style ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/992/files - new: https://git.openjdk.java.net/jdk/pull/992/files/6d6103c5..e3380c84 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=04-05 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/992.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992 PR: https://git.openjdk.java.net/jdk/pull/992 From dongbo at openjdk.java.net Tue Nov 10 03:20:09 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 10 Nov 2020 03:20:09 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v7] In-Reply-To: References: Message-ID: <4GWrn1Ligk5L4R_g0yl54gwOcKOIguJeTgTVXf14cb8=.d63bbfbb-b257-42b6-a232-a7be33e268ef@github.com> > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: fix register naming style ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/992/files - new: https://git.openjdk.java.net/jdk/pull/992/files/e3380c84..5f4bc36c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=992&range=05-06 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/992.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/992/head:pull/992 PR: https://git.openjdk.java.net/jdk/pull/992 From dongbo at openjdk.java.net Tue Nov 10 03:22:56 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 10 Nov 2020 03:22:56 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic In-Reply-To: References: Message-ID: <8KxHHfqAM9M5Nkohsy9Qfz9GgFECjhHJ7AUzRAVq1yI=.144e66b0-92f1-409d-8b8c-4e8b82f33e4b@github.com> On Mon, 2 Nov 2020 03:05:48 GMT, Dong Bo wrote: > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op @theRealAph Fixed the register name style in the newest version. I added a comment line to make it clear that c_rarg6 and c_rarg7 are not arguments, but free registers to use as temps. Suggestions? ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From david.holmes at oracle.com Tue Nov 10 03:30:13 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Nov 2020 13:30:13 +1000 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> <9K1VRMjKJ4TIYx2Dc2Cn98vmFTIAaZ-l903nmvcl3rI=.c91f9d0b-e385-4c6c-977e-5555175a3783@github.com> Message-ID: <081aacad-8ab4-a928-5d43-3584777bc7cf@oracle.com> On 10/11/2020 12:35 pm, Daniel D.Daugherty wrote: > On Tue, 10 Nov 2020 02:15:19 GMT, Daniel D. Daugherty wrote: > >>> The loop that was being discussed here is the one on line 93 of the original changeset: >>> `while (next != NULL && next->is_being_async_deflated()) {` >>> I made no comment on the outer while loop. >> >> You're correct. I got confused by the line renumbering due to other changes. >> Fortunately, the outer while loop can also correctly and easily be changed >> into a do-while loop so I'll keep that change. I'm looking at the correct inner >> while loop now. > > The inner while loop is now converted into a do-while loop. There was a little more to it than just that, but nevermind I think I see why next_next is needed, so lets move on. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/642 > From dholmes at openjdk.java.net Tue Nov 10 03:32:01 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 10 Nov 2020 03:32:01 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 02:35:17 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From shade at openjdk.java.net Tue Nov 10 06:32:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 06:32:57 GMT Subject: Integrated: 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 19:28:29 GMT, Aleksey Shipilev wrote: > When doing Zero VM performance investigations, I realized that `UseTLAB` is disabled there by default. > > That is effectively because `UseTLAB` is currently `product_pd` (defined in `gc_globals.hpp`), and it is enabled for every platform with C1 and C2 ports (in their respective `c1/c2_globals.hpp`). `compiler_globals.hpp` has a block that defines `UseTLAB` to `false` when no C1/C2/JVMCI is present. > > Not only this is awkward -- GC flag is managed by Compiler globals! -- it makes Zero awkward to opt-in to `UseTLAB` in `*_zero_globals.hpp`, because `compiler_globals.hpp` already defines it. I think we can make this all better by turning TLAB flags from `product_pd` to `product`, and defaulting them to `true`. This matches what every current Server/Minimal VM config has, and would implicitly enable `UseTLAB` and `ResizeTLAB` for Zero, as well as for builds with `--with-jvm-features=-compiler1,-compiler-2,-jvmci` (in case anyone actually builds it, that is only with template interpreter). > > On the downside, this shuts the door for new platform ports to disable TLAB flags by default to ease porting. But I believe the same can be achieved by turning these flags off in `arguments.cpp` under the special platform defines, while TLAB enablement work is in progress. > > Additional testing: > - [x] Linux x86_64 Zero ad-hoc runs > - [x] Linux x86_64 `--with-jvm-features=-compiler1,-compiler-2,-jvmci,*` builds This pull request has now been integrated. Changeset: 4bc065cf Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/4bc065cf Stats: 25 lines in 12 files changed: 1 ins; 22 del; 2 mod 8255782: Turn UseTLAB and ResizeTLAB from product_pd to product, defaulting to "true" Reviewed-by: stuefe, stefank, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/1019 From shade at openjdk.java.net Tue Nov 10 07:22:07 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 07:22:07 GMT Subject: RFR: 8255822: Zero: improve build-time JVMTI handling [v3] In-Reply-To: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> References: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> Message-ID: > Current Zero interpreter has the optimization for JVMTI support. It recognizes that JVMTI is disabled most of the time, and that JVMTI checks in the interpreter code slows it down considerably. (I measured it myself when working on this patch: removing this optimization yields about 20% hit in build times). > > Current optimization works as follows. At build time, an XSLT transform is performed on `bytecodeInterpreter.cpp`, yielding `bytecodeInterpreterWithChecks.cpp`. In that new compilation unit, `VM_JVMTI` macro is defined, and a new entry point -- `BytecodeInterpreter::withChecks` -- is defined. Then, both compilation units are compiled. In one of them, `JVMTI` hooks are stripped out. In another, they persist. Then, callers have to choose which entry point to use. > > I believe this can be rewritten to use C++ templates instead of XLST and defines dance. This also allows to clean up JVMTI checks a bit. > > Additional testing: > - [x] Linux x86_64 Zero fastdebug build with `-jvmti` > - [x] Linux x86_64 Zero fastdebug/release build times are not regressing Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Simplify a few JVMTI_ENABLED blocks - Merge branch 'master' into JDK-8255822-zero-jvmti-rework - Fix build error - Merge branch 'master' into JDK-8255822-zero-jvmti-rework - Revert one dubious change - 8255822: Zero: improve build-time JVMTI handling Summary: use C++ templates instead of XSLT transforms ------------- Changes: https://git.openjdk.java.net/jdk/pull/1061/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1061&range=02 Stats: 188 lines in 6 files changed: 5 ins; 128 del; 55 mod Patch: https://git.openjdk.java.net/jdk/pull/1061.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1061/head:pull/1061 PR: https://git.openjdk.java.net/jdk/pull/1061 From aph at openjdk.java.net Tue Nov 10 08:41:00 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 10 Nov 2020 08:41:00 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v5] In-Reply-To: References: Message-ID: <1VK0__nxU327zmURyR3mVD402EmGZPIuzyjMXPP0Wyw=.48883f69-758d-4e69-8e19-363cb0c06177@github.com> On Tue, 3 Nov 2020 11:57:16 GMT, Dong Bo wrote: >> Base64.encodeBlock stub is implemented for x86_64. >> We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. >> A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. >> >> Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. >> Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. >> >> A JMH micro, Base64Encode.java, is added for performance test. >> With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), >> we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. >> >> The Base64Encode.java JMH micro-benchmark results: >> Benchmark (maxNumBytes) Mode Cnt Score Error Units >> # kunpeng 916, intrinsic >> Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op >> Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op >> Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op >> Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op >> Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op >> Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op >> Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op >> Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op >> Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op >> Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op >> Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op >> >> # kunpeng 916, default >> Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op >> Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op >> Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op >> Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op >> Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op >> Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op >> Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op >> Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op >> Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op >> Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op >> Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op >> >> # kunpeng 920, intrinsic >> Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op >> Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op >> Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op >> Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op >> Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op >> Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op >> Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op >> Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op >> Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op >> Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op >> Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op >> >> # kunpeng 920, default >> Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op >> Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op >> Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op >> Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op >> Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op >> Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op >> Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op >> Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op >> Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op >> Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op >> Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > use r6/r7 instead of scratch registers Thanks. I'm sorry that this took so long. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/992 From aph at openjdk.java.net Tue Nov 10 08:41:01 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 10 Nov 2020 08:41:01 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4] In-Reply-To: References: <_p5rQggkG8fBak9RrEDnoOzRnl8hsezlFjNmN_MFRiU=.c4167eaf-79f4-45eb-9fa9-2d0613274392@github.com> Message-ID: On Tue, 3 Nov 2020 17:00:27 GMT, Andrew Haley wrote: >> Done, I didn't realize this would be a problem at all. >> Thanks for the clarification. > > OK, thanks, I need to write some of this stuff down as guidance. Aliases for register names are always risky, but for the scratch registers doubly so. Oh, and please make up your mind wether to use "r6" or "c_rarg6". I know they're the same really. With that, this patch is good. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From dongbo at openjdk.java.net Tue Nov 10 09:10:02 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 10 Nov 2020 09:10:02 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v4] In-Reply-To: References: <_p5rQggkG8fBak9RrEDnoOzRnl8hsezlFjNmN_MFRiU=.c4167eaf-79f4-45eb-9fa9-2d0613274392@github.com> Message-ID: On Wed, 4 Nov 2020 01:43:22 GMT, Dong Bo wrote: >> Oh, and please make up your mind wether to use "r6" or "c_rarg6". I know they're the same really. With that, this patch is good. > > That's great! I think we will walk into much less detours if there is a guidance which contains empirical coding rules in it. > > For this patch, if we do not need further modifications, could you please press the approval button? Thanks. Agree with `c_rarg6`, it's better for readers to understand. In newest version, I have replaced these with c_rarg6/7, that is: // c_rarg6 and c_rarg7 are free to use as temps Register codec = c_rarg6; Register length = c_rarg7; Are we good to go? ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From rkennke at openjdk.java.net Tue Nov 10 09:32:09 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 09:32:09 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v3] In-Reply-To: References: Message-ID: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > Testing: hotspot_gc_shenandoah, tier1 & tier2 with -XX:+UseShenandoahGC Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Ensure correct strength and width of LRB runtime calls - Merge branch 'master' into JDK-8256011 - Merge branch 'master' into JDK-8256011 - Simplify condition, don't resurrect finalizably reachable objects even on phantom access - 8256011: Shenandoah: Don't resurrect finalizably reachable objects ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/189be782..8d979eea Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=01-02 Stats: 47771 lines in 313 files changed: 26190 ins; 14261 del; 7320 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From shade at openjdk.java.net Tue Nov 10 09:53:05 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 09:53:05 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v3] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 09:32:09 GMT, Roman Kennke wrote: >> In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. >> >> Testing: hotspot_gc_shenandoah, tier1 & tier2 with -XX:+UseShenandoahGC > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Ensure correct strength and width of LRB runtime calls > - Merge branch 'master' into JDK-8256011 > - Merge branch 'master' into JDK-8256011 > - Simplify condition, don't resurrect finalizably reachable objects even on phantom access > - 8256011: Shenandoah: Don't resurrect finalizably reachable objects I have many questions :) src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 863: > 861: __ call(RuntimeAddress(bs->load_reference_barrier_weak_rt_code_blob()->code_begin())); > 862: } else { > 863: assert(is_phantom, "only remaining strenght"); Typo: "strenght" src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 944: > 942: if (UseCompressedOops) { > 943: __ call_VM_leaf(CAST_FROM_FN_PTR(address, ShenandoahRuntime::load_reference_barrier_strong_narrow), c_rarg0, > 944: c_rarg1); Stick with newline style? Is `c_rarg1` on new line or not in this method? src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 1063: > 1061: Node* in2 = n->in(2); > 1062: > 1063: // If one input is NULL, then step over the barriers normal LRB barriers on the other input "strong LRBs barriers" now... :) src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 984: > 982: calladdr = CAST_FROM_FN_PTR(address, ShenandoahRuntime::load_reference_barrier_strong); > 983: } > 984: name = "load_reference_barrier_strong"; Why not separate "load_reference_barrier_strong" and "load_reference_barrier_strong_narrow"? src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 2910: > 2908: > 2909: ShenandoahLoadReferenceBarrierNode::ShenandoahLoadReferenceBarrierNode(Node* ctrl, Node* obj, DecoratorSet decorators) > 2910: : Node(ctrl, obj), _decorators(decorators & (ON_STRONG_OOP_REF | ON_WEAK_OOP_REF | ON_PHANTOM_OOP_REF | ON_UNKNOWN_OOP_REF | IN_NATIVE)) { Do we really want to strip bits from decorator here? Aren't we testing specific bits anyway downstream? I can imagine some downstream code in the future would be surprised not to see some bits that are actually set, but stripped here? Is this for `LRBNode::hash/cmp` stability? It should be stripped in `hash/cmp` then. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp line 58: > 56: > 57: static bool is_strong_access(DecoratorSet decorators) { > 58: return (decorators & (ON_WEAK_OOP_REF | ON_PHANTOM_OOP_REF | ON_UNKNOWN_OOP_REF)) == 0; Is this the same as `decorators & ON_STRONG_OOP_REF`? src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 105: > 103: inline oop ShenandoahBarrierSet::load_reference_barrier(oop obj, T* load_addr) { > 104: > 105: // Prevent resurrection of unreachable non-strorg references. A chance to fix typo "non-strorg" here. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp line 62: > 60: > 61: static bool is_weak_access(DecoratorSet decorators) { > 62: return (decorators & (ON_WEAK_OOP_REF | ON_UNKNOWN_OOP_REF)) != 0; Um. Shouldn't `ON_UNKNOWN_OOP_REF` default to strong? I.e. the access via Unsafe should probably resurrect the object, otherwise it corrupts the heap? src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 108: > 106: if (!HasDecorator::value && obj != NULL && > 107: _heap->is_concurrent_weak_root_in_progress() && > 108: !(HasDecorator::value ? _heap->marking_context()->is_marked(obj) Maybe split out boolean locals for decorator tests? A matter of style, your call. src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 1066: > 1064: if (in1->bottom_type() == TypePtr::NULL_PTR && > 1065: !((in2->Opcode() == Op_ShenandoahLoadReferenceBarrier) && > 1066: (((ShenandoahLoadReferenceBarrierNode*)in2)->decorators() & ON_STRONG_OOP_REF) == 0)) { Is this `ShenandoahBarrierSet::is_strong_access`? ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1109 From tschatzl at openjdk.java.net Tue Nov 10 09:58:03 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Nov 2020 09:58:03 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v4] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 18:07:48 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> sjohanss review 2 > > src/hotspot/share/gc/g1/g1FullGCHeapRegionAttr.hpp line 58: > >> 56: bool is_pinned_or_closed(HeapWord* obj) const { >> 57: assert(!is_invalid(obj), "not initialized yet"); >> 58: return get_by_address(obj) >= Pinned; > > Better have a static assert that `ClosedArchive > Pinned`. Okay, will do. > src/hotspot/share/gc/g1/g1FullGCHeapRegionAttr.hpp line 37: > >> 35: static const uint8_t ClosedArchive = 2; >> 36: >> 37: static const uint8_t Invalid = 255; > > Why use 255 as the default value? Prior to this PR, the default value is 0. I think it's best to keep it intact. (Probably, the compiler can optimize `clear()` better if each element is reset to 0.) I do not understand what "Prior to this PR the default value is 0" means, can you clarify? The default value for the biased arrays must always be defined by the child classes overriding default_value() so there does not seem to be a pre-defined "prior" value. Other than that the reason for using 255 is easier discrimination from the valid values in the debugger, just like hotspot uses something like 0xdeadbeef for various kinds of invalid memory. There is no difference for the compiler for having 0 or 255 except for maybe the part about initializing the default value (on probably older x86 processors). However compared to the overhead of setting up an efficient byte-wise `clear()` method is much larger than in this case a difference between `xor reg, reg` vs. `movsx reg, -1` (e.g. for x86, the former is/having been a common trick to quickly clear registers). That pales in comparison to the actual work. Some architectures (ppc, sparc, iirc likely some more) have some special "clear cache line" instruction which may be used to clear memory "quickly" to zero, but it is not used since they often come with large caveats. Otherwise the clearing of the this small table (one byte per region) is a one-time cost at startup of a full gc operations that may take billions of cpu cycles. It's not worth thinking about overhead of the clearing in this case imho. > src/hotspot/share/gc/g1/g1FullCollector.hpp line 76: > >> 74: ReferenceProcessorSubjectToDiscoveryMutator _is_subject_mutator; >> 75: >> 76: G1FullGCHeapRegionAttr _region_attr_table; > > I don't really see the point of this auxiliary data structure. Why can't we just query the underlying region for its type, pinned, open/close archive? The only reason is performance: Consider memory access with that auxiliary data structure: it is a single dereference/memory load on a very dense data structure. Directly querying the HeapRegion* is two dependent dereferences (first getting the HeapRegion*, then to the HeapRegion, then accessing the member) on a fairly large data structure (HeapRegion). Since these loads are done for every reference (millions of times) the second is much slower, at best filling the cache with useless data as (roughly) a HeapRegion is around a cache line in size (iirc). Further, with the current encoding of the values in region attribute table, the closed-or-pinned check can be done with a single check instead of some disjunction (ie. region->is_pinned() || region->is_closed(), although it is true that closed regions are always pinned), so saving even more code (and branch locations). ------------- PR: https://git.openjdk.java.net/jdk/pull/824 From tschatzl at openjdk.java.net Tue Nov 10 09:59:55 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Nov 2020 09:59:55 GMT Subject: RFR: 8252588: HotSpot Style Guide should permit uniform initialization In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:05:35 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of uniform initialization, aka brace initialization, in > HotSpot code. Uniform initialization is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision to > approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. > > [Note: This is the first attempt to change the Style Guide since the > revision that added a description for a change process (requires rough > consensus of the Group), and also since the start of using git and > github PRs. I'm making a guess at how to instantiate that process > within the new mechanisms.] Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1119 From tschatzl at openjdk.java.net Tue Nov 10 10:17:11 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Nov 2020 10:17:11 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v5] In-Reply-To: References: Message-ID: <5pZEcUBz7l9O7nN2kEbLOp6SIS8WjPrOnokK0aNgJZE=.272c196a-fe3c-480b-9b80-412a58d85830@github.com> > Hi all, > > can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? > > By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? > > Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: > - humongous regions are either live or fully reclaimed. > - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). > > This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. > > Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). > > Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. > Performance testing: no regressions > > Some comments for questions that might come up during review: > > - how does this work with the bitmaps now: > - at start of full gc the next bitmap is cleared > - full gc marks the next bitmap > - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom > - swap bitmaps > - clear next bitmap for next marking > > (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. > > - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. > > Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. > (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). > > I.e. the second clause in the condition of this hunk is intentionally slower than could be: > @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { > // Marked by us, preserve if needed. > markWord mark = obj->mark(); > if (obj->mark_must_be_preserved(mark) && > // It is not necessary to preserve marks for objects in pinned regions because > // we do not change their headers (i.e. forward them). > !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { > preserved_stack()->push(obj, mark); > } > - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. > > Also please note that the 51b297b change is from the #808 change. > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into 8253600-full-gc-pinned-region-support - ayang review - sjohanss review 2 - Merge branch 'master' into 8253600-full-gc-pinned-region-support - Merge branch 'master' into 8253600-full-gc-pinned-region-support - sjohanss review Also remove _archive_allocator_map et al as the new attribute table implements the same functionality also suggested by sjohanss in private. - Initial import - Initial import ------------- Changes: https://git.openjdk.java.net/jdk/pull/824/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=824&range=04 Stats: 565 lines in 31 files changed: 270 ins; 199 del; 96 mod Patch: https://git.openjdk.java.net/jdk/pull/824.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/824/head:pull/824 PR: https://git.openjdk.java.net/jdk/pull/824 From rkennke at openjdk.java.net Tue Nov 10 10:23:03 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 10:23:03 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v3] In-Reply-To: References: Message-ID: <1MKAddYmmZwDvkKF8UxfITgxVfm0o2Tcbg4ZotOHXZk=.fc4a354d-668f-4d8d-8da3-1277a08bca4e@github.com> On Tue, 10 Nov 2020 09:41:52 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Ensure correct strength and width of LRB runtime calls >> - Merge branch 'master' into JDK-8256011 >> - Merge branch 'master' into JDK-8256011 >> - Simplify condition, don't resurrect finalizably reachable objects even on phantom access >> - 8256011: Shenandoah: Don't resurrect finalizably reachable objects > > src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 984: > >> 982: calladdr = CAST_FROM_FN_PTR(address, ShenandoahRuntime::load_reference_barrier_strong); >> 983: } >> 984: name = "load_reference_barrier_strong"; > > Why not separate "load_reference_barrier_strong" and "load_reference_barrier_strong_narrow"? Dunno. Could do that. Within one instance of JVM, we would never see both narrow/non-narrow versions, though. > src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 2910: > >> 2908: >> 2909: ShenandoahLoadReferenceBarrierNode::ShenandoahLoadReferenceBarrierNode(Node* ctrl, Node* obj, DecoratorSet decorators) >> 2910: : Node(ctrl, obj), _decorators(decorators & (ON_STRONG_OOP_REF | ON_WEAK_OOP_REF | ON_PHANTOM_OOP_REF | ON_UNKNOWN_OOP_REF | IN_NATIVE)) { > > Do we really want to strip bits from decorator here? Aren't we testing specific bits anyway downstream? I can imagine some downstream code in the future would be surprised not to see some bits that are actually set, but stripped here? > > Is this for `LRBNode::hash/cmp` stability? It should be stripped in `hash/cmp` then. Yes, this is for hash/cmp. In particular, I suspect we might miss optimizations that don't find two LRBs equal only because some non-relevant access-bit is set. Will move it to hash/cmp then. > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp line 58: > >> 56: >> 57: static bool is_strong_access(DecoratorSet decorators) { >> 58: return (decorators & (ON_WEAK_OOP_REF | ON_PHANTOM_OOP_REF | ON_UNKNOWN_OOP_REF)) == 0; > > Is this the same as `decorators & ON_STRONG_OOP_REF`? It is the same, but in my testing I found that ON_STRONG_OOP_REF is not always set (it is the default when no other ON_XYZ_OOP_REF is specified). > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp line 62: > >> 60: >> 61: static bool is_weak_access(DecoratorSet decorators) { >> 62: return (decorators & (ON_WEAK_OOP_REF | ON_UNKNOWN_OOP_REF)) != 0; > > Um. Shouldn't `ON_UNKNOWN_OOP_REF` default to strong? I.e. the access via Unsafe should probably resurrect the object, otherwise it corrupts the heap? No. ON_UNKNOWN_OOP_REF *might* access the referent field by reflection. We must do the correct thing in this situation. Normally, ON_UNKNOWN_OOP_REF is indeed called on normal/strong references, though, however we do the correct thing in this case too, because the referent would be marked and thus go through regular LRB. It doesn't hurt to call into weak for unknown oop refs. > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 108: > >> 106: if (!HasDecorator::value && obj != NULL && >> 107: _heap->is_concurrent_weak_root_in_progress() && >> 108: !(HasDecorator::value ? _heap->marking_context()->is_marked(obj) > > Maybe split out boolean locals for decorator tests? A matter of style, your call. Hmm yeah. Actually I am thinking how to do the HasDecorator tests as outermost tests, because those are the compile-time-tests. Would need to duplicate the whole test once for WEAK/UNKNOWN and once for PHANTOM I guess. ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From sjohanss at openjdk.java.net Tue Nov 10 10:56:25 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 10 Nov 2020 10:56:25 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 Message-ID: Please review this change that implements concurrent uncommit for G1. **Summary** G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. **Logging** To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 ... [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms ... [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) **Testing** Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. ------------- Commit messages: - Self review - Simplified task - Improved logging - Test improvement - Uncommit task - Move HeapRegionRange constructor - Stress Uncommit - Feedback from dev-meeting - Initial patch for concurrent uncommit Changes: https://git.openjdk.java.net/jdk/pull/1141/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8236926 Stats: 1250 lines in 26 files changed: 1113 ins; 85 del; 52 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From alanb at openjdk.java.net Tue Nov 10 11:03:55 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Tue, 10 Nov 2020 11:03:55 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v26] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 16:07:13 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: >> >> * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads >> * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually >> * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. >> >> A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. >> >> This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). >> >> A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. >> >> A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. >> >> Thanks >> Maurizio >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff: >> >> http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254163 >> >> >> >> ### API Changes >> >> * `MemorySegment` >> * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) >> * added a no-arg factory for a native restricted segment representing entire native heap >> * rename `withOwnerThread` to `handoff` >> * add new `share` method, to create shared segments >> * add new `registerCleaner` method, to register a segment against a cleaner >> * add more helpers to create arrays from a segment e.g. `toIntArray` >> * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) >> * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) >> * `MemoryAddress` >> * drop `segment` accessor >> * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment >> * `MemoryAccess` >> * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). >> * `MemoryHandles` >> * drop `withOffset` combinator >> * drop `withStride` combinator >> * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. >> * `Addressable` >> * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. >> * `MemoryLayouts` >> * A new layout, for machine addresses, has been added to the mix. >> >> >> >> ### Implementation changes >> >> There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. >> >> #### Shared segments >> >> The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. >> >> After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. >> >> Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). >> >> The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. >> >> As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. >> >> In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. >> >> To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). >> >> Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). >> >> `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. >> >> The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. >> >> #### Memory access var handles overhaul >> >> The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. >> >> This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. >> >> This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. >> >> #### Test changes >> >> Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. >> >> [1] - https://openjdk.java.net/jeps/393 >> [2] - https://openjdk.java.net/jeps/389 >> [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html >> [4] - https://openjdk.java.net/jeps/312 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add more output in TestHandhsake.java Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From sspitsyn at openjdk.java.net Tue Nov 10 11:05:02 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 10 Nov 2020 11:05:02 GMT Subject: RFR: 8253525: Implement getInstanceSize/sizeOf intrinsics [v5] In-Reply-To: References: <_TFbHUvH8zI29hGvpE3TGGNLGgi7PhP7rfed23up13U=.5a19aaed-cc1e-4d18-997d-f37616323e61@github.com> Message-ID: On Mon, 9 Nov 2020 19:02:41 GMT, Aleksey Shipilev wrote: >> Thanks for review, @kvn! I would also like a review from someone from serviceability. > > Friendly reminder. Hi Aleksey, I've not looked at the compiler generated code. The fix looks okay to me. I have a question and a couple of minor suggestions on new test. Q: Why the value of ITERS is that big? What is the need to have this number of iterations? Also, I do not like one-letter identifiers, especially if they are not local. Could you, please, replace identifiers R and A with some short versions that give a hint. Something like REFSIZE and ALIGNMENT would be good enough. Also, what tests did you run to make sure no regression is introduced? Thanks, Serguei ------------- PR: https://git.openjdk.java.net/jdk/pull/650 From shade at openjdk.java.net Tue Nov 10 11:06:02 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 11:06:02 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v3] In-Reply-To: <1MKAddYmmZwDvkKF8UxfITgxVfm0o2Tcbg4ZotOHXZk=.fc4a354d-668f-4d8d-8da3-1277a08bca4e@github.com> References: <1MKAddYmmZwDvkKF8UxfITgxVfm0o2Tcbg4ZotOHXZk=.fc4a354d-668f-4d8d-8da3-1277a08bca4e@github.com> Message-ID: <5MmpX2SvrnBppVn3qGhkVt0v3Hrlqv57Y_3X3QTlvso=.471dec62-f0ac-4743-a86a-d911ea448e31@github.com> On Tue, 10 Nov 2020 10:15:22 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp line 984: >> >>> 982: calladdr = CAST_FROM_FN_PTR(address, ShenandoahRuntime::load_reference_barrier_strong); >>> 983: } >>> 984: name = "load_reference_barrier_strong"; >> >> Why not separate "load_reference_barrier_strong" and "load_reference_barrier_strong_narrow"? > > Dunno. Could do that. Within one instance of JVM, we would never see both narrow/non-narrow versions, though. I would be more straight-forward looking, AFAICS. Is there any code that actually matches those strings? >> src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp line 58: >> >>> 56: >>> 57: static bool is_strong_access(DecoratorSet decorators) { >>> 58: return (decorators & (ON_WEAK_OOP_REF | ON_PHANTOM_OOP_REF | ON_UNKNOWN_OOP_REF)) == 0; >> >> Is this the same as `decorators & ON_STRONG_OOP_REF`? > > It is the same, but in my testing I found that ON_STRONG_OOP_REF is not always set (it is the default when no other ON_XYZ_OOP_REF is specified). Ouch. That means we should never trust `ON_STRONG_OOP_REF`, and instead go for `SBS::is_strong_access`? ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From shade at openjdk.java.net Tue Nov 10 11:11:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 11:11:59 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v3] In-Reply-To: <1MKAddYmmZwDvkKF8UxfITgxVfm0o2Tcbg4ZotOHXZk=.fc4a354d-668f-4d8d-8da3-1277a08bca4e@github.com> References: <1MKAddYmmZwDvkKF8UxfITgxVfm0o2Tcbg4ZotOHXZk=.fc4a354d-668f-4d8d-8da3-1277a08bca4e@github.com> Message-ID: On Tue, 10 Nov 2020 10:18:57 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp line 62: >> >>> 60: >>> 61: static bool is_weak_access(DecoratorSet decorators) { >>> 62: return (decorators & (ON_WEAK_OOP_REF | ON_UNKNOWN_OOP_REF)) != 0; >> >> Um. Shouldn't `ON_UNKNOWN_OOP_REF` default to strong? I.e. the access via Unsafe should probably resurrect the object, otherwise it corrupts the heap? > > No. ON_UNKNOWN_OOP_REF *might* access the referent field by reflection. We must do the correct thing in this situation. Normally, ON_UNKNOWN_OOP_REF is indeed called on normal/strong references, though, however we do the correct thing in this case too, because the referent would be marked and thus go through regular LRB. It doesn't hurt to call into weak for unknown oop refs. > No. ON_UNKNOWN_OOP_REF _might_ access the referent field by reflection. Ew. This just goes against my intuition about the default strongness. For example, `Unsafe_GetReference` calls with `ON_UNKNOWN_OOP_REF` -- does that mean we would treat that access as weak? I don't think we should. ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From rkennke at openjdk.java.net Tue Nov 10 11:19:55 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 11:19:55 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v3] In-Reply-To: <5MmpX2SvrnBppVn3qGhkVt0v3Hrlqv57Y_3X3QTlvso=.471dec62-f0ac-4743-a86a-d911ea448e31@github.com> References: <1MKAddYmmZwDvkKF8UxfITgxVfm0o2Tcbg4ZotOHXZk=.fc4a354d-668f-4d8d-8da3-1277a08bca4e@github.com> <5MmpX2SvrnBppVn3qGhkVt0v3Hrlqv57Y_3X3QTlvso=.471dec62-f0ac-4743-a86a-d911ea448e31@github.com> Message-ID: On Tue, 10 Nov 2020 11:02:11 GMT, Aleksey Shipilev wrote: >> Dunno. Could do that. Within one instance of JVM, we would never see both narrow/non-narrow versions, though. > > I would be more straight-forward looking, AFAICS. Is there any code that actually matches those strings? I don't think any code matches those strings. I will change them to be more specific. >> It is the same, but in my testing I found that ON_STRONG_OOP_REF is not always set (it is the default when no other ON_XYZ_OOP_REF is specified). > > Ouch. That means we should never trust `ON_STRONG_OOP_REF`, and instead go for `SBS::is_strong_access`? Presumably, the Access machinery is supposed to fill it in when none of the others is set. In practice, it did not work for me. We could fix the access machinery too, but life's too short? ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From rkennke at openjdk.java.net Tue Nov 10 11:25:57 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 11:25:57 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v3] In-Reply-To: References: <1MKAddYmmZwDvkKF8UxfITgxVfm0o2Tcbg4ZotOHXZk=.fc4a354d-668f-4d8d-8da3-1277a08bca4e@github.com> Message-ID: <9gTaeVuNDSAXmEIuSFNkoybzNLospwZlGqNIRhHe1uA=.6e6d4011-e375-4415-bd0f-5d416543573f@github.com> On Tue, 10 Nov 2020 11:08:52 GMT, Aleksey Shipilev wrote: >> No. ON_UNKNOWN_OOP_REF *might* access the referent field by reflection. We must do the correct thing in this situation. Normally, ON_UNKNOWN_OOP_REF is indeed called on normal/strong references, though, however we do the correct thing in this case too, because the referent would be marked and thus go through regular LRB. It doesn't hurt to call into weak for unknown oop refs. > >> No. ON_UNKNOWN_OOP_REF _might_ access the referent field by reflection. > > Ew. This just goes against my intuition about the default strongness. For example, `Unsafe_GetReference` calls with `ON_UNKNOWN_OOP_REF` -- does that mean we would treat that access as weak? I don't think we should. Yes that is the idea. Note, as explained above, it does not hurt. The alternative would be to generate code that figures out the strength *at runtime* and calls into the correct LRB. This would be complex, though. We already do it for SATB-barriers, but there it is necessary, while here it is harmless to call the weak-LRB for non-weak references. ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From shade at openjdk.java.net Tue Nov 10 12:04:21 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 12:04:21 GMT Subject: RFR: 8253525: Implement getInstanceSize/sizeOf intrinsics [v5] In-Reply-To: References: <_TFbHUvH8zI29hGvpE3TGGNLGgi7PhP7rfed23up13U=.5a19aaed-cc1e-4d18-997d-f37616323e61@github.com> Message-ID: On Tue, 10 Nov 2020 11:01:54 GMT, Serguei Spitsyn wrote: > I have a question and a couple of minor suggestions on new test. > Q: Why the value of ITERS is that big? What is the need to have this number of iterations? The test verifies the answer does not change if we hit JIT compilers in that code. Since we are doing C1/C2 intrinsics, we need to execute the loops more than compilation-threshold times. Since the current threshold is about 100K, doing 1M iterations seems good: it is well past the compilation threshold times, and there is time to re-enter the newly compiled method. The test run time is still sane, 1 minute on my Linux x86_64 fastdebug. I can do 200K iterations and -Xbatch instead, though, see new change. This drops the test run time to 30 seconds. > Also, I do not like one-letter identifiers, especially if they are not local. > Could you, please, replace identifiers R and A with some short versions that give a hint. > Something like REFSIZE and ALIGNMENT would be good enough. Renamed to `REF_SIZE` and `OBJ_ALIGN` instead. > Also, what tests did you run to make sure no regression is introduced? Old code calls into `oop::size()` to get the object size. That method decodes the object's layout helper. So when we replace it with intrinsic, we now have to test the different shapes of the layout helper and varying conditions for that decoding. So the new test tries to cover the comprehensive matrix: - the usual object shapes: objects, primitive arrays, object arrays; - different compressed oops modes that affect reference sizes; - different object alignment modes that affect object sizes; - different compilation modes: interpreter, C1, C2; - special paths like carrying special bits in layout helper for allocation slow-paths; I know that test is sensitive to compiler intrinsics bugs, as I used these tests to develop the intrinsics. If you inject simple off-by-one bugs into current C1/C2 intrinsics, new test catches that. The additional safety comes from the somewhat loose API requirement: it is specified to return the guess, and that guess might as well be wrong (not overly wrong though, for a quality implementation). ------------- PR: https://git.openjdk.java.net/jdk/pull/650 From shade at openjdk.java.net Tue Nov 10 12:04:21 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 12:04:21 GMT Subject: RFR: 8253525: Implement getInstanceSize/sizeOf intrinsics [v6] In-Reply-To: References: Message-ID: > This is fork off the SizeOf JEP, JDK-8249196. There is already the entry point in JDK that can use the intrinsic like this: `Instrumentation.getInstanceSize`. Therefore, we can implement the C1/C2 intrinsic now, hook it up to `Instrumentation`, and let the tools use that fast path today. > > With this patch, JOL is able to be close to `deepSizeOf` implementation from SizeOf JEP. > > Example performance improvements for sizing up a custom linked list: > > Benchmark (size) Mode Cnt Score Error Units > > # Default > LinkedChainBench.linkedChain 1 avgt 5 705.835 ? 8.051 ns/op > LinkedChainBench.linkedChain 10 avgt 5 3148.874 ? 37.856 ns/op > LinkedChainBench.linkedChain 100 avgt 5 28693.256 ? 142.254 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 290161.590 ? 4594.631 ns/op > > # Instrumentation attached, no intrinsics > LinkedChainBench.linkedChain 1 avgt 5 159.659 ? 19.238 ns/op > LinkedChainBench.linkedChain 10 avgt 5 717.659 ? 22.540 ns/op > LinkedChainBench.linkedChain 100 avgt 5 7739.394 ? 111.683 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 80724.238 ? 2887.794 ns/op > > # Instrumentation attached, new intrinsics > LinkedChainBench.linkedChain 1 avgt 5 95.254 ? 0.808 ns/op > LinkedChainBench.linkedChain 10 avgt 5 261.564 ? 8.524 ns/op > LinkedChainBench.linkedChain 100 avgt 5 3367.192 ? 21.128 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 34148.851 ? 373.080 ns/op Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Trim down the iteration count, and use -Xbatch to wait for compilation - Use proper constant names in the test - Merge branch 'master' into JDK-8253525-sizeof-intrinsics - The new intrinsic-related test - Revert the change to test - Merge branch 'master' into JDK-8253525-sizeof-intrinsics - Add new intrinsics to toBeInvestigated list in CheckGraalIntrinsics.java - 8253525: Implement getInstanceSize/sizeOf intrinsics ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/650/files - new: https://git.openjdk.java.net/jdk/pull/650/files/482c2f24..1b7290a3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=650&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=650&range=04-05 Stats: 138167 lines in 1930 files changed: 82950 ins; 42900 del; 12317 mod Patch: https://git.openjdk.java.net/jdk/pull/650.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/650/head:pull/650 PR: https://git.openjdk.java.net/jdk/pull/650 From mcimadamore at openjdk.java.net Tue Nov 10 12:10:26 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 10 Nov 2020 12:10:26 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v17] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address more CSR feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/e9606edb..9960b3d7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=16 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=15-16 Stats: 50 lines in 6 files changed: 0 ins; 30 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From rkennke at openjdk.java.net Tue Nov 10 12:20:14 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 12:20:14 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v4] In-Reply-To: References: Message-ID: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [ ] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with five additional commits since the last revision: - Whitespace changes - Mask decorators in hash/cmp, not in ctor - Use ShBarrierSet::is_strong_access() in CmpP optimization - separate code-paths for phantom- and weak-access in runtime-LRB - Give different name to different LRB runtime calls ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/8d979eea..ac2674f5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=02-03 Stats: 33 lines in 4 files changed: 16 ins; 5 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From shade at openjdk.java.net Tue Nov 10 12:33:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 12:33:56 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v4] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 12:20:14 GMT, Roman Kennke wrote: >> In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. >> >> I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: >> - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions >> - We can have strong and phantom native referents, and strong and weak in-heap referents >> - Native referents are never compressed >> >> Note that this depends on PR#1140. >> >> Testing: >> - [x] hotspot_gc_shenandoah >> - [x] tier1 +UseShenandoahGC +ShenandoahVerify >> - [ ] tier2 +UseShenandoahGC +ShenandoahVerify > > Roman Kennke has updated the pull request incrementally with five additional commits since the last revision: > > - Whitespace changes > - Mask decorators in hash/cmp, not in ctor > - Use ShBarrierSet::is_strong_access() in CmpP optimization > - separate code-paths for phantom- and weak-access in runtime-LRB > - Give different name to different LRB runtime calls This looks fine to me. Let Zhengyu look through this thoroughly. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1109 From dholmes at openjdk.java.net Tue Nov 10 12:35:59 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 10 Nov 2020 12:35:59 GMT Subject: RFR: 8255822: Zero: improve build-time JVMTI handling [v3] In-Reply-To: References: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> Message-ID: <83OnOrWG5dCsK_f6OUm7517vxV4uSg76a-PhTQZJjlM=.687f44d9-d4c2-440a-86dd-efa5da75800b@github.com> On Tue, 10 Nov 2020 07:22:07 GMT, Aleksey Shipilev wrote: >> Current Zero interpreter has the optimization for JVMTI support. It recognizes that JVMTI is disabled most of the time, and that JVMTI checks in the interpreter code slows it down considerably. (I measured it myself when working on this patch: removing this optimization yields about 20% hit in build times). >> >> Current optimization works as follows. At build time, an XSLT transform is performed on `bytecodeInterpreter.cpp`, yielding `bytecodeInterpreterWithChecks.cpp`. In that new compilation unit, `VM_JVMTI` macro is defined, and a new entry point -- `BytecodeInterpreter::withChecks` -- is defined. Then, both compilation units are compiled. In one of them, `JVMTI` hooks are stripped out. In another, they persist. Then, callers have to choose which entry point to use. >> >> I believe this can be rewritten to use C++ templates instead of XLST and defines dance. This also allows to clean up JVMTI checks a bit. >> >> Additional testing: >> - [x] Linux x86_64 Zero fastdebug build with `-jvmti` >> - [x] Linux x86_64 Zero fastdebug/release build times are not regressing > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Simplify a few JVMTI_ENABLED blocks > - Merge branch 'master' into JDK-8255822-zero-jvmti-rework > - Fix build error > - Merge branch 'master' into JDK-8255822-zero-jvmti-rework > - Revert one dubious change > - 8255822: Zero: improve build-time JVMTI handling > Summary: use C++ templates instead of XSLT transforms Still good. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1061 From vlivanov at openjdk.java.net Tue Nov 10 12:39:15 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 12:39:15 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails [v2] In-Reply-To: References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: On Mon, 9 Nov 2020 21:03:05 GMT, Daniel D. Daugherty wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > Looks good. Your call on whether to add the comment I proposed. Thanks for the reviews, Vladimir and Dan! > src/hotspot/share/runtime/basicLock.cpp line 34: > >> 32: markWord mark_word = displaced_header(); >> 33: if (mark_word.value() != 0) { >> 34: bool print_monitor_info = (owner != NULL) && (owner->mark() == markWord::from_pointer((void*)this)); > > Could use a comment between L33 and L34: > // Print monitor info if there's an owning oop and it refers to this BasicLock. Yes, I'll incorporate the comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/1124 From vlivanov at openjdk.java.net Tue Nov 10 12:39:14 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 12:39:14 GMT Subject: RFR: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails [v2] In-Reply-To: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: > -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: > * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); > * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. > > Testing: > - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails > - [x] hs-precheckin-comp,hs-tier1,hs-tier2 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1124/files - new: https://git.openjdk.java.net/jdk/pull/1124/files/a41c82cd..89550ba2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1124&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1124&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1124.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1124/head:pull/1124 PR: https://git.openjdk.java.net/jdk/pull/1124 From vlivanov at openjdk.java.net Tue Nov 10 12:44:56 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 10 Nov 2020 12:44:56 GMT Subject: Integrated: 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails In-Reply-To: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> References: <4vKGL5jqtgobFx2t-9Byioh9oMiECHl9zH59N1Xc87g=.f94e0110-ba17-4387-bec8-ab414e91890c@github.com> Message-ID: <0MlWpgsM3xXFe-OlCzKhH1rr9TsTN8WhIWWyddoZLXI=.0156927b-9657-45b9-a5e1-6add0f2d35ec@github.com> On Mon, 9 Nov 2020 13:01:18 GMT, Vladimir Ivanov wrote: > -XX:+PrintDeoptimizationDetails triggers intermittent crashes. I spotted 2 independent problems which the patch addresses: > * `markWord::print_on` doesn't handle displaced header case well (the pointer stored in the header may be stale); > * `InstanceKlass::oop_print_value_on` dumps some specific details about `MemberName`, but the code assumes the instance is fully initialized. It's not necessarily the case: for example, deoptimization can happen when `MemberName` constructor is being executed. > > Testing: > - [x] manually verified that the crashes go away -XX:+PrintDeoptimizationDetails > - [x] hs-precheckin-comp,hs-tier1,hs-tier2 This pull request has now been integrated. Changeset: 3455fa9b Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/3455fa9b Stats: 31 lines in 7 files changed: 16 ins; 0 del; 15 mod 8256050: JVM crashes with -XX:+PrintDeoptimizationDetails Reviewed-by: kvn, dcubed ------------- PR: https://git.openjdk.java.net/jdk/pull/1124 From coleenp at openjdk.java.net Tue Nov 10 13:22:09 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 13:22:09 GMT Subject: RFR: 8138588: VerifyMergedCPBytecodes option cleanup needed Message-ID: This option has been removed in favor of always verifying the bytecodes in debug mode. Tested with tier1-3. ------------- Commit messages: - Fix test using option. - 8138588: VerifyMergedCPBytecodes option cleanup needed Changes: https://git.openjdk.java.net/jdk/pull/1137/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1137&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8138588 Stats: 9 lines in 4 files changed: 3 ins; 5 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1137.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1137/head:pull/1137 PR: https://git.openjdk.java.net/jdk/pull/1137 From rkennke at openjdk.java.net Tue Nov 10 13:28:13 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 13:28:13 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v5] In-Reply-To: References: Message-ID: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [ ] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Aarch64 parts ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/ac2674f5..d361e60b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=03-04 Stats: 67 lines in 3 files changed: 9 ins; 11 del; 47 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From ayang at openjdk.java.net Tue Nov 10 13:30:55 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Nov 2020 13:30:55 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v4] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 09:55:18 GMT, Thomas Schatzl wrote: > can you clarify? In the original code, `uint8_t default_value() const { return NoArchive; }`, where `NoArchive = 0`. > It's not worth thinking about overhead of the clearing in this case imho. OK, maybe it's not perf sensitive. ------------- PR: https://git.openjdk.java.net/jdk/pull/824 From hseigel at openjdk.java.net Tue Nov 10 13:54:56 2020 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 10 Nov 2020 13:54:56 GMT Subject: RFR: 8138588: VerifyMergedCPBytecodes option cleanup needed In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 23:46:04 GMT, Coleen Phillimore wrote: > This option has been removed in favor of always verifying the bytecodes in debug mode. Tested with tier1-3. Looks good! Thanks, Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1137 From dcubed at openjdk.java.net Tue Nov 10 14:04:03 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 14:04:03 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 03:28:52 GMT, David Holmes wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). > > Marked as reviewed by dholmes (Reviewer). Thanks for closing the loop on this one. Yes, next_next is needed and it's a carry-over name from the baseline code. And again, thanks for the thorough review. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From mcimadamore at openjdk.java.net Tue Nov 10 14:16:22 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 10 Nov 2020 14:16:22 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: - Merge pull request #7 from JornVernee/Additional_Review_Comments Additional review comments - Revert System.java changes - Set copyright year for added files to 2020 - Check result of AttachCurrentThread - Sort includes alphabetically - Relax ret_addr_offset() assert - Extra space after if - remove excessive asserts in ProgrammableInvoker::invoke_native - Remove os::is_MP() check - remove blank line in thread.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/9960b3d7..efc969dc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=17 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=16-17 Stats: 90 lines in 56 files changed: 14 ins; 13 del; 63 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From zgu at openjdk.java.net Tue Nov 10 14:17:01 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 10 Nov 2020 14:17:01 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v5] In-Reply-To: References: Message-ID: <1EJmFGKXgKlDUsfnNa9F3sVc_CIt1byutdNXLeZWCJM=.f4357e91-301c-47c4-9777-77f6f8e62948@github.com> On Tue, 10 Nov 2020 13:28:13 GMT, Roman Kennke wrote: >> In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. >> >> I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: >> - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions >> - We can have strong and phantom native referents, and strong and weak in-heap referents >> - Native referents are never compressed >> >> Note that this depends on PR#1140. >> >> Testing: >> - [x] hotspot_gc_shenandoah >> - [x] tier1 +UseShenandoahGC +ShenandoahVerify >> - [ ] tier2 +UseShenandoahGC +ShenandoahVerify > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Aarch64 parts Changes requested by zgu (Reviewer). src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 958: > 956: } > 957: #else > 958: __ load_parameter(0, rax); Dose not seem that x86_32 made adjustment as x86_64. Dose it even build on x86_32? ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From coleenp at openjdk.java.net Tue Nov 10 14:18:00 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 14:18:00 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Sat, 7 Nov 2020 17:17:01 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 136: >> >>> 134: >>> 135: // Honor block request. >>> 136: ThreadBlockInVM tbivm(self->as_Java_thread()); >> >> ThreadBlockInVM is generally used to wrap blocking code, not to cause the current thread to block (which it will do as a side-effect if a safepoint/handshake is requested). Surely this should just be call to `process_if_requested` (or the new `process_if_requested_with_exit_check`)? > > This kind of use of ThreadBlockInVM predates this changeset so while > the location is new, then code style is old, very old... I'll hold off changing > this for now. I'd rather see ThreadBlockInVM as the convention of allowing the thread to block if a safepoint is requested. The calls like process_if_requested are becoming alphabet soup and keep changing, so having TBIVM is better in my opinion. That said, this is a strange usage. This code appears three times. It should be a function like allow_safepoint_block(LogStream* ls, timer), with some comment above. Then it's clear that it's checking for a safepoint in a loop that could take a long time and the logging is ancillary. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From jvernee at openjdk.java.net Tue Nov 10 14:28:06 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 10 Nov 2020 14:28:06 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 04:10:30 GMT, David Holmes wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 64 commits: >> >> - Merge branch '8254162' into 8254231_linker >> - Fix post-merge issues caused by 8219014 >> - Merge branch 'master' into 8254162 >> - Addess remaining feedback from @AlanBateman and @mrserb >> - Address comments from @AlanBateman >> - Fix typo in upcall helper for aarch64 >> - Merge branch '8254162' into 8254231_linker >> - Merge branch 'master' into 8254162 >> - Fix issues with derived buffers and IO operations >> - More 32-bit fixes for TestLayouts >> - ... and 54 more: https://git.openjdk.java.net/jdk/compare/a50fdd54...b38afb3f > > A high-level scan through - mostly VM files. I've addressed the open review comments. One of the commits is a bigger change that removes the code duplication in the upcall handler code. The initialization code is moved to the ProgrammableUpcallHandler class' constructor instead. That class is then lazily instantiated using a local `static` variable (see ProgrammableUpcallHandler::instance), which since C++11 guarantees thread-safe initialize-once behaviour. Along with those changes I've also removed some other duplicated code in the native invoker code (ProgrammableInvokerGenerator), cleaned up the includes most files, as well as using a JNI_ENTRY function when doing upcalls (as requested), by splitting the current functionality of the upcall_helper function in 2; one function that does the thread attach, and then other that does the actual upcall (which is the one using JNI_ENTRY). (see: https://github.com/openjdk/jdk/pull/634/commits/719224ca9dc70fce6d28885acfb362fee715ebbd). As discussed, changing ProgrammableUpcallHandler to be a well-known class didn't work, since it is not in java.base. Changes: - Merge both versions of upcall_init and move the code to (the constructor of) ProgrammableUpcallHandler. Using the same lazy singleton pattern as for ForeignGlobals to make initialization thread-safe. - Merge both PorgrammableInvokeGenerator classes into a shared ProgrammableInvoke::Generator class. - Also move ProgrammableStub to ProgrammableInvoke::Stub for better name-spacing - Also move native_invoker_size constant to ProgrammableInvoker (we now have 1 instead of 2) - Merge ProgrammableInvoker::Generator::generate and top-level generate_invoke_native functions (avoiding the need to forward fields) - Split upcall_helper method into ProgrammableUpcallHandler::attach_thread_and_do_upcall and upcall_helper. The former does the thread attach/detach, the latter does the actual upcall. - Add a few comments to ProgrammableUpcallHandler::generate_upcall_stub - Remove unused imports The rest of the review comments were addressed in a set of smaller commits (see timeline on GitHub). The changes therein are: - remove blank line in thread.hpp - Remove os::is_MP() check - remove excessive asserts in ProgrammableInvoker::invoke_native - Extra space after if in jni_util_md (Windows) - Relax ret_addr_offset() assert - Sort includes alphabetically in upcallHandler CPU files - Check result of AttachCurrentThread call - Set copyright year for added files to 2020 (I didn't touch the ARM copyright headers) That should address all open review comments (but please let me know if I've missed something). Thanks for the reviews so far. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From eosterlund at openjdk.java.net Tue Nov 10 14:33:58 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Nov 2020 14:33:58 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 02:35:17 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). Looks very good! Thanks for picking this up and taking it all the way! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/642 From zgu at openjdk.java.net Tue Nov 10 14:37:55 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 10 Nov 2020 14:37:55 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v5] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 13:28:13 GMT, Roman Kennke wrote: >> In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. >> >> I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: >> - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions >> - We can have strong and phantom native referents, and strong and weak in-heap referents >> - Native referents are never compressed >> >> Note that this depends on PR#1140. >> >> Testing: >> - [x] hotspot_gc_shenandoah >> - [x] tier1 +UseShenandoahGC +ShenandoahVerify >> - [ ] tier2 +UseShenandoahGC +ShenandoahVerify > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Aarch64 parts Changes requested by zgu (Reviewer). src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp line 278: > 276: bool is_phantom = ShenandoahBarrierSet::is_phantom_access(decorators); > 277: bool is_native = ShenandoahBarrierSet::is_native_access(decorators); > 278: bool is_narrow = LP64_ONLY(UseCompressedOops &&) !is_native; This seems wrong for 32-bits. should be: is_narrow = LP64_ONLY(UseCompressedOops &&) NOT_LP64(false &&) !is_native; src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp line 237: > 235: bool is_phantom = ShenandoahBarrierSet::is_phantom_access(decorators); > 236: bool is_native = ShenandoahBarrierSet::is_native_access(decorators); > 237: bool is_narrow = LP64_ONLY(UseCompressedOops &&) !is_native; Don't need LP64_ONLY, aarch64 always 64-bits. ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From dcubed at openjdk.java.net Tue Nov 10 14:45:01 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 14:45:01 GMT Subject: RFR: 8138588: VerifyMergedCPBytecodes option cleanup needed In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 23:46:04 GMT, Coleen Phillimore wrote: > This option has been removed in favor of always verifying the bytecodes in debug mode. Tested with tier1-3. Thumbs up! Please make sure this is okay with someone on the Serviceability team. src/hotspot/share/runtime/globals.hpp line 875: > 873: "Force ldc -> ldc_w rewrite during RedefineClasses") \ > 874: \ > 875: product(bool, AllowRedefinitionToAddDeleteMethods, false, \ Well... it's definitely sometime after Mustang/1.6.0... :-) ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1137 From dcubed at openjdk.java.net Tue Nov 10 14:55:59 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 14:55:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 14:31:14 GMT, Erik ?sterlund wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). > > Looks very good! Thanks for picking this up and taking it all the way! @fisk - Thanks for the sanity check review. And thanks for prototyping this work and showing that this crazy idea could work! :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 15:06:01 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 15:06:01 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: On Tue, 10 Nov 2020 14:15:02 GMT, Coleen Phillimore wrote: >> This kind of use of ThreadBlockInVM predates this changeset so while >> the location is new, then code style is old, very old... I'll hold off changing >> this for now. > > I'd rather see ThreadBlockInVM as the convention of allowing the thread to block if a safepoint is requested. The calls like process_if_requested are becoming alphabet soup and keep changing, so having TBIVM is better in my opinion. > That said, this is a strange usage. This code appears three times. It should be a function like allow_safepoint_block(LogStream* ls, timer), with some comment above. Then it's clear that it's checking for a safepoint in a loop that could take a long time and the logging is ancillary. I previously looked at refactoring the three locations where `ThreadBlockInVM` is used. The problem with the refactoring is that the log messages and the parameters have some differences and some commonalities. Each of these logging sites is trying to communicate some local context that is unique to that call site along with some global context that might have changed from call site to call site. I'll take another look at refactoring shortly and will let you know what I come up with. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From rkennke at openjdk.java.net Tue Nov 10 15:26:14 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 15:26:14 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v6] In-Reply-To: References: Message-ID: <-X5aHA0aJFAUOjdrVoKttDntOBth2QQv7vD2cBVcAZo=.70f3d572-66dc-4ca6-b512-0e9ce0577131@github.com> > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [ ] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Remove superfluous LP64 in aarch64 part - Fixes/missing parts for x86_64 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/d361e60b..37c97786 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=04-05 Stats: 20 lines in 3 files changed: 4 ins; 6 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From ayang at openjdk.java.net Tue Nov 10 15:38:00 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Nov 2020 15:38:00 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v5] In-Reply-To: <5pZEcUBz7l9O7nN2kEbLOp6SIS8WjPrOnokK0aNgJZE=.272c196a-fe3c-480b-9b80-412a58d85830@github.com> References: <5pZEcUBz7l9O7nN2kEbLOp6SIS8WjPrOnokK0aNgJZE=.272c196a-fe3c-480b-9b80-412a58d85830@github.com> Message-ID: On Tue, 10 Nov 2020 10:17:11 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? >> >> By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? >> >> Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: >> - humongous regions are either live or fully reclaimed. >> - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). >> >> This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. >> >> Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). >> >> Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. >> Performance testing: no regressions >> >> Some comments for questions that might come up during review: >> >> - how does this work with the bitmaps now: >> - at start of full gc the next bitmap is cleared >> - full gc marks the next bitmap >> - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom >> - swap bitmaps >> - clear next bitmap for next marking >> >> (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. >> >> - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. >> >> Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. >> (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). >> >> I.e. the second clause in the condition of this hunk is intentionally slower than could be: >> @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { >> // Marked by us, preserve if needed. >> markWord mark = obj->mark(); >> if (obj->mark_must_be_preserved(mark) && >> // It is not necessary to preserve marks for objects in pinned regions because >> // we do not change their headers (i.e. forward them). >> !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { >> preserved_stack()->push(obj, mark); >> } >> - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. >> >> Also please note that the 51b297b change is from the #808 change. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into 8253600-full-gc-pinned-region-support > - ayang review > - sjohanss review 2 > - Merge branch 'master' into 8253600-full-gc-pinned-region-support > - Merge branch 'master' into 8253600-full-gc-pinned-region-support > - sjohanss review > > Also remove _archive_allocator_map et al as the new attribute table > implements the same functionality also suggested by sjohanss in > private. > - Initial import > - Initial import `G1FullGCHeapRegionAttr::is_pinned` and `HeapRegion::is_pinned` share the same name, but have different meanings. I wonder if it's possible to rename the former into sth more precise. Maybe `G1FullGCHeapRegionAttr::should_not_be_relocated`, or the version without negation `G1FullGCHeapRegionAttr::should_relocate`? ------------- Marked as reviewed by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/824 From ayang at openjdk.java.net Tue Nov 10 15:38:01 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Nov 2020 15:38:01 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v4] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 09:55:25 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1FullCollector.hpp line 76: >> >>> 74: ReferenceProcessorSubjectToDiscoveryMutator _is_subject_mutator; >>> 75: >>> 76: G1FullGCHeapRegionAttr _region_attr_table; >> >> I don't really see the point of this auxiliary data structure. Why can't we just query the underlying region for its type, pinned, open/close archive? > > The only reason is performance: > > Consider memory access with that auxiliary data structure: it is a single dereference/memory load on a very dense data structure. > > Directly querying the HeapRegion* is two dependent dereferences (first getting the HeapRegion*, then to the HeapRegion, then accessing the member) on a fairly large data structure (HeapRegion). > > Since these loads are done for every reference (millions of times) the second is much slower, at best filling the cache with useless data as (roughly) a HeapRegion is around a cache line in size (iirc). > > Further, with the current encoding of the values in region attribute table, the closed-or-pinned check can be done with a single check instead of some disjunction (ie. region->is_pinned() || region->is_closed(), although it is true that closed regions are always pinned), so saving even more code (and branch locations). I see; thank you for the explanation. Please add some comments around this code. ------------- PR: https://git.openjdk.java.net/jdk/pull/824 From zgu at openjdk.java.net Tue Nov 10 15:58:57 2020 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 10 Nov 2020 15:58:57 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v6] In-Reply-To: <-X5aHA0aJFAUOjdrVoKttDntOBth2QQv7vD2cBVcAZo=.70f3d572-66dc-4ca6-b512-0e9ce0577131@github.com> References: <-X5aHA0aJFAUOjdrVoKttDntOBth2QQv7vD2cBVcAZo=.70f3d572-66dc-4ca6-b512-0e9ce0577131@github.com> Message-ID: On Tue, 10 Nov 2020 15:26:14 GMT, Roman Kennke wrote: >> In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. >> >> I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: >> - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions >> - We can have strong and phantom native referents, and strong and weak in-heap referents >> - Native referents are never compressed >> >> Note that this depends on PR#1140. >> >> Testing: >> - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) >> - [x] tier1 +UseShenandoahGC +ShenandoahVerify >> - [ ] tier2 +UseShenandoahGC +ShenandoahVerify > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous LP64 in aarch64 part > - Fixes/missing parts for x86_64 Marked as reviewed by zgu (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From tschatzl at openjdk.java.net Tue Nov 10 16:20:09 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Nov 2020 16:20:09 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v6] In-Reply-To: References: Message-ID: > Hi all, > > can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? > > By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? > > Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: > - humongous regions are either live or fully reclaimed. > - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). > > This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. > > Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). > > Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. > Performance testing: no regressions > > Some comments for questions that might come up during review: > > - how does this work with the bitmaps now: > - at start of full gc the next bitmap is cleared > - full gc marks the next bitmap > - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom > - swap bitmaps > - clear next bitmap for next marking > > (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. > > - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. > > Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. > (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). > > I.e. the second clause in the condition of this hunk is intentionally slower than could be: > @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { > // Marked by us, preserve if needed. > markWord mark = obj->mark(); > if (obj->mark_must_be_preserved(mark) && > // It is not necessary to preserve marks for objects in pinned regions because > // we do not change their headers (i.e. forward them). > !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { > preserved_stack()->push(obj, mark); > } > - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. > > Also please note that the 51b297b change is from the #808 change. > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review2 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/824/files - new: https://git.openjdk.java.net/jdk/pull/824/files/dfb7564c..ef440a24 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=824&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=824&range=04-05 Stats: 9 lines in 2 files changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/824.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/824/head:pull/824 PR: https://git.openjdk.java.net/jdk/pull/824 From ayang at openjdk.java.net Tue Nov 10 16:20:10 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Nov 2020 16:20:10 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v6] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 16:17:55 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? >> >> By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? >> >> Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: >> - humongous regions are either live or fully reclaimed. >> - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). >> >> This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. >> >> Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). >> >> Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. >> Performance testing: no regressions >> >> Some comments for questions that might come up during review: >> >> - how does this work with the bitmaps now: >> - at start of full gc the next bitmap is cleared >> - full gc marks the next bitmap >> - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom >> - swap bitmaps >> - clear next bitmap for next marking >> >> (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. >> >> - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. >> >> Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. >> (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). >> >> I.e. the second clause in the condition of this hunk is intentionally slower than could be: >> @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { >> // Marked by us, preserve if needed. >> markWord mark = obj->mark(); >> if (obj->mark_must_be_preserved(mark) && >> // It is not necessary to preserve marks for objects in pinned regions because >> // we do not change their headers (i.e. forward them). >> !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { >> preserved_stack()->push(obj, mark); >> } >> - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. >> >> Also please note that the 51b297b change is from the #808 change. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review2 Thanks to Thomas' offline explanation, my previous comment is incorrect, due to my erroneous understanding of the enum in `HeapRegionType`. Thank you for new updates. This PR is good to go. ------------- Marked as reviewed by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/824 From tschatzl at openjdk.java.net Tue Nov 10 16:40:00 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Nov 2020 16:40:00 GMT Subject: RFR: 8253600: G1: Fully support pinned regions for full gc [v4] In-Reply-To: References: Message-ID: On Tue, 3 Nov 2020 10:58:43 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> sjohanss review 2 > > This looks great, thanks. Thanks @kstefanj @albertnetymk for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/824 From tschatzl at openjdk.java.net Tue Nov 10 16:40:03 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Nov 2020 16:40:03 GMT Subject: Integrated: 8253600: G1: Fully support pinned regions for full gc In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 07:40:45 GMT, Thomas Schatzl wrote: > Hi all, > > can I get reviews for this change that implements "proper" support for pinned regions in the G1 full collector? > > By proper I mean that at the end of gc, pinned regions contain the correct TAMS and bitmap markings under the TAMS so that dead objects within them are supported? > > Currently all (pinned) regions have their TAMS set to bottom() and their bitmap above TAMS cleared (at least logically :) ). This works as long objects within these regions can't be dead as it is the case now: > - humongous regions are either live or fully reclaimed. > - all other pinned regions are archive regions at the moment that are always treated as fully live (and do not contain dead objects). > > This change is a requirement for fixing JDK-8253081 as some earlier change made it possible to have dead objects within open archive regions. It also enables supporting removal of gclocker use for g1, i.e. using region pinning. > > Based on the PR#808 (https://github.com/openjdk/jdk/pull/808). > > Testing: tier1-8, testing with prototype for region pinning, testing with prototype for JDK-8253081. > Performance testing: no regressions > > Some comments for questions that might come up during review: > > - how does this work with the bitmaps now: > - at start of full gc the next bitmap is cleared > - full gc marks the next bitmap > - for all pinned regions, keep TAMS and top() (*), otherwise set TAMS to bottom > - swap bitmaps > - clear next bitmap for next marking > > (*) this means that from a usage POV pinned regions are considered full. This is inaccurate, but sufficient: full gc clears all remembered sets anyway, so we do not need that information for gc efficiency purposes anyway to evacuate later. The next marking before old gen evacuation will update it to the correct values anyway. G1 does not support allocation into "holes" in pinned regions that can be open archive only at this time too, so there is no need to be more exact. > > - use of a region attribute table for phase 2+ only: compared to before we need fast access to information whether a given reference goes into a pinned region (as opposed to an archive region) wrt to adjusting that pointer to avoid doing work for these references. > > Phase 1 marking could have used this information for the do-we-need-to-preserve-the-mark check too: however this would have required g1 to add an extra another pass over all regions to update that. This seemed slower than just checking this information "more slowly" for the objects that need mark preservation. Tests showed that this is the case for <0.00% (yeah, these references that need mark preservation are rounding errors in cases it matters) of overall references, so I did not add that pass. > (Additionally g1 full gc is a last-ditch effort, and while marking takes a significant time, it does not completely dominate it). > > I.e. the second clause in the condition of this hunk is intentionally slower than could be: > @@ -52,7 +52,9 @@ inline bool G1FullGCMarker::mark_object(oop obj) { > // Marked by us, preserve if needed. > markWord mark = obj->mark(); > if (obj->mark_must_be_preserved(mark) && > // It is not necessary to preserve marks for objects in pinned regions because > // we do not change their headers (i.e. forward them). > !G1CollectedHeap::heap()->heap_region_containing(obj)->is_pinned()) { > preserved_stack()->push(obj, mark); > } > - there is no code yet that checks for empty pinned regions yet. Only JDK-8253081 introduces that because still all contents of all archive regions are live forever. > > Also please note that the 51b297b change is from the #808 change. > > Thanks, > Thomas This pull request has now been integrated. Changeset: 6555996f Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/6555996f Stats: 567 lines in 31 files changed: 272 ins; 199 del; 96 mod 8253600: G1: Fully support pinned regions for full gc Reviewed-by: sjohanss, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/824 From gziemski at openjdk.java.net Tue Nov 10 17:04:01 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 10 Nov 2020 17:04:01 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> Message-ID: <182nfzh-a6eiBfaCmrFfv-ZDxWbyRgUNBzzYR0huHO0=.b6029a06-1aaf-4ad9-aab1-4d83800daab4@github.com> On Mon, 9 Nov 2020 20:53:12 GMT, Thomas Stuefe wrote: >>> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ >>> >>> On 7/11/2020 2:40 am, Gerard Ziemski wrote: >>> >>> > On Wed, 4 Nov 2020 04:22:05 GMT, David Holmes wrote: >>> > > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >>> > > > - use ifdef(SIGDANGER) and ifdef(SIGTRAP) >>> > > > - revert unblock_program_error_signals change >>> > > >>> > > >>> > > src/hotspot/os/posix/signals_posix.hpp line 33: >>> > > > 31: >>> > > > 32: typedef siginfo_t siginfo_t; >>> > > > 33: typedef sigset_t sigset_t; >>> > > >>> > > >>> > > I don't see why this is needed/wanted. We can include signal.h without a problem. >>> > > I'm not even sure what these typedefs means ?? >>> > >>> > >>> > Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. >>> > I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. >>> >>> The only reason we would care is if signals_posix.hpp were included in >>> many other handers/files and that should not be the case. This looks >>> completely bogus to me as we need the types from signal.h in this header >>> file. >> >> Just trying to learn: why do we **need** them? We only included them in the APIs here, but we don't actually **use** them otherwise. >> >> And if we need to include `` then don't we also need to include the headers for `outputStream`, `Thread` and `OSThread`? All of these types are used to define the APIs in this header file and are used in the same capacity. >> >> >>> What do those typedefs even mean? I would expect a forward >>> declaration to be of the form: >>> >>> struct siginfo_t; >>> >>> but you don't know what type sigset_t (could be integer or struct) >>> actually is so you can't forward declare it that way. >> >> Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. >> >> Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? > >> > > Coleen asked me to remove from the signals_posix.hpp, so those are forward declarations for the signal types we use. >> > > I thought it was a reasonable request to minimize the number of headers. I saw some efforts in the past to cleanup header files, which is supposed to help with build times, so every little bit helps. >> > >> > >> > The only reason we would care is if signals_posix.hpp were included in >> > many other handers/files and that should not be the case. This looks >> > completely bogus to me as we need the types from signal.h in this header >> > file. >> >> Just trying to learn: why do we **need** them? We only included them in the APIs here, but we don't actually **use** them otherwise. >> >> And if we need to include `` then don't we also need to include the headers for `outputStream`, `Thread` and `OSThread`? All of these types are used to define the APIs in this header file and are used in the same capacity. > > You need to include them if you: > - use the full type, so whoever compiles the header has to know the type size. E.g. if you pass a structure by value. > - use the type as pointer and do not forward declare it. > > In hotspot, it is standard practice to forward-declare structures from some hotspot utility headers - eg ostream.hpp - to avoid including them. There is no guideline of when to do this, and obviously there is a point at which it is simpler and clearer to just include the header. > > In my personal opinion forward declaration is just a bandaid to work around badly designed headers. Ideally, headers should be small and concise. But hotspot headers are balls of yarn. Pull one in you get a bunch of unrelated others. So people got used to forward declaring some classes instead, things like outputStream. I actually think its a bad practice and in the ideal world we would just include whatever we need. > > Which also means that the benefit of forward-declaring types from system headers is limited. Posix headers are usually well designed, and you are better off just including them. Especially since you need to be careful here: it is not clear what these opaque posix types are actually. Sometimes the standard tells you: "The header shall define the siginfo_t type as a structure.." but sometimes it leaves it open: "sigset_t ... Integer or structure type ...". > >> >> > What do those typedefs even mean? I would expect a forward >> > declaration to be of the form: >> > struct siginfo_t; >> > but you don't know what type sigset_t (could be integer or struct) >> > actually is so you can't forward declare it that way. >> >> Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. >> >> Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? > > For hotspot classes, I would leave the forward declarations in and the headers out. Current standard practice. System headers OTOH I would either include here or, somewhat better, in globalDefinitions_gcc.hpp. > > > Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. > > > Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? > > > > > > For hotspot classes, I would leave the forward declarations in and the headers out. Current standard practice. System headers OTOH I would either include here or, somewhat better, in globalDefinitions_gcc.hpp. > > Ha! It is already in globalDefinitions_gcc.hpp so neither the direct > include nor the forward declarations are actually needed. Many thanks Thomas & David for the lesson on the header files! If I understand it correctly, then we need to add `#include "utilities/globalDefinitions.hpp"` to signals_posix.hpp, correct? ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From rkennke at openjdk.java.net Tue Nov 10 17:08:10 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Nov 2020 17:08:10 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v7] In-Reply-To: References: Message-ID: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [ ] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Don't make phantom-access narrow (mistake when doing 32bit parts ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/37c97786..92a92fcd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From stuefe at openjdk.java.net Tue Nov 10 17:17:03 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 10 Nov 2020 17:17:03 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: <182nfzh-a6eiBfaCmrFfv-ZDxWbyRgUNBzzYR0huHO0=.b6029a06-1aaf-4ad9-aab1-4d83800daab4@github.com> References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> <182nfzh-a6eiBfaCmrFfv-ZDxWbyRgUNBzzYR0huHO0=.b6029a06-1aaf-4ad9-aab1-4d83800daab4@github.com> Message-ID: <8oWwl0HlvsDFV2WW8NO__SuUnMqIzASreFg0bhZbNbo=.6e0bec33-9dd9-4728-9881-b196fafbbad6@github.com> On Tue, 10 Nov 2020 17:01:22 GMT, Gerard Ziemski wrote: > > > > Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. > > > > Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? > > > > > > > > > For hotspot classes, I would leave the forward declarations in and the headers out. Current standard practice. System headers OTOH I would either include here or, somewhat better, in globalDefinitions_gcc.hpp. > > > > > > Ha! It is already in globalDefinitions_gcc.hpp so neither the direct > > include nor the forward declarations are actually needed. > > Many thanks Thomas & David for the lesson on the header files! > > If I understand it correctly, then we need to add `#include "utilities/globalDefinitions.hpp"` to signals_posix.hpp, correct? Yes. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From shade at openjdk.java.net Tue Nov 10 17:25:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 17:25:59 GMT Subject: RFR: 8255822: Zero: improve build-time JVMTI handling [v3] In-Reply-To: <83OnOrWG5dCsK_f6OUm7517vxV4uSg76a-PhTQZJjlM=.687f44d9-d4c2-440a-86dd-efa5da75800b@github.com> References: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> <83OnOrWG5dCsK_f6OUm7517vxV4uSg76a-PhTQZJjlM=.687f44d9-d4c2-440a-86dd-efa5da75800b@github.com> Message-ID: On Tue, 10 Nov 2020 12:33:34 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Simplify a few JVMTI_ENABLED blocks >> - Merge branch 'master' into JDK-8255822-zero-jvmti-rework >> - Fix build error >> - Merge branch 'master' into JDK-8255822-zero-jvmti-rework >> - Revert one dubious change >> - 8255822: Zero: improve build-time JVMTI handling >> Summary: use C++ templates instead of XSLT transforms > > Still good. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1061 From shade at openjdk.java.net Tue Nov 10 17:26:00 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 10 Nov 2020 17:26:00 GMT Subject: Integrated: 8255822: Zero: improve build-time JVMTI handling In-Reply-To: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> References: <7-Y8_X8c04i5dRiPOWnSV-Wqige6jXxb1WPW1pIO7CE=.f66482a9-f624-4897-a68f-1e131a6abf8f@github.com> Message-ID: On Wed, 4 Nov 2020 16:29:27 GMT, Aleksey Shipilev wrote: > Current Zero interpreter has the optimization for JVMTI support. It recognizes that JVMTI is disabled most of the time, and that JVMTI checks in the interpreter code slows it down considerably. (I measured it myself when working on this patch: removing this optimization yields about 20% hit in build times). > > Current optimization works as follows. At build time, an XSLT transform is performed on `bytecodeInterpreter.cpp`, yielding `bytecodeInterpreterWithChecks.cpp`. In that new compilation unit, `VM_JVMTI` macro is defined, and a new entry point -- `BytecodeInterpreter::withChecks` -- is defined. Then, both compilation units are compiled. In one of them, `JVMTI` hooks are stripped out. In another, they persist. Then, callers have to choose which entry point to use. > > I believe this can be rewritten to use C++ templates instead of XLST and defines dance. This also allows to clean up JVMTI checks a bit. > > Additional testing: > - [x] Linux x86_64 Zero fastdebug build with `-jvmti` > - [x] Linux x86_64 Zero fastdebug/release build times are not regressing This pull request has now been integrated. Changeset: 643969a1 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/643969a1 Stats: 188 lines in 6 files changed: 5 ins; 128 del; 55 mod 8255822: Zero: improve build-time JVMTI handling Reviewed-by: dholmes, ihse ------------- PR: https://git.openjdk.java.net/jdk/pull/1061 From kirk at kodewerk.com Tue Nov 10 17:38:47 2020 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Tue, 10 Nov 2020 09:38:47 -0800 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> Message-ID: <7C170D34-56B3-473F-95B8-C967E6EC945B@kodewerk.com> > On Nov 9, 2020, at 3:30 AM, Volker Simonis wrote: > > The following is just my personal opinion based on my feeling - it's > not backed by any data. > > The arguments for removing BiasedLocking go like this: > - only single-threaded legacy code using legacy APIs is benefiting > from BiasedLocking > - badly written code (which can easily be fixed) is benefiting from > BiasedLocking > > BiasedLocking was disabled in JDK15. That's not a release which is > widely used in production. Not many enterprise workloads have even > migrated to JDK 11. We know that migration to JDK 11 is hard and > migration to the next LTS version 17 will be even harder. Big > applications tend to have a lot of dependencies and code which can't > be easily upgraded or rewritten (even if it's badly written or uses > "old" APIs). In order to not introduce yet another upgrade problem I > think it would make sense to keep BiasedLocking alive in the next LTS > release 17. Removing it in 18 would be fine. +1 Kind regards, Kirk > > Best regards, > Volker > > > On Mon, Nov 9, 2020 at 11:19 AM Alan Bateman wrote: >> >> On 09/11/2020 09:44, Andrew Haley wrote: >>> : >>> JDK-8254078 is a simple example, and a very modest proposal for change, >>> and it's still stuck in CSR. >> >> The CSR was approved and closed on Oct 24 so you should be good to go. >> >> -Alan From dcubed at openjdk.java.net Tue Nov 10 17:39:25 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 17:39:25 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also done promotion perf runs on these bits and says "the results look fine". Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: coleenp CR - refactor common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/642/files - new: https://git.openjdk.java.net/jdk/pull/642/files/aec90b9a..15ad3526 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=03-04 Stats: 80 lines in 2 files changed: 30 ins; 37 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/642.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/642/head:pull/642 PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 17:45:58 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 17:45:58 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 14:53:34 GMT, Daniel D. Daugherty wrote: >> Looks very good! Thanks for picking this up and taking it all the way! > > @fisk - Thanks for the sanity check review. And thanks for prototyping > this work and showing that this crazy idea could work! :-) @coleenp - I refactored common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). Please let me know if you're okay with it. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 18:14:57 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 18:14:57 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <0fYueANbbMJQZaXMqos-O_pCe-OEf5fgMHWEnOqJr3M=.8884e351-76f5-45d9-b21a-c6160396f116@github.com> Message-ID: <3kTvc5Z5p0PiaJgA0Aj7NoaSVWX2vc1ii8g_Y3NBdf8=.72a0228f-73b1-4a89-9b2b-313f017e43e5@github.com> On Tue, 10 Nov 2020 15:02:31 GMT, Daniel D. Daugherty wrote: >> I'd rather see ThreadBlockInVM as the convention of allowing the thread to block if a safepoint is requested. The calls like process_if_requested are becoming alphabet soup and keep changing, so having TBIVM is better in my opinion. >> That said, this is a strange usage. This code appears three times. It should be a function like allow_safepoint_block(LogStream* ls, timer), with some comment above. Then it's clear that it's checking for a safepoint in a loop that could take a long time and the logging is ancillary. > > I previously looked at refactoring the three locations where > `ThreadBlockInVM` is used. The problem with the refactoring > is that the log messages and the parameters have some > differences and some commonalities. Each of these logging > sites is trying to communicate some local context that is > unique to that call site along with some global context that > might have changed from call site to call site. > > I'll take another look at refactoring shortly and will let you > know what I come up with. I refactored common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). Please let me know if you're okay with it. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From joe.darcy at oracle.com Tue Nov 10 19:04:07 2020 From: joe.darcy at oracle.com (Joe Darcy) Date: Tue, 10 Nov 2020 11:04:07 -0800 Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v3] In-Reply-To: References: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> Message-ID: On 11/9/2020 2:12 PM, Xubo Zhang wrote: > On Mon, 2 Nov 2020 17:42:27 GMT, Joe Darcy wrote: > >>> Xubo Zhang has updated the pull request incrementally with two additional commits since the last revision: >>> >>> - Merge branch 'JDK-8255368' of github.com:xbzhang99/bugfixes into JDK-8255368 >>> - Added test cases for exp at the value of 1024 and 10000 >> The regression tests for exp should be explicitly updated to cover the previously erroneous input if they do not do so already. > Hi Darcy, > Where should the test be? A new test file? > Given the absence of an existing dedicated exp test, yes, a new file in the test/jdk/java/lang/Math directory. I suggest looking at Atan2Tests.java as a model for another test that just probes a few values. HTH, -Joe From fparain at openjdk.java.net Tue Nov 10 19:19:59 2020 From: fparain at openjdk.java.net (Frederic Parain) Date: Tue, 10 Nov 2020 19:19:59 GMT Subject: RFR: 8256052: Remove unused allocation type from fieldInfo [v4] In-Reply-To: <-Wc3FeXsLIx0x0U3gt0oSW6qiTGSRuMKVr4qJ3aTTXE=.7ed06bf5-f795-44ca-99ca-848d199a2733@github.com> References: <-Wc3FeXsLIx0x0U3gt0oSW6qiTGSRuMKVr4qJ3aTTXE=.7ed06bf5-f795-44ca-99ca-848d199a2733@github.com> Message-ID: On Mon, 9 Nov 2020 20:23:55 GMT, Harold Seigel wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment > > Looks good! Claes, Lois, Harold, Thank you for your reviews and feedback. Fred ------------- PR: https://git.openjdk.java.net/jdk/pull/1130 From fparain at openjdk.java.net Tue Nov 10 19:20:01 2020 From: fparain at openjdk.java.net (Frederic Parain) Date: Tue, 10 Nov 2020 19:20:01 GMT Subject: Integrated: 8256052: Remove unused allocation type from fieldInfo In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 15:43:09 GMT, Frederic Parain wrote: > Please review this small cleanup code, removing the now unused allocation type from the fieldInfo structure. > > Tested with Mach5, tiers 1 to 3 and locally by running test/hotspot/jtreg/serviceability/sa tests. > > Thank you, > > Fred This pull request has now been integrated. Changeset: bd3e65b5 Author: Frederic Parain URL: https://git.openjdk.java.net/jdk/commit/bd3e65b5 Stats: 126 lines in 5 files changed: 5 ins; 99 del; 22 mod 8256052: Remove unused allocation type from fieldInfo Reviewed-by: redestad, lfoltan, hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/1130 From sspitsyn at openjdk.java.net Tue Nov 10 20:11:58 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 10 Nov 2020 20:11:58 GMT Subject: RFR: 8138588: VerifyMergedCPBytecodes option cleanup needed In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 23:46:04 GMT, Coleen Phillimore wrote: > This option has been removed in favor of always verifying the bytecodes in debug mode. Tested with tier1-3. Hi Coleen, This looks good to me. Thank you for taking care about this flag! Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1137 From sspitsyn at openjdk.java.net Tue Nov 10 20:17:08 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 10 Nov 2020 20:17:08 GMT Subject: RFR: 8253525: Implement getInstanceSize/sizeOf intrinsics [v6] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 12:04:21 GMT, Aleksey Shipilev wrote: >> This is fork off the SizeOf JEP, JDK-8249196. There is already the entry point in JDK that can use the intrinsic like this: `Instrumentation.getInstanceSize`. Therefore, we can implement the C1/C2 intrinsic now, hook it up to `Instrumentation`, and let the tools use that fast path today. >> >> With this patch, JOL is able to be close to `deepSizeOf` implementation from SizeOf JEP. >> >> Example performance improvements for sizing up a custom linked list: >> >> Benchmark (size) Mode Cnt Score Error Units >> >> # Default >> LinkedChainBench.linkedChain 1 avgt 5 705.835 ? 8.051 ns/op >> LinkedChainBench.linkedChain 10 avgt 5 3148.874 ? 37.856 ns/op >> LinkedChainBench.linkedChain 100 avgt 5 28693.256 ? 142.254 ns/op >> LinkedChainBench.linkedChain 1000 avgt 5 290161.590 ? 4594.631 ns/op >> >> # Instrumentation attached, no intrinsics >> LinkedChainBench.linkedChain 1 avgt 5 159.659 ? 19.238 ns/op >> LinkedChainBench.linkedChain 10 avgt 5 717.659 ? 22.540 ns/op >> LinkedChainBench.linkedChain 100 avgt 5 7739.394 ? 111.683 ns/op >> LinkedChainBench.linkedChain 1000 avgt 5 80724.238 ? 2887.794 ns/op >> >> # Instrumentation attached, new intrinsics >> LinkedChainBench.linkedChain 1 avgt 5 95.254 ? 0.808 ns/op >> LinkedChainBench.linkedChain 10 avgt 5 261.564 ? 8.524 ns/op >> LinkedChainBench.linkedChain 100 avgt 5 3367.192 ? 21.128 ns/op >> LinkedChainBench.linkedChain 1000 avgt 5 34148.851 ? 373.080 ns/op > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Trim down the iteration count, and use -Xbatch to wait for compilation > - Use proper constant names in the test > - Merge branch 'master' into JDK-8253525-sizeof-intrinsics > - The new intrinsic-related test > - Revert the change to test > - Merge branch 'master' into JDK-8253525-sizeof-intrinsics > - Add new intrinsics to toBeInvestigated list in CheckGraalIntrinsics.java > - 8253525: Implement getInstanceSize/sizeOf intrinsics Aleksey, Thank you for the update! It looks good to me. One more nit, I forgot to list in my previous comment, is uneeded '()' around comparisons: `+ static final int REF_SIZE = ((compressedOops == null) || (compressedOops == true)) ? 4 : 8;` Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/650 From rehn at openjdk.java.net Tue Nov 10 20:37:01 2020 From: rehn at openjdk.java.net (Robbin Ehn) Date: Tue, 10 Nov 2020 20:37:01 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Tue, 10 Nov 2020 17:39:25 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > coleenp CR - refactor common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). Hi, thanks for fixing. I had some comments nothing so approving. src/hotspot/share/runtime/monitorDeflationThread.cpp line 92: > 90: // We wait for GuaranteedSafepointInterval so that > 91: // is_async_deflation_needed() is checked at the same interval. > 92: ml.wait(GuaranteedSafepointInterval); I don't like that we add a new global monitor just for the whitebox API. Without WB poking this could just a plain sleep. If we must have this new global monitor there seems be no reason for this to be no safepoint ? So ThreadBlockInVM would not be needed if we did safepoint checks instead? I would either skip WB notification and run the test with a very low GuaranteedSafepointInterval and just set flag and wait a second. Or if this is really important use a local semaphore and skip the boolean, each increase on sema would result in a deflation pass. src/hotspot/share/runtime/synchronizer.cpp line 1419: > 1417: ", count=" SIZE_FORMAT ", max=" SIZE_FORMAT, op_name, > 1418: in_use_list_ceiling(), _in_use_list.count(), _in_use_list.max()); > 1419: timer_p->start(); ThreadBlockInVM have a miss-feature: it safepoint polls on front-edge and back-edge. It should only have that poll on backedge, once that is fixed, this will do the wrong thing. Also you may safepoint on both front-edge and back-edge now, so the timer would show the wrong thing then. So to get the timer correct you should use: SafepointMechanism::process_if_requested(thread); src/hotspot/share/runtime/synchronizer.cpp line 1532: > 1530: // A JavaThread must check for a safepoint/handshake and honor it. > 1531: chk_for_block_req(self->as_Java_thread(), "deletion", "deleted_count", > 1532: deleted_count, ls, &timer); If you release oopStorage when deflating you can do this entire loop while blocked instead, which will be faster. src/hotspot/share/runtime/objectMonitor.hpp line 148: > 146: DEFINE_PAD_MINUS_SIZE(0, OM_CACHE_LINE_SIZE, sizeof(volatile markWord) + > 147: sizeof(WeakHandle)); > 148: // Used by async deflation as a marker in the _owner field: I have test with and without padding, I saw no difference. ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Tue Nov 10 20:41:58 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 20:41:58 GMT Subject: Integrated: 8138588: VerifyMergedCPBytecodes option cleanup needed In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 23:46:04 GMT, Coleen Phillimore wrote: > This option has been removed in favor of always verifying the bytecodes in debug mode. Tested with tier1-3. This pull request has now been integrated. Changeset: 7d4e86be Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/7d4e86be Stats: 9 lines in 4 files changed: 3 ins; 5 del; 1 mod 8138588: VerifyMergedCPBytecodes option cleanup needed Reviewed-by: hseigel, dcubed, sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/1137 From coleenp at openjdk.java.net Tue Nov 10 20:41:56 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 20:41:56 GMT Subject: RFR: 8138588: VerifyMergedCPBytecodes option cleanup needed In-Reply-To: References: Message-ID: <5acS29YfcnHztBpSs1ft6gDgM18W_4kgcJDYsipR-68=.e781cccf-d7f9-48cd-8335-276113122492@github.com> On Tue, 10 Nov 2020 20:09:06 GMT, Serguei Spitsyn wrote: >> This option has been removed in favor of always verifying the bytecodes in debug mode. Tested with tier1-3. > > Hi Coleen, > > This looks good to me. > Thank you for taking care about this flag! > > Thanks, > Serguei Thanks Dan, Harold and Serguei. ------------- PR: https://git.openjdk.java.net/jdk/pull/1137 From dcubed at openjdk.java.net Tue Nov 10 20:48:00 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 20:48:00 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> On Tue, 10 Nov 2020 19:41:23 GMT, Robbin Ehn wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> coleenp CR - refactor common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). > > src/hotspot/share/runtime/monitorDeflationThread.cpp line 92: > >> 90: // We wait for GuaranteedSafepointInterval so that >> 91: // is_async_deflation_needed() is checked at the same interval. >> 92: ml.wait(GuaranteedSafepointInterval); > > I don't like that we add a new global monitor just for the whitebox API. > Without WB poking this could just a plain sleep. > > If we must have this new global monitor there seems be no reason for this to be no safepoint ? > So ThreadBlockInVM would not be needed if we did safepoint checks instead? > > I would either skip WB notification and run the test with a very low GuaranteedSafepointInterval and just set flag and wait a second. > Or if this is really important use a local semaphore and skip the boolean, each increase on sema would result in a deflation pass. We may still decide to do this fix (even though the _object field is now weak): JDK-8249638 Re-instate idle monitor deflation as part of System.gc() https://bugs.openjdk.java.net/browse/JDK-8249638 and if we do, then we'll still need the request mechanism. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From rehn at openjdk.java.net Tue Nov 10 21:00:00 2020 From: rehn at openjdk.java.net (Robbin Ehn) Date: Tue, 10 Nov 2020 21:00:00 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> Message-ID: On Tue, 10 Nov 2020 20:45:18 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/monitorDeflationThread.cpp line 92: >> >>> 90: // We wait for GuaranteedSafepointInterval so that >>> 91: // is_async_deflation_needed() is checked at the same interval. >>> 92: ml.wait(GuaranteedSafepointInterval); >> >> I don't like that we add a new global monitor just for the whitebox API. >> Without WB poking this could just a plain sleep. >> >> If we must have this new global monitor there seems be no reason for this to be no safepoint ? >> So ThreadBlockInVM would not be needed if we did safepoint checks instead? >> >> I would either skip WB notification and run the test with a very low GuaranteedSafepointInterval and just set flag and wait a second. >> Or if this is really important use a local semaphore and skip the boolean, each increase on sema would result in a deflation pass. > > We may still decide to do this fix (even though the _object field is now weak): > > JDK-8249638 Re-instate idle monitor deflation as part of System.gc() > https://bugs.openjdk.java.net/browse/JDK-8249638 > > and if we do, then we'll still need the request mechanism. So why not use a local semaphore and wait with safepoint check instead? >> src/hotspot/share/runtime/synchronizer.cpp line 1419: >> >>> 1417: ", count=" SIZE_FORMAT ", max=" SIZE_FORMAT, op_name, >>> 1418: in_use_list_ceiling(), _in_use_list.count(), _in_use_list.max()); >>> 1419: timer_p->start(); >> >> ThreadBlockInVM have a miss-feature: it safepoint polls on front-edge and back-edge. >> It should only have that poll on backedge, once that is fixed, this will do the wrong thing. >> Also you may safepoint on both front-edge and back-edge now, so the timer would show the wrong thing then. >> >> So to get the timer correct you should use: >> SafepointMechanism::process_if_requested(thread); > > The baseline code (ObjectSynchronizer::deflate_common_idle_monitors()) > uses ThreadBlockInVM currently. I don't want to change that as part of > this work. If we want to generally change uses of ThreadBlockInVM to > something else, then we should do that with a dedicated bug. Currently there is no issue with ThreadBlockInVM since there is no code inside those scopes. This adds code there which assumes the timer will be 'resumed', and logs "resume" when it actually could be going to a safepoint. >> src/hotspot/share/runtime/synchronizer.cpp line 1532: >> >>> 1530: // A JavaThread must check for a safepoint/handshake and honor it. >>> 1531: chk_for_block_req(self->as_Java_thread(), "deletion", "deleted_count", >>> 1532: deleted_count, ls, &timer); >> >> If you release oopStorage when deflating you can do this entire loop while blocked instead, which will be faster. >> >> (From what I remember you can actually release during a safepoint, but that is not future-prof as I understood it then. So I skipped going into that rabbit hole this time also.) > > @fisk said we should not release the oopStorage during a safepoint > because that's not safe or will not be safe. I can't remember which. Yes that's why I said you can release it during deflation instead. (not saying you should do this in this feeature change-set) >> src/hotspot/share/runtime/objectMonitor.hpp line 148: >> >>> 146: DEFINE_PAD_MINUS_SIZE(0, OM_CACHE_LINE_SIZE, sizeof(volatile markWord) + >>> 147: sizeof(WeakHandle)); >>> 148: // Used by async deflation as a marker in the _owner field: >> >> I have test with and without padding, I saw no difference. > > We've removed enough padding with this work already. If we > want to do more padding removal, then we need to use a > different RFE. Sure, this was more a FYI. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 20:59:58 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 20:59:58 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Tue, 10 Nov 2020 20:34:12 GMT, Robbin Ehn wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> coleenp CR - refactor common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). > > Hi, thanks for fixing. > > I had some comments nothing major so approving. @robehn - Thanks for the review!! And thanks for approving. > src/hotspot/share/runtime/synchronizer.cpp line 1419: > >> 1417: ", count=" SIZE_FORMAT ", max=" SIZE_FORMAT, op_name, >> 1418: in_use_list_ceiling(), _in_use_list.count(), _in_use_list.max()); >> 1419: timer_p->start(); > > ThreadBlockInVM have a miss-feature: it safepoint polls on front-edge and back-edge. > It should only have that poll on backedge, once that is fixed, this will do the wrong thing. > Also you may safepoint on both front-edge and back-edge now, so the timer would show the wrong thing then. > > So to get the timer correct you should use: > SafepointMechanism::process_if_requested(thread); The baseline code (ObjectSynchronizer::deflate_common_idle_monitors()) uses ThreadBlockInVM currently. I don't want to change that as part of this work. If we want to generally change uses of ThreadBlockInVM to something else, then we should do that with a dedicated bug. > src/hotspot/share/runtime/synchronizer.cpp line 1532: > >> 1530: // A JavaThread must check for a safepoint/handshake and honor it. >> 1531: chk_for_block_req(self->as_Java_thread(), "deletion", "deleted_count", >> 1532: deleted_count, ls, &timer); > > If you release oopStorage when deflating you can do this entire loop while blocked instead, which will be faster. > > (From what I remember you can actually release during a safepoint, but that is not future-prof as I understood it then. So I skipped going into that rabbit hole this time also.) @fisk said we should not release the oopStorage during a safepoint because that's not safe or will not be safe. I can't remember which. > src/hotspot/share/runtime/objectMonitor.hpp line 148: > >> 146: DEFINE_PAD_MINUS_SIZE(0, OM_CACHE_LINE_SIZE, sizeof(volatile markWord) + >> 147: sizeof(WeakHandle)); >> 148: // Used by async deflation as a marker in the _owner field: > > I have test with and without padding, I saw no difference. We've removed enough padding with this work already. If we want to do more padding removal, then we need to use a different RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Tue Nov 10 21:07:03 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 21:07:03 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 02:35:17 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). Changes requested by coleenp (Reviewer). src/hotspot/share/oops/markWord.cpp line 63: > 61: fatal("bad header=" INTPTR_FORMAT, value()); > 62: } > 63: This makes it so much clearer where the displaced markWord is. src/hotspot/share/runtime/globals.hpp line 750: > 748: product(intx, MonitorUsedDeflationThreshold, 90, EXPERIMENTAL, \ > 749: "Percentage of used monitors before triggering deflation (0 is " \ > 750: "off). The check is performed on GuaranteedSafepointInterval " \ Should there still be experimental options after this change? src/hotspot/share/runtime/monitorDeflationThread.cpp line 85: > 83: // visible to external suspension. > 84: > 85: ThreadBlockInVM tbivm(jt); Does this have to be a JavaThread? Could it be a non-java thread since deflating monitors doesn't have to call any Java code? You'd have to lock down the Monitor list maybe, but couldn't this be a NamedThread? This isn't a request to change it right now. src/hotspot/share/runtime/synchronizer.cpp line 1641: > 1639: > 1640: // Do the final audit and print of ObjectMonitor stats; must be done > 1641: // by the VMThread (at VM exit time). Can you take (at VM exit time) out of parenthesis? it made me wonder when else is this called. src/hotspot/share/runtime/objectMonitor.cpp line 509: > 507: // > 508: bool ObjectMonitor::deflate_monitor() { > 509: if (is_busy()) { is_busy should be checked != 0 since it doesn't return a bool. src/hotspot/share/runtime/objectMonitor.cpp line 540: > 538: if (try_set_owner_from(NULL, DEFLATER_MARKER) != NULL) { > 539: // The owner field is no longer NULL so we lost the race since the > 540: // ObjectMonitor is now busy. So here would contentions be > 0? Can it be asserted? Doesn't need to be, the comment really helps to understand why the cas failed. src/hotspot/share/runtime/objectMonitor.cpp line 551: > 549: if (try_set_owner_from(DEFLATER_MARKER, NULL) != DEFLATER_MARKER) { > 550: // Deferred decrement for the JT EnterI() that cancelled the async deflation. > 551: add_to_contentions(-1); contentions is essentially a refcount, isn't it. Can you fix the comment to include this at line 360 since that's not the only purpose of this count. // Keep track of contention for JVM/TI and M&M queries. add_to_contentions(1); src/hotspot/share/runtime/synchronizer.hpp line 61: > 59: bool has_next() const { return _current != NULL; } > 60: ObjectMonitor* next(); > 61: }; Can MonitorList be defined in the .cpp file? I don't see anything outside of synchronizer.cpp that refers to it. src/hotspot/share/runtime/synchronizer.hpp line 28: > 26: #define SHARE_RUNTIME_SYNCHRONIZER_HPP > 27: > 28: #include "logging/logStream.hpp" If you need to put MonitorList in the header file, use a forward declaration for LogStream instead of #including logstream.hpp. src/hotspot/share/runtime/objectMonitor.hpp line 171: > 169: volatile int _SpinDuration; > 170: > 171: jint _contentions; // Number of active contentions in enter(). It is used by is_busy() Future RFE - can we replace jint with int32_t or even int or some C++ types. We're trying not to have Java types leak into runtime code since this doesn't directly interface with Java. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 21:07:03 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 21:07:03 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> Message-ID: On Tue, 10 Nov 2020 20:52:15 GMT, Robbin Ehn wrote: >> We may still decide to do this fix (even though the _object field is now weak): >> >> JDK-8249638 Re-instate idle monitor deflation as part of System.gc() >> https://bugs.openjdk.java.net/browse/JDK-8249638 >> >> and if we do, then we'll still need the request mechanism. > > So why not use a local semaphore and wait with safepoint check instead? Sorry my preference is for Monitors instead of semaphores. Let's take that discussion off this PR and you can explain why you dislike the Monitor so much and think the local semaphore is the way to go. >> The baseline code (ObjectSynchronizer::deflate_common_idle_monitors()) >> uses ThreadBlockInVM currently. I don't want to change that as part of >> this work. If we want to generally change uses of ThreadBlockInVM to >> something else, then we should do that with a dedicated bug. > > Currently there is no issue with ThreadBlockInVM since there is no code inside those scopes. > This adds code there which assumes the timer will be 'resumed', and logs "resume" when it actually could be going to a safepoint. So if I narrow the scope around the ThreadBlockInVM, then it would be fine? { // Honor block request. ThreadBlockInVM tbivm(self); } I can make that change before I integrate... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 21:12:02 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 21:12:02 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> Message-ID: <_fjxyjNA-4UWOBnqEj6DVaS4BZEVbTUazpMJRolWcxg=.00e91263-7f75-4fb3-9350-36580395b1a6@github.com> On Tue, 10 Nov 2020 20:55:17 GMT, Robbin Ehn wrote: >> @fisk said we should not release the oopStorage during a safepoint >> because that's not safe or will not be safe. I can't remember which. > > Yes that's why I said you can release it during deflation instead. > > (not saying you should do this in this feeature change-set) What does this "you can do this entire loop while blocked instead" mean? Releasing during deflation kind of messes with the life-cycle I was trying to enforce since deletion is the nature end-of-life for these... But to think about that I need to know what you mean by "you can do this entire loop while blocked instead"... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 21:12:04 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 21:12:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 16:41:49 GMT, Coleen Phillimore wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). > > src/hotspot/share/oops/markWord.cpp line 63: > >> 61: fatal("bad header=" INTPTR_FORMAT, value()); >> 62: } >> 63: > > This makes it so much clearer where the displaced markWord is. Yup. That was the point. Get rid of the magic offset zero coding... > src/hotspot/share/runtime/globals.hpp line 750: > >> 748: product(intx, MonitorUsedDeflationThreshold, 90, EXPERIMENTAL, \ >> 749: "Percentage of used monitors before triggering deflation (0 is " \ >> 750: "off). The check is performed on GuaranteedSafepointInterval " \ > > Should there still be experimental options after this change? Robbin added MonitorUsedDeflationThreshold as an experimental option back in JDK10. See the longer reply to David's comment. I don't plan to change that option with this changeset. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From rehn at openjdk.java.net Tue Nov 10 21:21:04 2020 From: rehn at openjdk.java.net (Robbin Ehn) Date: Tue, 10 Nov 2020 21:21:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> Message-ID: <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> On Tue, 10 Nov 2020 21:01:21 GMT, Daniel D. Daugherty wrote: >> So why not use a local semaphore and wait with safepoint check instead? > > Sorry my preference is for Monitors instead of semaphores. Let's > take that discussion off this PR and you can explain why you dislike > the Monitor so much and think the local semaphore is the way to go. Yes >> Currently there is no issue with ThreadBlockInVM since there is no code inside those scopes. >> This adds code there which assumes the timer will be 'resumed', and logs "resume" when it actually could be going to a safepoint. > > So if I narrow the scope around the ThreadBlockInVM, then it would be fine? > > { > // Honor block request. > ThreadBlockInVM tbivm(self); > } > > I can make that change before I integrate... Yes that avoids it! >> Yes that's why I said you can release it during deflation instead. >> >> (not saying you should do this in this feeature change-set) > > What does this "you can do this entire loop while blocked instead" mean? > > Releasing during deflation kind of messes with the life-cycle I was trying > to enforce since deletion is the nature end-of-life for these... But to think > about that I need to know what you mean by "you can do this entire loop > while blocked instead"... If you only need to free CHeap memory, you can do: size_t deleted_count = 0; ThreadBlockInVM tbivm(self); for (ObjectMonitor* monitor: delete_list) { delete monitor; deleted_count++; } } ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 21:21:04 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 21:21:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: <-NIFifwEMk1oYQugLdKvFNgAxmeyEsf05sku_ccaNW0=.e54c02a9-acd4-449e-8fae-05da29aad1a9@github.com> On Tue, 10 Nov 2020 16:46:10 GMT, Coleen Phillimore wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). > > src/hotspot/share/runtime/monitorDeflationThread.cpp line 85: > >> 83: // visible to external suspension. >> 84: >> 85: ThreadBlockInVM tbivm(jt); > > Does this have to be a JavaThread? Could it be a non-java thread since deflating monitors doesn't have to call any Java code? You'd have to lock down the Monitor list maybe, but couldn't this be a NamedThread? This isn't a request to change it right now. Ummm... we use a JavaThread because we have to stop async deflation during safepoints so that we're not messing with Object headers during GC. Yes, it's possible to use a non-JavaThread because this is "just software", but I don't want to try to figure out those races... > src/hotspot/share/runtime/synchronizer.cpp line 1641: > >> 1639: >> 1640: // Do the final audit and print of ObjectMonitor stats; must be done >> 1641: // by the VMThread (at VM exit time). > > Can you take (at VM exit time) out of parenthesis? it made me wonder when else is this called. Okay. > src/hotspot/share/runtime/objectMonitor.cpp line 509: > >> 507: // >> 508: bool ObjectMonitor::deflate_monitor() { >> 509: if (is_busy()) { > > is_busy should be checked != 0 since it doesn't return a bool. Nice catch! That has been there for many, many years... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 21:25:04 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 21:25:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 17:05:16 GMT, Coleen Phillimore wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). > > src/hotspot/share/runtime/objectMonitor.cpp line 540: > >> 538: if (try_set_owner_from(NULL, DEFLATER_MARKER) != NULL) { >> 539: // The owner field is no longer NULL so we lost the race since the >> 540: // ObjectMonitor is now busy. > > So here would contentions be > 0? Can it be asserted? Doesn't need to be, the comment really helps to understand why the cas failed. No we can't assert that (contentions > 0). The ownership might have been taken by a fast path thread so it grabbed ownership without having to update contentions and wait. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Tue Nov 10 21:25:06 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 21:25:06 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 16:31:23 GMT, Jorn Vernee wrote: >> src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 121: >> >>> 119: upcall_info.upcall_method.name, upcall_info.upcall_method.sig, >>> 120: &args, thread); >>> 121: } >> >> This code shouldn't be in the cpu directory. This should be in SharedRuntime or in jni.cpp. It should have a JNI_ENTRY and not transition directly. I don't know what AttachCurrentThreadAsDaemon does. > > Roger that. > > We need the thread state transition though in case we get a random native thread calling us. yikes. Does that work? ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From github.com+1981974+kuaiwei at openjdk.java.net Tue Nov 10 21:29:04 2020 From: github.com+1981974+kuaiwei at openjdk.java.net (kuaiwei) Date: Tue, 10 Nov 2020 21:29:04 GMT Subject: Withdrawn: 8253049: Enhance itable_stub for AArch64 and x86_64 In-Reply-To: References: Message-ID: On Fri, 11 Sep 2020 11:58:34 GMT, kuaiwei wrote: > Now itable_stub will go through instanceKlass's itable twice to look up a method entry. resolved klass is used for type checking and method holder klass is used to find method entry. In many cases , we observed resolved klass is as same as holder klass. So we can improve itable stub based on it. If they are same klass, stub uses a fast loop to check only one klass. If not, a slow loop is used to checking both klasses. > > Even entering in slow loop, new implementation can be better than old one in some cases. Because new stub just need go through itable once and reduce memory operations. > > > bug: https://bugs.openjdk.java.net/browse/JDK-8253049 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/128 From dcubed at openjdk.java.net Tue Nov 10 21:30:05 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 21:30:05 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: <_8BktJ0NHE_i7LQ7_X8JQyyfriKl8_-PA7hOfdBrC3U=.ac425b59-0e89-47f2-9fa7-e461e55689a2@github.com> On Tue, 10 Nov 2020 17:32:56 GMT, Coleen Phillimore wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). > > src/hotspot/share/runtime/objectMonitor.cpp line 551: > >> 549: if (try_set_owner_from(DEFLATER_MARKER, NULL) != DEFLATER_MARKER) { >> 550: // Deferred decrement for the JT EnterI() that cancelled the async deflation. >> 551: add_to_contentions(-1); > > contentions is essentially a refcount, isn't it. Can you fix the comment to include this at line 360 since that's not the only purpose of this count. > > // Keep track of contention for JVM/TI and M&M queries. > add_to_contentions(1); No it is not a ref_count. We got rid of the accurate ref_count field because maintaining it was too slow. contentions just tells you how many threads have blocked on the slow-path trying to enter the monitor. The fast-paths never touch contentions. The comment on line 360 is accurate. > src/hotspot/share/runtime/synchronizer.hpp line 61: > >> 59: bool has_next() const { return _current != NULL; } >> 60: ObjectMonitor* next(); >> 61: }; > > Can MonitorList be defined in the .cpp file? I don't see anything outside of synchronizer.cpp that refers to it. I can see if that will work. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From hseigel at openjdk.java.net Tue Nov 10 21:30:04 2020 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 10 Nov 2020 21:30:04 GMT Subject: RFR: 8255787: Tag container tests that use cGroups with cgroups keyword Message-ID: Please review this small change to add a cgroups keyword to tests that use cgroups. The fix was tested by running Mach5 container tests. ------------- Commit messages: - 8255787: Tag container tests that use cGroups with cgroups keyword Changes: https://git.openjdk.java.net/jdk/pull/1148/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1148&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255787 Stats: 20 lines in 15 files changed: 15 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1148.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1148/head:pull/1148 PR: https://git.openjdk.java.net/jdk/pull/1148 From coleenp at openjdk.java.net Tue Nov 10 21:37:06 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 21:37:06 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 14:16:22 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: > > - Merge pull request #7 from JornVernee/Additional_Review_Comments > > Additional review comments > - Revert System.java changes > - Set copyright year for added files to 2020 > - Check result of AttachCurrentThread > - Sort includes alphabetically > - Relax ret_addr_offset() assert > - Extra space after if > - remove excessive asserts in ProgrammableInvoker::invoke_native > - Remove os::is_MP() check > - remove blank line in thread.hpp The Hotspot prims and runtime changes look good to me. src/hotspot/share/prims/universalUpcallHandler.cpp line 68: > 66: vm->functions->DetachCurrentThread(vm); > 67: } > 68: } Yes, this looks good. Thanks for moving this. src/hotspot/share/prims/universalUpcallHandler.cpp line 85: > 83: upcall_method.sig = SymbolTable::new_symbol("(L" FOREIGN_ABI "ProgrammableUpcallHandler;J)V"); > 84: > 85: assert(upcall_method.klass->lookup_method(upcall_method.name, upcall_method.sig) != nullptr, I think you need a ResourceMark here. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/634 From dcubed at openjdk.java.net Tue Nov 10 21:51:03 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 21:51:03 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> Message-ID: <7rQQpVtTPecL-Vt4JJ0CR3U_1523KksQwFAsn9QpJK8=.5e872ffe-d1e4-40e9-9dd4-595712452e5c@github.com> On Tue, 10 Nov 2020 21:16:33 GMT, Robbin Ehn wrote: >> So if I narrow the scope around the ThreadBlockInVM, then it would be fine? >> >> { >> // Honor block request. >> ThreadBlockInVM tbivm(self); >> } >> >> I can make that change before I integrate... > > Yes that avoids it! Done. I also did the one in ObjectSynchronizer::request_deflate_idle_monitors(). >> What does this "you can do this entire loop while blocked instead" mean? >> >> Releasing during deflation kind of messes with the life-cycle I was trying >> to enforce since deletion is the nature end-of-life for these... But to think >> about that I need to know what you mean by "you can do this entire loop >> while blocked instead"... > > If you only need to free CHeap memory, you can do: > size_t deleted_count = 0; > ThreadBlockInVM tbivm(self); > for (ObjectMonitor* monitor: delete_list) { > delete monitor; > deleted_count++; > } > } Ahhh... but that only works if we release the oopStorage when we deflate. Okay. I grok it now, but don't want to do that in this changeset. I would want a complete stress test cycle for that kind of a change. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 21:51:02 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 21:51:02 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: <3oT-RoU0dfm8DXWOSdkbjLG_xJnVsniox-6kpJ_h2cA=.15fff0b3-c8b1-4cc9-9113-8f764c20fa5e@github.com> On Tue, 10 Nov 2020 20:57:48 GMT, Coleen Phillimore wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). > > src/hotspot/share/runtime/synchronizer.hpp line 28: > >> 26: #define SHARE_RUNTIME_SYNCHRONIZER_HPP >> 27: >> 28: #include "logging/logStream.hpp" > > If you need to put MonitorList in the header file, use a forward declaration for LogStream instead of #including logstream.hpp. I can see if that will work. > src/hotspot/share/runtime/objectMonitor.hpp line 171: > >> 169: volatile int _SpinDuration; >> 170: >> 171: jint _contentions; // Number of active contentions in enter(). It is used by is_busy() > > Future RFE - can we replace jint with int32_t or even int or some C++ types. We're trying not to have Java types leak into runtime code since this doesn't directly interface with Java. It can be a future RFE, but it won't be at the top of my list of things to do. There may already be an RFE for that. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Tue Nov 10 22:25:02 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 22:25:02 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: <7rQQpVtTPecL-Vt4JJ0CR3U_1523KksQwFAsn9QpJK8=.5e872ffe-d1e4-40e9-9dd4-595712452e5c@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> <7rQQpVtTPecL-Vt4JJ0CR3U_1523KksQwFAsn9QpJK8=.5e872ffe-d1e4-40e9-9dd4-595712452e5c@github.com> Message-ID: <_EeIXcUypiIUutwEAJRwUR49nfILH3PNmz75RLBmy0M=.dca51ee5-c4b5-4bcf-874f-3c3919f332ee@github.com> On Tue, 10 Nov 2020 21:35:43 GMT, Daniel D. Daugherty wrote: >> Yes that avoids it! > > Done. I also did the one in ObjectSynchronizer::request_deflate_idle_monitors(). I like it but can you do one thing for me? Can you s/chk/check/ in the name? ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Tue Nov 10 22:25:04 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 22:25:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <-NIFifwEMk1oYQugLdKvFNgAxmeyEsf05sku_ccaNW0=.e54c02a9-acd4-449e-8fae-05da29aad1a9@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> <-NIFifwEMk1oYQugLdKvFNgAxmeyEsf05sku_ccaNW0=.e54c02a9-acd4-449e-8fae-05da29aad1a9@github.com> Message-ID: On Tue, 10 Nov 2020 21:12:25 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/monitorDeflationThread.cpp line 85: >> >>> 83: // visible to external suspension. >>> 84: >>> 85: ThreadBlockInVM tbivm(jt); >> >> Does this have to be a JavaThread? Could it be a non-java thread since deflating monitors doesn't have to call any Java code? You'd have to lock down the Monitor list maybe, but couldn't this be a NamedThread? This isn't a request to change it right now. > > Ummm... we use a JavaThread because we have to stop async deflation > during safepoints so that we're not messing with Object headers during GC. > Yes, it's possible to use a non-JavaThread because this is "just software", > but I don't want to try to figure out those races... Ok, I see why it has to be a JavaThread. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Tue Nov 10 22:31:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 22:31:01 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <_8BktJ0NHE_i7LQ7_X8JQyyfriKl8_-PA7hOfdBrC3U=.ac425b59-0e89-47f2-9fa7-e461e55689a2@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> <_8BktJ0NHE_i7LQ7_X8JQyyfriKl8_-PA7hOfdBrC3U=.ac425b59-0e89-47f2-9fa7-e461e55689a2@github.com> Message-ID: <_91UeV37-LcU43acy9VUD7rt3woXrc-jdz-ILC7ichs=.57f26cbb-1ac0-4c9e-ba2e-9b620d60b2c9@github.com> On Tue, 10 Nov 2020 21:25:51 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 551: >> >>> 549: if (try_set_owner_from(DEFLATER_MARKER, NULL) != DEFLATER_MARKER) { >>> 550: // Deferred decrement for the JT EnterI() that cancelled the async deflation. >>> 551: add_to_contentions(-1); >> >> contentions is essentially a refcount, isn't it. Can you fix the comment to include this at line 360 since that's not the only purpose of this count. >> >> // Keep track of contention for JVM/TI and M&M queries. >> add_to_contentions(1); > > No it is not a ref_count. We got rid of the accurate ref_count field > because maintaining it was too slow. > > contentions just tells you how many threads have blocked on the > slow-path trying to enter the monitor. The fast-paths never touch > contentions. The comment on line 360 is accurate. But contentions is used for more than informing JVMTI, it's used to test whether the monitor is_busy on the slow path. That's why I wanted the comment to say something like your last sentence, since I spent time trying to understand why the various calls to add_to_contentions(-1) in deflate_monitor earlier. >> src/hotspot/share/runtime/objectMonitor.hpp line 171: >> >>> 169: volatile int _SpinDuration; >>> 170: >>> 171: jint _contentions; // Number of active contentions in enter(). It is used by is_busy() >> >> Future RFE - can we replace jint with int32_t or even int or some C++ types. We're trying not to have Java types leak into runtime code since this doesn't directly interface with Java. > > It can be a future RFE, but it won't be at the top of my list of > things to do. There may already be an RFE for that. No, I assume it's not high priority. I'll file an RFE because someday I want these to be cleaned up as a personal nit. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 22:35:04 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 22:35:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: <_EeIXcUypiIUutwEAJRwUR49nfILH3PNmz75RLBmy0M=.dca51ee5-c4b5-4bcf-874f-3c3919f332ee@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> <7rQQpVtTPecL-Vt4JJ0CR3U_1523KksQwFAsn9QpJK8=.5e872ffe-d1e4-40e9-9dd4-595712452e5c@github.com> <_EeIXcUypiIUutwEAJRwUR49nfILH3PNmz75RLBmy0M=.dca51ee5-c4b5-4bcf-874f-3c3919f332ee@github.com> Message-ID: On Tue, 10 Nov 2020 22:08:10 GMT, Coleen Phillimore wrote: >> Done. I also did the one in ObjectSynchronizer::request_deflate_idle_monitors(). > > I like it but can you do one thing for me? Can you s/chk/check/ in the name? I'd rather not. It is a function with six freaking parameters so I went with names as short as I could... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 22:56:00 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 22:56:00 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <_91UeV37-LcU43acy9VUD7rt3woXrc-jdz-ILC7ichs=.57f26cbb-1ac0-4c9e-ba2e-9b620d60b2c9@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> <_8BktJ0NHE_i7LQ7_X8JQyyfriKl8_-PA7hOfdBrC3U=.ac425b59-0e89-47f2-9fa7-e461e55689a2@github.com> <_91UeV37-LcU43acy9VUD7rt3woXrc-jdz-ILC7ichs=.57f26cbb-1ac0-4c9e-ba2e-9b620d60b2c9@github.com> Message-ID: <3sH90dwOxC6MSBbtSZw8XNQpIvDyhsciRtPsrHJ8UPQ=.88463ff9-ca47-4dc1-8334-f436cd16276a@github.com> On Tue, 10 Nov 2020 22:27:06 GMT, Coleen Phillimore wrote: >> No it is not a ref_count. We got rid of the accurate ref_count field >> because maintaining it was too slow. >> >> contentions just tells you how many threads have blocked on the >> slow-path trying to enter the monitor. The fast-paths never touch >> contentions. The comment on line 360 is accurate. > > But contentions is used for more than informing JVMTI, it's used to test whether the monitor is_busy on the slow path. That's why I wanted the comment to say something like your last sentence, since I spent time trying to understand why the various calls to add_to_contentions(-1) in deflate_monitor earlier. Ahhh... I think I understand your confusion. This line: L550: // Deferred decrement for the JT EnterI() that cancelled the async deflation. L551: add_to_contentions(-1); doesn't match up with this line: L361: add_to_contentions(1); It matches up with one of these: if (try_set_owner_from(DEFLATER_MARKER, Self) == DEFLATER_MARKER) { // Cancelled the in-progress async deflation by changing owner from // DEFLATER_MARKER to Self. As part of the contended enter protocol, // contentions was incremented to a positive value before EnterI() // was called and that prevents the deflater thread from winning the // last part of the 2-part async deflation protocol. After EnterI() // returns to enter(), contentions is decremented because the caller // now owns the monitor. We bump contentions an extra time here to // prevent the deflater thread from winning the last part of the // 2-part async deflation protocol after the regular decrement // occurs in enter(). The deflater thread will decrement contentions // after it recognizes that the async deflation was cancelled. add_to_contentions(1); The long comments are in the two places where we temporarily increment contentions to stop the race with the deflater thread and the shorter comment, e.g., L550, are for where we undo the temporary increment. The primary purpose of the contentions field is for JVM/TI and M&M queries. We just (temporarily) steal it for async deflation purposes... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Tue Nov 10 23:18:59 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 10 Nov 2020 23:18:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <3sH90dwOxC6MSBbtSZw8XNQpIvDyhsciRtPsrHJ8UPQ=.88463ff9-ca47-4dc1-8334-f436cd16276a@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> <_8BktJ0NHE_i7LQ7_X8JQyyfriKl8_-PA7hOfdBrC3U=.ac425b59-0e89-47f2-9fa7-e461e55689a2@github.com> <_91UeV37-LcU43acy9VUD7rt3woXrc-jdz-ILC7ichs=.57f26cbb-1ac0-4c9e-ba2e-9b620d60b2c9@github.com> <3sH90dwOxC6MSBbtSZw8XNQpIvDyhsciRtPsrHJ8UPQ=.88463ff9-ca47-4dc1-8334-f436cd16276a@github.com> Message-ID: On Tue, 10 Nov 2020 22:53:10 GMT, Daniel D. Daugherty wrote: >> But contentions is used for more than informing JVMTI, it's used to test whether the monitor is_busy on the slow path. That's why I wanted the comment to say something like your last sentence, since I spent time trying to understand why the various calls to add_to_contentions(-1) in deflate_monitor earlier. > > Ahhh... I think I understand your confusion. This line: > > L550: // Deferred decrement for the JT EnterI() that cancelled the async deflation. > L551: add_to_contentions(-1); > > doesn't match up with this line: > > L361: add_to_contentions(1); > > It matches up with one of these: > if (try_set_owner_from(DEFLATER_MARKER, Self) == DEFLATER_MARKER) { > // Cancelled the in-progress async deflation by changing owner from > // DEFLATER_MARKER to Self. As part of the contended enter protocol, > // contentions was incremented to a positive value before EnterI() > // was called and that prevents the deflater thread from winning the > // last part of the 2-part async deflation protocol. After EnterI() > // returns to enter(), contentions is decremented because the caller > // now owns the monitor. We bump contentions an extra time here to > // prevent the deflater thread from winning the last part of the > // 2-part async deflation protocol after the regular decrement > // occurs in enter(). The deflater thread will decrement contentions > // after it recognizes that the async deflation was cancelled. > add_to_contentions(1); > > The long comments are in the two places where we temporarily > increment contentions to stop the race with the deflater thread > and the shorter comment, e.g., L550, are for where we undo the > temporary increment. > > The primary purpose of the contentions field is for JVM/TI and > M&M queries. We just (temporarily) steal it for async deflation > purposes... Well since it controls async deflation, it should probably get a mention since this comment on its own is not true: // Keep track of contention for JVM/TI and M&M queries and control async deflation. The field _contentions has a good comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 23:24:25 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 23:24:25 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Tue, 10 Nov 2020 20:34:12 GMT, Robbin Ehn wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> coleenp CR - refactor common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). > > Hi, thanks for fixing. > > I had some comments nothing major so approving. @robehn and @coleenp - resolved more comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 23:24:24 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 23:24:24 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also done promotion perf runs on these bits and says "the results look fine". Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: resolve more robehn and coleenp comments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/642/files - new: https://git.openjdk.java.net/jdk/pull/642/files/15ad3526..61d36884 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=04-05 Stats: 92 lines in 3 files changed: 46 ins; 38 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/642.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/642/head:pull/642 PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 23:33:59 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 23:33:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> <_8BktJ0NHE_i7LQ7_X8JQyyfriKl8_-PA7hOfdBrC3U=.ac425b59-0e89-47f2-9fa7-e461e55689a2@github.com> <_91UeV37-LcU43acy9VUD7rt3woXrc-jdz-ILC7ichs=.57f26cbb-1ac0-4c9e-ba2e-9b620d60b2c9@github.com> <3sH90dwOxC6MSBbtSZw8XNQpIvDyhsciRtPsrHJ8UPQ=.88463ff9-ca47-4dc1-8334-f436cd16276a@github.com> Message-ID: On Tue, 10 Nov 2020 23:15:15 GMT, Coleen Phillimore wrote: >> Ahhh... I think I understand your confusion. This line: >> >> L550: // Deferred decrement for the JT EnterI() that cancelled the async deflation. >> L551: add_to_contentions(-1); >> >> doesn't match up with this line: >> >> L361: add_to_contentions(1); >> >> It matches up with one of these: >> if (try_set_owner_from(DEFLATER_MARKER, Self) == DEFLATER_MARKER) { >> // Cancelled the in-progress async deflation by changing owner from >> // DEFLATER_MARKER to Self. As part of the contended enter protocol, >> // contentions was incremented to a positive value before EnterI() >> // was called and that prevents the deflater thread from winning the >> // last part of the 2-part async deflation protocol. After EnterI() >> // returns to enter(), contentions is decremented because the caller >> // now owns the monitor. We bump contentions an extra time here to >> // prevent the deflater thread from winning the last part of the >> // 2-part async deflation protocol after the regular decrement >> // occurs in enter(). The deflater thread will decrement contentions >> // after it recognizes that the async deflation was cancelled. >> add_to_contentions(1); >> >> The long comments are in the two places where we temporarily >> increment contentions to stop the race with the deflater thread >> and the shorter comment, e.g., L550, are for where we undo the >> temporary increment. >> >> The primary purpose of the contentions field is for JVM/TI and >> M&M queries. We just (temporarily) steal it for async deflation >> purposes... > > Well since it controls async deflation, it should probably get a mention since this comment on its own is not true: > > // Keep track of contention for JVM/TI and M&M queries and control async deflation. > > The field _contentions has a good comment. The comment for **that** location is correct. That's the location where we keep track of contended monitors for JVM/TI and M&M. The comments at the other locations are also correct (very verbose for two of them, but correct). ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Tue Nov 10 23:33:59 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 10 Nov 2020 23:33:59 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> <_8BktJ0NHE_i7LQ7_X8JQyyfriKl8_-PA7hOfdBrC3U=.ac425b59-0e89-47f2-9fa7-e461e55689a2@github.com> <_91UeV37-LcU43acy9VUD7rt3woXrc-jdz-ILC7ichs=.57f26cbb-1ac0-4c9e-ba2e-9b620d60b2c9@github.com> <3sH90dwOxC6MSBbtSZw8XNQpIvDyhsciRtPsrHJ8UPQ=.88463ff9-ca47-4dc1-8334-f436cd16276a@github.com> Message-ID: On Tue, 10 Nov 2020 23:27:52 GMT, Daniel D. Daugherty wrote: >> Well since it controls async deflation, it should probably get a mention since this comment on its own is not true: >> >> // Keep track of contention for JVM/TI and M&M queries and control async deflation. >> >> The field _contentions has a good comment. > > The comment for **that** location is correct. That's the location > where we keep track of contended monitors for JVM/TI and M&M. > The comments at the other locations are also correct (very verbose > for two of them, but correct). Look at the entire context of that location: // Keep track of contention for JVM/TI and M&M queries. add_to_contentions(1); if (is_being_async_deflated()) { // Async deflation is in progress and our contentions increment // above lost the race to async deflation. Undo the work and // force the caller to retry. const oop l_object = object(); if (l_object != NULL) { // Attempt to restore the header/dmw to the object's header so that // we only retry once if the deflater thread happens to be slow. install_displaced_markword_in_object(l_object); } Self->_Stalled = 0; add_to_contentions(-1); return false; } We do the increment for JVM/TI and M&M purposes and then we check to see if async deflation beat us... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From iklam at openjdk.java.net Wed Nov 11 00:27:08 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 11 Nov 2020 00:27:08 GMT Subject: RFR: 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp Message-ID: jvmti.h is included 905 times and jvmtiExport.hpp is included 776 times (out of 971 hotspot .o files). Most of these are unnecessarily included by the following 3 popular header files: * javaClasses.hpp: `java_lang_Class::ThreadStatus` (which depends on jvmti.h) this type is rarely used. Move this type to a separate header file. The enum is also changed to an enum class for better type safety. * os.hpp: No need to include jvmti.h. Use forward declaration for `struct jvmtiTimerInfo;` instead. * thread.hpp: No need to include jvmExport.hpp. Use forward declaration for `JvmtiSampledObjectAllocEventCollector` and `JvmtiVMObjectAllocEventCollector` instead. Tested with mach5 tier1-2 and build tiers 3-5. Note: Many files are changed, but most of the changes are adding a missing jvmtiExports.hpp. Original review thread: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-August/041509.html ------------- Commit messages: - 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp Changes: https://git.openjdk.java.net/jdk/pull/1152/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1152&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8252526 Stats: 232 lines in 65 files changed: 125 ins; 34 del; 73 mod Patch: https://git.openjdk.java.net/jdk/pull/1152.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1152/head:pull/1152 PR: https://git.openjdk.java.net/jdk/pull/1152 From sspitsyn at openjdk.java.net Wed Nov 11 00:30:54 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 11 Nov 2020 00:30:54 GMT Subject: RFR: 8255787: Tag container tests that use cGroups with cgroups keyword In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 21:24:25 GMT, Harold Seigel wrote: > Please review this small change to add a cgroups keyword to tests that use cgroups. The fix was tested by running Mach5 container tests. Hi Harold, The fix looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1148 From dongbo at openjdk.java.net Wed Nov 11 01:43:56 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 11 Nov 2020 01:43:56 GMT Subject: RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic [v5] In-Reply-To: <1VK0__nxU327zmURyR3mVD402EmGZPIuzyjMXPP0Wyw=.48883f69-758d-4e69-8e19-363cb0c06177@github.com> References: <1VK0__nxU327zmURyR3mVD402EmGZPIuzyjMXPP0Wyw=.48883f69-758d-4e69-8e19-363cb0c06177@github.com> Message-ID: On Tue, 10 Nov 2020 08:38:28 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> use r6/r7 instead of scratch registers > > Thanks. I'm sorry that this took so long. Oh, I think I misunderstood this. Sorry for that. :) I think `c_rarg6` is more consistent and I am going to integrate this. @theRealAph Thanks for you review. ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From dongbo at openjdk.java.net Wed Nov 11 01:55:58 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 11 Nov 2020 01:55:58 GMT Subject: Integrated: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 03:05:48 GMT, Dong Bo wrote: > Base64.encodeBlock stub is implemented for x86_64. > We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. > A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. > Tests in test/jdk/java/util/Base64/* runned specially for the correctness of the implementation and passed. > > A JMH micro, Base64Encode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the JMH micro), > we witness ~4x improvement with long input and no regression with short input on Kunpeng916 and Kunpeng920. > > The Base64Encode.java JMH micro-benchmark results: > Benchmark (maxNumBytes) Mode Cnt Score Error Units > # kunpeng 916, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 31.564 ? 0.034 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.921 ? 0.362 ns/op > Base64Encode.testBase64Encode 3 avgt 10 38.015 ? 0.220 ns/op > Base64Encode.testBase64Encode 6 avgt 10 41.115 ? 0.281 ns/op > Base64Encode.testBase64Encode 7 avgt 10 42.161 ? 0.630 ns/op > Base64Encode.testBase64Encode 9 avgt 10 44.797 ? 0.849 ns/op > Base64Encode.testBase64Encode 10 avgt 10 46.013 ? 0.917 ns/op > Base64Encode.testBase64Encode 48 avgt 10 67.984 ? 0.777 ns/op > Base64Encode.testBase64Encode 512 avgt 10 174.494 ? 1.614 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 277.103 ? 0.306 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 4261.018 ? 1.883 ns/op > > # kunpeng 916, default > Base64Encode.testBase64Encode 1 avgt 10 31.710 ? 0.234 ns/op > Base64Encode.testBase64Encode 2 avgt 10 33.978 ? 0.305 ns/op > Base64Encode.testBase64Encode 3 avgt 10 40.059 ? 0.444 ns/op > Base64Encode.testBase64Encode 6 avgt 10 47.958 ? 0.328 ns/op > Base64Encode.testBase64Encode 7 avgt 10 49.017 ? 1.305 ns/op > Base64Encode.testBase64Encode 9 avgt 10 53.150 ? 0.769 ns/op > Base64Encode.testBase64Encode 10 avgt 10 55.418 ? 0.316 ns/op > Base64Encode.testBase64Encode 48 avgt 10 93.517 ? 0.391 ns/op > Base64Encode.testBase64Encode 512 avgt 10 494.809 ? 0.413 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 898.581 ? 0.944 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 16464.411 ? 7.582 ns/op > > # kunpeng 920, intrinsic > Base64Encode.testBase64Encode 1 avgt 10 17.494 ? 0.012 ns/op > Base64Encode.testBase64Encode 2 avgt 10 21.023 ? 0.169 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.772 ? 0.138 ns/op > Base64Encode.testBase64Encode 6 avgt 10 30.121 ? 0.347 ns/op > Base64Encode.testBase64Encode 7 avgt 10 31.591 ? 0.238 ns/op > Base64Encode.testBase64Encode 9 avgt 10 32.728 ? 0.395 ns/op > Base64Encode.testBase64Encode 10 avgt 10 35.110 ? 0.215 ns/op > Base64Encode.testBase64Encode 48 avgt 10 48.621 ? 0.314 ns/op > Base64Encode.testBase64Encode 512 avgt 10 113.391 ? 0.554 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 180.749 ? 0.193 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 3273.961 ? 5.706 ns/op > > # kunpeng 920, default > Base64Encode.testBase64Encode 1 avgt 10 17.428 ? 0.037 ns/op > Base64Encode.testBase64Encode 2 avgt 10 20.926 ? 0.155 ns/op > Base64Encode.testBase64Encode 3 avgt 10 25.466 ? 0.140 ns/op > Base64Encode.testBase64Encode 6 avgt 10 32.526 ? 0.190 ns/op > Base64Encode.testBase64Encode 7 avgt 10 34.132 ? 0.387 ns/op > Base64Encode.testBase64Encode 9 avgt 10 36.685 ? 0.212 ns/op > Base64Encode.testBase64Encode 10 avgt 10 38.117 ? 0.246 ns/op > Base64Encode.testBase64Encode 48 avgt 10 62.447 ? 0.900 ns/op > Base64Encode.testBase64Encode 512 avgt 10 377.275 ? 0.162 ns/op > Base64Encode.testBase64Encode 1000 avgt 10 700.628 ? 0.509 ns/op > Base64Encode.testBase64Encode 20000 avgt 10 13626.764 ? 3.448 ns/op This pull request has now been integrated. Changeset: 8638cd9a Author: Dong Bo Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/8638cd9a Stats: 226 lines in 3 files changed: 226 ins; 0 del; 0 mod 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/992 From dholmes at openjdk.java.net Wed Nov 11 05:15:02 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 11 Nov 2020 05:15:02 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> On Tue, 10 Nov 2020 23:24:24 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > resolve more robehn and coleenp comments. One change requested in relation to use of jint instead of size_t. One code simplification suggestion. Thanks, David src/hotspot/share/runtime/synchronizer.cpp line 153: > 151: if (self->is_Java_thread()) { > 152: // A JavaThread must check for a safepoint/handshake and honor it. > 153: ObjectSynchronizer::chk_for_block_req(self->as_Java_thread(), "unlinking", I won't disapprove but this is a case where refactoring is IMO worse than code duplication. Logging parameters should not be a part of this API IMO. src/hotspot/share/runtime/synchronizer.cpp line 1228: > 1226: os::naked_short_sleep(999); // sleep for almost 1 second > 1227: } else { > 1228: os::naked_short_sleep(999); // sleep for almost 1 second So this block can now just be: > if (self->is_Java_thread()) { > ThreadBlockInVM tbivm(self->as_Java_thread()); > } > os::naked_short_sleep(999); // sleep for almost 1 second src/hotspot/share/runtime/synchronizer.cpp line 246: > 244: // > 245: // Start the ceiling with the estimate for one thread: > 246: jint _in_use_list_ceiling = AvgMonitorsPerThreadEstimate; Why is this a jint when you use size_t for its accessor and all the other sizes that you compare with the ceiling are also size_t? I'm not sure size_t is right to use in these cases (do we really expect different maximums on 32-bit versus 64-bit?) but it should be all or none IMO. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/642 From david.holmes at oracle.com Wed Nov 11 05:47:39 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 Nov 2020 15:47:39 +1000 Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: <182nfzh-a6eiBfaCmrFfv-ZDxWbyRgUNBzzYR0huHO0=.b6029a06-1aaf-4ad9-aab1-4d83800daab4@github.com> References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> <182nfzh-a6eiBfaCmrFfv-ZDxWbyRgUNBzzYR0huHO0=.b6029a06-1aaf-4ad9-aab1-4d83800daab4@github.com> Message-ID: <6cba1c0a-791b-bf50-3d9f-07c7a9359bd0@oracle.com> On 11/11/2020 3:04 am, Gerard Ziemski wrote: > Many thanks Thomas & David for the lesson on the header files! > > If I understand it correctly, then we need to add `#include "utilities/globalDefinitions.hpp"` to signals_posix.hpp, correct? It is already there. #ifndef OS_POSIX_SIGNALS_POSIX_HPP #define OS_POSIX_SIGNALS_POSIX_HPP #include "memory/allocation.hpp" #include "utilities/globalDefinitions.hpp" #include So you can just delete the include of signal.h David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/636 > From dholmes at openjdk.java.net Wed Nov 11 07:24:07 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 11 Nov 2020 07:24:07 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 14:16:22 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: > > - Merge pull request #7 from JornVernee/Additional_Review_Comments > > Additional review comments > - Revert System.java changes > - Set copyright year for added files to 2020 > - Check result of AttachCurrentThread > - Sort includes alphabetically > - Relax ret_addr_offset() assert > - Extra space after if > - remove excessive asserts in ProgrammableInvoker::invoke_native > - Remove os::is_MP() check > - remove blank line in thread.hpp Updates seem fine to me. Thanks. Adding approval for hotspot parts. src/hotspot/share/prims/universalUpcallHandler.cpp line 55: > 53: JavaVM_ *vm = (JavaVM *)(&main_vm); > 54: jint result = vm->functions->AttachCurrentThread(vm, (void**) &p_env, nullptr); > 55: guarantee(result == JNI_OK, "Could not attach thread for upcall. JNI error code: %d", result); I'm assuming you don't have a mechanism for conveying an error back to the original entry point used by the native thread? Attaching an existing thread should only fail if we run out of C-Heap, so we're on the brink of aborting anyway, but still the guarantee here is not ideal. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/634 From rehn at openjdk.java.net Wed Nov 11 08:29:01 2020 From: rehn at openjdk.java.net (Robbin Ehn) Date: Wed, 11 Nov 2020 08:29:01 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Tue, 10 Nov 2020 23:24:24 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > resolve more robehn and coleenp comments. src/hotspot/share/runtime/synchronizer.cpp line 1226: > 1224: ThreadBlockInVM tbivm(self->as_Java_thread()); > 1225: } > 1226: os::naked_short_sleep(999); // sleep for almost 1 second Hi Dan, you need to be blocked while sleeping, otherwise you are blocking safepoints for "almost 1 second". So previous code was correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From shade at openjdk.java.net Wed Nov 11 09:39:15 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 11 Nov 2020 09:39:15 GMT Subject: RFR: 8253525: Implement getInstanceSize/sizeOf intrinsics [v7] In-Reply-To: References: Message-ID: > This is fork off the SizeOf JEP, JDK-8249196. There is already the entry point in JDK that can use the intrinsic like this: `Instrumentation.getInstanceSize`. Therefore, we can implement the C1/C2 intrinsic now, hook it up to `Instrumentation`, and let the tools use that fast path today. > > With this patch, JOL is able to be close to `deepSizeOf` implementation from SizeOf JEP. > > Example performance improvements for sizing up a custom linked list: > > Benchmark (size) Mode Cnt Score Error Units > > # Default > LinkedChainBench.linkedChain 1 avgt 5 705.835 ? 8.051 ns/op > LinkedChainBench.linkedChain 10 avgt 5 3148.874 ? 37.856 ns/op > LinkedChainBench.linkedChain 100 avgt 5 28693.256 ? 142.254 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 290161.590 ? 4594.631 ns/op > > # Instrumentation attached, no intrinsics > LinkedChainBench.linkedChain 1 avgt 5 159.659 ? 19.238 ns/op > LinkedChainBench.linkedChain 10 avgt 5 717.659 ? 22.540 ns/op > LinkedChainBench.linkedChain 100 avgt 5 7739.394 ? 111.683 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 80724.238 ? 2887.794 ns/op > > # Instrumentation attached, new intrinsics > LinkedChainBench.linkedChain 1 avgt 5 95.254 ? 0.808 ns/op > LinkedChainBench.linkedChain 10 avgt 5 261.564 ? 8.524 ns/op > LinkedChainBench.linkedChain 100 avgt 5 3367.192 ? 21.128 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 34148.851 ? 373.080 ns/op Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Drop parentheses around comparisons ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/650/files - new: https://git.openjdk.java.net/jdk/pull/650/files/1b7290a3..89881e9f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=650&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=650&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/650.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/650/head:pull/650 PR: https://git.openjdk.java.net/jdk/pull/650 From jvernee at openjdk.java.net Wed Nov 11 11:19:04 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 11 Nov 2020 11:19:04 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 07:18:33 GMT, David Holmes wrote: >> Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Merge pull request #7 from JornVernee/Additional_Review_Comments >> >> Additional review comments >> - Revert System.java changes >> - Set copyright year for added files to 2020 >> - Check result of AttachCurrentThread >> - Sort includes alphabetically >> - Relax ret_addr_offset() assert >> - Extra space after if >> - remove excessive asserts in ProgrammableInvoker::invoke_native >> - Remove os::is_MP() check >> - remove blank line in thread.hpp > > src/hotspot/share/prims/universalUpcallHandler.cpp line 55: > >> 53: JavaVM_ *vm = (JavaVM *)(&main_vm); >> 54: jint result = vm->functions->AttachCurrentThread(vm, (void**) &p_env, nullptr); >> 55: guarantee(result == JNI_OK, "Could not attach thread for upcall. JNI error code: %d", result); > > I'm assuming you don't have a mechanism for conveying an error back to the original entry point used by the native thread? Attaching an existing thread should only fail if we run out of C-Heap, so we're on the brink of aborting anyway, but still the guarantee here is not ideal. Yeah, we have no idea where/how to report such and error. The native code might just call a function through a function pointer like `void(*)(void)` i.e. no place to return an error code. Also, since it's a previously non-attached thread, there is no JavaFrameAnchor that gives us a last Java frame we could jump back to and throw an exception. Is there's a more explicit way to exit with an error, other than a `guarantee` that you would prefer here? ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From david.holmes at oracle.com Wed Nov 11 11:36:43 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 Nov 2020 21:36:43 +1000 Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: <718dab9a-2939-68a6-9c79-2414a7296ffe@oracle.com> On 11/11/2020 9:19 pm, Jorn Vernee wrote: > On Wed, 11 Nov 2020 07:18:33 GMT, David Holmes wrote: > >>> Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: >>> >>> - Merge pull request #7 from JornVernee/Additional_Review_Comments >>> >>> Additional review comments >>> - Revert System.java changes >>> - Set copyright year for added files to 2020 >>> - Check result of AttachCurrentThread >>> - Sort includes alphabetically >>> - Relax ret_addr_offset() assert >>> - Extra space after if >>> - remove excessive asserts in ProgrammableInvoker::invoke_native >>> - Remove os::is_MP() check >>> - remove blank line in thread.hpp >> >> src/hotspot/share/prims/universalUpcallHandler.cpp line 55: >> >>> 53: JavaVM_ *vm = (JavaVM *)(&main_vm); >>> 54: jint result = vm->functions->AttachCurrentThread(vm, (void**) &p_env, nullptr); >>> 55: guarantee(result == JNI_OK, "Could not attach thread for upcall. JNI error code: %d", result); >> >> I'm assuming you don't have a mechanism for conveying an error back to the original entry point used by the native thread? Attaching an existing thread should only fail if we run out of C-Heap, so we're on the brink of aborting anyway, but still the guarantee here is not ideal. > > Yeah, we have no idea where/how to report such and error. The native code might just call a function through a function pointer like `void(*)(void)` i.e. no place to return an error code. Also, since it's a previously non-attached thread, there is no JavaFrameAnchor that gives us a last Java frame we could jump back to and throw an exception. Yeah not a solvable problem when you can transparently attach to the VM. > Is there's a more explicit way to exit with an error, other than a `guarantee` that you would prefer here? Perhaps vm_exit_out_if_memory under the assumption that is the only reason attach could fail. But really I don't think it makes much practical difference. Thanks, David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/634 > From mcimadamore at openjdk.java.net Wed Nov 11 11:40:12 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Wed, 11 Nov 2020 11:40:12 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v27] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge branch 'master' into 8254162 - Add more output in TestHandhsake.java - Further improve output of TestHandshake - Improve debugging output of TestHandhsake - Remove endianness-aware byte getter/setter in MemoryAccess Remove index-based version of byte getter/setter in MemoryAccess - Fix post-merge issues caused by 8219014 - Merge branch 'master' into 8254162 - Addess remaining feedback from @AlanBateman and @mrserb - Address comments from @AlanBateman - Merge branch 'master' into 8254162 - ... and 24 more: https://git.openjdk.java.net/jdk/compare/432c387e...8444c633 ------------- Changes: https://git.openjdk.java.net/jdk/pull/548/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=26 Stats: 7600 lines in 82 files changed: 4791 ins; 1590 del; 1219 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From mcimadamore at openjdk.java.net Wed Nov 11 11:50:14 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Wed, 11 Nov 2020 11:50:14 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v28] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Invert condition in memory access var handle `withInvokeBehavior' ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/548/files - new: https://git.openjdk.java.net/jdk/pull/548/files/8444c633..eae57b4d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=27 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=26-27 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From ihse at openjdk.java.net Wed Nov 11 12:18:56 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 11 Nov 2020 12:18:56 GMT Subject: RFR: 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp In-Reply-To: References: Message-ID: <6ZCqzq3jEjC-XId2G8FgW268uQvrUISUrrcf6-FIEkM=.c4362162-9b7e-49ab-ba2d-08c76d1e872d@github.com> On Wed, 11 Nov 2020 00:03:56 GMT, Ioi Lam wrote: > jvmti.h is included 905 times and jvmtiExport.hpp is included 776 times (out of 971 hotspot .o files). Most of these are unnecessarily included by the following 3 popular header files: > > * javaClasses.hpp: `java_lang_Class::ThreadStatus` (which depends on jvmti.h) this type is rarely used. Move this type to a separate header file. The enum is also changed to an enum class for better type safety. > * os.hpp: No need to include jvmti.h. Use forward declaration for `struct jvmtiTimerInfo;` instead. > * thread.hpp: No need to include jvmExport.hpp. Use forward declaration for `JvmtiSampledObjectAllocEventCollector` and `JvmtiVMObjectAllocEventCollector` instead. > > Tested with mach5 tier1-2 and build tiers 3-5. > > Note: Many files are changed, but most of the changes are adding a missing jvmtiExports.hpp. > > Original review thread: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-August/041509.html Thank you for your continued effort to simplify building of hotspot. All efforts to clean up the include mess edges us slowly towards a more efficient hotspot build (which will improve build times overall, since most of the product depends on hotspot). ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1152 From kbarrett at openjdk.java.net Wed Nov 11 12:50:55 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 11 Nov 2020 12:50:55 GMT Subject: RFR: 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 00:03:56 GMT, Ioi Lam wrote: > jvmti.h is included 905 times and jvmtiExport.hpp is included 776 times (out of 971 hotspot .o files). Most of these are unnecessarily included by the following 3 popular header files: > > * javaClasses.hpp: `java_lang_Class::ThreadStatus` (which depends on jvmti.h) this type is rarely used. Move this type to a separate header file. The enum is also changed to an enum class for better type safety. > * os.hpp: No need to include jvmti.h. Use forward declaration for `struct jvmtiTimerInfo;` instead. > * thread.hpp: No need to include jvmExport.hpp. Use forward declaration for `JvmtiSampledObjectAllocEventCollector` and `JvmtiVMObjectAllocEventCollector` instead. > > Tested with mach5 tier1-2 and build tiers 3-5. > > Note: Many files are changed, but most of the changes are adding a missing jvmtiExports.hpp. > > Original review thread: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-August/041509.html Nice! ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1152 From tschatzl at openjdk.java.net Wed Nov 11 13:17:03 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Nov 2020 13:17:03 GMT Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality Message-ID: Hi all, can I get reviews for this change that removes the "Allocation of old generation of Java heap on alternate memory devices" functionality introduced with JDK 12 with [JDK-8202286](https://bugs.openjdk.java.net/browse/JDK-8202286) due to being - not used by anyone - not maintained by anyone, i.e. several bugs open for a long time and bit rotting - requiring some workarounds for new feature development wrt to heap management All flags covered by this feature were experimental flags, so there are no additional procedural issues to take. I tried to remove all but a few minor cleanups that I thought useful, but of course this is very subjective. Testing: hs-tier1-5 ------------- Commit messages: - Initial import Changes: https://git.openjdk.java.net/jdk/pull/1162/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1162&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256181 Stats: 2309 lines in 49 files changed: 4 ins; 2177 del; 128 mod Patch: https://git.openjdk.java.net/jdk/pull/1162.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1162/head:pull/1162 PR: https://git.openjdk.java.net/jdk/pull/1162 From dnsimon at openjdk.java.net Wed Nov 11 13:28:03 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 11 Nov 2020 13:28:03 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:50:54 GMT, Jorn Vernee wrote: >> src/hotspot/cpu/aarch64/universalUpcallHandler_aarch64.cpp line 99: >> >>> 97: if (thread == NULL) { >>> 98: JavaVM_ *vm = (JavaVM *)(&main_vm); >>> 99: vm -> functions -> AttachCurrentThreadAsDaemon(vm, &p_env, NULL); >> >> Style nit: don't put spaces around `->` operator. >> >> What is the context for this being called? It looks highly suspicious to just attach the current thread to the VM this way. > > The context is a thread that is spawned by native code doing an upcall. We need to attach the thread to the VM first in that case. Normally this would be handled by the calling code, but in our case the calling code doesn't know it's calling into Java. Where's the logic for the native thread to detach? We have a similar problem in libgraal. We have a [utility class](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L27-L32) for libgraal created threads (as opposed to VM created threads that call into libgraal) that call into the VM. The utility class takes care of [attaching](https://github.com/oracle/graal/blob/a913944a06425c25ccd6e4a90379938fcf7ea2cf/substratevm/src/com.oracle.svm.graal.hotspot.libgraal/src/com/oracle/svm/graal/hotspot/libgraal/LibGraalFeature.java#L749) and [detaching](https://github.com/oracle/graal/blob/a913944a06425c25ccd6e4a90379938fcf7ea2cf/substratevm/src/com.oracle.svm.graal.hotspot.libgraal/src/com/oracle/svm/graal/hotspot/libgraal/LibGraalFeature.java#L757) to/from the VM. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From jvernee at openjdk.java.net Wed Nov 11 13:34:02 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 11 Nov 2020 13:34:02 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 13:25:32 GMT, Doug Simon wrote: >> The context is a thread that is spawned by native code doing an upcall. We need to attach the thread to the VM first in that case. Normally this would be handled by the calling code, but in our case the calling code doesn't know it's calling into Java. > > Where's the logic for the native thread to detach? We have a similar problem in libgraal. We have a [utility class](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L27-L32) for libgraal created threads (as opposed to VM created threads that call into libgraal) that call into the VM. The utility class takes care of [attaching](https://github.com/oracle/graal/blob/a913944a06425c25ccd6e4a90379938fcf7ea2cf/substratevm/src/com.oracle.svm.graal.hotspot.libgraal/src/com/oracle/svm/graal/hotspot/libgraal/LibGraalFeature.java#L749) and [detaching](https://github.com/oracle/graal/blob/a913944a06425c25ccd6e4a90379938fcf7ea2cf/substratevm/src/com.oracle.svm.graal.hotspot.libgraal/src/com/oracle/svm/graal/hotspot/libgraal/LibGraalFeature.java#L757) to/from the VM. I added a call to DetachCurrentThread here: https://github.com/openjdk/jdk/pull/634/commits/719224ca9dc70fce6d28885acfb362fee715ebbd#diff-c084afc373a6ce95010a480ddc5ab79d3cb759b80e46102c212c2cbc948e2303R65 ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From coleenp at openjdk.java.net Wed Nov 11 13:39:02 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 11 Nov 2020 13:39:02 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Tue, 10 Nov 2020 23:24:24 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > resolve more robehn and coleenp comments. Looks good! I don't have more comments blocking approval. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Wed Nov 11 13:53:08 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 11 Nov 2020 13:53:08 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: On Wed, 11 Nov 2020 05:11:10 GMT, David Holmes wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> resolve more robehn and coleenp comments. > > src/hotspot/share/runtime/synchronizer.cpp line 246: > >> 244: // >> 245: // Start the ceiling with the estimate for one thread: >> 246: jint _in_use_list_ceiling = AvgMonitorsPerThreadEstimate; > > Why is this a jint when you use size_t for its accessor and all the other sizes that you compare with the ceiling are also size_t? > I'm not sure size_t is right to use in these cases (do we really expect different maximums on 32-bit versus 64-bit?) but it should be all or none IMO. Our int types are really confused. AvgMonitorsPerThreadEstimate is defined as an intx which is intptr_t and the range of it is 0..max_jint which is 0 .. 0x7fffffff . jint is long on windows (the problematic type) and int on unix. Since this is a new declaration, it probably should be something other than jint but what? At any rate, it should be declared as 'static'. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From coleenp at openjdk.java.net Wed Nov 11 13:59:04 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 11 Nov 2020 13:59:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Tue, 10 Nov 2020 23:24:24 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > resolve more robehn and coleenp comments. Marked as reviewed by coleenp (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dnsimon at openjdk.java.net Wed Nov 11 14:00:14 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 11 Nov 2020 14:00:14 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v15] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 13:31:11 GMT, Jorn Vernee wrote: >> Where's the logic for the native thread to detach? We have a similar problem in libgraal. We have a [utility class](https://github.com/oracle/graal/blob/e4b9ab931940e1946f96f2015b937ba100384573/compiler/src/org.graalvm.compiler.core/src/org/graalvm/compiler/core/GraalServiceThread.java#L27-L32) for libgraal created threads (as opposed to VM created threads that call into libgraal) that call into the VM. The utility class takes care of [attaching](https://github.com/oracle/graal/blob/a913944a06425c25ccd6e4a90379938fcf7ea2cf/substratevm/src/com.oracle.svm.graal.hotspot.libgraal/src/com/oracle/svm/graal/hotspot/libgraal/LibGraalFeature.java#L749) and [detaching](https://github.com/oracle/graal/blob/a913944a06425c25ccd6e4a90379938fcf7ea2cf/substratevm/src/com.oracle.svm.graal.hotspot.libgraal/src/com/oracle/svm/graal/hotspot/libgraal/LibGraalFeature.java#L757) to/from the VM. > > I added a call to DetachCurrentThread here: https://github.com/openjdk/jdk/pull/634/commits/719224ca9dc70fce6d28885acfb362fee715ebbd#diff-c084afc373a6ce95010a480ddc5ab79d3cb759b80e46102c212c2cbc948e2303R65 Ok. That makes for high overhead for each upcall on a non-attached native thread. I assume that's an edge case not worth optimizing? ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From thomas.stuefe at gmail.com Wed Nov 11 14:06:52 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 11 Nov 2020 15:06:52 +0100 Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality In-Reply-To: References: Message-ID: Hi Thomas, I think this makes sense. Just a question, how are your thoughts about JEP316? Do you consider AllocateHeapAt similarly unused? Cheers, Thomas On Wed, Nov 11, 2020 at 2:17 PM Thomas Schatzl wrote: > Hi all, > > can I get reviews for this change that removes the "Allocation of old > generation of Java heap on alternate memory devices" functionality > introduced with JDK 12 with [JDK-8202286]( > https://bugs.openjdk.java.net/browse/JDK-8202286) due to being > > - not used by anyone > - not maintained by anyone, i.e. several bugs open for a long time and bit > rotting > - requiring some workarounds for new feature development wrt to heap > management > > All flags covered by this feature were experimental flags, so there are no > additional procedural issues to take. > > I tried to remove all but a few minor cleanups that I thought useful, but > of course this is very subjective. > > Testing: hs-tier1-5 > > ------------- > > Commit messages: > - Initial import > > Changes: https://git.openjdk.java.net/jdk/pull/1162/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1162&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8256181 > Stats: 2309 lines in 49 files changed: 4 ins; 2177 del; 128 mod > Patch: https://git.openjdk.java.net/jdk/pull/1162.diff > Fetch: git fetch https://git.openjdk.java.net/jdk > pull/1162/head:pull/1162 > > PR: https://git.openjdk.java.net/jdk/pull/1162 > From vlivanov at openjdk.java.net Wed Nov 11 14:21:14 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 11 Nov 2020 14:21:14 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 14:16:22 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: > > - Merge pull request #7 from JornVernee/Additional_Review_Comments > > Additional review comments > - Revert System.java changes > - Set copyright year for added files to 2020 > - Check result of AttachCurrentThread > - Sort includes alphabetically > - Relax ret_addr_offset() assert > - Extra space after if > - remove excessive asserts in ProgrammableInvoker::invoke_native > - Remove os::is_MP() check > - remove blank line in thread.hpp I made a pass over hotspot code. Overall, it looks good. Some comments follow. src/hotspot/cpu/aarch64/vmreg_aarch64.cpp line 57: > 55: #define INTEGER_TYPE 0 > 56: #define VECTOR_TYPE 1 > 57: #define X87_TYPE 2 Unused. src/hotspot/cpu/aarch64/foreign_globals_aarch64.hpp line 31: > 29: #include "utilities/growableArray.hpp" > 30: > 31: #define __ _masm-> Should be declared in cpp file instead. src/hotspot/cpu/x86/foreign_globals_x86.hpp line 30: > 28: #include "utilities/growableArray.hpp" > 29: > 30: #define __ _masm-> Same here (move to cpp file). src/hotspot/share/opto/lcm.cpp line 867: > 865: case Op_CallNative: > 866: // FIXME compute actual save policy based on nep->abi > 867: save_policy = _matcher._c_reg_save_policy; Please, elaborate here why it's OK for now to use ` _c_reg_save_policy`. And then turn `FIXME` into `TODO`. If possible, would be nice to introduce some asserts to back the claim. src/hotspot/share/opto/machnode.cpp line 831: > 829: st->print("%s ",_name); > 830: st->print("_arg_regs: "); > 831: _arg_regs.print_on(st); It doesn't print any useful info: `_arg_regs: AllocatedObj(0x000000011cf5cbe8)`. Please, improve it. src/hotspot/share/opto/output.cpp line 3394: > 3392: } > 3393: > 3394: address* native_stubs = NULL; IMO it's worth considering inlining `native_stubs` array into `nmethod` itself. That's the way how per-nmethod information is handled now (e.g., dependencies, debug info, exception handler table, implicit exception table). src/hotspot/share/opto/callnode.cpp line 1184: > 1182: void CallNativeNode::calling_convention( BasicType* sig_bt, VMRegPair *parm_regs, uint argcnt ) const { > 1183: assert((tf()->domain()->cnt() - TypeFunc::Parms) == argcnt, "arg counts must match!"); > 1184: #ifndef PRODUCT Should be `#ifdef ASSERT` instead. src/hotspot/share/opto/callnode.cpp line 1143: > 1141: case TypeFunc::Parms: > 1142: default: { > 1143: if(tf()->range()->field_at(proj->_con) == Type::HALF) { That's `TypeFunc::Parms+1` case in `CallNode::match`. Why did you decide to move it to `default` case? Overall, it looks very similar to `CallNode::match`. Why not just customize `OptoRegPair regs` computation for `CallNative` there? src/hotspot/share/opto/graphKit.cpp line 2665: > 2663: for (uint vm_ret_pos = 0; vm_ret_pos < n_returns; vm_ret_pos++) { > 2664: if (new_call_type->range()->field_at(TypeFunc::Parms + vm_ret_pos) == Type::HALF) { > 2665: // FIXME is this needed? Why do you need the projection at all? Please, clarify and remove `FIXME` comment. src/hotspot/share/opto/graphKit.cpp line 2675: > 2673: // Unpack native results if needed > 2674: // Need this method type since it's unerased > 2675: switch (nep->method_type()->rtype()->basic_type()) { Are calls returning multiple values supported right now? (From what I'm seeing in other places, they are not supported.) If not, then you don't need a loop over return values and there are other places where it can simplify code. src/hotspot/share/opto/type.hpp line 678: > 676: static const TypeTuple *make_range(ciSignature *sig); > 677: static const TypeTuple *make_domain(ciInstanceKlass* recv, ciSignature *sig); > 678: static const TypeTuple *make_func(uint arg_cnt, const Type **arg_fields); I find `make_func` name misleading: it makes an impression you get `TypeFunc` out of it, but in reailty it just composes type array with `TypeTyple::make`+`TypeTyple::fields`. I'd prefer to see it as `TypeTuple::fields` overload. Or rewrite `GraphKit::make_native_call` to operate directly on `TypeTyple::fields` and get rid fo intermediate arrays. src/hotspot/share/opto/output.cpp line 1144: > 1142: methodHandle null_mh; > 1143: bool rethrow_exception = false; > 1144: bool is_opt_native = mach->is_MachCallNative(); Please, move it to `MachCall`-related logic (where `is_method_handle_invoke` is set). src/hotspot/cpu/x86/universalNativeInvoker_x86.cpp line 77: > 75: XMMRegister reg = _abi->_vector_argument_registers.at(i); > 76: size_t offs = _layout->arguments_vector + i * sizeof(VectorRegister); > 77: if (UseAVX >= 3) { Assuming worst-case scenario (w.r.t. value size) is sub-optimal. Considering FP values are handled as vectors you end up operating on 32-bit/64-bit values as if they were 512-bit in size. And, in addition to wasted memory bandwidth, EVEX-encoded instructions may trigger CPU frequency scaling which will penalize cases when AVX512 is not used. So, it is worth considering annotating vector values with their actual sizes and taking the size into account when operating on vectors. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From Alan.Hayward at arm.com Wed Nov 11 14:32:17 2020 From: Alan.Hayward at arm.com (Alan Hayward) Date: Wed, 11 Nov 2020 14:32:17 +0000 Subject: RFR: 8221554: aarch64 cross-modifying code [v6] Message-ID: > On Mon, 19 Oct 2020 08:20:46 GMT, Robbin Ehn wrote: > >>> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >>> >>> Remove inlasm_isb define >>> >>> Change-Id: I2d0ef8a78292dac875f3f65d2253981cdb7a497a >> >> Seems fine to me, mostly look at shared code part. > > Patch merged to master - so that it's on top of pchilano's patch. > Already tested that both the patches work fine together (did a complete run of all our tests with VerifyCrossModifyFence set to true). > robehn has reviewed this patch, but I think I need a second review too (?) Quick ping for this. AIUI, it needs a second reviewer to vote ok. There aren?t any outstanding issues (that I?m aware of). ------------- PR: https://git.openjdk.java.net/jdk/pull/428 IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From rehn at openjdk.java.net Wed Nov 11 14:38:57 2020 From: rehn at openjdk.java.net (Robbin Ehn) Date: Wed, 11 Nov 2020 14:38:57 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v4] In-Reply-To: References: <0qlO5eLJPQgvI1Lg3pO8YpUeoC2zGexNt3DgHptiQiA=.e7ce4706-a702-4e32-a976-eac4adc24771@github.com> Message-ID: On Wed, 21 Oct 2020 15:34:16 GMT, Alan Hayward wrote: > Patch merged to master - so that it's on top of pchilano's patch. > Already tested that both the patches work fine together (did a complete run of all our tests with VerifyCrossModifyFence set to true). > robehn has reviewed this patch, but I think I need a second review too (?) I would recommend someone that knows aarch64 better than me, as I said mostly looked at shared parts. ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From jvernee at openjdk.java.net Wed Nov 11 14:42:12 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 11 Nov 2020 14:42:12 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 12:44:56 GMT, Vladimir Ivanov wrote: >> Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Merge pull request #7 from JornVernee/Additional_Review_Comments >> >> Additional review comments >> - Revert System.java changes >> - Set copyright year for added files to 2020 >> - Check result of AttachCurrentThread >> - Sort includes alphabetically >> - Relax ret_addr_offset() assert >> - Extra space after if >> - remove excessive asserts in ProgrammableInvoker::invoke_native >> - Remove os::is_MP() check >> - remove blank line in thread.hpp > > src/hotspot/share/opto/callnode.cpp line 1143: > >> 1141: case TypeFunc::Parms: >> 1142: default: { >> 1143: if(tf()->range()->field_at(proj->_con) == Type::HALF) { > > That's `TypeFunc::Parms+1` case in `CallNode::match`. Why did you decide to move it to `default` case? > > Overall, it looks very similar to `CallNode::match`. Why not just customize `OptoRegPair regs` computation for `CallNative` there? For native calls we can have multiple return values, at least in theory. Currently this is not the case though. Will take another look. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From jvernee at openjdk.java.net Wed Nov 11 14:48:04 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 11 Nov 2020 14:48:04 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 14:17:20 GMT, Vladimir Ivanov wrote: >> Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Merge pull request #7 from JornVernee/Additional_Review_Comments >> >> Additional review comments >> - Revert System.java changes >> - Set copyright year for added files to 2020 >> - Check result of AttachCurrentThread >> - Sort includes alphabetically >> - Relax ret_addr_offset() assert >> - Extra space after if >> - remove excessive asserts in ProgrammableInvoker::invoke_native >> - Remove os::is_MP() check >> - remove blank line in thread.hpp > > src/hotspot/cpu/x86/universalNativeInvoker_x86.cpp line 77: > >> 75: XMMRegister reg = _abi->_vector_argument_registers.at(i); >> 76: size_t offs = _layout->arguments_vector + i * sizeof(VectorRegister); >> 77: if (UseAVX >= 3) { > > Assuming worst-case scenario (w.r.t. value size) is sub-optimal. Considering FP values are handled as vectors you end up operating on 32-bit/64-bit values as if they were 512-bit in size. And, in addition to wasted memory bandwidth, EVEX-encoded instructions may trigger CPU frequency scaling which will penalize cases when AVX512 is not used. So, it is worth considering annotating vector values with their actual sizes and taking the size into account when operating on vectors. Yes, more cleanup is needed here. We don't support vectors at all right now, so I'd rather remove this code and only operate on XMM registers instead. In the future this could be handled using separate VMStorage types for different vector sizes. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From dcubed at openjdk.java.net Wed Nov 11 15:09:04 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 15:09:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: <9FnB_Mqa-xxw6ckDqqHgwo8RrtDOxwE5ji25mSZnCBc=.5664b114-621e-4c44-be28-47cdf45d855d@github.com> On Wed, 11 Nov 2020 04:50:52 GMT, David Holmes wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> resolve more robehn and coleenp comments. > > src/hotspot/share/runtime/synchronizer.cpp line 153: > >> 151: if (self->is_Java_thread()) { >> 152: // A JavaThread must check for a safepoint/handshake and honor it. >> 153: ObjectSynchronizer::chk_for_block_req(self->as_Java_thread(), "unlinking", > > I won't disapprove but this is a case where refactoring is IMO worse than code duplication. Logging parameters should not be a part of this API IMO. At this point, the refactoring has grown on me. I like the fact that it reduces the "noise" in those three functions. > src/hotspot/share/runtime/synchronizer.cpp line 1228: > >> 1226: os::naked_short_sleep(999); // sleep for almost 1 second >> 1227: } else { >> 1228: os::naked_short_sleep(999); // sleep for almost 1 second > > So this block can now just be: > >> if (self->is_Java_thread()) { >> ThreadBlockInVM tbivm(self->as_Java_thread()); >> } >> os::naked_short_sleep(999); // sleep for almost 1 second As @robehn pointed out, I screwed up that change because I lost the block over duration of the sleep. I've reverted that change to the original. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From thomas.schatzl at oracle.com Wed Nov 11 15:12:48 2020 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 11 Nov 2020 16:12:48 +0100 Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality In-Reply-To: References: Message-ID: <9155ea57-140f-6206-5cc6-cc0064129b1f@oracle.com> Hi, On 11.11.20 15:06, Thomas St?fe wrote: > Hi Thomas, > > I think this makes sense. Just a question, how are your thoughts about > JEP316? Do you consider AllocateHeapAt similarly unused? There are no plans to remove AllocateHeapAt functionality I am aware of. It is also much less intrusive than AllocateOldGenAt as you can see from this huge change, and actually works as far as I know :) AllocateHeapAt is also an easy way to get huge pages for your heap via HugeTLBFS without reserving them exclusively upfront, or putting the entire java heap onto NVDIMMs (afaiu, but you might notice that I'm no real expert in this). Thanks, Thomas From dcubed at openjdk.java.net Wed Nov 11 15:16:03 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 15:16:03 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: On Wed, 11 Nov 2020 13:50:08 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 246: >> >>> 244: // >>> 245: // Start the ceiling with the estimate for one thread: >>> 246: jint _in_use_list_ceiling = AvgMonitorsPerThreadEstimate; >> >> Why is this a jint when you use size_t for its accessor and all the other sizes that you compare with the ceiling are also size_t? >> I'm not sure size_t is right to use in these cases (do we really expect different maximums on 32-bit versus 64-bit?) but it should be all or none IMO. > > Our int types are really confused. AvgMonitorsPerThreadEstimate is defined as an intx which is intptr_t and the range of it is 0..max_jint which is 0 .. 0x7fffffff . jint is long on windows (the problematic type) and int on unix. Since this is a new declaration, it probably should be something other than jint but what? > At any rate, it should be declared as 'static'. `_in_use_list_ceiling` is a jint **because** we've specified the range as `0..max_jint` and I wanted some sanity to that variable's type. If I change `_in_use_list_ceiling` to `size_t`, then I get a compile time error probably because `AvgMonitorsPerThreadEstimate` is an `intx` which (I think) is my only choice for a command line option. @fisk will have to chime in with the background on why he picked `size_t`. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Wed Nov 11 15:16:04 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 15:16:04 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: <8BzDrxXiWh2l4ZOq6k0D5-cU76gtjFXa9S1esX4Kkqg=.a42d08ea-8662-461a-a5e1-0335ef4ecd83@github.com> On Wed, 11 Nov 2020 08:26:14 GMT, Robbin Ehn wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> resolve more robehn and coleenp comments. > > src/hotspot/share/runtime/synchronizer.cpp line 1226: > >> 1224: ThreadBlockInVM tbivm(self->as_Java_thread()); >> 1225: } >> 1226: os::naked_short_sleep(999); // sleep for almost 1 second > > Hi Dan, you need to be blocked while sleeping, otherwise you are blocking safepoints for "almost 1 second". > So previous code was correct. Good catch! Yes, I should not have written that change so late at night. I know better. I have reverted it. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Wed Nov 11 15:19:03 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 15:19:03 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: On Wed, 11 Nov 2020 05:12:21 GMT, David Holmes wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> resolve more robehn and coleenp comments. > > One change requested in relation to use of jint instead of size_t. > One code simplification suggestion. > Thanks, > David @dholmes-ora, @robehn and @coleenp - Thanks for chiming in on the review again. For the first time, I have a real conflict in a file so I'm updating my repo to the latest and greatest to see how that works. Then I'll make the minor tweaks that I mumbled about above. I did a Mach5 Tier1,2,3,4,5,6,7 last night and got through it with a single Tier7 unrelated and known failure. Of course, the baseline for last night was jdk-16+23 so... ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From ayang at openjdk.java.net Wed Nov 11 15:21:56 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 11 Nov 2020 15:21:56 GMT Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality In-Reply-To: References: Message-ID: <0nEbOq5anud0koNcq_5EV57wUd8wc8qHUCjI1GP2L0I=.ae3defc8-6c6b-40dd-894b-e5a4dde4d4bb@github.com> On Wed, 11 Nov 2020 11:11:25 GMT, Thomas Schatzl wrote: > Hi all, > > can I get reviews for this change that removes the "Allocation of old generation of Java heap on alternate memory devices" functionality introduced with JDK 12 with [JDK-8202286](https://bugs.openjdk.java.net/browse/JDK-8202286) due to being > > - not used by anyone > - not maintained by anyone, i.e. several bugs open for a long time and bit rotting > - requiring some workarounds for new feature development wrt to heap management > > All flags covered by this feature were experimental flags, so there are no additional procedural issues to take. > > I tried to remove all but a few minor cleanups that I thought useful, but of course this is very subjective. > > Testing: hs-tier1-5 A general comment for future PRs: I think it's best to isolate mechanical changes into their own commits; e.g. `HeapRegionManager* _hrm;` -> `HeapRegionManager _hrm;`. Otherwise, a real change, buried in the immense size of diff, might slip through. test/hotspot/jtreg/TEST.ROOT line 78: > 76: vm.musl \ > 77: docker.support \ > 78: test.vm.gc.nvdimm \ `test/jtreg-ext/requires/VMProps.java` still references `test.vm.gc.nvdimm`, btw. ------------- Marked as reviewed by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/1162 From dcubed at openjdk.java.net Wed Nov 11 15:26:02 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 15:26:02 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: On Wed, 11 Nov 2020 13:50:08 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 246: >> >>> 244: // >>> 245: // Start the ceiling with the estimate for one thread: >>> 246: jint _in_use_list_ceiling = AvgMonitorsPerThreadEstimate; >> >> Why is this a jint when you use size_t for its accessor and all the other sizes that you compare with the ceiling are also size_t? >> I'm not sure size_t is right to use in these cases (do we really expect different maximums on 32-bit versus 64-bit?) but it should be all or none IMO. > > Our int types are really confused. AvgMonitorsPerThreadEstimate is defined as an intx which is intptr_t and the range of it is 0..max_jint which is 0 .. 0x7fffffff . jint is long on windows (the problematic type) and int on unix. Since this is a new declaration, it probably should be something other than jint but what? > At any rate, it should be declared as 'static'. @coleenp - Nice catch on the missing 'static'. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From jvernee at openjdk.java.net Wed Nov 11 15:36:11 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 11 Nov 2020 15:36:11 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 11:05:47 GMT, Vladimir Ivanov wrote: >> Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Merge pull request #7 from JornVernee/Additional_Review_Comments >> >> Additional review comments >> - Revert System.java changes >> - Set copyright year for added files to 2020 >> - Check result of AttachCurrentThread >> - Sort includes alphabetically >> - Relax ret_addr_offset() assert >> - Extra space after if >> - remove excessive asserts in ProgrammableInvoker::invoke_native >> - Remove os::is_MP() check >> - remove blank line in thread.hpp > > src/hotspot/share/opto/machnode.cpp line 831: > >> 829: st->print("%s ",_name); >> 830: st->print("_arg_regs: "); >> 831: _arg_regs.print_on(st); > > It doesn't print any useful info: `_arg_regs: AllocatedObj(0x000000011cf5cbe8)`. Please, improve it. Ok, I added printing to GrowableArray at some point, but seems that this was removed in a merge maybe. > src/hotspot/share/opto/graphKit.cpp line 2665: > >> 2663: for (uint vm_ret_pos = 0; vm_ret_pos < n_returns; vm_ret_pos++) { >> 2664: if (new_call_type->range()->field_at(TypeFunc::Parms + vm_ret_pos) == Type::HALF) { >> 2665: // FIXME is this needed? > > Why do you need the projection at all? Please, clarify and remove `FIXME` comment. Was a leftover. Will remove > src/hotspot/share/opto/graphKit.cpp line 2675: > >> 2673: // Unpack native results if needed >> 2674: // Need this method type since it's unerased >> 2675: switch (nep->method_type()->rtype()->basic_type()) { > > Are calls returning multiple values supported right now? (From what I'm seeing in other places, they are not supported.) If not, then you don't need a loop over return values and there are other places where it can simplify code. Yes, multiple returns are not supported currently. Will simplify this. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From tschatzl at openjdk.java.net Wed Nov 11 15:38:10 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Nov 2020 15:38:10 GMT Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality [v2] In-Reply-To: References: Message-ID: > Hi all, > > can I get reviews for this change that removes the "Allocation of old generation of Java heap on alternate memory devices" functionality introduced with JDK 12 with [JDK-8202286](https://bugs.openjdk.java.net/browse/JDK-8202286) due to being > > - not used by anyone > - not maintained by anyone, i.e. several bugs open for a long time and bit rotting > - requiring some workarounds for new feature development wrt to heap management > > All flags covered by this feature were experimental flags, so there are no additional procedural issues to take. > > I tried to remove all but a few minor cleanups that I thought useful, but of course this is very subjective. > > Testing: hs-tier1-5 Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1162/files - new: https://git.openjdk.java.net/jdk/pull/1162/files/222739bc..9e849b60 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1162&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1162&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1162.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1162/head:pull/1162 PR: https://git.openjdk.java.net/jdk/pull/1162 From tschatzl at openjdk.java.net Wed Nov 11 15:47:59 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Nov 2020 15:47:59 GMT Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality [v2] In-Reply-To: <0nEbOq5anud0koNcq_5EV57wUd8wc8qHUCjI1GP2L0I=.ae3defc8-6c6b-40dd-894b-e5a4dde4d4bb@github.com> References: <0nEbOq5anud0koNcq_5EV57wUd8wc8qHUCjI1GP2L0I=.ae3defc8-6c6b-40dd-894b-e5a4dde4d4bb@github.com> Message-ID: <_bANQzB5ETr8QdwEiSRFKzMLwWE_IV-Pwpf0mSjtjmM=.574e4b29-db0b-4eaa-8065-a2c713c8c3c2@github.com> On Wed, 11 Nov 2020 15:09:44 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang review > > test/hotspot/jtreg/TEST.ROOT line 78: > >> 76: vm.musl \ >> 77: docker.support \ >> 78: test.vm.gc.nvdimm \ > > `test/jtreg-ext/requires/VMProps.java` still references `test.vm.gc.nvdimm`, btw. The original change introduced the `_hrm` -> `*_hrm` change, so this is a straightforward reversal, no optimization :) There has been one more removal possible by the `test.vm.gc.nvdimm` line in VMProps.java. Thanks for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1162 From rkennke at openjdk.java.net Wed Nov 11 15:55:19 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 11 Nov 2020 15:55:19 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v8] In-Reply-To: References: Message-ID: <62FWNENDnaI16zsUXVc9v0RuANXHS9FQRzNYajRXls0=.d2ad75ec-30b3-40b4-a66c-1d50c534dfae@github.com> > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Fix/invert condition in CmpP optimization - Fix after merge - Merge branch 'master' into JDK-8256011 - Merge branch 'master' into JDK-8256011 - Don't make phantom-access narrow (mistake when doing 32bit parts - Remove superfluous LP64 in aarch64 part - Fixes/missing parts for x86_64 - Aarch64 parts - Whitespace changes - Mask decorators in hash/cmp, not in ctor - ... and 8 more: https://git.openjdk.java.net/jdk/compare/8d27361d...c0ee9346 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/92a92fcd..c0ee9346 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=06-07 Stats: 6278 lines in 184 files changed: 4337 ins; 942 del; 999 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From dcubed at openjdk.java.net Wed Nov 11 16:01:15 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 16:01:15 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v7] In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also done promotion perf runs on these bits and says "the results look fine". Daniel D. Daugherty has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - @dholmes-ora, @robehn and @coleenp CR - a few more minor tweaks. - Merge branch 'master' into JDK-8253064 - resolve more robehn and coleenp comments. - coleenp CR - refactor common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). - dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). - Resolve more @dholmes-ora comments with help from @fisk. - Resolve most of dholmes-ora comments. - 8253064.v00.part2 - 8253064.v00.part1 ------------- Changes: https://git.openjdk.java.net/jdk/pull/642/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=642&range=06 Stats: 2506 lines in 25 files changed: 602 ins; 1697 del; 207 mod Patch: https://git.openjdk.java.net/jdk/pull/642.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/642/head:pull/642 PR: https://git.openjdk.java.net/jdk/pull/642 From neliasso at openjdk.java.net Wed Nov 11 16:12:03 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 11 Nov 2020 16:12:03 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Thu, 29 Oct 2020 10:28:00 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - JDK-8252848 : Review comments resolution. > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Replacing explicit type checks with existing type checking routines > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a Do you have any tests that exercise the different possible versions? - dynamic length with both small and long copies - dynamic length that can be proven always less than PartialInliningSize - constant size less than PartialInliningSize Except for these minor comments, and the tests, I am ready to approve. src/hotspot/share/opto/cfgnode.cpp line 397: > 395: } > 396: > 397: Remove unnecessary empty line src/hotspot/share/opto/node.hpp line 162: > 160: class StoreVectorScatterNode; > 161: class VectorMaskCmpNode; > 162: Remove line break or move it down one line below "class VectorSet;" ------------- Changes requested by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/302 From neliasso at openjdk.java.net Wed Nov 11 16:12:05 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 11 Nov 2020 16:12:05 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v13] In-Reply-To: References: Message-ID: On Fri, 6 Nov 2020 07:23:07 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - JDK-8252848 : Review comments resolved > - JDK-8252848: Review comments resolution. > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - JDK-8252848 : Review comments resolution. > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - ... and 4 more: https://git.openjdk.java.net/jdk/compare/5dfb42fc...ed343a9e src/hotspot/share/opto/cfgnode.hpp line 104: > 102: virtual Node* Ideal(PhaseGVN* phase, bool can_reshape); > 103: virtual const RegMask &out_RegMask() const; > 104: bool try_clean_mem_phi(PhaseGVN *phase); This changed line looks like a mistake. Please revert. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From dcubed at openjdk.java.net Wed Nov 11 16:24:06 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 16:24:06 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v7] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Wed, 11 Nov 2020 16:18:41 GMT, Robbin Ehn wrote: >> Daniel D. Daugherty has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - @dholmes-ora, @robehn and @coleenp CR - a few more minor tweaks. >> - Merge branch 'master' into JDK-8253064 >> - resolve more robehn and coleenp comments. >> - coleenp CR - refactor common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). >> - dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). >> - Resolve more @dholmes-ora comments with help from @fisk. >> - Resolve most of dholmes-ora comments. >> - 8253064.v00.part2 >> - 8253064.v00.part1 > > Thanks! Rebuild on my MBP13 and KitchensinkSanity look good. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From rehn at openjdk.java.net Wed Nov 11 16:24:06 2020 From: rehn at openjdk.java.net (Robbin Ehn) Date: Wed, 11 Nov 2020 16:24:06 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v7] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Wed, 11 Nov 2020 16:01:15 GMT, Daniel D. Daugherty wrote: >> Changes from @fisk and @dcubed-ojdk to: >> >> - simplify ObjectMonitor list management >> - get rid of Type-Stable Memory (TSM) >> >> This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. >> Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, >> SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) >> - a few minor regressions (<= -0.24%) >> - Volano is 6.8% better >> >> Eric C. has also done promotion perf runs on these bits and says "the results look fine". > > Daniel D. Daugherty has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - @dholmes-ora, @robehn and @coleenp CR - a few more minor tweaks. > - Merge branch 'master' into JDK-8253064 > - resolve more robehn and coleenp comments. > - coleenp CR - refactor common ThreadBlockInVM code block into ObjectSynchronizer::chk_for_block_req(). > - dholmes-ora - convert inner while loop to do-while loop in unlink_deflated(). > - Resolve more @dholmes-ora comments with help from @fisk. > - Resolve most of dholmes-ora comments. > - 8253064.v00.part2 > - 8253064.v00.part1 Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/642 From eosterlund at openjdk.java.net Wed Nov 11 16:24:07 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 11 Nov 2020 16:24:07 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: On Wed, 11 Nov 2020 15:23:15 GMT, Daniel D. Daugherty wrote: >> Our int types are really confused. AvgMonitorsPerThreadEstimate is defined as an intx which is intptr_t and the range of it is 0..max_jint which is 0 .. 0x7fffffff . jint is long on windows (the problematic type) and int on unix. Since this is a new declaration, it probably should be something other than jint but what? >> At any rate, it should be declared as 'static'. > > @coleenp - Nice catch on the missing 'static'. I typically use size_t for entities that can scale with the size of the machine's memory, so I don't have to think about whether there are enough bits. Could AvgMonitorsPerThreadEstimate be uintx instead of intx? And then maybe we don't need to declare a range, as the natural range of the uintx seems perfectly valid. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Wed Nov 11 16:24:08 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 16:24:08 GMT Subject: Integrated: 8253064: monitor list simplifications and getting rid of TSM In-Reply-To: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> Message-ID: On Tue, 13 Oct 2020 20:31:44 GMT, Daniel D. Daugherty wrote: > Changes from @fisk and @dcubed-ojdk to: > > - simplify ObjectMonitor list management > - get rid of Type-Stable Memory (TSM) > > This change has been tested with Mach5 Tier[1-3],4,5,6,7,8; no new regressions. > Aurora Perf runs have also been done (DaCapo-h2, Quick Startup/Footprint, > SPECjbb2015-Tuned-G1, SPECjbb2015-Tuned-ParGC, Volano) > - a few minor regressions (<= -0.24%) > - Volano is 6.8% better > > Eric C. has also done promotion perf runs on these bits and says "the results look fine". This pull request has now been integrated. Changeset: 2e19026d Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/2e19026d Stats: 2506 lines in 25 files changed: 602 ins; 1697 del; 207 mod 8253064: monitor list simplifications and getting rid of TSM Co-authored-by: Erik ?sterlund Reviewed-by: eosterlund, rehn, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Wed Nov 11 16:39:02 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 16:39:02 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: <4xLwmW1Wya1aTmUbGPONAA7V-ScyRsdUK467gNvdmCQ=.5c2329ea-4b45-4732-8abd-e94d1f89ad5f@github.com> On Wed, 11 Nov 2020 16:16:19 GMT, Erik ?sterlund wrote: >> @coleenp - Nice catch on the missing 'static'. > > I typically use size_t for entities that can scale with the size of the machine's memory, so I don't have to think about whether there are enough bits. Could AvgMonitorsPerThreadEstimate be uintx instead of intx? And then maybe we don't need to declare a range, as the natural range of the uintx seems perfectly valid. I'm pretty sure I copied the decl for AvgMonitorsPerThreadEstimate from some other already existing option. That's SOP for me anyway... If we make any more changes here it will have to be in a follow up. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Wed Nov 11 16:39:01 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 11 Nov 2020 16:39:01 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: On Wed, 11 Nov 2020 05:12:21 GMT, David Holmes wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> resolve more robehn and coleenp comments. > > One change requested in relation to use of jint instead of size_t. > One code simplification suggestion. > Thanks, > David This PR is now integrated! @dholmes-ora - for some reason you are not listed as a reviewer and I'm not sure why. I'm sorry I didn't notice that you were dropped off before I pulled the trigger. ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From aph at redhat.com Wed Nov 11 16:44:10 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 11 Nov 2020 16:44:10 +0000 Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: References: Message-ID: <3b96a543-f268-758c-09bd-4f8233d9c473@redhat.com> On 11/11/2020 14:32, Alan Hayward wrote: > Quick ping for this. > AIUI, it needs a second reviewer to vote ok. > There aren?t any outstanding issues (that I?m aware of). Please look at the issues still marked Pending. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at openjdk.java.net Wed Nov 11 16:59:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 11 Nov 2020 16:59:04 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v8] In-Reply-To: <62FWNENDnaI16zsUXVc9v0RuANXHS9FQRzNYajRXls0=.d2ad75ec-30b3-40b4-a66c-1d50c534dfae@github.com> References: <62FWNENDnaI16zsUXVc9v0RuANXHS9FQRzNYajRXls0=.d2ad75ec-30b3-40b4-a66c-1d50c534dfae@github.com> Message-ID: On Wed, 11 Nov 2020 15:55:19 GMT, Roman Kennke wrote: >> In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. >> >> I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: >> - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions >> - We can have strong and phantom native referents, and strong and weak in-heap referents >> - Native referents are never compressed >> >> Note that this depends on PR#1140. >> >> Testing: >> - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) >> - [x] tier1 +UseShenandoahGC +ShenandoahVerify >> - [x] tier2 +UseShenandoahGC +ShenandoahVerify > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: > > - Fix/invert condition in CmpP optimization > - Fix after merge > - Merge branch 'master' into JDK-8256011 > - Merge branch 'master' into JDK-8256011 > - Don't make phantom-access narrow (mistake when doing 32bit parts > - Remove superfluous LP64 in aarch64 part > - Fixes/missing parts for x86_64 > - Aarch64 parts > - Whitespace changes > - Mask decorators in hash/cmp, not in ctor > - ... and 8 more: https://git.openjdk.java.net/jdk/compare/682e0e24...c0ee9346 This looks fine to me. Good to go, assuming the tests pass. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1109 From iklam at openjdk.java.net Wed Nov 11 17:41:59 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 11 Nov 2020 17:41:59 GMT Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality [v2] In-Reply-To: <0nEbOq5anud0koNcq_5EV57wUd8wc8qHUCjI1GP2L0I=.ae3defc8-6c6b-40dd-894b-e5a4dde4d4bb@github.com> References: <0nEbOq5anud0koNcq_5EV57wUd8wc8qHUCjI1GP2L0I=.ae3defc8-6c6b-40dd-894b-e5a4dde4d4bb@github.com> Message-ID: On Wed, 11 Nov 2020 15:19:05 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang review > > A general comment for future PRs: I think it's best to isolate mechanical changes into their own commits; e.g. `HeapRegionManager* _hrm;` -> `HeapRegionManager _hrm;`. Otherwise, a real change, buried in the immense size of diff, might slip through. This is a good fix. It removes the dependency on the problematic API `ReservedSpace::first_part(..., split=true)`, which cannot be implemented correctly on Windows (see [JDK-8256079](https://bugs.openjdk.java.net/browse/JDK-8256079)). This API is used only by CDS (to be removed in [JDK-8255917](https://bugs.openjdk.java.net/browse/JDK-8255917)) and heterogenous heap. So hopefully we can remove this problematic API soon. ------------- PR: https://git.openjdk.java.net/jdk/pull/1162 From iignatyev at openjdk.java.net Wed Nov 11 17:45:58 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 11 Nov 2020 17:45:58 GMT Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality [v2] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 15:38:10 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this change that removes the "Allocation of old generation of Java heap on alternate memory devices" functionality introduced with JDK 12 with [JDK-8202286](https://bugs.openjdk.java.net/browse/JDK-8202286) due to being >> >> - not used by anyone >> - not maintained by anyone, i.e. several bugs open for a long time and bit rotting >> - requiring some workarounds for new feature development wrt to heap management >> >> All flags covered by this feature were experimental flags, so there are no additional procedural issues to take. >> >> I tried to remove all but a few minor cleanups that I thought useful, but of course this is very subjective. >> >> Testing: hs-tier1-5 > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1162 From github.com+168222+mgkwill at openjdk.java.net Wed Nov 11 18:06:00 2020 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Wed, 11 Nov 2020 18:06:00 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap Message-ID: Use 2m pages for executable large pages and in large page requests less than 1g on linux. - Add os::exec_large_page_size() that returns 2m as size - Add os::select_large_page_size() to return correct large page size for size_t bytes - Add 2m size to _page_sizes array - Update reserve_memory_special methods to set/use large_page_size based on exec size - Update large page not reserved warnings to include large_page_size attempted - Update TestLargePageUseForAuxMemory.java to expect 2m large pages in some instances Signed-off-by: Marcus G K Williams ------------- Commit messages: - Default to 2M LargePages for code Changes: https://git.openjdk.java.net/jdk/pull/1153/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256155 Stats: 99 lines in 3 files changed: 69 ins; 0 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/1153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153 PR: https://git.openjdk.java.net/jdk/pull/1153 From rkennke at openjdk.java.net Wed Nov 11 18:13:07 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 11 Nov 2020 18:13:07 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v9] In-Reply-To: References: Message-ID: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Move resurrection-barrier from JDK-8256020 into right place after merge - Merge branch 'master' into JDK-8256011 - Fix/invert condition in CmpP optimization - Fix after merge - Merge branch 'master' into JDK-8256011 - Merge branch 'master' into JDK-8256011 - Don't make phantom-access narrow (mistake when doing 32bit parts - Remove superfluous LP64 in aarch64 part - Fixes/missing parts for x86_64 - Aarch64 parts - ... and 10 more: https://git.openjdk.java.net/jdk/compare/96e02610...492cea4d ------------- Changes: https://git.openjdk.java.net/jdk/pull/1109/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=08 Stats: 337 lines in 16 files changed: 82 ins; 85 del; 170 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From shade at openjdk.java.net Wed Nov 11 20:04:00 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 11 Nov 2020 20:04:00 GMT Subject: RFR: 8253525: Implement getInstanceSize/sizeOf intrinsics [v6] In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 20:13:41 GMT, Serguei Spitsyn wrote: > One more nit, I forgot to list in my previous comment, is uneeded '()' around comparisons: > `+ static final int REF_SIZE = ((compressedOops == null) || (compressedOops == true)) ? 4 : 8;` Right. Fixed that. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/650 From rkennke at openjdk.java.net Wed Nov 11 20:34:07 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 11 Nov 2020 20:34:07 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v10] In-Reply-To: References: Message-ID: <1a7sW4Qf-6GcT82aBF5kbIW47ulaLfGO-nob6jl5uFI=.235b42d8-b4fc-4737-b1ee-8ee9ce478d56@github.com> > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into JDK-8256011 - Move resurrection-barrier from JDK-8256020 into right place after merge - Merge branch 'master' into JDK-8256011 - Fix/invert condition in CmpP optimization - Fix after merge - Merge branch 'master' into JDK-8256011 - Merge branch 'master' into JDK-8256011 - Don't make phantom-access narrow (mistake when doing 32bit parts - Remove superfluous LP64 in aarch64 part - Fixes/missing parts for x86_64 - ... and 11 more: https://git.openjdk.java.net/jdk/compare/59965c17...3eeee3c3 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1109/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=09 Stats: 337 lines in 16 files changed: 82 ins; 85 del; 170 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From github.com+58006833+xbzhang99 at openjdk.java.net Wed Nov 11 20:35:17 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 11 Nov 2020 20:35:17 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v4] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/757192c3..fc1dff49 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=02-03 Stats: 69 lines in 3 files changed: 65 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Wed Nov 11 20:39:14 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 11 Nov 2020 20:39:14 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v5] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/fc1dff49..b23c8cba Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Wed Nov 11 20:50:55 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 11 Nov 2020 20:50:55 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v5] In-Reply-To: References: <_NtWpo5mp-I1zmPtn883ezvoO1jA7WdJflfoFXcpL70=.96a06d93-47a4-48dd-88fc-f3c777315291@github.com> Message-ID: On Mon, 9 Nov 2020 22:10:07 GMT, Xubo Zhang wrote: >> The regression tests for exp should be explicitly updated to cover the previously erroneous input if they do not do so already. > > Hi Darcy, > Where should the test be? A new test file? > > Best regards, > Xubo > > From: Joe Darcy > Sent: Monday, November 9, 2020 1:33 PM > To: openjdk/jdk > Cc: Zhang, Xubo ; Mention > Subject: Re: [openjdk/jdk] 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms (#894) > > > @jddarcy commented on this pull request. > > ________________________________ > > In test/jdk/java/lang/Math/WorstCaseTests.java: > >> @@ -114,8 +114,8 @@ private static int testWorstExp() { > > {+0x1.A8EAD058BC6B8p3, 0x1.1D71965F516ADp19}, > > {+0x1.1D5C2DAEBE367p4, 0x1.A8C02E974C314p25}, > > {+0x1.C44CE0D716A1Ap4, 0x1.B890CA8637AE1p40}, > > - {+0x4.0p8, Double.POSITIVE_INFINITY}, > > - {+0x2.71p12, Double.POSITIVE_INFINITY}, > > + {+0x4.0p8, Double.POSITIVE_INFINITY}, //bug 8255368 gave 0 > > This is not an appropriate test to update to cover this bug. This test is specifically probing at difficult cases in double arithmetic for the underlying mathematically function as opposed to flaws in a particular implementation. > > ? > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or unsubscribe. Sure. I reversed the changes in WorstCaseTests.java, and added a new test file ExpCornerCaseTests.java in test/jdk/java/lang/Math ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+51754783+coreyashford at openjdk.java.net Wed Nov 11 21:33:58 2020 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Wed, 11 Nov 2020 21:33:58 GMT Subject: Integrated: 8248188: Add IntrinsicCandidate and API for Base64 decoding In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 02:45:36 GMT, Corey Ashford wrote: > This patch set encompasses the following commits: > > - Adds a new HotSpot intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. > - Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation > - Adds a Power64LE-specific implementation of the decodeBlock intrinsic. > - Adds a JMH microbenchmark for both Base64 encoding and encoding. > - Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. This pull request has now been integrated. Changeset: ccb48b72 Author: Corey Ashford Committer: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/ccb48b72 Stats: 1905 lines in 25 files changed: 1878 ins; 4 del; 23 mod 8248188: Add IntrinsicCandidate and API for Base64 decoding 8248188: Add IntrinsicCandidate and API for Base64 decoding, add Power64LE intrinsic implementation. This patch set encompasses the following commits: Adds a new intrinsic candidate to the java.lang.Base64 class - decodeBlock(), and provides a flexible API for the intrinsic. The API is similar to the existing encodeBlock intrinsic. Adds the code in HotSpot to check and martial the new intrinsic's arguments to the arch-specific intrinsic implementation. Adds a Power64LE-specific implementation of the decodeBlock intrinsic. Adds a JMH microbenchmark for both Base64 encoding and encoding. Enhances the JTReg hotspot intrinsic "TestBase64.java" regression test to more fully test both decoding and encoding. Reviewed-by: rriggs, mdoerr, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/293 From david.holmes at oracle.com Wed Nov 11 22:15:41 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Nov 2020 08:15:41 +1000 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> Message-ID: <68912b7c-a600-fbea-6c0b-b4350ec8cc5c@oracle.com> On 12/11/2020 2:39 am, Daniel D.Daugherty wrote: > On Wed, 11 Nov 2020 05:12:21 GMT, David Holmes wrote: > >>> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >>> >>> resolve more robehn and coleenp comments. >> >> One change requested in relation to use of jint instead of size_t. >> One code simplification suggestion. >> Thanks, >> David > > This PR is now integrated! @dholmes-ora - for some reason you are not listed > as a reviewer and I'm not sure why. I'm sorry I didn't notice that you were dropped > off before I pulled the trigger. I'm not listed as a Reviewer because I did not mark this as approved and had an outstanding "change requested" status. That's just the way things work. I still think the jint/size_t situation is an unnecessary mess, but appreciate the extra comments you added. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/642 > From david.holmes at oracle.com Wed Nov 11 22:17:48 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Nov 2020 08:17:48 +1000 Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: <7rQQpVtTPecL-Vt4JJ0CR3U_1523KksQwFAsn9QpJK8=.5e872ffe-d1e4-40e9-9dd4-595712452e5c@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> <7rQQpVtTPecL-Vt4JJ0CR3U_1523KksQwFAsn9QpJK8=.5e872ffe-d1e4-40e9-9dd4-595712452e5c@github.com> Message-ID: <900e4b3d-47d1-eecd-53f3-15ff16da3741@oracle.com> On 11/11/2020 7:51 am, Daniel D.Daugherty wrote: > On Tue, 10 Nov 2020 21:16:33 GMT, Robbin Ehn wrote: > >>> So if I narrow the scope around the ThreadBlockInVM, then it would be fine? >>> >>> { >>> // Honor block request. >>> ThreadBlockInVM tbivm(self); >>> } >>> >>> I can make that change before I integrate... >> >> Yes that avoids it! > > Done. I also did the one in ObjectSynchronizer::request_deflate_idle_monitors(). Just to be crystal clear, the change in request_deflate_idle_monitors() was not needed as there is no logging code in the scope, and changing it was wrong as it put the sleep outside the scope of the TBIVM. Hence it was reverted and all is well. Thanks, David From iklam at openjdk.java.net Thu Nov 12 00:34:55 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 12 Nov 2020 00:34:55 GMT Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality [v2] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 15:38:10 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I get reviews for this change that removes the "Allocation of old generation of Java heap on alternate memory devices" functionality introduced with JDK 12 with [JDK-8202286](https://bugs.openjdk.java.net/browse/JDK-8202286) due to being >> >> - not used by anyone >> - not maintained by anyone, i.e. several bugs open for a long time and bit rotting >> - requiring some workarounds for new feature development wrt to heap management >> >> All flags covered by this feature were experimental flags, so there are no additional procedural issues to take. >> >> I tried to remove all but a few minor cleanups that I thought useful, but of course this is very subjective. >> >> Testing: hs-tier1-5 > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review LGTM ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1162 From ysuenaga at openjdk.java.net Thu Nov 12 01:36:56 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Thu, 12 Nov 2020 01:36:56 GMT Subject: RFR: 8252657: JVMTI agent is not unloaded when Agent_OnAttach is failed In-Reply-To: <-Xrp6c94000-jE1p6NvzjsxFUW5ILrH_F1eT1i7esw8=.9d609f81-1b61-4ebf-9afd-73b834c1b18c@github.com> References: <1H1wUQdxCLU2qddqEIYSx2iOhIKL3b5etUmjsS6NBlU=.0bf1fe0c-8dcf-4ca0-bd57-b8794d5f2810@github.com> <80LJDTCsT_y-KlThryd5Bxu5RRyrjmKfs5p9vJUn61E=.68b594a0-fe58-4f4d-a49c-eec2e90f9373@github.com> <-Xrp6c94000-jE1p6NvzjsxFUW5ILrH_F1eT1i7esw8=.9d609f81-1b61-4ebf-9afd-73b834c1b18c@github.com> Message-ID: On Mon, 19 Oct 2020 01:32:45 GMT, Yasumasa Suenaga wrote: >>> * Q1: Is it necessary to call the Agent_OnUnload()? >> >> [JVMTI spec of Agent_OnUnload()](https://docs.oracle.com/en/java/javase/15/docs/specs/jvmti.html#onunload) says this function will be called when the agent library will be unloaded by platform specific mechanism. OTOH it also says `Agent_OnUnload()` will be called both at VM termination and **by other reasons**. >> The spec don't say for the case if `Agent_OnAttach()` would be failed. IMHO `Agent_OnUnload()` should be called because this PR would unload library if `Agent_OnAttach()` failed. >> >>> * Q2: Would it be a JVMTI spec violation to call the Agent_OnAttach() multiple times? (It seems to be the case to me.) >> >> `Agent_OnAttach()` should be called only once per attach request, but VM should accept multiple attach request for same agent library. >> >> For example, we can add multiple `-agentlib` and `-agentpath` request as below. JVMTI agent might change behavior due to arguments or configuration file. >> >> -agentlib:test=profile=A -agentlib:test=profile=B -agentpath:/path/to/libtest=profile=C >> >> Agent developers should have responsibility for the behavior when more than one agent is loaded at a time. >> >>> * Q3: What has to be done for statically linked agent? >> >> JVMTI spec says "unless it is statically linked into the executable", so I think we can ignore about Agent_OnUnload_L() in this PR. >> >>> * Q4: Should the agent be correctly loadable in the first place? What were the reasons its loading to fail? >> >> Agent (`Agent_OnAttach()`) might fail due to error in agent logic. For example, some agents load configuration file at initialization. If the user gives wrong value, it will fail. >> >>> Yes, at least, a CSR is needed for this. >> >> I will file CSR for this PR after this discussion. > > If we can change the spec that agent library would not be unloaded when `Agent_OnAttach()` failed, we can change like [webrev.00](https://cr.openjdk.java.net/~ysuenaga/JDK-8252657/webrev.00/). It is simple, and similar behavior with `Agent_OnLoad()`. It might be prefer for JVMTI agent developers. In case of `Agent_OnLoad()`, if it is failed (it returns other than zero), JVM is aborted and `Agent_OnUnload()` is not called. I think it is compliant with [JVMTI spec of Agent_OnUnload()](https://docs.oracle.com/en/java/javase/15/docs/specs/jvmti.html#onunload) which says uncontrolled shutdown (aborting JVM) is an exception to this rule. I will add CSR for this fix, but I want to discuss what we should do before. I like that `Agent_OnUnload()` wouldn't be called when `Agent_OnAttach()` is failed if we can change the spec - it is consistent and friendly with `Agent_OnUnload()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/19 From iklam at openjdk.java.net Thu Nov 12 01:49:11 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 12 Nov 2020 01:49:11 GMT Subject: RFR: 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp [v2] In-Reply-To: References: Message-ID: <4bq4kZB40GTPmR2CjRRUBiKHfTT9rXHhS1JqXEOd-bA=.f1b30506-4e13-4443-b834-7f233ff94e7f@github.com> > jvmti.h is included 905 times and jvmtiExport.hpp is included 776 times (out of 971 hotspot .o files). Most of these are unnecessarily included by the following 3 popular header files: > > * javaClasses.hpp: `java_lang_Class::ThreadStatus` (which depends on jvmti.h) this type is rarely used. Move this type to a separate header file. The enum is also changed to an enum class for better type safety. > * os.hpp: No need to include jvmti.h. Use forward declaration for `struct jvmtiTimerInfo;` instead. > * thread.hpp: No need to include jvmExport.hpp. Use forward declaration for `JvmtiSampledObjectAllocEventCollector` and `JvmtiVMObjectAllocEventCollector` instead. > > Tested with mach5 tier1-2 and build tiers 3-5. > > Note: Many files are changed, but most of the changes are adding a missing jvmtiExports.hpp. > > Original review thread: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-August/041509.html Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8252526-include-jvmti-hpp - 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1152/files - new: https://git.openjdk.java.net/jdk/pull/1152/files/becd1f67..4d0171e6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1152&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1152&range=00-01 Stats: 8837 lines in 139 files changed: 5759 ins; 2203 del; 875 mod Patch: https://git.openjdk.java.net/jdk/pull/1152.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1152/head:pull/1152 PR: https://git.openjdk.java.net/jdk/pull/1152 From iklam at openjdk.java.net Thu Nov 12 01:49:12 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 12 Nov 2020 01:49:12 GMT Subject: RFR: 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp [v2] In-Reply-To: <6ZCqzq3jEjC-XId2G8FgW268uQvrUISUrrcf6-FIEkM=.c4362162-9b7e-49ab-ba2d-08c76d1e872d@github.com> References: <6ZCqzq3jEjC-XId2G8FgW268uQvrUISUrrcf6-FIEkM=.c4362162-9b7e-49ab-ba2d-08c76d1e872d@github.com> Message-ID: On Wed, 11 Nov 2020 12:16:22 GMT, Magnus Ihse Bursie wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' of https://github.com/openjdk/jdk into 8252526-include-jvmti-hpp >> - 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp > > Thank you for your continued effort to simplify building of hotspot. All efforts to clean up the include mess edges us slowly towards a more efficient hotspot build (which will improve build times overall, since most of the product depends on hotspot). Thanks @magicus @kimbarrett for the review ------------- PR: https://git.openjdk.java.net/jdk/pull/1152 From iklam at openjdk.java.net Thu Nov 12 01:49:15 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 12 Nov 2020 01:49:15 GMT Subject: Integrated: 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 00:03:56 GMT, Ioi Lam wrote: > jvmti.h is included 905 times and jvmtiExport.hpp is included 776 times (out of 971 hotspot .o files). Most of these are unnecessarily included by the following 3 popular header files: > > * javaClasses.hpp: `java_lang_Class::ThreadStatus` (which depends on jvmti.h) this type is rarely used. Move this type to a separate header file. The enum is also changed to an enum class for better type safety. > * os.hpp: No need to include jvmti.h. Use forward declaration for `struct jvmtiTimerInfo;` instead. > * thread.hpp: No need to include jvmExport.hpp. Use forward declaration for `JvmtiSampledObjectAllocEventCollector` and `JvmtiVMObjectAllocEventCollector` instead. > > Tested with mach5 tier1-2 and build tiers 3-5. > > Note: Many files are changed, but most of the changes are adding a missing jvmtiExports.hpp. > > Original review thread: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2020-August/041509.html This pull request has now been integrated. Changeset: 2f06893a Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/2f06893a Stats: 232 lines in 65 files changed: 125 ins; 34 del; 73 mod 8252526: Remove excessive inclusion of jvmti.h and jvmtiExport.hpp Reviewed-by: ihse, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/1152 From thomas.stuefe at gmail.com Thu Nov 12 05:03:09 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 12 Nov 2020 06:03:09 +0100 Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality In-Reply-To: <9155ea57-140f-6206-5cc6-cc0064129b1f@oracle.com> References: <9155ea57-140f-6206-5cc6-cc0064129b1f@oracle.com> Message-ID: Hi Thomas, On Wed, Nov 11, 2020 at 4:12 PM Thomas Schatzl wrote: > Hi, > > On 11.11.20 15:06, Thomas St?fe wrote: > > Hi Thomas, > > > > I think this makes sense. Just a question, how are your thoughts about > > JEP316? Do you consider AllocateHeapAt similarly unused? > > There are no plans to remove AllocateHeapAt functionality I am aware of. > It is also much less intrusive than AllocateOldGenAt as you can see from > this huge change, and actually works as far as I know :) > Yes, I am baffled by how invasive all that stuff was. AllocateHeapAt caused a bit of complexity at the virtual memory APIs we could do without, but we should maybe revise those APIs anyway. They are really cracking at their seams. > AllocateHeapAt is also an easy way to get huge pages for your heap via > HugeTLBFS without reserving them exclusively upfront, or putting the > entire java heap onto NVDIMMs (afaiu, but you might notice that I'm no > real expert in this). I see. I need to play around with this a bit more :) > > Thanks, > Thomas > BTW I think it is good that we remove features if they are not maintained. Cheers, Thomas From tschatzl at openjdk.java.net Thu Nov 12 09:06:06 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 12 Nov 2020 09:06:06 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 11:39:59 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? > > Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. > > With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. > > This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. > > Testing: tier1-5, one or two 6-8 runs > > The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements > > Thanks, > Thomas Change the handling of Open Archive areas, instead of assuming that everything in there is live always, a root containing references to all live root objects is provided. Adapt G1 to handle Open Archive regions as any other old region apart from never compacting or evacuating them. ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From tschatzl at openjdk.java.net Thu Nov 12 09:06:05 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 12 Nov 2020 09:06:05 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions Message-ID: Hi all, can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. Testing: tier1-5, one or two 6-8 runs The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements Thanks, Thomas ------------- Commit messages: - Initial import Changes: https://git.openjdk.java.net/jdk/pull/1163/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1163&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253081 Stats: 657 lines in 32 files changed: 467 ins; 83 del; 107 mod Patch: https://git.openjdk.java.net/jdk/pull/1163.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1163/head:pull/1163 PR: https://git.openjdk.java.net/jdk/pull/1163 From iklam at openjdk.java.net Thu Nov 12 09:06:06 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 12 Nov 2020 09:06:06 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 12:31:52 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? >> >> Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. >> >> With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. >> >> This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. >> >> Testing: tier1-5, one or two 6-8 runs >> >> The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements >> >> Thanks, >> Thomas > > Change the handling of Open Archive areas, instead of assuming that everything in there is live always, a root containing references to all live root objects is provided. Adapt G1 to handle Open Archive regions as any other old region apart from never compacting or evacuating them. The CDS changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From ngasson at openjdk.java.net Thu Nov 12 09:10:01 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 12 Nov 2020 09:10:01 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> References: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> Message-ID: On Fri, 23 Oct 2020 08:50:52 GMT, Alan Hayward wrote: >> The AArch64 port uses maybe_isb in places where an ISB might be required >> because the code may have safepointed. These maybe_isbs are very conservative >> and are used in many places are used when a safepoint has not happened. >> >> cross_modify_fence was added in common code to place a barrier in all the >> places after a safepoint has occurred. All the uses of it are in common code, >> yet it remains unimplemented on AArch64. >> >> This set of patches implements cross_modify_fence for AArch64 and reconsiders >> every uses of maybe_isb, discarding many of them. In addition, it introduces >> a new diagnostic option, which when enabled on AArch64 tests the correct >> usage of the barriers. >> >> Advantage of this patch is threefold: >> * Reducing the number of ISBs - giving a theoretical performance improvement. >> * Use of common code instead of backend specific code. >> * Additional test diagnostic options >> >> Patch 1: Split cross_modify_fence >> ================================= >> This is simply refactoring work split out to simplify the other two patches. >> >> instruction_fence() is provided by each target and simply places >> a fence for the instruction stream. >> >> cross_modify_fence() is now a member of JavaThread and just calls >> instruction_fence. This function will be extended in Patch 3. >> >> Patch 2: Use cross_modify_fence instead of maybe_isb >> ==================================================== >> >> The [n] References refer to the comments for cross_modify_fence in >> thread.hpp. >> >> This is all the existing uses of maybe_isb in the AArch64 target: >> >> 1) Instances of Java code calling a VM function >> * This encapsulates the changes to: >> ** MacroAssembler::call_VM_leaf_base() >> ** generate_fast_get_int_field0() >> ** stubGenerator_aarch64 generate_throw_exception() >> ** sharedRuntime_aarch64 generate_handler_blob() >> ** SharedRuntime::generate_resolve_blob() >> ** C1 LIR_Assembler::rt_call >> ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, >> generate_handle_exception, generate_code_for. >> ** OptoRuntime::generate_exception_blob() >> * Any changes will be caught due to calls to [2] or [3] by the VM function. >> * Any calls that do not call [2] or [3] do not require an ISB. >> * This patch is more optimal for these cases. >> >> 2) Instances of Java code calling a JNI function >> * This encapsulates the changes to: >> ** SharedRuntime::generate_native_wrapper() >> ** TemplateInterpreterGenerator::generate_native_entry() >> * A safepoint still in progress after the call with be caught by [4]. >> * An ISB is still required for the case where there was a safepoint >> but it completed during the call. This happens if the code doesn't >> branch on safepoint_in_progress >> * In the SharedRuntime version, the two possible calls to >> reguard_yellow_pages and complete_monitor_unlocking_C are after the thread >> goes back into it's original state, so are covered by [2] and [3], the >> same as a normal VM call. >> * This patch is only more optimal for the two post-JNI calls. >> >> 3) Patching functions >> * This encapsulates the changes to: >> ** patch_callers_callsite() (called by gen_c2i_adapter()) >> * This results in code being patched, but does not safepoint >> * Therefore an ISB is required. >> * This patch introduces no change here. >> >> 4) C1 MacroAssembler::emit_static_call_stub() >> * Calls ISB (not maybe_isb) >> * By design, the patching doesn't require that the up-to-date >> destination is required for proper functioning. >> * However, the ISB makes it most likely that the new destination will >> be picked up. >> * This patch introduces no change here. >> >> Patch 3: Add cross modify fence verification >> ============================================ >> >> The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct >> usage of instruction barriers. It can safely be enabled on any Java run. >> >> Enabling it will cause the following: >> >> * Once all threads have been brought to a safepoint, each thread will be >> marked. >> >> * On a cross_modify_fence and safepoint_fence the mark for that thread >> will be cleared. >> >> * On entry to a method and in a safepoint poll, then the thread is checked. >> If it is marked, then the code will error. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge master > > Change-Id: I97df4e7686699478f0f89451ec0a3537d38cfd6d > - Merge master > > Change-Id: I5e1715fdb11305191fe7bf86cbfb7a6da446b3dc > - Remove inlasm_isb define > > Change-Id: I2d0ef8a78292dac875f3f65d2253981cdb7a497a > - AArch64: Add cross modify fence verification > - AArch64: Use cross_modify_fence instead of maybe_isb > - Split cross_modify_fence src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.hpp line 60: > 58: } > 59: > 60: #undef inlasm_isb Don't need this `#undef` any more. ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From mbaesken at openjdk.java.net Thu Nov 12 09:21:03 2020 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Thu, 12 Nov 2020 09:21:03 GMT Subject: RFR: JDK-8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe Message-ID: Most return values of CodeCache::find_blob_unsafe are NULL checked or at least handled by an asserts. However especially on ppc a few are missing and should be added (e.g. in nativeInst_ppc.cpp). ------------- Commit messages: - JDK-8256258 Changes: https://git.openjdk.java.net/jdk/pull/1181/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1181&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256258 Stats: 10 lines in 3 files changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1181/head:pull/1181 PR: https://git.openjdk.java.net/jdk/pull/1181 From rkennke at openjdk.java.net Thu Nov 12 09:54:07 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 12 Nov 2020 09:54:07 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v11] In-Reply-To: References: Message-ID: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Call phantom LRB when ON_PHANTOM_OOP_REF is requested in C1 LRB stub ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/3eeee3c3..29347682 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From jbhateja at openjdk.java.net Thu Nov 12 10:10:00 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 12 Nov 2020 10:10:00 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: <0_LT7bB5ut9xNX4zaFidW8UBC3lFRMh91qaMtBw21nI=.d0cedf97-d1cb-43c0-bb69-04495f611bb3@github.com> On Wed, 11 Nov 2020 16:09:20 GMT, Nils Eliasson wrote: > Do you have any tests that exercise the different possible versions? > > * dynamic length with both small and long copies > * dynamic length that can be proven always less than PartialInliningSize > * constant size less than PartialInliningSize > > Except for these minor comments, and the tests, I am ready to approve. Hi Nils, Thanks for your comments, Suggested tests have already been added as the part of commit for JDK-8252847 test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyConjoint.java test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyDisjoint.java I shall remove extra spaces before integration. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From shade at openjdk.java.net Thu Nov 12 10:57:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 12 Nov 2020 10:57:55 GMT Subject: RFR: JDK-8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 09:11:51 GMT, Matthias Baesken wrote: > Most return values of CodeCache::find_blob_unsafe are NULL checked or at least handled by an asserts. However especially on ppc a few are missing and should be added (e.g. in nativeInst_ppc.cpp). Leftover below? Otherwise I think the change is fine, because the alternative is `NULL`-dereference and (hopefully) SEGV. hb-subset-plan.lst line 2: > 1: GRARNN: gr29643 used before defined gr29643 at: > 2: 0: L4Z gr29647=(hb_array_t).length(gr29643,8) What's this change? ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1181 From mbaesken at openjdk.java.net Thu Nov 12 11:13:01 2020 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Thu, 12 Nov 2020 11:13:01 GMT Subject: RFR: JDK-8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 10:54:06 GMT, Aleksey Shipilev wrote: >> Most return values of CodeCache::find_blob_unsafe are NULL checked or at least handled by an asserts. However especially on ppc a few are missing and should be added (e.g. in nativeInst_ppc.cpp). > > hb-subset-plan.lst line 2: > >> 1: GRARNN: gr29643 used before defined gr29643 at: >> 2: 0: L4Z gr29647=(hb_array_t).length(gr29643,8) > > What's this change? seems this is somehow related to harfbuzz but it should be removed of course ------------- PR: https://git.openjdk.java.net/jdk/pull/1181 From mbaesken at openjdk.java.net Thu Nov 12 11:24:11 2020 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Thu, 12 Nov 2020 11:24:11 GMT Subject: RFR: JDK-8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe [v2] In-Reply-To: References: Message-ID: > Most return values of CodeCache::find_blob_unsafe are NULL checked or at least handled by an asserts. However especially on ppc a few are missing and should be added (e.g. in nativeInst_ppc.cpp). Matthias Baesken has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: JDK-8256258 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1181/files - new: https://git.openjdk.java.net/jdk/pull/1181/files/e0156d33..7c196d3e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1181&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1181&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1181/head:pull/1181 PR: https://git.openjdk.java.net/jdk/pull/1181 From shade at openjdk.java.net Thu Nov 12 11:24:12 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 12 Nov 2020 11:24:12 GMT Subject: RFR: JDK-8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 11:21:14 GMT, Matthias Baesken wrote: >> Most return values of CodeCache::find_blob_unsafe are NULL checked or at least handled by an asserts. However especially on ppc a few are missing and should be added (e.g. in nativeInst_ppc.cpp). > > Matthias Baesken has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > JDK-8256258 Looks good to me. Are these cases only exist in AIX/PPC code? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1181 From mbaesken at openjdk.java.net Thu Nov 12 11:24:12 2020 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Thu, 12 Nov 2020 11:24:12 GMT Subject: RFR: JDK-8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 11:17:10 GMT, Aleksey Shipilev wrote: >Looks good to me. Are these cases only exist in AIX/PPC code? I found find_blob_unsafe calls with unchecked/asserted return val only in ppc and ppc/AIX code. ( Regarding find_blob there are a few that are checked and a few without a NULL check/assert cross platforms, this might be intentional) ------------- PR: https://git.openjdk.java.net/jdk/pull/1181 From sjohanss at openjdk.java.net Thu Nov 12 11:39:55 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 12 Nov 2020 11:39:55 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 01:48:46 GMT, Marcus G K Williams wrote: > Use 2m pages for executable large > pages and in large page requests less > than 1g on linux. > > - Add os::exec_large_page_size() that > returns 2m as size > - Add os::select_large_page_size() to return > correct large page size for size_t bytes > - Add 2m size to _page_sizes array > - Update reserve_memory_special methods > to set/use large_page_size based on exec > size > - Update large page not reserved warnings > to include large_page_size attempted > - Update TestLargePageUseForAuxMemory.java > to expect 2m large pages in some instances > > Signed-off-by: Marcus G K Williams Hi and welcome :) I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments: * Why do we have a special case for `exec` when selecting a large page size? * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From mcimadamore at openjdk.java.net Thu Nov 12 12:18:09 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Thu, 12 Nov 2020 12:18:09 GMT Subject: RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v29] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into 8254162 - Invert condition in memory access var handle `withInvokeBehavior' - Merge branch 'master' into 8254162 - Add more output in TestHandhsake.java - Further improve output of TestHandshake - Improve debugging output of TestHandhsake - Remove endianness-aware byte getter/setter in MemoryAccess Remove index-based version of byte getter/setter in MemoryAccess - Fix post-merge issues caused by 8219014 - Merge branch 'master' into 8254162 - Addess remaining feedback from @AlanBateman and @mrserb - ... and 26 more: https://git.openjdk.java.net/jdk/compare/ec08b3f2...0b81a39e ------------- Changes: https://git.openjdk.java.net/jdk/pull/548/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=28 Stats: 7600 lines in 82 files changed: 4791 ins; 1590 del; 1219 mod Patch: https://git.openjdk.java.net/jdk/pull/548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548 PR: https://git.openjdk.java.net/jdk/pull/548 From rkennke at openjdk.java.net Thu Nov 12 12:41:12 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 12 Nov 2020 12:41:12 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v12] In-Reply-To: References: Message-ID: <_cPA4cY7jgq7ZOiXVcPTTZ6SdoBiAIIH_OGrsswU2tg=.2788ba6a-985c-46b0-b73d-c9d9a198bdfd@github.com> > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Avoid null-check and skip cset-check on non-strong refs in C1 LRB stub - Call weak-LRB with ON_WEAK_OOP_REF, not ON_UNKNOWN_OOP_REF (cosmetic) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/29347682..83c07cb8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=10-11 Stats: 26 lines in 2 files changed: 6 ins; 7 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From hseigel at openjdk.java.net Thu Nov 12 13:26:54 2020 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 12 Nov 2020 13:26:54 GMT Subject: Integrated: 8255787: Tag container tests that use cGroups with cgroups keyword In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 21:24:25 GMT, Harold Seigel wrote: > Please review this small change to add a cgroups keyword to tests that use cgroups. The fix was tested by running Mach5 container tests. This pull request has now been integrated. Changeset: 4df8abc2 Author: Harold Seigel URL: https://git.openjdk.java.net/jdk/commit/4df8abc2 Stats: 20 lines in 15 files changed: 15 ins; 0 del; 5 mod 8255787: Tag container tests that use cGroups with cgroups keyword Reviewed-by: sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/1148 From hseigel at openjdk.java.net Thu Nov 12 13:30:57 2020 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 12 Nov 2020 13:30:57 GMT Subject: RFR: 8255787: Tag container tests that use cGroups with cgroups keyword In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 00:27:59 GMT, Serguei Spitsyn wrote: >> Please review this small change to add a cgroups keyword to tests that use cgroups. The fix was tested by running Mach5 container tests. > > Hi Harold, > > The fix looks good. > > Thanks, > Serguei Thanks Serguei! Harold ------------- PR: https://git.openjdk.java.net/jdk/pull/1148 From tschatzl at openjdk.java.net Thu Nov 12 14:09:00 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 12 Nov 2020 14:09:00 GMT Subject: RFR: 8256181: Remove Allocation of old generation on alternate memory devices functionality [v2] In-Reply-To: <0nEbOq5anud0koNcq_5EV57wUd8wc8qHUCjI1GP2L0I=.ae3defc8-6c6b-40dd-894b-e5a4dde4d4bb@github.com> References: <0nEbOq5anud0koNcq_5EV57wUd8wc8qHUCjI1GP2L0I=.ae3defc8-6c6b-40dd-894b-e5a4dde4d4bb@github.com> Message-ID: On Wed, 11 Nov 2020 15:19:05 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang review > > A general comment for future PRs: I think it's best to isolate mechanical changes into their own commits; e.g. `HeapRegionManager* _hrm;` -> `HeapRegionManager _hrm;`. Otherwise, a real change, buried in the immense size of diff, might slip through. Thanks @albertnetymk @iklam @iignatev for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1162 From tschatzl at openjdk.java.net Thu Nov 12 14:09:01 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 12 Nov 2020 14:09:01 GMT Subject: Integrated: 8256181: Remove Allocation of old generation on alternate memory devices functionality In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 11:11:25 GMT, Thomas Schatzl wrote: > Hi all, > > can I get reviews for this change that removes the "Allocation of old generation of Java heap on alternate memory devices" functionality introduced with JDK 12 with [JDK-8202286](https://bugs.openjdk.java.net/browse/JDK-8202286) due to being > > - not used by anyone > - not maintained by anyone, i.e. several bugs open for a long time and bit rotting > - requiring some workarounds for new feature development wrt to heap management > > All flags covered by this feature were experimental flags, so there are no additional procedural issues to take. > > I tried to remove all but a few minor cleanups that I thought useful, but of course this is very subjective. > > Testing: hs-tier1-5 This pull request has now been integrated. Changeset: bd8693a0 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/bd8693a0 Stats: 2315 lines in 50 files changed: 4 ins; 2183 del; 128 mod 8256181: Remove Allocation of old generation on alternate memory devices functionality Reviewed-by: ayang, iignatyev, iklam ------------- PR: https://git.openjdk.java.net/jdk/pull/1162 From github.com+4146708+a74nh at openjdk.java.net Thu Nov 12 14:31:12 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Thu, 12 Nov 2020 14:31:12 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v7] In-Reply-To: References: Message-ID: > The AArch64 port uses maybe_isb in places where an ISB might be required > because the code may have safepointed. These maybe_isbs are very conservative > and are used in many places are used when a safepoint has not happened. > > cross_modify_fence was added in common code to place a barrier in all the > places after a safepoint has occurred. All the uses of it are in common code, > yet it remains unimplemented on AArch64. > > This set of patches implements cross_modify_fence for AArch64 and reconsiders > every uses of maybe_isb, discarding many of them. In addition, it introduces > a new diagnostic option, which when enabled on AArch64 tests the correct > usage of the barriers. > > Advantage of this patch is threefold: > * Reducing the number of ISBs - giving a theoretical performance improvement. > * Use of common code instead of backend specific code. > * Additional test diagnostic options > > Patch 1: Split cross_modify_fence > ================================= > This is simply refactoring work split out to simplify the other two patches. > > instruction_fence() is provided by each target and simply places > a fence for the instruction stream. > > cross_modify_fence() is now a member of JavaThread and just calls > instruction_fence. This function will be extended in Patch 3. > > Patch 2: Use cross_modify_fence instead of maybe_isb > ==================================================== > > The [n] References refer to the comments for cross_modify_fence in > thread.hpp. > > This is all the existing uses of maybe_isb in the AArch64 target: > > 1) Instances of Java code calling a VM function > * This encapsulates the changes to: > ** MacroAssembler::call_VM_leaf_base() > ** generate_fast_get_int_field0() > ** stubGenerator_aarch64 generate_throw_exception() > ** sharedRuntime_aarch64 generate_handler_blob() > ** SharedRuntime::generate_resolve_blob() > ** C1 LIR_Assembler::rt_call > ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, > generate_handle_exception, generate_code_for. > ** OptoRuntime::generate_exception_blob() > * Any changes will be caught due to calls to [2] or [3] by the VM function. > * Any calls that do not call [2] or [3] do not require an ISB. > * This patch is more optimal for these cases. > > 2) Instances of Java code calling a JNI function > * This encapsulates the changes to: > ** SharedRuntime::generate_native_wrapper() > ** TemplateInterpreterGenerator::generate_native_entry() > * A safepoint still in progress after the call with be caught by [4]. > * An ISB is still required for the case where there was a safepoint > but it completed during the call. This happens if the code doesn't > branch on safepoint_in_progress > * In the SharedRuntime version, the two possible calls to > reguard_yellow_pages and complete_monitor_unlocking_C are after the thread > goes back into it's original state, so are covered by [2] and [3], the > same as a normal VM call. > * This patch is only more optimal for the two post-JNI calls. > > 3) Patching functions > * This encapsulates the changes to: > ** patch_callers_callsite() (called by gen_c2i_adapter()) > * This results in code being patched, but does not safepoint > * Therefore an ISB is required. > * This patch introduces no change here. > > 4) C1 MacroAssembler::emit_static_call_stub() > * Calls ISB (not maybe_isb) > * By design, the patching doesn't require that the up-to-date > destination is required for proper functioning. > * However, the ISB makes it most likely that the new destination will > be picked up. > * This patch introduces no change here. > > Patch 3: Add cross modify fence verification > ============================================ > > The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct > usage of instruction barriers. It can safely be enabled on any Java run. > > Enabling it will cause the following: > > * Once all threads have been brought to a safepoint, each thread will be > marked. > > * On a cross_modify_fence and safepoint_fence the mark for that thread > will be cleared. > > * On entry to a method and in a safepoint poll, then the thread is checked. > If it is marked, then the code will error. Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Update comments & remove ifdef Change-Id: Ibbe45650d351d8cff6fbf7a7c8baf30afbdac17c CustomizedGitHooks: yes - Merge master 2020/11/12 Change-Id: I73323c90765bf8524f12f680abde7e7e5b3bb898 CustomizedGitHooks: yes - Merge master Change-Id: I97df4e7686699478f0f89451ec0a3537d38cfd6d - Merge master Change-Id: I5e1715fdb11305191fe7bf86cbfb7a6da446b3dc - Remove inlasm_isb define Change-Id: I2d0ef8a78292dac875f3f65d2253981cdb7a497a - AArch64: Add cross modify fence verification - AArch64: Use cross_modify_fence instead of maybe_isb - Split cross_modify_fence ------------- Changes: https://git.openjdk.java.net/jdk/pull/428/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=428&range=06 Stats: 164 lines in 25 files changed: 117 ins; 8 del; 39 mod Patch: https://git.openjdk.java.net/jdk/pull/428.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/428/head:pull/428 PR: https://git.openjdk.java.net/jdk/pull/428 From github.com+4146708+a74nh at openjdk.java.net Thu Nov 12 14:31:13 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Thu, 12 Nov 2020 14:31:13 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: References: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> Message-ID: On Thu, 12 Nov 2020 09:07:01 GMT, Nick Gasson wrote: >> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> >> Change-Id: I97df4e7686699478f0f89451ec0a3537d38cfd6d >> - Merge master >> >> Change-Id: I5e1715fdb11305191fe7bf86cbfb7a6da446b3dc >> - Remove inlasm_isb define >> >> Change-Id: I2d0ef8a78292dac875f3f65d2253981cdb7a497a >> - AArch64: Add cross modify fence verification >> - AArch64: Use cross_modify_fence instead of maybe_isb >> - Split cross_modify_fence > > src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.hpp line 60: > >> 58: } >> 59: >> 60: #undef inlasm_isb > > Don't need this `#undef` any more. Missed that, thanks. Removed now ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From Alan.Hayward at arm.com Thu Nov 12 14:44:14 2020 From: Alan.Hayward at arm.com (Alan Hayward) Date: Thu, 12 Nov 2020 14:44:14 +0000 Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: <3b96a543-f268-758c-09bd-4f8233d9c473@redhat.com> References: <3b96a543-f268-758c-09bd-4f8233d9c473@redhat.com> Message-ID: <198AD316-111F-41A2-B6E1-71C2F15F67F2@arm.com> > On 11 Nov 2020, at 16:44, Andrew Haley wrote: > > On 11/11/2020 14:32, Alan Hayward wrote: >> Quick ping for this. >> AIUI, it needs a second reviewer to vote ok. >> There aren?t any outstanding issues (that I?m aware of). > > Please look at the issues still marked Pending. > I?m probably missing something obvious, but I don?t see anything marked on the gitHub page, or see anything outstanding in the comments. Unless you just mean missing having a sponsor? Spotted the comments in the patch were slightly out of date, so I?ve updated those in the meantime (plus merged to head). Alan. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From rkennke at openjdk.java.net Thu Nov 12 15:18:10 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 12 Nov 2020 15:18:10 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v13] In-Reply-To: References: Message-ID: <0ZJsK6RHI2m-zr7RoheY-uQCsO-AxfxNxMyVjERjD18=.f22c8dff-d391-48bd-8aca-a65592a0ae30@github.com> > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Don't call compressed-oops LRB on IN_NATIVE from C1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/83c07cb8..51811e31 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=12 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=11-12 Stats: 29 lines in 3 files changed: 24 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From stuefe at openjdk.java.net Thu Nov 12 15:39:00 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 12 Nov 2020 15:39:00 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap In-Reply-To: References: Message-ID: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> On Wed, 11 Nov 2020 01:48:46 GMT, Marcus G K Williams wrote: > Use 2m pages for executable large > pages and in large page requests less > than 1g on linux. > > - Add os::exec_large_page_size() that > returns 2m as size > - Add os::select_large_page_size() to return > correct large page size for size_t bytes > - Add 2m size to _page_sizes array > - Update reserve_memory_special methods > to set/use large_page_size based on exec > size > - Update large page not reserved warnings > to include large_page_size attempted > - Update TestLargePageUseForAuxMemory.java > to expect 2m large pages in some instances > > Signed-off-by: Marcus G K Williams Hi, this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) Why is this proposal hard coded to 2M pages? What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. For SHM, I think you need to make sure that alignment matches SHMLBA? It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). The linux-2m-page-specific code in the platform-generic G1 test seems wrong. Cheers, Thomas src/hotspot/os/linux/os_linux.cpp line 3970: > 3968: char* req_addr, bool exec) { > 3969: size_t large_page_size; > 3970: large_page_size = os::select_large_page_size(bytes, exec); The "os" is shared and platform generic. Please don't add anything there unless you write (and test as much as possible) the different platforms. I do not see why this API should even be exported from this unit. src/hotspot/os/linux/os_linux.cpp line 4002: > 4000: char msg[128]; > 4001: jio_snprintf(msg, sizeof(msg), "Failed to reserve shared memory with large_page_size: " SIZE_FORMAT ".", large_page_size); > 4002: shm_warning_format_with_errno("%s", msg); Why the double printf here? But you can just use Univeral Logging ` log_info(os)("..") `. See e.g. thread creation in this file for examples. test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 80: > 78: } > 79: > 80: static void testVM(String what, long heapsize, boolean cardsShouldUseLargePages, boolean bitmapShouldUseLargePages, boolean largePages2m) throws Exception { Having this linux-specific stuff in a generic G1 test :( test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 150: > 148: if (Platform.isLinux() && largePageSize != largePageExecSize) { > 149: try { > 150: Scanner scan_hugepages = new Scanner(new File("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages")); 2M hard coded. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From aph at openjdk.java.net Thu Nov 12 15:59:59 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 12 Nov 2020 15:59:59 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v3] In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 12:45:19 GMT, Alan Hayward wrote: >> The AArch64 port uses maybe_isb in places where an ISB might be required >> because the code may have safepointed. These maybe_isbs are very conservative >> and are used in many places are used when a safepoint has not happened. >> >> cross_modify_fence was added in common code to place a barrier in all the >> places after a safepoint has occurred. All the uses of it are in common code, >> yet it remains unimplemented on AArch64. >> >> This set of patches implements cross_modify_fence for AArch64 and reconsiders >> every uses of maybe_isb, discarding many of them. In addition, it introduces >> a new diagnostic option, which when enabled on AArch64 tests the correct >> usage of the barriers. >> >> Advantage of this patch is threefold: >> * Reducing the number of ISBs - giving a theoretical performance improvement. >> * Use of common code instead of backend specific code. >> * Additional test diagnostic options >> >> Patch 1: Split cross_modify_fence >> ================================= >> This is simply refactoring work split out to simplify the other two patches. >> >> instruction_fence() is provided by each target and simply places >> a fence for the instruction stream. >> >> cross_modify_fence() is now a member of JavaThread and just calls >> instruction_fence. This function will be extended in Patch 3. >> >> Patch 2: Use cross_modify_fence instead of maybe_isb >> ==================================================== >> >> The [n] References refer to the comments for cross_modify_fence in >> thread.hpp. >> >> This is all the existing uses of maybe_isb in the AArch64 target: >> >> 1) Instances of Java code calling a VM function >> * This encapsulates the changes to: >> ** MacroAssembler::call_VM_leaf_base() >> ** generate_fast_get_int_field0() >> ** stubGenerator_aarch64 generate_throw_exception() >> ** sharedRuntime_aarch64 generate_handler_blob() >> ** SharedRuntime::generate_resolve_blob() >> ** C1 LIR_Assembler::rt_call >> ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, >> generate_handle_exception, generate_code_for. >> ** OptoRuntime::generate_exception_blob() >> * Any changes will be caught due to calls to [2] or [3] by the VM function. >> * Any calls that do not call [2] or [3] do not require an ISB. >> * This patch is more optimal for these cases. >> >> 2) Instances of Java code calling a JNI function >> * This encapsulates the changes to: >> ** SharedRuntime::generate_native_wrapper() >> ** TemplateInterpreterGenerator::generate_native_entry() >> * A safepoint still in progress after the call with be caught by [4]. >> * An ISB is still required for the case where there was a safepoint >> but it completed during the call. This happens if the code doesn't >> branch on safepoint_in_progress >> * In the SharedRuntime version, the two possible calls to >> reguard_yellow_pages and complete_monitor_unlocking_C are after the thread >> goes back into it's original state, so are covered by [2] and [3], the >> same as a normal VM call. >> * This patch is only more optimal for the two post-JNI calls. >> >> 3) Patching functions >> * This encapsulates the changes to: >> ** patch_callers_callsite() (called by gen_c2i_adapter()) >> * This results in code being patched, but does not safepoint >> * Therefore an ISB is required. >> * This patch introduces no change here. >> >> 4) C1 MacroAssembler::emit_static_call_stub() >> * Calls ISB (not maybe_isb) >> * By design, the patching doesn't require that the up-to-date >> destination is required for proper functioning. >> * However, the ISB makes it most likely that the new destination will >> be picked up. >> * This patch introduces no change here. >> >> Patch 3: Add cross modify fence verification >> ============================================ >> >> The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct >> usage of instruction barriers. It can safely be enabled on any Java run. >> >> Enabling it will cause the following: >> >> * Once all threads have been brought to a safepoint, each thread will be >> marked. >> >> * On a cross_modify_fence and safepoint_fence the mark for that thread >> will be cleared. >> >> * On entry to a method and in a safepoint poll, then the thread is checked. >> If it is marked, then the code will error. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - AArch64: Add cross modify fence verification > - AArch64: Use cross_modify_fence instead of maybe_isb > - Split cross_modify_fence Still pending. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5317: > 5315: #endif > 5316: } > 5317: Unless VerifyCrossModifyFence is turned on in debug builds it will almost never be used. Please turn this on by default in AArch64 debug builds. ------------- Changes requested by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/428 From aph at openjdk.java.net Thu Nov 12 16:00:05 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 12 Nov 2020 16:00:05 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> References: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> Message-ID: <0PpHCCKEdktOLbil22UM8L7Zpqcp82ejR9Zdu6VmmWI=.186fcfb1-33fc-45ad-9c49-a67552a837b9@github.com> On Fri, 23 Oct 2020 08:50:52 GMT, Alan Hayward wrote: >> The AArch64 port uses maybe_isb in places where an ISB might be required >> because the code may have safepointed. These maybe_isbs are very conservative >> and are used in many places are used when a safepoint has not happened. >> >> cross_modify_fence was added in common code to place a barrier in all the >> places after a safepoint has occurred. All the uses of it are in common code, >> yet it remains unimplemented on AArch64. >> >> This set of patches implements cross_modify_fence for AArch64 and reconsiders >> every uses of maybe_isb, discarding many of them. In addition, it introduces >> a new diagnostic option, which when enabled on AArch64 tests the correct >> usage of the barriers. >> >> Advantage of this patch is threefold: >> * Reducing the number of ISBs - giving a theoretical performance improvement. >> * Use of common code instead of backend specific code. >> * Additional test diagnostic options >> >> Patch 1: Split cross_modify_fence >> ================================= >> This is simply refactoring work split out to simplify the other two patches. >> >> instruction_fence() is provided by each target and simply places >> a fence for the instruction stream. >> >> cross_modify_fence() is now a member of JavaThread and just calls >> instruction_fence. This function will be extended in Patch 3. >> >> Patch 2: Use cross_modify_fence instead of maybe_isb >> ==================================================== >> >> The [n] References refer to the comments for cross_modify_fence in >> thread.hpp. >> >> This is all the existing uses of maybe_isb in the AArch64 target: >> >> 1) Instances of Java code calling a VM function >> * This encapsulates the changes to: >> ** MacroAssembler::call_VM_leaf_base() >> ** generate_fast_get_int_field0() >> ** stubGenerator_aarch64 generate_throw_exception() >> ** sharedRuntime_aarch64 generate_handler_blob() >> ** SharedRuntime::generate_resolve_blob() >> ** C1 LIR_Assembler::rt_call >> ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, >> generate_handle_exception, generate_code_for. >> ** OptoRuntime::generate_exception_blob() >> * Any changes will be caught due to calls to [2] or [3] by the VM function. >> * Any calls that do not call [2] or [3] do not require an ISB. >> * This patch is more optimal for these cases. >> >> 2) Instances of Java code calling a JNI function >> * This encapsulates the changes to: >> ** SharedRuntime::generate_native_wrapper() >> ** TemplateInterpreterGenerator::generate_native_entry() >> * A safepoint still in progress after the call with be caught by [4]. >> * An ISB is still required for the case where there was a safepoint >> but it completed during the call. This happens if the code doesn't >> branch on safepoint_in_progress >> * In the SharedRuntime version, the two possible calls to >> reguard_yellow_pages and complete_monitor_unlocking_C are after the thread >> goes back into it's original state, so are covered by [2] and [3], the >> same as a normal VM call. >> * This patch is only more optimal for the two post-JNI calls. >> >> 3) Patching functions >> * This encapsulates the changes to: >> ** patch_callers_callsite() (called by gen_c2i_adapter()) >> * This results in code being patched, but does not safepoint >> * Therefore an ISB is required. >> * This patch introduces no change here. >> >> 4) C1 MacroAssembler::emit_static_call_stub() >> * Calls ISB (not maybe_isb) >> * By design, the patching doesn't require that the up-to-date >> destination is required for proper functioning. >> * However, the ISB makes it most likely that the new destination will >> be picked up. >> * This patch introduces no change here. >> >> Patch 3: Add cross modify fence verification >> ============================================ >> >> The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct >> usage of instruction barriers. It can safely be enabled on any Java run. >> >> Enabling it will cause the following: >> >> * Once all threads have been brought to a safepoint, each thread will be >> marked. >> >> * On a cross_modify_fence and safepoint_fence the mark for that thread >> will be cleared. >> >> * On entry to a method and in a safepoint poll, then the thread is checked. >> If it is marked, then the code will error. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge master > > Change-Id: I97df4e7686699478f0f89451ec0a3537d38cfd6d > - Merge master > > Change-Id: I5e1715fdb11305191fe7bf86cbfb7a6da446b3dc > - Remove inlasm_isb define > > Change-Id: I2d0ef8a78292dac875f3f65d2253981cdb7a497a > - AArch64: Add cross modify fence verification > - AArch64: Use cross_modify_fence instead of maybe_isb > - Split cross_modify_fence src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1413: > 1411: __ blr(rscratch2); > 1412: // An instruction sync is required here after the call into the VM. However, > 1413: // that will have been caught in the VM by a cross_modify_fence call. I think this wording is confusing; I had to read it several times. Would not something like ''' // When we return from the VM, the instruction stream may have // been modified. Previously we emitted an ISB at this point, but // it's now unnecessary because the VM itself calls cross_modify_fence() ''' be better? src/hotspot/share/runtime/orderAccess.hpp line 237: > 235: // to the instruction code preceding the fence is not reordered w.r.t. any > 236: // memory accesses to instruction code subsequent to the fence in program order. > 237: // It should be used in conjunction with safepointing to ensure that changes This is rather misleading: the AArch64 needs the ISB for the instruction pipeline rather than the cache, which is invalidated by the IC IVAU broadcast. I suspect other processors work in the same way. The language in the AArch64 spec is better, IMO: "ensures that all instructions that come after the ISB instruction in program order are fetched from the cache or memory after the ISB instruction has completed" ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From aph at openjdk.java.net Thu Nov 12 16:00:07 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 12 Nov 2020 16:00:07 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v3] In-Reply-To: References: Message-ID: On Mon, 12 Oct 2020 14:51:31 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - AArch64: Add cross modify fence verification >> - AArch64: Use cross_modify_fence instead of maybe_isb >> - Split cross_modify_fence > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5317: > >> 5315: #endif >> 5316: } >> 5317: > > Unless VerifyCrossModifyFence is turned on in debug builds it will almost never be used. Please turn this on by default in AArch64 debug builds. Please... ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From aph at openjdk.java.net Thu Nov 12 16:00:08 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 12 Nov 2020 16:00:08 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: <0PpHCCKEdktOLbil22UM8L7Zpqcp82ejR9Zdu6VmmWI=.186fcfb1-33fc-45ad-9c49-a67552a837b9@github.com> References: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> <0PpHCCKEdktOLbil22UM8L7Zpqcp82ejR9Zdu6VmmWI=.186fcfb1-33fc-45ad-9c49-a67552a837b9@github.com> Message-ID: On Fri, 23 Oct 2020 10:10:09 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> >> Change-Id: I97df4e7686699478f0f89451ec0a3537d38cfd6d >> - Merge master >> >> Change-Id: I5e1715fdb11305191fe7bf86cbfb7a6da446b3dc >> - Remove inlasm_isb define >> >> Change-Id: I2d0ef8a78292dac875f3f65d2253981cdb7a497a >> - AArch64: Add cross modify fence verification >> - AArch64: Use cross_modify_fence instead of maybe_isb >> - Split cross_modify_fence > > src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1413: > >> 1411: __ blr(rscratch2); >> 1412: // An instruction sync is required here after the call into the VM. However, >> 1413: // that will have been caught in the VM by a cross_modify_fence call. > > I think this wording is confusing; I had to read it several times. > > Would not something like > ''' > // When we return from the VM, the instruction stream may have > // been modified. Previously we emitted an ISB at this point, but > // it's now unnecessary because the VM itself calls cross_modify_fence() > ''' > > be better? This is still pending. ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From coleenp at openjdk.java.net Thu Nov 12 16:23:23 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 12 Nov 2020 16:23:23 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v8] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add logging to event posting in case of pauses. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/6dec83d8..0487b84c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=06-07 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From mcimadamore at openjdk.java.net Thu Nov 12 16:41:01 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Thu, 12 Nov 2020 16:41:01 GMT Subject: Integrated: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 17:13:22 GMT, Maurizio Cimadamore wrote: > This patch contains the changes associated with the third incubation round of the foreign memory access API incubation (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways: > > * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads > * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually > * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower. > > A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference. > > This has all changed as per this API refresh; now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`). > > A list of the API, implementation and test changes is provided below. If you have any questions, or need more detailed explanations, I (and the rest of the Panama team) will be happy to point at existing discussions, and/or to provide the feedback required. > > A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey. > > Thanks > Maurizio > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff: > > http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254163 > > > > ### API Changes > > * `MemorySegment` > * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below) > * added a no-arg factory for a native restricted segment representing entire native heap > * rename `withOwnerThread` to `handoff` > * add new `share` method, to create shared segments > * add new `registerCleaner` method, to register a segment against a cleaner > * add more helpers to create arrays from a segment e.g. `toIntArray` > * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors) > * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`) > * `MemoryAddress` > * drop `segment` accessor > * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment > * `MemoryAccess` > * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`). > * `MemoryHandles` > * drop `withOffset` combinator > * drop `withStride` combinator > * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators. > * `Addressable` > * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`. > * `MemoryLayouts` > * A new layout, for machine addresses, has been added to the mix. > > > > ### Implementation changes > > There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support. > > #### Shared segments > > The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it. > > After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints. > > Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]). > > The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed. > > As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such. > > In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on. > > To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners). > > Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present). > > `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully. > > The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail. > > #### Memory access var handles overhaul > > The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form. > > This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle. > > This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone. > > #### Test changes > > Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case. > > [1] - https://openjdk.java.net/jeps/393 > [2] - https://openjdk.java.net/jeps/389 > [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html > [4] - https://openjdk.java.net/jeps/312 This pull request has now been integrated. Changeset: 3e70aac5 Author: Maurizio Cimadamore URL: https://git.openjdk.java.net/jdk/commit/3e70aac5 Stats: 7600 lines in 82 files changed: 4791 ins; 1590 del; 1219 mod 8254162: Implementation of Foreign-Memory Access API (Third Incubator) Reviewed-by: erikj, psandoz, alanb ------------- PR: https://git.openjdk.java.net/jdk/pull/548 From vlivanov at openjdk.java.net Thu Nov 12 17:05:06 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 12 Nov 2020 17:05:06 GMT Subject: RFR: 8256275: Optimized build is broken Message-ID: Fix optimized build. Testing: - [x] manual build with --with-debug-level=optimized - [x] hs-precheckin-comp, hs-tier1, hs-tier2 ------------- Commit messages: - Fix optimized build Changes: https://git.openjdk.java.net/jdk/pull/1185/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1185&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256275 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1185/head:pull/1185 PR: https://git.openjdk.java.net/jdk/pull/1185 From redestad at openjdk.java.net Thu Nov 12 17:15:00 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 12 Nov 2020 17:15:00 GMT Subject: RFR: 8256275: Optimized build is broken In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 12:53:24 GMT, Vladimir Ivanov wrote: > Fix optimized build. > > Testing: > - [x] manual build with --with-debug-level=optimized > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1185 From github.com+4146708+a74nh at openjdk.java.net Thu Nov 12 17:28:14 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Thu, 12 Nov 2020 17:28:14 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v3] In-Reply-To: References: Message-ID: On Fri, 23 Oct 2020 10:17:09 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5317: >> >>> 5315: #endif >>> 5316: } >>> 5317: >> >> Unless VerifyCrossModifyFence is turned on in debug builds it will almost never be used. Please turn this on by default in AArch64 debug builds. > > Please... Aha - Looks like your comments hadn't been made public until now. The problem is it massively slows down a run. A tier1 test run for fastdebug went from 1h 32m 58s to 3h 43m 47s. I didn't think that would be acceptable. ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From github.com+4146708+a74nh at openjdk.java.net Thu Nov 12 17:28:15 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Thu, 12 Nov 2020 17:28:15 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: References: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> <0PpHCCKEdktOLbil22UM8L7Zpqcp82ejR9Zdu6VmmWI=.186fcfb1-33fc-45ad-9c49-a67552a837b9@github.com> Message-ID: On Wed, 11 Nov 2020 15:04:32 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1413: >> >>> 1411: __ blr(rscratch2); >>> 1412: // An instruction sync is required here after the call into the VM. However, >>> 1413: // that will have been caught in the VM by a cross_modify_fence call. >> >> I think this wording is confusing; I had to read it several times. >> >> Would not something like >> ''' >> // When we return from the VM, the instruction stream may have >> // been modified. Previously we emitted an ISB at this point, but >> // it's now unnecessary because the VM itself calls cross_modify_fence() >> ''' >> >> be better? > > This is still pending. I wanted to avoid mentioning code that no longer exists. (Maybe it's best to just drop the comment?) How about: // When we return from the VM, the instruction stream may have // been modified. Therefore needs an isb is required. The VM will // have already done this by calling cross_modify_fence(). ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From github.com+4146708+a74nh at openjdk.java.net Thu Nov 12 17:28:18 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Thu, 12 Nov 2020 17:28:18 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: <0PpHCCKEdktOLbil22UM8L7Zpqcp82ejR9Zdu6VmmWI=.186fcfb1-33fc-45ad-9c49-a67552a837b9@github.com> References: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> <0PpHCCKEdktOLbil22UM8L7Zpqcp82ejR9Zdu6VmmWI=.186fcfb1-33fc-45ad-9c49-a67552a837b9@github.com> Message-ID: On Fri, 23 Oct 2020 10:15:38 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> >> Change-Id: I97df4e7686699478f0f89451ec0a3537d38cfd6d >> - Merge master >> >> Change-Id: I5e1715fdb11305191fe7bf86cbfb7a6da446b3dc >> - Remove inlasm_isb define >> >> Change-Id: I2d0ef8a78292dac875f3f65d2253981cdb7a497a >> - AArch64: Add cross modify fence verification >> - AArch64: Use cross_modify_fence instead of maybe_isb >> - Split cross_modify_fence > > src/hotspot/share/runtime/orderAccess.hpp line 237: > >> 235: // to the instruction code preceding the fence is not reordered w.r.t. any >> 236: // memory accesses to instruction code subsequent to the fence in program order. >> 237: // It should be used in conjunction with safepointing to ensure that changes > > This is rather misleading: the AArch64 needs the ISB for the instruction pipeline rather than the cache, which is invalidated by the IC IVAU broadcast. I suspect other processors work in the same way. > The language in the AArch64 spec is better, IMO: > "ensures that all instructions that come after the ISB instruction in program order are fetched from > the cache or memory after the ISB instruction has completed" This better? // Finally, we define an "instruction_fence" operation, which ensures that all // instructions that come after the ISB instruction in program order are fetched // from the cache or memory after the ISB instruction has completed // It should be used in conjunction with safepointing to ensure that changes // to the instruction stream are seen on exit from a safepoint. Namely: .....etc ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From mcimadamore at openjdk.java.net Thu Nov 12 17:58:21 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Thu, 12 Nov 2020 17:58:21 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v19] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 78 commits: - Merge branch 'master' into 8254231_linker - Merge pull request #7 from JornVernee/Additional_Review_Comments Additional review comments - Revert System.java changes - Set copyright year for added files to 2020 - Check result of AttachCurrentThread - Sort includes alphabetically - Relax ret_addr_offset() assert - Extra space after if - remove excessive asserts in ProgrammableInvoker::invoke_native - Remove os::is_MP() check - ... and 68 more: https://git.openjdk.java.net/jdk/compare/3e70aac5...56099dac ------------- Changes: https://git.openjdk.java.net/jdk/pull/634/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=18 Stats: 67452 lines in 213 files changed: 67277 ins; 79 del; 96 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From mcimadamore at openjdk.java.net Thu Nov 12 18:07:23 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Thu, 12 Nov 2020 18:07:23 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v20] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix whitespaces ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/56099dac..e3d62ee7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=19 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From mcimadamore at openjdk.java.net Thu Nov 12 18:07:23 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Thu, 12 Nov 2020 18:07:23 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v18] In-Reply-To: References: Message-ID: <0NZr5QkMAOUHtmbqSuJA8ZUHPZR6D3sO5pQvBneCbf0=.70e86f1c-0217-47b0-bd53-bdb4fa488800@github.com> On Wed, 11 Nov 2020 14:18:33 GMT, Vladimir Ivanov wrote: >> Maurizio Cimadamore has updated the pull request incrementally with 10 additional commits since the last revision: >> >> - Merge pull request #7 from JornVernee/Additional_Review_Comments >> >> Additional review comments >> - Revert System.java changes >> - Set copyright year for added files to 2020 >> - Check result of AttachCurrentThread >> - Sort includes alphabetically >> - Relax ret_addr_offset() assert >> - Extra space after if >> - remove excessive asserts in ProgrammableInvoker::invoke_native >> - Remove os::is_MP() check >> - remove blank line in thread.hpp > > I made a pass over hotspot code. Overall, it looks good. Some comments follow. I've just merged against master - which now contains the foreign memory API changes that this JEP depends on. I believe reviewing the changes should now be easier, as only the relevant changes should be presented in the "File Changed" tab. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From coleenp at openjdk.java.net Thu Nov 12 19:13:57 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 12 Nov 2020 19:13:57 GMT Subject: RFR: 8256275: Optimized build is broken In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 12:53:24 GMT, Vladimir Ivanov wrote: > Fix optimized build. > > Testing: > - [x] manual build with --with-debug-level=optimized > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 Looks trivial. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1185 From rkennke at openjdk.java.net Thu Nov 12 19:35:07 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 12 Nov 2020 19:35:07 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v14] In-Reply-To: References: Message-ID: <1xZs_DTTaoyM6EJz6wxsGbZ8TQIL7Az0CWcjI2Lfe_M=.df15f501-6ba8-4f2d-89ec-86ec15c5364f@github.com> > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remaining aarch64 changes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/51811e31..eef816c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=13 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=12-13 Stats: 48 lines in 2 files changed: 18 ins; 20 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From rkennke at openjdk.java.net Thu Nov 12 19:44:14 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 12 Nov 2020 19:44:14 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v15] In-Reply-To: References: Message-ID: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Whitespace fixes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/eef816c8..03633594 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=14 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=13-14 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From rkennke at openjdk.java.net Thu Nov 12 20:14:12 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 12 Nov 2020 20:14:12 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v16] In-Reply-To: References: Message-ID: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Asserts against impossible combinations of weak/phantom vs in-native/in-heap ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1109/files - new: https://git.openjdk.java.net/jdk/pull/1109/files/03633594..c3b1b19a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=15 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1109&range=14-15 Stats: 5 lines in 3 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1109.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1109/head:pull/1109 PR: https://git.openjdk.java.net/jdk/pull/1109 From gziemski at openjdk.java.net Thu Nov 12 20:16:57 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 12 Nov 2020 20:16:57 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: <8oWwl0HlvsDFV2WW8NO__SuUnMqIzASreFg0bhZbNbo=.6e0bec33-9dd9-4728-9881-b196fafbbad6@github.com> References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> <182nfzh-a6eiBfaCmrFfv-ZDxWbyRgUNBzzYR0huHO0=.b6029a06-1aaf-4ad9-aab1-4d83800daab4@github.com> <8oWwl0HlvsDFV2WW8NO__SuUnMqIzASreFg0bhZbNbo=.6e0bec33-9dd9-4728-9881-b196fafbbad6@github.com> Message-ID: On Tue, 10 Nov 2020 17:14:42 GMT, Thomas Stuefe wrote: >>> > > Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. >>> > > Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? >>> > >>> > >>> > For hotspot classes, I would leave the forward declarations in and the headers out. Current standard practice. System headers OTOH I would either include here or, somewhat better, in globalDefinitions_gcc.hpp. >>> >>> Ha! It is already in globalDefinitions_gcc.hpp so neither the direct >>> include nor the forward declarations are actually needed. >> >> Many thanks Thomas & David for the lesson on the header files! >> >> If I understand it correctly, then we need to add `#include "utilities/globalDefinitions.hpp"` to signals_posix.hpp, correct? > >> > > > Forward declaration was a new concept to me, so I had to look it up and although it was easy enough to do it for the internal c++ classes, I struggled with the forward declaring of `struct sigset_t`, but I thought I found it. >> > > > Should I also drop the forward declaration for `outputStream`, `Thread`, `OSThread`? >> > > >> > > >> > > For hotspot classes, I would leave the forward declarations in and the headers out. Current standard practice. System headers OTOH I would either include here or, somewhat better, in globalDefinitions_gcc.hpp. >> > >> > >> > Ha! It is already in globalDefinitions_gcc.hpp so neither the direct >> > include nor the forward declarations are actually needed. >> >> Many thanks Thomas & David for the lesson on the header files! >> >> If I understand it correctly, then we need to add `#include "utilities/globalDefinitions.hpp"` to signals_posix.hpp, correct? > > Yes. > _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ > > On 11/11/2020 3:04 am, Gerard Ziemski wrote: > > > Many thanks Thomas & David for the lesson on the header files! > > If I understand it correctly, then we need to add `#include "utilities/globalDefinitions.hpp"` to signals_posix.hpp, correct? > > It is already there. > > #ifndef OS_POSIX_SIGNALS_POSIX_HPP > #define OS_POSIX_SIGNALS_POSIX_HPP > > #include "memory/allocation.hpp" > #include "utilities/globalDefinitions.hpp" > > #include > > So you can just delete the include of signal.h `#include "memory/allocation.hpp"` has: `#include "memory/allStatic.hpp" #include "utilities/globalDefinitions.hpp"` so in the end `"memory/allocation.hpp"` is all we need. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From dcubed at openjdk.java.net Thu Nov 12 21:06:01 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 12 Nov 2020 21:06:01 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> Message-ID: On Tue, 10 Nov 2020 21:16:20 GMT, Robbin Ehn wrote: >> Sorry my preference is for Monitors instead of semaphores. Let's >> take that discussion off this PR and you can explain why you dislike >> the Monitor so much and think the local semaphore is the way to go. > > Yes Filed the following new RFE: JDK-8256241 replace MonitorDeflation_lock with a semaphore https://bugs.openjdk.java.net/browse/JDK-8256241 ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Thu Nov 12 21:14:07 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 12 Nov 2020 21:14:07 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: <7rQQpVtTPecL-Vt4JJ0CR3U_1523KksQwFAsn9QpJK8=.5e872ffe-d1e4-40e9-9dd4-595712452e5c@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> <6na55oAsYSTSML28qV7bzizCY1BqkmpkNsc1eHEZZ4U=.c58c86f2-b090-4c42-a73e-f36ff2572963@github.com> <7rQQpVtTPecL-Vt4JJ0CR3U_1523KksQwFAsn9QpJK8=.5e872ffe-d1e4-40e9-9dd4-595712452e5c@github.com> Message-ID: On Tue, 10 Nov 2020 21:37:40 GMT, Daniel D. Daugherty wrote: >> If you only need to free CHeap memory, you can do: >> size_t deleted_count = 0; >> ThreadBlockInVM tbivm(self); >> for (ObjectMonitor* monitor: delete_list) { >> delete monitor; >> deleted_count++; >> } >> } > > Ahhh... but that only works if we release the oopStorage when > we deflate. Okay. I grok it now, but don't want to do that in this > changeset. I would want a complete stress test cycle for that > kind of a change. Filed this new RFE: JDK-8256302 releasing oopStorage when deflating allows for faster deleting https://bugs.openjdk.java.net/browse/JDK-8256302 ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Thu Nov 12 21:20:00 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 12 Nov 2020 21:20:00 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v5] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <05PVdNFn9-u7KHSDPXNSr62kvfFQHKCoyT6Z1TIAC2s=.40e0c36c-e7be-48e4-8352-dbc5ebae25b0@github.com> Message-ID: On Tue, 10 Nov 2020 20:57:00 GMT, Robbin Ehn wrote: >> We've removed enough padding with this work already. If we >> want to do more padding removal, then we need to use a >> different RFE. > > Sure, this was more a FYI. Filed this new RFE: JDK-8256303 revisit ObjectMonitor padding between fields https://bugs.openjdk.java.net/browse/JDK-8256303 ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Thu Nov 12 21:25:05 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 12 Nov 2020 21:25:05 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> Message-ID: On Tue, 10 Nov 2020 21:08:53 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/globals.hpp line 750: >> >>> 748: product(intx, MonitorUsedDeflationThreshold, 90, EXPERIMENTAL, \ >>> 749: "Percentage of used monitors before triggering deflation (0 is " \ >>> 750: "off). The check is performed on GuaranteedSafepointInterval " \ >> >> Should there still be experimental options after this change? > > Robbin added MonitorUsedDeflationThreshold as an experimental > option back in JDK10. See the longer reply to David's comment. > I don't plan to change that option with this changeset. Filed the following new RFE: JDK-8256304 should MonitorUsedDeflationThreshold be experimental or diagnostic https://bugs.openjdk.java.net/browse/JDK-8256304 >> src/hotspot/share/runtime/objectMonitor.cpp line 509: >> >>> 507: // >>> 508: bool ObjectMonitor::deflate_monitor() { >>> 509: if (is_busy()) { >> >> is_busy should be checked != 0 since it doesn't return a bool. > > Nice catch! That has been there for many, many years... Filed the following new RFE: JDK-8256301 ObjectMonitor::is_busy() should return bool https://bugs.openjdk.java.net/browse/JDK-8256301 ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Thu Nov 12 21:47:06 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 12 Nov 2020 21:47:06 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v6] In-Reply-To: <4xLwmW1Wya1aTmUbGPONAA7V-ScyRsdUK467gNvdmCQ=.5c2329ea-4b45-4732-8abd-e94d1f89ad5f@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <4oqXqtvTejqhnmPrBcVFQ_2y8F6rTuYmgRmaaV2kk0U=.a260df3f-a1b4-4efa-b3eb-7e5558e84327@github.com> <4xLwmW1Wya1aTmUbGPONAA7V-ScyRsdUK467gNvdmCQ=.5c2329ea-4b45-4732-8abd-e94d1f89ad5f@github.com> Message-ID: On Wed, 11 Nov 2020 16:35:50 GMT, Daniel D. Daugherty wrote: >> I typically use size_t for entities that can scale with the size of the machine's memory, so I don't have to think about whether there are enough bits. Could AvgMonitorsPerThreadEstimate be uintx instead of intx? And then maybe we don't need to declare a range, as the natural range of the uintx seems perfectly valid. > > I'm pretty sure I copied the decl for AvgMonitorsPerThreadEstimate > from some other already existing option. That's SOP for me anyway... > If we make any more changes here it will have to be in a follow up. Filed the following new RFE: JDK-8256307 cleanup AvgMonitorsPerThreadEstimate and _in_use_list_ceiling types https://bugs.openjdk.java.net/browse/JDK-8256307 ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From dcubed at openjdk.java.net Thu Nov 12 21:47:05 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 12 Nov 2020 21:47:05 GMT Subject: RFR: 8253064: monitor list simplifications and getting rid of TSM [v4] In-Reply-To: <_91UeV37-LcU43acy9VUD7rt3woXrc-jdz-ILC7ichs=.57f26cbb-1ac0-4c9e-ba2e-9b620d60b2c9@github.com> References: <1ejVYakgL_jBrD0qpBMBNGJE3E2sLxKvt0ccOs7aKiA=.eadc65bc-3964-4917-b2c2-d18cf4925ac3@github.com> <3hoc3iLbt_d7hTF-3GIbPTSE7QaJhapR5EjsCH8nRkA=.996c5e96-f760-489c-93ae-b6631dc9446d@github.com> <_8BktJ0NHE_i7LQ7_X8JQyyfriKl8_-PA7hOfdBrC3U=.ac425b59-0e89-47f2-9fa7-e461e55689a2@github.com> <_91UeV37-LcU43acy9VUD7rt3woXrc-jdz-ILC7ichs=.57f26cbb-1ac0-4c9e-ba2e-9b620d60b2c9@github.com> Message-ID: On Tue, 10 Nov 2020 22:28:12 GMT, Coleen Phillimore wrote: >> It can be a future RFE, but it won't be at the top of my list of >> things to do. There may already be an RFE for that. > > No, I assume it's not high priority. I'll file an RFE because someday I want these to be cleaned up as a personal nit. Filed the following new RFE: JDK-8256306 ObjectMonitor::_contentions field should not be 'jint' https://bugs.openjdk.java.net/browse/JDK-8256306 ------------- PR: https://git.openjdk.java.net/jdk/pull/642 From ayang at openjdk.java.net Thu Nov 12 22:04:02 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 12 Nov 2020 22:04:02 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 In-Reply-To: References: Message-ID: On Tue, 10 Nov 2020 10:43:20 GMT, Stefan Johansson wrote: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. The PR description provides a very good summary. Furthermore, it would be nice to have an overview of the algorithm: how the additional concurrency managed, what resources are protected by the lock, etc. Ideally, this should be in the comments, making references to corresponding classes/variables. > I've also done a performance run and as expected there are not significant changes. How about pauses? One of the motivations of this PR is to remove pauses introduced by synchronous uncommit, right? src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2668: > 2666: void G1CollectedHeap::uncommit_heap_if_necessary() { > 2667: if (hrm()->has_inactive_regions()) { > 2668: G1UncommitRegionTask::activate(); Since `activate` is already used for regions, I would suggest another word; simple `run` is enough. Additionally, I don't think using an enum is necessary, a plain `bool is_running` should do. enum class TaskState { active, inactive }; TaskState _state; src/hotspot/share/gc/g1/heapRegionManager.cpp line 178: > 176: > 177: void HeapRegionManager::expand(uint start, uint num_regions, WorkGang* pretouch_gang) { > 178: guarantee(num_regions > 0, "No point in calling this for zero regions"); The same `guarantee` is already in `commit_regions`; I think an assert is fine. src/hotspot/share/gc/g1/heapRegionManager.cpp line 189: > 187: } > 188: > 189: if (G1CollectedHeap::heap()->hr_printer()->is_active()) { This `if` is not needed, right? `commit(hr)` does such check already. There are a few other cases of the same kind. src/hotspot/share/gc/g1/heapRegionManager.cpp line 272: > 270: void HeapRegionManager::deactivate_regions(uint start, size_t num_regions) { > 271: guarantee(num_regions >= 1, "Need to specify at least one region to uncommit, tried to uncommit zero regions at %u", start); > 272: guarantee(length() >= num_regions, "pre-condition"); I am not really sure why `guarantee` here but `assert` in `reactivate_regions`. I don't see a strong reason to use `guarantee` here. src/hotspot/share/gc/g1/heapRegionManager.cpp line 264: > 262: assert(num_regions > 0, "No point in calling this for zero regions"); > 263: > 264: clear_auxiliary_data_structures(start, num_regions); IMO, this name is too general. Even after reading its body and the associated comments, I don't get what "data structures" are cleared. src/hotspot/share/gc/g1/heapRegionManager.cpp line 330: > 328: // No more regions available for uncommit > 329: if (range.length() == 0) { > 330: return uncommitted; Is `assert(uncommitted > 0)` true here? Why or why not? Please add some comments; I think it's helpful for the readers. src/hotspot/share/gc/g1/heapRegionManager.hpp line 97: > 95: > 96: // Notify other data structures about change in the heap layout. > 97: void update_committed_space(HeapWord* old_end, HeapWord* new_end); `update_committed_space` is not implemented, right? src/hotspot/share/gc/g1/heapRegionManager.hpp line 81: > 79: G1RegionToSpaceMapper* _card_counts_mapper; > 80: > 81: // Map to keep track of which regions are in use. The name of the variable suggests it's tracking what regions are committed and that's all. After reading `G1CommittedRegionMap`, I believe the class name and the var name are quite misleading; it tracks the state of mem backing up all regions, `committed+mapped`, `committed`, `to_be_uncommitted`, `committed`. I don't have any good alternatives, but the comments could surely be expanded. src/hotspot/share/gc/g1/heapRegionManager.hpp line 137: > 135: void deactivate_regions(uint start, size_t num_regions); > 136: void reactivate_regions(uint start, uint num_regions); > 137: void uncommit_regions(uint start, uint num_regions); Why `uint` vs `size_t`? Why `= 1` for some but not others? If such inconsistency is intentional, it should be documented. src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 107: > 105: > 106: // Each execution is limited to uncommit at most 256M worth of regions. > 107: static const uint region_limit = (uint) (256 * M / G1HeapRegionSize); Why 256M? Better include some motivation in the comments. src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 67: > 65: void active_clear_range(uint start, uint end); > 66: void inactive_set_range(uint start, uint end); > 67: void inactive_clear_range(uint start, uint end); After reading their implementation, I see that `Uncommit_lock` must be held on call them. I think it's best to mention this precondition in the comments, and explain why this lock is needed. ------------- Changes requested by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/1141 From david.holmes at oracle.com Fri Nov 13 04:02:19 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Nov 2020 14:02:19 +1000 Subject: RFR: 8253742: POSIX signal code cleanup [v5] In-Reply-To: References: <-uNIXq8h8NDGpzaNFVp4loICt2jh0EGev5EneflHsX4=.149b59bc-4ffc-4056-9ac1-82d3f4710ab6@github.com> <5GGhy0pxA33snyHk4-oAP608V87gagZwctV9FlwBTR4=.ea121adc-64bd-47a5-831a-eff9bb984237@github.com> <182nfzh-a6eiBfaCmrFfv-ZDxWbyRgUNBzzYR0huHO0=.b6029a06-1aaf-4ad9-aab1-4d83800daab4@github.com> <8oWwl0HlvsDFV2WW8NO__SuUnMqIzASreFg0bhZbNbo=.6e0bec33-9dd9-4728-9881-b196fafbbad6@github.com> Message-ID: <672167a3-e17a-794d-027b-affeb05d882f@oracle.com> On 13/11/2020 6:16 am, Gerard Ziemski wrote: >>> If I understand it correctly, then we need to add `#include "utilities/globalDefinitions.hpp"` to signals_posix.hpp, correct? >> >> It is already there. >> >> #ifndef OS_POSIX_SIGNALS_POSIX_HPP >> #define OS_POSIX_SIGNALS_POSIX_HPP >> >> #include "memory/allocation.hpp" >> #include "utilities/globalDefinitions.hpp" >> >> #include >> >> So you can just delete the include of signal.h > > `#include "memory/allocation.hpp"` has: > > `#include "memory/allStatic.hpp" > #include "utilities/globalDefinitions.hpp"` > > so in the end `"memory/allocation.hpp"` is all we need. Yes but we don't like to rely on extreme levels of indirection as they are completely obscure. globalDefinitions.hpp is a known source of a whole bunch of includes (which a number of folk find objectionable in itself) so it is okay to get an indirect include through it. In all seriousness I would just leave the existing includes alone. If it ain't broke ... and this is wasting far too many cycles for everyone. Thanks, David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/636 > From jbhateja at openjdk.java.net Fri Nov 13 05:31:10 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 13 Nov 2020 05:31:10 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Review comments resolved - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Merge remote-tracking branch 'upstream' into JDK-8252848 - JDK-8252848 : Review comments resolved - JDK-8252848: Review comments resolution. - JDK-8252848: Review comments addressed. - Merge remote-tracking branch 'origin' into JDK-8252848 - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 - ... and 6 more: https://git.openjdk.java.net/jdk/compare/1d3d64f3...edb74db3 ------------- Changes: https://git.openjdk.java.net/jdk/pull/302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=13 Stats: 532 lines in 25 files changed: 483 ins; 23 del; 26 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Fri Nov 13 05:39:59 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 13 Nov 2020 05:39:59 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v11] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 16:09:20 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - JDK-8252848: Review comments addressed. >> - Merge remote-tracking branch 'origin' into JDK-8252848 >> - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - JDK-8252848 : Review comments resolution. >> - Merge remote-tracking branch 'upstream' into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Replacing explicit type checks with existing type checking routines >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - ... and 1 more: https://git.openjdk.java.net/jdk/compare/4031cb41...9e85592a > > Do you have any tests that exercise the different possible versions? > - dynamic length with both small and long copies > - dynamic length that can be proven always less than PartialInliningSize > - constant size less than PartialInliningSize > > Except for these minor comments, and the tests, I am ready to approve. Hi @neliasso I have resolved your outstanding review comments, it will be helpful if you can regress the patch through your test infrastructure once. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From shade at openjdk.java.net Fri Nov 13 07:59:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 13 Nov 2020 07:59:56 GMT Subject: RFR: 8256011: Shenandoah: Don't resurrect finalizably reachable objects [v16] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 20:14:12 GMT, Roman Kennke wrote: >> In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. >> >> I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: >> - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions >> - We can have strong and phantom native referents, and strong and weak in-heap referents >> - Native referents are never compressed >> >> Note that this depends on PR#1140. >> >> Testing: >> - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) >> - [x] tier1 +UseShenandoahGC +ShenandoahVerify >> - [x] tier2 +UseShenandoahGC +ShenandoahVerify > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Asserts against impossible combinations of weak/phantom vs in-native/in-heap Okay, looks good. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1109 From shade at openjdk.java.net Fri Nov 13 08:23:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 13 Nov 2020 08:23:01 GMT Subject: RFR: 8253525: Implement getInstanceSize/sizeOf intrinsics [v5] In-Reply-To: References: Message-ID: On Wed, 21 Oct 2020 17:33:27 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - The new intrinsic-related test >> - Revert the change to test >> - Merge branch 'master' into JDK-8253525-sizeof-intrinsics >> - Add new intrinsics to toBeInvestigated list in CheckGraalIntrinsics.java >> - 8253525: Implement getInstanceSize/sizeOf intrinsics > > Good. Thanks @vnkozlov and @sspitsyn! ------------- PR: https://git.openjdk.java.net/jdk/pull/650 From shade at openjdk.java.net Fri Nov 13 08:23:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 13 Nov 2020 08:23:03 GMT Subject: Integrated: 8253525: Implement getInstanceSize/sizeOf intrinsics In-Reply-To: References: Message-ID: On Wed, 14 Oct 2020 10:11:23 GMT, Aleksey Shipilev wrote: > This is fork off the SizeOf JEP, JDK-8249196. There is already the entry point in JDK that can use the intrinsic like this: `Instrumentation.getInstanceSize`. Therefore, we can implement the C1/C2 intrinsic now, hook it up to `Instrumentation`, and let the tools use that fast path today. > > With this patch, JOL is able to be close to `deepSizeOf` implementation from SizeOf JEP. > > Example performance improvements for sizing up a custom linked list: > > Benchmark (size) Mode Cnt Score Error Units > > # Default > LinkedChainBench.linkedChain 1 avgt 5 705.835 ? 8.051 ns/op > LinkedChainBench.linkedChain 10 avgt 5 3148.874 ? 37.856 ns/op > LinkedChainBench.linkedChain 100 avgt 5 28693.256 ? 142.254 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 290161.590 ? 4594.631 ns/op > > # Instrumentation attached, no intrinsics > LinkedChainBench.linkedChain 1 avgt 5 159.659 ? 19.238 ns/op > LinkedChainBench.linkedChain 10 avgt 5 717.659 ? 22.540 ns/op > LinkedChainBench.linkedChain 100 avgt 5 7739.394 ? 111.683 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 80724.238 ? 2887.794 ns/op > > # Instrumentation attached, new intrinsics > LinkedChainBench.linkedChain 1 avgt 5 95.254 ? 0.808 ns/op > LinkedChainBench.linkedChain 10 avgt 5 261.564 ? 8.524 ns/op > LinkedChainBench.linkedChain 100 avgt 5 3367.192 ? 21.128 ns/op > LinkedChainBench.linkedChain 1000 avgt 5 34148.851 ? 373.080 ns/op This pull request has now been integrated. Changeset: b4d01867 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/b4d01867 Stats: 655 lines in 12 files changed: 655 ins; 0 del; 0 mod 8253525: Implement getInstanceSize/sizeOf intrinsics Reviewed-by: kvn, sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/650 From sjohanss at openjdk.java.net Fri Nov 13 08:27:39 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 13 Nov 2020 08:27:39 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v2] In-Reply-To: References: Message-ID: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Lock for small mapper and use BitMap parallel operations. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1141/files - new: https://git.openjdk.java.net/jdk/pull/1141/files/3f09c5eb..54e16ca9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=00-01 Stats: 200 lines in 4 files changed: 172 ins; 14 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Fri Nov 13 08:35:59 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 13 Nov 2020 08:35:59 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 22:00:40 GMT, Albert Mingkun Yang wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Lock for small mapper and use BitMap parallel operations. > > The PR description provides a very good summary. Furthermore, it would be nice to have an overview of the algorithm: how the additional concurrency managed, what resources are protected by the lock, etc. Ideally, this should be in the comments, making references to corresponding classes/variables. > >> I've also done a performance run and as expected there are not significant changes. > > How about pauses? One of the motivations of this PR is to remove pauses introduced by synchronous uncommit, right? Thanks @albertnetymk for your comments, this update does not include any updates for those. This instead fixes a race in the low level committing code for G1 where it was assumed only one thread can commit and uncommit regions at a time. This is no longer the case after this change. It is true for a single region, but adjacent regions are allowed to be committed/uncommitted in parallel. This can for example happen if a humongous allocation happens during concurrent uncommit. The fix is to use parallel versions of the bitmap operations and for the `G1RegionsSmallerThanCommitSizeMapper` add a lock to prevent parallel updates for this mapper. This is needed because multiple regions can share a single underlying OS page, so we need to make sure those updates are atomic on a page level. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From shade at openjdk.java.net Fri Nov 13 08:46:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 13 Nov 2020 08:46:59 GMT Subject: RFR: JDK-8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe [v2] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 11:21:14 GMT, Matthias Baesken wrote: >> Looks good to me. Are these cases only exist in AIX/PPC code? > >>Looks good to me. Are these cases only exist in AIX/PPC code? > > I found find_blob_unsafe calls with unchecked/asserted return val only in ppc and ppc/AIX code. > > ( Regarding find_blob there are a few that are checked and a few without a NULL check/assert cross platforms, this might be intentional) This still looks fine, if you happen to wait for me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1181 From aph at redhat.com Fri Nov 13 09:07:04 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 13 Nov 2020 09:07:04 +0000 Subject: RFR: 8221554: aarch64 cross-modifying code [v3] In-Reply-To: References: Message-ID: <31a6fb8f-7ef1-ebe8-0141-6fd9ab5d9015@redhat.com> On 11/12/20 5:28 PM, Alan Hayward wrote: > On Fri, 23 Oct 2020 10:17:09 GMT, Andrew Haley wrote: > >>> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5317: >>> >>>> 5315: #endif >>>> 5316: } >>>> 5317: >>> >>> Unless VerifyCrossModifyFence is turned on in debug builds it will almost never be used. Please turn this on by default in AArch64 debug builds. >> >> Please... > > Aha - Looks like your comments hadn't been made public until now. "Hadn't been made public?" There's a way to comment without the owner of the PR being able to see the comments? > The problem is it massively slows down a run. A tier1 test run for fastdebug went from 1h 32m 58s to > 3h 43m 47s. I didn't think that would be acceptable. But why is it so expensive? All it does is mark the threads at a safepoint and later check the mark at safepoints. It's not as if it's doing anything much, but you're telling me it is more expensive than everything else put together. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Fri Nov 13 09:09:00 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 13 Nov 2020 09:09:00 +0000 Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: References: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> <0PpHCCKEdktOLbil22UM8L7Zpqcp82ejR9Zdu6VmmWI=.186fcfb1-33fc-45ad-9c49-a67552a837b9@github.com> Message-ID: <76f601c4-c2c5-18e4-0054-5890adf7aa99@redhat.com> On 11/12/20 5:28 PM, Alan Hayward wrote: > On Wed, 11 Nov 2020 15:04:32 GMT, Andrew Haley wrote: > >>> src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1413: >>> >>>> 1411: __ blr(rscratch2); >>>> 1412: // An instruction sync is required here after the call into the VM. However, >>>> 1413: // that will have been caught in the VM by a cross_modify_fence call. >>> >>> I think this wording is confusing; I had to read it several times. >>> >>> Would not something like >>> ''' >>> // When we return from the VM, the instruction stream may have >>> // been modified. Previously we emitted an ISB at this point, but >>> // it's now unnecessary because the VM itself calls cross_modify_fence() >>> ''' >>> >>> be better? >> >> This is still pending. > > I wanted to avoid mentioning code that no longer exists. (Maybe it's best to just drop the comment?) The comment only makes sense in the context of the code that was there before. > How about: > > // When we return from the VM, the instruction stream may have > // been modified. Therefore needs an isb is required. The VM will > // have already done this by calling cross_modify_fence(). This is self contradicting: firstly you say and ISB is required, then you say why it isn't. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.java.net Fri Nov 13 09:14:59 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 13 Nov 2020 09:14:59 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v6] In-Reply-To: References: <1nKnthro_DGnukTjVKanK4y3FQwiQyGfJmtwc_Qm3Ik=.08f3cee7-8c4a-4f04-8073-d02e30774600@github.com> <0PpHCCKEdktOLbil22UM8L7Zpqcp82ejR9Zdu6VmmWI=.186fcfb1-33fc-45ad-9c49-a67552a837b9@github.com> Message-ID: On Thu, 12 Nov 2020 17:25:41 GMT, Alan Hayward wrote: >> src/hotspot/share/runtime/orderAccess.hpp line 237: >> >>> 235: // to the instruction code preceding the fence is not reordered w.r.t. any >>> 236: // memory accesses to instruction code subsequent to the fence in program order. >>> 237: // It should be used in conjunction with safepointing to ensure that changes >> >> This is rather misleading: the AArch64 needs the ISB for the instruction pipeline rather than the cache, which is invalidated by the IC IVAU broadcast. I suspect other processors work in the same way. >> The language in the AArch64 spec is better, IMO: >> "ensures that all instructions that come after the ISB instruction in program order are fetched from >> the cache or memory after the ISB instruction has completed" > > This better? > > // Finally, we define an "instruction_fence" operation, which ensures that all > // instructions that come after the ISB instruction in program order are fetched > // from the cache or memory after the ISB instruction has completed > // It should be used in conjunction with safepointing to ensure that changes > // to the instruction stream are seen on exit from a safepoint. Namely: > .....etc // Finally, we define an "instruction_fence" operation, which ensures that all // instructions that come after the fence in program order are fetched // from the cache or memory after the fence has completed ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From neliasso at openjdk.java.net Fri Nov 13 09:42:57 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 13 Nov 2020 09:42:57 GMT Subject: Integrated: 8255964: Add all details to jstack log in jtreg timeout handler In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 17:09:58 GMT, Nils Eliasson wrote: > This patch adds jcmd Thread.print to the jtreg timeout handler. > > Please review. This pull request has now been integrated. Changeset: 41139e31 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/41139e31 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8255964: Add all details to jstack log in jtreg timeout handler Reviewed-by: iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/1080 From sjohanss at openjdk.java.net Fri Nov 13 09:44:06 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 13 Nov 2020 09:44:06 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v3] In-Reply-To: References: Message-ID: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into 8236926-ccu - Lock for small mapper and use BitMap parallel operations. - Self review - Simplified task - Improved logging - Test improvement - Uncommit task - Move HeapRegionRange constructor - Stress Uncommit - Feedback from dev-meeting - ... and 1 more: https://git.openjdk.java.net/jdk/compare/b4d01867...0a3ba091 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1141/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=02 Stats: 1457 lines in 26 files changed: 1294 ins; 99 del; 64 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From rkennke at openjdk.java.net Fri Nov 13 09:49:56 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 13 Nov 2020 09:49:56 GMT Subject: Integrated: 8256011: Shenandoah: Don't resurrect finalizably reachable objects In-Reply-To: References: Message-ID: On Sat, 7 Nov 2020 20:10:14 GMT, Roman Kennke wrote: > In the weak-LRB we currently return referents when it is 'marked', that is when it's either reachable strongly or through a finalizable object. This means a finalizable object can be resurrected by Reference.get(), which is wrong. Only truly strongly reachable objects should be returned by Reference.get() during weak-reference-processing. > > I had to reconsider the way we call into runtime-LRBs from generated code for these reasons: > - We need to distinguish phantom, weak and strong reference strength, and native vs in-heap access. Those are two orthogonal dimensions > - We can have strong and phantom native referents, and strong and weak in-heap referents > - Native referents are never compressed > > Note that this depends on PR#1140. > > Testing: > - [x] hotspot_gc_shenandoah (x86_64, x64_32, aarch64) > - [x] tier1 +UseShenandoahGC +ShenandoahVerify > - [x] tier2 +UseShenandoahGC +ShenandoahVerify This pull request has now been integrated. Changeset: b0c28fad Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/b0c28fad Stats: 390 lines in 16 files changed: 107 ins; 87 del; 196 mod 8256011: Shenandoah: Don't resurrect finalizably reachable objects Reviewed-by: shade, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/1109 From vlivanov at openjdk.java.net Fri Nov 13 11:10:56 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 13 Nov 2020 11:10:56 GMT Subject: RFR: 8256275: Optimized build is broken In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 19:11:30 GMT, Coleen Phillimore wrote: >> Fix optimized build. >> >> Testing: >> - [x] manual build with --with-debug-level=optimized >> - [x] hs-precheckin-comp, hs-tier1, hs-tier2 > > Looks trivial. Thanks for the reviews, Claes and Coleen. ------------- PR: https://git.openjdk.java.net/jdk/pull/1185 From vlivanov at openjdk.java.net Fri Nov 13 11:10:59 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 13 Nov 2020 11:10:59 GMT Subject: Integrated: 8256275: Optimized build is broken In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 12:53:24 GMT, Vladimir Ivanov wrote: > Fix optimized build. > > Testing: > - [x] manual build with --with-debug-level=optimized > - [x] hs-precheckin-comp, hs-tier1, hs-tier2 This pull request has now been integrated. Changeset: 8c31bd29 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/8c31bd29 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod 8256275: Optimized build is broken Reviewed-by: redestad, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/1185 From mcimadamore at openjdk.java.net Fri Nov 13 11:38:24 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 13 Nov 2020 11:38:24 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v21] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: - Fix crashes on aarch64 due to lack of intrinsics support - Fix high arity test for aarch64 - Fix build failure with disabled precompiled headers ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/e3d62ee7..15ab3647 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=20 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=19-20 Stats: 57 lines in 7 files changed: 52 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From mcimadamore at openjdk.java.net Fri Nov 13 11:44:17 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 13 Nov 2020 11:44:17 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v22] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: - Merge branch 'master' into 8254231_linker - Fix crashes on aarch64 due to lack of intrinsics support - Fix high arity test for aarch64 - Fix build failure with disabled precompiled headers - Fix whitespaces - Merge branch 'master' into 8254231_linker - Merge pull request #7 from JornVernee/Additional_Review_Comments Additional review comments - Revert System.java changes - Set copyright year for added files to 2020 - Check result of AttachCurrentThread - ... and 73 more: https://git.openjdk.java.net/jdk/compare/5973e91c...9b7cd259 ------------- Changes: https://git.openjdk.java.net/jdk/pull/634/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=21 Stats: 67506 lines in 214 files changed: 67329 ins; 79 del; 98 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From sjohanss at openjdk.java.net Fri Nov 13 13:45:16 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 13 Nov 2020 13:45:16 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v4] In-Reply-To: References: Message-ID: <3P6x1VZJDJMNjlTesZmdkrdVcWM_V55Y_ONb5u6vNv0=.7ab3656f-142c-4312-9c39-27cc2c4ab462@github.com> > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Albert review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1141/files - new: https://git.openjdk.java.net/jdk/pull/1141/files/0a3ba091..19490764 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=02-03 Stats: 51 lines in 7 files changed: 12 ins; 2 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Fri Nov 13 13:45:19 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 13 Nov 2020 13:45:19 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v3] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 09:44:06 GMT, Stefan Johansson wrote: >> Please review this change that implements concurrent uncommit for G1. >> >> **Summary** >> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. >> >> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). >> >> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. >> >> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. >> >> **Logging** >> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: >> [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 >> ... >> [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms >> [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms >> ... >> [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms >> >> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. >> >> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: >> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) >> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) >> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) >> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) >> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) >> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) >> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) >> >> **Testing** >> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. >> >> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. > > Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into 8236926-ccu > - Lock for small mapper and use BitMap parallel operations. > - Self review > - Simplified task > - Improved logging > - Test improvement > - Uncommit task > - Move HeapRegionRange constructor > - Stress Uncommit > - Feedback from dev-meeting > - ... and 1 more: https://git.openjdk.java.net/jdk/compare/b4d01867...0a3ba091 Thanks for the comments Albert. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Fri Nov 13 13:45:25 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 13 Nov 2020 13:45:25 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v4] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 10:49:23 GMT, Albert Mingkun Yang wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Albert review > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2668: > >> 2666: void G1CollectedHeap::uncommit_heap_if_necessary() { >> 2667: if (hrm()->has_inactive_regions()) { >> 2668: G1UncommitRegionTask::activate(); > > Since `activate` is already used for regions, I would suggest another word; simple `run` is enough. Additionally, I don't think using an enum is necessary, a plain `bool is_running` should do. > > enum class TaskState { active, inactive }; > TaskState _state; Good catch, for a short while I had the state `running` as well, but now just having a simple bool is enough now. I still call the state `_active` since I think that is more accurate. Also change the name to `run()`. > src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 67: > >> 65: void active_clear_range(uint start, uint end); >> 66: void inactive_set_range(uint start, uint end); >> 67: void inactive_clear_range(uint start, uint end); > > After reading their implementation, I see that `Uncommit_lock` must be held on call them. I think it's best to mention this precondition in the comments, and explain why this lock is needed. Updated the comments a bit and referred to guarantee_mt_safty_* for more details. > src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 107: > >> 105: >> 106: // Each execution is limited to uncommit at most 256M worth of regions. >> 107: static const uint region_limit = (uint) (256 * M / G1HeapRegionSize); > > Why 256M? Better include some motivation in the comments. 256M is just a "reasonable" limit that I picked to get short enough invocations. I updated the comment a bit. > src/hotspot/share/gc/g1/heapRegionManager.cpp line 178: > >> 176: >> 177: void HeapRegionManager::expand(uint start, uint num_regions, WorkGang* pretouch_gang) { >> 178: guarantee(num_regions > 0, "No point in calling this for zero regions"); > > The same `guarantee` is already in `commit_regions`; I think an assert is fine. Removed it, as you say we check it the first thing we do in `commit_regions`. > src/hotspot/share/gc/g1/heapRegionManager.cpp line 189: > >> 187: } >> 188: >> 189: if (G1CollectedHeap::heap()->hr_printer()->is_active()) { > > This `if` is not needed, right? `commit(hr)` does such check already. There are a few other cases of the same kind. This is a bit unfortunate, but this `is_inactive()` is checking if the `G1HRPrinter` is active and should print stuff. So it is needed. > src/hotspot/share/gc/g1/heapRegionManager.cpp line 264: > >> 262: assert(num_regions > 0, "No point in calling this for zero regions"); >> 263: >> 264: clear_auxiliary_data_structures(start, num_regions); > > IMO, this name is too general. Even after reading its body and the associated comments, I don't get what "data structures" are cleared. To me the comments and implementation is pretty descriptive, but I did update the comment for `signal_mapping_changed` a bit to make it more explicit. Also added a few lines to explain what will be cleared by each mapper. I agree that this is not a perfect name but I think the naming is in line with how we refer to these structures elsewhere in the code. > src/hotspot/share/gc/g1/heapRegionManager.cpp line 272: > >> 270: void HeapRegionManager::deactivate_regions(uint start, size_t num_regions) { >> 271: guarantee(num_regions >= 1, "Need to specify at least one region to uncommit, tried to uncommit zero regions at %u", start); >> 272: guarantee(length() >= num_regions, "pre-condition"); > > I am not really sure why `guarantee` here but `assert` in `reactivate_regions`. I don't see a strong reason to use `guarantee` here. Changed to assert, the reason for them being different is that the code in `deactivate_regions` is old and just moved into this function. Also changed condition to > 0 like we have most other places. > src/hotspot/share/gc/g1/heapRegionManager.cpp line 330: > >> 328: // No more regions available for uncommit >> 329: if (range.length() == 0) { >> 330: return uncommitted; > > Is `assert(uncommitted > 0)` true here? Why or why not? Please add some comments; I think it's helpful for the readers. `uncommitted` can be 0 here so we can't add that assert. There is a chance that between we add the uncommit-task (which calls this function) and that we grab the lock, someone else might have used the `inactive` regions to expand the heap again. Update the comment a bit, I hope it makes it easier to follow. > src/hotspot/share/gc/g1/heapRegionManager.hpp line 81: > >> 79: G1RegionToSpaceMapper* _card_counts_mapper; >> 80: >> 81: // Map to keep track of which regions are in use. > > The name of the variable suggests it's tracking what regions are committed and that's all. After reading `G1CommittedRegionMap`, I believe the class name and the var name are quite misleading; it tracks the state of mem backing up all regions, `committed+mapped`, `committed`, `to_be_uncommitted`, `committed`. I don't have any good alternatives, but the comments could surely be expanded. Naming is hard and I agree this isn't perfect but the union of the two bitmaps in `G1CommittedRegionMap` do track what is committed, so it's not completely wrong. I did update the comment here a bit. > src/hotspot/share/gc/g1/heapRegionManager.hpp line 97: > >> 95: >> 96: // Notify other data structures about change in the heap layout. >> 97: void update_committed_space(HeapWord* old_end, HeapWord* new_end); > > `update_committed_space` is not implemented, right? Correct, filed [JDK-8256323](https://bugs.openjdk.java.net/browse/JDK-8256323) for this. > src/hotspot/share/gc/g1/heapRegionManager.hpp line 137: > >> 135: void deactivate_regions(uint start, size_t num_regions); >> 136: void reactivate_regions(uint start, uint num_regions); >> 137: void uncommit_regions(uint start, uint num_regions); > > Why `uint` vs `size_t`? Why `= 1` for some but not others? If such inconsistency is intentional, it should be documented. The reason for `size_t` for `deactivate_regions` is that we have a size_t in `shrink_at` which calls it. But for consistency here I will change it to `uint` and add a cast in `shrink_at`. For the default values, on expand it is historic but can now be removed. Good catch, and for inactive I don't have any good excuse =) ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Fri Nov 13 15:42:10 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 13 Nov 2020 15:42:10 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v5] In-Reply-To: References: Message-ID: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Albert review 2 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1141/files - new: https://git.openjdk.java.net/jdk/pull/1141/files/19490764..8552d23b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=03-04 Stats: 16 lines in 1 file changed: 1 ins; 11 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Fri Nov 13 15:49:01 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 13 Nov 2020 15:49:01 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v5] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 12:35:58 GMT, Stefan Johansson wrote: >> src/hotspot/share/gc/g1/heapRegionManager.cpp line 189: >> >>> 187: } >>> 188: >>> 189: if (G1CollectedHeap::heap()->hr_printer()->is_active()) { >> >> This `if` is not needed, right? `commit(hr)` does such check already. There are a few other cases of the same kind. > > This is a bit unfortunate, but this `is_active()` is checking if the `G1HRPrinter` is active and should print stuff. So it is needed. Misunderstood Alberts comment here. He is correct that the check inside `commit(hr)` is enough, updated this and a few other similar cases. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From mcimadamore at openjdk.java.net Fri Nov 13 17:06:41 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 13 Nov 2020 17:06:41 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v23] In-Reply-To: References: Message-ID: <7igMkBiVz8heDwoSy6l7jyEr37zVgg6EIA7vNQrAFzA=.e7b3930b-e800-4426-a844-acffd744b64e@github.com> > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with six additional commits since the last revision: - Merge pull request #8 from JornVernee/Vlad_Comments Address More Review comments - - Don't print anything in nmehtod debug output for native invoker if there are none. - Use memcpy to copy native stubs to nmethod data - Simplify print code - Merge branch '8254231_linker' into Vlad_Comments - Address Vlad's review comments - Add ResourceMark to ProgrammableUpcallHandler constructor ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/9b7cd259..739c7925 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=22 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=21-22 Stats: 264 lines in 29 files changed: 72 ins; 112 del; 80 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From darcy at openjdk.java.net Fri Nov 13 17:37:08 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Fri, 13 Nov 2020 17:37:08 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v5] In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 20:39:14 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Marked as reviewed by darcy (Reviewer). test/jdk/java/lang/Math/ExpCornerCaseTests.java line 2: > 1: /* > 2: * Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved. Before pushing, [lease update the copyright year to "2011, 2020" (since this file was based on one from 2020). test/jdk/java/lang/Math/ExpCornerCaseTests.java line 28: > 26: * @bug 8255368 > 27: * @summary Tests corner cases of Math.exp > 28: * @author Xubo Zhang We don't use @author tags on new tests, but haven't removed from old tests. src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp line 593: > 591: jmp(L_2TAG_PACKET_2_0_2); > 592: cmpl(ecx, INT_MIN); > 593: jcc(Assembler::below, L_2TAG_PACKET_3_0_2); If all the changed instructions are not covered by the two test arguments, please add additional values to the test covering the other instructions. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Fri Nov 13 17:53:16 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Fri, 13 Nov 2020 17:53:16 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: References: Message-ID: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/b23c8cba..704dfff2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From dongbo at openjdk.java.net Sat Nov 14 06:28:05 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Sat, 14 Nov 2020 06:28:05 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference Message-ID: This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. Verified with linux-aarch64-server-release, tier1-3. Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. The JMH results on Kunpeng916: Benchmark (count) (seed) Mode Cnt Score Error Units # before, fsub+fabs FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op # after, fabd FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op ------------- Commit messages: - 8256318: AArch64: Add support for floating-point absolute difference Changes: https://git.openjdk.java.net/jdk/pull/1215/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256318 Stats: 209 lines in 20 files changed: 176 ins; 0 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/1215.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1215/head:pull/1215 PR: https://git.openjdk.java.net/jdk/pull/1215 From aph at openjdk.java.net Sat Nov 14 10:34:56 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 14 Nov 2020 10:34:56 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference In-Reply-To: References: Message-ID: On Sat, 14 Nov 2020 06:22:19 GMT, Dong Bo wrote: > This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. > > The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. > For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), > so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. > > The JMH results on Kunpeng916: > > Benchmark (count) (seed) Mode Cnt Score Error Units > > # before, fsub+fabs > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op > > # after, fabd > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op Looks good. Please add any new instructions to aarch64-asmtest.py and regenerate assembler_aarch64.cpp. ------------- Changes requested by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1215 From njian at openjdk.java.net Mon Nov 16 02:18:54 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 16 Nov 2020 02:18:54 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference In-Reply-To: References: Message-ID: On Sat, 14 Nov 2020 06:22:19 GMT, Dong Bo wrote: > This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. > > The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. > For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), > so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. > > The JMH results on Kunpeng916: > > Benchmark (count) (seed) Mode Cnt Score Error Units > > # before, fsub+fabs > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op > > # after, fabd > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op src/hotspot/cpu/aarch64/aarch64.ad line 18110: > 18108: %{ > 18109: predicate(n->as_Vector()->length() == 2); > 18110: match(Set dst (AbsVF (SubVF src1 src2))); We now have aarch64_neon.ad, do you think we should put neon vector rules to that file, to keep aarch64.ad smaller? ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From dongbo at openjdk.java.net Mon Nov 16 02:53:55 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 16 Nov 2020 02:53:55 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 02:15:13 GMT, Ningsheng Jian wrote: >> This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. >> >> Verified with linux-aarch64-server-release, tier1-3. >> >> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. >> >> The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. >> For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), >> so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. >> >> The JMH results on Kunpeng916: >> >> Benchmark (count) (seed) Mode Cnt Score Error Units >> >> # before, fsub+fabs >> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op >> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op >> >> # after, fabd >> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op >> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op > > src/hotspot/cpu/aarch64/aarch64.ad line 18110: > >> 18108: %{ >> 18109: predicate(n->as_Vector()->length() == 2); >> 18110: match(Set dst (AbsVF (SubVF src1 src2))); > > We now have aarch64_neon.ad, do you think we should put neon vector rules to that file, to keep aarch64.ad smaller? I've considered this. But I feel a little bit unconsistent that only the new `fabd` is added into `aarch64_neon.ad`, while other NEON intructions (i.e. `fabs`, `fsub`, `fdiv`, `fsqrt`, etc) are still in aarch64.ad. And moving them all from aarch64.ad to aarch64_neon.ad deviates far from this patch. I think I can put the close related `fabs` and `fabd` into aarch_neon.ad in this patch. Is that OK? ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From dongbo at openjdk.java.net Mon Nov 16 03:07:11 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 16 Nov 2020 03:07:11 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v2] In-Reply-To: References: Message-ID: > This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. > > The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. > For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), > so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. > > The JMH results on Kunpeng916: > > Benchmark (count) (seed) Mode Cnt Score Error Units > > # before, fsub+fabs > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op > > # after, fabd > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: Add fabd to aarch64-asmtest.py and regenerate assembler_aarch64.cpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1215/files - new: https://git.openjdk.java.net/jdk/pull/1215/files/c00d941e..b2543841 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=00-01 Stats: 306 lines in 2 files changed: 6 ins; 1 del; 299 mod Patch: https://git.openjdk.java.net/jdk/pull/1215.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1215/head:pull/1215 PR: https://git.openjdk.java.net/jdk/pull/1215 From dongbo at openjdk.java.net Mon Nov 16 03:15:23 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 16 Nov 2020 03:15:23 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v3] In-Reply-To: References: Message-ID: > This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. > > The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. > For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), > so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. > > The JMH results on Kunpeng916: > > Benchmark (count) (seed) Mode Cnt Score Error Units > > # before, fsub+fabs > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op > > # after, fabd > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: fix trailing whitespace error ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1215/files - new: https://git.openjdk.java.net/jdk/pull/1215/files/b2543841..e80d3802 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1215.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1215/head:pull/1215 PR: https://git.openjdk.java.net/jdk/pull/1215 From dongbo at openjdk.java.net Mon Nov 16 03:15:23 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 16 Nov 2020 03:15:23 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v3] In-Reply-To: References: Message-ID: On Sat, 14 Nov 2020 10:32:21 GMT, Andrew Haley wrote: > Looks good. Please add any new instructions to aarch64-asmtest.py and regenerate assembler_aarch64.cpp. Done, added tests for `fabd` scalar/vector instructions in this script and regenerated the code. Verfied with linux-aarch64-server-fastdebug build. ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From njian at openjdk.java.net Mon Nov 16 03:15:24 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 16 Nov 2020 03:15:24 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v3] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 02:51:05 GMT, Dong Bo wrote: > But I feel a little bit unconsistent that only the new fabd is added into aarch64_neon.ad, > while other NEON intructions (i.e. fabs, fsub, fdiv, fsqrt, etc) are still in aarch64.ad. > And moving them all from aarch64.ad to aarch64_neon.ad deviates far from this patch. Yes, I think when we introduced aarch64_neon.ad (m4), we just tried to keep that patch simple and would move other vector rules in future patches. Maybe Andrew can comment on this? ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From shade at openjdk.java.net Mon Nov 16 08:20:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 08:20:58 GMT Subject: RFR: 8255523: Clean up temporary shared_locs initializations In-Reply-To: References: <5OHoOpEGZ0j62A09r5hqDzxIu39qDSrxrC-b2tWOAzg=.4555cf67-82ee-4827-9154-1322b3f4fcf8@github.com> Message-ID: On Thu, 5 Nov 2020 07:19:26 GMT, Aleksey Shipilev wrote: >> See #648. Apparently, LLVM 11 complains that we are computing the number of elements over the array of a different type. Instead of ignoring the warning, it seems better to just clean up that code. We can allocate the whole thing as resource array of the same size. `sizeOf(relocInfo) = 2`, since it carries `unsigned short`. > > Friendly reminder. Anyone? :) ------------- PR: https://git.openjdk.java.net/jdk/pull/897 From kbarrett at openjdk.java.net Mon Nov 16 09:38:00 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Nov 2020 09:38:00 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 11:39:59 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? > > Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. > > With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. > > This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. > > Testing: tier1-5, one or two 6-8 runs > > The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements > > Thanks, > Thomas I've only skimmed the non-GC changes. src/hotspot/share/oops/klass.cpp line 207: > 205: _shared_class_path_index(-1) { > 206: CDS_ONLY(_shared_class_flags = 0); > 207: CDS_JAVA_HEAP_ONLY(_archived_mirror_index = -1); Why are the semi-colons being moved out of the macros here? This isn't needed, and is contrary to usage elsewhere. src/hotspot/share/memory/heapShared.hpp line 70: > 68: GrowableArray* _subgraph_entry_fields; > 69: > 70: // Does this KlassSubGraphInfo belong to the arcived full module graph s/arcived/archived/ src/hotspot/share/memory/heapShared.hpp line 85: > 83: _is_full_module_graph(is_full_module_graph), > 84: _has_non_early_klasses(false) {} > 85: ~KlassSubGraphInfo() { Please add a blank line between the constructor and the destructor. src/hotspot/share/gc/g1/heapRegion.hpp line 181: > 179: // Returns whether the given object is dead based on TAMS and bitmap. > 180: // An object is dead iff a) it was not allocated since the last mark, b) it > 181: // is not marked, and c) it is not in a closed archive region. The first, unchanged, line isn't consistent with the additional comment. I suggest ending it after "dead", and adding "(TAMS)" and "(bitmap)" before the ending commas of the first two alternatives. src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp line 60: > 58: // Never free closed archive regions. This is also be the only other allowed > 59: // type at this point. > 60: assert(hr->is_closed_archive(), "Only closed archive regions can also be pinned."); I found the assert message here very confusing. It's really that all other pinned region cases have been covered, and closed_archive is the last remaining one. src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp line 128: > 126: } > 127: > 128: void G1FullGCPrepareTask::G1CalculatePointersClosure::free_pinned_region(HeapRegion* hr) { Should this be called free_archive_region (or free_open_archive_region)? The statistics counter is `_pinned_archive_regions_removed`, so this presumably can't be used for some other kind of pinned region. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1163 From dholmes at openjdk.java.net Mon Nov 16 10:13:54 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 16 Nov 2020 10:13:54 GMT Subject: RFR: 8255523: Clean up temporary shared_locs initializations In-Reply-To: <5OHoOpEGZ0j62A09r5hqDzxIu39qDSrxrC-b2tWOAzg=.4555cf67-82ee-4827-9154-1322b3f4fcf8@github.com> References: <5OHoOpEGZ0j62A09r5hqDzxIu39qDSrxrC-b2tWOAzg=.4555cf67-82ee-4827-9154-1322b3f4fcf8@github.com> Message-ID: On Wed, 28 Oct 2020 09:36:55 GMT, Aleksey Shipilev wrote: > See #648. Apparently, LLVM 11 complains that we are computing the number of elements over the array of a different type. Instead of ignoring the warning, it seems better to just clean up that code. We can allocate the whole thing as resource array of the same size. `sizeOf(relocInfo) = 2`, since it carries `unsigned short`. Seems quite reasonable. We already have other resource area usage in that code. Not clear why the stack buffer was incorrectly typed in the first place? Alignment issue perhaps? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/897 From shade at openjdk.java.net Mon Nov 16 10:25:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 10:25:55 GMT Subject: RFR: 8255523: Clean up temporary shared_locs initializations In-Reply-To: References: <5OHoOpEGZ0j62A09r5hqDzxIu39qDSrxrC-b2tWOAzg=.4555cf67-82ee-4827-9154-1322b3f4fcf8@github.com> Message-ID: On Mon, 16 Nov 2020 10:11:26 GMT, David Holmes wrote: >> See #648. Apparently, LLVM 11 complains that we are computing the number of elements over the array of a different type. Instead of ignoring the warning, it seems better to just clean up that code. We can allocate the whole thing as resource array of the same size. `sizeOf(relocInfo) = 2`, since it carries `unsigned short`. > > Seems quite reasonable. We already have other resource area usage in that code. > > Not clear why the stack buffer was incorrectly typed in the first place? Alignment issue perhaps? Ew. Actually now I discovered https://bugs.openjdk.java.net/browse/JDK-8253375 and https://bugs.openjdk.java.net/browse/JDK-8253868. This needs more work. ------------- PR: https://git.openjdk.java.net/jdk/pull/897 From tschatzl at openjdk.java.net Mon Nov 16 10:33:10 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 16 Nov 2020 10:33:10 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v2] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? > > Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. > > With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. > > This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. > > Testing: tier1-5, one or two 6-8 runs > > The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: kbarrett review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1163/files - new: https://git.openjdk.java.net/jdk/pull/1163/files/c29503b6..a324c9c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1163&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1163&range=00-01 Stats: 28 lines in 9 files changed: 4 ins; 0 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/1163.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1163/head:pull/1163 PR: https://git.openjdk.java.net/jdk/pull/1163 From patric.hedlin at oracle.com Mon Nov 16 11:14:04 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 16 Nov 2020 12:14:04 +0100 Subject: [15u] RFR: 8248411: [aarch64] Insufficient error handling when CodeBuffer is exhausted Message-ID: <0e4f6efc-13b8-5456-2e44-e5c380231c9f@oracle.com> I would like to ask for help to review the following change/update to back-port JDK-8248411 to JDK15u. Changes made to the original patch includes replacing (C++11) template varargs with macros. Issue:? https://bugs.openjdk.java.net/browse/JDK-8248411 Webrev: http://cr.openjdk.java.net/~phedlin/tr8248411.15u/ Testing: tier1-3 Best regards, Patric From mcimadamore at openjdk.java.net Mon Nov 16 11:30:22 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 16 Nov 2020 11:30:22 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v24] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #9 from JornVernee/Windows_Warnings Fix warnings on MSVC - Fix warnings on MSVC ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/739c7925..000f75d5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=23 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=22-23 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From aph at redhat.com Mon Nov 16 11:57:11 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 16 Nov 2020 11:57:11 +0000 Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v3] In-Reply-To: References: Message-ID: <06f9118b-1d98-5279-9ccc-4d2203e30627@redhat.com> On 11/16/20 3:15 AM, Ningsheng Jian wrote: > On Mon, 16 Nov 2020 02:51:05 GMT, Dong Bo wrote: > >> But I feel a little bit unconsistent that only the new fabd is added into aarch64_neon.ad, >> while other NEON intructions (i.e. fabs, fsub, fdiv, fsqrt, etc) are still in aarch64.ad. >> And moving them all from aarch64.ad to aarch64_neon.ad deviates far from this patch. > > Yes, I think when we introduced aarch64_neon.ad (m4), we just tried to keep that patch simple and would move other vector rules in future patches. Maybe Andrew can comment on this? I wonder: I'm not sure if we should do the lot in one bang. I can't quite figure out the best thing to do. It'll be tricky to move all of the SIMD instructions, but I guess it's the best thing to do. It'll make backports hard, but we don't see many in this area. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jvernee at openjdk.java.net Mon Nov 16 12:19:03 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 16 Nov 2020 12:19:03 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build Message-ID: Fix win-32 linker error due to forward declaration and definition signature mismatch. FWIW, the altered header is included from nativeLookup.cpp, which uses the function pointer of that function. But, since the signature of the definition [1] is different, and due to name mangling on the particular ABI, the symbols of the declaration in the header and definition are different as well, and things fail to link later. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/scopedMemoryAccess.cpp#L181 ------------- Commit messages: - Fix win-32 linker error due to forward declaration and definition signature mismatch Changes: https://git.openjdk.java.net/jdk/pull/1222/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1222&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256380 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1222.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1222/head:pull/1222 PR: https://git.openjdk.java.net/jdk/pull/1222 From shade at openjdk.java.net Mon Nov 16 12:28:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 12:28:01 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 12:13:38 GMT, Jorn Vernee wrote: > Fix win-32 linker error due to forward declaration and definition signature mismatch. > > FWIW, the altered header is included from nativeLookup.cpp, which uses the function pointer of that function. But, since the signature of the definition [1] is different, and due to name mangling on the particular ABI, the symbols of the declaration in the header and definition are different as well, and things fail to link later. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/scopedMemoryAccess.cpp#L181 Looks good to me, and trivial. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1222 From tschatzl at openjdk.java.net Mon Nov 16 12:33:10 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 16 Nov 2020 12:33:10 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v3] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? > > Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. > > With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. > > This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. > > Testing: tier1-5, one or two 6-8 runs > > The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' of https://git.openjdk.java.net/jdk into 8253081-null-narrow-klass-changes2 - kbarrett review - Initial import ------------- Changes: https://git.openjdk.java.net/jdk/pull/1163/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1163&range=02 Stats: 666 lines in 32 files changed: 471 ins; 83 del; 112 mod Patch: https://git.openjdk.java.net/jdk/pull/1163.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1163/head:pull/1163 PR: https://git.openjdk.java.net/jdk/pull/1163 From stuefe at openjdk.java.net Mon Nov 16 12:38:55 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 16 Nov 2020 12:38:55 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 12:25:33 GMT, Aleksey Shipilev wrote: >> Fix win-32 linker error due to forward declaration and definition signature mismatch. >> >> FWIW, the altered header is included from nativeLookup.cpp, which uses the function pointer of that function. But, since the signature of the definition [1] is different, and due to name mangling on the particular ABI, the symbols of the declaration in the header and definition are different as well, and things fail to link later. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/scopedMemoryAccess.cpp#L181 > > Looks good to me, and trivial. Looks good. Interesting. So because we export with extern "C" all other platforms had the name decorations stripped away, and we only notice the mismatch because 32bit windows still has calling convention specific decorations. ..Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From tschatzl at openjdk.java.net Mon Nov 16 12:40:58 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 16 Nov 2020 12:40:58 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v3] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 09:34:51 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into 8253081-null-narrow-klass-changes2 >> - kbarrett review >> - Initial import > > I've only skimmed the non-GC changes. @kimbarrett : I think the latest update fixes all your concerns. Also had to rebase. Reran tier1. ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From mcimadamore at openjdk.java.net Mon Nov 16 13:05:30 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 16 Nov 2020 13:05:30 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v25] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix signature mismatch on aarch64 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/000f75d5..3999a188 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=24 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From dongbo at openjdk.java.net Mon Nov 16 13:26:23 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 16 Nov 2020 13:26:23 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v4] In-Reply-To: References: Message-ID: > This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. > > The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. > For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), > so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. > > The JMH results on Kunpeng916: > > Benchmark (count) (seed) Mode Cnt Score Error Units > > # before, fsub+fabs > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op > > # after, fabd > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: move ABS/FABS/FABD neon vector rules into aarch64_neon.ad ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1215/files - new: https://git.openjdk.java.net/jdk/pull/1215/files/e80d3802..eae9185c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=02-03 Stats: 391 lines in 3 files changed: 227 ins; 164 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1215.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1215/head:pull/1215 PR: https://git.openjdk.java.net/jdk/pull/1215 From dongbo4 at huawei.com Mon Nov 16 13:31:24 2020 From: dongbo4 at huawei.com (dongbo (E)) Date: Mon, 16 Nov 2020 21:31:24 +0800 Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v3] In-Reply-To: <06f9118b-1d98-5279-9ccc-4d2203e30627@redhat.com> References: <06f9118b-1d98-5279-9ccc-4d2203e30627@redhat.com> Message-ID: <654cca24-fdc1-485f-649c-143dede8a264@huawei.com> On 2020/11/16 19:57, Andrew Haley wrote: > On 11/16/20 3:15 AM, Ningsheng Jian wrote: >> On Mon, 16 Nov 2020 02:51:05 GMT, Dong Bo wrote: >> >>> But I feel a little bit inconsistent that only the new fabd is added into aarch64_neon.ad, >>> while other NEON intructions (i.e. fabs, fsub, fdiv, fsqrt, etc) are still in aarch64.ad. >>> And moving them all from aarch64.ad to aarch64_neon.ad deviates far away from this patch. >> Yes, I think when we introduced aarch64_neon.ad (m4), we just tried to keep that patch simple and would move other vector rules in future patches. Maybe Andrew can comment on this? > I wonder: I'm not sure if we should do the lot in one bang. > > I can't quite figure out the best thing to do. It'll be tricky to move > all of the SIMD instructions, but I guess it's the best thing to do. It'll > make backports hard, but we don't see many in this area. Yes. Moving all NEON instructions to aarch64_neon.ad would make the code clearer and more consistent. I put ABS/FABS/FABD into aarch64_neon.ad, hope it would be a good start for this work. Thanks. From vromero at openjdk.java.net Mon Nov 16 13:36:13 2020 From: vromero at openjdk.java.net (Vicente Romero) Date: Mon, 16 Nov 2020 13:36:13 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) Message-ID: Please review the code for the second iteration of sealed classes. In this iteration we are: - Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies. - Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface ------------- Commit messages: - 8246778: Compiler implementation for Sealed Classes (Second Preview) Changes: https://git.openjdk.java.net/jdk/pull/1227/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1227&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 Stats: 589 lines in 12 files changed: 508 ins; 18 del; 63 mod Patch: https://git.openjdk.java.net/jdk/pull/1227.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1227/head:pull/1227 PR: https://git.openjdk.java.net/jdk/pull/1227 From alanb at openjdk.java.net Mon Nov 16 13:52:06 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Mon, 16 Nov 2020 13:52:06 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 13:30:06 GMT, Vicente Romero wrote: > Please review the code for the second iteration of sealed classes. In this iteration we are: > > - Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies > - Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface > - renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] > - adding code to make sure that annotations can't be sealed > - improving some tests > > TIA > > Related specs: > [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) > [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) > [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) src/java.base/share/classes/java/lang/Package.java line 227: > 225: * This method reports on a distinct concept of sealing from > 226: * {@link Class#isSealed() Class::isSealed}. > 227: * This API note will be very confusing to readers. I think the javadoc will need to be fleshed out and probably will need to link to a section the Package class description that defines the legacy concept of sealing. ------------- PR: https://git.openjdk.java.net/jdk/pull/1227 From nils.eliasson at oracle.com Mon Nov 16 13:59:37 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 16 Nov 2020 14:59:37 +0100 Subject: [15u] RFR: 8248411: [aarch64] Insufficient error handling when CodeBuffer is exhausted In-Reply-To: <0e4f6efc-13b8-5456-2e44-e5c380231c9f@oracle.com> References: <0e4f6efc-13b8-5456-2e44-e5c380231c9f@oracle.com> Message-ID: Looks good! Thanks for fixing! Best regards, Nils On 2020-11-16 12:14, Patric Hedlin wrote: > I would like to ask for help to review the following change/update to > back-port JDK-8248411 to JDK15u. > > Changes made to the original patch includes replacing (C++11) template > varargs with macros. > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8248411 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8248411.15u/ > > Testing: tier1-3 > > Best regards, > Patric From mcimadamore at openjdk.java.net Mon Nov 16 14:01:04 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 16 Nov 2020 14:01:04 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 13:30:06 GMT, Vicente Romero wrote: > Please review the code for the second iteration of sealed classes. In this iteration we are: > > - Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies > - Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface > - renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] > - adding code to make sure that annotations can't be sealed > - improving some tests > > TIA > > Related specs: > [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) > [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) > [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) I left some comments re. the changes in cast conversion routine src/jdk.compiler/share/classes/com/sun/tools/javac/code/Types.java line 1659: > 1657: result = isCastable.visit(t,s); > 1658: } > 1659: if ((t.tsym.isSealed() || s.tsym.isSealed())) { It would probably be better to only run this extra check if `result == true`, to minimize compatibility impact. src/jdk.compiler/share/classes/com/sun/tools/javac/code/Types.java line 1672: > 1670: } > 1671: // if both are classes or both are interfaces, shortcut > 1672: if (ts.isInterface() == ss.isInterface()) { What happens with code like this? interface A permits B { } non-sealed interface B extends A { } interface C { } class D implements C, B { } // this is a valid witness for both A and C, but A and C are unrelated with subtyping class Test { void m(A a, C c) { a = (A)c; } }``` Note that, w/o sealed types, the above snippet compiles ok - e.g. casting C to A is not going to give problems (as there could be a common subtype D <: A, C). ------------- PR: https://git.openjdk.java.net/jdk/pull/1227 From patric.hedlin at oracle.com Mon Nov 16 14:09:39 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 16 Nov 2020 15:09:39 +0100 Subject: [15u] RFR: 8248411: [aarch64] Insufficient error handling when CodeBuffer is exhausted In-Reply-To: References: <0e4f6efc-13b8-5456-2e44-e5c380231c9f@oracle.com> Message-ID: <271aa974-58f2-a4dd-9964-231af71d5195@oracle.com> Thanks for reviewing Nils. /Patric On 2020-11-16 14:59, Nils Eliasson wrote: > Looks good! > > Thanks for fixing! > > Best regards, > Nils > > On 2020-11-16 12:14, Patric Hedlin wrote: >> I would like to ask for help to review the following change/update to >> back-port JDK-8248411 to JDK15u. >> >> Changes made to the original patch includes replacing (C++11) >> template varargs with macros. >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8248411 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8248411.15u/ >> >> Testing: tier1-3 >> >> Best regards, >> Patric > From ihse at openjdk.java.net Mon Nov 16 14:25:03 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 16 Nov 2020 14:25:03 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 12:36:30 GMT, Thomas Stuefe wrote: >> Looks good to me, and trivial. > > Looks good. > > Interesting. So because we export with extern "C" all other platforms had the name decorations stripped away, and we only notice the mismatch because 32bit windows still has calling convention specific decorations. > > ..Thomas @tstuefe This is a but unnerving. I have no objections to the patch, but it seems worrisome that this mismatch was not caught on other platforms. Anyone care digging more deeply into why this is the case? ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From jvernee at openjdk.java.net Mon Nov 16 14:29:02 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 16 Nov 2020 14:29:02 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: <2yACyqbDnfsxpCaMqyPBpn4TevhUPSM4ST95ZjxhDq8=.df2f07df-8f77-4291-9561-24b2c38c1885@github.com> On Mon, 16 Nov 2020 14:22:39 GMT, Magnus Ihse Bursie wrote: >> Looks good. >> >> Interesting. So because we export with extern "C" all other platforms had the name decorations stripped away, and we only notice the mismatch because 32bit windows still has calling convention specific decorations. >> >> ..Thomas > > @tstuefe This is a but unnerving. I have no objections to the patch, but it seems worrisome that this mismatch was not caught on other platforms. Anyone care digging more deeply into why this is the case? Since it's a trivial change, I'll integrate this now. ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From jvernee at openjdk.java.net Mon Nov 16 14:29:04 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 16 Nov 2020 14:29:04 GMT Subject: Integrated: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: <6kF27aShBQ0V0DYt0olORLlqdZs-ASmy-0elHafXBE8=.04838bbf-c050-4f97-9033-a972a6703f94@github.com> On Mon, 16 Nov 2020 12:13:38 GMT, Jorn Vernee wrote: > Fix win-32 linker error due to forward declaration and definition signature mismatch. > > FWIW, the altered header is included from nativeLookup.cpp, which uses the function pointer of that function. But, since the signature of the definition [1] is different, and due to name mangling on the particular ABI, the symbols of the declaration in the header and definition are different as well, and things fail to link later. > > Testing: Building Windows-x86 locally, and running jdk_foreign tests. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/scopedMemoryAccess.cpp#L181 This pull request has now been integrated. Changeset: b8de2391 Author: Jorn Vernee URL: https://git.openjdk.java.net/jdk/commit/b8de2391 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8256380: JDK-8254162 broke 32bit windows build Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From jvernee at openjdk.java.net Mon Nov 16 14:46:06 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 16 Nov 2020 14:46:06 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 14:22:39 GMT, Magnus Ihse Bursie wrote: >> Looks good. >> >> Interesting. So because we export with extern "C" all other platforms had the name decorations stripped away, and we only notice the mismatch because 32bit windows still has calling convention specific decorations. >> >> ..Thomas > > @tstuefe This is a but unnerving. I have no objections to the patch, but it seems worrisome that this mismatch was not caught on other platforms. Anyone care digging more deeply into why this is the case? @magicus I think it's just a consequence of how the C ABI works on each platform. Without name mangling, there is no way to discern a function `void foo(int)` from another function `void foo(double)`. C spec says this: > All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined. I.e. I can have a compilation unit that forward declares `extern void foo(int);` and then calls it, passing an `int` as argument. This gets compiled into a .obj file that just has e.g. an external `_foo` in it's symbol table, i.e. no type information included. Then another compilation unit that define a `void foo(double)`. Again .obj file will just have a `_foo` symbol in it. Linker is happy to link the 2 together, but the behavior is undefined.' Maybe there is something that the linker can do based on other information in the .obj file, but guess it doesn't? (Or maybe we don't turn it on) ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From stuefe at openjdk.java.net Mon Nov 16 14:53:03 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 16 Nov 2020 14:53:03 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 12:36:30 GMT, Thomas Stuefe wrote: >> Looks good to me, and trivial. > > Looks good. > > Interesting. So because we export with extern "C" all other platforms had the name decorations stripped away, and we only notice the mismatch because 32bit windows still has calling convention specific decorations. > > ..Thomas > @tstuefe This is a but unnerving. I have no objections to the patch, but it seems worrisome that this mismatch was not caught on other platforms. Anyone care digging more deeply into why this is the case? I guess the price you pay for extern-"C"-ing the `JVM_xxx()` entry points? Leaves the symbols C++-undecorated. Which makes sense, of course, since outside code wants to dynamically resolve those symbols. On 32bit Windows we have a second decoration scheme, nothing to do with C++, depending on the calling convention. 32bit Windows has three of those, no other platform has that. The JVM_xx() use JNICALL which is "__stdcall" which causes the name to have a leading underscore and followed by '@' + the number bytes in the argument list. Since the new argument list of the prototype had a different number of arguments than the implementation, this led to a different number after the @. I guess there is not much you can do here. On other platforms, a caller would have loaded the symbol by name and called it with the wrong number of arguments, which then hopefully causes crashes? Not sure how to prevent this. Apart from maybe macro-fying the argument list somewhere central or similar techniques. ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From jvernee at openjdk.java.net Mon Nov 16 15:01:05 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 16 Nov 2020 15:01:05 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 14:49:47 GMT, Thomas Stuefe wrote: >> Looks good. >> >> Interesting. So because we export with extern "C" all other platforms had the name decorations stripped away, and we only notice the mismatch because 32bit windows still has calling convention specific decorations. >> >> ..Thomas > >> @tstuefe This is a but unnerving. I have no objections to the patch, but it seems worrisome that this mismatch was not caught on other platforms. Anyone care digging more deeply into why this is the case? > > I guess the price you pay for extern-"C"-ing the `JVM_xxx()` entry points? Leaves the symbols C++-undecorated. Which makes sense, of course, since outside code wants to dynamically resolve those symbols. > > On 32bit Windows we have a second decoration scheme, nothing to do with C++, depending on the calling convention. 32bit Windows has three of those, no other platform has that. The JVM_xx() use JNICALL which is "__stdcall" which causes the name to have a leading underscore and followed by '@' + the number bytes in the argument list. Since the new argument list of the prototype had a different number of arguments than the implementation, this led to a different number after the @. > > I guess there is not much you can do here. On other platforms, a caller would have loaded the symbol by name and called it with the wrong number of arguments, which then hopefully causes crashes? Not sure how to prevent this. Apart from maybe macro-fying the argument list somewhere central or similar techniques. Hmm, just realized we could help the situation by including the header file (with the forward declaration) from the file that has the definition. Compiler will then complain about a conflicting declaration of the function. (time to write a script that catches files which don't do this :) ) ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From ihse at openjdk.java.net Mon Nov 16 15:08:59 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 16 Nov 2020 15:08:59 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 14:57:21 GMT, Jorn Vernee wrote: >>> @tstuefe This is a but unnerving. I have no objections to the patch, but it seems worrisome that this mismatch was not caught on other platforms. Anyone care digging more deeply into why this is the case? >> >> I guess the price you pay for extern-"C"-ing the `JVM_xxx()` entry points? Leaves the symbols C++-undecorated. Which makes sense, of course, since outside code wants to dynamically resolve those symbols. >> >> On 32bit Windows we have a second decoration scheme, nothing to do with C++, depending on the calling convention. 32bit Windows has three of those, no other platform has that. The JVM_xx() use JNICALL which is "__stdcall" which causes the name to have a leading underscore and followed by '@' + the number bytes in the argument list. Since the new argument list of the prototype had a different number of arguments than the implementation, this led to a different number after the @. >> >> I guess there is not much you can do here. On other platforms, a caller would have loaded the symbol by name and called it with the wrong number of arguments, which then hopefully causes crashes? Not sure how to prevent this. Apart from maybe macro-fying the argument list somewhere central or similar techniques. > > Hmm, just realized we could help the situation by including the header file (with the forward declaration) from the file that has the definition. Compiler will then complain about a conflicting declaration of the function. > > (time to write a script that catches files which don't do this :) ) @JornVernee Yes, it was something like that I was after. If the header file is always included, we get a compiler error if the declaration and definition differ. There is actually compiler flags that enforce this; basically saying that a symbol must either be declared before being defined (that is, have the header included), or it must not be exported outside the object file. Some month ago I tried enabling it on hotspot, but it generated too many warnings at that point for me to handle them; for that to work we'd need to do some passes over the code and make sure that these conditions are fulfilled. ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From mcimadamore at openjdk.java.net Mon Nov 16 16:45:31 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 16 Nov 2020 16:45:31 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v26] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix aarch64 test failure ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/3999a188..a836cc32 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=25 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From aph at redhat.com Mon Nov 16 16:50:44 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 16 Nov 2020 16:50:44 +0000 Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v3] In-Reply-To: <654cca24-fdc1-485f-649c-143dede8a264@huawei.com> References: <06f9118b-1d98-5279-9ccc-4d2203e30627@redhat.com> <654cca24-fdc1-485f-649c-143dede8a264@huawei.com> Message-ID: On 16/11/2020 13:31, dongbo (E) wrote: > On 2020/11/16 19:57, Andrew Haley wrote: >> On 11/16/20 3:15 AM, Ningsheng Jian wrote: >>> On Mon, 16 Nov 2020 02:51:05 GMT, Dong Bo wrote: >>> >>>> But I feel a little bit inconsistent that only the new fabd is added into aarch64_neon.ad, >>>> while other NEON intructions (i.e. fabs, fsub, fdiv, fsqrt, etc) are still in aarch64.ad. >>>> And moving them all from aarch64.ad to aarch64_neon.ad deviates far away from this patch. >>> Yes, I think when we introduced aarch64_neon.ad (m4), we just tried to keep that patch simple and would move other vector rules in future patches. Maybe Andrew can comment on this? >> I wonder: I'm not sure if we should do the lot in one bang. >> >> I can't quite figure out the best thing to do. It'll be tricky to move >> all of the SIMD instructions, but I guess it's the best thing to do. It'll >> make backports hard, but we don't see many in this area. > > Yes. Moving all NEON instructions to aarch64_neon.ad would make the code > clearer and more consistent. OK. This should definitely be done as a stand-alone change/ > I put ABS/FABS/FABD into aarch64_neon.ad, hope it would be a good start > for this work. Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gziemski at openjdk.java.net Mon Nov 16 16:57:19 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Mon, 16 Nov 2020 16:57:19 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v6] In-Reply-To: References: Message-ID: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Gerard Ziemski has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge master - Thomas' feedback - cleanup ucontext_get_pc/ucontext_set_pc - David's feedback - use ifdef(SIGDANGER) and ifdef(SIGTRAP) - revert unblock_program_error_signals change - revert JVM_handle_XXX_signal change - Factor out common POSIX signal initialization code - Factor out do_task into PosixSignals - factor out print_signal_handlers - Coleen's feedback integrated - ... and 6 more: https://git.openjdk.java.net/jdk/compare/17f04fc9...dbec19cc ------------- Changes: https://git.openjdk.java.net/jdk/pull/636/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=05 Stats: 331 lines in 21 files changed: 66 ins; 170 del; 95 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From coleenp at openjdk.java.net Mon Nov 16 17:12:32 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 16 Nov 2020 17:12:32 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v9] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: - Add shenandoah set_needs_cleaning but this doesn't work. - fix vmTestbase/nsk/jvmti tests - improve tagmap cleanup and objectfree event posting ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/0487b84c..283696f6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=07-08 Stats: 223 lines in 16 files changed: 174 ins; 12 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From gziemski at openjdk.java.net Mon Nov 16 17:18:29 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Mon, 16 Nov 2020 17:18:29 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v7] In-Reply-To: References: Message-ID: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: Fix missing os::print_signal_handlers on Windows, white space on Linux ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/636/files - new: https://git.openjdk.java.net/jdk/pull/636/files/dbec19cc..44d34bac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=05-06 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From iignatyev at openjdk.java.net Mon Nov 16 18:33:13 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 18:33:13 GMT Subject: RFR: 8256414: add optimized build to submit workflow Message-ID: Hi all, Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? Thanks, -- Igor ------------- Commit messages: - add linux-x64-optimized build Changes: https://git.openjdk.java.net/jdk/pull/1233/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256414 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1233.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1233/head:pull/1233 PR: https://git.openjdk.java.net/jdk/pull/1233 From vlivanov at openjdk.java.net Mon Nov 16 18:38:10 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 16 Nov 2020 18:38:10 GMT Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:26:18 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor Thanks a lot, Igor! Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1233 From shade at openjdk.java.net Mon Nov 16 18:43:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 18:43:03 GMT Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:26:18 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor Looks fine, but wouldn't you like to add `--disable-precompiled-headers` as well? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1233 From kvn at openjdk.java.net Mon Nov 16 18:47:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 16 Nov 2020 18:47:06 GMT Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:26:18 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From mcimadamore at openjdk.java.net Mon Nov 16 18:52:27 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 16 Nov 2020 18:52:27 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v27] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Add `final` modifier on NativeLibraries.defaultLookup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/634/files - new: https://git.openjdk.java.net/jdk/pull/634/files/a836cc32..a71d51a0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=26 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=25-26 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From rhalade at openjdk.java.net Mon Nov 16 18:52:29 2020 From: rhalade at openjdk.java.net (Rajan Halade) Date: Mon, 16 Nov 2020 18:52:29 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v26] In-Reply-To: References: Message-ID: <_MAO7-ELT_n5RxduwWzN7gJ23wpv06mQZftEFwtNWYs=.579b699a-8424-4eac-9b30-b37607e1ba32@github.com> On Mon, 16 Nov 2020 16:45:31 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 test failure src/java.base/share/classes/jdk/internal/loader/NativeLibraries.java line 387: > 385: } > 386: > 387: public static NativeLibrary defaultLibrary = new NativeLibraryImpl(Object.class, "", true, true) { This field can be final. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From iignatyev at openjdk.java.net Mon Nov 16 18:53:20 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 18:53:20 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: added --disable-precompiled-headers ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1233/files - new: https://git.openjdk.java.net/jdk/pull/1233/files/bad0582f..f8189b0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1233.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1233/head:pull/1233 PR: https://git.openjdk.java.net/jdk/pull/1233 From iignatyev at openjdk.java.net Mon Nov 16 18:53:21 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 18:53:21 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:40:27 GMT, Aleksey Shipilev wrote: > Looks fine, but wouldn't you like to add `--disable-precompiled-headers` as well? sure, make sense. ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From shade at openjdk.java.net Mon Nov 16 19:01:07 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 16 Nov 2020 19:01:07 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:53:20 GMT, Igor Ignatyev wrote: >> Hi all, >> >> Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? >> >> Thanks, >> -- Igor > > Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: > > added --disable-precompiled-headers Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From sjohanss at openjdk.java.net Mon Nov 16 19:22:18 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 16 Nov 2020 19:22:18 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v6] In-Reply-To: References: Message-ID: <3gDh_SQGrawMyGTKNsiZAiWunq6JTyxkQ194ixsWXjU=.bff99f70-bb1d-4b08-aeed-e6a09c5a8507@github.com> > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Zoom feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1141/files - new: https://git.openjdk.java.net/jdk/pull/1141/files/8552d23b..86360718 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=04-05 Stats: 30 lines in 5 files changed: 12 ins; 1 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From iignatyev at openjdk.java.net Mon Nov 16 19:34:06 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 19:34:06 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:58:37 GMT, Aleksey Shipilev wrote: >> Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: >> >> added --disable-precompiled-headers > > Marked as reviewed by shade (Reviewer). Thanks for the reviews, folks. the build looks green, integrating. ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From iignatyev at openjdk.java.net Mon Nov 16 19:34:09 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 19:34:09 GMT Subject: Integrated: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 18:26:18 GMT, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor This pull request has now been integrated. Changeset: 68fd71d2 Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/68fd71d2 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8256414: add optimized build to submit workflow add linux-x64-optimized to submit workflow Reviewed-by: vlivanov, shade, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From sjohanss at openjdk.java.net Mon Nov 16 19:39:13 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 16 Nov 2020 19:39:13 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' into 8236926-ccu - Zoom feedback - Albert review 2 - Albert review - Merge branch 'master' into 8236926-ccu - Lock for small mapper and use BitMap parallel operations. - Self review - Simplified task - Improved logging - Test improvement - ... and 5 more: https://git.openjdk.java.net/jdk/compare/3675653c...c354b1d8 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1141/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=06 Stats: 1475 lines in 26 files changed: 1308 ins; 102 del; 65 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From ayang at openjdk.java.net Mon Nov 16 20:04:07 2020 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Nov 2020 20:04:07 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 19:39:13 GMT, Stefan Johansson wrote: >> Please review this change that implements concurrent uncommit for G1. >> >> **Summary** >> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. >> >> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). >> >> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. >> >> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. >> >> **Logging** >> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: >> [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 >> ... >> [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms >> [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms >> ... >> [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms >> >> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. >> >> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: >> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) >> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) >> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) >> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) >> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) >> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) >> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) >> >> **Testing** >> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. >> >> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. > > Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into 8236926-ccu > - Zoom feedback > - Albert review 2 > - Albert review > - Merge branch 'master' into 8236926-ccu > - Lock for small mapper and use BitMap parallel operations. > - Self review > - Simplified task > - Improved logging > - Test improvement > - ... and 5 more: https://git.openjdk.java.net/jdk/compare/3675653c...c354b1d8 Thank you for the revision. ------------- Marked as reviewed by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/1141 From dcubed at openjdk.java.net Mon Nov 16 20:34:59 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 16 Nov 2020 20:34:59 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: <3OXruv8DJOopa0weeyBUGCY8uw244AJFdLL8a4CkQ3M=.cc7ae99e-a8b2-4a9c-95bc-06167d365936@github.com> On Mon, 16 Nov 2020 19:29:59 GMT, Igor Ignatyev wrote: >> Marked as reviewed by shade (Reviewer). > > Thanks for the reviews, folks. the build looks green, integrating. @iignatev - did you also change Mach5? I don't have workflow builds enabled by default since I typically do Mach5 builds... ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From stuefe at openjdk.java.net Mon Nov 16 20:41:11 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 16 Nov 2020 20:41:11 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 17:18:29 GMT, Gerard Ziemski wrote: >> hi all, >> >> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >> >> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >> >> #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) >> >> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) >> >> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >> >> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >> >> #6 Coleen's feedback - factored out print_signal_handlers() >> >> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >> >> #8 Thomas's feedback - factored out common POSIX signal initialization code >> >> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >> >> #10 YaSuenag's feedback - unified logging out of the scope for this fix >> >> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Fix missing os::print_signal_handlers on Windows, white space on Linux Hi Gerard, we are getting closer :) Cheers, Thomas src/hotspot/os/posix/signals_posix.hpp line 31: > 29: #include "utilities/globalDefinitions.hpp" > 30: > 31: // Forward declarations to be independent of the include structure. I don't think you have to write this, this is clear from the context (we do this in a zillion other places too). src/hotspot/os/posix/signals_posix.cpp line 1389: > 1387: } > 1388: > 1389: int PosixSignals::unblock_thread_signal_mask(const sigset_t *set) { I don't think we need this (nor the prototype in the header) src/hotspot/os/posix/signals_posix.cpp line 1724: > 1722: // initialize suspend/resume support - must do this before signal_sets_init() > 1723: if (SR_initialize() != 0) { > 1724: perror("SR_initialize failed"); perror() is old :) does not make much sense here either since errno could be stale. Could you pls switch this to whatever we do at this point when facing an unrecoverable error (probly vm_exit_initialization or similar) src/hotspot/os/posix/signals_posix.cpp line 1393: > 1391: } > 1392: > 1393: void signal_sets_init() { If this is not needed outside this compilation unit, make it pls static. ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/636 From iignatyev at openjdk.java.net Mon Nov 16 20:41:03 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 16 Nov 2020 20:41:03 GMT Subject: RFR: 8256414: add optimized build to submit workflow [v2] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 19:29:59 GMT, Igor Ignatyev wrote: >> Marked as reviewed by shade (Reviewer). > > Thanks for the reviews, folks. the build looks green, integrating. > @iignatev - did you also change Mach5? I don't have workflow builds enabled > by default since I typically do Mach5 builds... Hi Dan, no, not yet. I?m going to change jib profile and tier definitions by a separate RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/1233 From coleenp at openjdk.java.net Mon Nov 16 20:43:12 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 16 Nov 2020 20:43:12 GMT Subject: RFR: 8256365: Clean up vtable initialization code Message-ID: I was looking through this code because of JDK-8061949 and want to do some minor cleanups. 1. There's a function in the wrong place (is_override) 2. methodHandles that use mh()->is_native(), with extra (), 3. some methods declared with TRAPS, that don't trap 4. some multi-clause conditionals with confusing formatting 5. extra InstanceKlass::cast() casts 6. some useless asserts 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) Tested with tier1-3. ------------- Commit messages: - 8256365: Clean up vtable initialization code Changes: https://git.openjdk.java.net/jdk/pull/1236/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1236&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256365 Stats: 130 lines in 4 files changed: 33 ins; 39 del; 58 mod Patch: https://git.openjdk.java.net/jdk/pull/1236.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1236/head:pull/1236 PR: https://git.openjdk.java.net/jdk/pull/1236 From github.com+51754783+coreyashford at openjdk.java.net Mon Nov 16 21:06:09 2020 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Mon, 16 Nov 2020 21:06:09 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7] In-Reply-To: References: Message-ID: On Thu, 5 Nov 2020 13:30:11 GMT, Ziviani wrote: >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions > > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Looks great overall! The removal of branches is a big win. src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 387: > 385: { emit_int32(SETBC_OPCODE | rt(d) | bi(biint)); } > 386: inline void Assembler::setbc(Register d, ConditionRegister cr, Condition cc) { > 387: setbc(d, bi0(cr, cc)); Indentation here should be 2 spaces. src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 392: > 390: { emit_int32(SETNBC_OPCODE | rt(d) | bi(biint)); } > 391: inline void Assembler::setnbc(Register d, ConditionRegister cr, Condition cc) { > 392: setnbc(d, bi0(cr, cc)); Indentation here should be 2 spaces. src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 164: > 162: // branch, jump > 163: // > 164: // set dst to -1, 0, +1 Comment should be something like: set dst to -1, 0, +1, as follows: (some description) src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 239: > 237: } > 238: > 239: // set dst to -1, 0, +1 Comment should be something like: set dst to -1, 0, +1, as follows: (some description) src/hotspot/cpu/ppc/ppc.ad line 11425: > 11423: match(Set dst (CmpL3 src1 src2)); > 11424: effect(KILL cr0); > 11425: ins_cost(DEFAULT_COST * 5); Should this depend on P10 vs. P9 since the instruction cost changes by 1 ? src/hotspot/cpu/ppc/ppc.ad line 11760: > 11758: match(Set dst (CmpF3 src1 src2)); > 11759: effect(KILL cr0); > 11760: ins_cost(DEFAULT_COST * 6); Should this depend on P10 vs. P9 because of the different number of instructions needed? Maybe an approx. value is enough when other paths can't come close to competing. src/hotspot/cpu/ppc/ppc.ad line 11844: > 11842: match(Set dst (CmpD3 src1 src2)); > 11843: effect(KILL cr0); > 11844: ins_cost(DEFAULT_COST * 6); Same question here about P10 vs. P9 regarding cost src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 1613: > 1611: // if unordered_result is 1, treat unordered_result like 'greater than' > 1612: assert(unordered_result == 1 || unordered_result == -1, "only supported"); > 1613: __ set_cmpu3(R17_tos, (unordered_result == 1) ? false : true); instead of `(unordered_result == 1) ? false : true` how about `unordered_result != 1` src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 1612: > 1610: __ fcmpu(CCR0, Rfirst, Rsecond); // compare > 1611: // if unordered_result is 1, treat unordered_result like 'greater than' > 1612: assert(unordered_result == 1 || unordered_result == -1, "only supported"); The assertion error "only supported" is unclear to me. Is there precedent for this kind of message? src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 251: > 249: srawi(R0, R0, 31); > 250: } > 251: orr(dst, dst, R0); I think this section could use a bit more detail in the comments as to what's going on. I know comments are missing from the original code too, but as it is, it's clever but a bit obtuse. ------------- Changes requested by CoreyAshford at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Mon Nov 16 21:51:07 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Mon, 16 Nov 2020 21:51:07 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 20:31:02 GMT, Corey Ashford wrote: >> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions >> >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 387: > >> 385: { emit_int32(SETBC_OPCODE | rt(d) | bi(biint)); } >> 386: inline void Assembler::setbc(Register d, ConditionRegister cr, Condition cc) { >> 387: setbc(d, bi0(cr, cc)); > > Indentation here should be 2 spaces. done, thanks > src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 392: > >> 390: { emit_int32(SETNBC_OPCODE | rt(d) | bi(biint)); } >> 391: inline void Assembler::setnbc(Register d, ConditionRegister cr, Condition cc) { >> 392: setnbc(d, bi0(cr, cc)); > > Indentation here should be 2 spaces. done, thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Mon Nov 16 21:55:07 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Mon, 16 Nov 2020 21:55:07 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7] In-Reply-To: References: Message-ID: <_Fi9aesgguD0GepeoBY9smlin8DqVVhnfCdcjvvprgg=.f0d873af-afab-40fe-889d-6dedd76e4ec9@github.com> On Mon, 16 Nov 2020 20:46:27 GMT, Corey Ashford wrote: >> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions >> >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > src/hotspot/cpu/ppc/ppc.ad line 11425: > >> 11423: match(Set dst (CmpL3 src1 src2)); >> 11424: effect(KILL cr0); >> 11425: ins_cost(DEFAULT_COST * 5); > > Should this depend on P10 vs. P9 since the instruction cost changes by 1 ? As per Martin: > "size" needs to be precise, but a rough estimate is sufficient for "ins_const". In this case CmpL3 has only one match rule, so matcher doesn't have a choice and cost is pointless. So I suggest to keep it more simple and make cost independent on has_brw. > src/hotspot/cpu/ppc/ppc.ad line 11760: > >> 11758: match(Set dst (CmpF3 src1 src2)); >> 11759: effect(KILL cr0); >> 11760: ins_cost(DEFAULT_COST * 6); > > Should this depend on P10 vs. P9 because of the different number of instructions needed? Maybe an approx. value is enough when other paths can't come close to competing. same as above > src/hotspot/cpu/ppc/ppc.ad line 11844: > >> 11842: match(Set dst (CmpD3 src1 src2)); >> 11843: effect(KILL cr0); >> 11844: ins_cost(DEFAULT_COST * 6); > > Same question here about P10 vs. P9 regarding cost same as above ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Mon Nov 16 22:02:11 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Mon, 16 Nov 2020 22:02:11 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 20:33:12 GMT, Corey Ashford wrote: >> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions >> >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 239: > >> 237: } >> 238: >> 239: // set dst to -1, 0, +1 > > Comment should be something like: > set dst to -1, 0, +1, as follows: (some description) Added this comment: // set dst to -1, 0, +1 as follows: if CCR0bi is "greater than", dst is set to 1, // if CCR0bi is "equal", dst is set to 0, otherwise it's set to -1. > src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 164: > >> 162: // branch, jump >> 163: // >> 164: // set dst to -1, 0, +1 > > Comment should be something like: > set dst to -1, 0, +1, as follows: (some description) Added this comment: // set dst to -1, 0, +1 as follows: if CCR0bi is "greater than", dst is set to 1, // if CCR0bi is "equal", dst is set to 0, otherwise it's set to -1. > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 1613: > >> 1611: // if unordered_result is 1, treat unordered_result like 'greater than' >> 1612: assert(unordered_result == 1 || unordered_result == -1, "only supported"); >> 1613: __ set_cmpu3(R17_tos, (unordered_result == 1) ? false : true); > > instead of `(unordered_result == 1) ? false : true` > > how about `unordered_result != 1` done, thanks ! ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From github.com+670087+jrziviani at openjdk.java.net Mon Nov 16 22:07:16 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Mon, 16 Nov 2020 22:07:16 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 20:52:42 GMT, Corey Ashford wrote: >> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions >> >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 1612: > >> 1610: __ fcmpu(CCR0, Rfirst, Rsecond); // compare >> 1611: // if unordered_result is 1, treat unordered_result like 'greater than' >> 1612: assert(unordered_result == 1 || unordered_result == -1, "only supported"); > > The assertion error "only supported" is unclear to me. Is there precedent for this kind of message? https://github.com/openjdk/jdk/pull/907#discussion_r517961858 Actually I can change that to "unordered_result can be either 1 or -1". What do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From iklam at openjdk.java.net Mon Nov 16 22:14:09 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 16 Nov 2020 22:14:09 GMT Subject: RFR: 8256254: Convert vmIntrinsics::ID to enum class Message-ID: This PR is follows the same style as https://github.com/openjdk/jdk/pull/276, except this time I am converting `vmIntrinsics::ID` to `vmIntrinsicID`: * Convert `vmIntrinsics::SID` to `enum class` to provide better type safety. * Also, put this enum class in the top level, so it can be forward-declared. I.e., `enum class vmIntrinsicID : int;`. This avoids excessive inclusion of vmIntrinsics.hpp and vmSymbols.hpp (which were included indirectly by almost every hotspot source files). * vmIntrinsics.hpp: was included 805 times, now included 414 times * vmSymbols.hpp: was included 805 times, now include 394 times * Note: more #include reduction will be done in [JDK-8256424](https://bugs.openjdk.java.net/browse/JDK-8256424) Many files are changed, but most of them are minor * Added missing dependencies of vmSymbols.hpp and/or vmIntrinsics.hpp * safe conversion between vmIntrinsicID and integer types (see comments around `vmIntrinsics::as_int()`) Since we have a lot of references like `vmIntrinsics::_invokeGeneric`, I added aliases like static const vmIntrinsicID vmIntrinsics::_invokeGeneric = vmIntrinsicID::_invokeGeneric so we don't need to change over a thousand `vmIntrinsics::XXX` to `vmIntrinsicID::XXX`. ------------- Commit messages: - 8256254: Convert vmIntrinsics::ID to enum class Changes: https://git.openjdk.java.net/jdk/pull/1237/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1237&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256254 Stats: 213 lines in 51 files changed: 91 ins; 16 del; 106 mod Patch: https://git.openjdk.java.net/jdk/pull/1237.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1237/head:pull/1237 PR: https://git.openjdk.java.net/jdk/pull/1237 From github.com+670087+jrziviani at openjdk.java.net Mon Nov 16 22:24:16 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Mon, 16 Nov 2020 22:24:16 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v8] In-Reply-To: References: Message-ID: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds Ziviani has updated the pull request incrementally with one additional commit since the last revision: Implementing code review suggestions Signed-off-by: Jose Ricardo Ziviani ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/907/files - new: https://git.openjdk.java.net/jdk/pull/907/files/68081ca6..ff1ecd91 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=907&range=06-07 Stats: 14 lines in 4 files changed: 2 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/907/head:pull/907 PR: https://git.openjdk.java.net/jdk/pull/907 From coleenp at openjdk.java.net Mon Nov 16 23:10:21 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 16 Nov 2020 23:10:21 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v10] In-Reply-To: References: Message-ID: <4wSXIq_YvsFFLWWNjVgLCfNqbrxHDajiQGrKPuwcP3A=.5427b09b-48e2-409f-8099-9e44fbdc339d@github.com> > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Reverse remove_dead_entries_locked function names. - Merge branch 'master' into jvmti-table - Add shenandoah set_needs_cleaning but this doesn't work. - fix vmTestbase/nsk/jvmti tests - improve tagmap cleanup and objectfree event posting - Add logging to event posting in case of pauses. - Merge branch 'master' into jvmti-table - Add back WeakProcessorPhases::Phase enum. - Serguei 1. - Code review comments from Kim and Albert. - ... and 5 more: https://git.openjdk.java.net/jdk/compare/0357db35...daaa13fe ------------- Changes: https://git.openjdk.java.net/jdk/pull/967/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=09 Stats: 1884 lines in 49 files changed: 768 ins; 993 del; 123 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From github.com+51754783+coreyashford at openjdk.java.net Mon Nov 16 23:19:13 2020 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Mon, 16 Nov 2020 23:19:13 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7] In-Reply-To: References: Message-ID: <9z6ta1AMB_zKub249zpQNTcxOxpt2ADdOE0JBmko7HI=.e457e01d-28bd-474c-8a68-6bd5266e48cd@github.com> On Mon, 16 Nov 2020 22:03:48 GMT, Ziviani wrote: >> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 1612: >> >>> 1610: __ fcmpu(CCR0, Rfirst, Rsecond); // compare >>> 1611: // if unordered_result is 1, treat unordered_result like 'greater than' >>> 1612: assert(unordered_result == 1 || unordered_result == -1, "only supported"); >> >> The assertion error "only supported" is unclear to me. Is there precedent for this kind of message? > > https://github.com/openjdk/jdk/pull/907#discussion_r517961858 > Actually I can change that to "unordered_result can be either 1 or -1". What do you think? Yeah, that sounds good. >> src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 239: >> >>> 237: } >>> 238: >>> 239: // set dst to -1, 0, +1 >> >> Comment should be something like: >> set dst to -1, 0, +1, as follows: (some description) > > Added this comment: > // set dst to -1, 0, +1 as follows: if CCR0bi is "greater than", dst is set to 1, > // if CCR0bi is "equal", dst is set to 0, otherwise it's set to -1. Looks good! ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From coleenp at openjdk.java.net Mon Nov 16 23:19:14 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 16 Nov 2020 23:19:14 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v4] In-Reply-To: References: <172AiTMoD9T5iKu5xEVQ3AEixTDRx4gaAx0pQFfR57k=.d1d7648c-fb6f-4e87-b515-295c4e6187f7@github.com> Message-ID: On Thu, 5 Nov 2020 14:36:44 GMT, Erik ?sterlund wrote: >> Ok, so there were many test failures with other approaches. Having GC trigger the posting was the most reliable way to post the events when the tests (and presumably the jvmti customers) expected the events to be posted. We could revisit during event disabling if a customer complains about GC pause times. > > The point of this change was not necessarily to be lazy about updating the tagmap, until someone uses it. The point was to get rid of the last annoying serial GC phase. Doing it all lazily would certainly also achieve that. But it would also lead to situations where no event is ever posted from GC to GC. So you would get the event 20 GCs later, which might come as a surprise. It did come as a surprise to some tests, so it is reasonable to assume it would come as a surprise to users too. And I don't think we want such surprises unless we couldn't deal with them. And we can. Kim's change to post the events from the service thread or before other JVMTI operations removes posting events from the gc_notification, which was the objection. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Mon Nov 16 23:19:13 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 16 Nov 2020 23:19:13 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v6] In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 20:39:40 GMT, Coleen Phillimore wrote: >> Thanks @sspitsyn . I'm going to leave the gc_notification code because structurally the two sides of the if statement are different and it's not a long function. Thank you for reviewing the change. > > This change also passes tier 7,8 testing. does this work? I've added two commits from @kimbarrett that defer the ObjectFree posting to the service thread or to a place where it could be removed before posting. I also remerged and added the call JvmtiTagMap::set_needs_cleaning() to shenandoah which works after merging the latest code from shenandoah. Testing tiers 1-6 currently. jvmti/jdi tests pass with G1 and ZGC stress options, and JVMTI tests pass with shenandoah. Thanks! Coleen ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Mon Nov 16 23:30:25 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 16 Nov 2020 23:30:25 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix minimal build. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/daaa13fe..1940eaf1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From patricio.chilano.mateo at oracle.com Tue Nov 17 00:12:43 2020 From: patricio.chilano.mateo at oracle.com (Patricio Chilano) Date: Mon, 16 Nov 2020 21:12:43 -0300 Subject: Biased locking Obsoletion In-Reply-To: <7C170D34-56B3-473F-95B8-C967E6EC945B@kodewerk.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> <7C170D34-56B3-473F-95B8-C967E6EC945B@kodewerk.com> Message-ID: Hi all, Thank you for all the feedback. We've had some internal VM discussions here at Oracle, and there is a general consensus that leaving the biased-locking code in a while longer will not adversely affect any of the project work that is in the pipeline for JDK 16 and 17. While it is essential for the Valhalla project that the biased-locking code is removed, there is no immediate urgency in doing so. Further, there is a general consensus that giving more time for feedback on any impacts from biased-locking being disabled is a good idea and will allow time for any needed mitigations. So we will defer the obsoletion of the biased-locking code to JDK 18 (JDK-8256253), with the expectation that it will be done as soon as JDK 18 forks next June/July (JDK-8256425). Thanks again, Patricio From mchung at openjdk.java.net Tue Nov 17 00:29:03 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Tue, 17 Nov 2020 00:29:03 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: References: Message-ID: <3hcm-LPJG34kftsIY2_tgDJiPwuplmron5EQkJ4NT5s=.88ff5bc4-a2d8-4273-a958-e271aacd3358@github.com> On Mon, 16 Nov 2020 13:49:26 GMT, Alan Bateman wrote: >> Please review the code for the second iteration of sealed classes. In this iteration we are: >> >> - Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies >> - Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface >> - renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] >> - adding code to make sure that annotations can't be sealed >> - improving some tests >> >> TIA >> >> Related specs: >> [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) >> [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) >> [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) > > src/java.base/share/classes/java/lang/Package.java line 227: > >> 225: * This method reports on a distinct concept of sealing from >> 226: * {@link Class#isSealed() Class::isSealed}. >> 227: * > > This API note will be very confusing to readers. I think the javadoc will need to be fleshed out and probably will need to link to a section the Package class description that defines the legacy concept of sealing. I agree. This @apiNote needs more clarification to help the readers to understand the context here. One thing we could do in the Package class description to add a "Package Sealing" section that can also explain that it has no relationship to "sealed classes". ------------- PR: https://git.openjdk.java.net/jdk/pull/1227 From iignatyev at openjdk.java.net Tue Nov 17 00:36:10 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 17 Nov 2020 00:36:10 GMT Subject: RFR: 8256430: add linux-x64-optimized to tier1 Message-ID: Hi all, [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? Thanks -- Igor cc-ing @dcubed-ojdk ------------- Commit messages: - 8256430: add linux-x64-optimized to tier1 Changes: https://git.openjdk.java.net/jdk/pull/1244/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1244&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256430 Stats: 12 lines in 1 file changed: 12 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1244.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1244/head:pull/1244 PR: https://git.openjdk.java.net/jdk/pull/1244 From kvn at openjdk.java.net Tue Nov 17 01:17:02 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 01:17:02 GMT Subject: RFR: 8256430: add linux-x64-optimized to tier1 In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 00:31:24 GMT, Igor Ignatyev wrote: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1244 From dcubed at openjdk.java.net Tue Nov 17 01:43:01 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 17 Nov 2020 01:43:01 GMT Subject: RFR: 8256430: add linux-x64-optimized to tier1 In-Reply-To: References: Message-ID: <3XYPJS0vv9aGfb613MCbkJw0VvYIdwvE4T2iwCXhDCg=.602bf1fa-f90b-496a-a9be-ddafb13e153a@github.com> On Tue, 17 Nov 2020 00:31:24 GMT, Igor Ignatyev wrote: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk Thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1244 From dholmes at openjdk.java.net Tue Nov 17 01:49:05 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 17 Nov 2020 01:49:05 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 17:18:29 GMT, Gerard Ziemski wrote: >> hi all, >> >> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >> >> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >> >> #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) >> >> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) >> >> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >> >> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >> >> #6 Coleen's feedback - factored out print_signal_handlers() >> >> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >> >> #8 Thomas's feedback - factored out common POSIX signal initialization code >> >> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >> >> #10 YaSuenag's feedback - unified logging out of the scope for this fix >> >> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Fix missing os::print_signal_handlers on Windows, white space on Linux I'm seeing unexpected changes relating to os::fetch_compiled_frame_from_context seemingly due to the "Merge from master". Otherwise the actual described changes since last commit seem fine. Once Thomas's last couple of comments are addressed this is done. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/636 From dholmes at openjdk.java.net Tue Nov 17 02:01:05 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 17 Nov 2020 02:01:05 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 12:13:38 GMT, Jorn Vernee wrote: > Fix win-32 linker error due to forward declaration and definition signature mismatch. > > FWIW, the altered header is included from nativeLookup.cpp, which uses the function pointer of that function. But, since the signature of the definition [1] is different, and due to name mangling on the particular ABI, the symbols of the declaration in the header and definition are different as well, and things fail to link later. > > Testing: Building Windows-x86 locally, and running jdk_foreign tests. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/scopedMemoryAccess.cpp#L181 src/hotspot/share/prims/scopedMemoryAccess.hpp line 32: > 30: > 31: extern "C" { > 32: void JNICALL JVM_RegisterJDKInternalMiscScopedMemoryAccessMethods(JNIEnv *env, jclass scopedMemoryAccessClass); Why isn't this declared with JNIEXPORT like the functions in jvm.h? ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From kvn at openjdk.java.net Tue Nov 17 02:06:05 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 02:06:05 GMT Subject: RFR: 8252588: HotSpot Style Guide should permit uniform initialization In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:05:35 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of uniform initialization, aka brace initialization, in > HotSpot code. Uniform initialization is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision to > approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. > > [Note: This is the first attempt to change the Style Guide since the > revision that added a description for a change process (requires rough > consensus of the Group), and also since the start of using git and > github PRs. I'm making a guess at how to instantiate that process > within the new mechanisms.] Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1119 From mchung at openjdk.java.net Tue Nov 17 02:07:06 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Tue, 17 Nov 2020 02:07:06 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 13:30:06 GMT, Vicente Romero wrote: > Please review the code for the second iteration of sealed classes. In this iteration we are: > > - Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies > - Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface > - renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] > - adding code to make sure that annotations can't be sealed > - improving some tests > > TIA > > Related specs: > [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) > [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) > [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) src/java.base/share/classes/java/lang/Class.java line 4381: > 4379: */ > 4380: @jdk.internal.PreviewFeature(feature=jdk.internal.PreviewFeature.Feature.SEALED_CLASSES, essentialAPI=false) > 4381: public Class[] getPermittedSubclasses() { What happens if a permitted subclass is not found? I see that `getPermittedSubclasses0` ignores the entry if the class is not found. Should that be specified? Have you considered whether security package access is needed (now that this method returns `Class` objects the caller may not have access to)? This needs to be discussed with the security team. If someone gets a hold of a sealed class (e.g. `obj.getClass()`), this method could leak other `Class` objects. ------------- PR: https://git.openjdk.java.net/jdk/pull/1227 From darcy at openjdk.java.net Tue Nov 17 02:08:05 2020 From: darcy at openjdk.java.net (Joe Darcy) Date: Tue, 17 Nov 2020 02:08:05 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 17:53:16 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large test/jdk/java/lang/Math/ExpCornerCaseTests.java line 2: > 1: /* > 2: * Copyright (c) 2011,2020 Oracle and/or its affiliates. All rights reserved. The year of the copyright syntax is "2011, 2020,"; no need for a re-review before pushing after that correction. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From dongbo at openjdk.java.net Tue Nov 17 02:25:02 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 17 Nov 2020 02:25:02 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v4] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 03:10:42 GMT, Dong Bo wrote: >> Looks good. Please add any new instructions to aarch64-asmtest.py and regenerate assembler_aarch64.cpp. > >> Looks good. Please add any new instructions to aarch64-asmtest.py and regenerate assembler_aarch64.cpp. > > Done, added tests for `fabd` scalar/vector instructions in this script and regenerated the code. > Verfied with linux-aarch64-server-fastdebug build. > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ > > On 16/11/2020 13:31, dongbo (E) wrote: > > > On 2020/11/16 19:57, Andrew Haley wrote: > > > On 11/16/20 3:15 AM, Ningsheng Jian wrote: > > > > On Mon, 16 Nov 2020 02:51:05 GMT, Dong Bo wrote: > > > > > But I feel a little bit inconsistent that only the new fabd is added into aarch64_neon.ad, > > > > > while other NEON intructions (i.e. fabs, fsub, fdiv, fsqrt, etc) are still in aarch64.ad. > > > > > And moving them all from aarch64.ad to aarch64_neon.ad deviates far away from this patch. > > > > > Yes, I think when we introduced aarch64_neon.ad (m4), we just tried to keep that patch simple and would move other vector rules in future patches. Maybe Andrew can comment on this? > > > > > I wonder: I'm not sure if we should do the lot in one bang. > > > > > > > > > I can't quite figure out the best thing to do. It'll be tricky to move > > > all of the SIMD instructions, but I guess it's the best thing to do. It'll > > > make backports hard, but we don't see many in this area. > > > > > > Yes. Moving all NEON instructions to aarch64_neon.ad would make the code > > clearer and more consistent. > > OK. This should definitely be done as a stand-alone change/ > We can do it with a new PR. @theRealAph @nsjian Is there any further suggestions for this FABD patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From kvn at openjdk.java.net Tue Nov 17 02:25:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 02:25:06 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: References: Message-ID: <-FdDOTDgxvTFXLBjs4ZCO3xHA5NM0yLXaPyMgrebbQA=.f533c3f1-0199-40b0-b507-104d7997c7cf@github.com> On Fri, 13 Nov 2020 17:53:16 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large Changes requested by kvn (Reviewer). src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp line 496: > 494: movl(Address(rsp, 64), tmp); > 495: lea(tmp, ExternalAddress(static_const_table)); > 496: movsd(xmm0, Address(rsp, 128)); Can you explain this change? ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From kvn at openjdk.java.net Tue Nov 17 02:25:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 02:25:08 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: <-FdDOTDgxvTFXLBjs4ZCO3xHA5NM0yLXaPyMgrebbQA=.f533c3f1-0199-40b0-b507-104d7997c7cf@github.com> References: <-FdDOTDgxvTFXLBjs4ZCO3xHA5NM0yLXaPyMgrebbQA=.f533c3f1-0199-40b0-b507-104d7997c7cf@github.com> Message-ID: <7jUr9izZQ1h44rV3OCRR-16J2bN3eAJKN559LIvjb1M=.48f7411b-3306-4c71-92e6-208fa18da860@github.com> On Tue, 17 Nov 2020 02:19:05 GMT, Vladimir Kozlov wrote: >> Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large > > src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp line 496: > >> 494: movl(Address(rsp, 64), tmp); >> 495: lea(tmp, ExternalAddress(static_const_table)); >> 496: movsd(xmm0, Address(rsp, 128)); > > Can you explain this change? Would be nice to add comment about what values are on stack. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From kbarrett at openjdk.java.net Tue Nov 17 03:19:02 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 17 Nov 2020 03:19:02 GMT Subject: RFR: 8252588: HotSpot Style Guide should permit uniform initialization In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 02:03:32 GMT, Vladimir Kozlov wrote: >> Please review and vote on this change to the HotSpot Style Guide to >> permit the use of uniform initialization, aka brace initialization, in >> HotSpot code. Uniform initialization is a feature added in C++11. >> >> This is a modification of the Style Guide, so rough consensus among >> the HotSpot Group members is required to make this change. Only Group >> members should vote for approval (via the github PR), though reasoned >> objectsions or comments from anyone will be considered. A decision to >> approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. >> >> [Note: This is the first attempt to change the Style Guide since the >> revision that added a description for a change process (requires rough >> consensus of the Group), and also since the start of using git and >> github PRs. I'm making a guess at how to instantiate that process >> within the new mechanisms.] > > Good. Thanks for reviews and comments in support. ------------- PR: https://git.openjdk.java.net/jdk/pull/1119 From kbarrett at openjdk.java.net Tue Nov 17 03:19:04 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 17 Nov 2020 03:19:04 GMT Subject: Integrated: 8252588: HotSpot Style Guide should permit uniform initialization In-Reply-To: References: Message-ID: On Mon, 9 Nov 2020 11:05:35 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of uniform initialization, aka brace initialization, in > HotSpot code. Uniform initialization is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision to > approve will not be made before Monday 16-Nov-2020 at 12h00 UTC. > > [Note: This is the first attempt to change the Style Guide since the > revision that added a description for a change process (requires rough > consensus of the Group), and also since the start of using git and > github PRs. I'm making a guess at how to instantiate that process > within the new mechanisms.] This pull request has now been integrated. Changeset: 537b40e0 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/537b40e0 Stats: 29 lines in 2 files changed: 29 ins; 0 del; 0 mod 8252588: HotSpot Style Guide should permit uniform initialization Reviewed-by: jrose, dholmes, dcubed, tschatzl, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1119 From njian at openjdk.java.net Tue Nov 17 03:51:07 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 17 Nov 2020 03:51:07 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v4] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 13:26:23 GMT, Dong Bo wrote: >> This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. >> >> Verified with linux-aarch64-server-release, tier1-3. >> >> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. >> >> The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. >> For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), >> so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. >> >> The JMH results on Kunpeng916: >> >> Benchmark (count) (seed) Mode Cnt Score Error Units >> >> # before, fsub+fabs >> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op >> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op >> >> # after, fabd >> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op >> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > move ABS/FABS/FABD neon vector rules into aarch64_neon.ad src/hotspot/cpu/aarch64/aarch64_neon.ad line 3462: > 3460: instruct vabs8B(vecD dst, vecD src) > 3461: %{ > 3462: predicate(n->as_Vector()->length() == 8); This is different from original code. ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From kbarrett at openjdk.java.net Tue Nov 17 03:52:14 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 17 Nov 2020 03:52:14 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 23:30:25 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix minimal build. Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From dongbo at openjdk.java.net Tue Nov 17 06:05:04 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 17 Nov 2020 06:05:04 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v4] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 03:47:37 GMT, Ningsheng Jian wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> move ABS/FABS/FABD neon vector rules into aarch64_neon.ad > > src/hotspot/cpu/aarch64/aarch64_neon.ad line 3462: > >> 3460: instruct vabs8B(vecD dst, vecD src) >> 3461: %{ >> 3462: predicate(n->as_Vector()->length() == 8); > > This is different from original code. For integer absolute (vector), the accepted arrangements are `T8B, T16B, T4H, T8H, T2S, T4S, T2D`. ARM compiler armasm user guide reference: https://developer.arm.com/documentation/dui0801/h/A64-SIMD-Vector-Instructions/ABS--vector-?lang=en I think the original code `n->as_Vector()->length() == 4 ||` is not right for basic type Byte. So I delete it, I am sorry if I miss something. ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From goetz.lindenmaier at sap.com Tue Nov 17 07:21:51 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 17 Nov 2020 07:21:51 +0000 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> <7C170D34-56B3-473F-95B8-C967E6EC945B@kodewerk.com> Message-ID: Hi Patricio, That is a great plan, thanks! Best regards, Goetz. > -----Original Message----- > From: hotspot-dev On Behalf Of > Patricio Chilano > Sent: Tuesday, November 17, 2020 1:13 AM > To: HotSpot Open Source Developers > Subject: Re: Biased locking Obsoletion > > Hi all, > > Thank you for all the feedback. We've had some internal VM discussions > here at Oracle, and there is a general consensus that leaving the > biased-locking code in a while longer will not adversely affect any of > the project work that is in the pipeline for JDK 16 and 17. While it is > essential for the Valhalla project that the biased-locking code is > removed, there is no immediate urgency in doing so. Further, there is a > general consensus that giving more time for feedback on any impacts from > biased-locking being disabled is a good idea and will allow time for any > needed mitigations. So we will defer the obsoletion of the > biased-locking code to JDK 18 (JDK-8256253), with the expectation that > it will be done as soon as JDK 18 forks next June/July (JDK-8256425). > > Thanks again, > Patricio From mbaesken at openjdk.java.net Tue Nov 17 08:02:04 2020 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Tue, 17 Nov 2020 08:02:04 GMT Subject: Integrated: JDK-8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 09:11:51 GMT, Matthias Baesken wrote: > Most return values of CodeCache::find_blob_unsafe are NULL checked or at least handled by an asserts. However especially on ppc a few are missing and should be added (e.g. in nativeInst_ppc.cpp). This pull request has now been integrated. Changeset: 4553fa0b Author: Matthias Baesken URL: https://git.openjdk.java.net/jdk/commit/4553fa0b Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod 8256258: some missing NULL checks or asserts after CodeCache::find_blob_unsafe Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1181 From neliasso at openjdk.java.net Tue Nov 17 08:20:12 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 17 Nov 2020 08:20:12 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 05:31:10 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Review comments resolved > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - JDK-8252848 : Review comments resolved > - JDK-8252848: Review comments resolution. > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - ... and 6 more: https://git.openjdk.java.net/jdk/compare/1d3d64f3...edb74db3 ok - looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/302 From njian at openjdk.java.net Tue Nov 17 08:41:06 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 17 Nov 2020 08:41:06 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v4] In-Reply-To: References: Message-ID: <5lFZtGvhwRh-OTHw9jRpxTSKbFHfQhyUKNhA30qj0iA=.b9c55684-ed83-417d-9feb-5ef1805da110@github.com> On Tue, 17 Nov 2020 06:02:41 GMT, Dong Bo wrote: >> src/hotspot/cpu/aarch64/aarch64_neon.ad line 3462: >> >>> 3460: instruct vabs8B(vecD dst, vecD src) >>> 3461: %{ >>> 3462: predicate(n->as_Vector()->length() == 8); >> >> This is different from original code. > > For integer absolute (vector), the accepted arrangements are `T8B, T16B, T4H, T8H, T2S, T4S, T2D`. > ARM compiler armasm user guide reference: https://developer.arm.com/documentation/dui0801/h/A64-SIMD-Vector-Instructions/ABS--vector-?lang=en > I think the original code `n->as_Vector()->length() == 4 ||` is not right for basic type Byte. So I delete it, I am sorry if I miss something. We load 4B with loadV4 but handle 4B types with T8B instructions. And current min_vector_size for byte type is 4, so it's possible for vectorizer to generate 4B vector nodes. ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From magnus.ihse.bursie at oracle.com Tue Nov 17 09:37:31 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 17 Nov 2020 10:37:31 +0100 Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: References: Message-ID: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> Hi Igor, There is a long-standing bug with the intent to remove optimized builds (https://bugs.openjdk.java.net/browse/JDK-8183287). Given that it does not seem that popular, I wonder if it really is necessary to burden the submit workflow (and tier1 testing, as requested in https://bugs.openjdk.java.net/browse/JDK-8256430) with this. At the very least, I'd like to get some input from more Hotspot developers to hear if they think it is a worthy cause to spend our resources at. Otherwise, I believe a better way forward is to follow through on JDK-8183287, viz. to split up optimized builds into the two extra components it actually provides: enable diagnostic code in normal release builds (https://bugs.openjdk.java.net/browse/JDK-8183288) and enable tracing with INCLUDE_PRINT (https://bugs.openjdk.java.net/browse/JDK-8202283). /Magnus On 2020-11-16 19:33, Igor Ignatyev wrote: > Hi all, > > Could you please review this small and trivial patch which adds `linux-x64-optimized` build to submit workflow so breakages of this build flavor would be easier to spot? > > Thanks, > -- Igor > > ------------- > > Commit messages: > - add linux-x64-optimized build > > Changes: https://git.openjdk.java.net/jdk/pull/1233/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8256414 > Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod > Patch: https://git.openjdk.java.net/jdk/pull/1233.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/1233/head:pull/1233 > > PR: https://git.openjdk.java.net/jdk/pull/1233 From magnus.ihse.bursie at oracle.com Tue Nov 17 09:47:26 2020 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 17 Nov 2020 10:47:26 +0100 Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> References: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> Message-ID: <282bed4c-56b5-3da2-1bbe-a9ff54edf91b@oracle.com> I see now that this PR was already integrated. I think that any change to the submit workflow (if they add additional testing, and not just fix bugs) is a non-trivial change which needs careful consideration. We have already had a huge influx of additional build platforms in a very short time. Each additional platform is subject to any kind of build issues, not all of which might be related to the actual patch, and we therefore need to weight the benefits of getting additional testing of build platforms to the risks that this might cause unnecessary road blocks for developers. Also, I believe it is good practice when changing build code to make sure that at least one reviewer is a member of the Build Group (https://openjdk.java.net/census#build). This is not something Skara can enforce, so it is dependent on the good will of committers (who should notify the correct set of reviewers), and of JDK Reviewers to specify if they believe additional reviewers from any particular area is needed. /Magnus On 2020-11-17 10:37, Magnus Ihse Bursie wrote: > Hi Igor, > > There is a long-standing bug with the intent to remove optimized > builds (https://bugs.openjdk.java.net/browse/JDK-8183287). Given that > it does not seem that popular, I wonder if it really is necessary to > burden the submit workflow (and tier1 testing, as requested in > https://bugs.openjdk.java.net/browse/JDK-8256430) with this. > > At the very least, I'd like to get some input from more Hotspot > developers to hear if they think it is a worthy cause to spend our > resources at. > > Otherwise, I believe a better way forward is to follow through on > JDK-8183287, viz. to split up optimized builds into the two extra > components it actually provides: enable diagnostic code in normal > release builds (https://bugs.openjdk.java.net/browse/JDK-8183288) and > enable tracing with INCLUDE_PRINT > (https://bugs.openjdk.java.net/browse/JDK-8202283). > > /Magnus > > On 2020-11-16 19:33, Igor Ignatyev wrote: >> Hi all, >> >> Could you please review this small and trivial patch which adds >> `linux-x64-optimized` build to submit workflow so breakages of this >> build flavor would be easier to spot? >> >> Thanks, >> -- Igor >> >> ------------- >> >> Commit messages: >> ? - add linux-x64-optimized build >> >> Changes: https://git.openjdk.java.net/jdk/pull/1233/files >> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00 >> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8256414 >> ?? Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod >> ?? Patch: https://git.openjdk.java.net/jdk/pull/1233.diff >> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >> pull/1233/head:pull/1233 >> >> PR: https://git.openjdk.java.net/jdk/pull/1233 > From github.com+51754783+coreyashford at openjdk.java.net Tue Nov 17 10:09:08 2020 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Tue, 17 Nov 2020 10:09:08 GMT Subject: RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v8] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 22:24:16 GMT, Ziviani wrote: >> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. >> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. >> Ref: PowerISA 3.1, page 129. >> >> These instructions are particularly interesting to improve the following >> pattern `(src1src2)? 1: 0)`, which can be found in >> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. >> >> Long.toString, that generate such pattern in getChars, has showed a >> good performance gain by using these new instructions. >> >> Example: >> for (int i = 0; i < 200_000; i++) >> res = Long.toString((long)i); >> >> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString >> >> Without setbc (average): 0.1178 seconds >> With setbc (average): 0.0396 seconds > > Ziviani has updated the pull request incrementally with one additional commit since the last revision: > > Implementing code review suggestions > > Signed-off-by: Jose Ricardo Ziviani Marked as reviewed by CoreyAshford at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From sjohanss at openjdk.java.net Tue Nov 17 10:19:08 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 17 Nov 2020 10:19:08 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v3] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 12:33:10 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? >> >> Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. >> >> With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. >> >> This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. >> >> Testing: tier1-5, one or two 6-8 runs >> >> The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into 8253081-null-narrow-klass-changes2 > - kbarrett review > - Initial import I've been focusing on the GC-parts and it looks good in general just a few comments. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4458: > 4456: heap_region_iterate(&cl); > 4457: > 4458: remove_from_old_gen_sets(0, 0, cl.humongous_regions_reclaimed()); Looking at this call and now having three parameters that are "optional" for `remove_from_old_gen_sets()` I wonder if it would be cleaner to have three functions, one for each set. It would increase the number of times we take the look, but we could restructure the code in `G1ReclaimEmptyRegionsTask` to not do the updates in the worker threads and that way only take the lock when there will be no contention. If you fell like this is outside the scope of this change, please file an issue instead. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1266: > 1264: _g1h->remove_from_old_gen_sets(cl.old_regions_removed(), > 1265: cl.archive_regions_removed(), > 1266: cl.humongous_regions_removed()); If we want to go with the one call per set approach, here we could just atomically add these to a task-counter for each type and then do the calls to update the sets after the task has finished. src/hotspot/share/gc/g1/g1FullGCPrepareTask.hpp line 60: > 58: G1FullGCCompactionPoint* _cp; > 59: uint _humongous_regions_removed; > 60: uint _open_archive_regions_freed; Since we no longer use these counters to update the sets, it is enough to just have one counter to track if any region has been freed. Or use a boolean if you prefer that. ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1163 From jvernee at openjdk.java.net Tue Nov 17 10:33:06 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 17 Nov 2020 10:33:06 GMT Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 01:58:03 GMT, David Holmes wrote: >> Fix win-32 linker error due to forward declaration and definition signature mismatch. >> >> FWIW, the altered header is included from nativeLookup.cpp, which uses the function pointer of that function. But, since the signature of the definition [1] is different, and due to name mangling on the particular ABI, the symbols of the declaration in the header and definition are different as well, and things fail to link later. >> >> Testing: Building Windows-x86 locally, and running jdk_foreign tests. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/scopedMemoryAccess.cpp#L181 > > src/hotspot/share/prims/scopedMemoryAccess.hpp line 32: > >> 30: >> 31: extern "C" { >> 32: void JNICALL JVM_RegisterJDKInternalMiscScopedMemoryAccessMethods(JNIEnv *env, jclass scopedMemoryAccessClass); > > Why isn't this declared with JNIEXPORT like the functions in jvm.h? It's similar to the declarations in nativeLookup.cpp. Those don't have JNIEXPORT either. FWIW, this symbol doesn't need to/shouldn't be exported, since we only reference it internally as the implementation of a Java registerNatives method, and it's not meant to be called directly when an external program links against libjvm. The functions in jvm.h are part of an API though, so those need to be exported. e.g. on Windows the macro expands to __declspec(dllexport), which is used to make the symbol externally visible in the final jvm.dll. ------------- PR: https://git.openjdk.java.net/jdk/pull/1222 From volker.simonis at gmail.com Tue Nov 17 10:37:51 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 17 Nov 2020 11:37:51 +0100 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <26c5c51e-9bbe-455c-7bf0-1f7cbcd126f2@oracle.com> <9b9414e1-c7b7-59be-8e6d-d7d5d44527d1@oracle.com> <7C170D34-56B3-473F-95B8-C967E6EC945B@kodewerk.com> Message-ID: Sounds good! Thanks Patricio. On Tue, Nov 17, 2020 at 1:15 AM Patricio Chilano wrote: > > Hi all, > > Thank you for all the feedback. We've had some internal VM discussions > here at Oracle, and there is a general consensus that leaving the > biased-locking code in a while longer will not adversely affect any of > the project work that is in the pipeline for JDK 16 and 17. While it is > essential for the Valhalla project that the biased-locking code is > removed, there is no immediate urgency in doing so. Further, there is a > general consensus that giving more time for feedback on any impacts from > biased-locking being disabled is a good idea and will allow time for any > needed mitigations. So we will defer the obsoletion of the > biased-locking code to JDK 18 (JDK-8256253), with the expectation that > it will be done as soon as JDK 18 forks next June/July (JDK-8256425). > > Thanks again, > Patricio From david.holmes at oracle.com Tue Nov 17 10:41:31 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 17 Nov 2020 20:41:31 +1000 Subject: RFR: 8256380: JDK-8254162 broke 32bit windows build In-Reply-To: References: Message-ID: <0fe0f266-1d0d-df70-fed1-0305fcdcbfb9@oracle.com> Hi Jorn, On 17/11/2020 8:33 pm, Jorn Vernee wrote: > On Tue, 17 Nov 2020 01:58:03 GMT, David Holmes wrote: > >>> Fix win-32 linker error due to forward declaration and definition signature mismatch. >>> >>> FWIW, the altered header is included from nativeLookup.cpp, which uses the function pointer of that function. But, since the signature of the definition [1] is different, and due to name mangling on the particular ABI, the symbols of the declaration in the header and definition are different as well, and things fail to link later. >>> >>> Testing: Building Windows-x86 locally, and running jdk_foreign tests. >>> >>> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/scopedMemoryAccess.cpp#L181 >> >> src/hotspot/share/prims/scopedMemoryAccess.hpp line 32: >> >>> 30: >>> 31: extern "C" { >>> 32: void JNICALL JVM_RegisterJDKInternalMiscScopedMemoryAccessMethods(JNIEnv *env, jclass scopedMemoryAccessClass); >> >> Why isn't this declared with JNIEXPORT like the functions in jvm.h? > > It's similar to the declarations in nativeLookup.cpp. Those don't have JNIEXPORT either. > > FWIW, this symbol doesn't need to/shouldn't be exported, since we only reference it internally as the implementation of a Java registerNatives method, and it's not meant to be called directly when an external program links against libjvm. > > The functions in jvm.h are part of an API though, so those need to be exported. e.g. on Windows the macro expands to __declspec(dllexport), which is used to make the symbol externally visible in the final jvm.dll. All of jvm.h defines a contract between the JDK and JVM, not any external program. The JNIEXPORT is so that native code in libjava.so (or other JDK native libs) can call native code in libjvm.so. Possibly this is unneeded when the registration logic itself is executed in the VM. Cheers, David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/1222 > From vlivanov at openjdk.java.net Tue Nov 17 10:49:04 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 17 Nov 2020 10:49:04 GMT Subject: RFR: 8256430: add linux-x64-optimized to tier1 In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 00:31:24 GMT, Igor Ignatyev wrote: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1244 From tschatzl at openjdk.java.net Tue Nov 17 10:55:19 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 17 Nov 2020 10:55:19 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v4] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? > > Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. > > With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. > > This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. > > Testing: tier1-5, one or two 6-8 runs > > The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - sjohanss review - Remove code that "activates" dormant objects as now all active objects are reachable via the root object array ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1163/files - new: https://git.openjdk.java.net/jdk/pull/1163/files/099ec2f4..6669b529 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1163&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1163&range=02-03 Stats: 56 lines in 6 files changed: 2 ins; 35 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/1163.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1163/head:pull/1163 PR: https://git.openjdk.java.net/jdk/pull/1163 From tschatzl at openjdk.java.net Tue Nov 17 11:01:09 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 17 Nov 2020 11:01:09 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v3] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 10:16:28 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into 8253081-null-narrow-klass-changes2 >> - kbarrett review >> - Initial import > > I've been focusing on the GC-parts and it looks good in general just a few comments. Ran tier1-4, tier5-gc with the changes from ea78aa1 (contributed by @iklam after some discussion). > src/hotspot/share/gc/g1/g1FullGCPrepareTask.hpp line 60: > >> 58: G1FullGCCompactionPoint* _cp; >> 59: uint _humongous_regions_removed; >> 60: uint _open_archive_regions_freed; > > Since we no longer use these counters to update the sets, it is enough to just have one counter to track if any region has been freed. Or use a boolean if you prefer that. Fixed. > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4458: > >> 4456: heap_region_iterate(&cl); >> 4457: >> 4458: remove_from_old_gen_sets(0, 0, cl.humongous_regions_reclaimed()); > > Looking at this call and now having three parameters that are "optional" for `remove_from_old_gen_sets()` I wonder if it would be cleaner to have three functions, one for each set. It would increase the number of times we take the look, but we could restructure the code in `G1ReclaimEmptyRegionsTask` to not do the updates in the worker threads and that way only take the lock when there will be no contention. > > If you fell like this is outside the scope of this change, please file an issue instead. I am not very clear on what the problem there is and how the parameters are optional. I do not completely understand how adding an extra method just for this caller would improve the code significantly. I'll opt to defer this cleanup to a separate CR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From vladimir.x.ivanov at oracle.com Tue Nov 17 11:02:18 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 17 Nov 2020 14:02:18 +0300 Subject: RFR: 8256414: add optimized build to submit workflow In-Reply-To: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> References: <9ffae379-72ea-f830-c6da-763522e0ed5f@oracle.com> Message-ID: <58ab29bb-9170-78cd-97d1-87505ecb36bb@oracle.com> > There is a long-standing bug with the intent to remove optimized builds > (https://bugs.openjdk.java.net/browse/JDK-8183287). Given that it does > not seem that popular, I wonder if it really is necessary to burden the > submit workflow (and tier1 testing, as requested in > https://bugs.openjdk.java.net/browse/JDK-8256430) with this. > > At the very least, I'd like to get some input from more Hotspot > developers to hear if they think it is a worthy cause to spend our > resources at. > > Otherwise, I believe a better way forward is to follow through on > JDK-8183287, viz. to split up optimized builds into the two extra > components it actually provides: enable diagnostic code in normal > release builds (https://bugs.openjdk.java.net/browse/JDK-8183288) and > enable tracing with INCLUDE_PRINT > (https://bugs.openjdk.java.net/browse/JDK-8202283). I find !PRODUCT vs ASSERT distinction confusing, but irrespective of the way the relevant code is guarded (!PRODUCT or INCLUDE_PRINT), it has to be built regularly to avoid the rot. So, once the bugs you mentioned are addressed, optimized build can be replaced with release build + tracing configuration. Regarding the most appropriate tier to put it, I don't think it has to be part of tier1. IMO later tiers are fine as well. But having it in tier1 doesn't look like a significant waste of resources. I don't think there's a notion of tiers in submit workflow, so I'm strongly in favor of having optimized configuration built there. Best regards, Vladimir Ivanov > On 2020-11-16 19:33, Igor Ignatyev wrote: >> Hi all, >> >> Could you please review this small and trivial patch which adds >> `linux-x64-optimized` build to submit workflow so breakages of this >> build flavor would be easier to spot? >> >> Thanks, >> -- Igor >> >> ------------- >> >> Commit messages: >> ? - add linux-x64-optimized build >> >> Changes: https://git.openjdk.java.net/jdk/pull/1233/files >> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1233&range=00 >> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8256414 >> ?? Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod >> ?? Patch: https://git.openjdk.java.net/jdk/pull/1233.diff >> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >> pull/1233/head:pull/1233 >> >> PR: https://git.openjdk.java.net/jdk/pull/1233 > From mcimadamore at openjdk.java.net Tue Nov 17 11:15:13 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 17 Nov 2020 11:15:13 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v26] In-Reply-To: <_MAO7-ELT_n5RxduwWzN7gJ23wpv06mQZftEFwtNWYs=.579b699a-8424-4eac-9b30-b37607e1ba32@github.com> References: <_MAO7-ELT_n5RxduwWzN7gJ23wpv06mQZftEFwtNWYs=.579b699a-8424-4eac-9b30-b37607e1ba32@github.com> Message-ID: On Mon, 16 Nov 2020 18:44:55 GMT, Rajan Halade wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix aarch64 test failure > > src/java.base/share/classes/jdk/internal/loader/NativeLibraries.java line 387: > >> 385: } >> 386: >> 387: public static NativeLibrary defaultLibrary = new NativeLibraryImpl(Object.class, "", true, true) { > > This field can be final. Thanks - I already made this change in the latest revision. ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From mcimadamore at openjdk.java.net Tue Nov 17 11:49:26 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 17 Nov 2020 11:49:26 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v28] In-Reply-To: References: Message-ID: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 95 commits: - Merge branch 'master' into 8254231_linker - Add `final` modifier on NativeLibraries.defaultLookup - Fix aarch64 test failure - Fix signature mismatch on aarch64 - Merge pull request #9 from JornVernee/Windows_Warnings Fix warnings on MSVC - Fix warnings on MSVC - Merge pull request #8 from JornVernee/Vlad_Comments Address More Review comments - - Don't print anything in nmehtod debug output for native invoker if there are none. - Use memcpy to copy native stubs to nmethod data - Simplify print code - Merge branch '8254231_linker' into Vlad_Comments - ... and 85 more: https://git.openjdk.java.net/jdk/compare/a7422ac2...40bd5df1 ------------- Changes: https://git.openjdk.java.net/jdk/pull/634/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=27 Stats: 67469 lines in 212 files changed: 67290 ins; 79 del; 100 mod Patch: https://git.openjdk.java.net/jdk/pull/634.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634 PR: https://git.openjdk.java.net/jdk/pull/634 From sjohanss at openjdk.java.net Tue Nov 17 13:00:05 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 17 Nov 2020 13:00:05 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v4] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 10:55:19 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? >> >> Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. >> >> With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. >> >> This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. >> >> Testing: tier1-5, one or two 6-8 runs >> >> The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - sjohanss review > - Remove code that "activates" dormant objects as now all active objects are reachable via the root object array Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From sjohanss at openjdk.java.net Tue Nov 17 13:00:07 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 17 Nov 2020 13:00:07 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v3] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 10:56:44 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4458: >> >>> 4456: heap_region_iterate(&cl); >>> 4457: >>> 4458: remove_from_old_gen_sets(0, 0, cl.humongous_regions_reclaimed()); >> >> Looking at this call and now having three parameters that are "optional" for `remove_from_old_gen_sets()` I wonder if it would be cleaner to have three functions, one for each set. It would increase the number of times we take the look, but we could restructure the code in `G1ReclaimEmptyRegionsTask` to not do the updates in the worker threads and that way only take the lock when there will be no contention. >> >> If you fell like this is outside the scope of this change, please file an issue instead. > > I am not very clear on what the problem there is and how the parameters are optional. I do not completely understand how adding an extra method just for this caller would improve the code significantly. > > I'll opt to defer this cleanup to a separate CR. Sounds good. ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From kbarrett at openjdk.java.net Tue Nov 17 14:55:07 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 17 Nov 2020 14:55:07 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v4] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 10:55:19 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? >> >> Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. >> >> With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. >> >> This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. >> >> Testing: tier1-5, one or two 6-8 runs >> >> The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - sjohanss review > - Remove code that "activates" dormant objects as now all active objects are reachable via the root object array Marked as reviewed by kbarrett (Reviewer). src/hotspot/share/memory/heapShared.cpp line 660: > 658: VM_Verify verify_op; > 659: VMThread::execute(&verify_op); > 660: if (!FLAG_IS_DEFAULT(VerifyArchivedFields)) { Comment says "command line", so this should be FLAG_IS_CMDLINE rather than !FLAG_IS_DEFAULT. src/hotspot/share/memory/heapShared.cpp line 662: > 660: if (!FLAG_IS_DEFAULT(VerifyArchivedFields)) { > 661: // If this -XX:+VerifyArchivedFields is specified on the command-line, do extra > 662: // checks. s/this// ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From github.com+670087+jrziviani at openjdk.java.net Tue Nov 17 15:20:06 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Tue, 17 Nov 2020 15:20:06 GMT Subject: Integrated: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 17:00:43 GMT, Ziviani wrote: > - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0. > - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0. > Ref: PowerISA 3.1, page 129. > > These instructions are particularly interesting to improve the following > pattern `(src1src2)? 1: 0)`, which can be found in > `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches. > > Long.toString, that generate such pattern in getChars, has showed a > good performance gain by using these new instructions. > > Example: > for (int i = 0; i < 200_000; i++) > res = Long.toString((long)i); > > java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString > > Without setbc (average): 0.1178 seconds > With setbc (average): 0.0396 seconds This pull request has now been integrated. Changeset: c3717826 Author: Jose Ricardo Ziviani Committer: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/c3717826 Stats: 180 lines in 7 files changed: 62 ins; 94 del; 24 mod 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions Reviewed-by: mdoerr ------------- PR: https://git.openjdk.java.net/jdk/pull/907 From gziemski at openjdk.java.net Tue Nov 17 15:20:13 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 17 Nov 2020 15:20:13 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 20:15:24 GMT, Thomas Stuefe wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix missing os::print_signal_handlers on Windows, white space on Linux > > src/hotspot/os/posix/signals_posix.hpp line 31: > >> 29: #include "utilities/globalDefinitions.hpp" >> 30: >> 31: // Forward declarations to be independent of the include structure. > > I don't think you have to write this, this is clear from the context (we do this in a zillion other places too). Fixed. > src/hotspot/os/posix/signals_posix.cpp line 1389: > >> 1387: } >> 1388: >> 1389: int PosixSignals::unblock_thread_signal_mask(const sigset_t *set) { > > I don't think we need this (nor the prototype in the header) Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 17 15:28:09 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 17 Nov 2020 15:28:09 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v7] In-Reply-To: References: Message-ID: <9zF4F1nd2tdhVzW0bDRjo8tXGDwFeTX-vES86x0k66g=.a3b41dd8-9379-4f7b-9fac-8531e4a379b4@github.com> On Mon, 16 Nov 2020 20:34:47 GMT, Thomas Stuefe wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix missing os::print_signal_handlers on Windows, white space on Linux > > src/hotspot/os/posix/signals_posix.cpp line 1724: > >> 1722: // initialize suspend/resume support - must do this before signal_sets_init() >> 1723: if (SR_initialize() != 0) { >> 1724: perror("SR_initialize failed"); > > perror() is old :) does not make much sense here either since errno could be stale. Could you pls switch this to whatever we do at this point when facing an unrecoverable error (probly vm_exit_initialization or similar) Fixed, used `vm_exit_during_initialization(err_msg("SR_initialize failed"));` > src/hotspot/os/posix/signals_posix.cpp line 1393: > >> 1391: } >> 1392: >> 1393: void signal_sets_init() { > > If this is not needed outside this compilation unit, make it pls static. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From mdoerr at openjdk.java.net Tue Nov 17 15:33:10 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 17 Nov 2020 15:33:10 GMT Subject: RFR: 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX Message-ID: C2 crashes on Power9 when using UseVectorByteReverseInstructionsPPC64 without SuperwordUseVSX. bytes_reverse_long_vecNode uses a Vector Register. See bug for more details. The VectorConversion tests can detect the issue. ------------- Commit messages: - 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX Changes: https://git.openjdk.java.net/jdk/pull/1262/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1262&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256479 Stats: 18 lines in 2 files changed: 16 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1262.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1262/head:pull/1262 PR: https://git.openjdk.java.net/jdk/pull/1262 From gziemski at openjdk.java.net Tue Nov 17 15:41:13 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 17 Nov 2020 15:41:13 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v7] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 01:46:41 GMT, David Holmes wrote: > I'm seeing unexpected changes relating to os::fetch_compiled_frame_from_context seemingly due to the "Merge from master". > > Otherwise the actual described changes since last commit seem fine. > > Once Thomas's last couple of comments are addressed this is done. > > Thanks, > David The `ucontext_get_pc()` API needs to be in `os::Posix`, not `PosixSignals`, so that change is OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From stuefe at openjdk.java.net Tue Nov 17 15:41:14 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 17 Nov 2020 15:41:14 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v7] In-Reply-To: <9zF4F1nd2tdhVzW0bDRjo8tXGDwFeTX-vES86x0k66g=.a3b41dd8-9379-4f7b-9fac-8531e4a379b4@github.com> References: <9zF4F1nd2tdhVzW0bDRjo8tXGDwFeTX-vES86x0k66g=.a3b41dd8-9379-4f7b-9fac-8531e4a379b4@github.com> Message-ID: On Tue, 17 Nov 2020 15:23:38 GMT, Gerard Ziemski wrote: >> src/hotspot/os/posix/signals_posix.cpp line 1724: >> >>> 1722: // initialize suspend/resume support - must do this before signal_sets_init() >>> 1723: if (SR_initialize() != 0) { >>> 1724: perror("SR_initialize failed"); >> >> perror() is old :) does not make much sense here either since errno could be stale. Could you pls switch this to whatever we do at this point when facing an unrecoverable error (probly vm_exit_initialization or similar) > > Fixed, used `vm_exit_during_initialization(err_msg("SR_initialize failed"));` No need for the err_msg here, just use the plain string literal. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From goetz at openjdk.java.net Tue Nov 17 15:44:03 2020 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Tue, 17 Nov 2020 15:44:03 GMT Subject: RFR: 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:27:29 GMT, Martin Doerr wrote: > C2 crashes on Power9 when using UseVectorByteReverseInstructionsPPC64 without SuperwordUseVSX. bytes_reverse_long_vecNode uses a Vector Register. See bug for more details. > > The VectorConversion tests can detect the issue. Looks good to me ------------- Marked as reviewed by goetz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1262 From stuefe at openjdk.java.net Tue Nov 17 15:50:25 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 17 Nov 2020 15:50:25 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v8] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:47:04 GMT, Gerard Ziemski wrote: >> hi all, >> >> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >> >> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >> >> #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) >> >> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) >> >> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >> >> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >> >> #6 Coleen's feedback - factored out print_signal_handlers() >> >> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >> >> #8 Thomas's feedback - factored out common POSIX signal initialization code >> >> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >> >> #10 YaSuenag's feedback - unified logging out of the scope for this fix >> >> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > - remove leftover comment > - implement Thomas' final feedback Hi Gerard, some final nits. I'll run the change through our system tonight to see if this broke one of our platforms. Cheers, Thomas src/hotspot/os/posix/signals_posix.cpp line 1: > 1: /* superfluous change src/hotspot/os/posix/signals_posix.cpp line 1721: > 1719: // initialize suspend/resume support - must do this before signal_sets_init() > 1720: if (SR_initialize() != 0) { > 1721: vm_exit_during_initialization(err_msg("SR_initialize failed")); err_msg not needed, just use the plain literal ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 17 15:50:24 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 17 Nov 2020 15:50:24 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v8] In-Reply-To: References: Message-ID: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: - remove leftover comment - implement Thomas' final feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/636/files - new: https://git.openjdk.java.net/jdk/pull/636/files/44d34bac..e7ca88f9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=06-07 Stats: 11 lines in 2 files changed: 1 ins; 8 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 17 15:50:25 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 17 Nov 2020 15:50:25 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v7] In-Reply-To: References: Message-ID: <0oEEoXXtlh05aHsqH2V6fuy016qLRO1mJseP1A_hWxM=.8c5998af-e68a-4440-9fdd-91efc4eeee0b@github.com> On Tue, 17 Nov 2020 15:38:55 GMT, Gerard Ziemski wrote: >> I'm seeing unexpected changes relating to os::fetch_compiled_frame_from_context seemingly due to the "Merge from master". >> >> Otherwise the actual described changes since last commit seem fine. >> >> Once Thomas's last couple of comments are addressed this is done. >> >> Thanks, >> David > >> I'm seeing unexpected changes relating to os::fetch_compiled_frame_from_context seemingly due to the "Merge from master". >> >> Otherwise the actual described changes since last commit seem fine. >> >> Once Thomas's last couple of comments are addressed this is done. >> >> Thanks, >> David > > The `ucontext_get_pc()` API needs to be in `os::Posix`, not `PosixSignals`, so that change is OK. Thank you Thomas and David for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 17 15:59:19 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 17 Nov 2020 15:59:19 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v9] In-Reply-To: References: Message-ID: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: last tweaks ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/636/files - new: https://git.openjdk.java.net/jdk/pull/636/files/e7ca88f9..45dfe8ec Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=636&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/636.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/636/head:pull/636 PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 17 15:59:20 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 17 Nov 2020 15:59:20 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v8] In-Reply-To: References: Message-ID: <2gGCbf-kP_ageIo31EL5ov8uRdi8c-ZuRMztYnSc9sM=.6ff14ba6-afd4-428e-81b1-7c51e072429d@github.com> On Tue, 17 Nov 2020 15:45:58 GMT, Thomas Stuefe wrote: >> Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove leftover comment >> - implement Thomas' final feedback > > src/hotspot/os/posix/signals_posix.cpp line 1721: > >> 1719: // initialize suspend/resume support - must do this before signal_sets_init() >> 1720: if (SR_initialize() != 0) { >> 1721: vm_exit_during_initialization(err_msg("SR_initialize failed")); > > err_msg not needed, just use the plain literal Ah, `err_msg()` is for the cases where we need to print out values (i.e. formatting). ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From gziemski at openjdk.java.net Tue Nov 17 15:59:21 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 17 Nov 2020 15:59:21 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v9] In-Reply-To: References: Message-ID: <1kpgT4lN7nKVO8BkXXQQ39feJ_8ErbTKNThT__OEMwU=.c31c6a79-08f3-48d3-b1fc-fabd9de74603@github.com> On Tue, 17 Nov 2020 15:44:09 GMT, Thomas Stuefe wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> last tweaks > > src/hotspot/os/posix/signals_posix.cpp line 1: > >> 1: /* > > superfluous change Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From redestad at openjdk.java.net Tue Nov 17 16:14:08 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 16:14:08 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v4] In-Reply-To: References: Message-ID: <_3YMX9f2F3xUZnge_avARhEOikx-f_hZB2VQ1SKZ4DM=.e2dbb88d-6c69-423e-9099-88ec9ec85827@github.com> On Tue, 17 Nov 2020 10:55:19 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? >> >> Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. >> >> With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. >> >> This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. >> >> Testing: tier1-5, one or two 6-8 runs >> >> The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - sjohanss review > - Remove code that "activates" dormant objects as now all active objects are reachable via the root object array Looks good! I took a sweep through the code and have some nits that you may choose to ignore. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4538: > 4536: } else { > 4537: // We ignore free regions, we'll empty the free list afterwards. > 4538: assert(hr->is_free(), Can make this one line src/hotspot/share/memory/heapShared.cpp line 334: > 332: } > 333: > 334: // Returns an objArray that contains all the roots of the archived objects It does..? src/hotspot/share/memory/heapShared.cpp line 413: > 411: int length = _pending_roots != NULL ? _pending_roots->length() : 0; > 412: int size = objArrayOopDesc::object_size(length); > 413: Klass *k = Universe::objectArrayKlassObj(); // already relocated to point to archived klass `Klass* k` ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1163 From eosterlund at openjdk.java.net Tue Nov 17 16:37:04 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 17 Nov 2020 16:37:04 GMT Subject: RFR: 8256365: Clean up vtable initialization code In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 20:38:22 GMT, Coleen Phillimore wrote: > I was looking through this code because of JDK-8061949 and want to do some minor cleanups. > 1. There's a function in the wrong place (is_override) > 2. methodHandles that use mh()->is_native(), with extra (), > 3. some methods declared with TRAPS, that don't trap > 4. some multi-clause conditionals with confusing formatting > 5. extra InstanceKlass::cast() casts > 6. some useless asserts > 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) > > Tested with tier1-3. I don't think I like having nasty and subtle bug fixes and subtle behavioural changes hidden in what otherwise looks like a large (ish) cleanup patch. Could we do the bug fix first, and then rebase this mostly cleanup related change on top of that? As is, I have to read every line very carefully to see if it is just a cleanup or a subtle behaviour change. src/hotspot/share/oops/klassVtable.cpp line 230: > 228: HandleMark hm(THREAD); > 229: assert(default_methods->at(i)->is_method(), "must be a Method*"); > 230: methodHandle mh(THREAD, default_methods->at(i)); Maybe wrap the method in a {} block so you can see when it goes out of scope and don't get tempted to use it after it becomes invalid. src/hotspot/share/oops/klassVtable.cpp line 474: > 472: // (TBD: put in a method to throw NoSuchMethodError if this slot is ever used.) > 473: if (super_method->name() == name && super_method->signature() == signature && > 474: (!klass->is_interface() || Is _klass and klass really the same? I thought they could be different. If so, this looks like a subtle behavioural change hidden in a large cleanup patch. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1236 From coleenp at openjdk.java.net Tue Nov 17 16:50:07 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 17 Nov 2020 16:50:07 GMT Subject: RFR: 8256365: Clean up vtable initialization code In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 16:18:46 GMT, Erik ?sterlund wrote: >> I was looking through this code because of JDK-8061949 and want to do some minor cleanups. >> 1. There's a function in the wrong place (is_override) >> 2. methodHandles that use mh()->is_native(), with extra (), >> 3. some methods declared with TRAPS, that don't trap >> 4. some multi-clause conditionals with confusing formatting >> 5. extra InstanceKlass::cast() casts >> 6. some useless asserts >> 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) >> >> Tested with tier1-3. > > src/hotspot/share/oops/klassVtable.cpp line 474: > >> 472: // (TBD: put in a method to throw NoSuchMethodError if this slot is ever used.) >> 473: if (super_method->name() == name && super_method->signature() == signature && >> 474: (!klass->is_interface() || > > Is _klass and klass really the same? I thought they could be different. If so, this looks like a subtle behavioural change hidden in a large cleanup patch. Yes, they are the same which is why I made this change. I thought of taking out the klass parameter and using _klass everywhere so there isn't this extra _klass v. klass distinction to worry about. ------------- PR: https://git.openjdk.java.net/jdk/pull/1236 From coleenp at openjdk.java.net Tue Nov 17 16:59:10 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 17 Nov 2020 16:59:10 GMT Subject: RFR: 8256365: Clean up vtable initialization code In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 16:34:46 GMT, Erik ?sterlund wrote: >> I was looking through this code because of JDK-8061949 and want to do some minor cleanups. >> 1. There's a function in the wrong place (is_override) >> 2. methodHandles that use mh()->is_native(), with extra (), >> 3. some methods declared with TRAPS, that don't trap >> 4. some multi-clause conditionals with confusing formatting >> 5. extra InstanceKlass::cast() casts >> 6. some useless asserts >> 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) >> >> Tested with tier1-3. > > I don't think I like having nasty and subtle bug fixes and subtle behavioural changes hidden in what otherwise looks like a large (ish) cleanup patch. Could we do the bug fix first, and then rebase this mostly cleanup related change on top of that? As is, I have to read every line very carefully to see if it is just a cleanup or a subtle behaviour change. The behavioral change is that that redefinition can make a method an old method in the safepoint that does constraint checking. I can take this part out and leave the cleanup in this PR. See my other comment about _klass vs. klass. > src/hotspot/share/oops/klassVtable.cpp line 230: > >> 228: HandleMark hm(THREAD); >> 229: assert(default_methods->at(i)->is_method(), "must be a Method*"); >> 230: methodHandle mh(THREAD, default_methods->at(i)); > > Maybe wrap the method in a {} block so you can see when it goes out of scope and don't get tempted to use it after it becomes invalid. Do you mean this methodHandle mh? vs the refetch of method below on line 232? Since your main comment was to not mix the behavior change with the cleanup, I'm going to revert line 232 and do a different PR for that so that I can adequately describe the redefinition problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/1236 From jbhateja at openjdk.java.net Tue Nov 17 17:25:07 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 17 Nov 2020 17:25:07 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 08:17:44 GMT, Nils Eliasson wrote: > ok - looks good! Hi @neliasso, thanks for your comments and review approval. Hi @vnkozlov, kindly let me know if there are any other comments from your end. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From iklam at openjdk.java.net Tue Nov 17 18:09:12 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 17 Nov 2020 18:09:12 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v4] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 14:51:23 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: >> >> - sjohanss review >> - Remove code that "activates" dormant objects as now all active objects are reachable via the root object array > > src/hotspot/share/memory/heapShared.cpp line 662: > >> 660: if (!FLAG_IS_DEFAULT(VerifyArchivedFields)) { >> 661: // If this -XX:+VerifyArchivedFields is specified on the command-line, do extra >> 662: // checks. > > s/this// I think we can keep the code and change the comments to the following: // If VerifyArchivedFields has a non-default value (e.g., specified on the command-line), do // more expensive checks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From kvn at openjdk.java.net Tue Nov 17 18:12:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 18:12:13 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 05:31:10 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Review comments resolved > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge remote-tracking branch 'upstream' into JDK-8252848 > - JDK-8252848 : Review comments resolved > - JDK-8252848: Review comments resolution. > - JDK-8252848: Review comments addressed. > - Merge remote-tracking branch 'origin' into JDK-8252848 > - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 > - ... and 6 more: https://git.openjdk.java.net/jdk/compare/1d3d64f3...edb74db3 Changes requested by kvn (Reviewer). src/hotspot/share/opto/macroArrayCopy.cpp line 215: > 213: const_len = lty->get_con() << shift; > 214: } else if ((lty = _igvn.type(length)->isa_int()) && lty->is_con()) { > 215: const_len = lty->get_con() << shift; isa_int() may return NULL (common case if input is TOP). I suggest to refactor this code to check for that. And, please, don't use assignment inside checks. src/hotspot/share/opto/macroArrayCopy.cpp line 207: > 205: Node* orig_mem = *mem; > 206: Node* is_lt64bytes_tp = NULL; > 207: Node* is_lt64bytes_fp = NULL; It is difficult distinguish between _tp and _fp. Please, use whole words: false, true. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From iklam at openjdk.java.net Tue Nov 17 18:13:05 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 17 Nov 2020 18:13:05 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v4] In-Reply-To: <_3YMX9f2F3xUZnge_avARhEOikx-f_hZB2VQ1SKZ4DM=.e2dbb88d-6c69-423e-9099-88ec9ec85827@github.com> References: <_3YMX9f2F3xUZnge_avARhEOikx-f_hZB2VQ1SKZ4DM=.e2dbb88d-6c69-423e-9099-88ec9ec85827@github.com> Message-ID: On Tue, 17 Nov 2020 15:55:07 GMT, Claes Redestad wrote: >> Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: >> >> - sjohanss review >> - Remove code that "activates" dormant objects as now all active objects are reachable via the root object array > > src/hotspot/share/memory/heapShared.cpp line 334: > >> 332: } >> 333: >> 334: // Returns an objArray that contains all the roots of the archived objects > > It does..? Oops good catch! This comment should be moved to above `oop HeapShared::get_root(int index, bool clear) {` ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From tschatzl at openjdk.java.net Tue Nov 17 18:20:17 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 17 Nov 2020 18:20:17 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v5] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? > > Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. > > With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. > > This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. > > Testing: tier1-5, one or two 6-8 runs > > The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: kbarrett cl4es review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1163/files - new: https://git.openjdk.java.net/jdk/pull/1163/files/6669b529..d68d7527 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1163&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1163&range=03-04 Stats: 10 lines in 2 files changed: 1 ins; 2 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/1163.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1163/head:pull/1163 PR: https://git.openjdk.java.net/jdk/pull/1163 From iignatyev at openjdk.java.net Tue Nov 17 19:31:16 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 17 Nov 2020 19:31:16 GMT Subject: RFR: 8256430: add linux-x64-optimized to regular testing [v2] In-Reply-To: References: Message-ID: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: build only hotspot for optimized ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1244/files - new: https://git.openjdk.java.net/jdk/pull/1244/files/05c735be..2d861d14 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1244&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1244&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1244.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1244/head:pull/1244 PR: https://git.openjdk.java.net/jdk/pull/1244 From github.com+670087+jrziviani at openjdk.java.net Tue Nov 17 19:42:05 2020 From: github.com+670087+jrziviani at openjdk.java.net (Ziviani) Date: Tue, 17 Nov 2020 19:42:05 GMT Subject: RFR: 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:27:29 GMT, Martin Doerr wrote: > C2 crashes on Power9 when using UseVectorByteReverseInstructionsPPC64 without SuperwordUseVSX. bytes_reverse_long_vecNode uses a Vector Register. See bug for more details. > > The VectorConversion tests can detect the issue. Marked as reviewed by jrziviani at github.com (no known OpenJDK username). Hello @TheRealMDoerr, thank you for catching and fixing it. The looks good and works great (tested in P9 and P10) jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:-SuperwordUseVSX -XX:+UseVectorByteReverseInstructionsPPC64 -version OpenJDK 64-Bit Server VM warning: UseVectorByteReverseInstructionsPPC64 specified, but needs SuperwordUseVSX. ------------- PR: https://git.openjdk.java.net/jdk/pull/1262 From psandoz at openjdk.java.net Tue Nov 17 19:48:06 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 17 Nov 2020 19:48:06 GMT Subject: RFR: 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX In-Reply-To: References: Message-ID: <2niGhUOD2rNsM5tuIeMr3Jhg4Vtot1LNS6_ffYl6egk=.c4dbfa4a-efa7-46c6-b8ad-7f0dd5bf7872@github.com> On Tue, 17 Nov 2020 15:27:29 GMT, Martin Doerr wrote: > C2 crashes on Power9 when using UseVectorByteReverseInstructionsPPC64 without SuperwordUseVSX. bytes_reverse_long_vecNode uses a Vector Register. See bug for more details. > > The VectorConversion tests can detect the issue. Test update looks good. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1262 From erikj at openjdk.java.net Tue Nov 17 19:55:12 2020 From: erikj at openjdk.java.net (Erik Joelsson) Date: Tue, 17 Nov 2020 19:55:12 GMT Subject: RFR: 8256430: add linux-x64-optimized to regular testing [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:31:16 GMT, Igor Ignatyev wrote: >> Hi all, >> >> >> [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? >> >> Thanks >> -- Igor >> >> cc-ing @dcubed-ojdk > > Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: > > build only hotspot for optimized Marked as reviewed by erikj (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1244 From iignatyev at openjdk.java.net Tue Nov 17 20:04:08 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 17 Nov 2020 20:04:08 GMT Subject: RFR: 8256430: add linux-x64-optimized to regular testing [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:52:49 GMT, Erik Joelsson wrote: >> Igor Ignatyev has updated the pull request incrementally with one additional commit since the last revision: >> >> build only hotspot for optimized > > Marked as reviewed by erikj (Reviewer). folks, thanks for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1244 From iignatyev at openjdk.java.net Tue Nov 17 20:04:09 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 17 Nov 2020 20:04:09 GMT Subject: Integrated: 8256430: add linux-x64-optimized to regular testing In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 00:31:24 GMT, Igor Ignatyev wrote: > Hi all, > > > [8256414](https://bugs.openjdk.java.net/browse/JDK-8256414) / #1233 added similar profile to submit workflow, this patch defines `linux-x64-optimized` profile in jib-profile so it can be used by mach5 and added to tier1? > > Thanks > -- Igor > > cc-ing @dcubed-ojdk This pull request has now been integrated. Changeset: d9dbd5de Author: Igor Ignatyev URL: https://git.openjdk.java.net/jdk/commit/d9dbd5de Stats: 13 lines in 1 file changed: 13 ins; 0 del; 0 mod 8256430: add linux-x64-optimized to regular testing Reviewed-by: kvn, dcubed, vlivanov, erikj ------------- PR: https://git.openjdk.java.net/jdk/pull/1244 From kvn at openjdk.java.net Tue Nov 17 20:24:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 20:24:07 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 18:08:58 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Review comments resolved >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge remote-tracking branch 'upstream' into JDK-8252848 >> - JDK-8252848 : Review comments resolved >> - JDK-8252848: Review comments resolution. >> - JDK-8252848: Review comments addressed. >> - Merge remote-tracking branch 'origin' into JDK-8252848 >> - JDK-8252848 : Replacing generic assembler routine evmovdqu with macro assembly routine calling type specific leaf level assembly functions. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8252848 >> - ... and 6 more: https://git.openjdk.java.net/jdk/compare/1d3d64f3...edb74db3 > > Changes requested by kvn (Reviewer). I ran tier1-tier4 with latest changes and got failures in TestArrayCopyDisjoint.java and TestArrayCopyConjoint.java tests: java -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=64 -XX:MaxVectorSize=64 compiler.arraycopy.TestArrayCopyDisjoint Test Byte constant length 45 [class [B] Result mismtach at i = 13 expected = 57 actual = 70 fromPos = 1324 toPos = 1353 java.lang.Error: Fail at compiler.arraycopy.TestArrayCopyDisjoint.validate(TestArrayCopyDisjoint.java:95) at compiler.arraycopy.TestArrayCopyDisjoint.testByte_constant_LT64B(TestArrayCopyDisjoint.java:162) at compiler.arraycopy.TestArrayCopyDisjoint.main(TestArrayCopyDisjoint.java:207) java -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=64 -XX:MaxVectorSize=64 compiler.arraycopy.TestArrayCopyConjoint elapsed time (seconds): 7.464 Test Byte constant length 45 [class [B] Result mismtach at i = 13 expected = 109 actual = 111 fromPos = 1120 toPos = 1122 java.lang.Error: Fail at compiler.arraycopy.TestArrayCopyConjoint.validate(TestArrayCopyConjoint.java:124) at compiler.arraycopy.TestArrayCopyConjoint.testByte_constant_LT64B(TestArrayCopyConjoint.java:192) at compiler.arraycopy.TestArrayCopyConjoint.main(TestArrayCopyConjoint.java:240) ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From redestad at openjdk.java.net Tue Nov 17 20:25:06 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 20:25:06 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v5] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 18:20:17 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? >> >> Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. >> >> With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. >> >> This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. >> >> Testing: tier1-5, one or two 6-8 runs >> >> The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > kbarrett cl4es review Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From github.com+58006833+xbzhang99 at openjdk.java.net Tue Nov 17 20:49:07 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 17 Nov 2020 20:49:07 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: <7jUr9izZQ1h44rV3OCRR-16J2bN3eAJKN559LIvjb1M=.48f7411b-3306-4c71-92e6-208fa18da860@github.com> References: <-FdDOTDgxvTFXLBjs4ZCO3xHA5NM0yLXaPyMgrebbQA=.f533c3f1-0199-40b0-b507-104d7997c7cf@github.com> <7jUr9izZQ1h44rV3OCRR-16J2bN3eAJKN559LIvjb1M=.48f7411b-3306-4c71-92e6-208fa18da860@github.com> Message-ID: On Tue, 17 Nov 2020 02:21:49 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/macroAssembler_x86_exp.cpp line 496: >> >>> 494: movl(Address(rsp, 64), tmp); >>> 495: lea(tmp, ExternalAddress(static_const_table)); >>> 496: movsd(xmm0, Address(rsp, 128)); >> >> Can you explain this change? > > Would be nice to add comment about what values are on stack. movdqu moves 128 bits from the memory, while movsd moves 64 bits. movsd is what's needed for double precision calculation. In this case however, no harm was done even using movdqu, as the subsequent vunpcklpd would broadcast only the lower 64bits. Still it is safe to change to movsd to begin with an example of stack is c05ec00000000000. movqqu would move 0x00000000e719ee40c05ec00000000000 ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Tue Nov 17 20:49:10 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 17 Nov 2020 20:49:10 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v6] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 02:05:14 GMT, Joe Darcy wrote: >> Xubo Zhang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Fixed the bug in 32-bit build, exp generates 0 when the exponent is too large > > test/jdk/java/lang/Math/ExpCornerCaseTests.java line 2: > >> 1: /* >> 2: * Copyright (c) 2011,2020 Oracle and/or its affiliates. All rights reserved. > > The year of the copyright syntax is "2011, 2020,"; no need for a re-review before pushing after that correction. Thanks. changed ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From github.com+58006833+xbzhang99 at openjdk.java.net Tue Nov 17 21:04:22 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Tue, 17 Nov 2020 21:04:22 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: References: Message-ID: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: fixed copyright syntax ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/894/files - new: https://git.openjdk.java.net/jdk/pull/894/files/704dfff2..5932c732 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=894&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/894.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/894/head:pull/894 PR: https://git.openjdk.java.net/jdk/pull/894 From redestad at openjdk.java.net Tue Nov 17 21:10:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 17 Nov 2020 21:10:09 GMT Subject: RFR: 8256254: Convert vmIntrinsics::ID to enum class In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 20:46:46 GMT, Ioi Lam wrote: > This PR is follows the same style as https://github.com/openjdk/jdk/pull/276, except this time I am converting `vmIntrinsics::ID` to `vmIntrinsicID`: > > * Convert `vmIntrinsics::SID` to `enum class` to provide better type safety. > * Also, put this enum class in the top level, so it can be forward-declared. I.e., `enum class vmIntrinsicID : int;`. This avoids excessive inclusion of vmIntrinsics.hpp and vmSymbols.hpp (which were included indirectly by almost every hotspot source files). > * vmIntrinsics.hpp: was included 805 times, now included 414 times > * vmSymbols.hpp: was included 805 times, now include 394 times > * Note: more #include reduction will be done in [JDK-8256424](https://bugs.openjdk.java.net/browse/JDK-8256424) > > Many files are changed, but most of them are minor > * Added missing dependencies of vmSymbols.hpp and/or vmIntrinsics.hpp > * safe conversion between vmIntrinsicID and integer types (see comments around `vmIntrinsics::as_int()`) > > Since we have a lot of references like `vmIntrinsics::_invokeGeneric`, I added aliases like > > static const vmIntrinsicID vmIntrinsics::_invokeGeneric = vmIntrinsicID::_invokeGeneric > > so we don't need to change over a thousand `vmIntrinsics::XXX` to `vmIntrinsicID::XXX`. Looks ok. src/hotspot/share/interpreter/abstractInterpreter.cpp line 117: > 115: vmIntrinsics::ID id = m->intrinsic_id(); > 116: assert(MethodHandles::is_signature_polymorphic(id), "must match an intrinsic"); > 117: MethodKind kind = (MethodKind)( method_handle_invoke_FIRST + pre-existing extraneous whitespace src/hotspot/share/interpreter/abstractInterpreter.hpp line 98: > 96: static vmIntrinsics::ID method_handle_intrinsic(MethodKind kind) { > 97: if (kind >= method_handle_invoke_FIRST && kind <= method_handle_invoke_LAST) > 98: return vmIntrinsics::ID_from(static_cast(vmIntrinsics::FIRST_MH_SIG_POLY) + (kind - method_handle_invoke_FIRST) ); pre-existing extra whitespace src/hotspot/share/classfile/vmIntrinsics.hpp line 1085: > 1083: } > 1084: > 1085: static constexpr size_t number_of_intrinsics() { Could this be returning int? src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 164: > 162: > 163: JVMCIObjectArray CompilerToVM::initialize_intrinsics(JVMCI_TRAPS) { > 164: int len = static_cast(vmIntrinsics::ID_LIMIT) - 1; vmIntrinsics::number_of_intrinsics? ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1237 From kvn at openjdk.java.net Tue Nov 17 21:14:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 21:14:07 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> Message-ID: <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> On Tue, 17 Nov 2020 21:04:22 GMT, Xubo Zhang wrote: >> Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number > > Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: > > fixed copyright syntax Okay ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/894 From sviswanathan at openjdk.java.net Tue Nov 17 21:42:06 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 17 Nov 2020 21:42:06 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> Message-ID: On Tue, 17 Nov 2020 21:10:49 GMT, Vladimir Kozlov wrote: >> Xubo Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed copyright syntax > > Okay Hi Vladimir, Please let me know the next steps on this. Looks like running tests need approval. Xubo is a first time contributor. Best Regards, Sandhya ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From coleenp at openjdk.java.net Tue Nov 17 21:54:17 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 17 Nov 2020 21:54:17 GMT Subject: RFR: 8256365: Clean up vtable initialization code [v2] In-Reply-To: References: Message-ID: > I was looking through this code because of JDK-8061949 and want to do some minor cleanups. > 1. There's a function in the wrong place (is_override) > 2. methodHandles that use mh()->is_native(), with extra (), > 3. some methods declared with TRAPS, that don't trap > 4. some multi-clause conditionals with confusing formatting > 5. extra InstanceKlass::cast() casts > 6. some useless asserts > 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) > > Tested with tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Revert RedefineClasses fix and remove InstanceKlass argument from update_inherited_vtable so which klass is clear. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1236/files - new: https://git.openjdk.java.net/jdk/pull/1236/files/b839b156..364f8195 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1236&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1236&range=00-01 Stats: 17 lines in 2 files changed: 5 ins; 6 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1236.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1236/head:pull/1236 PR: https://git.openjdk.java.net/jdk/pull/1236 From eosterlund at openjdk.java.net Tue Nov 17 22:01:07 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 17 Nov 2020 22:01:07 GMT Subject: RFR: 8256365: Clean up vtable initialization code [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 21:54:17 GMT, Coleen Phillimore wrote: >> I was looking through this code because of JDK-8061949 and want to do some minor cleanups. >> 1. There's a function in the wrong place (is_override) >> 2. methodHandles that use mh()->is_native(), with extra (), >> 3. some methods declared with TRAPS, that don't trap >> 4. some multi-clause conditionals with confusing formatting >> 5. extra InstanceKlass::cast() casts >> 6. some useless asserts >> 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) >> >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Revert RedefineClasses fix and remove InstanceKlass argument from update_inherited_vtable so which klass is clear. Looks good. Thanks for separating out the bug fix. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1236 From eosterlund at openjdk.java.net Tue Nov 17 22:01:09 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 17 Nov 2020 22:01:09 GMT Subject: RFR: 8256365: Clean up vtable initialization code [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 16:47:26 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/klassVtable.cpp line 474: >> >>> 472: // (TBD: put in a method to throw NoSuchMethodError if this slot is ever used.) >>> 473: if (super_method->name() == name && super_method->signature() == signature && >>> 474: (!klass->is_interface() || >> >> Is _klass and klass really the same? I thought they could be different. If so, this looks like a subtle behavioural change hidden in a large cleanup patch. > > Yes, they are the same which is why I made this change. I thought of taking out the klass parameter and using _klass everywhere so there isn't this extra _klass v. klass distinction to worry about. Thanks it is more clear in the new patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/1236 From coleenp at openjdk.java.net Tue Nov 17 22:38:07 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 17 Nov 2020 22:38:07 GMT Subject: RFR: 8256365: Clean up vtable initialization code [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 21:57:52 GMT, Erik ?sterlund wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert RedefineClasses fix and remove InstanceKlass argument from update_inherited_vtable so which klass is clear. > > Looks good. Thanks for separating out the bug fix. Thanks Erik! ------------- PR: https://git.openjdk.java.net/jdk/pull/1236 From dholmes at openjdk.java.net Tue Nov 17 22:59:08 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 17 Nov 2020 22:59:08 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v9] In-Reply-To: References: Message-ID: <0oMeRRhZupog8tI80dsgVxx_luSw4HB1hloHTFrqHFQ=.48c17fde-f2a2-4ca9-b04b-14c0d503d0d7@github.com> On Tue, 17 Nov 2020 15:59:19 GMT, Gerard Ziemski wrote: >> hi all, >> >> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >> >> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >> >> #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) >> >> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) >> >> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >> >> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >> >> #6 Coleen's feedback - factored out print_signal_handlers() >> >> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >> >> #8 Thomas's feedback - factored out common POSIX signal initialization code >> >> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >> >> #10 YaSuenag's feedback - unified logging out of the scope for this fix >> >> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > last tweaks Marked as reviewed by dholmes (Reviewer). src/hotspot/os/posix/signals_posix.cpp line 1721: > 1719: // initialize suspend/resume support - must do this before signal_sets_init() > 1720: if (SR_initialize() != 0) { > 1721: vm_exit_during_initialization("SR_initialize failed"); You've lost the failure reason (errno) that perror would have provided. SR_initalize only returns non-zero if sigaction fails, which leaves errno set - hence perror() would report it. That said there are only two errors for sigaction (EFAULT or EINVAL) so we really haven't lost anything that useful. Arguably the vm_exit... should be inside SR_initialize. But I'm fine with this as-is so we can conclude this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From iklam at openjdk.java.net Tue Nov 17 23:16:17 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 17 Nov 2020 23:16:17 GMT Subject: RFR: 8256254: Convert vmIntrinsics::ID to enum class [v2] In-Reply-To: References: Message-ID: > This PR is follows the same style as https://github.com/openjdk/jdk/pull/276, except this time I am converting `vmIntrinsics::ID` to `vmIntrinsicID`: > > * Convert `vmIntrinsics::SID` to `enum class` to provide better type safety. > * Also, put this enum class in the top level, so it can be forward-declared. I.e., `enum class vmIntrinsicID : int;`. This avoids excessive inclusion of vmIntrinsics.hpp and vmSymbols.hpp (which were included indirectly by almost every hotspot source files). > * vmIntrinsics.hpp: was included 805 times, now included 414 times > * vmSymbols.hpp: was included 805 times, now include 394 times > * Note: more #include reduction will be done in [JDK-8256424](https://bugs.openjdk.java.net/browse/JDK-8256424) > > Many files are changed, but most of them are minor > * Added missing dependencies of vmSymbols.hpp and/or vmIntrinsics.hpp > * safe conversion between vmIntrinsicID and integer types (see comments around `vmIntrinsics::as_int()`) > > Since we have a lot of references like `vmIntrinsics::_invokeGeneric`, I added aliases like > > static const vmIntrinsicID vmIntrinsics::_invokeGeneric = vmIntrinsicID::_invokeGeneric > > so we don't need to change over a thousand `vmIntrinsics::XXX` to `vmIntrinsicID::XXX`. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @cl4es reviews ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1237/files - new: https://git.openjdk.java.net/jdk/pull/1237/files/313cef4b..4f8f2692 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1237&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1237&range=00-01 Stats: 16 lines in 6 files changed: 2 ins; 3 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1237.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1237/head:pull/1237 PR: https://git.openjdk.java.net/jdk/pull/1237 From iklam at openjdk.java.net Tue Nov 17 23:16:18 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 17 Nov 2020 23:16:18 GMT Subject: RFR: 8256254: Convert vmIntrinsics::ID to enum class [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 21:01:31 GMT, Claes Redestad wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @cl4es reviews > > src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 164: > >> 162: >> 163: JVMCIObjectArray CompilerToVM::initialize_intrinsics(JVMCI_TRAPS) { >> 164: int len = static_cast(vmIntrinsics::ID_LIMIT) - 1; > > vmIntrinsics::number_of_intrinsics? Thanks for the review. I have pushed a new version that addresses your comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/1237 From kvn at openjdk.java.net Tue Nov 17 23:25:07 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 17 Nov 2020 23:25:07 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> Message-ID: On Tue, 17 Nov 2020 21:39:10 GMT, Sandhya Viswanathan wrote: >> Okay > > Hi Vladimir, > Please let me know the next steps on this. Looks like running tests need approval. > Xubo is a first time contributor. > Best Regards, > Sandhya I asked Skara team question about testing approval. Meanwhile I submitted our internal tier1 testing. Note, we test only 64-bit VM. You need to run tests with 32-bit VM to verify fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From kvn at openjdk.java.net Wed Nov 18 00:09:06 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 18 Nov 2020 00:09:06 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 20:21:04 GMT, Vladimir Kozlov wrote: >> Changes requested by kvn (Reviewer). > > I ran tier1-tier4 with latest changes and got failures in TestArrayCopyDisjoint.java and TestArrayCopyConjoint.java tests: > java -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=64 -XX:MaxVectorSize=64 compiler.arraycopy.TestArrayCopyDisjoint > Test Byte constant length 45 [class [B] Result mismtach at i = 13 expected = 57 actual = 70 fromPos = 1324 toPos = 1353 > java.lang.Error: Fail > at compiler.arraycopy.TestArrayCopyDisjoint.validate(TestArrayCopyDisjoint.java:95) > at compiler.arraycopy.TestArrayCopyDisjoint.testByte_constant_LT64B(TestArrayCopyDisjoint.java:162) > at compiler.arraycopy.TestArrayCopyDisjoint.main(TestArrayCopyDisjoint.java:207) > > java -XX:-TieredCompilation -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:ArrayCopyPartialInlineSize=64 -XX:MaxVectorSize=64 compiler.arraycopy.TestArrayCopyConjoint > > elapsed time (seconds): 7.464 > Test Byte constant length 45 [class [B] Result mismtach at i = 13 expected = 109 actual = 111 fromPos = 1120 toPos = 1122 > java.lang.Error: Fail > at compiler.arraycopy.TestArrayCopyConjoint.validate(TestArrayCopyConjoint.java:124) > at compiler.arraycopy.TestArrayCopyConjoint.testByte_constant_LT64B(TestArrayCopyConjoint.java:192) > at compiler.arraycopy.TestArrayCopyConjoint.main(TestArrayCopyConjoint.java:240) Forgot to say that failure was on Windows with only avx512f, avx512cd. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Wed Nov 18 01:32:03 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 18 Nov 2020 01:32:03 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> Message-ID: On Tue, 17 Nov 2020 23:22:46 GMT, Vladimir Kozlov wrote: >> Hi Vladimir, >> Please let me know the next steps on this. Looks like running tests need approval. >> Xubo is a first time contributor. >> Best Regards, >> Sandhya > > I asked Skara team question about testing approval. Meanwhile I submitted our internal tier1 testing. > Note, we test only 64-bit VM. You need to run tests with 32-bit VM to verify fix. My testing passed. But it did not test 32-bit as I said. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From sviswanathan at openjdk.java.net Wed Nov 18 01:36:06 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 18 Nov 2020 01:36:06 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms [v7] In-Reply-To: References: <4Ine9CL8R4EY5DMaa55hNlgFQ5ig9eyjtlvE7PLI-gQ=.0bf630f1-a3ea-47a4-94b4-2fcb4495a157@github.com> <_MwgvrmtrJQFg549T6F9pgFPdiSHv0Sp44-bv1DPWfA=.749be78c-3551-4bbd-ba6f-3bee0dd81f4a@github.com> Message-ID: <3HWl7aNxx6xI8qsAXl2C6txeBewZtnVjKr5fq-AtDLc=.51cf5ea4-6c48-45df-a855-bab2f115997f@github.com> On Wed, 18 Nov 2020 01:29:13 GMT, Vladimir Kozlov wrote: >> I asked Skara team question about testing approval. Meanwhile I submitted our internal tier1 testing. >> Note, we test only 64-bit VM. You need to run tests with 32-bit VM to verify fix. > > My testing passed. But it did not test 32-bit as I said. Thanks a lot. I am doing the 32-bit tier1 testing. I will integrate the patch once the testing completes. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From dongbo at openjdk.java.net Wed Nov 18 01:41:05 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 18 Nov 2020 01:41:05 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v4] In-Reply-To: <5lFZtGvhwRh-OTHw9jRpxTSKbFHfQhyUKNhA30qj0iA=.b9c55684-ed83-417d-9feb-5ef1805da110@github.com> References: <5lFZtGvhwRh-OTHw9jRpxTSKbFHfQhyUKNhA30qj0iA=.b9c55684-ed83-417d-9feb-5ef1805da110@github.com> Message-ID: On Tue, 17 Nov 2020 08:38:36 GMT, Ningsheng Jian wrote: >> For integer absolute (vector), the accepted arrangements are `T8B, T16B, T4H, T8H, T2S, T4S, T2D`. >> ARM compiler armasm user guide reference: https://developer.arm.com/documentation/dui0801/h/A64-SIMD-Vector-Instructions/ABS--vector-?lang=en >> I think the original code `n->as_Vector()->length() == 4 ||` is not right for basic type Byte. So I delete it, I am sorry if I miss something. > > We load 4B with loadV4 but handle 4B types with T8B instructions. And current min_vector_size for byte type is 4, so it's possible for vectorizer to generate 4B vector nodes. Oh, I see. Thanks a lot for clarifying this, I should have taken a deep look. Really sorry for producing a BUG here, I will fix it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From dholmes at openjdk.java.net Wed Nov 18 02:34:08 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 18 Nov 2020 02:34:08 GMT Subject: RFR: 8256365: Clean up vtable initialization code [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 21:54:17 GMT, Coleen Phillimore wrote: >> I was looking through this code because of JDK-8061949 and want to do some minor cleanups. >> 1. There's a function in the wrong place (is_override) >> 2. methodHandles that use mh()->is_native(), with extra (), >> 3. some methods declared with TRAPS, that don't trap >> 4. some multi-clause conditionals with confusing formatting >> 5. extra InstanceKlass::cast() casts >> 6. some useless asserts >> 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) >> >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Revert RedefineClasses fix and remove InstanceKlass argument from update_inherited_vtable so which klass is clear. Hi Coleen, Why do you consider `is_override` to be in the wrong place? (It is badly named - should be `can_override` or `is_overridable`). I think of overriding as being a feature/property of a class not a vtable. Anyway all the cleanups appear to be fine. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1236 From dongbo at openjdk.java.net Wed Nov 18 02:48:25 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 18 Nov 2020 02:48:25 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v5] In-Reply-To: References: Message-ID: > This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. > > The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. > For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), > so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. > > The JMH results on Kunpeng916: > > Benchmark (count) (seed) Mode Cnt Score Error Units > > # before, fsub+fabs > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op > > # after, fabd > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op Dong Bo has updated the pull request incrementally with one additional commit since the last revision: add match rules back for AbsVB with vector length 4 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1215/files - new: https://git.openjdk.java.net/jdk/pull/1215/files/eae9185c..3376972c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1215&range=03-04 Stats: 55 lines in 2 files changed: 6 ins; 24 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/1215.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1215/head:pull/1215 PR: https://git.openjdk.java.net/jdk/pull/1215 From coleenp at openjdk.java.net Wed Nov 18 02:57:06 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 18 Nov 2020 02:57:06 GMT Subject: RFR: 8256365: Clean up vtable initialization code [v2] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 02:30:55 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert RedefineClasses fix and remove InstanceKlass argument from update_inherited_vtable so which klass is clear. > > Hi Coleen, > Why do you consider `is_override` to be in the wrong place? (It is badly named - should be `can_override` or `is_overridable`). I think of overriding as being a feature/property of a class not a vtable. > Anyway all the cleanups appear to be fine. > Thanks, > David The function is_override should be colocated with the functions that call it that implement overriding rules for selection for vtable initialization, which are very complicated. can_be_overridden might be a better name for it. I'll change it. That's a minor change. // Returns true iff super_method can be overridden by a method in targetclassname It'll match the comment also. Thanks for the code review! ------------- PR: https://git.openjdk.java.net/jdk/pull/1236 From coleenp at openjdk.java.net Wed Nov 18 03:55:19 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 18 Nov 2020 03:55:19 GMT Subject: RFR: 8256365: Clean up vtable initialization code [v3] In-Reply-To: References: Message-ID: > I was looking through this code because of JDK-8061949 and want to do some minor cleanups. > 1. There's a function in the wrong place (is_override) > 2. methodHandles that use mh()->is_native(), with extra (), > 3. some methods declared with TRAPS, that don't trap > 4. some multi-clause conditionals with confusing formatting > 5. extra InstanceKlass::cast() casts > 6. some useless asserts > 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) > > Tested with tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Rename is_override to can_be_overridden. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1236/files - new: https://git.openjdk.java.net/jdk/pull/1236/files/364f8195..5f51ef5b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1236&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1236&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1236.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1236/head:pull/1236 PR: https://git.openjdk.java.net/jdk/pull/1236 From sviswanathan at openjdk.java.net Wed Nov 18 04:37:04 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 18 Nov 2020 04:37:04 GMT Subject: RFR: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms In-Reply-To: References: Message-ID: <-6l9LeNyjb28a9nsoZ5VclAw2JfIQNePlgUXm32tXd8=.e894b00e-aa24-4088-bcf6-6ae913d3da8c@github.com> On Wed, 28 Oct 2020 04:32:41 GMT, Xubo Zhang wrote: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number @xbzhang99 32-bit tier1 testing passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From dholmes at openjdk.java.net Wed Nov 18 04:47:07 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 18 Nov 2020 04:47:07 GMT Subject: RFR: 8256365: Clean up vtable initialization code [v3] In-Reply-To: References: Message-ID: <0UAn3LXa0sX2NmISOUqMHXD2QRP7sxLr-G6pjOPm-sw=.f1af40eb-5d24-40cc-aa92-8a9feecdcc6b@github.com> On Wed, 18 Nov 2020 03:55:19 GMT, Coleen Phillimore wrote: >> I was looking through this code because of JDK-8061949 and want to do some minor cleanups. >> 1. There's a function in the wrong place (is_override) >> 2. methodHandles that use mh()->is_native(), with extra (), >> 3. some methods declared with TRAPS, that don't trap >> 4. some multi-clause conditionals with confusing formatting >> 5. extra InstanceKlass::cast() casts >> 6. some useless asserts >> 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) >> >> Tested with tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename is_override to can_be_overridden. Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1236 From github.com+58006833+xbzhang99 at openjdk.java.net Wed Nov 18 04:52:07 2020 From: github.com+58006833+xbzhang99 at openjdk.java.net (Xubo Zhang) Date: Wed, 18 Nov 2020 04:52:07 GMT Subject: Integrated: 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms In-Reply-To: References: Message-ID: On Wed, 28 Oct 2020 04:32:41 GMT, Xubo Zhang wrote: > Math.exp(10000) produces 0 instead of positive infinity on x86 32-bit platform. The reason was for some jmp instructions, it used jge instead of jae. Also changed movdqu to movsd as it was supposed to load a 64-bit number This pull request has now been integrated. Changeset: c0892148 Author: Xubo Zhang Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/c0892148 Stats: 72 lines in 2 files changed: 64 ins; 0 del; 8 mod 8255368: Math.exp() gives wrong result for large values on x86 32-bit platforms Reviewed-by: darcy, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/894 From david.holmes at oracle.com Wed Nov 18 05:31:11 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 Nov 2020 15:31:11 +1000 Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: References: Message-ID: <618aa897-18fd-fd70-1f0a-506e0e5a74d8@oracle.com> Hi Vincente, On 16/11/2020 11:36 pm, Vicente Romero wrote: > Please review the code for the second iteration of sealed classes. In this iteration we are: > > - Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies. > - Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface The major change here seems to be that getPermittedSubclasses() now returns actual Class objects instead of ClassDesc. My recollection from earlier discussions here was that the use of ClassDesc was very deliberate as the permitted subclasses may not actually exist and there may be security concerns with returning them! Cheers, David ----- > ------------- > > Commit messages: > - 8246778: Compiler implementation for Sealed Classes (Second Preview) > > Changes: https://git.openjdk.java.net/jdk/pull/1227/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1227&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 > Stats: 589 lines in 12 files changed: 508 ins; 18 del; 63 mod > Patch: https://git.openjdk.java.net/jdk/pull/1227.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/1227/head:pull/1227 > > PR: https://git.openjdk.java.net/jdk/pull/1227 > From stuefe at openjdk.java.net Wed Nov 18 06:47:08 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 18 Nov 2020 06:47:08 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v9] In-Reply-To: <0oMeRRhZupog8tI80dsgVxx_luSw4HB1hloHTFrqHFQ=.48c17fde-f2a2-4ca9-b04b-14c0d503d0d7@github.com> References: <0oMeRRhZupog8tI80dsgVxx_luSw4HB1hloHTFrqHFQ=.48c17fde-f2a2-4ca9-b04b-14c0d503d0d7@github.com> Message-ID: On Tue, 17 Nov 2020 22:55:54 GMT, David Holmes wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> last tweaks > > src/hotspot/os/posix/signals_posix.cpp line 1721: > >> 1719: // initialize suspend/resume support - must do this before signal_sets_init() >> 1720: if (SR_initialize() != 0) { >> 1721: vm_exit_during_initialization("SR_initialize failed"); > > You've lost the failure reason (errno) that perror would have provided. SR_initalize only returns non-zero if sigaction fails, which leaves errno set - hence perror() would report it. That said there are only two errors for sigaction (EFAULT or EINVAL) so we really haven't lost anything that useful. > Arguably the vm_exit... should be inside SR_initialize. > But I'm fine with this as-is so we can conclude this PR. Yes, but this relies on knowing the internal implementation of SR_initialize() (mainly that the last CRT call there was sigaction) and is different from all the other places where we do CRT calls. I'd rather add logging right there if the call fails like we do in other places. ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From stuefe at openjdk.java.net Wed Nov 18 06:55:11 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 18 Nov 2020 06:55:11 GMT Subject: RFR: 8253742: POSIX signal code cleanup [v9] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:59:19 GMT, Gerard Ziemski wrote: >> hi all, >> >> Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: >> >> #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code >> >> #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) >> >> #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) >> >> #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp >> >> #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include >> >> #6 Coleen's feedback - factored out print_signal_handlers() >> >> #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() >> >> #8 Thomas's feedback - factored out common POSIX signal initialization code >> >> #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API >> >> #10 YaSuenag's feedback - unified logging out of the scope for this fix >> >> #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > last tweaks AIX build went through, tests are fine so far. Looks good now. Ship it! ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/636 From njian at openjdk.java.net Wed Nov 18 07:00:04 2020 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 18 Nov 2020 07:00:04 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v5] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 02:48:25 GMT, Dong Bo wrote: >> This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. >> >> Verified with linux-aarch64-server-release, tier1-3. >> >> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. >> >> The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. >> For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), >> so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. >> >> The JMH results on Kunpeng916: >> >> Benchmark (count) (seed) Mode Cnt Score Error Units >> >> # before, fsub+fabs >> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op >> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op >> >> # after, fabd >> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op >> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > add match rules back for AbsVB with vector length 4 LGTM ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From tschatzl at openjdk.java.net Wed Nov 18 08:24:04 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 18 Nov 2020 08:24:04 GMT Subject: Integrated: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 11:39:59 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that changes the way how archive regions are managed in general and specifically by the G1 collector, fixing the crashes caused by adding the module graph into the archive in [JDK-8244778](https://bugs.openjdk.java.net/browse/JDK-8244778)? > > Previously before the JDK-8244778 change, archived objects could always be assumed as live, and so the G1 collector did so, not caring about the archive region's contents at all. With JDK-8244778 however, archived objects could die, and keep stale references to objects outside of the archive regions, which obviously causes crashes when walking these objects. > > With this change, open archive region contents are basically handled as any other objects; to support that, all open archive regions are now reachable via a single object array root. This hopefully also facilitates implementation in other collectors. > > This allows us to remove quite a bit of special handling in G1 too; the only difference is that open archive regions will generally not be collected unless they are completely empty: we do want to profit from the sharing across VMs as much as possible. > > Testing: tier1-5, one or two 6-8 runs > > The appcds changes were done by @iklam. These changes are described in this document: https://wiki.openjdk.java.net/display/HotSpot/CDS+Archived+Heap+Improvements > > Thanks, > Thomas This pull request has now been integrated. Changeset: d3095605 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/d3095605 Stats: 694 lines in 32 files changed: 464 ins; 110 del; 120 mod 8253081: G1 fails on stale objects in archived module graph in Open Archive regions Change the handling of Open Archive areas, instead of assuming that everything in there is live always, a root containing references to all live root objects is provided. Adapt G1 to handle Open Archive regions as any other old region apart from never compacting or evacuating them. Co-authored-by: Ioi Lam Reviewed-by: kbarrett, sjohanss, redestad ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From tschatzl at openjdk.java.net Wed Nov 18 08:24:02 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 18 Nov 2020 08:24:02 GMT Subject: RFR: 8253081: G1 fails on stale objects in archived module graph in Open Archive regions [v4] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 14:52:35 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: >> >> - sjohanss review >> - Remove code that "activates" dormant objects as now all active objects are reachable via the root object array > > Marked as reviewed by kbarrett (Reviewer). Thanks @kimbarrett @kstefanj @cl4es for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1163 From aph at openjdk.java.net Wed Nov 18 09:35:07 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 18 Nov 2020 09:35:07 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v5] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 02:48:25 GMT, Dong Bo wrote: >> This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. >> >> Verified with linux-aarch64-server-release, tier1-3. >> >> Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. >> >> The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. >> For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), >> so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. >> >> The JMH results on Kunpeng916: >> >> Benchmark (count) (seed) Mode Cnt Score Error Units >> >> # before, fsub+fabs >> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op >> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op >> >> # after, fabd >> FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op >> FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > add match rules back for AbsVB with vector length 4 Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From dongbo at openjdk.java.net Wed Nov 18 09:57:05 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 18 Nov 2020 09:57:05 GMT Subject: RFR: 8256318: AArch64: Add support for floating-point absolute difference [v5] In-Reply-To: References: Message-ID: <3yFx-RkAorTPKIH3r6oL4zPs8KY660_v12DHdgPQKjM=.083458ce-24cc-4d3e-9308-4a9e47adea72@github.com> On Wed, 18 Nov 2020 09:32:45 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> add match rules back for AbsVB with vector length 4 > > Marked as reviewed by aph (Reviewer). @theRealAph @nsjian Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From kbarrett at openjdk.java.net Wed Nov 18 10:01:16 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 18 Nov 2020 10:01:16 GMT Subject: RFR: 8256516: Simplify clearing References Message-ID: Please review this simplification of jlr.Reference clearing by VM code. The function java_lang_ref_Reference::set_referent_raw was being used to clear the referent of Reference objects, and only for that purpose. This change replaces that function with java_lang_ref_Reference::clear_referent, which is much more obvious in intent. That change is then percolated up through callers in the obvious way. Testing: mach5 tier1 ------------- Commit messages: - replace set_referent with clear_referent Changes: https://git.openjdk.java.net/jdk/pull/1286/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1286&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256516 Stats: 32 lines in 7 files changed: 9 ins; 10 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/1286.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1286/head:pull/1286 PR: https://git.openjdk.java.net/jdk/pull/1286 From rkennke at openjdk.java.net Wed Nov 18 10:13:04 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 18 Nov 2020 10:13:04 GMT Subject: RFR: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 09:53:11 GMT, Kim Barrett wrote: > Please review this simplification of jlr.Reference clearing by VM code. > > The function java_lang_ref_Reference::set_referent_raw was being used to > clear the referent of Reference objects, and only for that purpose. This > change replaces that function with java_lang_ref_Reference::clear_referent, > which is much more obvious in intent. That change is then percolated up > through callers in the obvious way. > > Testing: > mach5 tier1 Looks good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1286 From shade at openjdk.java.net Wed Nov 18 10:13:06 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 18 Nov 2020 10:13:06 GMT Subject: RFR: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 09:53:11 GMT, Kim Barrett wrote: > Please review this simplification of jlr.Reference clearing by VM code. > > The function java_lang_ref_Reference::set_referent_raw was being used to > clear the referent of Reference objects, and only for that purpose. This > change replaces that function with java_lang_ref_Reference::clear_referent, > which is much more obvious in intent. That change is then percolated up > through callers in the obvious way. > > Testing: > mach5 tier1 Looks good, with minor nit. src/hotspot/share/gc/shared/referenceProcessor.hpp line 123: > 121: > 122: // Apply the keep_alive function to the referent address. > 123: void make_referent_alive(); I wonder if moving this from the `.hpp` to `.cpp` has performance implications for callers. Maybe move to `.inline.hpp`? src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 319: > 317: } else { > 318: // Clear referent > 319: reference_clear_referent(reference); Now I am looking at this code and wonder if we could just inline `reference_clear_referent` and `reference_set_next` both in Shenandoah and ZGC code. Probably something for a followup. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1286 From dongbo at openjdk.java.net Wed Nov 18 10:17:06 2020 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 18 Nov 2020 10:17:06 GMT Subject: Integrated: 8256318: AArch64: Add support for floating-point absolute difference In-Reply-To: References: Message-ID: On Sat, 14 Nov 2020 06:22:19 GMT, Dong Bo wrote: > This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. > > Verified with linux-aarch64-server-release, tier1-3. > > Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. > > The FABD (scalar), the performance tests handle registers directly, the average latency reduces to almost half (~57%) of the original. > For FABD (vector), we restrict the data size (~24KB) to be less than L1 data cache size (32KB), > so that the memory access can hit in L1, and witness 14.2% (float) and 21.2% (double) improvements. > > The JMH results on Kunpeng916: > > Benchmark (count) (seed) Mode Cnt Score Error Units > > # before, fsub+fabs > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 6038.333 ? 3.889 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 6005.125 ? 3.025 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 950.340 ? 9.398 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 454.350 ? 1.798 ns/op > > # after, fabd > FloatingScalarVectorAbsDiff.testScalarAbsDiffDouble 1024 316731 avgt 10 3483.801 ? 1.763 ns/op > FloatingScalarVectorAbsDiff.testScalarAbsDiffFloat 1024 316731 avgt 10 3442.412 ? 1.866 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 1024 316731 avgt 10 816.301 ? 4.454 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 1024 316731 avgt 10 354.710 ? 1.001 ns/op This pull request has now been integrated. Changeset: b0b9dd27 Author: Dong Bo Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/b0b9dd27 Stats: 806 lines in 24 files changed: 350 ins; 124 del; 332 mod 8256318: AArch64: Add support for floating-point absolute difference Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/1215 From mdoerr at openjdk.java.net Wed Nov 18 10:24:02 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 18 Nov 2020 10:24:02 GMT Subject: RFR: 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX In-Reply-To: <2niGhUOD2rNsM5tuIeMr3Jhg4Vtot1LNS6_ffYl6egk=.c4dbfa4a-efa7-46c6-b8ad-7f0dd5bf7872@github.com> References: <2niGhUOD2rNsM5tuIeMr3Jhg4Vtot1LNS6_ffYl6egk=.c4dbfa4a-efa7-46c6-b8ad-7f0dd5bf7872@github.com> Message-ID: On Tue, 17 Nov 2020 19:45:03 GMT, Paul Sandoz wrote: >> C2 crashes on Power9 when using UseVectorByteReverseInstructionsPPC64 without SuperwordUseVSX. bytes_reverse_long_vecNode uses a Vector Register. See bug for more details. >> >> The VectorConversion tests can detect the issue. > > Test update looks good. Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/1262 From mdoerr at openjdk.java.net Wed Nov 18 10:24:03 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 18 Nov 2020 10:24:03 GMT Subject: Integrated: 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX In-Reply-To: References: Message-ID: <7GXzltelAtPnU0xz_itolnVShUeo2qDXxqEqVpuZJQk=.ffd1caa2-a9fc-469d-8f55-65dcbf49b4f9@github.com> On Tue, 17 Nov 2020 15:27:29 GMT, Martin Doerr wrote: > C2 crashes on Power9 when using UseVectorByteReverseInstructionsPPC64 without SuperwordUseVSX. bytes_reverse_long_vecNode uses a Vector Register. See bug for more details. > > The VectorConversion tests can detect the issue. This pull request has now been integrated. Changeset: 97074969 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/97074969 Stats: 18 lines in 2 files changed: 16 ins; 0 del; 2 mod 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX Reviewed-by: goetz, psandoz ------------- PR: https://git.openjdk.java.net/jdk/pull/1262 From redestad at openjdk.java.net Wed Nov 18 10:25:06 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 18 Nov 2020 10:25:06 GMT Subject: RFR: 8256254: Convert vmIntrinsics::ID to enum class [v2] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 23:16:17 GMT, Ioi Lam wrote: >> This PR is follows the same style as https://github.com/openjdk/jdk/pull/276, except this time I am converting `vmIntrinsics::ID` to `vmIntrinsicID`: >> >> * Convert `vmIntrinsics::SID` to `enum class` to provide better type safety. >> * Also, put this enum class in the top level, so it can be forward-declared. I.e., `enum class vmIntrinsicID : int;`. This avoids excessive inclusion of vmIntrinsics.hpp and vmSymbols.hpp (which were included indirectly by almost every hotspot source files). >> * vmIntrinsics.hpp: was included 805 times, now included 414 times >> * vmSymbols.hpp: was included 805 times, now include 394 times >> * Note: more #include reduction will be done in [JDK-8256424](https://bugs.openjdk.java.net/browse/JDK-8256424) >> >> Many files are changed, but most of them are minor >> * Added missing dependencies of vmSymbols.hpp and/or vmIntrinsics.hpp >> * safe conversion between vmIntrinsicID and integer types (see comments around `vmIntrinsics::as_int()`) >> >> Since we have a lot of references like `vmIntrinsics::_invokeGeneric`, I added aliases like >> >> static const vmIntrinsicID vmIntrinsics::_invokeGeneric = vmIntrinsicID::_invokeGeneric >> >> so we don't need to change over a thousand `vmIntrinsics::XXX` to `vmIntrinsicID::XXX`. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @cl4es reviews Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1237 From rraj at openjdk.java.net Wed Nov 18 10:41:12 2020 From: rraj at openjdk.java.net (Rohit Arul Raj) Date: Wed, 18 Nov 2020 10:41:12 GMT Subject: RFR: 8256536: Newer AMD 19h (EPYC) Processor family defaults Message-ID: This patch sets the following flags as defaults for newer AMD 19h (EPYC) processors. bool UseFPUForSpilling = true bool UseUnalignedLoadStores = true bool UseXMMForArrayCopy = true bool UseXMMForObjInit = true bool UseFastStosb = false bool AlignVector = false Additional testing: make run-test TEST="tier1 tier2" JTREG="JOBS=128" CONF=linux-x86_64-server-fastdebug Please review this change. Thanks, Rohit ------------- Commit messages: - 8256536: Newer AMD 19h (EPYC) Processor family defaults Changes: https://git.openjdk.java.net/jdk/pull/1288/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1288&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256536 Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1288.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1288/head:pull/1288 PR: https://git.openjdk.java.net/jdk/pull/1288 From tschatzl at openjdk.java.net Wed Nov 18 10:46:12 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 18 Nov 2020 10:46:12 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 19:39:13 GMT, Stefan Johansson wrote: >> Please review this change that implements concurrent uncommit for G1. >> >> **Summary** >> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. >> >> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). >> >> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. >> >> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. >> >> **Logging** >> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: >> [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 >> ... >> [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms >> [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms >> ... >> [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms >> >> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. >> >> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: >> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) >> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) >> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) >> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) >> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) >> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) >> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) >> >> **Testing** >> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. >> >> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. > > Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into 8236926-ccu > - Zoom feedback > - Albert review 2 > - Albert review > - Merge branch 'master' into 8236926-ccu > - Lock for small mapper and use BitMap parallel operations. > - Self review > - Simplified task > - Improved logging > - Test improvement > - ... and 5 more: https://git.openjdk.java.net/jdk/compare/3675653c...c354b1d8 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 98: > 96: // Finds the next range of committable regions starting at offset. > 97: // This function must only be called when no inactive regions are > 98: // present and can be used to active more regions. s/active/activate src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 567: > 565: > 566: // Check if there is memory to uncommit and if so schedule a task to do it. > 567: void uncommit_heap_if_necessary(); I would prefer if the method were called `uncommit_regions_if_necessary()` as this method does not uncommit the *heap* but just uncomittable regions. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 568: > 566: // Check if there is memory to uncommit and if so schedule a task to do it. > 567: void uncommit_heap_if_necessary(); > 568: uint uncommit_regions(uint region_limit); Please add a comment like "// Immediately uncommits uncommittable regions. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 806: > 804: HeapRegion::GrainWords * HeapWordSize * shrink_count); > 805: // Explicit uncommit. > 806: _hrm.uncommit_inactive_regions((uint) shrink_count); Please let the code `G1CollectedHeap::uncommit_regions()` helper here to limit the references to `_hrm`. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 747: > 745: HeapRegion* prev_last_region = NULL; > 746: size_t size_used = 0; > 747: size_t shrink_count = 0; The code may as well define `shrink_count` as `uint` as the only use seems to cast it to `uint` anyway. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1312: > 1310: shrink_bytes, aligned_shrink_bytes, shrunk_bytes); > 1311: if (num_regions_removed > 0) { > 1312: log_debug(gc, heap)("Regions ready for uncommit: %u", num_regions_removed); Maybe keep with existing terminology here i.e. `Uncommittable regions after shrink: %u` src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp line 58: > 56: > 57: public: > 58: static void run(); I'd prefer if the method were named `schedule` (or something like `schedule_for_later` or maybe `enqueue` to not clash with the protected `schedule(jlong)` method) instead of `run` since imho `run` implies that it is actually executed. src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 128: > 126: // No delay, reason to reschedule rather then to loop is to allow > 127: // other tasks to run without waiting for a full uncommit cycle. > 128: schedule(0); Why not notify like in `run()`? Maybe refactor the code a bit to allow calling the same method to schedule the task and notify the service thread for immediate processing here. src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp line 42: > 40: // while running on the service thread joined with the suspendible > 41: // thread set. > 42: bool _active; The information in the comment is missing context on how scheduling and suspension works and how the `_active` flag interacts with `run()` requests. The MT safety seems an implementation detail. I.e. it would be nice to know that: * this task does its work in chunks, i.e. does not uncommit everything at once to avoid long stalls or issues with GCs interrupting it * this flag prevents rescheduling the task when it has not uncommitted everything yet but another request (`run` call) comes in. src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 83: > 81: _summary_duration += time; > 82: > 83: log_trace(gc, heap)("Concurrent Uncommit: " SIZE_FORMAT "%s, %u regions, %1.3fms", Maybe use gc+heap+region here (and below) like other code; imho after all this task is kind of an extension of the `HeapRegionManager`. I did not look at the output though. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1172: > 1170: reset_at_marking_complete(); > 1171: > 1172: _g1h->uncommit_heap_if_necessary(); I'd prefer it this call were placed below the `resize_heap_if_necessary` call unless there is a reason not to. src/hotspot/share/gc/g1/g1FullCollector.cpp line 218: > 216: _heap->print_heap_after_full_collection(scope()->heap_transition()); > 217: > 218: _heap->uncommit_heap_if_necessary(); Maybe move this to the `gc_epilogue` or actually into `prepare_heap_for_mutators` where the shrinking occurs. src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 81: > 79: } > 80: > 81: bool committed_range(uint start_idx, size_t num_regions) { I would prefer if these two methods were named more like a condition, i.e. `is_range_committed` and `is_range_uncommitted` (optionally exchanging `range` and `[un]committed`) as this makes the verification they are used in read better. src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 92: > 90: > 91: virtual void commit_regions(uint start_idx, size_t num_regions, WorkGang* pretouch_gang) { > 92: guarantee(uncommitted_range(start_idx, num_regions), Not sure this (and the one in `uncommit_regions`) should be guarantees. src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 138: > 136: // - G1RegionToSpaceMapper::_region_commit_map; > 137: // - G1PageBasedVirtualSpace::_committed (_storage.commit()) > 138: Mutex _lock; Good comment! :) src/hotspot/share/gc/g1/g1ServiceThread.hpp line 138: > 136: // Notify a change to the service thread. Used to stop either > 137: // stop the service or to force check for new tasks. > 138: void notify(); The first "stop" is too much src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 46: > 44: }; > 45: > 46: class G1CommittedRegionMap : public CHeapObj { It would be nice to have a diagram of the region states and their transitions here. src/hotspot/share/gc/g1/heapRegionManager.cpp line 181: > 179: G1CollectedHeap::heap()->hr_printer()->commit(hr); > 180: } > 181: activate_regions(start, num_regions); In this place there will be two messages from the HRPrinter for every region: 1) a COMMIT message and 2) an ACTIVATE message This is a bit confusing as in my understanding (that's why I asked for a region state diagram in the `G1CommittedRegionMap` which may as well be put in `HeapRegionManager`) the (typical) flow of states are Uncommitted->Committed/Active->Committed/Inactive->Uncommitted. As mentioned, I'm not sure if it is a good idea to send two separate messages here; better rename the "Active" message to "Commit-Active" (and "Inactive" to "Commit-Inactive") instead imho, even if it's quite long (and drop `HRPrinter::commit()` completely) ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From tschatzl at openjdk.java.net Wed Nov 18 10:46:12 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 18 Nov 2020 10:46:12 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Fri, 13 Nov 2020 12:15:58 GMT, Stefan Johansson wrote: >> src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 107: >> >>> 105: >>> 106: // Each execution is limited to uncommit at most 256M worth of regions. >>> 107: static const uint region_limit = (uint) (256 * M / G1HeapRegionSize); >> >> Why 256M? Better include some motivation in the comments. > > 256M is just a "reasonable" limit that I picked to get short enough invocations. I updated the comment a bit. One suggestion I have is to make this a static constant in the class declaration not dig it up somewhere in the code. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From github.com+4146708+a74nh at openjdk.java.net Wed Nov 18 11:42:13 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Wed, 18 Nov 2020 11:42:13 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v8] In-Reply-To: References: Message-ID: > The AArch64 port uses maybe_isb in places where an ISB might be required > because the code may have safepointed. These maybe_isbs are very conservative > and are used in many places are used when a safepoint has not happened. > > cross_modify_fence was added in common code to place a barrier in all the > places after a safepoint has occurred. All the uses of it are in common code, > yet it remains unimplemented on AArch64. > > This set of patches implements cross_modify_fence for AArch64 and reconsiders > every uses of maybe_isb, discarding many of them. In addition, it introduces > a new diagnostic option, which when enabled on AArch64 tests the correct > usage of the barriers. > > Advantage of this patch is threefold: > * Reducing the number of ISBs - giving a theoretical performance improvement. > * Use of common code instead of backend specific code. > * Additional test diagnostic options > > Patch 1: Split cross_modify_fence > ================================= > This is simply refactoring work split out to simplify the other two patches. > > instruction_fence() is provided by each target and simply places > a fence for the instruction stream. > > cross_modify_fence() is now a member of JavaThread and just calls > instruction_fence. This function will be extended in Patch 3. > > Patch 2: Use cross_modify_fence instead of maybe_isb > ==================================================== > > The [n] References refer to the comments for cross_modify_fence in > thread.hpp. > > This is all the existing uses of maybe_isb in the AArch64 target: > > 1) Instances of Java code calling a VM function > * This encapsulates the changes to: > ** MacroAssembler::call_VM_leaf_base() > ** generate_fast_get_int_field0() > ** stubGenerator_aarch64 generate_throw_exception() > ** sharedRuntime_aarch64 generate_handler_blob() > ** SharedRuntime::generate_resolve_blob() > ** C1 LIR_Assembler::rt_call > ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, > generate_handle_exception, generate_code_for. > ** OptoRuntime::generate_exception_blob() > * Any changes will be caught due to calls to [2] or [3] by the VM function. > * Any calls that do not call [2] or [3] do not require an ISB. > * This patch is more optimal for these cases. > > 2) Instances of Java code calling a JNI function > * This encapsulates the changes to: > ** SharedRuntime::generate_native_wrapper() > ** TemplateInterpreterGenerator::generate_native_entry() > * A safepoint still in progress after the call with be caught by [4]. > * An ISB is still required for the case where there was a safepoint > but it completed during the call. This happens if the code doesn't > branch on safepoint_in_progress > * In the SharedRuntime version, the two possible calls to > reguard_yellow_pages and complete_monitor_unlocking_C are after the thread > goes back into it's original state, so are covered by [2] and [3], the > same as a normal VM call. > * This patch is only more optimal for the two post-JNI calls. > > 3) Patching functions > * This encapsulates the changes to: > ** patch_callers_callsite() (called by gen_c2i_adapter()) > * This results in code being patched, but does not safepoint > * Therefore an ISB is required. > * This patch introduces no change here. > > 4) C1 MacroAssembler::emit_static_call_stub() > * Calls ISB (not maybe_isb) > * By design, the patching doesn't require that the up-to-date > destination is required for proper functioning. > * However, the ISB makes it most likely that the new destination will > be picked up. > * This patch introduces no change here. > > Patch 3: Add cross modify fence verification > ============================================ > > The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct > usage of instruction barriers. It can safely be enabled on any Java run. > > Enabling it will cause the following: > > * Once all threads have been brought to a safepoint, each thread will be > marked. > > * On a cross_modify_fence and safepoint_fence the mark for that thread > will be cleared. > > * On entry to a method and in a safepoint poll, then the thread is checked. > If it is marked, then the code will error. Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge master 2020/11/18 Change-Id: I256b857ca275a8806febc1b9dc5412aac6d862a7 CustomizedGitHooks: yes - Enable VerifyCrossModifyFence for debug aarch64 - Remove An instruction sync is required comments - Update cross_modify_fence comment - Update comments & remove ifdef Change-Id: Ibbe45650d351d8cff6fbf7a7c8baf30afbdac17c CustomizedGitHooks: yes - Merge master 2020/11/12 Change-Id: I73323c90765bf8524f12f680abde7e7e5b3bb898 CustomizedGitHooks: yes - Merge master Change-Id: I97df4e7686699478f0f89451ec0a3537d38cfd6d - Merge master Change-Id: I5e1715fdb11305191fe7bf86cbfb7a6da446b3dc - Remove inlasm_isb define Change-Id: I2d0ef8a78292dac875f3f65d2253981cdb7a497a - AArch64: Add cross modify fence verification - ... and 2 more: https://git.openjdk.java.net/jdk/compare/f7f34474...499d7063 ------------- Changes: https://git.openjdk.java.net/jdk/pull/428/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=428&range=07 Stats: 141 lines in 25 files changed: 94 ins; 11 del; 36 mod Patch: https://git.openjdk.java.net/jdk/pull/428.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/428/head:pull/428 PR: https://git.openjdk.java.net/jdk/pull/428 From github.com+4146708+a74nh at openjdk.java.net Wed Nov 18 11:52:03 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Wed, 18 Nov 2020 11:52:03 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v3] In-Reply-To: References: Message-ID: On Thu, 12 Nov 2020 15:57:23 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - AArch64: Add cross modify fence verification >> - AArch64: Use cross_modify_fence instead of maybe_isb >> - Split cross_modify_fence > > Still pending. > > > The problem is it massively slows down a run. A tier1 test run for fastdebug went from 1h 32m 58s to > > 3h 43m 47s. I didn't think that would be acceptable. > > But why is it so expensive? All it does is mark the threads at a safepoint > and later check the mark at safepoints. It's not as if it's doing anything > much, but you're telling me it is more expensive than everything else put > together. > Ok, please ignore my comment about slowdowns. Turns out that our testing infrastructure had been scheduling onto different boxes. Once forced onto a single machine, I see no real difference in a complete tier 1 run with and without VerifyCrossModifyFence. (With the slower boxes taking ~4 hours and a quicker one taking ~1 hours) Patch updated to enable VerifyCrossModifyFence always for aarch64 debug. > > I wanted to avoid mentioning code that no longer exists. (Maybe it's best to just drop the comment?) > > The comment only makes sense in the context of the code that was there > before. > > > How about: > > // When we return from the VM, the instruction stream may have > > // been modified. Therefore needs an isb is required. The VM will > > // have already done this by calling cross_modify_fence(). > > This is self contradicting: firstly you say and ISB is required, then > you say why it isn't. Agreed. The comment only makes sense when it refers to code that used to exist. And I really dislike comments that do that. So, I've dropped them completely. > // Finally, we define an "instruction_fence" operation, which ensures that all > // instructions that come after the fence in program order are fetched > // from the cache or memory after the fence has completed Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From github.com+4146708+a74nh at openjdk.java.net Wed Nov 18 12:00:21 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Wed, 18 Nov 2020 12:00:21 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v9] In-Reply-To: References: Message-ID: > The AArch64 port uses maybe_isb in places where an ISB might be required > because the code may have safepointed. These maybe_isbs are very conservative > and are used in many places are used when a safepoint has not happened. > > cross_modify_fence was added in common code to place a barrier in all the > places after a safepoint has occurred. All the uses of it are in common code, > yet it remains unimplemented on AArch64. > > This set of patches implements cross_modify_fence for AArch64 and reconsiders > every uses of maybe_isb, discarding many of them. In addition, it introduces > a new diagnostic option, which when enabled on AArch64 tests the correct > usage of the barriers. > > Advantage of this patch is threefold: > * Reducing the number of ISBs - giving a theoretical performance improvement. > * Use of common code instead of backend specific code. > * Additional test diagnostic options > > Patch 1: Split cross_modify_fence > ================================= > This is simply refactoring work split out to simplify the other two patches. > > instruction_fence() is provided by each target and simply places > a fence for the instruction stream. > > cross_modify_fence() is now a member of JavaThread and just calls > instruction_fence. This function will be extended in Patch 3. > > Patch 2: Use cross_modify_fence instead of maybe_isb > ==================================================== > > The [n] References refer to the comments for cross_modify_fence in > thread.hpp. > > This is all the existing uses of maybe_isb in the AArch64 target: > > 1) Instances of Java code calling a VM function > * This encapsulates the changes to: > ** MacroAssembler::call_VM_leaf_base() > ** generate_fast_get_int_field0() > ** stubGenerator_aarch64 generate_throw_exception() > ** sharedRuntime_aarch64 generate_handler_blob() > ** SharedRuntime::generate_resolve_blob() > ** C1 LIR_Assembler::rt_call > ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, > generate_handle_exception, generate_code_for. > ** OptoRuntime::generate_exception_blob() > * Any changes will be caught due to calls to [2] or [3] by the VM function. > * Any calls that do not call [2] or [3] do not require an ISB. > * This patch is more optimal for these cases. > > 2) Instances of Java code calling a JNI function > * This encapsulates the changes to: > ** SharedRuntime::generate_native_wrapper() > ** TemplateInterpreterGenerator::generate_native_entry() > * A safepoint still in progress after the call with be caught by [4]. > * An ISB is still required for the case where there was a safepoint > but it completed during the call. This happens if the code doesn't > branch on safepoint_in_progress > * In the SharedRuntime version, the two possible calls to > reguard_yellow_pages and complete_monitor_unlocking_C are after the thread > goes back into it's original state, so are covered by [2] and [3], the > same as a normal VM call. > * This patch is only more optimal for the two post-JNI calls. > > 3) Patching functions > * This encapsulates the changes to: > ** patch_callers_callsite() (called by gen_c2i_adapter()) > * This results in code being patched, but does not safepoint > * Therefore an ISB is required. > * This patch introduces no change here. > > 4) C1 MacroAssembler::emit_static_call_stub() > * Calls ISB (not maybe_isb) > * By design, the patching doesn't require that the up-to-date > destination is required for proper functioning. > * However, the ISB makes it most likely that the new destination will > be picked up. > * This patch introduces no change here. > > Patch 3: Add cross modify fence verification > ============================================ > > The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct > usage of instruction barriers. It can safely be enabled on any Java run. > > Enabling it will cause the following: > > * Once all threads have been brought to a safepoint, each thread will be > marked. > > * On a cross_modify_fence and safepoint_fence the mark for that thread > will be cleared. > > * On entry to a method and in a safepoint poll, then the thread is checked. > If it is marked, then the code will error. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Fix global flags indentation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/428/files - new: https://git.openjdk.java.net/jdk/pull/428/files/499d7063..56429ca8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=428&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=428&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/428.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/428/head:pull/428 PR: https://git.openjdk.java.net/jdk/pull/428 From david.holmes at oracle.com Wed Nov 18 13:35:22 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 Nov 2020 23:35:22 +1000 Subject: RFR: 8256536: Newer AMD 19h (EPYC) Processor family defaults In-Reply-To: References: Message-ID: <8a6c0481-fbd5-7a93-2abb-61a5eee4f3ce@oracle.com> Hi Rohit, On 18/11/2020 8:41 pm, Rohit Arul Raj wrote: > This patch sets the following flags as defaults for newer AMD 19h (EPYC) processors. > bool UseFPUForSpilling = true > bool UseUnalignedLoadStores = true > bool UseXMMForArrayCopy = true > bool UseXMMForObjInit = true > bool UseFastStosb = false > bool AlignVector = false I assume you have performance numbers to justify/motivate this change. Can you please provide some details in the bug report. Thanks, David > Additional testing: make run-test TEST="tier1 tier2" JTREG="JOBS=128" CONF=linux-x86_64-server-fastdebug > > Please review this change. > > Thanks, > Rohit > > ------------- > > Commit messages: > - 8256536: Newer AMD 19h (EPYC) Processor family defaults > > Changes: https://git.openjdk.java.net/jdk/pull/1288/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1288&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8256536 > Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod > Patch: https://git.openjdk.java.net/jdk/pull/1288.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/1288/head:pull/1288 > > PR: https://git.openjdk.java.net/jdk/pull/1288 > From rkennke at openjdk.java.net Wed Nov 18 13:47:19 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 18 Nov 2020 13:47:19 GMT Subject: RFR: 8256497: Zero: enable G1 and Shenandoah GCs In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:02:03 GMT, Aleksey Shipilev wrote: > Following the [JDK-8255796](https://bugs.openjdk.java.net/browse/JDK-8255796) improvement that ditched the inline contiguous alloc use from Zero, we can now rely on GC interface to hook the GCs properly. G1 and Shenandoah are a bit special here, because they require special `Reference.get` handling. > > Note that it does not change the default GC for Zero, because Zero is implicitly `NeverActAsServerMachine`, which still selects Serial GC by default. After this change, Zero users can opt-in to G1 or Shenandoah. > > Additional testing: > - [x] Linux x86_64 Zero fastdebug `hotspot_gc_shenandoah` (some lingering failures about non-enabled compressed oops) > - [ ] Linux x86_64 Zero fastdebug `tier1` with `-XX:+UseG1GC` Looks good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1268 From shade at openjdk.java.net Wed Nov 18 13:47:19 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 18 Nov 2020 13:47:19 GMT Subject: RFR: 8256497: Zero: enable G1 and Shenandoah GCs Message-ID: Following the [JDK-8255796](https://bugs.openjdk.java.net/browse/JDK-8255796) improvement that ditched the inline contiguous alloc use from Zero, we can now rely on GC interface to hook the GCs properly. G1 and Shenandoah are a bit special here, because they require special `Reference.get` handling. Note that it does not change the default GC for Zero, because Zero is implicitly `NeverActAsServerMachine`, which still selects Serial GC by default. After this change, Zero users can opt-in to G1 or Shenandoah. Additional testing: - [x] Linux x86_64 Zero fastdebug `hotspot_gc_shenandoah` (some lingering failures about non-enabled compressed oops) - [ ] Linux x86_64 Zero fastdebug `tier1` with `-XX:+UseG1GC` ------------- Commit messages: - Remove TODO - 8256497: Zero: enable G1 and Shenandoah GCs Changes: https://git.openjdk.java.net/jdk/pull/1268/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1268&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256497 Stats: 72 lines in 5 files changed: 54 ins; 16 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1268.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1268/head:pull/1268 PR: https://git.openjdk.java.net/jdk/pull/1268 From coleenp at openjdk.java.net Wed Nov 18 14:16:09 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 18 Nov 2020 14:16:09 GMT Subject: Integrated: 8256365: Clean up vtable initialization code In-Reply-To: References: Message-ID: <2cW48Vv25aWZo6HSUH749tfAQA0VV8jRGctEcu8gUA8=.13cf3ac4-d37b-4370-9bcf-0e0e3353738e@github.com> On Mon, 16 Nov 2020 20:38:22 GMT, Coleen Phillimore wrote: > I was looking through this code because of JDK-8061949 and want to do some minor cleanups. > 1. There's a function in the wrong place (is_override) > 2. methodHandles that use mh()->is_native(), with extra (), > 3. some methods declared with TRAPS, that don't trap > 4. some multi-clause conditionals with confusing formatting > 5. extra InstanceKlass::cast() casts > 6. some useless asserts > 7. and potentially a bug with RedefineClasses where the method being added to the vtable may have been redefined in the constraint verification call. (noreg-hard) > > Tested with tier1-3. This pull request has now been integrated. Changeset: fa8dce4f Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/fa8dce4f Stats: 131 lines in 4 files changed: 30 ins; 37 del; 64 mod 8256365: Clean up vtable initialization code Reviewed-by: eosterlund, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/1236 From gziemski at openjdk.java.net Wed Nov 18 15:32:07 2020 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 18 Nov 2020 15:32:07 GMT Subject: Integrated: 8253742: POSIX signal code cleanup In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 14:19:02 GMT, Gerard Ziemski wrote: > hi all, > > Please review this followup to [JDK-8252324 Signal related code should be shared among POSIX platforms](https://bugs.openjdk.java.net/browse/JDK-8252324), where several issues were identified for a cleanup. This change addresses them all: > > #1 David's feedback - removed non POSIX SIGNIFICANT_SIGNAL_MASK code > > #2 David's feedback - used unblock_program_error_signals() on all platforms (reverted for JDK-8252533) > > #3 David's feedback - used single JVM_handle_posix_signal API for all POSIX platforms (reverted for JDK-8255711) > > #4 Coleen's feedback - cleanup header files in src/hotspot/os/posix/signals_posix.hpp > > #5 Coleen's feedback - hid SR_signum assignment in src/hotspot/os/posix/signals_posix.hpp to avoid having to include > > #6 Coleen's feedback - factored out print_signal_handlers() > > #7 Thomas' feedback - factored out common POSIX os::SuspendedThreadTask::internal_do_task() > > #8 Thomas's feedback - factored out common POSIX signal initialization code > > #9 YaSuenag's feedback - used JVM_handle_posix_signal for the common API > > #10 YaSuenag's feedback - unified logging out of the scope for this fix > > #11 YaSuenag's feedback - memset usage in PosixSignals::jdk_misc_signal_init() correct? This pull request has now been integrated. Changeset: 50a2c22f Author: Gerard Ziemski URL: https://git.openjdk.java.net/jdk/commit/50a2c22f Stats: 327 lines in 20 files changed: 65 ins; 172 del; 90 mod 8253742: POSIX signal code cleanup Reviewed-by: stuefe, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/636 From mbeckwit at openjdk.java.net Wed Nov 18 17:04:04 2020 From: mbeckwit at openjdk.java.net (Monica Beckwith) Date: Wed, 18 Nov 2020 17:04:04 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v9] In-Reply-To: References: Message-ID: <5Gv2io-Pf6IKjd2gAYl2G02cBqpX5hX9tw52j0gLP8s=.7e4e4b43-908b-4f4d-a3eb-347f8719af75@github.com> On Wed, 18 Nov 2020 12:00:21 GMT, Alan Hayward wrote: >> The AArch64 port uses maybe_isb in places where an ISB might be required >> because the code may have safepointed. These maybe_isbs are very conservative >> and are used in many places are used when a safepoint has not happened. >> >> cross_modify_fence was added in common code to place a barrier in all the >> places after a safepoint has occurred. All the uses of it are in common code, >> yet it remains unimplemented on AArch64. >> >> This set of patches implements cross_modify_fence for AArch64 and reconsiders >> every uses of maybe_isb, discarding many of them. In addition, it introduces >> a new diagnostic option, which when enabled on AArch64 tests the correct >> usage of the barriers. >> >> Advantage of this patch is threefold: >> * Reducing the number of ISBs - giving a theoretical performance improvement. >> * Use of common code instead of backend specific code. >> * Additional test diagnostic options >> >> Patch 1: Split cross_modify_fence >> ================================= >> This is simply refactoring work split out to simplify the other two patches. >> >> instruction_fence() is provided by each target and simply places >> a fence for the instruction stream. >> >> cross_modify_fence() is now a member of JavaThread and just calls >> instruction_fence. This function will be extended in Patch 3. >> >> Patch 2: Use cross_modify_fence instead of maybe_isb >> ==================================================== >> >> The [n] References refer to the comments for cross_modify_fence in >> thread.hpp. >> >> This is all the existing uses of maybe_isb in the AArch64 target: >> >> 1) Instances of Java code calling a VM function >> * This encapsulates the changes to: >> ** MacroAssembler::call_VM_leaf_base() >> ** generate_fast_get_int_field0() >> ** stubGenerator_aarch64 generate_throw_exception() >> ** sharedRuntime_aarch64 generate_handler_blob() >> ** SharedRuntime::generate_resolve_blob() >> ** C1 LIR_Assembler::rt_call >> ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, >> generate_handle_exception, generate_code_for. >> ** OptoRuntime::generate_exception_blob() >> * Any changes will be caught due to calls to [2] or [3] by the VM function. >> * Any calls that do not call [2] or [3] do not require an ISB. >> * This patch is more optimal for these cases. >> >> 2) Instances of Java code calling a JNI function >> * This encapsulates the changes to: >> ** SharedRuntime::generate_native_wrapper() >> ** TemplateInterpreterGenerator::generate_native_entry() >> * A safepoint still in progress after the call with be caught by [4]. >> * An ISB is still required for the case where there was a safepoint >> but it completed during the call. This happens if the code doesn't >> branch on safepoint_in_progress >> * In the SharedRuntime version, the two possible calls to >> reguard_yellow_pages and complete_monitor_unlocking_C are after the thread >> goes back into it's original state, so are covered by [2] and [3], the >> same as a normal VM call. >> * This patch is only more optimal for the two post-JNI calls. >> >> 3) Patching functions >> * This encapsulates the changes to: >> ** patch_callers_callsite() (called by gen_c2i_adapter()) >> * This results in code being patched, but does not safepoint >> * Therefore an ISB is required. >> * This patch introduces no change here. >> >> 4) C1 MacroAssembler::emit_static_call_stub() >> * Calls ISB (not maybe_isb) >> * By design, the patching doesn't require that the up-to-date >> destination is required for proper functioning. >> * However, the ISB makes it most likely that the new destination will >> be picked up. >> * This patch introduces no change here. >> >> Patch 3: Add cross modify fence verification >> ============================================ >> >> The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct >> usage of instruction barriers. It can safely be enabled on any Java run. >> >> Enabling it will cause the following: >> >> * Once all threads have been brought to a safepoint, each thread will be >> marked. >> >> * On a cross_modify_fence and safepoint_fence the mark for that thread >> will be cleared. >> >> * On entry to a method and in a safepoint poll, then the thread is checked. >> If it is marked, then the code will error. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Fix global flags indentation Could you please incorporate the following orderAccess changes for windows_aarch64.hpp in your PR?: https://gist.github.com/mo-beck/2dc66d068741030b4a422b57607bea8e#file-orderaccess_windows_aarch64_hpp-diff ------------- Changes requested by mbeckwit (Author). PR: https://git.openjdk.java.net/jdk/pull/428 From github.com+4146708+a74nh at openjdk.java.net Wed Nov 18 17:20:08 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Wed, 18 Nov 2020 17:20:08 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v9] In-Reply-To: <5Gv2io-Pf6IKjd2gAYl2G02cBqpX5hX9tw52j0gLP8s=.7e4e4b43-908b-4f4d-a3eb-347f8719af75@github.com> References: <5Gv2io-Pf6IKjd2gAYl2G02cBqpX5hX9tw52j0gLP8s=.7e4e4b43-908b-4f4d-a3eb-347f8719af75@github.com> Message-ID: On Wed, 18 Nov 2020 17:01:16 GMT, Monica Beckwith wrote: > Could you please incorporate the following orderAccess changes for windows_aarch64.hpp in your PR?: https://gist.github.com/mo-beck/2dc66d068741030b4a422b57607bea8e#file-orderaccess_windows_aarch64_hpp-diff Happy to do this. The file has been added since I started the patch, so that's fine. But as a disclaimer for myself, I've no way of testing it, and I'm going to have to assume it just works for Windows AArch64. ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From coleenp at openjdk.java.net Wed Nov 18 17:33:04 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 18 Nov 2020 17:33:04 GMT Subject: RFR: JDK-8255544: Create a checked cast [v2] In-Reply-To: References: Message-ID: On Sat, 31 Oct 2020 14:02:07 GMT, Andrew Haley wrote: >> In many places we've added C-style casts to silence compiler warnings, for example when truncating a size_t to an int when we know the size_t is a small struct. Such casts are inherently risky, because they effectively disable useful compiler warnings. We should add a form of cast that checks at runtime that a truncation does not overflow. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8255544: Create a checked cast Looks good to me. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/904 From github.com+51754783+coreyashford at openjdk.java.net Wed Nov 18 17:38:01 2020 From: github.com+51754783+coreyashford at openjdk.java.net (Corey Ashford) Date: Wed, 18 Nov 2020 17:38:01 GMT Subject: RFR: 8256479: [PPC64] C2 crashes when UseVectorByteReverseInstructionsPPC64 used without SuperwordUseVSX In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 15:27:29 GMT, Martin Doerr wrote: > C2 crashes on Power9 when using UseVectorByteReverseInstructionsPPC64 without SuperwordUseVSX. bytes_reverse_long_vecNode uses a Vector Register. See bug for more details. > > The VectorConversion tests can detect the issue. Looks good. I don't see any problems with this. ------------- PR: https://git.openjdk.java.net/jdk/pull/1262 From pliden at openjdk.java.net Wed Nov 18 18:03:01 2020 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 18 Nov 2020 18:03:01 GMT Subject: RFR: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 09:53:11 GMT, Kim Barrett wrote: > Please review this simplification of jlr.Reference clearing by VM code. > > The function java_lang_ref_Reference::set_referent_raw was being used to > clear the referent of Reference objects, and only for that purpose. This > change replaces that function with java_lang_ref_Reference::clear_referent, > which is much more obvious in intent. That change is then percolated up > through callers in the obvious way. > > Testing: > mach5 tier1 Looks good! ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1286 From pliden at openjdk.java.net Wed Nov 18 18:03:04 2020 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 18 Nov 2020 18:03:04 GMT Subject: RFR: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 10:10:33 GMT, Aleksey Shipilev wrote: >> Please review this simplification of jlr.Reference clearing by VM code. >> >> The function java_lang_ref_Reference::set_referent_raw was being used to >> clear the referent of Reference objects, and only for that purpose. This >> change replaces that function with java_lang_ref_Reference::clear_referent, >> which is much more obvious in intent. That change is then percolated up >> through callers in the obvious way. >> >> Testing: >> mach5 tier1 > > src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 319: > >> 317: } else { >> 318: // Clear referent >> 319: reference_clear_referent(reference); > > Now I am looking at this code and wonder if we could just inline `reference_clear_referent` and `reference_set_next` both in Shenandoah and ZGC code. Probably something for a followup. I'd prefer to keep the `reference_*` helper functions as is in ZGC. ------------- PR: https://git.openjdk.java.net/jdk/pull/1286 From github.com+168222+mgkwill at openjdk.java.net Wed Nov 18 19:18:09 2020 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Wed, 18 Nov 2020 19:18:09 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap In-Reply-To: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> Message-ID: On Thu, 12 Nov 2020 15:36:13 GMT, Thomas Stuefe wrote: >> Use 2m pages for executable large >> pages and in large page requests less >> than 1g on linux. >> >> - Add os::exec_large_page_size() that >> returns 2m as size >> - Add os::select_large_page_size() to return >> correct large page size for size_t bytes >> - Add 2m size to _page_sizes array >> - Update reserve_memory_special methods >> to set/use large_page_size based on exec >> size >> - Update large page not reserved warnings >> to include large_page_size attempted >> - Update TestLargePageUseForAuxMemory.java >> to expect 2m large pages in some instances >> >> Signed-off-by: Marcus G K Williams > > Hi, > > this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. > > Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. > > I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. > > What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) > > Why is this proposal hard coded to 2M pages? > > What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". > > One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? > > What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? > > The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. > > For SHM, I think you need to make sure that alignment matches SHMLBA? > > It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. > > Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). > > The linux-2m-page-specific code in the platform-generic G1 test seems wrong. > > Cheers, Thomas Hi Thomas, Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some. **Responses below inline:** > Hi, > > this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. > > Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. > I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance). To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. > I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. > > What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted. > > Why is this proposal hard coded to 2M pages? > To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. Also it was an implementation suggestion from some Oracle engineers on the topic, though my implementation of those suggestions could be suspect. Perhaps we should detect the smallest large page size in _page_sizes array that will fit the requested memory amount. However populating _page_sizes array is complicated by the fact that the current jdk code relies heavily on the default large page size, almost as if that is the only size that can be used. 2M large page sizes are the default default_large_page_size in Linux and should be available even if one configures the default_large_page_size to 1G. > What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". > Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 This is where 2m pages are added. However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077 we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G So in https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229 we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested. > One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? > > What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? > My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back. > The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. > > For SHM, I think you need to make sure that alignment matches SHMLBA? Looking into this. > > It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. > I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations. > Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). > Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those. > The linux-2m-page-specific code in the platform-generic G1 test seems wrong. Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled? > > Cheers, Thomas Thanks again for the review. > src/hotspot/os/linux/os_linux.cpp line 3970: > >> 3968: char* req_addr, bool exec) { >> 3969: size_t large_page_size; >> 3970: large_page_size = os::select_large_page_size(bytes, exec); > > The "os" is shared and platform generic. Please don't add anything there unless you write (and test as much as possible) the different platforms. I do not see why this API should even be exported from this unit. Understood. Will move from src/hotspot/share/runtime/os.hpp to src/hotspot/os/linux/os_linux.hpp . > src/hotspot/os/linux/os_linux.cpp line 4002: > >> 4000: char msg[128]; >> 4001: jio_snprintf(msg, sizeof(msg), "Failed to reserve shared memory with large_page_size: " SIZE_FORMAT ".", large_page_size); >> 4002: shm_warning_format_with_errno("%s", msg); > > Why the double printf here? But you can just use Univeral Logging ` log_info(os)("..") `. See e.g. thread creation in this file for examples. Looking into this. > test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 80: > >> 78: } >> 79: >> 80: static void testVM(String what, long heapsize, boolean cardsShouldUseLargePages, boolean bitmapShouldUseLargePages, boolean largePages2m) throws Exception { > > Having this linux-specific stuff in a generic G1 test :( Understood. Not sure how others would handle this as assumptions would change if my code is merged and one gets 2M large pages on smaller mem reservations but only on linux. Any advice? > test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 150: > >> 148: if (Platform.isLinux() && largePageSize != largePageExecSize) { >> 149: try { >> 150: Scanner scan_hugepages = new Scanner(new File("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages")); > > 2M hard coded. Understood. Not sure how others would handle this. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From github.com+168222+mgkwill at openjdk.java.net Wed Nov 18 19:28:08 2020 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Wed, 18 Nov 2020 19:28:08 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap In-Reply-To: References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> Message-ID: On Wed, 18 Nov 2020 19:13:02 GMT, Marcus G K Williams wrote: >> Hi, >> >> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. >> >> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. >> >> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. >> >> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) >> >> Why is this proposal hard coded to 2M pages? >> >> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". >> >> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? >> >> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? >> >> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. >> >> For SHM, I think you need to make sure that alignment matches SHMLBA? >> >> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. >> >> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). >> >> The linux-2m-page-specific code in the platform-generic G1 test seems wrong. >> >> Cheers, Thomas > > Hi Thomas, > > Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some. > > **Responses below inline:** > >> Hi, >> >> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. >> >> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. >> > > I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance). > > To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. > >> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. >> >> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) > > I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted. > >> >> Why is this proposal hard coded to 2M pages? >> > > To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. Also it was an implementation suggestion from some Oracle engineers on the topic, though my implementation of those suggestions could be suspect. Perhaps we should detect the smallest large page size in _page_sizes array that will fit the requested memory amount. However populating _page_sizes array is complicated by the fact that the current jdk code relies heavily on the default large page size, almost as if that is the only size that can be used. 2M large page sizes are the default default_large_page_size in Linux and should be available even if one configures the default_large_page_size to 1G. > >> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". >> > > Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 > This is where 2m pages are added. > > However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077 > we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G > > So in > https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229 > we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested. > >> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? >> >> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? >> > > My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back. > >> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. >> >> For SHM, I think you need to make sure that alignment matches SHMLBA? > > Looking into this. > >> >> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. >> > > I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations. > >> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). >> > > Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those. > >> The linux-2m-page-specific code in the platform-generic G1 test seems wrong. > > Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled? > >> >> Cheers, Thomas > > Thanks again for the review. Hi Stefan, Thanks so much for your review. > Hi and welcome :) > > I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments: > > * Why do we have a special case for `exec` when selecting a large page size? To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. Perhaps I should just select the page size <= bytes requested and remove 'exec' special case. > * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`. os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sspitsyn at openjdk.java.net Wed Nov 18 19:34:12 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 18 Nov 2020 19:34:12 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: References: Message-ID: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> On Mon, 16 Nov 2020 23:30:25 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix minimal build. Hi Coleen, It looks good to me. Just a couple of nits below. src/hotspot/share/prims/jvmtiTagMap.cpp: There is a double-check for _needs_cleaning, so the one at line 136 can be removed: 136 if (_needs_cleaning && 137 post_events && 138 env()->is_enabled(JVMTI_EVENT_OBJECT_FREE)) { 139 remove_dead_entries(true /* post_object_free */); 1158 void JvmtiTagMap::remove_dead_entries(bool post_object_free) { 1159 assert(is_locked(), "precondition"); 1160 if (_needs_cleaning) { 1161 log_info(jvmti, table)("TagMap table needs cleaning%s", 1162 (post_object_free ? " and posting" : "")); 1163 hashmap()->remove_dead_entries(env(), post_object_free); 1164 _needs_cleaning = false; 1165 } 1166 } test/hotspot/jtreg/vmTestbase/nsk/jvmti/AttachOnDemand/attach021/attach021Agent00.cpp: The change below is not needed as the call to nsk_jvmti_aod_disableEventAndFinish() does exactly the same: - nsk_jvmti_aod_disableEventAndFinish(agentName, JVMTI_EVENT_OBJECT_FREE, success, jvmti, jni); + + /* Flush any pending ObjectFree events, which may set success to 1 */ + if (jvmti->SetEventNotificationMode(JVMTI_DISABLE, + JVMTI_EVENT_OBJECT_FREE, + NULL) != JVMTI_ERROR_NONE) { + success = 0; + } + + nsk_aod_agentFinished(jni, agentName, success); } ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/967 From sjohanss at openjdk.java.net Wed Nov 18 20:44:19 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Nov 2020 20:44:19 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v8] In-Reply-To: References: Message-ID: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into 8236926-ccu - Thomas review - Merge branch 'master' into 8236926-ccu - Zoom feedback - Albert review 2 - Albert review - Merge branch 'master' into 8236926-ccu - Lock for small mapper and use BitMap parallel operations. - Self review - Simplified task - ... and 7 more: https://git.openjdk.java.net/jdk/compare/03e84ef7...6e3e33fc ------------- Changes: https://git.openjdk.java.net/jdk/pull/1141/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=07 Stats: 1502 lines in 25 files changed: 1332 ins; 102 del; 68 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Wed Nov 18 20:44:20 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Nov 2020 20:44:20 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 19:39:13 GMT, Stefan Johansson wrote: >> Please review this change that implements concurrent uncommit for G1. >> >> **Summary** >> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. >> >> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). >> >> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. >> >> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. >> >> **Logging** >> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: >> [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 >> ... >> [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms >> [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms >> ... >> [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms >> >> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. >> >> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: >> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) >> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) >> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) >> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) >> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) >> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) >> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) >> >> **Testing** >> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. >> >> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. > > Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into 8236926-ccu > - Zoom feedback > - Albert review 2 > - Albert review > - Merge branch 'master' into 8236926-ccu > - Lock for small mapper and use BitMap parallel operations. > - Self review > - Simplified task > - Improved logging > - Test improvement > - ... and 5 more: https://git.openjdk.java.net/jdk/compare/3675653c...c354b1d8 Thanks for the review Thomas. Addressed most of your concerns, but some things I left as is. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Wed Nov 18 20:44:29 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Nov 2020 20:44:29 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 09:17:42 GMT, Thomas Schatzl wrote: >> Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into 8236926-ccu >> - Zoom feedback >> - Albert review 2 >> - Albert review >> - Merge branch 'master' into 8236926-ccu >> - Lock for small mapper and use BitMap parallel operations. >> - Self review >> - Simplified task >> - Improved logging >> - Test improvement >> - ... and 5 more: https://git.openjdk.java.net/jdk/compare/3675653c...c354b1d8 > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 747: > >> 745: HeapRegion* prev_last_region = NULL; >> 746: size_t size_used = 0; >> 747: size_t shrink_count = 0; > > The code may as well define `shrink_count` as `uint` as the only use seems to cast it to `uint` anyway. I agree, I left it `size_t` since everything else in this function uses `size_t`, but `uint` is a better fit. > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 806: > >> 804: HeapRegion::GrainWords * HeapWordSize * shrink_count); >> 805: // Explicit uncommit. >> 806: _hrm.uncommit_inactive_regions((uint) shrink_count); > > Please let the code use the `G1CollectedHeap::uncommit_regions()` helper here to limit the references to `_hrm`. Good point, fixed. > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1312: > >> 1310: shrink_bytes, aligned_shrink_bytes, shrunk_bytes); >> 1311: if (num_regions_removed > 0) { >> 1312: log_debug(gc, heap)("Regions ready for uncommit: %u", num_regions_removed); > > Maybe keep with existing terminology here i.e. `Uncommittable regions after shrink: %u` Sound good. > src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 567: > >> 565: >> 566: // Check if there is memory to uncommit and if so schedule a task to do it. >> 567: void uncommit_heap_if_necessary(); > > I would prefer if the method were called `uncommit_regions_if_necessary()` as this method does not uncommit the *heap* but just uncomittable regions. Done, I agree that is more accurate. > src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 568: > >> 566: // Check if there is memory to uncommit and if so schedule a task to do it. >> 567: void uncommit_heap_if_necessary(); >> 568: uint uncommit_regions(uint region_limit); > > Please add a comment like "// Immediately uncommits uncommittable regions. Done. > src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 46: > >> 44: }; >> 45: >> 46: class G1CommittedRegionMap : public CHeapObj { > > It would be nice to have a diagram of the region states and their transitions here. Good idea. > src/hotspot/share/gc/g1/g1CommittedRegionMap.hpp line 98: > >> 96: // Finds the next range of committable regions starting at offset. >> 97: // This function must only be called when no inactive regions are >> 98: // present and can be used to active more regions. > > s/active/activate ?? > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1172: > >> 1170: reset_at_marking_complete(); >> 1171: >> 1172: _g1h->uncommit_heap_if_necessary(); > > I'd prefer it this call were placed below the `resize_heap_if_necessary` call unless there is a reason not to. Fixed. > src/hotspot/share/gc/g1/g1FullCollector.cpp line 218: > >> 216: _heap->print_heap_after_full_collection(scope()->heap_transition()); >> 217: >> 218: _heap->uncommit_heap_if_necessary(); > > Maybe move this to the `gc_epilogue` or actually into `prepare_heap_for_mutators` where the shrinking occurs. Moved it to `prepare_heap_for_mutators`. > src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 81: > >> 79: } >> 80: >> 81: bool committed_range(uint start_idx, size_t num_regions) { > > I would prefer if these two methods were named more like a condition, i.e. `is_range_committed` and `is_range_uncommitted` (optionally exchanging `range` and `[un]committed`) as this makes the verification they are used in read better. Went with `is_range_[un]committed`. > src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 92: > >> 90: >> 91: virtual void commit_regions(uint start_idx, size_t num_regions, WorkGang* pretouch_gang) { >> 92: guarantee(uncommitted_range(start_idx, num_regions), > > Not sure this (and the one in `uncommit_regions`) should be guarantees. I went with `guarantee` since this is what's used in `G1PageBasedVirtualSpace` for similar checks. Errors like this might be more likely in release builds. > src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp line 138: > >> 136: // - G1RegionToSpaceMapper::_region_commit_map; >> 137: // - G1PageBasedVirtualSpace::_committed (_storage.commit()) >> 138: Mutex _lock; > > Good comment! :) Thanks! > src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 83: > >> 81: _summary_duration += time; >> 82: >> 83: log_trace(gc, heap)("Concurrent Uncommit: " SIZE_FORMAT "%s, %u regions, %1.3fms", > > Maybe use gc+heap+region here (and below) like other code; imho after all this task is kind of an extension of the `HeapRegionManager`. I did not look at the output though. My idea is to use `gc+heap+region` for transitions on a region level and `gc+heap` for the actual changes to the heap. See the PR description for some examples. > src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 128: > >> 126: // No delay, reason to reschedule rather then to loop is to allow >> 127: // other tasks to run without waiting for a full uncommit cycle. >> 128: schedule(0); > > Why not notify like in `run()`? Maybe refactor the code a bit to allow calling the same method to schedule the task and notify the service thread for immediate processing here. Because here we know that the service thread is running and if we schedule with a 0 delay, there is no risk of it going to sleep before we run again. Another task might be up next, but this task will eventually run before the service thread can go to sleep. There is room for improvement on how we schedule tasks from another thread. I changed `enqueue` to call a new public function on the `G1ServiceThread` called `schedule_task`, which calls `schedule(task, delay)` and then `notify()`. The function `schedule(task, delay)` is what previously was named `schedule_task`. This is what will be called when someone does `task->schedule(delay)` as well, so it is a bit more unified. > src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp line 42: > >> 40: // while running on the service thread joined with the suspendible >> 41: // thread set. >> 42: bool _active; > > The information in the comment is missing context on how scheduling and suspension works and how the `_active` flag interacts with `run()` requests. The MT safety seems an implementation detail. I.e. it would be nice to know that: > * this task does its work in chunks, i.e. does not uncommit everything at once to avoid long stalls or issues with GCs interrupting it > * this flag prevents rescheduling the task when it has not uncommitted everything yet but another request (`run` call) comes in. Fair point, since I moved the 256M constant into the class, the chunking info is now in a comment for this constant. For the state I added the info around usage and moved the implementation detail into `set_active()`. > src/hotspot/share/gc/g1/g1UncommitRegionTask.hpp line 58: > >> 56: >> 57: public: >> 58: static void run(); > > I'd prefer if the method were named `schedule` (or something like `schedule_for_later` or maybe `enqueue` to not clash with the protected `schedule(jlong)` method) instead of `run` since imho `run` implies that it is actually executed. I like `enqueue`, changed. > src/hotspot/share/gc/g1/g1ServiceThread.hpp line 138: > >> 136: // Notify a change to the service thread. Used to stop either >> 137: // stop the service or to force check for new tasks. >> 138: void notify(); > > The first "stop" is too much Good catch. > src/hotspot/share/gc/g1/heapRegionManager.cpp line 181: > >> 179: G1CollectedHeap::heap()->hr_printer()->commit(hr); >> 180: } >> 181: activate_regions(start, num_regions); > > In this place there will be two messages from the HRPrinter for every region: > 1) a COMMIT message and > 2) an ACTIVATE message > > This is a bit confusing as in my understanding (that's why I asked for a region state diagram in the `G1CommittedRegionMap` which may as well be put in `HeapRegionManager`) the (typical) flow of states are Uncommitted->Committed/Active->Committed/Inactive->Uncommitted. > As mentioned, I'm not sure if it is a good idea to send two separate messages here; better rename the "Active" message to "Commit-Active" (and "Inactive" to "Commit-Inactive") instead imho, even if it's quite long (and drop `HRPrinter::commit()` completely) It's true that it will generate two messages when committing a previously uncommitted region, but I still think it is valuable to separate them since we can also have the state change `Active->Inactive->Active`. In this case the transition from Inactive to Active will not include a commit, but rather making inactive regions active again. Just seeing a "Commit-Active" message in this case would not be as clear as seeing "Active" that is not immediately preceded by "Commit". ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Wed Nov 18 20:44:30 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Nov 2020 20:44:30 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v8] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 09:24:47 GMT, Thomas Schatzl wrote: >> 256M is just a "reasonable" limit that I picked to get short enough invocations. I updated the comment a bit. > > One suggestion I have is to make this a static constant in the class declaration not dig it up somewhere in the code. Moved the 256 * M out to be a class constant, leaving the region limit here since it is not known compile time. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From mchung at openjdk.java.net Wed Nov 18 21:57:09 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 18 Nov 2020 21:57:09 GMT Subject: RFR: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: <8HHuL5zgZD6or93_F0C9twqdpMR35c9MUIvS_G16YRk=.3978c5ea-67b2-4178-ac60-9e04b9eef05e@github.com> On Wed, 18 Nov 2020 09:53:11 GMT, Kim Barrett wrote: > Please review this simplification of jlr.Reference clearing by VM code. > > The function java_lang_ref_Reference::set_referent_raw was being used to > clear the referent of Reference objects, and only for that purpose. This > change replaces that function with java_lang_ref_Reference::clear_referent, > which is much more obvious in intent. That change is then percolated up > through callers in the obvious way. > > Testing: > mach5 tier1 Marked as reviewed by mchung (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1286 From erikj at openjdk.java.net Wed Nov 18 22:32:03 2020 From: erikj at openjdk.java.net (Erik Joelsson) Date: Wed, 18 Nov 2020 22:32:03 GMT Subject: RFR: 8256497: Zero: enable G1 and Shenandoah GCs In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:02:03 GMT, Aleksey Shipilev wrote: > Following the [JDK-8255796](https://bugs.openjdk.java.net/browse/JDK-8255796) improvement that ditched the inline contiguous alloc use from Zero, we can now rely on GC interface to hook the GCs properly. G1 and Shenandoah are a bit special here, because they require special `Reference.get` handling. > > Note that it does not change the default GC for Zero, because Zero is implicitly `NeverActAsServerMachine`, which still selects Serial GC by default. After this change, Zero users can opt-in to G1 or Shenandoah. > > Additional testing: > - [x] Linux x86_64 Zero fastdebug `hotspot_gc_shenandoah` (some lingering failures about non-enabled compressed oops) > - [ ] Linux x86_64 Zero fastdebug `tier1` with `-XX:+UseG1GC` Build change looks ok. ------------- Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1268 From coleenp at openjdk.java.net Wed Nov 18 23:24:12 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 18 Nov 2020 23:24:12 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v10] In-Reply-To: <4wSXIq_YvsFFLWWNjVgLCfNqbrxHDajiQGrKPuwcP3A=.5427b09b-48e2-409f-8099-9e44fbdc339d@github.com> References: <4wSXIq_YvsFFLWWNjVgLCfNqbrxHDajiQGrKPuwcP3A=.5427b09b-48e2-409f-8099-9e44fbdc339d@github.com> Message-ID: <6MQhIGtA1LWvXvSFSZWRkyU6d9AXf_LVdkE0zAq0ekc=.151e621a-eec4-4366-8ffa-729eb1d5bb63@github.com> On Mon, 16 Nov 2020 23:10:21 GMT, Coleen Phillimore wrote: >> This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. >> >> The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. >> >> The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: >> >> 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. >> 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. >> >> Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. >> >> To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. >> >> Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. >> >> It has also been tested with tier1-6. >> >> Thank you to Stefan, Erik and Kim for their help with this change. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Reverse remove_dead_entries_locked function names. > - Merge branch 'master' into jvmti-table > - Add shenandoah set_needs_cleaning but this doesn't work. > - fix vmTestbase/nsk/jvmti tests > - improve tagmap cleanup and objectfree event posting > - Add logging to event posting in case of pauses. > - Merge branch 'master' into jvmti-table > - Add back WeakProcessorPhases::Phase enum. > - Serguei 1. > - Code review comments from Kim and Albert. > - ... and 5 more: https://git.openjdk.java.net/jdk/compare/0357db35...daaa13fe test/hotspot/jtreg/vmTestbase/nsk/jvmti/AttachOnDemand/attach021/attach021Agent00.cpp line 72: > 70: Java_nsk_jvmti_AttachOnDemand_attach021_attach021Target_shutdownAgent(JNIEnv * jni, > 71: jclass klass) { > 72: @kimbarrett ? I'm not sure why you made this change. See Serguei's comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Wed Nov 18 23:29:10 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 18 Nov 2020 23:29:10 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> References: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> Message-ID: On Wed, 18 Nov 2020 19:31:44 GMT, Serguei Spitsyn wrote: > There is a double-check for _needs_cleaning, so the one at line 136 can be removed: I want to leave _needs_cleaning at 136 because even though the boolean is checked twice, it doesn't hurt performance and it has a nice symmetry in that function. I asked Kim about the other change. Thank you for reviewing, Serguei! ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Wed Nov 18 23:51:04 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 18 Nov 2020 23:51:04 GMT Subject: RFR: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 10:07:25 GMT, Aleksey Shipilev wrote: >> Please review this simplification of jlr.Reference clearing by VM code. >> >> The function java_lang_ref_Reference::set_referent_raw was being used to >> clear the referent of Reference objects, and only for that purpose. This >> change replaces that function with java_lang_ref_Reference::clear_referent, >> which is much more obvious in intent. That change is then percolated up >> through callers in the obvious way. >> >> Testing: >> mach5 tier1 > > src/hotspot/share/gc/shared/referenceProcessor.hpp line 123: > >> 121: >> 122: // Apply the keep_alive function to the referent address. >> 123: void make_referent_alive(); > > I wonder if moving this from the `.hpp` to `.cpp` has performance implications for callers. Maybe move to `.inline.hpp`? The reason for moving it out of the .hpp was of course that the change to call java_lang_ref_Reference::referent_addr_raw needs to #include javaClasses.inline.hpp. I don't see this move having any measureable performance difference, and not even sure what the sign of any change might be. A better refactoring might be to package up the common remove/make_referent_alive/move_to_next sequence. ------------- PR: https://git.openjdk.java.net/jdk/pull/1286 From coleenp at openjdk.java.net Thu Nov 19 00:20:24 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 00:20:24 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v12] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into jvmti-table - Fix minimal build. - Reverse remove_dead_entries_locked function names. - Merge branch 'master' into jvmti-table - Add shenandoah set_needs_cleaning but this doesn't work. - fix vmTestbase/nsk/jvmti tests - improve tagmap cleanup and objectfree event posting - Add logging to event posting in case of pauses. - Merge branch 'master' into jvmti-table - Add back WeakProcessorPhases::Phase enum. - ... and 7 more: https://git.openjdk.java.net/jdk/compare/2b155713...9ef44f28 ------------- Changes: https://git.openjdk.java.net/jdk/pull/967/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=11 Stats: 1884 lines in 49 files changed: 768 ins; 992 del; 124 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Thu Nov 19 00:43:09 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Nov 2020 00:43:09 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> References: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> Message-ID: On Wed, 18 Nov 2020 19:31:44 GMT, Serguei Spitsyn wrote: > test/hotspot/jtreg/vmTestbase/nsk/jvmti/AttachOnDemand/attach021/attach021Agent00.cpp: > > The change below is not needed as the call to nsk_jvmti_aod_disableEventAndFinish() does exactly the same: > > ``` > - nsk_jvmti_aod_disableEventAndFinish(agentName, JVMTI_EVENT_OBJECT_FREE, success, jvmti, jni); > + > + /* Flush any pending ObjectFree events, which may set success to 1 */ > + if (jvmti->SetEventNotificationMode(JVMTI_DISABLE, > + JVMTI_EVENT_OBJECT_FREE, > + NULL) != JVMTI_ERROR_NONE) { > + success = 0; > + } > + > + nsk_aod_agentFinished(jni, agentName, success); > } > ``` This change really is needed. The success variable in the test is a global, initially 0, set to 1 by the ObjectFree handler. In the old code, if the ObjectFree event hasn't been posted yet, we pass the initial 0 value of success to nsk_jvmti_aod_disabledEventAndFinish, where it's a local variable (so unaffected by any changes to the variable in the test), so stays 0 through to the call to nsk_aod_agentFinished. So the test fails. The split in the change causes the updated post-ObjectFree event success value of 1 to be passed to agentFinished. So the test passes. That required some head scratching to find at the time. That's the point of the comment about flushing pending events. Maybe the comment should be improved. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From psandoz at openjdk.java.net Thu Nov 19 01:14:09 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 19 Nov 2020 01:14:09 GMT Subject: RFR: 8256581: Refactor vector conversion tests Message-ID: Refactor the vector conversions tests to improve performance and reduce explicit test methods (using data providers). ------------- Commit messages: - 8256581: Refactor vector conversion tests Changes: https://git.openjdk.java.net/jdk/pull/1302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1302&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256581 Stats: 37231 lines in 6 files changed: 212 ins; 36768 del; 251 mod Patch: https://git.openjdk.java.net/jdk/pull/1302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1302/head:pull/1302 PR: https://git.openjdk.java.net/jdk/pull/1302 From github.com+168222+mgkwill at openjdk.java.net Thu Nov 19 03:06:08 2020 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Thu, 19 Nov 2020 03:06:08 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: References: Message-ID: > Add 2M LargePages to _page_sizes > > Use 2m pages for large page requests > less than 1g on linux when 1G are default > pages > > - Add os::Linux::large_page_size_2m() that > returns 2m as size > - Add os::Linux::select_large_page_size() to return > correct large page size for size_t bytes > - Add 2m size to _page_sizes array > - Update reserve_memory_special methods > to set/use large_page_size based on bytes reserved > - Update large page not reserved warnings > to include large_page_size attempted > - Update TestLargePageUseForAuxMemory.java > to expect 2m large pages in some instances > > Signed-off-by: Marcus G K Williams Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Add 2M LargePages to _page_sizes Use 2m pages for large page requests less than 1g on linux when 1G are default pages - Add os::Linux::large_page_size_2m() that returns 2m as size - Add os::Linux::select_large_page_size() to return correct large page size for size_t bytes - Add 2m size to _page_sizes array - Update reserve_memory_special methods to set/use large_page_size based on bytes reserved - Update large page not reserved warnings to include large_page_size attempted - Update TestLargePageUseForAuxMemory.java to expect 2m large pages in some instances Signed-off-by: Marcus G K Williams ------------- Changes: https://git.openjdk.java.net/jdk/pull/1153/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=01 Stats: 105 lines in 3 files changed: 75 ins; 0 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/1153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153 PR: https://git.openjdk.java.net/jdk/pull/1153 From github.com+168222+mgkwill at openjdk.java.net Thu Nov 19 03:06:09 2020 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Thu, 19 Nov 2020 03:06:09 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> Message-ID: On Wed, 18 Nov 2020 19:14:08 GMT, Marcus G K Williams wrote: >> src/hotspot/os/linux/os_linux.cpp line 3970: >> >>> 3968: char* req_addr, bool exec) { >>> 3969: size_t large_page_size; >>> 3970: large_page_size = os::select_large_page_size(bytes, exec); >> >> The "os" is shared and platform generic. Please don't add anything there unless you write (and test as much as possible) the different platforms. I do not see why this API should even be exported from this unit. > > Understood. Will move from src/hotspot/share/runtime/os.hpp to src/hotspot/os/linux/os_linux.hpp . Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From github.com+168222+mgkwill at openjdk.java.net Thu Nov 19 03:12:05 2020 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Thu, 19 Nov 2020 03:12:05 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> Message-ID: On Wed, 18 Nov 2020 19:22:06 GMT, Marcus G K Williams wrote: >> Hi Thomas, >> >> Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some. >> >> **Responses below inline:** >> >>> Hi, >>> >>> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. >>> >>> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. >>> >> >> I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance). >> >> To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. >> >>> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. >>> >>> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) >> >> I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted. >> >>> >>> Why is this proposal hard coded to 2M pages? >>> >> >> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. Also it was an implementation suggestion from some Oracle engineers on the topic, though my implementation of those suggestions could be suspect. Perhaps we should detect the smallest large page size in _page_sizes array that will fit the requested memory amount. However populating _page_sizes array is complicated by the fact that the current jdk code relies heavily on the default large page size, almost as if that is the only size that can be used. 2M large page sizes are the default default_large_page_size in Linux and should be available even if one configures the default_large_page_size to 1G. >> >>> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". >>> >> >> Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 >> This is where 2m pages are added. >> >> However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077 >> we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G >> >> So in >> https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229 >> we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested. >> >>> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? >>> >>> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? >>> >> >> My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back. >> >>> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. >>> >>> For SHM, I think you need to make sure that alignment matches SHMLBA? >> >> Looking into this. >> >>> >>> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. >>> >> >> I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations. >> >>> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). >>> >> >> Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those. >> >>> The linux-2m-page-specific code in the platform-generic G1 test seems wrong. >> >> Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled? >> >>> >>> Cheers, Thomas >> >> Thanks again for the review. > > Hi Stefan, > > Thanks so much for your review. > >> Hi and welcome :) >> >> I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments: >> >> * Why do we have a special case for `exec` when selecting a large page size? > > To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. > > Perhaps I should just select the page size <= bytes requested and remove 'exec' special case. > >> * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`. > > os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved. Pushed a new patch removing exec references and instead use large page size based on requested memory size bytes. Moved added definitions from os to os::Linux. More work/research in progress. ---- Add 2M LargePages to _page_sizes Use 2m pages for large page requests less than 1g on linux when 1G are default pages - **Add os::Linux::large_page_size_2m() that returns 2m as size** - **Add os::Linux::select_large_page_size() to return correct large page size for size_t bytes** - Add 2m size to _page_sizes array - Update reserve_memory_special methods to set/use large_page_size based on bytes reserved - Update large page not reserved warnings to include large_page_size attempted - Update TestLargePageUseForAuxMemory.java to expect 2m large pages in some instances ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sviswanathan at openjdk.java.net Thu Nov 19 04:05:13 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 19 Nov 2020 04:05:13 GMT Subject: RFR: 8256585: Remove in-place conversion vector operators from Vector API Message-ID: Remove partially implemented in-place conversion vector operators from Vector API: ofNarrowing, ofWidening, INPLACE_XXX ------------- Commit messages: - 8256585: Remove in-place conversion vector operators from Vector API Changes: https://git.openjdk.java.net/jdk/pull/1305/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1305&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256585 Stats: 121 lines in 1 file changed: 0 ins; 118 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1305.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1305/head:pull/1305 PR: https://git.openjdk.java.net/jdk/pull/1305 From shade at openjdk.java.net Thu Nov 19 08:04:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 08:04:01 GMT Subject: RFR: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: <5O5Vqx5aD_xmnnp_o3ivY77pk3nQxVviAStdzdH_LK8=.4a584824-089c-47db-b4de-d738ff219950@github.com> On Wed, 18 Nov 2020 09:53:11 GMT, Kim Barrett wrote: > Please review this simplification of jlr.Reference clearing by VM code. > > The function java_lang_ref_Reference::set_referent_raw was being used to > clear the referent of Reference objects, and only for that purpose. This > change replaces that function with java_lang_ref_Reference::clear_referent, > which is much more obvious in intent. That change is then percolated up > through callers in the obvious way. > > Testing: > mach5 tier1 Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1286 From shade at openjdk.java.net Thu Nov 19 08:04:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 08:04:04 GMT Subject: RFR: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 23:48:46 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/shared/referenceProcessor.hpp line 123: >> >>> 121: >>> 122: // Apply the keep_alive function to the referent address. >>> 123: void make_referent_alive(); >> >> I wonder if moving this from the `.hpp` to `.cpp` has performance implications for callers. Maybe move to `.inline.hpp`? > > The reason for moving it out of the .hpp was of course that the > change to call java_lang_ref_Reference::referent_addr_raw needs to #include > javaClasses.inline.hpp. > > I don't see this move having any measureable performance difference, and not > even sure what the sign of any change might be. A better refactoring might > be to package up the common remove/make_referent_alive/move_to_next sequence. Yeah, it is probably only affects the path that is infrequently taken, i.e. marking through the resurrected referents during weak reference processing, that is probably only finalizers. It is fine to have it in `.cpp`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1286 From sjohanss at openjdk.java.net Thu Nov 19 08:23:03 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 08:23:03 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> Message-ID: <8f-BJdFip5yf0Rv4uw-qcXVk2uM3Lb6Hrq9VPR6UzF4=.04966477-8834-4fb9-aa77-8da86f104176@github.com> On Wed, 18 Nov 2020 19:22:06 GMT, Marcus G K Williams wrote: > Hi Stefan, > > Thanks so much for your review. > > > Hi and welcome :) > > I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments: > > > > * Why do we have a special case for `exec` when selecting a large page size? > > To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. > > Perhaps I should just select the page size <= bytes requested and remove 'exec' special case. > Yes, I see no reason to keep that special case and we want to keep this code as general as possible. Looking at the code in `os::Linux::find_default_large_page_size()` it looks like S390 supports 1M large pages, so we cannot assume 2M. I suggest using a technique similar to the one used in `os::Linux::find_large_page_size` to find supported page sizes. If you scan `/sys/kernel/mm/hugepages` and populate `_page_sizes` using the information found we know we only get supported sizes. > > * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`. > > os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved. You are correct that the default size might indeed be 1G, so using something like I suggest above to figure out the available page sizes and then using an appropriate one given the size of the mapping sounds good. Please also avoid force pushing changes to open PRs since it makes it harder to follow what changes between updates. It is fine for a PR to contain multiple commits and if you need to update with things from the main branch you should merge rather than rebase. Cheers, Stefan ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From shade at openjdk.java.net Thu Nov 19 08:32:14 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 08:32:14 GMT Subject: RFR: 8256497: Zero: enable G1 and Shenandoah GCs [v2] In-Reply-To: References: Message-ID: > Following the [JDK-8255796](https://bugs.openjdk.java.net/browse/JDK-8255796) improvement that ditched the inline contiguous alloc use from Zero, we can now rely on GC interface to hook the GCs properly. G1 and Shenandoah are a bit special here, because they require special `Reference.get` handling. > > Note that it does not change the default GC for Zero, because Zero is implicitly `NeverActAsServerMachine`, which still selects Serial GC by default. After this change, Zero users can opt-in to G1 or Shenandoah. > > Additional testing: > - [x] Linux x86_64 Zero fastdebug `hotspot_gc_shenandoah` (some lingering failures about non-enabled compressed oops) > - [ ] Linux x86_64 Zero fastdebug `tier1` with `-XX:+UseG1GC` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into JDK-8256497-zero-g1-shenandoah - Remove TODO - 8256497: Zero: enable G1 and Shenandoah GCs ------------- Changes: https://git.openjdk.java.net/jdk/pull/1268/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1268&range=01 Stats: 72 lines in 5 files changed: 54 ins; 16 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1268.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1268/head:pull/1268 PR: https://git.openjdk.java.net/jdk/pull/1268 From tschatzl at openjdk.java.net Thu Nov 19 08:33:10 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 19 Nov 2020 08:33:10 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: <-MlQ8Aq3ZIySP5WsAVfXbwEgffac0Xkg2gljQM1g7KE=.682a8845-5c40-4499-8351-468d714dc170@github.com> On Wed, 18 Nov 2020 20:34:42 GMT, Stefan Johansson wrote: >> src/hotspot/share/gc/g1/heapRegionManager.cpp line 181: >> >>> 179: G1CollectedHeap::heap()->hr_printer()->commit(hr); >>> 180: } >>> 181: activate_regions(start, num_regions); >> >> In this place there will be two messages from the HRPrinter for every region: >> 1) a COMMIT message and >> 2) an ACTIVATE message >> >> This is a bit confusing as in my understanding (that's why I asked for a region state diagram in the `G1CommittedRegionMap` which may as well be put in `HeapRegionManager`) the (typical) flow of states are Uncommitted->Committed/Active->Committed/Inactive->Uncommitted. >> As mentioned, I'm not sure if it is a good idea to send two separate messages here; better rename the "Active" message to "Commit-Active" (and "Inactive" to "Commit-Inactive") instead imho, even if it's quite long (and drop `HRPrinter::commit()` completely) > > It's true that it will generate two messages when committing a previously uncommitted region, but I still think it is valuable to separate them since we can also have the state change `Active->Inactive->Active`. In this case the transition from Inactive to Active will not include a commit, but rather making inactive regions active again. Just seeing a "Commit-Active" message in this case would not be as clear as seeing "Active" that is not immediately preceded by "Commit". Okay, I looked at the `G1HRPrinter` again, and indeed it prints the action, not the region state, so this is a fair point. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Thu Nov 19 08:43:07 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 08:43:07 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 20:40:29 GMT, Stefan Johansson wrote: >> Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into 8236926-ccu >> - Zoom feedback >> - Albert review 2 >> - Albert review >> - Merge branch 'master' into 8236926-ccu >> - Lock for small mapper and use BitMap parallel operations. >> - Self review >> - Simplified task >> - Improved logging >> - Test improvement >> - ... and 5 more: https://git.openjdk.java.net/jdk/compare/3675653c...c354b1d8 > > Thanks for the review Thomas. Addressed most of your concerns, but some things I left as is. Testing on the latest changes looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From tschatzl at openjdk.java.net Thu Nov 19 08:43:09 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 19 Nov 2020 08:43:09 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 13:28:57 GMT, Stefan Johansson wrote: >> src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 83: >> >>> 81: _summary_duration += time; >>> 82: >>> 83: log_trace(gc, heap)("Concurrent Uncommit: " SIZE_FORMAT "%s, %u regions, %1.3fms", >> >> Maybe use gc+heap+region here (and below) like other code; imho after all this task is kind of an extension of the `HeapRegionManager`. I did not look at the output though. > > My idea is to use `gc+heap+region` for transitions on a region level and `gc+heap` for the actual changes to the heap. See the PR description for some examples. Okay. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From github.com+4146708+a74nh at openjdk.java.net Thu Nov 19 08:48:24 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Thu, 19 Nov 2020 08:48:24 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v10] In-Reply-To: References: Message-ID: <4QMWLVkKVex4WlweFhX5pjx3wvjX8pKIGRkawtFiTNI=.e31d9a04-030c-43c6-9a87-736840aaa721@github.com> > The AArch64 port uses maybe_isb in places where an ISB might be required > because the code may have safepointed. These maybe_isbs are very conservative > and are used in many places are used when a safepoint has not happened. > > cross_modify_fence was added in common code to place a barrier in all the > places after a safepoint has occurred. All the uses of it are in common code, > yet it remains unimplemented on AArch64. > > This set of patches implements cross_modify_fence for AArch64 and reconsiders > every uses of maybe_isb, discarding many of them. In addition, it introduces > a new diagnostic option, which when enabled on AArch64 tests the correct > usage of the barriers. > > Advantage of this patch is threefold: > * Reducing the number of ISBs - giving a theoretical performance improvement. > * Use of common code instead of backend specific code. > * Additional test diagnostic options > > Patch 1: Split cross_modify_fence > ================================= > This is simply refactoring work split out to simplify the other two patches. > > instruction_fence() is provided by each target and simply places > a fence for the instruction stream. > > cross_modify_fence() is now a member of JavaThread and just calls > instruction_fence. This function will be extended in Patch 3. > > Patch 2: Use cross_modify_fence instead of maybe_isb > ==================================================== > > The [n] References refer to the comments for cross_modify_fence in > thread.hpp. > > This is all the existing uses of maybe_isb in the AArch64 target: > > 1) Instances of Java code calling a VM function > * This encapsulates the changes to: > ** MacroAssembler::call_VM_leaf_base() > ** generate_fast_get_int_field0() > ** stubGenerator_aarch64 generate_throw_exception() > ** sharedRuntime_aarch64 generate_handler_blob() > ** SharedRuntime::generate_resolve_blob() > ** C1 LIR_Assembler::rt_call > ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, > generate_handle_exception, generate_code_for. > ** OptoRuntime::generate_exception_blob() > * Any changes will be caught due to calls to [2] or [3] by the VM function. > * Any calls that do not call [2] or [3] do not require an ISB. > * This patch is more optimal for these cases. > > 2) Instances of Java code calling a JNI function > * This encapsulates the changes to: > ** SharedRuntime::generate_native_wrapper() > ** TemplateInterpreterGenerator::generate_native_entry() > * A safepoint still in progress after the call with be caught by [4]. > * An ISB is still required for the case where there was a safepoint > but it completed during the call. This happens if the code doesn't > branch on safepoint_in_progress > * In the SharedRuntime version, the two possible calls to > reguard_yellow_pages and complete_monitor_unlocking_C are after the thread > goes back into it's original state, so are covered by [2] and [3], the > same as a normal VM call. > * This patch is only more optimal for the two post-JNI calls. > > 3) Patching functions > * This encapsulates the changes to: > ** patch_callers_callsite() (called by gen_c2i_adapter()) > * This results in code being patched, but does not safepoint > * Therefore an ISB is required. > * This patch introduces no change here. > > 4) C1 MacroAssembler::emit_static_call_stub() > * Calls ISB (not maybe_isb) > * By design, the patching doesn't require that the up-to-date > destination is required for proper functioning. > * However, the ISB makes it most likely that the new destination will > be picked up. > * This patch introduces no change here. > > Patch 3: Add cross modify fence verification > ============================================ > > The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct > usage of instruction barriers. It can safely be enabled on any Java run. > > Enabling it will cause the following: > > * Once all threads have been brought to a safepoint, each thread will be > marked. > > * On a cross_modify_fence and safepoint_fence the mark for that thread > will be cleared. > > * On entry to a method and in a safepoint poll, then the thread is checked. > If it is marked, then the code will error. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Add cross_modify_fence_impl for Windows AArch64 Change-Id: I8701ea60d2823d16666cb43cb9d0935d92b81e52 CustomizedGitHooks: yes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/428/files - new: https://git.openjdk.java.net/jdk/pull/428/files/56429ca8..655497a0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=428&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=428&range=08-09 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/428.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/428/head:pull/428 PR: https://git.openjdk.java.net/jdk/pull/428 From tschatzl at openjdk.java.net Thu Nov 19 09:12:07 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 19 Nov 2020 09:12:07 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 14:32:02 GMT, Stefan Johansson wrote: >> src/hotspot/share/gc/g1/g1UncommitRegionTask.cpp line 128: >> >>> 126: // No delay, reason to reschedule rather then to loop is to allow >>> 127: // other tasks to run without waiting for a full uncommit cycle. >>> 128: schedule(0); >> >> Why not notify like in `run()`? Maybe refactor the code a bit to allow calling the same method to schedule the task and notify the service thread for immediate processing here. > > Because here we know that the service thread is running and if we schedule with a 0 delay, there is no risk of it going to sleep before we run again. Another task might be up next, but this task will eventually run before the service thread can go to sleep. > > There is room for improvement on how we schedule tasks from another thread. I changed `enqueue` to call a new public function on the `G1ServiceThread` called `schedule_task`, which calls `schedule(task, delay)` and then `notify()`. The function `schedule(task, delay)` is what previously was named `schedule_task`. This is what will be called when someone does `task->schedule(delay)` as well, so it is a bit more unified. My question is mainly, why not notify the task to wake up when scheduling a new task in all cases. I understand the reason for the zero delay. Not seeing the problem of doing so: * tasks that need to run (or are overdue) are automatically run before this task as they have a time-to-run < current time, and so this task is scheduled after * The extra notification for `schedule_task()` does not seem to hurt, at most it wakes up the service thread to do the next scheduled task (which afaiu other tasks if they are already due). I.e. the "optimization" here to not notify the service thread seems to be superfluous. Or maybe the notification could be suppressed automatically if `schedule_task()` is called in `execute` (`G1ServiceThread` can check fairly easily if it is currently running a task) I am concerned about users of the API to needlessly have to decide whether they should call `schedule()` or `schedule_task()` as they have different effect. Maybe `schedule()` could just call `schedule_task()`. (That might be a pre-existing issue of using `schedule` vs. `schedule_task()`, so feel free to say it's out of scope. :) ) ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From tschatzl at openjdk.java.net Thu Nov 19 09:12:06 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 19 Nov 2020 09:12:06 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v8] In-Reply-To: References: Message-ID: <0qczWwLrXPZdrZcKH1WFtQcbtYq5irpFsgM5HiTKJn4=.c677e730-d9bc-4cb4-b393-7c58b2854b72@github.com> On Wed, 18 Nov 2020 20:44:19 GMT, Stefan Johansson wrote: >> Please review this change that implements concurrent uncommit for G1. >> >> **Summary** >> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. >> >> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). >> >> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. >> >> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. >> >> **Logging** >> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: >> [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 >> ... >> [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms >> [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms >> ... >> [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms >> >> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. >> >> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: >> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) >> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) >> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) >> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) >> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) >> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) >> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) >> >> **Testing** >> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. >> >> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. > > Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into 8236926-ccu > - Thomas review > - Merge branch 'master' into 8236926-ccu > - Zoom feedback > - Albert review 2 > - Albert review > - Merge branch 'master' into 8236926-ccu > - Lock for small mapper and use BitMap parallel operations. > - Self review > - Simplified task > - ... and 7 more: https://git.openjdk.java.net/jdk/compare/03e84ef7...6e3e33fc All good, thanks, except for some remaining minor nit and some question. src/hotspot/share/gc/g1/g1ServiceThread.cpp line 213: > 211: // Notify the service thread that there is a new task, thread might > 212: // be waiting and the newly added task might be first in the list. > 213: notify(); Maybe call `schedule_task()` here because the two calls are just that. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1141 From aph at openjdk.java.net Thu Nov 19 09:27:05 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 19 Nov 2020 09:27:05 GMT Subject: RFR: 8221554: aarch64 cross-modifying code [v10] In-Reply-To: <4QMWLVkKVex4WlweFhX5pjx3wvjX8pKIGRkawtFiTNI=.e31d9a04-030c-43c6-9a87-736840aaa721@github.com> References: <4QMWLVkKVex4WlweFhX5pjx3wvjX8pKIGRkawtFiTNI=.e31d9a04-030c-43c6-9a87-736840aaa721@github.com> Message-ID: <_tkcDnK8lWnrNX0ltj7e6omroZ_GojTEzsO56g3pPAw=.7e7df669-f68a-449a-84f0-fc5b8fb09405@github.com> On Thu, 19 Nov 2020 08:48:24 GMT, Alan Hayward wrote: >> The AArch64 port uses maybe_isb in places where an ISB might be required >> because the code may have safepointed. These maybe_isbs are very conservative >> and are used in many places are used when a safepoint has not happened. >> >> cross_modify_fence was added in common code to place a barrier in all the >> places after a safepoint has occurred. All the uses of it are in common code, >> yet it remains unimplemented on AArch64. >> >> This set of patches implements cross_modify_fence for AArch64 and reconsiders >> every uses of maybe_isb, discarding many of them. In addition, it introduces >> a new diagnostic option, which when enabled on AArch64 tests the correct >> usage of the barriers. >> >> Advantage of this patch is threefold: >> * Reducing the number of ISBs - giving a theoretical performance improvement. >> * Use of common code instead of backend specific code. >> * Additional test diagnostic options >> >> Patch 1: Split cross_modify_fence >> ================================= >> This is simply refactoring work split out to simplify the other two patches. >> >> instruction_fence() is provided by each target and simply places >> a fence for the instruction stream. >> >> cross_modify_fence() is now a member of JavaThread and just calls >> instruction_fence. This function will be extended in Patch 3. >> >> Patch 2: Use cross_modify_fence instead of maybe_isb >> ==================================================== >> >> The [n] References refer to the comments for cross_modify_fence in >> thread.hpp. >> >> This is all the existing uses of maybe_isb in the AArch64 target: >> >> 1) Instances of Java code calling a VM function >> * This encapsulates the changes to: >> ** MacroAssembler::call_VM_leaf_base() >> ** generate_fast_get_int_field0() >> ** stubGenerator_aarch64 generate_throw_exception() >> ** sharedRuntime_aarch64 generate_handler_blob() >> ** SharedRuntime::generate_resolve_blob() >> ** C1 LIR_Assembler::rt_call >> ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, >> generate_handle_exception, generate_code_for. >> ** OptoRuntime::generate_exception_blob() >> * Any changes will be caught due to calls to [2] or [3] by the VM function. >> * Any calls that do not call [2] or [3] do not require an ISB. >> * This patch is more optimal for these cases. >> >> 2) Instances of Java code calling a JNI function >> * This encapsulates the changes to: >> ** SharedRuntime::generate_native_wrapper() >> ** TemplateInterpreterGenerator::generate_native_entry() >> * A safepoint still in progress after the call with be caught by [4]. >> * An ISB is still required for the case where there was a safepoint >> but it completed during the call. This happens if the code doesn't >> branch on safepoint_in_progress >> * In the SharedRuntime version, the two possible calls to >> reguard_yellow_pages and complete_monitor_unlocking_C are after the thread >> goes back into it's original state, so are covered by [2] and [3], the >> same as a normal VM call. >> * This patch is only more optimal for the two post-JNI calls. >> >> 3) Patching functions >> * This encapsulates the changes to: >> ** patch_callers_callsite() (called by gen_c2i_adapter()) >> * This results in code being patched, but does not safepoint >> * Therefore an ISB is required. >> * This patch introduces no change here. >> >> 4) C1 MacroAssembler::emit_static_call_stub() >> * Calls ISB (not maybe_isb) >> * By design, the patching doesn't require that the up-to-date >> destination is required for proper functioning. >> * However, the ISB makes it most likely that the new destination will >> be picked up. >> * This patch introduces no change here. >> >> Patch 3: Add cross modify fence verification >> ============================================ >> >> The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct >> usage of instruction barriers. It can safely be enabled on any Java run. >> >> Enabling it will cause the following: >> >> * Once all threads have been brought to a safepoint, each thread will be >> marked. >> >> * On a cross_modify_fence and safepoint_fence the mark for that thread >> will be cleared. >> >> * On entry to a method and in a safepoint poll, then the thread is checked. >> If it is marked, then the code will error. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Add cross_modify_fence_impl for Windows AArch64 > > Change-Id: I8701ea60d2823d16666cb43cb9d0935d92b81e52 > CustomizedGitHooks: yes Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From ihse at openjdk.java.net Thu Nov 19 09:58:05 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Thu, 19 Nov 2020 09:58:05 GMT Subject: RFR: 8256497: Zero: enable G1 and Shenandoah GCs [v2] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 08:32:14 GMT, Aleksey Shipilev wrote: >> Following the [JDK-8255796](https://bugs.openjdk.java.net/browse/JDK-8255796) improvement that ditched the inline contiguous alloc use from Zero, we can now rely on GC interface to hook the GCs properly. G1 and Shenandoah are a bit special here, because they require special `Reference.get` handling. >> >> Note that it does not change the default GC for Zero, because Zero is implicitly `NeverActAsServerMachine`, which still selects Serial GC by default. After this change, Zero users can opt-in to G1 or Shenandoah. >> >> Additional testing: >> - [x] Linux x86_64 Zero fastdebug `hotspot_gc_shenandoah` (some lingering failures about non-enabled compressed oops) >> - [ ] Linux x86_64 Zero fastdebug `tier1` with `-XX:+UseG1GC` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into JDK-8256497-zero-g1-shenandoah > - Remove TODO > - 8256497: Zero: enable G1 and Shenandoah GCs Build changes look good. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1268 From sspitsyn at openjdk.java.net Thu Nov 19 10:13:13 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 19 Nov 2020 10:13:13 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: References: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> Message-ID: On Thu, 19 Nov 2020 00:39:52 GMT, Kim Barrett wrote: >> Hi Coleen, >> It looks good to me. >> Just a couple of nits below. >> >> src/hotspot/share/prims/jvmtiTagMap.cpp: >> >> There is a double-check for _needs_cleaning, so the one at line 136 can be removed: >> 136 if (_needs_cleaning && >> 137 post_events && >> 138 env()->is_enabled(JVMTI_EVENT_OBJECT_FREE)) { >> 139 remove_dead_entries(true /* post_object_free */); >> 1158 void JvmtiTagMap::remove_dead_entries(bool post_object_free) { >> 1159 assert(is_locked(), "precondition"); >> 1160 if (_needs_cleaning) { >> 1161 log_info(jvmti, table)("TagMap table needs cleaning%s", >> 1162 (post_object_free ? " and posting" : "")); >> 1163 hashmap()->remove_dead_entries(env(), post_object_free); >> 1164 _needs_cleaning = false; >> 1165 } >> 1166 } >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/AttachOnDemand/attach021/attach021Agent00.cpp: >> >> The change below is not needed as the call to nsk_jvmti_aod_disableEventAndFinish() does exactly the same: >> - nsk_jvmti_aod_disableEventAndFinish(agentName, JVMTI_EVENT_OBJECT_FREE, success, jvmti, jni); >> + >> + /* Flush any pending ObjectFree events, which may set success to 1 */ >> + if (jvmti->SetEventNotificationMode(JVMTI_DISABLE, >> + JVMTI_EVENT_OBJECT_FREE, >> + NULL) != JVMTI_ERROR_NONE) { >> + success = 0; >> + } >> + >> + nsk_aod_agentFinished(jni, agentName, success); >> } > >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/AttachOnDemand/attach021/attach021Agent00.cpp: >> >> The change below is not needed as the call to nsk_jvmti_aod_disableEventAndFinish() does exactly the same: >> >> ``` >> - nsk_jvmti_aod_disableEventAndFinish(agentName, JVMTI_EVENT_OBJECT_FREE, success, jvmti, jni); >> + >> + /* Flush any pending ObjectFree events, which may set success to 1 */ >> + if (jvmti->SetEventNotificationMode(JVMTI_DISABLE, >> + JVMTI_EVENT_OBJECT_FREE, >> + NULL) != JVMTI_ERROR_NONE) { >> + success = 0; >> + } >> + >> + nsk_aod_agentFinished(jni, agentName, success); >> } >> ``` > > This change really is needed. > > The success variable in the test is a global, initially 0, set to 1 by the > ObjectFree handler. > > In the old code, if the ObjectFree event hasn't been posted yet, we pass the > initial 0 value of success to nsk_jvmti_aod_disabledEventAndFinish, where > it's a local variable (so unaffected by any changes to the variable in the > test), so stays 0 through to the call to nsk_aod_agentFinished. So the test > fails. > > The split in the change causes the updated post-ObjectFree event success > value of 1 to be passed to agentFinished. So the test passes. > > That required some head scratching to find at the time. That's the point of > the comment about flushing pending events. Maybe the comment should be > improved. @kimbarrett Okay, thank you for explanation. I agree, it'd make sense to improve the comment a little bit. Thanks, Serguei ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From rraj at openjdk.java.net Thu Nov 19 10:29:03 2020 From: rraj at openjdk.java.net (Rohit Arul Raj) Date: Thu, 19 Nov 2020 10:29:03 GMT Subject: RFR: 8256536: Newer AMD 19h (EPYC) Processor family defaults In-Reply-To: References: Message-ID: <_77p1TbiDooGF3LoRryCXkl9RblFG9grulAbkDa0Mk8=.271d3976-6878-4d73-b6fd-b82cbdf6fd3e@github.com> On Wed, 18 Nov 2020 10:36:03 GMT, Rohit Arul Raj wrote: > This patch sets the following flags as defaults for newer AMD 19h (EPYC) processors. > bool UseFPUForSpilling = true > bool UseUnalignedLoadStores = true > bool UseXMMForArrayCopy = true > bool UseXMMForObjInit = true > bool UseFastStosb = false > bool AlignVector = false > > Additional testing: make run-test TEST="tier1 tier2" JTREG="JOBS=128" CONF=linux-x86_64-server-fastdebug > > Please review this change. > > Thanks, > Rohit > On 18/11/2020 8:41 pm, Rohit Arul Raj wrote: > > > This patch sets the following flags as defaults for newer AMD 19h (EPYC) processors. > > bool UseFPUForSpilling = true > > bool UseUnalignedLoadStores = true > > bool UseXMMForArrayCopy = true > > bool UseXMMForObjInit = true > > bool UseFastStosb = false > > bool AlignVector = false > > I assume you have performance numbers to justify/motivate this change. > Can you please provide some details in the bug report. > David, Thanks for the review. 1. Below mentioned 4 flags were set as default on AMD 17h too. We are just extending existing AMD 17h defaults to AMD 19h. bool UseFPUForSpilling = true bool UseUnalignedLoadStores = true bool UseXMMForArrayCopy = true bool AlignVector = false 2. Since AMD 19h supports fast string operations, ?UseFastStosb? was enabled by default for object initialization. But from our experiments, XMM/YMM MOVDQU instructions performs better overall especially with respect to array sizes >16 (64 bytes) & <256 (1KB). Attached performance data : [Perf-data.txt](https://github.com/openjdk/jdk/files/5566025/Perf-data.txt) Test case used: I have used the same test case as in (http://cr.openjdk.java.net/~shade/8146801/benchmarks.jar) with additional sizes. bool UseFastStosb = false bool UseXMMForObjInit = true Regards, Rohit ------------- PR: https://git.openjdk.java.net/jdk/pull/1288 From sjohanss at openjdk.java.net Thu Nov 19 12:06:05 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 12:06:05 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v8] In-Reply-To: <0qczWwLrXPZdrZcKH1WFtQcbtYq5irpFsgM5HiTKJn4=.c677e730-d9bc-4cb4-b393-7c58b2854b72@github.com> References: <0qczWwLrXPZdrZcKH1WFtQcbtYq5irpFsgM5HiTKJn4=.c677e730-d9bc-4cb4-b393-7c58b2854b72@github.com> Message-ID: On Thu, 19 Nov 2020 08:45:31 GMT, Thomas Schatzl wrote: >> Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into 8236926-ccu >> - Thomas review >> - Merge branch 'master' into 8236926-ccu >> - Zoom feedback >> - Albert review 2 >> - Albert review >> - Merge branch 'master' into 8236926-ccu >> - Lock for small mapper and use BitMap parallel operations. >> - Self review >> - Simplified task >> - ... and 7 more: https://git.openjdk.java.net/jdk/compare/03e84ef7...6e3e33fc > > src/hotspot/share/gc/g1/g1ServiceThread.cpp line 213: > >> 211: // Notify the service thread that there is a new task, thread might >> 212: // be waiting and the newly added task might be first in the list. >> 213: notify(); > > Maybe call `schedule_task()` here because the two calls are just that. Good catch, yes since we are already calling `schedule_task()` this call to `notify` is not needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Thu Nov 19 12:06:07 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 12:06:07 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: <-qxEvBUBccx4ceKiLAA-NfojHyhWf9t6t1N3SwiImTE=.08b68e70-8ec8-4a6d-8286-fa55e9e7417f@github.com> On Thu, 19 Nov 2020 08:47:47 GMT, Thomas Schatzl wrote: >> Because here we know that the service thread is running and if we schedule with a 0 delay, there is no risk of it going to sleep before we run again. Another task might be up next, but this task will eventually run before the service thread can go to sleep. >> >> There is room for improvement on how we schedule tasks from another thread. I changed `enqueue` to call a new public function on the `G1ServiceThread` called `schedule_task`, which calls `schedule(task, delay)` and then `notify()`. The function `schedule(task, delay)` is what previously was named `schedule_task`. This is what will be called when someone does `task->schedule(delay)` as well, so it is a bit more unified. > > My question is mainly, why not notify the task to wake up when scheduling a new task in all cases. I understand the reason for the zero delay. > > Not seeing the problem of doing so: > > * tasks that need to run (or are overdue) are automatically run before this task as they have a time-to-run < current time, and so this task is scheduled after > > * The extra notification for `schedule_task()` does not seem to hurt, at most it wakes up the service thread to do the next scheduled task (which afaiu other tasks if they are already due). > > I.e. the "optimization" here to not notify the service thread seems to be superfluous. Or maybe the notification could be suppressed automatically if `schedule_task()` is called in `execute` (`G1ServiceThread` can check fairly easily if it is currently running a task) > > I am concerned about users of the API to needlessly have to decide whether they should call `schedule()` or `schedule_task()` as they have different effect. > > Maybe `schedule()` could just call `schedule_task()`. > > (That might be a pre-existing issue of using `schedule` vs. `schedule_task()`, so feel free to say it's out of scope. :) ) You are correct that doing the extra notification won't hurt, but I don't really see why we should do the notification when we know it is not needed. There are probably a few improvements that we want to do when it comes to how the service thread is registering and scheduling tasks. Since this mechanism is very new I think we will realize how we want this to work more and more. This is the way I see it right now (after the introduction of the public `G1ServiceThread::schedule_task()`: * `G1ServiceTask::schedule(delay)` should only be called from a running task. Using it to schedule it from the outside was a bit of a hack. `schedule()` will call `G1ServiceThread::schedule(task, delay)` and nothing more. * `G1ServiceThread::schedule_task(task, delay)` should be used to schedule a task from the outside. It will under the hood call `G1ServiceThread::schedule(task, delay)` and `G1ServiceThread::notify()` to handle the case where the task end up being the first task in the queue. To me separating the use cases is good, but you might not agree. We could add an assert to `G1ServiceTask::schedule(delay)` to ensure that it is only called when running on the service thread, that way we would catch wrong usage of the API quickly. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From github.com+4146708+a74nh at openjdk.java.net Thu Nov 19 12:30:02 2020 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Thu, 19 Nov 2020 12:30:02 GMT Subject: Integrated: 8221554: aarch64 cross-modifying code In-Reply-To: References: Message-ID: On Wed, 30 Sep 2020 08:36:32 GMT, Alan Hayward wrote: > The AArch64 port uses maybe_isb in places where an ISB might be required > because the code may have safepointed. These maybe_isbs are very conservative > and are used in many places are used when a safepoint has not happened. > > cross_modify_fence was added in common code to place a barrier in all the > places after a safepoint has occurred. All the uses of it are in common code, > yet it remains unimplemented on AArch64. > > This set of patches implements cross_modify_fence for AArch64 and reconsiders > every uses of maybe_isb, discarding many of them. In addition, it introduces > a new diagnostic option, which when enabled on AArch64 tests the correct > usage of the barriers. > > Advantage of this patch is threefold: > * Reducing the number of ISBs - giving a theoretical performance improvement. > * Use of common code instead of backend specific code. > * Additional test diagnostic options > > Patch 1: Split cross_modify_fence > ================================= > This is simply refactoring work split out to simplify the other two patches. > > instruction_fence() is provided by each target and simply places > a fence for the instruction stream. > > cross_modify_fence() is now a member of JavaThread and just calls > instruction_fence. This function will be extended in Patch 3. > > Patch 2: Use cross_modify_fence instead of maybe_isb > ==================================================== > > The [n] References refer to the comments for cross_modify_fence in > thread.hpp. > > This is all the existing uses of maybe_isb in the AArch64 target: > > 1) Instances of Java code calling a VM function > * This encapsulates the changes to: > ** MacroAssembler::call_VM_leaf_base() > ** generate_fast_get_int_field0() > ** stubGenerator_aarch64 generate_throw_exception() > ** sharedRuntime_aarch64 generate_handler_blob() > ** SharedRuntime::generate_resolve_blob() > ** C1 LIR_Assembler::rt_call > ** C1 StubAssembler::call_RT(): used by Used by generate_exception_throw, > generate_handle_exception, generate_code_for. > ** OptoRuntime::generate_exception_blob() > * Any changes will be caught due to calls to [2] or [3] by the VM function. > * Any calls that do not call [2] or [3] do not require an ISB. > * This patch is more optimal for these cases. > > 2) Instances of Java code calling a JNI function > * This encapsulates the changes to: > ** SharedRuntime::generate_native_wrapper() > ** TemplateInterpreterGenerator::generate_native_entry() > * A safepoint still in progress after the call with be caught by [4]. > * An ISB is still required for the case where there was a safepoint > but it completed during the call. This happens if the code doesn't > branch on safepoint_in_progress > * In the SharedRuntime version, the two possible calls to > reguard_yellow_pages and complete_monitor_unlocking_C are after the thread > goes back into it's original state, so are covered by [2] and [3], the > same as a normal VM call. > * This patch is only more optimal for the two post-JNI calls. > > 3) Patching functions > * This encapsulates the changes to: > ** patch_callers_callsite() (called by gen_c2i_adapter()) > * This results in code being patched, but does not safepoint > * Therefore an ISB is required. > * This patch introduces no change here. > > 4) C1 MacroAssembler::emit_static_call_stub() > * Calls ISB (not maybe_isb) > * By design, the patching doesn't require that the up-to-date > destination is required for proper functioning. > * However, the ISB makes it most likely that the new destination will > be picked up. > * This patch introduces no change here. > > Patch 3: Add cross modify fence verification > ============================================ > > The VerifyCrossModifyFence diagnostic flag enables confirmation to the correct > usage of instruction barriers. It can safely be enabled on any Java run. > > Enabling it will cause the following: > > * Once all threads have been brought to a safepoint, each thread will be > marked. > > * On a cross_modify_fence and safepoint_fence the mark for that thread > will be cleared. > > * On entry to a method and in a safepoint poll, then the thread is checked. > If it is marked, then the code will error. This pull request has now been integrated. Changeset: d183fc7f Author: Alan Hayward Committer: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/d183fc7f Stats: 145 lines in 26 files changed: 96 ins; 11 del; 38 mod 8221554: aarch64 cross-modifying code Reviewed-by: rehn, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/428 From coleenp at openjdk.java.net Thu Nov 19 12:39:14 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 12:39:14 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: References: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> Message-ID: <6Rg4SwfEMn0metyXsBl4pGfdP5zfspPuBLjFP82bGic=.f3ba30ad-ea01-4484-ae5f-1b6e3ce5b12a@github.com> On Thu, 19 Nov 2020 10:10:06 GMT, Serguei Spitsyn wrote: >>> test/hotspot/jtreg/vmTestbase/nsk/jvmti/AttachOnDemand/attach021/attach021Agent00.cpp: >>> >>> The change below is not needed as the call to nsk_jvmti_aod_disableEventAndFinish() does exactly the same: >>> >>> ``` >>> - nsk_jvmti_aod_disableEventAndFinish(agentName, JVMTI_EVENT_OBJECT_FREE, success, jvmti, jni); >>> + >>> + /* Flush any pending ObjectFree events, which may set success to 1 */ >>> + if (jvmti->SetEventNotificationMode(JVMTI_DISABLE, >>> + JVMTI_EVENT_OBJECT_FREE, >>> + NULL) != JVMTI_ERROR_NONE) { >>> + success = 0; >>> + } >>> + >>> + nsk_aod_agentFinished(jni, agentName, success); >>> } >>> ``` >> >> This change really is needed. >> >> The success variable in the test is a global, initially 0, set to 1 by the >> ObjectFree handler. >> >> In the old code, if the ObjectFree event hasn't been posted yet, we pass the >> initial 0 value of success to nsk_jvmti_aod_disabledEventAndFinish, where >> it's a local variable (so unaffected by any changes to the variable in the >> test), so stays 0 through to the call to nsk_aod_agentFinished. So the test >> fails. >> >> The split in the change causes the updated post-ObjectFree event success >> value of 1 to be passed to agentFinished. So the test passes. >> >> That required some head scratching to find at the time. That's the point of >> the comment about flushing pending events. Maybe the comment should be >> improved. > > @kimbarrett > Okay, thank you for explanation. > I agree, it'd make sense to improve the comment a little bit. > Thanks, > Serguei So should nsk_jvmti_aod_disableEventAndFinish pass the address of success instead? Why didn't it fail before this change? ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Thu Nov 19 12:57:25 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 12:57:25 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: <6Rg4SwfEMn0metyXsBl4pGfdP5zfspPuBLjFP82bGic=.f3ba30ad-ea01-4484-ae5f-1b6e3ce5b12a@github.com> References: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> <6Rg4SwfEMn0metyXsBl4pGfdP5zfspPuBLjFP82bGic=.f3ba30ad-ea01-4484-ae5f-1b6e3ce5b12a@github.com> Message-ID: On Thu, 19 Nov 2020 12:36:46 GMT, Coleen Phillimore wrote: >> @kimbarrett >> Okay, thank you for explanation. >> I agree, it'd make sense to improve the comment a little bit. >> Thanks, >> Serguei > > So should nsk_jvmti_aod_disableEventAndFinish pass the address of success instead? Why didn't it fail before this change? > Ok, with this change, it's not posted yet and the success variable for nsk_aod_agentFinished is the local variable. We should fix this in an RFE filed: https://bugs.openjdk.java.net/browse/JDK-8256651 /* Flush any pending ObjectFree events, which will set global success variable to 1 for any pending ObjectFree events. */ How about this? The word 'global' helps me. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Thu Nov 19 12:57:24 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 12:57:24 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v13] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Update comment in jvmti test. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/9ef44f28..40168e63 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=12 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=11-12 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Thu Nov 19 12:57:25 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 12:57:25 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: References: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> <6Rg4SwfEMn0metyXsBl4pGfdP5zfspPuBLjFP82bGic=.f3ba30ad-ea01-4484-ae5f-1b6e3ce5b12a@github.com> Message-ID: On Thu, 19 Nov 2020 12:51:11 GMT, Coleen Phillimore wrote: >> So should nsk_jvmti_aod_disableEventAndFinish pass the address of success instead? Why didn't it fail before this change? >> Ok, with this change, it's not posted yet and the success variable for nsk_aod_agentFinished is the local variable. We should fix this in an RFE filed: https://bugs.openjdk.java.net/browse/JDK-8256651 > > /* Flush any pending ObjectFree events, which will set global success variable to 1 > for any pending ObjectFree events. */ > How about this? The word 'global' helps me. With remerging into shenandoah, all the jdi tests pass with shenandoah also. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From tschatzl at openjdk.java.net Thu Nov 19 12:59:10 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 19 Nov 2020 12:59:10 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: <-qxEvBUBccx4ceKiLAA-NfojHyhWf9t6t1N3SwiImTE=.08b68e70-8ec8-4a6d-8286-fa55e9e7417f@github.com> References: <-qxEvBUBccx4ceKiLAA-NfojHyhWf9t6t1N3SwiImTE=.08b68e70-8ec8-4a6d-8286-fa55e9e7417f@github.com> Message-ID: On Thu, 19 Nov 2020 12:02:27 GMT, Stefan Johansson wrote: >> My question is mainly, why not notify the task to wake up when scheduling a new task in all cases. I understand the reason for the zero delay. >> >> Not seeing the problem of doing so: >> >> * tasks that need to run (or are overdue) are automatically run before this task as they have a time-to-run < current time, and so this task is scheduled after >> >> * The extra notification for `schedule_task()` does not seem to hurt, at most it wakes up the service thread to do the next scheduled task (which afaiu other tasks if they are already due). >> >> I.e. the "optimization" here to not notify the service thread seems to be superfluous. Or maybe the notification could be suppressed automatically if `schedule_task()` is called in `execute` (`G1ServiceThread` can check fairly easily if it is currently running a task) >> >> I am concerned about users of the API to needlessly have to decide whether they should call `schedule()` or `schedule_task()` as they have different effect. >> >> Maybe `schedule()` could just call `schedule_task()`. >> >> (That might be a pre-existing issue of using `schedule` vs. `schedule_task()`, so feel free to say it's out of scope. :) ) > > You are correct that doing the extra notification won't hurt, but I don't really see why we should do the notification when we know it is not needed. There are probably a few improvements that we want to do when it comes to how the service thread is registering and scheduling tasks. Since this mechanism is very new I think we will realize how we want this to work more and more. > > This is the way I see it right now (after the introduction of the public `G1ServiceThread::schedule_task()`: > * `G1ServiceTask::schedule(delay)` should only be called from a running task. Using it to schedule it from the outside was a bit of a hack. `schedule()` will call `G1ServiceThread::schedule(task, delay)` and nothing more. > * `G1ServiceThread::schedule_task(task, delay)` should be used to schedule a task from the outside. It will under the hood call `G1ServiceThread::schedule(task, delay)` and `G1ServiceThread::notify()` to handle the case where the task end up being the first task in the queue. > > To me separating the use cases is good, but you might not agree. We could add an assert to `G1ServiceTask::schedule(delay)` to ensure that it is only called when running on the service thread, that way we would catch wrong usage of the API quickly. I would be fine with the assert to prevent misuse. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From coleenp at openjdk.java.net Thu Nov 19 13:01:22 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 13:01:22 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v14] In-Reply-To: References: Message-ID: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix copyrights in test changes. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/967/files - new: https://git.openjdk.java.net/jdk/pull/967/files/40168e63..589e4c5b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=13 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=967&range=12-13 Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/967.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/967/head:pull/967 PR: https://git.openjdk.java.net/jdk/pull/967 From stuefe at openjdk.java.net Thu Nov 19 13:11:10 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 19 Nov 2020 13:11:10 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: <8f-BJdFip5yf0Rv4uw-qcXVk2uM3Lb6Hrq9VPR6UzF4=.04966477-8834-4fb9-aa77-8da86f104176@github.com> References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> <8f-BJdFip5yf0Rv4uw-qcXVk2uM3Lb6Hrq9VPR6UzF4=.04966477-8834-4fb9-aa77-8da86f104176@github.com> Message-ID: On Thu, 19 Nov 2020 08:19:59 GMT, Stefan Johansson wrote: >> Hi Stefan, >> >> Thanks so much for your review. >> >>> Hi and welcome :) >>> >>> I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments: >>> >>> * Why do we have a special case for `exec` when selecting a large page size? >> >> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. >> >> Perhaps I should just select the page size <= bytes requested and remove 'exec' special case. >> >>> * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`. >> >> os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved. > >> Hi Stefan, >> >> Thanks so much for your review. >> >> > Hi and welcome :) >> > I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments: >> > >> > * Why do we have a special case for `exec` when selecting a large page size? >> >> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. >> >> Perhaps I should just select the page size <= bytes requested and remove 'exec' special case. >> > Yes, I see no reason to keep that special case and we want to keep this code as general as possible. Looking at the code in `os::Linux::find_default_large_page_size()` it looks like S390 supports 1M large pages, so we cannot assume 2M. I suggest using a technique similar to the one used in `os::Linux::find_large_page_size` to find supported page sizes. If you scan `/sys/kernel/mm/hugepages` and populate `_page_sizes` using the information found we know we only get supported sizes. > >> > * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`. >> >> os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved. > > You are correct that the default size might indeed be 1G, so using something like I suggest above to figure out the available page sizes and then using an appropriate one given the size of the mapping sounds good. > > Please also avoid force pushing changes to open PRs since it makes it harder to follow what changes between updates. It is fine for a PR to contain multiple commits and if you need to update with things from the main branch you should merge rather than rebase. > > Cheers, > Stefan Hi Markus, thanks, and a belated welcome! Some initial background: We at SAP are maintainers for a number of ports, among others AIX and linux ppc/s390 as well as some propietary ones (e.g. HPUX or ia64). So I wear my platform glasses when looking at this code. IMHO the virtual memory layer in hotspot - os::reserve_memory() and all its friends - could do with a revamp. At least a consistent API documentation :-/. Supposed to be an API-independent abstraction, its facade breaks in many places. See e.g. JDK-8255978, JDK-8253649 (all windows), AIX sysV shmem handling, @AntonKozlov's valiant attempt to add MAP_JIT code heap reservation on MacOS (https://github.com/openjdk/jdk/pull/294), or the relative difficulty with which support for JEP 316 (from Intel) had been added. Hence my initial caution. Every new feature increases complexity for us maintainers. Especially if it continues the bad tradition of not documenting or commenting anything. Since I do not know whether Intel sticks around to maintain this contribution (bit of a mixed track record there, see e.g. JDK-8256181), we must plan on maintenance falling to us. That said, now that I understand better what you want to do, your plan certainly makes sense and is useful. One of the more pressing concerns I have is that the changes to reserve_memory() would somehow be observable from the outside and/or leak back into the os layer when calling os::commit_memory/uncommit_memory/release_memory. This is the case with @AntonKozlov's MAP_JIT change: it requires a matching commit call in os::commit_memory() to be made for executable memory allocated with os::reserve_memory(), and therefore exposed one weakness of the os::reserve_memory() API, that its very difficult to pass along meta information about memory mappings. I think this is not the case here, but I'm not sure and we should be sure. **More remarks inline.** > Hi Thomas, > > Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some. > > **Responses below inline:** > > > Hi, > > this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. > > Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. > > I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance). Please beef up the JBS issue a bit. If you do not have access to it, you can send the text to me I will update it. Or even easier, just update the PR description and we copy the text to the JBS. JBS tickets are supposed to keep information about what we did and why for a long time. When formulating the text, just imagine the reader to be someone in the future with general knowledge in your field but without particular knowledge about this very case. I know this is a vague description though; for an example, see e.g. https://bugs.openjdk.java.net/browse/JDK-8255978. > > To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. Right, and as Stefan suggested, this should be kept more "fluid" and not be hard coded to 2M, nor to just one additional large page. Maybe the system has four page sizes (our propietary HPUX has that, not that it matters here). > > > I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. > > What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) > > I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted. > > > What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". > > Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 > This is where 2m pages are added. > > However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077 > we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G > > So in > https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229 > we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested. We need to decide on whether we want to do this for the code heap only or for every reservation done with reserve_memory_special (I really dislike that name btw). In your proposal you "piggyback" on the exec property as a standin for "code heap", which is not clean and also not necessarily true. So: a) If we only want to do this for the code heap, we could think about creating an own API for allocating the code heap. E.g. os::reserve_code_space() and os::release_code_space(). This is one of the ideas @AntonKozlov came up with to circumvent the need for a fully fledged revamp of these APIs while still being able to move his PR forward. b) If we want to do this for all callers of reserve_memory_special(), we should also remove any mention of "exec" and just implement that. I currently favour (b) but would like to know opinions of others. > > > One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? > > What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? > > My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back. Okay. We do not expect every contributor to have exotic test machines, but this means we will have to do that testing. We need to know to plan in these efforts. > > > The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. > > For SHM, I think you need to make sure that alignment matches SHMLBA? > > Looking into this. > > > It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. > > I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations. > > > Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). > > Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those. When I write API specs I basically mean "new code should comment better". That can be as simple as a one liner above your os::Linux::select_large_page_size() function. About regression tests, we have a google-test suite (see test/hotspot/gtest) which would be the appropiate point to put in tests. > > > The linux-2m-page-specific code in the platform-generic G1 test seems wrong. > > Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled? I defer to the G1 folks for that. > > > Cheers, Thomas > > Thanks again for the review. Sure. Thanks for the much more clear information. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sjohanss at openjdk.java.net Thu Nov 19 13:16:20 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 13:16:20 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v9] In-Reply-To: References: Message-ID: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Thomas review 2 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1141/files - new: https://git.openjdk.java.net/jdk/pull/1141/files/6e3e33fc..553f99a1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=07-08 Stats: 13 lines in 2 files changed: 5 ins; 4 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Thu Nov 19 13:35:08 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 13:35:08 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v9] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 13:16:20 GMT, Stefan Johansson wrote: >> Please review this change that implements concurrent uncommit for G1. >> >> **Summary** >> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. >> >> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). >> >> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. >> >> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. >> >> **Logging** >> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: >> [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 >> ... >> [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms >> [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms >> ... >> [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms >> >> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. >> >> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: >> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) >> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) >> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) >> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) >> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) >> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) >> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) >> >> **Testing** >> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. >> >> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Thomas review 2 Thanks Thomas for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Thu Nov 19 13:35:09 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 13:35:09 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: <-qxEvBUBccx4ceKiLAA-NfojHyhWf9t6t1N3SwiImTE=.08b68e70-8ec8-4a6d-8286-fa55e9e7417f@github.com> Message-ID: On Thu, 19 Nov 2020 12:56:31 GMT, Thomas Schatzl wrote: >> You are correct that doing the extra notification won't hurt, but I don't really see why we should do the notification when we know it is not needed. There are probably a few improvements that we want to do when it comes to how the service thread is registering and scheduling tasks. Since this mechanism is very new I think we will realize how we want this to work more and more. >> >> This is the way I see it right now (after the introduction of the public `G1ServiceThread::schedule_task()`: >> * `G1ServiceTask::schedule(delay)` should only be called from a running task. Using it to schedule it from the outside was a bit of a hack. `schedule()` will call `G1ServiceThread::schedule(task, delay)` and nothing more. >> * `G1ServiceThread::schedule_task(task, delay)` should be used to schedule a task from the outside. It will under the hood call `G1ServiceThread::schedule(task, delay)` and `G1ServiceThread::notify()` to handle the case where the task end up being the first task in the queue. >> >> To me separating the use cases is good, but you might not agree. We could add an assert to `G1ServiceTask::schedule(delay)` to ensure that it is only called when running on the service thread, that way we would catch wrong usage of the API quickly. > > I would be fine with the assert to prevent misuse. Just pushed the assert and also update the comments a bit. ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From tschatzl at openjdk.java.net Thu Nov 19 14:18:10 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 19 Nov 2020 14:18:10 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v9] In-Reply-To: References: Message-ID: <80ivNCjCbJgQiMYSzrq0lOE0hC2N9UF4Ke7Davc40gk=.cedfa72a-c6ee-4c07-aee4-d02865bd622b@github.com> On Thu, 19 Nov 2020 13:16:20 GMT, Stefan Johansson wrote: >> Please review this change that implements concurrent uncommit for G1. >> >> **Summary** >> G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. >> >> The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). >> >> Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. >> >> One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. >> >> **Logging** >> To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: >> [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 >> ... >> [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms >> [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms >> ... >> [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms >> >> The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. >> >> On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: >> [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) >> [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) >> [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) >> [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) >> [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) >> [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) >> [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) >> >> **Testing** >> Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. >> >> I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Thomas review 2 Lgtm. Ship it. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1141 From coleenp at openjdk.java.net Thu Nov 19 14:33:16 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 14:33:16 GMT Subject: Integrated: 8212879: Make JVMTI TagMap table concurrent In-Reply-To: References: Message-ID: On Fri, 30 Oct 2020 20:23:04 GMT, Coleen Phillimore wrote: > This change turns the HashTable that JVMTI uses for object tagging into a regular Hotspot hashtable - the one in hashtable.hpp with resizing and rehashing. Instead of pointing directly to oops so that GC has to walk the table to follow oops and then to rehash the table, this table points to WeakHandle. GC walks the backing OopStorages concurrently. > > The hash function for the table is a hash of the lower 32 bits of the address. A flag is set during GC (gc_notification if in a safepoint, and through a call to JvmtiTagMap::needs_processing()) so that the table is rehashed at the next use. > > The gc_notification mechanism of weak oop processing is used to notify Jvmti to post ObjectFree events. In concurrent GCs there can be a window of time between weak oop marking where the oop is unmarked, so dead (the phantom load in peek returns NULL) but the gc_notification hasn't been done yet. In this window, a heap walk or GetObjectsWithTags call would not find an object before the ObjectFree event is posted. This is dealt with in two ways: > > 1. In the Heap walk, there's an unconditional table walk to post events if events are needed to post. > 2. For GetObjectWithTags, if a dead oop is found in the table and posting is required, we use the VM thread to post the event. > > Event posting cannot be done in a JavaThread because the posting needs to be done while holding the table lock, so that the JvmtiEnv state doesn't change before posting is done. ObjectFree callbacks are limited in what they can do as per the JVMTI Specification. The allowed callbacks to the VM already have code to allow NonJava threads. > > To avoid rehashing, I also tried to use object->identity_hash() but this breaks because entries can be added to the table during heapwalk, where the objects use marking. The starting markWord is saved and restored. Adding a hashcode during this operation makes restoring the former markWord (locked, inflated, etc) too complicated. Plus we don't want all these objects to have hashcodes because locking operations after tagging would have to always use inflated locks. > > Much of this change is to remove serial weak oop processing for the weakProcessor, ZGC and Shenandoah. The GCs have been stress tested with jvmti code. > > It has also been tested with tier1-6. > > Thank you to Stefan, Erik and Kim for their help with this change. This pull request has now been integrated. Changeset: ba721f5f Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/ba721f5f Stats: 1891 lines in 49 files changed: 769 ins; 992 del; 130 mod 8212879: Make JVMTI TagMap table concurrent Co-authored-by: Kim Barrett Co-authored-by: Coleen Phillimore Reviewed-by: stefank, ihse, zgu, eosterlund, sspitsyn, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From coleenp at openjdk.java.net Thu Nov 19 14:33:12 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 14:33:12 GMT Subject: RFR: 8212879: Make JVMTI TagMap table concurrent [v11] In-Reply-To: References: <8m4pLTYDq8LLEZ3MRVfnZsKducSHJLX8WK-FxGaqgQw=.f426b0f8-0a50-4383-b037-24a925d9cf7e@github.com> <6Rg4SwfEMn0metyXsBl4pGfdP5zfspPuBLjFP82bGic=.f3ba30ad-ea01-4484-ae5f-1b6e3ce5b12a@github.com> Message-ID: On Thu, 19 Nov 2020 12:54:23 GMT, Coleen Phillimore wrote: >> /* Flush any pending ObjectFree events, which will set global success variable to 1 >> for any pending ObjectFree events. */ >> How about this? The word 'global' helps me. > > With remerging into shenandoah, all the jdi tests pass with shenandoah also. Thank you to all the reviewers. ------------- PR: https://git.openjdk.java.net/jdk/pull/967 From kbarrett at openjdk.java.net Thu Nov 19 15:39:22 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Nov 2020 15:39:22 GMT Subject: RFR: 8256516: Simplify clearing References [v2] In-Reply-To: References: Message-ID: > Please review this simplification of jlr.Reference clearing by VM code. > > The function java_lang_ref_Reference::set_referent_raw was being used to > clear the referent of Reference objects, and only for that purpose. This > change replaces that function with java_lang_ref_Reference::clear_referent, > which is much more obvious in intent. That change is then percolated up > through callers in the obvious way. > > Testing: > mach5 tier1 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into simplify - replace set_referent with clear_referent ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1286/files - new: https://git.openjdk.java.net/jdk/pull/1286/files/84dc63e4..74a67a1a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1286&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1286&range=00-01 Stats: 8917 lines in 316 files changed: 5071 ins; 2346 del; 1500 mod Patch: https://git.openjdk.java.net/jdk/pull/1286.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1286/head:pull/1286 PR: https://git.openjdk.java.net/jdk/pull/1286 From kbarrett at openjdk.java.net Thu Nov 19 15:46:06 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Nov 2020 15:46:06 GMT Subject: RFR: 8256516: Simplify clearing References [v2] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 10:08:30 GMT, Roman Kennke wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into simplify >> - replace set_referent with clear_referent > > Looks good to me! Thanks! Thanks for reviews @rkennke, @shipilev, @pliden, @mlchung ------------- PR: https://git.openjdk.java.net/jdk/pull/1286 From kbarrett at openjdk.java.net Thu Nov 19 15:46:08 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Nov 2020 15:46:08 GMT Subject: Integrated: 8256516: Simplify clearing References In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 09:53:11 GMT, Kim Barrett wrote: > Please review this simplification of jlr.Reference clearing by VM code. > > The function java_lang_ref_Reference::set_referent_raw was being used to > clear the referent of Reference objects, and only for that purpose. This > change replaces that function with java_lang_ref_Reference::clear_referent, > which is much more obvious in intent. That change is then percolated up > through callers in the obvious way. > > Testing: > mach5 tier1 This pull request has now been integrated. Changeset: 675d1d56 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/675d1d56 Stats: 32 lines in 7 files changed: 9 ins; 10 del; 13 mod 8256516: Simplify clearing References Provide and use explicit referent clearing instead of set to null. Reviewed-by: rkennke, shade, pliden, mchung ------------- PR: https://git.openjdk.java.net/jdk/pull/1286 From vlivanov at openjdk.java.net Thu Nov 19 16:12:11 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 19 Nov 2020 16:12:11 GMT Subject: RFR: 8256581: Refactor vector conversion tests In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 01:07:12 GMT, Paul Sandoz wrote: > Refactor the vector conversions tests to improve performance and reduce explicit test methods (using data providers). > +463, -37,019 Impressive improvement, Paul! :-) ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1302 From coleenp at openjdk.java.net Thu Nov 19 16:15:11 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 16:15:11 GMT Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code Message-ID: The MethodHandles verbose logging code tries to walk windows stacks without using the StackWalk API and fails miserably. It was the only consumer of os::current_frame() and get_sender_for_C_frame(). For Windows, the hs_err stack walking was fixed to use the StackWalk API and NMT was used to use RtlCaptureStackBackTrace. This change removes the stub generated for getting the fp (which isn't valid anyway on windows). I also removed it from windows_aarch64 since it wasn't actually generated. Tested with tier1-3 for windows-x64. Thanks, Coleen ------------- Commit messages: - 8246378: [Windows] assert on MethodHandle logging code Changes: https://git.openjdk.java.net/jdk/pull/1321/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1321&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8246378 Stats: 136 lines in 11 files changed: 23 ins; 92 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/1321.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1321/head:pull/1321 PR: https://git.openjdk.java.net/jdk/pull/1321 From shade at openjdk.java.net Thu Nov 19 16:31:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 16:31:04 GMT Subject: RFR: 8142984: Zero: fast accessors should handle both getters and setters [v2] In-Reply-To: References: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> Message-ID: <3A0JRinKBX1t4Y_H3fl7q4GWft8H7CsIreQZvt3MMZc=.4590daf5-5d73-4283-92d8-c1a3fe6ebbad@github.com> On Mon, 9 Nov 2020 09:19:59 GMT, Aleksey Shipilev wrote: >> Looks good to me. > > @coleenp or other runtime folks might want to take a look as well? Anyone else? :) ------------- PR: https://git.openjdk.java.net/jdk/pull/728 From coleenp at openjdk.java.net Thu Nov 19 16:38:11 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 16:38:11 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable Message-ID: I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. ------------- Commit messages: - 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable Changes: https://git.openjdk.java.net/jdk/pull/1323/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1323&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256640 Stats: 24 lines in 2 files changed: 19 ins; 1 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1323.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1323/head:pull/1323 PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 16:46:05 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 16:46:05 GMT Subject: RFR: 8142984: Zero: fast accessors should handle both getters and setters [v2] In-Reply-To: References: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> Message-ID: <09kZ5xJaFtQ8iq0W6Gzfby0MiBIVbx6sQ2kInhpsiXo=.87cd635c-2d2b-4489-b32a-957ed403764e@github.com> On Wed, 28 Oct 2020 09:56:31 GMT, Aleksey Shipilev wrote: >> It started as removing the TODO item in `abstractInterpreter.cpp`. Zero is the only implementation that treats `accessor` to mean `getter`, which makes the awkward choice in the entry selection. After going back and forth (including trying to remove the fast accessor methods altogether in [JDK-8255066](https://bugs.openjdk.java.net/browse/JDK-8255066)), I settled on implementing the fast Zero `setter`-s too, plus renaming and whipping the existing `getter` code in shape. The end result seems to be more straight-forward than it was before. >> >> On the plus side, it improves `make bootcycle-images` in release mode from ~47m40s to ~46m50s, because we are saving time doing the `normal_entry` for setters. >> >> The "normal", non-Zero template interpreter is not affected, because it does not have any specializations for `accessor`, `getter` or `setter`, and instead just doing the normal entry. >> >> Testing: >> - [x] Linux x86_64 {fastdebug, release} Zero `make bootcycle-images` >> - [x] Linux aarch64 {fastdebug, release} Zero `make bootcycle-images` >> - [x] Linux x86_64 Zero release jcstress >> - [x] Linux aarch64 Zero release jcstress > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8142984: Zero: fast accessors should handle both getters and setters Looks good to me. For Zero, I suppose these methods improve performance by a lot. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/728 From psandoz at openjdk.java.net Thu Nov 19 17:01:04 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 19 Nov 2020 17:01:04 GMT Subject: Integrated: 8256581: Refactor vector conversion tests In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 01:07:12 GMT, Paul Sandoz wrote: > Refactor the vector conversions tests to improve performance and reduce explicit test methods (using data providers). This pull request has now been integrated. Changeset: 580f22cc Author: Paul Sandoz URL: https://git.openjdk.java.net/jdk/commit/580f22cc Stats: 37231 lines in 6 files changed: 212 ins; 36768 del; 251 mod 8256581: Refactor vector conversion tests Reviewed-by: vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/1302 From sjohanss at openjdk.java.net Thu Nov 19 17:09:22 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 17:09:22 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v10] In-Reply-To: References: Message-ID: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Fix missing include ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1141/files - new: https://git.openjdk.java.net/jdk/pull/1141/files/553f99a1..ee031dc5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1141&range=08-09 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1141/head:pull/1141 PR: https://git.openjdk.java.net/jdk/pull/1141 From psandoz at openjdk.java.net Thu Nov 19 17:21:05 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 19 Nov 2020 17:21:05 GMT Subject: RFR: 8256585: Remove in-place conversion vector operators from Vector API In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 03:26:20 GMT, Sandhya Viswanathan wrote: > Remove partially implemented in-place conversion vector operators from Vector API: > ofNarrowing, ofWidening, INPLACE_XXX The documentation `Vector.convert` and `Vector.convertShape` needs to be updated to remove specification of in-place conversions, as does the class documentation on `Vector`. A search for the term `in-place` should find the relevant locations. ------------- Changes requested by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1305 From shade at openjdk.java.net Thu Nov 19 17:40:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 17:40:03 GMT Subject: RFR: 8142984: Zero: fast accessors should handle both getters and setters [v2] In-Reply-To: References: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> Message-ID: On Mon, 2 Nov 2020 21:17:01 GMT, Andrew John Hughes wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8142984: Zero: fast accessors should handle both getters and setters > > Looks good to me. Thanks @gnu-andrew and @coleenp. Yes, these improve Zero performance quite significantly, along with cleaning up some TODOs in shared code. Seems like win-win. ------------- PR: https://git.openjdk.java.net/jdk/pull/728 From shade at openjdk.java.net Thu Nov 19 17:40:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 17:40:04 GMT Subject: Integrated: 8142984: Zero: fast accessors should handle both getters and setters In-Reply-To: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> References: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> Message-ID: On Mon, 19 Oct 2020 08:57:04 GMT, Aleksey Shipilev wrote: > It started as removing the TODO item in `abstractInterpreter.cpp`. Zero is the only implementation that treats `accessor` to mean `getter`, which makes the awkward choice in the entry selection. After going back and forth (including trying to remove the fast accessor methods altogether in [JDK-8255066](https://bugs.openjdk.java.net/browse/JDK-8255066)), I settled on implementing the fast Zero `setter`-s too, plus renaming and whipping the existing `getter` code in shape. The end result seems to be more straight-forward than it was before. > > On the plus side, it improves `make bootcycle-images` in release mode from ~47m40s to ~46m50s, because we are saving time doing the `normal_entry` for setters. > > The "normal", non-Zero template interpreter is not affected, because it does not have any specializations for `accessor`, `getter` or `setter`, and instead just doing the normal entry. > > Testing: > - [x] Linux x86_64 {fastdebug, release} Zero `make bootcycle-images` > - [x] Linux aarch64 {fastdebug, release} Zero `make bootcycle-images` > - [x] Linux x86_64 Zero release jcstress > - [x] Linux aarch64 Zero release jcstress This pull request has now been integrated. Changeset: defdd12e Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/defdd12e Stats: 202 lines in 7 files changed: 97 ins; 38 del; 67 mod 8142984: Zero: fast accessors should handle both getters and setters Reviewed-by: andrew, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/728 From sjohanss at openjdk.java.net Thu Nov 19 17:59:12 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 17:59:12 GMT Subject: RFR: 8236926: Concurrently uncommit memory in G1 [v7] In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 20:01:19 GMT, Albert Mingkun Yang wrote: >> Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into 8236926-ccu >> - Zoom feedback >> - Albert review 2 >> - Albert review >> - Merge branch 'master' into 8236926-ccu >> - Lock for small mapper and use BitMap parallel operations. >> - Self review >> - Simplified task >> - Improved logging >> - Test improvement >> - ... and 5 more: https://git.openjdk.java.net/jdk/compare/3675653c...c354b1d8 > > Thank you for the revision. Thanks for the reviews @albertnetymk and @tschatzl! ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From sjohanss at openjdk.java.net Thu Nov 19 17:59:13 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Nov 2020 17:59:13 GMT Subject: Integrated: 8236926: Concurrently uncommit memory in G1 In-Reply-To: References: Message-ID: <92_IE4xxslQIwRi7YHh1enH7ebTJ9DFUMHcECyTYlXo=.3d5bf08c-3d28-426c-beb9-0829504c6b79@github.com> On Tue, 10 Nov 2020 10:43:20 GMT, Stefan Johansson wrote: > Please review this change that implements concurrent uncommit for G1. > > **Summary** > G1 currently check if the heap can be shrunk at the end of the Remark pause and at the end of a Full GC. The uncommit work (handing back the memory to the OS) is quite expensive and this change moves it out of the pause. The actual uncommitting is now handled by the G1 service thread and the new task `G1 Uncommit Region Task`. The new task will uncommit memory in chunks of regions to avoid starving out other tasks. > > The calculations of how much to shrink the heap and when is not changed, but during the pause only quick preparation work is done. Splitting the uncommit work into two parts comes with some additional meta-data cost. Previously we had a single bitmap to mark if a region was committed or not, now we need two bitmaps. One bitmap to keep track of the regions available for use (active) and one bitmap for the regions ready to be uncommitted (inactive). The union of those two bitmaps are the regions currently committed. When expanding the heap we prefer to re-activate regions from the inactive bitmap if there are any, instead of committing new regions, since this is cheaper (avoiding calls to the OS). > > Splitting the work also comes with some additional synchronization. Both the uncommit task and a mutator thread doing a humongous allocation might want to alter the inactive map at the same time. To prevent this a new lock `Uncommit_lock` is added. > > One thing to note is that there is still one case left where we do the uncommit directly and this is during CDS initialization. > > **Logging** > To track the concurrent uncommit in logs a few additional messages have been added. There are no new `info` messages, but for `gc+heap` there are two new `debug` messages and one `trace`: > [7,468s][debug][gc,heap ] GC(32) Regions ready for uncommit: 1873 > ... > [7,509s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 11,173ms > [7,522s][trace][gc,heap ] Concurrent Uncommit: 256M, 32 regions, 12,599ms > ... > [9,691s][debug][gc,heap ] Concurrent Uncommit Summary: 4864M, 608 regions, 405,827ms > > The `trace` is printed for each invocation of the task while the `debug` message is only printed when there is no more uncommit work available. As you can see in the above example, it's not certain that all regions made ready for uncommit are actually uncommitted. The reason for this is that the heap had to grow again during the concurrent uncommit, and regions were re-activated. > > On `gc+heap+region` there are new logs to see how ranges of regions transition between different states: > [6,337s][debug][gc,heap,region ] Uncommit regions [12768, 13024) > [6,424s][debug][gc,heap,region ] Uncommit regions [13024, 13280) > [6,438s][debug][gc,heap,region ] Uncommit regions [13280, 13536) > [6,510s][debug][gc,heap,region ] Uncommit regions [13536, 13792) > [6,573s][debug][gc,heap,region ] GC(79) Reactivate regions [13792, 15651) > [6,574s][debug][gc,heap,region ] GC(79) Activate regions [76, 96) > [6,579s][debug][gc,heap,region ] GC(79) Activate regions [97, 1099) > > **Testing** > Two new tests have been added, one gtest and one jtreg test. These are intended to test the basic functionality, but most testing is gained by just running applications that resize the heap. This is quite common in our testing, so the code will be exercised a lot. > > I've run multiple runs of mach5 testing tier 1-5 as well as local testing. I've also done a performance run and as expected there are not significant changes. This pull request has now been integrated. Changeset: b8244b60 Author: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/b8244b60 Stats: 1515 lines in 25 files changed: 1338 ins; 106 del; 71 mod 8236926: Concurrently uncommit memory in G1 Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/1141 From kvn at openjdk.java.net Thu Nov 19 18:07:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 19 Nov 2020 18:07:08 GMT Subject: RFR: 8256536: Newer AMD 19h (EPYC) Processor family defaults In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 10:36:03 GMT, Rohit Arul Raj wrote: > This patch sets the following flags as defaults for newer AMD 19h (EPYC) processors. > bool UseFPUForSpilling = true > bool UseUnalignedLoadStores = true > bool UseXMMForArrayCopy = true > bool UseXMMForObjInit = true > bool UseFastStosb = false > bool AlignVector = false > > Additional testing: make run-test TEST="tier1 tier2" JTREG="JOBS=128" CONF=linux-x86_64-server-fastdebug > > Please review this change. > > Thanks, > Rohit Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1288 From coleen.phillimore at oracle.com Thu Nov 19 18:24:15 2020 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 19 Nov 2020 13:24:15 -0500 Subject: RFR: 8142984: Zero: fast accessors should handle both getters and setters [v2] In-Reply-To: References: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> Message-ID: <50353d76-ae53-7216-1993-311f6c288955@oracle.com> On 11/19/20 12:40 PM, Aleksey Shipilev wrote: > On Mon, 2 Nov 2020 21:17:01 GMT, Andrew John Hughes wrote: > >>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >>> >>> 8142984: Zero: fast accessors should handle both getters and setters >> Looks good to me. > Thanks @gnu-andrew and @coleenp. Yes, these improve Zero performance quite significantly, along with cleaning up some TODOs in shared code. Seems like win-win. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/728 There's an #if 0 in zeroInterpreter_zero.cpp that you can trivially remove also. Coleen From shade at openjdk.java.net Thu Nov 19 18:48:06 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 18:48:06 GMT Subject: RFR: 8142984: Zero: fast accessors should handle both getters and setters [v2] In-Reply-To: References: <0skRfs7hB88JHFy53lVD0Fvt-JlF2HWGTC05qMgHidA=.346c67ce-af20-41ba-b4fa-a24e8ca6c0e2@github.com> Message-ID: On Thu, 19 Nov 2020 17:36:16 GMT, Aleksey Shipilev wrote: >> Looks good to me. > > Thanks @gnu-andrew and @coleenp. Yes, these improve Zero performance quite significantly, along with cleaning up some TODOs in shared code. Seems like win-win. > _Mailing list message from [Coleen Phillimore](mailto:coleen.phillimore at oracle.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ > > On 11/19/20 12:40 PM, Aleksey Shipilev wrote: > > There's an #if 0 in zeroInterpreter_zero.cpp that you can trivially > remove also. D'oh. It does not seem related, thankfully. I'll file another issue to remove that block. ------------- PR: https://git.openjdk.java.net/jdk/pull/728 From shade at openjdk.java.net Thu Nov 19 19:05:12 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 19:05:12 GMT Subject: RFR: 8256692: Zero: remove obsolete block from ZeroInterpreter::native_entry Message-ID: There is a block that is protected by #if 0. Seems to be protected by it after [JDK-8239782](https://bugs.openjdk.java.net/browse/JDK-8256692), and it is completely unnecessary after [JDK-8255617](https://bugs.openjdk.java.net/browse/JDK-8255617). ------------- Commit messages: - 8256692: Zero: remove obsolete block from ZeroInterpreter::native_entry Changes: https://git.openjdk.java.net/jdk/pull/1328/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1328&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256692 Stats: 20 lines in 1 file changed: 0 ins; 20 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1328.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1328/head:pull/1328 PR: https://git.openjdk.java.net/jdk/pull/1328 From lfoltan at openjdk.java.net Thu Nov 19 19:34:03 2020 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Thu, 19 Nov 2020 19:34:03 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: Message-ID: <-ANOQiniVwL9O_YTP4G5v0uaY9vgV3bPRelq8Vb8tss=.7e15eee4-a08e-4f78-9783-df4bcc07cd46@github.com> On Thu, 19 Nov 2020 16:31:44 GMT, Coleen Phillimore wrote: > I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. > Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. > > The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. Marked as reviewed by lfoltan (Reviewer). src/hotspot/share/oops/klassVtable.cpp line 507: > 505: // Set the vtable index before the constraint check safepoint potentially > 506: // redefines this method, which is possible if it is a default method belonging > 507: // to a super class or interface. Minor nit, wording is awkward. Maybe: Set the vtable index before the constraint check safepoint, which potentially redefines this method if this method is a default method belonging to a super class or interface. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 19:41:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 19:41:01 GMT Subject: RFR: 8256692: Zero: remove obsolete block from ZeroInterpreter::native_entry In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 18:59:30 GMT, Aleksey Shipilev wrote: > There is a block that is protected by #if 0. Seems to be protected by it after [JDK-8239782](https://bugs.openjdk.java.net/browse/JDK-8256692), and it is completely unnecessary after [JDK-8255617](https://bugs.openjdk.java.net/browse/JDK-8255617). This is trivially good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1328 From sviswanathan at openjdk.java.net Thu Nov 19 19:47:15 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 19 Nov 2020 19:47:15 GMT Subject: RFR: 8256585: Remove in-place conversion vector operators from Vector API [v2] In-Reply-To: References: Message-ID: > Remove partially implemented in-place conversion vector operators from Vector API: > ofNarrowing, ofWidening, INPLACE_XXX Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Update documentation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1305/files - new: https://git.openjdk.java.net/jdk/pull/1305/files/73122deb..db9d37de Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1305&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1305&range=00-01 Stats: 12 lines in 1 file changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1305.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1305/head:pull/1305 PR: https://git.openjdk.java.net/jdk/pull/1305 From sviswanathan at openjdk.java.net Thu Nov 19 19:50:03 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 19 Nov 2020 19:50:03 GMT Subject: RFR: 8256585: Remove in-place conversion vector operators from Vector API [v2] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 17:18:12 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update documentation > > The documentation `Vector.convert` and `Vector.convertShape` needs to be updated to remove specification of in-place conversions, as does the class documentation on `Vector`. A search for the term `in-place` should find the relevant locations. Updated documentation to reflect removal of widening and contracting in-place conversions. ------------- PR: https://git.openjdk.java.net/jdk/pull/1305 From dcubed at openjdk.java.net Thu Nov 19 20:53:03 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 19 Nov 2020 20:53:03 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: Message-ID: <_AzDnG6UiwgLwKXGKoJVJC2bWPRzqEMO9QIUsY73qxk=.a706ee96-1fea-480c-b972-da897b41505f@github.com> On Thu, 19 Nov 2020 20:48:53 GMT, Daniel D. Daugherty wrote: >> I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. >> Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. >> >> The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. > > Marked as reviewed by dcubed (Reviewer). Normally when fixing a regression, I would want one of the original reviewers to chime in on the thread, but it's the wrong time of day for @dholmes-ora or @fisk. I'm good with this fix. Your call on whether to address my one nit. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From dcubed at openjdk.java.net Thu Nov 19 20:53:05 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 19 Nov 2020 20:53:05 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: <-ANOQiniVwL9O_YTP4G5v0uaY9vgV3bPRelq8Vb8tss=.7e15eee4-a08e-4f78-9783-df4bcc07cd46@github.com> References: <-ANOQiniVwL9O_YTP4G5v0uaY9vgV3bPRelq8Vb8tss=.7e15eee4-a08e-4f78-9783-df4bcc07cd46@github.com> Message-ID: On Thu, 19 Nov 2020 19:30:58 GMT, Lois Foltan wrote: >> I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. >> Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. >> >> The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. > > src/hotspot/share/oops/klassVtable.cpp line 507: > >> 505: // Set the vtable index before the constraint check safepoint potentially >> 506: // redefines this method, which is possible if it is a default method belonging >> 507: // to a super class or interface. > > Minor nit, wording is awkward. Maybe: > Set the vtable index before the constraint check safepoint, which potentially redefines this method if this method is a default method belonging to a super class or interface. The suggested rewrite reads better. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From dcubed at openjdk.java.net Thu Nov 19 20:53:02 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 19 Nov 2020 20:53:02 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 16:31:44 GMT, Coleen Phillimore wrote: > I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. > Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. > > The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. Marked as reviewed by dcubed (Reviewer). src/hotspot/share/oops/klassVtable.cpp line 237: > 235: // needs new entry > 236: if (needs_new_entry) { > 237: // Refetch this default method in case of redefinition in safepoint above. Which code above can go to a safepoint? So perhaps reword the comment: // Refetch this default method in case of redefinition in a safepoint that // might happen in XXX above. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 21:20:24 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 21:20:24 GMT Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code [v2] In-Reply-To: References: Message-ID: > The MethodHandles verbose logging code tries to walk windows stacks without using the StackWalk API and fails miserably. It was the only consumer of os::current_frame() and get_sender_for_C_frame(). For Windows, the hs_err stack walking was fixed to use the StackWalk API and NMT was used to use RtlCaptureStackBackTrace. > This change removes the stub generated for getting the fp (which isn't valid anyway on windows). I also removed it from windows_aarch64 since it wasn't actually generated. > > Tested with tier1-3 for windows-x64. > Thanks, > Coleen Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: restore fp and sp printing. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1321/files - new: https://git.openjdk.java.net/jdk/pull/1321/files/0a344e32..6f56d678 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1321&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1321&range=00-01 Stats: 19 lines in 1 file changed: 7 ins; 2 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/1321.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1321/head:pull/1321 PR: https://git.openjdk.java.net/jdk/pull/1321 From coleenp at openjdk.java.net Thu Nov 19 21:25:05 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 21:25:05 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: <-ANOQiniVwL9O_YTP4G5v0uaY9vgV3bPRelq8Vb8tss=.7e15eee4-a08e-4f78-9783-df4bcc07cd46@github.com> Message-ID: On Thu, 19 Nov 2020 20:42:29 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/oops/klassVtable.cpp line 507: >> >>> 505: // Set the vtable index before the constraint check safepoint potentially >>> 506: // redefines this method, which is possible if it is a default method belonging >>> 507: // to a super class or interface. >> >> Minor nit, wording is awkward. Maybe: >> Set the vtable index before the constraint check safepoint, which potentially redefines this method if this method is a default method belonging to a super class or interface. > > The suggested rewrite reads better. ok, that wording is more clear. thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From psandoz at openjdk.java.net Thu Nov 19 21:25:06 2020 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 19 Nov 2020 21:25:06 GMT Subject: RFR: 8256585: Remove in-place conversion vector operators from Vector API [v2] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 19:47:15 GMT, Sandhya Viswanathan wrote: >> Remove partially implemented in-place conversion vector operators from Vector API: >> ofNarrowing, ofWidening, INPLACE_XXX > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Update documentation Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1305 From coleenp at openjdk.java.net Thu Nov 19 21:30:10 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 21:30:10 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: <_AzDnG6UiwgLwKXGKoJVJC2bWPRzqEMO9QIUsY73qxk=.a706ee96-1fea-480c-b972-da897b41505f@github.com> References: <_AzDnG6UiwgLwKXGKoJVJC2bWPRzqEMO9QIUsY73qxk=.a706ee96-1fea-480c-b972-da897b41505f@github.com> Message-ID: On Thu, 19 Nov 2020 20:50:12 GMT, Daniel D. Daugherty wrote: >> Marked as reviewed by dcubed (Reviewer). > > Normally when fixing a regression, I would want one of the original > reviewers to chime in on the thread, but it's the wrong time of day > for @dholmes-ora or @fisk. > > I'm good with this fix. Your call on whether to address my one nit. I thought it should get checked in today to stop the regression. Thank you for the reviews and comments Lois and Dan. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 21:30:13 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 21:30:13 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 20:45:39 GMT, Daniel D. Daugherty wrote: >> I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. >> Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. >> >> The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. > > src/hotspot/share/oops/klassVtable.cpp line 237: > >> 235: // needs new entry >> 236: if (needs_new_entry) { >> 237: // Refetch this default method in case of redefinition in safepoint above. > > Which code above can go to a safepoint? So perhaps reword the comment: > > // Refetch this default method in case of redefinition in a safepoint that > // might happen in XXX above. // Refetch this default method in case of redefinition that might // happen during constraint checking in the update_inherited_vtable call above. How about this? ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From dcubed at openjdk.java.net Thu Nov 19 21:43:03 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 19 Nov 2020 21:43:03 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: <_AzDnG6UiwgLwKXGKoJVJC2bWPRzqEMO9QIUsY73qxk=.a706ee96-1fea-480c-b972-da897b41505f@github.com> Message-ID: On Thu, 19 Nov 2020 21:27:30 GMT, Coleen Phillimore wrote: >> Normally when fixing a regression, I would want one of the original >> reviewers to chime in on the thread, but it's the wrong time of day >> for @dholmes-ora or @fisk. >> >> I'm good with this fix. Your call on whether to address my one nit. > > I thought it should get checked in today to stop the regression. Thank you for the reviews and comments Lois and Dan. I would appreciate as quick an integration as you can manage given the needs to have Tier5 and Tier6 test results. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From dcubed at openjdk.java.net Thu Nov 19 21:43:04 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 19 Nov 2020 21:43:04 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 21:26:55 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/klassVtable.cpp line 237: >> >>> 235: // needs new entry >>> 236: if (needs_new_entry) { >>> 237: // Refetch this default method in case of redefinition in safepoint above. >> >> Which code above can go to a safepoint? So perhaps reword the comment: >> >> // Refetch this default method in case of redefinition in a safepoint that >> // might happen in XXX above. > > // Refetch this default method in case of redefinition that might > // happen during constraint checking in the update_inherited_vtable call above. > How about this? Looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From iklam at openjdk.java.net Thu Nov 19 21:47:08 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 19 Nov 2020 21:47:08 GMT Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code [v2] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 21:20:24 GMT, Coleen Phillimore wrote: >> The MethodHandles verbose logging code tries to walk windows stacks without using the StackWalk API and fails miserably. It was the only consumer of os::current_frame() and get_sender_for_C_frame(). For Windows, the hs_err stack walking was fixed to use the StackWalk API and NMT was used to use RtlCaptureStackBackTrace. >> This change removes the stub generated for getting the fp (which isn't valid anyway on windows). I also removed it from windows_aarch64 since it wasn't actually generated. >> >> Tested with tier1-3 for windows-x64. >> Thanks, >> Coleen > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > restore fp and sp printing. LGTM. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6720: > 6718: > 6719: // platform dependent > 6720: StubRoutines::x86::_get_previous_sp_entry = generate_get_previous_sp(); The body of `generate_get_previous_fp()` can also be removed. ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1321 From shade at openjdk.java.net Thu Nov 19 22:01:05 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 22:01:05 GMT Subject: RFR: 8256692: Zero: remove obsolete block from ZeroInterpreter::native_entry In-Reply-To: References: Message-ID: <5OUyiIjqspC1ROvz9ZB2_QVog742IZWzGKFsnrst4Qg=.b07e0225-93c8-45d0-b1ab-afebf9db6b2c@github.com> On Thu, 19 Nov 2020 19:38:32 GMT, Coleen Phillimore wrote: >> There is a block that is protected by #if 0. Seems to be protected by it after [JDK-8239782](https://bugs.openjdk.java.net/browse/JDK-8256692), and it is completely unnecessary after [JDK-8255617](https://bugs.openjdk.java.net/browse/JDK-8255617). > > This is trivially good. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1328 From shade at openjdk.java.net Thu Nov 19 22:01:06 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 19 Nov 2020 22:01:06 GMT Subject: Integrated: 8256692: Zero: remove obsolete block from ZeroInterpreter::native_entry In-Reply-To: References: Message-ID: <-1CnCH_qDANK7Vt1fJSFJ9_NWXYrut-lSGBYdktxfrk=.7878f81e-a9f2-4b79-84c0-77c8b1d57f49@github.com> On Thu, 19 Nov 2020 18:59:30 GMT, Aleksey Shipilev wrote: > There is a block that is protected by #if 0. Seems to be protected by it after [JDK-8239782](https://bugs.openjdk.java.net/browse/JDK-8256692), and it is completely unnecessary after [JDK-8255617](https://bugs.openjdk.java.net/browse/JDK-8255617). This pull request has now been integrated. Changeset: c1407733 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/c1407733 Stats: 20 lines in 1 file changed: 0 ins; 20 del; 0 mod 8256692: Zero: remove obsolete block from ZeroInterpreter::native_entry Reviewed-by: coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/1328 From lfoltan at openjdk.java.net Thu Nov 19 22:10:06 2020 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Thu, 19 Nov 2020 22:10:06 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 16:31:44 GMT, Coleen Phillimore wrote: > I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. > Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. > > The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. Thanks for updating the comment. I'm fine with going ahead with this change. ------------- Marked as reviewed by lfoltan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 22:14:23 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 22:14:23 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable [v2] In-Reply-To: References: Message-ID: <40VmISmPlYTohW4wuSXTAAH6R2xeWMD8Pq1fGSkv7dM=.53f24cef-0561-4bb0-b40f-36dab34e8997@github.com> > I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. > Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. > > The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix comments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1323/files - new: https://git.openjdk.java.net/jdk/pull/1323/files/12c24a1d..9ebe0c6c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1323&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1323&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1323.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1323/head:pull/1323 PR: https://git.openjdk.java.net/jdk/pull/1323 From dholmes at openjdk.java.net Thu Nov 19 22:44:07 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 19 Nov 2020 22:44:07 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable [v2] In-Reply-To: <40VmISmPlYTohW4wuSXTAAH6R2xeWMD8Pq1fGSkv7dM=.53f24cef-0561-4bb0-b40f-36dab34e8997@github.com> References: <40VmISmPlYTohW4wuSXTAAH6R2xeWMD8Pq1fGSkv7dM=.53f24cef-0561-4bb0-b40f-36dab34e8997@github.com> Message-ID: On Thu, 19 Nov 2020 22:14:23 GMT, Coleen Phillimore wrote: >> I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. >> Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. >> >> The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments. Hi Coleen, This all seems quite reasonable in itself. I have some concerns about when/why CDS archives a redefined version of a class, but that is a different matter. Thanks, David ----- ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1323 From dcubed at openjdk.java.net Thu Nov 19 22:44:09 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 19 Nov 2020 22:44:09 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable [v2] In-Reply-To: <40VmISmPlYTohW4wuSXTAAH6R2xeWMD8Pq1fGSkv7dM=.53f24cef-0561-4bb0-b40f-36dab34e8997@github.com> References: <40VmISmPlYTohW4wuSXTAAH6R2xeWMD8Pq1fGSkv7dM=.53f24cef-0561-4bb0-b40f-36dab34e8997@github.com> Message-ID: On Thu, 19 Nov 2020 22:14:23 GMT, Coleen Phillimore wrote: >> I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. >> Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. >> >> The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments. Thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 22:44:10 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 22:44:10 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable [v2] In-Reply-To: References: <40VmISmPlYTohW4wuSXTAAH6R2xeWMD8Pq1fGSkv7dM=.53f24cef-0561-4bb0-b40f-36dab34e8997@github.com> Message-ID: On Thu, 19 Nov 2020 22:37:54 GMT, Daniel D. Daugherty wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comments. > > Thumbs up. The archive doesn't have a redefined version. The super class that is in the archive is redefined before the subclass is loaded but the subclass's default_methods point to super class/interface owned default methods. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 22:44:11 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 22:44:11 GMT Subject: RFR: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable [v2] In-Reply-To: References: <40VmISmPlYTohW4wuSXTAAH6R2xeWMD8Pq1fGSkv7dM=.53f24cef-0561-4bb0-b40f-36dab34e8997@github.com> Message-ID: <7H-9yXfhsXk9hV7EwcBorAW81C5nfsT7iQqEo0kgIcY=.0d9f0e69-ce27-4633-a798-36dabc850f47@github.com> On Thu, 19 Nov 2020 22:38:02 GMT, Coleen Phillimore wrote: >> Thumbs up. > > The archive doesn't have a redefined version. The super class that is in the archive is redefined before the subclass is loaded but the subclass's default_methods point to super class/interface owned default methods. Thanks for the code reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 22:44:11 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 22:44:11 GMT Subject: Integrated: 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 16:31:44 GMT, Coleen Phillimore wrote: > I added an assert with https://bugs.openjdk.java.net/browse/JDK-8256365 to catch adding redefined methods into the vtable where they don't belong (except while redefining that class). The assert found that CDS was restoring classes with methods in the default_methods array which can be from a redefined class. So these methods need to be adjusted before recreating the vtable. > Also fixed is a potential bug where a Method in a methodHandle in the default_methods array can be redefined in the safepoint caused by loader constraint checking, then added to the vtable. The window for this bug is very small so I couldn't write a test for it. This change was in my v1 patch for JDK-8256365, so reintroduced here. > > The test java/lang/instrument/IsModifiableClassAgent.java now passes with this patch. Rerunning tier1-6 tests in progress. Built with minimal VM to verify #if INCLUDE_JVMTIs were in the right places. This pull request has now been integrated. Changeset: fae68ff0 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/fae68ff0 Stats: 25 lines in 2 files changed: 20 ins; 1 del; 4 mod 8256640: assert(!m->is_old() || ik()->is_being_redefined()) failed: old methods should not be in vtable Reviewed-by: lfoltan, dcubed, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/1323 From coleenp at openjdk.java.net Thu Nov 19 22:47:07 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 19 Nov 2020 22:47:07 GMT Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code [v2] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 21:44:23 GMT, Ioi Lam wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> restore fp and sp printing. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6720: > >> 6718: >> 6719: // platform dependent >> 6720: StubRoutines::x86::_get_previous_sp_entry = generate_get_previous_sp(); > > The body of `generate_get_previous_fp()` can also be removed. Thanks for noticing. I meant to delete it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1321 From coleenp at openjdk.java.net Fri Nov 20 00:03:18 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 20 Nov 2020 00:03:18 GMT Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code [v3] In-Reply-To: References: Message-ID: > The MethodHandles verbose logging code tries to walk windows stacks without using the StackWalk API and fails miserably. It was the only consumer of os::current_frame() and get_sender_for_C_frame(). For Windows, the hs_err stack walking was fixed to use the StackWalk API and NMT was used to use RtlCaptureStackBackTrace. > This change removes the stub generated for getting the fp (which isn't valid anyway on windows). I also removed it from windows_aarch64 since it wasn't actually generated. > > Tested with tier1-3 for windows-x64. > Thanks, > Coleen Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove generate_get_previous_fp(). ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1321/files - new: https://git.openjdk.java.net/jdk/pull/1321/files/6f56d678..f09c882b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1321&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1321&range=01-02 Stats: 20 lines in 1 file changed: 0 ins; 20 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1321.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1321/head:pull/1321 PR: https://git.openjdk.java.net/jdk/pull/1321 From iklam at openjdk.java.net Fri Nov 20 06:27:07 2020 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 20 Nov 2020 06:27:07 GMT Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code [v3] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 00:03:18 GMT, Coleen Phillimore wrote: >> The MethodHandles verbose logging code tries to walk windows stacks without using the StackWalk API and fails miserably. It was the only consumer of os::current_frame() and get_sender_for_C_frame(). For Windows, the hs_err stack walking was fixed to use the StackWalk API and NMT was used to use RtlCaptureStackBackTrace. >> This change removes the stub generated for getting the fp (which isn't valid anyway on windows). I also removed it from windows_aarch64 since it wasn't actually generated. >> >> Tested with tier1-3 for windows-x64. >> Thanks, >> Coleen > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove generate_get_previous_fp(). Marked as reviewed by iklam (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1321 From vlivanov at openjdk.java.net Fri Nov 20 10:41:04 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 10:41:04 GMT Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code [v3] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 00:03:18 GMT, Coleen Phillimore wrote: >> The MethodHandles verbose logging code tries to walk windows stacks without using the StackWalk API and fails miserably. It was the only consumer of os::current_frame() and get_sender_for_C_frame(). For Windows, the hs_err stack walking was fixed to use the StackWalk API and NMT was used to use RtlCaptureStackBackTrace. >> This change removes the stub generated for getting the fp (which isn't valid anyway on windows). I also removed it from windows_aarch64 since it wasn't actually generated. >> >> Tested with tier1-3 for windows-x64. >> Thanks, >> Coleen > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove generate_get_previous_fp(). Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1321 From shade at openjdk.java.net Fri Nov 20 11:10:10 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 20 Nov 2020 11:10:10 GMT Subject: RFR: 8256736: Zero: GTest tests fail with "unsupported vm variant" Message-ID: Manifests as `tier1` test: $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=jtreg:gtest/MetaspaceGtests.java 00:16:40 java.lang.Error: TESTBUG: unsuppported vm variant 00:16:40 at GTestWrapper.getJVMVariantSubDir(GTestWrapper.java:122) 00:16:40 at GTestWrapper.main(GTestWrapper.java:50) 00:16:40 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 00:16:40 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78) 00:16:40 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 00:16:40 at java.base/java.lang.reflect.Method.invoke(Method.java:564) 00:16:40 at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) 00:16:40 at java.base/java.lang.Thread.run(Thread.java:831) Let's make the wrapper know about the Zero variant. Additional testing: - [x] Affected tests (they still fail, but because of Zero problems) ------------- Commit messages: - 8256736: Zero: GTest tests fail with "unsupported vm variant" Changes: https://git.openjdk.java.net/jdk/pull/1344/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1344&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256736 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1344.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1344/head:pull/1344 PR: https://git.openjdk.java.net/jdk/pull/1344 From coleenp at openjdk.java.net Fri Nov 20 13:03:04 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 20 Nov 2020 13:03:04 GMT Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code [v3] In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 10:37:56 GMT, Vladimir Ivanov wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove generate_get_previous_fp(). > > Looks good. Thank you Ioi and Vladimir. @iwanowww I was hoping you'd see this since it's possible that this logging is/was useful to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/1321 From coleenp at openjdk.java.net Fri Nov 20 13:03:06 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 20 Nov 2020 13:03:06 GMT Subject: Integrated: 8246378: [Windows] assert on MethodHandle logging code In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 16:09:58 GMT, Coleen Phillimore wrote: > The MethodHandles verbose logging code tries to walk windows stacks without using the StackWalk API and fails miserably. It was the only consumer of os::current_frame() and get_sender_for_C_frame(). For Windows, the hs_err stack walking was fixed to use the StackWalk API and NMT was used to use RtlCaptureStackBackTrace. > This change removes the stub generated for getting the fp (which isn't valid anyway on windows). I also removed it from windows_aarch64 since it wasn't actually generated. > > Tested with tier1-3 for windows-x64. > Thanks, > Coleen This pull request has now been integrated. Changeset: e7c7469c Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/e7c7469c Stats: 158 lines in 11 files changed: 14 ins; 98 del; 46 mod 8246378: [Windows] assert on MethodHandle logging code Reviewed-by: iklam, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/1321 From vlivanov at openjdk.java.net Fri Nov 20 15:00:18 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 20 Nov 2020 15:00:18 GMT Subject: RFR: 8254231: Implementation of Foreign Linker API (Incubator) [v28] In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 11:49:26 GMT, Maurizio Cimadamore wrote: >> This patch contains the changes associated with the first incubation round of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). >> >> The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. >> >> Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. >> >> A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). >> >> Thanks >> Maurizio >> >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. >> * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). >> >> ### Implementation changes >> >> The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). >> >> As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. >> >> Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. >> >> The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. >> >> This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. >> >> For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. >> >> A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. >> >> At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: >> >> * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). >> >> Again, for more readings on the internals of the foreign linker support, please refer to [5]. >> >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. >> >> Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. >> >> [1] - https://openjdk.java.net/jeps/389 >> [2] - https://openjdk.java.net/jeps/393 >> [3] - https://git.openjdk.java.net/jdk/pull/548 >> [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 95 commits: > > - Merge branch 'master' into 8254231_linker > - Add `final` modifier on NativeLibraries.defaultLookup > - Fix aarch64 test failure > - Fix signature mismatch on aarch64 > - Merge pull request #9 from JornVernee/Windows_Warnings > > Fix warnings on MSVC > - Fix warnings on MSVC > - Merge pull request #8 from JornVernee/Vlad_Comments > > Address More Review comments > - - Don't print anything in nmehtod debug output for native invoker if there are none. > - Use memcpy to copy native stubs to nmethod data > - Simplify print code > - Merge branch '8254231_linker' into Vlad_Comments > - ... and 85 more: https://git.openjdk.java.net/jdk/compare/a7422ac2...40bd5df1 Compiler changes look good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/634 From redestad at openjdk.java.net Fri Nov 20 21:03:14 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 20 Nov 2020 21:03:14 GMT Subject: RFR: 8256741: Reduce footprint of compiler interface data structures Message-ID: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> A few data structure in the ci allocate unconditionally created GrowableArrays out-of-line, have fields that are newer updated/read, or are unnecessarily cached. By cleaning this up we can slightly reduce memory used for JIT compilations while slightly speeding them up. ------------- Commit messages: - Copyrights, syntax polish - Adjust vmStructs/SA to changes in ciObjectFactory - Inline unconditionally created GAs in ciObjectFactory - Remove sad assert - Remove ciMethod fields cached in ciTypeFlow - Reduce footprint of compiler interface data structures Changes: https://git.openjdk.java.net/jdk/pull/1346/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1346&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256741 Stats: 226 lines in 9 files changed: 28 ins; 88 del; 110 mod Patch: https://git.openjdk.java.net/jdk/pull/1346.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1346/head:pull/1346 PR: https://git.openjdk.java.net/jdk/pull/1346 From vladimir.x.ivanov at oracle.com Fri Nov 20 22:19:22 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 21 Nov 2020 01:19:22 +0300 Subject: RFR: 8246378: [Windows] assert on MethodHandle logging code [v3] In-Reply-To: References: Message-ID: <86b9078b-2647-52eb-9659-fd02ac47a632@oracle.com> > Thank you Ioi and Vladimir. @iwanowww I was hoping you'd see this since it's possible that this logging is/was useful to you. Thanks for fixing it, Coleen! Best regards, Vladimir Ivanov From cjplummer at openjdk.java.net Fri Nov 20 22:24:16 2020 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Fri, 20 Nov 2020 22:24:16 GMT Subject: RFR: 8256741: Reduce footprint of compiler interface data structures In-Reply-To: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> References: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> Message-ID: On Fri, 20 Nov 2020 12:19:48 GMT, Claes Redestad wrote: > A few data structure in the ci allocate unconditionally created GrowableArrays out-of-line, have fields that are newer updated/read, or are unnecessarily cached. By cleaning this up we can slightly reduce memory used for JIT compilations while slightly speeding them up. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/ci/ciObjectFactory.java line 82: > 80: public GrowableArray symbols() { > 81: Address addr = getAddress().addOffsetTo(symbolsField.getOffset()); > 82: return GrowableArray.create(addr, ciSymbolConstructor); It's unclear to me why these two changes were needed. Don't they produce the same result before and after? ------------- PR: https://git.openjdk.java.net/jdk/pull/1346 From redestad at openjdk.java.net Fri Nov 20 22:42:02 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 20 Nov 2020 22:42:02 GMT Subject: RFR: 8256741: Reduce footprint of compiler interface data structures In-Reply-To: References: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> Message-ID: On Fri, 20 Nov 2020 22:21:04 GMT, Chris Plummer wrote: >> A few data structure in the ci allocate unconditionally created GrowableArrays out-of-line, have fields that are newer updated/read, or are unnecessarily cached. By cleaning this up we can slightly reduce memory used for JIT compilations while slightly speeding them up. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/ci/ciObjectFactory.java line 82: > >> 80: public GrowableArray symbols() { >> 81: Address addr = getAddress().addOffsetTo(symbolsField.getOffset()); >> 82: return GrowableArray.create(addr, ciSymbolConstructor); > > It's unclear to me why these two changes were needed. Don't they produce the same result before and after? When changing the fields from `GrowableArray<..>*` to `GrowableArray<..>` I looked around and found another use of an embedded `GrowableArray` in `InlineTree.java` and took my cues from that. If you're certain these approaches are semantically identical I can drop these changes, but as I'm not exactly sure how to verify and test this I went the safe(?) route of leaning on prior art here. ------------- PR: https://git.openjdk.java.net/jdk/pull/1346 From cjplummer at openjdk.java.net Fri Nov 20 22:53:59 2020 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Fri, 20 Nov 2020 22:53:59 GMT Subject: RFR: 8256741: Reduce footprint of compiler interface data structures In-Reply-To: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> References: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> Message-ID: On Fri, 20 Nov 2020 12:19:48 GMT, Claes Redestad wrote: > A few data structure in the ci allocate unconditionally created GrowableArrays out-of-line, have fields that are newer updated/read, or are unnecessarily cached. By cleaning this up we can slightly reduce memory used for JIT compilations while slightly speeding them up. Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1346 From cjplummer at openjdk.java.net Fri Nov 20 22:54:01 2020 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Fri, 20 Nov 2020 22:54:01 GMT Subject: RFR: 8256741: Reduce footprint of compiler interface data structures In-Reply-To: References: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> Message-ID: On Fri, 20 Nov 2020 22:39:19 GMT, Claes Redestad wrote: >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/ci/ciObjectFactory.java line 82: >> >>> 80: public GrowableArray symbols() { >>> 81: Address addr = getAddress().addOffsetTo(symbolsField.getOffset()); >>> 82: return GrowableArray.create(addr, ciSymbolConstructor); >> >> It's unclear to me why these two changes were needed. Don't they produce the same result before and after? > > When changing the fields from `GrowableArray<..>*` to `GrowableArray<..>` I looked around and found another use of an embedded `GrowableArray` in `InlineTree.java` and took my cues from that. If you're certain these approaches are semantically identical I can drop these changes, but as I'm not exactly sure how to verify and test this I went the safe(?) route of leaning on prior art here. Ah, I see now. I didn't pick up on the change from a pointer type to an embedded type. Yes, I think your changes are correct and necessary. Rather than plucking the pointer value out of the field, you are now computing the address of the field, which seems like the right thing to be doing when changing the field from a pointer type to an embedded type. Consider the SA changes reviewed. I'm not reviewing any of the hotspot changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/1346 From kvn at openjdk.java.net Fri Nov 20 23:14:15 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 20 Nov 2020 23:14:15 GMT Subject: RFR: 8256741: Reduce footprint of compiler interface data structures In-Reply-To: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> References: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> Message-ID: <4Ifnx25KPdlz3Q8zgyahVXCgLoz3F4kYijl1tBzy__g=.21062af5-be49-485c-8c16-30e69feef668@github.com> On Fri, 20 Nov 2020 12:19:48 GMT, Claes Redestad wrote: > A few data structure in the ci allocate unconditionally created GrowableArrays out-of-line, have fields that are newer updated/read, or are unnecessarily cached. By cleaning this up we can slightly reduce memory used for JIT compilations while slightly speeding them up. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1346 From thomas.stuefe at gmail.com Sat Nov 21 05:57:58 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 21 Nov 2020 06:57:58 +0100 Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts In-Reply-To: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> Message-ID: Coleen did ask me to crosspost this to hs-dev. I added the label to the gh issue but skara does not seem to pick it up, so here the manual crosspost. ...Thomas On Fri, Nov 20, 2020 at 11:06 AM Thomas Stuefe wrote: > Hi, > > may I have reviews please for this small change. > > To analyze JDK-8256572, I'd like to see more information in asserts for > binlist and blocktree. > > This patch: > > - beefs up assertion messages when verifying binlist and blocktree > - adds a canary to the blocktree node to detect overwriters > - improves the blocktree printing > - adds a gtest death test to test the overwrite detection. > > Thanks! > > ------------- > > Commit messages: > - Initial > > Changes: https://git.openjdk.java.net/jdk/pull/1339/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1339&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8256725 > Stats: 120 lines in 5 files changed: 92 ins; 9 del; 19 mod > Patch: https://git.openjdk.java.net/jdk/pull/1339.diff > Fetch: git fetch https://git.openjdk.java.net/jdk > pull/1339/head:pull/1339 > > PR: https://git.openjdk.java.net/jdk/pull/1339 > From stuefe at openjdk.java.net Sat Nov 21 09:31:30 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 21 Nov 2020 09:31:30 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v3] In-Reply-To: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> Message-ID: <6AJXvjkfpqDcqQgn9wUjqAUSQe5l1LLqQRxcgKbx1rw=.af8022d5-2283-47ed-b270-1f3a6c3fd8e0@github.com> > Hi, > > may I have reviews please for this small change. > > To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. > > This patch: > > - beefs up assertion messages when verifying binlist and blocktree > - adds a canary to the blocktree node to detect overwriters > - improves the blocktree printing > - adds a gtest death test to test the overwrite detection. > > Thanks! Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Quick node checks in blocktree on insert/removal ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1339/files - new: https://git.openjdk.java.net/jdk/pull/1339/files/491f9782..4e85fe65 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1339&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1339&range=01-02 Stats: 30 lines in 2 files changed: 19 ins; 5 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1339.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1339/head:pull/1339 PR: https://git.openjdk.java.net/jdk/pull/1339 From stuefe at openjdk.java.net Sat Nov 21 09:31:31 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 21 Nov 2020 09:31:31 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v3] In-Reply-To: References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> <3kBfn2ZpoKAkirT6YjCY3b57QCZahpNWRPEDHkqePE4=.1d900069-320b-47ca-8ea6-1e81676ebd56@github.com> Message-ID: On Fri, 20 Nov 2020 16:12:18 GMT, Coleen Phillimore wrote: >>> Stylistic drive-by comments. >> >> Thanks Aleksey! All valid, all fixed. > > Can you add the hotspot-dev mailing list to these RFRs? I added some more checks - which should be very quick - for all passed nodes when inserting or removing into the BlockTree. Similar for BinList. ------------- PR: https://git.openjdk.java.net/jdk/pull/1339 From jbhateja at openjdk.java.net Sat Nov 21 18:59:00 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 21 Nov 2020 18:59:00 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v15] In-Reply-To: References: Message-ID: <0i0tjPQs0FixJzm1sk3a6CqTHQodF-U55lM__ePTI2c=.d6070473-9aa4-41cf-8e35-f280f5d4836d@github.com> > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/edb74db3..b83808a8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=14 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=13-14 Stats: 14 lines in 3 files changed: 3 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Sat Nov 21 19:31:13 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 21 Nov 2020 19:31:13 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v16] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with three additional commits since the last revision: - Reverting configure file - Merge branch 'JDK-8252848' of http://github.com/jatin-bhateja/jdk into JDK-8252848 - Review comments resolution. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/b83808a8..4a2a7897 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=15 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=14-15 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From jbhateja at openjdk.java.net Sat Nov 21 19:40:12 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 21 Nov 2020 19:40:12 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 00:06:31 GMT, Vladimir Kozlov wrote: > Forgot to say that failure was on Windows with only avx512f, avx512cd Thanks Vladimir, I have resolved your review comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kvn at openjdk.java.net Sun Nov 22 02:22:28 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 22 Nov 2020 02:22:28 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Sat, 21 Nov 2020 19:36:54 GMT, Jatin Bhateja wrote: >> Forgot to say that failure was on Windows with only avx512f, avx512cd. > >> Forgot to say that failure was on Windows with only avx512f, avx512cd > > Thanks Vladimir, I have resolved your review comments. Version 15 failed next tests on linux-x64 with -XX:+UseParallelGC -XX:+UseNUMA flags: vmTestbase/metaspace/stressHierarchy/stressHierarchy015/TestDescription.java vmTestbase/metaspace/stressHierarchy/stressHierarchy006/TestDescription.java vmTestbase/metaspace/stressHierarchy/stressHierarchy005/TestDescription.java # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/opto/macroArrayCopy.cpp:861), pid=8205, tid=8216 # assert(ArrayCopyNode::may_modify(dest_t, (*ctrl)->in(0)->as_MemBar(), &_igvn, ac)) failed: dependency on arraycopy lost # # Problematic frame: # V [libjvm.so+0x134fbfc] PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, AllocateArrayNode*, Node**, MergeMemNode*, Node**, TypePtr const*, BasicType, Node*, Node*, Node*, Node*, Node*, bool, bool, RegionNode*)+0x30bc # Host: Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz, 8 cores, 58G, Oracle Linux Server release 7.9 Current CompileTask: C2: 27392 5458 4 package_level34_num50.Dummy::composeString (10 bytes) Stack: [0x00007f4c5a024000,0x00007f4c5a125000], sp=0x00007f4c5a120420, free space=1009k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x134fbfc] PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, AllocateArrayNode*, Node**, MergeMemNode*, Node**, TypePtr const*, BasicType, Node*, Node*, Node*, Node*, Node*, bool, bool, RegionNode*)+0x30bc V [libjvm.so+0x1350741] PhaseMacroExpand::expand_arraycopy_node(ArrayCopyNode*)+0x641 V [libjvm.so+0x1340d7b] PhaseMacroExpand::expand_macro_nodes()+0xfdb V [libjvm.so+0x9fe79b] Compile::Optimize()+0x177b V [libjvm.so+0xa00268] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x17e8 V [libjvm.so+0x8322ae] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1ce V [libjvm.so+0xa103f8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08 V [libjvm.so+0xa10f48] CompileBroker::compiler_thread_loop()+0x5a8 ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From shade at openjdk.java.net Sun Nov 22 18:12:41 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 22 Nov 2020 18:12:41 GMT Subject: Integrated: 8256497: Zero: enable G1 and Shenandoah GCs In-Reply-To: References: Message-ID: On Tue, 17 Nov 2020 19:02:03 GMT, Aleksey Shipilev wrote: > Following the [JDK-8255796](https://bugs.openjdk.java.net/browse/JDK-8255796) improvement that ditched the inline contiguous alloc use from Zero, we can now rely on GC interface to hook the GCs properly. G1 and Shenandoah are a bit special here, because they require special `Reference.get` handling. > > Note that it does not change the default GC for Zero, because Zero is implicitly `NeverActAsServerMachine`, which still selects Serial GC by default. After this change, Zero users can opt-in to G1 or Shenandoah. > > Additional testing: > - [x] Linux x86_64 Zero fastdebug `hotspot_gc_shenandoah` (some lingering failures about non-enabled compressed oops) > - [x] Linux x86_64 Zero fastdebug `tier1` with `-XX:+UseG1GC` This pull request has now been integrated. Changeset: e06a6839 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/e06a6839 Stats: 72 lines in 5 files changed: 54 ins; 16 del; 2 mod 8256497: Zero: enable G1 and Shenandoah GCs Reviewed-by: rkennke, erikj, ihse ------------- PR: https://git.openjdk.java.net/jdk/pull/1268 From shade at openjdk.java.net Sun Nov 22 18:12:39 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 22 Nov 2020 18:12:39 GMT Subject: RFR: 8256497: Zero: enable G1 and Shenandoah GCs [v2] In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 09:55:04 GMT, Magnus Ihse Bursie wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merge branch 'master' into JDK-8256497-zero-g1-shenandoah >> - Remove TODO >> - 8256497: Zero: enable G1 and Shenandoah GCs > > Build changes look good. Thanks! By now I completed more thorough tests with Shenandoah, and those look fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/1268 From jbhateja at openjdk.java.net Sun Nov 22 21:04:56 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 22 Nov 2020 21:04:56 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v17] In-Reply-To: References: Message-ID: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Removing special handling for constant length, GVN will remove dead stub blocks in case constant length is less than partial inline size. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/302/files - new: https://git.openjdk.java.net/jdk/pull/302/files/4a2a7897..465c5f54 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=16 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=302&range=15-16 Stats: 68 lines in 2 files changed: 1 ins; 39 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/302/head:pull/302 PR: https://git.openjdk.java.net/jdk/pull/302 From kbarrett at openjdk.java.net Mon Nov 23 02:41:27 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 23 Nov 2020 02:41:27 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification Message-ID: Please review this change to Reference.clear() to address several issues. (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent field to null may extend the lifetime of the referent value. (JDK-8240696) For GCs with concurrent reference processing, clearing the referent field during reference processing may discard the expected notification. Both of these are addressed by introducing a private native helper function for clearing the referent, rather than using an ordinary in-Java field assignment. Tests have been added for both of these issues. This required adding a new breakpoint in reference processing for ZGC. Of course, finalization adds some complexity to the problem. We deal with that by having FinalReference override clear. The implementation is provided by a new package-private method in Reference. (There are a number of alternatives, all of them clumsy; finalization is annoying that way.) While dealing with FinalReference clearing it was noted that the recent JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not updated to call the new Reference.getInactive(), instead still calling get() on FinalReferences, with the JDK-8256106 problems. Fixing that showed the assertion for inactive FinalReference added by JDK-8256370 used the wrong test. Rather than tracking down and changing all get() and clear() calls on final references and changing them to use getInactive and a new similar clear function, I've changed FinalReference to override get and clear, which call the helper functions in Reference. I've also renamed getInactive to be more explanatory and less convenient to call directly, and similarly named the helper for clear. This means that get/clear should never be called on an active FinalReference. That's already never done, and would have problems if it were. Testing: mach5 tier1-6 Local (linux-x64) tier1 using Shenandoah. New TestReferenceClearDuringMarking fails for G1 without these changes. New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. ------------- Commit messages: - add private native Reference::clear0 - test clear during marking - test clear during reference processing Changes: https://git.openjdk.java.net/jdk/pull/1376/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1376&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256517 Stats: 304 lines in 13 files changed: 279 ins; 16 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/1376.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1376/head:pull/1376 PR: https://git.openjdk.java.net/jdk/pull/1376 From redestad at openjdk.java.net Mon Nov 23 10:20:59 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 10:20:59 GMT Subject: RFR: 8256741: Reduce footprint of compiler interface data structures In-Reply-To: References: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> Message-ID: On Fri, 20 Nov 2020 22:51:23 GMT, Chris Plummer wrote: >> A few data structure in the ci allocate unconditionally created GrowableArrays out-of-line, have fields that are newer updated/read, or are unnecessarily cached. By cleaning this up we can slightly reduce memory used for JIT compilations while slightly speeding them up. > > Marked as reviewed by cjplummer (Reviewer). @plummercj @vnkozlov - thank you for reviewing ------------- PR: https://git.openjdk.java.net/jdk/pull/1346 From redestad at openjdk.java.net Mon Nov 23 10:21:00 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 10:21:00 GMT Subject: Integrated: 8256741: Reduce footprint of compiler interface data structures In-Reply-To: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> References: <_ueFk7sJ0AZFFkyoMgEl-st9DaN-8PRJWPnBHfXpRn0=.cd50148b-3ed0-437a-8508-adbe2a104a11@github.com> Message-ID: On Fri, 20 Nov 2020 12:19:48 GMT, Claes Redestad wrote: > A few data structure in the ci allocate unconditionally created GrowableArrays out-of-line, have fields that are newer updated/read, or are unnecessarily cached. By cleaning this up we can slightly reduce memory used for JIT compilations while slightly speeding them up. This pull request has now been integrated. Changeset: c0689d25 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/c0689d25 Stats: 226 lines in 9 files changed: 28 ins; 88 del; 110 mod 8256741: Reduce footprint of compiler interface data structures Reviewed-by: cjplummer, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1346 From rrich at openjdk.java.net Mon Nov 23 10:26:02 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 23 Nov 2020 10:26:02 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v3] In-Reply-To: <6AJXvjkfpqDcqQgn9wUjqAUSQe5l1LLqQRxcgKbx1rw=.af8022d5-2283-47ed-b270-1f3a6c3fd8e0@github.com> References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> <6AJXvjkfpqDcqQgn9wUjqAUSQe5l1LLqQRxcgKbx1rw=.af8022d5-2283-47ed-b270-1f3a6c3fd8e0@github.com> Message-ID: On Sat, 21 Nov 2020 09:31:30 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I have reviews please for this small change. >> >> To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. >> >> This patch: >> >> - beefs up assertion messages when verifying binlist and blocktree >> - adds a canary to the blocktree node to detect overwriters >> - improves the blocktree printing >> - adds a gtest death test to test the overwrite detection. >> >> Thanks! > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Quick node checks in blocktree on insert/removal Just a few minor points. In general looks good. src/hotspot/share/memory/metaspace/binList.hpp line 161: > 159: assert(b->_word_size >= word_size && > 160: b->_word_size == real_word_size, > 161: "bad block size in list[%u] (" BLOCK_FORMAT ")", index, BLOCK_FORMAT_ARGS(b)); index is signed but you use %u in the format. src/hotspot/share/memory/metaspace/binList.hpp line 189: > 187: for (Block* b = _blocks[i]; b != NULL; b = b->_next, pos++) { > 188: assert(b->_word_size == s, > 189: "bad block size in list[%u] at pos %d (" BLOCK_FORMAT ")", index is signed but you use %u in the format. src/hotspot/share/memory/metaspace/blockTree.cpp line 112: > 110: // Assume a (ridiculously large) edge limit to catch cases > 111: // of badly degenerated or circular trees. > 112: tree_assert(info.depth < 10000, "too deep (%u)", info.depth); info.depth is signed but in the format you use %u. src/hotspot/share/memory/metaspace/blockTree.cpp line 56: > 54: p2i((n) ? (n)->_right : NULL), \ > 55: p2i((n) ? (n)->_next : NULL), \ > 56: ((n) ? (n)->_word_size : 0) The null check is with one exception redundant as n is dereferenced before. Also the check does not help if n is a random invalid address. I think it would be better to print the node only if the canary check was successfull and otherwise try a safe hexdump. ------------- PR: https://git.openjdk.java.net/jdk/pull/1339 From lkorinth at openjdk.java.net Mon Nov 23 10:42:01 2020 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 23 Nov 2020 10:42:01 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v3] In-Reply-To: <6AJXvjkfpqDcqQgn9wUjqAUSQe5l1LLqQRxcgKbx1rw=.af8022d5-2283-47ed-b270-1f3a6c3fd8e0@github.com> References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> <6AJXvjkfpqDcqQgn9wUjqAUSQe5l1LLqQRxcgKbx1rw=.af8022d5-2283-47ed-b270-1f3a6c3fd8e0@github.com> Message-ID: On Sat, 21 Nov 2020 09:31:30 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I have reviews please for this small change. >> >> To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. >> >> This patch: >> >> - beefs up assertion messages when verifying binlist and blocktree >> - adds a canary to the blocktree node to detect overwriters >> - improves the blocktree printing >> - adds a gtest death test to test the overwrite detection. >> >> Thanks! > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Quick node checks in blocktree on insert/removal Looks good. I like the canary in Node. I have three minor comments you should take a look at. src/hotspot/share/memory/metaspace/binList.hpp line 162: > 160: b->_word_size == real_word_size, > 161: "bad block size in list[%u] (" BLOCK_FORMAT ")", index, BLOCK_FORMAT_ARGS(b)); > 162: MetaWord* const p = (MetaWord*)b; I suggest not creating a `p` variable and just casting the `b` variable at return. After your change, they have the same value and just differs in type. src/hotspot/share/memory/metaspace/binList.hpp line 189: > 187: for (Block* b = _blocks[i]; b != NULL; b = b->_next, pos++) { > 188: assert(b->_word_size == s, > 189: "bad block size in list[%u] at pos %d (" BLOCK_FORMAT ")", Both `i` and `pos` are signed integers, why not print them as such. I am a little surprised that the compiler did not catch it, so maybe I am missing something. src/hotspot/share/memory/metaspace/blockTree.cpp line 126: > 124: // check size and ordering > 125: tree_assert_invalid_node(n->_word_size >= MinWordSize && > 126: n->_word_size <= chunklevel::MAX_CHUNK_WORD_SIZE, n); I prefer one assert per line (that is, not using `&&` in asserts). In this case it does not matter as much, as we can deduce which sub-expression failed from the tree printout, but I think it is a better style, because in _general_ you do not want such an assert to fail. ------------- Changes requested by lkorinth (Committer). PR: https://git.openjdk.java.net/jdk/pull/1339 From mcimadamore at openjdk.java.net Mon Nov 23 11:04:09 2020 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 23 Nov 2020 11:04:09 GMT Subject: Integrated: 8254231: Implementation of Foreign Linker API (Incubator) In-Reply-To: References: Message-ID: On Tue, 13 Oct 2020 13:08:14 GMT, Maurizio Cimadamore wrote: > This patch contains the changes associated with the first incubation round of the foreign linker access API incubation > (see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]). > > The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients. > > Disclaimer: the pull request mechanism isn't great at managing *dependent* reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible. > > A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand). > > Thanks > Maurizio > > Webrev: > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev > > Javadoc: > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html > > Specdiff (relative to [3]): > > http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html > > CSR: > > https://bugs.openjdk.java.net/browse/JDK-8254232 > > > > ### API Changes > > The API changes are actually rather slim: > > * `LibraryLookup` > * This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library. > * `FunctionDescriptor` > * This is an abstraction that is very similar, in spirit, to `MethodType`; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function. > * `CLinker` > * This is the real star of the show. A `CLinker` has two main methods: `downcallHandle` and `upcallStub`; the first takes a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a `FunctionDescriptor` and returns a `MethodHandle` instance which can be used to call the target native symbol. The second takes an existing method handle, and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls. > * This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. `C_LONG` and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to *infer* how Java arguments should be shuffled for the native call to take place. > * Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back. > * `NativeScope` > * This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate *try-with-resource* constructs, a `NativeScope` allows clients to use a _single_ block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into `malloc` calls. > * `MemorySegment` > * Only one method added here - namely `handoff(NativeScope)` which allows a segment to be transferred onto an existing native scope. > > ### Safety > > The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a `CLinker` instance is a *restricted* operation, which can be enabled by specifying the usual JDK property `-Dforeign.restricted=permit` (as it's the case for other restricted method in the foreign memory API). > > ### Implementation changes > > The Java changes associated with `LibraryLookup` are relative straightforward; the only interesting thing to note here is that library loading does _not_ depend on class loaders, so `LibraryLookup` is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders). > > As for `NativeScope` the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. `NativeScope` comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of `AbstractNativeScopeImpl`. > > Of course the bulk of the changes are to support the `CLinker` downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5]. > > The main idea behind foreign linker is to infer, given a Java method type (expressed as a `MethodType` instance) and the description of the signature of a native function (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used to turn a Java call into the corresponding native call targeting the requested native function. > > This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various `CallArranger` classes, of which we have a flavor for each supported platform, do exactly that kind of inference. > > For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, `CLinker` offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime extracts this attribute, and performs classification accordingly. > > A native call is decomposed into a sequence of basic, primitive operations, called `Binding` (see the great javadoc on the `Binding.java` class for more info). There are many such bindings - for instance the `Move` binding is used to move a value into a specific machine register/stack slot. So, the main job of the various `CallingArranger` classes is to determine, given a Java `MethodType` and `FunctionDescriptor` what is the set of bindings associated with the downcall/upcall. > > At the heart of the foreign linker support is the `ProgrammableInvoker` class. This class effectively generates a `MethodHandle` which follows the steps described by the various bindings obtained by `CallArranger`. There are actually various strategies to interpret these bindings - listed below: > > * basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see `BindingInterpreter`), except for the `Move` bindings. For these bindings, the move is implemented by allocating a *buffer* (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place. > > * specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except `Move` ones). > > * intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a `static`, `final` field), then the VM can generate specialized assembly code which interprets the `Move` binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI. > > For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized). > > Again, for more readings on the internals of the foreign linker support, please refer to [5]. > > #### Test changes > > Many new tests have been added to validate the foreign linker support; we have high level tests (see `StdLibTest`) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the `callarranger` folder) which test the various `CallArranger`s for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on. > > Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI. > > [1] - https://openjdk.java.net/jeps/389 > [2] - https://openjdk.java.net/jeps/393 > [3] - https://git.openjdk.java.net/jdk/pull/548 > [4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md > [5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html This pull request has now been integrated. Changeset: 0fb31dbf Author: Maurizio Cimadamore URL: https://git.openjdk.java.net/jdk/commit/0fb31dbf Stats: 67469 lines in 212 files changed: 67290 ins; 79 del; 100 mod 8254231: Implementation of Foreign Linker API (Incubator) Reviewed-by: coleenp, ihse, dholmes, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/634 From phedlin at openjdk.java.net Mon Nov 23 11:58:04 2020 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 23 Nov 2020 11:58:04 GMT Subject: RFR: 8255479: [aarch64] assert(src->section_index_of(target) == CodeBuffer::SECT_NONE) failed: sanity Message-ID: The "byte map base" (CardTable) might be materialised as an external address but as such the current relocation support expects an address _external_ to the associated CodeBuffer. This might not be the case since "byte map base" /is not/need not be/ a proper address. Instead, the "byte map base" may be materialised as a constant, in order to avoid relocation (issues). (If PC-relative materialisation should be used, a new RFE is suggested.) Changing assert in "fix_relocation_after_move" to cover both target == NULL and target != NULL, for both source and destination code buffer. ------------- Commit messages: - 8255479: [aarch64] assert(src->section_index_of(target) == CodeBuffer::SECT_NONE) failed: sanity Changes: https://git.openjdk.java.net/jdk/pull/1382/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1382&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255479 Stats: 23 lines in 3 files changed: 1 ins; 11 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1382.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1382/head:pull/1382 PR: https://git.openjdk.java.net/jdk/pull/1382 From phedlin at openjdk.java.net Mon Nov 23 12:18:55 2020 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 23 Nov 2020 12:18:55 GMT Subject: RFR: 8255479: [aarch64] assert(src->section_index_of(target) == CodeBuffer::SECT_NONE) failed: sanity In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 11:53:04 GMT, Patric Hedlin wrote: > The "byte map base" (CardTable) might be materialised as an external address but as such the current relocation support expects an address _external_ to the associated CodeBuffer. This might not be the case since "byte map base" /is not/need not be/ a proper address. Instead, the "byte map base" may be materialised as a constant, in order to avoid relocation (issues). > > (If PC-relative materialisation should be used, a new RFE is suggested.) > > Changing assert in "fix_relocation_after_move" to cover both target == NULL and target != NULL, for both source and destination code buffer. Testing tier1-3. ------------- PR: https://git.openjdk.java.net/jdk/pull/1382 From pliden at openjdk.java.net Mon Nov 23 12:52:56 2020 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 23 Nov 2020 12:52:56 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 01:43:39 GMT, Kim Barrett wrote: > Please review this change to Reference.clear() to address several issues. > > (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent > field to null may extend the lifetime of the referent value. > > (JDK-8240696) For GCs with concurrent reference processing, clearing the > referent field during reference processing may discard the expected > notification. > > Both of these are addressed by introducing a private native helper function > for clearing the referent, rather than using an ordinary in-Java field > assignment. Tests have been added for both of these issues. This required > adding a new breakpoint in reference processing for ZGC. > > Of course, finalization adds some complexity to the problem. We deal with > that by having FinalReference override clear. The implementation is > provided by a new package-private method in Reference. (There are a number > of alternatives, all of them clumsy; finalization is annoying that way.) > > While dealing with FinalReference clearing it was noted that the recent > JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not > updated to call the new Reference.getInactive(), instead still calling get() > on FinalReferences, with the JDK-8256106 problems. Fixing that showed the > assertion for inactive FinalReference added by JDK-8256370 used the wrong > test. > > Rather than tracking down and changing all get() and clear() calls on final > references and changing them to use getInactive and a new similar clear > function, I've changed FinalReference to override get and clear, which call > the helper functions in Reference. I've also renamed getInactive to be more > explanatory and less convenient to call directly, and similarly named the > helper for clear. This means that get/clear should never be called on an > active FinalReference. That's already never done, and would have problems > if it were. > > Testing: > mach5 tier1-6 > Local (linux-x64) tier1 using Shenandoah. > New TestReferenceClearDuringMarking fails for G1 without these changes. > New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. Looks good. Just want to request that you also remove the following comment in zReferenceProcessor.cpp, as it's no longer true. --- a/src/hotspot/share/gc/z/zReferenceProcessor.cpp +++ b/src/hotspot/share/gc/z/zReferenceProcessor.cpp @@ -184,12 +184,6 @@ bool ZReferenceProcessor::should_discover(oop reference, ReferenceType type) con } bool ZReferenceProcessor::should_drop(oop reference, ReferenceType type) const { - // This check is racing with a call to Reference.clear() from the application. - // If the application clears the reference after this check it will still end - // up on the pending list, and there's nothing we can do about that without - // changing the Reference.clear() API. This check is also racing with a call - // to Reference.enqueue() from the application, which is unproblematic, since - // the application wants the reference to be enqueued anyway. const oop referent = reference_referent(reference); if (referent == NULL) { // Reference has been cleared, by a call to Reference.enqueue() ------------- Changes requested by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1376 From shade at openjdk.java.net Mon Nov 23 13:19:11 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 13:19:11 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 Message-ID: Foreign linker broke ARM32 builds. This change stubs out the new entry points, without implementing the actual support yet. Additional testing: - [x] Linux arm fastdebug cross-compilation ------------- Commit messages: - Add debug.hpp include as well - 8256857: ARM32 builds broken after JDK-8254231 - 8256857: ARM32 builds broken after JDK-8254231 Changes: https://git.openjdk.java.net/jdk/pull/1383/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1383&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256857 Stats: 149 lines in 9 files changed: 149 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1383.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1383/head:pull/1383 PR: https://git.openjdk.java.net/jdk/pull/1383 From aph at openjdk.java.net Mon Nov 23 13:47:59 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 23 Nov 2020 13:47:59 GMT Subject: RFR: 8255479: [aarch64] assert(src->section_index_of(target) == CodeBuffer::SECT_NONE) failed: sanity In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 11:53:04 GMT, Patric Hedlin wrote: > The "byte map base" (CardTable) might be materialised as an external address but as such the current relocation support expects an address _external_ to the associated CodeBuffer. This might not be the case since "byte map base" /is not/need not be/ a proper address. Instead, the "byte map base" may be materialised as a constant, in order to avoid relocation (issues). > > (If PC-relative materialisation should be used, a new RFE is suggested.) > > Changing assert in "fix_relocation_after_move" to cover both target == NULL and target != NULL, for both source and destination code buffer. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1382 From stuefe at openjdk.java.net Mon Nov 23 14:17:11 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 23 Nov 2020 14:17:11 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v4] In-Reply-To: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> Message-ID: > Hi, > > may I have reviews please for this small change. > > To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. > > This patch: > > - beefs up assertion messages when verifying binlist and blocktree > - adds a canary to the blocktree node to detect overwriters > - improves the blocktree printing > - adds a gtest death test to test the overwrite detection. > > Thanks! Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Review feedback Richard+Leo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1339/files - new: https://git.openjdk.java.net/jdk/pull/1339/files/4e85fe65..386550bb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1339&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1339&range=02-03 Stats: 49 lines in 3 files changed: 27 ins; 6 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/1339.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1339/head:pull/1339 PR: https://git.openjdk.java.net/jdk/pull/1339 From stuefe at openjdk.java.net Mon Nov 23 14:20:59 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 23 Nov 2020 14:20:59 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v3] In-Reply-To: References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> <6AJXvjkfpqDcqQgn9wUjqAUSQe5l1LLqQRxcgKbx1rw=.af8022d5-2283-47ed-b270-1f3a6c3fd8e0@github.com> Message-ID: On Mon, 23 Nov 2020 10:39:06 GMT, Leo Korinth wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Quick node checks in blocktree on insert/removal > > Looks good. I like the canary in Node. I have three minor comments you should take a look at. Richard, Leo, thanks for reviewing! I fixed your remarks. Changes in the last commit: - %d -> %u - split multiline assert into two asserts - simplified NODE_FORMAT_ARGS - When verifying a tree, I have now special handling for unreadable node pointers and node pointers whose canaries are invalid. In the former case I assert right away, in the latter I print the node as hex dump (I considered doing further analysis like "is in metaspace" which would be trivial but this class is earmarked as a potential future generic class so I did not want to add deps to metaspace). - When an error is detected, the whole tree is printed; now tree printing tippytoes around invalid pointers. ------------- PR: https://git.openjdk.java.net/jdk/pull/1339 From lkorinth at openjdk.java.net Mon Nov 23 14:36:00 2020 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 23 Nov 2020 14:36:00 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v4] In-Reply-To: References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> Message-ID: On Mon, 23 Nov 2020 14:17:11 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I have reviews please for this small change. >> >> To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. >> >> This patch: >> >> - beefs up assertion messages when verifying binlist and blocktree >> - adds a canary to the blocktree node to detect overwriters >> - improves the blocktree printing >> - adds a gtest death test to test the overwrite detection. >> >> Thanks! > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback Richard+Leo Looks good to me. ------------- Marked as reviewed by lkorinth (Committer). PR: https://git.openjdk.java.net/jdk/pull/1339 From jvernee at openjdk.java.net Mon Nov 23 14:53:57 2020 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 23 Nov 2020 14:53:57 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 13:14:06 GMT, Aleksey Shipilev wrote: > Foreign linker broke ARM32 builds. This change stubs out the new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux arm fastdebug cross-compilation LGTM. Though, note that the style guide asks to prefer `nullptr` to `NULL`: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#nullptr nullptr Prefer nullptr (n2431) to NULL. Don't use (constexpr or literal) 0 for pointers. For historical reasons there are widespread uses of both NULL and of integer 0 as a pointer value. ------------- Marked as reviewed by jvernee (Committer). PR: https://git.openjdk.java.net/jdk/pull/1383 From rkennke at openjdk.java.net Mon Nov 23 15:03:57 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 23 Nov 2020 15:03:57 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 01:43:39 GMT, Kim Barrett wrote: > Please review this change to Reference.clear() to address several issues. > > (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent > field to null may extend the lifetime of the referent value. > > (JDK-8240696) For GCs with concurrent reference processing, clearing the > referent field during reference processing may discard the expected > notification. > > Both of these are addressed by introducing a private native helper function > for clearing the referent, rather than using an ordinary in-Java field > assignment. Tests have been added for both of these issues. This required > adding a new breakpoint in reference processing for ZGC. > > Of course, finalization adds some complexity to the problem. We deal with > that by having FinalReference override clear. The implementation is > provided by a new package-private method in Reference. (There are a number > of alternatives, all of them clumsy; finalization is annoying that way.) > > While dealing with FinalReference clearing it was noted that the recent > JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not > updated to call the new Reference.getInactive(), instead still calling get() > on FinalReferences, with the JDK-8256106 problems. Fixing that showed the > assertion for inactive FinalReference added by JDK-8256370 used the wrong > test. > > Rather than tracking down and changing all get() and clear() calls on final > references and changing them to use getInactive and a new similar clear > function, I've changed FinalReference to override get and clear, which call > the helper functions in Reference. I've also renamed getInactive to be more > explanatory and less convenient to call directly, and similarly named the > helper for clear. This means that get/clear should never be called on an > active FinalReference. That's already never done, and would have problems > if it were. > > Testing: > mach5 tier1-6 > Local (linux-x64) tier1 using Shenandoah. > New TestReferenceClearDuringMarking fails for G1 without these changes. > New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. Looks good, except one small nit in one of the test configs. test/hotspot/jtreg/gc/TestReferenceClearDuringReferenceProcessing.java line 28: > 26: /* @test > 27: * @bug 8256517 > 28: * @requires vm.gc.Z Please add | vm.gc.Shenandoah here. ------------- Changes requested by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1376 From shade at openjdk.java.net Mon Nov 23 15:09:21 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 15:09:21 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: > Foreign linker broke ARM32 builds. This change stubs out the new entry points, without implementing the actual support yet. > > Additional testing: > - [x] Linux arm fastdebug cross-compilation Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Use nullptr instead ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1383/files - new: https://git.openjdk.java.net/jdk/pull/1383/files/d20a6624..01918d87 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1383&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1383&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1383.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1383/head:pull/1383 PR: https://git.openjdk.java.net/jdk/pull/1383 From shade at openjdk.java.net Mon Nov 23 15:09:21 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 15:09:21 GMT Subject: RFR: 8256857: ARM32 builds broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 14:51:03 GMT, Jorn Vernee wrote: > Though, note that the style guide asks to prefer `nullptr` to `NULL`: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#nullptr Fixed. Please push your Zero integration first, and then I rebase this one? ------------- PR: https://git.openjdk.java.net/jdk/pull/1383 From rrich at openjdk.java.net Mon Nov 23 15:24:02 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 23 Nov 2020 15:24:02 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v4] In-Reply-To: References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> Message-ID: On Mon, 23 Nov 2020 14:17:11 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I have reviews please for this small change. >> >> To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. >> >> This patch: >> >> - beefs up assertion messages when verifying binlist and blocktree >> - adds a canary to the blocktree node to detect overwriters >> - improves the blocktree printing >> - adds a gtest death test to test the overwrite detection. >> >> Thanks! > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback Richard+Leo Looks good. ------------- Marked as reviewed by rrich (Committer). PR: https://git.openjdk.java.net/jdk/pull/1339 From rrich at openjdk.java.net Mon Nov 23 15:30:00 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 23 Nov 2020 15:30:00 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant Message-ID: This is a XS clean-up of Deoptimization::revoke_for_object_deoptimization() which removes the StackWatermarkSet::start_processing() call. This is correct because all paths leading to revoke_for_object_deoptimization() are equipped with a KeepStackGCProcessedMark. Call Tree: StackWatermarkSet::start_processing(JavaThread *, enum StackWatermarkKind) : void Deoptimization::revoke_for_object_deoptimization(JavaThread *, frame, RegisterMap *, JavaThread *) : void Deoptimization::deoptimize_objects_internal(JavaThread *, GrowableArray *, bool &) : bool EscapeBarrier::deoptimize_objects_internal(JavaThread *, intptr_t *) : bool EscapeBarrier::deoptimize_objects_all_threads() : bool // has KeepStackGCProcessedMark EscapeBarrier::deoptimize_objects(intptr_t *) : bool EscapeBarrier::deoptimize_objects(int, int) : bool // has KeepStackGCProcessedMark Testing: hotspot_serviceability, jdk_svc, jdk_jdi, vmTestbase_nsk_jdi, vmTestbase_nsk_jvmti, vmTestbase_nsk_jdwp with -XX:+UseZGC. ------------- Commit messages: - 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant Changes: https://git.openjdk.java.net/jdk/pull/1381/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1381&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256754 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1381.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1381/head:pull/1381 PR: https://git.openjdk.java.net/jdk/pull/1381 From ihse at openjdk.java.net Mon Nov 23 15:59:07 2020 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 23 Nov 2020 15:59:07 GMT Subject: RFR: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <9oXnHULCd76_J69CKMVVZl3FfDte1pnt38y06LVV4Sg=.26a4ab2c-5ff7-4e2f-9428-0d8cd931d243@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> <8Eqswd7tsVaGEXHdKDncXqKpW2tBsSeuY0PV6aTB9_c=.a6cf4957-9d31-4e89-bf44-e7b7852205d5@github.com> <2S00ucaPGiAQLeLOejt1kfXeYEc7ctEPeRCIcq1N0N8=.dbf1ea7a-8de4-48a5-8759-03495e3e3c08@github.com> <9oXnHULCd76_J69CKMVVZl3FfDte1pnt38y06LVV4Sg=.26a4ab2c-5ff7-4e2f-9428-0d8cd931d243@github.com> Message-ID: On Mon, 26 Oct 2020 11:41:16 GMT, Magnus Ihse Bursie wrote: >> Some notes (perhaps most to myself) about how this ties into the existing hsdis implementation, and with JDK-8188073 (Capstone porting). >> >> When printing disassembly from hotspot, the current solution tries to locate and load the hsdis library, which prints disassembly using bfd. The reason for using the separate library approach is, as far as I can understand, perhaps a mix of both incompatible licensing for bfd, and a wish to not burden the jvm library with additional bloat which is needed only for debugging. >> >> The Capstone approach, in the prototype patch presented by Jorn in the issue, is to create a new capstonedis library, and dispatch to it instead of hsdis. >> >> The approach used in this patch is to refactor the existing hsdis library into an abstract base class for hsdis backends, with two concrete implementations, one for bfd and one for llvm. >> >> Unfortunately, I think the resulting code in hsdis.cpp in this patch is hard to read and understand. > > I think a proper solution to both this and the Capstone implementation is to provide a proper framework for selecting the hsdis backend as a first step, and refactor the existing bfd implementation as the first such backend. After that, we can add llvm and capstone as alternative hsdis backend implementations. FWIW, I started working on a framework which would add support for selectable backends for hsdis. Unfortunately it was not as simple as I initially thought, so I had to put it on hold while directing my time to working on the winenv patch instead. I believe the proper way forward is to get the "selectable hsdis backend" framework in place, and then retrofit this patch to add LLVM support in that framework. If this means that this PR should be closed, or kept open until this is done, I don't know. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From mdoerr at openjdk.java.net Mon Nov 23 17:03:06 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 23 Nov 2020 17:03:06 GMT Subject: RFR: 8256843: [PPC64] runtime/logging/RedefineClasses.java fails with assert: registers not saved on stack Message-ID: Method handle logging is broken in fastdebug builds. Problem is that os::current_frame() doesn't return the right frame in fastdebug builds. ------------- Commit messages: - 8256843: [PPC64] runtime/logging/RedefineClasses.java fails with assert: registers not saved on stack Changes: https://git.openjdk.java.net/jdk/pull/1394/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1394&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256843 Stats: 13 lines in 3 files changed: 0 ins; 8 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1394.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1394/head:pull/1394 PR: https://git.openjdk.java.net/jdk/pull/1394 From stuefe at openjdk.java.net Mon Nov 23 18:00:59 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 23 Nov 2020 18:00:59 GMT Subject: Withdrawn: JDK-8256725: Metaspace: better blocktree and binlist asserts In-Reply-To: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> Message-ID: On Fri, 20 Nov 2020 08:34:18 GMT, Thomas Stuefe wrote: > Hi, > > may I have reviews please for this small change. > > To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. > > This patch: > > - beefs up assertion messages when verifying binlist and blocktree > - adds a canary to the blocktree node to detect overwriters > - improves the blocktree printing > - adds a gtest death test to test the overwrite detection. > > Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1339 From shade at openjdk.java.net Mon Nov 23 18:43:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Nov 2020 18:43:58 GMT Subject: RFR: JDK-8256725: Metaspace: better blocktree and binlist asserts [v4] In-Reply-To: References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> Message-ID: On Mon, 23 Nov 2020 14:17:11 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I have reviews please for this small change. >> >> To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. >> >> This patch: >> >> - beefs up assertion messages when verifying binlist and blocktree >> - adds a canary to the blocktree node to detect overwriters >> - improves the blocktree printing >> - adds a gtest death test to test the overwrite detection. >> >> Thanks! > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback Richard+Leo I guess you miss a formal Reviewer ack. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1339 From sviswanathan at openjdk.java.net Mon Nov 23 18:51:57 2020 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 23 Nov 2020 18:51:57 GMT Subject: Integrated: 8256585: Remove in-place conversion vector operators from Vector API In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 03:26:20 GMT, Sandhya Viswanathan wrote: > Remove partially implemented in-place conversion vector operators from Vector API: > ofNarrowing, ofWidening, INPLACE_XXX This pull request has now been integrated. Changeset: 9de5d091 Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/9de5d091 Stats: 133 lines in 2 files changed: 0 ins; 130 del; 3 mod 8256585: Remove in-place conversion vector operators from Vector API Reviewed-by: psandoz ------------- PR: https://git.openjdk.java.net/jdk/pull/1305 From stuefe at openjdk.java.net Mon Nov 23 18:52:56 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 23 Nov 2020 18:52:56 GMT Subject: Integrated: JDK-8256725: Metaspace: better blocktree and binlist asserts In-Reply-To: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> References: <6DX2ZpqTMWMl__Tbi636rONPiWW8f8sJXZ66mXpehT8=.c1776be5-1a83-4eab-942e-d7f6646e4a24@github.com> Message-ID: On Fri, 20 Nov 2020 08:34:18 GMT, Thomas Stuefe wrote: > Hi, > > may I have reviews please for this small change. > > To analyze JDK-8256572, I'd like to see more information in asserts for binlist and blocktree. > > This patch: > > - beefs up assertion messages when verifying binlist and blocktree > - adds a canary to the blocktree node to detect overwriters > - improves the blocktree printing > - adds a gtest death test to test the overwrite detection. > > Thanks! This pull request has now been integrated. Changeset: fa75ad69 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/fa75ad69 Stats: 175 lines in 5 files changed: 131 ins; 15 del; 29 mod 8256725: Metaspace: better blocktree and binlist asserts Reviewed-by: shade, rrich, lkorinth ------------- PR: https://git.openjdk.java.net/jdk/pull/1339 From pliden at openjdk.java.net Mon Nov 23 19:20:59 2020 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 23 Nov 2020 19:20:59 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 15:00:08 GMT, Roman Kennke wrote: >> Please review this change to Reference.clear() to address several issues. >> >> (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent >> field to null may extend the lifetime of the referent value. >> >> (JDK-8240696) For GCs with concurrent reference processing, clearing the >> referent field during reference processing may discard the expected >> notification. >> >> Both of these are addressed by introducing a private native helper function >> for clearing the referent, rather than using an ordinary in-Java field >> assignment. Tests have been added for both of these issues. This required >> adding a new breakpoint in reference processing for ZGC. >> >> Of course, finalization adds some complexity to the problem. We deal with >> that by having FinalReference override clear. The implementation is >> provided by a new package-private method in Reference. (There are a number >> of alternatives, all of them clumsy; finalization is annoying that way.) >> >> While dealing with FinalReference clearing it was noted that the recent >> JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not >> updated to call the new Reference.getInactive(), instead still calling get() >> on FinalReferences, with the JDK-8256106 problems. Fixing that showed the >> assertion for inactive FinalReference added by JDK-8256370 used the wrong >> test. >> >> Rather than tracking down and changing all get() and clear() calls on final >> references and changing them to use getInactive and a new similar clear >> function, I've changed FinalReference to override get and clear, which call >> the helper functions in Reference. I've also renamed getInactive to be more >> explanatory and less convenient to call directly, and similarly named the >> helper for clear. This means that get/clear should never be called on an >> active FinalReference. That's already never done, and would have problems >> if it were. >> >> Testing: >> mach5 tier1-6 >> Local (linux-x64) tier1 using Shenandoah. >> New TestReferenceClearDuringMarking fails for G1 without these changes. >> New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. > > test/hotspot/jtreg/gc/TestReferenceClearDuringReferenceProcessing.java line 28: > >> 26: /* @test >> 27: * @bug 8256517 >> 28: * @requires vm.gc.Z > > Please add | vm.gc.Shenandoah here. Note that for this test to be useful, the GC needs to support concurrent GC breakpoints, which Shenandoah doesn't do. ------------- PR: https://git.openjdk.java.net/jdk/pull/1376 From harold.seigel at oracle.com Mon Nov 23 20:27:21 2020 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 23 Nov 2020 15:27:21 -0500 Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: <618aa897-18fd-fd70-1f0a-506e0e5a74d8@oracle.com> References: <618aa897-18fd-fd70-1f0a-506e0e5a74d8@oracle.com> Message-ID: Hi David, Thanks for looking at this. The intent was for method Class.permittedSubclasses() to be implemented similarly to Class.getNestMembers().? Are you suggesting that a security manager check be added to permittedSubclasses() similar to the security manager check in getNestMembers()? Thanks, Harold On 11/18/2020 12:31 AM, David Holmes wrote: > Hi Vincente, > > On 16/11/2020 11:36 pm, Vicente Romero wrote: >> Please review the code for the second iteration of sealed classes. In >> this iteration we are: >> >> - Enhancing narrowing reference conversion to allow for stricter >> checking of cast conversions with respect to sealed type hierarchies. >> - Also local classes are not considered when determining implicitly >> declared permitted direct subclasses of a sealed class or sealed >> interface > > The major change here seems to be that getPermittedSubclasses() now > returns actual Class objects instead of ClassDesc. My recollection > from earlier discussions here was that the use of ClassDesc was very > deliberate as the permitted subclasses may not actually exist and > there may be security concerns with returning them! > > Cheers, > David > ----- > >> ------------- >> >> Commit messages: >> ? - 8246778: Compiler implementation for Sealed Classes (Second Preview) >> >> Changes: https://git.openjdk.java.net/jdk/pull/1227/files >> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1227&range=00 >> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 >> ?? Stats: 589 lines in 12 files changed: 508 ins; 18 del; 63 mod >> ?? Patch: https://git.openjdk.java.net/jdk/pull/1227.diff >> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >> pull/1227/head:pull/1227 >> >> PR: https://git.openjdk.java.net/jdk/pull/1227 >> From rkennke at openjdk.java.net Mon Nov 23 20:39:01 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 23 Nov 2020 20:39:01 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification In-Reply-To: References: Message-ID: <4C7DyAgcXsSq3YEzbbwWeLaIbWwOyEriG8_4QrWNZ80=.4b75553f-7eaf-4d8b-9b47-007fc0609ba7@github.com> On Mon, 23 Nov 2020 01:43:39 GMT, Kim Barrett wrote: > Please review this change to Reference.clear() to address several issues. > > (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent > field to null may extend the lifetime of the referent value. > > (JDK-8240696) For GCs with concurrent reference processing, clearing the > referent field during reference processing may discard the expected > notification. > > Both of these are addressed by introducing a private native helper function > for clearing the referent, rather than using an ordinary in-Java field > assignment. Tests have been added for both of these issues. This required > adding a new breakpoint in reference processing for ZGC. > > Of course, finalization adds some complexity to the problem. We deal with > that by having FinalReference override clear. The implementation is > provided by a new package-private method in Reference. (There are a number > of alternatives, all of them clumsy; finalization is annoying that way.) > > While dealing with FinalReference clearing it was noted that the recent > JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not > updated to call the new Reference.getInactive(), instead still calling get() > on FinalReferences, with the JDK-8256106 problems. Fixing that showed the > assertion for inactive FinalReference added by JDK-8256370 used the wrong > test. > > Rather than tracking down and changing all get() and clear() calls on final > references and changing them to use getInactive and a new similar clear > function, I've changed FinalReference to override get and clear, which call > the helper functions in Reference. I've also renamed getInactive to be more > explanatory and less convenient to call directly, and similarly named the > helper for clear. This means that get/clear should never be called on an > active FinalReference. That's already never done, and would have problems > if it were. > > Testing: > mach5 tier1-6 > Local (linux-x64) tier1 using Shenandoah. > New TestReferenceClearDuringMarking fails for G1 without these changes. > New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. Looks good! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1376 From rkennke at openjdk.java.net Mon Nov 23 20:39:03 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 23 Nov 2020 20:39:03 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification In-Reply-To: References: Message-ID: <2BmKewSdmP4lGhembmuF4d2hQbVtJFrW0Cv_XRVdD3U=.492b7c6a-c38e-4159-8c29-656073502176@github.com> On Mon, 23 Nov 2020 19:18:05 GMT, Per Liden wrote: >> test/hotspot/jtreg/gc/TestReferenceClearDuringReferenceProcessing.java line 28: >> >>> 26: /* @test >>> 27: * @bug 8256517 >>> 28: * @requires vm.gc.Z >> >> Please add | vm.gc.Shenandoah here. > > Note that for this test to be useful, the GC needs to support concurrent GC breakpoints, which Shenandoah doesn't do. Ok, right. Nevermind then! ------------- PR: https://git.openjdk.java.net/jdk/pull/1376 From mchung at openjdk.java.net Mon Nov 23 20:58:54 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Mon, 23 Nov 2020 20:58:54 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 01:43:39 GMT, Kim Barrett wrote: > Please review this change to Reference.clear() to address several issues. > > (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent > field to null may extend the lifetime of the referent value. > > (JDK-8240696) For GCs with concurrent reference processing, clearing the > referent field during reference processing may discard the expected > notification. > > Both of these are addressed by introducing a private native helper function > for clearing the referent, rather than using an ordinary in-Java field > assignment. Tests have been added for both of these issues. This required > adding a new breakpoint in reference processing for ZGC. > > Of course, finalization adds some complexity to the problem. We deal with > that by having FinalReference override clear. The implementation is > provided by a new package-private method in Reference. (There are a number > of alternatives, all of them clumsy; finalization is annoying that way.) > > While dealing with FinalReference clearing it was noted that the recent > JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not > updated to call the new Reference.getInactive(), instead still calling get() > on FinalReferences, with the JDK-8256106 problems. Fixing that showed the > assertion for inactive FinalReference added by JDK-8256370 used the wrong > test. > > Rather than tracking down and changing all get() and clear() calls on final > references and changing them to use getInactive and a new similar clear > function, I've changed FinalReference to override get and clear, which call > the helper functions in Reference. I've also renamed getInactive to be more > explanatory and less convenient to call directly, and similarly named the > helper for clear. This means that get/clear should never be called on an > active FinalReference. That's already never done, and would have problems > if it were. > > Testing: > mach5 tier1-6 > Local (linux-x64) tier1 using Shenandoah. > New TestReferenceClearDuringMarking fails for G1 without these changes. > New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. Looks good. Thanks for correcting `assert next != null` inactive FinalReference check. ------------- Marked as reviewed by mchung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1376 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:09 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:09 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs Message-ID: This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. ------------- Commit messages: - 8255351: Add detection for Graviton 1 & 2 CPUs Changes: https://git.openjdk.java.net/jdk/pull/1315/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1315&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255351 Stats: 14 lines in 1 file changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1315.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1315/head:pull/1315 PR: https://git.openjdk.java.net/jdk/pull/1315 From simonis at openjdk.java.net Mon Nov 23 21:07:09 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 23 Nov 2020 21:07:09 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 12:39:47 GMT, Eugene Astigeevich wrote: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. Yes, that's good. Handling potential regressions/improvements in #1293 is fine for me. Evgeny works for the Amazon Corretto team and is covered by the Amazon OCA. ------------- Marked as reviewed by simonis (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:10 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:10 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 16:47:01 GMT, Volker Simonis wrote: >> Evgeny works for the Amazon Corretto team and is covered by the Amazon OCA. > > Hi Evegeny, > > in general, your changes look good to me. > > You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them. > Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :) > > Could you please also post some results for byte arrays (with and without SIMD). > > Thank you and best regards, > Volker I work for the Amazon Corretto team and am covered by the Amazon OCA. See the [comment](https://github.com/openjdk/jdk/pull/1315#issuecomment-731262807) from Volker above. > Hi Evegeny, > > in general, your changes look good to me. > > You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them. > Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :) > > Could you please also post some results for byte arrays (with and without SIMD). > > Thank you and best regards, > Volker Thank you, Volker. I did more runs of testByte microbenchmarks with PR https://github.com/openjdk/jdk/pull/1293. They show that only ArrayCopyUnalignedDst.testByte has some regressions. I am running full range of copying from 65 to 96 to see which are more affected. I decided to enable UseSIMDForMemoryOps for all types of copying because overall it with PR https://github.com/openjdk/jdk/pull/1293 brings good improvements. I'll address ArrayCopyUnalignedDst.testByte regressions in a separate PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From simonis at openjdk.java.net Mon Nov 23 21:07:09 2020 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 23 Nov 2020 21:07:09 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 16:16:19 GMT, Volker Simonis wrote: >> This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. > > Evgeny works for the Amazon Corretto team and is covered by the Amazon OCA. Hi Evegeny, in general, your changes look good to me. You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them. Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :) Could you please also post some results for byte arrays (with and without SIMD). Thank you and best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 21:07:10 2020 From: github.com+42899633+eastig at openjdk.java.net (Eugene Astigeevich) Date: Mon, 23 Nov 2020 21:07:10 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: <_XSbz8D-TnDy1N1JfJC07OgEa6tfQ2GhRGdaJSkUAb4=.941ec975-9ca7-46fe-94ac-15b5b7c87ecb@github.com> On Fri, 20 Nov 2020 17:47:13 GMT, Eugene Astigeevich wrote: >> Hi Evegeny, >> >> in general, your changes look good to me. >> >> You've disabled SIMD instructions for copying byte arrays <= 96 because you say there are no benefits from using them. >> Have you seen regression in the microbenchmarks? As far as I can see, you've only pasted the results for char, int and long (where the improvements are quite nice, by the way :) >> >> Could you please also post some results for byte arrays (with and without SIMD). >> >> Thank you and best regards, >> Volker > > I work for the Amazon Corretto team and am covered by the Amazon OCA. See the [comment](https://github.com/openjdk/jdk/pull/1315#issuecomment-731262807) from Volker above. These are JMH microbenchmark results when UseSIMDForMemoryOps is on for all types of copying. Regressions are due to the use of ld4/st4 and are fixed in PR https://github.com/openjdk/jdk/pull/1293 |Benchmark|Length|Cnt|Units|Diff|Max Relative Error| |-|-|-|-|-|-| |ArrayCopy.arrayCopyChar|46|25|ns/op|5.26%🔴|0.01%| |ArrayCopy.arrayCopyCharNonConst|46|25|ns/op|15.79%🔴|0.01%| |ArrayCopy.arrayCopyObject|200|25|ns/op|-5.91%|1.76%| |ArrayCopy.arrayCopyObjectNonConst|200|25|ns/op|-5.78%|0.73%| |ArrayCopy.arrayCopyObjectSameArraysBackwa|200|25|ns/op|-1.27%|0.71%| |ArrayCopy.arrayCopyObjectSameArraysForwar|200|25|ns/op|-2.05%|0.76%| |ArrayCopyAligned.testByte|70|25|ns/op|48.59%🔴|0.48%| |ArrayCopyAligned.testByte|150|25|ns/op|1.72%|4.19%| |ArrayCopyAligned.testByte|300|25|ns/op|-3.20%|4.78%| |ArrayCopyAligned.testByte|600|25|ns/op|-8.24%|2.62%| |ArrayCopyAligned.testByte|1200|25|ns/op|-13.33%|3.13%| |ArrayCopyAligned.testChar|20|25|ns/op|-5.57%|0.01%| |ArrayCopyAligned.testChar|70|25|ns/op|-5.42%|3.64%| |ArrayCopyAligned.testChar|150|25|ns/op|-4.96%|1.58%| |ArrayCopyAligned.testChar|300|25|ns/op|-12.06%|0.83%| |ArrayCopyAligned.testChar|600|25|ns/op|-16.13%|0.37%| |ArrayCopyAligned.testChar|1200|25|ns/op|-16.12%|0.80%| |ArrayCopyAligned.testInt|10|25|ns/op|-5.55%|0.04%| |ArrayCopyAligned.testInt|20|25|ns/op|34.75%🔴|1.30%| |ArrayCopyAligned.testInt|70|25|ns/op|-8.75%|2.12%| |ArrayCopyAligned.testInt|150|25|ns/op|-11.74%|1.13%| |ArrayCopyAligned.testInt|300|25|ns/op|-14.38%|0.69%| |ArrayCopyAligned.testInt|600|25|ns/op|-17.87%|1.08%| |ArrayCopyAligned.testInt|1200|25|ns/op|-18.01%|0.90%| |ArrayCopyAligned.testLong|5|25|ns/op|-4.37%|0.77%| |ArrayCopyAligned.testLong|10|25|ns/op|27.45%🔴|6.59%| |ArrayCopyAligned.testLong|20|25|ns/op|-1.95%|2.76%| |ArrayCopyAligned.testLong|70|25|ns/op|-11.46%|1.42%| |ArrayCopyAligned.testLong|150|25|ns/op|-16.28%|0.68%| |ArrayCopyAligned.testLong|300|25|ns/op|-18.02%|1.90%| |ArrayCopyAligned.testLong|600|25|ns/op|-18.08%|0.90%| |ArrayCopyAligned.testLong|1200|25|ns/op|-18.67%|1.16%| |ArrayCopyUnalignedBoth.testByte|70|25|ns/op|38.98%🔴|0.47%| |ArrayCopyUnalignedBoth.testByte|150|25|ns/op|0.32%|1.15%| |ArrayCopyUnalignedBoth.testByte|300|25|ns/op|-1.94%|1.72%| |ArrayCopyUnalignedBoth.testByte|600|25|ns/op|-4.96%|1.05%| |ArrayCopyUnalignedBoth.testByte|1200|25|ns/op|-11.10%|1.11%| |ArrayCopyUnalignedBoth.testChar|20|25|ns/op|-5.56%|0.07%| |ArrayCopyUnalignedBoth.testChar|70|25|ns/op|-2.05%|1.43%| |ArrayCopyUnalignedBoth.testChar|150|25|ns/op|-5.62%|1.32%| |ArrayCopyUnalignedBoth.testChar|300|25|ns/op|-10.33%|0.70%| |ArrayCopyUnalignedBoth.testChar|600|25|ns/op|-14.68%|0.39%| |ArrayCopyUnalignedBoth.testChar|1200|25|ns/op|-16.43%|0.44%| |ArrayCopyUnalignedBoth.testInt|10|25|ns/op|-5.55%|0.01%| |ArrayCopyUnalignedBoth.testInt|20|25|ns/op|33.55%🔴|0.36%| |ArrayCopyUnalignedBoth.testInt|70|25|ns/op|-5.35%|2.38%| |ArrayCopyUnalignedBoth.testInt|150|25|ns/op|-11.15%|1.75%| |ArrayCopyUnalignedBoth.testInt|300|25|ns/op|-14.18%|1.27%| |ArrayCopyUnalignedBoth.testInt|600|25|ns/op|-15.84%|0.68%| |ArrayCopyUnalignedBoth.testInt|1200|25|ns/op|-16.42%|0.45%| |ArrayCopyUnalignedBoth.testLong|5|25|ns/op|-5.54%|0.01%| |ArrayCopyUnalignedBoth.testLong|10|25|ns/op|35.60%🔴|1.69%| |ArrayCopyUnalignedBoth.testLong|20|25|ns/op|-3.18%|2.26%| |ArrayCopyUnalignedBoth.testLong|70|25|ns/op|-10.66%|0.99%| |ArrayCopyUnalignedBoth.testLong|150|25|ns/op|-15.68%|1.01%| |ArrayCopyUnalignedBoth.testLong|300|25|ns/op|-15.57%|0.47%| |ArrayCopyUnalignedBoth.testLong|600|25|ns/op|-17.11%|0.23%| |ArrayCopyUnalignedBoth.testLong|1200|25|ns/op|-17.00%|0.55%| |ArrayCopyUnalignedDst.testByte|70|25|ns/op|48.32%🔴|0.30%| |ArrayCopyUnalignedDst.testByte|150|25|ns/op|-0.68%|4.05%| |ArrayCopyUnalignedDst.testByte|300|25|ns/op|-7.50%|1.35%| |ArrayCopyUnalignedDst.testByte|600|25|ns/op|-10.04%|1.51%| |ArrayCopyUnalignedDst.testByte|1200|25|ns/op|-14.07%|0.98%| |ArrayCopyUnalignedDst.testChar|20|25|ns/op|-5.54%|0.01%| |ArrayCopyUnalignedDst.testChar|70|25|ns/op|-5.70%|3.79%| |ArrayCopyUnalignedDst.testChar|150|25|ns/op|-6.26%|1.59%| |ArrayCopyUnalignedDst.testChar|300|25|ns/op|-12.78%|0.86%| |ArrayCopyUnalignedDst.testChar|600|25|ns/op|-14.29%|0.54%| |ArrayCopyUnalignedDst.testChar|1200|25|ns/op|-17.37%|1.18%| |ArrayCopyUnalignedDst.testInt|10|25|ns/op|-5.55%|0.01%| |ArrayCopyUnalignedDst.testInt|20|25|ns/op|34.84%🔴|1.46%| |ArrayCopyUnalignedDst.testInt|70|25|ns/op|-8.32%|1.33%| |ArrayCopyUnalignedDst.testInt|150|25|ns/op|-11.82%|0.61%| |ArrayCopyUnalignedDst.testInt|300|25|ns/op|-13.78%|0.59%| |ArrayCopyUnalignedDst.testInt|600|25|ns/op|-16.61%|0.94%| |ArrayCopyUnalignedDst.testInt|1200|25|ns/op|-17.63%|0.78%| |ArrayCopyUnalignedDst.testLong|5|25|ns/op|-5.51%|0.07%| |ArrayCopyUnalignedDst.testLong|10|25|ns/op|37.01%🔴|1.03%| |ArrayCopyUnalignedDst.testLong|20|25|ns/op|-3.52%|2.72%| |ArrayCopyUnalignedDst.testLong|70|25|ns/op|-10.93%|1.16%| |ArrayCopyUnalignedDst.testLong|150|25|ns/op|-16.99%|0.61%| |ArrayCopyUnalignedDst.testLong|300|25|ns/op|-16.65%|0.45%| |ArrayCopyUnalignedDst.testLong|600|25|ns/op|-16.55%|0.60%| |ArrayCopyUnalignedDst.testLong|1200|25|ns/op|-17.37%|0.76%| |ArrayCopyUnalignedSrc.testByte|70|25|ns/op|38.66%🔴|0.22%| |ArrayCopyUnalignedSrc.testByte|150|25|ns/op|1.64%|1.69%| |ArrayCopyUnalignedSrc.testByte|300|25|ns/op|-5.86%|0.64%| |ArrayCopyUnalignedSrc.testByte|600|25|ns/op|-10.30%|1.71%| |ArrayCopyUnalignedSrc.testByte|1200|25|ns/op|-14.25%|0.91%| |ArrayCopyUnalignedSrc.testChar|20|25|ns/op|-5.73%|0.10%| |ArrayCopyUnalignedSrc.testChar|70|25|ns/op|-3.69%|1.68%| |ArrayCopyUnalignedSrc.testChar|150|25|ns/op|-8.36%|2.28%| |ArrayCopyUnalignedSrc.testChar|300|25|ns/op|-9.90%|0.49%| |ArrayCopyUnalignedSrc.testChar|600|25|ns/op|-15.08%|0.55%| |ArrayCopyUnalignedSrc.testChar|1200|25|ns/op|-17.08%|0.49%| |ArrayCopyUnalignedSrc.testInt|10|25|ns/op|-5.55%|0.01%| |ArrayCopyUnalignedSrc.testInt|20|25|ns/op|33.53%🔴|0.28%| |ArrayCopyUnalignedSrc.testInt|70|25|ns/op|-8.23%|2.06%| |ArrayCopyUnalignedSrc.testInt|150|25|ns/op|-12.65%|1.27%| |ArrayCopyUnalignedSrc.testInt|300|25|ns/op|-14.22%|0.41%| |ArrayCopyUnalignedSrc.testInt|600|25|ns/op|-16.20%|0.37%| |ArrayCopyUnalignedSrc.testInt|1200|25|ns/op|-15.81%|1.09%| |ArrayCopyUnalignedSrc.testLong|5|25|ns/op|-5.54%|0.01%| |ArrayCopyUnalignedSrc.testLong|10|25|ns/op|35.81%🔴|0.54%| |ArrayCopyUnalignedSrc.testLong|20|25|ns/op|-3.96%|2.10%| |ArrayCopyUnalignedSrc.testLong|70|25|ns/op|-10.90%|0.79%| |ArrayCopyUnalignedSrc.testLong|150|25|ns/op|-15.83%|0.64%| |ArrayCopyUnalignedSrc.testLong|300|25|ns/op|-17.88%|1.61%| |ArrayCopyUnalignedSrc.testLong|600|25|ns/op|-18.03%|0.88%| |ArrayCopyUnalignedSrc.testLong|1200|25|ns/op|-18.93%|0.04%| ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From kvn at openjdk.java.net Mon Nov 23 21:22:04 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 21:22:04 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: <75WHr84LNShtrbKsnvqAEL6RnHKiJk9uEx4xK3n42ms=.ad0cf5d4-2373-422f-a45e-bf0b9a8e1350@github.com> On Thu, 19 Nov 2020 12:39:47 GMT, Evgeny Astigeevich wrote: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 169: > 167: if (_cpu == CPU_ARM && (_model == 0xd08 || _model2 == 0xd08)) { > 168: if (FLAG_IS_DEFAULT(UseSIMDForMemoryOps)) { > 169: FLAG_SET_DEFAULT(UseSIMDForMemoryOps, true); What about A73? Should this flag be true for it too? ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Mon Nov 23 22:08:59 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 23 Nov 2020 22:08:59 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: <75WHr84LNShtrbKsnvqAEL6RnHKiJk9uEx4xK3n42ms=.ad0cf5d4-2373-422f-a45e-bf0b9a8e1350@github.com> References: <75WHr84LNShtrbKsnvqAEL6RnHKiJk9uEx4xK3n42ms=.ad0cf5d4-2373-422f-a45e-bf0b9a8e1350@github.com> Message-ID: <8f_FAcyezL2GReTgxekHms_xlTTKK7aJUNGkcCRwbbs=.2cda3d72-55b8-487c-bf8a-b31ad555f212@github.com> On Mon, 23 Nov 2020 21:19:34 GMT, Vladimir Kozlov wrote: >> This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 169: > >> 167: if (_cpu == CPU_ARM && (_model == 0xd08 || _model2 == 0xd08)) { >> 168: if (FLAG_IS_DEFAULT(UseSIMDForMemoryOps)) { >> 169: FLAG_SET_DEFAULT(UseSIMDForMemoryOps, true); > > What about A73? Should this flag be true for it too? Hi Vladimir, Thank you for reviewing the changes. Yes, it can be enabled. However I found only HiKey960/970 by Linaro use A73. Other devices are phones. I can do this if Linaro engineers agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From dlong at openjdk.java.net Mon Nov 23 22:43:59 2020 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 23 Nov 2020 22:43:59 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 11:40:49 GMT, Richard Reingruber wrote: > This is a XS clean-up of Deoptimization::revoke_for_object_deoptimization() which removes the StackWatermarkSet::start_processing() call. > > This is correct because all paths leading to revoke_for_object_deoptimization() are equipped with a KeepStackGCProcessedMark. > > Call Tree: > > StackWatermarkSet::start_processing(JavaThread *, enum StackWatermarkKind) : void > Deoptimization::revoke_for_object_deoptimization(JavaThread *, frame, RegisterMap *, JavaThread *) : void > Deoptimization::deoptimize_objects_internal(JavaThread *, GrowableArray *, bool &) : bool > EscapeBarrier::deoptimize_objects_internal(JavaThread *, intptr_t *) : bool > EscapeBarrier::deoptimize_objects_all_threads() : bool // has KeepStackGCProcessedMark > EscapeBarrier::deoptimize_objects(intptr_t *) : bool > EscapeBarrier::deoptimize_objects(int, int) : bool // has KeepStackGCProcessedMark > > Testing: hotspot_serviceability, jdk_svc, jdk_jdi, vmTestbase_nsk_jdi, vmTestbase_nsk_jvmti, vmTestbase_nsk_jdwp with -XX:+UseZGC. The justification for removing this code is that all callers use KeepStackGCProcessedMark. Is there an assert you can add in place of the removed code that checks this invariant? ------------- Changes requested by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1381 From redestad at openjdk.java.net Mon Nov 23 23:26:09 2020 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 23 Nov 2020 23:26:09 GMT Subject: RFR: 8256883: C2: Add a RegMask iterator [v2] In-Reply-To: References: Message-ID: <7_JNB2Gt9c-4p6IqcaKNDykS3w8-wndOMMYMXCO2sg0=.131fbf27-6c38-49e7-aeb7-0f6c8d54292b@github.com> > By implementing a simple RegMaskIterator we can speed this up and possibly make the code a bit clearer by doing so. > > As a data point, this reduce the `C2Compiler::initialize` overhead from 8.82M instructions to 8.58M instructions, from the improvement in `PhaseChaitin::post_allocate_copy_removal` (~16k insns/compilation). The gain varies with type of compilation, so on the naive tests in `SimpleRepeatCompilation` it's in the noise (~2k insns/compilation on `trivialMath, for example). Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Use RegMaskIterator in zBarrierSetAssembler_x86+aarch ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1397/files - new: https://git.openjdk.java.net/jdk/pull/1397/files/4192cffe..b5345e9e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1397&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1397&range=00-01 Stats: 9 lines in 2 files changed: 1 ins; 3 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1397.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1397/head:pull/1397 PR: https://git.openjdk.java.net/jdk/pull/1397 From kvn at openjdk.java.net Mon Nov 23 23:40:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 23:40:56 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: References: Message-ID: <-Oiu2Cimcy4ZQpPIZIhVjsmKlDaSKC15fLdq-X3I7LU=.60531c0c-b1e9-4e22-9bae-d25d85a1144c@github.com> On Thu, 19 Nov 2020 12:39:47 GMT, Evgeny Astigeevich wrote: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From kvn at openjdk.java.net Mon Nov 23 23:40:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 23 Nov 2020 23:40:58 GMT Subject: RFR: 8255351: Add detection for Graviton 1 & 2 CPUs In-Reply-To: <8f_FAcyezL2GReTgxekHms_xlTTKK7aJUNGkcCRwbbs=.2cda3d72-55b8-487c-bf8a-b31ad555f212@github.com> References: <75WHr84LNShtrbKsnvqAEL6RnHKiJk9uEx4xK3n42ms=.ad0cf5d4-2373-422f-a45e-bf0b9a8e1350@github.com> <8f_FAcyezL2GReTgxekHms_xlTTKK7aJUNGkcCRwbbs=.2cda3d72-55b8-487c-bf8a-b31ad555f212@github.com> Message-ID: On Mon, 23 Nov 2020 22:06:39 GMT, Evgeny Astigeevich wrote: >> src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 169: >> >>> 167: if (_cpu == CPU_ARM && (_model == 0xd08 || _model2 == 0xd08)) { >>> 168: if (FLAG_IS_DEFAULT(UseSIMDForMemoryOps)) { >>> 169: FLAG_SET_DEFAULT(UseSIMDForMemoryOps, true); >> >> What about A73? Should this flag be true for it too? > > Hi Vladimir, > Thank you for reviewing the changes. > Yes, it can be enabled. However I found only HiKey960/970 by Linaro use A73. Other devices are phones. > I can do this if Linaro engineers agree. ARM ecosystem is really strange :) In this case keep these changes as it is and let Linaro engineers add it later if they want. ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From david.holmes at oracle.com Tue Nov 24 01:04:55 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 24 Nov 2020 11:04:55 +1000 Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: References: <618aa897-18fd-fd70-1f0a-506e0e5a74d8@oracle.com> Message-ID: <96418b57-12fe-208e-c642-b04efeedca24@oracle.com> Hi Harold, On 24/11/2020 6:27 am, Harold Seigel wrote: > Hi David, > > Thanks for looking at this. > > The intent was for method Class.permittedSubclasses() to be implemented > similarly to Class.getNestMembers().? Are you suggesting that a security > manager check be added to permittedSubclasses() similar to the security > manager check in getNestMembers()? No I'm suggesting the change to the API is plain wrong. :) Please see discussion in the CSR. Cheers, David > Thanks, Harold > > On 11/18/2020 12:31 AM, David Holmes wrote: >> Hi Vincente, >> >> On 16/11/2020 11:36 pm, Vicente Romero wrote: >>> Please review the code for the second iteration of sealed classes. In >>> this iteration we are: >>> >>> - Enhancing narrowing reference conversion to allow for stricter >>> checking of cast conversions with respect to sealed type hierarchies. >>> - Also local classes are not considered when determining implicitly >>> declared permitted direct subclasses of a sealed class or sealed >>> interface >> >> The major change here seems to be that getPermittedSubclasses() now >> returns actual Class objects instead of ClassDesc. My recollection >> from earlier discussions here was that the use of ClassDesc was very >> deliberate as the permitted subclasses may not actually exist and >> there may be security concerns with returning them! >> >> Cheers, >> David >> ----- >> >>> ------------- >>> >>> Commit messages: >>> ? - 8246778: Compiler implementation for Sealed Classes (Second Preview) >>> >>> Changes: https://git.openjdk.java.net/jdk/pull/1227/files >>> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1227&range=00 >>> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 >>> ?? Stats: 589 lines in 12 files changed: 508 ins; 18 del; 63 mod >>> ?? Patch: https://git.openjdk.java.net/jdk/pull/1227.diff >>> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >>> pull/1227/head:pull/1227 >>> >>> PR: https://git.openjdk.java.net/jdk/pull/1227 >>> From dholmes at openjdk.java.net Tue Nov 24 02:10:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 24 Nov 2020 02:10:56 GMT Subject: RFR: 8256736: Zero: GTest tests fail with "unsuppported vm variant" In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 11:04:56 GMT, Aleksey Shipilev wrote: > Manifests as `tier1` test: > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=jtreg:gtest/MetaspaceGtests.java > > 00:16:40 java.lang.Error: TESTBUG: unsuppported vm variant > 00:16:40 at GTestWrapper.getJVMVariantSubDir(GTestWrapper.java:122) > 00:16:40 at GTestWrapper.main(GTestWrapper.java:50) > 00:16:40 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 00:16:40 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78) > 00:16:40 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 00:16:40 at java.base/java.lang.reflect.Method.invoke(Method.java:564) > 00:16:40 at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > 00:16:40 at java.base/java.lang.Thread.run(Thread.java:831) > > Let's make the wrapper know about the Zero variant. > > Additional testing: > - [x] Affected tests (they still fail, but because of Zero problems) Seems fine and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1344 From jbhateja at openjdk.java.net Tue Nov 24 02:30:58 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 24 Nov 2020 02:30:58 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v14] In-Reply-To: References: Message-ID: On Sun, 22 Nov 2020 02:19:53 GMT, Vladimir Kozlov wrote: >>> Forgot to say that failure was on Windows with only avx512f, avx512cd >> >> Thanks Vladimir, I have resolved your review comments. > > Version 15 failed next tests on linux-x64 with -XX:+UseParallelGC -XX:+UseNUMA flags: > vmTestbase/metaspace/stressHierarchy/stressHierarchy015/TestDescription.java > vmTestbase/metaspace/stressHierarchy/stressHierarchy006/TestDescription.java > vmTestbase/metaspace/stressHierarchy/stressHierarchy005/TestDescription.java > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/macroArrayCopy.cpp:861), pid=8205, tid=8216 > # assert(ArrayCopyNode::may_modify(dest_t, (*ctrl)->in(0)->as_MemBar(), &_igvn, ac)) failed: dependency on arraycopy lost > # > # Problematic frame: > # V [libjvm.so+0x134fbfc] PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, AllocateArrayNode*, Node**, MergeMemNode*, Node**, TypePtr const*, BasicType, Node*, Node*, Node*, Node*, Node*, bool, bool, RegionNode*)+0x30bc > # > Host: Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz, 8 cores, 58G, Oracle Linux Server release 7.9 > > Current CompileTask: > C2: 27392 5458 4 package_level34_num50.Dummy::composeString (10 bytes) > > Stack: [0x00007f4c5a024000,0x00007f4c5a125000], sp=0x00007f4c5a120420, free space=1009k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x134fbfc] PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, AllocateArrayNode*, Node**, MergeMemNode*, Node**, TypePtr const*, BasicType, Node*, Node*, Node*, Node*, Node*, bool, bool, RegionNode*)+0x30bc > V [libjvm.so+0x1350741] PhaseMacroExpand::expand_arraycopy_node(ArrayCopyNode*)+0x641 > V [libjvm.so+0x1340d7b] PhaseMacroExpand::expand_macro_nodes()+0xfdb > V [libjvm.so+0x9fe79b] Compile::Optimize()+0x177b > V [libjvm.so+0xa00268] Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x17e8 > V [libjvm.so+0x8322ae] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1ce > V [libjvm.so+0xa103f8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xe08 > V [libjvm.so+0xa10f48] CompileBroker::compiler_thread_loop()+0x5a8 Hi @vnkozlov , Kindly let me know if there are any other comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From kbarrett at openjdk.java.net Tue Nov 24 03:05:19 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 24 Nov 2020 03:05:19 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification [v2] In-Reply-To: References: Message-ID: > Please review this change to Reference.clear() to address several issues. > > (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent > field to null may extend the lifetime of the referent value. > > (JDK-8240696) For GCs with concurrent reference processing, clearing the > referent field during reference processing may discard the expected > notification. > > Both of these are addressed by introducing a private native helper function > for clearing the referent, rather than using an ordinary in-Java field > assignment. Tests have been added for both of these issues. This required > adding a new breakpoint in reference processing for ZGC. > > Of course, finalization adds some complexity to the problem. We deal with > that by having FinalReference override clear. The implementation is > provided by a new package-private method in Reference. (There are a number > of alternatives, all of them clumsy; finalization is annoying that way.) > > While dealing with FinalReference clearing it was noted that the recent > JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not > updated to call the new Reference.getInactive(), instead still calling get() > on FinalReferences, with the JDK-8256106 problems. Fixing that showed the > assertion for inactive FinalReference added by JDK-8256370 used the wrong > test. > > Rather than tracking down and changing all get() and clear() calls on final > references and changing them to use getInactive and a new similar clear > function, I've changed FinalReference to override get and clear, which call > the helper functions in Reference. I've also renamed getInactive to be more > explanatory and less convenient to call directly, and similarly named the > helper for clear. This means that get/clear should never be called on an > active FinalReference. That's already never done, and would have problems > if it were. > > Testing: > mach5 tier1-6 > Local (linux-x64) tier1 using Shenandoah. > New TestReferenceClearDuringMarking fails for G1 without these changes. > New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: - new tests need ref in oldgen too - remove obsolete comment about races with clear and enqueue ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1376/files - new: https://git.openjdk.java.net/jdk/pull/1376/files/46e5b1f6..c19efd70 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1376&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1376&range=00-01 Stats: 17 lines in 3 files changed: 7 ins; 6 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1376.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1376/head:pull/1376 PR: https://git.openjdk.java.net/jdk/pull/1376 From kbarrett at openjdk.java.net Tue Nov 24 03:05:19 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 24 Nov 2020 03:05:19 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification [v2] In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 12:50:31 GMT, Per Liden wrote: > Looks good. Just want to request that you also remove the following comment in zReferenceProcessor.cpp, as it's no longer true. > > ``` > --- a/src/hotspot/share/gc/z/zReferenceProcessor.cpp > +++ b/src/hotspot/share/gc/z/zReferenceProcessor.cpp > @@ -184,12 +184,6 @@ bool ZReferenceProcessor::should_discover(oop reference, ReferenceType type) con > } > > bool ZReferenceProcessor::should_drop(oop reference, ReferenceType type) const { > - // This check is racing with a call to Reference.clear() from the application. > - // If the application clears the reference after this check it will still end > - // up on the pending list, and there's nothing we can do about that without > - // changing the Reference.clear() API. This check is also racing with a call > - // to Reference.enqueue() from the application, which is unproblematic, since > - // the application wants the reference to be enqueued anyway. > const oop referent = reference_referent(reference); > if (referent == NULL) { > // Reference has been cleared, by a call to Reference.enqueue() > ``` Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/1376 From kbarrett at openjdk.java.net Tue Nov 24 03:05:19 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 24 Nov 2020 03:05:19 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 02:59:50 GMT, Kim Barrett wrote: >> Looks good. Just want to request that you also remove the following comment in zReferenceProcessor.cpp, as it's no longer true. >> >> --- a/src/hotspot/share/gc/z/zReferenceProcessor.cpp >> +++ b/src/hotspot/share/gc/z/zReferenceProcessor.cpp >> @@ -184,12 +184,6 @@ bool ZReferenceProcessor::should_discover(oop reference, ReferenceType type) con >> } >> >> bool ZReferenceProcessor::should_drop(oop reference, ReferenceType type) const { >> - // This check is racing with a call to Reference.clear() from the application. >> - // If the application clears the reference after this check it will still end >> - // up on the pending list, and there's nothing we can do about that without >> - // changing the Reference.clear() API. This check is also racing with a call >> - // to Reference.enqueue() from the application, which is unproblematic, since >> - // the application wants the reference to be enqueued anyway. >> const oop referent = reference_referent(reference); >> if (referent == NULL) { >> // Reference has been cleared, by a call to Reference.enqueue() > >> Looks good. Just want to request that you also remove the following comment in zReferenceProcessor.cpp, as it's no longer true. >> >> ``` >> --- a/src/hotspot/share/gc/z/zReferenceProcessor.cpp >> +++ b/src/hotspot/share/gc/z/zReferenceProcessor.cpp >> @@ -184,12 +184,6 @@ bool ZReferenceProcessor::should_discover(oop reference, ReferenceType type) con >> } >> >> bool ZReferenceProcessor::should_drop(oop reference, ReferenceType type) const { >> - // This check is racing with a call to Reference.clear() from the application. >> - // If the application clears the reference after this check it will still end >> - // up on the pending list, and there's nothing we can do about that without >> - // changing the Reference.clear() API. This check is also racing with a call >> - // to Reference.enqueue() from the application, which is unproblematic, since >> - // the application wants the reference to be enqueued anyway. >> const oop referent = reference_referent(reference); >> if (referent == NULL) { >> // Reference has been cleared, by a call to Reference.enqueue() >> ``` > > Done. I realized there was a theoretical problem with the new tests; they should also be ensuring the Reference objects are in oldgen. That's fixed in the latest push. ------------- PR: https://git.openjdk.java.net/jdk/pull/1376 From ysuenaga at openjdk.java.net Tue Nov 24 05:12:26 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Tue, 24 Nov 2020 05:12:26 GMT Subject: RFR: 8252657: JVMTI agent is not unloaded when Agent_OnAttach is failed [v2] In-Reply-To: <1H1wUQdxCLU2qddqEIYSx2iOhIKL3b5etUmjsS6NBlU=.0bf1fe0c-8dcf-4ca0-bd57-b8794d5f2810@github.com> References: <1H1wUQdxCLU2qddqEIYSx2iOhIKL3b5etUmjsS6NBlU=.0bf1fe0c-8dcf-4ca0-bd57-b8794d5f2810@github.com> Message-ID: > If `Agent_OnAttach()` in JVMTI agent which is attempted to load via JVMTI.agent_load dcmd is failed, it would not be unloaded. > We've [discussed it on serviceability-dev](https://mail.openjdk.java.net/pipermail/serviceability-dev/2020-September/032839.html). This PR is a continuation of that. > > This PR also includes to call `Agent_OnUnload()` when `Agent_OnAttach()` failed. > > How to reproduce: > > 1. Build JVMTI agent for test > $ git clone https://github.com/YaSuenag/jvmti-examples.git > $ cd jvmti-examples/helloworld/out/build > $ cmake ../.. > > 2. Run JShell > > 3. Load JVMTI agent via `jcmd JVMTI.agent_load` with "error" ("error" means `Agent_OnAttach()` returns JNI_ERR) > $ jcmd > 89456 jdk.jshell.execution.RemoteExecutionControl 45651 > 89547 sun.tools.jcmd.JCmd > 89436 jdk.jshell/jdk.internal.jshell.tool.JShellToolProvider > $ jcmd 89436 JVMTI.agent_load `pwd`/libhelloworld.so error > 89436: > return code: -1 > > 4. Check loaded libraries via `jcmd VM.dynlibs` > $ jcmd 89436 VM.dynlibs | grep libhelloworld > 7f2f8b06b000-7f2f8b06c000 r--p 00000000 fd:00 11818202 /home/ysuenaga/github/jvmti-examples/helloworld/out/build/libhelloworld.so > 7f2f8b06c000-7f2f8b06d000 r-xp 00001000 fd:00 11818202 /home/ysuenaga/github/jvmti-examples/helloworld/out/build/libhelloworld.so > 7f2f8b06d000-7f2f8b06e000 r--p 00002000 fd:00 11818202 /home/ysuenaga/github/jvmti-examples/helloworld/out/build/libhelloworld.so > 7f2f8b06e000-7f2f8b06f000 r--p 00002000 fd:00 11818202 /home/ysuenaga/github/jvmti-examples/helloworld/out/build/libhelloworld.so > 7f2f8b06f000-7f2f8b070000 rw-p 00003000 fd:00 11818202 /home/ysuenaga/github/jvmti-examples/helloworld/out/build/libhelloworld.so Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Update patch - Merge remote-tracking branch 'upstream/master' into JDK-8252657 - revert - JVMTI agent is not unloaded when Agent_OnAttach is failed ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/19/files - new: https://git.openjdk.java.net/jdk/pull/19/files/3cc731a9..640014dc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=19&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=19&range=00-01 Stats: 715494 lines in 6740 files changed: 549418 ins; 117424 del; 48652 mod Patch: https://git.openjdk.java.net/jdk/pull/19.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/19/head:pull/19 PR: https://git.openjdk.java.net/jdk/pull/19 From ysuenaga at openjdk.java.net Tue Nov 24 05:45:55 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Tue, 24 Nov 2020 05:45:55 GMT Subject: RFR: 8252657: JVMTI agent is not unloaded when Agent_OnAttach is failed In-Reply-To: References: <1H1wUQdxCLU2qddqEIYSx2iOhIKL3b5etUmjsS6NBlU=.0bf1fe0c-8dcf-4ca0-bd57-b8794d5f2810@github.com> <80LJDTCsT_y-KlThryd5Bxu5RRyrjmKfs5p9vJUn61E=.68b594a0-fe58-4f4d-a49c-eec2e90f9373@github.com> <-Xrp6c94000-jE1p6NvzjsxFUW5ILrH_F1eT1i7esw8=.9d609f81-1b61-4ebf-9afd-73b834c1b18c@github.com> Message-ID: <96QB1WTG_puWZM9TqQusV_43lTbHoGxCIJKBpAG43o0=.0e67d662-7fd6-46d1-a571-81604669efb7@github.com> On Thu, 12 Nov 2020 01:34:33 GMT, Yasumasa Suenaga wrote: >> If we can change the spec that agent library would not be unloaded when `Agent_OnAttach()` failed, we can change like [webrev.00](https://cr.openjdk.java.net/~ysuenaga/JDK-8252657/webrev.00/). It is simple, and similar behavior with `Agent_OnLoad()`. It might be prefer for JVMTI agent developers. > > In case of `Agent_OnLoad()`, if it is failed (it returns other than zero), JVM is aborted and `Agent_OnUnload()` is not called. I think it is compliant with [JVMTI spec of Agent_OnUnload()](https://docs.oracle.com/en/java/javase/15/docs/specs/jvmti.html#onunload) which says uncontrolled shutdown (aborting JVM) is an exception to this rule. > > I will add CSR for this fix, but I want to discuss what we should do before. I like that `Agent_OnUnload()` wouldn't be called when `Agent_OnAttach()` is failed if we can change the spec - it is consistent and friendly with `Agent_OnLoad()`. I added [CSR for this PR](https://wiki.openjdk.java.net/display/csr/Main) and updated patch. Could you review it? I think it is reasonable to clarify the spec not to call `Agent_OnUnload()` when `Agent_OnLoad()` or `Agent_OnAttach()` are failed. It is nature for me, and it does not change current behavior in HotSpot. ------------- PR: https://git.openjdk.java.net/jdk/pull/19 From shade at openjdk.java.net Tue Nov 24 06:52:55 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 06:52:55 GMT Subject: RFR: 8256736: Zero: GTest tests fail with "unsuppported vm variant" In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 02:07:43 GMT, David Holmes wrote: >> Manifests as `tier1` test: >> >> $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=jtreg:gtest/MetaspaceGtests.java >> >> 00:16:40 java.lang.Error: TESTBUG: unsuppported vm variant >> 00:16:40 at GTestWrapper.getJVMVariantSubDir(GTestWrapper.java:122) >> 00:16:40 at GTestWrapper.main(GTestWrapper.java:50) >> 00:16:40 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> 00:16:40 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78) >> 00:16:40 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> 00:16:40 at java.base/java.lang.reflect.Method.invoke(Method.java:564) >> 00:16:40 at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) >> 00:16:40 at java.base/java.lang.Thread.run(Thread.java:831) >> >> Let's make the wrapper know about the Zero variant. >> >> Additional testing: >> - [x] Affected tests (they still fail, but because of Zero problems) > > Seems fine and trivial. > > Thanks, > David Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/1344 From shade at openjdk.java.net Tue Nov 24 06:52:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 06:52:57 GMT Subject: Integrated: 8256736: Zero: GTest tests fail with "unsuppported vm variant" In-Reply-To: References: Message-ID: On Fri, 20 Nov 2020 11:04:56 GMT, Aleksey Shipilev wrote: > Manifests as `tier1` test: > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=jtreg:gtest/MetaspaceGtests.java > > 00:16:40 java.lang.Error: TESTBUG: unsuppported vm variant > 00:16:40 at GTestWrapper.getJVMVariantSubDir(GTestWrapper.java:122) > 00:16:40 at GTestWrapper.main(GTestWrapper.java:50) > 00:16:40 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 00:16:40 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78) > 00:16:40 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 00:16:40 at java.base/java.lang.reflect.Method.invoke(Method.java:564) > 00:16:40 at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > 00:16:40 at java.base/java.lang.Thread.run(Thread.java:831) > > Let's make the wrapper know about the Zero variant. > > Additional testing: > - [x] Affected tests (they still fail, but because of Zero problems) This pull request has now been integrated. Changeset: b52f6c05 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/b52f6c05 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8256736: Zero: GTest tests fail with "unsuppported vm variant" Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/1344 From ysuenaga at openjdk.java.net Tue Nov 24 07:06:14 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Tue, 24 Nov 2020 07:06:14 GMT Subject: RFR: 8256916: Add JFR event for OutOfMemoryError Message-ID: OOM on Metaspace would be reported in `MetaspaceOOM` JFR event, however other OOM (e.g. Java heap) would not be reported. It is useful if JFR reports OOMs. ------------- Commit messages: - 8256916: Add JFR event for OutOfMemoryError Changes: https://git.openjdk.java.net/jdk/pull/1403/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1403&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256916 Stats: 94 lines in 5 files changed: 94 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1403.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1403/head:pull/1403 PR: https://git.openjdk.java.net/jdk/pull/1403 From ysuenaga at openjdk.java.net Tue Nov 24 07:06:14 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Tue, 24 Nov 2020 07:06:14 GMT Subject: RFR: 8256916: Add JFR event for OutOfMemoryError In-Reply-To: References: Message-ID: <73crkD-SepaAqEyV2wuGyO8tnAHjQmiIOHjN9zovm8M=.3ea49ef3-a2f6-487e-b86a-956832861a46@github.com> On Tue, 24 Nov 2020 04:31:53 GMT, Yasumasa Suenaga wrote: > OOM on Metaspace would be reported in `MetaspaceOOM` JFR event, however other OOM (e.g. Java heap) would not be reported. > > It is useful if JFR reports OOMs. All of failures on GitHub Actions are not caused by this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1403 From stuefe at openjdk.java.net Tue Nov 24 07:14:15 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 07:14:15 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 Message-ID: Hi, may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. Thanks, Thomas ------------- Commit messages: - ppc-fixes Changes: https://git.openjdk.java.net/jdk/pull/1404/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1404&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256922 Stats: 211 lines in 18 files changed: 153 ins; 0 del; 58 mod Patch: https://git.openjdk.java.net/jdk/pull/1404.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1404/head:pull/1404 PR: https://git.openjdk.java.net/jdk/pull/1404 From dholmes at openjdk.java.net Tue Nov 24 07:21:55 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 24 Nov 2020 07:21:55 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 06:57:24 GMT, Thomas Stuefe wrote: > Hi, > > may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. > > Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). > > This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. > > Thanks, Thomas Hi Thomas, Generally looks okay but two of your new files have s390 in the name instead of ppc. Cheers, David ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1404 From stuefe at openjdk.java.net Tue Nov 24 07:30:20 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 07:30:20 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: > Hi, > > may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. > > Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). > > This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. > > Thanks, Thomas Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - include guard misnamed - Rename files ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1404/files - new: https://git.openjdk.java.net/jdk/pull/1404/files/4dfd04f9..4dc288c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1404&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1404&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1404.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1404/head:pull/1404 PR: https://git.openjdk.java.net/jdk/pull/1404 From stuefe at openjdk.java.net Tue Nov 24 07:30:20 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 07:30:20 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: <2rJ6l8aaw_AMAHDv2MZnNDH8cmJ4r_N6Vue4RocsvPw=.4465fc96-99bd-49f3-b91b-38013eb6741c@github.com> On Tue, 24 Nov 2020 07:19:07 GMT, David Holmes wrote: > Hi Thomas, > > Generally looks okay but two of your new files have s390 in the name instead of ppc. > > Cheers, > David Thanks David, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From shade at openjdk.java.net Tue Nov 24 07:45:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 07:45:01 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 07:30:20 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. >> >> Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). >> >> This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. >> >> Thanks, Thomas > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - include guard misnamed > - Rename files A few minor nits. src/hotspot/cpu/ppc/foreign_globals_ppc.cpp line 39: > 37: return {}; > 38: } > 39: Excess empty line. src/hotspot/cpu/ppc/foreign_globals_ppc.hpp line 31: > 29: class ABIDescriptor {}; > 30: > 31: Excess double line. src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 3441: > 3439: return nullptr; > 3440: } > 3441: Excess new line. src/hotspot/cpu/ppc/frame_ppc.hpp line 121: > 119: }; > 120: > 121: #define _abi0(_component) \ When I was experimenting with this code, I thought to name it consistently with `_abi_reg_args_spill` definition a few lines below. That is, make it `_abi_reg_args`. Does that make more sense than `_abi0`? ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1404 From pliden at openjdk.java.net Tue Nov 24 07:48:04 2020 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 24 Nov 2020 07:48:04 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 03:05:19 GMT, Kim Barrett wrote: >> Please review this change to Reference.clear() to address several issues. >> >> (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent >> field to null may extend the lifetime of the referent value. >> >> (JDK-8240696) For GCs with concurrent reference processing, clearing the >> referent field during reference processing may discard the expected >> notification. >> >> Both of these are addressed by introducing a private native helper function >> for clearing the referent, rather than using an ordinary in-Java field >> assignment. Tests have been added for both of these issues. This required >> adding a new breakpoint in reference processing for ZGC. >> >> Of course, finalization adds some complexity to the problem. We deal with >> that by having FinalReference override clear. The implementation is >> provided by a new package-private method in Reference. (There are a number >> of alternatives, all of them clumsy; finalization is annoying that way.) >> >> While dealing with FinalReference clearing it was noted that the recent >> JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not >> updated to call the new Reference.getInactive(), instead still calling get() >> on FinalReferences, with the JDK-8256106 problems. Fixing that showed the >> assertion for inactive FinalReference added by JDK-8256370 used the wrong >> test. >> >> Rather than tracking down and changing all get() and clear() calls on final >> references and changing them to use getInactive and a new similar clear >> function, I've changed FinalReference to override get and clear, which call >> the helper functions in Reference. I've also renamed getInactive to be more >> explanatory and less convenient to call directly, and similarly named the >> helper for clear. This means that get/clear should never be called on an >> active FinalReference. That's already never done, and would have problems >> if it were. >> >> Testing: >> mach5 tier1-6 >> Local (linux-x64) tier1 using Shenandoah. >> New TestReferenceClearDuringMarking fails for G1 without these changes. >> New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - new tests need ref in oldgen too > - remove obsolete comment about races with clear and enqueue Looks good! ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1376 From stuefe at openjdk.java.net Tue Nov 24 08:31:00 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 08:31:00 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: <_9D-k2McpPS5JCghU8Em4sk9rYvZmOw4wDvmrKzmE84=.b4038142-ec82-4f27-9904-404298e3f5ba@github.com> On Tue, 24 Nov 2020 07:42:06 GMT, Aleksey Shipilev wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - include guard misnamed >> - Rename files > > src/hotspot/cpu/ppc/frame_ppc.hpp line 121: > >> 119: }; >> 120: >> 121: #define _abi0(_component) \ > > When I was experimenting with this code, I thought to name it consistently with `_abi_reg_args_spill` definition a few lines below. That is, make it `_abi_reg_args`. Does that make more sense than `_abi0`? Sure, but lets ask @RealLucy @TheRealMDoerr first to agree on a name. ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From eosterlund at openjdk.java.net Tue Nov 24 08:46:57 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 24 Nov 2020 08:46:57 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 11:40:49 GMT, Richard Reingruber wrote: > This is a XS clean-up of Deoptimization::revoke_for_object_deoptimization() which removes the StackWatermarkSet::start_processing() call. > > This is correct because all paths leading to revoke_for_object_deoptimization() are equipped with a KeepStackGCProcessedMark. > > Call Tree: > > StackWatermarkSet::start_processing(JavaThread *, enum StackWatermarkKind) : void > Deoptimization::revoke_for_object_deoptimization(JavaThread *, frame, RegisterMap *, JavaThread *) : void > Deoptimization::deoptimize_objects_internal(JavaThread *, GrowableArray *, bool &) : bool > EscapeBarrier::deoptimize_objects_internal(JavaThread *, intptr_t *) : bool > EscapeBarrier::deoptimize_objects_all_threads() : bool // has KeepStackGCProcessedMark > EscapeBarrier::deoptimize_objects(intptr_t *) : bool > EscapeBarrier::deoptimize_objects(int, int) : bool // has KeepStackGCProcessedMark > > Testing: hotspot_serviceability, jdk_svc, jdk_jdi, vmTestbase_nsk_jdi, vmTestbase_nsk_jvmti, vmTestbase_nsk_jdwp with -XX:+UseZGC. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1381 From github.com+779991+jaokim at openjdk.java.net Tue Nov 24 09:11:16 2020 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Tue, 24 Nov 2020 09:11:16 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size [v4] In-Reply-To: References: Message-ID: > ### Description > The logging for reserved heap space, printed the `GenAlignment` value instead of page size. > Before: >
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=512K size=2M
> 
> Besides changing to display page size using `os::vm_page_size()`, this fix also adds logging of `SpaceAlignment `,`GenAlignment`, and `HeapAlignment`. > After: >
> [0.367s][info][pagesize] Alignments:  space_align=512K gen_align=512K heap_align=2M
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=4K size=2M
> 
> Please advice whether the added logging of alignments is sufficient or irrelevant. There were no signs of either alignment being logged anywhere. > ### Testing > - Tested manually, and logs are showing with arguments -Xlog:pagesize=info -XX:+UseParallelGC > - Tested Tier1, Tier2, Tier3. Joakim Nordstr?m has updated the pull request incrementally with two additional commits since the last revision: - Moved helper function actual_reserved_page_size to ReservedSpace - Changed so heap info is logged as heap info and not gc_specific ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1161/files - new: https://git.openjdk.java.net/jdk/pull/1161/files/cd9773d2..d2d29b46 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1161&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1161&range=02-03 Stats: 70 lines in 6 files changed: 20 ins; 32 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/1161.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1161/head:pull/1161 PR: https://git.openjdk.java.net/jdk/pull/1161 From forax at univ-mlv.fr Tue Nov 24 09:45:14 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 24 Nov 2020 10:45:14 +0100 (CET) Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: <96418b57-12fe-208e-c642-b04efeedca24@oracle.com> References: <618aa897-18fd-fd70-1f0a-506e0e5a74d8@oracle.com> <96418b57-12fe-208e-c642-b04efeedca24@oracle.com> Message-ID: <353454426.1152629.1606211114626.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "David Holmes" > ?: "Harold David Seigel" , "Vicente Romero" , "compiler-dev" > , "core-libs-dev" , "hotspot-dev" > > Envoy?: Mardi 24 Novembre 2020 02:04:55 > Objet: Re: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) > Hi Harold, > > On 24/11/2020 6:27 am, Harold Seigel wrote: >> Hi David, >> >> Thanks for looking at this. >> >> The intent was for method Class.permittedSubclasses() to be implemented >> similarly to Class.getNestMembers().? Are you suggesting that a security >> manager check be added to permittedSubclasses() similar to the security >> manager check in getNestMembers()? > > No I'm suggesting the change to the API is plain wrong. :) Please see > discussion in the CSR. Given that the CSR is closed, i will answer here. There are two issues with the previous implementation of permittedSubclasses, first it's the only method that using method desc which means that people has to be aware on another non trivial concept (object that describes constant pool constant) to understand how to use the method then i've tested this API with my students, all but one what able to correctly derives the Class from a ClassDesc, so instead of asking every users of permittedSubclasses to go through the oops of getting Class from a ClassDesc, i think we can agree that it's better to move the burden from the user to the JDK implementors. > > Cheers, > David regards, R?mi > >> Thanks, Harold >> >> On 11/18/2020 12:31 AM, David Holmes wrote: >>> Hi Vincente, >>> >>> On 16/11/2020 11:36 pm, Vicente Romero wrote: >>>> Please review the code for the second iteration of sealed classes. In >>>> this iteration we are: >>>> >>>> - Enhancing narrowing reference conversion to allow for stricter >>>> checking of cast conversions with respect to sealed type hierarchies. >>>> - Also local classes are not considered when determining implicitly >>>> declared permitted direct subclasses of a sealed class or sealed >>>> interface >>> >>> The major change here seems to be that getPermittedSubclasses() now >>> returns actual Class objects instead of ClassDesc. My recollection >>> from earlier discussions here was that the use of ClassDesc was very >>> deliberate as the permitted subclasses may not actually exist and >>> there may be security concerns with returning them! >>> >>> Cheers, >>> David >>> ----- >>> >>>> ------------- >>>> >>>> Commit messages: >>>> ? - 8246778: Compiler implementation for Sealed Classes (Second Preview) >>>> >>>> Changes: https://git.openjdk.java.net/jdk/pull/1227/files >>>> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1227&range=00 >>>> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 >>>> ?? Stats: 589 lines in 12 files changed: 508 ins; 18 del; 63 mod >>>> ?? Patch: https://git.openjdk.java.net/jdk/pull/1227.diff >>>> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >>>> pull/1227/head:pull/1227 >>>> >>>> PR: https://git.openjdk.java.net/jdk/pull/1227 From shade at openjdk.java.net Tue Nov 24 09:46:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 09:46:56 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: <_9D-k2McpPS5JCghU8Em4sk9rYvZmOw4wDvmrKzmE84=.b4038142-ec82-4f27-9904-404298e3f5ba@github.com> References: <_9D-k2McpPS5JCghU8Em4sk9rYvZmOw4wDvmrKzmE84=.b4038142-ec82-4f27-9904-404298e3f5ba@github.com> Message-ID: On Tue, 24 Nov 2020 08:28:32 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/ppc/frame_ppc.hpp line 121: >> >>> 119: }; >>> 120: >>> 121: #define _abi0(_component) \ >> >> When I was experimenting with this code, I thought to name it consistently with `_abi_reg_args_spill` definition a few lines below. That is, make it `_abi_reg_args`. Does that make more sense than `_abi0`? > > Sure, but lets ask @RealLucy @TheRealMDoerr first to agree on a name. Pardon me, I did not mean to stall this build-fixing integration. `_abi0` is an okay name, at least temporarily. We can rename it properly later. ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From github.com+779991+jaokim at openjdk.java.net Tue Nov 24 10:16:15 2020 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Tue, 24 Nov 2020 10:16:15 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size [v5] In-Reply-To: References: Message-ID: <7GFQmOfxp_L1E6MdyCZg1aBpAr569iz9Ceqjh_nDChc=.766ea03c-17ad-488d-b6df-03e90d2a2625@github.com> > ### Description > The logging for reserved heap space, printed the `GenAlignment` value instead of page size. > Before: >
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=512K size=2M
> 
> Besides changing to display page size using `os::vm_page_size()`, this fix also adds logging of `SpaceAlignment `,`GenAlignment`, and `HeapAlignment`. > After: >
> [0.367s][info][pagesize] Alignments:  space_align=512K gen_align=512K heap_align=2M
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=4K size=2M
> 
> Please advice whether the added logging of alignments is sufficient or irrelevant. There were no signs of either alignment being logged anywhere. > ### Testing > - Tested manually, and logs are showing with arguments -Xlog:pagesize=info -XX:+UseParallelGC > - Tested Tier1, Tier2, Tier3. Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge remote-tracking branch 'origin/master' into JBS-8180069 - Moved helper function actual_reserved_page_size to ReservedSpace - Changed so heap info is logged as heap info and not gc_specific - Fixed whitespaces. - Added ParallelInitLogger with alignment logging - Added ParallelInitLogger with alignment logging - Added logging method that logs actual reserved page sizes, and not just vm_page_size. - 8180069: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size ------------- Changes: https://git.openjdk.java.net/jdk/pull/1161/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1161&range=04 Stats: 145 lines in 7 files changed: 117 ins; 24 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1161.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1161/head:pull/1161 PR: https://git.openjdk.java.net/jdk/pull/1161 From lucy at openjdk.java.net Tue Nov 24 10:21:58 2020 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 24 Nov 2020 10:21:58 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 07:30:20 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. >> >> Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). >> >> This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. >> >> Thanks, Thomas > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - include guard misnamed > - Rename files The changes look good to me. WRT the macro name (_abi0 or _abi_reg_args), I'm very much in favour of the latter. But that should not stall the build fix integration, as Aleksey notes correctly. Maybe there is a volunteer changing the name in a separate effort. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1404 From mdoerr at openjdk.java.net Tue Nov 24 10:21:59 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 24 Nov 2020 10:21:59 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 07:30:20 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. >> >> Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). >> >> This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. >> >> Thanks, Thomas > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - include guard misnamed > - Rename files Marked as reviewed by mdoerr (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From mdoerr at openjdk.java.net Tue Nov 24 10:22:01 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 24 Nov 2020 10:22:01 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: <_9D-k2McpPS5JCghU8Em4sk9rYvZmOw4wDvmrKzmE84=.b4038142-ec82-4f27-9904-404298e3f5ba@github.com> Message-ID: On Tue, 24 Nov 2020 09:44:24 GMT, Aleksey Shipilev wrote: >> Sure, but lets ask @RealLucy @TheRealMDoerr first to agree on a name. > > Pardon me, I did not mean to stall this build-fixing integration. `_abi0` is an okay name, at least temporarily. We can rename it properly later. Please don't use `_abi_reg_args_spill` because it's often used for `abi_minframe` and other derived structures, too. We could name it `_ppc_abi` because it refers to the layout specified in the 64-bit PowerPC ELF Application Binary Interface. But I'm also fine with with `_abi0`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From lucy at openjdk.java.net Tue Nov 24 10:22:01 2020 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 24 Nov 2020 10:22:01 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: <_9D-k2McpPS5JCghU8Em4sk9rYvZmOw4wDvmrKzmE84=.b4038142-ec82-4f27-9904-404298e3f5ba@github.com> Message-ID: On Tue, 24 Nov 2020 10:14:16 GMT, Martin Doerr wrote: >> Pardon me, I did not mean to stall this build-fixing integration. `_abi0` is an okay name, at least temporarily. We can rename it properly later. > > Please don't use `_abi_reg_args_spill` because it's often used for `abi_minframe` and other derived structures, too. > We could name it `_ppc_abi` because it refers to the layout specified in the 64-bit PowerPC ELF Application Binary Interface. > But I'm also fine with with `_abi0`. @TheRealMDoerr Why not use `_abi_reg_args`, as Aleksey suggested? Do you see an issue with that name as well? ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From sjohanss at openjdk.java.net Tue Nov 24 10:26:07 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 24 Nov 2020 10:26:07 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size [v5] In-Reply-To: <7GFQmOfxp_L1E6MdyCZg1aBpAr569iz9Ceqjh_nDChc=.766ea03c-17ad-488d-b6df-03e90d2a2625@github.com> References: <7GFQmOfxp_L1E6MdyCZg1aBpAr569iz9Ceqjh_nDChc=.766ea03c-17ad-488d-b6df-03e90d2a2625@github.com> Message-ID: On Tue, 24 Nov 2020 10:16:15 GMT, Joakim Nordstr?m wrote: >> ### Description >> The logging for reserved heap space, printed the `GenAlignment` value instead of page size. >> Before: >>
>> [0.369s][info][gc      ] Using Parallel
>> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=512K size=2M
>> 
>> Besides changing to display page size using `os::vm_page_size()`, this fix also adds logging of `SpaceAlignment `,`GenAlignment`, and `HeapAlignment`. >> After: >>
>> [0.367s][info][pagesize] Alignments:  space_align=512K gen_align=512K heap_align=2M
>> [0.369s][info][gc      ] Using Parallel
>> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=4K size=2M
>> 
>> Please advice whether the added logging of alignments is sufficient or irrelevant. There were no signs of either alignment being logged anywhere. >> ### Testing >> - Tested manually, and logs are showing with arguments -Xlog:pagesize=info -XX:+UseParallelGC >> - Tested Tier1, Tier2, Tier3. > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge remote-tracking branch 'origin/master' into JBS-8180069 > - Moved helper function actual_reserved_page_size to ReservedSpace > - Changed so heap info is logged as heap info and not gc_specific > - Fixed whitespaces. > - Added ParallelInitLogger with alignment logging > - Added ParallelInitLogger with alignment logging > - Added logging method that logs actual reserved page sizes, and not just vm_page_size. > - 8180069: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size Thanks for addressing my comments, just one small nit left from me. src/hotspot/share/gc/parallel/parallelInitLogger.cpp line 38: > 36: byte_size_in_exact_unit(GenAlignment), exact_unit_for_byte_size(GenAlignment), > 37: byte_size_in_exact_unit(HeapAlignment), exact_unit_for_byte_size(HeapAlignment) > 38: ); Move `);` up to the line above. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1161 From mdoerr at openjdk.java.net Tue Nov 24 10:28:55 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 24 Nov 2020 10:28:55 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: <_9D-k2McpPS5JCghU8Em4sk9rYvZmOw4wDvmrKzmE84=.b4038142-ec82-4f27-9904-404298e3f5ba@github.com> Message-ID: On Tue, 24 Nov 2020 10:18:35 GMT, Lutz Schmidt wrote: >> Please don't use `_abi_reg_args_spill` because it's often used for `abi_minframe` and other derived structures, too. >> We could name it `_ppc_abi` because it refers to the layout specified in the 64-bit PowerPC ELF Application Binary Interface. >> But I'm also fine with with `_abi0`. > > @TheRealMDoerr Why not use `_abi_reg_args`, as Aleksey suggested? Do you see an issue with that name as well? I think that would be confusing, because `_abi_reg_args` sounds like we had a `abi_reg_args` structure on stack which is often not the case. We are using it for any frame structure which is based on `abi_minframe`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From stuefe at openjdk.java.net Tue Nov 24 11:03:17 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 11:03:17 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v3] In-Reply-To: References: Message-ID: > Hi, > > may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. > > Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). > > This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. > > Thanks, Thomas Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: remove excess newlines ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1404/files - new: https://git.openjdk.java.net/jdk/pull/1404/files/4dc288c8..f29b4129 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1404&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1404&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1404.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1404/head:pull/1404 PR: https://git.openjdk.java.net/jdk/pull/1404 From stuefe at openjdk.java.net Tue Nov 24 11:03:18 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 11:03:18 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 10:15:42 GMT, Martin Doerr wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - include guard misnamed >> - Rename files > > Marked as reviewed by mdoerr (Reviewer). Thanks guys, I fixed the newlines and will integrate. Renaming of the abi macro I leave to a follow up. ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From stuefe at openjdk.java.net Tue Nov 24 11:03:19 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 11:03:19 GMT Subject: Integrated: JDK-8256922: ppc, ppcle build broken after JDK-8254231 In-Reply-To: References: Message-ID: <8OAX1rRXdVcYY3-AFGZf_1eeT8dpYVD9s4G1ppzpvcA=.72462050-cc28-4527-b132-551f12ed5725@github.com> On Tue, 24 Nov 2020 06:57:24 GMT, Thomas Stuefe wrote: > Hi, > > may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. > > Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). > > This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. > > Thanks, Thomas This pull request has now been integrated. Changeset: f8d7c5a5 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/f8d7c5a5 Stats: 208 lines in 18 files changed: 150 ins; 0 del; 58 mod 8256922: ppc, ppcle build broken after JDK-8254231 Reviewed-by: shade, lucy, mdoerr ------------- PR: https://git.openjdk.java.net/jdk/pull/1404 From shade at openjdk.java.net Tue Nov 24 11:03:18 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Nov 2020 11:03:18 GMT Subject: RFR: JDK-8256922: ppc, ppcle build broken after JDK-8254231 [v3] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 11:00:24 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for these fixes to the ppc, ppcle builds after JDK-8254231. >> >> Once the build is fixed, VM does not come up, but that is a separate issue (JDK-8256924). >> >> This roughly followed Alekseys work on s390, but additionally I had to rename the _abi macro due to name conflict with a class member in the new code. >> >> Thanks, Thomas > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > remove excess newlines Looks fine, let's push and unbreak the builds. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1404 From rrich at openjdk.java.net Tue Nov 24 11:27:10 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 24 Nov 2020 11:27:10 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant [v2] In-Reply-To: References: Message-ID: > This is a XS clean-up of Deoptimization::revoke_for_object_deoptimization() which removes the StackWatermarkSet::start_processing() call. > > This is correct because all paths leading to revoke_for_object_deoptimization() are equipped with a KeepStackGCProcessedMark. > > Call Tree: > > StackWatermarkSet::start_processing(JavaThread *, enum StackWatermarkKind) : void > Deoptimization::revoke_for_object_deoptimization(JavaThread *, frame, RegisterMap *, JavaThread *) : void > Deoptimization::deoptimize_objects_internal(JavaThread *, GrowableArray *, bool &) : bool > EscapeBarrier::deoptimize_objects_internal(JavaThread *, intptr_t *) : bool > EscapeBarrier::deoptimize_objects_all_threads() : bool // has KeepStackGCProcessedMark > EscapeBarrier::deoptimize_objects(intptr_t *) : bool > EscapeBarrier::deoptimize_objects(int, int) : bool // has KeepStackGCProcessedMark > > Testing: hotspot_serviceability, jdk_svc, jdk_jdi, vmTestbase_nsk_jdi, vmTestbase_nsk_jvmti, vmTestbase_nsk_jdwp with -XX:+UseZGC. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Assert stack is kept gc processed. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1381/files - new: https://git.openjdk.java.net/jdk/pull/1381/files/f09aaf9b..7f61dbeb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1381&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1381&range=00-01 Stats: 21 lines in 4 files changed: 21 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1381.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1381/head:pull/1381 PR: https://git.openjdk.java.net/jdk/pull/1381 From rrich at openjdk.java.net Tue Nov 24 11:31:56 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 24 Nov 2020 11:31:56 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant [v2] In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 22:41:03 GMT, Dean Long wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Assert stack is kept gc processed. > > The justification for removing this code is that all callers use KeepStackGCProcessedMark. Is there an assert you can add in place of the removed code that checks this invariant? Thanks for having a look @dean-long > > > The justification for removing this code is that all callers use KeepStackGCProcessedMark. Is there an assert you can add in place of the removed code that checks this invariant? I don't think there is an assert I could use. I added code that allows for asserting this. Let me know which version you prefer. ------------- PR: https://git.openjdk.java.net/jdk/pull/1381 From rrich at openjdk.java.net Tue Nov 24 11:34:59 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 24 Nov 2020 11:34:59 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 08:43:55 GMT, Erik ?sterlund wrote: > > > Looks good. Thank you @fisk Note that I added code to assert that the remote stack is kept gc processed. Just in case you have comments on that. ------------- PR: https://git.openjdk.java.net/jdk/pull/1381 From david.holmes at oracle.com Tue Nov 24 12:16:39 2020 From: david.holmes at oracle.com (David Holmes) Date: Tue, 24 Nov 2020 22:16:39 +1000 Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: <353454426.1152629.1606211114626.JavaMail.zimbra@u-pem.fr> References: <618aa897-18fd-fd70-1f0a-506e0e5a74d8@oracle.com> <96418b57-12fe-208e-c642-b04efeedca24@oracle.com> <353454426.1152629.1606211114626.JavaMail.zimbra@u-pem.fr> Message-ID: Hi Remi, On 24/11/2020 7:45 pm, Remi Forax wrote: > ----- Mail original ----- >> De: "David Holmes" >> ?: "Harold David Seigel" , "Vicente Romero" , "compiler-dev" >> , "core-libs-dev" , "hotspot-dev" >> >> Envoy?: Mardi 24 Novembre 2020 02:04:55 >> Objet: Re: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) > >> Hi Harold, >> >> On 24/11/2020 6:27 am, Harold Seigel wrote: >>> Hi David, >>> >>> Thanks for looking at this. >>> >>> The intent was for method Class.permittedSubclasses() to be implemented >>> similarly to Class.getNestMembers().? Are you suggesting that a security >>> manager check be added to permittedSubclasses() similar to the security >>> manager check in getNestMembers()? >> >> No I'm suggesting the change to the API is plain wrong. :) Please see >> discussion in the CSR. > > Given that the CSR is closed, i will answer here. > There are two issues with the previous implementation of permittedSubclasses, first it's the only method that using method desc which means that people has to be aware on another non trivial concept (object that describes constant pool constant) to understand how to use the method then i've tested this API with my students, all but one what able to correctly derives the Class from a ClassDesc, so instead of asking every users of permittedSubclasses to go through the oops of getting Class from a ClassDesc, i think we can agree that it's better to move the burden from the user to the JDK implementors. Why is the objective to get the Class objects? What purpose does that serve? The original API provided a descriptor for the contents of the permittedSubclasses attribute. I find it totally bizarre to have an API whose role is now to attempt to load all the subclasses of a sealed class. YMMV. David >> >> Cheers, >> David > > regards, > R?mi > >> >>> Thanks, Harold >>> >>> On 11/18/2020 12:31 AM, David Holmes wrote: >>>> Hi Vincente, >>>> >>>> On 16/11/2020 11:36 pm, Vicente Romero wrote: >>>>> Please review the code for the second iteration of sealed classes. In >>>>> this iteration we are: >>>>> >>>>> - Enhancing narrowing reference conversion to allow for stricter >>>>> checking of cast conversions with respect to sealed type hierarchies. >>>>> - Also local classes are not considered when determining implicitly >>>>> declared permitted direct subclasses of a sealed class or sealed >>>>> interface >>>> >>>> The major change here seems to be that getPermittedSubclasses() now >>>> returns actual Class objects instead of ClassDesc. My recollection >>>> from earlier discussions here was that the use of ClassDesc was very >>>> deliberate as the permitted subclasses may not actually exist and >>>> there may be security concerns with returning them! >>>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> ------------- >>>>> >>>>> Commit messages: >>>>> ? - 8246778: Compiler implementation for Sealed Classes (Second Preview) >>>>> >>>>> Changes: https://git.openjdk.java.net/jdk/pull/1227/files >>>>> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1227&range=00 >>>>> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 >>>>> ?? Stats: 589 lines in 12 files changed: 508 ins; 18 del; 63 mod >>>>> ?? Patch: https://git.openjdk.java.net/jdk/pull/1227.diff >>>>> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >>>>> pull/1227/head:pull/1227 >>>>> >>>>> PR: https://git.openjdk.java.net/jdk/pull/1227 From stefank at openjdk.java.net Tue Nov 24 12:50:04 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 24 Nov 2020 12:50:04 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing Message-ID: The EventLog locks are taken when the hs_err files are generated. Since crashes and asserts can occur when other locks are held, this can cause lock reordering problems if the held locks also are low-rank locks. There's no way to solve this if blocking locks are taken. I hit this problem when investigating making the GCLogPrecious lock use the lowest lock rank (same as EventLog). See JDK-8254877. Both GCLogPrecious and EventLog are considered "leaf" locks. No other locks should be taken when those locks are taken. However, if we crash in either of these sub-systems, there will be a lock-reordering error message in the hs_err file, and the rest of the logged info is skipped in the currently logged section. The proposal is to use try_lock_without_range_check and only log information if the lock could be acquired without blocking. This relies on the new try_lock_without_range_check function from JDK-8255678. I've tested this by injecting crashes while not holding locks in both GCLogPrecious, while holding locks during EventLog logging, and when not holding the locks, and verified that we get the expected behavior. Example output while crashing during 'Internal exceptions' logging: Classes redefined (0 events): No events Internal exceptions (5 events): No events printed - crash while holding lock Events (20 events): Event: 1,437 loading class java/util/HashMap$KeyIterator Event: 1,438 loading class java/util/HashMap$KeyIterator done Event: 1,438 loading class java/lang/module/ModuleDescriptor$Exports ------------- Commit messages: - 8256382: Use try_lock for hs_err EventLog printing Changes: https://git.openjdk.java.net/jdk/pull/1408/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1408&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256382 Stats: 30 lines in 1 file changed: 26 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/1408.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1408/head:pull/1408 PR: https://git.openjdk.java.net/jdk/pull/1408 From tschatzl at openjdk.java.net Tue Nov 24 13:11:09 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 24 Nov 2020 13:11:09 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size [v5] In-Reply-To: <7GFQmOfxp_L1E6MdyCZg1aBpAr569iz9Ceqjh_nDChc=.766ea03c-17ad-488d-b6df-03e90d2a2625@github.com> References: <7GFQmOfxp_L1E6MdyCZg1aBpAr569iz9Ceqjh_nDChc=.766ea03c-17ad-488d-b6df-03e90d2a2625@github.com> Message-ID: On Tue, 24 Nov 2020 10:16:15 GMT, Joakim Nordstr?m wrote: >> ### Description >> The logging for reserved heap space, printed the `GenAlignment` value instead of page size. >> Before: >>
>> [0.369s][info][gc      ] Using Parallel
>> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=512K size=2M
>> 
>> Besides changing to display page size using `os::vm_page_size()`, this fix also adds logging of `SpaceAlignment `,`GenAlignment`, and `HeapAlignment`. >> After: >>
>> [0.367s][info][pagesize] Alignments:  space_align=512K gen_align=512K heap_align=2M
>> [0.369s][info][gc      ] Using Parallel
>> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=4K size=2M
>> 
>> Please advice whether the added logging of alignments is sufficient or irrelevant. There were no signs of either alignment being logged anywhere. >> ### Testing >> - Tested manually, and logs are showing with arguments -Xlog:pagesize=info -XX:+UseParallelGC >> - Tested Tier1, Tier2, Tier3. > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge remote-tracking branch 'origin/master' into JBS-8180069 > - Moved helper function actual_reserved_page_size to ReservedSpace > - Changed so heap info is logged as heap info and not gc_specific > - Fixed whitespaces. > - Added ParallelInitLogger with alignment logging > - Added ParallelInitLogger with alignment logging > - Added logging method that logs actual reserved page sizes, and not just vm_page_size. > - 8180069: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1161 From stuefe at openjdk.java.net Tue Nov 24 13:33:54 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 24 Nov 2020 13:33:54 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing In-Reply-To: References: Message-ID: <69fkZdGvgYprbfVE4dKZfkhBgnOPiNEoyVy3FpsDGck=.0855b1bf-da14-4f18-9808-0f0fd1bf0cba@github.com> On Tue, 24 Nov 2020 12:46:08 GMT, Stefan Karlsson wrote: > The EventLog locks are taken when the hs_err files are generated. Since crashes and asserts can occur when other locks are held, this can cause lock reordering problems if the held locks also are low-rank locks. There's no way to solve this if blocking locks are taken. > > I hit this problem when investigating making the GCLogPrecious lock use the lowest lock rank (same as EventLog). See JDK-8254877. > > Both GCLogPrecious and EventLog are considered "leaf" locks. No other locks should be taken when those locks are taken. However, if we crash in either of these sub-systems, there will be a lock-reordering error message in the hs_err file, and the rest of the logged info is skipped in the currently logged section. > > The proposal is to use try_lock_without_range_check and only log information if the lock could be acquired without blocking. This relies on the new try_lock_without_range_check function from JDK-8255678. > > I've tested this by injecting crashes while not holding locks in both GCLogPrecious, while holding locks during EventLog logging, and when not holding the locks, and verified that we get the expected behavior. > > Example output while crashing during 'Internal exceptions' logging: > Classes redefined (0 events): > No events > > Internal exceptions (5 events): > No events printed - crash while holding lock > > Events (20 events): > Event: 1,437 loading class java/util/HashMap$KeyIterator > Event: 1,438 loading class java/util/HashMap$KeyIterator done > Event: 1,438 loading class java/lang/module/ModuleDescriptor$Exports Hi Stefan, can't we just print out unconditionally if VMError::is_error_reported()? Whats the worst that can happen, we crash? Then we would continue with the next reporting step. But we still have a chance to see the event buffer contents even if the event logger itself crashed or asserted. Side note, some time ago I rewrote the whole event system: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-March/033150.html to a much simpler implementation and I think that one would also be pretty safe to get printed even if unlocked since it works on a static pre-allocated buffer. But that work somehow got bogged down in review and I never got around to drive it upstream. If there is enough interest I may take up the work again. ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From forax at univ-mlv.fr Tue Nov 24 13:56:22 2020 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Tue, 24 Nov 2020 14:56:22 +0100 (CET) Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: References: <618aa897-18fd-fd70-1f0a-506e0e5a74d8@oracle.com> <96418b57-12fe-208e-c642-b04efeedca24@oracle.com> <353454426.1152629.1606211114626.JavaMail.zimbra@u-pem.fr> Message-ID: <555245742.1403793.1606226182119.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "David Holmes" > ?: "Remi Forax" > Cc: "Harold David Seigel" , "Vicente Romero" , "compiler-dev" > , "core-libs-dev" , "hotspot-dev" > > Envoy?: Mardi 24 Novembre 2020 13:16:39 > Objet: Re: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) > Hi Remi, > > On 24/11/2020 7:45 pm, Remi Forax wrote: >> ----- Mail original ----- >>> De: "David Holmes" >>> ?: "Harold David Seigel" , "Vicente Romero" >>> , "compiler-dev" >>> , "core-libs-dev" >>> , "hotspot-dev" >>> >>> Envoy?: Mardi 24 Novembre 2020 02:04:55 >>> Objet: Re: RFR: 8246778: Compiler implementation for Sealed Classes (Second >>> Preview) >> >>> Hi Harold, >>> >>> On 24/11/2020 6:27 am, Harold Seigel wrote: >>>> Hi David, >>>> >>>> Thanks for looking at this. >>>> >>>> The intent was for method Class.permittedSubclasses() to be implemented >>>> similarly to Class.getNestMembers().? Are you suggesting that a security >>>> manager check be added to permittedSubclasses() similar to the security >>>> manager check in getNestMembers()? >>> >>> No I'm suggesting the change to the API is plain wrong. :) Please see >>> discussion in the CSR. >> >> Given that the CSR is closed, i will answer here. >> There are two issues with the previous implementation of permittedSubclasses, >> first it's the only method that using method desc which means that people has >> to be aware on another non trivial concept (object that describes constant pool >> constant) to understand how to use the method then i've tested this API with my >> students, all but one what able to correctly derives the Class from a >> ClassDesc, so instead of asking every users of permittedSubclasses to go >> through the oops of getting Class from a ClassDesc, i think we can agree that >> it's better to move the burden from the user to the JDK implementors. > > Why is the objective to get the Class objects? What purpose does that > serve? The whole idea of the reflection is to provide the runtime view of Java the language. Even if such thing does not exist. > The original API provided a descriptor for the contents of the > permittedSubclasses attribute. I find it totally bizarre to have an API > whose role is now to attempt to load all the subclasses of a sealed class. It's as bizarre as Class.getClasses() loading all the member classes. You are discovering that the reflection API is bizarre, and it is :) It's not the view of the JVM, it's the view of Java at runtime, whatever it means. > > YMMV. > > David R?mi [1] https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/lang/Class.html#getClasses() > >>> >>> Cheers, >>> David >> >> regards, >> R?mi >> >>> >>>> Thanks, Harold >>>> >>>> On 11/18/2020 12:31 AM, David Holmes wrote: >>>>> Hi Vincente, >>>>> >>>>> On 16/11/2020 11:36 pm, Vicente Romero wrote: >>>>>> Please review the code for the second iteration of sealed classes. In >>>>>> this iteration we are: >>>>>> >>>>>> - Enhancing narrowing reference conversion to allow for stricter >>>>>> checking of cast conversions with respect to sealed type hierarchies. >>>>>> - Also local classes are not considered when determining implicitly >>>>>> declared permitted direct subclasses of a sealed class or sealed >>>>>> interface >>>>> >>>>> The major change here seems to be that getPermittedSubclasses() now >>>>> returns actual Class objects instead of ClassDesc. My recollection >>>>> from earlier discussions here was that the use of ClassDesc was very >>>>> deliberate as the permitted subclasses may not actually exist and >>>>> there may be security concerns with returning them! >>>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> ------------- >>>>>> >>>>>> Commit messages: >>>>>> ? - 8246778: Compiler implementation for Sealed Classes (Second Preview) >>>>>> >>>>>> Changes: https://git.openjdk.java.net/jdk/pull/1227/files >>>>>> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1227&range=00 >>>>>> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 >>>>>> ?? Stats: 589 lines in 12 files changed: 508 ins; 18 del; 63 mod >>>>>> ?? Patch: https://git.openjdk.java.net/jdk/pull/1227.diff >>>>>> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >>>>>> pull/1227/head:pull/1227 >>>>>> > >>>>> PR: https://git.openjdk.java.net/jdk/pull/1227 From github.com+779991+jaokim at openjdk.java.net Tue Nov 24 14:05:15 2020 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Tue, 24 Nov 2020 14:05:15 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size [v6] In-Reply-To: References: Message-ID: > ### Description > The logging for reserved heap space, printed the `GenAlignment` value instead of page size. > Before: >
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=512K size=2M
> 
> Besides changing to display page size using `os::vm_page_size()`, this fix also adds logging of `SpaceAlignment `,`GenAlignment`, and `HeapAlignment`. > After: >
> [0.367s][info][pagesize] Alignments:  space_align=512K gen_align=512K heap_align=2M
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=4K size=2M
> 
> Please advice whether the added logging of alignments is sufficient or irrelevant. There were no signs of either alignment being logged anywhere. > ### Testing > - Tested manually, and logs are showing with arguments -Xlog:pagesize=info -XX:+UseParallelGC > - Tested Tier1, Tier2, Tier3. Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: ?Fixed dangling semicolon ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1161/files - new: https://git.openjdk.java.net/jdk/pull/1161/files/a00a1262..8293b65d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1161&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1161&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1161.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1161/head:pull/1161 PR: https://git.openjdk.java.net/jdk/pull/1161 From rrich at openjdk.java.net Tue Nov 24 14:21:58 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 24 Nov 2020 14:21:58 GMT Subject: RFR: 8256843: [PPC64] runtime/logging/RedefineClasses.java fails with assert: registers not saved on stack In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 16:57:55 GMT, Martin Doerr wrote: > Method handle logging is broken in fastdebug builds. Problem is that os::current_frame() doesn't return the right frame in fastdebug builds. Looks good to me. ------------- Marked as reviewed by rrich (Committer). PR: https://git.openjdk.java.net/jdk/pull/1394 From neliasso at openjdk.java.net Tue Nov 24 14:31:02 2020 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 24 Nov 2020 14:31:02 GMT Subject: RFR: 8255479: [aarch64] assert(src->section_index_of(target) == CodeBuffer::SECT_NONE) failed: sanity In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 11:53:04 GMT, Patric Hedlin wrote: > The "byte map base" (CardTable) might be materialised as an external address but as such the current relocation support expects an address _external_ to the associated CodeBuffer. This might not be the case since "byte map base" /is not/need not be/ a proper address. Instead, the "byte map base" may be materialised as a constant, in order to avoid relocation (issues). > > (If PC-relative materialisation should be used, a new RFE is suggested.) > > Changing assert in "fix_relocation_after_move" to cover both target == NULL and target != NULL, for both source and destination code buffer. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1382 From stefank at openjdk.java.net Tue Nov 24 14:48:57 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 24 Nov 2020 14:48:57 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing In-Reply-To: <69fkZdGvgYprbfVE4dKZfkhBgnOPiNEoyVy3FpsDGck=.0855b1bf-da14-4f18-9808-0f0fd1bf0cba@github.com> References: <69fkZdGvgYprbfVE4dKZfkhBgnOPiNEoyVy3FpsDGck=.0855b1bf-da14-4f18-9808-0f0fd1bf0cba@github.com> Message-ID: On Tue, 24 Nov 2020 13:31:28 GMT, Thomas Stuefe wrote: >> The EventLog locks are taken when the hs_err files are generated. Since crashes and asserts can occur when other locks are held, this can cause lock reordering problems if the held locks also are low-rank locks. There's no way to solve this if blocking locks are taken. >> >> I hit this problem when investigating making the GCLogPrecious lock use the lowest lock rank (same as EventLog). See JDK-8254877. >> >> Both GCLogPrecious and EventLog are considered "leaf" locks. No other locks should be taken when those locks are taken. However, if we crash in either of these sub-systems, there will be a lock-reordering error message in the hs_err file, and the rest of the logged info is skipped in the currently logged section. >> >> The proposal is to use try_lock_without_range_check and only log information if the lock could be acquired without blocking. This relies on the new try_lock_without_range_check function from JDK-8255678. >> >> I've tested this by injecting crashes while not holding locks in both GCLogPrecious, while holding locks during EventLog logging, and when not holding the locks, and verified that we get the expected behavior. >> >> Example output while crashing during 'Internal exceptions' logging: >> Classes redefined (0 events): >> No events >> >> Internal exceptions (5 events): >> No events printed - crash while holding lock >> >> Events (20 events): >> Event: 1,437 loading class java/util/HashMap$KeyIterator >> Event: 1,438 loading class java/util/HashMap$KeyIterator done >> Event: 1,438 loading class java/lang/module/ModuleDescriptor$Exports > > Hi Stefan, > > can't we just print out unconditionally if VMError::is_error_reported()? Whats the worst that can happen, we crash? Then we would continue with the next reporting step. But we still have a chance to see the event buffer contents even if the event logger itself crashed or asserted. > > Side note, some time ago I rewrote the whole event system: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-March/033150.html to a much simpler implementation and I think that one would also be pretty safe to get printed even if unlocked since it works on a static pre-allocated buffer. But that work somehow got bogged down in review and I never got around to drive it upstream. If there is enough interest I may take up the work again. Hi @tstuefe With your proposal, if we have a section that looks like this: * event log instance 0 * event log instance 1 * event log instance 2 * other type of printing and we crash at "event log instance 1", then we completely skip printing "event log instance 2" and "other type of printing". My thinking was that that was unfortunate, and I wanted to localize the problem. We often have this problem that one hs_err crash hides important information that was supposed to be written later, because a lot of the code in the hs_err printing isn't hardened and the hs_err sections are on a too high level (IMHO). With that said, I don't mind skipping the trying to take the lock if we are printing the hs_err file. Do others agree with Thomas? ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From mdoerr at openjdk.java.net Tue Nov 24 14:52:57 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 24 Nov 2020 14:52:57 GMT Subject: RFR: 8256843: [PPC64] runtime/logging/RedefineClasses.java fails with assert: registers not saved on stack In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 14:19:05 GMT, Richard Reingruber wrote: >> Method handle logging is broken in fastdebug builds. Problem is that os::current_frame() doesn't return the right frame in fastdebug builds. > > Looks good to me. Thanks for the review! Unfortunatly, it's still not fully tested because PPC build is currently broken. I'll check again later. ------------- PR: https://git.openjdk.java.net/jdk/pull/1394 From kvn at openjdk.java.net Tue Nov 24 17:10:00 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 24 Nov 2020 17:10:00 GMT Subject: RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v17] In-Reply-To: References: Message-ID: On Sun, 22 Nov 2020 21:04:56 GMT, Jatin Bhateja wrote: >> Summary: >> >> 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. >> 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. >> 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. >> 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. >> >> Performance Results: >> System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz >> Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java >> ArrayCopyPartialInlineSize : 32 >> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain >> -- | -- | -- | -- | -- >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 >> ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 >> ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 >> ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 >> ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 >> ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 >> ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 >> ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 >> ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 >> ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 >> ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 >> ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 >> ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 >> ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 >> ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 >> ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 >> ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 >> ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 >> ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 >> ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 >> ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 >> ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 >> ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 >> ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 >> ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 >> ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 >> ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 >> ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 >> ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 >> ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 >> ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 >> ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 >> ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 >> ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 >> ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 >> ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 >> ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 >> ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 >> ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 >> ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 >> ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 >> ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 >> ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 >> ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 >> ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 >> ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 >> ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 >> ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 >> ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 >> ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 >> >> Detailed Reports: >> Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) >> WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Removing special handling for constant length, GVN will remove dead stub blocks in case constant length is less than partial inline size. tier1-tier4 passed without new failures ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/302 From egahlin at openjdk.java.net Tue Nov 24 17:45:55 2020 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Tue, 24 Nov 2020 17:45:55 GMT Subject: RFR: 8256916: Add JFR event for OutOfMemoryError In-Reply-To: <73crkD-SepaAqEyV2wuGyO8tnAHjQmiIOHjN9zovm8M=.3ea49ef3-a2f6-487e-b86a-956832861a46@github.com> References: <73crkD-SepaAqEyV2wuGyO8tnAHjQmiIOHjN9zovm8M=.3ea49ef3-a2f6-487e-b86a-956832861a46@github.com> Message-ID: On Tue, 24 Nov 2020 07:01:56 GMT, Yasumasa Suenaga wrote: >> OOM on Metaspace would be reported in `MetaspaceOOM` JFR event, however other OOM (e.g. Java heap) would not be reported. >> >> It is useful if JFR reports OOMs. > > All of failures on GitHub Actions are not caused by this PR. Will this fix be able to handla all cases of OOM? ------------- PR: https://git.openjdk.java.net/jdk/pull/1403 From phedlin at openjdk.java.net Tue Nov 24 18:59:56 2020 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 24 Nov 2020 18:59:56 GMT Subject: RFR: 8255479: [aarch64] assert(src->section_index_of(target) == CodeBuffer::SECT_NONE) failed: sanity In-Reply-To: References: Message-ID: <79uX8L7KnOMBGe5nO96xrQbbqxN5NioMHqrWDE5edUU=.43442abe-a860-4b74-a1ee-b103f3ea2385@github.com> On Tue, 24 Nov 2020 14:27:41 GMT, Nils Eliasson wrote: >> The "byte map base" (CardTable) might be materialised as an external address but as such the current relocation support expects an address _external_ to the associated CodeBuffer. This might not be the case since "byte map base" /is not/need not be/ a proper address. Instead, the "byte map base" may be materialised as a constant, in order to avoid relocation (issues). >> >> (If PC-relative materialisation should be used, a new RFE is suggested.) >> >> Changing assert in "fix_relocation_after_move" to cover both target == NULL and target != NULL, for both source and destination code buffer. > > Looks good. Thanks for reviewing Andrew and Nils. ------------- PR: https://git.openjdk.java.net/jdk/pull/1382 From phedlin at openjdk.java.net Tue Nov 24 18:59:57 2020 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 24 Nov 2020 18:59:57 GMT Subject: Integrated: 8255479: [aarch64] assert(src->section_index_of(target) == CodeBuffer::SECT_NONE) failed: sanity In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 11:53:04 GMT, Patric Hedlin wrote: > The "byte map base" (CardTable) might be materialised as an external address but as such the current relocation support expects an address _external_ to the associated CodeBuffer. This might not be the case since "byte map base" /is not/need not be/ a proper address. Instead, the "byte map base" may be materialised as a constant, in order to avoid relocation (issues). > > (If PC-relative materialisation should be used, a new RFE is suggested.) > > Changing assert in "fix_relocation_after_move" to cover both target == NULL and target != NULL, for both source and destination code buffer. This pull request has now been integrated. Changeset: 695117f8 Author: Patric Hedlin URL: https://git.openjdk.java.net/jdk/commit/695117f8 Stats: 23 lines in 3 files changed: 1 ins; 11 del; 11 mod 8255479: [aarch64] assert(src->section_index_of(target) == CodeBuffer::SECT_NONE) failed: sanity Reviewed-by: aph, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/1382 From david.holmes at oracle.com Tue Nov 24 22:49:25 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 25 Nov 2020 08:49:25 +1000 Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: <555245742.1403793.1606226182119.JavaMail.zimbra@u-pem.fr> References: <618aa897-18fd-fd70-1f0a-506e0e5a74d8@oracle.com> <96418b57-12fe-208e-c642-b04efeedca24@oracle.com> <353454426.1152629.1606211114626.JavaMail.zimbra@u-pem.fr> <555245742.1403793.1606226182119.JavaMail.zimbra@u-pem.fr> Message-ID: On 24/11/2020 11:56 pm, forax at univ-mlv.fr wrote: > ----- Mail original ----- >> De: "David Holmes" >> ?: "Remi Forax" >> Cc: "Harold David Seigel" , "Vicente Romero" , "compiler-dev" >> , "core-libs-dev" , "hotspot-dev" >> >> Envoy?: Mardi 24 Novembre 2020 13:16:39 >> Objet: Re: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) > >> Hi Remi, >> >> On 24/11/2020 7:45 pm, Remi Forax wrote: >>> ----- Mail original ----- >>>> De: "David Holmes" >>>> ?: "Harold David Seigel" , "Vicente Romero" >>>> , "compiler-dev" >>>> , "core-libs-dev" >>>> , "hotspot-dev" >>>> >>>> Envoy?: Mardi 24 Novembre 2020 02:04:55 >>>> Objet: Re: RFR: 8246778: Compiler implementation for Sealed Classes (Second >>>> Preview) >>> >>>> Hi Harold, >>>> >>>> On 24/11/2020 6:27 am, Harold Seigel wrote: >>>>> Hi David, >>>>> >>>>> Thanks for looking at this. >>>>> >>>>> The intent was for method Class.permittedSubclasses() to be implemented >>>>> similarly to Class.getNestMembers().? Are you suggesting that a security >>>>> manager check be added to permittedSubclasses() similar to the security >>>>> manager check in getNestMembers()? >>>> >>>> No I'm suggesting the change to the API is plain wrong. :) Please see >>>> discussion in the CSR. >>> >>> Given that the CSR is closed, i will answer here. >>> There are two issues with the previous implementation of permittedSubclasses, >>> first it's the only method that using method desc which means that people has >>> to be aware on another non trivial concept (object that describes constant pool >>> constant) to understand how to use the method then i've tested this API with my >>> students, all but one what able to correctly derives the Class from a >>> ClassDesc, so instead of asking every users of permittedSubclasses to go >>> through the oops of getting Class from a ClassDesc, i think we can agree that >>> it's better to move the burden from the user to the JDK implementors. >> >> Why is the objective to get the Class objects? What purpose does that >> serve? > > The whole idea of the reflection is to provide the runtime view of Java the language. > Even if such thing does not exist. And providing some kind of descriptor for an attribute fulfills that role. Nothing says it has to produce Class objects. >> The original API provided a descriptor for the contents of the >> permittedSubclasses attribute. I find it totally bizarre to have an API >> whose role is now to attempt to load all the subclasses of a sealed class. > > It's as bizarre as Class.getClasses() loading all the member classes. Not at all. Nested types are considered to be part of the same implementation as the outer type. They are intimately related and all part of the same accessibility domain. Loading subtypes that likely exist outside of the current package (else why would you need to be using sealed types) is a completely different matter. > You are discovering that the reflection API is bizarre, and it is :) I don't find it that bizarre. > It's not the view of the JVM, it's the view of Java at runtime, whatever it means. It means whatever we implement it to mean. So if we implement it in bizarre ways then it will be perceived as bizarre. David ----- >> >> YMMV. >> >> David > > R?mi > > [1] https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/lang/Class.html#getClasses() > >> >>>> >>>> Cheers, >>>> David >>> >>> regards, >>> R?mi >>> >>>> >>>>> Thanks, Harold >>>>> >>>>> On 11/18/2020 12:31 AM, David Holmes wrote: >>>>>> Hi Vincente, >>>>>> >>>>>> On 16/11/2020 11:36 pm, Vicente Romero wrote: >>>>>>> Please review the code for the second iteration of sealed classes. In >>>>>>> this iteration we are: >>>>>>> >>>>>>> - Enhancing narrowing reference conversion to allow for stricter >>>>>>> checking of cast conversions with respect to sealed type hierarchies. >>>>>>> - Also local classes are not considered when determining implicitly >>>>>>> declared permitted direct subclasses of a sealed class or sealed >>>>>>> interface >>>>>> >>>>>> The major change here seems to be that getPermittedSubclasses() now >>>>>> returns actual Class objects instead of ClassDesc. My recollection >>>>>> from earlier discussions here was that the use of ClassDesc was very >>>>>> deliberate as the permitted subclasses may not actually exist and >>>>>> there may be security concerns with returning them! >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> ------------- >>>>>>> >>>>>>> Commit messages: >>>>>>> ? - 8246778: Compiler implementation for Sealed Classes (Second Preview) >>>>>>> >>>>>>> Changes: https://git.openjdk.java.net/jdk/pull/1227/files >>>>>>> ? Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1227&range=00 >>>>>>> ?? Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 >>>>>>> ?? Stats: 589 lines in 12 files changed: 508 ins; 18 del; 63 mod >>>>>>> ?? Patch: https://git.openjdk.java.net/jdk/pull/1227.diff >>>>>>> ?? Fetch: git fetch https://git.openjdk.java.net/jdk >>>>>>> pull/1227/head:pull/1227 >>>>>>> >>>>>>> PR: https://git.openjdk.java.net/jdk/pull/1227 From mchung at openjdk.java.net Tue Nov 24 23:02:53 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Tue, 24 Nov 2020 23:02:53 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: <3hcm-LPJG34kftsIY2_tgDJiPwuplmron5EQkJ4NT5s=.88ff5bc4-a2d8-4273-a958-e271aacd3358@github.com> References: <3hcm-LPJG34kftsIY2_tgDJiPwuplmron5EQkJ4NT5s=.88ff5bc4-a2d8-4273-a958-e271aacd3358@github.com> Message-ID: <9wJGqoEIqSX2DDUO5Y8qxv3lvU5QhCgwox5wzApcXwM=.b7cd85fc-018f-4a70-93fb-32218b1babe5@github.com> On Tue, 17 Nov 2020 00:25:51 GMT, Mandy Chung wrote: >> src/java.base/share/classes/java/lang/Package.java line 227: >> >>> 225: * This method reports on a distinct concept of sealing from >>> 226: * {@link Class#isSealed() Class::isSealed}. >>> 227: * >> >> This API note will be very confusing to readers. I think the javadoc will need to be fleshed out and probably will need to link to a section the Package class description that defines the legacy concept of sealing. > > I agree. This @apiNote needs more clarification to help the readers to understand the context here. One thing we could do in the Package class description to add a "Package Sealing" section that can also explain that it has no relationship to "sealed classes". I added an API note in `Package::isSealed` [1] to clarify sealed package vs sealed class or interface. The API note you added in `Class::isSealed` can be clarified in a similar fashion like: "Sealed class or interface has no relationship with {@linkplain Package#isSealed package sealing}". [1] https://github.com/openjdk/jdk/commit/3c230b8a ------------- PR: https://git.openjdk.java.net/jdk/pull/1227 From dlong at openjdk.java.net Tue Nov 24 23:17:55 2020 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 24 Nov 2020 23:17:55 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 11:27:10 GMT, Richard Reingruber wrote: >> This is a XS clean-up of Deoptimization::revoke_for_object_deoptimization() which removes the StackWatermarkSet::start_processing() call. >> >> This is correct because all paths leading to revoke_for_object_deoptimization() are equipped with a KeepStackGCProcessedMark. >> >> Call Tree: >> >> StackWatermarkSet::start_processing(JavaThread *, enum StackWatermarkKind) : void >> Deoptimization::revoke_for_object_deoptimization(JavaThread *, frame, RegisterMap *, JavaThread *) : void >> Deoptimization::deoptimize_objects_internal(JavaThread *, GrowableArray *, bool &) : bool >> EscapeBarrier::deoptimize_objects_internal(JavaThread *, intptr_t *) : bool >> EscapeBarrier::deoptimize_objects_all_threads() : bool // has KeepStackGCProcessedMark >> EscapeBarrier::deoptimize_objects(intptr_t *) : bool >> EscapeBarrier::deoptimize_objects(int, int) : bool // has KeepStackGCProcessedMark >> >> Testing: hotspot_serviceability, jdk_svc, jdk_jdi, vmTestbase_nsk_jdi, vmTestbase_nsk_jvmti, vmTestbase_nsk_jdwp with -XX:+UseZGC. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Assert stack is kept gc processed. Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1381 From dlong at openjdk.java.net Tue Nov 24 23:17:56 2020 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 24 Nov 2020 23:17:56 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:15:02 GMT, Dean Long wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Assert stack is kept gc processed. > > Marked as reviewed by dlong (Reviewer). I like the version with the assert. ------------- PR: https://git.openjdk.java.net/jdk/pull/1381 From forax at univ-mlv.fr Tue Nov 24 23:18:55 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 25 Nov 2020 00:18:55 +0100 (CET) Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: <9wJGqoEIqSX2DDUO5Y8qxv3lvU5QhCgwox5wzApcXwM=.b7cd85fc-018f-4a70-93fb-32218b1babe5@github.com> References: <3hcm-LPJG34kftsIY2_tgDJiPwuplmron5EQkJ4NT5s=.88ff5bc4-a2d8-4273-a958-e271aacd3358@github.com> <9wJGqoEIqSX2DDUO5Y8qxv3lvU5QhCgwox5wzApcXwM=.b7cd85fc-018f-4a70-93fb-32218b1babe5@github.com> Message-ID: <1050079243.1769685.1606259935502.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "Mandy Chung" > ?: "compiler-dev" , "core-libs-dev" , "hotspot-dev" > > Envoy?: Mercredi 25 Novembre 2020 00:02:53 > Objet: Re: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) > On Tue, 17 Nov 2020 00:25:51 GMT, Mandy Chung wrote: > >>> src/java.base/share/classes/java/lang/Package.java line 227: >>> >>>> 225: * This method reports on a distinct concept of sealing from >>>> 226: * {@link Class#isSealed() Class::isSealed}. >>>> 227: * >>> >>> This API note will be very confusing to readers. I think the javadoc will need >>> to be fleshed out and probably will need to link to a section the Package class >>> description that defines the legacy concept of sealing. >> >> I agree. This @apiNote needs more clarification to help the readers to >> understand the context here. One thing we could do in the Package class >> description to add a "Package Sealing" section that can also explain that it >> has no relationship to "sealed classes". > > I added an API note in `Package::isSealed` [1] to clarify sealed package vs > sealed class or interface. > > The API note you added in `Class::isSealed` can be clarified in a similar > fashion like: "Sealed class or interface has no relationship with {@linkplain > Package#isSealed package sealing}". Hi Mandy, given that almost nobody knows about sealed packages, i'm not sure that adding a reference to Package::isSealed in Class::isSealed actually helps, it might be confusing. > > [1] https://github.com/openjdk/jdk/commit/3c230b8a > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/1227 From coleenp at openjdk.java.net Tue Nov 24 23:45:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 24 Nov 2020 23:45:01 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random Message-ID: The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. Ran tier1 tests on linux-x64 and windows-x64. ------------- Commit messages: - 8254042: gtest/GTestWrapper.java failed os.test_random Changes: https://git.openjdk.java.net/jdk/pull/1422/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1422&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254042 Stats: 68 lines in 8 files changed: 23 ins; 17 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/1422.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1422/head:pull/1422 PR: https://git.openjdk.java.net/jdk/pull/1422 From coleenp at openjdk.java.net Tue Nov 24 23:45:01 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 24 Nov 2020 23:45:01 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:38:52 GMT, Coleen Phillimore wrote: > The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. > I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. > Ran tier1 tests on linux-x64 and windows-x64. I also changed the platform code to not initialize the random seed, instead it is statically initialized. The only function that needs to call os::init_random() is the CDS archiver. So I added an assert that this can only be called inside a safepoint. ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From ysuenaga at openjdk.java.net Tue Nov 24 23:47:55 2020 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Tue, 24 Nov 2020 23:47:55 GMT Subject: RFR: 8256916: Add JFR event for OutOfMemoryError In-Reply-To: References: <73crkD-SepaAqEyV2wuGyO8tnAHjQmiIOHjN9zovm8M=.3ea49ef3-a2f6-487e-b86a-956832861a46@github.com> Message-ID: On Tue, 24 Nov 2020 17:43:30 GMT, Erik Gahlin wrote: > Will this fix be able to handla all cases of OOM? This PR can handle all OOMs which are thrown by `Exceptions::_throw()` e.g. * `Universe::out_of_memory_error_java_heap()` * `Universe::out_of_memory_error_c_heap()` * `Universe::out_of_memory_error_metaspace()` * `Universe::out_of_memory_error_class_metaspace()` * `Universe::out_of_memory_error_array_size()` * `Universe::out_of_memory_error_gc_overhead_limit()` * `Universe::out_of_memory_error_realloc_objects()` * `Universe::out_of_memory_error_retry()` However OOMs which are generated in Java ( `new OutOfMemoryError()` ) cannot be handled by this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1403 From dnsimon at openjdk.java.net Wed Nov 25 00:18:01 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 25 Nov 2020 00:18:01 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports Message-ID: A number of jtreg tests require a specific GC. These tests should be ignored when EnableJVMCI is true and the JVMCI compiler does not support the required GC. This PR adds `JVMCICompiler.isGCSupported` and makes use of it in `WhiteBox.isGCSupported`. Prior to this PR, a test requiring a GC not yet supported by a JVMCI compiler fail as follows: Error occurred during initialization of VM JVMCI Compiler does not support selected GC: epsilon gc ------------- Commit messages: - move logic to GCConfig - enable a JVMCICompiler to specify which GCs it supports Changes: https://git.openjdk.java.net/jdk/pull/1423/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1423&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257020 Stats: 107 lines in 12 files changed: 91 ins; 15 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1423.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1423/head:pull/1423 PR: https://git.openjdk.java.net/jdk/pull/1423 From dholmes at openjdk.java.net Wed Nov 25 01:15:57 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 25 Nov 2020 01:15:57 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:38:52 GMT, Coleen Phillimore wrote: > The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. > I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. > Ran tier1 tests on linux-x64 and windows-x64. If the problem is as suspected then the change to execute the test at a safepoint seems reasonable. Kim had doubts that any other thread should be running in a TEST case (versus TEST_VM case). The changes to the seed initialization seem unnecessary (in relation to the bug fix) but also harmless. Like all our VM initialization it is safe to do init_random() during VM initialization, so there should not have been any issue in that regard. But as all platforms set the same seed anyway, this is simpler. Thanks, David test/hotspot/gtest/logging/test_logOutputList.cpp line 69: > 67: LogLevelType expected_level_for_output[TestOutputCount]; > 68: > 69: os::init_random(0x4711); I assume this was done to try and get some measure of reproducibility, and now we have lost that, even though it wasn't guaranteed anyway. test/hotspot/gtest/runtime/test_os.cpp line 119: > 117: } > 118: > 119: class VM_TestRandom : public VM_GTestExecuteAtSafepoint { It took me a while to convince myself that none of the existing uses of os::random() can occur concurrently with a safepoint, but that does seem to be the case. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1422 From kvn at openjdk.java.net Wed Nov 25 01:41:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 01:41:54 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports In-Reply-To: References: Message-ID: <-RIuurnvW5M8EYNRiJejJY8yQt3o64X-xcJ82Zv2Q1I=.31873652-68bc-4194-943c-071d9065c210@github.com> On Tue, 24 Nov 2020 23:46:17 GMT, Doug Simon wrote: > A number of jtreg tests require a specific GC. These tests should be ignored when EnableJVMCI is true and the JVMCI compiler does not support the required GC. > > This PR adds `JVMCICompiler.isGCSupported` and makes use of it in `WhiteBox.isGCSupported`. > > Prior to this PR, a test requiring a GC not yet supported by a JVMCI compiler fail as follows: > Error occurred during initialization of VM > JVMCI Compiler does not support selected GC: epsilon gc For metropolis I have next code in jvmci_globals.cpp which switch off JVMCI if test uses unsupported GC: static bool supported_gc() { return (UseSerialGC || UseParallelGC || UseG1GC); } void JVMCIGlobals::check_jvmci_supported_gc() { if (EnableJVMCI) { // Check if selected GC is supported by JVMCI and Java compiler if (!supported_gc()) { if (FLAG_IS_DEFAULT(UseJVMCINativeLibrary) && !UseJVMCINativeLibrary) { // Just disable Graal when libgraal is present vm_exit_during_initialization("JVMCI Compiler does not support selected GC", GCConfig::hs_err_name()); } FLAG_SET_DEFAULT(EnableJVMCI, false); FLAG_SET_DEFAULT(UseJVMCICompiler, false); FLAG_SET_DEFAULT(UseJVMCINativeLibrary, false); } } } ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From kvn at openjdk.java.net Wed Nov 25 01:50:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 01:50:57 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports In-Reply-To: <-RIuurnvW5M8EYNRiJejJY8yQt3o64X-xcJ82Zv2Q1I=.31873652-68bc-4194-943c-071d9065c210@github.com> References: <-RIuurnvW5M8EYNRiJejJY8yQt3o64X-xcJ82Zv2Q1I=.31873652-68bc-4194-943c-071d9065c210@github.com> Message-ID: On Wed, 25 Nov 2020 01:39:11 GMT, Vladimir Kozlov wrote: >> A number of jtreg tests require a specific GC. These tests should be ignored when EnableJVMCI is true and the JVMCI compiler does not support the required GC. >> >> This PR adds `JVMCICompiler.isGCSupported` and makes use of it in `WhiteBox.isGCSupported`. >> >> Prior to this PR, a test requiring a GC not yet supported by a JVMCI compiler fail as follows: >> Error occurred during initialization of VM >> JVMCI Compiler does not support selected GC: epsilon gc > > For metropolis I have next code in jvmci_globals.cpp which switch off JVMCI if test uses unsupported GC: > > static bool supported_gc() { > return (UseSerialGC || UseParallelGC || UseG1GC); > } > > void JVMCIGlobals::check_jvmci_supported_gc() { > if (EnableJVMCI) { > // Check if selected GC is supported by JVMCI and Java compiler > if (!supported_gc()) { > if (FLAG_IS_DEFAULT(UseJVMCINativeLibrary) && !UseJVMCINativeLibrary) { // Just disable Graal when libgraal is present > vm_exit_during_initialization("JVMCI Compiler does not support selected GC", GCConfig::hs_err_name()); > } > FLAG_SET_DEFAULT(EnableJVMCI, false); > FLAG_SET_DEFAULT(UseJVMCICompiler, false); > FLAG_SET_DEFAULT(UseJVMCINativeLibrary, false); > } > } > } Why not call such `JVMCIGlobals::supported_gc()` from `GCConfig::is_gc_supported`? I don't understand why you need to go through Java JVMCI code for that? ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From coleenp at openjdk.java.net Wed Nov 25 02:16:59 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 25 Nov 2020 02:16:59 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: References: Message-ID: <2zrqcuo65UqXeIwxoHOBh78-Vv7a3vdBTiRkC_9yBzs=.c4d58115-ecf4-4c1d-9002-9d39fd6d0868@github.com> On Wed, 25 Nov 2020 01:00:57 GMT, David Holmes wrote: >> The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. >> I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. >> Ran tier1 tests on linux-x64 and windows-x64. > > test/hotspot/gtest/logging/test_logOutputList.cpp line 69: > >> 67: LogLevelType expected_level_for_output[TestOutputCount]; >> 68: >> 69: os::init_random(0x4711); > > I assume this was done to try and get some measure of reproducibility, and now we have lost that, even though it wasn't guaranteed anyway. Yes, I don't know why this was done here. > test/hotspot/gtest/runtime/test_os.cpp line 119: > >> 117: } >> 118: >> 119: class VM_TestRandom : public VM_GTestExecuteAtSafepoint { > > It took me a while to convince myself that none of the existing uses of os::random() can occur concurrently with a safepoint, but that does seem to be the case. In this test I'm assuming that once this thread gets to a safepoint, it should be the only thing running. Unless maybe some concurrent GC threads. I don't know what they would be doing for this test case though. ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From coleenp at openjdk.java.net Wed Nov 25 02:20:56 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 25 Nov 2020 02:20:56 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 01:13:14 GMT, David Holmes wrote: >> The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. >> I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. >> Ran tier1 tests on linux-x64 and windows-x64. > > If the problem is as suspected then the change to execute the test at a safepoint seems reasonable. Kim had doubts that any other thread should be running in a TEST case (versus TEST_VM case). > The changes to the seed initialization seem unnecessary (in relation to the bug fix) but also harmless. Like all our VM initialization it is safe to do init_random() during VM initialization, so there should not have been any issue in that regard. But as all platforms set the same seed anyway, this is simpler. > Thanks, > David Thanks for reviewing this, David. I experimented with a ShouldNotReachHere and found there were other vm threads running even though the test was TEST(). Unfortunately, this didn't reproduce enough to be able to experiment which os::random() made the test fail. ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From kbarrett at openjdk.java.net Wed Nov 25 03:21:58 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 25 Nov 2020 03:21:58 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification [v2] In-Reply-To: <4C7DyAgcXsSq3YEzbbwWeLaIbWwOyEriG8_4QrWNZ80=.4b75553f-7eaf-4d8b-9b47-007fc0609ba7@github.com> References: <4C7DyAgcXsSq3YEzbbwWeLaIbWwOyEriG8_4QrWNZ80=.4b75553f-7eaf-4d8b-9b47-007fc0609ba7@github.com> Message-ID: On Mon, 23 Nov 2020 20:36:57 GMT, Roman Kennke wrote: >> Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: >> >> - new tests need ref in oldgen too >> - remove obsolete comment about races with clear and enqueue > > Looks good! Thanks! Thanks @rkennke, @pliden, @mlchung for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1376 From kvn at openjdk.java.net Wed Nov 25 03:37:08 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 03:37:08 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo Message-ID: JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. Initial patch was prepared by @fisk. Tested hs-tier1-4. Added new compiler tests to test intrinsics. Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. ------------- Commit messages: - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo Changes: https://git.openjdk.java.net/jdk/pull/1425/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1425&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256999 Stats: 381 lines in 16 files changed: 242 ins; 61 del; 78 mod Patch: https://git.openjdk.java.net/jdk/pull/1425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1425/head:pull/1425 PR: https://git.openjdk.java.net/jdk/pull/1425 From kbarrett at openjdk.java.net Wed Nov 25 03:39:20 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 25 Nov 2020 03:39:20 GMT Subject: Integrated: 8256517: (ref) Reference.clear during reference processing may lose notification In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 01:43:39 GMT, Kim Barrett wrote: > Please review this change to Reference.clear() to address several issues. > > (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent > field to null may extend the lifetime of the referent value. > > (JDK-8240696) For GCs with concurrent reference processing, clearing the > referent field during reference processing may discard the expected > notification. > > Both of these are addressed by introducing a private native helper function > for clearing the referent, rather than using an ordinary in-Java field > assignment. Tests have been added for both of these issues. This required > adding a new breakpoint in reference processing for ZGC. > > Of course, finalization adds some complexity to the problem. We deal with > that by having FinalReference override clear. The implementation is > provided by a new package-private method in Reference. (There are a number > of alternatives, all of them clumsy; finalization is annoying that way.) > > While dealing with FinalReference clearing it was noted that the recent > JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not > updated to call the new Reference.getInactive(), instead still calling get() > on FinalReferences, with the JDK-8256106 problems. Fixing that showed the > assertion for inactive FinalReference added by JDK-8256370 used the wrong > test. > > Rather than tracking down and changing all get() and clear() calls on final > references and changing them to use getInactive and a new similar clear > function, I've changed FinalReference to override get and clear, which call > the helper functions in Reference. I've also renamed getInactive to be more > explanatory and less convenient to call directly, and similarly named the > helper for clear. This means that get/clear should never be called on an > active FinalReference. That's already never done, and would have problems > if it were. > > Testing: > mach5 tier1-6 > Local (linux-x64) tier1 using Shenandoah. > New TestReferenceClearDuringMarking fails for G1 without these changes. > New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. This pull request has now been integrated. Changeset: 66943fef Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/66943fef Stats: 317 lines in 14 files changed: 286 ins; 22 del; 9 mod 8256517: (ref) Reference.clear during reference processing may lose notification 8240696: (ref) Reference.clear may extend the lifetime of the referent Use private native helper to implement Reference.clear. Reviewed-by: pliden, rkennke, mchung ------------- PR: https://git.openjdk.java.net/jdk/pull/1376 From kbarrett at openjdk.java.net Wed Nov 25 03:39:17 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 25 Nov 2020 03:39:17 GMT Subject: RFR: 8256517: (ref) Reference.clear during reference processing may lose notification [v3] In-Reply-To: References: Message-ID: > Please review this change to Reference.clear() to address several issues. > > (JDK-8240696) For GCs using a SATB barrier, simply assigning the referent > field to null may extend the lifetime of the referent value. > > (JDK-8240696) For GCs with concurrent reference processing, clearing the > referent field during reference processing may discard the expected > notification. > > Both of these are addressed by introducing a private native helper function > for clearing the referent, rather than using an ordinary in-Java field > assignment. Tests have been added for both of these issues. This required > adding a new breakpoint in reference processing for ZGC. > > Of course, finalization adds some complexity to the problem. We deal with > that by having FinalReference override clear. The implementation is > provided by a new package-private method in Reference. (There are a number > of alternatives, all of them clumsy; finalization is annoying that way.) > > While dealing with FinalReference clearing it was noted that the recent > JDK-8256106 and JDK-8256370 have some problems. FinalizerHistogram was not > updated to call the new Reference.getInactive(), instead still calling get() > on FinalReferences, with the JDK-8256106 problems. Fixing that showed the > assertion for inactive FinalReference added by JDK-8256370 used the wrong > test. > > Rather than tracking down and changing all get() and clear() calls on final > references and changing them to use getInactive and a new similar clear > function, I've changed FinalReference to override get and clear, which call > the helper functions in Reference. I've also renamed getInactive to be more > explanatory and less convenient to call directly, and similarly named the > helper for clear. This means that get/clear should never be called on an > active FinalReference. That's already never done, and would have problems > if it were. > > Testing: > mach5 tier1-6 > Local (linux-x64) tier1 using Shenandoah. > New TestReferenceClearDuringMarking fails for G1 without these changes. > New TestReferenceClearDuringReferenceProcessing fails for ZGC without these changes. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into java_clear - new tests need ref in oldgen too - remove obsolete comment about races with clear and enqueue - add private native Reference::clear0 - test clear during marking - test clear during reference processing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1376/files - new: https://git.openjdk.java.net/jdk/pull/1376/files/c19efd70..dfa51fb3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1376&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1376&range=01-02 Stats: 74214 lines in 450 files changed: 70893 ins; 2086 del; 1235 mod Patch: https://git.openjdk.java.net/jdk/pull/1376.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1376/head:pull/1376 PR: https://git.openjdk.java.net/jdk/pull/1376 From stuefe at openjdk.java.net Wed Nov 25 05:25:56 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 05:25:56 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:38:52 GMT, Coleen Phillimore wrote: > The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. > I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. > Ran tier1 tests on linux-x64 and windows-x64. Hi Coleen, Why is someone concurrently changing the seed? I thought "TEST" tests do not start the VM? Or is it that some earlier test already did? Could we not just change the test to use os::next_random(seed)? That would be a two-line change: remove the init_random call, and replace os::random() with `num = seed = os::next_random(seed);` That said, moving the initialization out of the platform files and initialize the seed directly makes sense. If you want to stay with your approach of making os::init_random() only available at safepoint, could you make it private? CDS could - or arguably should - use the same technique as outlined above, using os::next_random with an own seed instead of os::random. A third possibility would be to keep the seed THREAD_LOCAL. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From jbhateja at openjdk.java.net Wed Nov 25 06:11:58 2020 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 25 Nov 2020 06:11:58 GMT Subject: Integrated: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions In-Reply-To: References: Message-ID: On Tue, 22 Sep 2020 15:24:41 GMT, Jatin Bhateja wrote: > Summary: > > 1) Partial in-lining technique avoids call overhead penalty for small array copy operations with size less than 32 bytes. > 2) At runtime, a conditional check based on copy length either calls an array-copy stub or executes an optimized instruction sequence using AVX-512 masked instructions emitted at the call site. > 3) New runtime flag ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum size for partial in-lining. > 4) Based on the perf results seen in benchmarks currently partial in-lining is performed only for arraycopy involving sub-word types (bool/byte/char/short). Once PR-61 gets integrated we can extend this patch to cover all the primitive types. > > Performance Results: > System : CascadeLake Server, Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz > Micros : test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java > ArrayCopyPartialInlineSize : 32 > > JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain > -- | -- | -- | -- | -- > ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997 > ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866 > ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829 > ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564 > ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909 > ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667 > ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773 > ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179 > ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874 > ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228 > ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082 > ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769 > ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672 > ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844 > ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788 > ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947 > ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157 > ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038 > ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953 > ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584 > ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443 > ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724 > ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302 > ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756 > ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550836 > ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289 > ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512 > ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831 > ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362 > ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 | 0.820195974 > ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 | 1.774752475 > ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 | 1.868621064 > ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 | 2.021923621 > ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 | 0.945780903 > ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 | 1.02117081 > ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 | 0.890917886 > ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 | 0.973313287 > ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 | 1.189361945 > ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892 | 1.000094067 > ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 | 2.068638604 > ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 | 2.058475204 > ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 | 2.008512213 > ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 | 1.990727003 > ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 | 2.214258627 > ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 | 0.986595174 > ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 | 1.001771741 > ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 | 0.954089313 > ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 | 0.565510734 > ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 | 0.985462345 > ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 | 1.775733994 > ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 | 1.857711803 > ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 | 1.82404055 > ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 | 2.070238957 > ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 | 0.946676541 > ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 | 1.001287416 > ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 | 0.906707692 > ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 | 1.012100774 > ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 | 0.990837467 > ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 | 1.011095137 > ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 | 2.008160237 > ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 | 2.083428138 > ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 | 2.059537353 > ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 | 1.977062523 > ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 | 2.163946588 > ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 | 0.979031453 > ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 | 0.979475983 > ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 | 0.953952321 > ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 | 0.987171053 > ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 | 0.9934507 > ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 | 1.81788559 > ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 | 1.857349747 > ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 | 1.783516095 > ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 | 2.073886273 > ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 | 0.94649139 > ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 | 0.981692629 > ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 | 0.949923935 > ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 | 0.97457385 > ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 | 1.001925096 > ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 | 0.998092975 > > Detailed Reports: > Baseline : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt) > WithOpt : [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI_Opts.txt) This pull request has now been integrated. Changeset: 0d91f0a1 Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/0d91f0a1 Stats: 493 lines in 25 files changed: 448 ins; 23 del; 22 mod 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions Reviewed-by: neliasso, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/302 From dholmes at openjdk.java.net Wed Nov 25 07:38:58 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 25 Nov 2020 07:38:58 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing In-Reply-To: References: <69fkZdGvgYprbfVE4dKZfkhBgnOPiNEoyVy3FpsDGck=.0855b1bf-da14-4f18-9808-0f0fd1bf0cba@github.com> Message-ID: On Tue, 24 Nov 2020 14:45:59 GMT, Stefan Karlsson wrote: >> Hi Stefan, >> >> can't we just print out unconditionally if VMError::is_error_reported()? Whats the worst that can happen, we crash? Then we would continue with the next reporting step. But we still have a chance to see the event buffer contents even if the event logger itself crashed or asserted. >> >> Side note, some time ago I rewrote the whole event system: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2019-March/033150.html to a much simpler implementation and I think that one would also be pretty safe to get printed even if unlocked since it works on a static pre-allocated buffer. But that work somehow got bogged down in review and I never got around to drive it upstream. If there is enough interest I may take up the work again. > > Hi @tstuefe > > With your proposal, if we have a section that looks like this: > * event log instance 0 > * event log instance 1 > * event log instance 2 > * other type of printing > > and we crash at "event log instance 1", then we completely skip printing "event log instance 2" and "other type of printing". My thinking was that that was unfortunate, and I wanted to localize the problem. We often have this problem that one hs_err crash hides important information that was supposed to be written later, because a lot of the code in the hs_err printing isn't hardened and the hs_err sections are on a too high level (IMHO). > > With that said, I don't mind skipping the trying to take the lock if we are printing the hs_err file. Do others agree with Thomas? Don't we have to check not only VMError::is_error_reported, but that it is the current thread that is doing the reporting? I also think the Thread::current_or_null()==NULL case has to mean we are doing the error reporting very early in VM init - else how can we get in here in a "non attached" thread? Even then I'm not sure that is actually possible either - at what point in VM init have we installed our crash handler? ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From stuefe at openjdk.java.net Wed Nov 25 07:54:54 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 07:54:54 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing In-Reply-To: References: <69fkZdGvgYprbfVE4dKZfkhBgnOPiNEoyVy3FpsDGck=.0855b1bf-da14-4f18-9808-0f0fd1bf0cba@github.com> Message-ID: <4iQgQm7IEGnV4-iRBygxcwLqLfTUwK1P8AzhvmRquT4=.f6db138c-e972-483f-8628-528c5ed5b1a3@github.com> On Wed, 25 Nov 2020 07:36:04 GMT, David Holmes wrote: > Don't we have to check not only VMError::is_error_reported, but that it is the current thread that is doing the reporting? > You mean my proposal of just not locking altogether? Its a calculated risk. The chance of someone printing out the even log concurrently to the hs-err reporter doing it is extremely low, and since we have secondary crash reporting the only risk we run is an interrupted error reporting step. > I also think the Thread::current_or_null()==NULL case has to mean we are doing the error reporting very early in VM init - else how can we get in here in a "non attached" thread? Thread::current_or_null()==NULL if we crash in a non attached thread or at/before VM init. The former case is typical if the VM is embedded into another launcher and foreign code crashes (not uncommon). Arguably, in both cases the event log is not very interesting, but I'd still attempt to print it. > Even then I'm not sure that is actually possible either - at what point in VM init have we installed our crash handler? ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From stefank at openjdk.java.net Wed Nov 25 07:54:55 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Wed, 25 Nov 2020 07:54:55 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing In-Reply-To: <4iQgQm7IEGnV4-iRBygxcwLqLfTUwK1P8AzhvmRquT4=.f6db138c-e972-483f-8628-528c5ed5b1a3@github.com> References: <69fkZdGvgYprbfVE4dKZfkhBgnOPiNEoyVy3FpsDGck=.0855b1bf-da14-4f18-9808-0f0fd1bf0cba@github.com> <4iQgQm7IEGnV4-iRBygxcwLqLfTUwK1P8AzhvmRquT4=.f6db138c-e972-483f-8628-528c5ed5b1a3@github.com> Message-ID: On Wed, 25 Nov 2020 07:49:42 GMT, Thomas Stuefe wrote: > Don't we have to check not only VMError::is_error_reported, but that it is the current thread that is doing the reporting? I found inspiration for this in this piece of code: void ThreadsSMRSupport::print_info_on(outputStream* st) { // Only grab the Threads_lock if we don't already own it and if we // are not reporting an error. // Note: Not grabbing the Threads_lock during error reporting is // dangerous because the data structures we want to print can be // freed concurrently. However, grabbing the Threads_lock during // error reporting can be equally dangerous since this thread might // block during error reporting or a nested error could leave the // Threads_lock held. The classic no win scenario. // MutexLocker ml((Threads_lock->owned_by_self() || VMError::is_error_reported()) ? NULL : Threads_lock); Looking at this a bit more I see that the EventLogs typically check this function instead: // check to see if fatal error reporting is in progress static bool fatal_error_in_progress() { return _first_error_tid != -1; } which is used to turn off EventLog logging when we crash. Both these variables are set here: if (_first_error_tid == -1 && Atomic::cmpxchg(&_first_error_tid, (intptr_t)-1, mytid) == -1) { ... // first time _error_reported = true; ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From shade at openjdk.java.net Wed Nov 25 08:01:03 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 08:01:03 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 03:31:36 GMT, Vladimir Kozlov wrote: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. src/hotspot/share/opto/c2compiler.cpp line 476: > 474: if (UseCompressedOops && UseShenandoahGC) return false; > 475: #endif > 476: break; Is this intended to disable the intrinsic on all non-64-bit platforms? Is that only for Shenandoah 64-bit? I wonder if it should just be: case vmIntrinsics::_PhantomReference_refersTo0: if (UseCompressedOops && UseShenandoahGC) return false; break; ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From vlivanov at openjdk.java.net Wed Nov 25 08:21:57 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 25 Nov 2020 08:21:57 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 07:58:42 GMT, Aleksey Shipilev wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > src/hotspot/share/opto/c2compiler.cpp line 476: > >> 474: if (UseCompressedOops && UseShenandoahGC) return false; >> 475: #endif >> 476: break; > > Is this intended to disable the intrinsic on all non-64-bit platforms? Is that only for Shenandoah 64-bit? I wonder if it should just be: > > case vmIntrinsics::_PhantomReference_refersTo0: > if (UseCompressedOops && UseShenandoahGC) return false; > break; Considering `UseCompressedOops` doesn't make much sense in 32-bit mode and is set to `false`, it seems `#ifdef` can be just dropped. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From shade at openjdk.java.net Wed Nov 25 08:27:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 08:27:56 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 03:31:36 GMT, Vladimir Kozlov wrote: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. I just pulled the fresh master, applied this patch on top, enabled `_PhantomReference_refersTo0` in `c2compiler.cpp`, and ran `CONF=linux-x86_64-server-fastdebug make images run-test TEST=tier1 TEST_VM_OPTS="-XX:+UseShenandoahGC"` without problems. @vnkozlov, what Shenandoah failure did you see? Attention @rkennke. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From dnsimon at openjdk.java.net Wed Nov 25 08:31:59 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 25 Nov 2020 08:31:59 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports In-Reply-To: References: <-RIuurnvW5M8EYNRiJejJY8yQt3o64X-xcJ82Zv2Q1I=.31873652-68bc-4194-943c-071d9065c210@github.com> Message-ID: On Wed, 25 Nov 2020 01:48:17 GMT, Vladimir Kozlov wrote: >> For metropolis I have next code in jvmci_globals.cpp which switch off JVMCI if test uses unsupported GC: >> >> static bool supported_gc() { >> return (UseSerialGC || UseParallelGC || UseG1GC); >> } >> >> void JVMCIGlobals::check_jvmci_supported_gc() { >> if (EnableJVMCI) { >> // Check if selected GC is supported by JVMCI and Java compiler >> if (!supported_gc()) { >> if (FLAG_IS_DEFAULT(UseJVMCINativeLibrary) && !UseJVMCINativeLibrary) { // Just disable Graal when libgraal is present >> vm_exit_during_initialization("JVMCI Compiler does not support selected GC", GCConfig::hs_err_name()); >> } >> FLAG_SET_DEFAULT(EnableJVMCI, false); >> FLAG_SET_DEFAULT(UseJVMCICompiler, false); >> FLAG_SET_DEFAULT(UseJVMCINativeLibrary, false); >> } >> } >> } > > Why not call such `JVMCIGlobals::supported_gc()` from `GCConfig::is_gc_supported`? > I don't understand why you need to go through Java JVMCI code for that? The reason to go through Java JVMCI code is that JVMCI itself does not know which GCs are supported by a JVMCI compiler. For example, once Graal supports ZGC, the above `supported_gc` function will be incorrect. ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From stefank at openjdk.java.net Wed Nov 25 08:31:56 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Wed, 25 Nov 2020 08:31:56 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:46:17 GMT, Doug Simon wrote: > A number of jtreg tests require a specific GC. These tests should be ignored when EnableJVMCI is true and the JVMCI compiler does not support the required GC. > > This PR adds `JVMCICompiler.isGCSupported` and makes use of it in `WhiteBox.isGCSupported`. > > Prior to this PR, a test requiring a GC not yet supported by a JVMCI compiler fail as follows: > Error occurred during initialization of VM > JVMCI Compiler does not support selected GC: epsilon gc Could this be done without adding JVMCI code to GCConfig::is_gc_supported? Could you extend VMProps.java: protected void vmGC(SafeMap map) { var isGraalEnabled = Compiler.isGraalEnabled(); for (GC gc: GC.values()) { map.put("vm.gc." + gc.name(), () -> "" + (gc.isSupported() && (!isGraalEnabled || isGcSupportedByGraal(gc)) && (gc.isSelected() || GC.isSelectedErgonomically()))); } } and add: `!isJvcmiEnabled || isGcSupportedByJvmci(gc)` plus necessary JVMCI code to implement that? ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1423 From sjohanss at openjdk.java.net Wed Nov 25 08:50:56 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 25 Nov 2020 08:50:56 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 09:27:50 GMT, Joakim Nordstr?m wrote: > ### Description > The logging for reserved heap space, printed the `GenAlignment` value instead of page size. > Before: >
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=512K size=2M
> 
> Besides changing to display page size using `os::vm_page_size()`, this fix also adds logging of `SpaceAlignment `,`GenAlignment`, and `HeapAlignment`. > After: >
> [0.367s][info][pagesize] Alignments:  space_align=512K gen_align=512K heap_align=2M
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=4K size=2M
> 
> Please advice whether the added logging of alignments is sufficient or irrelevant. There were no signs of either alignment being logged anywhere. > ### Testing > - Tested manually, and logs are showing with arguments -Xlog:pagesize=info -XX:+UseParallelGC > - Tested Tier1, Tier2, Tier3. @jaokim I've run some additional testing and haven't seen any problems so I'm ready to sponsor whenever you want to integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From rrich at openjdk.java.net Wed Nov 25 09:01:59 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 25 Nov 2020 09:01:59 GMT Subject: RFR: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant [v2] In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:15:27 GMT, Dean Long wrote: > > > I like the version with the assert. Good. Thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/1381 From vlivanov at openjdk.java.net Wed Nov 25 09:03:00 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 25 Nov 2020 09:03:00 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 03:31:36 GMT, Vladimir Kozlov wrote: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. src/hotspot/cpu/x86/gc/z/z_x86_64.ad line 123: > 121: > 122: ins_encode %{ > 123: if (barrier_data() != 0) { // barrier could be elided by ZBarrierSetC2::analyze_dominating_barriers() Maybe keep a bit reserved for `ZLoadBarrierElided` to just map it to `0`? The former is preferred because it keeps the info that there was a barrier data attached in the first place. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 623: > 621: // Also we need to add memory barrier to prevent commoning reads > 622: // from this field across safepoint since GC can change its value. > 623: bool need_read_barrier = (((on_weak || on_phantom) && !no_keepalive) || There's a slight change: `in_heap && (on_weak || ...)` turns into `(on_weak ...) || (in_heap ...)`. It will introduce a read barrier for `!in_heap && on_weak` case. Does it occur in practice? Another one: `on_weak` turns into ((on_weak ...) && !no_keepalive). My interpretation is no read barrier needed when `NO_KEEPALIVE` flag is used and currently a redundant barrier is issued. Maybe replace `!no_keepalive` with just `keep_alive`? The former is harder to parse. The check grows bigger and bigger. Maybe it's time to split it? Turn `on_weak || on_phantom` into `!is_strong`? ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From github.com+779991+jaokim at openjdk.java.net Wed Nov 25 09:39:59 2020 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Wed, 25 Nov 2020 09:39:59 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 08:48:26 GMT, Stefan Johansson wrote: >> ### Description >> The logging for reserved heap space, printed the `GenAlignment` value instead of page size. >> Before: >>
>> [0.369s][info][gc      ] Using Parallel
>> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=512K size=2M
>> 
>> Besides changing to display page size using `os::vm_page_size()`, this fix also adds logging of `SpaceAlignment `,`GenAlignment`, and `HeapAlignment`. >> After: >>
>> [0.367s][info][pagesize] Alignments:  space_align=512K gen_align=512K heap_align=2M
>> [0.369s][info][gc      ] Using Parallel
>> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=4K size=2M
>> 
>> Please advice whether the added logging of alignments is sufficient or irrelevant. There were no signs of either alignment being logged anywhere. >> ### Testing >> - Tested manually, and logs are showing with arguments -Xlog:pagesize=info -XX:+UseParallelGC >> - Tested Tier1, Tier2, Tier3. > > @jaokim I've run some additional testing and haven't seen any problems so I'm ready to sponsor whenever you want to integrate. Thank you @kstefanj and @tschatzl for reviewing and comments! Tested tier1, tier2, tier3 without problems. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From github.com+779991+jaokim at openjdk.java.net Wed Nov 25 09:40:02 2020 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Wed, 25 Nov 2020 09:40:02 GMT Subject: Integrated: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 09:27:50 GMT, Joakim Nordstr?m wrote: > ### Description > The logging for reserved heap space, printed the `GenAlignment` value instead of page size. > Before: >
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=512K size=2M
> 
> Besides changing to display page size using `os::vm_page_size()`, this fix also adds logging of `SpaceAlignment `,`GenAlignment`, and `HeapAlignment`. > After: >
> [0.367s][info][pagesize] Alignments:  space_align=512K gen_align=512K heap_align=2M
> [0.369s][info][gc      ] Using Parallel
> [0.374s][info][pagesize] Heap:  min=2M max=2M base=0x00000000ffe00000 page_size=4K size=2M
> 
> Please advice whether the added logging of alignments is sufficient or irrelevant. There were no signs of either alignment being logged anywhere. > ### Testing > - Tested manually, and logs are showing with arguments -Xlog:pagesize=info -XX:+UseParallelGC > - Tested Tier1, Tier2, Tier3. This pull request has now been integrated. Changeset: 8cd2e0f6 Author: Joakim Nordstr?m Committer: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/8cd2e0f6 Stats: 144 lines in 7 files changed: 116 ins; 24 del; 4 mod 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size Reviewed-by: sjohanss, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From eosterlund at openjdk.java.net Wed Nov 25 09:49:57 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 25 Nov 2020 09:49:57 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 08:30:46 GMT, Vladimir Ivanov wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > src/hotspot/cpu/x86/gc/z/z_x86_64.ad line 123: > >> 121: >> 122: ins_encode %{ >> 123: if (barrier_data() != 0) { // barrier could be elided by ZBarrierSetC2::analyze_dominating_barriers() > > Maybe keep a bit reserved for `ZLoadBarrierElided` to just map it to `0`? The former is preferred because it keeps the info that there was a barrier data attached in the first place. The information that there was a barrier attached, is implicit in the ins_encode block due to it being run at all. In other words, since we matched the mach node to our ZGC access instead of a normal access, we already know that there was barrier data attached, and that we no longer have such barrier data. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From vlivanov at openjdk.java.net Wed Nov 25 10:03:56 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 25 Nov 2020 10:03:56 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 09:47:19 GMT, Erik ?sterlund wrote: >> src/hotspot/cpu/x86/gc/z/z_x86_64.ad line 123: >> >>> 121: >>> 122: ins_encode %{ >>> 123: if (barrier_data() != 0) { // barrier could be elided by ZBarrierSetC2::analyze_dominating_barriers() >> >> Maybe keep a bit reserved for `ZLoadBarrierElided` to just map it to `0`? The former is preferred because it keeps the info that there was a barrier data attached in the first place. > > The information that there was a barrier attached, is implicit in the ins_encode block due to it being run at all. In other words, since we matched the mach node to our ZGC access instead of a normal access, we already know that there was barrier data attached, and that we no longer have such barrier data. Ok, makes sense. What do you think about making `ZLoadBarrierElided = 0` then? ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From pliden at openjdk.java.net Wed Nov 25 11:07:59 2020 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 25 Nov 2020 11:07:59 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 03:31:36 GMT, Vladimir Kozlov wrote: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. src/hotspot/share/opto/library_call.cpp line 5525: > 5523: Node* LibraryCallKit::load_field_from_object(Node* fromObj, const char* fieldName, const char* fieldTypeString, > 5524: DecoratorSet decorators = IN_HEAP, bool is_exact = false, bool is_static = false, > 5525: ciInstanceKlass* fromKls = NULL) { It looks like the `is_exact` argument here can be removed, as all call-sites use the default value, which is `false`, and the only use of it in the function is this assert, which will never fail. assert(!is_exact || tinst->klass_is_exact(), "klass not exact"); ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From pliden at openjdk.java.net Wed Nov 25 11:11:56 2020 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 25 Nov 2020 11:11:56 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: <19RAO8WaSz5jtGP-k8aNoHtNkWjVBFaa-QU97qHKmGc=.faf01bfc-7a71-45ee-b7e3-37aa2cc194d0@github.com> On Wed, 25 Nov 2020 03:31:36 GMT, Vladimir Kozlov wrote: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. ZGC changes look good! ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1425 From stuefe at openjdk.java.net Wed Nov 25 11:44:58 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 11:44:58 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 09:34:56 GMT, Joakim Nordstr?m wrote: >> @jaokim I've run some additional testing and haven't seen any problems so I'm ready to sponsor whenever you want to integrate. > > Thank you @kstefanj and @tschatzl for reviewing and comments! > > Tested tier1, tier2, tier3 without problems. Hi guys, I was just reviewing this but see that it was already pushed in the meantime. I have some concerns about this change. Most importantly, it perpetuates the notion that alignment has anything to do with pages, which is just wrong. I know we have still some coding in the hotspot which assumes this but it is not true. We should remove any coding assuming that, instead of building onto it. ReservedSpace::alignment is just that, the alignment of the start address of the ReservedSpace. Arguably, there is no need to even keep it as a member in ReservedSpace, it is needed for reservation and should not be needed beyond that. So, `page_size = MIN2(rs.alignment(), os::large_page_size());` is wrong, e.g. if I create a space with 128M aligned base, that does say nothing about the underlying page size. Unfortunately we do not have tests to test this. This coding had been less of a problem as long as it lived in gc land and was only called for heap areas. Now it is a general purpose function. (why is this function static btw?) As an example for larger alignment which have nothing to do with page size: Metaspace reserves 4M-aligned ReservedSpaces. Currently only uses small pages, but that may change. Furthermore I think the notion of asking about the page size of a range is itself shaky. A range can and does consist of multiple page sizes (see e.g. what Linux reserves with UseHugeTLBFS). The patch also assumes os::large_page_size() to be the one large page size used, and currently there are attempts by Intel to more or less fall back to a "small large page" if the large page is too large for a given area: see https://github.com/openjdk/jdk/pull/1153. This would increase the number of used page sizes, and could mean space reserved with reserve_memory_special() would silently return memory with small-large-pages instead of os::large_page_size(). I am sorry to make such a fuss. ----- In an ideal world we would have something like pagesizes_t os::query_page_sizes(address range); which would return information about all page sizes in the range (page_sizes_t could be a bitmask btw since all page sizes are pow2). We have something like this on AIX, see `os::Aix::query_pagesize`. On Linux, we have e.g. int getpagesize(void); That'd be awesome since it would take the burden from us to keep information of page sizes or second-guessing reservation logic. That information could even be cached in a ReservedSpace if its difficult to get. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From dnsimon at openjdk.java.net Wed Nov 25 12:03:58 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 25 Nov 2020 12:03:58 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 08:29:01 GMT, Stefan Karlsson wrote: > Could this be done without adding JVMCI code to GCConfig::is_gc_supported? This has to go through the WhiteBox API as JVMCI is not accessible to normal Java code (such as VMProps). We could restrict the JVMCI interposition to WhiteBox. This is what I did [initially](https://github.com/openjdk/jdk/pull/1423/commits/191d61d996dd08f43c2c946f02b0cce0d96f2ef4). However, I then saw that `GCConfig::is_gc_supported` is only called from whitebox.cpp so [moved](https://github.com/openjdk/jdk/pull/1423/commits/54b7ba8463dfee9a679abb09a5cc898ba8550d85) the logic directly into `GCConfig`. I think this is better because as soon as there's another caller of `GCConfig::is_gc_supported`, they should get an accurate answer and not have to worry about implications of `EnableJVMCI` separately. ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From stuefe at openjdk.java.net Wed Nov 25 12:05:00 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 12:05:00 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> <8f-BJdFip5yf0Rv4uw-qcXVk2uM3Lb6Hrq9VPR6UzF4=.04966477-8834-4fb9-aa77-8da86f104176@github.com> Message-ID: On Thu, 19 Nov 2020 13:08:05 GMT, Thomas Stuefe wrote: >>> Hi Stefan, >>> >>> Thanks so much for your review. >>> >>> > Hi and welcome :) >>> > I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments: >>> > >>> > * Why do we have a special case for `exec` when selecting a large page size? >>> >>> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. >>> >>> Perhaps I should just select the page size <= bytes requested and remove 'exec' special case. >>> >> Yes, I see no reason to keep that special case and we want to keep this code as general as possible. Looking at the code in `os::Linux::find_default_large_page_size()` it looks like S390 supports 1M large pages, so we cannot assume 2M. I suggest using a technique similar to the one used in `os::Linux::find_large_page_size` to find supported page sizes. If you scan `/sys/kernel/mm/hugepages` and populate `_page_sizes` using the information found we know we only get supported sizes. >> >>> > * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`. >>> >>> os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved. >> >> You are correct that the default size might indeed be 1G, so using something like I suggest above to figure out the available page sizes and then using an appropriate one given the size of the mapping sounds good. >> >> Please also avoid force pushing changes to open PRs since it makes it harder to follow what changes between updates. It is fine for a PR to contain multiple commits and if you need to update with things from the main branch you should merge rather than rebase. >> >> Cheers, >> Stefan > > Hi Markus, > > thanks, and a belated welcome! > > Some initial background: > > We at SAP are maintainers for a number of ports, among others AIX and linux ppc/s390 as well as some propietary ones (e.g. HPUX or ia64). So I wear my platform glasses when looking at this code. > > IMHO the virtual memory layer in hotspot - os::reserve_memory() and all its friends - could do with a revamp. At least a consistent API documentation :-/. Supposed to be an API-independent abstraction, its facade breaks in many places. See e.g. JDK-8255978, JDK-8253649 (all windows), AIX sysV shmem handling, @AntonKozlov's valiant attempt to add MAP_JIT code heap reservation on MacOS (https://github.com/openjdk/jdk/pull/294), or the relative difficulty with which support for JEP 316 (from Intel) had been added. > > Hence my initial caution. Every new feature increases complexity for us maintainers. Especially if it continues the bad tradition of not documenting or commenting anything. Since I do not know whether Intel sticks around to maintain this contribution (bit of a mixed track record there, see e.g. JDK-8256181), we must plan on maintenance falling to us. > > That said, now that I understand better what you want to do, your plan certainly makes sense and is useful. > > One of the more pressing concerns I have is that the changes to reserve_memory() would somehow be observable from the outside and/or leak back into the os layer when calling os::commit_memory/uncommit_memory/release_memory. This is the case with @AntonKozlov's MAP_JIT change: it requires a matching commit call in os::commit_memory() to be made for executable memory allocated with os::reserve_memory(), and therefore exposed one weakness of the os::reserve_memory() API, that its very difficult to pass along meta information about memory mappings. > > I think this is not the case here, but I'm not sure and we should be sure. > > **More remarks inline.** > >> Hi Thomas, >> >> Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some. >> >> **Responses below inline:** >> >> > Hi, >> > this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. >> > Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. >> >> I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance). > > Please beef up the JBS issue a bit. If you do not have access to it, you can send the text to me I will update it. Or even easier, just update the PR description and we copy the text to the JBS. > > JBS tickets are supposed to keep information about what we did and why for a long time. When formulating the text, just imagine the reader to be someone in the future with general knowledge in your field but without particular knowledge about this very case. I know this is a vague description though; for an example, see e.g. https://bugs.openjdk.java.net/browse/JDK-8255978. > >> >> To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. > > Right, and as Stefan suggested, this should be kept more "fluid" and not be hard coded to 2M, nor to just one additional large page. Maybe the system has four page sizes (our propietary HPUX has that, not that it matters here). > >> >> > I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. >> > What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) >> >> I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted. > > > >> >> > What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". >> >> Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 >> This is where 2m pages are added. >> >> However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077 >> we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G >> >> So in >> https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229 >> we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested. > > > We need to decide on whether we want to do this for the code heap only or for every reservation done with reserve_memory_special (I really dislike that name btw). In your proposal you "piggyback" on the exec property as a standin for "code heap", which is not clean and also not necessarily true. So: > > a) If we only want to do this for the code heap, we could think about creating an own API for allocating the code heap. E.g. os::reserve_code_space() and os::release_code_space(). This is one of the ideas @AntonKozlov came up with to circumvent the need for a fully fledged revamp of these APIs while still being able to move his PR forward. > > b) If we want to do this for all callers of reserve_memory_special(), we should also remove any mention of "exec" and just implement that. > > I currently favour (b) but would like to know opinions of others. > >> >> > One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? >> > What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? >> >> My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back. > > Okay. We do not expect every contributor to have exotic test machines, but this means we will have to do that testing. We need to know to plan in these efforts. > >> >> > The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. >> > For SHM, I think you need to make sure that alignment matches SHMLBA? >> >> Looking into this. >> >> > It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. >> >> I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations. >> >> > Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). >> >> Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those. > > When I write API specs I basically mean "new code should comment better". That can be as simple as a one liner above your os::Linux::select_large_page_size() function. > > About regression tests, we have a google-test suite (see test/hotspot/gtest) which would be the appropiate point to put in tests. > >> >> > The linux-2m-page-specific code in the platform-generic G1 test seems wrong. >> >> Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled? > > I defer to the G1 folks for that. > >> >> > Cheers, Thomas >> >> Thanks again for the review. > > Sure. Thanks for the much more clear information. > > Cheers, Thomas Hi Markus, the more I think about this the more I think it your proposal makes sense. In my opinion I would do it transparently for reserve_memory_special() (so, not tied to code heap). Maybe one way to simplify this and move it forward would be to just do it for UseHugeTLBFS, and leave the UseSHM path unchanged. I consider this less risky since with UseHugeTLBFS we already reserve spaces with mixed page sizes and that seems to work - so here, callers already are wrong if they make any assumptions about the underlying page size. Note that UseHugeTLBFS is the default if +UseLargePages is set. Just my 5 cent. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Wed Nov 25 12:14:56 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 12:14:56 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 11:42:45 GMT, Thomas Stuefe wrote: >> Thank you @kstefanj and @tschatzl for reviewing and comments! >> >> Tested tier1, tier2, tier3 without problems. > > Hi guys, > > I was just reviewing this but see that it was already pushed in the meantime. > > I have some concerns about this change. > > Most importantly, it perpetuates the notion that alignment has anything to do with pages, which is just wrong. I know we have still some coding in the hotspot which assumes this but it is not true. We should remove any coding assuming that, instead of building onto it. > > ReservedSpace::alignment is just that, the alignment of the start address of the ReservedSpace. Arguably, there is no need to even keep it as a member in ReservedSpace, it is needed for reservation and should not be needed beyond that. > > So, `page_size = MIN2(rs.alignment(), os::large_page_size());` is wrong, e.g. if I create a space with 128M aligned base, that does say nothing about the underlying page size. Unfortunately we do not have tests to test this. This coding had been less of a problem as long as it lived in gc land and was only called for heap areas. Now it is a general purpose function. > > (why is this function static btw?) > > As an example for larger alignment which have nothing to do with page size: Metaspace reserves 4M-aligned ReservedSpaces. Currently only uses small pages, but that may change. > > Furthermore I think the notion of asking about the page size of a range is itself shaky. A range can and does consist of multiple page sizes (see e.g. what Linux reserves with UseHugeTLBFS). > > The patch also assumes os::large_page_size() to be the one large page size used, and currently there are attempts by Intel to more or less fall back to a "small large page" if the large page is too large for a given area: see https://github.com/openjdk/jdk/pull/1153. This would increase the number of used page sizes, and could mean space reserved with reserve_memory_special() would silently return memory with small-large-pages instead of os::large_page_size(). > > I am sorry to make such a fuss. > > ----- > > In an ideal world we would have something like > > pagesizes_t os::query_page_sizes(address range); > > which would return information about all page sizes in the range (page_sizes_t could be a bitmask btw since all page sizes are pow2). We have something like this on AIX, see `os::Aix::query_pagesize`. On Linux, we have e.g. int getpagesize(void); > > That'd be awesome since it would take the burden from us to keep information of page sizes or second-guessing reservation logic. That information could even be cached in a ReservedSpace if its difficult to get. > > Cheers, Thomas I wrote > On Linux, we have e.g. int getpagesize(void); > which is of course wrong since it just returns the global small page size. Querying the page size of an arbitrary range seems to require walking smaps, and that is way more cumbersome. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From jlahoda at openjdk.java.net Wed Nov 25 12:43:59 2020 From: jlahoda at openjdk.java.net (Jan Lahoda) Date: Wed, 25 Nov 2020 12:43:59 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: <9wJGqoEIqSX2DDUO5Y8qxv3lvU5QhCgwox5wzApcXwM=.b7cd85fc-018f-4a70-93fb-32218b1babe5@github.com> References: <3hcm-LPJG34kftsIY2_tgDJiPwuplmron5EQkJ4NT5s=.88ff5bc4-a2d8-4273-a958-e271aacd3358@github.com> <9wJGqoEIqSX2DDUO5Y8qxv3lvU5QhCgwox5wzApcXwM=.b7cd85fc-018f-4a70-93fb-32218b1babe5@github.com> Message-ID: On Tue, 24 Nov 2020 23:00:05 GMT, Mandy Chung wrote: >> I agree. This @apiNote needs more clarification to help the readers to understand the context here. One thing we could do in the Package class description to add a "Package Sealing" section that can also explain that it has no relationship to "sealed classes". > > I added an API note in `Package::isSealed` [1] to clarify sealed package vs sealed class or interface. > > The API note you added in `Class::isSealed` can be clarified in a similar fashion like: "Sealed class or interface has no relationship with {@linkplain Package#isSealed package sealing}". > > [1] https://github.com/openjdk/jdk/commit/3c230b8a Thanks for that update, Mandy! I've tweaked the API note as per your recommendation. I'll publish a fixed PR later, reflecting the other review comments as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/1227 From coleen.phillimore at oracle.com Wed Nov 25 12:57:07 2020 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 25 Nov 2020 07:57:07 -0500 Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: References: Message-ID: On 11/25/20 12:25 AM, Thomas Stuefe wrote: > On Tue, 24 Nov 2020 23:38:52 GMT, Coleen Phillimore wrote: > >> The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. >> I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. >> Ran tier1 tests on linux-x64 and windows-x64. > Hi Coleen, > > Why is someone concurrently changing the seed? I thought "TEST" tests do not start the VM? Or is it that some earlier test already did? Yes, people investigated this and the VM is started by an earlier test. > > Could we not just change the test to use os::next_random(seed)? > That would be a two-line change: remove the init_random call, and replace os::random() with `num = seed = os::next_random(seed);` This is a really good suggestion. > > That said, moving the initialization out of the platform files and initialize the seed directly makes sense. I'd like to keep this part because it seemed senseless to me, but I'll rework the rest. > > If you want to stay with your approach of making os::init_random() only available at safepoint, could you make it private? CDS could - or arguably should - use the same technique as outlined above, using os::next_random with an own seed instead of os::random. I'd like only the os.random test to know about next_random function and leave CDS alone. Thanks, Coleen > > A third possibility would be to keep the seed THREAD_LOCAL. > > Cheers, Thomas > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/1422 From sjohanss at openjdk.java.net Wed Nov 25 13:26:01 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 25 Nov 2020 13:26:01 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: Message-ID: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> On Wed, 25 Nov 2020 12:12:19 GMT, Thomas Stuefe wrote: >> Hi guys, >> >> I was just reviewing this but see that it was already pushed in the meantime. >> >> I have some concerns about this change. >> >> Most importantly, it perpetuates the notion that alignment has anything to do with pages, which is just wrong. I know we have still some coding in the hotspot which assumes this but it is not true. We should remove any coding assuming that, instead of building onto it. >> >> ReservedSpace::alignment is just that, the alignment of the start address of the ReservedSpace. Arguably, there is no need to even keep it as a member in ReservedSpace, it is needed for reservation and should not be needed beyond that. >> >> So, `page_size = MIN2(rs.alignment(), os::large_page_size());` is wrong, e.g. if I create a space with 128M aligned base, that does say nothing about the underlying page size. Unfortunately we do not have tests to test this. This coding had been less of a problem as long as it lived in gc land and was only called for heap areas. Now it is a general purpose function. >> >> (why is this function static btw?) >> >> As an example for larger alignment which have nothing to do with page size: Metaspace reserves 4M-aligned ReservedSpaces. Currently only uses small pages, but that may change. >> >> Furthermore I think the notion of asking about the page size of a range is itself shaky. A range can and does consist of multiple page sizes (see e.g. what Linux reserves with UseHugeTLBFS). >> >> The patch also assumes os::large_page_size() to be the one large page size used, and currently there are attempts by Intel to more or less fall back to a "small large page" if the large page is too large for a given area: see https://github.com/openjdk/jdk/pull/1153. This would increase the number of used page sizes, and could mean space reserved with reserve_memory_special() would silently return memory with small-large-pages instead of os::large_page_size(). >> >> I am sorry to make such a fuss. >> >> ----- >> >> In an ideal world we would have something like >> >> pagesizes_t os::query_page_sizes(address range); >> >> which would return information about all page sizes in the range (page_sizes_t could be a bitmask btw since all page sizes are pow2). We have something like this on AIX, see `os::Aix::query_pagesize`. On Linux, we have e.g. int getpagesize(void); >> >> That'd be awesome since it would take the burden from us to keep information of page sizes or second-guessing reservation logic. That information could even be cached in a ReservedSpace if its difficult to get. >> >> Cheers, Thomas > > I wrote > >> On Linux, we have e.g. int getpagesize(void); >> > > which is of course wrong since it just returns the global small page size. Querying the page size of an arbitrary range seems to require walking smaps, and that is way more cumbersome. Hi Thomas, Thanks for sharing your concerns. I agree that this becomes more of a problem now when it is exposed outside the GC, that's why I wasn't sure where we should put this: > Not entirely sure where the best location for such helper is, but a static function ReservedSpace::actual_page_size(ReservedSpace) could work. This is also why it was added as a static helper rather than a member. An alternative is to move this helper to a shared GC utility, and maybe document it a bit more. Adding multiple large page sizes will require this function to be updated, that's one of the reasons I wanted it shared not duplicated. As you say the best thing would be if we had a way to really query the page size rather than doing a good estimation given the looks of the reserved space. But for the current use-cases in the GC, this estimation seems to be enough. Going the other route is, as you say, way more work. A different solution would be to add a `_largest_page_size` member to the reserved space, which would be the larges page size used by that reservation. That might make sense if we are moving towards having multiple large page sizes. What would be your preferred way forward? Cheers, Stefan ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From stefank at openjdk.java.net Wed Nov 25 13:36:56 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Wed, 25 Nov 2020 13:36:56 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 12:01:21 GMT, Doug Simon wrote: > > Could this be done without adding JVMCI code to GCConfig::is_gc_supported? > > This has to go through the WhiteBox API as JVMCI is not accessible to normal Java code (such as VMProps). > We could restrict the JVMCI interposition to WhiteBox. This is what I did [initially](https://github.com/openjdk/jdk/pull/1423/commits/191d61d996dd08f43c2c946f02b0cce0d96f2ef4). However, I then saw that `GCConfig::is_gc_supported` is only called from whitebox.cpp so [moved](https://github.com/openjdk/jdk/pull/1423/commits/54b7ba8463dfee9a679abb09a5cc898ba8550d85) the logic directly into `GCConfig`. I think this is better because as soon as there's another caller of `GCConfig::is_gc_supported`, they should get an accurate answer and not have to worry about implications of `EnableJVMCI` separately. We already have code in place to ensure that EnableJVMCI is set to false when you run a GC that doesn't support JVMCI: void JVMCIGlobals::check_jvmci_supported_gc() { if (EnableJVMCI) { // Check if selected GC is supported by JVMCI and Java compiler if (!(UseSerialGC || UseParallelGC || UseG1GC)) { vm_exit_during_initialization("JVMCI Compiler does not support selected GC", GCConfig::hs_err_name()); FLAG_SET_DEFAULT(EnableJVMCI, false); FLAG_SET_DEFAULT(UseJVMCICompiler, false); } } } So, in our view we have "if you run this particular GC, that don't have support in JVMCI, then we turn off JVMCI forcefully", and we rely on that behavior in the HotSpot code. The proposed patch turns this inside out and the changed is_gc_supported function now becomes "if you've turned on EnableJVMCI is this GC supported". It sort-of inverts the dependency between the two sub-systems and we don't want to use that particular question in HotSpot. AFAICT, this is something you only need to ask in the context of jtreg's @requires, so I think it should be moved out of the GC code and put in the jtreg extension code. We've other flags that we treat similarly. For example, UseCompressedOops. ZGC doesn't support it, so we turn it off forcefully, but we don't have code in GCConfig::is_gc_supported that returns false when UseCompressedOops is turned on. That is completely handled by the jtreg extensions (and the WhiteboxAPI). I think you could easily solve this without adding calls through JVMCI from the GC code by: 1) Update JVMCIGlobals::supported_gc() to make a call through JVMCI into the used compiler that can tell if it supports the particular GC 2) Use Vladimirs proposed check_jvmci_supported_gc function, which uses supported_gc(). 3) Add your own WhiteboxAPI function, say WB_IsJVMCISupportedByGC, and call your "supported_gc" function from that 4) Update VMProps.java like I suggested above ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From sjohanss at openjdk.java.net Wed Nov 25 13:38:02 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 25 Nov 2020 13:38:02 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> <8f-BJdFip5yf0Rv4uw-qcXVk2uM3Lb6Hrq9VPR6UzF4=.04966477-8834-4fb9-aa77-8da86f104176@github.com> Message-ID: On Wed, 25 Nov 2020 12:02:43 GMT, Thomas Stuefe wrote: >> Hi Markus, >> >> thanks, and a belated welcome! >> >> Some initial background: >> >> We at SAP are maintainers for a number of ports, among others AIX and linux ppc/s390 as well as some propietary ones (e.g. HPUX or ia64). So I wear my platform glasses when looking at this code. >> >> IMHO the virtual memory layer in hotspot - os::reserve_memory() and all its friends - could do with a revamp. At least a consistent API documentation :-/. Supposed to be an API-independent abstraction, its facade breaks in many places. See e.g. JDK-8255978, JDK-8253649 (all windows), AIX sysV shmem handling, @AntonKozlov's valiant attempt to add MAP_JIT code heap reservation on MacOS (https://github.com/openjdk/jdk/pull/294), or the relative difficulty with which support for JEP 316 (from Intel) had been added. >> >> Hence my initial caution. Every new feature increases complexity for us maintainers. Especially if it continues the bad tradition of not documenting or commenting anything. Since I do not know whether Intel sticks around to maintain this contribution (bit of a mixed track record there, see e.g. JDK-8256181), we must plan on maintenance falling to us. >> >> That said, now that I understand better what you want to do, your plan certainly makes sense and is useful. >> >> One of the more pressing concerns I have is that the changes to reserve_memory() would somehow be observable from the outside and/or leak back into the os layer when calling os::commit_memory/uncommit_memory/release_memory. This is the case with @AntonKozlov's MAP_JIT change: it requires a matching commit call in os::commit_memory() to be made for executable memory allocated with os::reserve_memory(), and therefore exposed one weakness of the os::reserve_memory() API, that its very difficult to pass along meta information about memory mappings. >> >> I think this is not the case here, but I'm not sure and we should be sure. >> >> **More remarks inline.** >> >>> Hi Thomas, >>> >>> Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some. >>> >>> **Responses below inline:** >>> >>> > Hi, >>> > this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping. >>> > Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures. >>> >>> I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance). >> >> Please beef up the JBS issue a bit. If you do not have access to it, you can send the text to me I will update it. Or even easier, just update the PR description and we copy the text to the JBS. >> >> JBS tickets are supposed to keep information about what we did and why for a long time. When formulating the text, just imagine the reader to be someone in the future with general knowledge in your field but without particular knowledge about this very case. I know this is a vague description though; for an example, see e.g. https://bugs.openjdk.java.net/browse/JDK-8255978. >> >>> >>> To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations. >> >> Right, and as Stefan suggested, this should be kept more "fluid" and not be hard coded to 2M, nor to just one additional large page. Maybe the system has four page sizes (our propietary HPUX has that, not that it matters here). >> >>> >>> > I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec. >>> > What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :) >>> >>> I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted. >> >> >> >>> >>> > What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other". >>> >>> Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 >>> This is where 2m pages are added. >>> >>> However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077 >>> we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G >>> >>> So in >>> https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229 >>> we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested. >> >> >> We need to decide on whether we want to do this for the code heap only or for every reservation done with reserve_memory_special (I really dislike that name btw). In your proposal you "piggyback" on the exec property as a standin for "code heap", which is not clean and also not necessarily true. So: >> >> a) If we only want to do this for the code heap, we could think about creating an own API for allocating the code heap. E.g. os::reserve_code_space() and os::release_code_space(). This is one of the ideas @AntonKozlov came up with to circumvent the need for a fully fledged revamp of these APIs while still being able to move his PR forward. >> >> b) If we want to do this for all callers of reserve_memory_special(), we should also remove any mention of "exec" and just implement that. >> >> I currently favour (b) but would like to know opinions of others. >> >>> >>> > One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size? >>> > What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM? >>> >>> My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back. >> >> Okay. We do not expect every contributor to have exotic test machines, but this means we will have to do that testing. We need to know to plan in these efforts. >> >>> >>> > The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages. >>> > For SHM, I think you need to make sure that alignment matches SHMLBA? >>> >>> Looking into this. >>> >>> > It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR. >>> >>> I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations. >>> >>> > Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder). >>> >>> Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those. >> >> When I write API specs I basically mean "new code should comment better". That can be as simple as a one liner above your os::Linux::select_large_page_size() function. >> >> About regression tests, we have a google-test suite (see test/hotspot/gtest) which would be the appropiate point to put in tests. >> >>> >>> > The linux-2m-page-specific code in the platform-generic G1 test seems wrong. >>> >>> Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled? >> >> I defer to the G1 folks for that. >> >>> >>> > Cheers, Thomas >>> >>> Thanks again for the review. >> >> Sure. Thanks for the much more clear information. >> >> Cheers, Thomas > > Hi Markus, > > the more I think about this the more I think it your proposal makes sense. > > In my opinion I would do it transparently for reserve_memory_special() (so, not tied to code heap). > > Maybe one way to simplify this and move it forward would be to just do it for UseHugeTLBFS, and leave the UseSHM path unchanged. I consider this less risky since with UseHugeTLBFS we already reserve spaces with mixed page sizes and that seems to work - so here, callers already are wrong if they make any assumptions about the underlying page size. Note that UseHugeTLBFS is the default if +UseLargePages is set. > > Just my 5 cent. > > Cheers, Thomas I agree with what Thomas is saying. This should be a generic thing for reservations, as I've suggested before, choosing the largest page size given the size of the mapping. I would also be good with starting with the `UseHugeTLBFS` case. When it comes to testing, we should not hard code these kind of things in the test, but add WhiteBox functions that return the correct numbers given the platform and environment. WhiteBox wb = WhiteBox.getWhiteBox(); smallPageSize = wb.getVMPageSize(); smallPageSize = wb.getVMPageSize(); largePageSize = wb.getVMLargePageSize(); largePageSize = wb.getVMLargePageSize(); largePageExecSize = 2097152; So instead of hard coding this, I guess the correct approach would be to return an array of available page sizes and verify that the correct one is used. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Wed Nov 25 13:56:59 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 13:56:59 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> Message-ID: On Wed, 25 Nov 2020 13:23:30 GMT, Stefan Johansson wrote: >> I wrote >> >>> On Linux, we have e.g. int getpagesize(void); >>> >> >> which is of course wrong since it just returns the global small page size. Querying the page size of an arbitrary range seems to require walking smaps, and that is way more cumbersome. > > Hi Thomas, > > Thanks for sharing your concerns. I agree that this becomes more of a problem now when it is exposed outside the GC, that's why I wasn't sure where we should put this: >> Not entirely sure where the best location for such helper is, but a static function ReservedSpace::actual_page_size(ReservedSpace) could work. > > This is also why it was added as a static helper rather than a member. An alternative is to move this helper to a shared GC utility, and maybe document it a bit more. Adding multiple large page sizes will require this function to be updated, that's one of the reasons I wanted it shared not duplicated. > > As you say the best thing would be if we had a way to really query the page size rather than doing a good estimation given the looks of the reserved space. But for the current use-cases in the GC, this estimation seems to be enough. Going the other route is, as you say, way more work. > > A different solution would be to add a `_largest_page_size` member to the reserved space, which would be the larges page size used by that reservation. That might make sense if we are moving towards having multiple large page sizes. > > What would be your preferred way forward? > > Cheers, > Stefan Hi Stefan, > Hi Thomas, > > Thanks for sharing your concerns. I agree that this becomes more of a problem now when it is exposed outside the GC, that's why I wasn't sure where we should put this: > > > Not entirely sure where the best location for such helper is, but a static function ReservedSpace::actual_page_size(ReservedSpace) could work. > > This is also why it was added as a static helper rather than a member. An alternative is to move this helper to a shared GC utility, and maybe document it a bit more. Adding multiple large page sizes will require this function to be updated, that's one of the reasons I wanted it shared not duplicated. > > As you say the best thing would be if we had a way to really query the page size rather than doing a good estimation given the looks of the reserved space. But for the current use-cases in the GC, this estimation seems to be enough. Going the other route is, as you say, way more work. > > A different solution would be to add a `_largest_page_size` member to the reserved space, which would be the larges page size used by that reservation. That might make sense if we are moving towards having multiple large page sizes. > > What would be your preferred way forward? > > Cheers, > Stefan Thanks for taking my concerns seriously. I know that this is all a bit messy, and has a long history. I briefly looked into some way to ask the OS about the page size of a memory region. That is certainly possible but annoyingly complicated. On AIX we have a thing already. The only platform that has :) On Linux you would have to scan /proc/self/smaps. May be worthwhile but complex and expensive for large processes. On Windows I have no clue what do do. VirtualQuery() does not return page information. Neither do I know what to do on MacOS. Interestingly, I found that we have os::page_info() and os::scan_pages(), which look like they would do what we want but they are empty on all platforms. Solaris leftover. I filed JDK-8257076 for you guys to track this since you may remove some of the associated NUMA coding too. --- So the more practical and faster way would be to store this information ourselves. It would be better than trying to deduce what we did at reservation time. I am currently working on a prototype to expand os::reserve_memory_special() to optionally return information about the page sizes in the reserved range. Kinda like sigset_t. That would allow the os implementors to do whatever (eg mix in 2M pages), and you have the information about that set of page sizes, which could be stored in ReservedSpace as a member. Later, that info can be used as "what the underlying page sizes are to our best knowledge". That prototype would look a bit like this: char* os::reserve_memory_special(..., pagesizeset_t* page_sizes = NULL); This is a smaller version of what I think would be generally a good idea, which is for all os::reserve__... APIs to return meta information about the reserved space for the caller to hold on to. But this is only for os::reserve_memory_special(), and only for page sizes, so less daunting. I'll post an RFR if I have a prototype ready. Then we may take another look at this change and rework this actual_page_size(). Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From stuefe at openjdk.java.net Wed Nov 25 14:02:00 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 14:02:00 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> <8f-BJdFip5yf0Rv4uw-qcXVk2uM3Lb6Hrq9VPR6UzF4=.04966477-8834-4fb9-aa77-8da86f104176@github.com> Message-ID: On Wed, 25 Nov 2020 13:35:18 GMT, Stefan Johansson wrote: >> Hi Markus, >> >> the more I think about this the more I think it your proposal makes sense. >> >> In my opinion I would do it transparently for reserve_memory_special() (so, not tied to code heap). >> >> Maybe one way to simplify this and move it forward would be to just do it for UseHugeTLBFS, and leave the UseSHM path unchanged. I consider this less risky since with UseHugeTLBFS we already reserve spaces with mixed page sizes and that seems to work - so here, callers already are wrong if they make any assumptions about the underlying page size. Note that UseHugeTLBFS is the default if +UseLargePages is set. >> >> Just my 5 cent. >> >> Cheers, Thomas > > I agree with what Thomas is saying. This should be a generic thing for reservations, as I've suggested before, choosing the largest page size given the size of the mapping. I would also be good with starting with the `UseHugeTLBFS` case. > > When it comes to testing, we should not hard code these kind of things in the test, but add WhiteBox functions that return the correct numbers given the platform and environment. > > WhiteBox wb = WhiteBox.getWhiteBox(); > smallPageSize = wb.getVMPageSize(); > smallPageSize = wb.getVMPageSize(); > largePageSize = wb.getVMLargePageSize(); > largePageSize = wb.getVMLargePageSize(); > largePageExecSize = 2097152; > So instead of hard coding this, I guess the correct approach would be to return an array of available page sizes and verify that the correct one is used. I honestly don't even know why we have UseSHM. Seems redundant, and since it uses SystemV shared memory which has a different semantics from mmap, it is subtly broken in a number of places (eg https://bugs.openjdk.java.net/browse/JDK-8257040 or https://bugs.openjdk.java.net/browse/JDK-8257041). ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sjohanss at openjdk.java.net Wed Nov 25 14:44:58 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 25 Nov 2020 14:44:58 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> Message-ID: <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> On Wed, 25 Nov 2020 13:54:23 GMT, Thomas Stuefe wrote: >> Hi Thomas, >> >> Thanks for sharing your concerns. I agree that this becomes more of a problem now when it is exposed outside the GC, that's why I wasn't sure where we should put this: >>> Not entirely sure where the best location for such helper is, but a static function ReservedSpace::actual_page_size(ReservedSpace) could work. >> >> This is also why it was added as a static helper rather than a member. An alternative is to move this helper to a shared GC utility, and maybe document it a bit more. Adding multiple large page sizes will require this function to be updated, that's one of the reasons I wanted it shared not duplicated. >> >> As you say the best thing would be if we had a way to really query the page size rather than doing a good estimation given the looks of the reserved space. But for the current use-cases in the GC, this estimation seems to be enough. Going the other route is, as you say, way more work. >> >> A different solution would be to add a `_largest_page_size` member to the reserved space, which would be the larges page size used by that reservation. That might make sense if we are moving towards having multiple large page sizes. >> >> What would be your preferred way forward? >> >> Cheers, >> Stefan > > Hi Stefan, > >> Hi Thomas, >> >> Thanks for sharing your concerns. I agree that this becomes more of a problem now when it is exposed outside the GC, that's why I wasn't sure where we should put this: >> >> > Not entirely sure where the best location for such helper is, but a static function ReservedSpace::actual_page_size(ReservedSpace) could work. >> >> This is also why it was added as a static helper rather than a member. An alternative is to move this helper to a shared GC utility, and maybe document it a bit more. Adding multiple large page sizes will require this function to be updated, that's one of the reasons I wanted it shared not duplicated. >> >> As you say the best thing would be if we had a way to really query the page size rather than doing a good estimation given the looks of the reserved space. But for the current use-cases in the GC, this estimation seems to be enough. Going the other route is, as you say, way more work. >> >> A different solution would be to add a `_largest_page_size` member to the reserved space, which would be the larges page size used by that reservation. That might make sense if we are moving towards having multiple large page sizes. >> >> What would be your preferred way forward? >> >> Cheers, >> Stefan > > Thanks for taking my concerns seriously. I know that this is all a bit messy, and has a long history. > > I briefly looked into some way to ask the OS about the page size of a memory region. That is certainly possible but annoyingly complicated. > > On AIX we have a thing already. The only platform that has :) > > On Linux you would have to scan /proc/self/smaps. May be worthwhile but complex and expensive for large processes. > > On Windows I have no clue what do do. VirtualQuery() does not return page information. Neither do I know what to do on MacOS. > > Interestingly, I found that we have os::page_info() and os::scan_pages(), which look like they would do what we want but they are empty on all platforms. Solaris leftover. I filed JDK-8257076 for you guys to track this since you may remove some of the associated NUMA coding too. > > --- > > So the more practical and faster way would be to store this information ourselves. It would be better than trying to deduce what we did at reservation time. > > I am currently working on a prototype to expand os::reserve_memory_special() to optionally return information about the page sizes in the reserved range. Kinda like sigset_t. That would allow the os implementors to do whatever (eg mix in 2M pages), and you have the information about that set of page sizes, which could be stored in ReservedSpace as a member. Later, that info can be used as "what the underlying page sizes are to our best knowledge". > > That prototype would look a bit like this: > > char* os::reserve_memory_special(..., pagesizeset_t* page_sizes = NULL); > > This is a smaller version of what I think would be generally a good idea, which is for all os::reserve__... APIs to return meta information about the reserved space for the caller to hold on to. But this is only for os::reserve_memory_special(), and only for page sizes, so less daunting. > > I'll post an RFR if I have a prototype ready. Then we may take another look at this change and rework this actual_page_size(). > > Cheers, Thomas So Thomas in what cases do we need the `ReservedSpace` to be able to handle multiple page-sizes? A simpler approach would be to aim for only having one page size per `ReservedSpace`. For example, in G1 it would be problematic or at least inefficient to have multiple page sizes per mapping. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From dnsimon at openjdk.java.net Wed Nov 25 14:55:56 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 25 Nov 2020 14:55:56 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 13:34:24 GMT, Stefan Karlsson wrote: >>> Could this be done without adding JVMCI code to GCConfig::is_gc_supported? >> >> This has to go through the WhiteBox API as JVMCI is not accessible to normal Java code (such as VMProps). >> We could restrict the JVMCI interposition to WhiteBox. This is what I did [initially](https://github.com/openjdk/jdk/pull/1423/commits/191d61d996dd08f43c2c946f02b0cce0d96f2ef4). However, I then saw that `GCConfig::is_gc_supported` is only called from whitebox.cpp so [moved](https://github.com/openjdk/jdk/pull/1423/commits/54b7ba8463dfee9a679abb09a5cc898ba8550d85) the logic directly into `GCConfig`. I think this is better because as soon as there's another caller of `GCConfig::is_gc_supported`, they should get an accurate answer and not have to worry about implications of `EnableJVMCI` separately. > >> > Could this be done without adding JVMCI code to GCConfig::is_gc_supported? >> >> This has to go through the WhiteBox API as JVMCI is not accessible to normal Java code (such as VMProps). >> We could restrict the JVMCI interposition to WhiteBox. This is what I did [initially](https://github.com/openjdk/jdk/pull/1423/commits/191d61d996dd08f43c2c946f02b0cce0d96f2ef4). However, I then saw that `GCConfig::is_gc_supported` is only called from whitebox.cpp so [moved](https://github.com/openjdk/jdk/pull/1423/commits/54b7ba8463dfee9a679abb09a5cc898ba8550d85) the logic directly into `GCConfig`. I think this is better because as soon as there's another caller of `GCConfig::is_gc_supported`, they should get an accurate answer and not have to worry about implications of `EnableJVMCI` separately. > > We already have code in place to ensure that EnableJVMCI is set to false when you run a GC that doesn't support JVMCI: > void JVMCIGlobals::check_jvmci_supported_gc() { > if (EnableJVMCI) { > // Check if selected GC is supported by JVMCI and Java compiler > if (!(UseSerialGC || UseParallelGC || UseG1GC)) { > vm_exit_during_initialization("JVMCI Compiler does not support selected GC", GCConfig::hs_err_name()); > FLAG_SET_DEFAULT(EnableJVMCI, false); > FLAG_SET_DEFAULT(UseJVMCICompiler, false); > } > } > } > > So, in our view we have "if you run this particular GC, that don't have support in JVMCI, then we turn off JVMCI forcefully", and we rely on that behavior in the HotSpot code. The proposed patch turns this inside out and the changed is_gc_supported function now becomes "if you've turned on EnableJVMCI is this GC supported". It sort-of inverts the dependency between the two sub-systems and we don't want to use that particular question in HotSpot. AFAICT, this is something you only need to ask in the context of jtreg's @requires, so I think it should be moved out of the GC code and put in the jtreg extension code. > > We've other flags that we treat similarly. For example, UseCompressedOops. ZGC doesn't support it, so we turn it off forcefully, but we don't have code in GCConfig::is_gc_supported that returns false when UseCompressedOops is turned on. That is completely handled by the jtreg extensions (and the WhiteboxAPI). > > I think you could easily solve this without adding calls through JVMCI from the GC code by: > 1) Update JVMCIGlobals::supported_gc() to make a call through JVMCI into the used compiler that can tell if it supports the particular GC > 2) Use Vladimirs proposed check_jvmci_supported_gc function, which uses supported_gc(). > 3) Add your own WhiteboxAPI function, say WB_IsJVMCISupportedByGC, and call your "supported_gc" function from that > 4) Update VMProps.java like I suggested above `JVMCIGlobals::check_jvmci_supported_gc` is incorrect in that it hard codes the GCs supported by JVMCI which not something JVMCI itself can accurately answer. What's more, the upcall into JVMCI Java code to get the right answer cannot be made here as it is too early in the VM boot process. There's not even a `JavaThread` available yet. This code should be removed and let Graal exit the VM with an error should it be used in conjunction with a GC it does not support. For example: Error occurred during initialization of VM JVMCI Compiler does not support selected GC: epsilon gc Moving the check to later in the VM bootstrap doesn't really work either as it imposes the overhead of initializing JVMCI and Graal eagerly. This means a conflict between GC and JVMCI compiler cannot be finessed by overriding the compiler selection to use a non-JVMCI compiler. It's a fatal VM error, just like selecting multiple GCs on the command line is. If you agree with that, then I think the proposed PR is the right approach. We could revert to the version that leaves the JVMCI specific logic in `WB_IsGCSupported` but as I stated, that means `GCConfig::is_gc_supported` can give a wrong answer. ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From eosterlund at openjdk.java.net Wed Nov 25 15:09:59 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 25 Nov 2020 15:09:59 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: <36Kj4FuE21c-OU5VhIWXbBvHqRTh5T3TrQSs7J1LJqw=.ea8ac32f-6646-4125-b11c-d3263134faf0@github.com> On Wed, 25 Nov 2020 10:01:15 GMT, Vladimir Ivanov wrote: >> The information that there was a barrier attached, is implicit in the ins_encode block due to it being run at all. In other words, since we matched the mach node to our ZGC access instead of a normal access, we already know that there was barrier data attached, and that we no longer have such barrier data. > > Ok, makes sense. What do you think about making `ZLoadBarrierElided = 0` then? I'm okay with that. I don't have a strong preference. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From stuefe at openjdk.java.net Wed Nov 25 15:14:05 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 15:14:05 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> Message-ID: On Wed, 25 Nov 2020 14:40:52 GMT, Stefan Johansson wrote: > So Thomas in what cases do we need the `ReservedSpace` to be able to handle multiple page-sizes? A simpler approach would be to aim for only having one page size per `ReservedSpace`. For example, in G1 it would be problematic or at least inefficient to have multiple page sizes per mapping. Happens if you reserve a space with a size not aligned to the underlying large page size. See reserve_memory_special_huge_tlbfs_mixed() on Linux. In that case, we try to be smart and fill in the ends with small pages rather than failing. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From stuefe at openjdk.java.net Wed Nov 25 15:14:06 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 15:14:06 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> Message-ID: On Wed, 25 Nov 2020 15:09:29 GMT, Thomas Stuefe wrote: >> So Thomas in what cases do we need the `ReservedSpace` to be able to handle multiple page-sizes? A simpler approach would be to aim for only having one page size per `ReservedSpace`. For example, in G1 it would be problematic or at least inefficient to have multiple page sizes per mapping. > >> So Thomas in what cases do we need the `ReservedSpace` to be able to handle multiple page-sizes? A simpler approach would be to aim for only having one page size per `ReservedSpace`. For example, in G1 it would be problematic or at least inefficient to have multiple page sizes per mapping. > > Happens if you reserve a space with a size not aligned to the underlying large page size. See reserve_memory_special_huge_tlbfs_mixed() on Linux. In that case, we try to be smart and fill in the ends with small pages rather than failing. Of which the Intel approach with the 2M pages could be seen as an expansion (e.g. if you have a heap of 1.5G, allocate 1G huge page, and fill in the ends with 2M pages, and if necessary with 4K pages). ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From eosterlund at openjdk.java.net Wed Nov 25 15:16:02 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 25 Nov 2020 15:16:02 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: <2cJwK-GzCQ5fcT333q5GZPSlz5t7ghs4Bg5YGzCOwjA=.a577266a-6730-4cde-9b46-5953e800ef9e@github.com> On Wed, 25 Nov 2020 08:43:04 GMT, Vladimir Ivanov wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 623: > >> 621: // Also we need to add memory barrier to prevent commoning reads >> 622: // from this field across safepoint since GC can change its value. >> 623: bool need_read_barrier = (((on_weak || on_phantom) && !no_keepalive) || > > There's a slight change: `in_heap && (on_weak || ...)` turns into `(on_weak ...) || (in_heap ...)`. It will introduce a read barrier for `!in_heap && on_weak` case. Does it occur in practice? > > Another one: `on_weak` turns into ((on_weak ...) && !no_keepalive). > My interpretation is no read barrier needed when `NO_KEEPALIVE` flag is used and currently a redundant barrier is issued. > > Maybe replace `!no_keepalive` with just `keep_alive`? The former is harder to parse. > > The check grows bigger and bigger. Maybe it's time to split it? > > Turn `on_weak || on_phantom` into `!is_strong`? I don't think we have any !in_heap && on_weak loads today. But if we did, they would indeed need read barriers. We need read barrier if the the reference isn't provably strong... unless it's an AS_NO_KEEPALIVE access. That also reflects why the variable is called no_keepalive instead of keepalive; it is to reflect the shared decorator name used all over the place. I don't mind inverting it though, but personally found it easier to read when the names match our decorators. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From sjohanss at openjdk.java.net Wed Nov 25 15:25:00 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 25 Nov 2020 15:25:00 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> Message-ID: <_imjzmFgjMEABYlW2LnfuyJNqyil1kBrX_AFmq_LBlc=.7b70d4bd-6257-4436-81ee-19b40953ed10@github.com> On Wed, 25 Nov 2020 15:10:56 GMT, Thomas Stuefe wrote: >>> So Thomas in what cases do we need the `ReservedSpace` to be able to handle multiple page-sizes? A simpler approach would be to aim for only having one page size per `ReservedSpace`. For example, in G1 it would be problematic or at least inefficient to have multiple page sizes per mapping. >> >> Happens if you reserve a space with a size not aligned to the underlying large page size. See reserve_memory_special_huge_tlbfs_mixed() on Linux. In that case, we try to be smart and fill in the ends with small pages rather than failing. > > Of which the Intel approach with the 2M pages could be seen as an expansion (e.g. if you have a heap of 1.5G, allocate 1G huge page, and fill in the ends with 2M pages, and if necessary with 4K pages). Sorry was a bit unclear, I know about this case but I was more thinking do we need to support it. Could we just always align correctly. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From sjohanss at openjdk.java.net Wed Nov 25 15:25:01 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 25 Nov 2020 15:25:01 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: <_imjzmFgjMEABYlW2LnfuyJNqyil1kBrX_AFmq_LBlc=.7b70d4bd-6257-4436-81ee-19b40953ed10@github.com> References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> <_imjzmFgjMEABYlW2LnfuyJNqyil1kBrX_A Fmq_LBlc=.7b70d4bd-6257-4436-81ee-19b40953ed10@github.com> Message-ID: On Wed, 25 Nov 2020 15:20:22 GMT, Stefan Johansson wrote: >> Of which the Intel approach with the 2M pages could be seen as an expansion (e.g. if you have a heap of 1.5G, allocate 1G huge page, and fill in the ends with 2M pages, and if necessary with 4K pages). > > Sorry was a bit unclear, I know about this case but I was more thinking do we need to support it. Could we just always align correctly. The model would be so much nicer if a `ReservedSpace` always just had one page-size. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From stuefe at openjdk.java.net Wed Nov 25 15:55:03 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 15:55:03 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> <_imjzmFgjMEABYlW2LnfuyJNqyil1kBrX_A Fmq_LBlc=.7b70d4bd-6257-4436-81ee-19b40953ed10@github.com> Message-ID: On Wed, 25 Nov 2020 15:22:27 GMT, Stefan Johansson wrote: >> Sorry was a bit unclear, I know about this case but I was more thinking do we need to support it. Could we just always align correctly. > > The model would be so much nicer if a `ReservedSpace` always just had one page-size. Yes, but I think it makes sense to allow the os layer to mix page sizes for large paged areas, for performance reasons. The fact that this coding exists, and that Intel wants to further complicate it and add 2M pages, means we have a need here. Trying to avoid this just means that people add patches sideways to satisfy specific needs, which hurts maintainability. Also, I don't like to discourage first time contributors with lots of concerns, therefore I'd like a cleaner, more flexible os layer. But no-one forces you to accept multi-page-sized-areas. If you really want just one page size, you can query the largest page size available beforehand and align the reservation size accordingly, and with my proposed change you could now assert the result and log it correctly. But if one just generally wants large pages without caring for the precise layout, one could let os::reserve_memory_special() do its best, and would now get the information about what reserve_memory_special() did. For example, were I to re-introduce large pages for Metaspace, I would like to have the luxury of just calling os::reserve_memory_special() without having to think about specific geometry - if the space is large enough for large pages, it should stitch the area together as best as it can. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From shade at openjdk.java.net Wed Nov 25 15:55:28 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 15:55:28 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes [v2] In-Reply-To: References: Message-ID: > JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: > > 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. > 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. > 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. > > Supporting this directly in compilers would improve nanobenchmark fidelity. > > Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). > > Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. > > C1 code is platform-independent, and it adds the new node which is then lowered to nothing. > > C2 code is more complicated. I tried to introduce new node and hook arguments there, but failed. There seems to be no way to model the effects we are after: consume the value, but have no observable side effects. Roland suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. This drags the blackhole through C2 as if it has call-like side effects, and then emits nothing. On the downside, it requires fiddling with arch-specific code in every .ad. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Redo C2 blackholes as CallBlackholeJavaNode ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1203/files - new: https://git.openjdk.java.net/jdk/pull/1203/files/0cced3d7..a5caa4c0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=00-01 Stats: 252 lines in 22 files changed: 129 ins; 96 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/1203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203 PR: https://git.openjdk.java.net/jdk/pull/1203 From shade at openjdk.java.net Wed Nov 25 16:07:24 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 16:07:24 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes [v3] In-Reply-To: References: Message-ID: > JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: > > 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. > 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. > 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. > > Supporting this directly in compilers would improve nanobenchmark fidelity. > > Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). > > Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. > > C1 code is platform-independent, and it adds the new node which is then lowered to nothing. > > C2 code is more complicated. There were four attempts to implement this, and what you see in the PR is the fourth attempt. > > [First attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.01/) was to introduce fake store like `StoreV` ("store void"), and then lower them to nothing. It runs into a series of funky problems: you would like to have at least two shapes of the store to match the store type width not to confuse the optimizer, or even have the whole mirror of `Store*` hierarchy. Additionally, optimizer tweaks were needed. The awkward problem of GC barrier verification showed up: if `StoreV*` is a subclass of `Store*`, then verificators rightfully expect GC barriers before them. If we emit GC, then we need to handle walking over `StoreV*` nodes in optimizers. > > [Second attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.04/) was to introduce the special `Blackhole` node that consumes the values. Alas, the many attempts to make sure the new node is not DCE'd failed. Roland looked at it, and suggested that there seems to be no way to model the effects we are after: consume the value, but have no observable side effects. So, suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. > > ...which is the essence of the third attempt. Drag the blackhole through C2 as if it has call-like side effects, and then emit nothing. Instead of boolean flag, the subsequent iteration introduced a full new `CallBlackholeJava` node, that is a call as far as optimizers are concerned, and then it is matched to nothing in `.ad`. This seems to require the least fiddling with C2 internals. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Revert old changes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1203/files - new: https://git.openjdk.java.net/jdk/pull/1203/files/a5caa4c0..260b1ae5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=01-02 Stats: 10 lines in 2 files changed: 0 ins; 10 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203 PR: https://git.openjdk.java.net/jdk/pull/1203 From shade at openjdk.java.net Wed Nov 25 16:15:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 16:15:59 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 16:12:13 GMT, Aleksey Shipilev wrote: >> Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects. >> >> Can you elaborate on your experiment with introducing custom node you mentioned? >> Have you tried introducing new control node and just wire data nodes to it? > >> Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects. > > Right, that is what `Ideal` and `RegMask` handling in new `(Mach)CallBlackholeJava` node does. On the upside, it IMO makes Blackhole semantics close to what I want in JMH: it is like a call, but without the actual call. So obvious code generation quirks handled already, I think other effects are good to have. > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > But Aleksey, there is an alternative: a store that doesn't do anything. > Did you consider that instead? I guess the problem is that there'd be a lot > more nodes. I did try that. It was the very first attempt at doing it in C2, but it is harder than it looks. I updated the PR description with some history of attempts. ------------- PR: https://git.openjdk.java.net/jdk/pull/1203 From shade at openjdk.java.net Wed Nov 25 16:15:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 16:15:58 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 12:42:57 GMT, Vladimir Ivanov wrote: > Can you elaborate on your experiment with introducing custom node you mentioned? > Have you tried introducing new control node and just wire data nodes to it? See the updated PR description. Yes, I tried to introduce a new node and just wire the data nodes in it, but then I failed (miserably) to make sure the node is not considered dead by subsequent optimizations. Roland looked at it too, and did not think we can manage it. So we decided instead to piggyback on calls. New version hopefully makes it much cleaner: it is now `CallBlackholeJava` node. We can try and unhook it from `CallJava` hierarchy, and try to manage its effects more explicitly, but my prior experience tells me it is not as simple as it looks at the beginning. > Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects. Right, that is what `Ideal` and `RegMask` handling in new `(Mach)CallBlackholeJava` node does. On the upside, it IMO makes Blackhole semantics close to what I want in JMH: it is like a call, but without the actual call. So obvious code generation quirks handled already, I think other effects are good to have. ------------- PR: https://git.openjdk.java.net/jdk/pull/1203 From shade at openjdk.java.net Wed Nov 25 16:37:06 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 16:37:06 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes [v4] In-Reply-To: References: Message-ID: > JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: > > 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. > 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. > 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. > > Supporting this directly in compilers would improve nanobenchmark fidelity. > > Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). > > Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. > > C1 code is platform-independent, and it adds the new node which is then lowered to nothing. > > C2 code is more complicated. There were four attempts to implement this, and what you see in the PR is the fourth attempt. > > [First attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.01/) was to introduce fake store like `StoreV` ("store void"), and then lower them to nothing. It runs into a series of funky problems: you would like to have at least two shapes of the store to match the store type width not to confuse the optimizer, or even have the whole mirror of `Store*` hierarchy. Additionally, optimizer tweaks were needed. The awkward problem of GC barrier verification showed up: if `StoreV*` is a subclass of `Store*`, then verificators rightfully expect GC barriers before them. If we emit GC, then we need to handle walking over `StoreV*` nodes in optimizers. > > [Second attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.04/) was to introduce the special `Blackhole` node that consumes the values -- basically like C1 implementation does it. Alas, the many attempts to make sure the new node is not DCE'd failed. Roland looked at it, and suggested that there seems to be no way to model the effects we are after: consume the value, but have no observable side effects. So, suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. > > ...which is the essence of the third attempt. Drag the blackhole through C2 as if it has call-like side effects, and then emit nothing. Instead of boolean flag, the subsequent iteration introduced a full new `CallBlackholeJava` node, that is a call as far as optimizers are concerned, and then it is matched to nothing in `.ad`. This seems to require the least fiddling with C2 internals. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Fixes after the merge - Merge branch 'master' into JDK-8252505-blackholes - More touchups: dead code, resolve TODOs - Revert old changes - Redo C2 blackholes as CallBlackholeJavaNode - 8252505: C1/C2 compiler support for blackholes ------------- Changes: https://git.openjdk.java.net/jdk/pull/1203/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=03 Stats: 1452 lines in 42 files changed: 1441 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203 PR: https://git.openjdk.java.net/jdk/pull/1203 From tschatzl at openjdk.java.net Wed Nov 25 17:04:02 2020 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 25 Nov 2020 17:04:02 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> <_imjzmFgjMEABYlW2LnfuyJNqyil1kBrX_A Fmq_LBlc=.7b70d4bd-6257-4436-81ee-19b40953ed10@github.com> Message-ID: On Wed, 25 Nov 2020 15:52:24 GMT, Thomas Stuefe wrote: >> The model would be so much nicer if a `ReservedSpace` always just had one page-size. > > Yes, but I think it makes sense to allow the os layer to mix page sizes for large paged areas, for performance reasons. The fact that this coding exists, and that Intel wants to further complicate it and add 2M pages, means we have a need here. > > Trying to avoid this just means that people add patches sideways to satisfy specific needs, which hurts maintainability. Also, I don't like to discourage first time contributors with lots of concerns, therefore I'd like a cleaner, more flexible os layer. > > But no-one forces you to accept multi-page-sized-areas. If you really want just one page size, you can query the largest page size available beforehand and align the reservation size accordingly, and with my proposed change you could now assert the result and log it correctly. > > But if one just generally wants large pages without caring for the precise layout, one could let os::reserve_memory_special() do its best, and would now get the information about what reserve_memory_special() did. > > For example, were I to re-introduce large pages for Metaspace, I would like to have the luxury of just calling os::reserve_memory_special() without having to think about specific geometry - if the space is large enough for large pages, it should stitch the area together as best as it can. Hi, On 25.11.20 16:52, Thomas Stuefe wrote: > Yes, but I think it makes sense to allow the os layer to mix page sizes > for large paged areas, for performance reasons. The fact that this > coding exists, That code may be an artifact of 32 bit machines from 10+ years ago where address space fragmentation has been a real concern (typically Windows). Or RAM sizes were measured in hundreds of MB instead of GBs where (on Linux) pre-reserving hundreds of MB for huge pages is/has been an issue. > and that Intel wants to further complicate it and add 2M > pages, means we have a need here. The JDK-8256155 (Intel) CR does not state that requirement. Imho it only says that the author wants to use (any and all) configured pages for different reserved spaces. E.g. the machine has (on x64, to simplify the case) configured: 10 1G pages 1000 2M pages so the heap should use the 1G pages (assuming it's less than 10G), other reservations like code heap should first use the 50 2M pages before falling back to other page sizes to better use available TLB cache entries. I would prefer if we do not overcomplicate the requirements :) Also probably this should be asked and followed up in the correct review thread. > Trying to avoid this just means that people add patches sideways to > satisfy specific needs, which hurts maintainability. Also, I don't like > to discourage first time contributors with lots of concerns, therefore > I'd like a cleaner, more flexible os layer. I'd like a simpler, maybe less flexible but understandable by mere mortals OS layer :) lower layer that does not make upper layers too complicated. > But no-one forces you to accept multi-page-sized-areas. If you really > want just one page size, you can query the largest page size available > beforehand and align the reservation size accordingly, and with my Which is no issue with 64 bit machines at all, but probably has been with the prevalence of 32 bit address spaces. > proposed change you could now assert the result and log it correctly. > > But if one just generally wants large pages without caring for the > precise layout, one could let os::reserve_memory_special() do its best, > and would now get the information about what reserve_memory_special() did. This is a kind of one-sided argument, taking only commit into account. Since actually giving back memory is expected nowadays, taking care of different random page sizes is complicated. E.g. when implementing G1's region memory management (in 8u40) the decision to only support a single page size for every one of its GC data structures has been a conscious one - because the complexity overhead did not justify the gains. Nobody complained yet. > For example, were I to re-introduce large pages for Metaspace, I would > like to have the luxury of just calling os::reserve_memory_special() > without having to think about specific geometry - if the space is large > enough for large pages, it should stitch the area together as best as it > can. That's true, but that Metaspace layer then needs to be aware of multiple page sizes when uncommitting, and (presumably) tracking liveness on the lowest granularity anyway. Which does not make the code easier. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From shade at openjdk.java.net Wed Nov 25 17:04:13 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 17:04:13 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes [v5] In-Reply-To: References: Message-ID: <7Lr3OOMmw27eesiH8xA9EdoTKTs1Yp4bznoC1ybubnY=.3f66c70a-a514-4c71-b901-b40479cb23a0@github.com> > JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: > > 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. > 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. > 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. > > Supporting this directly in compilers would improve nanobenchmark fidelity. > > Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). > > Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. > > C1 code is platform-independent, and it adds the new node which is then lowered to nothing. > > C2 code is more complicated. There were four attempts to implement this, and what you see in the PR is the fourth attempt. > > [First attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.01/) was to introduce fake store like `StoreV` ("store void"), and then lower them to nothing. It runs into a series of funky problems: you would like to have at least two shapes of the store to match the store type width not to confuse the optimizer, or even have the whole mirror of `Store*` hierarchy. Additionally, optimizer tweaks were needed. The awkward problem of GC barrier verification showed up: if `StoreV*` is a subclass of `Store*`, then verificators rightfully expect GC barriers before them. If we emit GC, then we need to handle walking over `StoreV*` nodes in optimizers. > > [Second attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.04/) was to introduce the special `Blackhole` node that consumes the values -- basically like C1 implementation does it. Alas, the many attempts to make sure the new node is not DCE'd failed. Roland looked at it, and suggested that there seems to be no way to model the effects we are after: consume the value, but have no observable side effects. So, suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. > > ...which is the essence of the third attempt. Drag the blackhole through C2 as if it has call-like side effects, and then emit nothing. Instead of boolean flag, the subsequent iteration introduced a full new `CallBlackholeJava` node, that is a call as far as optimizers are concerned, and then it is matched to nothing in `.ad`. This seems to require the least fiddling with C2 internals. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix AArch64 build and test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1203/files - new: https://git.openjdk.java.net/jdk/pull/1203/files/e035d744..71fbcb9d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=03-04 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203 PR: https://git.openjdk.java.net/jdk/pull/1203 From coleenp at openjdk.java.net Wed Nov 25 17:23:10 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 25 Nov 2020 17:23:10 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: References: Message-ID: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> > The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. > I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. > Ran tier1 tests on linux-x64 and windows-x64. Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - Restore old copyright - Refix os.random test using Thomas Stuefe's better suggestion. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1422/files - new: https://git.openjdk.java.net/jdk/pull/1422/files/7961b368..3292c2f8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1422&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1422&range=00-01 Stats: 58 lines in 4 files changed: 8 ins; 25 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/1422.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1422/head:pull/1422 PR: https://git.openjdk.java.net/jdk/pull/1422 From kvn at openjdk.java.net Wed Nov 25 17:43:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 17:43:59 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: <36Kj4FuE21c-OU5VhIWXbBvHqRTh5T3TrQSs7J1LJqw=.ea8ac32f-6646-4125-b11c-d3263134faf0@github.com> References: <36Kj4FuE21c-OU5VhIWXbBvHqRTh5T3TrQSs7J1LJqw=.ea8ac32f-6646-4125-b11c-d3263134faf0@github.com> Message-ID: On Wed, 25 Nov 2020 15:07:21 GMT, Erik ?sterlund wrote: >> Ok, makes sense. What do you think about making `ZLoadBarrierElided = 0` then? > > I'm okay with that. I don't have a strong preference. I also prefer to have ZLoadBarrierElided = 0. I will add it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From kvn at openjdk.java.net Wed Nov 25 17:43:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 17:43:58 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 08:18:23 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/c2compiler.cpp line 476: >> >>> 474: if (UseCompressedOops && UseShenandoahGC) return false; >>> 475: #endif >>> 476: break; >> >> Is this intended to disable the intrinsic on all non-64-bit platforms? Is that only for Shenandoah 64-bit? I wonder if it should just be: >> >> case vmIntrinsics::_PhantomReference_refersTo0: >> if (UseCompressedOops && UseShenandoahGC) return false; >> break; > > Considering `UseCompressedOops` doesn't make much sense in 32-bit mode and is set to `false`, it seems `#ifdef` can be just dropped. You are right. I thought flag UseCompressedOops is defined only in 64-bit VM. @shipilev, #ifdef was placed incorrectly - it should be after `case:`. But as you both pointed, it is not needed. I will remove it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From kvn at openjdk.java.net Wed Nov 25 17:44:01 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 17:44:01 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 11:05:39 GMT, Per Liden wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > src/hotspot/share/opto/library_call.cpp line 5525: > >> 5523: Node* LibraryCallKit::load_field_from_object(Node* fromObj, const char* fieldName, const char* fieldTypeString, >> 5524: DecoratorSet decorators = IN_HEAP, bool is_exact = false, bool is_static = false, >> 5525: ciInstanceKlass* fromKls = NULL) { > > It looks like the `is_exact` argument here can be removed, as all call-sites use the default value, which is `false`, and the only use of it in the function is this assert, which will never fail. > assert(!is_exact || tinst->klass_is_exact(), "klass not exact"); Good suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From dnsimon at openjdk.java.net Wed Nov 25 17:45:14 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 25 Nov 2020 17:45:14 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v2] In-Reply-To: References: Message-ID: > A number of jtreg tests require a specific GC. These tests should be ignored when EnableJVMCI is true and the JVMCI compiler does not support the required GC. > > This PR adds `JVMCICompiler.isGCSupported` and makes use of it in `WhiteBox.isGCSupported`. > > Prior to this PR, a test requiring a GC not yet supported by a JVMCI compiler fail as follows: > Error occurred during initialization of VM > JVMCI Compiler does not support selected GC: epsilon gc Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - removed broken check_jvmci_supported_gc logic - removed redundant signature ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1423/files - new: https://git.openjdk.java.net/jdk/pull/1423/files/54b7ba84..bc7ee6c2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1423&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1423&range=00-01 Stats: 21 lines in 6 files changed: 0 ins; 19 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1423.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1423/head:pull/1423 PR: https://git.openjdk.java.net/jdk/pull/1423 From kvn at openjdk.java.net Wed Nov 25 17:51:59 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 17:51:59 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: <19RAO8WaSz5jtGP-k8aNoHtNkWjVBFaa-QU97qHKmGc=.faf01bfc-7a71-45ee-b7e3-37aa2cc194d0@github.com> References: <19RAO8WaSz5jtGP-k8aNoHtNkWjVBFaa-QU97qHKmGc=.faf01bfc-7a71-45ee-b7e3-37aa2cc194d0@github.com> Message-ID: On Wed, 25 Nov 2020 11:09:05 GMT, Per Liden wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > ZGC changes look good! > I just pulled the fresh master, applied this patch on top, enabled `_PhantomReference_refersTo0` in `c2compiler.cpp`, and ran `CONF=linux-x86_64-server-fastdebug make images run-test TEST=tier1 TEST_VM_OPTS="-XX:+UseShenandoahGC"` without problems. > > @vnkozlov, what Shenandoah failure did you see? Attention @rkennke. @shipilev 2 new tests added by JDK-8188055 does not trigger C2 compilation. You need to run my new test to trigger problem I see: java -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseShenandoahGC TestReferenceRefersTo.java # Internal Error (/open/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:999), pid=2498681, tid=2498694 # assert(!is_narrow) failed: phantom access cannot be narrow # # JRE version: Java(TM) SE Runtime Environment (16.0) (fastdebug build 16-internal+0-2020-11-24) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-internal+0-2020-11-24, mixed mode, sharing, compressed oops, shenandoah gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1862090] ShenandoahBarrierC2Support::call_lrb_stub(Node*&, Node*&, Node*, Node*&, Node*, unsigned long, PhaseIdealLoop*)+0x7e0 ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From kvn at openjdk.java.net Wed Nov 25 18:11:00 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 18:11:00 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: <2cJwK-GzCQ5fcT333q5GZPSlz5t7ghs4Bg5YGzCOwjA=.a577266a-6730-4cde-9b46-5953e800ef9e@github.com> References: <2cJwK-GzCQ5fcT333q5GZPSlz5t7ghs4Bg5YGzCOwjA=.a577266a-6730-4cde-9b46-5953e800ef9e@github.com> Message-ID: On Wed, 25 Nov 2020 15:13:11 GMT, Erik ?sterlund wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 623: >> >>> 621: // Also we need to add memory barrier to prevent commoning reads >>> 622: // from this field across safepoint since GC can change its value. >>> 623: bool need_read_barrier = (((on_weak || on_phantom) && !no_keepalive) || >> >> There's a slight change: `in_heap && (on_weak || ...)` turns into `(on_weak ...) || (in_heap ...)`. It will introduce a read barrier for `!in_heap && on_weak` case. Does it occur in practice? >> >> Another one: `on_weak` turns into ((on_weak ...) && !no_keepalive). >> My interpretation is no read barrier needed when `NO_KEEPALIVE` flag is used and currently a redundant barrier is issued. >> >> Maybe replace `!no_keepalive` with just `keep_alive`? The former is harder to parse. >> >> The check grows bigger and bigger. Maybe it's time to split it? >> >> Turn `on_weak || on_phantom` into `!is_strong`? > > I don't think we have any !in_heap && on_weak loads today. But if we did, they would indeed need read barriers. > We need read barrier if the the reference isn't provably strong... unless it's an AS_NO_KEEPALIVE access. That also reflects why the variable is called no_keepalive instead of keepalive; it is to reflect the shared decorator name used all over the place. I don't mind inverting it though, but personally found it easier to read when the names match our decorators. >From this conversation the only change I can do is 'Turn (on_weak || on_phantom) into !on_strong'. @fisk Is this correct? I am concern that it will include `unknown` decorator too. I agree with Erik to keep !no_keepalive because he prefer it and this is code supported by GC group. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From eosterlund at openjdk.java.net Wed Nov 25 18:36:54 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 25 Nov 2020 18:36:54 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: <2cJwK-GzCQ5fcT333q5GZPSlz5t7ghs4Bg5YGzCOwjA=.a577266a-6730-4cde-9b46-5953e800ef9e@github.com> Message-ID: On Wed, 25 Nov 2020 18:07:34 GMT, Vladimir Kozlov wrote: >> I don't think we have any !in_heap && on_weak loads today. But if we did, they would indeed need read barriers. >> We need read barrier if the the reference isn't provably strong... unless it's an AS_NO_KEEPALIVE access. That also reflects why the variable is called no_keepalive instead of keepalive; it is to reflect the shared decorator name used all over the place. I don't mind inverting it though, but personally found it easier to read when the names match our decorators. > > From this conversation the only change I can do is 'Turn (on_weak || on_phantom) into !on_strong'. > @fisk Is this correct? I am concern that it will include `unknown` decorator too. > I agree with Erik to keep !no_keepalive because he prefer it and this is code supported by GC group. Well if on_weak || on_phantom then it is provably a weak access. But I think the absence of the strong decorator does not prove it is weak, as it could have an unknown strength (via unsafe), in which case we need some extra logic to see if we can prove that an unknown strength access can't be weak. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From stuefe at openjdk.java.net Wed Nov 25 18:40:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 18:40:01 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> <_imjzmFgjMEABYlW2LnfuyJNqyil1kBrX_A Fmq_LBlc=.7b70d4bd-6257-4436-81ee-19b40953ed10@github.com> Message-ID: On Wed, 25 Nov 2020 17:01:13 GMT, Thomas Schatzl wrote: >> Yes, but I think it makes sense to allow the os layer to mix page sizes for large paged areas, for performance reasons. The fact that this coding exists, and that Intel wants to further complicate it and add 2M pages, means we have a need here. >> >> Trying to avoid this just means that people add patches sideways to satisfy specific needs, which hurts maintainability. Also, I don't like to discourage first time contributors with lots of concerns, therefore I'd like a cleaner, more flexible os layer. >> >> But no-one forces you to accept multi-page-sized-areas. If you really want just one page size, you can query the largest page size available beforehand and align the reservation size accordingly, and with my proposed change you could now assert the result and log it correctly. >> >> But if one just generally wants large pages without caring for the precise layout, one could let os::reserve_memory_special() do its best, and would now get the information about what reserve_memory_special() did. >> >> For example, were I to re-introduce large pages for Metaspace, I would like to have the luxury of just calling os::reserve_memory_special() without having to think about specific geometry - if the space is large enough for large pages, it should stitch the area together as best as it can. > > Hi, > > On 25.11.20 16:52, Thomas Stuefe wrote: >> Yes, but I think it makes sense to allow the os layer to mix page sizes >> for large paged areas, for performance reasons. The fact that this >> coding exists, > > That code may be an artifact of 32 bit machines from 10+ years ago where > address space fragmentation has been a real concern (typically Windows). > Or RAM sizes were measured in hundreds of MB instead of GBs where (on > Linux) pre-reserving hundreds of MB for huge pages is/has been an issue. > >> and that Intel wants to further complicate it and add 2M >> pages, means we have a need here. > > The JDK-8256155 (Intel) CR does not state that requirement. Imho it only > says that the author wants to use (any and all) configured pages for > different reserved spaces. > > E.g. the machine has (on x64, to simplify the case) configured: > > 10 1G pages > 1000 2M pages > > so the heap should use the 1G pages (assuming it's less than 10G), other > reservations like code heap should first use the 50 2M pages before > falling back to other page sizes to better use available TLB cache entries. > > I would prefer if we do not overcomplicate the requirements :) Also > probably this should be asked and followed up in the correct review thread. > >> Trying to avoid this just means that people add patches sideways to >> satisfy specific needs, which hurts maintainability. Also, I don't like >> to discourage first time contributors with lots of concerns, therefore >> I'd like a cleaner, more flexible os layer. > > I'd like a simpler, maybe less flexible but understandable by mere > mortals OS layer :) lower layer that does not make upper layers too > complicated. > >> But no-one forces you to accept multi-page-sized-areas. If you really >> want just one page size, you can query the largest page size available >> beforehand and align the reservation size accordingly, and with my > > Which is no issue with 64 bit machines at all, but probably has been > with the prevalence of 32 bit address spaces. > >> proposed change you could now assert the result and log it correctly. >> >> But if one just generally wants large pages without caring for the >> precise layout, one could let os::reserve_memory_special() do its best, >> and would now get the information about what reserve_memory_special() did. > > This is a kind of one-sided argument, taking only commit into account. > Since actually giving back memory is expected nowadays, taking care of > different random page sizes is complicated. > > E.g. when implementing G1's region memory management (in 8u40) the > decision to only support a single page size for every one of its GC data > structures has been a conscious one - because the complexity overhead > did not justify the gains. > > Nobody complained yet. > >> For example, were I to re-introduce large pages for Metaspace, I would >> like to have the luxury of just calling os::reserve_memory_special() >> without having to think about specific geometry - if the space is large >> enough for large pages, it should stitch the area together as best as it >> can. > > That's true, but that Metaspace layer then needs to be aware of multiple > page sizes when uncommitting, and (presumably) tracking liveness on the > lowest granularity anyway. Which does not make the code easier. > > Thanks, > Thomas Hi Thomas, first off, the last thing I want is for matters to become more complicated. God no. Today the virtual memory layer is inconsistent and undocumented and has subtle bugs. Among other things this means that PRs can take forever to discuss, which I guess is frustrating for newcomers. Or if they are pushed they can cause a lot of cleanup work afterwards. Just attempting to write a consistent API description of the virtual memory layer, parameters expected behavior etc, is very difficult. I know since I tried. And that is a bad sign. Tests are missing too, since what you cannot describe you cannot test. For an interesting example, see JDK-8255978. That bug had been in there since at least JDK8. One problem I see reoccurring with these APIs is that meta information about a reservation are needed later, but are lost. Many examples: - the exact flags with which a mapping was established (exec or not, MAP_JIT, etc) - which API has been used (AIX) - what the page sizes were at reservation time - whether the area had been stitched together with multiple mappings or a single one .. etc. The code then often tries to guess these information from "circumstantial evidence", which introduces lots of strange dependencies and makes the coding fragile. ReservedSpace::actual_page_size() in this patch is one example. A better way would be to have these information kept somewhere. Ideally, the OS would do this and I could ask it for properties of a memory mapping. In reality, the OS is not so forthcoming. So we should keep these information ourselves. One of these information is the page sizes of a region. In that light you can see my proposal. I did not propose to change a single atom about how os::reserve_memory_special() works. I just proposed for it to just be honest and return information about what it did. So that this information could be kept inside ReservedSpace and reused instead of guessed. And since the reality today is that os::reserve_memory_special() can reserve multiple page sizes, a correct implementation would have to reflect that or be wrong. You are right, reporting multiple page sizes is more difficult than just one. But even more complex is what we have today, where a mapping can have multiple page sizes but this is not acknowledged by upper layers. In this case, we have ReservedSpace::actual_page_size() which plain cannot be implemented correctly. I am fine with the idea of restricting RS to just one page size. But then lets do that and get rid of mixed TLBFS reservation (and whatever Windows does, they may be doing something similar). We also should specify the page size as an explicit parameter and not confuse this with the alignment. > That code may be an artifact of 32 bit machines from 10+ years ago where address space fragmentation has been a real concern (typically Windows). Or RAM sizes were measured in hundreds of MB instead of GBs where (on Linux) pre-reserving hundreds of MB for huge pages is/has been an issue. I am not sure what coding you refer to. > and that Intel wants to further complicate it and add 2M pages, means we have a need here. > The JDK-8256155 (Intel) CR does not state that requirement. Imho it only says that the author wants to use (any and all) configured pages for different reserved spaces. E.g. the machine has (on x64, to simplify the case) configured: 10 1G pages 1000 2M pages so the heap should use the 1G pages (assuming it's less than 10G), other reservations like code heap should first use the 50 2M pages before falling back to other page sizes to better use available TLB cache entries. I would prefer if we do not overcomplicate the requirements :) Flexible stitching would have one advantage though, in that a huge page pool could be better used. In your example, if someone starts with a 12G heap, it would fail since no pool indicidually is large enough, but combining different page sizes would work. I wont insist on this. As I wrote, I am fine with one-pagesize-per-mapping if we can arrive there. > Also probably this should be asked and followed up in the correct review thread. Since it was relevant to this PR, I did mention it here. > I'd like a simpler, maybe less flexible but understandable by mere mortals OS layer :) lower layer that does not make upper layers too complicated. I do too. >> > This is a kind of one-sided argument, taking only commit into account. Since actually giving back memory is expected nowadays, taking care of different random page sizes is complicated. E.g. when implementing G1's region memory management (in 8u40) the decision to only support a single page size for every one of its GC data structures has been a conscious one - because the complexity overhead did not justify the gains. Nobody complained yet. I may be missing something here, but is hupe paged space not pinned? How can this be uncommitted? Okay yes, if it can be uncommitted, multiple page sizes should be avoided. >> For example, were I to re-introduce large pages for Metaspace, I would like to have the luxury of just calling os::reserve_memory_special() without having to think about specific geometry - if the space is large enough for large pages, it should stitch the area together as best as it can. > That's true, but that Metaspace layer then needs to be aware of multiple page sizes when uncommitting, and (presumably) tracking liveness on the lowest granularity anyway. Which does not make the code easier. See above. I was under the assumption that uncommit goes out of the window the moment you choose explicit large pages. > Thanks, Thomas Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From stuefe at openjdk.java.net Wed Nov 25 18:42:58 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 25 Nov 2020 18:42:58 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: On Wed, 25 Nov 2020 17:23:10 GMT, Coleen Phillimore wrote: >> The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. >> I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. >> Ran tier1 tests on linux-x64 and windows-x64. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Restore old copyright > - Refix os.random test using Thomas Stuefe's better suggestion. LGTM. Thanks for taking my suggestion. Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1422 From coleenp at openjdk.java.net Wed Nov 25 18:47:08 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 25 Nov 2020 18:47:08 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" Message-ID: The ServiceThread cleaning used a stale ObjectFree state when calling remove_dead_entries, because another thread had concurrently set is_enabled to false. Add a lock around setting/resetting the lock event state and retest the state under a lock. Ran the test 100s of time without failure, where otherwise it fails very quickly. Tested with tier2,3 and running tiers 4,5,6 in progress. Thanks to Kim for his previous feedback. ------------- Commit messages: - 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" Changes: https://git.openjdk.java.net/jdk/pull/1439/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1439&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256830 Stats: 25 lines in 3 files changed: 23 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1439.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1439/head:pull/1439 PR: https://git.openjdk.java.net/jdk/pull/1439 From shade at openjdk.java.net Wed Nov 25 19:51:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 19:51:59 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: <19RAO8WaSz5jtGP-k8aNoHtNkWjVBFaa-QU97qHKmGc=.faf01bfc-7a71-45ee-b7e3-37aa2cc194d0@github.com> Message-ID: On Wed, 25 Nov 2020 17:48:54 GMT, Vladimir Kozlov wrote: > @shipilev 2 new tests added by JDK-8188055 does not trigger C2 compilation. That sounds like a testbug to me! Since this PR adds C2 intrinsics, I thought it is expected that new tests trigger it in default test configs... > You need to run my new test to trigger problem I see: > > ``` > java -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseShenandoahGC TestReferenceRefersTo.java > > # Internal Error (/open/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:999), pid=2498681, tid=2498694 > # assert(!is_narrow) failed: phantom access cannot be narrow > # Right. So new intrinsic introduces the "phantom" access from C2 intrinsic code, and it can be narrow. Shenandoah did not handle that path, because no existing code shapes were exercising it, and it was considered dead. Since it is not dead now, we can simply implement that part like this: http://cr.openjdk.java.net/~shade/shenandoah/8256999-shenandoah-fix.patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From kvn at openjdk.java.net Wed Nov 25 19:52:02 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 19:52:02 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: <2cJwK-GzCQ5fcT333q5GZPSlz5t7ghs4Bg5YGzCOwjA=.a577266a-6730-4cde-9b46-5953e800ef9e@github.com> Message-ID: <_LdM7ZnWXBn1K37mrFdWSjD3DCJEFTzVP8wuda0kJgA=.374b94f4-80ac-4a0e-8496-1c0005a5d29a@github.com> On Wed, 25 Nov 2020 18:34:28 GMT, Erik ?sterlund wrote: >> From this conversation the only change I can do is 'Turn (on_weak || on_phantom) into !on_strong'. >> @fisk Is this correct? I am concern that it will include `unknown` decorator too. >> I agree with Erik to keep !no_keepalive because he prefer it and this is code supported by GC group. > > Well if on_weak || on_phantom then it is provably a weak access. But I think the absence of the strong decorator does not prove it is weak, as it could have an unknown strength (via unsafe), in which case we need some extra logic to see if we can prove that an unknown strength access can't be weak. Okay. I will leave changes as it is. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From shade at openjdk.java.net Wed Nov 25 19:53:06 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 19:53:06 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes [v6] In-Reply-To: References: Message-ID: > JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: > > 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. > 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. > 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. > > Supporting this directly in compilers would improve nanobenchmark fidelity. > > Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). > > Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. > > C1 code is platform-independent, and it adds the new node which is then lowered to nothing. > > C2 code is more complicated. There were four attempts to implement this, and what you see in the PR is the fourth attempt. > > [First attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.01/) was to introduce fake store like `StoreV` ("store void"), and then lower them to nothing. It runs into a series of funky problems: you would like to have at least two shapes of the store to match the store type width not to confuse the optimizer, or even have the whole mirror of `Store*` hierarchy. Additionally, optimizer tweaks were needed. The awkward problem of GC barrier verification showed up: if `StoreV*` is a subclass of `Store*`, then verificators rightfully expect GC barriers before them. If we emit GC, then we need to handle walking over `StoreV*` nodes in optimizers. > > [Second attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.04/) was to introduce the special `Blackhole` node that consumes the values -- basically like C1 implementation does it. Alas, the many attempts to make sure the new node is not DCE'd failed. Roland looked at it, and suggested that there seems to be no way to model the effects we are after: consume the value, but have no observable side effects. So, suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. > > ...which is the essence of the third attempt. Drag the blackhole through C2 as if it has call-like side effects, and then emit nothing. Instead of boolean flag, the subsequent iteration introduced a full new `CallBlackholeJava` node, that is a call as far as optimizers are concerned, and then it is matched to nothing in `.ad`. This seems to require the least fiddling with C2 internals. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Support for ARM32, PPC, S390 - Fix x86_32 support - Merge branch 'master' into JDK-8252505-blackholes - Fix AArch64 build and test - Fixes after the merge - Merge branch 'master' into JDK-8252505-blackholes - More touchups: dead code, resolve TODOs - Revert old changes - Redo C2 blackholes as CallBlackholeJavaNode - 8252505: C1/C2 compiler support for blackholes ------------- Changes: https://git.openjdk.java.net/jdk/pull/1203/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=05 Stats: 1523 lines in 47 files changed: 1512 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203 PR: https://git.openjdk.java.net/jdk/pull/1203 From shade at openjdk.java.net Wed Nov 25 19:54:58 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Nov 2020 19:54:58 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: <19RAO8WaSz5jtGP-k8aNoHtNkWjVBFaa-QU97qHKmGc=.faf01bfc-7a71-45ee-b7e3-37aa2cc194d0@github.com> Message-ID: On Wed, 25 Nov 2020 19:48:40 GMT, Aleksey Shipilev wrote: >>> I just pulled the fresh master, applied this patch on top, enabled `_PhantomReference_refersTo0` in `c2compiler.cpp`, and ran `CONF=linux-x86_64-server-fastdebug make images run-test TEST=tier1 TEST_VM_OPTS="-XX:+UseShenandoahGC"` without problems. >>> >>> @vnkozlov, what Shenandoah failure did you see? Attention @rkennke. >> >> @shipilev 2 new tests added by JDK-8188055 does not trigger C2 compilation. >> You need to run my new test to trigger problem I see: >> >> java -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseShenandoahGC TestReferenceRefersTo.java >> >> # Internal Error (/open/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:999), pid=2498681, tid=2498694 >> # assert(!is_narrow) failed: phantom access cannot be narrow >> # >> # JRE version: Java(TM) SE Runtime Environment (16.0) (fastdebug build 16-internal+0-2020-11-24) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-internal+0-2020-11-24, mixed mode, sharing, compressed oops, shenandoah gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x1862090] ShenandoahBarrierC2Support::call_lrb_stub(Node*&, Node*&, Node*, Node*&, Node*, unsigned long, PhaseIdealLoop*)+0x7e0 > >> @shipilev 2 new tests added by JDK-8188055 does not trigger C2 compilation. > > That sounds like a testbug to me! Since this PR adds C2 intrinsics, I thought it is expected that new tests trigger it in default test configs... > >> You need to run my new test to trigger problem I see: >> >> ``` >> java -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseShenandoahGC TestReferenceRefersTo.java >> >> # Internal Error (/open/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:999), pid=2498681, tid=2498694 >> # assert(!is_narrow) failed: phantom access cannot be narrow >> # > > Right. So new intrinsic introduces the "phantom" access from C2 intrinsic code, and it can be narrow. Shenandoah did not handle that path, because no existing code shapes were exercising it, and it was considered dead. Since it is not dead now, we can simply implement that part like this: http://cr.openjdk.java.net/~shade/shenandoah/8256999-shenandoah-fix.patch. Your PR have also been bitten by #1427, merge from master to get it fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From kvn at openjdk.java.net Wed Nov 25 19:58:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 19:58:55 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: <19RAO8WaSz5jtGP-k8aNoHtNkWjVBFaa-QU97qHKmGc=.faf01bfc-7a71-45ee-b7e3-37aa2cc194d0@github.com> Message-ID: On Wed, 25 Nov 2020 19:52:21 GMT, Aleksey Shipilev wrote: > Your PR have also been bitten by #1427, merge from master to get it fixed. Thanks! I will apply your patch and merge from master latest changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From xxinliu at amazon.com Wed Nov 25 20:51:01 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 25 Nov 2020 20:51:01 +0000 Subject: How to avoid git push --force to a pull request(PR)? In-Reply-To: <1606269567937.41228@amazon.com> References: <1606269567937.41228@amazon.com> Message-ID: <1606337460402.57835@amazon.com> cc hotspot-dev. I found that skara-dev is mainly for skara developers. my question is for general hotspot developers. thanks, --lx ________________________________________ From: Liu, Xin Sent: Tuesday, November 24, 2020 5:59 PM To: skara-dev at openjdk.java.net Cc: Tobias Hartmann Subject: How to avoid git push --force to a pull request(PR)? Hi, Skara developers, Tobias suggested not to use force push here https://github.com/openjdk/jdk/pull/1073#issuecomment-726549523 Sometimes, I use git push --force to a private branch, which maps to an ongoing PR. What I do is to update my branch to TIP, rebase my changes to it and then "git push --force" to my branch remotely. Skara remarks the PR ?force pushed ? eg. https://github.com/openjdk/jdk/pull/1179 Yes, I admit that it would ruin the "incremental webrev". I do it for the following two reasons. 1) the reviewing process lasts too long. I have to update the base of my private branch, or it isn't mergeable. Other developers may have changed the common code when you are working on your PRs, right? /integrate will fail because of conflicts. 2) I have to update the base because of testing. Openjdk now contains the sanity check workflow. https://github.com/openjdk/jdk/blob/master/.github/workflows/submit.yml I'd like to pass them all before integrating. Sometimes, I run into failures but my PR is not the culprit. The build breakage and regression are usually rapidly fixed in the master branch. I understand I can always ditch the old PR and start over, but all comments in the old PR will lose in this way. On the side, I also feel guilty to use force push frequently. May I know if Skara has other option to help me out? I read this blog (https://julien.danjou.info/rant-about-github-pull-request-workflow-implementation/), it declares the dilemma comes from github PR mechanism. but that blog was 7-year-old, I am not sure that if github has sorted it out or not. Even github hasn't, is that possible to be solved by Skara? thanks, --lx From kvn at openjdk.java.net Wed Nov 25 21:19:54 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 21:19:54 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 14:53:34 GMT, Doug Simon wrote: >>> > Could this be done without adding JVMCI code to GCConfig::is_gc_supported? >>> >>> This has to go through the WhiteBox API as JVMCI is not accessible to normal Java code (such as VMProps). >>> We could restrict the JVMCI interposition to WhiteBox. This is what I did [initially](https://github.com/openjdk/jdk/pull/1423/commits/191d61d996dd08f43c2c946f02b0cce0d96f2ef4). However, I then saw that `GCConfig::is_gc_supported` is only called from whitebox.cpp so [moved](https://github.com/openjdk/jdk/pull/1423/commits/54b7ba8463dfee9a679abb09a5cc898ba8550d85) the logic directly into `GCConfig`. I think this is better because as soon as there's another caller of `GCConfig::is_gc_supported`, they should get an accurate answer and not have to worry about implications of `EnableJVMCI` separately. >> >> We already have code in place to ensure that EnableJVMCI is set to false when you run a GC that doesn't support JVMCI: >> void JVMCIGlobals::check_jvmci_supported_gc() { >> if (EnableJVMCI) { >> // Check if selected GC is supported by JVMCI and Java compiler >> if (!(UseSerialGC || UseParallelGC || UseG1GC)) { >> vm_exit_during_initialization("JVMCI Compiler does not support selected GC", GCConfig::hs_err_name()); >> FLAG_SET_DEFAULT(EnableJVMCI, false); >> FLAG_SET_DEFAULT(UseJVMCICompiler, false); >> } >> } >> } >> >> So, in our view we have "if you run this particular GC, that don't have support in JVMCI, then we turn off JVMCI forcefully", and we rely on that behavior in the HotSpot code. The proposed patch turns this inside out and the changed is_gc_supported function now becomes "if you've turned on EnableJVMCI is this GC supported". It sort-of inverts the dependency between the two sub-systems and we don't want to use that particular question in HotSpot. AFAICT, this is something you only need to ask in the context of jtreg's @requires, so I think it should be moved out of the GC code and put in the jtreg extension code. >> >> We've other flags that we treat similarly. For example, UseCompressedOops. ZGC doesn't support it, so we turn it off forcefully, but we don't have code in GCConfig::is_gc_supported that returns false when UseCompressedOops is turned on. That is completely handled by the jtreg extensions (and the WhiteboxAPI). >> >> I think you could easily solve this without adding calls through JVMCI from the GC code by: >> 1) Update JVMCIGlobals::supported_gc() to make a call through JVMCI into the used compiler that can tell if it supports the particular GC >> 2) Use Vladimirs proposed check_jvmci_supported_gc function, which uses supported_gc(). >> 3) Add your own WhiteboxAPI function, say WB_IsJVMCISupportedByGC, and call your "supported_gc" function from that >> 4) Update VMProps.java like I suggested above > > `JVMCIGlobals::check_jvmci_supported_gc` is incorrect in that it hard codes the GCs supported by JVMCI which not something JVMCI itself can accurately answer. What's more, the upcall into JVMCI Java code to get the right answer cannot be made here as it is too early in the VM boot process. There's not even a `JavaThread` available yet. This code should be removed and let Graal exit the VM with an error should it be used in conjunction with a GC it does not support. For example: > Error occurred during initialization of VM > JVMCI Compiler does not support selected GC: epsilon gc > > Moving the check to later in the VM bootstrap doesn't really work either as it imposes the overhead of initializing JVMCI and Graal eagerly. > This means a conflict between GC and JVMCI compiler cannot be finessed by overriding the compiler selection to use a non-JVMCI compiler. It's a fatal VM error, just like selecting multiple GCs on the command line is. > > If you agree with that, then I think the proposed PR is the right approach. We could revert to the version that leaves the JVMCI specific logic in `WB_IsGCSupported` but as I stated, that means `GCConfig::is_gc_supported` can give a wrong answer. I think you merged 2 things into this PR which made me confused. The title says that Graal should specify which GC it supports (new JVMCI API). But your description is about fixing Graal testing with different GCs. Yes, they are connected but I think they should be reviewed separately. The reason currently we test Graal supported GC in C++ code during VM startup is because we want to avoid executing application in Interpreter and bailout later when Graal is loaded to compile a hot method. Note, Graal in JDK is loaded on first compilation request. To get answer from Graal about GC you would have to load and initialize it very early which will affect startup (even with libgraal). To add new JVMCI API just to resolve testing issues is overkill for me. ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From dnsimon at openjdk.java.net Wed Nov 25 21:34:54 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 25 Nov 2020 21:34:54 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 21:16:51 GMT, Vladimir Kozlov wrote: >> `JVMCIGlobals::check_jvmci_supported_gc` is incorrect in that it hard codes the GCs supported by JVMCI which not something JVMCI itself can accurately answer. What's more, the upcall into JVMCI Java code to get the right answer cannot be made here as it is too early in the VM boot process. There's not even a `JavaThread` available yet. This code should be removed and let Graal exit the VM with an error should it be used in conjunction with a GC it does not support. For example: >> Error occurred during initialization of VM >> JVMCI Compiler does not support selected GC: epsilon gc >> >> Moving the check to later in the VM bootstrap doesn't really work either as it imposes the overhead of initializing JVMCI and Graal eagerly. >> This means a conflict between GC and JVMCI compiler cannot be finessed by overriding the compiler selection to use a non-JVMCI compiler. It's a fatal VM error, just like selecting multiple GCs on the command line is. >> >> If you agree with that, then I think the proposed PR is the right approach. We could revert to the version that leaves the JVMCI specific logic in `WB_IsGCSupported` but as I stated, that means `GCConfig::is_gc_supported` can give a wrong answer. > > I think you merged 2 things into this PR which made me confused. The title says that Graal should specify which GC it supports (new JVMCI API). But your description is about fixing Graal testing with different GCs. Yes, they are connected but I think they should be reviewed separately. > > The reason currently we test Graal supported GC in C++ code during VM startup is because we want to avoid executing application in Interpreter and bailout later when Graal is loaded to compile a hot method. Note, Graal in JDK is loaded on first compilation request. > To get answer from Graal about GC you would have to load and initialize it very early which will affect startup (even with libgraal). > > To add new JVMCI API just to resolve testing issues is overkill for me. Now I'm confused ;-) The title and description of this PR don't mention Graal at all. While Graal is obviously the concrete JVMCI compiler motivating this change, the problem relates to any Java based JVMCI compiler. This is why an API change is needed. I understand the motivation for wanting to test which GCs are supported by JVMCI in C++ code during VM startup. But I don't think that can justify giving a wrong answer. What happens when support for ZGC is added to Graal? The VM will incorrectly exit during startup based on the current code. With JVMCI, the problem of delayed VM exit upon misconfiguration already exists. For example, an incorrectly specified or unrecognized `-Dgraal.*` option will only be detected when compiling the first method with Graal (i.e. when Graal is lazily initialized). Maybe I'm missing something, but why is this a real problem? Does it matter whether the VM exits after 1ms of execution or 500ms? ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From kvn at openjdk.java.net Wed Nov 25 21:34:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 21:34:55 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 21:31:44 GMT, Doug Simon wrote: >> I think you merged 2 things into this PR which made me confused. The title says that Graal should specify which GC it supports (new JVMCI API). But your description is about fixing Graal testing with different GCs. Yes, they are connected but I think they should be reviewed separately. >> >> The reason currently we test Graal supported GC in C++ code during VM startup is because we want to avoid executing application in Interpreter and bailout later when Graal is loaded to compile a hot method. Note, Graal in JDK is loaded on first compilation request. >> To get answer from Graal about GC you would have to load and initialize it very early which will affect startup (even with libgraal). >> >> To add new JVMCI API just to resolve testing issues is overkill for me. > > Now I'm confused ;-) The title and description of this PR don't mention Graal at all. While Graal is obviously the concrete JVMCI compiler motivating this change, the problem relates to any Java based JVMCI compiler. This is why an API change is needed. > > I understand the motivation for wanting to test which GCs are supported by JVMCI in C++ code during VM startup. But I don't think that can justify giving a wrong answer. What happens when support for ZGC is added to Graal? The VM will incorrectly exit during startup based on the current code. > > With JVMCI, the problem of delayed VM exit upon misconfiguration already exists. For example, an incorrectly specified or unrecognized `-Dgraal.*` option will only be detected when compiling the first method with Graal (i.e. when Graal is lazily initialized). Maybe I'm missing something, but why is this a real problem? Does it matter whether the VM exits after 1ms of execution or 500ms? Actually here is fundamental question. Why not build GraalVM JDK without GCs which Graal does not support? It is all configurable. ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From dnsimon at openjdk.java.net Wed Nov 25 21:39:55 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 25 Nov 2020 21:39:55 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v2] In-Reply-To: References: Message-ID: <7k6ZfRR2Dc81jDPTmOtf7POaZ0jcEwCyvNBrViX6DXE=.ec7fca44-20f9-48a5-8252-484eb4affdbf@github.com> On Wed, 25 Nov 2020 21:31:51 GMT, Vladimir Kozlov wrote: > Actually here is fundamental question. Why not build GraalVM JDK without GCs which Graal does not support? It is all configurable. GraalVM could indeed do that but there are other OpenJDK community members who want to be able to use Graal on stock JDK binaries. Even for GraalVM, we want to deviate as little as possible from how the JDK underlying GraalVM is built. ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From kvn at openjdk.java.net Wed Nov 25 23:15:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 23:15:58 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v2] In-Reply-To: <7k6ZfRR2Dc81jDPTmOtf7POaZ0jcEwCyvNBrViX6DXE=.ec7fca44-20f9-48a5-8252-484eb4affdbf@github.com> References: <7k6ZfRR2Dc81jDPTmOtf7POaZ0jcEwCyvNBrViX6DXE=.ec7fca44-20f9-48a5-8252-484eb4affdbf@github.com> Message-ID: On Wed, 25 Nov 2020 21:37:13 GMT, Doug Simon wrote: >> Actually here is fundamental question. Why not build GraalVM JDK without GCs which Graal does not support? It is all configurable. > >> Actually here is fundamental question. Why not build GraalVM JDK without GCs which Graal does not support? It is all configurable. > > GraalVM could indeed do that but there are other OpenJDK community members who want to be able to use Graal on stock JDK binaries. Even for GraalVM, we want to deviate as little as possible from how the JDK underlying GraalVM is built. Yes, it is reasonable case. We already had such situation with CMS before. You and they understand which GC can be used with Graal. I agree with removal of `JVMCIGlobals::check_jvmci_supported_gc()` and not do any C++ checks during VM startup. Now about testing and changes. I agree with @stefank that you should not modify `GCConfig::is_gc_supported()` because it is about Hotspot VM support (`Interpreter` only, for example). I suggest to modify `isGcSupportedByGraal()` in VMProps.java by adding new `WB_IsGCSupportedByGraal()` WB api to call JVMCI runtime. Then `vmGC()` in VMProps.java will work as it is. And you can change `Graal` with `JVMCICompiler` in methods names if you want. ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From kbarrett at openjdk.java.net Wed Nov 25 23:34:01 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 25 Nov 2020 23:34:01 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" In-Reply-To: References: Message-ID: <1NGa5DMWUHpmh3JWelXH4X2bQRt7pymebSQ19tywORM=.95e3ef25-e234-4daf-89f3-ea481b9c2266@github.com> On Wed, 25 Nov 2020 18:40:49 GMT, Coleen Phillimore wrote: > The ServiceThread cleaning used a stale ObjectFree state when calling remove_dead_entries, because another thread had concurrently set is_enabled to false. Add a lock around setting/resetting the lock event state and retest the state under a lock. Ran the test 100s of time without failure, where otherwise it fails very quickly. > Tested with tier2,3 and running tiers 4,5,6 in progress. > Thanks to Kim for his previous feedback. Marked as reviewed by kbarrett (Reviewer). src/hotspot/share/prims/jvmtiEventController.cpp line 463: > 461: ((now_enabled & OBJECT_FREE_BIT)) != 0) { > 462: // Set/reset the event enabled under the tagmap lock. > 463: set_enabled_events_with_lock(env, now_enabled); You could tighten up the test to only handle specially when the state of the ObjectFree event is changing, i.e. (((was_enabled ^ now_enabled) & OBJECT_FREE_BIT) != 0)``` Or you could not bother with the conditionalization at all, and just always call set_enabled_events_with_lock; I bet nobody would notice any performance difference. That would eliminate "benign" races between unlocked bit setting here and bit testing in remove_dead_entries_locked. Of course, the current implementation has such races on these bits all over the place; what's one race more or less among friends... Or you could just leave it as you have it. Your call. ------------- PR: https://git.openjdk.java.net/jdk/pull/1439 From kvn at openjdk.java.net Wed Nov 25 23:35:14 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 25 Nov 2020 23:35:14 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v2] In-Reply-To: References: Message-ID: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8256999 - Added ZLoadBarrierElided = 0 definition. Removed is_exact argument in load_field_from_object(). Added Shenandoah support for narrow phantom accesses. - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1425/files - new: https://git.openjdk.java.net/jdk/pull/1425/files/7bfec378..08bdd307 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1425&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1425&range=00-01 Stats: 8771 lines in 206 files changed: 2656 ins; 872 del; 5243 mod Patch: https://git.openjdk.java.net/jdk/pull/1425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1425/head:pull/1425 PR: https://git.openjdk.java.net/jdk/pull/1425 From kbarrett at openjdk.java.net Wed Nov 25 23:48:57 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 25 Nov 2020 23:48:57 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: On Wed, 25 Nov 2020 17:23:10 GMT, Coleen Phillimore wrote: >> The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. >> I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. >> Ran tier1 tests on linux-x64 and windows-x64. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Restore old copyright > - Refix os.random test using Thomas Stuefe's better suggestion. Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From kim.barrett at oracle.com Wed Nov 25 23:51:48 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 25 Nov 2020 18:51:48 -0500 Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: References: Message-ID: <6A9B8550-8691-4566-986D-67E8973636D5@oracle.com> > On Nov 25, 2020, at 12:25 AM, Thomas Stuefe wrote: > Why is someone concurrently changing the seed? I thought "TEST" tests do not start the VM? Or is it that some earlier test already did? I was surprised by this too. "TEST" tests and "TEST_VM" tests are collected in a single list and run sequentially. The first "TEST_VM" test that gets run by that will trigger VM initialization, and all the remaining tests (whether "TEST" or "TEST_VM") get run in the resulting context. I don't much like that behavior. From stefan.karlsson at oracle.com Thu Nov 26 08:51:23 2020 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 26 Nov 2020 09:51:23 +0100 Subject: How to avoid git push --force to a pull request(PR)? In-Reply-To: <1606337460402.57835@amazon.com> References: <1606269567937.41228@amazon.com> <1606337460402.57835@amazon.com> Message-ID: <4a17d143-7d35-2b7e-9ab4-0d8d20bbbfcf@oracle.com> Hi, On 2020-11-25 21:51, Liu, Xin wrote: > cc hotspot-dev. > I found that skara-dev is mainly for skara developers. my question is for general hotspot developers. > > thanks, > --lx > > > ________________________________________ > From: Liu, Xin > Sent: Tuesday, November 24, 2020 5:59 PM > To: skara-dev at openjdk.java.net > Cc: Tobias Hartmann > Subject: How to avoid git push --force to a pull request(PR)? > > Hi, Skara developers, > > > Tobias suggested not to use force push here > https://github.com/openjdk/jdk/pull/1073#issuecomment-726549523 > > > Sometimes, I use git push --force to a private branch, which maps to an ongoing PR. > What I do is to update my branch to TIP, rebase my changes to it and then "git push --force" to my branch remotely. Skara remarks the PR ?force pushed ? eg. https://github.com/openjdk/jdk/pull/1179 > > > Yes, I admit that it would ruin the "incremental webrev". I do it for the following two reasons. > 1) the reviewing process lasts too long. I have to update the base of my private branch, or it isn't mergeable. > Other developers may have changed the common code when you are working on your PRs, right? /integrate will fail because of conflicts. Trying to understand what kind of problems you are encountering. What do you mean with "it isn't mergeable"? Do you mean that you get a conflict? If so, then resolve the conflict and then complete the merge. > > 2) I have to update the base because of testing. > Openjdk now contains the sanity check workflow. https://github.com/openjdk/jdk/blob/master/.github/workflows/submit.yml > I'd like to pass them all before integrating. Sometimes, I run into failures but my PR is not the culprit. The build breakage and regression are usually rapidly fixed in the master branch. For this use-case just merge master into your local PR branch, do local testing, push it to the PR. > > I understand I can always ditch the old PR and start over, but all comments in the old PR will lose in this way. On the side, I also feel guilty to use force push frequently. So, I think I've recently faced the same situation as you, where I really wanted to rebase my local PR branch (for reasons). What I did was I left the local PR branch and *did not* rebase it. Instead I created a new branch for the same commit and rebased that, completed some more changes, pushed it to GitHub for others to see. Then when it was time to move those changes over to the PR, I merged the rebased local branch into the local PR branch, resolved the conflicts by accepting the rebased version of the patch, and then pushed the result to the PR. This is a bit cumbersome, so I only use that approach if I really need a pristine/rebased branch for, say, showing others the stand-alone commit. Usually, I just use merge as suggested by others. Cheers, StefanK > May I know if Skara has other option to help me out? > > > I read this blog (https://julien.danjou.info/rant-about-github-pull-request-workflow-implementation/), it declares the dilemma comes from github PR mechanism. > but that blog was 7-year-old, I am not sure that if github has sorted it out or not. Even github hasn't, is that possible to be solved by Skara? > > thanks, > --lx > > > > > From sjohanss at openjdk.java.net Thu Nov 26 08:51:58 2020 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 26 Nov 2020 08:51:58 GMT Subject: RFR: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size In-Reply-To: References: <0t69lhO_zHj_sg0_doBgHyv5Fpu6Y2OuZsza1NsyE6A=.2a972bf7-5e12-4ce3-9ca2-b2add68d07c5@github.com> <3ZpyM1gAZ0HeDdWd4_Ps98de3IKXvCowyDeFCaso97U=.9694feb3-82d5-4da6-bda6-5431e3dd94d1@github.com> <_imjzmFgjMEABYlW2LnfuyJNqyil1kBrX_A Fmq_LBlc=.7b70d4bd-6257-4436-81ee-19b40953ed10@github.com> Message-ID: On Wed, 25 Nov 2020 18:36:51 GMT, Thomas Stuefe wrote: >> Hi, >> >> On 25.11.20 16:52, Thomas Stuefe wrote: >>> Yes, but I think it makes sense to allow the os layer to mix page sizes >>> for large paged areas, for performance reasons. The fact that this >>> coding exists, >> >> That code may be an artifact of 32 bit machines from 10+ years ago where >> address space fragmentation has been a real concern (typically Windows). >> Or RAM sizes were measured in hundreds of MB instead of GBs where (on >> Linux) pre-reserving hundreds of MB for huge pages is/has been an issue. >> >>> and that Intel wants to further complicate it and add 2M >>> pages, means we have a need here. >> >> The JDK-8256155 (Intel) CR does not state that requirement. Imho it only >> says that the author wants to use (any and all) configured pages for >> different reserved spaces. >> >> E.g. the machine has (on x64, to simplify the case) configured: >> >> 10 1G pages >> 1000 2M pages >> >> so the heap should use the 1G pages (assuming it's less than 10G), other >> reservations like code heap should first use the 50 2M pages before >> falling back to other page sizes to better use available TLB cache entries. >> >> I would prefer if we do not overcomplicate the requirements :) Also >> probably this should be asked and followed up in the correct review thread. >> >>> Trying to avoid this just means that people add patches sideways to >>> satisfy specific needs, which hurts maintainability. Also, I don't like >>> to discourage first time contributors with lots of concerns, therefore >>> I'd like a cleaner, more flexible os layer. >> >> I'd like a simpler, maybe less flexible but understandable by mere >> mortals OS layer :) lower layer that does not make upper layers too >> complicated. >> >>> But no-one forces you to accept multi-page-sized-areas. If you really >>> want just one page size, you can query the largest page size available >>> beforehand and align the reservation size accordingly, and with my >> >> Which is no issue with 64 bit machines at all, but probably has been >> with the prevalence of 32 bit address spaces. >> >>> proposed change you could now assert the result and log it correctly. >>> >>> But if one just generally wants large pages without caring for the >>> precise layout, one could let os::reserve_memory_special() do its best, >>> and would now get the information about what reserve_memory_special() did. >> >> This is a kind of one-sided argument, taking only commit into account. >> Since actually giving back memory is expected nowadays, taking care of >> different random page sizes is complicated. >> >> E.g. when implementing G1's region memory management (in 8u40) the >> decision to only support a single page size for every one of its GC data >> structures has been a conscious one - because the complexity overhead >> did not justify the gains. >> >> Nobody complained yet. >> >>> For example, were I to re-introduce large pages for Metaspace, I would >>> like to have the luxury of just calling os::reserve_memory_special() >>> without having to think about specific geometry - if the space is large >>> enough for large pages, it should stitch the area together as best as it >>> can. >> >> That's true, but that Metaspace layer then needs to be aware of multiple >> page sizes when uncommitting, and (presumably) tracking liveness on the >> lowest granularity anyway. Which does not make the code easier. >> >> Thanks, >> Thomas > > Hi Thomas, > > first off, the last thing I want is for matters to become more complicated. God no. > > Today the virtual memory layer is inconsistent and undocumented and has subtle bugs. Among other things this means that PRs can take forever to discuss, which I guess is frustrating for newcomers. Or if they are pushed they can cause a lot of cleanup work afterwards. > > Just attempting to write a consistent API description of the virtual memory layer, parameters expected behavior etc, is very difficult. I know since I tried. And that is a bad sign. Tests are missing too, since what you cannot describe you cannot test. For an interesting example, see JDK-8255978. That bug had been in there since at least JDK8. > > One problem I see reoccurring with these APIs is that meta information about a reservation are needed later, but are lost. Many examples: > - the exact flags with which a mapping was established (exec or not, MAP_JIT, etc) > - which API has been used (AIX) > - what the page sizes were at reservation time > - whether the area had been stitched together with multiple mappings or a single one > .. etc. > > The code then often tries to guess these information from "circumstantial evidence", which introduces lots of strange dependencies and makes the coding fragile. ReservedSpace::actual_page_size() in this patch is one example. > > A better way would be to have these information kept somewhere. Ideally, the OS would do this and I could ask it for properties of a memory mapping. In reality, the OS is not so forthcoming. So we should keep these information ourselves. > > One of these information is the page sizes of a region. In that light you can see my proposal. I did not propose to change a single atom about how os::reserve_memory_special() works. I just proposed for it to just be honest and return information about what it did. So that this information could be kept inside ReservedSpace and reused instead of guessed. And since the reality today is that os::reserve_memory_special() can reserve multiple page sizes, a correct implementation would have to reflect that or be wrong. > > You are right, reporting multiple page sizes is more difficult than just one. But even more complex is what we have today, where a mapping can have multiple page sizes but this is not acknowledged by upper layers. In this case, we have ReservedSpace::actual_page_size() which plain cannot be implemented correctly. > > I am fine with the idea of restricting RS to just one page size. But then lets do that and get rid of mixed TLBFS reservation (and whatever Windows does, they may be doing something similar). We also should specify the page size as an explicit parameter and not confuse this with the alignment. > >> That code may be an artifact of 32 bit machines from 10+ years ago where address space fragmentation has been a real concern (typically Windows). Or RAM sizes were measured in hundreds of MB instead of GBs where (on Linux) pre-reserving hundreds of MB for huge pages is/has been an issue. > > I am not sure what coding you refer to. > >> and that Intel wants to further complicate it and add 2M pages, means we have a need here. > >> The JDK-8256155 (Intel) CR does not state that requirement. Imho it only says that the author wants to use (any and all) configured pages for different reserved spaces. E.g. the machine has (on x64, to simplify the case) configured: 10 1G pages 1000 2M pages so the heap should use the 1G pages (assuming it's less than 10G), other reservations like code heap should first use the 50 2M pages before falling back to other page sizes to better use available TLB cache entries. I would prefer if we do not overcomplicate the requirements :) > > Flexible stitching would have one advantage though, in that a huge page pool could be better used. In your example, if someone starts with a 12G heap, it would fail since no pool indicidually is large enough, but combining different page sizes would work. > > I wont insist on this. As I wrote, I am fine with one-pagesize-per-mapping if we can arrive there. > >> Also probably this should be asked and followed up in the correct review thread. > > Since it was relevant to this PR, I did mention it here. > >> I'd like a simpler, maybe less flexible but understandable by mere mortals OS layer :) lower layer that does not make upper layers too complicated. > > I do too. > >>> >> This is a kind of one-sided argument, taking only commit into account. Since actually giving back memory is expected nowadays, taking care of different random page sizes is complicated. E.g. when implementing G1's region memory management (in 8u40) the decision to only support a single page size for every one of its GC data structures has been a conscious one - because the complexity overhead did not justify the gains. Nobody complained yet. > > I may be missing something here, but is hupe paged space not pinned? How can this be uncommitted? Okay yes, if it can be uncommitted, multiple page sizes should be avoided. > >>> For example, were I to re-introduce large pages for Metaspace, I would like to have the luxury of just calling os::reserve_memory_special() without having to think about specific geometry - if the space is large enough for large pages, it should stitch the area together as best as it can. > >> That's true, but that Metaspace layer then needs to be aware of multiple page sizes when uncommitting, and (presumably) tracking liveness on the lowest granularity anyway. Which does not make the code easier. > > See above. I was under the assumption that uncommit goes out of the window the moment you choose explicit large pages. > >> Thanks, Thomas > > Thanks, Thomas You are correct that explicit huge pages are pinned and of course there might be situations where it would be beneficial to do the stitching, but I think the maintenance cost will be significantly higher. It sound like you would prefer a simpler model as well and int would certainly be interesting to see if there are any problems connected to getting rid of the mixed mappings we have today. I also agree that we should do better book keeping since it is so hard to get the information later on. ------------- PR: https://git.openjdk.java.net/jdk/pull/1161 From stefank at openjdk.java.net Thu Nov 26 09:30:11 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 26 Nov 2020 09:30:11 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing [v2] In-Reply-To: References: Message-ID: > The EventLog locks are taken when the hs_err files are generated. Since crashes and asserts can occur when other locks are held, this can cause lock reordering problems if the held locks also are low-rank locks. There's no way to solve this if blocking locks are taken. > > I hit this problem when investigating making the GCLogPrecious lock use the lowest lock rank (same as EventLog). See JDK-8254877. > > Both GCLogPrecious and EventLog are considered "leaf" locks. No other locks should be taken when those locks are taken. However, if we crash in either of these sub-systems, there will be a lock-reordering error message in the hs_err file, and the rest of the logged info is skipped in the currently logged section. > > The proposal is to use try_lock_without_range_check and only log information if the lock could be acquired without blocking. This relies on the new try_lock_without_range_check function from JDK-8255678. > > I've tested this by injecting crashes while not holding locks in both GCLogPrecious, while holding locks during EventLog logging, and when not holding the locks, and verified that we get the expected behavior. > > Example output while crashing during 'Internal exceptions' logging: > Classes redefined (0 events): > No events > > Internal exceptions (5 events): > No events printed - crash while holding lock > > Events (20 events): > Event: 1,437 loading class java/util/HashMap$KeyIterator > Event: 1,438 loading class java/util/HashMap$KeyIterator done > Event: 1,438 loading class java/lang/module/ModuleDescriptor$Exports Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8256382_eventlog_try_lock - 8256382: Use try_lock for hs_err EventLog printing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1408/files - new: https://git.openjdk.java.net/jdk/pull/1408/files/26eeee59..c4f461f9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1408&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1408&range=00-01 Stats: 10231 lines in 264 files changed: 3425 ins; 1289 del; 5517 mod Patch: https://git.openjdk.java.net/jdk/pull/1408.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1408/head:pull/1408 PR: https://git.openjdk.java.net/jdk/pull/1408 From stefank at openjdk.java.net Thu Nov 26 09:32:56 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 26 Nov 2020 09:32:56 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing [v2] In-Reply-To: <4iQgQm7IEGnV4-iRBygxcwLqLfTUwK1P8AzhvmRquT4=.f6db138c-e972-483f-8628-528c5ed5b1a3@github.com> References: <69fkZdGvgYprbfVE4dKZfkhBgnOPiNEoyVy3FpsDGck=.0855b1bf-da14-4f18-9808-0f0fd1bf0cba@github.com> <4iQgQm7IEGnV4-iRBygxcwLqLfTUwK1P8AzhvmRquT4=.f6db138c-e972-483f-8628-528c5ed5b1a3@github.com> Message-ID: <_3ifcry3VL0HaImQy5Ld-iUm8q_ytUjAG8P7khzNwJM=.4d28e333-800d-4dd6-b91c-f326ffb62035@github.com> On Wed, 25 Nov 2020 07:49:42 GMT, Thomas Stuefe wrote: >> Don't we have to check not only VMError::is_error_reported, but that it is the current thread that is doing the reporting? >> >> I also think the Thread::current_or_null()==NULL case has to mean we are doing the error reporting very early in VM init - else how can we get in here in a "non attached" thread? Even then I'm not sure that is actually possible either - at what point in VM init have we installed our crash handler? > >> Don't we have to check not only VMError::is_error_reported, but that it is the current thread that is doing the reporting? >> > > You mean my proposal of just not locking altogether? > Its a calculated risk. The chance of someone printing out the even log concurrently to the hs-err reporter doing it is extremely low, and since we have secondary crash reporting the only risk we run is an interrupted error reporting step. > >> I also think the Thread::current_or_null()==NULL case has to mean we are doing the error reporting very early in VM init - else how can we get in here in a "non attached" thread? > > Thread::current_or_null()==NULL if we crash in a non attached thread or at/before VM init. The former case is typical if the VM is embedded into another launcher and foreign code crashes (not uncommon). > > Arguably, in both cases the event log is not very interesting, but I'd still attempt to print it. > >> Even then I'm not sure that is actually possible either - at what point in VM init have we installed our crash handler? @tstuefe @dholmes-ora What's the path forward with this PR? Have your reconciled and come to a conclusion about what you want me to change? ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From stuefe at openjdk.java.net Thu Nov 26 11:16:00 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 26 Nov 2020 11:16:00 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing [v2] In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 09:30:11 GMT, Stefan Karlsson wrote: >> The EventLog locks are taken when the hs_err files are generated. Since crashes and asserts can occur when other locks are held, this can cause lock reordering problems if the held locks also are low-rank locks. There's no way to solve this if blocking locks are taken. >> >> I hit this problem when investigating making the GCLogPrecious lock use the lowest lock rank (same as EventLog). See JDK-8254877. >> >> Both GCLogPrecious and EventLog are considered "leaf" locks. No other locks should be taken when those locks are taken. However, if we crash in either of these sub-systems, there will be a lock-reordering error message in the hs_err file, and the rest of the logged info is skipped in the currently logged section. >> >> The proposal is to use try_lock_without_range_check and only log information if the lock could be acquired without blocking. This relies on the new try_lock_without_range_check function from JDK-8255678. >> >> I've tested this by injecting crashes while not holding locks in both GCLogPrecious, while holding locks during EventLog logging, and when not holding the locks, and verified that we get the expected behavior. >> >> Example output while crashing during 'Internal exceptions' logging: >> Classes redefined (0 events): >> No events >> >> Internal exceptions (5 events): >> No events printed - crash while holding lock >> >> Events (20 events): >> Event: 1,437 loading class java/util/HashMap$KeyIterator >> Event: 1,438 loading class java/util/HashMap$KeyIterator done >> Event: 1,438 loading class java/lang/module/ModuleDescriptor$Exports > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8256382_eventlog_try_lock > - 8256382: Use try_lock for hs_err EventLog printing LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1408 From vladimir.x.ivanov at oracle.com Thu Nov 26 11:26:14 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 26 Nov 2020 14:26:14 +0300 Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: References: Message-ID: <5af01d43-4889-210e-e4d0-9c8e4b52cb69@oracle.com> Thanks for the clarifications, Aleksey. >> Can you elaborate on your experiment with introducing custom node you mentioned? >> Have you tried introducing new control node and just wire data nodes to it? > > See the updated PR description. Yes, I tried to introduce a new node and just wire the data nodes in it, but then I failed (miserably) to make sure the node is not considered dead by subsequent optimizations. Roland looked at it too, and did not think we can manage it. So we decided instead to piggyback on calls. New version hopefully makes it much cleaner: it is now `CallBlackholeJava` node. We can try and unhook it from `CallJava` hierarchy, and try to manage its effects more explicitly, but my prior experience tells me it is not as simple as it looks at the beginning. Some thoughts/suggestions on call-based approach: * you subclass CallJavaNode, but then disable most of the features it brings. * what you try to achieve is closer to CallLeaf: it doesn't have safepoint info attached. * but SafePoint is also close (except JVM state): it doesn't have any arguments, but still keeps values alive through debug inputs; you could try to turn arguments into debug info even for an ordinary call. * you can try to avoid AD changes and construct a MachNode directly; >> Even if there are no instructions issued, some of the unfortunate effects of a call may be still there (e.g., spills around the place where the node is scheduled, memory state is effectively killed). Fixing that would involve overriding calling conventions, in_RegMask()/out_RegMask(), customize memory effects. > > Right, that is what `Ideal` and `RegMask` handling in new `(Mach)CallBlackholeJava` node does. On the upside, it IMO makes Blackhole semantics close to what I want in JMH: it is like a call, but without the actual call. So obvious code generation quirks handled already, I think other effects are good to have. Ok, makes sense. Best regards, Vladimir Ivanov From vlivanov at openjdk.java.net Thu Nov 26 11:33:58 2020 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 26 Nov 2020 11:33:58 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 23:35:14 GMT, Vladimir Kozlov wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8256999 > - Added ZLoadBarrierElided = 0 definition. > Removed is_exact argument in load_field_from_object(). > Added Shenandoah support for narrow phantom accesses. > - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo Marked as reviewed by vlivanov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From github.com+42899633+eastig at openjdk.java.net Thu Nov 26 13:34:14 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 26 Nov 2020 13:34:14 GMT Subject: RFR: 8255351: Add detection for Graviton 2 CPUs [v2] In-Reply-To: References: Message-ID: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. Evgeny Astigeevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8255351: Add detection for Graviton 2 CPUs This commit adds detection for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for it. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1315/files - new: https://git.openjdk.java.net/jdk/pull/1315/files/204f1f68..60e28f57 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1315&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1315&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1315.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1315/head:pull/1315 PR: https://git.openjdk.java.net/jdk/pull/1315 From github.com+42899633+eastig at openjdk.java.net Thu Nov 26 13:42:55 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 26 Nov 2020 13:42:55 GMT Subject: RFR: 8255351: Add detection for Graviton 2 CPUs [v2] In-Reply-To: <-Oiu2Cimcy4ZQpPIZIhVjsmKlDaSKC15fLdq-X3I7LU=.60531c0c-b1e9-4e22-9bae-d25d85a1144c@github.com> References: <-Oiu2Cimcy4ZQpPIZIhVjsmKlDaSKC15fLdq-X3I7LU=.60531c0c-b1e9-4e22-9bae-d25d85a1144c@github.com> Message-ID: On Mon, 23 Nov 2020 23:37:44 GMT, Vladimir Kozlov wrote: >> Evgeny Astigeevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8255351: Add detection for Graviton 2 CPUs >> >> This commit adds detection for Graviton 2 (as Neoverse N1) >> and enables UseSIMDForMemoryOps for it. > > Marked as reviewed by kvn (Reviewer). I've got additional performance data. It shows that Graviton 1 has different behaviour regarding SIMD for memory ops than Graviton 2. Especially array lengths, when it is worth to use SIMD instructions. We will address the issues in a separate PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From rkennke at openjdk.java.net Thu Nov 26 13:53:59 2020 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 26 Nov 2020 13:53:59 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 23:35:14 GMT, Vladimir Kozlov wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8256999 > - Added ZLoadBarrierElided = 0 definition. > Removed is_exact argument in load_field_from_object(). > Added Shenandoah support for narrow phantom accesses. > - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo Shenandoah parts look good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1425 From github.com+42899633+eastig at openjdk.java.net Thu Nov 26 19:58:59 2020 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 26 Nov 2020 19:58:59 GMT Subject: Integrated: 8255351: Add detection for Graviton 2 CPUs In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 12:39:47 GMT, Evgeny Astigeevich wrote: > This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. > > The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. This pull request has now been integrated. Changeset: 2215e5a4 Author: Evgeny Astigeevich Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/2215e5a4 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod 8255351: Add detection for Graviton 2 CPUs Reviewed-by: simonis, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From kvn at openjdk.java.net Thu Nov 26 19:58:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 26 Nov 2020 19:58:58 GMT Subject: RFR: 8255351: Add detection for Graviton 2 CPUs [v2] In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 13:34:14 GMT, Evgeny Astigeevich wrote: >> This commit adds detection for Graviton 1 (as Cortex-A72) and for Graviton 2 (as Neoverse N1) and enables UseSIMDForMemoryOps for them. >> >> The patch passed jtreg tier1-2 and all gtest tests with linux-aarch64-server-release build. > > Evgeny Astigeevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1315 From shade at redhat.com Thu Nov 26 20:13:59 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 26 Nov 2020 21:13:59 +0100 Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: <5af01d43-4889-210e-e4d0-9c8e4b52cb69@oracle.com> References: <5af01d43-4889-210e-e4d0-9c8e4b52cb69@oracle.com> Message-ID: <4c361db2-7f66-8f55-3771-e48b19ca575e@redhat.com> Hi Vladimir, Apparently Skara bots were not able to link this reply to the PR... On 11/26/20 12:26 PM, Vladimir Ivanov wrote: > * you subclass CallJavaNode, but then disable most of the features it > brings. Yeah. On the upside, that effectively allow me to push the CallBlackholeJava as if it is CallJavaNode, and create this special node by trivially hijacking two CallGenerators. But maybe that is not as clean, and we are better off lifting CallBlackhole to Call, and do its own BlackholeCallGenerator. For example, like this: https://cr.openjdk.java.net/~shade/8252505/remodel-callblackhole.patch Since we are now doing the whole setup for it ourselves, I think we can avoid safepoint and debug info, and therefore ditch Ideal for it. RegMask would still avoid register shuffles of "real" arguments. It still has some x86_32 failures which I would attend to tomorrow, so I cannot push it to the branch yet. > * what you try to achieve is closer to CallLeaf: it doesn't have > safepoint info attached. Yeah, but CallLeaf is CallRuntime, which is weird. > * but SafePoint is also close (except JVM state): it doesn't have any > arguments, but still keeps values alive through debug inputs; you could > try to turn arguments into debug info even for an ordinary call. I am probably stupid, but I cannot get this part to work yet. > * you can try to avoid AD changes and construct a MachNode directly; You mean hack Matcher to emit MachCallBlackholeNode for every CallBlackholeNode without involving .ad? I guess I can try, but simple .ad match rule seems a tad more clear and maintainable to me: it does not do special hacks in Matcher. -- Thanks, -Aleksey From dnsimon at openjdk.java.net Thu Nov 26 20:59:14 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 26 Nov 2020 20:59:14 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v3] In-Reply-To: References: Message-ID: > A number of jtreg tests require a specific GC. These tests should be ignored when EnableJVMCI is true and the JVMCI compiler does not support the required GC. > > This PR adds `JVMCICompiler.isGCSupported` and makes use of it in `WhiteBox.isGCSupported`. > > Prior to this PR, a test requiring a GC not yet supported by a JVMCI compiler fail as follows: > Error occurred during initialization of VM > JVMCI Compiler does not support selected GC: epsilon gc Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - implemented isGCSupported in Graal compiler - removed JVMCI logic from GCConfig and introduced WhiteBox.isGCSupportedByJVMCI instead ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1423/files - new: https://git.openjdk.java.net/jdk/pull/1423/files/bc7ee6c2..384287b7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1423&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1423&range=01-02 Stats: 110 lines in 7 files changed: 70 ins; 28 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/1423.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1423/head:pull/1423 PR: https://git.openjdk.java.net/jdk/pull/1423 From dnsimon at openjdk.java.net Thu Nov 26 21:09:00 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 26 Nov 2020 21:09:00 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v3] In-Reply-To: References: <7k6ZfRR2Dc81jDPTmOtf7POaZ0jcEwCyvNBrViX6DXE=.ec7fca44-20f9-48a5-8252-484eb4affdbf@github.com> Message-ID: On Wed, 25 Nov 2020 23:13:21 GMT, Vladimir Kozlov wrote: >>> Actually here is fundamental question. Why not build GraalVM JDK without GCs which Graal does not support? It is all configurable. >> >> GraalVM could indeed do that but there are other OpenJDK community members who want to be able to use Graal on stock JDK binaries. Even for GraalVM, we want to deviate as little as possible from how the JDK underlying GraalVM is built. > > Yes, it is reasonable case. We already had such situation with CMS before. You and they understand which GC can be used with Graal. I agree with removal of `JVMCIGlobals::check_jvmci_supported_gc()` and not do any C++ checks during VM startup. > > Now about testing and changes. I agree with @stefank that you should not modify `GCConfig::is_gc_supported()` because it is about Hotspot VM support (`Interpreter` only, for example). I suggest to modify `isGcSupportedByGraal()` in VMProps.java by adding new `WB_IsGCSupportedByGraal()` WB api to call JVMCI runtime. Then `vmGC()` in VMProps.java will work as it is. And you can change `Graal` with `JVMCICompiler` in methods names if you want. Ok, I get the point about `GCConfig::is_gc_supported()` now. It's answering the question "is the given GC built into the VM" as opposed to "can the given GC work in the current VM runtime configuration". Based on your suggestions, I've removed the JVMCI logic from `GCConfig`, added `WB_isGCSupportedByJVMCI` (it's better to keep things JVMCI compiler neutral at this layer) and modified `VMProps` to use it (https://github.com/openjdk/jdk/pull/1423/commits/379f48f794f2f59e1b4ffd837025f9ea999d06ff). I've also implemented the Graal side of this interface so that it can be demonstrated to work in the JDK (https://github.com/openjdk/jdk/pull/1423/commits/384287b748acfbb3ebea7d9b5eb1cbbe106d912d): > sh configure --disable-precompiled-headers --enable-jvm-feature-graal > make images # With Graal > jtreg -v -noreport -jdk:build/macosx-x86_64-server-release/images/jdk -vmoptions:"-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler" test/hotspot/jtreg/gc/epsilon/TestAlignment.java Test results: no tests selected Results written to /Users/dnsimon/jdk-jdk/open/JTwork # Without Graal > jtreg -v -noreport -jdk:build/macosx-x86_64-server-release/images/jdk test/hotspot/jtreg/gc/epsilon/TestAlignment.java runner starting test: gc/epsilon/TestAlignment.java runner finished test: gc/epsilon/TestAlignment.java Passed. Execution successful Test results: passed: 1 Results written to /Users/dnsimon/jdk-jdk/open/JTwork ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From david.holmes at oracle.com Thu Nov 26 22:54:14 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 27 Nov 2020 08:54:14 +1000 Subject: How to avoid git push --force to a pull request(PR)? In-Reply-To: <1606337460402.57835@amazon.com> References: <1606269567937.41228@amazon.com> <1606337460402.57835@amazon.com> Message-ID: On 26/11/2020 6:51 am, Liu, Xin wrote: > cc hotspot-dev. > I found that skara-dev is mainly for skara developers. my question is for general hotspot developers. > > thanks, > --lx > > > ________________________________________ > From: Liu, Xin > Sent: Tuesday, November 24, 2020 5:59 PM > To: skara-dev at openjdk.java.net > Cc: Tobias Hartmann > Subject: How to avoid git push --force to a pull request(PR)? > > Hi, Skara developers, > > > Tobias suggested not to use force push here > https://github.com/openjdk/jdk/pull/1073#issuecomment-726549523 Right. There is generally no need for "force push" on an active PR because you can just merge and do a normal push. The skara tooling will flatten the set of commits into one single clean commit when you integrate, so none of the merges are evident. > Sometimes, I use git push --force to a private branch, which maps to an ongoing PR. > What I do is to update my branch to TIP, rebase my changes to it and then "git push --force" to my branch remotely. Skara remarks the PR ?force pushed ? eg. https://github.com/openjdk/jdk/pull/1179 > > > Yes, I admit that it would ruin the "incremental webrev". I do it for the following two reasons. > 1) the reviewing process lasts too long. I have to update the base of my private branch, or it isn't mergeable. Not sure what you mean by the "base" of your "private branch. You should have a personal fork on Github. You clone your personal fork locally and create as many branches as you like, merging as you desire, and push them to your PF when you need to create a PR. You can update your local master to branch with upstream any time you like without affecting your working branches. If you need to update a working branch then just merge with master and push to your PF. Cheers, David ----- > Other developers may have changed the common code when you are working on your PRs, right? /integrate will fail because of conflicts. > > 2) I have to update the base because of testing. > Openjdk now contains the sanity check workflow. https://github.com/openjdk/jdk/blob/master/.github/workflows/submit.yml > I'd like to pass them all before integrating. Sometimes, I run into failures but my PR is not the culprit. The build breakage and regression are usually rapidly fixed in the master branch. > > I understand I can always ditch the old PR and start over, but all comments in the old PR will lose in this way. On the side, I also feel guilty to use force push frequently. > May I know if Skara has other option to help me out? > > > I read this blog (https://julien.danjou.info/rant-about-github-pull-request-workflow-implementation/), it declares the dilemma comes from github PR mechanism. > but that blog was 7-year-old, I am not sure that if github has sorted it out or not. Even github hasn't, is that possible to be solved by Skara? > > thanks, > --lx > > > > > From dholmes at openjdk.java.net Thu Nov 26 23:15:57 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 26 Nov 2020 23:15:57 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing [v2] In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 11:12:57 GMT, Thomas Stuefe wrote: >> Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8256382_eventlog_try_lock >> - 8256382: Use try_lock for hs_err EventLog printing > > LGTM > > > @tstuefe @dholmes-ora What's the path forward with this PR? Have your reconciled and come to a conclusion about what you want me to change? I have no particular "wants" in this area I was just trying to evaluate the two proposals. Thomas seems fine with your original proposal now. I thought you were suggesting above you should be using `fatal_error_in_progress()` instead, so I'm not sure if this is your final proposal. If it is then I'm fine with it. There's no perfect solution when it comes to error reporting. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From dholmes at openjdk.java.net Fri Nov 27 01:02:03 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 27 Nov 2020 01:02:03 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: On Wed, 25 Nov 2020 17:23:10 GMT, Coleen Phillimore wrote: >> The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. >> I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. >> Ran tier1 tests on linux-x64 and windows-x64. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Restore old copyright > - Refix os.random test using Thomas Stuefe's better suggestion. Marked as reviewed by dholmes (Reviewer). test/hotspot/gtest/runtime/test_os.cpp line 125: > 123: int num; > 124: for (int k = 0; k < reps; k++) { > 125: num = seed = os::next_random(seed); I suggest adding a comment before this line: // Use next_random so the calculation is stateless. Or something to that affect. ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From dholmes at openjdk.java.net Fri Nov 27 02:29:57 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 27 Nov 2020 02:29:57 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" In-Reply-To: References: Message-ID: <4GRPSMKxwFONrO-YzHXh7z8WxPHJM2tP0Y5X5uWnV10=.03265f2f-a4f7-4ef0-99e3-37701aa6b7fa@github.com> On Wed, 25 Nov 2020 18:40:49 GMT, Coleen Phillimore wrote: > The ServiceThread cleaning used a stale ObjectFree state when calling remove_dead_entries, because another thread had concurrently set is_enabled to false. Add a lock around setting/resetting the lock event state and retest the state under a lock. Ran the test 100s of time without failure, where otherwise it fails very quickly. > Tested with tier2,3 and running tiers 4,5,6 in progress. > Thanks to Kim for his previous feedback. src/hotspot/share/prims/jvmtiTagMap.cpp line 1162: > 1160: if (_needs_cleaning) { > 1161: // Recheck whether to post object free events under the lock. > 1162: post_object_free = post_object_free && env()->is_enabled(JVMTI_EVENT_OBJECT_FREE); Where is `is_enabled` called without the lock being held in a caller of `remove_dead_entries()`? ------------- PR: https://git.openjdk.java.net/jdk/pull/1439 From stuefe at openjdk.java.net Fri Nov 27 06:23:01 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 27 Nov 2020 06:23:01 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing [v2] In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 23:12:51 GMT, David Holmes wrote: >> LGTM > >> >> >> @tstuefe @dholmes-ora What's the path forward with this PR? Have your reconciled and come to a conclusion about what you want me to change? > > I have no particular "wants" in this area I was just trying to evaluate the two proposals. Thomas seems fine with your original proposal now. I thought you were suggesting above you should be using `fatal_error_in_progress()` instead, so I'm not sure if this is your final proposal. If it is then I'm fine with it. There's no perfect solution when it comes to error reporting. > Thanks. Yes, I am fine with this. I would have prepared a simpler solution as I originally proposed, but do not want to block this patch. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From stuefe at openjdk.java.net Fri Nov 27 06:25:58 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 27 Nov 2020 06:25:58 GMT Subject: RFR: JDK-8256864: [windows] Improve tracing for mapping errors In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 15:26:45 GMT, Thomas Stuefe wrote: > To analyze JDK-8256729 further, we need more tracing: > > 1) We should print all mappings inside the split area if os::split_reserved_memory() fails > > 2) The print-mapping code on windows has some shortcomings: > - should not probe for mappings outside of what we know are valid address ranges for Windows > - should handle wrap-arounds - it should be possible to print the whole address space > - Protection information is printed wrong (MEMORY_BASIC_INFORMATION.Protect would be the correct member) > - should be printed in a more compact manner - base address should be on the same line as the first region > - maybe adorned with some basic range info, e.g. library mappings Gentle ping. This is needed to get further information on the CDS/Metaspace initialization errors (https://bugs.openjdk.java.net/browse/JDK-8256729) ------------- PR: https://git.openjdk.java.net/jdk/pull/1390 From stuefe at openjdk.java.net Fri Nov 27 06:29:12 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 27 Nov 2020 06:29:12 GMT Subject: RFR: JDK-8256864: [windows] Improve tracing for mapping errors [v2] In-Reply-To: References: Message-ID: > To analyze JDK-8256729 further, we need more tracing: > > 1) We should print all mappings inside the split area if os::split_reserved_memory() fails > > 2) The print-mapping code on windows has some shortcomings: > - should not probe for mappings outside of what we know are valid address ranges for Windows > - should handle wrap-arounds - it should be possible to print the whole address space > - Protection information is printed wrong (MEMORY_BASIC_INFORMATION.Protect would be the correct member) > - should be printed in a more compact manner - base address should be on the same line as the first region > - maybe adorned with some basic range info, e.g. library mappings Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge - Initial patch ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1390/files - new: https://git.openjdk.java.net/jdk/pull/1390/files/7bba557d..71d8ef99 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1390&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1390&range=00-01 Stats: 13570 lines in 396 files changed: 5223 ins; 2162 del; 6185 mod Patch: https://git.openjdk.java.net/jdk/pull/1390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1390/head:pull/1390 PR: https://git.openjdk.java.net/jdk/pull/1390 From thomas.stuefe at gmail.com Fri Nov 27 06:30:49 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 27 Nov 2020 07:30:49 +0100 Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: <6A9B8550-8691-4566-986D-67E8973636D5@oracle.com> References: <6A9B8550-8691-4566-986D-67E8973636D5@oracle.com> Message-ID: On Thu, Nov 26, 2020 at 12:51 AM Kim Barrett wrote: > > On Nov 25, 2020, at 12:25 AM, Thomas Stuefe > wrote: > > Why is someone concurrently changing the seed? I thought "TEST" tests do > not start the VM? Or is it that some earlier test already did? > > I was surprised by this too. > > "TEST" tests and "TEST_VM" tests are collected in a single list and > run sequentially. The first "TEST_VM" test that gets run by that will > trigger VM initialization, and all the remaining tests (whether "TEST" > or "TEST_VM") get run in the resulting context. > > I don't much like that behavior. > > I don't either. One of the problems I keep encountering is that I use TEST only to later see that I accidentally used VM infrastructure; but you only notice when executing the test in a different order. googletests has all these nice options, eg --gtest_shuffle. Maybe we should randomize the execution order of these tests to shake loose these errors. Cheers, Thome From iignatyev at openjdk.java.net Fri Nov 27 06:58:58 2020 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 27 Nov 2020 06:58:58 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: On Fri, 27 Nov 2020 00:59:35 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore old copyright >> - Refix os.random test using Thomas Stuefe's better suggestion. > > Marked as reviewed by dholmes (Reviewer). > _Mailing list message from [Thomas St??fe](mailto:thomas.stuefe at gmail.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ > > On Thu, Nov 26, 2020 at 12:51 AM Kim Barrett wrote: > > > > On Nov 25, 2020, at 12:25 AM, Thomas Stuefe > > > wrote: > > > Why is someone concurrently changing the seed? I thought "TEST" tests do > > > not start the VM? Or is it that some earlier test already did? > > > > > > I was surprised by this too. > > "TEST" tests and "TEST_VM" tests are collected in a single list and > > run sequentially. The first "TEST_VM" test that gets run by that will > > trigger VM initialization, and all the remaining tests (whether "TEST" > > or "TEST_VM") get run in the resulting context. > > I don't much like that behavior. > > I don't either. One of the problems I keep encountering is that I use TEST > only to later see that I accidentally used VM infrastructure; but you only > notice when executing the test in a different order. > > googletests has all these nice options, eg --gtest_shuffle. Maybe we should > randomize the execution order of these tests to shake loose these errors. > > Cheers, Thome there is an easier way to find all *current* errors, you can just run each test individually, so all `TEST` which should have been `TEST_VM` will fail. there is a script which I used to use to do that, and I actually thought we have integrated it alongside the prev. rounds of such clean. as a historical note, one of the first iterations of JEP [281](https://bugs.openjdk.java.net/browse/JDK-8047975) ran `TEST` before all other tests, but at some point, we decided that the complexity this brought just didn't worth it. it can easily be a case that this decision wasn't right (or isn't right anymore), and we should be able to change the behavior to ensure that `TEST` tests are run w/o VM being initialized, the only problem I can foresee is w/ `--gtest_shuffle`. Cheers, -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From thomas.stuefe at gmail.com Fri Nov 27 07:09:54 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 27 Nov 2020 08:09:54 +0100 Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: On Fri, Nov 27, 2020 at 7:59 AM Igor Ignatyev wrote: > On Fri, 27 Nov 2020 00:59:35 GMT, David Holmes > wrote: > > >> Coleen Phillimore has updated the pull request incrementally with two > additional commits since the last revision: > >> > >> - Restore old copyright > >> - Refix os.random test using Thomas Stuefe's better suggestion. > > > > Marked as reviewed by dholmes (Reviewer). > > > _Mailing list message from [Thomas St??fe](mailto: > thomas.stuefe at gmail.com) on [hotspot-dev](mailto: > hotspot-dev at openjdk.java.net):_ > > > > On Thu, Nov 26, 2020 at 12:51 AM Kim Barrett > wrote: > > > > > > On Nov 25, 2020, at 12:25 AM, Thomas Stuefe openjdk.java.net> > > > > wrote: > > > > Why is someone concurrently changing the seed? I thought "TEST" > tests do > > > > not start the VM? Or is it that some earlier test already did? > > > > > > > > > I was surprised by this too. > > > "TEST" tests and "TEST_VM" tests are collected in a single list and > > > run sequentially. The first "TEST_VM" test that gets run by that will > > > trigger VM initialization, and all the remaining tests (whether "TEST" > > > or "TEST_VM") get run in the resulting context. > > > I don't much like that behavior. > > > > I don't either. One of the problems I keep encountering is that I use > TEST > > only to later see that I accidentally used VM infrastructure; but you > only > > notice when executing the test in a different order. > > > > googletests has all these nice options, eg --gtest_shuffle. Maybe we > should > > randomize the execution order of these tests to shake loose these errors. > > > > Cheers, Thome > > there is an easier way to find all *current* errors, you can just run each > test individually, so all `TEST` which should have been `TEST_VM` will > fail. there is a script which I used to use to do that, and I actually > thought we have integrated it alongside the prev. rounds of such clean. > > as a historical note, one of the first iterations of JEP [281]( > https://bugs.openjdk.java.net/browse/JDK-8047975) ran `TEST` before all > other tests, but at some point, we decided that the complexity this brought > just didn't worth it. it can easily be a case that this decision wasn't > right (or isn't right anymore), and we should be able to change the > behavior to ensure that `TEST` tests are run w/o VM being initialized, the > only problem I can foresee is w/ `--gtest_shuffle`. > > Cheers, > -- Igor > > I was aiming for a modification of the current tier1 tests, without increasing the run time. I don't think running each test as an individual process is worth it. But I can now see problems with shuffle too, esp. non-reproducibility. Errors would appear intermittent but aren't. Executing all TEST tests first, followed by TEST_VM, seems to be the simplest and cleanest solution. Cheers, Thomas > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/1422 > From xxinliu at amazon.com Fri Nov 27 07:37:58 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 27 Nov 2020 07:37:58 +0000 Subject: How to avoid git push --force to a pull request(PR)? In-Reply-To: References: <1606269567937.41228@amazon.com> <1606337460402.57835@amazon.com>, Message-ID: <1606462678144.28075@amazon.com> hi, Stefan and David, Thank you for guidance. After reading your message, I realize that I mentally rejected the most straightforward approach -- merge. After some tests, I am happy to see that Skara can recognize the merging commit and generate correct webrev for me. Amazingly, github provides a webUI to help me to resolve conflicts. I can avoid force push in most cases. awesome! Thank Stefan for teaching me a new approach to solve that tricky situation. you mix both rebase model and merge model. it's a great takeaway. thanks, --lx ________________________________________ From: David Holmes Sent: Thursday, November 26, 2020 2:54 PM To: Liu, Xin; skara-dev at openjdk.java.net Cc: hotspot-dev at openjdk.java.net Subject: RE: [EXTERNAL] How to avoid git push --force to a pull request(PR)? CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 26/11/2020 6:51 am, Liu, Xin wrote: > cc hotspot-dev. > I found that skara-dev is mainly for skara developers. my question is for general hotspot developers. > > thanks, > --lx > > > ________________________________________ > From: Liu, Xin > Sent: Tuesday, November 24, 2020 5:59 PM > To: skara-dev at openjdk.java.net > Cc: Tobias Hartmann > Subject: How to avoid git push --force to a pull request(PR)? > > Hi, Skara developers, > > > Tobias suggested not to use force push here > https://github.com/openjdk/jdk/pull/1073#issuecomment-726549523 Right. There is generally no need for "force push" on an active PR because you can just merge and do a normal push. The skara tooling will flatten the set of commits into one single clean commit when you integrate, so none of the merges are evident. > Sometimes, I use git push --force to a private branch, which maps to an ongoing PR. > What I do is to update my branch to TIP, rebase my changes to it and then "git push --force" to my branch remotely. Skara remarks the PR ?force pushed ? eg. https://github.com/openjdk/jdk/pull/1179 > > > Yes, I admit that it would ruin the "incremental webrev". I do it for the following two reasons. > 1) the reviewing process lasts too long. I have to update the base of my private branch, or it isn't mergeable. Not sure what you mean by the "base" of your "private branch. You should have a personal fork on Github. You clone your personal fork locally and create as many branches as you like, merging as you desire, and push them to your PF when you need to create a PR. You can update your local master to branch with upstream any time you like without affecting your working branches. If you need to update a working branch then just merge with master and push to your PF. Cheers, David ----- > Other developers may have changed the common code when you are working on your PRs, right? /integrate will fail because of conflicts. > > 2) I have to update the base because of testing. > Openjdk now contains the sanity check workflow. https://github.com/openjdk/jdk/blob/master/.github/workflows/submit.yml > I'd like to pass them all before integrating. Sometimes, I run into failures but my PR is not the culprit. The build breakage and regression are usually rapidly fixed in the master branch. > > I understand I can always ditch the old PR and start over, but all comments in the old PR will lose in this way. On the side, I also feel guilty to use force push frequently. > May I know if Skara has other option to help me out? > > > I read this blog (https://julien.danjou.info/rant-about-github-pull-request-workflow-implementation/), it declares the dilemma comes from github PR mechanism. > but that blog was 7-year-old, I am not sure that if github has sorted it out or not. Even github hasn't, is that possible to be solved by Skara? > > thanks, > --lx > > > > > From dnsimon at openjdk.java.net Fri Nov 27 10:20:17 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 27 Nov 2020 10:20:17 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v4] In-Reply-To: References: Message-ID: > A number of jtreg tests require a specific GC. These tests should be ignored when EnableJVMCI is true and the JVMCI compiler does not support the required GC. > > This PR adds `JVMCICompiler.isGCSupported` and makes use of it in `WhiteBox.isGCSupported`. > > Prior to this PR, a test requiring a GC not yet supported by a JVMCI compiler fail as follows: > Error occurred during initialization of VM > JVMCI Compiler does not support selected GC: epsilon gc Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8257020 - implemented isGCSupported in Graal compiler - removed JVMCI logic from GCConfig and introduced WhiteBox.isGCSupportedByJVMCI instead - removed broken check_jvmci_supported_gc logic - removed redundant signature - move logic to GCConfig - enable a JVMCICompiler to specify which GCs it supports ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1423/files - new: https://git.openjdk.java.net/jdk/pull/1423/files/384287b7..6e6aef0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1423&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1423&range=02-03 Stats: 9695 lines in 261 files changed: 3265 ins; 1008 del; 5422 mod Patch: https://git.openjdk.java.net/jdk/pull/1423.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1423/head:pull/1423 PR: https://git.openjdk.java.net/jdk/pull/1423 From vladimir.x.ivanov at oracle.com Fri Nov 27 11:06:47 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 27 Nov 2020 14:06:47 +0300 Subject: RFR: 8252505: C1/C2 compiler support for blackholes In-Reply-To: <4c361db2-7f66-8f55-3771-e48b19ca575e@redhat.com> References: <5af01d43-4889-210e-e4d0-9c8e4b52cb69@oracle.com> <4c361db2-7f66-8f55-3771-e48b19ca575e@redhat.com> Message-ID: <5857edb5-4780-edf2-9e26-6d41dcc362c9@oracle.com> >> ??? * you subclass CallJavaNode, but then disable most of the features it >> brings. > > Yeah. On the upside, that effectively allow me to push the > CallBlackholeJava as if it is CallJavaNode, and create this special node > by trivially hijacking two CallGenerators. But maybe that is not as > clean, and we are better off lifting CallBlackhole to Call, and do its > own BlackholeCallGenerator. > > For example, like this: > ? https://cr.openjdk.java.net/~shade/8252505/remodel-callblackhole.patch > > Since we are now doing the whole setup for it ourselves, I think we can > avoid safepoint and debug info, and therefore ditch Ideal for it. > RegMask would still avoid register shuffles of "real" arguments. It > still has some x86_32 failures which I would attend to tomorrow, so I > cannot push it to the branch yet. Yeah, it looks better. Still, IMO LibraryCallKit is a better place for it. It would require additional logic to attach intrinsic id to a method, but I think it's worth it. >> ??? * what you try to achieve is closer to CallLeaf: it doesn't have >> safepoint info attached. > > Yeah, but CallLeaf is CallRuntime, which is weird. Representing something non-existent in the generated code as a Call is already weird, isn't it? ;-) >> ??? * but SafePoint is also close (except JVM state): it doesn't have any >> arguments, but still keeps values alive through debug inputs; you could >> try to turn arguments into debug info even for an ordinary call. > > I am probably stupid, but I cannot get this part to work yet. On my side it is more about pointing out similarities with existing concepts. I still think it's worth considering a special node without any ties to Call/Safepoint hierarchy. One more idea: would a new flavor of mem-bar (CPUOrder-based one?) with debug info support solve the problem? It would have the following features: (1) don't produce anything in generated code; (2) don't require explicit matching; (3) keeps values alive, but don't touch them; (4) orders both control and memory graph. (MemBarNode node extends MultiNode and is a sibling of SafePoint.) >> ??? * you can try to avoid AD changes and construct a MachNode directly; > > You mean hack Matcher to emit MachCallBlackholeNode for every > CallBlackholeNode without involving .ad? I guess I can try, but simple > .ad match rule seems a tad more clear and maintainable to me: it does > not do special hacks in Matcher. Sure, it works nicely for Calls/Safepoints and you would only save on trivial declarations across all platforms. But once you untie the node from Call/Safepoint hierarchy, you have to be more explicit about the effects of the particular mach node flavor. Special handling of Call/Safepoints by Matcher covers that for you now. Best regards, Vladimir Ivanov From stefank at openjdk.java.net Fri Nov 27 12:19:56 2020 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 27 Nov 2020 12:19:56 GMT Subject: RFR: 8256382: Use try_lock for hs_err EventLog printing [v2] In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 23:12:51 GMT, David Holmes wrote: >> LGTM > >> >> >> @tstuefe @dholmes-ora What's the path forward with this PR? Have your reconciled and come to a conclusion about what you want me to change? > > I have no particular "wants" in this area I was just trying to evaluate the two proposals. Thomas seems fine with your original proposal now. I thought you were suggesting above you should be using `fatal_error_in_progress()` instead, so I'm not sure if this is your final proposal. If it is then I'm fine with it. There's no perfect solution when it comes to error reporting. > Thanks. @dholmes-ora @tstuefe OK. Thanks. I'll go ahead with what I'm currently having, and we can continue to chip away at this in follow-up RFEs (if we find the time to). ------------- PR: https://git.openjdk.java.net/jdk/pull/1408 From shade at openjdk.java.net Fri Nov 27 13:16:14 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 27 Nov 2020 13:16:14 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes [v7] In-Reply-To: References: Message-ID: > JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: > > 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. > 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. > 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. > > Supporting this directly in compilers would improve nanobenchmark fidelity. > > Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). > > Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. > > C1 code is platform-independent, and it adds the new node which is then lowered to nothing. > > C2 code is more complicated. There were four attempts to implement this, and what you see in the PR is the fourth attempt. > > [First attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.01/) was to introduce fake store like `StoreV` ("store void"), and then lower them to nothing. It runs into a series of funky problems: you would like to have at least two shapes of the store to match the store type width not to confuse the optimizer, or even have the whole mirror of `Store*` hierarchy. Additionally, optimizer tweaks were needed. The awkward problem of GC barrier verification showed up: if `StoreV*` is a subclass of `Store*`, then verificators rightfully expect GC barriers before them. If we emit GC, then we need to handle walking over `StoreV*` nodes in optimizers. > > [Second attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.04/) was to introduce the special `Blackhole` node that consumes the values -- basically like C1 implementation does it. Alas, the many attempts to make sure the new node is not DCE'd failed. Roland looked at it, and suggested that there seems to be no way to model the effects we are after: consume the value, but have no observable side effects. So, suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. > > ...which is the essence of the third attempt. Drag the blackhole through C2 as if it has call-like side effects, and then emit nothing. Instead of boolean flag, the subsequent iteration introduced a full new `CallBlackholeJava` node, that is a call as far as optimizers are concerned, and then it is matched to nothing in `.ad`. This seems to require the least fiddling with C2 internals. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Rehash tests: remove useless, strengthen existing ones - Merge branch 'master' into JDK-8252505-blackholes - Support for ARM32, PPC, S390 - Fix x86_32 support - Merge branch 'master' into JDK-8252505-blackholes - Fix AArch64 build and test - Fixes after the merge - Merge branch 'master' into JDK-8252505-blackholes - More touchups: dead code, resolve TODOs - Revert old changes - ... and 2 more: https://git.openjdk.java.net/jdk/compare/a3eec39b...0dab041e ------------- Changes: https://git.openjdk.java.net/jdk/pull/1203/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=06 Stats: 1082 lines in 46 files changed: 1071 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/1203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203 PR: https://git.openjdk.java.net/jdk/pull/1203 From smonteith at openjdk.java.net Fri Nov 27 15:17:04 2020 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Fri, 27 Nov 2020 15:17:04 GMT Subject: RFR: 8248736: [aarch64] runtime/signal/TestSigpoll.java failed "fatal error: not an ldr (literal) instruction." Message-ID: Change mov to movptr in the nmethod entry barrier. movptr will generate a consistent number of mov/movk instructions that are necessary to consistently calculate the size of the nmethod barrier. ------------- Commit messages: - 8248736: Fix nmethod_entry_barrier alignment Changes: https://git.openjdk.java.net/jdk/pull/1481/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1481&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8248736 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1481.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1481/head:pull/1481 PR: https://git.openjdk.java.net/jdk/pull/1481 From jlahoda at openjdk.java.net Fri Nov 27 17:06:03 2020 From: jlahoda at openjdk.java.net (Jan Lahoda) Date: Fri, 27 Nov 2020 17:06:03 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) Message-ID: This pull request replaces https://github.com/openjdk/jdk/pull/1227. >From the original PR: > Please review the code for the second iteration of sealed classes. In this iteration we are: > > * Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies > > * Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface > > * renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] > > * adding code to make sure that annotations can't be sealed > > * improving some tests > > > TIA > > Related specs: > [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) > [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) > [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) This PR strives to reflect the review comments from 1227: * adjustments to javadoc of j.l.Class methods * package access checks in Class.getPermittedSubclasses() * fixed to the narrowing conversion/castability as pointed out by Maurizio ------------- Commit messages: - Moving checkPackageAccess from getPermittedSubclasses to a separate method. - Improving getPermittedSubclasses() javadoc. - Enhancing the Class.getPermittedSubclasses() test to verify behavior both for sealed classes in named and unnamed modules. - Removing unnecessary file. - Tweaking javadoc. - Reflecting review comments w.r.t. narrowing conversion. - Improving checks in getPermittedSubclasses() - Merging master into JDK-8246778 - Adding checkPackageAccess to Class.getPermittedSubclasses(). - 8246778: Compiler implementation for Sealed Classes (Second Preview) Changes: https://git.openjdk.java.net/jdk/pull/1483/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1483&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8246778 Stats: 915 lines in 12 files changed: 834 ins; 9 del; 72 mod Patch: https://git.openjdk.java.net/jdk/pull/1483.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1483/head:pull/1483 PR: https://git.openjdk.java.net/jdk/pull/1483 From kvn at openjdk.java.net Fri Nov 27 19:04:00 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 27 Nov 2020 19:04:00 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v4] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 10:20:17 GMT, Doug Simon wrote: >> A number of jtreg tests require a specific GC. These tests should be ignored when EnableJVMCI is true and the JVMCI compiler does not support the required GC. >> >> This PR adds `JVMCICompiler.isGCSupported` and makes use of it in `WhiteBox.isGCSupported`. >> >> Prior to this PR, a test requiring a GC not yet supported by a JVMCI compiler fail as follows: >> Error occurred during initialization of VM >> JVMCI Compiler does not support selected GC: epsilon gc > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8257020 > - implemented isGCSupported in Graal compiler > - removed JVMCI logic from GCConfig and introduced WhiteBox.isGCSupportedByJVMCI instead > - removed broken check_jvmci_supported_gc logic > - removed redundant signature > - move logic to GCConfig > - enable a JVMCICompiler to specify which GCs it supports Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1423 From kvn at openjdk.java.net Fri Nov 27 19:05:00 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 27 Nov 2020 19:05:00 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v2] In-Reply-To: References: Message-ID: On Thu, 26 Nov 2020 13:51:05 GMT, Roman Kennke wrote: >> Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8256999 >> - Added ZLoadBarrierElided = 0 definition. >> Removed is_exact argument in load_field_from_object(). >> Added Shenandoah support for narrow phantom accesses. >> - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo > > Shenandoah parts look good to me! Thanks! @fisk and @shipilev are you fine with updated version? ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From dnsimon at openjdk.java.net Fri Nov 27 19:17:02 2020 From: dnsimon at openjdk.java.net (Doug Simon) Date: Fri, 27 Nov 2020 19:17:02 GMT Subject: RFR: 8257020: [JVMCI] enable a JVMCICompiler to specify which GCs it supports [v4] In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 19:00:56 GMT, Vladimir Kozlov wrote: >> Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8257020 >> - implemented isGCSupported in Graal compiler >> - removed JVMCI logic from GCConfig and introduced WhiteBox.isGCSupportedByJVMCI instead >> - removed broken check_jvmci_supported_gc logic >> - removed redundant signature >> - move logic to GCConfig >> - enable a JVMCICompiler to specify which GCs it supports > > Looks good. Thanks for the review @vnkozlov . Are you also good with the PR is its current state @stefank ? ------------- PR: https://git.openjdk.java.net/jdk/pull/1423 From kim.barrett at oracle.com Fri Nov 27 20:03:59 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 27 Nov 2020 15:03:59 -0500 Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: <10F7EA8A-DD3B-4B13-AB81-59B0489862EC@oracle.com> > On Nov 27, 2020, at 2:09 AM, Thomas St?fe wrote: > But I can now see problems with shuffle too, esp. non-reproducibility. > Errors would appear intermittent but aren't. Executing all TEST tests > first, followed by TEST_VM, seems to be the simplest and cleanest solution. RFE? From kbarrett at openjdk.java.net Sat Nov 28 06:42:01 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 28 Nov 2020 06:42:01 GMT Subject: RFR: 8254733: HotSpot Style Guide should permit using range-based for loops Message-ID: Please review and vote on this change to the HotSpot Style Guide to permit the use of range-based `for` loops in HotSpot code. Range-based `for` is a feature added in C++11. This is a modification of the Style Guide, so rough consensus among the HotSpot Group members is required to make this change. Only Group members should vote for approval (via the github PR), though reasoned objectsions or comments from anyone will be considered. A decision on this proposal will not be made before Monday 7-Dec-2020 at 12h00 UTC. Since we're piggybacking on github PRs here, please use the PR review process to approve (click on Review Changes > Approve), rather than sending a "vote: yes" email reply that would be normal for a CFV. Other responses can still use email of course. ------------- Commit messages: - 8254733: HotSpot Style Guide should permit using range-based for loops Changes: https://git.openjdk.java.net/jdk/pull/1488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1488&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254733 Stats: 5 lines in 2 files changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1488/head:pull/1488 PR: https://git.openjdk.java.net/jdk/pull/1488 From kbarrett at openjdk.java.net Sat Nov 28 06:54:59 2020 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 28 Nov 2020 06:54:59 GMT Subject: RFR: 8254733: HotSpot Style Guide should permit using range-based for loops In-Reply-To: References: Message-ID: <9pTqYH7Py4tVHNh1RpwDHv2y_IWgBpByKJfivQ-4Z5E=.2e72e13a-d2b2-4df7-ad9d-16edf9b1c423@github.com> On Sat, 28 Nov 2020 06:37:57 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of range-based `for` loops in HotSpot code. Range-based > `for` is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision on > this proposal will not be made before Monday 7-Dec-2020 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than > sending a "vote: yes" email reply that would be normal for a CFV. > Other responses can still use email of course. The range-based for loop feature is currently of limited utility, because HotSpot code mostly avoids using the Standard Library and existing HotSpot code provides relatively little support for the feature. However, GrowableArray and EnumRange both provide the necessary begin and end member functions returning iterators. There's a chicken and egg problem with the latter; there's no reason to provide support for the feature if it can't be used. By permitting use of the feature we encourage adding support. There is at least one use of the feature already present. It was used in a recent PR; the reviewers noted this but decided to allow it, with the expectation that the feature would be explicitly permitted soon. Most uses of the new EnumIterator facility could use and benefit from the feature. ------------- PR: https://git.openjdk.java.net/jdk/pull/1488 From stuefe at openjdk.java.net Sat Nov 28 08:09:58 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 28 Nov 2020 08:09:58 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: On Fri, 27 Nov 2020 06:56:33 GMT, Igor Ignatyev wrote: >> Marked as reviewed by dholmes (Reviewer). > >> _Mailing list message from [Thomas St??fe](mailto:thomas.stuefe at gmail.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ >> >> On Thu, Nov 26, 2020 at 12:51 AM Kim Barrett wrote: >> >> > > On Nov 25, 2020, at 12:25 AM, Thomas Stuefe >> > > wrote: >> > > Why is someone concurrently changing the seed? I thought "TEST" tests do >> > > not start the VM? Or is it that some earlier test already did? >> > >> > >> > I was surprised by this too. >> > "TEST" tests and "TEST_VM" tests are collected in a single list and >> > run sequentially. The first "TEST_VM" test that gets run by that will >> > trigger VM initialization, and all the remaining tests (whether "TEST" >> > or "TEST_VM") get run in the resulting context. >> > I don't much like that behavior. >> >> I don't either. One of the problems I keep encountering is that I use TEST >> only to later see that I accidentally used VM infrastructure; but you only >> notice when executing the test in a different order. >> >> googletests has all these nice options, eg --gtest_shuffle. Maybe we should >> randomize the execution order of these tests to shake loose these errors. >> >> Cheers, Thome > > there is an easier way to find all *current* errors, you can just run each test individually, so all `TEST` which should have been `TEST_VM` will fail. there is a script which I used to use to do that, and I actually thought we have integrated it alongside the prev. rounds of such clean. > > as a historical note, one of the first iterations of JEP [281](https://bugs.openjdk.java.net/browse/JDK-8047975) ran `TEST` before all other tests, but at some point, we decided that the complexity this brought just didn't worth it. it can easily be a case that this decision wasn't right (or isn't right anymore), and we should be able to change the behavior to ensure that `TEST` tests are run w/o VM being initialized, the only problem I can foresee is w/ `--gtest_shuffle`. > > Cheers, > -- Igor I created https://bugs.openjdk.java.net/browse/JDK-8257226 to track that RFE ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From eosterlund at openjdk.java.net Sat Nov 28 09:04:03 2020 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Sat, 28 Nov 2020 09:04:03 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 23:35:14 GMT, Vladimir Kozlov wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8256999 > - Added ZLoadBarrierElided = 0 definition. > Removed is_exact argument in load_field_from_object(). > Added Shenandoah support for narrow phantom accesses. > - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1425 From shade at openjdk.java.net Sat Nov 28 09:46:01 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sat, 28 Nov 2020 09:46:01 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v2] In-Reply-To: References: Message-ID: On Sat, 28 Nov 2020 09:43:15 GMT, Aleksey Shipilev wrote: >> Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8256999 >> - Added ZLoadBarrierElided = 0 definition. >> Removed is_exact argument in load_field_from_object(). >> Added Shenandoah support for narrow phantom accesses. >> - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo > > Yes, still looks good for me. You might want to pull from master to get clean `x86_32` test runs. ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From shade at openjdk.java.net Sat Nov 28 09:45:59 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sat, 28 Nov 2020 09:45:59 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v2] In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 23:35:14 GMT, Vladimir Kozlov wrote: >> JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. >> >> Initial patch was prepared by @fisk. >> >> Tested hs-tier1-4. Added new compiler tests to test intrinsics. >> >> Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8256999 > - Added ZLoadBarrierElided = 0 definition. > Removed is_exact argument in load_field_from_object(). > Added Shenandoah support for narrow phantom accesses. > - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo Yes, still looks good for me. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1425 From kvn at openjdk.java.net Sat Nov 28 23:30:13 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sat, 28 Nov 2020 23:30:13 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v3] In-Reply-To: References: Message-ID: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8256999 - Merge branch 'master' into JDK-8256999 - Added ZLoadBarrierElided = 0 definition. Removed is_exact argument in load_field_from_object(). Added Shenandoah support for narrow phantom accesses. - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1425/files - new: https://git.openjdk.java.net/jdk/pull/1425/files/08bdd307..962d54d5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1425&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1425&range=01-02 Stats: 5258 lines in 151 files changed: 2571 ins; 1881 del; 806 mod Patch: https://git.openjdk.java.net/jdk/pull/1425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1425/head:pull/1425 PR: https://git.openjdk.java.net/jdk/pull/1425 From dholmes at openjdk.java.net Sun Nov 29 00:29:00 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 29 Nov 2020 00:29:00 GMT Subject: RFR: 8257233: Windows x86 build is broken by JDK-8252684 Message-ID: Moved the ifdef after the include of precompiled.hpp. Awaiting GHA results. ------------- Commit messages: - 8257233: Windows x86 build is broken by JDK-8252684 Changes: https://git.openjdk.java.net/jdk/pull/1500/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1500&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257233 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1500/head:pull/1500 PR: https://git.openjdk.java.net/jdk/pull/1500 From mikael at openjdk.java.net Sun Nov 29 00:29:01 2020 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Sun, 29 Nov 2020 00:29:01 GMT Subject: RFR: 8257233: Windows x86 build is broken by JDK-8252684 In-Reply-To: References: Message-ID: On Sun, 29 Nov 2020 00:21:31 GMT, David Holmes wrote: > Moved the ifdef after the include of precompiled.hpp. > > Awaiting GHA results. Marked as reviewed by mikael (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1500 From ngasson at openjdk.java.net Sun Nov 29 00:42:57 2020 From: ngasson at openjdk.java.net (Nick Gasson) Date: Sun, 29 Nov 2020 00:42:57 GMT Subject: RFR: 8257233: Windows x86 build is broken by JDK-8252684 In-Reply-To: References: Message-ID: On Sun, 29 Nov 2020 00:25:50 GMT, Mikael Vidstedt wrote: >> Moved the ifdef after the include of precompiled.hpp. >> >> Awaiting GHA results. > > Marked as reviewed by mikael (Reviewer). Thanks for fixing this. Sorry I didn't check the windows build logs. ------------- PR: https://git.openjdk.java.net/jdk/pull/1500 From dholmes at openjdk.java.net Sun Nov 29 01:23:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 29 Nov 2020 01:23:56 GMT Subject: Integrated: 8257233: Windows x86 build is broken by JDK-8252684 In-Reply-To: References: Message-ID: On Sun, 29 Nov 2020 00:21:31 GMT, David Holmes wrote: > Moved the ifdef after the include of precompiled.hpp. > > Awaiting GHA results. This pull request has now been integrated. Changeset: 04eecf03 Author: David Holmes URL: https://git.openjdk.java.net/jdk/commit/04eecf03 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod 8257233: Windows x86 build is broken by JDK-8252684 Reviewed-by: mikael ------------- PR: https://git.openjdk.java.net/jdk/pull/1500 From stuefe at openjdk.java.net Sun Nov 29 08:19:56 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sun, 29 Nov 2020 08:19:56 GMT Subject: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2] In-Reply-To: References: <-xtX9qSJHuD-qfp52XPToKhkl1HypRmNFHCJaupaync=.99285cd9-69a6-42cd-84b4-3c87fefc2cc5@github.com> <8f-BJdFip5yf0Rv4uw-qcXVk2uM3Lb6Hrq9VPR6UzF4=.04966477-8834-4fb9-aa77-8da86f104176@github.com> Message-ID: On Wed, 25 Nov 2020 13:58:49 GMT, Thomas Stuefe wrote: >> I agree with what Thomas is saying. This should be a generic thing for reservations, as I've suggested before, choosing the largest page size given the size of the mapping. I would also be good with starting with the `UseHugeTLBFS` case. >> >> When it comes to testing, we should not hard code these kind of things in the test, but add WhiteBox functions that return the correct numbers given the platform and environment. >> >> WhiteBox wb = WhiteBox.getWhiteBox(); >> smallPageSize = wb.getVMPageSize(); >> smallPageSize = wb.getVMPageSize(); >> largePageSize = wb.getVMLargePageSize(); >> largePageSize = wb.getVMLargePageSize(); >> largePageExecSize = 2097152; >> So instead of hard coding this, I guess the correct approach would be to return an array of available page sizes and verify that the correct one is used. > > I honestly don't even know why we have UseSHM. Seems redundant, and since it uses SystemV shared memory which has a different semantics from mmap, it is subtly broken in a number of places (eg https://bugs.openjdk.java.net/browse/JDK-8257040 or https://bugs.openjdk.java.net/browse/JDK-8257041). One thing I stumbled upon while looking at this code is why the CodeHeap always wants to have at least 8 pages covering its range: // If large page support is enabled, align code heaps according to large // page size to make sure that code cache is covered by large pages. const size_t alignment = MAX2(page_size(false, 8), (size_t) os::vm_allocation_granularity()); which means that for a wish pagesize of 1G, the code cache would have to cover at least 8G. I am not even sure this is possible, isn't it limited to 4G? Anyway, they don't uncommit. And the comment in codecache.cpp indicates this is to be able to step-wise commit, but with huge pages the space is committed right from the start anyway. So I do not see what good these 8 pages do. If we allowed the CodeCache to use just one page, it could be 1G in size and use a single 1G page. Note that there are similar min_page_size requests in GC, but I did not look closer into them. Also, this does not take away the usefulness of this proposal. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From kvn at openjdk.java.net Sun Nov 29 16:52:15 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 29 Nov 2020 16:52:15 GMT Subject: RFR: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo [v4] In-Reply-To: References: Message-ID: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8256999 - Merge branch 'master' into JDK-8256999 - Merge branch 'master' into JDK-8256999 - Added ZLoadBarrierElided = 0 definition. Removed is_exact argument in load_field_from_object(). Added Shenandoah support for narrow phantom accesses. - 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1425/files - new: https://git.openjdk.java.net/jdk/pull/1425/files/962d54d5..0e4596ce Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1425&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1425&range=02-03 Stats: 115 lines in 5 files changed: 8 ins; 98 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/1425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1425/head:pull/1425 PR: https://git.openjdk.java.net/jdk/pull/1425 From kvn at openjdk.java.net Sun Nov 29 20:31:58 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Sun, 29 Nov 2020 20:31:58 GMT Subject: Integrated: 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo In-Reply-To: References: Message-ID: On Wed, 25 Nov 2020 03:31:36 GMT, Vladimir Kozlov wrote: > JDK-8188055 added the function Reference.refersTo. For performance, the supporting native methods Reference.refersTo0 and PhantomReference.refersTo0 should be intrinsified by C2. > > Initial patch was prepared by @fisk. > > Tested hs-tier1-4. Added new compiler tests to test intrinsics. > > Ran new test with Shenandoah. Found only one issue. As result I disable PhantomReference::refersTo intrinsic for COOP+ Shenandoah combination. Someone from Shenandoah team have to test changes if that is enough. This pull request has now been integrated. Changeset: 816e8f83 Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/816e8f83 Stats: 381 lines in 20 files changed: 248 ins; 62 del; 71 mod 8256999: Add C2 intrinsic for Reference.refersTo and PhantomReference::refersTo Reviewed-by: pliden, vlivanov, rkennke, eosterlund, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1425 From dholmes at openjdk.java.net Mon Nov 30 02:10:56 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 30 Nov 2020 02:10:56 GMT Subject: RFR: 8254733: HotSpot Style Guide should permit using range-based for loops In-Reply-To: References: Message-ID: <6S6VwhN9Q-SigneHE6VxbagwA_-9ZkpSH386D_ou-Ds=.9640a6ec-b373-451b-9e6a-a847f85f4250@github.com> On Sat, 28 Nov 2020 06:37:57 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of range-based `for` loops in HotSpot code. Range-based > `for` is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision on > this proposal will not be made before Monday 7-Dec-2020 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than > sending a "vote: yes" email reply that would be normal for a CFV. > Other responses can still use email of course. Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1488 From stuefe at openjdk.java.net Mon Nov 30 07:21:13 2020 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 30 Nov 2020 07:21:13 GMT Subject: RFR: JDK-8256864: [windows] Improve tracing for mapping errors [v3] In-Reply-To: References: Message-ID: > To analyze JDK-8256729 further, we need more tracing: > > 1) We should print all mappings inside the split area if os::split_reserved_memory() fails > > 2) The print-mapping code on windows has some shortcomings: > - should not probe for mappings outside of what we know are valid address ranges for Windows > - should handle wrap-arounds - it should be possible to print the whole address space > - Protection information is printed wrong (MEMORY_BASIC_INFORMATION.Protect would be the correct member) > - should be printed in a more compact manner - base address should be on the same line as the first region > - maybe adorned with some basic range info, e.g. library mappings Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Fix gtest for MacOS and AIX ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1390/files - new: https://git.openjdk.java.net/jdk/pull/1390/files/71d8ef99..be7acf6c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1390&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1390&range=01-02 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1390/head:pull/1390 PR: https://git.openjdk.java.net/jdk/pull/1390 From rrich at openjdk.java.net Mon Nov 30 08:39:56 2020 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 30 Nov 2020 08:39:56 GMT Subject: Integrated: 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant In-Reply-To: References: Message-ID: On Mon, 23 Nov 2020 11:40:49 GMT, Richard Reingruber wrote: > This is a XS clean-up of Deoptimization::revoke_for_object_deoptimization() which removes the StackWatermarkSet::start_processing() call. > > This is correct because all paths leading to revoke_for_object_deoptimization() are equipped with a KeepStackGCProcessedMark. > > Call Tree: > > StackWatermarkSet::start_processing(JavaThread *, enum StackWatermarkKind) : void > Deoptimization::revoke_for_object_deoptimization(JavaThread *, frame, RegisterMap *, JavaThread *) : void > Deoptimization::deoptimize_objects_internal(JavaThread *, GrowableArray *, bool &) : bool > EscapeBarrier::deoptimize_objects_internal(JavaThread *, intptr_t *) : bool > EscapeBarrier::deoptimize_objects_all_threads() : bool // has KeepStackGCProcessedMark > EscapeBarrier::deoptimize_objects(intptr_t *) : bool > EscapeBarrier::deoptimize_objects(int, int) : bool // has KeepStackGCProcessedMark > > Testing: hotspot_serviceability, jdk_svc, jdk_jdi, vmTestbase_nsk_jdi, vmTestbase_nsk_jvmti, vmTestbase_nsk_jdwp with -XX:+UseZGC. This pull request has now been integrated. Changeset: e77aed62 Author: Richard Reingruber URL: https://git.openjdk.java.net/jdk/commit/e77aed62 Stats: 24 lines in 4 files changed: 20 ins; 3 del; 1 mod 8256754: Deoptimization::revoke_for_object_deoptimization: stack processing start call is redundant Reviewed-by: dlong, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/1381 From shade at openjdk.java.net Mon Nov 30 08:58:04 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 30 Nov 2020 08:58:04 GMT Subject: RFR: 8257396: AArch64 Zero build is broken after JDK-8252684 Message-ID: Zero does not have AArch64 assembler, so attempt to use it from the test fails to build. The fix is trivial: sense if we are building Zero. Testing: - [x] Linux aarch64 zero build ------------- Commit messages: - 8257396: AArch64 Zero build is broken after JDK-8252684 Changes: https://git.openjdk.java.net/jdk/pull/1511/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1511&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257396 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1511/head:pull/1511 PR: https://git.openjdk.java.net/jdk/pull/1511 From shade at openjdk.java.net Mon Nov 30 09:21:02 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 30 Nov 2020 09:21:02 GMT Subject: RFR: 8248736: [aarch64] runtime/signal/TestSigpoll.java failed "fatal error: not an ldr (literal) instruction." In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 15:11:18 GMT, Stuart Monteith wrote: > Change mov to movptr in the nmethod entry barrier. movptr will generate a consistent number of mov/movk instructions that are necessary to consistently calculate the size of the nmethod barrier. src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.cpp line 257: > 255: __ br(Assembler::EQ, skip); > 256: > 257: __ movptr(rscratch1, (uintptr_t) StubRoutines::aarch64::method_entry_barrier()); Looking how the rest of AArch64 code calls into external things, this might be: __ lea(rscratch1, ExternalAddress(StubRoutines::aarch64::method_entry_barrier())); ...which would end up doing the same (right) thing? ------------- PR: https://git.openjdk.java.net/jdk/pull/1481 From smonteith at openjdk.java.net Mon Nov 30 09:34:56 2020 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Mon, 30 Nov 2020 09:34:56 GMT Subject: RFR: 8248736: [aarch64] runtime/signal/TestSigpoll.java failed "fatal error: not an ldr (literal) instruction." In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 09:18:28 GMT, Aleksey Shipilev wrote: >> Change mov to movptr in the nmethod entry barrier. movptr will generate a consistent number of mov/movk instructions that are necessary to consistently calculate the size of the nmethod barrier. > > src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.cpp line 257: > >> 255: __ br(Assembler::EQ, skip); >> 256: >> 257: __ movptr(rscratch1, (uintptr_t) StubRoutines::aarch64::method_entry_barrier()); > > Looking how the rest of AArch64 code calls into external things, this might be: > > __ lea(rscratch1, ExternalAddress(StubRoutines::aarch64::method_entry_barrier())); > > ...which would end up doing the same (right) thing? lea calls Address::lea, which itself might call movptr, or but it can also construct the address through other means. It won't, not in the way it is currently written. However, movptr() should be call when generating patchable sequences. While this address isn't patched, we do rely on the property that patchable sequences are of a fixed length. The mistake I made before was in not ensuring that property. ------------- PR: https://git.openjdk.java.net/jdk/pull/1481 From aph at openjdk.java.net Mon Nov 30 09:38:58 2020 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 30 Nov 2020 09:38:58 GMT Subject: RFR: 8248736: [aarch64] runtime/signal/TestSigpoll.java failed "fatal error: not an ldr (literal) instruction." In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 09:31:55 GMT, Stuart Monteith wrote: > lea calls Address::lea, which itself might call movptr, or but it can also construct the address through other means. True. LEA is a macro that generates the address in some unspecified way. ------------- PR: https://git.openjdk.java.net/jdk/pull/1481 From shade at openjdk.java.net Mon Nov 30 09:58:57 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 30 Nov 2020 09:58:57 GMT Subject: RFR: 8248736: [aarch64] runtime/signal/TestSigpoll.java failed "fatal error: not an ldr (literal) instruction." In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 09:35:53 GMT, Andrew Haley wrote: >> lea calls Address::lea, which itself might call movptr, or but it can also construct the address through other means. It won't, not in the way it is currently written. However, movptr() should be call when generating patchable sequences. While this address isn't patched, we do rely on the property that patchable sequences are of a fixed length. The mistake I made before was in not ensuring that property. > >> lea calls Address::lea, which itself might call movptr, or but it can also construct the address through other means. > > True. LEA is a macro that generates the address in some unspecified way. All right! ------------- PR: https://git.openjdk.java.net/jdk/pull/1481 From alanb at openjdk.java.net Mon Nov 30 09:58:58 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Mon, 30 Nov 2020 09:58:58 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) In-Reply-To: References: Message-ID: On Fri, 27 Nov 2020 16:57:54 GMT, Jan Lahoda wrote: > This pull request replaces https://github.com/openjdk/jdk/pull/1227. > > From the original PR: > >> Please review the code for the second iteration of sealed classes. In this iteration we are: >> >> * Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies >> >> * Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface >> >> * renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] >> >> * adding code to make sure that annotations can't be sealed >> >> * improving some tests >> >> >> TIA >> >> Related specs: >> [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) >> [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) >> [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) > > This PR strives to reflect the review comments from 1227: > * adjustments to javadoc of j.l.Class methods > * package access checks in Class.getPermittedSubclasses() > * fixed to the narrowing conversion/castability as pointed out by Maurizio src/java.base/share/classes/java/lang/Class.java line 4420: > 4418: * {@linkplain #getClassLoader() the defining class loader} of the current > 4419: * {@code Class} object. If a name cannot be converted to the {@code Class} > 4420: * instance, it is silently excluded from the result. I think this paragraph will need a little bit of wordsmithing. The 3rd paragraph of getNestMembers might be useful to examine as it more clearly describes how the method attempts to "obtain" the Class object for each of the class names in the NestMembers attribute and maybe some of that wording could be used instead of using the term "convert". Minor nit but the prevailing style for the @throws SecurityException is to align the description with the exception, probably best to keep it consistent if you can. ------------- PR: https://git.openjdk.java.net/jdk/pull/1483 From mdoerr at openjdk.java.net Mon Nov 30 11:08:56 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 30 Nov 2020 11:08:56 GMT Subject: RFR: 8256843: [PPC64] runtime/logging/RedefineClasses.java fails with assert: registers not saved on stack In-Reply-To: References: Message-ID: <1fZj0MQrD0HSC1OGd11d4i6I9l3aZFnelGveurPDyr8=.785e9701-d64a-4b2d-a7d1-8c5033b63dee@github.com> On Tue, 24 Nov 2020 14:50:30 GMT, Martin Doerr wrote: >> Looks good to me. > > Thanks for the review! > Unfortunatly, it's still not fully tested because PPC build is currently broken. I'll check again later. Tests have passed in the meantime. ------------- PR: https://git.openjdk.java.net/jdk/pull/1394 From rraj at openjdk.java.net Mon Nov 30 12:03:58 2020 From: rraj at openjdk.java.net (Rohit Arul Raj) Date: Mon, 30 Nov 2020 12:03:58 GMT Subject: RFR: 8256536: Newer AMD 19h (EPYC) Processor family defaults In-Reply-To: References: Message-ID: On Thu, 19 Nov 2020 18:04:03 GMT, Vladimir Kozlov wrote: >> This patch sets the following flags as defaults for newer AMD 19h (EPYC) processors. >> bool UseFPUForSpilling = true >> bool UseUnalignedLoadStores = true >> bool UseXMMForArrayCopy = true >> bool UseXMMForObjInit = true >> bool UseFastStosb = false >> bool AlignVector = false >> >> Additional testing: make run-test TEST="tier1 tier2" JTREG="JOBS=128" CONF=linux-x86_64-server-fastdebug >> >> Please review this change. >> >> Thanks, >> Rohit > > Good. @vnkozlov Hello Vladimir, can you please sponsor this patch? Another query I have is that to back port this patch to JDK11 branch, will opening a RFE be enough? Regards, Rohit ------------- PR: https://git.openjdk.java.net/jdk/pull/1288 From dholmes at openjdk.java.net Mon Nov 30 12:37:54 2020 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 30 Nov 2020 12:37:54 GMT Subject: RFR: 8257396: AArch64 Zero build is broken after JDK-8252684 In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 08:49:54 GMT, Aleksey Shipilev wrote: > Zero does not have AArch64 assembler, so attempt to use it from the test fails to build. The fix is trivial: sense if we are building Zero. > > Testing: > - [x] Linux aarch64 zero build Lets hope this is the last adjustment needed in relation to this change. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1511 From thomas.stuefe at gmail.com Mon Nov 30 12:39:10 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 30 Nov 2020 13:39:10 +0100 Subject: [11u] RFR: JDK-8255734: VM should ignore SIGXFSZ on ppc64, s390 too Message-ID: Hi, may I have reviews please for the following backport: Original issue: https://bugs.openjdk.java.net/browse/JDK-8255734 Original patch: https://github.com/openjdk/jdk/commit/54c88132.diff Modified patch: http://cr.openjdk.java.net/~stuefe/webrevs/backports/8255734-VM-should-ignore-SIGXFSZ-on-ppc64-s390-too.diff Patch is trivial and nothing changed, but did not apply correctly since "8252324: Signal related code should be shared among POSIX platforms" shuffled a lot of coding around. Thanks, Thomas From coleenp at openjdk.java.net Mon Nov 30 12:54:15 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 12:54:15 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v3] In-Reply-To: References: Message-ID: > The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. > I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. > Ran tier1 tests on linux-x64 and windows-x64. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add comment about next_random. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1422/files - new: https://git.openjdk.java.net/jdk/pull/1422/files/3292c2f8..ba372ae6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1422&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1422&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/1422.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1422/head:pull/1422 PR: https://git.openjdk.java.net/jdk/pull/1422 From coleenp at openjdk.java.net Mon Nov 30 12:54:15 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 12:54:15 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: On Sat, 28 Nov 2020 08:07:19 GMT, Thomas Stuefe wrote: >>> _Mailing list message from [Thomas St??fe](mailto:thomas.stuefe at gmail.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ >>> >>> On Thu, Nov 26, 2020 at 12:51 AM Kim Barrett wrote: >>> >>> > > On Nov 25, 2020, at 12:25 AM, Thomas Stuefe >>> > > wrote: >>> > > Why is someone concurrently changing the seed? I thought "TEST" tests do >>> > > not start the VM? Or is it that some earlier test already did? >>> > >>> > >>> > I was surprised by this too. >>> > "TEST" tests and "TEST_VM" tests are collected in a single list and >>> > run sequentially. The first "TEST_VM" test that gets run by that will >>> > trigger VM initialization, and all the remaining tests (whether "TEST" >>> > or "TEST_VM") get run in the resulting context. >>> > I don't much like that behavior. >>> >>> I don't either. One of the problems I keep encountering is that I use TEST >>> only to later see that I accidentally used VM infrastructure; but you only >>> notice when executing the test in a different order. >>> >>> googletests has all these nice options, eg --gtest_shuffle. Maybe we should >>> randomize the execution order of these tests to shake loose these errors. >>> >>> Cheers, Thome >> >> there is an easier way to find all *current* errors, you can just run each test individually, so all `TEST` which should have been `TEST_VM` will fail. there is a script which I used to use to do that, and I actually thought we have integrated it alongside the prev. rounds of such clean. >> >> as a historical note, one of the first iterations of JEP [281](https://bugs.openjdk.java.net/browse/JDK-8047975) ran `TEST` before all other tests, but at some point, we decided that the complexity this brought just didn't worth it. it can easily be a case that this decision wasn't right (or isn't right anymore), and we should be able to change the behavior to ensure that `TEST` tests are run w/o VM being initialized, the only problem I can foresee is w/ `--gtest_shuffle`. >> >> Cheers, >> -- Igor > > I created https://bugs.openjdk.java.net/browse/JDK-8257226 to track that RFE Thanks for the reviews, and recommendation for the patch, Thomas. ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From coleenp at openjdk.java.net Mon Nov 30 12:54:16 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 12:54:16 GMT Subject: RFR: 8254042: gtest/GTestWrapper.java failed os.test_random [v2] In-Reply-To: References: <_SKxbxqodbI4aWGdFGVdks6_VqeADnMgjMfi_oEK-Bc=.2cd493f4-1467-48dc-a1f2-2bb674649f2b@github.com> Message-ID: On Fri, 27 Nov 2020 00:59:27 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore old copyright >> - Refix os.random test using Thomas Stuefe's better suggestion. > > test/hotspot/gtest/runtime/test_os.cpp line 125: > >> 123: int num; >> 124: for (int k = 0; k < reps; k++) { >> 125: num = seed = os::next_random(seed); > > I suggest adding a comment before this line: > // Use next_random so the calculation is stateless. > Or something to that affect. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From coleenp at openjdk.java.net Mon Nov 30 12:54:17 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 12:54:17 GMT Subject: Integrated: 8254042: gtest/GTestWrapper.java failed os.test_random In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 23:38:52 GMT, Coleen Phillimore wrote: > The function os::init_random() and os::random() both set the _rand_seed. This test thinks nothing can change the seed while it is computing its expected value. > I changed the test to run in a VM operation safepoint. Alternately, I can change the test to not verify the random value computed in the loop, or remove the test. > Ran tier1 tests on linux-x64 and windows-x64. This pull request has now been integrated. Changeset: 4db05e99 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/4db05e99 Stats: 15 lines in 7 files changed: 1 ins; 11 del; 3 mod 8254042: gtest/GTestWrapper.java failed os.test_random Reviewed-by: dholmes, stuefe, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/1422 From coleenp at openjdk.java.net Mon Nov 30 13:00:00 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 13:00:00 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" In-Reply-To: <4GRPSMKxwFONrO-YzHXh7z8WxPHJM2tP0Y5X5uWnV10=.03265f2f-a4f7-4ef0-99e3-37701aa6b7fa@github.com> References: <4GRPSMKxwFONrO-YzHXh7z8WxPHJM2tP0Y5X5uWnV10=.03265f2f-a4f7-4ef0-99e3-37701aa6b7fa@github.com> Message-ID: On Fri, 27 Nov 2020 02:26:44 GMT, David Holmes wrote: >> The ServiceThread cleaning used a stale ObjectFree state when calling remove_dead_entries, because another thread had concurrently set is_enabled to false. Add a lock around setting/resetting the lock event state and retest the state under a lock. Ran the test 100s of time without failure, where otherwise it fails very quickly. >> Tested with tier2,3 and running tiers 4,5,6 in progress. >> Thanks to Kim for his previous feedback. > > src/hotspot/share/prims/jvmtiTagMap.cpp line 1162: > >> 1160: if (_needs_cleaning) { >> 1161: // Recheck whether to post object free events under the lock. >> 1162: post_object_free = post_object_free && env()->is_enabled(JVMTI_EVENT_OBJECT_FREE); > > Where is `is_enabled` called without the lock being held in a caller of `remove_dead_entries()`? void JvmtiTagMap::flush_object_free_events() { assert_not_at_safepoint(); if (env()->is_enabled(JVMTI_EVENT_OBJECT_FREE)) { Called by JVMTI to disable events and called by the service thread. And here for get_objects_with_tags: if (collector.some_dead_found() && env()->is_enabled(JVMTI_EVENT_OBJECT_FREE)) { post_dead_objects_on_vm_thread(); } ------------- PR: https://git.openjdk.java.net/jdk/pull/1439 From coleenp at openjdk.java.net Mon Nov 30 12:59:58 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 12:59:58 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" In-Reply-To: <1NGa5DMWUHpmh3JWelXH4X2bQRt7pymebSQ19tywORM=.95e3ef25-e234-4daf-89f3-ea481b9c2266@github.com> References: <1NGa5DMWUHpmh3JWelXH4X2bQRt7pymebSQ19tywORM=.95e3ef25-e234-4daf-89f3-ea481b9c2266@github.com> Message-ID: On Wed, 25 Nov 2020 23:25:55 GMT, Kim Barrett wrote: >> The ServiceThread cleaning used a stale ObjectFree state when calling remove_dead_entries, because another thread had concurrently set is_enabled to false. Add a lock around setting/resetting the lock event state and retest the state under a lock. Ran the test 100s of time without failure, where otherwise it fails very quickly. >> Tested with tier2,3 and running tiers 4,5,6 in progress. >> Thanks to Kim for his previous feedback. > > src/hotspot/share/prims/jvmtiEventController.cpp line 463: > >> 461: ((now_enabled & OBJECT_FREE_BIT)) != 0) { >> 462: // Set/reset the event enabled under the tagmap lock. >> 463: set_enabled_events_with_lock(env, now_enabled); > > You could tighten up the test to only handle specially when the state of the ObjectFree event is changing, i.e. > > (((was_enabled ^ now_enabled) & OBJECT_FREE_BIT) != 0)``` > > Or you could not bother with the conditionalization at all, and just always call set_enabled_events_with_lock; I bet nobody would notice any performance difference. That would eliminate "benign" races between unlocked bit setting here and bit testing in remove_dead_entries_locked. Of course, the current implementation has such races on these bits all over the place; what's one race more or less among friends... > > Or you could just leave it as you have it. Your call. I like the idea of reseting the enabled bits under a lock unconditionally. I'll do that. ------------- PR: https://git.openjdk.java.net/jdk/pull/1439 From pliden at openjdk.java.net Mon Nov 30 13:03:01 2020 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 30 Nov 2020 13:03:01 GMT Subject: RFR: 8257415: ZGC: Fix barrier_data types Message-ID: The `barrier_data` is an `uin8_t`, but we sometimes pass it around as an `int`. With this patch we always treat it as an `uint8_t`. ------------- Commit messages: - 8257415: ZGC: Fix barrier_data types Changes: https://git.openjdk.java.net/jdk/pull/1514/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1514&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257415 Stats: 7 lines in 4 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/1514.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1514/head:pull/1514 PR: https://git.openjdk.java.net/jdk/pull/1514 From aph at redhat.com Mon Nov 30 13:29:40 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 30 Nov 2020 13:29:40 +0000 Subject: RFR: 8257396: AArch64 Zero build is broken after JDK-8252684 In-Reply-To: References: Message-ID: On 11/30/20 12:37 PM, David Holmes wrote: > On Mon, 30 Nov 2020 08:49:54 GMT, Aleksey Shipilev wrote: > >> Zero does not have AArch64 assembler, so attempt to use it from the test fails to build. The fix is trivial: sense if we are building Zero. >> >> Testing: >> - [x] Linux aarch64 zero build > > Lets hope this is the last adjustment needed in relation to this change. Aren't the pre-integration tests supposed to pick up this kind of thing? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at openjdk.java.net Mon Nov 30 13:36:56 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 30 Nov 2020 13:36:56 GMT Subject: RFR: 8257396: AArch64 Zero build is broken after JDK-8252684 In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 12:35:35 GMT, David Holmes wrote: > Lets hope this is the last adjustment needed in relation to this change. All builds are green for me, except AArch64 Zero. > Aren't the pre-integration tests supposed to pick up this kind of thing? GH actions do not build AArch64 Zero. GH actions build both x86_64 Zero and AArch64 Server. [My CIs](https://builds.shipilev.net/openjdk-jdk/), on the other hand, make a point to build every configuration out there every night -- at expense of CPU time spent on it -- and thus capture even the odd corner cases like these. ------------- PR: https://git.openjdk.java.net/jdk/pull/1511 From coleenp at openjdk.java.net Mon Nov 30 13:57:14 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 13:57:14 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" [v2] In-Reply-To: References: Message-ID: <5Ed2eVORB8pJ0BmKISasvgLZCT4HIjJt-YA4sjZZW0k=.96a9d20c-f931-493a-83c2-2444e2646896@github.com> > The ServiceThread cleaning used a stale ObjectFree state when calling remove_dead_entries, because another thread had concurrently set is_enabled to false. Add a lock around setting/resetting the lock event state and retest the state under a lock. Ran the test 100s of time without failure, where otherwise it fails very quickly. > Tested with tier2,3 and running tiers 4,5,6 in progress. > Thanks to Kim for his previous feedback. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Make enable events lock unconditionally if tagmap present. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1439/files - new: https://git.openjdk.java.net/jdk/pull/1439/files/1d9e6bef..ca1715b9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1439&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1439&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 6 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1439.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1439/head:pull/1439 PR: https://git.openjdk.java.net/jdk/pull/1439 From martin.doerr at sap.com Mon Nov 30 14:20:53 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 30 Nov 2020 14:20:53 +0000 Subject: [11u] RFR: JDK-8255734: VM should ignore SIGXFSZ on ppc64, s390 too In-Reply-To: References: Message-ID: Hi Thomas, backport looks good. Thanks for doing it. Best regards, Martin > -----Original Message----- > From: jdk-updates-dev On > Behalf Of Thomas St?fe > Sent: Montag, 30. November 2020 13:39 > To: HotSpot Open Source Developers ; > jdk-updates-dev > Subject: [11u] RFR: JDK-8255734: VM should ignore SIGXFSZ on ppc64, s390 > too > > Hi, > > may I have reviews please for the following backport: > > Original issue: https://bugs.openjdk.java.net/browse/JDK-8255734 > Original patch: https://github.com/openjdk/jdk/commit/54c88132.diff > Modified patch: > http://cr.openjdk.java.net/~stuefe/webrevs/backports/8255734-VM- > should-ignore-SIGXFSZ-on-ppc64-s390-too.diff > > Patch is trivial and nothing changed, but did not apply correctly since > "8252324: Signal related code should be shared among POSIX platforms" > shuffled a lot of coding around. > > Thanks, Thomas From smonteith at openjdk.java.net Mon Nov 30 14:48:58 2020 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Mon, 30 Nov 2020 14:48:58 GMT Subject: RFR: 8257415: ZGC: Fix barrier_data types In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 12:42:00 GMT, Per Liden wrote: > The `barrier_data` is an `uin8_t`, but we sometimes pass it around as an `int`. With this patch we always treat it as an `uint8_t`. Looks good to me. (I'm not a reviewer). ------------- Marked as reviewed by smonteith (Author). PR: https://git.openjdk.java.net/jdk/pull/1514 From mdoerr at openjdk.java.net Mon Nov 30 15:42:03 2020 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 30 Nov 2020 15:42:03 GMT Subject: RFR: 8257423: [PPC64] Support -XX:-UseInlineCaches Message-ID: <66d_EAED3_BRpr9YmIrv1seQFfyUOIPdntMllUyJa-Q=.61c545be-b26d-4d3a-8ac6-e7239656b7a6@github.com> The JVM currently runs into Unimplemented() when using -XX:-UseInlineCaches in C2 code (postalloc_expand_java_dynamic_call_sched). I'd like to enable the existing code in postalloc_expand_java_dynamic_call_sched and fix MachCallDynamicJavaNode::ret_addr_offset() and MacroAssembler::instr_size_for_decode_klass_not_null(). I suggest to use scratch emit to determine the size, because there are too many cases and emitting it once is fast. ------------- Commit messages: - 8257423: [PPC64] Support -XX:-UseInlineCaches Changes: https://git.openjdk.java.net/jdk/pull/1521/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1521&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8257423 Stats: 31 lines in 2 files changed: 15 ins; 7 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/1521.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1521/head:pull/1521 PR: https://git.openjdk.java.net/jdk/pull/1521 From jlahoda at openjdk.java.net Mon Nov 30 15:59:07 2020 From: jlahoda at openjdk.java.net (Jan Lahoda) Date: Mon, 30 Nov 2020 15:59:07 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) [v2] In-Reply-To: References: Message-ID: > This pull request replaces https://github.com/openjdk/jdk/pull/1227. > > From the original PR: > >> Please review the code for the second iteration of sealed classes. In this iteration we are: >> >> * Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies >> >> * Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface >> >> * renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] >> >> * adding code to make sure that annotations can't be sealed >> >> * improving some tests >> >> >> TIA >> >> Related specs: >> [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) >> [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) >> [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) > > This PR strives to reflect the review comments from 1227: > * adjustments to javadoc of j.l.Class methods > * package access checks in Class.getPermittedSubclasses() > * fixed to the narrowing conversion/castability as pointed out by Maurizio Jan Lahoda has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Improving getPermittedSubclasses javadoc. - Merge branch 'master' into JDK-8246778 - Moving checkPackageAccess from getPermittedSubclasses to a separate method. - Improving getPermittedSubclasses() javadoc. - Enhancing the Class.getPermittedSubclasses() test to verify behavior both for sealed classes in named and unnamed modules. - Removing unnecessary file. - Tweaking javadoc. - Reflecting review comments w.r.t. narrowing conversion. - Improving checks in getPermittedSubclasses() - Merging master into JDK-8246778 - ... and 2 more: https://git.openjdk.java.net/jdk/compare/6e006223...4d484179 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1483/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1483&range=01 Stats: 918 lines in 12 files changed: 837 ins; 9 del; 72 mod Patch: https://git.openjdk.java.net/jdk/pull/1483.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1483/head:pull/1483 PR: https://git.openjdk.java.net/jdk/pull/1483 From jlahoda at openjdk.java.net Mon Nov 30 15:59:09 2020 From: jlahoda at openjdk.java.net (Jan Lahoda) Date: Mon, 30 Nov 2020 15:59:09 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) [v2] In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 09:55:56 GMT, Alan Bateman wrote: >> Jan Lahoda has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Improving getPermittedSubclasses javadoc. >> - Merge branch 'master' into JDK-8246778 >> - Moving checkPackageAccess from getPermittedSubclasses to a separate method. >> - Improving getPermittedSubclasses() javadoc. >> - Enhancing the Class.getPermittedSubclasses() test to verify behavior both for sealed classes in named and unnamed modules. >> - Removing unnecessary file. >> - Tweaking javadoc. >> - Reflecting review comments w.r.t. narrowing conversion. >> - Improving checks in getPermittedSubclasses() >> - Merging master into JDK-8246778 >> - ... and 2 more: https://git.openjdk.java.net/jdk/compare/6e006223...4d484179 > > src/java.base/share/classes/java/lang/Class.java line 4420: > >> 4418: * {@linkplain #getClassLoader() the defining class loader} of the current >> 4419: * {@code Class} object. If a name cannot be converted to the {@code Class} >> 4420: * instance, it is silently excluded from the result. > > I think this paragraph will need a little bit of wordsmithing. The 3rd paragraph of getNestMembers might be useful to examine as it more clearly describes how the method attempts to "obtain" the Class object for each of the class names in the NestMembers attribute and maybe some of that wording could be used instead of using the term "convert". > > Minor nit but the prevailing style for the @throws SecurityException is to align the description with the exception, probably best to keep it consistent if you can. Thanks, I've tried to improve the javadoc here: https://github.com/openjdk/jdk/pull/1483/commits/4d484179e6e4d64ed460b997d25b4dca5d964016 ------------- PR: https://git.openjdk.java.net/jdk/pull/1483 From thomas.stuefe at gmail.com Mon Nov 30 16:29:39 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 30 Nov 2020 17:29:39 +0100 Subject: [11u] RFR: JDK-8255734: VM should ignore SIGXFSZ on ppc64, s390 too In-Reply-To: References: Message-ID: Thanks, Martin. On Mon, Nov 30, 2020 at 3:20 PM Doerr, Martin wrote: > Hi Thomas, > > backport looks good. Thanks for doing it. > > Best regards, > Martin > > > > -----Original Message----- > > From: jdk-updates-dev On > > Behalf Of Thomas St?fe > > Sent: Montag, 30. November 2020 13:39 > > To: HotSpot Open Source Developers ; > > jdk-updates-dev > > Subject: [11u] RFR: JDK-8255734: VM should ignore SIGXFSZ on ppc64, s390 > > too > > > > Hi, > > > > may I have reviews please for the following backport: > > > > Original issue: https://bugs.openjdk.java.net/browse/JDK-8255734 > > Original patch: https://github.com/openjdk/jdk/commit/54c88132.diff > > Modified patch: > > http://cr.openjdk.java.net/~stuefe/webrevs/backports/8255734-VM- > > should-ignore-SIGXFSZ-on-ppc64-s390-too.diff > > > > Patch is trivial and nothing changed, but did not apply correctly since > > "8252324: Signal related code should be shared among POSIX platforms" > > shuffled a lot of coding around. > > > > Thanks, Thomas > From pliden at openjdk.java.net Mon Nov 30 17:14:57 2020 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 30 Nov 2020 17:14:57 GMT Subject: RFR: 8254733: HotSpot Style Guide should permit using range-based for loops In-Reply-To: References: Message-ID: On Sat, 28 Nov 2020 06:37:57 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of range-based `for` loops in HotSpot code. Range-based > `for` is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision on > this proposal will not be made before Monday 7-Dec-2020 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than > sending a "vote: yes" email reply that would be normal for a CFV. > Other responses can still use email of course. Marked as reviewed by pliden (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1488 From ioi.lam at oracle.com Mon Nov 30 17:35:48 2020 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 30 Nov 2020 09:35:48 -0800 Subject: [PING] RFR: 8256254: Convert vmIntrinsics::ID to enum class [v2] In-Reply-To: References: Message-ID: <9e3d3023-430a-acd8-b952-49180c56e9c6@oracle.com> May I have a second reviewer? Thanks - Ioi On 11/18/20 2:25 AM, Claes Redestad wrote: > On Tue, 17 Nov 2020 23:16:17 GMT, Ioi Lam wrote: > >>> This PR is follows the same style as https://github.com/openjdk/jdk/pull/276, except this time I am converting `vmIntrinsics::ID` to `vmIntrinsicID`: >>> >>> * Convert `vmIntrinsics::SID` to `enum class` to provide better type safety. >>> * Also, put this enum class in the top level, so it can be forward-declared. I.e., `enum class vmIntrinsicID : int;`. This avoids excessive inclusion of vmIntrinsics.hpp and vmSymbols.hpp (which were included indirectly by almost every hotspot source files). >>> * vmIntrinsics.hpp: was included 805 times, now included 414 times >>> * vmSymbols.hpp: was included 805 times, now include 394 times >>> * Note: more #include reduction will be done in [JDK-8256424](https://bugs.openjdk.java.net/browse/JDK-8256424) >>> >>> Many files are changed, but most of them are minor >>> * Added missing dependencies of vmSymbols.hpp and/or vmIntrinsics.hpp >>> * safe conversion between vmIntrinsicID and integer types (see comments around `vmIntrinsics::as_int()`) >>> >>> Since we have a lot of references like `vmIntrinsics::_invokeGeneric`, I added aliases like >>> >>> static const vmIntrinsicID vmIntrinsics::_invokeGeneric = vmIntrinsicID::_invokeGeneric >>> >>> so we don't need to change over a thousand `vmIntrinsics::XXX` to `vmIntrinsicID::XXX`. >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @cl4es reviews > Marked as reviewed by redestad (Reviewer). > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/1237 From alanb at openjdk.java.net Mon Nov 30 17:46:04 2020 From: alanb at openjdk.java.net (Alan Bateman) Date: Mon, 30 Nov 2020 17:46:04 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) [v2] In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 15:59:07 GMT, Jan Lahoda wrote: >> This pull request replaces https://github.com/openjdk/jdk/pull/1227. >> >> From the original PR: >> >>> Please review the code for the second iteration of sealed classes. In this iteration we are: >>> >>> * Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies >>> >>> * Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface >>> >>> * renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] >>> >>> * adding code to make sure that annotations can't be sealed >>> >>> * improving some tests >>> >>> >>> TIA >>> >>> Related specs: >>> [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) >>> [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) >>> [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) >> >> This PR strives to reflect the review comments from 1227: >> * adjustments to javadoc of j.l.Class methods >> * package access checks in Class.getPermittedSubclasses() >> * fixed to the narrowing conversion/castability as pointed out by Maurizio > > Jan Lahoda has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Improving getPermittedSubclasses javadoc. > - Merge branch 'master' into JDK-8246778 > - Moving checkPackageAccess from getPermittedSubclasses to a separate method. > - Improving getPermittedSubclasses() javadoc. > - Enhancing the Class.getPermittedSubclasses() test to verify behavior both for sealed classes in named and unnamed modules. > - Removing unnecessary file. > - Tweaking javadoc. > - Reflecting review comments w.r.t. narrowing conversion. > - Improving checks in getPermittedSubclasses() > - Merging master into JDK-8246778 > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/6e006223...4d484179 src/java.base/share/classes/java/lang/Class.java line 4412: > 4410: * The {@code Class} objects which can be obtained using this procedure > 4411: * are indicated by elements of the returned array. If a {@code Class} object > 4412: * cannot be obtained, it is silently ignored, and not included in the result Thanks for the update, this reads much better. ------------- PR: https://git.openjdk.java.net/jdk/pull/1483 From kvn at openjdk.java.net Mon Nov 30 18:18:55 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 18:18:55 GMT Subject: RFR: 8256536: Newer AMD 19h (EPYC) Processor family defaults In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 12:01:37 GMT, Rohit Arul Raj wrote: >> Good. > > @vnkozlov > > Hello Vladimir, can you please sponsor this patch? > Another query I have is that to back port this patch to JDK11 branch, will opening a RFE be enough? > > Regards, > Rohit For backports you have to use the same Bug ID 8256536 - do NOT crate new RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/1288 From kvn at openjdk.java.net Mon Nov 30 18:21:56 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 18:21:56 GMT Subject: RFR: 8256536: Newer AMD 19h (EPYC) Processor family defaults In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 18:15:54 GMT, Vladimir Kozlov wrote: >> @vnkozlov >> >> Hello Vladimir, can you please sponsor this patch? >> Another query I have is that to back port this patch to JDK11 branch, will opening a RFE be enough? >> >> Regards, >> Rohit > > For backports you have to use the same Bug ID 8256536 - do NOT crate new RFE. You do need to add backport 'Fix Request' to this RFE (8256536): http://openjdk.java.net/projects/jdk-updates/approval.html ------------- PR: https://git.openjdk.java.net/jdk/pull/1288 From kvn at openjdk.java.net Mon Nov 30 18:37:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 18:37:57 GMT Subject: RFR: 8256254: Convert vmIntrinsics::ID to enum class [v2] In-Reply-To: References: Message-ID: <6MtLoYmFPbnvkoayFiKJ90WbxJgz7Ls-3v_DTohxyB4=.383e49f6-7d19-432e-b0d2-19f5edc1fc91@github.com> On Tue, 17 Nov 2020 23:16:17 GMT, Ioi Lam wrote: >> This PR is follows the same style as https://github.com/openjdk/jdk/pull/276, except this time I am converting `vmIntrinsics::ID` to `vmIntrinsicID`: >> >> * Convert `vmIntrinsics::SID` to `enum class` to provide better type safety. >> * Also, put this enum class in the top level, so it can be forward-declared. I.e., `enum class vmIntrinsicID : int;`. This avoids excessive inclusion of vmIntrinsics.hpp and vmSymbols.hpp (which were included indirectly by almost every hotspot source files). >> * vmIntrinsics.hpp: was included 805 times, now included 414 times >> * vmSymbols.hpp: was included 805 times, now include 394 times >> * Note: more #include reduction will be done in [JDK-8256424](https://bugs.openjdk.java.net/browse/JDK-8256424) >> >> Many files are changed, but most of them are minor >> * Added missing dependencies of vmSymbols.hpp and/or vmIntrinsics.hpp >> * safe conversion between vmIntrinsicID and integer types (see comments around `vmIntrinsics::as_int()`) >> >> Since we have a lot of references like `vmIntrinsics::_invokeGeneric`, I added aliases like >> >> static const vmIntrinsicID vmIntrinsics::_invokeGeneric = vmIntrinsicID::_invokeGeneric >> >> so we don't need to change over a thousand `vmIntrinsics::XXX` to `vmIntrinsicID::XXX`. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @cl4es reviews Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1237 From kvn at openjdk.java.net Mon Nov 30 19:03:57 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 19:03:57 GMT Subject: RFR: 8254733: HotSpot Style Guide should permit using range-based for loops In-Reply-To: References: Message-ID: On Sat, 28 Nov 2020 06:37:57 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of range-based `for` loops in HotSpot code. Range-based > `for` is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision on > this proposal will not be made before Monday 7-Dec-2020 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than > sending a "vote: yes" email reply that would be normal for a CFV. > Other responses can still use email of course. @kimbarrett Not related to these changes which are fine. I looked again on voting description for Style Guide changes. And it references to `rough consensus` which is not in OpenJDK bylaws : https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.html#L69 I think it is bug (separate from these changes) and should be fixed by using our rule http://openjdk.java.net/bylaws#three-vote-consensus With Project Lead final vote we will need at least 2 other members votes during 2 weeks review period. I think it is similar to `rough consensus`. ------------- PR: https://git.openjdk.java.net/jdk/pull/1488 From mchung at openjdk.java.net Mon Nov 30 19:36:02 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Mon, 30 Nov 2020 19:36:02 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) [v2] In-Reply-To: References: Message-ID: <1b8GH2EQDJvztsMclvskMZrsYcjnapSyqqGMGcutLTY=.b2ef4f3c-0e0d-43a9-9890-cdfbd12f53e9@github.com> On Mon, 30 Nov 2020 15:59:07 GMT, Jan Lahoda wrote: >> This pull request replaces https://github.com/openjdk/jdk/pull/1227. >> >> From the original PR: >> >>> Please review the code for the second iteration of sealed classes. In this iteration we are: >>> >>> * Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies >>> >>> * Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface >>> >>> * renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] >>> >>> * adding code to make sure that annotations can't be sealed >>> >>> * improving some tests >>> >>> >>> TIA >>> >>> Related specs: >>> [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) >>> [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) >>> [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) >> >> This PR strives to reflect the review comments from 1227: >> * adjustments to javadoc of j.l.Class methods >> * package access checks in Class.getPermittedSubclasses() >> * fixed to the narrowing conversion/castability as pointed out by Maurizio > > Jan Lahoda has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Improving getPermittedSubclasses javadoc. > - Merge branch 'master' into JDK-8246778 > - Moving checkPackageAccess from getPermittedSubclasses to a separate method. > - Improving getPermittedSubclasses() javadoc. > - Enhancing the Class.getPermittedSubclasses() test to verify behavior both for sealed classes in named and unnamed modules. > - Removing unnecessary file. > - Tweaking javadoc. > - Reflecting review comments w.r.t. narrowing conversion. > - Improving checks in getPermittedSubclasses() > - Merging master into JDK-8246778 > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/6e006223...4d484179 src/java.base/share/classes/java/lang/Class.java line 4463: > 4461: * @apiNote > 4462: * Sealed class or interface has no relationship with > 4463: * {@linkplain Package#isSealed package sealing}. Package sealing is legacy. Remi suggests to take out this api note which sounds good to me. The API note in Package::isSealed has made this clear which has no relationship with sealed class or interface. src/java.base/share/classes/java/lang/Class.java line 4415: > 4413: * array. > 4414: * > 4415: * @return an array of class objects of the permitted subclasses of this class or interface Nit: s/class objects/{@code Class} objects/ ------------- PR: https://git.openjdk.java.net/jdk/pull/1483 From mchung at openjdk.java.net Mon Nov 30 19:48:00 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Mon, 30 Nov 2020 19:48:00 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) [v2] In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 15:59:07 GMT, Jan Lahoda wrote: >> This pull request replaces https://github.com/openjdk/jdk/pull/1227. >> >> From the original PR: >> >>> Please review the code for the second iteration of sealed classes. In this iteration we are: >>> >>> * Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies >>> >>> * Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface >>> >>> * renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] >>> >>> * adding code to make sure that annotations can't be sealed >>> >>> * improving some tests >>> >>> >>> TIA >>> >>> Related specs: >>> [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) >>> [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) >>> [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) >> >> This PR strives to reflect the review comments from 1227: >> * adjustments to javadoc of j.l.Class methods >> * package access checks in Class.getPermittedSubclasses() >> * fixed to the narrowing conversion/castability as pointed out by Maurizio > > Jan Lahoda has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Improving getPermittedSubclasses javadoc. > - Merge branch 'master' into JDK-8246778 > - Moving checkPackageAccess from getPermittedSubclasses to a separate method. > - Improving getPermittedSubclasses() javadoc. > - Enhancing the Class.getPermittedSubclasses() test to verify behavior both for sealed classes in named and unnamed modules. > - Removing unnecessary file. > - Tweaking javadoc. > - Reflecting review comments w.r.t. narrowing conversion. > - Improving checks in getPermittedSubclasses() > - Merging master into JDK-8246778 > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/6e006223...4d484179 src/java.base/share/classes/java/lang/Class.java line 4480: > 4478: } > 4479: > 4480: private native Class[] getPermittedSubclasses0(); Does this JVM method return the permitted subclasses or subinterfaces with the following conditions enforced by JLS: - If a sealed class C belongs to a named module, then every class named in the permits clause of the declaration of C must belong to the same module as C - If a sealed class C belongs to an unnamed module, then every class named in the permits clause of the declaration of C must belong to the same package as C Should the library implementation of `Class::getPermittedSubclasses` filter that if not done by `getPermittedSubclasses0`? If the return array contains only classes as specified above, `checkPackageAccessForClasses` can be simplified. ------------- PR: https://git.openjdk.java.net/jdk/pull/1483 From shade at openjdk.java.net Mon Nov 30 19:55:10 2020 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 30 Nov 2020 19:55:10 GMT Subject: RFR: 8252505: C1/C2 compiler support for blackholes [v8] In-Reply-To: References: Message-ID: > JMH uses the [`Blackhole::consume`](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/infra/Blackhole.java#l153) methods to avoid dead-code elimination of the code that produces benchmark values. It now relies on producing opaque side-effects and breaking inlining. While it was proved useful for many years, it unfortunately comes with several major drawbacks: > > 1. Call costs dominate nanobenchmarks. On TR 3970X, the call cost is several nanoseconds. > 2. The work spent in Blackhole.consume dominates nanobenchmarks too. It takes about a nanosecond on TR 3970X. > 3. Argument preparation for call makes different argument types behave differently. This is prominent on architectures where calling conventions for passing e.g. floating-point arguments require elaborate dance. > > Supporting this directly in compilers would improve nanobenchmark fidelity. > > Instead of introducing public APIs or special-casing JMH methods in JVM, we can hook a new command to compiler control, and let JMH sign up its Blackhole methods for it with `-XX:CompileCommand=blackhole,org.openjdk.jmh.infra.Blackhole::consume`. This is being prototyped as [CODETOOLS-7902762](https://bugs.openjdk.java.net/browse/CODETOOLS-7902762). It makes Blackholes behave [substantially better](http://cr.openjdk.java.net/~shade/8252505/bh-old-vs-new.png). > > Current prototype is for initial approach review and early testing. I am open for suggestions how to make it simpler; not that I haven't tried, but it is likely there is something I am overlooking here. > > C1 code is platform-independent, and it adds the new node which is then lowered to nothing. > > C2 code is more complicated. There were four attempts to implement this, and what you see in the PR is the fourth attempt. > > [First attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.01/) was to introduce fake store like `StoreV` ("store void"), and then lower them to nothing. It runs into a series of funky problems: you would like to have at least two shapes of the store to match the store type width not to confuse the optimizer, or even have the whole mirror of `Store*` hierarchy. Additionally, optimizer tweaks were needed. The awkward problem of GC barrier verification showed up: if `StoreV*` is a subclass of `Store*`, then verificators rightfully expect GC barriers before them. If we emit GC, then we need to handle walking over `StoreV*` nodes in optimizers. > > [Second attempt](http://cr.openjdk.java.net/~shade/8252505/webrev.04/) was to introduce the special `Blackhole` node that consumes the values -- basically like C1 implementation does it. Alas, the many attempts to make sure the new node is not DCE'd failed. Roland looked at it, and suggested that there seems to be no way to model the effects we are after: consume the value, but have no observable side effects. So, suggested we instead put the boolean flag onto `CallJavaNode`, and then match it to nothing in `.ad`. > > ...which is the essence of the third attempt. Drag the blackhole through C2 as if it has call-like side effects, and then emit nothing. Instead of boolean flag, the subsequent iteration introduced a full new `CallBlackholeJava` node, that is a call as far as optimizers are concerned, and then it is matched to nothing in `.ad`. This seems to require the least fiddling with C2 internals. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' into JDK-8252505-blackholes - Remodel CallBlackholeJava -> CallBlackhole - Rehash tests: remove useless, strengthen existing ones - Merge branch 'master' into JDK-8252505-blackholes - Support for ARM32, PPC, S390 - Fix x86_32 support - Merge branch 'master' into JDK-8252505-blackholes - Fix AArch64 build and test - Fixes after the merge - Merge branch 'master' into JDK-8252505-blackholes - ... and 4 more: https://git.openjdk.java.net/jdk/compare/41dbc139...8a84621b ------------- Changes: https://git.openjdk.java.net/jdk/pull/1203/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1203&range=07 Stats: 1109 lines in 47 files changed: 1103 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/1203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1203/head:pull/1203 PR: https://git.openjdk.java.net/jdk/pull/1203 From sspitsyn at openjdk.java.net Mon Nov 30 20:08:57 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 30 Nov 2020 20:08:57 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" [v2] In-Reply-To: <5Ed2eVORB8pJ0BmKISasvgLZCT4HIjJt-YA4sjZZW0k=.96a9d20c-f931-493a-83c2-2444e2646896@github.com> References: <5Ed2eVORB8pJ0BmKISasvgLZCT4HIjJt-YA4sjZZW0k=.96a9d20c-f931-493a-83c2-2444e2646896@github.com> Message-ID: <8XNFwVFqjXQ95gjVxcRvmopfjzqkhxeLwWpVv488Grs=.2c0f8a1d-7a51-487a-a3da-4e0ffa34acd7@github.com> On Mon, 30 Nov 2020 13:57:14 GMT, Coleen Phillimore wrote: >> The ServiceThread cleaning used a stale ObjectFree state when calling remove_dead_entries, because another thread had concurrently set is_enabled to false. Add a lock around setting/resetting the lock event state and retest the state under a lock. Ran the test 100s of time without failure, where otherwise it fails very quickly. >> Tested with tier2,3 and running tiers 4,5,6 in progress. >> Thanks to Kim for his previous feedback. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Make enable events lock unconditionally if tagmap present. Hi Coleen, The fix looks okay to me. Of course, there is some overhead but I'm not sure the impact is significant. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1439 From coleenp at openjdk.java.net Mon Nov 30 20:42:58 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 20:42:58 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" [v2] In-Reply-To: <8XNFwVFqjXQ95gjVxcRvmopfjzqkhxeLwWpVv488Grs=.2c0f8a1d-7a51-487a-a3da-4e0ffa34acd7@github.com> References: <5Ed2eVORB8pJ0BmKISasvgLZCT4HIjJt-YA4sjZZW0k=.96a9d20c-f931-493a-83c2-2444e2646896@github.com> <8XNFwVFqjXQ95gjVxcRvmopfjzqkhxeLwWpVv488Grs=.2c0f8a1d-7a51-487a-a3da-4e0ffa34acd7@github.com> Message-ID: <6gQu7Xd92VxquOSm5PHSFQ_OOYMiffQGaLuc_gT3H0o=.dc861e81-28a0-4510-9526-9339dc95e5d7@github.com> On Mon, 30 Nov 2020 20:05:43 GMT, Serguei Spitsyn wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Make enable events lock unconditionally if tagmap present. > > Hi Coleen, > The fix looks okay to me. > Of course, there is some overhead but I'm not sure the impact is significant. > Thanks, > Serguei Thank you Serguei! ------------- PR: https://git.openjdk.java.net/jdk/pull/1439 From mchung at openjdk.java.net Mon Nov 30 20:49:02 2020 From: mchung at openjdk.java.net (Mandy Chung) Date: Mon, 30 Nov 2020 20:49:02 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) [v2] In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 15:59:07 GMT, Jan Lahoda wrote: >> This pull request replaces https://github.com/openjdk/jdk/pull/1227. >> >> From the original PR: >> >>> Please review the code for the second iteration of sealed classes. In this iteration we are: >>> >>> * Enhancing narrowing reference conversion to allow for stricter checking of cast conversions with respect to sealed type hierarchies >>> >>> * Also local classes are not considered when determining implicitly declared permitted direct subclasses of a sealed class or sealed interface >>> >>> * renaming Class::permittedSubclasses to Class::getPermittedSubclasses, still in the same method, the return type has been changed to Class[] instead of the previous ClassDesc[] >>> >>> * adding code to make sure that annotations can't be sealed >>> >>> * improving some tests >>> >>> >>> TIA >>> >>> Related specs: >>> [Sealed Classes JSL](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jls.html) >>> [Sealed Classes JVMS](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/sealed-classes-jvms.html) >>> [Additional: Contextual Keywords](http://cr.openjdk.java.net/~gbierman/jep397/jep397-20201104/specs/contextual-keywords-jls.html) >> >> This PR strives to reflect the review comments from 1227: >> * adjustments to javadoc of j.l.Class methods >> * package access checks in Class.getPermittedSubclasses() >> * fixed to the narrowing conversion/castability as pointed out by Maurizio > > Jan Lahoda has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Improving getPermittedSubclasses javadoc. > - Merge branch 'master' into JDK-8246778 > - Moving checkPackageAccess from getPermittedSubclasses to a separate method. > - Improving getPermittedSubclasses() javadoc. > - Enhancing the Class.getPermittedSubclasses() test to verify behavior both for sealed classes in named and unnamed modules. > - Removing unnecessary file. > - Tweaking javadoc. > - Reflecting review comments w.r.t. narrowing conversion. > - Improving checks in getPermittedSubclasses() > - Merging master into JDK-8246778 > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/6e006223...4d484179 src/java.base/share/classes/java/lang/Class.java line 3042: > 3040: for (Class c : classes) { > 3041: // skip the package access check on a proxy class in default proxy package > 3042: if (!Proxy.isProxyClass(c) || ReflectUtil.isNonPublicProxyClass(c)) { If a sealed class is in a named module, the permitted subclasses/subinterfaces are in the same module as the sealed class. If a sealed class is in an unnamed module, it will be in the same runtime package as the sealed class. A proxy class is dynamically generated and not intended for statically named in `permits` clause of a sealed class`. It can be in a different module or different package. So a permitted subclass or interface should never be a proxy class. So the package access check for permitted subclasses/subinterfaces can be simplified. I would suggest this check be inlined in `getPermittedSubclasses` as follows: SecurityManager sm = System.getSecurityManager(); if (subclasses.length > 0 && sm != null) { ClassLoader ccl = ClassLoader.getClassLoader(Reflection.getCallerClass()); ClassLoader cl = getClassLoader0(); if (ReflectUtil.needsPackageAccessCheck(ccl, cl)) { Set packages = new HashSet<>(); for (Class c : subclasses) { if (Proxy.isProxyClass(c)) throw new InternalError("a permitted subclass should not be a proxy class: " + c); String pkg = c.getPackageName(); if (!pkg.isEmpty()) packages.add(pkg); } for (String pkg : packages) { sm.checkPackageAccess(pkg); } } } ------------- PR: https://git.openjdk.java.net/jdk/pull/1483 From coleenp at openjdk.java.net Mon Nov 30 20:55:00 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 20:55:00 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" [v2] In-Reply-To: <6gQu7Xd92VxquOSm5PHSFQ_OOYMiffQGaLuc_gT3H0o=.dc861e81-28a0-4510-9526-9339dc95e5d7@github.com> References: <5Ed2eVORB8pJ0BmKISasvgLZCT4HIjJt-YA4sjZZW0k=.96a9d20c-f931-493a-83c2-2444e2646896@github.com> <8XNFwVFqjXQ95gjVxcRvmopfjzqkhxeLwWpVv488Grs=.2c0f8a1d-7a51-487a-a3da-4e0ffa34acd7@github.com> <6gQu7Xd92VxquOSm5PHSFQ_OOYMiffQGaLuc_gT3H0o=.dc861e81-28a0-4510-9526-9339dc95e5d7@github.com> Message-ID: On Mon, 30 Nov 2020 20:40:16 GMT, Coleen Phillimore wrote: >> Hi Coleen, >> The fix looks okay to me. >> Of course, there is some overhead but I'm not sure the impact is significant. >> Thanks, >> Serguei > > Thank you Serguei! I don't know how to measure impact, to be honest. ------------- PR: https://git.openjdk.java.net/jdk/pull/1439 From hseigel at openjdk.java.net Mon Nov 30 20:59:59 2020 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 30 Nov 2020 20:59:59 GMT Subject: RFR: 8246778: Compiler implementation for Sealed Classes (Second Preview) [v2] In-Reply-To: References: Message-ID: <9cyMSNayyu63LkCt6r1Ltrzzv7TPvcsity5IYimZNa4=.f35daee0-3c52-4844-8354-d4b0507c57ff@github.com> On Mon, 30 Nov 2020 19:44:52 GMT, Mandy Chung wrote: >> Jan Lahoda has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Improving getPermittedSubclasses javadoc. >> - Merge branch 'master' into JDK-8246778 >> - Moving checkPackageAccess from getPermittedSubclasses to a separate method. >> - Improving getPermittedSubclasses() javadoc. >> - Enhancing the Class.getPermittedSubclasses() test to verify behavior both for sealed classes in named and unnamed modules. >> - Removing unnecessary file. >> - Tweaking javadoc. >> - Reflecting review comments w.r.t. narrowing conversion. >> - Improving checks in getPermittedSubclasses() >> - Merging master into JDK-8246778 >> - ... and 2 more: https://git.openjdk.java.net/jdk/compare/6e006223...4d484179 > > src/java.base/share/classes/java/lang/Class.java line 4480: > >> 4478: } >> 4479: >> 4480: private native Class[] getPermittedSubclasses0(); > > Does this JVM method return the permitted subclasses or subinterfaces with the following conditions enforced by JLS: > > - If a sealed class C belongs to a named module, then every class named in the permits clause of the declaration of C must belong to the same module as C > - If a sealed class C belongs to an unnamed module, then every class named in the permits clause of the declaration of C must belong to the same package as C > > I didn't check the VM implementation. > > If the return array contains only classes as specified above, `checkPackageAccessForClasses` can be simplified. The JVM method that returns the permitted subclasses (and interfaces) does not weed out permitted subclasses based on the above module requirements. It returns all the classes listed in the PermittedSubclasses attribute that it is able to load. ------------- PR: https://git.openjdk.java.net/jdk/pull/1483 From hseigel at openjdk.java.net Mon Nov 30 21:20:06 2020 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 30 Nov 2020 21:20:06 GMT Subject: RFR: 8256718: Obsolete the long term deprecated and aliased Trace flags Message-ID: Please review this change to obsolete the deprecated and aliased Trace flags. The now empty aliased_logging_flags support was left in arguments.cpp for use by trace flags that get deprecated and aliased in the future. With this change, users will get the following example messages when using these obsolete flags, depending on whether -XX:+... or -XX:-... was specified: VM warning: Ignoring option TraceClassPaths; support was removed in 16.0. Please use -Xlog:class+path=info instead. VM warning: Ignoring option TraceClassPaths; support was removed in 16.0. Please use -Xlog:class+path=off instead. The change was tested with tiers1and 2 on Linux, Windows, and MacOS, and tiers 3-5 on Linux x64 and with JCK lang and vm tests. Thanks, Harold ------------- Commit messages: - 8256718: Obsolete the long term deprecated and aliased Trace flags Changes: https://git.openjdk.java.net/jdk/pull/1525/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1525&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256718 Stats: 190 lines in 22 files changed: 51 ins; 112 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/1525.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1525/head:pull/1525 PR: https://git.openjdk.java.net/jdk/pull/1525 From rraj at openjdk.java.net Mon Nov 30 22:33:22 2020 From: rraj at openjdk.java.net (Rohit Arul Raj) Date: Mon, 30 Nov 2020 22:33:22 GMT Subject: Integrated: 8256536: Newer AMD 19h (EPYC) Processor family defaults In-Reply-To: References: Message-ID: On Wed, 18 Nov 2020 10:36:03 GMT, Rohit Arul Raj wrote: > This patch sets the following flags as defaults for newer AMD 19h (EPYC) processors. > bool UseFPUForSpilling = true > bool UseUnalignedLoadStores = true > bool UseXMMForArrayCopy = true > bool UseXMMForObjInit = true > bool UseFastStosb = false > bool AlignVector = false > > Additional testing: make run-test TEST="tier1 tier2" JTREG="JOBS=128" CONF=linux-x86_64-server-fastdebug > > Please review this change. > > Thanks, > Rohit This pull request has now been integrated. Changeset: 29f86e00 Author: Rohit Arul Raj Committer: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/29f86e00 Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod 8256536: Newer AMD 19h (EPYC) Processor family defaults Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/1288 From kvn at openjdk.java.net Mon Nov 30 22:33:10 2020 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 30 Nov 2020 22:33:10 GMT Subject: RFR: 8256536: Newer AMD 19h (EPYC) Processor family defaults In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 18:19:13 GMT, Vladimir Kozlov wrote: >> For backports you have to use the same Bug ID 8256536 - do NOT crate new RFE. > > You do need to add backport 'Fix Request' to this RFE (8256536): http://openjdk.java.net/projects/jdk-updates/approval.html tier1-3 testing passed on x86. ------------- PR: https://git.openjdk.java.net/jdk/pull/1288 From jrose at openjdk.java.net Mon Nov 30 22:48:56 2020 From: jrose at openjdk.java.net (John R Rose) Date: Mon, 30 Nov 2020 22:48:56 GMT Subject: RFR: 8254733: HotSpot Style Guide should permit using range-based for loops In-Reply-To: References: Message-ID: On Sat, 28 Nov 2020 06:37:57 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of range-based `for` loops in HotSpot code. Range-based > `for` is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision on > this proposal will not be made before Monday 7-Dec-2020 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than > sending a "vote: yes" email reply that would be normal for a CFV. > Other responses can still use email of course. Marked as reviewed by jrose (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1488 From dcubed at openjdk.java.net Mon Nov 30 22:48:57 2020 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 30 Nov 2020 22:48:57 GMT Subject: RFR: 8254733: HotSpot Style Guide should permit using range-based for loops In-Reply-To: References: Message-ID: <7pzLdET9W-uHeRoam6z3RjB92E_762EOeeXK12FNxu8=.7aa65995-e942-429a-bff7-ffa95a97a391@github.com> On Sat, 28 Nov 2020 06:37:57 GMT, Kim Barrett wrote: > Please review and vote on this change to the HotSpot Style Guide to > permit the use of range-based `for` loops in HotSpot code. Range-based > `for` is a feature added in C++11. > > This is a modification of the Style Guide, so rough consensus among > the HotSpot Group members is required to make this change. Only Group > members should vote for approval (via the github PR), though reasoned > objectsions or comments from anyone will be considered. A decision on > this proposal will not be made before Monday 7-Dec-2020 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than > sending a "vote: yes" email reply that would be normal for a CFV. > Other responses can still use email of course. Marked as reviewed by dcubed (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1488 From sspitsyn at openjdk.java.net Mon Nov 30 23:24:55 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 30 Nov 2020 23:24:55 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" [v2] In-Reply-To: References: <5Ed2eVORB8pJ0BmKISasvgLZCT4HIjJt-YA4sjZZW0k=.96a9d20c-f931-493a-83c2-2444e2646896@github.com> <8XNFwVFqjXQ95gjVxcRvmopfjzqkhxeLwWpVv488Grs=.2c0f8a1d-7a51-487a-a3da-4e0ffa34acd7@github.com> <6gQu7Xd92VxquOSm5PHSFQ_OOYMiffQGaLuc_gT3H0o=.dc861e81-28a0-4510-9526-9339dc95e5d7@github.com> Message-ID: On Mon, 30 Nov 2020 20:51:58 GMT, Coleen Phillimore wrote: >> Thank you Serguei! > > I don't know how to measure impact, to be honest. I'd suggest to separate this potential issue and work on it only if the impact is noticeable. ------------- PR: https://git.openjdk.java.net/jdk/pull/1439 From coleenp at openjdk.java.net Mon Nov 30 23:32:56 2020 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 30 Nov 2020 23:32:56 GMT Subject: RFR: 8256830: misc tests failed with "assert(env->is_enabled(JVMTI_EVENT_OBJECT_FREE)) failed: checking" [v2] In-Reply-To: References: <5Ed2eVORB8pJ0BmKISasvgLZCT4HIjJt-YA4sjZZW0k=.96a9d20c-f931-493a-83c2-2444e2646896@github.com> <8XNFwVFqjXQ95gjVxcRvmopfjzqkhxeLwWpVv488Grs=.2c0f8a1d-7a51-487a-a3da-4e0ffa34acd7@github.com> <6gQu7Xd92VxquOSm5PHSFQ_OOYMiffQGaLuc_gT3H0o=.dc861e81-28a0-4510-9526-9339dc95e5d7@github.com> Message-ID: <_oTCrSz5kPvuLdlthK7hLfIsMjFl8fZd6FNllFbjV0k=.b3dc9d22-4ed7-40cf-8e43-e944369f52b8@github.com> On Mon, 30 Nov 2020 23:22:14 GMT, Serguei Spitsyn wrote: >> I don't know how to measure impact, to be honest. > > I'd suggest to separate this potential issue and work on it only if the impact is noticeable. Ok, yes, Serguei. I don't think this is noticeable since one doesn't enable and disable events over and over in a loop. thanks for the code review and comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/1439 From sspitsyn at openjdk.java.net Mon Nov 30 23:44:55 2020 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 30 Nov 2020 23:44:55 GMT Subject: RFR: 8256718: Obsolete the long term deprecated and aliased Trace flags In-Reply-To: References: Message-ID: On Mon, 30 Nov 2020 21:13:05 GMT, Harold Seigel wrote: > Please review this change to obsolete the deprecated and aliased Trace flags. The now empty aliased_logging_flags support was left in arguments.cpp for use by trace flags that get deprecated and aliased in the future. > > With this change, users will get the following example messages when using these obsolete flags, depending on whether -XX:+... or -XX:-... was specified: > > VM warning: Ignoring option TraceClassPaths; support was removed in 16.0. Please use -Xlog:class+path=info instead. > > VM warning: Ignoring option TraceClassPaths; support was removed in 16.0. Please use -Xlog:class+path=off instead. > > The change was tested with tiers1and 2 on Linux, Windows, and MacOS, and tiers 3-5 on Linux x64 and with JCK lang and vm tests. > > Thanks, Harold Hi Harold, The fix looks okay to me. I was more focusing on the serviceability flags. I'm not sure the flag `TraceJVMTIObjectTagging` should be mentioned in the `src/java.base/share/man/java.1` the same way as the `TraceRedefineClasses`. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1525