From martin.doerr at sap.com Mon Jul 2 07:55:57 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 2 Jul 2018 07:55:57 +0000 Subject: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space In-Reply-To: References: Message-ID: Hi Michihiro, thanks for addressing this issue. The change looks good to me. I only have a comment on the coding style (oop.inline.hpp): ?if ()? should be followed by braces ?{ ? }?. Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. Please note that SAP still supports CMS in the commercial VM so this change is still relevant and we?d like to push it to jdk11 if possible. But we definitely need an OK from a CMS expert (which I?m not). Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Mittwoch, 27. Juni 2018 02:23 To: hotspot-gc-dev at openjdk.java.net Cc: Doerr, Martin ; Kim Barrett ; Gustavo Romero Subject: RFR: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8205908 Webrev: http://cr.openjdk.java.net/~mhorie/8205908/webrev.00/ [Current implementation] ParNewGeneration::copy_to_survivor_space tries to move live objects to a different location. There are two patterns on how to copy an object depending on whether there is space to allocate new_obj in to-space or not. If a thread cannot find space to allocate new_obj in to-space, the thread first executes the CAS with a dummy forwarding pointer "ClaimedForwardPtr", which is a sentinel to mark an object as claimed. After succeeding in the CAS, a thread can copy the new_obj in the old space. Here, suppose thread A succeeds in the CAS, while thread B fails in the CAS. When thread A finishes the copy, it replaces the dummy forwarding pointer with a real forwarding pointer. After thread B fails in the CAS, thread B returns the forwardee after waiting for the copy of the forwardee is completed. This is observable by checking the dummy forwarding pointer is replaced with a real forwarding pointer by thread A. In contrast, if a thread can find space to allocate new_obj in to-space, the thread first copies the new_obj and then executes the CAS with the new_obj. If a thread fails in the CAS, it deallocates the copied new_obj and returns the forwardee. Procedure of ParNewGeneration::copy_to_survivor_space : ([L****] represents the line number in src/hotspot/share/gc/cms/parNewGeneration.cpp) 1. Try to each allocate space for new_obj in to-space [L.1110] 2. If fail in the allocation in to-space [L1117] 2.1. Execute the CAS with the dummy forwarding pointer [L1122] ??? (A) 2.2. If fail in the CAS, return the forwardee via real_forwardee() [L1123] 2.3. If succeed in the CAS [L1128] 2.3.1. If promotion is allowed, copy new_obj in the old area [L1129] 2.3.2. If promotion is not allowed, forward to obj itself [L1133] 2.4. Set new_obj as forwardee [L1142] 3. If succeed in the allocation in to-space [L1144] 3.1. Copy new_obj [L1146] 3.2. Execute the CAS with new_obj [L1148] ??? (B) 4. Dereference the new_obj for logging. Each new_obj copied by each thread at step 3.1 is used instead of forwardee() [L1159] 5. If succeed in either CAS (A) or CAS (B), return new_obj [L1163] 6. If fail in CAS (B), get the forwardee via real_forwardee(). Unallocate new_obj in to-space [L1193] 7. Return forwardee [L1203] For reference, real_forwardee() is as shown below: oop ParNewGeneration::real_forwardee(oop obj) { oop forward_ptr = obj->forwardee(); if (forward_ptr != ClaimedForwardPtr) { return forward_ptr; } else { // manually inlined for readability. oop forward_ptr = obj->forwardee(); while (forward_ptr == ClaimedForwardPtr) { waste_some_time(); forward_ptr = obj->forwardee(); } return forward_ptr; } } Regarding the CAS (A), There is no copy before the CAS. Dereferencing the forwardee must be allowed after obtaining the forwardee. Regarding the CAS (B), There is a copy before the CAS. Dereferencing the forwardee must be allowed after obtaining the forwardee. [Observation on the current implementation] No fence is necessary before and after the CAS (A). Release barrier is necessary before the CAS (B). The forwardee_acquire() must be used instead of forwardee() in real_forwardee(). [Performance measurement] The critical-jOPS of SPECjbb2015 improved by 12% with this change. Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Mon Jul 2 12:32:30 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 02 Jul 2018 14:32:30 +0200 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> Message-ID: <0332d67eb5e0509367118eb99b9c84b280465918.camel@oracle.com> Hi all, can I have reviews for this fix that is scheduled for JDK 11? Thanks, Thomas On Tue, 2018-06-26 at 19:15 +0200, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this bug in keeping remembered sets > consistent between HC and HS regions, causing crashes with > verification? > > The problem occurs during updating the remembered sets during the > Remark pause. This process is parallel; it uses liveness information > from marking to set the new remembered set states. > > However during marking G1 attributes all liveness information of a > humongous object to the HS region; if that liveness information has > not been updated yet for HC regions, and another thread is > responsible for determining that HC region's remembered set state, > the new remembered set state of the HC region will get a state as the > HS region. > > The fix is to, for HC regions, just pass the liveness data of the HS > region into the method that determines the new remembered set state. > Further, in that latter method, make sure that the predicate for > determining whether a region gets a remembered set assigned is > completely disjoint for humongous and non-humongous regions. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8205426 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8205426/webrev/ > Testing: > new test case, hs-tier1-4,jdk-tier1-3 > > Thanks, > Thomas > From per.liden at oracle.com Mon Jul 2 15:05:26 2018 From: per.liden at oracle.com (Per Liden) Date: Mon, 2 Jul 2018 17:05:26 +0200 Subject: RFR: 8205924: ZGC: Premature OOME due to failure to expand backing file Message-ID: <5564b685-22ab-c346-3fb9-f58a9aeee75b@oracle.com> ZGC currently assumes that there will be enough space available on the backing file system to hold the max heap size (-Xmx). However, this might not be true. For example, the backing filesystem might have been misconfigured or space on that filesystem might be used by some other process. In this situation, ZGC will try (and fail) to map more memory every time a new page needs to be allocated (assuming that request can't be satisfied by the page case). As a result, we fail to flush the page cache, which in turn means we throw a premature OOME and we continuously take the performance hit by making unnecessary fallocate() syscalls that will never succeed. We should instead detect this situation, flush the page cache and avoid making further fallocate() calls. This issue has been seen now and then in various tests (e.g. RunThese30M and Kitchensink), typically on machines running older kernels without support for memfd_create(), where we fall back to using /dev/shm, which sometimes doesn't have enough space to hold the given max heap size (default tmpfs size is 50% of the RAM in the machine). Bug: https://bugs.openjdk.java.net/browse/JDK-8205924 Webrev: http://cr.openjdk.java.net/~pliden/8205924/webrev.0 Testing: Passed two iterations of tier{1,2,3,4,5,6} on linux-x64, passed multiple iterations of RunThese30M locally, and various manual testing to provoke the bad situations. /Per From HORIE at jp.ibm.com Tue Jul 3 08:25:41 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 3 Jul 2018 17:25:41 +0900 Subject: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space In-Reply-To: References: Message-ID: Hi Martin, Thanks a lot for your review. Sure, we need an OK from a CMS expert. Following is the new webrev: http://cr.openjdk.java.net/~mhorie/8205908/webrev.01/ >Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. Thank you for pointing out this issue in the original implementation. I newly inserted a release at "2.4. Set new_obj as forwardee [L1142]". Improvement of critical-jOPS in SPECjbb2015 was 10%, which is still a big number. Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie , "hotspot-gc-dev at openjdk.java.net" Cc: Kim Barrett , Gustavo Romero Date: 2018/07/02 16:56 Subject: RE: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space Hi Michihiro, thanks for addressing this issue. The change looks good to me. I only have a comment on the coding style (oop.inline.hpp): ?if ()? should be followed by braces ?{ ? }?. Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. Please note that SAP still supports CMS in the commercial VM so this change is still relevant and we?d like to push it to jdk11 if possible. But we definitely need an OK from a CMS expert (which I?m not). Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Mittwoch, 27. Juni 2018 02:23 To: hotspot-gc-dev at openjdk.java.net Cc: Doerr, Martin ; Kim Barrett ; Gustavo Romero Subject: RFR: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8205908 Webrev: http://cr.openjdk.java.net/~mhorie/8205908/webrev.00/ [Current implementation] ParNewGeneration::copy_to_survivor_space tries to move live objects to a different location. There are two patterns on how to copy an object depending on whether there is space to allocate new_obj in to-space or not. If a thread cannot find space to allocate new_obj in to-space, the thread first executes the CAS with a dummy forwarding pointer "ClaimedForwardPtr", which is a sentinel to mark an object as claimed. After succeeding in the CAS, a thread can copy the new_obj in the old space. Here, suppose thread A succeeds in the CAS, while thread B fails in the CAS. When thread A finishes the copy, it replaces the dummy forwarding pointer with a real forwarding pointer. After thread B fails in the CAS, thread B returns the forwardee after waiting for the copy of the forwardee is completed. This is observable by checking the dummy forwarding pointer is replaced with a real forwarding pointer by thread A. In contrast, if a thread can find space to allocate new_obj in to-space, the thread first copies the new_obj and then executes the CAS with the new_obj. If a thread fails in the CAS, it deallocates the copied new_obj and returns the forwardee. Procedure of ParNewGeneration::copy_to_survivor_space : ([L****] represents the line number in src/hotspot/share/gc/cms/parNewGeneration.cpp) 1. Try to each allocate space for new_obj in to-space [L.1110] 2. If fail in the allocation in to-space [L1117] 2.1. Execute the CAS with the dummy forwarding pointer [L1122] ??? (A) 2.2. If fail in the CAS, return the forwardee via real_forwardee() [L1123] 2.3. If succeed in the CAS [L1128] 2.3.1. If promotion is allowed, copy new_obj in the old area [L1129] 2.3.2. If promotion is not allowed, forward to obj itself [L1133] 2.4. Set new_obj as forwardee [L1142] 3. If succeed in the allocation in to-space [L1144] 3.1. Copy new_obj [L1146] 3.2. Execute the CAS with new_obj [L1148] ??? (B) 4. Dereference the new_obj for logging. Each new_obj copied by each thread at step 3.1 is used instead of forwardee() [L1159] 5. If succeed in either CAS (A) or CAS (B), return new_obj [L1163] 6. If fail in CAS (B), get the forwardee via real_forwardee(). Unallocate new_obj in to-space [L1193] 7. Return forwardee [L1203] For reference, real_forwardee() is as shown below: oop ParNewGeneration::real_forwardee(oop obj) { oop forward_ptr = obj->forwardee(); if (forward_ptr != ClaimedForwardPtr) { return forward_ptr; } else { // manually inlined for readability. oop forward_ptr = obj->forwardee(); while (forward_ptr == ClaimedForwardPtr) { waste_some_time(); forward_ptr = obj->forwardee(); } return forward_ptr; } } Regarding the CAS (A), There is no copy before the CAS. Dereferencing the forwardee must be allowed after obtaining the forwardee. Regarding the CAS (B), There is a copy before the CAS. Dereferencing the forwardee must be allowed after obtaining the forwardee. [Observation on the current implementation] No fence is necessary before and after the CAS (A). Release barrier is necessary before the CAS (B). The forwardee_acquire() must be used instead of forwardee() in real_forwardee(). [Performance measurement] The critical-jOPS of SPECjbb2015 improved by 12% with this change. Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From martin.doerr at sap.com Tue Jul 3 13:51:18 2018 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 3 Jul 2018 13:51:18 +0000 Subject: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space In-Reply-To: References:

Message-ID: <073dbd9f9aef42c89954e12b5ad005b9@sap.com> Hi Michihiro, I think oopDesc::forward_to should not be changed with this change because it is used by many GCs. If you want to add a StoreStore barrier, you could add ?OrderAccess::storestore();? before ?old->forward_to(new_obj);? for example. It would be nice to have a comment for your new case in oopDesc::forward_to_atomic. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 3. Juli 2018 10:26 To: Doerr, Martin Cc: hotspot-gc-dev at openjdk.java.net; Kim Barrett ; Gustavo Romero Subject: RE: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space Hi Martin, Thanks a lot for your review. Sure, we need an OK from a CMS expert. Following is the new webrev: http://cr.openjdk.java.net/~mhorie/8205908/webrev.01/ >Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. Thank you for pointing out this issue in the original implementation. I newly inserted a release at "2.4. Set new_obj as forwardee [L1142]". Improvement of critical-jOPS in SPECjbb2015 was 10%, which is still a big number. Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2018/07/02 16:56:03---Hi Michihiro, thanks for addressing this issue.]"Doerr, Martin" ---2018/07/02 16:56:03---Hi Michihiro, thanks for addressing this issue. From: "Doerr, Martin" > To: Michihiro Horie >, "hotspot-gc-dev at openjdk.java.net" > Cc: Kim Barrett >, Gustavo Romero > Date: 2018/07/02 16:56 Subject: RE: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space ________________________________ Hi Michihiro, thanks for addressing this issue. The change looks good to me. I only have a comment on the coding style (oop.inline.hpp): ?if ()? should be followed by braces ?{ ? }?. Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. Please note that SAP still supports CMS in the commercial VM so this change is still relevant and we?d like to push it to jdk11 if possible. But we definitely need an OK from a CMS expert (which I?m not). Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Mittwoch, 27. Juni 2018 02:23 To: hotspot-gc-dev at openjdk.java.net Cc: Doerr, Martin >; Kim Barrett >; Gustavo Romero > Subject: RFR: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8205908 Webrev: http://cr.openjdk.java.net/~mhorie/8205908/webrev.00/ [Current implementation] ParNewGeneration::copy_to_survivor_space tries to move live objects to a different location. There are two patterns on how to copy an object depending on whether there is space to allocate new_obj in to-space or not. If a thread cannot find space to allocate new_obj in to-space, the thread first executes the CAS with a dummy forwarding pointer "ClaimedForwardPtr", which is a sentinel to mark an object as claimed. After succeeding in the CAS, a thread can copy the new_obj in the old space. Here, suppose thread A succeeds in the CAS, while thread B fails in the CAS. When thread A finishes the copy, it replaces the dummy forwarding pointer with a real forwarding pointer. After thread B fails in the CAS, thread B returns the forwardee after waiting for the copy of the forwardee is completed. This is observable by checking the dummy forwarding pointer is replaced with a real forwarding pointer by thread A. In contrast, if a thread can find space to allocate new_obj in to-space, the thread first copies the new_obj and then executes the CAS with the new_obj. If a thread fails in the CAS, it deallocates the copied new_obj and returns the forwardee. Procedure of ParNewGeneration::copy_to_survivor_space : ([L****] represents the line number in src/hotspot/share/gc/cms/parNewGeneration.cpp) 1. Try to each allocate space for new_obj in to-space [L.1110] 2. If fail in the allocation in to-space [L1117] 2.1. Execute the CAS with the dummy forwarding pointer [L1122] ??? (A) 2.2. If fail in the CAS, return the forwardee via real_forwardee() [L1123] 2.3. If succeed in the CAS [L1128] 2.3.1. If promotion is allowed, copy new_obj in the old area [L1129] 2.3.2. If promotion is not allowed, forward to obj itself [L1133] 2.4. Set new_obj as forwardee [L1142] 3. If succeed in the allocation in to-space [L1144] 3.1. Copy new_obj [L1146] 3.2. Execute the CAS with new_obj [L1148] ??? (B) 4. Dereference the new_obj for logging. Each new_obj copied by each thread at step 3.1 is used instead of forwardee() [L1159] 5. If succeed in either CAS (A) or CAS (B), return new_obj [L1163] 6. If fail in CAS (B), get the forwardee via real_forwardee(). Unallocate new_obj in to-space [L1193] 7. Return forwardee [L1203] For reference, real_forwardee() is as shown below: oop ParNewGeneration::real_forwardee(oop obj) { oop forward_ptr = obj->forwardee(); if (forward_ptr != ClaimedForwardPtr) { return forward_ptr; } else { // manually inlined for readability. oop forward_ptr = obj->forwardee(); while (forward_ptr == ClaimedForwardPtr) { waste_some_time(); forward_ptr = obj->forwardee(); } return forward_ptr; } } Regarding the CAS (A), There is no copy before the CAS. Dereferencing the forwardee must be allowed after obtaining the forwardee. Regarding the CAS (B), There is a copy before the CAS. Dereferencing the forwardee must be allowed after obtaining the forwardee. [Observation on the current implementation] No fence is necessary before and after the CAS (A). Release barrier is necessary before the CAS (B). The forwardee_acquire() must be used instead of forwardee() in real_forwardee(). [Performance measurement] The critical-jOPS of SPECjbb2015 improved by 12% with this change. Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From kim.barrett at oracle.com Tue Jul 3 20:40:54 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Jul 2018 16:40:54 -0400 Subject: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space In-Reply-To: References:

Message-ID: <36321D6C-A8B7-48FB-8560-9B5807956A87@oracle.com> > On Jul 3, 2018, at 4:25 AM, Michihiro Horie wrote: > > Hi Martin, > > Thanks a lot for your review. Sure, we need an OK from a CMS expert. Following is the new webrev: > http://cr.openjdk.java.net/~mhorie/8205908/webrev.01/ > > >Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. > Thank you for pointing out this issue in the original implementation. I newly inserted a release at "2.4. Set new_obj as forwardee [L1142]". > > Improvement of critical-jOPS in SPECjbb2015 was 10%, which is still a big number. > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo CMS was deprecated in JDK 9, and has been on maintenance life-support for some time. This complex-to-review performance enhancement was proposed less than 48 hours before JDK 11 FC, and didn't receive any reviews until after FC. Because of these factors, I don't think it should be included in JDK 11. And if CMS gets removed in JDK 12 (I don't know if that will happen), then this change would be rendered entirely moot. I haven't looked carefully at the change, though I did find one part that I don't like. The new test of "order" in forward_to_atomic not only affects CMS, but also (uselessly) affects G1. I'm not going to be able to look at this carefully soon, as JDK 11 bug fixing has a higher priority for me. Since I think CMS might soon not be an issue, I'd really rather not look at it at all. I think this change needs not just a CMS-expert reviewer, but someone who is willing to maintain CMS (including any potential bug tail from this change). From per.liden at oracle.com Tue Jul 3 21:28:16 2018 From: per.liden at oracle.com (Per Liden) Date: Tue, 3 Jul 2018 23:28:16 +0200 Subject: 8206316: ZGC: Preferred tmpfs mount point not found on Debian Message-ID: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> On Linux kernels < 3.17 (where memfd_create() syscall does not exist), ZGC falls back to searching for a suitable tmpfs mount point. If multiple mount points are found (which is the common case) we try to see if any of them matches the "preferred default" path (which is hard coded to be /dev/shm in ZGC). This work well, except on Debian and Debian derived distributions, which for some reason have chosen to use /run/shm instead instead of /dev/shm. As a result, ZGC will fail to initialize on some commonly used distributions (current Debian stable, Ubuntu 14.04-LTS, etc), forcing the user to manually mount a tmpfs file system on /dev/shm or use -XX:ZPath=/run/shm to explicitly select the mount point. ZGC should handle this situation much better, but having a list of preferred mount points (instead of just one) to allow for multiple alternatives covering differences between distributions. There is a high risk that this otherwise becomes a common problem, given the popularity of Debian and Debian derived distributions. Bug: https://bugs.openjdk.java.net/browse/JDK-8206316 Webrev: http://cr.openjdk.java.net/~pliden/8206316/webrev.0 Testing: Manual testing of various mount point configurations. /Per From per.liden at oracle.com Tue Jul 3 22:04:39 2018 From: per.liden at oracle.com (Per Liden) Date: Wed, 4 Jul 2018 00:04:39 +0200 Subject: RFR: 8206322: ZGC: Incorrect license header in gtests Message-ID: When the ZGC gtests where open-sourced, the license header in these files were not updated accordingly. Bug: https://bugs.openjdk.java.net/browse/JDK-8206322 Webrev: http://cr.openjdk.java.net/~pliden/8206322/webrev.0 /Per From kim.barrett at oracle.com Tue Jul 3 23:41:51 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Jul 2018 19:41:51 -0400 Subject: RFR: 8206322: ZGC: Incorrect license header in gtests In-Reply-To: References: Message-ID: <6567E49E-2512-4D1A-B446-21FEE50F9996@oracle.com> > On Jul 3, 2018, at 6:04 PM, Per Liden wrote: > > When the ZGC gtests where open-sourced, the license header in these files were not updated accordingly. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8206322 > Webrev: http://cr.openjdk.java.net/~pliden/8206322/webrev.0 > > /Per Looks good, and trivial. From kim.barrett at oracle.com Tue Jul 3 23:47:48 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 3 Jul 2018 19:47:48 -0400 Subject: 8206316: ZGC: Preferred tmpfs mount point not found on Debian In-Reply-To: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> References: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> Message-ID: > On Jul 3, 2018, at 5:28 PM, Per Liden wrote: > > On Linux kernels < 3.17 (where memfd_create() syscall does not exist), ZGC falls back to searching for a suitable tmpfs mount point. If multiple mount points are found (which is the common case) we try to see if any of them matches the "preferred default" path (which is hard coded to be /dev/shm in ZGC). This work well, except on Debian and Debian derived distributions, which for some reason have chosen to use /run/shm instead instead of /dev/shm. As a result, ZGC will fail to initialize on some commonly used distributions (current Debian stable, Ubuntu 14.04-LTS, etc), forcing the user to manually mount a tmpfs file system on /dev/shm or use -XX:ZPath=/run/shm to explicitly select the mount point. ZGC should handle this situation much better, but having a list of preferred mount points (instead of just one) to allow for multiple alternatives covering differences between distributions. There is a high risk that this otherwise becomes a common problem, given the popularity of Debian and Debian derived distributions. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8206316 > Webrev: http://cr.openjdk.java.net/~pliden/8206316/webrev.0 > > Testing: Manual testing of various mount point configurations. > > /Per Looks good. From per.liden at oracle.com Wed Jul 4 05:55:44 2018 From: per.liden at oracle.com (Per Liden) Date: Wed, 4 Jul 2018 07:55:44 +0200 Subject: RFR: 8206322: ZGC: Incorrect license header in gtests In-Reply-To: <6567E49E-2512-4D1A-B446-21FEE50F9996@oracle.com> References: <6567E49E-2512-4D1A-B446-21FEE50F9996@oracle.com> Message-ID: <23d712c2-f652-1966-d363-bbb7f2c873a3@oracle.com> Thanks Kim! /Per On 07/04/2018 01:41 AM, Kim Barrett wrote: >> On Jul 3, 2018, at 6:04 PM, Per Liden wrote: >> >> When the ZGC gtests where open-sourced, the license header in these files were not updated accordingly. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8206322 >> Webrev: http://cr.openjdk.java.net/~pliden/8206322/webrev.0 >> >> /Per > > Looks good, and trivial. > From per.liden at oracle.com Wed Jul 4 05:57:06 2018 From: per.liden at oracle.com (Per Liden) Date: Wed, 4 Jul 2018 07:57:06 +0200 Subject: 8206316: ZGC: Preferred tmpfs mount point not found on Debian In-Reply-To: References: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> Message-ID: Thanks Kim! /Per On 07/04/2018 01:47 AM, Kim Barrett wrote: >> On Jul 3, 2018, at 5:28 PM, Per Liden wrote: >> >> On Linux kernels < 3.17 (where memfd_create() syscall does not exist), ZGC falls back to searching for a suitable tmpfs mount point. If multiple mount points are found (which is the common case) we try to see if any of them matches the "preferred default" path (which is hard coded to be /dev/shm in ZGC). This work well, except on Debian and Debian derived distributions, which for some reason have chosen to use /run/shm instead instead of /dev/shm. As a result, ZGC will fail to initialize on some commonly used distributions (current Debian stable, Ubuntu 14.04-LTS, etc), forcing the user to manually mount a tmpfs file system on /dev/shm or use -XX:ZPath=/run/shm to explicitly select the mount point. ZGC should handle this situation much better, but having a list of preferred mount points (instead of just one) to allow for multiple alternatives covering differences between distributions. There is a high risk that this otherwise becomes a common problem, given the popularity of Debian and Debian derived distributions. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8206316 >> Webrev: http://cr.openjdk.java.net/~pliden/8206316/webrev.0 >> >> Testing: Manual testing of various mount point configurations. >> >> /Per > > Looks good. > From thomas.schatzl at oracle.com Wed Jul 4 06:03:12 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 04 Jul 2018 08:03:12 +0200 Subject: RFR: 8206322: ZGC: Incorrect license header in gtests In-Reply-To: References: Message-ID: <07fd3ba37713cdebdf59e500c704123330728745.camel@oracle.com> Hi, On Wed, 2018-07-04 at 00:04 +0200, Per Liden wrote: > When the ZGC gtests where open-sourced, the license header in these > files were not updated accordingly. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8206322 > Webrev: http://cr.openjdk.java.net/~pliden/8206322/webrev.0 > good. Thomas From per.liden at oracle.com Wed Jul 4 06:06:13 2018 From: per.liden at oracle.com (Per Liden) Date: Wed, 4 Jul 2018 08:06:13 +0200 Subject: RFR: 8206322: ZGC: Incorrect license header in gtests In-Reply-To: <07fd3ba37713cdebdf59e500c704123330728745.camel@oracle.com> References: <07fd3ba37713cdebdf59e500c704123330728745.camel@oracle.com> Message-ID: Thanks Thomas! /Per On 07/04/2018 08:03 AM, Thomas Schatzl wrote: > Hi, > > On Wed, 2018-07-04 at 00:04 +0200, Per Liden wrote: >> When the ZGC gtests where open-sourced, the license header in these >> files were not updated accordingly. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8206322 >> Webrev: http://cr.openjdk.java.net/~pliden/8206322/webrev.0 >> > > good. > > Thomas > From thomas.schatzl at oracle.com Wed Jul 4 06:55:29 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 04 Jul 2018 08:55:29 +0200 Subject: 8206316: ZGC: Preferred tmpfs mount point not found on Debian In-Reply-To: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> References: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> Message-ID: <5a83a02502ba8c42431c931dadbc8b73f385b27f.camel@oracle.com> Hi, On Tue, 2018-07-03 at 23:28 +0200, Per Liden wrote: > On Linux kernels < 3.17 (where memfd_create() syscall does not > exist), ZGC falls back to searching for a suitable tmpfs mount point. > If multiple mount points are found (which is the common case) we try > to see if any of them matches the "preferred default" path (which is > hard coded to be /dev/shm in ZGC). This work well, except on Debian > and Debian derived distributions, which for some reason have chosen > to use /run/shm instead instead of /dev/shm. As a result, ZGC will > fail to initialize on some commonly used distributions (current > Debian stable, Ubuntu 14.04-LTS, etc), forcing the user to manually > mount a tmpfs file system on /dev/shm or use -XX:ZPath=/run/shm to > explicitly select the mount point. ZGC should handle this situation > much better, but having a list of preferred mount points (instead of > just one) to allow for multiple alternatives covering differences > between distributions. There is a high risk that this otherwise > becomes a common problem, given the popularity of Debian and Debian > derived distributions. Looking at the various support documents, this does not seem to be a very significant issue. On Debian Stretch (latest stable) kernel is 4.9 [1]. And latest Ubuntu 14.04(.05) runs on a 4.4 kernel [2]. While Jessie (previous stable) is 3.16, it is "almost" out of support (and so is 14.04), and will be even more at GA. Also memfd_create has been backported to Jessie afaict [3]. I am not sure that people that already need to go out of their way to install latest JDK on these OSes to test, won't also consider upgrading minor versions their OS. (And I assume that for testing, people do not use live systems anyway). All in all I do not see this issue as urgent as the description make it seem. It does not seem to be a problematic change either (to me it seems like an enhancement of existing code too). > Bug: https://bugs.openjdk.java.net/browse/JDK-8206316 > Webrev: http://cr.openjdk.java.net/~pliden/8206316/webrev.0 > > Testing: Manual testing of various mount point configurations. looks good. Thomas [1] https://lists.debian.org/debian-kernel/2016/08/msg00099.html [2] https://wiki.ubuntu.com/Kernel/Support [3] https://manpages.debian.org/jessie-backports/manpages-dev/memfd_cre ate.2.en.html (see the "other versions" table) From per.liden at oracle.com Wed Jul 4 07:52:34 2018 From: per.liden at oracle.com (Per Liden) Date: Wed, 4 Jul 2018 09:52:34 +0200 Subject: 8206316: ZGC: Preferred tmpfs mount point not found on Debian In-Reply-To: <5a83a02502ba8c42431c931dadbc8b73f385b27f.camel@oracle.com> References: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> <5a83a02502ba8c42431c931dadbc8b73f385b27f.camel@oracle.com> Message-ID: <812cc3b6-6423-a843-46b6-9ead2ada7a58@oracle.com> Hi Thomas, On 07/04/2018 08:55 AM, Thomas Schatzl wrote: > Hi, > > On Tue, 2018-07-03 at 23:28 +0200, Per Liden wrote: >> On Linux kernels < 3.17 (where memfd_create() syscall does not >> exist), ZGC falls back to searching for a suitable tmpfs mount point. >> If multiple mount points are found (which is the common case) we try >> to see if any of them matches the "preferred default" path (which is >> hard coded to be /dev/shm in ZGC). This work well, except on Debian >> and Debian derived distributions, which for some reason have chosen >> to use /run/shm instead instead of /dev/shm. As a result, ZGC will >> fail to initialize on some commonly used distributions (current >> Debian stable, Ubuntu 14.04-LTS, etc), forcing the user to manually >> mount a tmpfs file system on /dev/shm or use -XX:ZPath=/run/shm to >> explicitly select the mount point. ZGC should handle this situation >> much better, but having a list of preferred mount points (instead of >> just one) to allow for multiple alternatives covering differences >> between distributions. There is a high risk that this otherwise >> becomes a common problem, given the popularity of Debian and Debian >> derived distributions. > > Looking at the various support documents, this does not seem to be a > very significant issue. On Debian Stretch (latest stable) kernel is 4.9 > [1]. > And latest Ubuntu 14.04(.05) runs on a 4.4 kernel [2]. > > While Jessie (previous stable) is 3.16, it is "almost" out of support > (and so is 14.04), and will be even more at GA. Also memfd_create has > been backported to Jessie afaict [3]. > > I am not sure that people that already need to go out of their way to > install latest JDK on these OSes to test, won't also consider upgrading > minor versions their OS. (And I assume that for testing, people do not > use live systems anyway). > > All in all I do not see this issue as urgent as the description make it > seem. It does not seem to be a problematic change either (to me it > seems like an enhancement of existing code too). Thanks for digging up this information. In light of this, I agree that this is not as urgent as I first thought. I still think we should consider this for 11, given that I've already received reports from people running into this issue, and the fix is pretty straight forward. Objections or thoughts? > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8206316 >> Webrev: http://cr.openjdk.java.net/~pliden/8206316/webrev.0 >> >> Testing: Manual testing of various mount point configurations. > > looks good. Thanks for reviewing! cheers, Per > > Thomas > > [1] https://lists.debian.org/debian-kernel/2016/08/msg00099.html > [2] https://wiki.ubuntu.com/Kernel/Support > [3] https://manpages.debian.org/jessie-backports/manpages-dev/memfd_cre > ate.2.en.html (see the "other versions" table) > From HORIE at jp.ibm.com Wed Jul 4 08:26:05 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 4 Jul 2018 17:26:05 +0900 Subject: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space In-Reply-To: <36321D6C-A8B7-48FB-8560-9B5807956A87@oracle.com> References:

<36321D6C-A8B7-48FB-8560-9B5807956A87@oracle.com> Message-ID: Hi Martin, Kim, Thank you for both of your comments. I missed the point that oopDesc::forward_to is invoked from several callers. Using OrderAccess:storestore() before the invocation of forward_to () would be a great idea, thanks. >I haven't looked carefully at the change, though I did find one part >that I don't like. The new test of "order" in forward_to_atomic not >only affects CMS, but also (uselessly) affects G1. Please let me confirm your point. You mean I should give memory_order_acq_rel to forward_to_atomic, which uses tests as follows to hold the consistent meaning of acquire/release in forward_to_atomic? I agree it is not clear the test with release returns the forwardee with acquire. oop oopDesc::forward_to_atomic(oop p, atomic_memory_order order) { : while (!oldMark->is_marked()) { if (order == memory_order_acq_rel) { curMark = cas_set_mark_raw(forwardPtrMark, oldMark, memory_order_release); } else { curMark = cas_set_mark_raw(forwardPtrMark, oldMark, order); } } : } if (order == memory_order_acq_rel) { return forwardee_acquire(); } return forwardee(); } Best regards, -- Michihiro, IBM Research - Tokyo From: Kim Barrett To: Michihiro Horie Cc: "Doerr, Martin" , "hotspot-gc-dev at openjdk.java.net" , Gustavo Romero Date: 2018/07/04 05:41 Subject: Re: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space > On Jul 3, 2018, at 4:25 AM, Michihiro Horie wrote: > > Hi Martin, > > Thanks a lot for your review. Sure, we need an OK from a CMS expert. Following is the new webrev: > http://cr.openjdk.java.net/~mhorie/8205908/webrev.01/ > > >Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. > Thank you for pointing out this issue in the original implementation. I newly inserted a release at "2.4. Set new_obj as forwardee [L1142]". > > Improvement of critical-jOPS in SPECjbb2015 was 10%, which is still a big number. > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo CMS was deprecated in JDK 9, and has been on maintenance life-support for some time. This complex-to-review performance enhancement was proposed less than 48 hours before JDK 11 FC, and didn't receive any reviews until after FC. Because of these factors, I don't think it should be included in JDK 11. And if CMS gets removed in JDK 12 (I don't know if that will happen), then this change would be rendered entirely moot. I haven't looked carefully at the change, though I did find one part that I don't like. The new test of "order" in forward_to_atomic not only affects CMS, but also (uselessly) affects G1. I'm not going to be able to look at this carefully soon, as JDK 11 bug fixing has a higher priority for me. Since I think CMS might soon not be an issue, I'd really rather not look at it at all. I think this change needs not just a CMS-expert reviewer, but someone who is willing to maintain CMS (including any potential bug tail from this change). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From erik.helin at oracle.com Wed Jul 4 08:46:51 2018 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 4 Jul 2018 10:46:51 +0200 Subject: 8206316: ZGC: Preferred tmpfs mount point not found on Debian In-Reply-To: <812cc3b6-6423-a843-46b6-9ead2ada7a58@oracle.com> References: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> <5a83a02502ba8c42431c931dadbc8b73f385b27f.camel@oracle.com> <812cc3b6-6423-a843-46b6-9ead2ada7a58@oracle.com> Message-ID: <8705fb90-3b25-0846-e76f-de1c5615e2ac@oracle.com> On 07/04/2018 09:52 AM, Per Liden wrote: > Hi Thomas, > > On 07/04/2018 08:55 AM, Thomas Schatzl wrote: >> Hi, >> >> On Tue, 2018-07-03 at 23:28 +0200, Per Liden wrote: >>> On Linux kernels < 3.17 (where memfd_create() syscall does not >>> exist), ZGC falls back to searching for a suitable tmpfs mount point. >>> If multiple mount points are found (which is the common case) we try >>> to see if any of them matches the "preferred default" path (which is >>> hard coded to be /dev/shm in ZGC). This work well, except on Debian >>> and Debian derived distributions, which for some reason have chosen >>> to use /run/shm instead instead of /dev/shm. As a result, ZGC will >>> fail to initialize on some commonly used distributions (current >>> Debian stable, Ubuntu 14.04-LTS, etc), forcing the user to manually >>> mount a tmpfs file system? on /dev/shm or use -XX:ZPath=/run/shm to >>> explicitly select the mount point. ZGC should handle this situation >>> much better, but having a list of preferred mount points (instead of >>> just one) to allow for multiple alternatives covering differences >>> between distributions. There is a high risk that this otherwise >>> becomes a common problem, given the popularity of Debian and Debian >>> derived distributions. >> >> Looking at the various support documents, this does not seem to be a >> very significant issue. On Debian Stretch (latest stable) kernel is 4.9 >> [1]. >> And latest Ubuntu 14.04(.05) runs on a 4.4 kernel [2]. >> >> While Jessie (previous stable) is 3.16, it is "almost" out of support >> (and so is 14.04), and will be even more at GA. Also memfd_create has >> been backported to Jessie afaict [3]. >> >> I am not sure that people that already need to go out of their way to >> install latest JDK on these OSes to test, won't also consider upgrading >> ? minor versions their OS. (And I assume that for testing, people do not >> use live systems anyway). >> >> All in all I do not see this issue as urgent as the description make it >> seem. It does not seem to be a problematic change either (to me it >> seems like an enhancement of existing code too). > > Thanks for digging up this information. In light of this, I agree that > this is not as urgent as I first thought. I still think we should > consider this for 11, given that I've already received reports from > people running into this issue, and the fix is pretty straight forward. > > Objections or thoughts? Given the fix is small, I think we should just fix it. That seems easier than explaining to users why we did not fix this :) The patch looks good to me, Reviewed. Thanks, Erik >> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8206316 >>> Webrev: http://cr.openjdk.java.net/~pliden/8206316/webrev.0 >>> >>> Testing: Manual testing of various mount point configurations. >> >> ?? looks good. > > Thanks for reviewing! > > cheers, > Per > >> >> Thomas >> >> [1] https://lists.debian.org/debian-kernel/2016/08/msg00099.html >> [2] https://wiki.ubuntu.com/Kernel/Support >> [3] https://manpages.debian.org/jessie-backports/manpages-dev/memfd_cre >> ate.2.en.html (see the "other versions" table) >> From per.liden at oracle.com Wed Jul 4 09:02:21 2018 From: per.liden at oracle.com (Per Liden) Date: Wed, 4 Jul 2018 11:02:21 +0200 Subject: 8206316: ZGC: Preferred tmpfs mount point not found on Debian In-Reply-To: <8705fb90-3b25-0846-e76f-de1c5615e2ac@oracle.com> References: <51c4c4b7-8ec0-fd6b-dc18-a6fc4685caed@oracle.com> <5a83a02502ba8c42431c931dadbc8b73f385b27f.camel@oracle.com> <812cc3b6-6423-a843-46b6-9ead2ada7a58@oracle.com> <8705fb90-3b25-0846-e76f-de1c5615e2ac@oracle.com> Message-ID: <6bad5283-a69a-31b1-cade-4d89bb8f2ee2@oracle.com> On 07/04/2018 10:46 AM, Erik Helin wrote: > On 07/04/2018 09:52 AM, Per Liden wrote: >> Hi Thomas, >> >> On 07/04/2018 08:55 AM, Thomas Schatzl wrote: >>> Hi, >>> >>> On Tue, 2018-07-03 at 23:28 +0200, Per Liden wrote: >>>> On Linux kernels < 3.17 (where memfd_create() syscall does not >>>> exist), ZGC falls back to searching for a suitable tmpfs mount point. >>>> If multiple mount points are found (which is the common case) we try >>>> to see if any of them matches the "preferred default" path (which is >>>> hard coded to be /dev/shm in ZGC). This work well, except on Debian >>>> and Debian derived distributions, which for some reason have chosen >>>> to use /run/shm instead instead of /dev/shm. As a result, ZGC will >>>> fail to initialize on some commonly used distributions (current >>>> Debian stable, Ubuntu 14.04-LTS, etc), forcing the user to manually >>>> mount a tmpfs file system? on /dev/shm or use -XX:ZPath=/run/shm to >>>> explicitly select the mount point. ZGC should handle this situation >>>> much better, but having a list of preferred mount points (instead of >>>> just one) to allow for multiple alternatives covering differences >>>> between distributions. There is a high risk that this otherwise >>>> becomes a common problem, given the popularity of Debian and Debian >>>> derived distributions. >>> >>> Looking at the various support documents, this does not seem to be a >>> very significant issue. On Debian Stretch (latest stable) kernel is 4.9 >>> [1]. >>> And latest Ubuntu 14.04(.05) runs on a 4.4 kernel [2]. >>> >>> While Jessie (previous stable) is 3.16, it is "almost" out of support >>> (and so is 14.04), and will be even more at GA. Also memfd_create has >>> been backported to Jessie afaict [3]. >>> >>> I am not sure that people that already need to go out of their way to >>> install latest JDK on these OSes to test, won't also consider upgrading >>> ? minor versions their OS. (And I assume that for testing, people do not >>> use live systems anyway). >>> >>> All in all I do not see this issue as urgent as the description make it >>> seem. It does not seem to be a problematic change either (to me it >>> seems like an enhancement of existing code too). >> >> Thanks for digging up this information. In light of this, I agree that >> this is not as urgent as I first thought. I still think we should >> consider this for 11, given that I've already received reports from >> people running into this issue, and the fix is pretty straight forward. >> >> Objections or thoughts? > > Given the fix is small, I think we should just fix it. That seems easier > than explaining to users why we did not fix this :) > > The patch looks good to me, Reviewed. Thanks Erik! /Per (For the record, Thomas told me off-line that he didn't have any objections, so I'll go ahead and push this to 11) > > Thanks, > Erik > >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8206316 >>>> Webrev: http://cr.openjdk.java.net/~pliden/8206316/webrev.0 >>>> >>>> Testing: Manual testing of various mount point configurations. >>> >>> ?? looks good. >> >> Thanks for reviewing! >> >> cheers, >> Per >> >>> >>> Thomas >>> >>> [1] https://lists.debian.org/debian-kernel/2016/08/msg00099.html >>> [2] https://wiki.ubuntu.com/Kernel/Support >>> [3] https://manpages.debian.org/jessie-backports/manpages-dev/memfd_cre >>> ate.2.en.html (see the "other versions" table) >>> From erik.helin at oracle.com Wed Jul 4 09:44:44 2018 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 4 Jul 2018 11:44:44 +0200 Subject: RFR: 8205924: ZGC: Premature OOME due to failure to expand backing file In-Reply-To: <5564b685-22ab-c346-3fb9-f58a9aeee75b@oracle.com> References: <5564b685-22ab-c346-3fb9-f58a9aeee75b@oracle.com> Message-ID: <58201055-4c6f-18af-2a06-183729d9cc6f@oracle.com> On 07/02/2018 05:05 PM, Per Liden wrote: > ZGC currently assumes that there will be enough space available on the > backing file system to hold the max heap size (-Xmx). However, this > might not be true. For example, the backing filesystem might have been > misconfigured or space on that filesystem might be used by some other > process. In this situation, ZGC will try (and fail) to map more memory > every time a new page needs to be allocated (assuming that request can't > be satisfied by the page case). As a result, we fail to flush the page > cache, which in turn means we throw a premature OOME and we continuously > take the performance hit by making unnecessary fallocate() syscalls that > will never succeed. We should instead detect this situation, flush the > page cache and avoid making further fallocate() calls. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8205924 > Webrev: http://cr.openjdk.java.net/~pliden/8205924/webrev.0 This looks good to me, I would have added an assert in size_t ZPhysicalMemoryBacking::try_expand such as +size_t ZPhysicalMemoryBacking::try_expand(size_t old_capacity, size_t new_capacity) { + assert(new_capacity > old_capacity, "invariant"); + const size_t capacity = _file.try_expand(old_capacity, new_capacity - old_capacity, _granule_size); Not because I spotted anything wrong with this patch, more because if someone one day introduces such a bug, then it will be hell to debug without an assert like the above one :) In ZPageAllocator::try_ensure_unused_for_pre_mapped I would maybe have designed ZPhysicalMemoryManager::try_ensure_unused_capacity so that it is always valid to call (the method would just return in case _backing isn't initialized). I don't need to see another webrev if you just add the assert, but please send out a new version if you rework ZPhysicalMemoryManager. Thanks, Erik From per.liden at oracle.com Wed Jul 4 09:57:14 2018 From: per.liden at oracle.com (Per Liden) Date: Wed, 4 Jul 2018 11:57:14 +0200 Subject: RFR: 8205924: ZGC: Premature OOME due to failure to expand backing file In-Reply-To: <58201055-4c6f-18af-2a06-183729d9cc6f@oracle.com> References: <5564b685-22ab-c346-3fb9-f58a9aeee75b@oracle.com> <58201055-4c6f-18af-2a06-183729d9cc6f@oracle.com> Message-ID: <6c46d251-4d39-1c12-4f76-00b2f44b957f@oracle.com> On 07/04/2018 11:44 AM, Erik Helin wrote: > On 07/02/2018 05:05 PM, Per Liden wrote: >> ZGC currently assumes that there will be enough space available on the >> backing file system to hold the max heap size (-Xmx). However, this >> might not be true. For example, the backing filesystem might have been >> misconfigured or space on that filesystem might be used by some other >> process. In this situation, ZGC will try (and fail) to map more memory >> every time a new page needs to be allocated (assuming that request >> can't be satisfied by the page case). As a result, we fail to flush >> the page cache, which in turn means we throw a premature OOME and we >> continuously take the performance hit by making unnecessary >> fallocate() syscalls that will never succeed. We should instead detect >> this situation, flush the page cache and avoid making further >> fallocate() calls. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8205924 >> Webrev: http://cr.openjdk.java.net/~pliden/8205924/webrev.0 > > This looks good to me, I would have added an assert in > size_t ZPhysicalMemoryBacking::try_expand such as > > +size_t ZPhysicalMemoryBacking::try_expand(size_t old_capacity, size_t > new_capacity) { > +? assert(new_capacity > old_capacity, "invariant"); > +? const size_t capacity = _file.try_expand(old_capacity, new_capacity - > old_capacity, _granule_size); > > Not because I spotted anything wrong with this patch, more because if > someone one day introduces such a bug, then it will be hell to debug > without an assert like the above one :) Sounds good, will add that. > > In ZPageAllocator::try_ensure_unused_for_pre_mapped I would maybe have > designed ZPhysicalMemoryManager::try_ensure_unused_capacity so that it > is always valid to call (the method would just return in case _backing > isn't initialized). I would prefer to keep that check in ZPageAllocator::try_ensure_unused_for_pre_mapped(), since that function is a special case since it's called during construction. The underlying ZPhysicalMemoryManager::try_ensure_unused_capacity() should never be called if the ZPhysicalMemoryManager isn't initialized and I'd rather crash hard than silently return if someone does that mistake. > > I don't need to see another webrev if you just add the assert, but > please send out a new version if you rework ZPhysicalMemoryManager. Thanks for reviewing, Erik! cheers, Per From erik.helin at oracle.com Wed Jul 4 10:01:09 2018 From: erik.helin at oracle.com (Erik Helin) Date: Wed, 4 Jul 2018 12:01:09 +0200 Subject: RFR: 8205924: ZGC: Premature OOME due to failure to expand backing file In-Reply-To: <6c46d251-4d39-1c12-4f76-00b2f44b957f@oracle.com> References: <5564b685-22ab-c346-3fb9-f58a9aeee75b@oracle.com> <58201055-4c6f-18af-2a06-183729d9cc6f@oracle.com> <6c46d251-4d39-1c12-4f76-00b2f44b957f@oracle.com> Message-ID: <8c08aa56-5808-09df-e4b7-38a36731568b@oracle.com> On 07/04/2018 11:57 AM, Per Liden wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205924 >>> Webrev: http://cr.openjdk.java.net/~pliden/8205924/webrev.0 >> >> This looks good to me, I would have added an assert in >> size_t ZPhysicalMemoryBacking::try_expand such as >> >> +size_t ZPhysicalMemoryBacking::try_expand(size_t old_capacity, size_t >> new_capacity) { >> +? assert(new_capacity > old_capacity, "invariant"); >> +? const size_t capacity = _file.try_expand(old_capacity, new_capacity >> - old_capacity, _granule_size); >> >> Not because I spotted anything wrong with this patch, more because if >> someone one day introduces such a bug, then it will be hell to debug >> without an assert like the above one :) > > Sounds good, will add that. Ok, good! >> >> In ZPageAllocator::try_ensure_unused_for_pre_mapped I would maybe have >> designed ZPhysicalMemoryManager::try_ensure_unused_capacity so that it >> is always valid to call (the method would just return in case _backing >> isn't initialized). > > I would prefer to keep that check in > ZPageAllocator::try_ensure_unused_for_pre_mapped(), since that function > is a special case since it's called during construction. The underlying > ZPhysicalMemoryManager::try_ensure_unused_capacity() should never be > called if the ZPhysicalMemoryManager isn't initialized and I'd rather > crash hard than silently return if someone does that mistake. Ok, that sounds good to me, just keep it the way it is then. >> I don't need to see another webrev if you just add the assert, but >> please send out a new version if you rework ZPhysicalMemoryManager. > > Thanks for reviewing, Erik! Since you only adding the assert, I'm fine with this now, Reviewed. Thanks, Erik > cheers, > Per From kim.barrett at oracle.com Wed Jul 4 23:17:00 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 4 Jul 2018 19:17:00 -0400 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> Message-ID: <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> > On Jun 26, 2018, at 1:15 PM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this bug in keeping remembered sets consistent > between HC and HS regions, causing crashes with verification? > > The problem occurs during updating the remembered sets during the > Remark pause. This process is parallel; it uses liveness information > from marking to set the new remembered set states. > > However during marking G1 attributes all liveness information of a > humongous object to the HS region; if that liveness information has not > been updated yet for HC regions, and another thread is responsible for > determining that HC region's remembered set state, the new remembered > set state of the HC region will get a state as the HS region. > > The fix is to, for HC regions, just pass the liveness data of the HS > region into the method that determines the new remembered set state. > Further, in that latter method, make sure that the predicate for > determining whether a region gets a remembered set assigned is > completely disjoint for humongous and non-humongous regions. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8205426 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8205426/webrev/ > Testing: > new test case, hs-tier1-4,jdk-tier1-3 > > Thanks, > Thomas The new live_bytes_for seems like it's overly verbose and complicated, and could instead just be something like: size_t live_bytes_for(HeapRegion* r) { // For humongous regions, use liveness of associated starts region. HeapRegion* hr = r->is_humongous() ? r->humongous_start_region() : r; return _cm->liveness(hr->hrm_index()) * HeapWordSize; } However, I wonder if this is the right way to go? It seems to me that the underlying problem is that we're even asking the live_bytes question of humongous_continues regions, when nobody really cares about the answer (after fixing the policy). We're also computing it for young regions and for humongous_start regions containing an objarray, where again the (fixed) policy doesn't care. It seems to me that things would be simpler if it were the policy that asked the live_bytes question, after it has determined that it was interested in the answer. The only downside I can think of is that the G1RemSetTrackingPolicy would be additionally coupled to the G1ConcurrentMark object. From kim.barrett at oracle.com Thu Jul 5 00:13:00 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 4 Jul 2018 20:13:00 -0400 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # Message-ID: Please review this fix of the HeapRegion gtest. The test modifies a region's "top" to unexpected values without ensuring that no allocation might use the region and no GC might run while the region is in that invalid state. We solve this by executing the test code in its very own safepoint, and by saving and then restoring the region's top back to its original value before completing the test. And since we are doing all that, there's no longer any reason to run the test in a separate VM. CR: https://bugs.openjdk.java.net/browse/JDK-8204691 Webrev: http://cr.openjdk.java.net/~kbarrett/8204691/open.00/ Testing: mach5 tier1 (where gtests are run). I wasn't able to reproduce the failure, but the issues these changes address can account for the one failure that's been reported. From kim.barrett at oracle.com Thu Jul 5 05:15:44 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Jul 2018 01:15:44 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> Message-ID: <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> > On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: > > Hi, > > Please review this small enhancement base on paper [1], that keeps the last successfully stolen queue as one of best-of-2 candidates for work stealing. > > Based on experiments done by Thomas Schatzl and myself, it shows positive impacts on task termination and average pause time. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8205921 > Webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.00/index.html > > > Test: > hotspot_gc on Linux 64 (fastdebug and release) > > > [1] Characterizing and Optimizing Hotspot Parallel Garbage > Collection on Multicore Systems > http://ranger.uta.edu/~jrao/papers/EuroSys18.pdf > > Thanks, > > -Zhengyu Once set, _last_stolen_queues entries are never invalidated. So we may as well initialize the entries to queue_num+1 mod num_queues. Then get rid of the is_valid test (and the whole notion of validity) and the (only used once per queue_num in the webrev change) random selection of k1. But I think that might not be desirable. The webrev change's behavior is to always use the queue chosen for the last steal attempt as one of the two, even if the last steal attempt failed. And because the choice of which of the two to try next prefers that one when they are both empty, we may be reduced to searching with only one random choice for a while, even though the one we keep using has repeatedly failed to yield a result. An alternative that might be better is, whenever a pop_global fails, reset the associated last_stolen id to invalid. This will revert to 2 random choices until we find (at least) one with something we can steal. Actually, it seems the referenced paper does something similar, and the webrev code doesn't match the referenced paper. Why do the last_queue array entries need to be padded? Why not just add a _last_stolen_queue member to TaskQueueSuper? I think it is a pre-existing bug that GenericTaskQueueSet::_n is of type uint, but the associated constructor argument is of type int. I think the constructor is wrong in this regard. From thomas.schatzl at oracle.com Thu Jul 5 07:16:49 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 05 Jul 2018 09:16:49 +0200 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> Message-ID: <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> Hi Kim, thanks for your review. On Wed, 2018-07-04 at 19:17 -0400, Kim Barrett wrote: > > On Jun 26, 2018, at 1:15 PM, Thomas Schatzl > com> wrote: > > > > Hi all, > > > > can I have reviews for this bug in keeping remembered sets > > consistent between HC and HS regions, causing crashes with > > verification? > > [...] > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8205426 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8205426/webrev/ > > Testing: > > new test case, hs-tier1-4,jdk-tier1-3 > > > > Thanks, > > Thomas > > The new live_bytes_for seems like it's overly verbose and > complicated, and could instead just be something like: The verbosity mainly comes from me just trying to add the comment about the approximation somehwere fitting. Cramming it into a "?" operator statement may cause confusion. But see further below. > > size_t live_bytes_for(HeapRegion* r) { > // For humongous regions, use liveness of associated starts region. > HeapRegion* hr = r->is_humongous() ? r->humongous_start_region() : > r; > return _cm->liveness(hr->hrm_index()) * HeapWordSize; > } > > However, I wonder if this is the right way to go? > > It seems to me that the underlying problem is that we're even asking > the live_bytes question of humongous_continues regions, when nobody > really cares about the answer (after fixing the policy). We're also > computing it for young regions and for humongous_start regions > containing an objarray, where again the (fixed) policy doesn't care. > > It seems to me that things would be simpler if it were the policy > that asked the live_bytes question, after it has determined that it > was interested in the answer. The only downside I can think of is > that the G1RemSetTrackingPolicy would be additionally coupled to the > G1ConcurrentMark object. I would like to have the G1RemSetTrackingPolicy mostly self-contained and not looking through things all over the place; however the main issue seems to be that we actually need to ask for the liveness and update the remembered sets for HC regions. It would be much nicer, and remove a lot of distinction between regular regions and humongous regions if the remembered sets were ready. So my master plan ;) had been to have the incremental mixed gcs ready by jdk11 and then fairly easily implement sharing of remembered sets between multiple regions. That not only solves this issue, but also quite much decreases remembered set overhead in many dimensions (e.g. if we process old gen regions during mixed gc in increments >> 1 anyway, why provide that possibility; of course there are some drawbacks to that in reducing flexibility that is not used at the moment anyway). (The exactly same thought came up when talking to ErikD about this change). There is a new webrev at http://cr.openjdk.java.net/~tschatzl/8205426/webrev.1 (full) http://cr.openjdk.java.net/~tschatzl/8205426/webrev.0_to_1 (diff, but almost useless due to many changes) That at least separates the concerns about humongous/regular region a bit. Thanks, Thomas From thomas.schatzl at oracle.com Thu Jul 5 07:54:42 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 05 Jul 2018 09:54:42 +0200 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> Message-ID: <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> Hi, On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: > > On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: > > > > Hi, > > > > Please review this small enhancement base on paper [1], that keeps > > the last successfully stolen queue as one of best-of-2 candidates > > for work stealing. > > > > Based on experiments done by Thomas Schatzl and myself, it shows > > positive impacts on task termination and average pause time. > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8205921 > > Webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.00/index.htm > > l > > > > > > Test: > > hotspot_gc on Linux 64 (fastdebug and release) > > > > > > [1] Characterizing and Optimizing Hotspot Parallel Garbage > > Collection on Multicore Systems > > http://ranger.uta.edu/~jrao/papers/EuroSys18.pdf > > > > Thanks, > > > > -Zhengyu > > Once set, _last_stolen_queues entries are never invalidated. So we > may as well initialize the entries to queue_num+1 mod num_queues. > Then get rid of the is_valid test (and the whole notion of validity) > and the (only used once per queue_num in the webrev change) random > selection of k1. > > But I think that might not be desirable. The webrev change's > behavior is to always use the queue chosen for the last steal attempt > as one of the two, even if the last steal attempt failed. And > because the choice of which of the two to try next prefers that one > when they are both empty, we may be reduced to searching with only > one random choice for a while, even though the one we keep using has > repeatedly failed to yield a result. > > An alternative that might be better is, whenever a pop_global fails, > reset the associated last_stolen id to invalid. This will revert to > 2 random choices until we find (at least) one with something we can > steal. Actually, it seems the referenced paper does something > similar, and the webrev code doesn't match the referenced paper. That may explain why my perf results are different to the paper that I was planning to investigate :) Nice find. > Why do the last_queue array entries need to be padded? Why not just > add a _last_stolen_queue member to TaskQueueSuper? The _last_stolen_queue is associated to a (stealing) thread, not the queue. Multiple threads might have the same queue as current steal target. One other option I discussed is instead of this array of PaddedQueueId (which I would rename as TaskQueueThreadLocal or TaskQueueStealLocals/Context because I can see adding more in the future) would be passing this around like the seed parameter to steal_best_of_2 (and actually put the seed parameter in there too). It's a bit weird to me to pass two different kinds of thread locals related to work stealing two different ways. The padding is to avoid potential false sharing issues as otherwise last_stolen_id's of different threads end up on the same cache line. And the writes of different threads to disjoint locations would likely invalidate the cache line all the time. Just to avoid a potential performance issue here. > I think it is a pre-existing bug that GenericTaskQueueSet::_n is of > type uint, but the associated constructor argument is of type int. I > think the constructor is wrong in this regard. - please use CamelCase for the INVALID_QUEUE_ID constant. - there are some superfluous spaces at the end-of-line, but that would be flushed out before pushing anyway. Thanks, Thomas From thomas.schatzl at oracle.com Thu Jul 5 07:57:01 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 05 Jul 2018 09:57:01 +0200 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: References: Message-ID: Hi, On Wed, 2018-07-04 at 20:13 -0400, Kim Barrett wrote: > Please review this fix of the HeapRegion gtest. > > The test modifies a region's "top" to unexpected values without > ensuring that no allocation might use the region and no GC might run > while the region is in that invalid state. We solve this by > executing the test code in its very own safepoint, and by saving and > then restoring the region's top back to its original value before > completing the test. And since we are doing all that, there's no > longer any reason to run the test in a separate VM. looks good, but the actual test is still run in a separate VM. Intentional? Thanks, Thomas From rkennke at redhat.com Thu Jul 5 14:42:19 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 5 Jul 2018 16:42:19 +0200 Subject: RFR: JDK-8206407: Primitive atomic_cmpxchg_in_heap_at() in BarrierSet::Access needs to call non-oop raw method Message-ID: <7bf834d8-3fe3-9458-21a1-02eac7e86897@redhat.com> BarrierSet::Access:atomic_cmpxchg_in_heap_at() is currently calling Raw::oop_atomic_cmpxchg_at() which is obviously wrong. We've been lucky because primitive is not bound in default OpenJDK. Even in Shenandoah land we've been lucky because primitives don't match narrowOop and thus don't get (attempted to) encoded/decoded. Lucky us. Let's fix it anyway: http://cr.openjdk.java.net/~rkennke/JDK-8206407/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8206407 Can I get reviews? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From shade at redhat.com Thu Jul 5 14:44:43 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 5 Jul 2018 16:44:43 +0200 Subject: RFR: JDK-8206407: Primitive atomic_cmpxchg_in_heap_at() in BarrierSet::Access needs to call non-oop raw method In-Reply-To: <7bf834d8-3fe3-9458-21a1-02eac7e86897@redhat.com> References: <7bf834d8-3fe3-9458-21a1-02eac7e86897@redhat.com> Message-ID: <894b27f7-2eae-ff9f-6a0a-48fcac07a48a@redhat.com> On 07/05/2018 04:42 PM, Roman Kennke wrote: > BarrierSet::Access:atomic_cmpxchg_in_heap_at() is currently calling > Raw::oop_atomic_cmpxchg_at() which is obviously wrong. > > We've been lucky because primitive is not bound in default OpenJDK. > > Even in Shenandoah land we've been lucky because primitives don't match > narrowOop and thus don't get (attempted to) encoded/decoded. > > Lucky us. > > Let's fix it anyway: > http://cr.openjdk.java.net/~rkennke/JDK-8206407/webrev.00/ Fix looks good to me. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From per.liden at oracle.com Thu Jul 5 14:58:43 2018 From: per.liden at oracle.com (Per Liden) Date: Thu, 5 Jul 2018 16:58:43 +0200 Subject: RFR: JDK-8206407: Primitive atomic_cmpxchg_in_heap_at() in BarrierSet::Access needs to call non-oop raw method In-Reply-To: <7bf834d8-3fe3-9458-21a1-02eac7e86897@redhat.com> References: <7bf834d8-3fe3-9458-21a1-02eac7e86897@redhat.com> Message-ID: <42e239d7-3006-a887-e81e-4fbeae80e2be@oracle.com> On 07/05/2018 04:42 PM, Roman Kennke wrote: > BarrierSet::Access:atomic_cmpxchg_in_heap_at() is currently calling > Raw::oop_atomic_cmpxchg_at() which is obviously wrong. > > We've been lucky because primitive is not bound in default OpenJDK. > > Even in Shenandoah land we've been lucky because primitives don't match > narrowOop and thus don't get (attempted to) encoded/decoded. > > Lucky us. > > Let's fix it anyway: > http://cr.openjdk.java.net/~rkennke/JDK-8206407/webrev.00/ Nice catch! Looks good! /Per > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8206407 > > Can I get reviews? > > Roman > From rkennke at redhat.com Thu Jul 5 15:00:38 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 5 Jul 2018 17:00:38 +0200 Subject: RFR: JDK-8206407: Primitive atomic_cmpxchg_in_heap_at() in BarrierSet::Access needs to call non-oop raw method In-Reply-To: <42e239d7-3006-a887-e81e-4fbeae80e2be@oracle.com> References: <7bf834d8-3fe3-9458-21a1-02eac7e86897@redhat.com> <42e239d7-3006-a887-e81e-4fbeae80e2be@oracle.com> Message-ID: <3513ec0b-45bb-8983-78be-1cfb81ac8e5c@redhat.com> Hi Per, >> BarrierSet::Access:atomic_cmpxchg_in_heap_at() is currently calling >> Raw::oop_atomic_cmpxchg_at() which is obviously wrong. >> >> We've been lucky because primitive is not bound in default OpenJDK. >> >> Even in Shenandoah land we've been lucky because primitives don't match >> narrowOop and thus don't get (attempted to) encoded/decoded. >> >> Lucky us. >> >> Let's fix it anyway: >> http://cr.openjdk.java.net/~rkennke/JDK-8206407/webrev.00/ > > Nice catch! Looks good! Thanks for reviewing! Does it qualify for trivial-doesn't-have-to-wait-24h-rule? I believe it does, it's 1 line that is not even touched by default. Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rkennke at redhat.com Thu Jul 5 15:12:55 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 5 Jul 2018 17:12:55 +0200 Subject: RFR: JDK-8206272: Remove stray BarrierSetAssembler call Message-ID: <48d19ab8-25d1-7a94-c015-8db7b30fc1aa@redhat.com> and while we are at trivial fixes, we've a call to get a BarrierSetAssembler* in methodHandles_x86.cpp that is subsequently not used anywhere. Bug: https://bugs.openjdk.java.net/browse/JDK-8206272 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8206272/webrev.00/ I assume this qualifies as trivial? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From shade at redhat.com Thu Jul 5 15:17:44 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 5 Jul 2018 17:17:44 +0200 Subject: RFR: JDK-8206272: Remove stray BarrierSetAssembler call In-Reply-To: <48d19ab8-25d1-7a94-c015-8db7b30fc1aa@redhat.com> References: <48d19ab8-25d1-7a94-c015-8db7b30fc1aa@redhat.com> Message-ID: On 07/05/2018 05:12 PM, Roman Kennke wrote: > and while we are at trivial fixes, we've a call to get a > BarrierSetAssembler* in methodHandles_x86.cpp that is subsequently not > used anywhere. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8206272 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8206272/webrev.00/ > > I assume this qualifies as trivial? I think so. Looks good! -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From per.liden at oracle.com Thu Jul 5 15:24:48 2018 From: per.liden at oracle.com (Per Liden) Date: Thu, 5 Jul 2018 17:24:48 +0200 Subject: RFR: JDK-8206272: Remove stray BarrierSetAssembler call In-Reply-To: <48d19ab8-25d1-7a94-c015-8db7b30fc1aa@redhat.com> References: <48d19ab8-25d1-7a94-c015-8db7b30fc1aa@redhat.com> Message-ID: On 07/05/2018 05:12 PM, Roman Kennke wrote: > and while we are at trivial fixes, we've a call to get a > BarrierSetAssembler* in methodHandles_x86.cpp that is subsequently not > used anywhere. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8206272 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8206272/webrev.00/ > > I assume this qualifies as trivial? Yep, looks good and trivial to me. /Per > > Roman > From per.liden at oracle.com Thu Jul 5 15:26:01 2018 From: per.liden at oracle.com (Per Liden) Date: Thu, 5 Jul 2018 17:26:01 +0200 Subject: RFR: JDK-8206407: Primitive atomic_cmpxchg_in_heap_at() in BarrierSet::Access needs to call non-oop raw method In-Reply-To: <3513ec0b-45bb-8983-78be-1cfb81ac8e5c@redhat.com> References: <7bf834d8-3fe3-9458-21a1-02eac7e86897@redhat.com> <42e239d7-3006-a887-e81e-4fbeae80e2be@oracle.com> <3513ec0b-45bb-8983-78be-1cfb81ac8e5c@redhat.com> Message-ID: <881e0ffd-4fbc-d7ad-ba2c-8b66b12e11f5@oracle.com> On 07/05/2018 05:00 PM, Roman Kennke wrote: > Hi Per, > >>> BarrierSet::Access:atomic_cmpxchg_in_heap_at() is currently calling >>> Raw::oop_atomic_cmpxchg_at() which is obviously wrong. >>> >>> We've been lucky because primitive is not bound in default OpenJDK. >>> >>> Even in Shenandoah land we've been lucky because primitives don't match >>> narrowOop and thus don't get (attempted to) encoded/decoded. >>> >>> Lucky us. >>> >>> Let's fix it anyway: >>> http://cr.openjdk.java.net/~rkennke/JDK-8206407/webrev.00/ >> >> Nice catch! Looks good! > > Thanks for reviewing! > > Does it qualify for trivial-doesn't-have-to-wait-24h-rule? I believe it > does, it's 1 line that is not even touched by default. Fine with me (assuming it still passes tier1). /Per > > Roman > From zgu at redhat.com Thu Jul 5 16:08:55 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 5 Jul 2018 12:08:55 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> Message-ID: <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> Hi Kim and Thomas, Thanks for reviewing. On 07/05/2018 03:54 AM, Thomas Schatzl wrote: > Hi, > > On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: >>> On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: >>> >>> Hi, >>> >>> Please review this small enhancement base on paper [1], that keeps >>> the last successfully stolen queue as one of best-of-2 candidates >>> for work stealing. >>> >>> Based on experiments done by Thomas Schatzl and myself, it shows >>> positive impacts on task termination and average pause time. >>> >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205921 >>> Webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.00/index.htm >>> l >>> >>> >>> Test: >>> hotspot_gc on Linux 64 (fastdebug and release) >>> >>> >>> [1] Characterizing and Optimizing Hotspot Parallel Garbage >>> Collection on Multicore Systems >>> http://ranger.uta.edu/~jrao/papers/EuroSys18.pdf >>> >>> Thanks, >>> >>> -Zhengyu >> >> Once set, _last_stolen_queues entries are never invalidated. So we >> may as well initialize the entries to queue_num+1 mod num_queues. >> Then get rid of the is_valid test (and the whole notion of validity) >> and the (only used once per queue_num in the webrev change) random >> selection of k1. >> >> But I think that might not be desirable. The webrev change's >> behavior is to always use the queue chosen for the last steal attempt >> as one of the two, even if the last steal attempt failed. And >> because the choice of which of the two to try next prefers that one >> when they are both empty, we may be reduced to searching with only >> one random choice for a while, even though the one we keep using has >> repeatedly failed to yield a result. >> >> An alternative that might be better is, whenever a pop_global fails, >> reset the associated last_stolen id to invalid. This will revert to >> 2 random choices until we find (at least) one with something we can >> steal. Actually, it seems the referenced paper does something >> similar, and the webrev code doesn't match the referenced paper. > > That may explain why my perf results are different to the paper that I > was planning to investigate :) Nice find. Sorry, my bad. > >> Why do the last_queue array entries need to be padded? Why not just >> add a _last_stolen_queue member to TaskQueueSuper? > > The _last_stolen_queue is associated to a (stealing) thread, not the > queue. Multiple threads might have the same queue as current steal > target. > > One other option I discussed is instead of this array of PaddedQueueId > (which I would rename as TaskQueueThreadLocal or > TaskQueueStealLocals/Context because I can see adding more in the > future) would be passing this around like the seed parameter to > steal_best_of_2 (and actually put the seed parameter in there too). > > It's a bit weird to me to pass two different kinds of thread locals > related to work stealing two different ways. I would prefer to pass down TaskQueueStealContext, just like seed, to avoid this padded queue id array. However, it means that we have to update all call sites, which I am not comfortable to do at this time. Could we make this a future item? Updated webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.01/index.html Thanks, -Zhengyu > > The padding is to avoid potential false sharing issues as otherwise > last_stolen_id's of different threads end up on the same cache line. > And the writes of different threads to disjoint locations would likely > invalidate the cache line all the time. Just to avoid a potential > performance issue here. > >> I think it is a pre-existing bug that GenericTaskQueueSet::_n is of >> type uint, but the associated constructor argument is of type int. I >> think the constructor is wrong in this regard. > > > - please use CamelCase for the INVALID_QUEUE_ID constant. > > - there are some superfluous spaces at the end-of-line, but that would > be flushed out before pushing anyway. > > Thanks, > Thomas > From kim.barrett at oracle.com Thu Jul 5 17:33:09 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Jul 2018 13:33:09 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> Message-ID: <39B54C1B-815D-4A24-A4F5-F3660FE63E05@oracle.com> > On Jul 5, 2018, at 3:54 AM, Thomas Schatzl wrote: > > Hi, > > On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: >>> On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: >>> >>> [?] >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205921 >>> Webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.00/index.htm >>> l >>> [?] >> Why do the last_queue array entries need to be padded? Why not just >> add a _last_stolen_queue member to TaskQueueSuper? > > The _last_stolen_queue is associated to a (stealing) thread, not the > queue. Multiple threads might have the same queue as current steal > target. The stealing thread should use its own queue to obtain and record this value, e.g. _queues[queue_num]->_last_stolen_queue It seems to me the random seed could also be there, addressing your other complaint (below). That might have false sharing issues with the volatile members of the queue, but the existing _elems member have similar issues. Maybe the volatile queue members ought to be padded? > One other option I discussed is instead of this array of PaddedQueueId > (which I would rename as TaskQueueThreadLocal or > TaskQueueStealLocals/Context because I can see adding more in the > future) would be passing this around like the seed parameter to > steal_best_of_2 (and actually put the seed parameter in there too). > > It's a bit weird to me to pass two different kinds of thread locals > related to work stealing two different ways. > > The padding is to avoid potential false sharing issues as otherwise > last_stolen_id's of different threads end up on the same cache line. > And the writes of different threads to disjoint locations would likely > invalidate the cache line all the time. Just to avoid a potential > performance issue here. From zgu at redhat.com Thu Jul 5 18:44:30 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 5 Jul 2018 14:44:30 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <39B54C1B-815D-4A24-A4F5-F3660FE63E05@oracle.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <39B54C1B-815D-4A24-A4F5-F3660FE63E05@oracle.com> Message-ID: <78c346e6-6c36-243a-e84d-16b2cee458d7@redhat.com> Hi Kim, On 07/05/2018 01:33 PM, Kim Barrett wrote: >> On Jul 5, 2018, at 3:54 AM, Thomas Schatzl wrote: >> >> Hi, >> >> On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: >>>> On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: >>>> >>>> [?] >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205921 >>>> Webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.00/index.htm >>>> l >>>> [?] >>> Why do the last_queue array entries need to be padded? Why not just >>> add a _last_stolen_queue member to TaskQueueSuper? >> >> The _last_stolen_queue is associated to a (stealing) thread, not the >> queue. Multiple threads might have the same queue as current steal >> target. > > The stealing thread should use its own queue to obtain and record this > value, e.g. > > _queues[queue_num]->_last_stolen_queue > > It seems to me the random seed could also be there, addressing your > other complaint (below). > Is it a bit weird to have these two fields in queue? given they have nothing to do with queue itself? I intended to use thread local for last_stolen_queue in Shenandoah, since we do have extra spaces in GCThreadLocalData. > That might have false sharing issues with the volatile members of the > queue, but the existing _elems member have similar issues. Maybe the > volatile queue members ought to be padded? I can see we might need to pad Age and bottom. But I don't understand why _elems member has similar issues, could you explain? Thanks, -Zhengyu > >> One other option I discussed is instead of this array of PaddedQueueId >> (which I would rename as TaskQueueThreadLocal or >> TaskQueueStealLocals/Context because I can see adding more in the >> future) would be passing this around like the seed parameter to >> steal_best_of_2 (and actually put the seed parameter in there too). >> >> It's a bit weird to me to pass two different kinds of thread locals >> related to work stealing two different ways. >> >> The padding is to avoid potential false sharing issues as otherwise >> last_stolen_id's of different threads end up on the same cache line. >> And the writes of different threads to disjoint locations would likely >> invalidate the cache line all the time. Just to avoid a potential >> performance issue here. > From kim.barrett at oracle.com Thu Jul 5 18:49:46 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Jul 2018 14:49:46 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> Message-ID: <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> > On Jul 5, 2018, at 12:08 PM, Zhengyu Gu wrote: > > Hi Kim and Thomas, > > Thanks for reviewing. > > On 07/05/2018 03:54 AM, Thomas Schatzl wrote: >> Hi, >> On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: >>>> On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: >>> [?] >>> An alternative that might be better is, whenever a pop_global fails, >>> reset the associated last_stolen id to invalid. This will revert to >>> 2 random choices until we find (at least) one with something we can >>> steal. Actually, it seems the referenced paper does something >>> similar, and the webrev code doesn't match the referenced paper. >> That may explain why my perf results are different to the paper that I >> was planning to investigate :) Nice find. > > Sorry, my bad. > > [?] > Updated webrev: > > http://cr.openjdk.java.net/~zgu/8205921/webrev.01/index.html src/hotspot/share/gc/shared/taskqueue.inline.hpp 255 if (sz2 > sz1) { 256 sel_k = k2; 257 suc = _queues[k2]->pop_global(t); 258 } else { 259 sel_k = k1; 260 suc = _queues[k1]->pop_global(t); 261 } The paper avoids the steal attempt when both potential victims have a size of zero, e.g. insert another clause: } else if (sz1 == 0) { sel_k = k1; // Might be needed to avoid uninitialized variable warnings? suc = false; } else { ... There is a race condition between obtaining the size and checking it here, but I don't think that's important. The point is to avoid an expensive steal attempt when it is very likely to fail. From zgu at redhat.com Thu Jul 5 19:16:06 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 5 Jul 2018 15:16:06 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> Message-ID: <75070643-0492-c620-eb11-3225506ef1b7@redhat.com> On 07/05/2018 02:49 PM, Kim Barrett wrote: >> On Jul 5, 2018, at 12:08 PM, Zhengyu Gu wrote: >> >> Hi Kim and Thomas, >> >> Thanks for reviewing. >> >> On 07/05/2018 03:54 AM, Thomas Schatzl wrote: >>> Hi, >>> On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: >>>>> On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: >>>> [?] > >>>> An alternative that might be better is, whenever a pop_global fails, >>>> reset the associated last_stolen id to invalid. This will revert to >>>> 2 random choices until we find (at least) one with something we can >>>> steal. Actually, it seems the referenced paper does something >>>> similar, and the webrev code doesn't match the referenced paper. >>> That may explain why my perf results are different to the paper that I >>> was planning to investigate :) Nice find. >> >> Sorry, my bad. >> >> [?] >> Updated webrev: >> >> http://cr.openjdk.java.net/~zgu/8205921/webrev.01/index.html > > src/hotspot/share/gc/shared/taskqueue.inline.hpp > 255 if (sz2 > sz1) { > 256 sel_k = k2; > 257 suc = _queues[k2]->pop_global(t); > 258 } else { > 259 sel_k = k1; > 260 suc = _queues[k1]->pop_global(t); > 261 } > > The paper avoids the steal attempt when both potential victims have a > size of zero, e.g. insert another clause: > > } else if (sz1 == 0) { > sel_k = k1; // Might be needed to avoid uninitialized variable warnings? > suc = false; > } else { > ... > > There is a race condition between obtaining the size and checking it > here, but I don't think that's important. The point is to avoid an > expensive steal attempt when it is very likely to fail. Yes, I missed this. http://cr.openjdk.java.net/~zgu/8205921/webrev.02/index.html Thanks, -Zhengyu > From kim.barrett at oracle.com Thu Jul 5 19:26:04 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Jul 2018 15:26:04 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <78c346e6-6c36-243a-e84d-16b2cee458d7@redhat.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <39B54C1B-815D-4A24-A4F5-F3660FE63E05@oracle.com> <78c346e6-6c36-243a-e84d-16b2cee458d7@redhat.com> Message-ID: > On Jul 5, 2018, at 2:44 PM, Zhengyu Gu wrote: > > Hi Kim, > > On 07/05/2018 01:33 PM, Kim Barrett wrote: >>> On Jul 5, 2018, at 3:54 AM, Thomas Schatzl wrote: >>> >>> Hi, >>> >>> On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: >>>>> On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: >>>>> >>>>> [?] >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205921 >>>>> Webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.00/index.htm >>>>> l >>>>> [?] >>>> Why do the last_queue array entries need to be padded? Why not just >>>> add a _last_stolen_queue member to TaskQueueSuper? >>> >>> The _last_stolen_queue is associated to a (stealing) thread, not the >>> queue. Multiple threads might have the same queue as current steal >>> target. >> The stealing thread should use its own queue to obtain and record this >> value, e.g. >> _queues[queue_num]->_last_stolen_queue >> It seems to me the random seed could also be there, addressing your >> other complaint (below). > Is it a bit weird to have these two fields in queue? given they have nothing to do with queue itself? > > I intended to use thread local for last_stolen_queue in Shenandoah, since we do have extra spaces in GCThreadLocalData. I don't think it's weird. Both the last steal queue and the random seed are 1:1 associated with a specific queue, and are part of the implementation of operations on the queue. This is a common problem when there is a cooperating pair of class X and class "collection of X". Maybe if steal_best_of_2 were a member function of the queue, rather than implemented by the queue set operating on the data in a selected queue, it might seem more apparent that this information belongs with the queue. >> That might have false sharing issues with the volatile members of the >> queue, but the existing _elems member have similar issues. Maybe the >> volatile queue members ought to be padded? > > I can see we might need to pad Age and bottom. But I don't understand why _elems member has similar issues, could you explain? Unshared _elems may be in the same cache line as shared _age or _bottom, so reads of the _elems member may be impacted by writes to those shared members by other threads. The same is true for any new unshared members we might add. From kim.barrett at oracle.com Thu Jul 5 19:33:03 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Jul 2018 15:33:03 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <75070643-0492-c620-eb11-3225506ef1b7@redhat.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> <75070643-0492-c620-eb11-3225506ef1b7@redhat.com> Message-ID: > On Jul 5, 2018, at 3:16 PM, Zhengyu Gu wrote: > >>> Updated webrev: >>> >>> http://cr.openjdk.java.net/~zgu/8205921/webrev.01/index.html >> src/hotspot/share/gc/shared/taskqueue.inline.hpp >> 255 if (sz2 > sz1) { >> 256 sel_k = k2; >> 257 suc = _queues[k2]->pop_global(t); >> 258 } else { >> 259 sel_k = k1; >> 260 suc = _queues[k1]->pop_global(t); >> 261 } >> The paper avoids the steal attempt when both potential victims have a >> size of zero, e.g. insert another clause: >> } else if (sz1 == 0) { >> sel_k = k1; // Might be needed to avoid uninitialized variable warnings? >> suc = false; >> } else { >> ... >> There is a race condition between obtaining the size and checking it >> here, but I don't think that's important. The point is to avoid an >> expensive steal attempt when it is very likely to fail. > > Yes, I missed this. > > http://cr.openjdk.java.net/~zgu/8205921/webrev.02/index.html > > Thanks, > > -Zhengyu I think that makes the change accurately reflect the paper. Just one minor nit: extraneous whitespace in ?0 )?: 258 } else if (sz1 > 0 ) { From thomas.schatzl at oracle.com Thu Jul 5 19:47:49 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 05 Jul 2018 21:47:49 +0200 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> Message-ID: <344187d3db4f3a069a70a730cc1c3b6555243f9d.camel@oracle.com> Hi, On Thu, 2018-07-05 at 14:49 -0400, Kim Barrett wrote: > > On Jul 5, 2018, at 12:08 PM, Zhengyu Gu wrote: > > > > Hi Kim and Thomas, > > > > Thanks for reviewing. > > > > On 07/05/2018 03:54 AM, Thomas Schatzl wrote: > > > Hi, > > > On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: > > > > > On Jun 27, 2018, at 2:39 PM, Zhengyu Gu > > > > > wrote: > > > > > > > > [?] > > > > An alternative that might be better is, whenever a pop_global > > > > fails, reset the associated last_stolen id to invalid. This > > > > will revert to 2 random choices until we find (at least) one > > > > with something we can steal. Actually, it seems the referenced > > > > paper does something similar, and the webrev code doesn't match > > > > the referenced paper. > > > > > > That may explain why my perf results are different to the paper > > > that I was planning to investigate :) Nice > > > find. > > > > Sorry, my bad. I have been looking into this a bit and finally (with some patch from me that fixes the changes too) and some additional probes (using the TASKQUEUE_STATS "infrastructure") I am starting to get meaningful results. More about that later. In any case the technique looks like a nice improvement at least in steal attempts and steal/steal attempts ratio on some bigger tests, but I need to update my code again it seems :) I can add the changes to the TASKQUEUE_STATS logging later btw. > > > > [?] > > Updated webrev: > > > > http://cr.openjdk.java.net/~zgu/8205921/webrev.01/index.html > > src/hotspot/share/gc/shared/taskqueue.inline.hpp > 255 if (sz2 > sz1) { > 256 sel_k = k2; > 257 suc = _queues[k2]->pop_global(t); > 258 } else { > 259 sel_k = k1; > 260 suc = _queues[k1]->pop_global(t); > 261 } > > The paper avoids the steal attempt when both potential victims have a > size of zero, e.g. insert another clause: > > } else if (sz1 == 0) { > sel_k = k1; // Might be needed to avoid uninitialized variable > warnings? > suc = false; > } else { > ... > > There is a race condition between obtaining the size and checking it > here, but I don't think that's important. The point is to avoid an > expensive steal attempt when it is very likely to fail. > There is another bug in the existing code: current Hotspot collectors all reuse a single task queue set. So since the queue id's are only initialized once at startup, there will be some initial use of a suboptimal queue. I assume Shenandoah does not need a reset because it creates new taskqueuesets whenever it needs them (and frees them afterwards). The current design of passing stealing-local information (the seed) makes it clear that the owner of that variable needs to initialize it. At this time I have no preference on Kim's suggestion to put these variables into the queue if you asked me. I would tend to encapsulate the mechanism as much as possible though. I do think if Shenandoah wants to put this information into the GCThreadLocalBlock (for what reason?) it is probably most flexible to pass these things as kind of context to the steal_best_of_2() method. I do not think it is desirable to have a second copy of the taskqueue* code around; I can't see how else one implementation can use the TaskQueueSet local queue ids and the other use the same information from somewhere else right now. Thanks, Thomas From zgu at redhat.com Thu Jul 5 19:56:54 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 5 Jul 2018 15:56:54 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <39B54C1B-815D-4A24-A4F5-F3660FE63E05@oracle.com> <78c346e6-6c36-243a-e84d-16b2cee458d7@redhat.com> Message-ID: <196db311-d3d2-de16-9af2-593d82b17a29@redhat.com> On 07/05/2018 03:26 PM, Kim Barrett wrote: >> On Jul 5, 2018, at 2:44 PM, Zhengyu Gu wrote: >> >> Hi Kim, >> >> On 07/05/2018 01:33 PM, Kim Barrett wrote: >>>> On Jul 5, 2018, at 3:54 AM, Thomas Schatzl wrote: >>>> >>>> Hi, >>>> >>>> On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote: >>>>>> On Jun 27, 2018, at 2:39 PM, Zhengyu Gu wrote: >>>>>> >>>>>> [?] >>>>>> >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8205921 >>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.00/index.htm >>>>>> l >>>>>> [?] >>>>> Why do the last_queue array entries need to be padded? Why not just >>>>> add a _last_stolen_queue member to TaskQueueSuper? >>>> >>>> The _last_stolen_queue is associated to a (stealing) thread, not the >>>> queue. Multiple threads might have the same queue as current steal >>>> target. >>> The stealing thread should use its own queue to obtain and record this >>> value, e.g. >>> _queues[queue_num]->_last_stolen_queue >>> It seems to me the random seed could also be there, addressing your >>> other complaint (below). >> Is it a bit weird to have these two fields in queue? given they have nothing to do with queue itself? >> >> I intended to use thread local for last_stolen_queue in Shenandoah, since we do have extra spaces in GCThreadLocalData. > > I don't think it's weird. Both the last steal queue and the random > seed are 1:1 associated with a specific queue, and are part of the > implementation of operations on the queue. This is a common problem > when there is a cooperating pair of class X and class "collection of > X". Maybe if steal_best_of_2 were a member function of the queue, > rather than implemented by the queue set operating on the data in a > selected queue, it might seem more apparent that this information > belongs with the queue. > >>> That might have false sharing issues with the volatile members of the >>> queue, but the existing _elems member have similar issues. Maybe the >>> volatile queue members ought to be padded? >> >> I can see we might need to pad Age and bottom. But I don't understand why _elems member has similar issues, could you explain? > > Unshared _elems may be in the same cache line as shared _age or > _bottom, so reads of the _elems member may be impacted by writes to > those shared members by other threads. The same is true for any new > unshared members we might add. Ah, I thought _elems is from additional allocation, it is not a concern, but I guess there is still a chance. Thanks, -Zhengyu > From zgu at redhat.com Thu Jul 5 19:57:30 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 5 Jul 2018 15:57:30 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> <75070643-0492-c620-eb11-3225506ef1b7@redhat.com> Message-ID: On 07/05/2018 03:33 PM, Kim Barrett wrote: >> On Jul 5, 2018, at 3:16 PM, Zhengyu Gu wrote: >> >>>> Updated webrev: >>>> >>>> http://cr.openjdk.java.net/~zgu/8205921/webrev.01/index.html >>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>> 255 if (sz2 > sz1) { >>> 256 sel_k = k2; >>> 257 suc = _queues[k2]->pop_global(t); >>> 258 } else { >>> 259 sel_k = k1; >>> 260 suc = _queues[k1]->pop_global(t); >>> 261 } >>> The paper avoids the steal attempt when both potential victims have a >>> size of zero, e.g. insert another clause: >>> } else if (sz1 == 0) { >>> sel_k = k1; // Might be needed to avoid uninitialized variable warnings? >>> suc = false; >>> } else { >>> ... >>> There is a race condition between obtaining the size and checking it >>> here, but I don't think that's important. The point is to avoid an >>> expensive steal attempt when it is very likely to fail. >> >> Yes, I missed this. >> >> http://cr.openjdk.java.net/~zgu/8205921/webrev.02/index.html >> >> Thanks, >> >> -Zhengyu > > I think that makes the change accurately reflect the paper. > > Just one minor nit: extraneous whitespace in ?0 )?: > 258 } else if (sz1 > 0 ) { > I will fix it before push. Thanks a lot! -Zhengyu > From kim.barrett at oracle.com Thu Jul 5 20:03:27 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Jul 2018 16:03:27 -0400 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: References: Message-ID: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> > On Jul 5, 2018, at 3:57 AM, Thomas Schatzl wrote: > > Hi, > > On Wed, 2018-07-04 at 20:13 -0400, Kim Barrett wrote: >> Please review this fix of the HeapRegion gtest. >> >> The test modifies a region's "top" to unexpected values without >> ensuring that no allocation might use the region and no GC might run >> while the region is in that invalid state. We solve this by >> executing the test code in its very own safepoint, and by saving and >> then restoring the region's top back to its original value before >> completing the test. And since we are doing all that, there's no >> longer any reason to run the test in a separate VM. > > looks good, but the actual test is still run in a separate VM. > Intentional? Unintentional. And now I?m not sure what I last ran through mach5. I?ll re-test with TEST_OTHER_VM => TEST_VM. I know that failed in an obscure way earlier, but I think that was because of an unrelated recently introduced bug that?s been fixed in the repo. From kim.barrett at oracle.com Thu Jul 5 20:12:20 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Jul 2018 16:12:20 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> <75070643-0492-c620-eb11-3225506ef1b7@redhat.com>

Message-ID: > On Jul 5, 2018, at 3:57 PM, Zhengyu Gu wrote: > On 07/05/2018 03:33 PM, Kim Barrett wrote: >>> On Jul 5, 2018, at 3:16 PM, Zhengyu Gu wrote: >>> [?] >>> http://cr.openjdk.java.net/~zgu/8205921/webrev.02/index.html >>> >>> Thanks, >>> >>> -Zhengyu >> I think that makes the change accurately reflect the paper. >> Just one minor nit: extraneous whitespace in ?0 )?: >> 258 } else if (sz1 > 0 ) { > I will fix it before push. > > Thanks a lot! > > -Zhengyu In case there?s any confusion, that wasn?t a ?Looks good. Reviewed.? There?s still the padding and where to put the last steal queue discussions to be resolved. From kim.barrett at oracle.com Thu Jul 5 20:53:47 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 5 Jul 2018 16:53:47 -0400 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> Message-ID: <68B2A585-08B4-4741-93C6-B68D3CC801CA@oracle.com> > On Jul 5, 2018, at 3:16 AM, Thomas Schatzl wrote: > There is a new webrev at > > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.1 (full) > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.0_to_1 (diff, but > almost useless due to many changes) > > That at least separates the concerns about humongous/regular region a > bit. > > Thanks, > Thomas I like this much better. It eliminates the implicit logical coupling that the before rebuild task "knows" the liveness of the starts region is good enough, without introducing physical coupling from remset to concurrentmark. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1RemSetTrackingPolicy.cpp 116 if (!r->is_old() && r->is_archive()) { I think that should be || rather than &&. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1RemSetTrackingPolicy.cpp 111 bool G1RemSetTrackingPolicy::update_before_rebuild(HeapRegion* r, size_t live_bytes) { Consider adding "assert(!r->is_humongous(), ...)". The !r->is_old() will filter them out, but we shouldn't be here at all and should have instead called the associated update_humongous function. ------------------------------------------------------------------------------ From zgu at redhat.com Thu Jul 5 23:03:29 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 5 Jul 2018 19:03:29 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> <75070643-0492-c620-eb11-3225506ef1b7@redhat.com>

Message-ID: <7ba92840-d12f-0f4c-2b02-21ca690c3ca4@redhat.com> On 07/05/2018 04:12 PM, Kim Barrett wrote: >> On Jul 5, 2018, at 3:57 PM, Zhengyu Gu wrote: >> On 07/05/2018 03:33 PM, Kim Barrett wrote: >>>> On Jul 5, 2018, at 3:16 PM, Zhengyu Gu wrote: >>>> [?] >>>> http://cr.openjdk.java.net/~zgu/8205921/webrev.02/index.html >>>> >>>> Thanks, >>>> >>>> -Zhengyu >>> I think that makes the change accurately reflect the paper. >>> Just one minor nit: extraneous whitespace in ?0 )?: >>> 258 } else if (sz1 > 0 ) { >> I will fix it before push. >> >> Thanks a lot! >> >> -Zhengyu > > In case there?s any confusion, that wasn?t a ?Looks good. Reviewed.? There?s still the > padding and where to put the last steal queue discussions to be resolved. > Got it. I am fine with placing last steal queue inside stealing thread's queue. However, I think padding fields is beyond this RFE, we should file new one to address this issue. Thanks, -Zhengyu From zgu at redhat.com Thu Jul 5 23:22:59 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 5 Jul 2018 19:22:59 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <344187d3db4f3a069a70a730cc1c3b6555243f9d.camel@oracle.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> <344187d3db4f3a069a70a730cc1c3b6555243f9d.camel@oracle.com> Message-ID: <61def7df-d441-4a65-ea04-18e282b94db9@redhat.com> Hi Thomas, > > I have been looking into this a bit and finally (with some patch from > me that fixes the changes too) and some additional probes (using the > TASKQUEUE_STATS "infrastructure") I am starting to get meaningful > results. > > More about that later. > > In any case the technique looks like a nice improvement at least in > steal attempts and steal/steal attempts ratio on some bigger tests, but > I need to update my code again it seems :) > > I can add the changes to the TASKQUEUE_STATS logging later btw. Great! Looking forward to seeing the results. > >>> >>> [?] >>> Updated webrev: >>> >>> http://cr.openjdk.java.net/~zgu/8205921/webrev.01/index.html >> >> src/hotspot/share/gc/shared/taskqueue.inline.hpp >> 255 if (sz2 > sz1) { >> 256 sel_k = k2; >> 257 suc = _queues[k2]->pop_global(t); >> 258 } else { >> 259 sel_k = k1; >> 260 suc = _queues[k1]->pop_global(t); >> 261 } >> >> The paper avoids the steal attempt when both potential victims have a >> size of zero, e.g. insert another clause: >> >> } else if (sz1 == 0) { >> sel_k = k1; // Might be needed to avoid uninitialized variable >> warnings? >> suc = false; >> } else { >> ... >> >> There is a race condition between obtaining the size and checking it >> here, but I don't think that's important. The point is to avoid an >> expensive steal attempt when it is very likely to fail. >> > > There is another bug in the existing code: current Hotspot collectors > all reuse a single task queue set. So since the queue id's are only > initialized once at startup, there will be some initial use of a > suboptimal queue. Technically, it is a bug. I doubt it will have material impact, cause the old value probably just as good as next random one. > > I assume Shenandoah does not need a reset because it creates new > taskqueuesets whenever it needs them (and frees them afterwards). > Shenandoah does reuse queues, we added clear() method inside our queue set implementation to clean up queue, overflow queue and buffer, etc. > The current design of passing stealing-local information (the seed) > makes it clear that the owner of that variable needs to initialize it. > > At this time I have no preference on Kim's suggestion to put these > variables into the queue if you asked me. I would tend to encapsulate > the mechanism as much as possible though. > > I do think if Shenandoah wants to put this information into the > GCThreadLocalBlock (for what reason?) it is probably most flexible to > pass these things as kind of context to the steal_best_of_2() method. I > do not think it is desirable to have a second copy of the taskqueue* > code around; I can't see how else one implementation can use the > TaskQueueSet local queue ids and the other use the same information > from somewhere else right now. Similar to what ZGC does, so we can avoid passing worker id and queue, etc. all over the places. We don't want to use gnu style thread-local, so GCThreadLocalBlock is the temporary place until compiler upgrade (?) As I mentioned in early email, I would prefer to pass TaskQueueStealLocals/Context, but I am afraid of venturing into other GCs that I am not familiar with. Thomas, seems you have made other changes/improvements, do you want to take over this RFE? I am fine with either ways. Thanks, -Zhengyu > > Thanks, > Thomas > From ioi.lam at oracle.com Fri Jul 6 00:45:29 2018 From: ioi.lam at oracle.com (Ioi Lam) Date: Thu, 5 Jul 2018 17:45:29 -0700 Subject: RFR(L): 8202035: Archive the set of ModuleDescriptor and ModuleReference objects for system modules In-Reply-To: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> References: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> Message-ID: <64c6255c-7e5f-7688-6848-018504f479be@oracle.com> Hi Jiangli, Thank you so much for working on this. I think it's great that we can get the start-up improvement by archiving the ModuleDescriptor. I just have some coding style comments regarding heapShared.cpp. This file contains the code for coping objects and relocating pointers. By its nature, this kind of code is usually complicated, so I think we should try to make it as easy to understand as possible. [1] HeapShared::walk_from_field_and_archiving: ??? This name is not grammatically correct. How about HeapShared::archive_reachable_objects_from_static_field [2] How about changing the parameter field_offset -> static_field_offset ??? When I first read the code I was confused whether it's talking ??? about static or instance fields. Usually, "field" ??? implies instance field, so it's better to specifically ??? say "static field". [3] This code would fail if "f" is already archived. ??? 473?? // get the archived copy of the field referenced object ??? 474?? oop af = MetaspaceShared::archive_heap_object(f, THREAD); ??? 475?? WalkOopAndArchiveClosure walker(1, subgraph_info, f, af); ??? 476?? f->oop_iterate(&walker); [4] There's duplicated code between walk_from_field_and_archiving and ? ? WalkOopAndArchiveClosure::do_oop_work ??? 403?? assert(relocated_k == MetaspaceShared::get_relocated_klass(orig_k), ? ? 404????????? "must be the relocated Klass in the shared space"); ??? 405?? _subgraph_info->add_subgraph_object_klass(orig_k, relocated_k); ??? - vs - ? ? 484?? assert(relocated_k == MetaspaceShared::get_relocated_klass(orig_k), ? ? 485????????? "must be the relocated Klass in the shared space"); ? ? 486?? subgraph_info->add_subgraph_object_klass(orig_k, relocated_k); [5] This code? is also duplicated: ? ? 375?? RawAccess::oop_store(new_p, archived); ? ? 376?? log.print("--- archived copy existing, store archived " PTR_FORMAT " in " PTR_FORMAT, ? ? 377???????????? p2i(archived), p2i(new_p)); ??? - vs - ? ? 395? RawAccess::oop_store(new_p, archived); ??? 396? log.print("=== store archived " PTR_FORMAT " in " PTR_FORMAT, ??? 397??????????? p2i(archived), p2i(new_p)); [6] This code, even though it's correct, is hard to understand -- ? ? why are we calculating the distance between the two objects? ? ? 368? size_t delta = pointer_delta((HeapWord*)_archived_referencing_obj, ? ? 369 (HeapWord*)_orig_referencing_obj); ? ? 370? T* new_p = (T*)((HeapWord*)p + delta); ??? I thin it would be easier to understand if we change the order of the ? ? two arithmetic operations: ??? // new_p is the address of the same field inside _archived_referencing_obj. ??? size_t field_offset_in_bytes = pointer_delta(p, _orig_referencing_obj, 1); ??? T* new_p = (T*)(address(_orig_referencing_obj) + field_offset_in_bytes); [7] I have a hard time understand this log: ??? 376?? log.print("--- archived copy existing, store archived " PTR_FORMAT " in " PTR_FORMAT, ??? 377???????????? p2i(archived), p2i(new_p)); ??? How about this? ??? log.print("--- updated embedded pointer @[" PTR_FORMAT "] => " PTR_FORMAT, ????????????? p2i(new_p), p2i(archived)); For your consideration, I've incorporated my comments above into heapShared.cpp. I've not tested it so it most likely won't build :-( http://cr.openjdk.java.net/~iklam/misc/heapShared.old.cpp? [your version] http://cr.openjdk.java.net/~iklam/misc/heapShared.new.cpp? [my version] Please take a look and see if you like it. Thanks - Ioi On 6/28/18 4:15 PM, Jiangli Zhou wrote: > This is a follow-up RFE of JDK-8201650 (Move iteration order randomization of unmodifiable Set and Map to iterators), which was resolved to allow Set/Map objects being archived at CDS dump time (thanks Claes and Stuart Marks). In the current RFE, it archives the set of system ModuleReference and ModuleDescriptor objects (including their referenced objects) in 'open' archive heap region at CDS dump time. It allows reusing of the objects and bypassing the process of creating the system ModuleDescriptors and ModuleReferences at runtime for startup improvement. My preliminary measurements on linux-x64 showed ~5% startup improvement when running HelloWorld from -cp using archived module objects at runtime (without extra tuning). > > The library changes in the following webrev are contributed by Alan Bateman. Thanks Alan and Mandy for discussions and help. Thanks Karen, Lois and Ioi for discussion and suggestions on initialization ordering. > > The majority of the module object archiving code are in heapShared.hpp and heapShared.cpp. Thanks Coleen for pre-review and Eric Caspole for helping performance tests. > > webrev: http://cr.openjdk.java.net/~jiangli/8202035/webrev.00/ > RFE: https://bugs.openjdk.java.net/browse/JDK-8202035?filter=14921 > > Tested using tier1 - tier6 via mach5 including all new test cases added in the webrev. > > Following are the details of system module archiving, which are duplicated in above bug report. > --------------------------------------------------------------------------------------------------------------------------- > Support archiving system module graph when the initial module is unnamed module from -cp currently. > > Support G1 GC, 64-bit (non-Windows). Requires UseCompressedOops and UseCompressedClassPointers. > > Dump time system module object archiving > ================================= > At dump time, the following fields in ArchivedModuleGraph are set to record the system module information created by ModuleBootstrap for archiving. > > private static SystemModules archivedSystemModules; > private static ModuleFinder archivedSystemModuleFinder; > private static String archivedMainModule; > > The archiving process starts from a given static field in ArchivedModuleGraph class instance (java mirror object). The process archives the complete network of java heap objects that are reachable directly or indirectly from the starting object by following references. > > 1. Starts from a given static field within the Class instance (java mirror). If the static field is a refererence field and points to a non-null java object, proceed to the next step. The static field and it's value is recorded and stored outside the archived mirror. > 2. Archives the referenced java object. If an archived copy of the current object already exists, updates the pointer in the archived copy of the referencing object to point to the current archived object. Otherwise, proceed to the next step. > 3. Follows all references within the current java object and recursively archive the sub-graph of objects starting from each reference encountered within the object. > 4. Updates the pointer in the archived copy of referecing object to point to the current archived object. > 5. The Klass of the current java object is added to a list of Klasses for loading and initializing before any object in the archived graph can be accessed at runtime. > > Runtime initialization from archived system module objects > ============================================ > VM.initializeFromArchive() is called from ArchivedModuleGraph's static initializer to initialize from the archived module information. Klasses in the recorded list are loaded, linked and initialized. The static fields in ArchivedModuleGraph class instance are initialized using the archived field values. After initialization, the archived system module objects can be used directly. > > If the archived java heap data is not successfully mapped at runtime, or there is an error during VM.initializeFromArchive(), then all static fields in ArchivedModuleGraph are not initialized. In that case, system ModuleDescriptor and ModuleReference objects are created as normal. > > In non-CDS mode, VM.initializeFromArchive() returns immediately with minimum added overhead for normal execution. > > Thanks, > Jiangli > > From jiangli.zhou at Oracle.COM Fri Jul 6 02:38:38 2018 From: jiangli.zhou at Oracle.COM (Jiangli Zhou) Date: Thu, 5 Jul 2018 19:38:38 -0700 Subject: RFR(L): 8202035: Archive the set of ModuleDescriptor and ModuleReference objects for system modules In-Reply-To: <64c6255c-7e5f-7688-6848-018504f479be@oracle.com> References: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> <64c6255c-7e5f-7688-6848-018504f479be@oracle.com> Message-ID: Hi Ioi, Thanks for the review! > On Jul 5, 2018, at 5:45 PM, Ioi Lam wrote: > > Hi Jiangli, > > Thank you so much for working on this. I think it's great that we can get the > start-up improvement by archiving the ModuleDescriptor. > > I just have some coding style comments regarding heapShared.cpp. This file > contains the code for coping objects and relocating pointers. By its nature, > this kind of code is usually complicated, so I think we should try to make > it as easy to understand as possible. > > > [1] HeapShared::walk_from_field_and_archiving: > > This name is not grammatically correct. How about > HeapShared::archive_reachable_objects_from_static_field Sounds good. > > [2] How about changing the parameter field_offset -> static_field_offset > When I first read the code I was confused whether it's talking > about static or instance fields. Usually, "field" > implies instance field, so it's better to specifically > say "static field?. Ok. > > [3] This code would fail if "f" is already archived. > > 473 // get the archived copy of the field referenced object > 474 oop af = MetaspaceShared::archive_heap_object(f, THREAD); > 475 WalkOopAndArchiveClosure walker(1, subgraph_info, f, af); > 476 f->oop_iterate(&walker); Hmmm, it?s possible we might encounter an archived object during reference walking & archiving in future cases. I?ll add a check. > > [4] There's duplicated code between walk_from_field_and_archiving and > WalkOopAndArchiveClosure::do_oop_work > > 403 assert(relocated_k == MetaspaceShared::get_relocated_klass(orig_k), > 404 "must be the relocated Klass in the shared space"); > 405 _subgraph_info->add_subgraph_object_klass(orig_k, relocated_k); > > - vs - > > 484 assert(relocated_k == MetaspaceShared::get_relocated_klass(orig_k), > 485 "must be the relocated Klass in the shared space"); > 486 subgraph_info->add_subgraph_object_klass(orig_k, relocated_k); I?ll move the assert into add_subgraph_object_klass(). > > [5] This code is also duplicated: > > 375 RawAccess::oop_store(new_p, archived); > 376 log.print("--- archived copy existing, store archived " PTR_FORMAT " in " PTR_FORMAT, > 377 p2i(archived), p2i(new_p)); > > - vs - > > 395 RawAccess::oop_store(new_p, archived); > 396 log.print("=== store archived " PTR_FORMAT " in " PTR_FORMAT, > 397 p2i(archived), p2i(new_p)); The first case is for existing archived copy and the second is for newly archived. The different logging messages are helpful for debugging. Not sure if using a function to encapsulate the store & log worth it in this case. Any suggestion? > > [6] This code, even though it's correct, is hard to understand -- > why are we calculating the distance between the two objects? > > 368 size_t delta = pointer_delta((HeapWord*)_archived_referencing_obj, > 369 (HeapWord*)_orig_referencing_obj); > 370 T* new_p = (T*)((HeapWord*)p + delta); > > I thin it would be easier to understand if we change the order of the > two arithmetic operations: > > // new_p is the address of the same field inside _archived_referencing_obj. > size_t field_offset_in_bytes = pointer_delta(p, _orig_referencing_obj, 1); > T* new_p = (T*)(address(_orig_referencing_obj) + field_offset_in_bytes); I think this works too. I?ll change as you suggested. > > [7] I have a hard time understand this log: > > 376 log.print("--- archived copy existing, store archived " PTR_FORMAT " in " PTR_FORMAT, > 377 p2i(archived), p2i(new_p)); > > How about this? > > log.print("--- updated embedded pointer @[" PTR_FORMAT "] => " PTR_FORMAT, > p2i(new_p), p2i(archived)); It is for the case where there is an existing copy of the archived object. Maybe ?found existing archived copy? would help? > > > For your consideration, I've incorporated my comments above into heapShared.cpp. > I've not tested it so it most likely won't build :-( > > > http://cr.openjdk.java.net/~iklam/misc/heapShared.old.cpp [your version] > http://cr.openjdk.java.net/~iklam/misc/heapShared.new.cpp [my version] > > Please take a look and see if you like it. Thanks a lot! I?ll take a look and incorporate your suggestions. Thanks again! Jiangli > > Thanks > - Ioi > > On 6/28/18 4:15 PM, Jiangli Zhou wrote: >> This is a follow-up RFE of JDK-8201650 (Move iteration order randomization of unmodifiable Set and Map to iterators), which was resolved to allow Set/Map objects being archived at CDS dump time (thanks Claes and Stuart Marks). In the current RFE, it archives the set of system ModuleReference and ModuleDescriptor objects (including their referenced objects) in 'open' archive heap region at CDS dump time. It allows reusing of the objects and bypassing the process of creating the system ModuleDescriptors and ModuleReferences at runtime for startup improvement. My preliminary measurements on linux-x64 showed ~5% startup improvement when running HelloWorld from -cp using archived module objects at runtime (without extra tuning). >> >> The library changes in the following webrev are contributed by Alan Bateman. Thanks Alan and Mandy for discussions and help. Thanks Karen, Lois and Ioi for discussion and suggestions on initialization ordering. >> >> The majority of the module object archiving code are in heapShared.hpp and heapShared.cpp. Thanks Coleen for pre-review and Eric Caspole for helping performance tests. >> >> webrev: http://cr.openjdk.java.net/~jiangli/8202035/webrev.00/ >> RFE: https://bugs.openjdk.java.net/browse/JDK-8202035?filter=14921 >> >> Tested using tier1 - tier6 via mach5 including all new test cases added in the webrev. >> >> Following are the details of system module archiving, which are duplicated in above bug report. >> --------------------------------------------------------------------------------------------------------------------------- >> Support archiving system module graph when the initial module is unnamed module from -cp currently. >> >> Support G1 GC, 64-bit (non-Windows). Requires UseCompressedOops and UseCompressedClassPointers. >> >> Dump time system module object archiving >> ================================= >> At dump time, the following fields in ArchivedModuleGraph are set to record the system module information created by ModuleBootstrap for archiving. >> >> private static SystemModules archivedSystemModules; >> private static ModuleFinder archivedSystemModuleFinder; >> private static String archivedMainModule; >> >> The archiving process starts from a given static field in ArchivedModuleGraph class instance (java mirror object). The process archives the complete network of java heap objects that are reachable directly or indirectly from the starting object by following references. >> >> 1. Starts from a given static field within the Class instance (java mirror). If the static field is a refererence field and points to a non-null java object, proceed to the next step. The static field and it's value is recorded and stored outside the archived mirror. >> 2. Archives the referenced java object. If an archived copy of the current object already exists, updates the pointer in the archived copy of the referencing object to point to the current archived object. Otherwise, proceed to the next step. >> 3. Follows all references within the current java object and recursively archive the sub-graph of objects starting from each reference encountered within the object. >> 4. Updates the pointer in the archived copy of referecing object to point to the current archived object. >> 5. The Klass of the current java object is added to a list of Klasses for loading and initializing before any object in the archived graph can be accessed at runtime. >> >> Runtime initialization from archived system module objects >> ============================================ >> VM.initializeFromArchive() is called from ArchivedModuleGraph's static initializer to initialize from the archived module information. Klasses in the recorded list are loaded, linked and initialized. The static fields in ArchivedModuleGraph class instance are initialized using the archived field values. After initialization, the archived system module objects can be used directly. >> >> If the archived java heap data is not successfully mapped at runtime, or there is an error during VM.initializeFromArchive(), then all static fields in ArchivedModuleGraph are not initialized. In that case, system ModuleDescriptor and ModuleReference objects are created as normal. >> >> In non-CDS mode, VM.initializeFromArchive() returns immediately with minimum added overhead for normal execution. >> >> Thanks, >> Jiangli >> >> > From erik.helin at oracle.com Fri Jul 6 12:09:59 2018 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 6 Jul 2018 14:09:59 +0200 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> Message-ID: <5345e792-4a60-ea6d-e0b0-79aacae0e484@oracle.com> On 07/05/2018 09:16 AM, Thomas Schatzl wrote: > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.1 (full) > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.0_to_1 (diff, but > almost useless due to many changes) > > That at least separates the concerns about humongous/regular region a > bit. This version looks much better to me as well, thanks for refactoring the patch! I agree with Kim's comments and in addition I would also have "inlined" is_interesting_humongous_region into update_humongous_before_rebuild, so the `if` in update_humongous_before_rebuild becomes: bool is_type_array = oop(r->humongous_start_region()->bottom())->is_typeArray()); if (is_live && is_type_array && !r->rem_set()->is_tracked()) { r->rem_set()->set_state_updating(); selected_for_rebuild = true; } This change makes the comment easier to follow (at least for me). The patch also uses so called "east-side-const" in g1ConcurrentMark.cpp, but that doesn't matter too much since g1ConcurrentMark.cpp seems to use both "west-side-const" and "east-side-const" in equal proportions (that however should probably be cleaned up). Thanks, Erik From thomas.schatzl at oracle.com Fri Jul 6 13:10:16 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 06 Jul 2018 15:10:16 +0200 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <68B2A585-08B4-4741-93C6-B68D3CC801CA@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> <68B2A585-08B4-4741-93C6-B68D3CC801CA@oracle.com> Message-ID: <8f49bc7b4dbdf3255f4c2dc83354101c1b64ae2f.camel@oracle.com> Hi, On Thu, 2018-07-05 at 16:53 -0400, Kim Barrett wrote: > > On Jul 5, 2018, at 3:16 AM, Thomas Schatzl > om> wrote: > > There is a new webrev at > > > > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.1 (full) > > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.0_to_1 (diff, > > but > > almost useless due to many changes) > > > > That at least separates the concerns about humongous/regular region > > a > > bit. > > > > Thanks, > > Thomas > > I like this much better. It eliminates the implicit logical coupling > that the before rebuild task "knows" the liveness of the starts > region > is good enough, without introducing physical coupling from remset to > concurrentmark. > > ------------------------------------------------------------------- > ----------- > src/hotspot/share/gc/g1/g1RemSetTrackingPolicy.cpp > 116 if (!r->is_old() && r->is_archive()) { > > I think that should be || rather than &&. > > ------------------------------------------------------------------- > ----------- > src/hotspot/share/gc/g1/g1RemSetTrackingPolicy.cpp > 111 bool G1RemSetTrackingPolicy::update_before_rebuild(HeapRegion* > r, size_t live_bytes) { > > Consider adding "assert(!r->is_humongous(), ...)". The !r->is_old() > will filter them out, but we shouldn't be here at all and should have > instead called the associated update_humongous function. > > ------------------------------------------------------------------- > ----------- > fixed all that and Erik's suggestion. New webrev: http://cr.openjdk.java.net/~tschatzl/8205426/webrev.2 (full) http://cr.openjdk.java.net/~tschatzl/8205426/webrev.1_to_2 (diff) It passed hs-tier1-4,jdk-tier1-3 Thanks, Thomas From erik.helin at oracle.com Fri Jul 6 13:29:21 2018 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 6 Jul 2018 15:29:21 +0200 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <8f49bc7b4dbdf3255f4c2dc83354101c1b64ae2f.camel@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> <68B2A585-08B4-4741-93C6-B68D3CC801CA@oracle.com> <8f49bc7b4dbdf3255f4c2dc83354101c1b64ae2f.camel@oracle.com> Message-ID: <1ad12e1f-1bc4-5ab5-1d51-838ac0bd980e@oracle.com> On 07/06/2018 03:10 PM, Thomas Schatzl wrote: > New webrev: > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.2 (full) > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.1_to_2 (diff) Looks good, Reviewed! Thanks, Erik > It passed hs-tier1-4,jdk-tier1-3 > > Thanks, > Thomas > From thomas.schatzl at oracle.com Fri Jul 6 13:39:23 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 06 Jul 2018 15:39:23 +0200 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <61def7df-d441-4a65-ea04-18e282b94db9@redhat.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> <344187d3db4f3a069a70a730cc1c3b6555243f9d.camel@oracle.com> <61def7df-d441-4a65-ea04-18e282b94db9@redhat.com> Message-ID: <63818f4b77b2aee712e6001fd798adc97ba246bf.camel@oracle.com> Hi Zhengyu, On Thu, 2018-07-05 at 19:22 -0400, Zhengyu Gu wrote: > Hi Thomas, > > > [..] > > There is another bug in the existing code: current Hotspot > > collectors > > all reuse a single task queue set. So since the queue id's are only > > initialized once at startup, there will be some initial use of a > > suboptimal queue. > > Technically, it is a bug. I doubt it will have material impact, > cause the old value probably just as good as next random one. Agree. [...] > As I mentioned in early email, I would prefer to pass > TaskQueueStealLocals/Context, but I am afraid of venturing into > other GCs that I am not familiar with. > > Thomas, seems you have made other changes/improvements, do you want > to take over this RFE? I am fine with either ways. I assigned the issue to myselves. :) Thanks, Thomas From zgu at redhat.com Fri Jul 6 13:44:46 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 6 Jul 2018 09:44:46 -0400 Subject: RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection In-Reply-To: <63818f4b77b2aee712e6001fd798adc97ba246bf.camel@oracle.com> References: <904c2ea5-0935-2c4d-fcbd-6b90238b4dc4@redhat.com> <1A7B4B34-68A1-49D6-AA2D-39FD8A7502CB@oracle.com> <43e0e7278da5684daf450b9847c67362ec361b08.camel@oracle.com> <9e6d0156-ecbb-45e3-a345-fc7a0f5a14c3@redhat.com> <90E9F360-602F-4E02-9800-C5C3231D1827@oracle.com> <344187d3db4f3a069a70a730cc1c3b6555243f9d.camel@oracle.com> <61def7df-d441-4a65-ea04-18e282b94db9@redhat.com> <63818f4b77b2aee712e6001fd798adc97ba246bf.camel@oracle.com> Message-ID: >> Thomas, seems you have made other changes/improvements, do you want >> to take over this RFE? I am fine with either ways. > > I assigned the issue to myselves. :) Thank you! -Zhengyu > > Thanks, > Thomas > From thomas.schatzl at oracle.com Fri Jul 6 14:11:47 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 06 Jul 2018 16:11:47 +0200 Subject: RFR (S): 8206453: Taskqueue stats should count real steal attempts, not calls to GenericTaskQueueSet::steal Message-ID: <1992fab92d03fbbe64218aab7562a59e41d2b553.camel@oracle.com> Hi all, can I have reviews for some small change to how successful steals and steal attempts in taskqueue statistics are gathered? In particular, as the subject of the CR suggests, the "steal attempts" counter the number of calls to GenericTaskQueueSet::steal() which internally actually attempts stealing quite often. This makes a useful comparison of steal attempts to successful steals impossible and misleading. The calls to GenericTaskQueueSet::steal are mostly reflected to the number of termination attempts that is already counted elsewhere (ie. steal attempts - successful steals), so it does not really give new information. CR: https://bugs.openjdk.java.net/browse/JDK-8206453 Webrev: http://cr.openjdk.java.net/~tschatzl/8206453/webrev/ Testing: local compilation and use with and without TASKQUEUE_STATS. Thanks, Thomas From rkennke at redhat.com Fri Jul 6 14:46:59 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 6 Jul 2018 16:46:59 +0200 Subject: RFR: JDK-8206457: Code paths from oop_iterate() must use barrier-free access Message-ID: <28394f57-3590-3d74-b660-dcfa6b4648a2@redhat.com> We have several code paths going out from oop_iterate() methods that lead to GC barriers. This is not only inefficient but outright wrong. oop_iterate() is normally used by GC and GC need to see the raw stuff, not some resolved objects. In Shenandoah's full-GC it's fatal to attempt to read objects's forwarding pointers, because it's temporarily pointing to nowhere land. I propose to selectively use _raw() variants of the various accessors that are used on oop_iterate() paths. This means to introduce an oopDesc::int_field_raw(). I also propose to change metadata_field() accessors to always use raw access wholesale. This is only used to load the Klass* field, which is immutable and thus doesn't require barriers. The log_* statements in instanceRefKlass.inline.hpp surely don't need barriers. I turned them into raw accessors as well. Bug: https://bugs.openjdk.java.net/browse/JDK-8206457?filter=-1 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8206457/webrev.00/ Test: passes hotspot-tier1 here. Can I please get review? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From rkennke at redhat.com Fri Jul 6 15:28:11 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 6 Jul 2018 17:28:11 +0200 Subject: RFR: JDK-8204970: Remaing object comparisons need to use oopDesc::equals() Message-ID: <2b92069f-f378-2984-8518-68230238eaec@redhat.com> We found 2 more places where oopDesc::equals() should be used instead of raw obj==obj. Bug: https://bugs.openjdk.java.net/browse/JDK-8204970 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8204970/webrev.00/ Passes tier1 tests Can I get a review? Thanks, Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From calvin.cheung at oracle.com Fri Jul 6 16:15:39 2018 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Fri, 06 Jul 2018 09:15:39 -0700 Subject: RFR(L): 8202035: Archive the set of ModuleDescriptor and ModuleReference objects for system modules In-Reply-To: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> References: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> Message-ID: <5B3F95AB.7060702@oracle.com> Hi Jiangli, Thanks for this start-up improvement. The changes look good overall. I've the following minor comments. 1) make/hotspot/symbols/symbols-unix 134 JVM_InitializeFromArchive If you want the symbols to be in alphabetical order, the above should be moved after JVM_InitStackTraceElementArray. 2) metaspaceShared.cpp 1927 oop MetaspaceShared::materialize_archived_object(oop obj) { 1928 if (obj != NULL) { 1929 return G1CollectedHeap::heap()->materialize_archived_object(obj); 1930 } 1931 return NULL; 1932 } Instead of two return statements, how about replacing lines 1928 - 1931 with the following? return (obj != NULL) ? G1CollectedHeap::heap()->materialize_archived_object(obj) : NULL; 3) ArchivedModuleComboTest.java 55 Path moduleDir = Files.createTempDirectory(userDir, "mods"); I don't see anything got placed under the "mods" dir, is it by design? For the "dump with --module-path" cases, there seems to be a missing test case with "--show-module-resolution" (similar to Test case 2). 4) CheckArchivedModuleApp.java 53 if (expectArchived && wb.isShared(md)) { 54 System.out.println(name + " is archived. Expected."); 55 } else if (!expectArchived && !wb.isShared(md)) { 56 System.out.println(name + " is not archived. Expected."); 57 } else if (expectArchived) { 58 throw new RuntimeException( 59 "FAILED. " + name + " is not archived. Expect archived."); 60 } else { 61 throw new RuntimeException( 62 "FAILED. " + name + " is archived. Expect not archived."); 63 } I'd suggest the following so that the code is easier to understand: if (expectArchived) { if (wb.isShared(md)) { System.out.println(name + " is archived. Expected."); } else { throw new RuntimeException( "FAILED. " + name + " is not archived. Expect archived."); } } else { if (!wb.isShared(md)) { System.out.println(name + " is not archived. Expected."); } else { throw new RuntimeException( "FAILED. " + name + " is archived. Expect not archived."); } } 5) ArchivedModuleWithCustomImageTest.java 178 private static void printCommand(String opts[]) { 179 StringBuilder cmdLine = new StringBuilder(); 180 for (String cmd : opts) 181 cmdLine.append(cmd).append(' '); 182 System.out.println("Command line: [" + cmdLine.toString() + "]"); 183 } Consider putting the above method in ProcessTools.java so that ProcessTools.createJavaProcessBuilder() and the above test can call it and avoiding duplicate code. A separate follow-up bug to address this is fine. 6) PrintSystemModulesApp.java I don't think it is being used? thanks, Calvin On 6/28/18, 4:15 PM, Jiangli Zhou wrote: > This is a follow-up RFE of JDK-8201650 (Move iteration order randomization of unmodifiable Set and Map to iterators), which was resolved to allow Set/Map objects being archived at CDS dump time (thanks Claes and Stuart Marks). In the current RFE, it archives the set of system ModuleReference and ModuleDescriptor objects (including their referenced objects) in 'open' archive heap region at CDS dump time. It allows reusing of the objects and bypassing the process of creating the system ModuleDescriptors and ModuleReferences at runtime for startup improvement. My preliminary measurements on linux-x64 showed ~5% startup improvement when running HelloWorld from -cp using archived module objects at runtime (without extra tuning). > > The library changes in the following webrev are contributed by Alan Bateman. Thanks Alan and Mandy for discussions and help. Thanks Karen, Lois and Ioi for discussion and suggestions on initialization ordering. > > The majority of the module object archiving code are in heapShared.hpp and heapShared.cpp. Thanks Coleen for pre-review and Eric Caspole for helping performance tests. > > webrev: http://cr.openjdk.java.net/~jiangli/8202035/webrev.00/ > RFE: https://bugs.openjdk.java.net/browse/JDK-8202035?filter=14921 > > Tested using tier1 - tier6 via mach5 including all new test cases added in the webrev. > > Following are the details of system module archiving, which are duplicated in above bug report. > --------------------------------------------------------------------------------------------------------------------------- > Support archiving system module graph when the initial module is unnamed module from -cp currently. > > Support G1 GC, 64-bit (non-Windows). Requires UseCompressedOops and UseCompressedClassPointers. > > Dump time system module object archiving > ================================= > At dump time, the following fields in ArchivedModuleGraph are set to record the system module information created by ModuleBootstrap for archiving. > > private static SystemModules archivedSystemModules; > private static ModuleFinder archivedSystemModuleFinder; > private static String archivedMainModule; > > The archiving process starts from a given static field in ArchivedModuleGraph class instance (java mirror object). The process archives the complete network of java heap objects that are reachable directly or indirectly from the starting object by following references. > > 1. Starts from a given static field within the Class instance (java mirror). If the static field is a refererence field and points to a non-null java object, proceed to the next step. The static field and it's value is recorded and stored outside the archived mirror. > 2. Archives the referenced java object. If an archived copy of the current object already exists, updates the pointer in the archived copy of the referencing object to point to the current archived object. Otherwise, proceed to the next step. > 3. Follows all references within the current java object and recursively archive the sub-graph of objects starting from each reference encountered within the object. > 4. Updates the pointer in the archived copy of referecing object to point to the current archived object. > 5. The Klass of the current java object is added to a list of Klasses for loading and initializing before any object in the archived graph can be accessed at runtime. > > Runtime initialization from archived system module objects > ============================================ > VM.initializeFromArchive() is called from ArchivedModuleGraph's static initializer to initialize from the archived module information. Klasses in the recorded list are loaded, linked and initialized. The static fields in ArchivedModuleGraph class instance are initialized using the archived field values. After initialization, the archived system module objects can be used directly. > > If the archived java heap data is not successfully mapped at runtime, or there is an error during VM.initializeFromArchive(), then all static fields in ArchivedModuleGraph are not initialized. In that case, system ModuleDescriptor and ModuleReference objects are created as normal. > > In non-CDS mode, VM.initializeFromArchive() returns immediately with minimum added overhead for normal execution. > > Thanks, > Jiangli > > From kim.barrett at oracle.com Fri Jul 6 16:33:28 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Jul 2018 12:33:28 -0400 Subject: RFR (S): 8206453: Taskqueue stats should count real steal attempts, not calls to GenericTaskQueueSet::steal In-Reply-To: <1992fab92d03fbbe64218aab7562a59e41d2b553.camel@oracle.com> References: <1992fab92d03fbbe64218aab7562a59e41d2b553.camel@oracle.com> Message-ID: <93B88660-BA2D-4387-A311-B73F5EBFCC41@oracle.com> > On Jul 6, 2018, at 10:11 AM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for some small change to how successful steals and > steal attempts in taskqueue statistics are gathered? In particular, as > the subject of the CR suggests, the "steal attempts" counter the number > of calls to GenericTaskQueueSet::steal() which internally actually > attempts stealing quite often. > > This makes a useful comparison of steal attempts to successful steals > impossible and misleading. > > The calls to GenericTaskQueueSet::steal are mostly reflected to the > number of termination attempts that is already counted elsewhere (ie. > steal attempts - successful steals), so it does not really give new > information. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8206453 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8206453/webrev/ > Testing: > local compilation and use with and without TASKQUEUE_STATS. > > Thanks, > Thomas Please change record_attempt to record_steal_attempt. Otherwise, looks good. I don't need a new webrev for that renaming. From zgu at redhat.com Fri Jul 6 16:56:05 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 6 Jul 2018 12:56:05 -0400 Subject: RFR(S) 8206467: Refactor G1ParallelCleaningTask into shared Message-ID: <15cc2aaf-54bc-fbe2-f9a1-e66168fa7a54@redhat.com> Hi, Shenandoah has a similar version that was derived from G1ParallelCleaningTask, with additional time tracking of each cleaning task. Due to code movement and renaming, the changeset appears to be large, but it is really not. Other than the renaming, followings are the actual diffs vs. current G1ParallelCleaningTask: - G1StringDedupUnlinkOrOopsDoClosure is passed as a parameter. - Counters (_string_processed, _string_removed, and etc.) in StringAndSymbolCleaningTask are volatile now, cause they are updated using atomic operations. - Added ParallelCleaningTimes and ParallelCleaningTaskTimer classes for tracking times. - ParallelCleaningTask::work() added time tracking code. Bug: https://bugs.openjdk.java.net/browse/JDK-8206467 Webrev: http://cr.openjdk.java.net/~zgu/8206467/webrev.00/ Test: hotspot_gc on Linux 64 (fastdebug and release) Thanks, -Zhengyu From jiangli.zhou at Oracle.COM Fri Jul 6 19:34:59 2018 From: jiangli.zhou at Oracle.COM (Jiangli Zhou) Date: Fri, 6 Jul 2018 12:34:59 -0700 Subject: RFR(L): 8202035: Archive the set of ModuleDescriptor and ModuleReference objects for system modules In-Reply-To: <5B3F95AB.7060702@oracle.com> References: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> <5B3F95AB.7060702@oracle.com> Message-ID: Hi Calvin, Thanks for the review! Here is the updated webrevs that address the feedbacks from you and Ioi: http://cr.openjdk.java.net/~jiangli/8202035/webrev_inc.01/ Full webrev: http://cr.openjdk.java.net/~jiangli/8202035/webrev_full.01/ > On Jul 6, 2018, at 9:15 AM, Calvin Cheung wrote: > > Hi Jiangli, > > Thanks for this start-up improvement. The changes look good overall. I've the following minor comments. > > 1) make/hotspot/symbols/symbols-unix > > 134 JVM_InitializeFromArchive > > If you want the symbols to be in alphabetical order, the above should be moved after JVM_InitStackTraceElementArray. Fixed. > > 2) metaspaceShared.cpp > > 1927 oop MetaspaceShared::materialize_archived_object(oop obj) { > 1928 if (obj != NULL) { > 1929 return G1CollectedHeap::heap()->materialize_archived_object(obj); > 1930 } > 1931 return NULL; > 1932 } > > Instead of two return statements, how about replacing lines 1928 - 1931 with the following? > > return (obj != NULL) ? G1CollectedHeap::heap()->materialize_archived_object(obj) : NULL; The original format probably is slightly easier to read, so I left it unchanged. Hope that?s okay with you. > > 3) ArchivedModuleComboTest.java > > 55 Path moduleDir = Files.createTempDirectory(userDir, "mods"); > > I don't see anything got placed under the "mods" dir, is it by design? Yes. > > For the "dump with --module-path" cases, there seems to be a missing test case with "--show-module-resolution" (similar to Test case 2). When --module-path is specified at dump time, system module graph is not archived currently. There is no need for additional test case with --show-module-resolution in this case since all module objects are created as normal. > > > 4) CheckArchivedModuleApp.java > > 53 if (expectArchived && wb.isShared(md)) { > 54 System.out.println(name + " is archived. Expected."); > 55 } else if (!expectArchived && !wb.isShared(md)) { > 56 System.out.println(name + " is not archived. Expected."); > 57 } else if (expectArchived) { > 58 throw new RuntimeException( > 59 "FAILED. " + name + " is not archived. Expect archived."); > 60 } else { > 61 throw new RuntimeException( > 62 "FAILED. " + name + " is archived. Expect not archived."); > 63 } > > I'd suggest the following so that the code is easier to understand: > > if (expectArchived) { > if (wb.isShared(md)) { > System.out.println(name + " is archived. Expected."); > } else { > throw new RuntimeException( > "FAILED. " + name + " is not archived. Expect archived."); > } > } else { > if (!wb.isShared(md)) { > System.out.println(name + " is not archived. Expected."); > } else { > throw new RuntimeException( > "FAILED. " + name + " is archived. Expect not archived."); > } > } Reformatted as suggested. > > 5) ArchivedModuleWithCustomImageTest.java > > 178 private static void printCommand(String opts[]) { > 179 StringBuilder cmdLine = new StringBuilder(); > 180 for (String cmd : opts) > 181 cmdLine.append(cmd).append(' '); > 182 System.out.println("Command line: [" + cmdLine.toString() + "]"); > 183 } > > Consider putting the above method in ProcessTools.java so that ProcessTools.createJavaProcessBuilder() and the above test can call it and avoiding duplicate code. > A separate follow-up bug to address this is fine. That sounds good to me. We might need some reformatting for consolidation. I will file a follow-up RFE. > > 6) PrintSystemModulesApp.java > > I don't think it is being used? It?s used by ArchivedModuleCompareTest.java. Looks like it was missing from the earlier webrev. Thanks for catching that. The file is included in the updated webrev. Thanks! Jiangli > > thanks, > Calvin > > On 6/28/18, 4:15 PM, Jiangli Zhou wrote: >> This is a follow-up RFE of JDK-8201650 (Move iteration order randomization of unmodifiable Set and Map to iterators), which was resolved to allow Set/Map objects being archived at CDS dump time (thanks Claes and Stuart Marks). In the current RFE, it archives the set of system ModuleReference and ModuleDescriptor objects (including their referenced objects) in 'open' archive heap region at CDS dump time. It allows reusing of the objects and bypassing the process of creating the system ModuleDescriptors and ModuleReferences at runtime for startup improvement. My preliminary measurements on linux-x64 showed ~5% startup improvement when running HelloWorld from -cp using archived module objects at runtime (without extra tuning). >> >> The library changes in the following webrev are contributed by Alan Bateman. Thanks Alan and Mandy for discussions and help. Thanks Karen, Lois and Ioi for discussion and suggestions on initialization ordering. >> >> The majority of the module object archiving code are in heapShared.hpp and heapShared.cpp. Thanks Coleen for pre-review and Eric Caspole for helping performance tests. >> >> webrev: http://cr.openjdk.java.net/~jiangli/8202035/webrev.00/ >> RFE: https://bugs.openjdk.java.net/browse/JDK-8202035?filter=14921 >> >> Tested using tier1 - tier6 via mach5 including all new test cases added in the webrev. >> >> Following are the details of system module archiving, which are duplicated in above bug report. >> --------------------------------------------------------------------------------------------------------------------------- >> Support archiving system module graph when the initial module is unnamed module from -cp currently. >> >> Support G1 GC, 64-bit (non-Windows). Requires UseCompressedOops and UseCompressedClassPointers. >> >> Dump time system module object archiving >> ================================= >> At dump time, the following fields in ArchivedModuleGraph are set to record the system module information created by ModuleBootstrap for archiving. >> >> private static SystemModules archivedSystemModules; >> private static ModuleFinder archivedSystemModuleFinder; >> private static String archivedMainModule; >> >> The archiving process starts from a given static field in ArchivedModuleGraph class instance (java mirror object). The process archives the complete network of java heap objects that are reachable directly or indirectly from the starting object by following references. >> >> 1. Starts from a given static field within the Class instance (java mirror). If the static field is a refererence field and points to a non-null java object, proceed to the next step. The static field and it's value is recorded and stored outside the archived mirror. >> 2. Archives the referenced java object. If an archived copy of the current object already exists, updates the pointer in the archived copy of the referencing object to point to the current archived object. Otherwise, proceed to the next step. >> 3. Follows all references within the current java object and recursively archive the sub-graph of objects starting from each reference encountered within the object. >> 4. Updates the pointer in the archived copy of referecing object to point to the current archived object. >> 5. The Klass of the current java object is added to a list of Klasses for loading and initializing before any object in the archived graph can be accessed at runtime. >> >> Runtime initialization from archived system module objects >> ============================================ >> VM.initializeFromArchive() is called from ArchivedModuleGraph's static initializer to initialize from the archived module information. Klasses in the recorded list are loaded, linked and initialized. The static fields in ArchivedModuleGraph class instance are initialized using the archived field values. After initialization, the archived system module objects can be used directly. >> >> If the archived java heap data is not successfully mapped at runtime, or there is an error during VM.initializeFromArchive(), then all static fields in ArchivedModuleGraph are not initialized. In that case, system ModuleDescriptor and ModuleReference objects are created as normal. >> >> In non-CDS mode, VM.initializeFromArchive() returns immediately with minimum added overhead for normal execution. >> >> Thanks, >> Jiangli >> >> From mandy.chung at oracle.com Fri Jul 6 20:40:03 2018 From: mandy.chung at oracle.com (mandy chung) Date: Fri, 6 Jul 2018 13:40:03 -0700 Subject: RFR(L): 8202035: Archive the set of ModuleDescriptor and ModuleReference objects for system modules In-Reply-To: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> References: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> Message-ID: <9aafedcf-4abf-77fd-e455-823c9e11c8b0@oracle.com> Hi Jiangli, On 6/28/18 4:15 PM, Jiangli Zhou wrote:> webrev: http://cr.openjdk.java.net/~jiangli/8202035/webrev.00/ > RFE: https://bugs.openjdk.java.net/browse/JDK-8202035?filter=14921 Good work. I'm glad to see a pretty good startup improvement. I reviewed java.base change that looks good. Mandy From jiangli.zhou at oracle.com Fri Jul 6 20:41:30 2018 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Fri, 6 Jul 2018 13:41:30 -0700 Subject: RFR(L): 8202035: Archive the set of ModuleDescriptor and ModuleReference objects for system modules In-Reply-To: <9aafedcf-4abf-77fd-e455-823c9e11c8b0@oracle.com> References: <386DA770-8E9D-43A0-87CE-0E380977F884@oracle.com> <9aafedcf-4abf-77fd-e455-823c9e11c8b0@oracle.com> Message-ID: <39F0EBB2-3721-4A08-9661-955E4B5E6920@oracle.com> Thanks a lot for reviewing, Mandy! Jiangli > On Jul 6, 2018, at 1:40 PM, mandy chung wrote: > > Hi Jiangli, > > On 6/28/18 4:15 PM, Jiangli Zhou wrote:> webrev: http://cr.openjdk.java.net/~jiangli/8202035/webrev.00/ >> RFE: https://bugs.openjdk.java.net/browse/JDK-8202035?filter=14921 > > Good work. I'm glad to see a pretty good startup improvement. > > I reviewed java.base change that looks good. > > Mandy From kim.barrett at oracle.com Sat Jul 7 03:18:02 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Jul 2018 23:18:02 -0400 Subject: RFR: JDK-8204970: Remaing object comparisons need to use oopDesc::equals() In-Reply-To: <2b92069f-f378-2984-8518-68230238eaec@redhat.com> References: <2b92069f-f378-2984-8518-68230238eaec@redhat.com> Message-ID: > On Jul 6, 2018, at 11:28 AM, Roman Kennke wrote: > > We found 2 more places where oopDesc::equals() should be used instead of > raw obj==obj. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8204970 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8204970/webrev.00/ > > Passes tier1 tests > > Can I get a review? > > Thanks, > Roman This looks good. How close are we to being able to remove operator== and operator!= from the oop class that is defined when CHECK_UNHANDLED_OOPS is defined? I suspect the main problem is checks for NULL? From kim.barrett at oracle.com Sat Jul 7 03:20:02 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 6 Jul 2018 23:20:02 -0400 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <8f49bc7b4dbdf3255f4c2dc83354101c1b64ae2f.camel@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> <68B2A585-08B4-4741-93C6-B68D3CC801CA@oracle.com> <8f49bc7b4dbdf3255f4c2dc83354101c1b64ae2f.camel@oracle.com> Message-ID: <29DEBCBF-A50B-44D7-BDE2-FF88BA99C3F7@oracle.com> > On Jul 6, 2018, at 9:10 AM, Thomas Schatzl wrote: > > Hi, > > On Thu, 2018-07-05 at 16:53 -0400, Kim Barrett wrote: >>> On Jul 5, 2018, at 3:16 AM, Thomas Schatzl >> om> wrote: >>> There is a new webrev at >>> >>> http://cr.openjdk.java.net/~tschatzl/8205426/webrev.1 (full) >>> http://cr.openjdk.java.net/~tschatzl/8205426/webrev.0_to_1 (diff, >>> but >>> almost useless due to many changes) >>> >>> That at least separates the concerns about humongous/regular region >>> a >>> bit. >>> >>> Thanks, >>> Thomas >> >> I like this much better. It eliminates the implicit logical coupling >> that the before rebuild task "knows" the liveness of the starts >> region >> is good enough, without introducing physical coupling from remset to >> concurrentmark. >> >> ------------------------------------------------------------------- >> ----------- >> src/hotspot/share/gc/g1/g1RemSetTrackingPolicy.cpp >> 116 if (!r->is_old() && r->is_archive()) { >> >> I think that should be || rather than &&. >> >> ------------------------------------------------------------------- >> ----------- >> src/hotspot/share/gc/g1/g1RemSetTrackingPolicy.cpp >> 111 bool G1RemSetTrackingPolicy::update_before_rebuild(HeapRegion* >> r, size_t live_bytes) { >> >> Consider adding "assert(!r->is_humongous(), ...)". The !r->is_old() >> will filter them out, but we shouldn't be here at all and should have >> instead called the associated update_humongous function. >> >> ------------------------------------------------------------------- >> ----------- >> > > fixed all that and Erik's suggestion. > > New webrev: > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.2 (full) > http://cr.openjdk.java.net/~tschatzl/8205426/webrev.1_to_2 (diff) > > It passed hs-tier1-4,jdk-tier1-3 > > Thanks, > Thomas Looks good. From rkennke at redhat.com Sat Jul 7 10:52:44 2018 From: rkennke at redhat.com (Roman Kennke) Date: Sat, 7 Jul 2018 12:52:44 +0200 Subject: RFR: JDK-8204970: Remaing object comparisons need to use oopDesc::equals() In-Reply-To: References: <2b92069f-f378-2984-8518-68230238eaec@redhat.com> Message-ID: Am 07.07.2018 um 05:18 schrieb Kim Barrett: >> On Jul 6, 2018, at 11:28 AM, Roman Kennke wrote: >> >> We found 2 more places where oopDesc::equals() should be used instead of >> raw obj==obj. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8204970 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8204970/webrev.00/ >> >> Passes tier1 tests >> >> Can I get a review? >> >> Thanks, >> Roman > > This looks good. > > How close are we to being able to remove operator== and operator!= from the oop class that > is defined when CHECK_UNHANDLED_OOPS is defined? I suspect the main problem is > checks for NULL? The main problems are all those places where we actually want to use naked comparisons, especially inside GC code. In Shenandoah, we actually put checks in the == and != operators to catch unintended raw == and !=: https://builds.shipilev.net/patch-openjdk-shenandoah-jdk/2018-07-06-v255-vs-dea7ce62c7b0/src/hotspot/share/oops/oopsHierarchy.hpp.udiff.html But this requires all *intended* raw comparisons to be expressed differently, in Shenandoah we have a special unsafe_equals() method that casts to oop to HeapWord* and compare that, but we could use RawAccessBarrier::equals() for this now. These verification checks have proven to be very useful to catch bad naked ==, I'd like to upstream this soon if you agree. WDYT? Cheers, Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From kim.barrett at oracle.com Sun Jul 8 15:52:32 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 8 Jul 2018 11:52:32 -0400 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> References: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> Message-ID: > On Jul 5, 2018, at 4:03 PM, Kim Barrett wrote: > >> On Jul 5, 2018, at 3:57 AM, Thomas Schatzl wrote: >> >> Hi, >> >> On Wed, 2018-07-04 at 20:13 -0400, Kim Barrett wrote: >>> Please review this fix of the HeapRegion gtest. >>> >>> The test modifies a region's "top" to unexpected values without >>> ensuring that no allocation might use the region and no GC might run >>> while the region is in that invalid state. We solve this by >>> executing the test code in its very own safepoint, and by saving and >>> then restoring the region's top back to its original value before >>> completing the test. And since we are doing all that, there's no >>> longer any reason to run the test in a separate VM. >> >> looks good, but the actual test is still run in a separate VM. >> Intentional? > > Unintentional. And now I?m not sure what I last ran through mach5. > I?ll re-test with TEST_OTHER_VM => TEST_VM. > > I know that failed in an obscure way earlier, but I think that was because > of an unrelated recently introduced bug that?s been fixed in the repo. Verified that I really have tested in same VM. New webrev: http://cr.openjdk.java.net/~kbarrett/8204691/open.01/ The only change is TEST_OTHER_VM => TEST_VM. From thomas.schatzl at oracle.com Mon Jul 9 08:30:43 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 09 Jul 2018 10:30:43 +0200 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: References: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> Message-ID: <55d3dd07125645c1f86b274a6eb7ceb28a1286d2.camel@oracle.com> Hi Kim, On Sun, 2018-07-08 at 11:52 -0400, Kim Barrett wrote: > > On Jul 5, 2018, at 4:03 PM, Kim Barrett > > wrote: > > > > > On Jul 5, 2018, at 3:57 AM, Thomas Schatzl > > > wrote: > > > > > > On Wed, 2018-07-04 at 20:13 -0400, Kim Barrett wrote: > > > > Please review this fix of the HeapRegion gtest. > > > > [...] > > > > > > looks good, but the actual test is still run in a separate VM. > > > Intentional? > > > > Unintentional. And now I?m not sure what I last ran through mach5. > > I?ll re-test with TEST_OTHER_VM => TEST_VM. > > > > I know that failed in an obscure way earlier, but I think that was > > because of an unrelated recently introduced bug that?s been fixed > > in the repo. > > Verified that I really have tested in same VM. New webrev: > http://cr.openjdk.java.net/~kbarrett/8204691/open.01/ > > The only change is TEST_OTHER_VM => TEST_VM. > thanks. Looks good. Thomas From thomas.schatzl at oracle.com Mon Jul 9 08:31:31 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 09 Jul 2018 10:31:31 +0200 Subject: RFR (S): 8205426: Humongous continues remembered set does not match humongous start region one in Kitchensink In-Reply-To: <29DEBCBF-A50B-44D7-BDE2-FF88BA99C3F7@oracle.com> References: <69519db2cd7fe431357a5a05b89bba17cdd0eaaa.camel@oracle.com> <54BF88C5-5835-49B7-8E1F-E21A4E429D15@oracle.com> <15c89934f5f69de519bca42c9d7b049e621ccbae.camel@oracle.com> <68B2A585-08B4-4741-93C6-B68D3CC801CA@oracle.com> <8f49bc7b4dbdf3255f4c2dc83354101c1b64ae2f.camel@oracle.com> <29DEBCBF-A50B-44D7-BDE2-FF88BA99C3F7@oracle.com> Message-ID: <1b398fbb77953e9b8b218b9d86e844027332b24c.camel@oracle.com> Kim, Erik, thanks for your reviews. Thomas From thomas.schatzl at oracle.com Mon Jul 9 08:52:46 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 09 Jul 2018 10:52:46 +0200 Subject: RFR (S): 8206453: Taskqueue stats should count real steal attempts, not calls to GenericTaskQueueSet::steal In-Reply-To: <93B88660-BA2D-4387-A311-B73F5EBFCC41@oracle.com> References: <1992fab92d03fbbe64218aab7562a59e41d2b553.camel@oracle.com> <93B88660-BA2D-4387-A311-B73F5EBFCC41@oracle.com> Message-ID: <8088066dfc637622a738def9bc955dcb9bc47760.camel@oracle.com> Hi Kim, On Fri, 2018-07-06 at 12:33 -0400, Kim Barrett wrote: > > On Jul 6, 2018, at 10:11 AM, Thomas Schatzl > com> wrote: > > [...] > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8206453 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8206453/webrev/ > > Testing: > > local compilation and use with and without TASKQUEUE_STATS. > > > > Thanks, > > Thomas > > Please change record_attempt to record_steal_attempt. > Otherwise, looks good. I don't need a new webrev for that renaming. > thanks for your review. I updated the existing webrev in-place for the second reviewer. Thomas From erik.helin at oracle.com Mon Jul 9 09:28:43 2018 From: erik.helin at oracle.com (Erik Helin) Date: Mon, 9 Jul 2018 11:28:43 +0200 Subject: RFR (S): 8206453: Taskqueue stats should count real steal attempts, not calls to GenericTaskQueueSet::steal In-Reply-To: <8088066dfc637622a738def9bc955dcb9bc47760.camel@oracle.com> References: <1992fab92d03fbbe64218aab7562a59e41d2b553.camel@oracle.com> <93B88660-BA2D-4387-A311-B73F5EBFCC41@oracle.com> <8088066dfc637622a738def9bc955dcb9bc47760.camel@oracle.com> Message-ID: <07c18df7-57bd-0790-72bb-457720bf2823@oracle.com> On 07/09/2018 10:52 AM, Thomas Schatzl wrote: > Hi Kim, > > On Fri, 2018-07-06 at 12:33 -0400, Kim Barrett wrote: >>> On Jul 6, 2018, at 10:11 AM, Thomas Schatzl >> com> wrote: >>> > [...] >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8206453 >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8206453/webrev/ >>> Testing: >>> local compilation and use with and without TASKQUEUE_STATS. >>> >>> Thanks, >>> Thomas >> >> Please change record_attempt to record_steal_attempt. >> Otherwise, looks good. I don't need a new webrev for that renaming. >> > > thanks for your review. I updated the existing webrev in-place for > the second reviewer. Looks good, Reviewed! Thanks, Erik > Thomas > From thomas.schatzl at oracle.com Mon Jul 9 09:39:01 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 09 Jul 2018 11:39:01 +0200 Subject: RFR (S): 8206453: Taskqueue stats should count real steal attempts, not calls to GenericTaskQueueSet::steal In-Reply-To: <07c18df7-57bd-0790-72bb-457720bf2823@oracle.com> References: <1992fab92d03fbbe64218aab7562a59e41d2b553.camel@oracle.com> <93B88660-BA2D-4387-A311-B73F5EBFCC41@oracle.com> <8088066dfc637622a738def9bc955dcb9bc47760.camel@oracle.com> <07c18df7-57bd-0790-72bb-457720bf2823@oracle.com> Message-ID: <6c02d2281e5355d99602cfd9a0a9b78118a47866.camel@oracle.com> Hi Erik, On Mon, 2018-07-09 at 11:28 +0200, Erik Helin wrote: > On 07/09/2018 10:52 AM, Thomas Schatzl wrote: > > Hi Kim, > > > > On Fri, 2018-07-06 at 12:33 -0400, Kim Barrett wrote: > > > > On Jul 6, 2018, at 10:11 AM, Thomas Schatzl > > > cle. > > > > com> wrote: > > > > > > > > [...] > > > > CR: > > > > https://bugs.openjdk.java.net/browse/JDK-8206453 > > > > Webrev: > > > > http://cr.openjdk.java.net/~tschatzl/8206453/webrev/ > > > > Testing: > > > > local compilation and use with and without TASKQUEUE_STATS. > > > > > > > > Thanks, > > > > Thomas > > > [...] > > > thanks for your review. I updated the existing webrev in-place > > for > > the second reviewer. > > Looks good, Reviewed! > > Thanks, > Erik thanks for your review. Thanks, Thomas From thomas.schatzl at oracle.com Mon Jul 9 10:13:30 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 09 Jul 2018 12:13:30 +0200 Subject: [11] RFR (XXS): 8206476: Wrong assert in phase_enum_2_phase_string() in referenceProcessorPhaseTimes.cpp Message-ID: <10ff9cdf14a2265e2437b4efc4f48eddc7a7c001.camel@oracle.com> Hi all, can I have reviews to this issue found with our C++ source code analysis tool that complains about one assert not being strict enough. In particular: static const char* phase_enum_2_phase_string(ReferenceProcessor::RefProcPhases phase) { assert(phase >= ReferenceProcessor::RefPhase1 && phase <= ReferenceProcessor::RefPhaseMax, "Invalid reference processing phase (%d)", phase); return PhaseNames[phase]; } The second "<=" should be a "<". Actually there is an existing (correct) macro for the whole assert. Replaced that line with the macro as follows: @@ -80,8 +80,7 @@ STATIC_ASSERT((REF_PHANTOM + 1) == ARRAY_SIZE(ReferenceTypeNames)); static const char* phase_enum_2_phase_string(ReferenceProcessor::RefProcPhases phase) { - assert(phase >= ReferenceProcessor::RefPhase1 && phase <= ReferenceProcessor::RefPhaseMax, - "Invalid reference processing phase (%d)", phase); + ASSERT_PHASE(phase); return PhaseNames[phase]; } There is no actual failure, and there are no known failures with the change either; the reason for putting this into 11 is to get rid of unnecessary noise in source code analysis tool results. CR: https://bugs.openjdk.java.net/browse/JDK-8206476 Webrev: http://cr.openjdk.java.net/~tschatzl/8206476/webrev/index.html Testing: hs-tier1-3,jdk-tier1 Thanks, Thomas From erik.helin at oracle.com Mon Jul 9 12:05:19 2018 From: erik.helin at oracle.com (Erik Helin) Date: Mon, 9 Jul 2018 14:05:19 +0200 Subject: [11] RFR (XXS): 8206476: Wrong assert in phase_enum_2_phase_string() in referenceProcessorPhaseTimes.cpp In-Reply-To: <10ff9cdf14a2265e2437b4efc4f48eddc7a7c001.camel@oracle.com> References: <10ff9cdf14a2265e2437b4efc4f48eddc7a7c001.camel@oracle.com> Message-ID: <3f5c14d4-c850-4481-1299-9c2ee1ade6c5@oracle.com> On 07/09/2018 12:13 PM, Thomas Schatzl wrote: > Hi all, > > can I have reviews to this issue found with our C++ source code > analysis tool that complains about one assert not being strict enough. > > In particular: > > static const char* > phase_enum_2_phase_string(ReferenceProcessor::RefProcPhases phase) { > assert(phase >= ReferenceProcessor::RefPhase1 && phase <= > ReferenceProcessor::RefPhaseMax, > "Invalid reference processing phase (%d)", phase); > return PhaseNames[phase]; > } > > The second "<=" should be a "<". > > Actually there is an existing (correct) macro for the whole assert. > Replaced that line with the macro as follows: > > @@ -80,8 +80,7 @@ > STATIC_ASSERT((REF_PHANTOM + 1) == ARRAY_SIZE(ReferenceTypeNames)); > > static const char* > phase_enum_2_phase_string(ReferenceProcessor::RefProcPhases phase) { > - assert(phase >= ReferenceProcessor::RefPhase1 && phase <= > ReferenceProcessor::RefPhaseMax, > - "Invalid reference processing phase (%d)", phase); > + ASSERT_PHASE(phase); > return PhaseNames[phase]; > } > > There is no actual failure, and there are no known failures with the > change either; the reason for putting this into 11 is to get rid of > unnecessary noise in source code analysis tool results. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8206476 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8206476/webrev/index.html Looks good, Reviewed. Thanks, Erik > Testing: > hs-tier1-3,jdk-tier1 > > Thanks, > Thomas > From thomas.schatzl at oracle.com Mon Jul 9 12:54:24 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 09 Jul 2018 14:54:24 +0200 Subject: [11] RFR (XXS): 8206476: Wrong assert in phase_enum_2_phase_string() in referenceProcessorPhaseTimes.cpp In-Reply-To: <3f5c14d4-c850-4481-1299-9c2ee1ade6c5@oracle.com> References: <10ff9cdf14a2265e2437b4efc4f48eddc7a7c001.camel@oracle.com> <3f5c14d4-c850-4481-1299-9c2ee1ade6c5@oracle.com> Message-ID: <285a4c976db53d5cf54832b8cf930d6e863cb8ad.camel@oracle.com> Hi, On Mon, 2018-07-09 at 14:05 +0200, Erik Helin wrote: > On 07/09/2018 12:13 PM, Thomas Schatzl wrote: > > Hi all, > > > > can I have reviews to this issue found with our C++ source code > > analysis tool that complains about one assert not being strict > > enough. > > > > [...] > > > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8206476 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8206476/webrev/index.html > > Looks good, Reviewed. > thanks for your review. Thomas From kim.barrett at oracle.com Mon Jul 9 14:51:51 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Jul 2018 10:51:51 -0400 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: <55d3dd07125645c1f86b274a6eb7ceb28a1286d2.camel@oracle.com> References: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> <55d3dd07125645c1f86b274a6eb7ceb28a1286d2.camel@oracle.com> Message-ID: > On Jul 9, 2018, at 4:30 AM, Thomas Schatzl wrote: > > Hi Kim, > > On Sun, 2018-07-08 at 11:52 -0400, Kim Barrett wrote: >> [?] >> Verified that I really have tested in same VM. New webrev: >> http://cr.openjdk.java.net/~kbarrett/8204691/open.01/ >> >> The only change is TEST_OTHER_VM => TEST_VM. >> > > thanks. Looks good. > > Thomas Thanks. From kim.barrett at oracle.com Mon Jul 9 14:54:34 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Jul 2018 10:54:34 -0400 Subject: [11] RFR (XXS): 8206476: Wrong assert in phase_enum_2_phase_string() in referenceProcessorPhaseTimes.cpp In-Reply-To: <10ff9cdf14a2265e2437b4efc4f48eddc7a7c001.camel@oracle.com> References: <10ff9cdf14a2265e2437b4efc4f48eddc7a7c001.camel@oracle.com> Message-ID: > On Jul 9, 2018, at 6:13 AM, Thomas Schatzl wrote: > [?] > Actually there is an existing (correct) macro for the whole assert. > Replaced that line with the macro as follows: > > [?] > CR: > https://bugs.openjdk.java.net/browse/JDK-8206476 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8206476/webrev/index.html > Testing: > hs-tier1-3,jdk-tier1 > > Thanks, > Thomas Looks good. Thanks for spotting and using the existing helper macro. From thomas.schatzl at oracle.com Mon Jul 9 14:58:55 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 09 Jul 2018 16:58:55 +0200 Subject: [11] RFR (XXS): 8206476: Wrong assert in phase_enum_2_phase_string() in referenceProcessorPhaseTimes.cpp In-Reply-To: References: <10ff9cdf14a2265e2437b4efc4f48eddc7a7c001.camel@oracle.com> Message-ID: <90e85ae58fdc6810d4b69200b99a746394a8912a.camel@oracle.com> Hi, On Mon, 2018-07-09 at 10:54 -0400, Kim Barrett wrote: > > On Jul 9, 2018, at 6:13 AM, Thomas Schatzl > om> wrote: > > [?] > > Actually there is an existing (correct) macro for the whole assert. > > Replaced that line with the macro as follows: > > > > [?] > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8206476 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8206476/webrev/index.html > > Testing: > > hs-tier1-3,jdk-tier1 > > > > Thanks, > > Thomas > > Looks good. > > Thanks for spotting and using the existing helper macro. > thanks for your review. Thomas From erik.helin at oracle.com Mon Jul 9 15:49:48 2018 From: erik.helin at oracle.com (Erik Helin) Date: Mon, 9 Jul 2018 17:49:48 +0200 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: References: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> Message-ID: <2aa63d5e-d148-219c-2e77-928fd0964a64@oracle.com> On 07/08/2018 05:52 PM, Kim Barrett wrote: >> On Jul 5, 2018, at 4:03 PM, Kim Barrett wrote: >> >>> On Jul 5, 2018, at 3:57 AM, Thomas Schatzl wrote: >>> >>> Hi, >>> >>> On Wed, 2018-07-04 at 20:13 -0400, Kim Barrett wrote: >>>> Please review this fix of the HeapRegion gtest. >>>> >>>> The test modifies a region's "top" to unexpected values without >>>> ensuring that no allocation might use the region and no GC might run >>>> while the region is in that invalid state. We solve this by >>>> executing the test code in its very own safepoint, and by saving and >>>> then restoring the region's top back to its original value before >>>> completing the test. And since we are doing all that, there's no >>>> longer any reason to run the test in a separate VM. >>> >>> looks good, but the actual test is still run in a separate VM. >>> Intentional? >> >> Unintentional. And now I?m not sure what I last ran through mach5. >> I?ll re-test with TEST_OTHER_VM => TEST_VM. >> >> I know that failed in an obscure way earlier, but I think that was because >> of an unrelated recently introduced bug that?s been fixed in the repo. > > Verified that I really have tested in same VM. New webrev: > http://cr.openjdk.java.net/~kbarrett/8204691/open.01/ > > The only change is TEST_OTHER_VM => TEST_VM. Hmmm, it is (very) unfortunate if we have native code in JVM allocating Java objects and triggering garbage collections _concurrently_ with the unit tests being run (there shouldn't be any Java code running when the unit tests are executing). I understand that we have to restore the top pointer in case there is some verification for example when the JVM exits (or if we assert in a destructor etc), but do we really need to run the test in a safepoint? There is nothing wrong with running the test in a safepoint, but it seems to me that we then would have to run almost all TEST_VM tests in a safepoint? Thanks, Erik From kim.barrett at oracle.com Mon Jul 9 19:51:15 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 9 Jul 2018 15:51:15 -0400 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: <2aa63d5e-d148-219c-2e77-928fd0964a64@oracle.com> References: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> <2aa63d5e-d148-219c-2e77-928fd0964a64@oracle.com> Message-ID: <92BE79FB-A982-46AB-80D7-61B459D88396@oracle.com> > On Jul 9, 2018, at 11:49 AM, Erik Helin wrote: > > On 07/08/2018 05:52 PM, Kim Barrett wrote: >>> On Jul 5, 2018, at 4:03 PM, Kim Barrett wrote: >>> >>>> On Jul 5, 2018, at 3:57 AM, Thomas Schatzl wrote: >>>> >>>> Hi, >>>> >>>> On Wed, 2018-07-04 at 20:13 -0400, Kim Barrett wrote: >>>>> Please review this fix of the HeapRegion gtest. >>>>> >>>>> The test modifies a region's "top" to unexpected values without >>>>> ensuring that no allocation might use the region and no GC might run >>>>> while the region is in that invalid state. We solve this by >>>>> executing the test code in its very own safepoint, and by saving and >>>>> then restoring the region's top back to its original value before >>>>> completing the test. And since we are doing all that, there's no >>>>> longer any reason to run the test in a separate VM. >>>> >>>> looks good, but the actual test is still run in a separate VM. >>>> Intentional? >>> >>> Unintentional. And now I?m not sure what I last ran through mach5. >>> I?ll re-test with TEST_OTHER_VM => TEST_VM. >>> >>> I know that failed in an obscure way earlier, but I think that was because >>> of an unrelated recently introduced bug that?s been fixed in the repo. >> Verified that I really have tested in same VM. New webrev: >> http://cr.openjdk.java.net/~kbarrett/8204691/open.01/ >> The only change is TEST_OTHER_VM => TEST_VM. > > Hmmm, it is (very) unfortunate if we have native code in JVM allocating Java objects and triggering garbage collections _concurrently_ with the unit tests being run (there shouldn't be any Java code running when the unit tests are executing). I understand that we have to restore the top pointer in case there is some verification for example when the JVM exits (or if we assert in a destructor etc), but do we really need to run the test in a safepoint? There is nothing wrong with running the test in a safepoint, but it seems to me that we then would have to run almost all TEST_VM tests in a safepoint? > > Thanks, > Erik I don't think it's quite *that* bad. As far as I can tell, TEST_VM tests have always been executed concurrently with the executing VM. That is, the VM is created (by calling JNI_CreateJavaVM), and then the same thread that made that call (which is now the main thread for the VM) executes the TEST_VM tests. That thread is a Java thread, initially "in native" (which is why we need to do the ThreadInVMfromNative transition first, before going to the safepoint). It's only a problem for tests that mess with VM data structures in unexpected ways. I guess many / most / nearly all(?) don't do that, since we haven't seen more of these kinds of failures. But I agree that it does mean one needs to take some additional care when writing TEST_VM tests. And in fact, there are some tests that rely on that behavior, e.g. the test of OopStorage::delete_empty_blocks_concurrent(). This particular test is doing something really nasty behind the collector's back. It was trying to protect against that by using TEST_OTHER_VM, but that just narrowed the window for failures. I looked at the 3 other uses of TEST_OTHER_VM, and none of them appear to have this kind of problem. They are run in another VM because they side-effect the VM in a way that we don't necessarily want to apply the to main test runner. But they don't seem to be bashing on VM data structures in non-approved ways. From Derek.White at cavium.com Mon Jul 9 20:48:16 2018 From: Derek.White at cavium.com (White, Derek) Date: Mon, 9 Jul 2018 20:48:16 +0000 Subject: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space In-Reply-To: References:

<36321D6C-A8B7-48FB-8560-9B5807956A87@oracle.com> Message-ID: Hi Michihiro, FYI, this patch does seem to help AArch64 also on SPECjbb to a lesser degree. This was benchmarked with very large young gen, so GC overhead is kept lower than you?d see in typical applications. * Derek From: hotspot-gc-dev [mailto:hotspot-gc-dev-bounces at openjdk.java.net] On Behalf Of Michihiro Horie Sent: Wednesday, July 04, 2018 4:26 AM To: Kim Barrett Cc: hotspot-gc-dev at openjdk.java.net; Gustavo Romero Subject: Re: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space External Email Hi Martin, Kim, Thank you for both of your comments. I missed the point that oopDesc::forward_to is invoked from several callers. Using OrderAccess:storestore() before the invocation of forward_to() would be a great idea, thanks. >I haven't looked carefully at the change, though I did find one part >that I don't like. The new test of "order" in forward_to_atomic not >only affects CMS, but also (uselessly) affects G1. Please let me confirm your point. You mean I should give memory_order_acq_rel to forward_to_atomic, which uses tests as follows to hold the consistent meaning of acquire/release in forward_to_atomic? I agree it is not clear the test with release returns the forwardee with acquire. oop oopDesc::forward_to_atomic(oop p, atomic_memory_order order) { : while (!oldMark->is_marked()) { if (order == memory_order_acq_rel) { curMark = cas_set_mark_raw(forwardPtrMark, oldMark, memory_order_release); } else { curMark = cas_set_mark_raw(forwardPtrMark, oldMark, order); } } : } if (order == memory_order_acq_rel) { return forwardee_acquire(); } return forwardee(); } Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for Kim Barrett ---2018/07/04 05:41:02---> On Jul 3, 2018, at 4:25 AM, Michihiro Horie ]Kim Barrett ---2018/07/04 05:41:02---> On Jul 3, 2018, at 4:25 AM, Michihiro Horie > wrote: > From: Kim Barrett > To: Michihiro Horie > Cc: "Doerr, Martin" >, "hotspot-gc-dev at openjdk.java.net" >, Gustavo Romero > Date: 2018/07/04 05:41 Subject: Re: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space ________________________________ > On Jul 3, 2018, at 4:25 AM, Michihiro Horie > wrote: > > Hi Martin, > > Thanks a lot for your review. Sure, we need an OK from a CMS expert. Following is the new webrev: > http://cr.openjdk.java.net/~mhorie/8205908/webrev.01/ > > >Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. > Thank you for pointing out this issue in the original implementation. I newly inserted a release at "2.4. Set new_obj as forwardee [L1142]". > > Improvement of critical-jOPS in SPECjbb2015 was 10%, which is still a big number. > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo CMS was deprecated in JDK 9, and has been on maintenance life-support for some time. This complex-to-review performance enhancement was proposed less than 48 hours before JDK 11 FC, and didn't receive any reviews until after FC. Because of these factors, I don't think it should be included in JDK 11. And if CMS gets removed in JDK 12 (I don't know if that will happen), then this change would be rendered entirely moot. I haven't looked carefully at the change, though I did find one part that I don't like. The new test of "order" in forward_to_atomic not only affects CMS, but also (uselessly) affects G1. I'm not going to be able to look at this carefully soon, as JDK 11 bug fixing has a higher priority for me. Since I think CMS might soon not be an issue, I'd really rather not look at it at all. I think this change needs not just a CMS-expert reviewer, but someone who is willing to maintain CMS (including any potential bug tail from this change). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From HORIE at jp.ibm.com Tue Jul 10 07:36:11 2018 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 10 Jul 2018 16:36:11 +0900 Subject: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space In-Reply-To: References:

<36321D6C-A8B7-48FB-8560-9B5807956A87@oracle.com> Message-ID: Hi Derek, Thank you very much for testing this change in AArch64 and giving observation on the result, which makes sense. I uploaded a new webrev based on the comments from Martin and Kim. http://cr.openjdk.java.net/~mhorie/8205908/webrev.02/ Best regards, -- Michihiro, IBM Research - Tokyo From: "White, Derek" To: Michihiro Horie , Kim Barrett Cc: "hotspot-gc-dev at openjdk.java.net" , Gustavo Romero Date: 2018/07/10 05:48 Subject: RE: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space Hi Michihiro, FYI, this patch does seem to help AArch64 also on SPECjbb to a lesser degree. This was benchmarked with very large young gen, so GC overhead is kept lower than you?d see in typical applications. Derek From: hotspot-gc-dev [mailto:hotspot-gc-dev-bounces at openjdk.java.net] On Behalf Of Michihiro Horie Sent: Wednesday, July 04, 2018 4:26 AM To: Kim Barrett Cc: hotspot-gc-dev at openjdk.java.net; Gustavo Romero Subject: Re: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space External Email Hi Martin, Kim, Thank you for both of your comments. I missed the point that oopDesc::forward_to is invoked from several callers. Using OrderAccess:storestore() before the invocation of forward_to () would be a great idea, thanks. >I haven't looked carefully at the change, though I did find one part >that I don't like. The new test of "order" in forward_to_atomic not >only affects CMS, but also (uselessly) affects G1. Please let me confirm your point. You mean I should give memory_order_acq_rel to forward_to_atomic, which uses tests as follows to hold the consistent meaning of acquire/release in forward_to_atomic? I agree it is not clear the test with release returns the forwardee with acquire. oop oopDesc::forward_to_atomic(oop p, atomic_memory_order order) { : while (!oldMark->is_marked()) { if (order == memory_order_acq_rel) { curMark = cas_set_mark_raw(forwardPtrMark, oldMark, memory_order_release); } else { curMark = cas_set_mark_raw(forwardPtrMark, oldMark, order); } } : } if (order == memory_order_acq_rel) { return forwardee_acquire(); } return forwardee(); } Best regards, -- Michihiro, IBM Research - Tokyo Inactive hide details for Kim Barrett ---2018/07/04 05:41:02---> On Jul 3, 2018, at 4:25 AM, Michihiro Horie Kim Barrett ---2018/07/04 05:41:02---> On Jul 3, 2018, at 4:25 AM, Michihiro Horie < HORIE at jp.ibm.com> wrote: > From: Kim Barrett To: Michihiro Horie Cc: "Doerr, Martin" , " hotspot-gc-dev at openjdk.java.net" , Gustavo Romero Date: 2018/07/04 05:41 Subject: Re: 8205908: Unnecessarily strong memory barriers in ParNewGeneration::copy_to_survivor_space > On Jul 3, 2018, at 4:25 AM, Michihiro Horie wrote: > > Hi Martin, > > Thanks a lot for your review. Sure, we need an OK from a CMS expert. Following is the new webrev: > http://cr.openjdk.java.net/~mhorie/8205908/webrev.01/ > > >Seems like a user of the forwardee needs to rely on memory_order_consume in the current implementation. I guess it will be appreciated that you?re fixing this. > Thank you for pointing out this issue in the original implementation. I newly inserted a release at "2.4. Set new_obj as forwardee [L1142]". > > Improvement of critical-jOPS in SPECjbb2015 was 10%, which is still a big number. > > > Best regards, > -- > Michihiro, > IBM Research - Tokyo CMS was deprecated in JDK 9, and has been on maintenance life-support for some time. This complex-to-review performance enhancement was proposed less than 48 hours before JDK 11 FC, and didn't receive any reviews until after FC. Because of these factors, I don't think it should be included in JDK 11. And if CMS gets removed in JDK 12 (I don't know if that will happen), then this change would be rendered entirely moot. I haven't looked carefully at the change, though I did find one part that I don't like. The new test of "order" in forward_to_atomic not only affects CMS, but also (uselessly) affects G1. I'm not going to be able to look at this carefully soon, as JDK 11 bug fixing has a higher priority for me. Since I think CMS might soon not be an issue, I'd really rather not look at it at all. I think this change needs not just a CMS-expert reviewer, but someone who is willing to maintain CMS (including any potential bug tail from this change). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From erik.helin at oracle.com Tue Jul 10 14:34:05 2018 From: erik.helin at oracle.com (Erik Helin) Date: Tue, 10 Jul 2018 16:34:05 +0200 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: <92BE79FB-A982-46AB-80D7-61B459D88396@oracle.com> References: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> <2aa63d5e-d148-219c-2e77-928fd0964a64@oracle.com> <92BE79FB-A982-46AB-80D7-61B459D88396@oracle.com> Message-ID: On 07/09/2018 09:51 PM, Kim Barrett wrote: >> On Jul 9, 2018, at 11:49 AM, Erik Helin wrote: >> >> On 07/08/2018 05:52 PM, Kim Barrett wrote: >>>> On Jul 5, 2018, at 4:03 PM, Kim Barrett wrote: >>>> >>>>> On Jul 5, 2018, at 3:57 AM, Thomas Schatzl wrote: >>>>> >>>>> Hi, >>>>> >>>>> On Wed, 2018-07-04 at 20:13 -0400, Kim Barrett wrote: >>>>>> Please review this fix of the HeapRegion gtest. >>>>>> >>>>>> The test modifies a region's "top" to unexpected values without >>>>>> ensuring that no allocation might use the region and no GC might run >>>>>> while the region is in that invalid state. We solve this by >>>>>> executing the test code in its very own safepoint, and by saving and >>>>>> then restoring the region's top back to its original value before >>>>>> completing the test. And since we are doing all that, there's no >>>>>> longer any reason to run the test in a separate VM. >>>>> >>>>> looks good, but the actual test is still run in a separate VM. >>>>> Intentional? >>>> >>>> Unintentional. And now I?m not sure what I last ran through mach5. >>>> I?ll re-test with TEST_OTHER_VM => TEST_VM. >>>> >>>> I know that failed in an obscure way earlier, but I think that was because >>>> of an unrelated recently introduced bug that?s been fixed in the repo. >>> Verified that I really have tested in same VM. New webrev: >>> http://cr.openjdk.java.net/~kbarrett/8204691/open.01/ >>> The only change is TEST_OTHER_VM => TEST_VM. >> >> Hmmm, it is (very) unfortunate if we have native code in JVM allocating Java objects and triggering garbage collections _concurrently_ with the unit tests being run (there shouldn't be any Java code running when the unit tests are executing). I understand that we have to restore the top pointer in case there is some verification for example when the JVM exits (or if we assert in a destructor etc), but do we really need to run the test in a safepoint? There is nothing wrong with running the test in a safepoint, but it seems to me that we then would have to run almost all TEST_VM tests in a safepoint? >> >> Thanks, >> Erik > > I don't think it's quite *that* bad. > > As far as I can tell, TEST_VM tests have always been executed > concurrently with the executing VM. That is, the VM is created (by > calling JNI_CreateJavaVM), and then the same thread that made that > call (which is now the main thread for the VM) executes the TEST_VM > tests. That thread is a Java thread, initially "in native" (which is > why we need to do the ThreadInVMfromNative transition first, before > going to the safepoint). > > It's only a problem for tests that mess with VM data structures in > unexpected ways. I guess many / most / nearly all(?) don't do that, > since we haven't seen more of these kinds of failures. But I agree > that it does mean one needs to take some additional care when writing > TEST_VM tests. > > And in fact, there are some tests that rely on that behavior, e.g. > the test of OopStorage::delete_empty_blocks_concurrent(). > > This particular test is doing something really nasty behind the > collector's back. It was trying to protect against that by using > TEST_OTHER_VM, but that just narrowed the window for failures. > > I looked at the 3 other uses of TEST_OTHER_VM, and none of them appear > to have this kind of problem. They are run in another VM because they > side-effect the VM in a way that we don't necessarily want to apply > the to main test runner. But they don't seem to be bashing on VM data > structures in non-approved ways. Thanks for doing another round of checking. I'm still a bit concerned about some of our TEST_VM tests, I know that there are tests that e.g. temporarily changes the values of the flags in a way that would mess up a garbage collection. But lets leave that out of this patch, if there are additional problems with other tests then those can be solved in separate patches. This patch looks good, Reviewed. Thanks, Erik From kim.barrett at oracle.com Tue Jul 10 17:21:19 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Jul 2018 13:21:19 -0400 Subject: RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list # In-Reply-To: References: <7BB45268-F475-41CA-BB2D-DFB2FE9AE6E4@oracle.com> <2aa63d5e-d148-219c-2e77-928fd0964a64@oracle.com> <92BE79FB-A982-46AB-80D7-61B459D88396@oracle.com> Message-ID: <7F9CFD8E-566B-483B-BE72-54C8524E4680@oracle.com> > On Jul 10, 2018, at 10:34 AM, Erik Helin wrote: > > On 07/09/2018 09:51 PM, Kim Barrett wrote: >>> On Jul 9, 2018, at 11:49 AM, Erik Helin wrote: >>> >>> [?] >>> Hmmm, it is (very) unfortunate if we have native code in JVM allocating Java objects and triggering garbage collections _concurrently_ with the unit tests being run (there shouldn't be any Java code running when the unit tests are executing). I understand that we have to restore the top pointer in case there is some verification for example when the JVM exits (or if we assert in a destructor etc), but do we really need to run the test in a safepoint? There is nothing wrong with running the test in a safepoint, but it seems to me that we then would have to run almost all TEST_VM tests in a safepoint? >>> >>> Thanks, >>> Erik >> I don't think it's quite *that* bad.[?] > > Thanks for doing another round of checking. I'm still a bit concerned about some of our TEST_VM tests, I know that there are tests that e.g. temporarily changes the values of the flags in a way that would mess up a garbage collection. But lets leave that out of this patch, if there are additional problems with other tests then those can be solved in separate patches. Yes. > This patch looks good, Reviewed. Thanks. > > Thanks, > Erik From fairoz.matte at oracle.com Wed Jul 11 13:14:06 2018 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Wed, 11 Jul 2018 06:14:06 -0700 (PDT) Subject: [8u-backport] RFR: JDK-8114823: G1 doesn't honor request to disable class unloading Message-ID: <6371e589-466c-4b8e-b974-1dee8d2446c4@default> Hi, Kindly review the backport of "JDK- 8114823: G1 doesn't honor request to disable class unloading" to 8u Webrev - http://cr.openjdk.java.net/~fmatte/8114823/webrev.00/ JDK 9 bug - https://bugs.openjdk.java.net/browse/JDK-8114823 JDK 9 changeset - http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/53a14fe65414 Review thread - http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-June/018298.html Thanks, Fairoz From boris.ulasevich at bell-sw.com Wed Jul 11 16:17:26 2018 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 11 Jul 2018 19:17:26 +0300 Subject: [11] RFR(XS) 8207044: minimal vm build fail: missing #include Message-ID: <80f472dd-6d16-0714-5f43-1b3d9512888a@bell-sw.com> Hi all, Please review the following patch: http://cr.openjdk.java.net/~bulasevich/8207044/webrev.01 https://bugs.openjdk.java.net/browse/JDK-8207044 thanks, Boris From zgu at redhat.com Wed Jul 11 16:28:15 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Jul 2018 12:28:15 -0400 Subject: [12] RFR(XS) 8207056: Epsilon GC to support object pinning Message-ID: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> Please review this small change to support object pinning in Epsilon GC. Pinning object in Epsilon GC is no-op, so it is simpler than doing GCLock dance. Bug: https://bugs.openjdk.java.net/browse/JDK-8207056 Webrev: http://cr.openjdk.java.net/~zgu/8207056/webrev.00/ Test: gc/epsilon on Linux 64 (fastdebug + release) Thanks, -Zhengyu From shade at redhat.com Wed Jul 11 16:30:47 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jul 2018 18:30:47 +0200 Subject: [12] RFR(XS) 8207056: Epsilon GC to support object pinning In-Reply-To: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> References: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> Message-ID: On 07/11/2018 06:28 PM, Zhengyu Gu wrote: > Please review this small change to support object pinning in Epsilon GC. > > Pinning object in Epsilon GC is no-op, so it is simpler than doing GCLock dance. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8207056 > Webrev: http://cr.openjdk.java.net/~zgu/8207056/webrev.00/ This looks good to me, thanks! Make the comment more verbose: "Object pinning support: every object is implicitly pinned" -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From shade at redhat.com Wed Jul 11 16:34:59 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jul 2018 18:34:59 +0200 Subject: [11] RFR(XS) 8207044: minimal vm build fail: missing #include In-Reply-To: <80f472dd-6d16-0714-5f43-1b3d9512888a@bell-sw.com> References: <80f472dd-6d16-0714-5f43-1b3d9512888a@bell-sw.com> Message-ID: <74059557-9104-aa43-737f-023fd35ae052@redhat.com> On 07/11/2018 06:17 PM, Boris Ulasevich wrote: > Please review the following patch: > http://cr.openjdk.java.net/~bulasevich/8207044/webrev.01 > https://bugs.openjdk.java.net/browse/JDK-8207044 Looks good and trivial to me. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From zgu at redhat.com Wed Jul 11 16:37:45 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Jul 2018 12:37:45 -0400 Subject: [12] RFR(XS) 8207056: Epsilon GC to support object pinning In-Reply-To: References: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> Message-ID: <0a24a175-7b33-84bd-50f4-049ee9c22a8b@redhat.com> On 07/11/2018 12:30 PM, Aleksey Shipilev wrote: > On 07/11/2018 06:28 PM, Zhengyu Gu wrote: >> Please review this small change to support object pinning in Epsilon GC. >> >> Pinning object in Epsilon GC is no-op, so it is simpler than doing GCLock dance. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8207056 >> Webrev: http://cr.openjdk.java.net/~zgu/8207056/webrev.00/ > > This looks good to me, thanks! > > Make the comment more verbose: "Object pinning support: every object is implicitly pinned" Thanks for the review. Updated: http://cr.openjdk.java.net/~zgu/8207056/webrev.01/ -Zhengyu > > -Aleksey > From shade at redhat.com Wed Jul 11 16:44:57 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 11 Jul 2018 18:44:57 +0200 Subject: [12] RFR(XS) 8207056: Epsilon GC to support object pinning In-Reply-To: <0a24a175-7b33-84bd-50f4-049ee9c22a8b@redhat.com> References: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> <0a24a175-7b33-84bd-50f4-049ee9c22a8b@redhat.com> Message-ID: <03e59567-be00-70be-c4de-ea0fb5818b7d@redhat.com> On 07/11/2018 06:37 PM, Zhengyu Gu wrote: > Updated: http://cr.openjdk.java.net/~zgu/8207056/webrev.01/ Looks good. This is actually a very simple patch, and it avoids going to GCLocker needlessly. Maybe we should push it to 11, as "Late Enhancement": http://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From kim.barrett at oracle.com Wed Jul 11 17:14:02 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 11 Jul 2018 13:14:02 -0400 Subject: [11] RFR(XS) 8207044: minimal vm build fail: missing #include In-Reply-To: <80f472dd-6d16-0714-5f43-1b3d9512888a@bell-sw.com> References: <80f472dd-6d16-0714-5f43-1b3d9512888a@bell-sw.com> Message-ID: > On Jul 11, 2018, at 12:17 PM, Boris Ulasevich wrote: > > Hi all, > > Please review the following patch: > http://cr.openjdk.java.net/~bulasevich/8207044/webrev.01 > https://bugs.openjdk.java.net/browse/JDK-8207044 > > thanks, > Boris Looks good. I think you are not yet a committer? If so, I can push it for you. From rkennke at redhat.com Wed Jul 11 17:18:31 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Jul 2018 19:18:31 +0200 Subject: [12] RFR(XS) 8207056: Epsilon GC to support object pinning In-Reply-To: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> References: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> Message-ID: <69dfdff9-6e8b-6e11-6e94-b42ca45dc93e@redhat.com> Am 11.07.2018 um 18:28 schrieb Zhengyu Gu: > Please review this small change to support object pinning in Epsilon GC. > > Pinning object in Epsilon GC is no-op, so it is simpler than doing > GCLock dance. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8207056 > Webrev: http://cr.openjdk.java.net/~zgu/8207056/webrev.00/ > > Test: > > ?gc/epsilon on Linux 64 (fastdebug + release) > > Thanks, Looks good! Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From zgu at redhat.com Wed Jul 11 17:19:33 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Jul 2018 13:19:33 -0400 Subject: [12] RFR(XS) 8207056: Epsilon GC to support object pinning In-Reply-To: <03e59567-be00-70be-c4de-ea0fb5818b7d@redhat.com> References: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> <0a24a175-7b33-84bd-50f4-049ee9c22a8b@redhat.com> <03e59567-be00-70be-c4de-ea0fb5818b7d@redhat.com> Message-ID: <93f57b73-a5e1-bed4-19b6-0c1460f86a8e@redhat.com> On 07/11/2018 12:44 PM, Aleksey Shipilev wrot> On 07/11/2018 06:37 PM, Zhengyu Gu wrote: >> Updated: http://cr.openjdk.java.net/~zgu/8207056/webrev.01/ > > Looks good. This is actually a very simple patch, and it avoids going to GCLocker needlessly. Maybe > we should push it to 11, as "Late Enhancement": > http://openjdk.java.net/jeps/3#Late-Enhancement-Request-Process Done. -Zhengyu > > -Aleksey > From zgu at redhat.com Wed Jul 11 17:21:48 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 11 Jul 2018 13:21:48 -0400 Subject: [12] RFR(XS) 8207056: Epsilon GC to support object pinning In-Reply-To: <69dfdff9-6e8b-6e11-6e94-b42ca45dc93e@redhat.com> References: <8ffa03a3-de73-bafe-2664-3aa8a170db2d@redhat.com> <69dfdff9-6e8b-6e11-6e94-b42ca45dc93e@redhat.com> Message-ID: <8cebb738-fd55-19a4-0079-741794b62a28@redhat.com> Thanks, Roman. -Zhengyu On 07/11/2018 01:18 PM, Roman Kennke wrote: > Am 11.07.2018 um 18:28 schrieb Zhengyu Gu: >> Please review this small change to support object pinning in Epsilon GC. >> >> Pinning object in Epsilon GC is no-op, so it is simpler than doing >> GCLock dance. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8207056 >> Webrev: http://cr.openjdk.java.net/~zgu/8207056/webrev.00/ >> >> Test: >> >> ?gc/epsilon on Linux 64 (fastdebug + release) >> >> Thanks, > > > Looks good! > > Roman > > From boris.ulasevich at bell-sw.com Wed Jul 11 18:13:02 2018 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 11 Jul 2018 21:13:02 +0300 Subject: [11] RFR(XS) 8207044: minimal vm build fail: missing #include In-Reply-To: References: <80f472dd-6d16-0714-5f43-1b3d9512888a@bell-sw.com> Message-ID: Yes, push it please. Thank you! 11.07.2018 20:14, Kim Barrett ?????: >> On Jul 11, 2018, at 12:17 PM, Boris Ulasevich wrote: >> >> Hi all, >> >> Please review the following patch: >> http://cr.openjdk.java.net/~bulasevich/8207044/webrev.01 >> https://bugs.openjdk.java.net/browse/JDK-8207044 >> >> thanks, >> Boris > Looks good. > > I think you are not yet a committer? If so, I can push it for you. From vladimir.kozlov at oracle.com Thu Jul 12 21:28:51 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 12 Jul 2018 14:28:51 -0700 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. Message-ID: Including GC group since I added new method to GCConfig. http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8207069 Recent Graal's changes [1] added list of GC [2] which matches Hotspot GC list [3]. I used that to fix this issue by storing enum value from Graal in AOT config header and compare it with selected GC when AOT library is loaded into Hotspot. I verified the fix with all GCs combination when compiling AOT lib and using it. I also ran our hs-tier1-3 testing which includes AOT and Graal tests. These changes are for JDK 11 so I don't need to go through Graal PR now but I would need to do that for JDK 12 to make changes in AOT code. Thanks, Vladimir [1] https://bugs.openjdk.java.net/browse/JDK-8205824 "[GR-10514] Use whitelist for GCs supported by Graal" [2] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 [3] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 From erik.helin at oracle.com Fri Jul 13 08:18:19 2018 From: erik.helin at oracle.com (Erik Helin) Date: Fri, 13 Jul 2018 10:18:19 +0200 Subject: committed > max in MemoryMXBean#getHeapMemoryUsage() In-Reply-To: References: Message-ID: Hi Daniel, thanks for letting us know. Since you have only set -Xms512 and -Xmx512 and you are running on JDK 10 that means you are using the G1 garbage collector, so all the calls to pool->get_memory_usage() in the loop will end up in g1MemoryPool.cpp [0] which in turn will return cached values from the recalculate_sizes code in G1MonitoringSupport [1]. Since you are running with -Xmx512m you should have gotten 1 MB sized regions (see heapRegion.cpp for details [2]), so the 5 MB _could_ mean that five regions were accounted wrongly. Do you any kind of GC logging from the test run where you encountered the bug? The code in G1MonitoringSupport::recalculate_sizes seems messy enough that there could be in a small bug in there. I'm adding hotspot-gc-dev since all GC developers might not read serviceability-dev. Thanks, Erik [0]: http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/gc/g1/g1MemoryPool.cpp [1]: http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/gc/g1/g1MonitoringSupport.cpp#l182 [2]: http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/share/gc/g1/heapRegion.cpp#l63 On 07/12/2018 03:35 PM, Daniel Mitterdorfer wrote: > Hi, > > while working on a change in Elasticsearch, I discovered an interesting > situation related to the implementation of jmm_getMemoryUsage (see > [jdk-mem-usage]). In one of the test runs, a test failed with the following > exception: > > java.lang.IllegalArgumentException: committed = 542113792 should be < > max = 536870912 > at java.lang.management.MemoryUsage.(MemoryUsage.java:166) > at sun.management.MemoryImpl.getMemoryUsage0(Native Method) > at sun.management.MemoryImpl.getHeapMemoryUsage(MemoryImpl.java:71) > at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.currentMemoryUsage(HierarchyCircuitBreakerService.java:246) > [...] > > This happened on MacOS 10.12.6 with JDK 10 (build 10.0.1+10). The only JVM flags > specified where -Xms512M -Xmx512M. So far this failure occurred only once and I > could not reproduce it yet. > > The values reported in the exception message are: > > * "max": 536870912 = 512MB (exactly) > * "committed": 542113792 = 517MB (exactly), i.e. 5MB more than "max". > > As the value of "max" is exactly what we have specified with -Xmx this indicates > to me that the problem seems to be the calculation of "committed". > > As the value of "max" is exactly what we have specified with -Xmx it seems to > indicate that the problem is the calculation of "committed". I do not > understand under which conditions this can happen thus I post this to the > mailing list in case anybody has ideas what might cause this. > > I plan to run further tests with JVM trace logging enabled > (-Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags to be > precise) in the hope that this problem will occur again and I can provide logs > that help to debug / fix the problem. > > Searching for that error message, there is [JDK-8020530] but that one is about > *non-heap* memory usage and has already been resolved a while ago. Several > sources (e.g. [apache-ignite-workaround] or [netbeans-bug]) seem to indicate > that this problem happened indeed in the wild but what I find odd is that I > could not find a single ticket in the OpenJDK bug tracker or a discussion on a > JDK mailing list about this problem. > > I'd be glad to get any pointers on what might cause this or requests for > additional info that I need to provide to help analyze this problem. > > Thanks, > Daniel > > [jdk-mem-usage] > http://hg.openjdk.java.net/jdk-updates/jdk10u/file/142f0ed9ff5b/src/hotspot/share/services/management.cpp#l728 > [JDK-8020530] https://bugs.openjdk.java.net/browse/JDK-8020530 > [apache-ignite-workaround] > https://github.com/apache/ignite/blob/df4fd65a32/modules/core/src/main/java/org/apache/ignite/internal/managers/discovery/GridDiscoveryManager.java#L336-L346 > [netbeans-bug] https://netbeans.org/bugzilla/show_bug.cgi?id=194733 > From daniel.mitterdorfer at gmail.com Fri Jul 13 08:30:17 2018 From: daniel.mitterdorfer at gmail.com (Daniel Mitterdorfer) Date: Fri, 13 Jul 2018 10:30:17 +0200 Subject: committed > max in MemoryMXBean#getHeapMemoryUsage() In-Reply-To: References:

Message-ID: Hi Erik, > > Do you any kind of GC logging from the test run where you encountered > the bug? Unfortunately, we don't have GC logging enabled by default in our test suite so the exception trace is all I got. I am now repeatedly running the test suite with the original flags (-Xms512M -Xmx512M) and also added the following logging configuration: -Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags As soon as I get another failure, I'll provide the full log file. Please let me know if you need any other logs (i.e. whether I should adjust my log configuration). Daniel From thomas.schatzl at oracle.com Fri Jul 13 08:33:33 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 13 Jul 2018 10:33:33 +0200 Subject: committed > max in MemoryMXBean#getHeapMemoryUsage() In-Reply-To: References:

Message-ID: On Fri, 2018-07-13 at 10:30 +0200, Daniel Mitterdorfer wrote: > Hi Erik, > > > > Do you any kind of GC logging from the test run where you > > encountered the bug? > > Unfortunately, we don't have GC logging enabled by default in our > test suite so the exception trace is all I got. I am now repeatedly > running the test suite with the original flags (-Xms512M -Xmx512M) > and also added the following logging configuration: > > -Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags > > As soon as I get another failure, I'll provide the full log file. > Please let me know if you need any other logs (i.e. whether I should > adjust my log configuration). I think these flags are fine. Since Erik and me strongly believe the issue is with the relevant G1 code Erik mentioned we will reassign the bug to us (he said there is already a bug reported on it). Thanks a lot, Thomas From daniel.mitterdorfer at gmail.com Fri Jul 13 14:10:37 2018 From: daniel.mitterdorfer at gmail.com (Daniel Mitterdorfer) Date: Fri, 13 Jul 2018 16:10:37 +0200 Subject: committed > max in MemoryMXBean#getHeapMemoryUsage() In-Reply-To: References:

Message-ID: Hi, I have good news. I was able to reproduce this issue but this time I have logs. A test failed with the following stack trace around 15:06:55 with: java.lang.IllegalArgumentException: committed = 537919488 should be < max = 536870912 > at java.lang.management.MemoryUsage.(MemoryUsage.java:166) > at sun.management.MemoryImpl.getMemoryUsage0(Native Method) > at sun.management.MemoryImpl.getHeapMemoryUsage(MemoryImpl.java:71) > at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.currentMemoryUsage(HierarchyCircuitBreakerService.java:242) This time it happened on Linux (kernel 4.13.0-45-generic) with JDK 10 (build 10+46). The JVM arguments were: -Xms512M -Xmx512M -Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags The logs are somewhat massive (~250MB uncompressed) and available at https://www.dropbox.com/s/wir9sv1dk5cf54y/JDK-8207200_test_log.txt.zip?dl=0 I hope that helps identifying the cause. Please let me know if you need anything else. Daniel Am Fr., 13. Juli 2018 um 10:33 Uhr schrieb Thomas Schatzl : > > On Fri, 2018-07-13 at 10:30 +0200, Daniel Mitterdorfer wrote: > > Hi Erik, > > > > > > Do you any kind of GC logging from the test run where you > > > encountered the bug? > > > > Unfortunately, we don't have GC logging enabled by default in our > > test suite so the exception trace is all I got. I am now repeatedly > > running the test suite with the original flags (-Xms512M -Xmx512M) > > and also added the following logging configuration: > > > > -Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags > > > > As soon as I get another failure, I'll provide the full log file. > > Please let me know if you need any other logs (i.e. whether I should > > adjust my log configuration). > > I think these flags are fine. > > Since Erik and me strongly believe the issue is with the relevant G1 > code Erik mentioned we will reassign the bug to us (he said there is > already a bug reported on it). > > Thanks a lot, > Thomas > From goetz.lindenmaier at sap.com Tue Jul 17 09:49:24 2018 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 17 Jul 2018 09:49:24 +0000 Subject: Test IncompatibleOptions.java is failing if ZGC is not compiled Message-ID: <5831c166cd3a4cdebcf2124d01a8c92c@sap.com> Hi, did anybody notice that the test runtime/appcds/sharedStrings/IncompatibleOptions.java is failing if openJdk is compiled the default way, i.e., with INCLUDE_ZGC=0? Best regards, Goetz. [STDOUT] Error occurred during initialization of VM Option -XX:+UseZGC not supported ----------System.err:(23/1362)---------- stdout: [Error occurred during initialization of VM Option -XX:+UseZGC not supported ]; stderr: [] exitValue = 1 java.lang.RuntimeException: 'Cannot dump shared archive when UseCompressedOops or UseCompressedClassPointers is off' missing from stdout/stderr From cthalinger at twitter.com Wed Jul 18 00:17:21 2018 From: cthalinger at twitter.com (Christian Thalinger) Date: Tue, 17 Jul 2018 20:17:21 -0400 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. In-Reply-To: References: Message-ID: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> > On Jul 12, 2018, at 5:28 PM, Vladimir Kozlov wrote: > > Including GC group since I added new method to GCConfig. > > http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8207069 > > Recent Graal's changes [1] added list of GC [2] which matches Hotspot GC list [3]. > I used that to fix this issue by storing enum value from Graal in AOT config header and compare it with selected GC when AOT library is loaded into Hotspot. The fix is correct but too strict. For example, Serial and Parallel GC can use the same AOT library. CMS too. > > I verified the fix with all GCs combination when compiling AOT lib and using it. I also ran our hs-tier1-3 testing which includes AOT and Graal tests. > > These changes are for JDK 11 so I don't need to go through Graal PR now but I would need to do that for JDK 12 to make changes in AOT code. > > Thanks, > Vladimir > > [1] https://bugs.openjdk.java.net/browse/JDK-8205824 > "[GR-10514] Use whitelist for GCs supported by Graal" > [2] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 > [3] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 From vladimir.kozlov at oracle.com Wed Jul 18 04:14:32 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Jul 2018 21:14:32 -0700 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. In-Reply-To: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> References: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> Message-ID: On 7/17/18 5:17 PM, Christian Thalinger wrote: > > >> On Jul 12, 2018, at 5:28 PM, Vladimir Kozlov wrote: >> >> Including GC group since I added new method to GCConfig. >> >> http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8207069 >> >> Recent Graal's changes [1] added list of GC [2] which matches Hotspot GC list [3]. >> I used that to fix this issue by storing enum value from Graal in AOT config header and compare it with selected GC when AOT library is loaded into Hotspot. > > The fix is correct but too strict. For example, Serial and Parallel GC can use the same AOT library. CMS too. Do you have other suggestions how to check compatibility? Thanks, Vladimir > >> >> I verified the fix with all GCs combination when compiling AOT lib and using it. I also ran our hs-tier1-3 testing which includes AOT and Graal tests. >> >> These changes are for JDK 11 so I don't need to go through Graal PR now but I would need to do that for JDK 12 to make changes in AOT code. >> >> Thanks, >> Vladimir >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8205824 >> "[GR-10514] Use whitelist for GCs supported by Graal" >> [2] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 >> [3] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 > From cthalinger at twitter.com Wed Jul 18 13:15:47 2018 From: cthalinger at twitter.com (Christian Thalinger) Date: Wed, 18 Jul 2018 09:15:47 -0400 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. In-Reply-To: References: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> Message-ID: <2B6BD07E-7D66-4B3E-8B81-8528582FCC47@twitter.com> > On Jul 18, 2018, at 12:14 AM, Vladimir Kozlov wrote: > > On 7/17/18 5:17 PM, Christian Thalinger wrote: >>> On Jul 12, 2018, at 5:28 PM, Vladimir Kozlov wrote: >>> >>> Including GC group since I added new method to GCConfig. >>> >>> http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8207069 >>> >>> Recent Graal's changes [1] added list of GC [2] which matches Hotspot GC list [3]. >>> I used that to fix this issue by storing enum value from Graal in AOT config header and compare it with selected GC when AOT library is loaded into Hotspot. >> The fix is correct but too strict. For example, Serial and Parallel GC can use the same AOT library. CMS too. > > Do you have other suggestions how to check compatibility? I think the best way would be to use BarrierSet::Name: // Do something for each concrete barrier set part of the build. #define FOR_EACH_CONCRETE_BARRIER_SET_DO(f) \ f(CardTableBarrierSet) \ EPSILONGC_ONLY(f(EpsilonBarrierSet)) \ G1GC_ONLY(f(G1BarrierSet)) \ ZGC_ONLY(f(ZBarrierSet)) > > Thanks, > Vladimir > >>> >>> I verified the fix with all GCs combination when compiling AOT lib and using it. I also ran our hs-tier1-3 testing which includes AOT and Graal tests. >>> >>> These changes are for JDK 11 so I don't need to go through Graal PR now but I would need to do that for JDK 12 to make changes in AOT code. >>> >>> Thanks, >>> Vladimir >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8205824 >>> "[GR-10514] Use whitelist for GCs supported by Graal" >>> [2] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 >>> [3] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Jul 18 19:08:09 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 18 Jul 2018 12:08:09 -0700 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. In-Reply-To: <2B6BD07E-7D66-4B3E-8B81-8528582FCC47@twitter.com> References: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> <2B6BD07E-7D66-4B3E-8B81-8528582FCC47@twitter.com> Message-ID: On 7/18/18 6:15 AM, Christian Thalinger wrote: > > >> On Jul 18, 2018, at 12:14 AM, Vladimir Kozlov >> > wrote: >> >> On 7/17/18 5:17 PM, Christian Thalinger wrote: >>>> On Jul 12, 2018, at 5:28 PM, Vladimir Kozlov >>>> > wrote: >>>> >>>> Including GC group since I added new method to GCConfig. >>>> >>>> http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8207069 >>>> >>>> Recent Graal's changes [1] added list of GC [2] which matches >>>> Hotspot GC list [3]. >>>> I used that to fix this issue by storing enum value from Graal in >>>> AOT config header and compare it with selected GC when AOT library >>>> is loaded into Hotspot. >>> The fix is correct but too strict. ?For example, Serial and Parallel >>> GC can use the same AOT library. ?CMS too. >> >> Do you have other suggestions how to check compatibility? > > I think the best way would be to use BarrierSet::Name: > > // Do something for each concrete barrier set part of the build. > #define FOR_EACH_CONCRETE_BARRIER_SET_DO(f)? ? ? ? ? \ > ? f(CardTableBarrierSet) ? ? ? ? ? ? ? ? ? ? ? ? ? ? \ > ? EPSILONGC_ONLY(f(EpsilonBarrierSet)) ? ? ? ? ? ? ? \ > ? G1GC_ONLY(f(G1BarrierSet)) ? ? ? ? ? ? ? ? ? ? ? ? \ > ? ZGC_ONLY(f(ZBarrierSet)) Thank you, Chris, for suggestion. To record barrier set in AOT library would require a lot more complex changes (JVMCI) not suitable for JDK 11. Currently Graal checks only GC flags. To get information about barrier set it needs to access Hotspot's data. The only simple way to relax the check is to get BarrierSet::Name value based on CollectedHeap::Name and compare them in aot library config check code. But I can't find a functionality in GC code to do that. I asked GC group. Note, we never intended to support mixed GCs with the same type of barriers. It was accidental and I am not comfortable to support such "feature". Vladimir > >> >> Thanks, >> Vladimir >> >>>> >>>> I verified the fix with all GCs combination when compiling AOT lib >>>> and using it. I also ran our hs-tier1-3 testing which includes AOT >>>> and Graal tests. >>>> >>>> These changes are for JDK 11 so I don't need to go through Graal PR >>>> now but I would need to do that for JDK 12 to make changes in AOT code. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8205824 >>>> ?"[GR-10514] Use whitelist for GCs supported by Graal" >>>> [2] >>>> http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 >>>> [3] >>>> http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 > From cthalinger at twitter.com Wed Jul 18 21:25:40 2018 From: cthalinger at twitter.com (Christian Thalinger) Date: Wed, 18 Jul 2018 17:25:40 -0400 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. In-Reply-To: References: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> <2B6BD07E-7D66-4B3E-8B81-8528582FCC47@twitter.com> Message-ID: > On Jul 18, 2018, at 3:08 PM, Vladimir Kozlov wrote: > > On 7/18/18 6:15 AM, Christian Thalinger wrote: >>> On Jul 18, 2018, at 12:14 AM, Vladimir Kozlov >> wrote: >>> >>> On 7/17/18 5:17 PM, Christian Thalinger wrote: >>>>> On Jul 12, 2018, at 5:28 PM, Vladimir Kozlov >> wrote: >>>>> >>>>> Including GC group since I added new method to GCConfig. >>>>> >>>>> http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ > >>>>> https://bugs.openjdk.java.net/browse/JDK-8207069 >>>>> >>>>> Recent Graal's changes [1] added list of GC [2] which matches Hotspot GC list [3]. >>>>> I used that to fix this issue by storing enum value from Graal in AOT config header and compare it with selected GC when AOT library is loaded into Hotspot. >>>> The fix is correct but too strict. For example, Serial and Parallel GC can use the same AOT library. CMS too. >>> >>> Do you have other suggestions how to check compatibility? >> I think the best way would be to use BarrierSet::Name: >> // Do something for each concrete barrier set part of the build. >> #define FOR_EACH_CONCRETE_BARRIER_SET_DO(f) \ >> f(CardTableBarrierSet) \ >> EPSILONGC_ONLY(f(EpsilonBarrierSet)) \ >> G1GC_ONLY(f(G1BarrierSet)) \ >> ZGC_ONLY(f(ZBarrierSet)) > > Thank you, Chris, for suggestion. > > To record barrier set in AOT library would require a lot more complex changes (JVMCI) not suitable for JDK 11. Currently Graal checks only GC flags. To get information about barrier set it needs to access Hotspot's data. Yeah, that?s a problem. > > The only simple way to relax the check is to get BarrierSet::Name value based on CollectedHeap::Name and compare them in aot library config check code. But I can't find a functionality in GC code to do that. I asked GC group. > > Note, we never intended to support mixed GCs with the same type of barriers. It was accidental and I am not comfortable to support such "feature?. You mean that two different GCs use the same barrier set? Yes, I agree, it would be better if each had their own. Do you want to push your current patch? > > Vladimir > >>> >>> Thanks, >>> Vladimir >>> >>>>> >>>>> I verified the fix with all GCs combination when compiling AOT lib and using it. I also ran our hs-tier1-3 testing which includes AOT and Graal tests. >>>>> >>>>> These changes are for JDK 11 so I don't need to go through Graal PR now but I would need to do that for JDK 12 to make changes in AOT code. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8205824 >>>>> "[GR-10514] Use whitelist for GCs supported by Graal" >>>>> [2] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 >>>>> [3] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Jul 18 21:51:15 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 18 Jul 2018 14:51:15 -0700 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. In-Reply-To: References: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> <2B6BD07E-7D66-4B3E-8B81-8528582FCC47@twitter.com>

Message-ID: <12a13dff-7474-5c5f-4026-578cc4c40ae4@oracle.com> On 7/18/18 2:25 PM, Christian Thalinger wrote: > > >> On Jul 18, 2018, at 3:08 PM, Vladimir Kozlov >> > wrote: >> >> On 7/18/18 6:15 AM, Christian Thalinger wrote: >>>> On Jul 18, 2018, at 12:14 AM, Vladimir Kozlov >>>> >>> > >>>> wrote: >>>> >>>> On 7/17/18 5:17 PM, Christian Thalinger wrote: >>>>>> On Jul 12, 2018, at 5:28 PM, Vladimir Kozlov >>>>>> >>>>> > >>>>>> wrote: >>>>>> >>>>>> Including GC group since I added new method to GCConfig. >>>>>> >>>>>> http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8207069 >>>>>> >>>>>> Recent Graal's changes [1] added list of GC [2] which matches >>>>>> Hotspot GC list [3]. >>>>>> I used that to fix this issue by storing enum value from Graal in >>>>>> AOT config header and compare it with selected GC when AOT library >>>>>> is loaded into Hotspot. >>>>> The fix is correct but too strict. ?For example, Serial and >>>>> Parallel GC can use the same AOT library. CMS too. >>>> >>>> Do you have other suggestions how to check compatibility? >>> I think the best way would be to use BarrierSet::Name: >>> // Do something for each concrete barrier set part of the build. >>> #define FOR_EACH_CONCRETE_BARRIER_SET_DO(f)? ? ? ? ? \ >>> f(CardTableBarrierSet) ? ? ? ? ? ? ? ? ? ? ? ? ? ? \ >>> EPSILONGC_ONLY(f(EpsilonBarrierSet)) ? ? ? ? ? ? ? \ >>> G1GC_ONLY(f(G1BarrierSet)) ? ? ? ? ? ? ? ? ? ? ? ? \ >>> ZGC_ONLY(f(ZBarrierSet)) >> >> Thank you, Chris, for suggestion. >> >> To record barrier set in AOT library would require a lot more complex >> changes (JVMCI) not suitable for JDK 11. ?Currently Graal checks only >> GC flags. To get information about barrier set it needs to access >> Hotspot's data. > > Yeah, that?s a problem. > >> >> The only simple way to relax the check is to get BarrierSet::Name >> value based on CollectedHeap::Name and compare them in aot library >> config check code. But I can't find a functionality in GC code to do >> that. I asked GC group. I got answer from GC that they don't have such mapping and don't think it is needed. >> >> Note, we never intended to support mixed GCs with the same type of >> barriers. It was accidental and I am not comfortable to support such >> "feature?. > > You mean that two different GCs use the same barrier set? ?Yes, I agree, > it would be better if each had their own. > > Do you want to push your current patch? Yes. I am waiting PR review from Labs since AOT code (jaotc) is now there. Thanks, Vladimir > >> >> Vladimir >> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>>> >>>>>> I verified the fix with all GCs combination when compiling AOT lib >>>>>> and using it. I also ran our hs-tier1-3 testing which includes AOT >>>>>> and Graal tests. >>>>>> >>>>>> These changes are for JDK 11 so I don't need to go through Graal >>>>>> PR now but I would need to do that for JDK 12 to make changes in >>>>>> AOT code. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8205824 >>>>>> ?"[GR-10514] Use whitelist for GCs supported by Graal" >>>>>> [2] >>>>>> http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 >>>>>> [3] >>>>>> http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 > From cthalinger at twitter.com Wed Jul 18 23:56:03 2018 From: cthalinger at twitter.com (Christian Thalinger) Date: Wed, 18 Jul 2018 19:56:03 -0400 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. In-Reply-To: <12a13dff-7474-5c5f-4026-578cc4c40ae4@oracle.com> References: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> <2B6BD07E-7D66-4B3E-8B81-8528582FCC47@twitter.com>

<12a13dff-7474-5c5f-4026-578cc4c40ae4@oracle.com> Message-ID: <679F74F0-473D-4BA5-A651-8C45E1F99686@twitter.com> > On Jul 18, 2018, at 5:51 PM, Vladimir Kozlov wrote: > > On 7/18/18 2:25 PM, Christian Thalinger wrote: >>> On Jul 18, 2018, at 3:08 PM, Vladimir Kozlov >> wrote: >>> >>> On 7/18/18 6:15 AM, Christian Thalinger wrote: >>>>> On Jul 18, 2018, at 12:14 AM, Vladimir Kozlov >>> wrote: >>>>> >>>>> On 7/17/18 5:17 PM, Christian Thalinger wrote: >>>>>>> On Jul 12, 2018, at 5:28 PM, Vladimir Kozlov >>> wrote: >>>>>>> >>>>>>> Including GC group since I added new method to GCConfig. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ >> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207069 >>>>>>> >>>>>>> Recent Graal's changes [1] added list of GC [2] which matches Hotspot GC list [3]. >>>>>>> I used that to fix this issue by storing enum value from Graal in AOT config header and compare it with selected GC when AOT library is loaded into Hotspot. >>>>>> The fix is correct but too strict. For example, Serial and Parallel GC can use the same AOT library. CMS too. >>>>> >>>>> Do you have other suggestions how to check compatibility? >>>> I think the best way would be to use BarrierSet::Name: >>>> // Do something for each concrete barrier set part of the build. >>>> #define FOR_EACH_CONCRETE_BARRIER_SET_DO(f) \ >>>> f(CardTableBarrierSet) \ >>>> EPSILONGC_ONLY(f(EpsilonBarrierSet)) \ >>>> G1GC_ONLY(f(G1BarrierSet)) \ >>>> ZGC_ONLY(f(ZBarrierSet)) >>> >>> Thank you, Chris, for suggestion. >>> >>> To record barrier set in AOT library would require a lot more complex changes (JVMCI) not suitable for JDK 11. Currently Graal checks only GC flags. To get information about barrier set it needs to access Hotspot's data. >> Yeah, that?s a problem. >>> >>> The only simple way to relax the check is to get BarrierSet::Name value based on CollectedHeap::Name and compare them in aot library config check code. But I can't find a functionality in GC code to do that. I asked GC group. > > I got answer from GC that they don't have such mapping and don't think it is needed. > >>> >>> Note, we never intended to support mixed GCs with the same type of barriers. It was accidental and I am not comfortable to support such "feature?. >> You mean that two different GCs use the same barrier set? Yes, I agree, it would be better if each had their own. >> Do you want to push your current patch? > > Yes. I am waiting PR review from Labs since AOT code (jaotc) is now there. Sounds good. You can use me as a Reviewer, if needed. > > Thanks, > Vladimir > >>> >>> Vladimir >>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>>>> >>>>>>> I verified the fix with all GCs combination when compiling AOT lib and using it. I also ran our hs-tier1-3 testing which includes AOT and Graal tests. >>>>>>> >>>>>>> These changes are for JDK 11 so I don't need to go through Graal PR now but I would need to do that for JDK 12 to make changes in AOT code. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8205824 >>>>>>> "[GR-10514] Use whitelist for GCs supported by Graal" >>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 >>>>>>> [3] http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Jul 19 00:42:16 2018 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 18 Jul 2018 17:42:16 -0700 Subject: [11] RFR(S) 8207069: [AOT] we should check that VM uses the same GC as one used for AOT library generation. In-Reply-To: <679F74F0-473D-4BA5-A651-8C45E1F99686@twitter.com> References: <6B58B3BC-C763-4799-98A8-16B68B0949E0@twitter.com> <2B6BD07E-7D66-4B3E-8B81-8528582FCC47@twitter.com>

<12a13dff-7474-5c5f-4026-578cc4c40ae4@oracle.com> <679F74F0-473D-4BA5-A651-8C45E1F99686@twitter.com> Message-ID: <54f66d48-9d47-ca71-d06a-95b127da3146@oracle.com> Thank you, Chris Vladimir On 7/18/18 4:56 PM, Christian Thalinger wrote: > > >> On Jul 18, 2018, at 5:51 PM, Vladimir Kozlov >> > wrote: >> >> On 7/18/18 2:25 PM, Christian Thalinger wrote: >>>> On Jul 18, 2018, at 3:08 PM, Vladimir Kozlov >>>> >>> > >>>> wrote: >>>> >>>> On 7/18/18 6:15 AM, Christian Thalinger wrote: >>>>>> On Jul 18, 2018, at 12:14 AM, Vladimir Kozlov >>>>>> >>>>> > >>>>>> wrote: >>>>>> >>>>>> On 7/17/18 5:17 PM, Christian Thalinger wrote: >>>>>>>> On Jul 12, 2018, at 5:28 PM, Vladimir Kozlov >>>>>>>> >>>>>>> > >>>>>>>> wrote: >>>>>>>> >>>>>>>> Including GC group since I added new method to GCConfig. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~kvn/8207069/webrev.00/ >>>>>>>> >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8207069 >>>>>>>> >>>>>>>> Recent Graal's changes [1] added list of GC [2] which matches >>>>>>>> Hotspot GC list [3]. >>>>>>>> I used that to fix this issue by storing enum value from Graal >>>>>>>> in AOT config header and compare it with selected GC when AOT >>>>>>>> library is loaded into Hotspot. >>>>>>> The fix is correct but too strict. ?For example, Serial and >>>>>>> Parallel GC can use the same AOT library. CMS too. >>>>>> >>>>>> Do you have other suggestions how to check compatibility? >>>>> I think the best way would be to use BarrierSet::Name: >>>>> // Do something for each concrete barrier set part of the build. >>>>> #define FOR_EACH_CONCRETE_BARRIER_SET_DO(f)? ? ? ? ? \ >>>>> f(CardTableBarrierSet) ? ? ? ? ? ? ? ? ? ? ? ? ? ? \ >>>>> EPSILONGC_ONLY(f(EpsilonBarrierSet)) ? ? ? ? ? ? ? \ >>>>> G1GC_ONLY(f(G1BarrierSet)) ? ? ? ? ? ? ? ? ? ? ? ? \ >>>>> ZGC_ONLY(f(ZBarrierSet)) >>>> >>>> Thank you, Chris, for suggestion. >>>> >>>> To record barrier set in AOT library would require a lot more >>>> complex changes (JVMCI) not suitable for JDK 11. ?Currently Graal >>>> checks only GC flags. To get information about barrier set it needs >>>> to access Hotspot's data. >>> Yeah, that?s a problem. >>>> >>>> The only simple way to relax the check is to get BarrierSet::Name >>>> value based on CollectedHeap::Name and compare them in aot library >>>> config check code. But I can't find a functionality in GC code to do >>>> that. I asked GC group. >> >> I got answer from GC that they don't have such mapping and don't think >> it is needed. >> >>>> >>>> Note, we never intended to support mixed GCs with the same type of >>>> barriers. It was accidental and I am not comfortable to support such >>>> "feature?. >>> You mean that two different GCs use the same barrier set? ?Yes, I >>> agree, it would be better if each had their own. >>> Do you want to push your current patch? >> >> Yes. I am waiting PR review from Labs since AOT code (jaotc) is now there. > > Sounds good. ?You can use me as a Reviewer, if needed. > >> >> Thanks, >> Vladimir >> >>>> >>>> Vladimir >>>> >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>>>> >>>>>>>> I verified the fix with all GCs combination when compiling AOT >>>>>>>> lib and using it. I also ran our hs-tier1-3 testing which >>>>>>>> includes AOT and Graal tests. >>>>>>>> >>>>>>>> These changes are for JDK 11 so I don't need to go through Graal >>>>>>>> PR now but I would need to do that for JDK 12 to make changes in >>>>>>>> AOT code. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8205824 >>>>>>>> ?"[GR-10514] Use whitelist for GCs supported by Graal" >>>>>>>> [2] >>>>>>>> http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot/src/org/graalvm/compiler/hotspot/HotSpotGraalRuntime.java#l137 >>>>>>>> [3] >>>>>>>> http://hg.openjdk.java.net/jdk/jdk11/file/bf686c47c109/src/hotspot/share/gc/shared/collectedHeap.hpp#l173 > From manc at google.com Thu Jul 19 01:53:19 2018 From: manc at google.com (Man Cao) Date: Wed, 18 Jul 2018 18:53:19 -0700 Subject: Patch to inline os::SpinPause() for X86 on non-Windows OS Message-ID: Hello, The Java platform team at Google has maintained a local patch to inline os::SpinPause() since 2014. We would like to upstream this patch to OpenJDK. Could someone sponsor this patch? It is difficult to demonstrate performance improvement in Java benchmarks. It is more of a code refactoring to better utilize modern GCC. It partly addresses the comment about inlining SpinPause() above its declaration in os.hpp. I found an interesting discussion about PAUSE and a microbenchmark in: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2012-August/004352.html However, the microbenchmark has a large variance in our experiment, making it difficult to tell if there's any benefit from inlining PAUSE. Inlining PAUSE does seem to reduce the variance a bit. The patch is inlined and attached below: diff --git a/src/hotspot/os_cpu/bsd_x86/bsd_x86_32.s b/src/hotspot/os_cpu/bsd_x86/bsd_x86_32.s --- a/src/hotspot/os_cpu/bsd_x86/bsd_x86_32.s +++ b/src/hotspot/os_cpu/bsd_x86/bsd_x86_32.s @@ -63,15 +63,6 @@ popl %eax ret - .globl SYMBOL(SpinPause) - ELF_TYPE(SpinPause, at function) - .p2align 4,,15 -SYMBOL(SpinPause): - rep - nop - movl $1, %eax - ret - # Support for void Copy::conjoint_bytes(void* from, # void* to, # size_t count) diff --git a/src/hotspot/os_cpu/bsd_x86/bsd_x86_64.s b/src/hotspot/os_cpu/bsd_x86/bsd_x86_64.s --- a/src/hotspot/os_cpu/bsd_x86/bsd_x86_64.s +++ b/src/hotspot/os_cpu/bsd_x86/bsd_x86_64.s @@ -46,15 +46,6 @@ .text - .globl SYMBOL(SpinPause) - .p2align 4,,15 - ELF_TYPE(SpinPause, at function) -SYMBOL(SpinPause): - rep - nop - movq $1, %rax - ret - # Support for void Copy::arrayof_conjoint_bytes(void* from, # void* to, # size_t count) diff --git a/src/hotspot/os_cpu/linux_x86/linux_x86_32.s b/src/hotspot/os_cpu/linux_x86/linux_x86_32.s --- a/src/hotspot/os_cpu/linux_x86/linux_x86_32.s +++ b/src/hotspot/os_cpu/linux_x86/linux_x86_32.s @@ -42,15 +42,6 @@ .text - .globl SpinPause - .type SpinPause, at function - .p2align 4,,15 -SpinPause: - rep - nop - movl $1, %eax - ret - # Support for void Copy::conjoint_bytes(void* from, # void* to, # size_t count) diff --git a/src/hotspot/os_cpu/linux_x86/linux_x86_64.s b/src/hotspot/os_cpu/linux_x86/linux_x86_64.s --- a/src/hotspot/os_cpu/linux_x86/linux_x86_64.s +++ b/src/hotspot/os_cpu/linux_x86/linux_x86_64.s @@ -38,15 +38,6 @@ .text - .globl SpinPause - .align 16 - .type SpinPause, at function -SpinPause: - rep - nop - movq $1, %rax - ret - # Support for void Copy::arrayof_conjoint_bytes(void* from, # void* to, # size_t count) diff --git a/src/hotspot/os_cpu/solaris_x86/solaris_x86_64.s b/src/hotspot/os_cpu/solaris_x86/solaris_x86_64.s --- a/src/hotspot/os_cpu/solaris_x86/solaris_x86_64.s +++ b/src/hotspot/os_cpu/solaris_x86/solaris_x86_64.s @@ -51,15 +51,6 @@ movq %fs:0x0,%rax ret - .globl SpinPause - .align 16 -SpinPause: - rep - nop - movq $1, %rax - ret - - / Support for void Copy::arrayof_conjoint_bytes(void* from, / void* to, / size_t count) diff --git a/src/hotspot/share/runtime/os.hpp b/src/hotspot/share/runtime/os.hpp --- a/src/hotspot/share/runtime/os.hpp +++ b/src/hotspot/share/runtime/os.hpp @@ -1031,6 +1031,13 @@ // of the global SpinPause() with C linkage. // It'd also be eligible for inlining on many platforms. +#if defined(X86) && !defined(_WINDOWS) +extern "C" int inline SpinPause() { + __asm__ __volatile__ ("pause"); + return 1; +} +#else extern "C" int SpinPause(); +#endif #endif // SHARE_VM_RUNTIME_OS_HPP -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: inline_spinpause.patch Type: text/x-patch Size: 3778 bytes Desc: not available URL: From fairoz.matte at oracle.com Thu Jul 19 06:52:27 2018 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Wed, 18 Jul 2018 23:52:27 -0700 (PDT) Subject: [8u-backport] RFR: JDK-8114823: G1 doesn't honor request to disable class unloading In-Reply-To: <6371e589-466c-4b8e-b974-1dee8d2446c4@default> References: <6371e589-466c-4b8e-b974-1dee8d2446c4@default> Message-ID: <496e6c67-2b03-4e96-924b-eec90216ae20@default> Hi All, Just a gentle reminder for review request. Thanks, Fairoz > -----Original Message----- > From: Fairoz Matte > Sent: Wednesday, July 11, 2018 6:44 PM > To: hotspot-gc-dev at openjdk.java.net > Subject: [8u-backport] RFR: JDK-8114823: G1 doesn't honor request to > disable class unloading > > Hi, > > Kindly review the backport of "JDK- 8114823: G1 doesn't honor request to > disable class unloading" to 8u > > Webrev - http://cr.openjdk.java.net/~fmatte/8114823/webrev.00/ > > JDK 9 bug - https://bugs.openjdk.java.net/browse/JDK-8114823 > > JDK 9 changeset - > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/53a14fe65414 > > Review thread - http://mail.openjdk.java.net/pipermail/hotspot-gc- > dev/2016-June/018298.html > > Thanks, > Fairoz From thomas.schatzl at oracle.com Thu Jul 19 13:27:02 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 19 Jul 2018 15:27:02 +0200 Subject: [8u-backport] RFR: JDK-8114823: G1 doesn't honor request to disable class unloading In-Reply-To: <496e6c67-2b03-4e96-924b-eec90216ae20@default> References: <6371e589-466c-4b8e-b974-1dee8d2446c4@default> <496e6c67-2b03-4e96-924b-eec90216ae20@default> Message-ID: <74e6b279260667f9dd443771def98ad9d0448358.camel@oracle.com> Hi Fairoz, On Wed, 2018-07-18 at 23:52 -0700, Fairoz Matte wrote: > Hi All, > > Just a gentle reminder for review request. - at g1RootProcessor.cpp:152, the call to process_string_table_roots() should be moved to after the process_vm_roots() to (as much as possible) keep in sync with JDK9 code. - in G1RootProcessor::process_string_table_roots(), the "if (weak_roots != NULL)" is superfluous given the assert above it (and better keeps in sync with JDK9 code). Otherwise looks good to me. Thanks, Thomas From erik.helin at oracle.com Thu Jul 19 14:57:25 2018 From: erik.helin at oracle.com (Erik Helin) Date: Thu, 19 Jul 2018 16:57:25 +0200 Subject: committed > max in MemoryMXBean#getHeapMemoryUsage() In-Reply-To: References:

Message-ID: <35365ee0-e158-ad99-93c0-6f60251573a1@oracle.com> On 07/13/2018 04:10 PM, Daniel Mitterdorfer wrote: > Hi, > > I have good news. I was able to reproduce this issue but this time I > have logs. A test failed with the following stack trace around > 15:06:55 with: > > java.lang.IllegalArgumentException: committed = 537919488 should be < > max = 536870912 > > at java.lang.management.MemoryUsage.(MemoryUsage.java:166) > > at sun.management.MemoryImpl.getMemoryUsage0(Native Method) > > at sun.management.MemoryImpl.getHeapMemoryUsage(MemoryImpl.java:71) > > at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.currentMemoryUsage(HierarchyCircuitBreakerService.java:242) > > This time it happened on Linux (kernel 4.13.0-45-generic) with JDK 10 > (build 10+46). The JVM arguments were: > > -Xms512M -Xmx512M > -Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags > > The logs are somewhat massive (~250MB uncompressed) and available at > https://www.dropbox.com/s/wir9sv1dk5cf54y/JDK-8207200_test_log.txt.zip?dl=0 Thanks for the logs Daniel, they helped a lot! Me and Thomas looked through the logs and the code and as we suspected, this is code is a bit buggy :/ Please see the bug for more details: https://bugs.openjdk.java.net/browse/JDK-8207200 Again, thanks for taking your time and reporting this issue and for getting us the logs, much appreciated! Erik > I hope that helps identifying the cause. Please let me know if you > need anything else. > > Daniel > Am Fr., 13. Juli 2018 um 10:33 Uhr schrieb Thomas Schatzl > : >> >> On Fri, 2018-07-13 at 10:30 +0200, Daniel Mitterdorfer wrote: >>> Hi Erik, >>>> >>>> Do you any kind of GC logging from the test run where you >>>> encountered the bug? >>> >>> Unfortunately, we don't have GC logging enabled by default in our >>> test suite so the exception trace is all I got. I am now repeatedly >>> running the test suite with the original flags (-Xms512M -Xmx512M) >>> and also added the following logging configuration: >>> >>> -Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags >>> >>> As soon as I get another failure, I'll provide the full log file. >>> Please let me know if you need any other logs (i.e. whether I should >>> adjust my log configuration). >> >> I think these flags are fine. >> >> Since Erik and me strongly believe the issue is with the relevant G1 >> code Erik mentioned we will reassign the bug to us (he said there is >> already a bug reported on it). >> >> Thanks a lot, >> Thomas >> From fairoz.matte at oracle.com Thu Jul 19 15:34:28 2018 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Thu, 19 Jul 2018 08:34:28 -0700 (PDT) Subject: [8u-backport] RFR: JDK-8114823: G1 doesn't honor request to disable class unloading In-Reply-To: <74e6b279260667f9dd443771def98ad9d0448358.camel@oracle.com> References: <6371e589-466c-4b8e-b974-1dee8d2446c4@default> <496e6c67-2b03-4e96-924b-eec90216ae20@default> <74e6b279260667f9dd443771def98ad9d0448358.camel@oracle.com> Message-ID: <372b3602-e550-4f85-8ebd-2a2dfceb80a4@default> Hi Thomas, Thanks for the review. Here is the updated webrev http://cr.openjdk.java.net/~fmatte/8114823/webrev.01/ with suggested changes. Thanks, Fairoz > -----Original Message----- > From: Thomas Schatzl > Sent: Thursday, July 19, 2018 6:57 PM > To: Fairoz Matte ; hotspot-gc- > dev at openjdk.java.net > Subject: Re: [8u-backport] RFR: JDK-8114823: G1 doesn't honor request to > disable class unloading > > Hi Fairoz, > > On Wed, 2018-07-18 at 23:52 -0700, Fairoz Matte wrote: > > Hi All, > > > > Just a gentle reminder for review request. > > > - at g1RootProcessor.cpp:152, the call to process_string_table_roots() should > be moved to after the process_vm_roots() to (as much as > possible) keep in sync with JDK9 code. > > - in G1RootProcessor::process_string_table_roots(), the "if (weak_roots != > NULL)" is superfluous given the assert above it (and better keeps in sync with > JDK9 code). > > Otherwise looks good to me. > > Thanks, > Thomas > From thomas.schatzl at oracle.com Thu Jul 19 15:44:33 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 19 Jul 2018 17:44:33 +0200 Subject: [8u-backport] RFR: JDK-8114823: G1 doesn't honor request to disable class unloading In-Reply-To: <372b3602-e550-4f85-8ebd-2a2dfceb80a4@default> References: <6371e589-466c-4b8e-b974-1dee8d2446c4@default> <496e6c67-2b03-4e96-924b-eec90216ae20@default> <74e6b279260667f9dd443771def98ad9d0448358.camel@oracle.com> <372b3602-e550-4f85-8ebd-2a2dfceb80a4@default> Message-ID: <12f07c2bcaa51956486ae0d968211675903b85a1.camel@oracle.com> Hi, On Thu, 2018-07-19 at 08:34 -0700, Fairoz Matte wrote: > Hi Thomas, > > Thanks for the review. > Here is the updated webrev > http://cr.openjdk.java.net/~fmatte/8114823/webrev.01/ with suggested > changes. updates look good. Thanks, Thomas From daniel.mitterdorfer at gmail.com Thu Jul 19 17:10:09 2018 From: daniel.mitterdorfer at gmail.com (Daniel Mitterdorfer) Date: Thu, 19 Jul 2018 19:10:09 +0200 Subject: committed > max in MemoryMXBean#getHeapMemoryUsage() In-Reply-To: <35365ee0-e158-ad99-93c0-6f60251573a1@oracle.com> References:

<35365ee0-e158-ad99-93c0-6f60251573a1@oracle.com> Message-ID: Hi Erik, I am quite happy that I could reproduce it after running the tests repeatedly for approximately a week after the first failure. Glad I could help and thank you all for you help as well! Daniel Am Do., 19. Juli 2018 um 16:57 Uhr schrieb Erik Helin : > > On 07/13/2018 04:10 PM, Daniel Mitterdorfer wrote: > > Hi, > > > > I have good news. I was able to reproduce this issue but this time I > > have logs. A test failed with the following stack trace around > > 15:06:55 with: > > > > java.lang.IllegalArgumentException: committed = 537919488 should be < > > max = 536870912 > > > at java.lang.management.MemoryUsage.(MemoryUsage.java:166) > > > at sun.management.MemoryImpl.getMemoryUsage0(Native Method) > > > at sun.management.MemoryImpl.getHeapMemoryUsage(MemoryImpl.java:71) > > > at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.currentMemoryUsage(HierarchyCircuitBreakerService.java:242) > > > > This time it happened on Linux (kernel 4.13.0-45-generic) with JDK 10 > > (build 10+46). The JVM arguments were: > > > > -Xms512M -Xmx512M > > -Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags > > > > The logs are somewhat massive (~250MB uncompressed) and available at > > https://www.dropbox.com/s/wir9sv1dk5cf54y/JDK-8207200_test_log.txt.zip?dl=0 > > Thanks for the logs Daniel, they helped a lot! Me and Thomas looked > through the logs and the code and as we suspected, this is code is a bit > buggy :/ Please see the bug for more details: > > https://bugs.openjdk.java.net/browse/JDK-8207200 > > Again, thanks for taking your time and reporting this issue and for > getting us the logs, much appreciated! > Erik > > > I hope that helps identifying the cause. Please let me know if you > > need anything else. > > > > Daniel > > Am Fr., 13. Juli 2018 um 10:33 Uhr schrieb Thomas Schatzl > > : > >> > >> On Fri, 2018-07-13 at 10:30 +0200, Daniel Mitterdorfer wrote: > >>> Hi Erik, > >>>> > >>>> Do you any kind of GC logging from the test run where you > >>>> encountered the bug? > >>> > >>> Unfortunately, we don't have GC logging enabled by default in our > >>> test suite so the exception trace is all I got. I am now repeatedly > >>> running the test suite with the original flags (-Xms512M -Xmx512M) > >>> and also added the following logging configuration: > >>> > >>> -Xlog:gc*=trace,heap*=trace,tlab*=off:stdout:time,pid,tid,level,tags > >>> > >>> As soon as I get another failure, I'll provide the full log file. > >>> Please let me know if you need any other logs (i.e. whether I should > >>> adjust my log configuration). > >> > >> I think these flags are fine. > >> > >> Since Erik and me strongly believe the issue is with the relevant G1 > >> code Erik mentioned we will reassign the bug to us (he said there is > >> already a bug reported on it). > >> > >> Thanks a lot, > >> Thomas > >> From thomas.schatzl at oracle.com Fri Jul 20 09:58:30 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 20 Jul 2018 11:58:30 +0200 Subject: RFR (XS): 8207953: Remove dead code in G1CopyingKeepAliveClosure Message-ID: <0de8d55387eac077c43677a65bbc2c2a35407a55.camel@oracle.com> Hi all, can I have a review for this trivial change that removes dead code: There is the following code in G1CopyingKeepAliveClosure::do_oop_work: if (_g1h->is_in_cset_or_humongous(obj)) { [...] if (_g1h->is_in_g1_reserved(p)) { _par_scan_state->push_on_queue(p); } else { assert(!Metaspace::contains((const void*)p), "Unexpectedly found a pointer from metadata: " PTR_FORMAT, p2i(p)); _copy_non_heap_obj_cl->do_oop(p); } } is_in_cset_or_humongous() implies is_in_g1_reserved(), so the condition and the else-part can be removed. CR: https://bugs.openjdk.java.net/browse/JDK-8207953 Webrev: http://cr.openjdk.java.net/~tschatzl/8207953/webrev/ Testing: hs-tier1 Thanks, Thomas From kim.barrett at oracle.com Fri Jul 20 17:48:59 2018 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 20 Jul 2018 13:48:59 -0400 Subject: RFR (XS): 8207953: Remove dead code in G1CopyingKeepAliveClosure In-Reply-To: <0de8d55387eac077c43677a65bbc2c2a35407a55.camel@oracle.com> References: <0de8d55387eac077c43677a65bbc2c2a35407a55.camel@oracle.com> Message-ID: <058157FC-81FB-4F31-8F7F-B32B4C02B601@oracle.com> > On Jul 20, 2018, at 5:58 AM, Thomas Schatzl wrote: > > Hi all, > > can I have a review for this trivial change that removes dead code: > > There is the following code in G1CopyingKeepAliveClosure::do_oop_work: > > if (_g1h->is_in_cset_or_humongous(obj)) { > [...] > if (_g1h->is_in_g1_reserved(p)) { > _par_scan_state->push_on_queue(p); > } else { > assert(!Metaspace::contains((const void*)p), > "Unexpectedly found a pointer from metadata: " > PTR_FORMAT, p2i(p)); > _copy_non_heap_obj_cl->do_oop(p); > } > } > > is_in_cset_or_humongous() implies is_in_g1_reserved(), so the condition > and the else-part can be removed. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8207953 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8207953/webrev/ > Testing: > hs-tier1 > > Thanks, > Thomas Looks good. From thomas.schatzl at oracle.com Fri Jul 20 20:36:02 2018 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 20 Jul 2018 22:36:02 +0200 Subject: RFR (XS): 8207953: Remove dead code in G1CopyingKeepAliveClosure In-Reply-To: <058157FC-81FB-4F31-8F7F-B32B4C02B601@oracle.com> References: <0de8d55387eac077c43677a65bbc2c2a35407a55.camel@oracle.com> <058157FC-81FB-4F31-8F7F-B32B4C02B601@oracle.com> Message-ID: Hi Kim, On Fri, 2018-07-20 at 13:48 -0400, Kim Barrett wrote: > > On Jul 20, 2018, at 5:58 AM, Thomas Schatzl > com> wrote: > > > > Hi all, > > > > can I have a review for this trivial change that removes dead > > code: > > > > There is the following code in > > G1CopyingKeepAliveClosure::do_oop_work: > > [...] > > Thanks, > > Thomas > > Looks good. > thanks for your review. Thomas From hohensee at amazon.com Fri Jul 20 22:37:14 2018 From: hohensee at amazon.com (Hohensee, Paul) Date: Fri, 20 Jul 2018 22:37:14 +0000 Subject: RFR(L): 8196889: Revamp G1 JMX MemoryPoolMXBean, GarbageCollectorMXBean, and jstat counter definitions Message-ID: Please review. Bug: https://bugs.openjdk.java.net/browse/JDK-8196989 CSR: https://bugs.openjdk.java.net/browse/JDK-8196991 Webrev: http://cr.openjdk.java.net/~phh/8196989/webrev.00 This webrev is marked ?L? because it?s a behavioral change (CSR in draft state, may I have a review of that too please?) and because the test change fanout is large. The actual code changes are ?M?. Passes the submit repo, Hotspot tier1, the JFR gc event tests and any other test set with ?gc? or ?serviceability? in the test directory name. I found it difficult to verify the accuracy of the reported values other than manually, since they can vary from run to run of the same program. I?d appreciate suggestions for how to go about writing accuracy tests. I set out originally to revamp only the MXBeans, but decided it would be incomplete if I didn?t include the jstat counters and the output of the GC.heap_info jcmd option. I can separate the latter two into their own RFEs, but I find it easier understand it all in a single webrev and hope the reviewers will too. The basic approach is to add the new memory pools and collectors, the new jstat counters, and an archive region counter that stands in for an actual archive region set. HeapRegionSets are disjoint, so initially I tried to create a first-class archive region set (on the same level as the humongous region set), but that idea foundered on the fact that there?s too much code I don?t fully understand that depends on archive regions being in the existing old region set. Externally (i.e., in the MXBeans and the jstat counters), however, the old region set doesn?t include archive regions (unless running in legacy mode). I used CMS?s TraceCMSMemoryManagerStats class as the model for TraceConcMemoryManagerStats, which latter collects statistics on concurrent cycles. There are two STW pauses in each concurrent cycle: they are recorded separately and count as two sun.gc.collector.2 events. The humongous and archive space committed and used values are always identical, hence they are always 100% used. The revised output of jcmd GC.heap_info is in G1CollectedHeap::print_on(). I fixed a typo in src/hotspot/share/gc/g1/g1Policy.hpp by changing the result type of young_list_target_length() from size_t to uint, which latter is the type of the _young_list_target_length member. I updated the copyright date in src/hotspot/share/services/memoryService.hpp to 2018, as I neglected to do so in a previous push. Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.buck at oracle.com Sun Jul 22 01:10:35 2018 From: david.buck at oracle.com (David Buck) Date: Sun, 22 Jul 2018 10:10:35 +0900 Subject: JDK Memory Allocation In-Reply-To: References: Message-ID: <87801549-1b64-afa3-95dc-95994b0d0fea@oracle.com> Hi Max! Your question does not seem to be related to building OpenJDK, so I have BCCed build-dev from the thread and added gc-dev. That said, I am not sure any of the development lists are really an ideal place to ask general "code walk through" questions. If really necessary, memAllocator.cpp [0] would probably be as good a place as any to start reading the source code. But unless you intend to hack on the JVM itself, trying to read this source code may not be the most productive use of your time. You may get a lot more out of reading some of the wikis [1], blogs [2], and books [3][4] that cover the HotSpot JVM in detail. Even if you ultimately chose to read the source code directly, reading these other types of resources first should really help you make better sense of what you see in the source code. Cheers, -Buck [0] http://hg.openjdk.java.net/jdk/jdk/file/b0fcf59be391/src/hotspot/share/gc/shared/memAllocator.cpp [1] https://wiki.openjdk.java.net/display/HotSpot/Main [2] https://shipilev.net/jvm-anatomy-park/ [3] https://www.goodreads.com/book/show/13227108-java-performance [4] https://www.goodreads.com/book/show/23316035-java-performance-companion On 2018/07/22 8:51, mr rupplin wrote: > Having looked for some while at the OpenJDK source code I am unable to find where the memory allocation occurs. I will be working very much with the JDK and would like to get a firm grasp on its underlying mechanisms. > > public class JustAsk > { > public static void main(String...args) > { > for(int i=0; i<100; i++) > { > new JustAsk(); > } > } > } > > This doesn't seem to rely on any of the functions in the libjli nor of the jni.h. So clearly where do we look for the handler here? > > Thanks, > > Your friend Max > From hohensee at amazon.com Mon Jul 23 21:33:28 2018 From: hohensee at amazon.com (Hohensee, Paul) Date: Mon, 23 Jul 2018 21:33:28 +0000 Subject: RFR(L): 8196989: Revamp G1 JMX MemoryPoolMXBean, GarbageCollectorMXBean, and jstat counter definitions Message-ID: