From rwestrel at redhat.com  Mon Feb  3 08:00:44 2020
From: rwestrel at redhat.com (Roland Westrelin)
Date: Mon, 03 Feb 2020 09:00:44 +0100
Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test
In-Reply-To: <87wo98t3lb.fsf@redhat.com>
References: <87wo98t3lb.fsf@redhat.com>
Message-ID: <87ftfst80j.fsf@redhat.com>


After some offline discussion with Aleksey, here is an updated webrev:

http://cr.openjdk.java.net/~roland/8237776/webrev.01/

Only difference is an assert that checks the number of fp arguments.

Roland.


From shade at redhat.com  Mon Feb  3 08:19:59 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 3 Feb 2020 09:19:59 +0100
Subject: RFR(S): 8237776: Shenandoah: Wrong result with Lucene test
In-Reply-To: <87ftfst80j.fsf@redhat.com>
References: <87wo98t3lb.fsf@redhat.com> <87ftfst80j.fsf@redhat.com>
Message-ID: <93eed5d6-bae4-a919-a8a6-aa31294b0825@redhat.com>

On 2/3/20 9:00 AM, Roland Westrelin wrote:
> 
> After some offline discussion with Aleksey, here is an updated webrev:
> 
> http://cr.openjdk.java.net/~roland/8237776/webrev.01/

Looks good!

-- 
Thanks,
-Aleksey


From thomas.schatzl at oracle.com  Mon Feb  3 09:19:21 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 3 Feb 2020 10:19:21 +0100
Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in
 gc/g1/unloading/libdefine.cpp
In-Reply-To: <CA+w6HxYBNfYzrNy=gtfF2uvjy1cGCHxXztPmDMQUv3AaYcccKw@mail.gmail.com>
References: <CA+w6HxYTpW39sn7bGCQL48c6v=9PoOzb=La6KJV9KS5yuLNyOA@mail.gmail.com>
 <d1849115-fead-0096-4fb1-d9bce405dea4@oracle.com>
 <CA+w6HxZ8d1CMsSHWtV3ZeEdhsDabeRkHtBpnoWHyrHOOXNbY+g@mail.gmail.com>
 <CA+w6HxYBNfYzrNy=gtfF2uvjy1cGCHxXztPmDMQUv3AaYcccKw@mail.gmail.com>
Message-ID: <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com>

Hi,

On 31.01.20 04:27, Man Cao wrote:
> Hi,
> 
> I have incorporated Thomas's changes, and fixed the tests and updated the
> CR.
> New webrev: https://cr.openjdk.java.net/~manc/8234608/webrev.01/
> 
> The issue is that the signature of makeRedefinition0() in libdefine.cpp was
> wrong.
> It missed the "jclass clazz" parameter.
> 
> I have tested using 'make test
> TEST="test/hotspot/jtreg/vmTestbase/gc/g1/unloading/tests/unloading_redefinition_*"
> ', for both fastdebug and product builds.
> 
> I suppose Submit repo would not run these tests, because it only runs
> tier1. Am I correct?

   hs-tier1-5 passed. Looks good.

Thomas


From thomas.schatzl at oracle.com  Mon Feb  3 09:53:41 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 3 Feb 2020 10:53:41 +0100
Subject: RFR (S): 8238220: Rename OWSTTaskTerminator to TaskTerminator
In-Reply-To: <65ce518b-56da-92a8-010a-e58c5c015a7e@oracle.com>
References: <5f99b054-e286-2a8c-5a37-d641eb4932f1@oracle.com>
 <f8226668-56bf-1fb1-2d94-7bc06bb9f1db@oracle.com>
 <10c01fdb-d6e3-01a3-6cee-a8f467fac372@oracle.com>
 <65ce518b-56da-92a8-010a-e58c5c015a7e@oracle.com>
Message-ID: <69f9f6fe-8ec3-cf5f-2c0a-97bddee31624@oracle.com>

Hi Sangheon, Stefan,

On 31.01.20 18:54, sangheon.kim at oracle.com wrote:
> Hi Thomas,
> 
> On 1/31/20 2:41 AM, Thomas Schatzl wrote:
>> Hi Sangheon,
>>
>> On 30.01.20 19:08, sangheon.kim at oracle.com wrote:
>>> Hi Thomas,
>>>
>>> On 1/30/20 3:34 AM, Thomas Schatzl wrote:
>>>> Hi all,
>>>>
>>>> ? can I have reviews for this renaming change of OWSTTaskTerminator 
>>>> to TaskTerminator now that there is only one task termination 
>>>> protocol implementation?
>>>>
>>>> I believe that the OWST prefix only makes the code harder to read 
>>>> without conveying interesting information at the uses.
>>>>
>>>> Based on JDK-8215297.
>>>>
>>>> CR:
>>>> https://bugs.openjdk.java.net/browse/JDK-8238220
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~tschatzl/8238220/webrev/
>>> Looks good as is.
>>>
>>> One thing to note is the order of renamed header file.
>>> It looks like you are treating uppercase first? :)
>>>
>>> e.g. at g1CollectedHeap.cpp
>>>
>>> +#include "gc/shared/taskTerminator.hpp"
>>> ? #include "gc/shared/taskqueue.inline.hpp"
>>>
>>>
>>> I expect alphabet order first and then upper-lowercase. :)
>>>
>>
>> ? by default, upper case sorts before lower case in many if not all 
>> situations on computers since typically all upper case letters are 
>> "before" lower case letters in character sets.
>>
>> I would like to keep it as is unless you or somebody else really 
>> objects - there does not seem to be a precedence in hotspot files.
> I'm fine with current order.
> As you said personally, hotspot style just says "Keep the include lines 
> sorted".
> 
> https://wiki.openjdk.java.net/display/HotSpot/StyleGuide
> 
> Thanks,
> Sangheon
> 

   thanks for your reviews.

Thomas


From thomas.schatzl at oracle.com  Mon Feb  3 09:55:36 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 3 Feb 2020 10:55:36 +0100
Subject: RFR (XS): 8238229: Remove TRACESPINNING debug code
In-Reply-To: <C47A6199-A5FC-4A66-A11A-11176127BF08@oracle.com>
References: <77430bd4-19d8-0c6e-edc8-750dae163d96@oracle.com>
 <48885c09-77c2-8924-d9ec-2a825fd60f29@oracle.com>
 <00eec1c7-d524-44c1-a331-95088bb74f3c@oracle.com>
 <C47A6199-A5FC-4A66-A11A-11176127BF08@oracle.com>
Message-ID: <cbfaa41e-fb5d-4b58-9f48-20e626277ecf@oracle.com>

Hi Kim, Stefan,

On 31.01.20 00:14, Kim Barrett wrote:
>> On Jan 30, 2020, at 11:43 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> Hi,
>>
>> On 30.01.20 16:24, Stefan Johansson wrote:
>>> Looks good,
>>> StefanJ
>>
>>   all fixed. Idk why these were missing in that webrev, I regenerated it.
>>
>> Thanks,
>>   Thomas
>>
>>> On 2020-01-30 12:56, Thomas Schatzl wrote:
>>>> Hi all,
>>>>
>>>>     can I have reviews for this removal of some debug code in the TaskTerminator class?
[...]
>>>> CR:
>>>> https://bugs.openjdk.java.net/browse/JDK-8238229
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~tschatzl/8238229/webrev/
>>> I agree that this can be removed, and there is even more code that should go. The call from each collected heap:
> 
> Looks good.
> 
> 

   thanks for your reviews.

Thomas


From zgu at redhat.com  Mon Feb  3 20:59:28 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 3 Feb 2020 15:59:28 -0500
Subject: [14] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with
 "Forwardee must point to a heap address"
Message-ID: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>

Shenandoah uses oop mark word's "marked" pattern to indicate forwarding.
Unfortunately, JVMTI heap walk (VM_HeapWalkOperation) also uses this 
pattern to indicate visited.

The conflicts present serious problems during Shenandoah's concurrent 
evacuation and concurrent reference update phases, as it blindly treats 
"marked" pattern as "forwarding".


There are invariants we can use to distinguish "forwarding" and 
"visited" pattern.

1. Marked pattern in collection set indicates forwarding
2. Marked pattern off collection set indicates visited by ObjectMarker 	
    (because oops seen by ObjectMarker were LRB'd)
3. No off collection set marked pattern at any shenandoah safepoint. In
    fact, no off collection set marked pattern at any safepoints except
    VM_HeapWalkOperation safepoints.
    This is an important invariant, since traversal degenerated GC drops
    collection set before entering degenerated GC cycle.

Note: We only downgrade some debug assertions, but preserve full 
capacities of verifier, because verifier always runs at safepoints.

Bug: https://bugs.openjdk.java.net/browse/JDK-8237632
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/

Test:
   hotspot_gc_shenandoah
   vmTestbase_nsk_jvmti
   vmTestbase_nsk_jdi

Thanks,

-Zhengyu


From m.sundar85 at gmail.com  Tue Feb  4 03:38:46 2020
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Mon, 3 Feb 2020 22:38:46 -0500
Subject: Parallel GC Thread crash
Message-ID: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>

Hi,
   I am seeing following crashes frequently on our servers
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
#
# JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
# Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel
gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
#
# No core dump will be written. Core dumps have been disabled. To enable
core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://github.com/AdoptOpenJDK/openjdk-build/issues
#


---------------  T H R E A D  ---------------

Current thread (0x00007fca2c051000):  GCTaskThread "ParGC Thread#8" [stack:
0x00007fca30277000,0x00007fca30377000] [id=108299]

Stack: [0x00007fca30277000,0x00007fca30377000],  sp=0x00007fca30374890,
 free space=1014k
Native frames: (J=compiled Java code, A=aot compiled Java code,
j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
const*, OopClosure*)+0x2eb
V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
CodeBlobClosure*, RegisterMap*, bool)+0x99
V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
CodeBlobClosure*)+0x187
V  [libjvm.so+0xcce2f0]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
unsigned int)+0xb0
V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7

JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v  ~RuntimeStub::_new_array_Java
J 225122 c2
ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
(207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8]
J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004
bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
J 225129 c2
webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
(105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac]
J 131643 c2
webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
(9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0]
J 55114 c2
webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
(332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644]
J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272
bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
J 16114% c2
com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
(486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c]
j
 com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
J 7560 c1
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
[0x00007fca15b23160+0x0000000000000df4]
J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
[0x00007fca15b39a40+0x000000000000007c]
J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
0x0000000000000000

Register to memory mapping:
...

Can someone shed more info on when this can happen? I am seeing this on
multiple servers with Java 13.0.1+9 on RHEL6 servers.

There was another thread in hotspot runtime where David Holmes pointed this
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
0x0000000000000000

> This seems it may be related to:
> https://bugs.openjdk.java.net/browse/JDK-8004124

Just wondering if this is same or something to do with GC specific.


TIA
Sundar


From stefan.karlsson at oracle.com  Tue Feb  4 10:47:32 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 4 Feb 2020 11:47:32 +0100
Subject: Parallel GC Thread crash
In-Reply-To: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
Message-ID: <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>

Hi Sundar,

The GC crashes when it encounters something bad on the stack:
 > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
 > const*, OopClosure*)+0x2eb
 > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,

This is probably not a GC bug. It's more likely that this is caused by 
the JIT compiler. I see in your hotspot-runtime-dev thread, that you 
also get crashes in other compiler related areas.

If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and 
-XX:+VerifyAfterGC, and see if this asserts before the GC has started 
running.

StefanK

On 2020-02-04 04:38, Sundara Mohan M wrote:
> Hi,
>     I am seeing following crashes frequently on our servers
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
> #
> # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel
> gc, linux-amd64)
> # Problematic frame:
> # V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> #
> # No core dump will be written. Core dumps have been disabled. To enable
> core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> #   https://github.com/AdoptOpenJDK/openjdk-build/issues
> #
> 
> 
> ---------------  T H R E A D  ---------------
> 
> Current thread (0x00007fca2c051000):  GCTaskThread "ParGC Thread#8" [stack:
> 0x00007fca30277000,0x00007fca30377000] [id=108299]
> 
> Stack: [0x00007fca30277000,0x00007fca30377000],  sp=0x00007fca30374890,
>   free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
> const*, OopClosure*)+0x2eb
> V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> CodeBlobClosure*, RegisterMap*, bool)+0x99
> V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> CodeBlobClosure*)+0x187
> V  [libjvm.so+0xcce2f0]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> unsigned int)+0xb0
> V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
> 
> JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v  ~RuntimeStub::_new_array_Java
> J 225122 c2
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8]
> J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
> J 225129 c2
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac]
> J 131643 c2
> webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0]
> J 55114 c2
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644]
> J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272
> bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
> J 16114% c2
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c]
> j
>   com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
> bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
> J 7560 c1
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
> [0x00007fca15b23160+0x0000000000000df4]
> J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
> [0x00007fca15b39a40+0x000000000000007c]
> J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
> v  ~StubRoutines::call_stub
> 
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> 0x0000000000000000
> 
> Register to memory mapping:
> ...
> 
> Can someone shed more info on when this can happen? I am seeing this on
> multiple servers with Java 13.0.1+9 on RHEL6 servers.
> 
> There was another thread in hotspot runtime where David Holmes pointed this
>> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> 0x0000000000000000
> 
>> This seems it may be related to:
>> https://bugs.openjdk.java.net/browse/JDK-8004124
> 
> Just wondering if this is same or something to do with GC specific.
> 
> 
> 
> TIA
> Sundar
> 


From zgu at redhat.com  Tue Feb  4 13:35:43 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 4 Feb 2020 08:35:43 -0500
Subject: [15] RFR 8238162: Shenandoah: Remove ShenandoahTaskTerminator wrapper
Message-ID: <c1ac5bfd-3343-5189-be49-55a2229e3c0e@redhat.com>

I can not recall why we still have terminator wrapper, probably a 
leftover after we upstreamed OWST terminator. Let's remove it.

Bug: https://bugs.openjdk.java.net/browse/JDK-8238162
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238162/webrev.00/index.html

Test:
   hotspot_gc_shenandoah


Thanks,

-Zhengyu


From thomas.schatzl at oracle.com  Tue Feb  4 14:39:41 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 4 Feb 2020 15:39:41 +0100
Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon
In-Reply-To: <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com>
References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com>
 <f9505da9-ca8b-f880-5420-bcdb3622b206@oracle.com>
 <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com>
 <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com>
Message-ID: <a6b4a22b-3294-ab97-2992-f58355016569@oracle.com>

Hi Kim,

On 31.01.20 23:25, Kim Barrett wrote:
>> On Jan 23, 2020, at 3:10 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>
>>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>> On 16.01.20 09:51, Kim Barrett wrote:
>>>> Please review this change to eliminate the DirtyCardQ_cbl_mon.  This
>>>> is one of the two remaining super-special "access" ranked mutexes.
>>>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered
>>>> by JDK-8221360.)
>>>> There are three main parts to this change.
>>>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a
>>>> lock-free FIFO queue.
>>>> (2) Replace the use of a HotSpot monitor for signaling activation of
>>>> concurrent refinement threads with a semaphore-based solution.
>>>> (3) Handle pausing of buffer refinement in the middle of a buffer in
>>>> order to handle a pending safepoint request.  This can no longer just
>>>> push the partially processed buffer back onto the queue, due to ABA
>>>> problems now that the buffer is lock-free.
>>>> CR:
>>>> https://bugs.openjdk.java.net/browse/JDK-8237143
>>>> Webrev:
>>>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/
>>>> Testing:
>>>> mach5 tier1-5
>>>> Normal performance testing showed no significant change.
>>>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS
>>>> improvement, though not statistically significant; removing contention
>>>> for that lock by many hardware threads may be a little bit noticeable.
>>>
>>> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again.
>>
>> After some offline discussion with Thomas, I?m doing some restructuring that
>> makes it probably not very efficient for anyone else to do a careful review of
>> the open.00 version.
> 
> Here's a new webrev:
> 
> https://cr.openjdk.java.net/~kbarrett/8237143/open.02/
> 

   I think this is good. Thanks for your extensive changes.

Two minor issues. Do not need re-review:

* s/unsufficient/insufficient in g1DirtyCardQueue.cpp
* simple predicates returning bool tend to have an "is_" or "has_" 
prepended to it, i.e. s/PausedBuffers::empty()/...::is_empty()/

Thanks,
   Thomas


From shade at redhat.com  Tue Feb  4 19:15:16 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Feb 2020 20:15:16 +0100
Subject: [15] RFR 8238162: Shenandoah: Remove ShenandoahTaskTerminator
 wrapper
In-Reply-To: <c1ac5bfd-3343-5189-be49-55a2229e3c0e@redhat.com>
References: <c1ac5bfd-3343-5189-be49-55a2229e3c0e@redhat.com>
Message-ID: <ecf5e792-61bf-7a2a-3c85-14baa418c2f5@redhat.com>

On 2/4/20 2:35 PM, Zhengyu Gu wrote:
> I can not recall why we still have terminator wrapper, probably a 
> leftover after we upstreamed OWST terminator. Let's remove it.

I think we have upstreamed our version here? If so, please link it to 8238162:
  https://bugs.openjdk.java.net/browse/JDK-8204947

> Bug: https://bugs.openjdk.java.net/browse/JDK-8238162
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238162/webrev.00/index.html

Looks good.

-- 
Thanks,
-Aleksey


From shade at redhat.com  Tue Feb  4 19:23:05 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Feb 2020 20:23:05 +0100
Subject: [14] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests
 with "Forwardee must point to a heap address"
In-Reply-To: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
Message-ID: <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>

On 2/3/20 9:59 PM, Zhengyu Gu wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/

Uh. It seems to me the cure is worse than the disease:
  1) It rewires sensitive parts of barrier paths, root handling, etc, which requires more thorough
testing, and we are too deep in RDP2 for this;
  2) It effectively disables asserts for anything not in collection set. Which means it disables
most of asserts. The fact that Verifier still works is a small consolation.

I propose to accept this failure in 14, and rework the JVMTI heap walk to stop messing around with
mark words in 15. Since this relates to concurrent root handling, 11-shenandoah is already safe.

-- 
Thanks,
-Aleksey


From zgu at redhat.com  Tue Feb  4 19:29:52 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 4 Feb 2020 14:29:52 -0500
Subject: [14] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests
 with "Forwardee must point to a heap address"
In-Reply-To: <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
 <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
Message-ID: <2121a97b-47fe-0205-51ad-a927576fbb93@redhat.com>


On 2/4/20 2:23 PM, Aleksey Shipilev wrote:
> On 2/3/20 9:59 PM, Zhengyu Gu wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632
>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/
> 
> Uh. It seems to me the cure is worse than the disease:
>    1) It rewires sensitive parts of barrier paths, root handling, etc, which requires more thorough
> testing, and we are too deep in RDP2 for this;
>    2) It effectively disables asserts for anything not in collection set. Which means it disables
> most of asserts. The fact that Verifier still works is a small consolation.
> 
> I propose to accept this failure in 14, and rework the JVMTI heap walk to stop messing around with
> mark words in 15. Since this relates to concurrent root handling, 11-shenandoah is already safe.

I have yet to test 11-shenandoah. But performing JVMTI heap walk during 
evacuation phase, still sounds the alarm to me.

-Zhengyu

> 


From shade at redhat.com  Tue Feb  4 19:33:28 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 4 Feb 2020 20:33:28 +0100
Subject: [14] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests
 with "Forwardee must point to a heap address"
In-Reply-To: <2121a97b-47fe-0205-51ad-a927576fbb93@redhat.com>
References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
 <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
 <2121a97b-47fe-0205-51ad-a927576fbb93@redhat.com>
Message-ID: <a8454cca-e4d7-6c55-3cff-fd3420e06595@redhat.com>

On 2/4/20 8:29 PM, Zhengyu Gu wrote:
> On 2/4/20 2:23 PM, Aleksey Shipilev wrote:
>> On 2/3/20 9:59 PM, Zhengyu Gu wrote:
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632
>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/
>>
>> Uh. It seems to me the cure is worse than the disease:
>>    1) It rewires sensitive parts of barrier paths, root handling, etc, which requires more thorough
>> testing, and we are too deep in RDP2 for this;
>>    2) It effectively disables asserts for anything not in collection set. Which means it disables
>> most of asserts. The fact that Verifier still works is a small consolation.
>>
>> I propose to accept this failure in 14, and rework the JVMTI heap walk to stop messing around with
>> mark words in 15. Since this relates to concurrent root handling, 11-shenandoah is already safe.
> 
> I have yet to test 11-shenandoah. But performing JVMTI heap walk during 
> evacuation phase, still sounds the alarm to me.

Right. There is still plenty of time to fix 11. Let's not rush it in 14.

-- 
Thanks,
-Aleksey


From m.sundar85 at gmail.com  Tue Feb  4 20:21:26 2020
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Tue, 4 Feb 2020 15:21:26 -0500
Subject: Parallel GC Thread crash
In-Reply-To: <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
Message-ID: <CACGCMVp1pHbo4g-0+XPen0giK5U4bWBfB-Kfp0S1DYmVp1cLxg@mail.gmail.com>

Thanks for the tip!

On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson <stefan.karlsson at oracle.com>
wrote:

> Hi Sundar,
>
> The GC crashes when it encounters something bad on the stack:
>  > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
>  > const*, OopClosure*)+0x2eb
>  > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
>
> This is probably not a GC bug. It's more likely that this is caused by
> the JIT compiler. I see in your hotspot-runtime-dev thread, that you
> also get crashes in other compiler related areas.
>
> If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and
> -XX:+VerifyAfterGC, and see if this asserts before the GC has started
> running.
>
> StefanK
>
> On 2020-02-04 04:38, Sundara Mohan M wrote:
> > Hi,
> >     I am seeing following crashes frequently on our servers
> > #
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
> > #
> > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
> > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered,
> parallel
> > gc, linux-amd64)
> > # Problematic frame:
> > # V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> > #
> > # No core dump will be written. Core dumps have been disabled. To enable
> > core dumping, try "ulimit -c unlimited" before starting Java again
> > #
> > # If you would like to submit a bug report, please visit:
> > #   https://github.com/AdoptOpenJDK/openjdk-build/issues
> > #
> >
> >
> > ---------------  T H R E A D  ---------------
> >
> > Current thread (0x00007fca2c051000):  GCTaskThread "ParGC Thread#8"
> [stack:
> > 0x00007fca30277000,0x00007fca30377000] [id=108299]
> >
> > Stack: [0x00007fca30277000,0x00007fca30377000],  sp=0x00007fca30374890,
> >   free space=1014k
> > Native frames: (J=compiled Java code, A=aot compiled Java code,
> > j=interpreted, Vv=VM code, C=native code)
> > V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
> > const*, OopClosure*)+0x2eb
> > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> > CodeBlobClosure*, RegisterMap*, bool)+0x99
> > V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> > CodeBlobClosure*)+0x187
> > V  [libjvm.so+0xcce2f0]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> > unsigned int)+0xb0
> > V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> > V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> > V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
> >
> > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
> > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> > v  ~RuntimeStub::_new_array_Java
> > J 225122 c2
> > ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> > (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8]
> > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
> > J 225129 c2
> >
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> > (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac]
> > J 131643 c2
> >
> webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0]
> > J 55114 c2
> >
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> > (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644]
> > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272
> > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
> > J 16114% c2
> >
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> > (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c]
> > j
> >
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
> > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
> > J 7560 c1
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
> > [0x00007fca15b23160+0x0000000000000df4]
> > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
> > [0x00007fca15b39a40+0x000000000000007c]
> > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
> > v  ~StubRoutines::call_stub
> >
> > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > 0x0000000000000000
> >
> > Register to memory mapping:
> > ...
> >
> > Can someone shed more info on when this can happen? I am seeing this on
> > multiple servers with Java 13.0.1+9 on RHEL6 servers.
> >
> > There was another thread in hotspot runtime where David Holmes pointed
> this
> >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > 0x0000000000000000
> >
> >> This seems it may be related to:
> >> https://bugs.openjdk.java.net/browse/JDK-8004124
> >
> > Just wondering if this is same or something to do with GC specific.
> >
> >
> >
> > TIA
> > Sundar
> >
>


From thomas.schatzl at oracle.com  Wed Feb  5 08:13:57 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 5 Feb 2020 09:13:57 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
Message-ID: <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>

Hi Liang,

   apologies for the late reply - I did look at the patch immediately 
after you posted it, but initial tests showed that it does not work as 
(I) expected. More about that below. So I went ahead and hack up 
something that comes closer to what I had in mind. Unfortunately other 
more urgent issues came up, which caused the delay on this work. Sorry. 
(And sorry for the long post).

Not having any kind of workload to work with for testing the change I 
used some configuration of specjbb2015 with fixed ir [0] (taken from a 
colleague's unrelated recent internal test), simulating a constant load 
the user wants to control the heap usage of.

In this situatio I want to apologize to use specjbb2015 for this public 
reply because it not openly available, but I only noticed when writing 
up this email. Finding a substitute and redoing measurements would 
probably take more time. I will start looking into this issue.

Anyway, in my test scenario, after warmup, the user tries to first limit 
the heap to 2GB, and after a while to 3GB, and then back to 8GB.

The resulting graph [1] shows heap metrics over time: blue ("soft") is 
the current SoftMaxHeapSize, pink ("committed") represents committed 
memory, yellow ("goal")  shows G1's current heap size goal, turquoise 
("free") the amount of free heap and purple ("used") the amount of used 
memory.

Ignoring the drop from ~second 30-100 where I finally managed to set 
Min/MaxHeapFreeRatio ;) you can see  that G1 kind of stabilizes at 
around 3.8GB heap; at ~second 410 the softmaxheapsize soft is set to 
2GB. As you can see, G1 ignores the request. This corresponds to the 
code where apparently the heap is only reduced to SoftMaxHeapSize if 
there is enough free space to reduce to that value (I think).

At ~second 620 I set SoftMaxHeapSize to 3GB which gives the expected 
drop in memory usage. However, since the change does not modify G1 goals 
it ultimately just ignores the SoftMaxHeapSize goal. It probably worked 
if there were no further application activity.

I created a webrev of an alternative attempt that modifies G1's 
goal/target heap size in the adaptive IHOP mechanism so that G1 
automatically starts marking so that a space reclamation phase starts 
before reaching softmaxheapsize. It basically changes the predictor's 
reserve according to current committed heap size not only based on 
G1ReservePercent, but also on the specified SoftMaxHeapSize.

One complication in a generational setting is to adapt young gen 
(particularly survivor size) to that goal too, but I think the change 
does okay with that.

However it is not finished yet, there is debugging code in it and one 
FIXME that is about shuffling around code properly.

In the graph at [3] you can see the results, with same metrics shown. In 
this case G1 fairly well follows the soft goal.

For the 2g softmaxheapsize goal it works perfectly in the example (*1), 
in the 3g softmaxheapsize change we get some initial short overshoot in 
committed memory. (*2/*3)

There are however some problems/differences to your solution here which 
need to be discussed a bit more to see if it fits you and ultimately 
make it perform better:

*0 this change uses existing sizing to uncommit memory, i.e. memory is 
not uncommitted immediately but part of regular operation. This means 
that the garbage collection cycle needs to advance. In case of specjbb 
with fixed IR this is no issue, but completely quiescent applications 
need other mechanisms like the "Promptly Return Unused Committed Memory 
(JEP 346) feature enabled. Some tuning is needed in that mechanism for 
almost-idle applications.

*1 the problem with only setting SoftMaxHeapSize and relying on the 
regular uncommit mechanism is that due to other reasons, e.g. 
GCTimeRatio, G1 won't achieve this kind of compact heap. This is the 
reason why my setup includes the GCTimeRatio=4 on the command line - 
otherwise in neither case G1 would achieve the 2g goal (it would settle 
around 3g with my changes, didn't test the original changes; max heap 
usage would be ~5.8GB without SoftMaxHeapSize fyi), and you can't modify 
it during runtime (i.e. when you want to select a different 
throughput/latency tradeoff to achieve lower heap usage).

*2 looking at the results more closely the (first) overshoot in the 3g 
soft max heap size goal, I think this is a remaining issue in the heap 
sizing policy in conjunction with soft max heap size, i.e. temporarily 
the target gctimeratio is set to 10% for various reasons. (in 
G1HeapSizingPolicy::expansion_amount()).

In the log I have, the problem seems to be that we are re-setting the 
softmaxheapsize within the space reclamation phase (i.e. mixed gc) and 
G1 sizing policies got confused, i.e. it partially keeps on using the 2g 
goal for young gen sizing until the *2 problem expands it. That's a bug 
and needs to be fixed.

So far previous text only looked at the best case where everything fits 
together; there are some other issues which will prevent you from 
achieving a tight heap in some cases that I noticed during my testing. 
Something to think about.

*4 GCTimeRatio/heap expansion during young gc has different goals than 
the (un-)commit at the end of full gc. In some cases, with 
SoftMaxHeapSize (but also without), the later will undo the expansion at 
young gc, which will immediately start to expand again.

*5 GCTimeRatio can't be adjusted during runtime, which means that you 
won't achieve that tight of a heap as in this example. GCTimeRatio is 
also a bit unwieldy to use, i.e since it is the denominator in the 
(default; nobody sets GCPauseIntervalMillis) time calculation, you get 
"good" granularity of low values, but pretty bad granularity of high values.

*6 Min/MaxHeapFreeRatio default values are probably too high - with 
adaptive IHOP, G1 can typically meet its current goal very well, any 
excess is often just wasted committed memory. A similar issue to that 
is, don't set Min/MaxHeapFreeRatio to something below G1ReservePercent, 
i.e. the default reserve for the IHOP. In this case there will be 
significant memory commit/uncommit pauses.

Here is my question to you (and any readers), are you using 
Min/MaxHeapFreeRatio? Using SoftMaxHeapSize to set a target heap size 
seems to be much more direct and better than Min/MaxHeapFreeRatio. Given 
above (and assuming that there are no reasons to keep it), it may be 
useful to start deprecation process (at least for the use in G1) when 
SoftMaxHeapSize is in.

There are some more issues with heap sizing not really relevant to this 
discussion, I need to think about them a bit more and file appropriately 
worded CRs.

Either way, what do you think about my suggested change? Can you try it 
on your workloads to see if it could do the job? Any other comments?

More work is needed on this patch I think; also we might need to think 
about how the user can detect this change of the target better in the 
logs for troubleshooting.

The original patch (webrev.2) also contained some minor unrelated 
cleanups (one constification of a method, one rename of the heap 
resizing phase) that might be easier to address separately more quickly ;)

Thanks,
   Thomas

[0] specjbb2015 settings: -Dspecjbb.comm.connect.type=HTTP_Jetty 
-Dspecjbb.controller.type=PRESET -Dspecjbb.controller.presett.ir=5000 
-Dspecjbb.controller.preset.duration=10800000"
VM settings: -Xms2g -Xmx8g -XX:GCTimeRatio=4 -XX:+UseStringDeduplication

This gives ~1.5GB live set size, on my machine around 10-40ms pause 
time, so rather light load at least without setting any heap size goal; 
in my runs, G1 settles to around 3.8GB of committed heap. (with 
Min/MaxHeapFreeRatio=10 set after startup, but you can just put it into 
the VM startup options too)

[1] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize-alibaba.png

[2] http://cr.openjdk.java.net/~tschatzl/8236073/webrev/

[3] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize.png


From sangheon.kim at oracle.com  Wed Feb  5 22:52:49 2020
From: sangheon.kim at oracle.com (sangheon.kim at oracle.com)
Date: Wed, 5 Feb 2020 14:52:49 -0800
Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon
In-Reply-To: <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com>
References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com>
 <f9505da9-ca8b-f880-5420-bcdb3622b206@oracle.com>
 <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com>
 <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com>
Message-ID: <d091aa36-eb6a-0784-84c5-fa5dfab30af1@oracle.com>

Hi Kim,

On 1/31/20 2:25 PM, Kim Barrett wrote:
>> On Jan 23, 2020, at 3:10 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>
>>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>> On 16.01.20 09:51, Kim Barrett wrote:
>>>> Please review this change to eliminate the DirtyCardQ_cbl_mon.  This
>>>> is one of the two remaining super-special "access" ranked mutexes.
>>>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered
>>>> by JDK-8221360.)
>>>> There are three main parts to this change.
>>>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a
>>>> lock-free FIFO queue.
>>>> (2) Replace the use of a HotSpot monitor for signaling activation of
>>>> concurrent refinement threads with a semaphore-based solution.
>>>> (3) Handle pausing of buffer refinement in the middle of a buffer in
>>>> order to handle a pending safepoint request.  This can no longer just
>>>> push the partially processed buffer back onto the queue, due to ABA
>>>> problems now that the buffer is lock-free.
>>>> CR:
>>>> https://bugs.openjdk.java.net/browse/JDK-8237143
>>>> Webrev:
>>>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/
>>>> Testing:
>>>> mach5 tier1-5
>>>> Normal performance testing showed no significant change.
>>>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS
>>>> improvement, though not statistically significant; removing contention
>>>> for that lock by many hardware threads may be a little bit noticeable.
>>> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again.
>> After some offline discussion with Thomas, I?m doing some restructuring that
>> makes it probably not very efficient for anyone else to do a careful review of
>> the open.00 version.
> Here's a new webrev:
>
> https://cr.openjdk.java.net/~kbarrett/8237143/open.02/
Webrev.02 looks really good.

Thanks,
Sangheon


>
> Testing:
> mach5 tier1-5
> Performance testing showed no significant change.
>
> I didn't bother providing an incremental webrev, because the changes
> to g1DirtyCardQueue.[ch]pp are pretty substantial.  Those are the only
> files changed, except for the suggested move of the comment for
> G1ConcurrentRefineThread::maybe_deactivate and some related comment
> improvements nearby.
>
> Most of this round of changes are refactoring within G1DirtyCardQueueSet,
> mainly adding internal helper classes for the FIFO queue and for the paused
> buffers, each with their own (commented) APIs.  I think that has addressed a
> lot of Thomas's comments about the comments, and I hope has made the code
> easier to understand.
>
> I've also improved the mechanism for handling "paused" buffers, simplifying
> it by making better use of some invariants.
>
>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>> // The key idea to make this work is that pop (get_completed_buffer)
>> // never returns an element of the queue if it is the only accessible
>> // element,
>> If I understand this correctly, maybe "if there is only one buffer in the FIFO" is easier to understand than "only accessible element". (or define "accessible element?).
> I specifically don't want to say it that way because we could have a
> situation like
>
> (1) Start with a queue having exactly one element.
>
> (2) Thread1 starts a push by updating tail, but has not yet linked the old
> tail to the new.
>
> (3) Thread2 performs a push.
>
> The buffer pushed by Thread2 is "in the queue" by some reasonable
> definition, so the queue contains two buffers.  But that buffer is not yet
> accessible, because Thread1 hasn't completed its push.  The alternative is
> to (in the description) somehow divorce a completed push from the notion of
> the number of buffers in the queue, which seems worse to me.  I expanded the
> discussion a bit though, including what is meant by "accessible".
>
>> The code seems to unnecessarily use the NULL_buffer constant. Maybe use it here too. Overall I am not sure about the usefulness of using NULL_buffer in the code. The NULL value in Hotspot code is generally accepted as a special value, and the name "NULL_buffer" does not seem to add any information.
> The point of NULL_buffer was to avoid casts of NULL in Atomic operations,
> and I then used it consistently.  But I've changed to using such casts,
> since it turned out there weren't that many and we can get rid of those
> uniformly here and elsewhere when we have C++11 nullptr and nullptr_t.
>
>
>


From maoliang.ml at alibaba-inc.com  Thu Feb  6 12:27:09 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Thu, 06 Feb 2020 20:27:09 +0800
Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?=
 =?UTF-8?B?aGV1cmlzdGljcw==?=
In-Reply-To: <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>,
 <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
Message-ID: <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>

Hi Thomas,

Thanks for the testing and evaluating!

I tried your test with specjbb2015 and had some little different
result maybe because of machine capability. The config I used is as below:
-Xmx8g -Xms2g -Xlog:gc* -XX:GCTimeRatio=4
-XX:+UseStringDeduplication
-Dspecjbb.comm.connect.type=HTTP_Jetty
-Dspecjbb.controller.type=PRESET
-Dspecjbb.controller.preset.ir=5000
-Dspecjbb.controller.preset.duration=10800000

The heap was around 6GB after running for a while (300s). And
I was able to use SoftMaxHeapSize to let it shrink to 5GB. It
should be like your scenario to shrink the heap to 3GB.

The behavior is as I expected. But I thought you might expect 
more aggressive result. In my mind, for a constant load,
the jvm might not need to shrink the heap that JVM supposes to expand
the heap to the right capacity. The soft limit I imagine is 
to bring the heap size down after a load pike. In Alibaba's
workload, the heap shrink is controlled by cluster's unified
 control center which has the predicition data and the soft limit
works more like a *hard* limit in our 8u implementation.
So I think it is acceptable that heap size failed shrinked
 to 2GB in your test case. You can see that
G1HeapSizingPolicy::can_shrink_heap_size_to is a bit conservative
and we may be able to make it more aggressive.


For almost idle application which doesn't have a GC for a 
rather long time, the shrink cannot happen. In our previous 8u
patch, we have a timer to trigger GC and the softmx is changed by
a jcmd which will also trigger a GC(there was no SoftMaxHeapSize option
in 8u yet). Shall we introduce a timer GC as well?


Honestly, I don't think Min/MaxHeapFreeRatio is a good way to detemine
the heap expand/shrink in G1 and in our 8u practical experience we never
have full GC so Min/MaxHeapFreeRatio is useless. Here when I reproduce
your test, the only exception is the heap will expand to 6GB after
shrinking to SoftMaxHeapSize=5g is because in remark we will resize the heap.
BTW, I don't think remark is a good point to resize heap since in remark
phaseregions full of garbage havn't been reclaimed yet. IMHO we even don't
need to resize in remark but just resize after mixed GC according to GCTimeRatio.

Your change to make SoftMaxHeapSize sensible in adaptive IHOP controlling
 seems a similar approach as ZGC. ZGC is a single generation GC whose scenario
 is much simpler. Maybe we don't need SoftMaxHeapSize to guide GC decision
 in G1. Since we already have policy to determine the shrink of the heap
 by SoftMaxHeapSize, I'm not sure if we need to make adaptive IHOP according
 to SoftMaxHeapSize... We may encounter the situation that we cannot shrink the 
heap size to SoftMaxHeapSize but concurrent mark become frequent after affecting
the IHOP policy.

> In the log I have, the problem seems to be that we are re-setting the 
> softmaxheapsize within the space reclamation phase (i.e. mixed gc) and 
> G1 sizing policies got confused, i.e. it partially keeps on using the 2g 
> goal for young gen sizing until the *2 problem expands it. That's a bug 
> and needs to be fixed.

I don't think it's a problem that after mixed GC resize_heap_after_young_collection
will evaluate if the heap can be shrinked to the new value of SoftMaxHeapSize.

Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 5 (Wed.) 16:14
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi Liang,

   apologies for the late reply - I did look at the patch immediately 
after you posted it, but initial tests showed that it does not work as 
(I) expected. More about that below. So I went ahead and hack up 
something that comes closer to what I had in mind. Unfortunately other 
more urgent issues came up, which caused the delay on this work. Sorry. 
(And sorry for the long post).

Not having any kind of workload to work with for testing the change I 
used some configuration of specjbb2015 with fixed ir [0] (taken from a 
colleague's unrelated recent internal test), simulating a constant load 
the user wants to control the heap usage of.

In this situatio I want to apologize to use specjbb2015 for this public 
reply because it not openly available, but I only noticed when writing 
up this email. Finding a substitute and redoing measurements would 
probably take more time. I will start looking into this issue.

Anyway, in my test scenario, after warmup, the user tries to first limit 
the heap to 2GB, and after a while to 3GB, and then back to 8GB.

The resulting graph [1] shows heap metrics over time: blue ("soft") is 
the current SoftMaxHeapSize, pink ("committed") represents committed 
memory, yellow ("goal")  shows G1's current heap size goal, turquoise 
("free") the amount of free heap and purple ("used") the amount of used 
memory.

Ignoring the drop from ~second 30-100 where I finally managed to set 
Min/MaxHeapFreeRatio ;) you can see  that G1 kind of stabilizes at 
around 3.8GB heap; at ~second 410 the softmaxheapsize soft is set to 
2GB. As you can see, G1 ignores the request. This corresponds to the 
code where apparently the heap is only reduced to SoftMaxHeapSize if 
there is enough free space to reduce to that value (I think).

At ~second 620 I set SoftMaxHeapSize to 3GB which gives the expected 
drop in memory usage. However, since the change does not modify G1 goals 
it ultimately just ignores the SoftMaxHeapSize goal. It probably worked 
if there were no further application activity.

I created a webrev of an alternative attempt that modifies G1's 
goal/target heap size in the adaptive IHOP mechanism so that G1 
automatically starts marking so that a space reclamation phase starts 
before reaching softmaxheapsize. It basically changes the predictor's 
reserve according to current committed heap size not only based on 
G1ReservePercent, but also on the specified SoftMaxHeapSize.

One complication in a generational setting is to adapt young gen 
(particularly survivor size) to that goal too, but I think the change 
does okay with that.

However it is not finished yet, there is debugging code in it and one 
FIXME that is about shuffling around code properly.

In the graph at [3] you can see the results, with same metrics shown. In 
this case G1 fairly well follows the soft goal.

For the 2g softmaxheapsize goal it works perfectly in the example (*1), 
in the 3g softmaxheapsize change we get some initial short overshoot in 
committed memory. (*2/*3)

There are however some problems/differences to your solution here which 
need to be discussed a bit more to see if it fits you and ultimately 
make it perform better:

*0 this change uses existing sizing to uncommit memory, i.e. memory is 
not uncommitted immediately but part of regular operation. This means 
that the garbage collection cycle needs to advance. In case of specjbb 
with fixed IR this is no issue, but completely quiescent applications 
need other mechanisms like the "Promptly Return Unused Committed Memory 
(JEP 346) feature enabled. Some tuning is needed in that mechanism for 
almost-idle applications.

*1 the problem with only setting SoftMaxHeapSize and relying on the 
regular uncommit mechanism is that due to other reasons, e.g. 
GCTimeRatio, G1 won't achieve this kind of compact heap. This is the 
reason why my setup includes the GCTimeRatio=4 on the command line - 
otherwise in neither case G1 would achieve the 2g goal (it would settle 
around 3g with my changes, didn't test the original changes; max heap 
usage would be ~5.8GB without SoftMaxHeapSize fyi), and you can't modify 
it during runtime (i.e. when you want to select a different 
throughput/latency tradeoff to achieve lower heap usage).

*2 looking at the results more closely the (first) overshoot in the 3g 
soft max heap size goal, I think this is a remaining issue in the heap 
sizing policy in conjunction with soft max heap size, i.e. temporarily 
the target gctimeratio is set to 10% for various reasons. (in 
G1HeapSizingPolicy::expansion_amount()).

In the log I have, the problem seems to be that we are re-setting the 
softmaxheapsize within the space reclamation phase (i.e. mixed gc) and 
G1 sizing policies got confused, i.e. it partially keeps on using the 2g 
goal for young gen sizing until the *2 problem expands it. That's a bug 
and needs to be fixed.

So far previous text only looked at the best case where everything fits 
together; there are some other issues which will prevent you from 
achieving a tight heap in some cases that I noticed during my testing. 
Something to think about.

*4 GCTimeRatio/heap expansion during young gc has different goals than 
the (un-)commit at the end of full gc. In some cases, with 
SoftMaxHeapSize (but also without), the later will undo the expansion at 
young gc, which will immediately start to expand again.

*5 GCTimeRatio can't be adjusted during runtime, which means that you 
won't achieve that tight of a heap as in this example. GCTimeRatio is 
also a bit unwieldy to use, i.e since it is the denominator in the 
(default; nobody sets GCPauseIntervalMillis) time calculation, you get 
"good" granularity of low values, but pretty bad granularity of high values.

*6 Min/MaxHeapFreeRatio default values are probably too high - with 
adaptive IHOP, G1 can typically meet its current goal very well, any 
excess is often just wasted committed memory. A similar issue to that 
is, don't set Min/MaxHeapFreeRatio to something below G1ReservePercent, 
i.e. the default reserve for the IHOP. In this case there will be 
significant memory commit/uncommit pauses.

Here is my question to you (and any readers), are you using 
Min/MaxHeapFreeRatio? Using SoftMaxHeapSize to set a target heap size 
seems to be much more direct and better than Min/MaxHeapFreeRatio. Given 
above (and assuming that there are no reasons to keep it), it may be 
useful to start deprecation process (at least for the use in G1) when 
SoftMaxHeapSize is in.

There are some more issues with heap sizing not really relevant to this 
discussion, I need to think about them a bit more and file appropriately 
worded CRs.

Either way, what do you think about my suggested change? Can you try it 
on your workloads to see if it could do the job? Any other comments?

More work is needed on this patch I think; also we might need to think 
about how the user can detect this change of the target better in the 
logs for troubleshooting.

The original patch (webrev.2) also contained some minor unrelated 
cleanups (one constification of a method, one rename of the heap 
resizing phase) that might be easier to address separately more quickly ;)

Thanks,
   Thomas

[0] specjbb2015 settings: -Dspecjbb.comm.connect.type=HTTP_Jetty 
-Dspecjbb.controller.type=PRESET -Dspecjbb.controller.presett.ir=5000 
-Dspecjbb.controller.preset.duration=10800000"
VM settings: -Xms2g -Xmx8g -XX:GCTimeRatio=4 -XX:+UseStringDeduplication

This gives ~1.5GB live set size, on my machine around 10-40ms pause 
time, so rather light load at least without setting any heap size goal; 
in my runs, G1 settles to around 3.8GB of committed heap. (with 
Min/MaxHeapFreeRatio=10 set after startup, but you can just put it into 
the VM startup options too)

[1] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize-alibaba.png

[2] http://cr.openjdk.java.net/~tschatzl/8236073/webrev/

[3] http://cr.openjdk.java.net/~tschatzl/8236073/softmaxheapsize.png


From zgu at redhat.com  Thu Feb  6 17:34:10 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 6 Feb 2020 12:34:10 -0500
Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing null
 check
Message-ID: <e22fe1b8-8174-3a2f-76b4-b6e023b0c65e@redhat.com>

Please review this small change that adds a null check before calling 
keep alive barrier to avoid assertion failure.

Native barrier may return null for a not null oop, if it is dead.

Bug: https://bugs.openjdk.java.net/browse/JDK-8238574
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/

Test:
   hotspot_gc_shenandoah, vmTestbase_nsk_jdi where the problem was observed.

Thanks,

-Zhengyu


From shade at redhat.com  Thu Feb  6 17:42:44 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 6 Feb 2020 18:42:44 +0100
Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing
 null check
In-Reply-To: <e22fe1b8-8174-3a2f-76b4-b6e023b0c65e@redhat.com>
References: <e22fe1b8-8174-3a2f-76b4-b6e023b0c65e@redhat.com>
Message-ID: <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com>

On 2/6/20 6:34 PM, Zhengyu Gu wrote:
> Please review this small change that adds a null check before calling 
> keep alive barrier to avoid assertion failure.
> 
> Native barrier may return null for a not null oop, if it is dead.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8238574
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/
The patch looks good.

But I have a broader question: are all other paths that use the returned value from LRB-native safe?
E.g. calling from assembler/C1/C2?

Thanks,
-Aleksey


From zgu at redhat.com  Thu Feb  6 17:56:11 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 6 Feb 2020 12:56:11 -0500
Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing
 null check
In-Reply-To: <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com>
References: <e22fe1b8-8174-3a2f-76b4-b6e023b0c65e@redhat.com>
 <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com>
Message-ID: <b479e747-883c-b71b-96e0-6da14a08406e@redhat.com>


On 2/6/20 12:42 PM, Aleksey Shipilev wrote:
> On 2/6/20 6:34 PM, Zhengyu Gu wrote:
>> Please review this small change that adds a null check before calling
>> keep alive barrier to avoid assertion failure.
>>
>> Native barrier may return null for a not null oop, if it is dead.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238574
>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/
> The patch looks good.
> 
> But I have a broader question: are all other paths that use the returned value from LRB-native safe?
> E.g. calling from assembler/C1/C2?

In C1/C2, we just make runtime call to this implementation.

-Zhengyu

> 
> Thanks,
> -Aleksey
> 


From zgu at redhat.com  Thu Feb  6 18:32:46 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 6 Feb 2020 13:32:46 -0500
Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing
 null check
In-Reply-To: <b479e747-883c-b71b-96e0-6da14a08406e@redhat.com>
References: <e22fe1b8-8174-3a2f-76b4-b6e023b0c65e@redhat.com>
 <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com>
 <b479e747-883c-b71b-96e0-6da14a08406e@redhat.com>
Message-ID: <5085eff3-c6b4-c4d7-e3d6-7928ff77561a@redhat.com>


On 2/6/20 12:56 PM, Zhengyu Gu wrote:
> 
> 
> On 2/6/20 12:42 PM, Aleksey Shipilev wrote:
>> On 2/6/20 6:34 PM, Zhengyu Gu wrote:
>>> Please review this small change that adds a null check before calling
>>> keep alive barrier to avoid assertion failure.
>>>
>>> Native barrier may return null for a not null oop, if it is dead.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238574
>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/
>> The patch looks good.
>>
>> But I have a broader question: are all other paths that use the 
>> returned value from LRB-native safe?
>> E.g. calling from assembler/C1/C2?
> 
> In C1/C2, we just make runtime call to this implementation.

Sorry, jump the gun to fast. I don't think I answered your question :-(

C1/C2's pre-val barriers seem to have null check. Roman, could you confirm?

Thanks,

-Zhengyu

> 
> -Zhengyu
> 
>>
>> Thanks,
>> -Aleksey
>>


From rkennke at redhat.com  Thu Feb  6 19:39:08 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 6 Feb 2020 20:39:08 +0100
Subject: [15] RFR 8238574: Shenandoah: Assertion failure due to missing
 null check
In-Reply-To: <5085eff3-c6b4-c4d7-e3d6-7928ff77561a@redhat.com>
References: <e22fe1b8-8174-3a2f-76b4-b6e023b0c65e@redhat.com>
 <43248314-c8b1-d78a-5bff-415aaaa957cd@redhat.com>
 <b479e747-883c-b71b-96e0-6da14a08406e@redhat.com>
 <5085eff3-c6b4-c4d7-e3d6-7928ff77561a@redhat.com>
Message-ID: <028fc92b-b660-354f-1c4b-4a78bae8319a@redhat.com>

Hi folks,

>>>> Please review this small change that adds a null check before calling
>>>> keep alive barrier to avoid assertion failure.
>>>>
>>>> Native barrier may return null for a not null oop, if it is dead.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238574
>>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8238574/webrev.00/
>>> The patch looks good.
>>>
>>> But I have a broader question: are all other paths that use the
>>> returned value from LRB-native safe?
>>> E.g. calling from assembler/C1/C2?
>>
>> In C1/C2, we just make runtime call to this implementation.
> 
> Sorry, jump the gun to fast. I don't think I answered your question :-(
> 
> C1/C2's pre-val barriers seem to have null check. Roman, could you confirm?

Yes, I think this is correct.

Thanks,
Roman


From kim.barrett at oracle.com  Fri Feb  7 00:00:22 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 6 Feb 2020 19:00:22 -0500
Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon
In-Reply-To: <a6b4a22b-3294-ab97-2992-f58355016569@oracle.com>
References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com>
 <f9505da9-ca8b-f880-5420-bcdb3622b206@oracle.com>
 <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com>
 <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com>
 <a6b4a22b-3294-ab97-2992-f58355016569@oracle.com>
Message-ID: <EAEBD220-579C-4402-9360-D6E4EC6B55F2@oracle.com>

> On Feb 4, 2020, at 9:39 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Kim,
> 
> On 31.01.20 23:25, Kim Barrett wrote:
>>> On Jan 23, 2020, at 3:10 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>> 
>>>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>>> On 16.01.20 09:51, Kim Barrett wrote:
>>>>> Please review this change to eliminate the DirtyCardQ_cbl_mon.  This
>>>>> is one of the two remaining super-special "access" ranked mutexes.
>>>>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered
>>>>> by JDK-8221360.)
>>>>> There are three main parts to this change.
>>>>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a
>>>>> lock-free FIFO queue.
>>>>> (2) Replace the use of a HotSpot monitor for signaling activation of
>>>>> concurrent refinement threads with a semaphore-based solution.
>>>>> (3) Handle pausing of buffer refinement in the middle of a buffer in
>>>>> order to handle a pending safepoint request.  This can no longer just
>>>>> push the partially processed buffer back onto the queue, due to ABA
>>>>> problems now that the buffer is lock-free.
>>>>> CR:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8237143
>>>>> Webrev:
>>>>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/
>>>>> Testing:
>>>>> mach5 tier1-5
>>>>> Normal performance testing showed no significant change.
>>>>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS
>>>>> improvement, though not statistically significant; removing contention
>>>>> for that lock by many hardware threads may be a little bit noticeable.
>>>> 
>>>> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again.
>>> 
>>> After some offline discussion with Thomas, I?m doing some restructuring that
>>> makes it probably not very efficient for anyone else to do a careful review of
>>> the open.00 version.
>> Here's a new webrev:
>> https://cr.openjdk.java.net/~kbarrett/8237143/open.02/
> 
>  I think this is good. Thanks for your extensive changes.

Thanks.

> Two minor issues. Do not need re-review:
> 
> * s/unsufficient/insufficient in g1DirtyCardQueue.cpp

Thanks for spotting that.

> * simple predicates returning bool tend to have an "is_" or "has_" prepended to it, i.e. s/PausedBuffers::empty()/...::is_empty()/

Agreed; will change to is_empty.  Old habits seem to die hard; I think someday
we might want to be more consistent with the Standard Library, but not today.

> 
> Thanks,
>  Thomas


From kim.barrett at oracle.com  Fri Feb  7 00:00:44 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 6 Feb 2020 19:00:44 -0500
Subject: RFR: 8237143: Eliminate DirtyCardQ_cbl_mon
In-Reply-To: <d091aa36-eb6a-0784-84c5-fa5dfab30af1@oracle.com>
References: <745E91C1-AE1A-4DA2-80EE-59B70897F4BF@oracle.com>
 <f9505da9-ca8b-f880-5420-bcdb3622b206@oracle.com>
 <86BABDA8-E402-49F3-B478-ED0E70490015@oracle.com>
 <40479EE1-74EF-4C5F-A04B-8877F0ED9ACB@oracle.com>
 <d091aa36-eb6a-0784-84c5-fa5dfab30af1@oracle.com>
Message-ID: <0B25580C-189C-4B09-88A9-6D1FBCD97C08@oracle.com>

> On Feb 5, 2020, at 5:52 PM, sangheon.kim at oracle.com wrote:
> 
> Hi Kim,
> 
> On 1/31/20 2:25 PM, Kim Barrett wrote:
>>> On Jan 23, 2020, at 3:10 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>> 
>>>> On Jan 22, 2020, at 11:12 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>>> On 16.01.20 09:51, Kim Barrett wrote:
>>>>> Please review this change to eliminate the DirtyCardQ_cbl_mon.  This
>>>>> is one of the two remaining super-special "access" ranked mutexes.
>>>>> (The other is the Shared_DirtyCardQ_lock, whose elimination is covered
>>>>> by JDK-8221360.)
>>>>> There are three main parts to this change.
>>>>> (1) Replace the under-a-lock FIFO queue in G1DirtyCardQueueSet with a
>>>>> lock-free FIFO queue.
>>>>> (2) Replace the use of a HotSpot monitor for signaling activation of
>>>>> concurrent refinement threads with a semaphore-based solution.
>>>>> (3) Handle pausing of buffer refinement in the middle of a buffer in
>>>>> order to handle a pending safepoint request.  This can no longer just
>>>>> push the partially processed buffer back onto the queue, due to ABA
>>>>> problems now that the buffer is lock-free.
>>>>> CR:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8237143
>>>>> Webrev:
>>>>> https://cr.openjdk.java.net/~kbarrett/8237143/open.00/
>>>>> Testing:
>>>>> mach5 tier1-5
>>>>> Normal performance testing showed no significant change.
>>>>> specjbb2015 on a very big machine showed a 3.5% average critical-jOPS
>>>>> improvement, though not statistically significant; removing contention
>>>>> for that lock by many hardware threads may be a little bit noticeable.
>>>> initial comments only, and so far only about comments :( The code itself looks good to me, but I want to look over it again.
>>> After some offline discussion with Thomas, I?m doing some restructuring that
>>> makes it probably not very efficient for anyone else to do a careful review of
>>> the open.00 version.
>> Here's a new webrev:
>> 
>> https://cr.openjdk.java.net/~kbarrett/8237143/open.02/
> Webrev.02 looks really good.
> 
> Thanks,
> Sangheon

Thanks.


From maoliang.ml at alibaba-inc.com  Fri Feb  7 05:39:28 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Fri, 07 Feb 2020 13:39:28 +0800
Subject: =?UTF-8?B?W1JhcmUgY2FzZV0gRzEgbWl4ZWQgR0MgZGlkbid0IHJlY2xhaW0gZ2FyYmFnZXMgaW4gOHU=?=
Message-ID: <2258aa97-c360-47f1-96d9-8a7ca98b2461.maoliang.ml@alibaba-inc.com>

Hi All,

I saw a rare case that G1 almost clear nothing in mixed GC but later full GC
 reclaimed 70% of the heap. The version is 8u and is there any bug or is it
an extreme case of floating garbage because of SATB?

The GC log is as below:

2020-02-06T20:07:39.785+0800: 6805.381: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.0381100 secs]
   [Parallel Time: 32.5 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 6805383.4, Avg: 6805383.5, Max: 6805383.5, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 7.4, Avg: 12.1, Max: 19.6, Diff: 12.2, Sum: 97.0]
      [Update RS (ms): Min: 2.8, Avg: 6.3, Max: 11.1, Diff: 8.3, Sum: 50.7]
         [Processed Buffers: Min: 48, Avg: 116.9, Max: 180, Diff: 132, Sum: 935]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 9.7, Avg: 13.6, Max: 17.6, Diff: 7.9, Sum: 109.1]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 32.1, Avg: 32.2, Max: 32.2, Diff: 0.1, Sum: 257.5]
      [GC Worker End (ms): Min: 6805415.6, Avg: 6805415.6, Max: 6805415.6, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 5.3 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.1 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.0 ms]
   [Eden: 608.0M(608.0M)->0.0B(608.0M) Survivors: 96.0M->96.0M Heap: 13.8G(14.2G)->13.2G(14.2G)]
 [Times: user=0.22 sys=0.00, real=0.04 secs]
2020-02-06T20:07:39.825+0800: 6805.421: [GC concurrent-root-region-scan-start]
2020-02-06T20:07:39.826+0800: 6805.422: Total time for which application threads were stopped: 0.0512361 seconds, Stopping threads took: 0.0009971 seconds
2020-02-06T20:07:39.845+0800: 6805.441: [GC concurrent-root-region-scan-end, 0.0199532 secs]
2020-02-06T20:07:39.845+0800: 6805.441: [GC concurrent-mark-start]
2020-02-06T20:07:43.459+0800: 6809.055: [GC concurrent-mark-end, 3.6139728 secs]
2020-02-06T20:07:43.467+0800: 6809.063: [GC remark 2020-02-06T20:07:43.467+0800: 6809.063: [Finalize Marking, 0.0027913 secs] 2020-02-06T20:07:43.470+0800: 6809.066: [GC ref-proc, 0.0141510 secs] 2020-02-06T20:07:43.484+0800: 6809.080: [Unloading, 0.0562292 secs], 0.0987990 secs]
 [Times: user=0.60 sys=0.01, real=0.10 secs]
2020-02-06T20:07:43.568+0800: 6809.164: Total time for which application threads were stopped: 0.1087168 seconds, Stopping threads took: 0.0008774 seconds
2020-02-06T20:07:43.576+0800: 6809.172: [GC cleanup 13G->13G(14G), 0.0128258 secs]
 [Times: user=0.08 sys=0.01, real=0.01 secs]
2020-02-06T20:07:43.590+0800: 6809.186: Total time for which application threads were stopped: 0.0223063 seconds, Stopping threads took: 0.0005304 seconds
2020-02-06T20:07:45.145+0800: 6810.741: [GC pause (G1 Evacuation Pause) (young), 0.0299645 secs]
   [Parallel Time: 24.8 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 6810743.7, Avg: 6810744.1, Max: 6810746.8, Diff: 3.1]
      [Ext Root Scanning (ms): Min: 5.5, Avg: 9.0, Max: 15.3, Diff: 9.8, Sum: 72.2]
      [Update RS (ms): Min: 2.8, Avg: 6.8, Max: 9.3, Diff: 6.5, Sum: 54.2]
         [Processed Buffers: Min: 58, Avg: 120.5, Max: 175, Diff: 117, Sum: 964]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 6.4, Avg: 8.4, Max: 9.7, Diff: 3.3, Sum: 66.9]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.3]
      [GC Worker Total (ms): Min: 21.5, Avg: 24.3, Max: 24.7, Diff: 3.1, Sum: 194.0]
      [GC Worker End (ms): Min: 6810768.3, Avg: 6810768.3, Max: 6810768.4, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 4.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.1 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.0 ms]
   [Eden: 608.0M(608.0M)->0.0B(608.0M) Survivors: 96.0M->96.0M Heap: 13.8G(14.2G)->13.3G(14.2G)]
 [Times: user=0.17 sys=0.00, real=0.03 secs]
2020-02-06T20:07:45.178+0800: 6810.774: Total time for which application threads were stopped: 0.0420222 seconds, Stopping threads took: 0.0009371 seconds
2020-02-06T20:07:47.186+0800: 6812.782: Total time for which application threads were stopped: 0.0081580 seconds, Stopping threads took: 0.0009037 seconds
2020-02-06T20:07:51.031+0800: 6816.627: [GC pause (G1 Evacuation Pause) (mixed), 0.0327771 secs]
   [Parallel Time: 27.2 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 6816629.1, Avg: 6816629.2, Max: 6816629.2, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 4.9, Avg: 7.6, Max: 15.4, Diff: 10.5, Sum: 60.8]
      [Update RS (ms): Min: 2.8, Avg: 6.3, Max: 9.1, Diff: 6.2, Sum: 50.8]
         [Processed Buffers: Min: 18, Avg: 124.9, Max: 224, Diff: 206, Sum: 999]
      [Scan RS (ms): Min: 0.5, Avg: 0.8, Max: 1.2, Diff: 0.6, Sum: 6.0]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 6.4, Avg: 12.4, Max: 17.4, Diff: 11.0, Sum: 99.0]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 27.1, Avg: 27.1, Max: 27.2, Diff: 0.1, Sum: 216.9]
      [GC Worker End (ms): Min: 6816656.3, Avg: 6816656.3, Max: 6816656.3, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.4 ms]
   [Other: 5.1 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.2 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.1 ms]
   [Eden: 608.0M(608.0M)->0.0B(608.0M) Survivors: 96.0M->96.0M Heap: 13.8G(14.2G)->13.2G(14.2G)]
 [Times: user=0.21 sys=0.01, real=0.04 secs]
2020-02-06T20:07:51.066+0800: 6816.662: Total time for which application threads were stopped: 0.0449305 seconds, Stopping threads took: 0.0009496 seconds
2020-02-06T20:07:58.095+0800: 6823.691: [GC pause (G1 Evacuation Pause) (young), 0.0297937 secs]
   [Parallel Time: 24.3 ms, GC Workers: 8]
      [GC Worker Start (ms): Min: 6823693.1, Avg: 6823693.1, Max: 6823693.2, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 4.9, Avg: 7.3, Max: 15.6, Diff: 10.7, Sum: 58.1]
      [Update RS (ms): Min: 5.6, Avg: 7.4, Max: 9.1, Diff: 3.5, Sum: 58.9]
         [Processed Buffers: Min: 30, Avg: 124.6, Max: 163, Diff: 133, Sum: 997]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 2.4, Avg: 9.5, Max: 13.4, Diff: 11.0, Sum: 76.2]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 24.2, Avg: 24.2, Max: 24.3, Diff: 0.1, Sum: 193.9]
      [GC Worker End (ms): Min: 6823717.4, Avg: 6823717.4, Max: 6823717.4, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 5.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.2 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.1 ms]
   [Eden: 608.0M(608.0M)->0.0B(608.0M) Survivors: 96.0M->96.0M Heap: 13.8G(14.2G)->13.3G(14.2G)]

2020-02-06T20:08:18.256+0800: 6843.852: [Full GC (Allocation Failure)  14G->4027M(14G), 7.5914236 secs]
   [Eden: 0.0B(704.0M)->0.0B(4480.0M) Survivors: 0.0B->0.0B Heap: 14.0G(14.2G)->4027.5M(14.2G)], [Metaspace: 401632K->401608K(1411072K)]
 [Times: user=11.26 sys=0.18, real=7.59 secs]

Thanks,
Liang


From thomas.schatzl at oracle.com  Fri Feb  7 11:09:20 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 7 Feb 2020 12:09:20 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
 <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
 <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>
Message-ID: <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com>

Hi,

On 06.02.20 13:27, Liang Mao wrote:
> Hi Thomas,
> 
> Thanks for the testing and evaluating!
> 
> I tried your test with specjbb2015 and had some little different
> result maybe because of machine capability. The config I used is as below:
> -Xmx8g?-Xms2g?-Xlog:gc*?-XX:GCTimeRatio=4
> -XX:+UseStringDeduplication
> -Dspecjbb.comm.connect.type=HTTP_Jetty
> -Dspecjbb.controller.type=PRESET
> -Dspecjbb.controller.preset.ir=5000
> -Dspecjbb.controller.preset.duration=10800000
> 
> The heap was around 6GB after running for a while (300s). And
> I was able to use SoftMaxHeapSize to let it shrink to 5GB. It
> should be like your scenario to shrink the heap to 3GB
> 
> The behavior is as I expected. But I thought you might expect
> more aggressive result. In my mind, for a constant load,
> the jvm might not need to shrink the heap that JVM supposes to expand
> the heap to the right capacity. 

Did you change Min/MaxHeapFreeRatio for your test? It does not look like 
that, as I get roughly the same results if I don't. Given that we agree 
that it is wrong to use Min/MaxHeapFreeRatio during Remark, the 
observation is interesting, but does not seem to help here except 
reinforcing that Min/MaxHeapFreeRatio are not a good thing to use.

Also, I doubt that G1's current heap size selection is optimal. Some 
reasons off my head:

- Min/MaxHeapFreeRatio has been chosen to avoid uncommit/commit 
ping-pong and frequent (un-)commits (i.e. performance), not heap 
compactness.

- adaptive IHOP (or at least the knowledge about expected amount of 
memory used during gc operation) has not been available, hence the very 
conservative values.

- the values have been chosen long before the uncommit at remark [2] has 
been implemented. As author of that change I can authoratively say that 
fixing the policy had been out of scope for that change ;) however it 
had been needed for JEP 346 Promptly Uncommit unused memory [1] to do 
*something* without disrupting existing behavior too much to avoid 
lengthy re-evaluation of sizing policies.

The logic went something like: what concurrent mark does roughly equals 
full gc, so do the same sizing as during full gc. End.

- there is (rough) consensus that Min/MaxHeapFreeRatio is/has been a bad 
idea, starting from the naming. ZGC and Shenandoah do not use it afaict.

- optimal heap size depends on application phase (e.g. 
startup/operation/idle). Min/MaxHeapFreeRatio default values basically 
prevent shrinking in many cases. Sometimes they even expand the heap 
[3]. Given the high default value of MinHeapFreeRatio, G1 will most 
likely end up using too much memory.

I.e. we apply MinHeapFreeRatio at Remark, which means that the heap size 
will be kept at heap size at Remark + 40%. Given that Remark is where 
heap usage almost peaked anyway, you get a really large commit size. 
Unnecessarily large because (beginning with modestly large heaps in few 
GBs) the actual peak memory usage *at optimal operation* is what 
adaptive IHOP determined. This is typically a lot less than 40% of 
existing usage at Remark. So G1 keeps a lot of memory around for no 
reason. This can be particularly significant in large heaps (say, double 
digit GB) where those 40% can be a lot in absolute terms while G1 only 
ever uses single digit additional GB during the cycle.

In my tests, e.g. the suggested 10% seem sufficient for that particular 
case.

We also agree that uncommit at end of mixed gc is probably better, but 
again, how much do you uncommit? To keep as much as you expect to not 
use would be a good start, maybe a bit more. Not less, because then you 
are going to do an unnecessary commit during that cycle for sure. 
Currently the best idea about what we are going to need in the next time 
is given by the IHOP goal value imho.

So overall, please do not read too much into existing heap sizing policy :)

> The soft limit I imagine is
> to bring the heap size down after a load pike. In Alibaba's
> workload, the heap shrink is controlled by cluster's unified
> control center which has the predicition data and the soft limit
> works more like a *hard* limit in our 8u implementation. >
> So I think it is acceptable that heap size failed shrinked
> to 2GB in your test case. You can see that
> G1HeapSizingPolicy::can_shrink_heap_size_to is a bit conservative
> and we may be able to make it more aggressive.
> 
> 
> For almost idle application which doesn't have a GC for a
> rather long time, the shrink cannot happen. In our previous 8u
> patch, we have a timer to trigger GC and the softmx is changed by
> a jcmd which will also trigger a GC(there was no SoftMaxHeapSize option
> in 8u yet). Shall we introduce a timer GC as well?
>

Please give the functionality JEP 346 added a try if you haven't. It 
should achieve what you suggest except that Min/MaxHeapFreeRatio may 
prevent G1 to achive the compact heap you expect (again).

Min/MaxHeapFreeRatio were changed to be manageable exactly for this 
reason, i.e. if you are idle, and your control center knows that the 
machine is going to be idle, instead of adjusting (in this case) 
SoftMaxHeapSize it may as well set Min/MaxHeapFreeRatio to low values 
and JEP 346 would do the rest. Before JEP 346 you needed to send a 
manual system.gc in addition.

So a simpler solution than the one suggested by you would be to just 
drop usage of Min/MaxHeapFreeRatio and/or incorporate SoftMaxHeapSize in 
the uncommit at remark in your case and let JEP 346 functionality its job.

If JEP 346 does not work for your use case, we are eager to hear back 
from you about your experience. We do know that it may be a little bit 
too much focused on what "idle" is, but that can be tweaked.

The reason I am suggesting to try JEP 346 is that from my understanding 
the suggested implementation seems to cover only exactly the same case 
as JEP 346, but only with side effects e.g.

- causing commit/uncommit ping-pong if the application is slightly 
active at worst, and no effect at best. While concurrent uncommit tries 
to mitigate this (and it is still very interesting to do), doing less 
commit/uncommit in the first place seems better.

- not covering e.g. the case where an existing Remark finishes after the 
last GC that decreased the heap to SoftMaxHeapSize even in the idle case 
(could be fixed as you mentioned above with a timer, but JEP 346 covers 
this already)

- only limited to reducing heap to SoftMaxHeapSize (why? Fixed as you 
said you were thinking about a more aggressive policy)

In a SoftMaxHeapSize solution in the JVM that I envision, the change 
should cover a wide(r) range of usage scenarios. We need to look a bit 
further than this single use case (which afaict G1 should already handle).

In the case you need a real hard limit I recommend looking at 
implementing that. There has been a proposal to do so some time ago, but 
is inactive at this time [0].

> 
> Honestly, I don't think Min/MaxHeapFreeRatio is a good way to detemine
> the heap expand/shrink in G1 and in our 8u practical experience we never
> have full GC so Min/MaxHeapFreeRatio is useless. Here when I reproduce
> your test, the only exception is the heap will expand to 6GB after
> shrinking to SoftMaxHeapSize=5g is because in remark we will resize the 
> heap.
> BTW, I don't think remark is a good point to resize heap since in remark
> phaseregions full of garbage havn't been reclaimed yet. IMHO we even don't
> need to resize in remark but just resize after mixed GC according to 
> GCTimeRatio.
> 
> Your change to make SoftMaxHeapSize sensible in adaptive IHOP controlling
> seems a similar approach as ZGC. ZGC is a single generation GC whose 
> scenario
> is much simpler. Maybe we don't need SoftMaxHeapSize to guide GC decision
> in G1. Since we already have policy to determine the shrink of the heap
> by SoftMaxHeapSize, I'm not sure if we need to make adaptive IHOP according
> to SoftMaxHeapSize... We may encounter the situation that we cannot 
> shrink the
> heap size to SoftMaxHeapSize but concurrent mark become frequent after 
> affecting
> the IHOP policy.

ZGC will be generational at some point. This has been on its roadmap 
since the beginning. Also, there is not much difference as you can see 
from the patch. The difference is currently 1 LOC to set young gen sizes 
in addition to the heap goal.

I also thought about the last point, i.e. when the user sets 
SoftMaxHeapSize too low, then you get continuous marking cycles. My 
answer to the user would be that, well, feel free to shoot yourselves 
into the foot, but compared to an OOME with a hard limit, this behavior 
seems much better (but there are certainly situations where a hard limit 
is better for someone so both seem useful).
Ultimately the only thing I can say that there is no free lunch in the 
throughput/latency/memory triangle, but there may be situations where 
memory is more important than performance too (widening the appeal of 
SoftMaxHeapSize).

In the test I gave, the 2g goal is maybe too low for this case, but the 
3g (instead of 3.8g) looks really attractive (and G1 seems to find an 
"optimal" size of 2.2-2.8g at that point; I think I found the reason for 
the spikes above 3g and looking into testing a fix).

The implementation suggested by me does not affect the idle case at all; 
JEP 346 functionality will clean up and compact the heap nicely (you 
would still need to fix the shrinking amount in the sizing policy, but 
we already agreed on that it is not good, and that doing the evaluation 
at remark isn't the best idea either - but both are separate issues).

> 
>> In?the?log?I?have,?the?problem?seems?to?be?that?we?are?re-setting?the 
>> softmaxheapsize?within?the?space?reclamation?phase?(i.e.?mixed?gc)?and 
>> G1?sizing?policies?got?confused,?i.e.?it?partially?keeps?on?using?the?2g 
>> goal?for?young?gen?sizing?until?the?*2?problem?expands?it.?That's?a?bug 
>> and?needs?to?be?fixed.
> 
> I don't think it's a problem that after mixed GC 
> resize_heap_after_young_collection
> will evaluate if the heap can be shrinked to the new value of 
> SoftMaxHeapSize.

Resizing (to SoftMaxHeapSize) after every gc will shrink and expand all 
the time unnecessarily. I.e. you expand one GC, the next gc it may 
happen that G1 can shrink to SoftMaxHeapSize again (e.g. because eager 
reclaim freed a lot), next gc G1 commits again because of failed pause 
time goal (or just commit during humongous allocation which can be 
immediately reversed because of eager reclaim).

Even with concurrent uncommit, such behavior seems a waste of time. Imho 
with concurrent (un-)commit unnecessary resizing should be avoided if 
possible.

One option is to base that decision on the value that adaptive IHOP 
gives you. It seems a very good start but there may be better 
approaches. Fixed percentages like Min/MaxFreeRatio are too simple as it 
seems :)

Thanks,
   Thomas


[0] https://bugs.openjdk.java.net/browse/JDK-8204088
[1] https://bugs.openjdk.java.net/browse/JDK-8204089
[2] https://bugs.openjdk.java.net/browse/JDK-6490394
[3]
https://bugs.openjdk.java.net/browse/JDK-6490394?focusedCommentId=14283475&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14283475
(only just noticed)


From thomas.schatzl at oracle.com  Fri Feb  7 11:09:46 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 7 Feb 2020 12:09:46 +0100
Subject: [Rare case] G1 mixed GC didn't reclaim garbages in 8u
In-Reply-To: <2258aa97-c360-47f1-96d9-8a7ca98b2461.maoliang.ml@alibaba-inc.com>
References: <2258aa97-c360-47f1-96d9-8a7ca98b2461.maoliang.ml@alibaba-inc.com>
Message-ID: <3cafc27b-67bd-5f52-aa7d-3638104871c8@oracle.com>

Hi,

On 07.02.20 06:39, Liang Mao wrote:
> Hi All,
> 
> I saw a rare case that G1 almost clear nothing in mixed GC but later full GC
> reclaimed 70% of the heap. The version is 8u and is there any bug or is it
> an extreme case of floating garbage because of SATB?

   hard to say. It may just be the application keeping data alive as you 
indicate. I am not aware of a particular jdk8 bug that keeps objects 
alive unnecessarily.

G1LogLevel=finest would give answer to why the mixed phase stopped 
early. It would not give insight about what exactly kept the data alive 
though.

Thanks,
   Thomas


From maoliang.ml at alibaba-inc.com  Mon Feb 10 11:47:06 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Mon, 10 Feb 2020 19:47:06 +0800
Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?=
 =?UTF-8?B?aGV1cmlzdGljcw==?=
In-Reply-To: <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
 <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
 <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>,
 <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com>
Message-ID: <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com>

Hi Thomas,

In my testing, I didn't change the value of Min/MaxHeapFreeRatio.

The heap had already shrinked to 5GB but in remark it expand to 6644M.
The fault value of MinHeapFreeRatio is 40, so the minimal commit size
after remark is the heap size * 1.67 (3979M * 1.67 = 6644M).
1.67 = 100/(100 - 40)


[1031.322s][info][gc ] GC(741) Pause Young (Concurrent Start) (G1 Evacuation Pause) 4724M->4506M(5120M) 10.607ms
[1031.322s][info][gc,cpu         ] GC(741) User=0.42s Sys=0.00s Real=0.01s
[1031.322s][info][gc             ] GC(742) Concurrent Cycle
[1031.322s][info][gc,marking     ] GC(742) Concurrent Clear Claimed Marks
[1031.322s][info][gc,marking     ] GC(742) Concurrent Clear Claimed Marks 0.066ms
[1031.322s][info][gc,marking     ] GC(742) Concurrent Scan Root Regions
[1031.322s][info][gc,stringdedup ] Concurrent String Deduplication (1031.322s)
[1031.323s][info][gc,stringdedup ] Concurrent String Deduplication 14224.0B->0.0B(14224.0B) avg 51.1% (1031.322s, 1031.323s) 0.514ms
[1031.326s][info][gc,marking     ] GC(742) Concurrent Scan Root Regions 3.939ms
[1031.326s][info][gc,marking     ] GC(742) Concurrent Mark (1031.326s)
[1031.326s][info][gc,marking     ] GC(742) Concurrent Mark From Roots
[1031.326s][info][gc,task        ] GC(742) Using 16 workers of 16 for marking
[1031.483s][info][gc,marking     ] GC(742) Concurrent Mark From Roots 157.144ms
[1031.483s][info][gc,marking     ] GC(742) Concurrent Preclean
[1031.484s][info][gc,marking     ] GC(742) Concurrent Preclean 0.404ms
[1031.484s][info][gc,marking     ] GC(742) Concurrent Mark (1031.326s, 1031.484s) 157.587ms
[1031.485s][info][gc,start       ] GC(742) Pause Remark
[1031.496s][info][gc             ] GC(742) Pause Remark 4625M->3979M(6644M) 10.953ms
[1031.496s][info][gc,cpu         ] GC(742) User=0.22s Sys=0.04s Real=0.01s


In our production environment, we never use JEP 346 mainly because of JDK version.
So I cannot tell how if it would work. I agree the "idle" issue is not our main focus for now.

Using SoftMaxHeapSize to guide adaptive IHOP to make desicion of concurrent
 mark GC cycle can work well with JEP 346 and the resize logic in remark.
I don't stick to shrink the heap in every GC.

The capacity in resize_heap_if_necessary will be
Max2(min_desire_capacity_by_MinHeapFreeRatio,  Min2(soft_max_capacity(), max_desire_capacity_by_MaxHeapFreeRatio))

But both 2 approaches have the problem that default MinHeapFreeRatio is too large
in remark comparing to full gc.  As resize_heap_if_necessary
 will keep a minimal heap size as 1.667X of used heap size. After remark,
the used size could be large that not only include those old regions with garbages but
also the used young regions. 

#############################
void G1CollectedHeap::resize_heap_if_necessary() {
...
const size_t capacity_after_gc = capacity();
const size_t used_after_gc = capacity_after_gc - unused_committed_regions_in_bytes();
#############################

The used_after_gc is reasonable for full gc but it can contains young regions in remark.
Do you think it should be changed like this?
#############################
const size_t used_after_gc = capacity_after_gc - unused_committed_regions_in_bytes() - young_regions_count() * HeapRegion::GrainWords;
// young_regions_count is 0 after full GC
#############################


Besides this, as you suggested, a lower MinHeapFreeRatio would be good. 
But arbitrarily setting a fixed number seems is not a good way that the small
 number may not meet pause time goal in later young GC. I tried to use
 following number in resize_heap_if_necessary:

##############################
void G1CollectedHeap::resize_heap_if_necessary() {
...
// We can now safely turn them into size_t's.
  size_t minimum_desired_capacity = (size_t) minimum_desired_capacity_d;
  size_t maximum_desired_capacity = (size_t) maximum_desired_capacity_d;

if (!collector_state()->in_full_gc()) {
    minimum_desired_capacity = MIN2(minimum_desired_capacity, policy()->minimum_desired_bytes(used_after_gc));
  }

....size_t G1Policy::minimum_desired_bytes(size_t used_bytes) const {
  return _ihop_control->unrestrained_young_size() != 0 ?
           _ihop_control->unrestrained_young_size() :
           _young_list_max_length * HeapRegion::GrainBytes
         + _reserve_regions * HeapRegion::GrainBytes + used_bytes;
}
#############################

I made the minimum_desired_capacity small enough based on adaptive IHOP's
_last_unrestrained_young_size. Even without SoftMaxHeapSize, the test can
keep the memory under 3GB. It's a rough example and I didn't predict the promotion
bytes of next young gc yet. Do you think
 a proper value of minimum_desired_capacity in remark resize
+ 
G1AdaptiveIHOPControl::actual_target_threshold according to soft_max_capacity
 is enough?

Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 7 (Fri.) 19:09
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On 06.02.20 13:27, Liang Mao wrote:
> Hi Thomas,
> 
> Thanks for the testing and evaluating!
> 
> I tried your test with specjbb2015 and had some little different
> result maybe because of machine capability. The config I used is as below:
> -Xmx8g -Xms2g -Xlog:gc* -XX:GCTimeRatio=4
> -XX:+UseStringDeduplication
> -Dspecjbb.comm.connect.type=HTTP_Jetty
> -Dspecjbb.controller.type=PRESET
> -Dspecjbb.controller.preset.ir=5000
> -Dspecjbb.controller.preset.duration=10800000
> 
> The heap was around 6GB after running for a while (300s). And
> I was able to use SoftMaxHeapSize to let it shrink to 5GB. It
> should be like your scenario to shrink the heap to 3GB
> 
> The behavior is as I expected. But I thought you might expect
> more aggressive result. In my mind, for a constant load,
> the jvm might not need to shrink the heap that JVM supposes to expand
> the heap to the right capacity. 

Did you change Min/MaxHeapFreeRatio for your test? It does not look like 
that, as I get roughly the same results if I don't. Given that we agree 
that it is wrong to use Min/MaxHeapFreeRatio during Remark, the 
observation is interesting, but does not seem to help here except 
reinforcing that Min/MaxHeapFreeRatio are not a good thing to use.

Also, I doubt that G1's current heap size selection is optimal. Some 
reasons off my head:

- Min/MaxHeapFreeRatio has been chosen to avoid uncommit/commit 
ping-pong and frequent (un-)commits (i.e. performance), not heap 
compactness.

- adaptive IHOP (or at least the knowledge about expected amount of 
memory used during gc operation) has not been available, hence the very 
conservative values.

- the values have been chosen long before the uncommit at remark [2] has 
been implemented. As author of that change I can authoratively say that 
fixing the policy had been out of scope for that change ;) however it 
had been needed for JEP 346 Promptly Uncommit unused memory [1] to do 
*something* without disrupting existing behavior too much to avoid 
lengthy re-evaluation of sizing policies.

The logic went something like: what concurrent mark does roughly equals 
full gc, so do the same sizing as during full gc. End.

- there is (rough) consensus that Min/MaxHeapFreeRatio is/has been a bad 
idea, starting from the naming. ZGC and Shenandoah do not use it afaict.

- optimal heap size depends on application phase (e.g. 
startup/operation/idle). Min/MaxHeapFreeRatio default values basically 
prevent shrinking in many cases. Sometimes they even expand the heap 
[3]. Given the high default value of MinHeapFreeRatio, G1 will most 
likely end up using too much memory.

I.e. we apply MinHeapFreeRatio at Remark, which means that the heap size 
will be kept at heap size at Remark + 40%. Given that Remark is where 
heap usage almost peaked anyway, you get a really large commit size. 
Unnecessarily large because (beginning with modestly large heaps in few 
GBs) the actual peak memory usage *at optimal operation* is what 
adaptive IHOP determined. This is typically a lot less than 40% of 
existing usage at Remark. So G1 keeps a lot of memory around for no 
reason. This can be particularly significant in large heaps (say, double 
digit GB) where those 40% can be a lot in absolute terms while G1 only 
ever uses single digit additional GB during the cycle.

In my tests, e.g. the suggested 10% seem sufficient for that particular 
case.

We also agree that uncommit at end of mixed gc is probably better, but 
again, how much do you uncommit? To keep as much as you expect to not 
use would be a good start, maybe a bit more. Not less, because then you 
are going to do an unnecessary commit during that cycle for sure. 
Currently the best idea about what we are going to need in the next time 
is given by the IHOP goal value imho.

So overall, please do not read too much into existing heap sizing policy :)

> The soft limit I imagine is
> to bring the heap size down after a load pike. In Alibaba's
> workload, the heap shrink is controlled by cluster's unified
> control center which has the predicition data and the soft limit
> works more like a *hard* limit in our 8u implementation. >
> So I think it is acceptable that heap size failed shrinked
> to 2GB in your test case. You can see that
> G1HeapSizingPolicy::can_shrink_heap_size_to is a bit conservative
> and we may be able to make it more aggressive.
> 
> 
> For almost idle application which doesn't have a GC for a
> rather long time, the shrink cannot happen. In our previous 8u
> patch, we have a timer to trigger GC and the softmx is changed by
> a jcmd which will also trigger a GC(there was no SoftMaxHeapSize option
> in 8u yet). Shall we introduce a timer GC as well?
>

Please give the functionality JEP 346 added a try if you haven't. It 
should achieve what you suggest except that Min/MaxHeapFreeRatio may 
prevent G1 to achive the compact heap you expect (again).

Min/MaxHeapFreeRatio were changed to be manageable exactly for this 
reason, i.e. if you are idle, and your control center knows that the 
machine is going to be idle, instead of adjusting (in this case) 
SoftMaxHeapSize it may as well set Min/MaxHeapFreeRatio to low values 
and JEP 346 would do the rest. Before JEP 346 you needed to send a 
manual system.gc in addition.

So a simpler solution than the one suggested by you would be to just 
drop usage of Min/MaxHeapFreeRatio and/or incorporate SoftMaxHeapSize in 
the uncommit at remark in your case and let JEP 346 functionality its job.

If JEP 346 does not work for your use case, we are eager to hear back 
from you about your experience. We do know that it may be a little bit 
too much focused on what "idle" is, but that can be tweaked.

The reason I am suggesting to try JEP 346 is that from my understanding 
the suggested implementation seems to cover only exactly the same case 
as JEP 346, but only with side effects e.g.

- causing commit/uncommit ping-pong if the application is slightly 
active at worst, and no effect at best. While concurrent uncommit tries 
to mitigate this (and it is still very interesting to do), doing less 
commit/uncommit in the first place seems better.

- not covering e.g. the case where an existing Remark finishes after the 
last GC that decreased the heap to SoftMaxHeapSize even in the idle case 
(could be fixed as you mentioned above with a timer, but JEP 346 covers 
this already)

- only limited to reducing heap to SoftMaxHeapSize (why? Fixed as you 
said you were thinking about a more aggressive policy)

In a SoftMaxHeapSize solution in the JVM that I envision, the change 
should cover a wide(r) range of usage scenarios. We need to look a bit 
further than this single use case (which afaict G1 should already handle).

In the case you need a real hard limit I recommend looking at 
implementing that. There has been a proposal to do so some time ago, but 
is inactive at this time [0].

> 
> Honestly, I don't think Min/MaxHeapFreeRatio is a good way to detemine
> the heap expand/shrink in G1 and in our 8u practical experience we never
> have full GC so Min/MaxHeapFreeRatio is useless. Here when I reproduce
> your test, the only exception is the heap will expand to 6GB after
> shrinking to SoftMaxHeapSize=5g is because in remark we will resize the 
> heap.
> BTW, I don't think remark is a good point to resize heap since in remark
> phaseregions full of garbage havn't been reclaimed yet. IMHO we even don't
> need to resize in remark but just resize after mixed GC according to 
> GCTimeRatio.
> 
> Your change to make SoftMaxHeapSize sensible in adaptive IHOP controlling
> seems a similar approach as ZGC. ZGC is a single generation GC whose 
> scenario
> is much simpler. Maybe we don't need SoftMaxHeapSize to guide GC decision
> in G1. Since we already have policy to determine the shrink of the heap
> by SoftMaxHeapSize, I'm not sure if we need to make adaptive IHOP according
> to SoftMaxHeapSize... We may encounter the situation that we cannot 
> shrink the
> heap size to SoftMaxHeapSize but concurrent mark become frequent after 
> affecting
> the IHOP policy.

ZGC will be generational at some point. This has been on its roadmap 
since the beginning. Also, there is not much difference as you can see 
from the patch. The difference is currently 1 LOC to set young gen sizes 
in addition to the heap goal.

I also thought about the last point, i.e. when the user sets 
SoftMaxHeapSize too low, then you get continuous marking cycles. My 
answer to the user would be that, well, feel free to shoot yourselves 
into the foot, but compared to an OOME with a hard limit, this behavior 
seems much better (but there are certainly situations where a hard limit 
is better for someone so both seem useful).
Ultimately the only thing I can say that there is no free lunch in the 
throughput/latency/memory triangle, but there may be situations where 
memory is more important than performance too (widening the appeal of 
SoftMaxHeapSize).

In the test I gave, the 2g goal is maybe too low for this case, but the 
3g (instead of 3.8g) looks really attractive (and G1 seems to find an 
"optimal" size of 2.2-2.8g at that point; I think I found the reason for 
the spikes above 3g and looking into testing a fix).

The implementation suggested by me does not affect the idle case at all; 
JEP 346 functionality will clean up and compact the heap nicely (you 
would still need to fix the shrinking amount in the sizing policy, but 
we already agreed on that it is not good, and that doing the evaluation 
at remark isn't the best idea either - but both are separate issues).

> 
>> In the log I have, the problem seems to be that we are re-setting the 
>> softmaxheapsize within the space reclamation phase (i.e. mixed gc) and 
>> G1 sizing policies got confused, i.e. it partially keeps on using the 2g 
>> goal for young gen sizing until the *2 problem expands it. That's a bug 
>> and needs to be fixed.
> 
> I don't think it's a problem that after mixed GC 
> resize_heap_after_young_collection
> will evaluate if the heap can be shrinked to the new value of 
> SoftMaxHeapSize.

Resizing (to SoftMaxHeapSize) after every gc will shrink and expand all 
the time unnecessarily. I.e. you expand one GC, the next gc it may 
happen that G1 can shrink to SoftMaxHeapSize again (e.g. because eager 
reclaim freed a lot), next gc G1 commits again because of failed pause 
time goal (or just commit during humongous allocation which can be 
immediately reversed because of eager reclaim).

Even with concurrent uncommit, such behavior seems a waste of time. Imho 
with concurrent (un-)commit unnecessary resizing should be avoided if 
possible.

One option is to base that decision on the value that adaptive IHOP 
gives you. It seems a very good start but there may be better 
approaches. Fixed percentages like Min/MaxFreeRatio are too simple as it 
seems :)

Thanks,
   Thomas


[0] https://bugs.openjdk.java.net/browse/JDK-8204088
[1] https://bugs.openjdk.java.net/browse/JDK-8204089
[2] https://bugs.openjdk.java.net/browse/JDK-6490394
[3]
https://bugs.openjdk.java.net/browse/JDK-6490394?focusedCommentId=14283475&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14283475
(only just noticed)


From m.sundar85 at gmail.com  Mon Feb 10 18:32:32 2020
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Mon, 10 Feb 2020 13:32:32 -0500
Subject: Parallel GC Thread crash
In-Reply-To: <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
Message-ID: <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>

Hi Stefan,
    We started seeing more crashes on JDK13.0.1+9

Since seeing it on GC Task Thread assumed it is related to GC.

# Problematic frame:
# V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30

Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m -XX:NewSize=40000m
-XX:+DisableExplicitGC -Xnoclassgc -XX:+UseParallelGC
-XX:ParallelGCThreads=40 -XX:ConcGCTh
reads=5 ...

Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat
Enterprise Linux Server release 6.10 (Santiago)
Time: Fri Feb  7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h 31m
30s)

---------------  T H R E A D  ---------------

Current thread (0x00007fca6c074000):  GCTaskThread "ParGC Thread#28"
[stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]

Stack: [0x00007fba72ff1000,0x00007fba730f1000],  sp=0x00007fba730ee850,
 free space=1014k
Native frames: (J=compiled Java code, A=aot compiled Java code,
j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30
V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
const*, OopClosure*)+0x2eb
V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
CodeBlobClosure*, RegisterMap*, bool)+0x99
V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
CodeBlobClosure*)+0x187
V  [libjvm.so+0xd190be]  ThreadRootsTask::do_it(GCTaskManager*, unsigned
int)+0x6e
V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7

JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v  ~RuntimeStub::_new_array_Java
J 58520 c2
ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
(207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c]
J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004
bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8]
J 58224 c2
webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
(105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258]
J 69992 c2
webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
(9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4]
J 55265 c2
webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
(332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0]
J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272
bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634]
J 15811% c2
com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
(486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4]
j
 com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684]
J 7550 c1
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4
[0x00007fca54fba8e0+0x0000000000000df4]
J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c
[0x00007fca5454b8c0+0x000000000000007c]
J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
0x0000000000000000

Does JDK11 and 13 have different code for GC. Do you think
downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here?

Any insight to debug this will be helpful.

TIA
Sundar

On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson <stefan.karlsson at oracle.com>
wrote:

> Hi Sundar,
>
> The GC crashes when it encounters something bad on the stack:
>  > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
>  > const*, OopClosure*)+0x2eb
>  > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
>
> This is probably not a GC bug. It's more likely that this is caused by
> the JIT compiler. I see in your hotspot-runtime-dev thread, that you
> also get crashes in other compiler related areas.
>
> If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and
> -XX:+VerifyAfterGC, and see if this asserts before the GC has started
> running.
>
> StefanK
>
> On 2020-02-04 04:38, Sundara Mohan M wrote:
> > Hi,
> >     I am seeing following crashes frequently on our servers
> > #
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
> > #
> > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
> > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered,
> parallel
> > gc, linux-amd64)
> > # Problematic frame:
> > # V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> > #
> > # No core dump will be written. Core dumps have been disabled. To enable
> > core dumping, try "ulimit -c unlimited" before starting Java again
> > #
> > # If you would like to submit a bug report, please visit:
> > #   https://github.com/AdoptOpenJDK/openjdk-build/issues
> > #
> >
> >
> > ---------------  T H R E A D  ---------------
> >
> > Current thread (0x00007fca2c051000):  GCTaskThread "ParGC Thread#8"
> [stack:
> > 0x00007fca30277000,0x00007fca30377000] [id=108299]
> >
> > Stack: [0x00007fca30277000,0x00007fca30377000],  sp=0x00007fca30374890,
> >   free space=1014k
> > Native frames: (J=compiled Java code, A=aot compiled Java code,
> > j=interpreted, Vv=VM code, C=native code)
> > V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
> > const*, OopClosure*)+0x2eb
> > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> > CodeBlobClosure*, RegisterMap*, bool)+0x99
> > V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> > CodeBlobClosure*)+0x187
> > V  [libjvm.so+0xcce2f0]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> > unsigned int)+0xb0
> > V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> > V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> > V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
> >
> > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
> > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> > v  ~RuntimeStub::_new_array_Java
> > J 225122 c2
> > ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> > (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8]
> > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
> > J 225129 c2
> >
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> > (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac]
> > J 131643 c2
> >
> webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0]
> > J 55114 c2
> >
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> > (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644]
> > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272
> > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
> > J 16114% c2
> >
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> > (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c]
> > j
> >
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
> > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
> > J 7560 c1
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
> > [0x00007fca15b23160+0x0000000000000df4]
> > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
> > [0x00007fca15b39a40+0x000000000000007c]
> > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
> > v  ~StubRoutines::call_stub
> >
> > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > 0x0000000000000000
> >
> > Register to memory mapping:
> > ...
> >
> > Can someone shed more info on when this can happen? I am seeing this on
> > multiple servers with Java 13.0.1+9 on RHEL6 servers.
> >
> > There was another thread in hotspot runtime where David Holmes pointed
> this
> >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > 0x0000000000000000
> >
> >> This seems it may be related to:
> >> https://bugs.openjdk.java.net/browse/JDK-8004124
> >
> > Just wondering if this is same or something to do with GC specific.
> >
> >
> >
> > TIA
> > Sundar
> >
>


From sangheon.kim at oracle.com  Mon Feb 10 18:59:24 2020
From: sangheon.kim at oracle.com (sangheon.kim at oracle.com)
Date: Mon, 10 Feb 2020 10:59:24 -0800
Subject: RFR (S): 8238160: Uniformize Parallel GC task queue variable names
In-Reply-To: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com>
References: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com>
Message-ID: <c8c701bf-0c4d-f04f-1391-e8eaae030b73@oracle.com>

Hi Thomas,

On 1/30/20 3:08 AM, Thomas Schatzl wrote:
> Hi all,
>
> ? can I have reviews for this small change that moves some global 
> typedefs used only by Parallel GC from taskqueue.hpp to parallel gc 
> files, and further makes naming of instances of these more uniform?
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238160
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8238160/webrev/
Looks good to me.

If you are interested, copyright year can be updated. I don't need a new 
webrev for this.

Thanks,
Sangheon


> Testing:
> local compilation
>
> Thanks,
> ? Thomas


From ecki at zusammenkunft.net  Mon Feb 10 19:29:26 2020
From: ecki at zusammenkunft.net (Bernd Eckenfels)
Date: Mon, 10 Feb 2020 19:29:26 +0000
Subject: Parallel GC Thread crash
In-Reply-To: <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>,
 <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
Message-ID: <AM6PR03MB438926C7B91CDE7CE46241F8FF190@AM6PR03MB4389.eurprd03.prod.outlook.com>

Hello,

not an answer, but just a question,

> -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCThreads=5

what part of ParallelGC is controlled by concurrent threads setting?

Gruss
Bernd
--
http://bernd.eckenfels.net
________________________________
Von: hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net> im Auftrag von Sundara Mohan M <m.sundar85 at gmail.com>
Gesendet: Montag, Februar 10, 2020 7:33 PM
An: Stefan Karlsson
Cc: hotspot-gc-dev at openjdk.java.net
Betreff: Re: Parallel GC Thread crash

Hi Stefan,
    We started seeing more crashes on JDK13.0.1+9

Since seeing it on GC Task Thread assumed it is related to GC.

# Problematic frame:
# V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30

Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m -XX:NewSize=40000m
-XX:+DisableExplicitGC -Xnoclassgc -XX:+UseParallelGC
-XX:ParallelGCThreads=40 -XX:ConcGCTh
reads=5 ...

Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat
Enterprise Linux Server release 6.10 (Santiago)
Time: Fri Feb  7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h 31m
30s)

---------------  T H R E A D  ---------------

Current thread (0x00007fca6c074000):  GCTaskThread "ParGC Thread#28"
[stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]

Stack: [0x00007fba72ff1000,0x00007fba730f1000],  sp=0x00007fba730ee850,
 free space=1014k
Native frames: (J=compiled Java code, A=aot compiled Java code,
j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30
V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
const*, OopClosure*)+0x2eb
V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
CodeBlobClosure*, RegisterMap*, bool)+0x99
V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
CodeBlobClosure*)+0x187
V  [libjvm.so+0xd190be]  ThreadRootsTask::do_it(GCTaskManager*, unsigned
int)+0x6e
V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7

JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v  ~RuntimeStub::_new_array_Java
J 58520 c2
ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
(207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c]
J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004
bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8]
J 58224 c2
webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
(105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258]
J 69992 c2
webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
(9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4]
J 55265 c2
webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
(332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0]
J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272
bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634]
J 15811% c2
com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
(486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4]
j
 com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684]
J 7550 c1
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4
[0x00007fca54fba8e0+0x0000000000000df4]
J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c
[0x00007fca5454b8c0+0x000000000000007c]
J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
0x0000000000000000

Does JDK11 and 13 have different code for GC. Do you think
downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here?

Any insight to debug this will be helpful.

TIA
Sundar

On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson <stefan.karlsson at oracle.com>
wrote:

> Hi Sundar,
>
> The GC crashes when it encounters something bad on the stack:
>  > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
>  > const*, OopClosure*)+0x2eb
>  > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
>
> This is probably not a GC bug. It's more likely that this is caused by
> the JIT compiler. I see in your hotspot-runtime-dev thread, that you
> also get crashes in other compiler related areas.
>
> If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and
> -XX:+VerifyAfterGC, and see if this asserts before the GC has started
> running.
>
> StefanK
>
> On 2020-02-04 04:38, Sundara Mohan M wrote:
> > Hi,
> >     I am seeing following crashes frequently on our servers
> > #
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
> > #
> > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
> > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered,
> parallel
> > gc, linux-amd64)
> > # Problematic frame:
> > # V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> > #
> > # No core dump will be written. Core dumps have been disabled. To enable
> > core dumping, try "ulimit -c unlimited" before starting Java again
> > #
> > # If you would like to submit a bug report, please visit:
> > #   https://github.com/AdoptOpenJDK/openjdk-build/issues
> > #
> >
> >
> > ---------------  T H R E A D  ---------------
> >
> > Current thread (0x00007fca2c051000):  GCTaskThread "ParGC Thread#8"
> [stack:
> > 0x00007fca30277000,0x00007fca30377000] [id=108299]
> >
> > Stack: [0x00007fca30277000,0x00007fca30377000],  sp=0x00007fca30374890,
> >   free space=1014k
> > Native frames: (J=compiled Java code, A=aot compiled Java code,
> > j=interpreted, Vv=VM code, C=native code)
> > V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
> > const*, OopClosure*)+0x2eb
> > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> > CodeBlobClosure*, RegisterMap*, bool)+0x99
> > V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> > CodeBlobClosure*)+0x187
> > V  [libjvm.so+0xcce2f0]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> > unsigned int)+0xb0
> > V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> > V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> > V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
> >
> > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
> > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> > v  ~RuntimeStub::_new_array_Java
> > J 225122 c2
> > ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> > (207 bytes) @ 0x00007fca21f1a5d8 [0x00007fca21f17f20+0x00000000000026b8]
> > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
> > J 225129 c2
> >
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> > (105 bytes) @ 0x00007fca1da512ac [0x00007fca1da51100+0x00000000000001ac]
> > J 131643 c2
> >
> webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0]
> > J 55114 c2
> >
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> > (332 bytes) @ 0x00007fca2051fe64 [0x00007fca2051f820+0x0000000000000644]
> > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272
> > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
> > J 16114% c2
> >
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> > (486 bytes) @ 0x00007fca1ced465c [0x00007fca1ced4200+0x000000000000045c]
> > j
> >
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
> > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
> > J 7560 c1
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
> > [0x00007fca15b23160+0x0000000000000df4]
> > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
> > [0x00007fca15b39a40+0x000000000000007c]
> > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
> > v  ~StubRoutines::call_stub
> >
> > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > 0x0000000000000000
> >
> > Register to memory mapping:
> > ...
> >
> > Can someone shed more info on when this can happen? I am seeing this on
> > multiple servers with Java 13.0.1+9 on RHEL6 servers.
> >
> > There was another thread in hotspot runtime where David Holmes pointed
> this
> >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > 0x0000000000000000
> >
> >> This seems it may be related to:
> >> https://bugs.openjdk.java.net/browse/JDK-8004124
> >
> > Just wondering if this is same or something to do with GC specific.
> >
> >
> >
> > TIA
> > Sundar
> >
>


From stefan.karlsson at oracle.com  Mon Feb 10 19:42:49 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 10 Feb 2020 20:42:49 +0100
Subject: Parallel GC Thread crash
In-Reply-To: <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
 <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
Message-ID: <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com>

Hi Sundar,

On 2020-02-10 19:32, Sundara Mohan M wrote:
> Hi?Stefan,
> ? ? We started seeing more crashes on JDK13.0.1+9
>
> Since seeing it on GC Task Thread assumed it is related to GC.

As I said in my previous mail, I don't think this is caused by GC code. 
More below.

>
> # Problematic frame:
> # V ?[libjvm.so+0xd183c0] ?PSRootsClosure<false>::do_oop(oopDesc**)+0x30
>
> Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m 
> -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc 
> -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh
> reads=5 ...
>
> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red 
> Hat Enterprise Linux Server release 6.10 (Santiago)
> Time: Fri Feb ?7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h 
> 31m 30s)
>
> --------------- ?T H R E A D ?---------------
>
> Current thread (0x00007fca6c074000): ?GCTaskThread "ParGC Thread#28" 
> [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]
>
> Stack: [0x00007fba72ff1000,0x00007fba730f1000], 
> ?sp=0x00007fba730ee850, ?free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> V ?[libjvm.so+0xd183c0] ?PSRootsClosure<false>::do_oop(oopDesc**)+0x30
> V ?[libjvm.so+0xc6bf0b] ?OopMapSet::oops_do(frame const*, RegisterMap 
> const*, OopClosure*)+0x2eb
> V ?[libjvm.so+0x765489] ?frame::oops_do_internal(OopClosure*, 
> CodeBlobClosure*, RegisterMap*, bool)+0x99
> V ?[libjvm.so+0xf68b17] ?JavaThread::oops_do(OopClosure*, 
> CodeBlobClosure*)+0x187
> V ?[libjvm.so+0xd190be] ?ThreadRootsTask::do_it(GCTaskManager*, 
> unsigned int)+0x6e
> V ?[libjvm.so+0x7f422b] ?GCTaskThread::run()+0x1eb
> V ?[libjvm.so+0xf707fd] ?Thread::call_run()+0x10d
> V ?[libjvm.so+0xc875b7] ?thread_native_entry(Thread*)+0xe7
>
> JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v ?~RuntimeStub::_new_array_Java
> J 58520 c2 
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V 
> (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c]
> J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V 
> (1004 bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8]
> J 58224 c2 
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; 
> (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258]
> J 69992 c2 
> webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; 
> (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4]
> J 55265 c2 
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; 
> (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0]
> J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z 
> (272 bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634]
> J 15811% c2 
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; 
> (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4]
> j 
> ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 
> bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684]
> J 7550 c1 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V 
> java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 
> [0x00007fca54fba8e0+0x0000000000000df4]
> J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V 
> java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c 
> [0x00007fca5454b8c0+0x000000000000007c]
> J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ 
> 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
> v ?~StubRoutines::call_stub
>
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 
> 0x0000000000000000
>
> Does JDK11 and 13 have different code for GC. Do you think 
> downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here?

You should at least move to 13.0.2, to get the latest bug fixes/patches.

There has been a lot of changes in all areas of the JVM between 11 and 
13. We don't yet know the root cause of this crash, and I can't say if 
this is caused by new changes or not. Have you or anyone filed a bug 
report for this?

> Any insight to debug this will be helpful.

Did you try my previous suggestion to run with -XX:+VerifyBeforeGC and 
-XX:+VerifyAfterGC? If you can tolerate the longer GC times it 
introduces, then you could try to run with 
-XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC -XX:+VerifyAfterGC .

Cheers,
StefanK

>
> TIA
> Sundar
>
> On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson 
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
>     Hi Sundar,
>
>     The GC crashes when it encounters something bad on the stack:
>     ?> V? [libjvm.so+0xc6bf0b]? OopMapSet::oops_do(frame const*,
>     RegisterMap
>     ?> const*, OopClosure*)+0x2eb
>     ?> V? [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
>
>     This is probably not a GC bug. It's more likely that this is
>     caused by
>     the JIT compiler. I see in your hotspot-runtime-dev thread, that you
>     also get crashes in other compiler related areas.
>
>     If you want to rule out the GC, you can run with
>     -XX:+VerifyBeforeGC and
>     -XX:+VerifyAfterGC, and see if this asserts before the GC has started
>     running.
>
>     StefanK
>
>     On 2020-02-04 04:38, Sundara Mohan M wrote:
>     > Hi,
>     >? ? ?I am seeing following crashes frequently on our servers
>     > #
>     > # A fatal error has been detected by the Java Runtime Environment:
>     > #
>     > #? SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
>     > #
>     > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build
>     13.0.1+9)
>     > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode,
>     tiered, parallel
>     > gc, linux-amd64)
>     > # Problematic frame:
>     > # V? [libjvm.so+0xcd3311]
>     PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
>     > #
>     > # No core dump will be written. Core dumps have been disabled.
>     To enable
>     > core dumping, try "ulimit -c unlimited" before starting Java again
>     > #
>     > # If you would like to submit a bug report, please visit:
>     > # https://github.com/AdoptOpenJDK/openjdk-build/issues
>     > #
>     >
>     >
>     > ---------------? T H R E A D? ---------------
>     >
>     > Current thread (0x00007fca2c051000):? GCTaskThread "ParGC
>     Thread#8" [stack:
>     > 0x00007fca30277000,0x00007fca30377000] [id=108299]
>     >
>     > Stack: [0x00007fca30277000,0x00007fca30377000],
>     sp=0x00007fca30374890,
>     >? ?free space=1014k
>     > Native frames: (J=compiled Java code, A=aot compiled Java code,
>     > j=interpreted, Vv=VM code, C=native code)
>     > V? [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
>     > V? [libjvm.so+0xc6bf0b]? OopMapSet::oops_do(frame const*,
>     RegisterMap
>     > const*, OopClosure*)+0x2eb
>     > V? [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
>     > CodeBlobClosure*, RegisterMap*, bool)+0x99
>     > V? [libjvm.so+0xf68b17]? JavaThread::oops_do(OopClosure*,
>     > CodeBlobClosure*)+0x187
>     > V? [libjvm.so+0xcce2f0]
>     ThreadRootsMarkingTask::do_it(GCTaskManager*,
>     > unsigned int)+0xb0
>     > V? [libjvm.so+0x7f422b]? GCTaskThread::run()+0x1eb
>     > V? [libjvm.so+0xf707fd]? Thread::call_run()+0x10d
>     > V? [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7
>     >
>     > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
>     > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>     > v? ~RuntimeStub::_new_array_Java
>     > J 225122 c2
>     >
>     ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
>     > (207 bytes) @ 0x00007fca21f1a5d8
>     [0x00007fca21f17f20+0x00000000000026b8]
>     > J 62342 c2
>     webservice.exception.ExceptionLoggingWrapper.execute()V (1004
>     > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
>     > J 225129 c2
>     >
>     webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     > (105 bytes) @ 0x00007fca1da512ac
>     [0x00007fca1da51100+0x00000000000001ac]
>     > J 131643 c2
>     >
>     webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     > (9 bytes) @ 0x00007fca20ce6190
>     [0x00007fca20ce60c0+0x00000000000000d0]
>     > J 55114 c2
>     >
>     webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
>     > (332 bytes) @ 0x00007fca2051fe64
>     [0x00007fca2051f820+0x0000000000000644]
>     > J 57859 c2
>     webservice.filters.ResponseSerializationWorker.execute()Z (272
>     > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
>     > J 16114% c2
>     >
>     com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
>     > (486 bytes) @ 0x00007fca1ced465c
>     [0x00007fca1ced4200+0x000000000000045c]
>     > j
>     >
>     ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
>     > J 11639 c2 java.util.concurrent.FutureTask.run()V
>     java.base at 13.0.1 (123
>     > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
>     > J 7560 c1
>     >
>     java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>     > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
>     > [0x00007fca15b23160+0x0000000000000df4]
>     > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>     > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
>     > [0x00007fca15b39a40+0x000000000000007c]
>     > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
>     > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
>     > v? ~StubRoutines::call_stub
>     >
>     > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
>     > 0x0000000000000000
>     >
>     > Register to memory mapping:
>     > ...
>     >
>     > Can someone shed more info on when this can happen? I am seeing
>     this on
>     > multiple servers with Java 13.0.1+9 on RHEL6 servers.
>     >
>     > There was another thread in hotspot runtime where David Holmes
>     pointed this
>     >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
>     > 0x0000000000000000
>     >
>     >> This seems it may be related to:
>     >> https://bugs.openjdk.java.net/browse/JDK-8004124
>     >
>     > Just wondering if this is same or something to do with GC specific.
>     >
>     >
>     >
>     > TIA
>     > Sundar
>     >
>


From m.sundar85 at gmail.com  Mon Feb 10 19:44:35 2020
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Mon, 10 Feb 2020 14:44:35 -0500
Subject: Parallel GC Thread crash
In-Reply-To: <AM6PR03MB438926C7B91CDE7CE46241F8FF190@AM6PR03MB4389.eurprd03.prod.outlook.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
 <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
 <AM6PR03MB438926C7B91CDE7CE46241F8FF190@AM6PR03MB4389.eurprd03.prod.outlook.com>
Message-ID: <CACGCMVrQbwiLo+AgCOhH5J4ir03nA0iO22Z05GaCaNa=C=Wp3Q@mail.gmail.com>

I believe it is not used in case of Parallel GC. We were experimenting with
ZGC using these numbers and it is still there.


Thanks
Sundar

On Mon, Feb 10, 2020 at 2:36 PM Bernd Eckenfels <ecki at zusammenkunft.net>
wrote:

> Hello,
>
> not an answer, but just a question,
>
> > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCThreads=5
>
> what part of ParallelGC is controlled by concurrent threads setting?
>
> Gruss
> Bernd
> --
> http://bernd.eckenfels.net
> ________________________________
> Von: hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net> im Auftrag
> von Sundara Mohan M <m.sundar85 at gmail.com>
> Gesendet: Montag, Februar 10, 2020 7:33 PM
> An: Stefan Karlsson
> Cc: hotspot-gc-dev at openjdk.java.net
> Betreff: Re: Parallel GC Thread crash
>
> Hi Stefan,
>     We started seeing more crashes on JDK13.0.1+9
>
> Since seeing it on GC Task Thread assumed it is related to GC.
>
> # Problematic frame:
> # V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30
>
> Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m -XX:NewSize=40000m
> -XX:+DisableExplicitGC -Xnoclassgc -XX:+UseParallelGC
> -XX:ParallelGCThreads=40 -XX:ConcGCTh
> reads=5 ...
>
> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat
> Enterprise Linux Server release 6.10 (Santiago)
> Time: Fri Feb  7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h 31m
> 30s)
>
> ---------------  T H R E A D  ---------------
>
> Current thread (0x00007fca6c074000):  GCTaskThread "ParGC Thread#28"
> [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]
>
> Stack: [0x00007fba72ff1000,0x00007fba730f1000],  sp=0x00007fba730ee850,
>  free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30
> V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
> const*, OopClosure*)+0x2eb
> V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> CodeBlobClosure*, RegisterMap*, bool)+0x99
> V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> CodeBlobClosure*)+0x187
> V  [libjvm.so+0xd190be]  ThreadRootsTask::do_it(GCTaskManager*, unsigned
> int)+0x6e
> V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
>
> JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v  ~RuntimeStub::_new_array_Java
> J 58520 c2
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c]
> J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8]
> J 58224 c2
>
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258]
> J 69992 c2
>
> webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4]
> J 55265 c2
>
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0]
> J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z (272
> bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634]
> J 15811% c2
>
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4]
> j
>
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
> bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684]
> J 7550 c1
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4
> [0x00007fca54fba8e0+0x0000000000000df4]
> J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c
> [0x00007fca5454b8c0+0x000000000000007c]
> J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
> v  ~StubRoutines::call_stub
>
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> 0x0000000000000000
>
> Does JDK11 and 13 have different code for GC. Do you think
> downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here?
>
> Any insight to debug this will be helpful.
>
> TIA
> Sundar
>
> On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson <stefan.karlsson at oracle.com
> >
> wrote:
>
> > Hi Sundar,
> >
> > The GC crashes when it encounters something bad on the stack:
> >  > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
> >  > const*, OopClosure*)+0x2eb
> >  > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> >
> > This is probably not a GC bug. It's more likely that this is caused by
> > the JIT compiler. I see in your hotspot-runtime-dev thread, that you
> > also get crashes in other compiler related areas.
> >
> > If you want to rule out the GC, you can run with -XX:+VerifyBeforeGC and
> > -XX:+VerifyAfterGC, and see if this asserts before the GC has started
> > running.
> >
> > StefanK
> >
> > On 2020-02-04 04:38, Sundara Mohan M wrote:
> > > Hi,
> > >     I am seeing following crashes frequently on our servers
> > > #
> > > # A fatal error has been detected by the Java Runtime Environment:
> > > #
> > > #  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
> > > #
> > > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
> > > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered,
> > parallel
> > > gc, linux-amd64)
> > > # Problematic frame:
> > > # V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> > > #
> > > # No core dump will be written. Core dumps have been disabled. To
> enable
> > > core dumping, try "ulimit -c unlimited" before starting Java again
> > > #
> > > # If you would like to submit a bug report, please visit:
> > > #   https://github.com/AdoptOpenJDK/openjdk-build/issues
> > > #
> > >
> > >
> > > ---------------  T H R E A D  ---------------
> > >
> > > Current thread (0x00007fca2c051000):  GCTaskThread "ParGC Thread#8"
> > [stack:
> > > 0x00007fca30277000,0x00007fca30377000] [id=108299]
> > >
> > > Stack: [0x00007fca30277000,0x00007fca30377000],  sp=0x00007fca30374890,
> > >   free space=1014k
> > > Native frames: (J=compiled Java code, A=aot compiled Java code,
> > > j=interpreted, Vv=VM code, C=native code)
> > > V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> > > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
> > > const*, OopClosure*)+0x2eb
> > > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> > > CodeBlobClosure*, RegisterMap*, bool)+0x99
> > > V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> > > CodeBlobClosure*)+0x187
> > > V  [libjvm.so+0xcce2f0]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> > > unsigned int)+0xb0
> > > V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> > > V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> > > V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
> > >
> > > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
> > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> > > v  ~RuntimeStub::_new_array_Java
> > > J 225122 c2
> > >
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> > > (207 bytes) @ 0x00007fca21f1a5d8
> [0x00007fca21f17f20+0x00000000000026b8]
> > > J 62342 c2 webservice.exception.ExceptionLoggingWrapper.execute()V
> (1004
> > > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
> > > J 225129 c2
> > >
> >
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> > > (105 bytes) @ 0x00007fca1da512ac
> [0x00007fca1da51100+0x00000000000001ac]
> > > J 131643 c2
> > >
> >
> webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> > > (9 bytes) @ 0x00007fca20ce6190 [0x00007fca20ce60c0+0x00000000000000d0]
> > > J 55114 c2
> > >
> >
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> > > (332 bytes) @ 0x00007fca2051fe64
> [0x00007fca2051f820+0x0000000000000644]
> > > J 57859 c2 webservice.filters.ResponseSerializationWorker.execute()Z
> (272
> > > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
> > > J 16114% c2
> > >
> >
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> > > (486 bytes) @ 0x00007fca1ced465c
> [0x00007fca1ced4200+0x000000000000045c]
> > > j
> > >
> >
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> > > J 11639 c2 java.util.concurrent.FutureTask.run()V java.base at 13.0.1
> (123
> > > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
> > > J 7560 c1
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> > > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
> > > [0x00007fca15b23160+0x0000000000000df4]
> > > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> > > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
> > > [0x00007fca15b39a40+0x000000000000007c]
> > > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> > > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
> > > v  ~StubRoutines::call_stub
> > >
> > > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > > 0x0000000000000000
> > >
> > > Register to memory mapping:
> > > ...
> > >
> > > Can someone shed more info on when this can happen? I am seeing this on
> > > multiple servers with Java 13.0.1+9 on RHEL6 servers.
> > >
> > > There was another thread in hotspot runtime where David Holmes pointed
> > this
> > >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > > 0x0000000000000000
> > >
> > >> This seems it may be related to:
> > >> https://bugs.openjdk.java.net/browse/JDK-8004124
> > >
> > > Just wondering if this is same or something to do with GC specific.
> > >
> > >
> > >
> > > TIA
> > > Sundar
> > >
> >
>


From m.sundar85 at gmail.com  Mon Feb 10 19:53:51 2020
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Mon, 10 Feb 2020 14:53:51 -0500
Subject: Parallel GC Thread crash
In-Reply-To: <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
 <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
 <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com>
Message-ID: <CACGCMVroXwRupLH3yrC36zZBBR3En=V4tSrfSXCq8mrZdh9zuw@mail.gmail.com>

Hi Stefan,
    Yes we are trying to move to 13.0.2. Wanted to verify if anyone else
seen this or upgrading will really solve this problem.

Can you share how to file a bug report for this? I don't have access to
https://bugs.openjdk.java.net/

I will try to run with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC to get
more information.


Thanks
Sundar

On Mon, Feb 10, 2020 at 2:42 PM Stefan Karlsson <stefan.karlsson at oracle.com>
wrote:

> Hi Sundar,
>
> On 2020-02-10 19:32, Sundara Mohan M wrote:
> > Hi Stefan,
> >     We started seeing more crashes on JDK13.0.1+9
> >
> > Since seeing it on GC Task Thread assumed it is related to GC.
>
> As I said in my previous mail, I don't think this is caused by GC code.
> More below.
>
> >
> > # Problematic frame:
> > # V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30
> >
> > Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m
> > -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc
> > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh
> > reads=5 ...
> >
> > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red
> > Hat Enterprise Linux Server release 6.10 (Santiago)
> > Time: Fri Feb  7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h
> > 31m 30s)
> >
> > ---------------  T H R E A D  ---------------
> >
> > Current thread (0x00007fca6c074000):  GCTaskThread "ParGC Thread#28"
> > [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]
> >
> > Stack: [0x00007fba72ff1000,0x00007fba730f1000],
> >  sp=0x00007fba730ee850,  free space=1014k
> > Native frames: (J=compiled Java code, A=aot compiled Java code,
> > j=interpreted, Vv=VM code, C=native code)
> > V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30
> > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap
> > const*, OopClosure*)+0x2eb
> > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> > CodeBlobClosure*, RegisterMap*, bool)+0x99
> > V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> > CodeBlobClosure*)+0x187
> > V  [libjvm.so+0xd190be]  ThreadRootsTask::do_it(GCTaskManager*,
> > unsigned int)+0x6e
> > V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> > V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> > V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
> >
> > JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed
> > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> > v  ~RuntimeStub::_new_array_Java
> > J 58520 c2
> > ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> > (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c]
> > J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V
> > (1004 bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8]
> > J 58224 c2
> >
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>
> > (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258]
> > J 69992 c2
> >
> webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>
> > (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4]
> > J 55265 c2
> >
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
>
> > (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0]
> > J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z
> > (272 bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634]
> > J 15811% c2
> >
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
>
> > (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4]
> > j
> >
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> > J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
> > bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684]
> > J 7550 c1
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>
> > java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4
> > [0x00007fca54fba8e0+0x0000000000000df4]
> > J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> > java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c
> > [0x00007fca5454b8c0+0x000000000000007c]
> > J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> > 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
> > v  ~StubRoutines::call_stub
> >
> > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> > 0x0000000000000000
> >
> > Does JDK11 and 13 have different code for GC. Do you think
> > downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here?
>
> You should at least move to 13.0.2, to get the latest bug fixes/patches.
>
> There has been a lot of changes in all areas of the JVM between 11 and
> 13. We don't yet know the root cause of this crash, and I can't say if
> this is caused by new changes or not. Have you or anyone filed a bug
> report for this?
>
> > Any insight to debug this will be helpful.
>
> Did you try my previous suggestion to run with -XX:+VerifyBeforeGC and
> -XX:+VerifyAfterGC? If you can tolerate the longer GC times it
> introduces, then you could try to run with
> -XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC -XX:+VerifyAfterGC .
>
> Cheers,
> StefanK
>
> >
> > TIA
> > Sundar
> >
> > On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson
> > <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
> >
> >     Hi Sundar,
> >
> >     The GC crashes when it encounters something bad on the stack:
> >      > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*,
> >     RegisterMap
> >      > const*, OopClosure*)+0x2eb
> >      > V  [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
> >
> >     This is probably not a GC bug. It's more likely that this is
> >     caused by
> >     the JIT compiler. I see in your hotspot-runtime-dev thread, that you
> >     also get crashes in other compiler related areas.
> >
> >     If you want to rule out the GC, you can run with
> >     -XX:+VerifyBeforeGC and
> >     -XX:+VerifyAfterGC, and see if this asserts before the GC has started
> >     running.
> >
> >     StefanK
> >
> >     On 2020-02-04 04:38, Sundara Mohan M wrote:
> >     > Hi,
> >     >     I am seeing following crashes frequently on our servers
> >     > #
> >     > # A fatal error has been detected by the Java Runtime Environment:
> >     > #
> >     > #  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
> >     > #
> >     > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build
> >     13.0.1+9)
> >     > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode,
> >     tiered, parallel
> >     > gc, linux-amd64)
> >     > # Problematic frame:
> >     > # V  [libjvm.so+0xcd3311]
> >     PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> >     > #
> >     > # No core dump will be written. Core dumps have been disabled.
> >     To enable
> >     > core dumping, try "ulimit -c unlimited" before starting Java again
> >     > #
> >     > # If you would like to submit a bug report, please visit:
> >     > # https://github.com/AdoptOpenJDK/openjdk-build/issues
> >     > #
> >     >
> >     >
> >     > ---------------  T H R E A D  ---------------
> >     >
> >     > Current thread (0x00007fca2c051000):  GCTaskThread "ParGC
> >     Thread#8" [stack:
> >     > 0x00007fca30277000,0x00007fca30377000] [id=108299]
> >     >
> >     > Stack: [0x00007fca30277000,0x00007fca30377000],
> >     sp=0x00007fca30374890,
> >     >   free space=1014k
> >     > Native frames: (J=compiled Java code, A=aot compiled Java code,
> >     > j=interpreted, Vv=VM code, C=native code)
> >     > V  [libjvm.so+0xcd3311]
> PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> >     > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*,
> >     RegisterMap
> >     > const*, OopClosure*)+0x2eb
> >     > V  [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
> >     > CodeBlobClosure*, RegisterMap*, bool)+0x99
> >     > V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> >     > CodeBlobClosure*)+0x187
> >     > V  [libjvm.so+0xcce2f0]
> >     ThreadRootsMarkingTask::do_it(GCTaskManager*,
> >     > unsigned int)+0xb0
> >     > V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> >     > V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> >     > V  [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7
> >     >
> >     > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
> >     > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> >     > v  ~RuntimeStub::_new_array_Java
> >     > J 225122 c2
> >     >
> >
>  ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> >     > (207 bytes) @ 0x00007fca21f1a5d8
> >     [0x00007fca21f17f20+0x00000000000026b8]
> >     > J 62342 c2
> >     webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> >     > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
> >     > J 225129 c2
> >     >
> >
>  webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> >     > (105 bytes) @ 0x00007fca1da512ac
> >     [0x00007fca1da51100+0x00000000000001ac]
> >     > J 131643 c2
> >     >
> >
>  webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> >     > (9 bytes) @ 0x00007fca20ce6190
> >     [0x00007fca20ce60c0+0x00000000000000d0]
> >     > J 55114 c2
> >     >
> >
>  webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> >     > (332 bytes) @ 0x00007fca2051fe64
> >     [0x00007fca2051f820+0x0000000000000644]
> >     > J 57859 c2
> >     webservice.filters.ResponseSerializationWorker.execute()Z (272
> >     > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
> >     > J 16114% c2
> >     >
> >
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> >     > (486 bytes) @ 0x00007fca1ced465c
> >     [0x00007fca1ced4200+0x000000000000045c]
> >     > j
> >     >
> >
>   com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> >     > J 11639 c2 java.util.concurrent.FutureTask.run()V
> >     java.base at 13.0.1 (123
> >     > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
> >     > J 7560 c1
> >     >
> >
>  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> >     > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
> >     > [0x00007fca15b23160+0x0000000000000df4]
> >     > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> >     > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
> >     > [0x00007fca15b39a40+0x000000000000007c]
> >     > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> >     > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
> >     > v  ~StubRoutines::call_stub
> >     >
> >     > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> >     > 0x0000000000000000
> >     >
> >     > Register to memory mapping:
> >     > ...
> >     >
> >     > Can someone shed more info on when this can happen? I am seeing
> >     this on
> >     > multiple servers with Java 13.0.1+9 on RHEL6 servers.
> >     >
> >     > There was another thread in hotspot runtime where David Holmes
> >     pointed this
> >     >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL),
> si_addr:
> >     > 0x0000000000000000
> >     >
> >     >> This seems it may be related to:
> >     >> https://bugs.openjdk.java.net/browse/JDK-8004124
> >     >
> >     > Just wondering if this is same or something to do with GC specific.
> >     >
> >     >
> >     >
> >     > TIA
> >     > Sundar
> >     >
> >
>
>


From kim.barrett at oracle.com  Mon Feb 10 19:59:56 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Mon, 10 Feb 2020 14:59:56 -0500
Subject: RFR (S): 8238160: Uniformize Parallel GC task queue variable names
In-Reply-To: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com>
References: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com>
Message-ID: <9881CC0B-D390-43D6-8C60-D6FDBF476DDA@oracle.com>

> On Jan 30, 2020, at 6:08 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
>  can I have reviews for this small change that moves some global typedefs used only by Parallel GC from taskqueue.hpp to parallel gc files, and further makes naming of instances of these more uniform?
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238160
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8238160/webrev/
> Testing:
> local compilation
> 
> Thanks,
>  Thomas

The various "guarantee" checks that operator new didn't return NULL
are a waste of time and space; CHeapObj's operator new exits rather
than returning NULL. They are culterally compatible with other nearby
code though; cleanup later?

Looks good as is.


From stefan.karlsson at oracle.com  Mon Feb 10 20:13:06 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 10 Feb 2020 21:13:06 +0100
Subject: Parallel GC Thread crash
In-Reply-To: <CACGCMVroXwRupLH3yrC36zZBBR3En=V4tSrfSXCq8mrZdh9zuw@mail.gmail.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
 <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
 <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com>
 <CACGCMVroXwRupLH3yrC36zZBBR3En=V4tSrfSXCq8mrZdh9zuw@mail.gmail.com>
Message-ID: <6afab3a3-92ab-f1bf-2022-9e21034cd28a@oracle.com>

On 2020-02-10 20:53, Sundara Mohan M wrote:
> Hi Stefan,
> ? ? Yes we are trying to move to 13.0.2. Wanted to verify if anyone 
> else seen this or upgrading will really solve this?problem.
>
> Can you share how to file a bug report for this? I don't have access 
> to https://bugs.openjdk.java.net/

There are directions in the hs_err crash file that points you to the web 
page to file a bug.

You seem to be running AdoptJDK builds so your bug reports would end up 
at their system:
 >? ? ?> # If you would like to submit a bug report, please visit:
 >? ? ?> # https://github.com/AdoptOpenJDK/openjdk-build/issues


If you were running with Oracle binaries you would get lines like this:
# If you would like to submit a bug report, please visit:
#?? https://bugreport.java.com/bugreport/crash.jsp

>
> I will try to run with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC to 
> get more information.

OK. Hopefully this gives us more information.

StefanK
>
>
> Thanks
> Sundar
>
> On Mon, Feb 10, 2020 at 2:42 PM Stefan Karlsson 
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
>     Hi Sundar,
>
>     On 2020-02-10 19:32, Sundara Mohan M wrote:
>     > Hi?Stefan,
>     > ? ? We started seeing more crashes on JDK13.0.1+9
>     >
>     > Since seeing it on GC Task Thread assumed it is related to GC.
>
>     As I said in my previous mail, I don't think this is caused by GC
>     code.
>     More below.
>
>     >
>     > # Problematic frame:
>     > # V ?[libjvm.so+0xd183c0]
>     ?PSRootsClosure<false>::do_oop(oopDesc**)+0x30
>     >
>     > Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m
>     > -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc
>     > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh
>     > reads=5 ...
>     >
>     > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G,
>     Red
>     > Hat Enterprise Linux Server release 6.10 (Santiago)
>     > Time: Fri Feb ?7 11:15:04 2020 UTC elapsed time: 286290 seconds
>     (3d 7h
>     > 31m 30s)
>     >
>     > --------------- ?T H R E A D ?---------------
>     >
>     > Current thread (0x00007fca6c074000): ?GCTaskThread "ParGC
>     Thread#28"
>     > [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]
>     >
>     > Stack: [0x00007fba72ff1000,0x00007fba730f1000],
>     > ?sp=0x00007fba730ee850, ?free space=1014k
>     > Native frames: (J=compiled Java code, A=aot compiled Java code,
>     > j=interpreted, Vv=VM code, C=native code)
>     > V ?[libjvm.so+0xd183c0]
>     ?PSRootsClosure<false>::do_oop(oopDesc**)+0x30
>     > V ?[libjvm.so+0xc6bf0b] ?OopMapSet::oops_do(frame const*,
>     RegisterMap
>     > const*, OopClosure*)+0x2eb
>     > V ?[libjvm.so+0x765489] ?frame::oops_do_internal(OopClosure*,
>     > CodeBlobClosure*, RegisterMap*, bool)+0x99
>     > V ?[libjvm.so+0xf68b17] ?JavaThread::oops_do(OopClosure*,
>     > CodeBlobClosure*)+0x187
>     > V ?[libjvm.so+0xd190be] ?ThreadRootsTask::do_it(GCTaskManager*,
>     > unsigned int)+0x6e
>     > V ?[libjvm.so+0x7f422b] ?GCTaskThread::run()+0x1eb
>     > V ?[libjvm.so+0xf707fd] ?Thread::call_run()+0x10d
>     > V ?[libjvm.so+0xc875b7] ?thread_native_entry(Thread*)+0xe7
>     >
>     > JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed
>     > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>     > v ?~RuntimeStub::_new_array_Java
>     > J 58520 c2
>     >
>     ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
>
>     > (207 bytes) @ 0x00007fca5fd23dec
>     [0x00007fca5fd1dbc0+0x000000000000622c]
>     > J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V
>     > (1004 bytes) @ 0x00007fca60c02588
>     [0x00007fca60bffce0+0x00000000000028a8]
>     > J 58224 c2
>     >
>     webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>
>     > (105 bytes) @ 0x00007fca5f59bad8
>     [0x00007fca5f59b880+0x0000000000000258]
>     > J 69992 c2
>     >
>     webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>
>     > (9 bytes) @ 0x00007fca5e1019f4
>     [0x00007fca5e101940+0x00000000000000b4]
>     > J 55265 c2
>     >
>     webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
>
>     > (332 bytes) @ 0x00007fca5f6f58e0
>     [0x00007fca5f6f5700+0x00000000000001e0]
>     > J 483122 c2
>     webservice.filters.ResponseSerializationWorker.execute()Z
>     > (272 bytes) @ 0x00007fca622fc2b4
>     [0x00007fca622fbc80+0x0000000000000634]
>     > J 15811% c2
>     >
>     com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
>
>     > (486 bytes) @ 0x00007fca5c108794
>     [0x00007fca5c1082a0+0x00000000000004f4]
>     > j
>     >
>     ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
>     > J 4586 c1 java.util.concurrent.FutureTask.run()V
>     java.base at 13.0.1 (123
>     > bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684]
>     > J 7550 c1
>     >
>     java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>
>     > java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4
>     > [0x00007fca54fba8e0+0x0000000000000df4]
>     > J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>     > java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c
>     > [0x00007fca5454b8c0+0x000000000000007c]
>     > J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
>     > 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
>     > v ?~StubRoutines::call_stub
>     >
>     > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
>     > 0x0000000000000000
>     >
>     > Does JDK11 and 13 have different code for GC. Do you think
>     > downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here?
>
>     You should at least move to 13.0.2, to get the latest bug
>     fixes/patches.
>
>     There has been a lot of changes in all areas of the JVM between 11
>     and
>     13. We don't yet know the root cause of this crash, and I can't
>     say if
>     this is caused by new changes or not. Have you or anyone filed a bug
>     report for this?
>
>     > Any insight to debug this will be helpful.
>
>     Did you try my previous suggestion to run with -XX:+VerifyBeforeGC
>     and
>     -XX:+VerifyAfterGC? If you can tolerate the longer GC times it
>     introduces, then you could try to run with
>     -XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC
>     -XX:+VerifyAfterGC .
>
>     Cheers,
>     StefanK
>
>     >
>     > TIA
>     > Sundar
>     >
>     > On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson
>     > <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>
>     <mailto:stefan.karlsson at oracle.com
>     <mailto:stefan.karlsson at oracle.com>>> wrote:
>     >
>     >? ? ?Hi Sundar,
>     >
>     >? ? ?The GC crashes when it encounters something bad on the stack:
>     >? ? ??> V? [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*,
>     >? ? ?RegisterMap
>     >? ? ??> const*, OopClosure*)+0x2eb
>     >? ? ??> V? [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
>     >
>     >? ? ?This is probably not a GC bug. It's more likely that this is
>     >? ? ?caused by
>     >? ? ?the JIT compiler. I see in your hotspot-runtime-dev thread,
>     that you
>     >? ? ?also get crashes in other compiler related areas.
>     >
>     >? ? ?If you want to rule out the GC, you can run with
>     >? ? ?-XX:+VerifyBeforeGC and
>     >? ? ?-XX:+VerifyAfterGC, and see if this asserts before the GC
>     has started
>     >? ? ?running.
>     >
>     >? ? ?StefanK
>     >
>     >? ? ?On 2020-02-04 04:38, Sundara Mohan M wrote:
>     >? ? ?> Hi,
>     >? ? ?>? ? ?I am seeing following crashes frequently on our servers
>     >? ? ?> #
>     >? ? ?> # A fatal error has been detected by the Java Runtime
>     Environment:
>     >? ? ?> #
>     >? ? ?> #? SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575,
>     tid=108299
>     >? ? ?> #
>     >? ? ?> # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build
>     >? ? ?13.0.1+9)
>     >? ? ?> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode,
>     >? ? ?tiered, parallel
>     >? ? ?> gc, linux-amd64)
>     >? ? ?> # Problematic frame:
>     >? ? ?> # V? [libjvm.so+0xcd3311]
>     >? ? ?PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
>     >? ? ?> #
>     >? ? ?> # No core dump will be written. Core dumps have been disabled.
>     >? ? ?To enable
>     >? ? ?> core dumping, try "ulimit -c unlimited" before starting
>     Java again
>     >? ? ?> #
>     >? ? ?> # If you would like to submit a bug report, please visit:
>     >? ? ?> # https://github.com/AdoptOpenJDK/openjdk-build/issues
>     >? ? ?> #
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> ---------------? T H R E A D? ---------------
>     >? ? ?>
>     >? ? ?> Current thread (0x00007fca2c051000): GCTaskThread "ParGC
>     >? ? ?Thread#8" [stack:
>     >? ? ?> 0x00007fca30277000,0x00007fca30377000] [id=108299]
>     >? ? ?>
>     >? ? ?> Stack: [0x00007fca30277000,0x00007fca30377000],
>     >? ? ?sp=0x00007fca30374890,
>     >? ? ?>? ?free space=1014k
>     >? ? ?> Native frames: (J=compiled Java code, A=aot compiled Java
>     code,
>     >? ? ?> j=interpreted, Vv=VM code, C=native code)
>     >? ? ?> V? [libjvm.so+0xcd3311]
>     PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
>     >? ? ?> V? [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*,
>     >? ? ?RegisterMap
>     >? ? ?> const*, OopClosure*)+0x2eb
>     >? ? ?> V? [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
>     >? ? ?> CodeBlobClosure*, RegisterMap*, bool)+0x99
>     >? ? ?> V? [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*,
>     >? ? ?> CodeBlobClosure*)+0x187
>     >? ? ?> V? [libjvm.so+0xcce2f0]
>     >? ? ?ThreadRootsMarkingTask::do_it(GCTaskManager*,
>     >? ? ?> unsigned int)+0xb0
>     >? ? ?> V? [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb
>     >? ? ?> V? [libjvm.so+0xf707fd] Thread::call_run()+0x10d
>     >? ? ?> V? [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7
>     >? ? ?>
>     >? ? ?> JavaThread 0x00007fb85c004800 (nid = 111387) was being
>     processed
>     >? ? ?> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>     >? ? ?> v? ~RuntimeStub::_new_array_Java
>     >? ? ?> J 225122 c2
>     >? ? ?>
>     >
>     ?ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
>     >? ? ?> (207 bytes) @ 0x00007fca21f1a5d8
>     >? ? ?[0x00007fca21f17f20+0x00000000000026b8]
>     >? ? ?> J 62342 c2
>     > ?webservice.exception.ExceptionLoggingWrapper.execute()V (1004
>     >? ? ?> bytes) @ 0x00007fca20f0aec8
>     [0x00007fca20f07f40+0x0000000000002f88]
>     >? ? ?> J 225129 c2
>     >? ? ?>
>     >
>     ?webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     >? ? ?> (105 bytes) @ 0x00007fca1da512ac
>     >? ? ?[0x00007fca1da51100+0x00000000000001ac]
>     >? ? ?> J 131643 c2
>     >? ? ?>
>     >
>     ?webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     >? ? ?> (9 bytes) @ 0x00007fca20ce6190
>     >? ? ?[0x00007fca20ce60c0+0x00000000000000d0]
>     >? ? ?> J 55114 c2
>     >? ? ?>
>     >
>     ?webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
>     >? ? ?> (332 bytes) @ 0x00007fca2051fe64
>     >? ? ?[0x00007fca2051f820+0x0000000000000644]
>     >? ? ?> J 57859 c2
>     > ?webservice.filters.ResponseSerializationWorker.execute()Z (272
>     >? ? ?> bytes) @ 0x00007fca1ef2ed18
>     [0x00007fca1ef2e140+0x0000000000000bd8]
>     >? ? ?> J 16114% c2
>     >? ? ?>
>     >
>     ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
>     >? ? ?> (486 bytes) @ 0x00007fca1ced465c
>     >? ? ?[0x00007fca1ced4200+0x000000000000045c]
>     >? ? ?> j
>     >? ? ?>
>     >
>     ??com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
>     >? ? ?> J 11639 c2 java.util.concurrent.FutureTask.run()V
>     >? ? ?java.base at 13.0.1 (123
>     >? ? ?> bytes) @ 0x00007fca1cd00858
>     [0x00007fca1cd007c0+0x0000000000000098]
>     >? ? ?> J 7560 c1
>     >? ? ?>
>     >
>     ?java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>     >? ? ?> java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
>     >? ? ?> [0x00007fca15b23160+0x0000000000000df4]
>     >? ? ?> J 5143 c1
>     java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>     >? ? ?> java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
>     >? ? ?> [0x00007fca15b39a40+0x000000000000007c]
>     >? ? ?> J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17
>     bytes) @
>     >? ? ?> 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
>     >? ? ?> v? ~StubRoutines::call_stub
>     >? ? ?>
>     >? ? ?> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL),
>     si_addr:
>     >? ? ?> 0x0000000000000000
>     >? ? ?>
>     >? ? ?> Register to memory mapping:
>     >? ? ?> ...
>     >? ? ?>
>     >? ? ?> Can someone shed more info on when this can happen? I am
>     seeing
>     >? ? ?this on
>     >? ? ?> multiple servers with Java 13.0.1+9 on RHEL6 servers.
>     >? ? ?>
>     >? ? ?> There was another thread in hotspot runtime where David Holmes
>     >? ? ?pointed this
>     >? ? ?>> siginfo: si_signo: 11 (SIGSEGV), si_code: 128
>     (SI_KERNEL), si_addr:
>     >? ? ?> 0x0000000000000000
>     >? ? ?>
>     >? ? ?>> This seems it may be related to:
>     >? ? ?>> https://bugs.openjdk.java.net/browse/JDK-8004124
>     >? ? ?>
>     >? ? ?> Just wondering if this is same or something to do with GC
>     specific.
>     >? ? ?>
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> TIA
>     >? ? ?> Sundar
>     >? ? ?>
>     >
>


From ivan.walulya at oracle.com  Tue Feb 11 07:34:19 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Tue, 11 Feb 2020 08:34:19 +0100
Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified
 logging
Message-ID: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com>

Hi all,

Please review a small modification to  turn parallel gc develop tracing flags into unified logging

Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 <https://bugs.openjdk.java.net/browse/JDK-8232686>
Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/>

Testing: Tier 1 - Tier 3

//Ivan

From stefan.johansson at oracle.com  Tue Feb 11 10:26:32 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Tue, 11 Feb 2020 11:26:32 +0100
Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified
 logging
In-Reply-To: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com>
References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com>
Message-ID: <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com>

H Ivan,

> 11 feb. 2020 kl. 08:34 skrev Ivan Walulya <ivan.walulya at oracle.com>:
> 
> Hi all,
> 
> Please review a small modification to  turn parallel gc develop tracing flags into unified logging
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 <https://bugs.openjdk.java.net/browse/JDK-8232686>
> Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/>
> 
When looking through the webrev again I realized that we can now remove the "#ifdef ASSERT? here:
1616 #ifdef  ASSERT
1617   log_develop_debug(gc, marking)(
1618       "add_obj_count=" SIZE_FORMAT " "
1619       "add_obj_bytes=" SIZE_FORMAT,
1620       add_obj_count,
1621       add_obj_size * HeapWordSize);
1622   log_develop_debug(gc, marking)(
1623       "mark_bitmap_count=" SIZE_FORMAT " "
1624       "mark_bitmap_bytes=" SIZE_FORMAT,
1625       mark_bitmap_count,
1626       mark_bitmap_size * HeapWordSize);
1627 #endif  // #ifdef ASSERT

Otherwise a very nice cleanup.

Thanks,
Stefan

> Testing: Tier 1 - Tier 3
> 
> //Ivan


From thomas.schatzl at oracle.com  Tue Feb 11 10:42:34 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 11 Feb 2020 11:42:34 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
 <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
 <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>
 <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com>
 <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com>
Message-ID: <dafa2653-eb5d-2cc4-9ab8-b49b2872ec08@oracle.com>

Hi,

On 10.02.20 12:47, Liang Mao wrote:
> Hi Thomas,
> 
> In my testing, I didn't change the value of?Min/MaxHeapFreeRatio.
> 
> The heap had already shrinked to 5GB but in remark it expand to?6644M.
> The fault value of MinHeapFreeRatio is 40, so the minimal commit size
> after remark is the heap size * 1.67 (3979M * 1.67 = 6644M).
> 1.67 = 100/(100 - 40)
> 
> 
> [1031.322s][info][gc 
> ]?GC(741)?Pause?Young?(Concurrent?Start)?(G1?Evacuation?Pause)?4724M->4506M(5120M)?10.607ms
> [1031.322s][info][gc,cpu?????????]?GC(741)?User=0.42s?Sys=0.00s?Real=0.01s
> [1031.322s][info][gc?????????????]?GC(742)?Concurrent?Cycle
> [1031.322s][info][gc,marking?????]?GC(742)?Concurrent?Clear?Claimed?Marks
> [1031.322s][info][gc,marking?????]?GC(742)?Concurrent?Clear?Claimed?Marks?0.066ms
> [1031.322s][info][gc,marking?????]?GC(742)?Concurrent?Scan?Root?Regions
> [1031.322s][info][gc,stringdedup?]?Concurrent?String?Deduplication?(1031.322s)
> [1031.323s][info][gc,stringdedup?]?Concurrent?String?Deduplication?14224.0B->0.0B(14224.0B)?avg?51.1%?(1031.322s,?1031.323s)?0.514ms
> [1031.326s][info][gc,marking?????]?GC(742)?Concurrent?Scan?Root?Regions?3.939ms
> [1031.326s][info][gc,marking?????]?GC(742)?Concurrent?Mark?(1031.326s)
> [1031.326s][info][gc,marking?????]?GC(742)?Concurrent?Mark?From?Roots
> [1031.326s][info][gc,task????????]?GC(742)?Using?16?workers?of?16?for?marking
> [1031.483s][info][gc,marking?????]?GC(742)?Concurrent?Mark?From?Roots?157.144ms
> [1031.483s][info][gc,marking?????]?GC(742)?Concurrent?Preclean
> [1031.484s][info][gc,marking?????]?GC(742)?Concurrent?Preclean?0.404ms
> [1031.484s][info][gc,marking?????]?GC(742)?Concurrent?Mark?(1031.326s,?1031.484s)?157.587ms
> [1031.485s][info][gc,start???????]?GC(742)?Pause?Remark
> [1031.496s][info][gc?????????????]?GC(742)?Pause?Remark?4625M->3979M(6644M)?10.953ms
> [1031.496s][info][gc,cpu?????????]?GC(742)?User=0.22s?Sys=0.04s?Real=0.01s
> 
> 
> In our production environment, we never use JEP 346 mainly because of 
> JDK version.
> So I cannot tell how if it would work. I agree the "idle" issue is not 
> our main focus for now.
> 
> Using SoftMaxHeapSize to guide adaptive IHOP to make desicion of concurrent
> mark GC cycle can work well with JEP 346 and the resize logic in remark.
> I don't stick to shrink the heap in every GC.
> 
> The capacity in resize_heap_if_necessary will be
> Max2(min_desire_capacity_by_MinHeapFreeRatio, Min2(soft_max_capacity(), 
> max_desire_capacity_by_MaxHeapFreeRatio))
> 
> But both 2 approaches have the problem that default MinHeapFreeRatio is 
> too large
> in remark comparing to full gc.? As resize_heap_if_necessary
> will keep a minimal heap size as 1.667X of used heap size. After remark,
> the used size could be large that not only include those old regions 
> with garbages but
> also the used young regions.
> 
> #############################
> void?G1CollectedHeap::resize_heap_if_necessary()?{
> ...
> const?size_t?capacity_after_gc?=?capacity();
> const?size_t?used_after_gc?=?capacity_after_gc?-?unused_committed_regions_in_bytes();
> #############################
> 
> The used_after_gc is reasonable for full gc but it can contains young 
> regions in remark.
> Do you think it should be changed like this?
> #############################
> const?size_t?used_after_gc?=?capacity_after_gc?-?unused_committed_regions_in_bytes() 
> - young_regions_count() * HeapRegion::GrainWords;
> // young_regions_count is 0 after full GC
> #############################

Apart from naming ("used_after_gc") which has been wrong since that 
method has been in use for Remark, this seems reasonable.

Maybe "old_used_after_gc"? I think the comments need changes to reflect 
that we apply the Min/MaxHeapFreeRatio on the old gen occupancy now 
(which is the same as total occupancy after full gc) because it may be 
called with young regions active.

I also think the whole code that calculates the expansion and shrinking 
amount should be moved to the policy (and g1collectedheap code just 
calling that and then only react on the return value), but that can be 
done separately.

> 
> Besides this, as you suggested, a lower MinHeapFreeRatio would be good.
> But?arbitrarily setting a fixed number seems is not a good way that the 
> small number may not meet pause time goal in later young GC. I tried to use
> following number in resize_heap_if_necessary:
> 
> ##############################
> void?G1CollectedHeap::resize_heap_if_necessary()?{
> ...
> //?We?can?now?safely?turn?them?into?size_t's.
>  ??size_t?minimum_desired_capacity?=?(size_t)?minimum_desired_capacity_d;
>  ??size_t?maximum_desired_capacity?=?(size_t)?maximum_desired_capacity_d;
> 
> if?(!collector_state()->in_full_gc())?{
>  ????minimum_desired_capacity?=?MIN2(minimum_desired_capacity,?policy()->minimum_desired_bytes(used_after_gc));
>  ??}

That looks a bit hacky... :) But I do not have a better policy for 
sizing after full gc either. Did you try always using the 
minimum_desired_bytes()?

> 
> ....
> size_t?G1Policy::minimum_desired_bytes(size_t?used_bytes)?const?{
>  ??return?_ihop_control->unrestrained_young_size()?!=?0??
>  ???????????_ihop_control->unrestrained_young_size()?:
>  ???????????_young_list_max_length?*?HeapRegion::GrainBytes
>  ?????????+?_reserve_regions?*?HeapRegion::GrainBytes?+?used_bytes;
> }

I think G1IHOPControl::_target_occupancy (add a getter) is what you want 
to use here (untested).

> #############################
> 
> I made the minimum_desired_capacity small enough based on adaptive IHOP's
> _last_unrestrained_young_size. Even without SoftMaxHeapSize, the test can
> keep the memory under 3GB. It's a rough example and I didn't predict the 
> promotion bytes of next young gc yet. Do you think
> a proper value of minimum_desired_capacity in remark resize
> +
> G1AdaptiveIHOPControl::actual_target_threshold according to 
> soft_max_capacity> is enough?

Yes, both fixing the resizing logic and changing the IHOP target (and 
young gen size) according to SoftMaxHeapSize should be sufficient to let 
G1 keep that goal without too many commit activity.

The resizing logic change could be handled under JDK-8238686, although 
this change does not modify the use of MaxHeapFreeRatio.

There is a cleaned up version of my earlier change that implements the 
latter at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/ .

I will test your suggested changes and see its impact on our perf suite.

Thanks a lot,
   Thomas

P.S: it would be nice to send diffs of suggested changes for easier 
application too.


From thomas.schatzl at oracle.com  Tue Feb 11 10:43:57 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 11 Feb 2020 11:43:57 +0100
Subject: RFR (S): 8238160: Uniformize Parallel GC task queue variable names
In-Reply-To: <9881CC0B-D390-43D6-8C60-D6FDBF476DDA@oracle.com>
References: <8d350538-9a82-b420-e7de-319edaf8605c@oracle.com>
 <9881CC0B-D390-43D6-8C60-D6FDBF476DDA@oracle.com>
Message-ID: <d3e8c791-0bb7-db25-5cf7-e59d1f3f74ff@oracle.com>

Hi Sangheon, Kim,

On 10.02.20 20:59, Kim Barrett wrote:
>> On Jan 30, 2020, at 6:08 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> Hi all,
>>
>>   can I have reviews for this small change that moves some global typedefs used only by Parallel GC from taskqueue.hpp to parallel gc files, and further makes naming of instances of these more uniform?
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8238160
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8238160/webrev/
>> Testing:
>> local compilation
>>
>> Thanks,
>>   Thomas
> 
> The various "guarantee" checks that operator new didn't return NULL
> are a waste of time and space; CHeapObj's operator new exits rather
> than returning NULL. They are culterally compatible with other nearby
> code though; cleanup later?
> 
> Looks good as is.
> 

   thanks for your reviews.

I filed JDK-8238854 for looking through the new exits - currently a 
prototype is currently running through testing.

Thanks,
   Thomas


From ivan.walulya at oracle.com  Tue Feb 11 10:47:03 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Tue, 11 Feb 2020 11:47:03 +0100
Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified
 logging
In-Reply-To: <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com>
References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com>
 <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com>
Message-ID: <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com>

Thanks Stefan, find below patch with the suggested updates.

http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/>

http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/>

//Ivan

> On 11 Feb 2020, at 11:26, Stefan Johansson <stefan.johansson at oracle.com> wrote:
> 
> H Ivan,
> 
>> 11 feb. 2020 kl. 08:34 skrev Ivan Walulya <ivan.walulya at oracle.com>:
>> 
>> Hi all,
>> 
>> Please review a small modification to  turn parallel gc develop tracing flags into unified logging
>> 
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 <https://bugs.openjdk.java.net/browse/JDK-8232686>
>> Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/>
>> 
> When looking through the webrev again I realized that we can now remove the "#ifdef ASSERT? here:
> 1616 #ifdef  ASSERT
> 1617   log_develop_debug(gc, marking)(
> 1618       "add_obj_count=" SIZE_FORMAT " "
> 1619       "add_obj_bytes=" SIZE_FORMAT,
> 1620       add_obj_count,
> 1621       add_obj_size * HeapWordSize);
> 1622   log_develop_debug(gc, marking)(
> 1623       "mark_bitmap_count=" SIZE_FORMAT " "
> 1624       "mark_bitmap_bytes=" SIZE_FORMAT,
> 1625       mark_bitmap_count,
> 1626       mark_bitmap_size * HeapWordSize);
> 1627 #endif  // #ifdef ASSERT
> 
> Otherwise a very nice cleanup.
> 
> Thanks,
> Stefan
> 
>> Testing: Tier 1 - Tier 3
>> 
>> //Ivan
> 


From maoliang.ml at alibaba-inc.com  Tue Feb 11 11:46:21 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Tue, 11 Feb 2020 19:46:21 +0800
Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?=
 =?UTF-8?B?aGV1cmlzdGljcw==?=
In-Reply-To: <dafa2653-eb5d-2cc4-9ab8-b49b2872ec08@oracle.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
 <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
 <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>
 <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com>
 <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com>,
 <dafa2653-eb5d-2cc4-9ab8-b49b2872ec08@oracle.com>
Message-ID: <f300ad64-1c29-40e0-938d-e23d5992be54.maoliang.ml@alibaba-inc.com>

Hi Thomas,


> 
>> ....
>> size_t G1Policy::minimum_desired_bytes(size_t used_bytes) const {
>>    return _ihop_control->unrestrained_young_size() != 0 ?
>>             _ihop_control->unrestrained_young_size() :
>>             _young_list_max_length * HeapRegion::GrainBytes
>>           + _reserve_regions * HeapRegion::GrainBytes + used_bytes;
>> }

> I think G1IHOPControl::_target_occupancy (add a getter) is what you want 
> to use here (untested).

I'm not looking for _target_occupancy which is current heap capacity
because the minimum bytes  may exceed it. Since the memory
 usage is almost at peak in remark, 
old_use_bytes + promoted_bytes_in_next_gc + unrestrained_young_bytes
can be the minimum desired bytes.


> There is a cleaned up version of my earlier change that implements the 
> latter at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/ .

I have a question that heap size can be shrinked even commit size is not
changed so it could cause a waste of committed free regions.

Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 11 (Tue.) 18:42
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On 10.02.20 12:47, Liang Mao wrote:
> Hi Thomas,
> 
> In my testing, I didn't change the value of Min/MaxHeapFreeRatio.
> 
> The heap had already shrinked to 5GB but in remark it expand to 6644M.
> The fault value of MinHeapFreeRatio is 40, so the minimal commit size
> after remark is the heap size * 1.67 (3979M * 1.67 = 6644M).
> 1.67 = 100/(100 - 40)
> 
> 
> [1031.322s][info][gc 
> ] GC(741) Pause Young (Concurrent Start) (G1 Evacuation Pause) 4724M->4506M(5120M) 10.607ms
> [1031.322s][info][gc,cpu         ] GC(741) User=0.42s Sys=0.00s Real=0.01s
> [1031.322s][info][gc             ] GC(742) Concurrent Cycle
> [1031.322s][info][gc,marking     ] GC(742) Concurrent Clear Claimed Marks
> [1031.322s][info][gc,marking     ] GC(742) Concurrent Clear Claimed Marks 0.066ms
> [1031.322s][info][gc,marking     ] GC(742) Concurrent Scan Root Regions
> [1031.322s][info][gc,stringdedup ] Concurrent String Deduplication (1031.322s)
> [1031.323s][info][gc,stringdedup ] Concurrent String Deduplication 14224.0B->0.0B(14224.0B) avg 51.1% (1031.322s, 1031.323s) 0.514ms
> [1031.326s][info][gc,marking     ] GC(742) Concurrent Scan Root Regions 3.939ms
> [1031.326s][info][gc,marking     ] GC(742) Concurrent Mark (1031.326s)
> [1031.326s][info][gc,marking     ] GC(742) Concurrent Mark From Roots
> [1031.326s][info][gc,task        ] GC(742) Using 16 workers of 16 for marking
> [1031.483s][info][gc,marking     ] GC(742) Concurrent Mark From Roots 157.144ms
> [1031.483s][info][gc,marking     ] GC(742) Concurrent Preclean
> [1031.484s][info][gc,marking     ] GC(742) Concurrent Preclean 0.404ms
> [1031.484s][info][gc,marking     ] GC(742) Concurrent Mark (1031.326s, 1031.484s) 157.587ms
> [1031.485s][info][gc,start       ] GC(742) Pause Remark
> [1031.496s][info][gc             ] GC(742) Pause Remark 4625M->3979M(6644M) 10.953ms
> [1031.496s][info][gc,cpu         ] GC(742) User=0.22s Sys=0.04s Real=0.01s
> 
> 
> In our production environment, we never use JEP 346 mainly because of 
> JDK version.
> So I cannot tell how if it would work. I agree the "idle" issue is not 
> our main focus for now.
> 
> Using SoftMaxHeapSize to guide adaptive IHOP to make desicion of concurrent
> mark GC cycle can work well with JEP 346 and the resize logic in remark.
> I don't stick to shrink the heap in every GC.
> 
> The capacity in resize_heap_if_necessary will be
> Max2(min_desire_capacity_by_MinHeapFreeRatio, Min2(soft_max_capacity(), 
> max_desire_capacity_by_MaxHeapFreeRatio))
> 
> But both 2 approaches have the problem that default MinHeapFreeRatio is 
> too large
> in remark comparing to full gc.  As resize_heap_if_necessary
> will keep a minimal heap size as 1.667X of used heap size. After remark,
> the used size could be large that not only include those old regions 
> with garbages but
> also the used young regions.
> 
> #############################
> void G1CollectedHeap::resize_heap_if_necessary() {
> ...
> const size_t capacity_after_gc = capacity();
> const size_t used_after_gc = capacity_after_gc - unused_committed_regions_in_bytes();
> #############################
> 
> The used_after_gc is reasonable for full gc but it can contains young 
> regions in remark.
> Do you think it should be changed like this?
> #############################
> const size_t used_after_gc = capacity_after_gc - unused_committed_regions_in_bytes() 
> - young_regions_count() * HeapRegion::GrainWords;
> // young_regions_count is 0 after full GC
> #############################

Apart from naming ("used_after_gc") which has been wrong since that 
method has been in use for Remark, this seems reasonable.

Maybe "old_used_after_gc"? I think the comments need changes to reflect 
that we apply the Min/MaxHeapFreeRatio on the old gen occupancy now 
(which is the same as total occupancy after full gc) because it may be 
called with young regions active.

I also think the whole code that calculates the expansion and shrinking 
amount should be moved to the policy (and g1collectedheap code just 
calling that and then only react on the return value), but that can be 
done separately.

> 
> Besides this, as you suggested, a lower MinHeapFreeRatio would be good.
> But arbitrarily setting a fixed number seems is not a good way that the 
> small number may not meet pause time goal in later young GC. I tried to use
> following number in resize_heap_if_necessary:
> 
> ##############################
> void G1CollectedHeap::resize_heap_if_necessary() {
> ...
> // We can now safely turn them into size_t's.
>    size_t minimum_desired_capacity = (size_t) minimum_desired_capacity_d;
>    size_t maximum_desired_capacity = (size_t) maximum_desired_capacity_d;
> 
> if (!collector_state()->in_full_gc()) {
>      minimum_desired_capacity = MIN2(minimum_desired_capacity, policy()->minimum_desired_bytes(used_after_gc));
>    }

That looks a bit hacky... :) But I do not have a better policy for 
sizing after full gc either. Did you try always using the 
minimum_desired_bytes()?

> 
> ....
> size_t G1Policy::minimum_desired_bytes(size_t used_bytes) const {
>    return _ihop_control->unrestrained_young_size() != 0 ?
>             _ihop_control->unrestrained_young_size() :
>             _young_list_max_length * HeapRegion::GrainBytes
>           + _reserve_regions * HeapRegion::GrainBytes + used_bytes;
> }

I think G1IHOPControl::_target_occupancy (add a getter) is what you want 
to use here (untested).

> #############################
> 
> I made the minimum_desired_capacity small enough based on adaptive IHOP's
> _last_unrestrained_young_size. Even without SoftMaxHeapSize, the test can
> keep the memory under 3GB. It's a rough example and I didn't predict the 
> promotion bytes of next young gc yet. Do you think
> a proper value of minimum_desired_capacity in remark resize
> +
> G1AdaptiveIHOPControl::actual_target_threshold according to 
> soft_max_capacity> is enough?

Yes, both fixing the resizing logic and changing the IHOP target (and 
young gen size) according to SoftMaxHeapSize should be sufficient to let 
G1 keep that goal without too many commit activity.

The resizing logic change could be handled under JDK-8238686, although 
this change does not modify the use of MaxHeapFreeRatio.

There is a cleaned up version of my earlier change that implements the 
latter at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/ .

I will test your suggested changes and see its impact on our perf suite.

Thanks a lot,
   Thomas

P.S: it would be nice to send diffs of suggested changes for easier 
application too.


From thomas.schatzl at oracle.com  Tue Feb 11 12:08:02 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 11 Feb 2020 13:08:02 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <f300ad64-1c29-40e0-938d-e23d5992be54.maoliang.ml@alibaba-inc.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
 <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
 <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>
 <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com>
 <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com>
 <dafa2653-eb5d-2cc4-9ab8-b49b2872ec08@oracle.com>
 <f300ad64-1c29-40e0-938d-e23d5992be54.maoliang.ml@alibaba-inc.com>
Message-ID: <0cf69702-549b-9ef6-f13b-33a735536873@oracle.com>

Hi,

On 11.02.20 12:46, Liang Mao wrote:
> Hi Thomas,
> 
> 
>> 
>>>?....
>>>?size_t?G1Policy::minimum_desired_bytes(size_t?used_bytes)?const?{
>>>????return?_ihop_control->unrestrained_young_size()?!=?0??
>>>?????????????_ihop_control->unrestrained_young_size()?:
>>>?????????????_young_list_max_length?*?HeapRegion::GrainBytes
>>>???????????+?_reserve_regions?*?HeapRegion::GrainBytes?+?used_bytes;
>>>?}
> 
>> I?think?G1IHOPControl::_target_occupancy?(add?a?getter)?is?what?you?want 
>> to?use?here?(untested).
> 
> I'm not looking for _target_occupancy which is current heap capacity
> because the minimum bytes may exceed it. Since the memory
> usage is almost at peak in remark,
> old_use_bytes + promoted_bytes_in_next_gc + unrestrained_young_bytes
> can be?the minimum desired bytes.

You are right, I need to think about this some more.

I think the calculation assumes that the next gc is the first mixed gc, 
which isn't true, there's that "Prepare Mixed" GC too. But as an initial 
approximation it should work.

> 
> 
>> There?is?a?cleaned?up?version?of?my?earlier?change?that?implements?the 
>> latter?at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/?.
> 
> I have a question that heap size can be shrinked even commit size is not
> changed so it could cause a waste of committed free regions.

You mean because of regions being larger than the commit size or other 
reasons? I.e. you have 2M large pages but only 1M regions, so you may 
end up with G1 not being able to actually uncommit as only half of that 
2M page is free?

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Tue Feb 11 12:28:28 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 11 Feb 2020 13:28:28 +0100
Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified
 logging
In-Reply-To: <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com>
References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com>
 <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com>
 <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com>
Message-ID: <f2e67345-dea5-34bf-a699-f8914c222d81@oracle.com>

Hi,

On 11.02.20 11:47, Ivan Walulya wrote:
> Thanks Stefan, find below patch with the suggested updates.
> 
> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/>
> 
> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/>

   lgtm.

Thanks,
   Thomas


From maoliang.ml at alibaba-inc.com  Tue Feb 11 12:52:25 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Tue, 11 Feb 2020 20:52:25 +0800
Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?=
 =?UTF-8?B?aGV1cmlzdGljcw==?=
In-Reply-To: <0cf69702-549b-9ef6-f13b-33a735536873@oracle.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
 <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
 <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>
 <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com>
 <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com>
 <dafa2653-eb5d-2cc4-9ab8-b49b2872ec08@oracle.com>
 <f300ad64-1c29-40e0-938d-e23d5992be54.maoliang.ml@alibaba-inc.com>,
 <0cf69702-549b-9ef6-f13b-33a735536873@oracle.com>
Message-ID: <be75d861-86cf-4ffd-aaff-28a3af4dce1f.maoliang.ml@alibaba-inc.com>

Hi Thomas,

> I think the calculation assumes that the next gc is the first mixed gc, 
> which isn't true, there's that "Prepare Mixed" GC too. But as an initial 
> approximation it should work.

I assumed the prepare mixed GC. But technically we need the promotion
bytes in 1st mixed GC too, right? After I took a look at gc 
log, i found there could be several normal young GC between remark
and "Prepare Mixed" GC because it cost time to do some cleanup.
So do you think resize in "Pause Cleanup" is a better way?

> You mean because of regions being larger than the commit size or other 
> reasons? I.e. you have 2M large pages but only 1M regions, so you may 
> end up with G1 not being able to actually uncommit as only half of that 
> 2M page is free?

No. Sorry for my unclear discription. My point is update_heap_target_size
can happen in every normal GC but in remark real shrink may never happen
(the large MaxHeapFreeRatio will prevent the shrinking).
So we may use smaller heap size but no regions are uncommitted.

Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 11 (Tue.) 20:08
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On 11.02.20 12:46, Liang Mao wrote:
> Hi Thomas,
> 
> 
>> 
>>> ....
>>> size_t G1Policy::minimum_desired_bytes(size_t used_bytes) const {
>>>    return _ihop_control->unrestrained_young_size() != 0 ?
>>>             _ihop_control->unrestrained_young_size() :
>>>             _young_list_max_length * HeapRegion::GrainBytes
>>>           + _reserve_regions * HeapRegion::GrainBytes + used_bytes;
>>> }
> 
>> I think G1IHOPControl::_target_occupancy (add a getter) is what you want 
>> to use here (untested).
> 
> I'm not looking for _target_occupancy which is current heap capacity
> because the minimum bytes may exceed it. Since the memory
> usage is almost at peak in remark,
> old_use_bytes + promoted_bytes_in_next_gc + unrestrained_young_bytes
> can be the minimum desired bytes.

You are right, I need to think about this some more.

I think the calculation assumes that the next gc is the first mixed gc, 
which isn't true, there's that "Prepare Mixed" GC too. But as an initial 
approximation it should work.

> 
> 
>> There is a cleaned up version of my earlier change that implements the 
>> latter at http://cr.openjdk.java.net/~tschatzl/8236073/webrev.1/ .
> 
> I have a question that heap size can be shrinked even commit size is not
> changed so it could cause a waste of committed free regions.

You mean because of regions being larger than the commit size or other 
reasons? I.e. you have 2M large pages but only 1M regions, so you may 
end up with G1 not being able to actually uncommit as only half of that 
2M page is free?

Thanks,
   Thomas


From ivan.walulya at oracle.com  Tue Feb 11 13:10:23 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Tue, 11 Feb 2020 14:10:23 +0100
Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified
 logging
In-Reply-To: <f2e67345-dea5-34bf-a699-f8914c222d81@oracle.com>
References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com>
 <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com>
 <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com>
 <f2e67345-dea5-34bf-a699-f8914c222d81@oracle.com>
Message-ID: <3840FA83-22C6-4BA9-A7D3-7F3027653BD5@oracle.com>

Thanks Thomas!

> On 11 Feb 2020, at 13:28, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi,
> 
> On 11.02.20 11:47, Ivan Walulya wrote:
>> Thanks Stefan, find below patch with the suggested updates.
>> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/>
>> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/>
> 
>  lgtm.
> 
> Thanks,
>  Thomas


From thomas.schatzl at oracle.com  Tue Feb 11 13:27:36 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 11 Feb 2020 14:27:36 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <be75d861-86cf-4ffd-aaff-28a3af4dce1f.maoliang.ml@alibaba-inc.com>
References: <90aa2259-afce-44af-abb2-31700caea4a0.maoliang.ml@alibaba-inc.com>
 <7085d9f4-d579-2fb1-c3ba-938a01ab7576@oracle.com>
 <6a4dfc59-217c-446d-94ec-f4796d44617c.maoliang.ml@alibaba-inc.com>
 <72f8bfb6-2039-1d6b-c312-2a9dafe0b735@oracle.com>
 <97f72395-af73-41f7-98f3-8e22cce5b79b.maoliang.ml@alibaba-inc.com>
 <dafa2653-eb5d-2cc4-9ab8-b49b2872ec08@oracle.com>
 <f300ad64-1c29-40e0-938d-e23d5992be54.maoliang.ml@alibaba-inc.com>
 <0cf69702-549b-9ef6-f13b-33a735536873@oracle.com>
 <be75d861-86cf-4ffd-aaff-28a3af4dce1f.maoliang.ml@alibaba-inc.com>
Message-ID: <8bf05fec-3f74-f070-28f7-1c61335a3715@oracle.com>

Hi,

On 11.02.20 13:52, Liang Mao wrote:
> Hi Thomas,
> 
>> I?think?the?calculation?assumes?that?the?next?gc?is?the?first?mixed?gc, 
>> which?isn't?true,?there's?that?"Prepare?Mixed"?GC?too.?But?as?an?initial 
>> approximation?it?should?work.
> 
> I assumed the prepare mixed GC. But technically we need the promotion
> bytes in 1st mixed GC too, right? After I took a look at gc
> log, i found there could be several normal young GC between remark
> and "Prepare Mixed" GC because it cost time to do some cleanup.

Actually, building the remembered sets.

> So do you think resize in "Pause?Cleanup" is a better way?

I am certainly not opposed to moving resizing to the cleanup pause or 
anywhere else (last mixed gc?) where it makes most sense. Moving to 
Cleanup would likely make prediction about the "needed" memory easier.

>> You?mean?because?of?regions?being?larger?than?the?commit?size?or?other 
>> reasons??I.e.?you?have?2M?large?pages?but?only?1M?regions,?so?you?may 
>> end?up?with?G1?not?being?able?to?actually?uncommit?as?only?half?of?that 
>> 2M?page?is?free?
> 
> No. Sorry for my unclear discription. My point is update_heap_target_size
> can happen in every normal GC but in remark real shrink may never happen
> (the large MaxHeapFreeRatio will prevent the shrinking).
> So we may use smaller heap size but no regions are uncommitted.

that is true, that is the same issue I wanted to point out with my 
earlier remark about "The resizing logic change could be handled under 
JDK-8238686, although this change does not modify the use of 
MaxHeapFreeRatio. " - i.e. does not uncommit space due to MaxHeapFreeRatio.

That may be handled separately if it is easier; JDK-8238686 suggests to 
think about removing the use of Min/MaxHeapFreeRatio alltogether (or 
maybe use only during full gc).

> 
> Thanks,
> Liang
> 

Hth,
   Thomas


From thomas.schatzl at oracle.com  Tue Feb 11 13:30:09 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 11 Feb 2020 14:30:09 +0100
Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure checks
Message-ID: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com>

Hi all,

   can I have reviews for this change that removes superfluous C heap 
allocation failure checks (basically hard-exiting the VM) because by 
default C allocation already exits the VM.

CR:
https://bugs.openjdk.java.net/browse/JDK-8238854
Webrev:
http://cr.openjdk.java.net/~tschatzl/8238854/webrev/
Testing:
hs-tier1-5 without differences

Thanks,
   Thomas


From rkennke at redhat.com  Tue Feb 11 15:38:38 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 11 Feb 2020 16:38:38 +0100
Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of correct
 type
Message-ID: <b1940072-a2c4-4524-7b14-fa77ad9c7920@redhat.com>

In ShBSC1::ensure_in_register() we are blindly creating registers of
type T_OBJECT, even though in some cases we actually need T_ADDRESS.
This blows up when we verify oop registers: when the argument is of type
T_OBJECT we perform extra checks that fail when the value in register is
not actually an object.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8238851
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/
Testing: the provided testcase passes now. hotspot_gc_shenandoah

Ok?

Thanks,
Roman


From shade at redhat.com  Tue Feb 11 16:18:48 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 11 Feb 2020 17:18:48 +0100
Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of
 correct type
In-Reply-To: <b1940072-a2c4-4524-7b14-fa77ad9c7920@redhat.com>
References: <b1940072-a2c4-4524-7b14-fa77ad9c7920@redhat.com>
Message-ID: <c0d9333a-47c5-65ab-c362-8c44ee0fd1e5@redhat.com>

On 2/11/20 4:38 PM, Roman Kennke wrote:
> In ShBSC1::ensure_in_register() we are blindly creating registers of
> type T_OBJECT, even though in some cases we actually need T_ADDRESS.
> This blows up when we verify oop registers: when the argument is of type
> T_OBJECT we perform extra checks that fail when the value in register is
> not actually an object.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8238851
> Webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/
> Testing: the provided testcase passes now. hotspot_gc_shenandoah

This path probably needs adjustment too:

 167 #ifdef AARCH64
 168       // AArch64 expects double-size register.
 169       obj_reg = gen->new_pointer_register();
 170 #else

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Tue Feb 11 19:09:58 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 11 Feb 2020 20:09:58 +0100
Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of
 correct type
In-Reply-To: <c0d9333a-47c5-65ab-c362-8c44ee0fd1e5@redhat.com>
References: <b1940072-a2c4-4524-7b14-fa77ad9c7920@redhat.com>
 <c0d9333a-47c5-65ab-c362-8c44ee0fd1e5@redhat.com>
Message-ID: <cec991c5-355b-f63a-f5f8-9e769eb20b1b@redhat.com>

>> In ShBSC1::ensure_in_register() we are blindly creating registers of
>> type T_OBJECT, even though in some cases we actually need T_ADDRESS.
>> This blows up when we verify oop registers: when the argument is of type
>> T_OBJECT we perform extra checks that fail when the value in register is
>> not actually an object.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8238851
>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/
>> Testing: the provided testcase passes now. hotspot_gc_shenandoah
> 
> This path probably needs adjustment too:
> 
>  167 #ifdef AARCH64
>  168       // AArch64 expects double-size register.
>  169       obj_reg = gen->new_pointer_register();
>  170 #else

The provided test passes on aarch64 without any additional changes.

I tried removing the block, hoping that the suggested change does
perhaps make it unnecessary, but no. It's still needed.

Further suggestions welcome. This whole thing kinda smells.

Roman


From kim.barrett at oracle.com  Wed Feb 12 00:45:33 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 11 Feb 2020 19:45:33 -0500
Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop 
Message-ID: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com>

Please review this change to G1DirtyCardQueueSet::Queue::pop.
Previously, if there was exactly one element in the queue, a pop
operation could not return it, because doing so could break invariants
for concurrent operations.  Now, if there is one element and there are
concurrent pop operations, one of those operations will win.  Note
that there are still races between pop and push/append that may
prevent the pop operation from obtaining an element.

CR:
https://bugs.openjdk.java.net/browse/JDK-8238867

Webrev:
https://cr.openjdk.java.net/~kbarrett/8238867/open.00/

Testing:
mach5 tier1-3.
mach5 tier1-5 (only linux-x64) in conjunction with other changes.
Some performance testing didn't find any unexpected differences.


From leo.korinth at oracle.com  Wed Feb 12 08:12:48 2020
From: leo.korinth at oracle.com (Leo Korinth)
Date: Wed, 12 Feb 2020 09:12:48 +0100
Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified
 logging
In-Reply-To: <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com>
References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com>
 <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com>
 <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com>
Message-ID: <9b402df8-7e9d-0bc3-dee8-99709d82117e@oracle.com>

Hi Ivan,

On 11/02/2020 11:47, Ivan Walulya wrote:
> Thanks Stefan, find below patch with the suggested updates.
> 
> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/>
> 
> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/>


Looks good, I will help you push it.

Thanks,
Leo


> 
> //Ivan
> 
>> On 11 Feb 2020, at 11:26, Stefan Johansson <stefan.johansson at oracle.com> wrote:
>>
>> H Ivan,
>>
>>> 11 feb. 2020 kl. 08:34 skrev Ivan Walulya <ivan.walulya at oracle.com>:
>>>
>>> Hi all,
>>>
>>> Please review a small modification to  turn parallel gc develop tracing flags into unified logging
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 <https://bugs.openjdk.java.net/browse/JDK-8232686>
>>> Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/>
>>>
>> When looking through the webrev again I realized that we can now remove the "#ifdef ASSERT? here:
>> 1616 #ifdef  ASSERT
>> 1617   log_develop_debug(gc, marking)(
>> 1618       "add_obj_count=" SIZE_FORMAT " "
>> 1619       "add_obj_bytes=" SIZE_FORMAT,
>> 1620       add_obj_count,
>> 1621       add_obj_size * HeapWordSize);
>> 1622   log_develop_debug(gc, marking)(
>> 1623       "mark_bitmap_count=" SIZE_FORMAT " "
>> 1624       "mark_bitmap_bytes=" SIZE_FORMAT,
>> 1625       mark_bitmap_count,
>> 1626       mark_bitmap_size * HeapWordSize);
>> 1627 #endif  // #ifdef ASSERT
>>
>> Otherwise a very nice cleanup.
>>
>> Thanks,
>> Stefan
>>
>>> Testing: Tier 1 - Tier 3
>>>
>>> //Ivan
>>
> 


From shade at redhat.com  Wed Feb 12 09:10:23 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 12 Feb 2020 10:10:23 +0100
Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of
 correct type
In-Reply-To: <cec991c5-355b-f63a-f5f8-9e769eb20b1b@redhat.com>
References: <b1940072-a2c4-4524-7b14-fa77ad9c7920@redhat.com>
 <c0d9333a-47c5-65ab-c362-8c44ee0fd1e5@redhat.com>
 <cec991c5-355b-f63a-f5f8-9e769eb20b1b@redhat.com>
Message-ID: <80502580-be2e-7cc8-c0e8-e9a9d11adffc@redhat.com>

On 2/11/20 8:09 PM, Roman Kennke wrote:
>>> In ShBSC1::ensure_in_register() we are blindly creating registers of
>>> type T_OBJECT, even though in some cases we actually need T_ADDRESS.
>>> This blows up when we verify oop registers: when the argument is of type
>>> T_OBJECT we perform extra checks that fail when the value in register is
>>> not actually an object.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8238851
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/
>>> Testing: the provided testcase passes now. hotspot_gc_shenandoah
>>
>> This path probably needs adjustment too:
>>
>>  167 #ifdef AARCH64
>>  168       // AArch64 expects double-size register.
>>  169       obj_reg = gen->new_pointer_register();
>>  170 #else
> 
> The provided test passes on aarch64 without any additional changes.
> 
> I tried removing the block, hoping that the suggested change does
> perhaps make it unnecessary, but no. It's still needed.

Gaawh. The non-AARCH64 path still looks good, so we can push it in current form. We really need to
figure out AARCH64 thingie, please file the follow-up RFR?

-- 
Thanks,
-Aleksey


From ivan.walulya at oracle.com  Wed Feb 12 09:31:16 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Wed, 12 Feb 2020 10:31:16 +0100
Subject: RFR: 8232686: Turn parallel gc develop tracing flags into unified
 logging
In-Reply-To: <9b402df8-7e9d-0bc3-dee8-99709d82117e@oracle.com>
References: <5B542A0A-3477-40F9-9DD8-AC86E3870E60@oracle.com>
 <37724756-9D64-4DDB-9C8D-A5C4A24B23E9@oracle.com>
 <7E333365-53C8-49C0-A0C8-5464E3C8BCDC@oracle.com>
 <9b402df8-7e9d-0bc3-dee8-99709d82117e@oracle.com>
Message-ID: <88DBCBBC-C355-4BD5-8DAA-1ECDD487D17A@oracle.com>

Thanks Leo!

//Ivan

> On 12 Feb 2020, at 09:12, Leo Korinth <leo.korinth at oracle.com> wrote:
> 
> Hi Ivan,
> 
> On 11/02/2020 11:47, Ivan Walulya wrote:
>> Thanks Stefan, find below patch with the suggested updates.
>> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00-01/>
>> http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/01/>
> 
> 
> Looks good, I will help you push it.
> 
> Thanks,
> Leo
> 
> 
>> //Ivan
>>> On 11 Feb 2020, at 11:26, Stefan Johansson <stefan.johansson at oracle.com> wrote:
>>> 
>>> H Ivan,
>>> 
>>>> 11 feb. 2020 kl. 08:34 skrev Ivan Walulya <ivan.walulya at oracle.com>:
>>>> 
>>>> Hi all,
>>>> 
>>>> Please review a small modification to  turn parallel gc develop tracing flags into unified logging
>>>> 
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8232686 <https://bugs.openjdk.java.net/browse/JDK-8232686>
>>>> Webrev: http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/ <http://cr.openjdk.java.net/~sjohanss/iwalulya/8232686/00/>
>>>> 
>>> When looking through the webrev again I realized that we can now remove the "#ifdef ASSERT? here:
>>> 1616 #ifdef  ASSERT
>>> 1617   log_develop_debug(gc, marking)(
>>> 1618       "add_obj_count=" SIZE_FORMAT " "
>>> 1619       "add_obj_bytes=" SIZE_FORMAT,
>>> 1620       add_obj_count,
>>> 1621       add_obj_size * HeapWordSize);
>>> 1622   log_develop_debug(gc, marking)(
>>> 1623       "mark_bitmap_count=" SIZE_FORMAT " "
>>> 1624       "mark_bitmap_bytes=" SIZE_FORMAT,
>>> 1625       mark_bitmap_count,
>>> 1626       mark_bitmap_size * HeapWordSize);
>>> 1627 #endif  // #ifdef ASSERT
>>> 
>>> Otherwise a very nice cleanup.
>>> 
>>> Thanks,
>>> Stefan
>>> 
>>>> Testing: Tier 1 - Tier 3
>>>> 
>>>> //Ivan
>>> 


From maoliang.ml at alibaba-inc.com  Wed Feb 12 10:17:15 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Wed, 12 Feb 2020 18:17:15 +0800
Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?=
 =?UTF-8?B?aXN0aWNz?=
Message-ID: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>

Hi Thomas,

I made a new patch for the issues we listed in JDK-8238686 and
JDK-8236073:
http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/

Main changes are:
1) Don't use MinHeapFreeRatio in concurrent mark stage to guarantee
the minimal commit size as we discussed. I use the IHOP prediction
instead. 

2) Remove resize_heap_if_necessary in remark. Heap expansion
will be based on 1)  in concurrent mark cleanup pause but there is
no shrink at that time

3) Heap shrink will happen after mixed GC(s). I use 3 number to 
determine the target capacity: a) maximum_desired_capacity by
MaxHeapFreeRatio(here I still use MaxHeapFreeRatio because it is
unified with full gc, 30% of live objects make sence);
b) minimum_desired_bytes predicted in 1) in cleanup pause(to make sure
we will not do a shrink just after an expansion);
c) soft_max_capacity

4) expand/shrink logic are moved into sizing policy.

Apparently, it solves the issues in both JDK-8238686 and JDK-8236073.

I have performanced the original specjbb2015 test and it looks fine.  The
test will not commit memory so aggressive as original in remark and it
is able to shrink  heap after changing SoftMaxHeapSize(2500M) via
 jinfo. The heap capacity will drop from ~3G to 2500m. The new flow
can work with JEP 346 and benifit it for better memory saving.

The only remaining problem is shrinking heap after mixed GCs may not
happen on time if application is in "idle". We may still need a timer to
 make sure mixed GC can happen? 


Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 11 (Tue.) 21:27
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On 11.02.20 13:52, Liang Mao wrote:
> Hi Thomas,
> 
>> I think the calculation assumes that the next gc is the first mixed gc, 
>> which isn't true, there's that "Prepare Mixed" GC too. But as an initial 
>> approximation it should work.
> 
> I assumed the prepare mixed GC. But technically we need the promotion
> bytes in 1st mixed GC too, right? After I took a look at gc
> log, i found there could be several normal young GC between remark
> and "Prepare Mixed" GC because it cost time to do some cleanup.

Actually, building the remembered sets.

> So do you think resize in "Pause Cleanup" is a better way?

I am certainly not opposed to moving resizing to the cleanup pause or 
anywhere else (last mixed gc?) where it makes most sense. Moving to 
Cleanup would likely make prediction about the "needed" memory easier.

>> You mean because of regions being larger than the commit size or other 
>> reasons? I.e. you have 2M large pages but only 1M regions, so you may 
>> end up with G1 not being able to actually uncommit as only half of that 
>> 2M page is free?
> 
> No. Sorry for my unclear discription. My point is update_heap_target_size
> can happen in every normal GC but in remark real shrink may never happen
> (the large MaxHeapFreeRatio will prevent the shrinking).
> So we may use smaller heap size but no regions are uncommitted.

that is true, that is the same issue I wanted to point out with my 
earlier remark about "The resizing logic change could be handled under 
JDK-8238686, although this change does not modify the use of 
MaxHeapFreeRatio. " - i.e. does not uncommit space due to MaxHeapFreeRatio.

That may be handled separately if it is easier; JDK-8238686 suggests to 
think about removing the use of Min/MaxHeapFreeRatio alltogether (or 
maybe use only during full gc).

> 
> Thanks,
> Liang
> 

Hth,
   Thomas


From richard.reingruber at sap.com  Wed Feb 12 10:23:27 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Wed, 12 Feb 2020 10:23:27 +0000
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
Message-ID: <AM0PR0202MB33311AB7623F4164335286309B1B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>

// Repost including hotspot runtime and gc lists.
// Dean Long suggested to do so, because the enhancement replaces a vm operation
// with a handshake.
// Original thread: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-February/030359.html

Hi,

could I please get reviews for this small enhancement in hotspot's jvmti implementation:

Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
Bug:    https://bugs.openjdk.java.net/browse/JDK-8238585

The change avoids making all compiled methods on stack not_entrant when switching a java thread to
interpreter only execution for jvmti purposes. It is sufficient to deoptimize the compiled frames on stack.

Additionally a handshake is used instead of a vm operation to walk the stack and do the deoptimizations.

Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and release builds on all platforms.

Thanks, Richard.

See also my question if anyone knows a reason for making the compiled methods not_entrant:
http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html


From thomas.schatzl at oracle.com  Wed Feb 12 11:16:50 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 12 Feb 2020 12:16:50 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
Message-ID: <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>

Hi Liang,

On 12.02.20 11:17, Liang Mao wrote:
> Hi Thomas,
> 
> I made a new patch for the issues we listed in?JDK-8238686 and
> JDK-8236073:
> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/

   thanks. I only had time to quickly browse the change, and started 
building and testing it internally. I will run it through our perf 
benchmarks to look for regressions of out-of-box behavior.

I will need a day or two until I can get back to looking at the change 
in detail. There is currently something else I need to look at. Sorry.

Thanks,
   Thomas


From stefan.johansson at oracle.com  Wed Feb 12 11:38:47 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 12 Feb 2020 12:38:47 +0100
Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure
 checks
In-Reply-To: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com>
References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com>
Message-ID: <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com>

Hi Thomas,

> 11 feb. 2020 kl. 14:30 skrev Thomas Schatzl <thomas.schatzl at oracle.com>:
> 
> Hi all,
> 
>  can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238854
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8238854/webrev/
Nice cleanup in general, but MemRegion doesn?t derive from CHeapObj and will return NULL on failure: 
void* MemRegion::operator new(size_t size) throw() {
  return (address)AllocateHeap(size, mtGC, CURRENT_PC,
    AllocFailStrategy::RETURN_NULL);
}

So we should either change this to use the default AllocFailStrategy or keep the checks.

Otherwise it looks good,
Stefan

> Testing:
> hs-tier1-5 without differences
> 
> Thanks,
>  Thomas


From thomas.schatzl at oracle.com  Wed Feb 12 12:10:46 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 12 Feb 2020 13:10:46 +0100
Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure
 checks
In-Reply-To: <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com>
References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com>
 <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com>
Message-ID: <c96e7a8e-4696-0dce-a876-9b83ffe56af2@oracle.com>

Hi Stefan,

   thanks for your review.

On 12.02.20 12:38, Stefan Johansson wrote:
> Hi Thomas,
> 
>> 11 feb. 2020 kl. 14:30 skrev Thomas Schatzl <thomas.schatzl at oracle.com>:
>>
>> Hi all,
>>
>>   can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8238854
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev/
> Nice cleanup in general, but MemRegion doesn?t derive from CHeapObj and will return NULL on failure:
> void* MemRegion::operator new(size_t size) throw() {
>    return (address)AllocateHeap(size, mtGC, CURRENT_PC,
>      AllocFailStrategy::RETURN_NULL);
> }
> 
> So we should either change this to use the default AllocFailStrategy or keep the checks.
> 

Nice catch. I opted to revert the changes for MemRegion allocation.

Although I think all users of new for MemRegion expect it to fail (the 
only other user in filemap.cpp will crash with NPE a few lines after 
allocation), this needs more investigation because the change 
introducing the new operator talks about some clang compatibility issue 
(from 2003). But it also indicates that the problem occurred only on a 
use that is not in the code base any more (JDK-8021954 ftr, it is a 
closed issue that can't be opened).

(Note that my testing did not reproduce the failure, but, the code is 
not used in the crashing component, i.e. metaspace handling, any more).

http://cr.openjdk.java.net/~tschatzl/8238854/webrev.0_to_1/ (diff)
http://cr.openjdk.java.net/~tschatzl/8238854/webrev.1/ (full)

Thanks,
   Thomas


From stefan.johansson at oracle.com  Wed Feb 12 12:16:15 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 12 Feb 2020 13:16:15 +0100
Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure
 checks
In-Reply-To: <c96e7a8e-4696-0dce-a876-9b83ffe56af2@oracle.com>
References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com>
 <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com>
 <c96e7a8e-4696-0dce-a876-9b83ffe56af2@oracle.com>
Message-ID: <DCDEC0D4-1367-49D6-8FE9-073DFC7647B6@oracle.com>

Hi Thomas,

> 12 feb. 2020 kl. 13:10 skrev Thomas Schatzl <thomas.schatzl at oracle.com>:
> 
> Hi Stefan,
> 
>  thanks for your review.
> 
> On 12.02.20 12:38, Stefan Johansson wrote:
>> Hi Thomas,
>>> 11 feb. 2020 kl. 14:30 skrev Thomas Schatzl <thomas.schatzl at oracle.com>:
>>> 
>>> Hi all,
>>> 
>>>  can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM.
>>> 
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8238854
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev/
>> Nice cleanup in general, but MemRegion doesn?t derive from CHeapObj and will return NULL on failure:
>> void* MemRegion::operator new(size_t size) throw() {
>>   return (address)AllocateHeap(size, mtGC, CURRENT_PC,
>>     AllocFailStrategy::RETURN_NULL);
>> }
>> So we should either change this to use the default AllocFailStrategy or keep the checks.
> 
> Nice catch. I opted to revert the changes for MemRegion allocation.
> 
> Although I think all users of new for MemRegion expect it to fail (the only other user in filemap.cpp will crash with NPE a few lines after allocation), this needs more investigation because the change introducing the new operator talks about some clang compatibility issue (from 2003). But it also indicates that the problem occurred only on a use that is not in the code base any more (JDK-8021954 ftr, it is a closed issue that can't be opened).
> 
> (Note that my testing did not reproduce the failure, but, the code is not used in the crashing component, i.e. metaspace handling, any more).
> 
> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.0_to_1/ (diff)
> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.1/ (full)

I agree with your reasoning above and think this is good.

Thanks,
Stefan

> 
> Thanks,
>  Thomas


From rkennke at redhat.com  Wed Feb 12 14:21:12 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 12 Feb 2020 15:21:12 +0100
Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of
 correct type
In-Reply-To: <80502580-be2e-7cc8-c0e8-e9a9d11adffc@redhat.com>
References: <b1940072-a2c4-4524-7b14-fa77ad9c7920@redhat.com>
 <c0d9333a-47c5-65ab-c362-8c44ee0fd1e5@redhat.com>
 <cec991c5-355b-f63a-f5f8-9e769eb20b1b@redhat.com>
 <80502580-be2e-7cc8-c0e8-e9a9d11adffc@redhat.com>
Message-ID: <a1516504-c1bc-06b9-fccd-aca69a3f8130@redhat.com>

>>>> In ShBSC1::ensure_in_register() we are blindly creating registers of
>>>> type T_OBJECT, even though in some cases we actually need T_ADDRESS.
>>>> This blows up when we verify oop registers: when the argument is of type
>>>> T_OBJECT we perform extra checks that fail when the value in register is
>>>> not actually an object.
>>>>
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8238851
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/
>>>> Testing: the provided testcase passes now. hotspot_gc_shenandoah
>>>
>>> This path probably needs adjustment too:
>>>
>>>  167 #ifdef AARCH64
>>>  168       // AArch64 expects double-size register.
>>>  169       obj_reg = gen->new_pointer_register();
>>>  170 #else
>>
>> The provided test passes on aarch64 without any additional changes.
>>
>> I tried removing the block, hoping that the suggested change does
>> perhaps make it unnecessary, but no. It's still needed.
> 
> Gaawh. The non-AARCH64 path still looks good, so we can push it in current form. We really need to
> figure out AARCH64 thingie, please file the follow-up RFR?

Turns out that we can fix this rather easily. It is the non-aarch64 path
that is wrong though:

Differential:
http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01.diff/
Full:
http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/

This is also consistent with the implementations of LIRAssembler::leal()
both x86 and aarch64.

Testing: passes hotspot_gc_shenandoah both aarch64 and x86

Good?

Roman


From shade at redhat.com  Wed Feb 12 14:25:23 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 12 Feb 2020 15:25:23 +0100
Subject: RFR: JDK-8238851: Shenandoah: C1: Resolve into registers of
 correct type
In-Reply-To: <a1516504-c1bc-06b9-fccd-aca69a3f8130@redhat.com>
References: <b1940072-a2c4-4524-7b14-fa77ad9c7920@redhat.com>
 <c0d9333a-47c5-65ab-c362-8c44ee0fd1e5@redhat.com>
 <cec991c5-355b-f63a-f5f8-9e769eb20b1b@redhat.com>
 <80502580-be2e-7cc8-c0e8-e9a9d11adffc@redhat.com>
 <a1516504-c1bc-06b9-fccd-aca69a3f8130@redhat.com>
Message-ID: <a900fb15-4a09-adbe-7f69-458b091c4cd3@redhat.com>

On 2/12/20 3:21 PM, Roman Kennke wrote:
> Differential:
> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01.diff/
> Full:
> http://cr.openjdk.java.net/~rkennke/JDK-8238851/webrev.01/
> 
> This is also consistent with the implementations of LIRAssembler::leal()
> both x86 and aarch64.

Looks good.

-- 
Thanks,
-Aleksey


From mikael.vidstedt at oracle.com  Wed Feb 12 17:27:33 2020
From: mikael.vidstedt at oracle.com (Mikael Vidstedt)
Date: Wed, 12 Feb 2020 09:27:33 -0800
Subject: RFR(XS): 8238932: Invalid tier1_gc_1 test group definition
Message-ID: <A0EC68B8-4A3E-4A41-B1AD-B90519386FB4@oracle.com>


Please review this small change which fixes the definition of the tier1_gc_1 jtreg test group.

JBS: https://bugs.openjdk.java.net/browse/JDK-8238932
Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8238932/webrev.00/open/webrev/

The issue was introduced as part of JDK-8212657[1] "Promptly Return Unused Committed Memory from G1?. The missing backslash means "-gc/g1/TestTimelyCompaction.java? will actually be interpreted as a test group name by jtreg, resulting in an empty test group. There is no TestTimelyCompaction.java test/file, so either it was not added, or it?s really supposed to be test/hotspot/jtreg/gc/g1/TestPeriodicCollection.java (which was added as part of the same change).

In either case, since the test group definition has been used successfully for more than a year now it seems like simply removing the faulty line should do the trick..?

Cheers,
Mikael

[1] https://bugs.openjdk.java.net/browse/JDK-8212657

From kim.barrett at oracle.com  Wed Feb 12 18:47:36 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 12 Feb 2020 13:47:36 -0500
Subject: RFR(XS): 8238932: Invalid tier1_gc_1 test group definition
In-Reply-To: <A0EC68B8-4A3E-4A41-B1AD-B90519386FB4@oracle.com>
References: <A0EC68B8-4A3E-4A41-B1AD-B90519386FB4@oracle.com>
Message-ID: <71EAB7E4-2B66-4BCD-90D5-CF6AC2C814A5@oracle.com>

> On Feb 12, 2020, at 12:27 PM, Mikael Vidstedt <mikael.vidstedt at oracle.com> wrote:
> 
> 
> Please review this small change which fixes the definition of the tier1_gc_1 jtreg test group.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8238932
> Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8238932/webrev.00/open/webrev/
> 
> The issue was introduced as part of JDK-8212657[1] "Promptly Return Unused Committed Memory from G1?. The missing backslash means "-gc/g1/TestTimelyCompaction.java? will actually be interpreted as a test group name by jtreg, resulting in an empty test group. There is no TestTimelyCompaction.java test/file, so either it was not added, or it?s really supposed to be test/hotspot/jtreg/gc/g1/TestPeriodicCollection.java (which was added as part of the same change).
> 
> In either case, since the test group definition has been used successfully for more than a year now it seems like simply removing the faulty line should do the trick..?
> 
> Cheers,
> Mikael
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8212657

Looks good, and trivial.

Maybe someone should contact the original authors of the change regarding the possibly missing test.


From kim.barrett at oracle.com  Thu Feb 13 00:51:18 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 12 Feb 2020 19:51:18 -0500
Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure
 checks
In-Reply-To: <DCDEC0D4-1367-49D6-8FE9-073DFC7647B6@oracle.com>
References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com>
 <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com>
 <c96e7a8e-4696-0dce-a876-9b83ffe56af2@oracle.com>
 <DCDEC0D4-1367-49D6-8FE9-073DFC7647B6@oracle.com>
Message-ID: <FD0790D0-1DFB-4CA9-8C4A-F452B57FDE76@oracle.com>

> On Feb 12, 2020, at 7:16 AM, Stefan Johansson <stefan.johansson at oracle.com> wrote:
> 
> Hi Thomas,
> 
>> 12 feb. 2020 kl. 13:10 skrev Thomas Schatzl <thomas.schatzl at oracle.com>:
>> 
>> Hi Stefan,
>> 
>> thanks for your review.
>> 
>> On 12.02.20 12:38, Stefan Johansson wrote:
>>> Hi Thomas,
>>>> 11 feb. 2020 kl. 14:30 skrev Thomas Schatzl <thomas.schatzl at oracle.com>:
>>>> 
>>>> Hi all,
>>>> 
>>>> can I have reviews for this change that removes superfluous C heap allocation failure checks (basically hard-exiting the VM) because by default C allocation already exits the VM.
>>>> 
>>>> CR:
>>>> https://bugs.openjdk.java.net/browse/JDK-8238854
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev/
>>> Nice cleanup in general, but MemRegion doesn?t derive from CHeapObj and will return NULL on failure:
>>> void* MemRegion::operator new(size_t size) throw() {
>>>  return (address)AllocateHeap(size, mtGC, CURRENT_PC,
>>>    AllocFailStrategy::RETURN_NULL);
>>> }
>>> So we should either change this to use the default AllocFailStrategy or keep the checks.
>> 
>> Nice catch. I opted to revert the changes for MemRegion allocation.
>> 
>> Although I think all users of new for MemRegion expect it to fail (the only other user in filemap.cpp will crash with NPE a few lines after allocation), this needs more investigation because the change introducing the new operator talks about some clang compatibility issue (from 2003). But it also indicates that the problem occurred only on a use that is not in the code base any more (JDK-8021954 ftr, it is a closed issue that can't be opened).
>> 
>> (Note that my testing did not reproduce the failure, but, the code is not used in the crashing component, i.e. metaspace handling, any more).
>> 
>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.0_to_1/ (diff)
>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.1/ (full)
> 
> I agree with your reasoning above and think this is good.
> 
> Thanks,
> Stefan

I agree too.  Looks good.

Maybe file an RFE to look at this?  MemRegion allocator functions are declared throw(), which is
atypical and definitely strange for us.  When building with gcc we use -fcheck-new.  I?m not sure
how those interact, or exactly what -fcheck-new does, or whether we actually need -fcheck-new.


From manc at google.com  Thu Feb 13 01:34:43 2020
From: manc at google.com (Man Cao)
Date: Wed, 12 Feb 2020 17:34:43 -0800
Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in
 gc/g1/unloading/libdefine.cpp
In-Reply-To: <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com>
References: <CA+w6HxYTpW39sn7bGCQL48c6v=9PoOzb=La6KJV9KS5yuLNyOA@mail.gmail.com>
 <d1849115-fead-0096-4fb1-d9bce405dea4@oracle.com>
 <CA+w6HxZ8d1CMsSHWtV3ZeEdhsDabeRkHtBpnoWHyrHOOXNbY+g@mail.gmail.com>
 <CA+w6HxYBNfYzrNy=gtfF2uvjy1cGCHxXztPmDMQUv3AaYcccKw@mail.gmail.com>
 <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com>
Message-ID: <CA+w6HxZOk+O_7CuSnNbWkQzMhuCKXx-OhCdFGhxZOptD+Powew@mail.gmail.com>

Could I have a second review?

-Man


From ivan.walulya at oracle.com  Thu Feb 13 09:40:34 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Thu, 13 Feb 2020 10:40:34 +0100
Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop
In-Reply-To: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com>
References: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com>
Message-ID: <070328D8-808D-4F74-ACAD-2CD9DCA1C9FF@oracle.com>

This is a good fix to blocking on the last element. (Not a reviewer).

> On 12 Feb 2020, at 01:45, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
> Please review this change to G1DirtyCardQueueSet::Queue::pop.
> Previously, if there was exactly one element in the queue, a pop
> operation could not return it, because doing so could break invariants
> for concurrent operations.  Now, if there is one element and there are
> concurrent pop operations, one of those operations will win.  Note
> that there are still races between pop and push/append that may
> prevent the pop operation from obtaining an element.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238867
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8238867/open.00/
> 
> Testing:
> mach5 tier1-3.
> mach5 tier1-5 (only linux-x64) in conjunction with other changes.
> Some performance testing didn't find any unexpected differences.
> 


From thomas.schatzl at oracle.com  Thu Feb 13 09:59:24 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 13 Feb 2020 10:59:24 +0100
Subject: RFR (M): 8238854: Remove superfluous C heap allocation failure
 checks
In-Reply-To: <FD0790D0-1DFB-4CA9-8C4A-F452B57FDE76@oracle.com>
References: <51546c32-2d17-ed33-0b84-a56ca16a1227@oracle.com>
 <155C9D38-1F5C-452E-88A7-C24C9F41CF57@oracle.com>
 <c96e7a8e-4696-0dce-a876-9b83ffe56af2@oracle.com>
 <DCDEC0D4-1367-49D6-8FE9-073DFC7647B6@oracle.com>
 <FD0790D0-1DFB-4CA9-8C4A-F452B57FDE76@oracle.com>
Message-ID: <4864d2fb-933b-7be3-14c5-7903a11c7ff0@oracle.com>

Hi Kim, Stefan,

On 13.02.20 01:51, Kim Barrett wrote:
>> On Feb 12, 2020, at 7:16 AM, Stefan Johansson <stefan.johansson at oracle.com> wrote:
>>
>> Hi Thomas,
>>
>>> 12 feb. 2020 kl. 13:10 skrev Thomas Schatzl <thomas.schatzl at oracle.com>:
[...]
>>> Although I think all users of new for MemRegion expect it to fail (the only other user in filemap.cpp will crash with NPE a few lines after allocation), this needs more investigation because the change introducing the new operator talks about some clang compatibility issue (from 2003). But it also indicates that the problem occurred only on a use that is not in the code base any more (JDK-8021954 ftr, it is a closed issue that can't be opened).
>>>
>>> (Note that my testing did not reproduce the failure, but, the code is not used in the crashing component, i.e. metaspace handling, any more).
>>>
>>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.0_to_1/ (diff)
>>> http://cr.openjdk.java.net/~tschatzl/8238854/webrev.1/ (full)
>>
>> I agree with your reasoning above and think this is good.
>>
>> Thanks,
>> Stefan
> 
> I agree too.  Looks good.
> 
> Maybe file an RFE to look at this?  MemRegion allocator functions are declared throw(), which is
> atypical and definitely strange for us.  When building with gcc we use -fcheck-new.  I?m not sure
> how those interact, or exactly what -fcheck-new does, or whether we actually need -fcheck-new.
> 

Filed JDK-8238999; thanks for your reviews.

Thomas


From stefan.johansson at oracle.com  Thu Feb 13 11:23:52 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Thu, 13 Feb 2020 12:23:52 +0100
Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop
In-Reply-To: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com>
References: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com>
Message-ID: <6363B6D8-FECD-4F0B-B86B-0B493692D84B@oracle.com>

Hi Kim,

> 12 feb. 2020 kl. 01:45 skrev Kim Barrett <kim.barrett at oracle.com>:
> 
> Please review this change to G1DirtyCardQueueSet::Queue::pop.
> Previously, if there was exactly one element in the queue, a pop
> operation could not return it, because doing so could break invariants
> for concurrent operations.  Now, if there is one element and there are
> concurrent pop operations, one of those operations will win.  Note
> that there are still races between pop and push/append that may
> prevent the pop operation from obtaining an element.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238867
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8238867/open.00/
Looks good, thanks for all the comments. Makes it easier to follow. 

Thanks,
Stefan

> 
> Testing:
> mach5 tier1-3.
> mach5 tier1-5 (only linux-x64) in conjunction with other changes.
> Some performance testing didn't find any unexpected differences.
> 


From stefan.johansson at oracle.com  Thu Feb 13 12:01:12 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Thu, 13 Feb 2020 13:01:12 +0100
Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in
 gc/g1/unloading/libdefine.cpp
In-Reply-To: <CA+w6HxZOk+O_7CuSnNbWkQzMhuCKXx-OhCdFGhxZOptD+Powew@mail.gmail.com>
References: <CA+w6HxYTpW39sn7bGCQL48c6v=9PoOzb=La6KJV9KS5yuLNyOA@mail.gmail.com>
 <d1849115-fead-0096-4fb1-d9bce405dea4@oracle.com>
 <CA+w6HxZ8d1CMsSHWtV3ZeEdhsDabeRkHtBpnoWHyrHOOXNbY+g@mail.gmail.com>
 <CA+w6HxYBNfYzrNy=gtfF2uvjy1cGCHxXztPmDMQUv3AaYcccKw@mail.gmail.com>
 <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com>
 <CA+w6HxZOk+O_7CuSnNbWkQzMhuCKXx-OhCdFGhxZOptD+Powew@mail.gmail.com>
Message-ID: <27E85C8A-5223-4A00-B5B3-71212087AE10@oracle.com>

Looks good,
Stefan

> 13 feb. 2020 kl. 02:34 skrev Man Cao <manc at google.com>:
> 
> Could I have a second review?
> 
> -Man


From manc at google.com  Thu Feb 13 18:57:58 2020
From: manc at google.com (Man Cao)
Date: Thu, 13 Feb 2020 10:57:58 -0800
Subject: RFR (XS): 8234608: [TESTBUG] Memory leak in
 gc/g1/unloading/libdefine.cpp
In-Reply-To: <27E85C8A-5223-4A00-B5B3-71212087AE10@oracle.com>
References: <CA+w6HxYTpW39sn7bGCQL48c6v=9PoOzb=La6KJV9KS5yuLNyOA@mail.gmail.com>
 <d1849115-fead-0096-4fb1-d9bce405dea4@oracle.com>
 <CA+w6HxZ8d1CMsSHWtV3ZeEdhsDabeRkHtBpnoWHyrHOOXNbY+g@mail.gmail.com>
 <CA+w6HxYBNfYzrNy=gtfF2uvjy1cGCHxXztPmDMQUv3AaYcccKw@mail.gmail.com>
 <5fb5f27c-7b4e-f72e-a01f-aebb619c9558@oracle.com>
 <CA+w6HxZOk+O_7CuSnNbWkQzMhuCKXx-OhCdFGhxZOptD+Powew@mail.gmail.com>
 <27E85C8A-5223-4A00-B5B3-71212087AE10@oracle.com>
Message-ID: <CA+w6HxaW0dygozq0KSVPRED6B+YCab+_vP5gV3hSK9DiwaNRdA@mail.gmail.com>

Thanks for the reviews!

-Man


From kim.barrett at oracle.com  Thu Feb 13 19:53:12 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 13 Feb 2020 14:53:12 -0500
Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop
In-Reply-To: <070328D8-808D-4F74-ACAD-2CD9DCA1C9FF@oracle.com>
References: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com>
 <070328D8-808D-4F74-ACAD-2CD9DCA1C9FF@oracle.com>
Message-ID: <5BFAB89A-6476-42D4-AE04-92B053615E4C@oracle.com>

> On Feb 13, 2020, at 4:40 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
> 
> This is a good fix to blocking on the last element. (Not a reviewer).

Thanks.

> 
>> On 12 Feb 2020, at 01:45, Kim Barrett <kim.barrett at oracle.com> wrote:
>> 
>> Please review this change to G1DirtyCardQueueSet::Queue::pop.
>> Previously, if there was exactly one element in the queue, a pop
>> operation could not return it, because doing so could break invariants
>> for concurrent operations.  Now, if there is one element and there are
>> concurrent pop operations, one of those operations will win.  Note
>> that there are still races between pop and push/append that may
>> prevent the pop operation from obtaining an element.
>> 
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8238867
>> 
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8238867/open.00/
>> 
>> Testing:
>> mach5 tier1-3.
>> mach5 tier1-5 (only linux-x64) in conjunction with other changes.
>> Some performance testing didn't find any unexpected differences.


From kim.barrett at oracle.com  Thu Feb 13 19:53:26 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 13 Feb 2020 14:53:26 -0500
Subject: RFR: 8238867: Improve G1DirtyCardQueueSet::Queue::pop
In-Reply-To: <6363B6D8-FECD-4F0B-B86B-0B493692D84B@oracle.com>
References: <192C6AD3-E241-44B5-874A-3E4D6CF93A41@oracle.com>
 <6363B6D8-FECD-4F0B-B86B-0B493692D84B@oracle.com>
Message-ID: <D6A3A92F-6C0B-4190-8580-C19DD4E96268@oracle.com>

> On Feb 13, 2020, at 6:23 AM, Stefan Johansson <stefan.johansson at oracle.com> wrote:
> 
> Hi Kim,
> 
>> 12 feb. 2020 kl. 01:45 skrev Kim Barrett <kim.barrett at oracle.com>:
>> 
>> Please review this change to G1DirtyCardQueueSet::Queue::pop.
>> Previously, if there was exactly one element in the queue, a pop
>> operation could not return it, because doing so could break invariants
>> for concurrent operations.  Now, if there is one element and there are
>> concurrent pop operations, one of those operations will win.  Note
>> that there are still races between pop and push/append that may
>> prevent the pop operation from obtaining an element.
>> 
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8238867
>> 
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8238867/open.00/
> Looks good, thanks for all the comments. Makes it easier to follow. 

Thanks.

> 
> Thanks,
> Stefan
> 
>> 
>> Testing:
>> mach5 tier1-3.
>> mach5 tier1-5 (only linux-x64) in conjunction with other changes.
>> Some performance testing didn't find any unexpected differences.


From kim.barrett at oracle.com  Fri Feb 14 01:46:46 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 13 Feb 2020 20:46:46 -0500
Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously
 paused buffers 
Message-ID: <C4E56CF5-94A0-4064-96EF-CBA0D4CD4D41@oracle.com>

Please review this simplification of the handling of previously paused
buffers by G1DirtyCardQueueSet.  This change moves the call to
enqueue_previous_paused_buffers() into record_paused_buffer().  This
ensures any paused buffers from a previous safepoint have been flushed
out before recording a buffer for the next safepoint.

This move eliminates the former precondition that the enqueue had to
have been performed before recording.

This move also permits the enqueue_previous_paused_buffers in
get_completed_buffer() to be moved to a point where it will be called
much more rarely, slightly improving the normal performance of
get_dirtied_buffer.  The old location of the call was in support of
the call order invariant needed by record_paused_buffer().

As a consequence of the changed enqueue locations, the fast path check
in enqueue_previous_paused_buffers() will now only rarely succeed, and
is no longer worth the (very small) performance cost and (much more
importantly) the largish block comment arguing its correctness.  So
that fast path is removed.  And since the raison d'etre for
PausedBuffers::is_empty() was to support that fast path, that function
is also removed.

CR:
https://bugs.openjdk.java.net/browse/JDK-8238979

Webrev:
https://cr.openjdk.java.net/~kbarrett/8238979/open.00/

Testing:
mach5 tier1-5 in conjunction with other in-development changes.
Local (linux-x64) hotspot:tier1 for this change in isolation.


From suenaga at oss.nttdata.com  Fri Feb 14 09:07:59 2020
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Fri, 14 Feb 2020 18:07:59 +0900
Subject: Use DAX in ZGC
Message-ID: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>

Hi all,

I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't.
It seems to allow when filesystem is hugetlbfs or tmpfs.

According to kernel document [1], DAX is supported in ext2, ext4, and xfs.
Also we need to mount it with "-o dax".

I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage.
What do you think this change?

   http://cr.openjdk.java.net/~ysuenaga/dax-z/

If it can be accepted, I will file it to JBS and will propose CSR.


Thanks,

Yasumasa


[1] https://www.kernel.org/doc/Documentation/filesystems/dax.txt


From per.liden at oracle.com  Fri Feb 14 11:52:42 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 14 Feb 2020 12:52:42 +0100
Subject: Use DAX in ZGC
In-Reply-To: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
Message-ID: <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>

Hi Yasumasa,

On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
> Hi all,
> 
> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it 
> couldn't.
> It seems to allow when filesystem is hugetlbfs or tmpfs.
> 
> According to kernel document [1], DAX is supported in ext2, ext4, and xfs.
> Also we need to mount it with "-o dax".
> 
> I want to use ZGC on DAX, so I want to introduce new option 
> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing 
> storage.
> What do you think this change?


+  experimental(bool, ZAllowHeapOnFileSystem, false, 
     \
+          "Allow to use filesystem as Java heap backing storage " 
     \
+          "specified by -XX:AllocateHeapAt") 
     \
+ 
     \

Instead of adding a new option it would be preferable to automatically 
detect that it's a dax mounted filesystem. But I haven't has a chance to 
look into the best way of doing that.


    const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() : 
os::large_page_size();
-  if (expected_block_size != _block_size) {
+  if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) {
      log_error(gc)("%s filesystem has unexpected block size " 
SIZE_FORMAT " (expected " SIZE_FORMAT ")",
                    is_tmpfs() ? ZFILESYSTEM_TMPFS : 
ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size);
      return;
    }

This part looks potentially dangerous, since we might then be working 
with an incorrect _block_size.


  int ZPhysicalMemoryBacking::create_file_fd(const char* name) const {
+  if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) {
+    log_error(gc)("-XX:AllocateHeapAt is needed when 
ZAllowHeapOnFileSystem is specified");
+    return -1;
+  }
+
    const char* const filesystem = ZLargePages::is_explicit()
                                   ? ZFILESYSTEM_HUGETLBFS
                                   : ZFILESYSTEM_TMPFS;

This part looks unnecessary, no?

cheers,
Per

> 
>  ? http://cr.openjdk.java.net/~ysuenaga/dax-z/
> 
> If it can be accepted, I will file it to JBS and will propose CSR.
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> [1] https://www.kernel.org/doc/Documentation/filesystems/dax.txt


From richard.reingruber at sap.com  Fri Feb 14 12:58:41 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Fri, 14 Feb 2020 12:58:41 +0000
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <3c59b9f9-ec38-18c9-8f24-e1186a08a04a@oracle.com>
References: <AM0PR0202MB33311AB7623F4164335286309B1B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <3c59b9f9-ec38-18c9-8f24-e1186a08a04a@oracle.com>
Message-ID: <AM0PR0202MB3331FD660B315CEC25B1DC209B150@AM0PR0202MB3331.eurprd02.prod.outlook.com>

Hi Patricio,

thanks for having a look.

  > I?m only commenting on the handshake changes.
  > I see that operation VM_EnterInterpOnlyMode can be called inside 
  > operation VM_SetFramePop which also allows nested operations. Here is a 
  > comment in VM_SetFramePop definition:
  > 
  > // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is
  > // called from the JvmtiEventControllerPrivate::recompute_thread_enabled.
  > 
  > So if we change VM_EnterInterpOnlyMode to be a handshake, then now we 
  > could have a handshake inside a safepoint operation. The issue I see 
  > there is that at the end of the handshake the polling page of the target 
  > thread could be disarmed. So if the target thread happens to be in a 
  > blocked state just transiently and wakes up then it will not stop for 
  > the ongoing safepoint. Maybe I can file an RFE to assert that the 
  > polling page is armed at the beginning of disarm_safepoint().

I'm really glad you noticed the problematic nesting. This seems to be a general issue: currently a
handshake cannot be nested in a vm operation. Maybe it should be asserted in the
Handshake::execute() methods that they are not called by the vm thread evaluating a vm operation?

  > Alternatively I think you could do something similar to what we do in 
  > Deoptimization::deoptimize_all_marked():
  > 
  >    EnterInterpOnlyModeClosure hs;
  >    if (SafepointSynchronize::is_at_safepoint()) {
  >      hs.do_thread(state->get_thread());
  >    } else {
  >      Handshake::execute(&hs, state->get_thread());
  >    }
  > (you could pass ?EnterInterpOnlyModeClosure? directly to the 
  > HandshakeClosure() constructor)

Maybe this could be used also in the Handshake::execute() methods as general solution?

  > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is 
  > always called in a nested operation or just sometimes.

At least one execution path without vm operation exists:

  JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *) : void
    JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *) : jlong
      JvmtiEventControllerPrivate::recompute_enabled() : void
        JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent, bool) : void (2 matches)
          JvmtiEventController::change_field_watch(jvmtiEvent, bool) : void
            JvmtiEnv::SetFieldAccessWatch(fieldDescriptor *) : jvmtiError
              jvmti_SetFieldAccessWatch(jvmtiEnv *, jclass, jfieldID) : jvmtiError

I tend to revert back to VM_EnterInterpOnlyMode as it wasn't my main intent to replace it with a
handshake, but to avoid making the compiled methods on stack not_entrant.... unless I'm further
encouraged to do it with a handshake :)

Thanks again,
Richard.

-----Original Message-----
From: Patricio Chilano <patricio.chilano.mateo at oracle.com> 
Sent: Donnerstag, 13. Februar 2020 18:47
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant

Hi Richard,

I?m only commenting on the handshake changes.
I see that operation VM_EnterInterpOnlyMode can be called inside 
operation VM_SetFramePop which also allows nested operations. Here is a 
comment in VM_SetFramePop definition:

// Nested operation must be allowed for the VM_EnterInterpOnlyMode that is
// called from the JvmtiEventControllerPrivate::recompute_thread_enabled.

So if we change VM_EnterInterpOnlyMode to be a handshake, then now we 
could have a handshake inside a safepoint operation. The issue I see 
there is that at the end of the handshake the polling page of the target 
thread could be disarmed. So if the target thread happens to be in a 
blocked state just transiently and wakes up then it will not stop for 
the ongoing safepoint. Maybe I can file an RFE to assert that the 
polling page is armed at the beginning of disarm_safepoint().

I think one option could be to remove 
SafepointMechanism::disarm_if_needed() in 
HandshakeState::clear_handshake() and let each JavaThread disarm itself 
for the handshake case.

Alternatively I think you could do something similar to what we do in 
Deoptimization::deoptimize_all_marked():

 ? EnterInterpOnlyModeClosure hs;
 ? if (SafepointSynchronize::is_at_safepoint()) {
 ??? hs.do_thread(state->get_thread());
 ? } else {
 ??? Handshake::execute(&hs, state->get_thread());
 ? }
(you could pass ?EnterInterpOnlyModeClosure? directly to the 
HandshakeClosure() constructor)

I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is 
always called in a nested operation or just sometimes.

Thanks,
Patricio

On 2/12/20 7:23 AM, Reingruber, Richard wrote:
> // Repost including hotspot runtime and gc lists.
> // Dean Long suggested to do so, because the enhancement replaces a vm operation
> // with a handshake.
> // Original thread: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-February/030359.html
>
> Hi,
>
> could I please get reviews for this small enhancement in hotspot's jvmti implementation:
>
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
> Bug:    https://bugs.openjdk.java.net/browse/JDK-8238585
>
> The change avoids making all compiled methods on stack not_entrant when switching a java thread to
> interpreter only execution for jvmti purposes. It is sufficient to deoptimize the compiled frames on stack.
>
> Additionally a handshake is used instead of a vm operation to walk the stack and do the deoptimizations.
>
> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and release builds on all platforms.
>
> Thanks, Richard.
>
> See also my question if anyone knows a reason for making the compiled methods not_entrant:
> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html


From suenaga at oss.nttdata.com  Fri Feb 14 13:31:29 2020
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Fri, 14 Feb 2020 22:31:29 +0900
Subject: Use DAX in ZGC
In-Reply-To: <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
Message-ID: <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>

Hi Per,

On 2020/02/14 20:52, Per Liden wrote:
> Hi Yasumasa,
> 
> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>> Hi all,
>>
>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't.
>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>
>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs.
>> Also we need to mount it with "-o dax".
>>
>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage.
>> What do you think this change?
> 
> 
> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
> +????????? "Allow to use filesystem as Java heap backing storage " ??? \
> +????????? "specified by -XX:AllocateHeapAt") ??? \
> + ??? \
> 
> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that.

I thought so, but I guess it is difficult.
PMDK also does not check it automatically.

   https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c#L18

In addition, we don't seem to be able to get mount option ("-o dax") via syscall.
I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it.

Another solution, we can use /proc/mounts, but it might be complex.


>  ?? const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() : os::large_page_size();
> -? if (expected_block_size != _block_size) {
> +? if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) {
>  ???? log_error(gc)("%s filesystem has unexpected block size " SIZE_FORMAT " (expected " SIZE_FORMAT ")",
>  ?????????????????? is_tmpfs() ? ZFILESYSTEM_TMPFS : ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size);
>  ???? return;
>  ?? }
> 
> This part looks potentially dangerous, since we might then be working with an incorrect _block_size.

I guess block size in almost filesystems is 4KB even if DAX.
(XFS allows variable block sizes...)

   https://nvdimm.wiki.kernel.org/2mib_fs_dax

So I think we can limit _block_size to OS page size (4KB).


>  ?int ZPhysicalMemoryBacking::create_file_fd(const char* name) const {
> +? if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) {
> +??? log_error(gc)("-XX:AllocateHeapAt is needed when ZAllowHeapOnFileSystem is specified");
> +??? return -1;
> +? }
> +
>  ?? const char* const filesystem = ZLargePages::is_explicit()
>  ????????????????????????????????? ? ZFILESYSTEM_HUGETLBFS
>  ????????????????????????????????? : ZFILESYSTEM_TMPFS;
> 
> This part looks unnecessary, no?

I added ZAllowHeapOnFileSystem to use with AllocateHeapAt.
So I want to warn if AllocateHeapAt == NULL.


Thanks,

Yasumasa


> cheers,
> Per
> 
>>
>> ?? http://cr.openjdk.java.net/~ysuenaga/dax-z/
>>
>> If it can be accepted, I will file it to JBS and will propose CSR.
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> [1] https://www.kernel.org/doc/Documentation/filesystems/dax.txt


From per.liden at oracle.com  Fri Feb 14 14:08:55 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 14 Feb 2020 15:08:55 +0100
Subject: Use DAX in ZGC
In-Reply-To: <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
Message-ID: <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>

Hi,

On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
> Hi Per,
> 
> On 2020/02/14 20:52, Per Liden wrote:
>> Hi Yasumasa,
>>
>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>> Hi all,
>>>
>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but 
>>> it couldn't.
>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>
>>> According to kernel document [1], DAX is supported in ext2, ext4, and 
>>> xfs.
>>> Also we need to mount it with "-o dax".
>>>
>>> I want to use ZGC on DAX, so I want to introduce new option 
>>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing 
>>> storage.
>>> What do you think this change?
>>
>>
>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \
>> +????????? "specified by -XX:AllocateHeapAt") ??? \
>> + ??? \
>>
>> Instead of adding a new option it would be preferable to automatically 
>> detect that it's a dax mounted filesystem. But I haven't has a chance 
>> to look into the best way of doing that.
> 
> I thought so, but I guess it is difficult.
> PMDK also does not check it automatically.
> 
>    
> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ 
> 
> In addition, we don't seem to be able to get mount option ("-o dax") via 
> syscall.
> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th 
> argument (const void *data). It would be handled in each filesystem, so 
> I could not get it.
> 
> Another solution, we can use /proc/mounts, but it might be complex.

I was maybe hoping you could get this information through some ioctl() 
command on the file descriptor?

> 
> 
>> ??? const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() 
>> : os::large_page_size();
>> -? if (expected_block_size != _block_size) {
>> +? if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) {
>> ????? log_error(gc)("%s filesystem has unexpected block size " 
>> SIZE_FORMAT " (expected " SIZE_FORMAT ")",
>> ??????????????????? is_tmpfs() ? ZFILESYSTEM_TMPFS : 
>> ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size);
>> ????? return;
>> ??? }
>>
>> This part looks potentially dangerous, since we might then be working 
>> with an incorrect _block_size.
> 
> I guess block size in almost filesystems is 4KB even if DAX.
> (XFS allows variable block sizes...)

With your current patch, a user could use -XX:AllocateHeapAt to point to 
any kind of file system, which (at least in theory) could have any block 
size. For things to work down the road we must ensure than ZGranuleSize 
is a multiple of _block_size.

> 
>    
> https://urldefense.com/v3/__https://nvdimm.wiki.kernel.org/2mib_fs_dax__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpxnIc0as$ 
> 
> So I think we can limit _block_size to OS page size (4KB).
> 
> 
>> ??int ZPhysicalMemoryBacking::create_file_fd(const char* name) const {
>> +? if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) {
>> +??? log_error(gc)("-XX:AllocateHeapAt is needed when 
>> ZAllowHeapOnFileSystem is specified");
>> +??? return -1;
>> +? }
>> +
>> ??? const char* const filesystem = ZLargePages::is_explicit()
>> ?????????????????????????????????? ? ZFILESYSTEM_HUGETLBFS
>> ?????????????????????????????????? : ZFILESYSTEM_TMPFS;
>>
>> This part looks unnecessary, no?
> 
> I added ZAllowHeapOnFileSystem to use with AllocateHeapAt.
> So I want to warn if AllocateHeapAt == NULL.

Yes, but that seems unnecessary, and I suggest it's removed.

cheers,
/Per

> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
>> cheers,
>> Per
>>
>>>
>>> ?? http://cr.openjdk.java.net/~ysuenaga/dax-z/
>>>
>>> If it can be accepted, I will file it to JBS and will propose CSR.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> [1] 
>>> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/filesystems/dax.txt__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpe5WElhc$ 
> 


From suenaga at oss.nttdata.com  Fri Feb 14 14:23:04 2020
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Fri, 14 Feb 2020 23:23:04 +0900
Subject: Use DAX in ZGC
In-Reply-To: <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
Message-ID: <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>

On 2020/02/14 23:08, Per Liden wrote:
> Hi,
> 
> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
>> Hi Per,
>>
>> On 2020/02/14 20:52, Per Liden wrote:
>>> Hi Yasumasa,
>>>
>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>>> Hi all,
>>>>
>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't.
>>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>>
>>>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs.
>>>> Also we need to mount it with "-o dax".
>>>>
>>>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage.
>>>> What do you think this change?
>>>
>>>
>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
>>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \
>>> +????????? "specified by -XX:AllocateHeapAt") ??? \
>>> + ??? \
>>>
>>> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that.
>>
>> I thought so, but I guess it is difficult.
>> PMDK also does not check it automatically.
>>
>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$
>> In addition, we don't seem to be able to get mount option ("-o dax") via syscall.
>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it.
>>
>> Another solution, we can use /proc/mounts, but it might be complex.
> 
> I was maybe hoping you could get this information through some ioctl() command on the file descriptor?

I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in fsx_xflags), but I couldn't get.
(I use ext4 with "-o dax")


>>> ??? const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() : os::large_page_size();
>>> -? if (expected_block_size != _block_size) {
>>> +? if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) {
>>> ????? log_error(gc)("%s filesystem has unexpected block size " SIZE_FORMAT " (expected " SIZE_FORMAT ")",
>>> ??????????????????? is_tmpfs() ? ZFILESYSTEM_TMPFS : ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size);
>>> ????? return;
>>> ??? }
>>>
>>> This part looks potentially dangerous, since we might then be working with an incorrect _block_size.
>>
>> I guess block size in almost filesystems is 4KB even if DAX.
>> (XFS allows variable block sizes...)
> 
> With your current patch, a user could use -XX:AllocateHeapAt to point to any kind of file system, which (at least in theory) could have any block size. For things to work down the road we must ensure than ZGranuleSize is a multiple of _block_size.

Ok.


>> https://urldefense.com/v3/__https://nvdimm.wiki.kernel.org/2mib_fs_dax__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpxnIc0as$
>> So I think we can limit _block_size to OS page size (4KB).
>>
>>
>>> ??int ZPhysicalMemoryBacking::create_file_fd(const char* name) const {
>>> +? if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) {
>>> +??? log_error(gc)("-XX:AllocateHeapAt is needed when ZAllowHeapOnFileSystem is specified");
>>> +??? return -1;
>>> +? }
>>> +
>>> ??? const char* const filesystem = ZLargePages::is_explicit()
>>> ?????????????????????????????????? ? ZFILESYSTEM_HUGETLBFS
>>> ?????????????????????????????????? : ZFILESYSTEM_TMPFS;
>>>
>>> This part looks unnecessary, no?
>>
>> I added ZAllowHeapOnFileSystem to use with AllocateHeapAt.
>> So I want to warn if AllocateHeapAt == NULL.
> 
> Yes, but that seems unnecessary, and I suggest it's removed.

Ok.

BTW is it worth to file JBS?


Cheers,

Yasumasa


> cheers,
> /Per
> 
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>>> cheers,
>>> Per
>>>
>>>>
>>>> ?? http://cr.openjdk.java.net/~ysuenaga/dax-z/
>>>>
>>>> If it can be accepted, I will file it to JBS and will propose CSR.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> [1] https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/filesystems/dax.txt__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpe5WElhc$ 
>>


From patricio.chilano.mateo at oracle.com  Fri Feb 14 14:53:52 2020
From: patricio.chilano.mateo at oracle.com (Patricio Chilano)
Date: Fri, 14 Feb 2020 11:53:52 -0300
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <AM0PR0202MB3331FD660B315CEC25B1DC209B150@AM0PR0202MB3331.eurprd02.prod.outlook.com>
References: <AM0PR0202MB33311AB7623F4164335286309B1B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <3c59b9f9-ec38-18c9-8f24-e1186a08a04a@oracle.com>
 <AM0PR0202MB3331FD660B315CEC25B1DC209B150@AM0PR0202MB3331.eurprd02.prod.outlook.com>
Message-ID: <410eed04-e2ef-0f4f-1c56-19e6734a10f6@oracle.com>

Hi Richard,

On 2/14/20 9:58 AM, Reingruber, Richard wrote:
> Hi Patricio,
>
> thanks for having a look.
>
>    > I?m only commenting on the handshake changes.
>    > I see that operation VM_EnterInterpOnlyMode can be called inside
>    > operation VM_SetFramePop which also allows nested operations. Here is a
>    > comment in VM_SetFramePop definition:
>    >
>    > // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is
>    > // called from the JvmtiEventControllerPrivate::recompute_thread_enabled.
>    >
>    > So if we change VM_EnterInterpOnlyMode to be a handshake, then now we
>    > could have a handshake inside a safepoint operation. The issue I see
>    > there is that at the end of the handshake the polling page of the target
>    > thread could be disarmed. So if the target thread happens to be in a
>    > blocked state just transiently and wakes up then it will not stop for
>    > the ongoing safepoint. Maybe I can file an RFE to assert that the
>    > polling page is armed at the beginning of disarm_safepoint().
>
> I'm really glad you noticed the problematic nesting. This seems to be a general issue: currently a
> handshake cannot be nested in a vm operation. Maybe it should be asserted in the
> Handshake::execute() methods that they are not called by the vm thread evaluating a vm operation?
>
>    > Alternatively I think you could do something similar to what we do in
>    > Deoptimization::deoptimize_all_marked():
>    >
>    >    EnterInterpOnlyModeClosure hs;
>    >    if (SafepointSynchronize::is_at_safepoint()) {
>    >      hs.do_thread(state->get_thread());
>    >    } else {
>    >      Handshake::execute(&hs, state->get_thread());
>    >    }
>    > (you could pass ?EnterInterpOnlyModeClosure? directly to the
>    > HandshakeClosure() constructor)
>
> Maybe this could be used also in the Handshake::execute() methods as general solution?
Right, we could also do that. Avoiding to clear the polling page in 
HandshakeState::clear_handshake() should be enough to fix this issue and 
execute a handshake inside a safepoint, but adding that "if" statement 
in Hanshake::execute() sounds good to avoid all the extra code that we 
go through when executing a handshake. I filed 8239084 to make that change.

>    > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is
>    > always called in a nested operation or just sometimes.
>
> At least one execution path without vm operation exists:
>
>    JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *) : void
>      JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *) : jlong
>        JvmtiEventControllerPrivate::recompute_enabled() : void
>          JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent, bool) : void (2 matches)
>            JvmtiEventController::change_field_watch(jvmtiEvent, bool) : void
>              JvmtiEnv::SetFieldAccessWatch(fieldDescriptor *) : jvmtiError
>                jvmti_SetFieldAccessWatch(jvmtiEnv *, jclass, jfieldID) : jvmtiError
>
> I tend to revert back to VM_EnterInterpOnlyMode as it wasn't my main intent to replace it with a
> handshake, but to avoid making the compiled methods on stack not_entrant.... unless I'm further
> encouraged to do it with a handshake :)
Ah! I think you can still do it with a handshake with the 
Deoptimization::deoptimize_all_marked() like solution. I can change the 
if-else statement with just the Handshake::execute() call in 8239084. 
But up to you.? : )

Thanks,
Patricio
> Thanks again,
> Richard.
>
> -----Original Message-----
> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
> Sent: Donnerstag, 13. Februar 2020 18:47
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>
> Hi Richard,
>
> I?m only commenting on the handshake changes.
> I see that operation VM_EnterInterpOnlyMode can be called inside
> operation VM_SetFramePop which also allows nested operations. Here is a
> comment in VM_SetFramePop definition:
>
> // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is
> // called from the JvmtiEventControllerPrivate::recompute_thread_enabled.
>
> So if we change VM_EnterInterpOnlyMode to be a handshake, then now we
> could have a handshake inside a safepoint operation. The issue I see
> there is that at the end of the handshake the polling page of the target
> thread could be disarmed. So if the target thread happens to be in a
> blocked state just transiently and wakes up then it will not stop for
> the ongoing safepoint. Maybe I can file an RFE to assert that the
> polling page is armed at the beginning of disarm_safepoint().
>
> I think one option could be to remove
> SafepointMechanism::disarm_if_needed() in
> HandshakeState::clear_handshake() and let each JavaThread disarm itself
> for the handshake case.
>
> Alternatively I think you could do something similar to what we do in
> Deoptimization::deoptimize_all_marked():
>
>   ? EnterInterpOnlyModeClosure hs;
>   ? if (SafepointSynchronize::is_at_safepoint()) {
>   ??? hs.do_thread(state->get_thread());
>   ? } else {
>   ??? Handshake::execute(&hs, state->get_thread());
>   ? }
> (you could pass ?EnterInterpOnlyModeClosure? directly to the
> HandshakeClosure() constructor)
>
> I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is
> always called in a nested operation or just sometimes.
>
> Thanks,
> Patricio
>
> On 2/12/20 7:23 AM, Reingruber, Richard wrote:
>> // Repost including hotspot runtime and gc lists.
>> // Dean Long suggested to do so, because the enhancement replaces a vm operation
>> // with a handshake.
>> // Original thread: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-February/030359.html
>>
>> Hi,
>>
>> could I please get reviews for this small enhancement in hotspot's jvmti implementation:
>>
>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8238585
>>
>> The change avoids making all compiled methods on stack not_entrant when switching a java thread to
>> interpreter only execution for jvmti purposes. It is sufficient to deoptimize the compiled frames on stack.
>>
>> Additionally a handshake is used instead of a vm operation to walk the stack and do the deoptimizations.
>>
>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and release builds on all platforms.
>>
>> Thanks, Richard.
>>
>> See also my question if anyone knows a reason for making the compiled methods not_entrant:
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html


From thomas.schatzl at oracle.com  Fri Feb 14 15:05:22 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 14 Feb 2020 16:05:22 +0100
Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator
 overloads
Message-ID: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>

Hi all,

   can I have reviews for this small change to the MemRegion class to 
remove unnecessary new/delete overloads from MemRegion.

They return NULL if there is not enough memory. This is uncommon to do 
in Hotspot code.

All uses in the code either checks whether the allocation is non-NULL 
and then terminates the VM, or will just crash too.

It is easier to just replace the new[] calls with NEW_C_HEAP_ARRAY 
allocations and do the initialization manually.

cc'ing runtime because Coleen added the new operator for working around 
a Metaspace issue in JDK-8021954 years ago.

CR:
https://bugs.openjdk.java.net/browse/JDK-8238999
Webrev:
http://cr.openjdk.java.net/~tschatzl/8238999/webrev/
Testing:
hs-tier1-4

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Fri Feb 14 15:09:20 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 14 Feb 2020 16:09:20 +0100
Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in archive
 regions
Message-ID: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>

Hi all,

   can I have reviews for this change that plugs a (tiny) memory leak 
when we unsuccessfully map CDS archives into the Java heap?

The FileMapInfo::map_heap_data() method allocates some array of 
MemRegions, and in case we fail to map the archive, we return that 
method without assigning it to something or deallocating that memory.

Found while working on JDK-8238999, also out for review, and depending 
on it.

CR:
https://bugs.openjdk.java.net/browse/JDK-8239070
Webrev:
http://cr.openjdk.java.net/~tschatzl/8239070/webrev/
Testing:
hs-tier1-4

Thanks,
   Thomas


From ioi.lam at oracle.com  Fri Feb 14 16:08:45 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 14 Feb 2020 08:08:45 -0800
Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in
 archive regions
In-Reply-To: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
Message-ID: <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com>

Hi Thomas,

Thanks for fixing this issue. Freeing the array at each exit point seems 
error prone. How about: refactoring the function to a 
FileMapInfo::map_heap_data_impl function, allocate inside 
FileMapInfo::map_heap_data(), call map_heap_data() and if it returns 
false, free the array in a single place.

Thanks
- Ioi

On 2/14/20 7:09 AM, Thomas Schatzl wrote:
> Hi all,
>
> ? can I have reviews for this change that plugs a (tiny) memory leak 
> when we unsuccessfully map CDS archives into the Java heap?
>
> The FileMapInfo::map_heap_data() method allocates some array of 
> MemRegions, and in case we fail to map the archive, we return that 
> method without assigning it to something or deallocating that memory.
>
> Found while working on JDK-8238999, also out for review, and depending 
> on it.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8239070
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8239070/webrev/
> Testing:
> hs-tier1-4
>
> Thanks,
> ? Thomas


From ioi.lam at oracle.com  Fri Feb 14 16:12:37 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 14 Feb 2020 08:12:37 -0800
Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator
 overloads
In-Reply-To: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>
References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>
Message-ID: <702ed73a-216c-a9b2-f19c-5f75f3d408c1@oracle.com>

Hi Thomas,

Maybe we can fold this into a MemRegion::create(int size) function?

1750?? MemRegion* regions = NEW_C_HEAP_ARRAY(MemRegion, max, mtInternal);
1751?? for (int i = 0; i < max; i++) {
1752???? ::new (&regions[i]) MemRegion();
1753?? }

Thanks
- Ioi


On 2/14/20 7:05 AM, Thomas Schatzl wrote:
> Hi all,
>
> ? can I have reviews for this small change to the MemRegion class to 
> remove unnecessary new/delete overloads from MemRegion.
>
> They return NULL if there is not enough memory. This is uncommon to do 
> in Hotspot code.
>
> All uses in the code either checks whether the allocation is non-NULL 
> and then terminates the VM, or will just crash too.
>
> It is easier to just replace the new[] calls with NEW_C_HEAP_ARRAY 
> allocations and do the initialization manually.
>
> cc'ing runtime because Coleen added the new operator for working 
> around a Metaspace issue in JDK-8021954 years ago.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238999
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8238999/webrev/
> Testing:
> hs-tier1-4
>
> Thanks,
> ? Thomas


From thomas.schatzl at oracle.com  Fri Feb 14 17:06:10 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 14 Feb 2020 18:06:10 +0100
Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator
 overloads
In-Reply-To: <702ed73a-216c-a9b2-f19c-5f75f3d408c1@oracle.com>
References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>
 <702ed73a-216c-a9b2-f19c-5f75f3d408c1@oracle.com>
Message-ID: <4f585bb8-2d17-8eb1-2db0-6fff177389e6@oracle.com>

Hi,

On 14.02.20 17:12, Ioi Lam wrote:
> Hi Thomas,
> 
> Maybe we can fold this into a MemRegion::create(int size) function?
> 
> 1750?? MemRegion* regions = NEW_C_HEAP_ARRAY(MemRegion, max, mtInternal);
> 1751?? for (int i = 0; i < max; i++) {
> 1752???? ::new (&regions[i]) MemRegion();
> 1753?? }
> 

http://cr.openjdk.java.net/~tschatzl/8238999/webrev.0_to_1
http://cr.openjdk.java.net/~tschatzl/8238999/webrev.1

Thanks,
   Thomas :)


From per.liden at oracle.com  Fri Feb 14 17:08:59 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 14 Feb 2020 18:08:59 +0100
Subject: Use DAX in ZGC
In-Reply-To: <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
Message-ID: <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>

Hi,

On 2/14/20 3:23 PM, Yasumasa Suenaga wrote:
> On 2020/02/14 23:08, Per Liden wrote:
>> Hi,
>>
>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
>>> Hi Per,
>>>
>>> On 2020/02/14 20:52, Per Liden wrote:
>>>> Hi Yasumasa,
>>>>
>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>>>> Hi all,
>>>>>
>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, 
>>>>> but it couldn't.
>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>>>
>>>>> According to kernel document [1], DAX is supported in ext2, ext4, 
>>>>> and xfs.
>>>>> Also we need to mount it with "-o dax".
>>>>>
>>>>> I want to use ZGC on DAX, so I want to introduce new option 
>>>>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as 
>>>>> backing storage.
>>>>> What do you think this change?
>>>>
>>>>
>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
>>>> +????????? "Allow to use filesystem as Java heap backing storage " 
>>>> ??? \
>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \
>>>> + ??? \
>>>>
>>>> Instead of adding a new option it would be preferable to 
>>>> automatically detect that it's a dax mounted filesystem. But I 
>>>> haven't has a chance to look into the best way of doing that.
>>>
>>> I thought so, but I guess it is difficult.
>>> PMDK also does not check it automatically.
>>>
>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ 
>>>
>>> In addition, we don't seem to be able to get mount option ("-o dax") 
>>> via syscall.
>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th 
>>> argument (const void *data). It would be handled in each filesystem, 
>>> so I could not get it.
>>>
>>> Another solution, we can use /proc/mounts, but it might be complex.
>>
>> I was maybe hoping you could get this information through some ioctl() 
>> command on the file descriptor?
> 
> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in 
> fsx_xflags), but I couldn't get.
> (I use ext4 with "-o dax")


Ok. It would be good to get to the bottom of why it's not set.

cheers,
Per


From jianglizhou at google.com  Fri Feb 14 17:15:23 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Fri, 14 Feb 2020 09:15:23 -0800
Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in
 archive regions
In-Reply-To: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
Message-ID: <CALrW1jxsHe2_x76xS-_8D04GD1oC6C=ydkYz1jOP5Q6f4Ks+7g@mail.gmail.com>

Hi Thomas,

Thanks for finding the memory leak. The leak fix probably should be
applied to JDK 11 as well (as a modified backport). I'll try to
request it.

Ioi's suggestion of refactoring region mapping code into a
FileMapInfo::map_heap_data_impl sounds okay to me.

Best regards,
Jiangli

On Fri, Feb 14, 2020 at 7:09 AM Thomas Schatzl
<thomas.schatzl at oracle.com> wrote:
>
> Hi all,
>
>    can I have reviews for this change that plugs a (tiny) memory leak
> when we unsuccessfully map CDS archives into the Java heap?
>
> The FileMapInfo::map_heap_data() method allocates some array of
> MemRegions, and in case we fail to map the archive, we return that
> method without assigning it to something or deallocating that memory.
>
> Found while working on JDK-8238999, also out for review, and depending
> on it.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8239070
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8239070/webrev/
> Testing:
> hs-tier1-4
>
> Thanks,
>    Thomas


From ioi.lam at oracle.com  Fri Feb 14 18:46:31 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 14 Feb 2020 10:46:31 -0800
Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator
 overloads
In-Reply-To: <4f585bb8-2d17-8eb1-2db0-6fff177389e6@oracle.com>
References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>
 <702ed73a-216c-a9b2-f19c-5f75f3d408c1@oracle.com>
 <4f585bb8-2d17-8eb1-2db0-6fff177389e6@oracle.com>
Message-ID: <25db6783-0a9c-b544-34ee-59d40f3e7f6c@oracle.com>

Looks good to me.

Thanks
- Ioi

On 2/14/20 9:06 AM, Thomas Schatzl wrote:
> Hi,
>
> On 14.02.20 17:12, Ioi Lam wrote:
>> Hi Thomas,
>>
>> Maybe we can fold this into a MemRegion::create(int size) function?
>>
>> 1750?? MemRegion* regions = NEW_C_HEAP_ARRAY(MemRegion, max, 
>> mtInternal);
>> 1751?? for (int i = 0; i < max; i++) {
>> 1752???? ::new (&regions[i]) MemRegion();
>> 1753?? }
>>
>
> http://cr.openjdk.java.net/~tschatzl/8238999/webrev.0_to_1
> http://cr.openjdk.java.net/~tschatzl/8238999/webrev.1
>
> Thanks,
> ? Thomas :)
>


From richard.reingruber at sap.com  Fri Feb 14 18:47:20 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Fri, 14 Feb 2020 18:47:20 +0000
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <410eed04-e2ef-0f4f-1c56-19e6734a10f6@oracle.com>
References: <AM0PR0202MB33311AB7623F4164335286309B1B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <3c59b9f9-ec38-18c9-8f24-e1186a08a04a@oracle.com>
 <AM0PR0202MB3331FD660B315CEC25B1DC209B150@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <410eed04-e2ef-0f4f-1c56-19e6734a10f6@oracle.com>
Message-ID: <AM0PR0202MB33318BC60E3460662D42A40D9B150@AM0PR0202MB3331.eurprd02.prod.outlook.com>

Hi Patricio,

  > > I'm really glad you noticed the problematic nesting. This seems to be a general issue: currently a
  > > handshake cannot be nested in a vm operation. Maybe it should be asserted in the
  > > Handshake::execute() methods that they are not called by the vm thread evaluating a vm operation?
  > >
  > >    > Alternatively I think you could do something similar to what we do in
  > >    > Deoptimization::deoptimize_all_marked():
  > >    >
  > >    >    EnterInterpOnlyModeClosure hs;
  > >    >    if (SafepointSynchronize::is_at_safepoint()) {
  > >    >      hs.do_thread(state->get_thread());
  > >    >    } else {
  > >    >      Handshake::execute(&hs, state->get_thread());
  > >    >    }
  > >    > (you could pass ?EnterInterpOnlyModeClosure? directly to the
  > >    > HandshakeClosure() constructor)
  > >
  > > Maybe this could be used also in the Handshake::execute() methods as general solution?
  > Right, we could also do that. Avoiding to clear the polling page in 
  > HandshakeState::clear_handshake() should be enough to fix this issue and 
  > execute a handshake inside a safepoint, but adding that "if" statement 
  > in Hanshake::execute() sounds good to avoid all the extra code that we 
  > go through when executing a handshake. I filed 8239084 to make that change.

Thanks for taking care of this and creating the RFE.

  > 
  > >    > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is
  > >    > always called in a nested operation or just sometimes.
  > >
  > > At least one execution path without vm operation exists:
  > >
  > >    JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *) : void
  > >      JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *) : jlong
  > >        JvmtiEventControllerPrivate::recompute_enabled() : void
  > >          JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent, bool) : void (2 matches)
  > >            JvmtiEventController::change_field_watch(jvmtiEvent, bool) : void
  > >              JvmtiEnv::SetFieldAccessWatch(fieldDescriptor *) : jvmtiError
  > >                jvmti_SetFieldAccessWatch(jvmtiEnv *, jclass, jfieldID) : jvmtiError
  > >
  > > I tend to revert back to VM_EnterInterpOnlyMode as it wasn't my main intent to replace it with a
  > > handshake, but to avoid making the compiled methods on stack not_entrant.... unless I'm further
  > > encouraged to do it with a handshake :)
  > Ah! I think you can still do it with a handshake with the 
  > Deoptimization::deoptimize_all_marked() like solution. I can change the 
  > if-else statement with just the Handshake::execute() call in 8239084. 
  > But up to you.  : )

Well, I think that's enough encouragement :)
I'll wait for 8239084 and try then again.
(no urgency and all)

Thanks,
Richard.

-----Original Message-----
From: Patricio Chilano <patricio.chilano.mateo at oracle.com> 
Sent: Freitag, 14. Februar 2020 15:54
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant

Hi Richard,

On 2/14/20 9:58 AM, Reingruber, Richard wrote:
> Hi Patricio,
>
> thanks for having a look.
>
>    > I?m only commenting on the handshake changes.
>    > I see that operation VM_EnterInterpOnlyMode can be called inside
>    > operation VM_SetFramePop which also allows nested operations. Here is a
>    > comment in VM_SetFramePop definition:
>    >
>    > // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is
>    > // called from the JvmtiEventControllerPrivate::recompute_thread_enabled.
>    >
>    > So if we change VM_EnterInterpOnlyMode to be a handshake, then now we
>    > could have a handshake inside a safepoint operation. The issue I see
>    > there is that at the end of the handshake the polling page of the target
>    > thread could be disarmed. So if the target thread happens to be in a
>    > blocked state just transiently and wakes up then it will not stop for
>    > the ongoing safepoint. Maybe I can file an RFE to assert that the
>    > polling page is armed at the beginning of disarm_safepoint().
>
> I'm really glad you noticed the problematic nesting. This seems to be a general issue: currently a
> handshake cannot be nested in a vm operation. Maybe it should be asserted in the
> Handshake::execute() methods that they are not called by the vm thread evaluating a vm operation?
>
>    > Alternatively I think you could do something similar to what we do in
>    > Deoptimization::deoptimize_all_marked():
>    >
>    >    EnterInterpOnlyModeClosure hs;
>    >    if (SafepointSynchronize::is_at_safepoint()) {
>    >      hs.do_thread(state->get_thread());
>    >    } else {
>    >      Handshake::execute(&hs, state->get_thread());
>    >    }
>    > (you could pass ?EnterInterpOnlyModeClosure? directly to the
>    > HandshakeClosure() constructor)
>
> Maybe this could be used also in the Handshake::execute() methods as general solution?
Right, we could also do that. Avoiding to clear the polling page in 
HandshakeState::clear_handshake() should be enough to fix this issue and 
execute a handshake inside a safepoint, but adding that "if" statement 
in Hanshake::execute() sounds good to avoid all the extra code that we 
go through when executing a handshake. I filed 8239084 to make that change.

>    > I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is
>    > always called in a nested operation or just sometimes.
>
> At least one execution path without vm operation exists:
>
>    JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *) : void
>      JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *) : jlong
>        JvmtiEventControllerPrivate::recompute_enabled() : void
>          JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent, bool) : void (2 matches)
>            JvmtiEventController::change_field_watch(jvmtiEvent, bool) : void
>              JvmtiEnv::SetFieldAccessWatch(fieldDescriptor *) : jvmtiError
>                jvmti_SetFieldAccessWatch(jvmtiEnv *, jclass, jfieldID) : jvmtiError
>
> I tend to revert back to VM_EnterInterpOnlyMode as it wasn't my main intent to replace it with a
> handshake, but to avoid making the compiled methods on stack not_entrant.... unless I'm further
> encouraged to do it with a handshake :)
Ah! I think you can still do it with a handshake with the 
Deoptimization::deoptimize_all_marked() like solution. I can change the 
if-else statement with just the Handshake::execute() call in 8239084. 
But up to you.? : )

Thanks,
Patricio
> Thanks again,
> Richard.
>
> -----Original Message-----
> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
> Sent: Donnerstag, 13. Februar 2020 18:47
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>
> Hi Richard,
>
> I?m only commenting on the handshake changes.
> I see that operation VM_EnterInterpOnlyMode can be called inside
> operation VM_SetFramePop which also allows nested operations. Here is a
> comment in VM_SetFramePop definition:
>
> // Nested operation must be allowed for the VM_EnterInterpOnlyMode that is
> // called from the JvmtiEventControllerPrivate::recompute_thread_enabled.
>
> So if we change VM_EnterInterpOnlyMode to be a handshake, then now we
> could have a handshake inside a safepoint operation. The issue I see
> there is that at the end of the handshake the polling page of the target
> thread could be disarmed. So if the target thread happens to be in a
> blocked state just transiently and wakes up then it will not stop for
> the ongoing safepoint. Maybe I can file an RFE to assert that the
> polling page is armed at the beginning of disarm_safepoint().
>
> I think one option could be to remove
> SafepointMechanism::disarm_if_needed() in
> HandshakeState::clear_handshake() and let each JavaThread disarm itself
> for the handshake case.
>
> Alternatively I think you could do something similar to what we do in
> Deoptimization::deoptimize_all_marked():
>
>   ? EnterInterpOnlyModeClosure hs;
>   ? if (SafepointSynchronize::is_at_safepoint()) {
>   ??? hs.do_thread(state->get_thread());
>   ? } else {
>   ??? Handshake::execute(&hs, state->get_thread());
>   ? }
> (you could pass ?EnterInterpOnlyModeClosure? directly to the
> HandshakeClosure() constructor)
>
> I don?t know JVMTI code so I?m not sure if VM_EnterInterpOnlyMode is
> always called in a nested operation or just sometimes.
>
> Thanks,
> Patricio
>
> On 2/12/20 7:23 AM, Reingruber, Richard wrote:
>> // Repost including hotspot runtime and gc lists.
>> // Dean Long suggested to do so, because the enhancement replaces a vm operation
>> // with a handshake.
>> // Original thread: http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-February/030359.html
>>
>> Hi,
>>
>> could I please get reviews for this small enhancement in hotspot's jvmti implementation:
>>
>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8238585
>>
>> The change avoids making all compiled methods on stack not_entrant when switching a java thread to
>> interpreter only execution for jvmti purposes. It is sufficient to deoptimize the compiled frames on stack.
>>
>> Additionally a handshake is used instead of a vm operation to walk the stack and do the deoptimizations.
>>
>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and release builds on all platforms.
>>
>> Thanks, Richard.
>>
>> See also my question if anyone knows a reason for making the compiled methods not_entrant:
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html


From rkennke at redhat.com  Fri Feb 14 19:00:41 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 14 Feb 2020 20:00:41 +0100
Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native barriers
Message-ID: <dcf41c26-9fe4-ddf5-8da9-bf83fbd4f60b@redhat.com>

This is a fall-out from the recent Lucene debugging session.

Currently, when emitting IN_NATIVE LRB in C1, we generate a simple
runtime call directly in LIR. It'd arguably be more straightforward and
maintainable to simply re-use what we do for regular LRB, with the only
exception to call into a different runtime endpoint from the stub. It
might also be more efficient because it checks heap-stable before
calling into runtime.

If we ever have to backport C1 IN_NATIVE barriers (JDK-8226695) to 11u
(although, we should not strictly need native barriers there right now),
it also means we can skip backporting JDK-8226822, that would no longer
be needed.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8239081
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/

Testing: hotspot_gc_shenandoah (x86_64, x86_32 and aarch64)

Can I please get a review?

Thanks,
Roman


From rkennke at redhat.com  Fri Feb 14 19:29:34 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 14 Feb 2020 20:29:34 +0100
Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification
Message-ID: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>

This is another fallout from the Lucene debugging sessions :-)

Our nmethod verification has a number of problems:

- the assert(oops->length() == oop_count(), "Must match") is too too
strict. Weirdly, while we are registering an nmethod in one thread
(under CodeCache_lock), another thread can already patch the same
nmethod (under Patching_lock). Which can throw off the countings.
- We need to skip Universe::non_oop_word() because that what standard
oop iterator would do too.

It's fixed by:
1. counting actual oops, skipping Universe::non_oop_word() instead of
comparing with oop_count()
2. relaxing the assert from == to >=

I've also added a sanity check:

+    assert(nm == data->nm(), "must be same nmethod");

I've also left in some debug-output but under #ifdef ASSERT_DISABLED. I
found that very useful and wouldn't want to throw it away.

All of this has proven to be useful (if only to exclude the possibility
that we mess up something in handling Nmethods).

Bug:
https://bugs.openjdk.java.net/browse/JDK-8237780
Webrev:
http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.00/

Testing: provided testcase passes now (failed before).
hotspot_gc_shenandoah is fine

Can I please get a review?

Thanks,
Roman


From thomas.schatzl at oracle.com  Fri Feb 14 18:49:05 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 14 Feb 2020 19:49:05 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>
References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
 <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>
Message-ID: <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>

Hi,

On 12.02.20 12:16, Thomas Schatzl wrote:
> Hi Liang,
> 
> On 12.02.20 11:17, Liang Mao wrote:
>> Hi Thomas,
>>
>> I made a new patch for the issues we listed in?JDK-8238686 and
>> JDK-8236073:
>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/
> 
>  ? thanks. I only had time to quickly browse the change, and started 
> building and testing it internally. I will run it through our perf 
> benchmarks to look for regressions of out-of-box behavior.
> 
> I will need a day or two until I can get back to looking at the change 
> in detail. There is currently something else I need to look at. Sorry.

   initial results from testing:

- gc/g1/TestPeriodicCollection.java fails consistently because the heap 
does not shrink as expected (but probably this is a test bug as it may 
expect that uncommit occurs at remark).

- memory usage tends to be significantly higher with the change without 
improving scores.

E.g. I have been running specjvm2008 out-of-box with no settings on 
different machine(s) (32gb ram min), and the build with the changes 
almost consistently uses more heap (i.e. committed size) than without, 
in the range of 10% without any performance increase.

Specjvm2008 benchmarks are pretty simple application in terms of 
behavior, i.e. does the same things all the time. This also means that 
very likely the current sizing is already way beyond the point of 
diminishing returns (actually, this is a known issue :)); I would prefer 
if we did not add to that. ;)

Unfortunately I lost the graphs I had generated (manually), and I do not 
have more time available right now so can't show you right now.

I started some dacapo 2009 runs (running them for 30 iterations each).

Did not have time to look at the changes themselves any further or 
investigate the reasons for this memory usage increase than I already 
did earlier; will continue on Tuesday as I'm taking the day off Monday.

Thanks,
   Thomas


From shade at redhat.com  Fri Feb 14 21:18:11 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 14 Feb 2020 22:18:11 +0100
Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification
In-Reply-To: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>
References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>
Message-ID: <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com>

On 2/14/20 8:29 PM, Roman Kennke wrote:
> I've also left in some debug-output but under #ifdef ASSERT_DISABLED. I
> found that very useful and wouldn't want to throw it away.

I think the proper way to do this is:

 #if 0 // Helpful for debugging

...but then I wonder, why not turn it into the actual fastdebug diagnostics? Our verifier/asserts
very helpfully include a lot of debugging info into hs_err when asserts fail. Surely if we are
chasing a very rare bug, it would be more convenient for hs_err to include that right away, not
require us recompile the VM.

> Webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.00/

*) Looks like you can just initialize "int count = _oop_count" and skip increments in the first loop.

*) Capitalization in "Must", to match the style of other asserts:

 305     assert(nm == data->nm(), "must be same nmethod");

*) assert(false, ...) is probably just fatal(...)

-- 
Thanks,
-Aleksey


From shade at redhat.com  Fri Feb 14 21:23:30 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 14 Feb 2020 22:23:30 +0100
Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native
 barriers
In-Reply-To: <dcf41c26-9fe4-ddf5-8da9-bf83fbd4f60b@redhat.com>
References: <dcf41c26-9fe4-ddf5-8da9-bf83fbd4f60b@redhat.com>
Message-ID: <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com>

On 2/14/20 8:00 PM, Roman Kennke wrote:
> https://bugs.openjdk.java.net/browse/JDK-8239081
> Webrev:
> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/

Only some stylistic nits:

*) I believe the convention is to name these boolean arguments "is_native"?

*) C1ShenandoahLoadReferenceBarrierCodeGenClosure::_native should probably be const?


-- 
Thanks,
-Aleksey


From kim.barrett at oracle.com  Fri Feb 14 23:05:03 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 14 Feb 2020 18:05:03 -0500
Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator
 overloads
In-Reply-To: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>
References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>
Message-ID: <23D0C1DC-E59E-43D9-A54A-467F49385429@oracle.com>

> On Feb 14, 2020, at 10:05 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
>  can I have reviews for this small change to the MemRegion class to remove unnecessary new/delete overloads from MemRegion.
> 
> They return NULL if there is not enough memory. This is uncommon to do in Hotspot code.
> 
> All uses in the code either checks whether the allocation is non-NULL and then terminates the VM, or will just crash too.
> 
> It is easier to just replace the new[] calls with NEW_C_HEAP_ARRAY allocations and do the initialization manually.
> 
> cc'ing runtime because Coleen added the new operator for working around a Metaspace issue in JDK-8021954 years ago.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238999
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8238999/webrev/
> Testing:
> hs-tier1-4
> 
> Thanks,
>  Thomas

------------------------------------------------------------------------------
src/hotspot/share/memory/memRegion.hpp
  96   // Creates and initializes an array of MemRegions of the given length.
  97   static MemRegion* create(uint length, MEMFLAGS flags);

A function named "create" suggests to me creating a single object, not
an array.  Perhaps "make_array" or "create_array" or "new_array"?

------------------------------------------------------------------------------

Other than that, looks good.  I don't need a new webrev for using any
of the suggested names.

I noticed the memory leak in map_heap_data, but see that you filed a
separate bug for that, and already have a reviewed fix for it.


From jianglizhou at google.com  Fri Feb 14 23:14:17 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Fri, 14 Feb 2020 15:14:17 -0800
Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator
 overloads
In-Reply-To: <23D0C1DC-E59E-43D9-A54A-467F49385429@oracle.com>
References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>
 <23D0C1DC-E59E-43D9-A54A-467F49385429@oracle.com>
Message-ID: <CALrW1jwF1aPN4x70VoiZdoXpC4TJMnT3jJ5w3Nu1N2M-gaGoPw@mail.gmail.com>

On Fri, Feb 14, 2020 at 3:05 PM Kim Barrett <kim.barrett at oracle.com> wrote:
>
> > On Feb 14, 2020, at 10:05 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> >
> > Hi all,
> >
> >  can I have reviews for this small change to the MemRegion class to remove unnecessary new/delete overloads from MemRegion.
> >
> > They return NULL if there is not enough memory. This is uncommon to do in Hotspot code.
> >
> > All uses in the code either checks whether the allocation is non-NULL and then terminates the VM, or will just crash too.
> >
> > It is easier to just replace the new[] calls with NEW_C_HEAP_ARRAY allocations and do the initialization manually.
> >
> > cc'ing runtime because Coleen added the new operator for working around a Metaspace issue in JDK-8021954 years ago.
> >
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8238999
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8238999/webrev/
> > Testing:
> > hs-tier1-4
> >
> > Thanks,
> >  Thomas
>
> ------------------------------------------------------------------------------
> src/hotspot/share/memory/memRegion.hpp
>   96   // Creates and initializes an array of MemRegions of the given length.
>   97   static MemRegion* create(uint length, MEMFLAGS flags);
>
> A function named "create" suggests to me creating a single object, not
> an array.  Perhaps "make_array" or "create_array" or "new_array"?

+1. I had the same thoughts when looking at the webrev.1.

Best regards,
Jiangli

>
> ------------------------------------------------------------------------------
>
> Other than that, looks good.  I don't need a new webrev for using any
> of the suggested names.
>
> I noticed the memory leak in map_heap_data, but see that you filed a
> separate bug for that, and already have a reviewed fix for it.
>


From kim.barrett at oracle.com  Sat Feb 15 08:20:38 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Sat, 15 Feb 2020 03:20:38 -0500
Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in
 archive regions
In-Reply-To: <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com>
References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
 <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com>
Message-ID: <A2C13BDE-9C6F-4D52-BACD-93C77B22FB4F@oracle.com>

> On Feb 14, 2020, at 11:08 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
> 
> Hi Thomas,
> 
> Thanks for fixing this issue. Freeing the array at each exit point seems error prone. How about: refactoring the function to a FileMapInfo::map_heap_data_impl function, allocate inside FileMapInfo::map_heap_data(), call map_heap_data() and if it returns false, free the array in a single place.

Rather than splitting up the function, one could add a local cleanup handler:

  ... create and initialize regions object ...
  struct Cleanup {
    MemRegion* _regions;
    bool _aborted;
    Cleanup(MemRegion* regions) : _regions(regions), _aborted(true) {}
    ~Cleanup() { if (_aborted) FREE_C_HEAP_ARRAY(MemRegion, _regions); }
  } cleanup(regions);
  ...
  cleanup._aborted = false;
  return true;
}

or use std::unique_ptr :(


From rkennke at redhat.com  Sat Feb 15 12:35:10 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Sat, 15 Feb 2020 13:35:10 +0100
Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native
 barriers
In-Reply-To: <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com>
References: <dcf41c26-9fe4-ddf5-8da9-bf83fbd4f60b@redhat.com>
 <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com>
Message-ID: <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com>

>> https://bugs.openjdk.java.net/browse/JDK-8239081
>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/
> 
> Only some stylistic nits:
> 
> *) I believe the convention is to name these boolean arguments "is_native"?
> 
> *) C1ShenandoahLoadReferenceBarrierCodeGenClosure::_native should probably be const?

Right, good points! Both fixed here:

http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.01/

Good now?

Thanks for reviewing!
Roman


From suenaga at oss.nttdata.com  Mon Feb 17 04:05:45 2020
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Mon, 17 Feb 2020 13:05:45 +0900
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
Message-ID: <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>

Hi,

I filed this enhancement to JBS:

   JBS: https://bugs.openjdk.java.net/browse/JDK-8239129
   CSR: https://bugs.openjdk.java.net/browse/JDK-8239130
   webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/

Could you review this change and CSR?
It passed tests on submit repo (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205).


Thanks,

Yasumasa


On 2020/02/15 2:08, Per Liden wrote:
> Hi,
> 
> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote:
>> On 2020/02/14 23:08, Per Liden wrote:
>>> Hi,
>>>
>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
>>>> Hi Per,
>>>>
>>>> On 2020/02/14 20:52, Per Liden wrote:
>>>>> Hi Yasumasa,
>>>>>
>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't.
>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>>>>
>>>>>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs.
>>>>>> Also we need to mount it with "-o dax".
>>>>>>
>>>>>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage.
>>>>>> What do you think this change?
>>>>>
>>>>>
>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
>>>>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \
>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \
>>>>> + ??? \
>>>>>
>>>>> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that.
>>>>
>>>> I thought so, but I guess it is difficult.
>>>> PMDK also does not check it automatically.
>>>>
>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$
>>>> In addition, we don't seem to be able to get mount option ("-o dax") via syscall.
>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it.
>>>>
>>>> Another solution, we can use /proc/mounts, but it might be complex.
>>>
>>> I was maybe hoping you could get this information through some ioctl() command on the file descriptor?
>>
>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in fsx_xflags), but I couldn't get.
>> (I use ext4 with "-o dax")
> 
> 
> Ok. It would be good to get to the bottom of why it's not set.
> 
> cheers,
> Per


From maoliang.ml at alibaba-inc.com  Mon Feb 17 06:03:05 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Mon, 17 Feb 2020 14:03:05 +0800
Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?=
 =?UTF-8?B?aGV1cmlzdGljcw==?=
In-Reply-To: <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>
References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
 <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>,
 <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>
Message-ID: <a2f07925-fa83-4d6f-b8a0-7a04d3ad736a.maoliang.ml@alibaba-inc.com>

Hi Thomas,

> - gc/g1/TestPeriodicCollection.java fails consistently because the heap 
> does not shrink as expected (but probably this is a test bug as it may 
> expect that uncommit occurs at remark).

The reason should be that the patch makes shrinking after mixed GC
but the mixed gc doesn't happen. It's the only issue I listed for
the change.

> - memory usage tends to be significantly higher with the change without 
> improving scores.

> E.g. I have been running specjvm2008 out-of-box with no settings on 
> different machine(s) (32gb ram min), and the build with the changes 
> almost consistently uses more heap (i.e. committed size) than without, 
> in the range of 10% without any performance increase.

> Specjvm2008 benchmarks are pretty simple application in terms of 
> behavior, i.e. does the same things all the time. This also means that 
> very likely the current sizing is already way beyond the point of 
> diminishing returns (actually, this is a known issue :)); I would prefer 
> if we did not add to that. ;)

I have 2 questions here.
1) specjvm2008 cannot run with jdk9+:
https://bugs.openjdk.java.net/browse/JDK-8202460
I face the same problem. Do you have any way to perform the test
in JDK15?

2) I didn't understand : "This also means that 
> very likely the current sizing is already way beyond the point of 
> diminishing returns (actually, this is a known issue :));"
Could you please explain more about this?


Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 15 (Sat.) 03:51
To:hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On 12.02.20 12:16, Thomas Schatzl wrote:
> Hi Liang,
> 
> On 12.02.20 11:17, Liang Mao wrote:
>> Hi Thomas,
>>
>> I made a new patch for the issues we listed in JDK-8238686 and
>> JDK-8236073:
>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/
> 
>    thanks. I only had time to quickly browse the change, and started 
> building and testing it internally. I will run it through our perf 
> benchmarks to look for regressions of out-of-box behavior.
> 
> I will need a day or two until I can get back to looking at the change 
> in detail. There is currently something else I need to look at. Sorry.

   initial results from testing:

- gc/g1/TestPeriodicCollection.java fails consistently because the heap 
does not shrink as expected (but probably this is a test bug as it may 
expect that uncommit occurs at remark).

- memory usage tends to be significantly higher with the change without 
improving scores.

E.g. I have been running specjvm2008 out-of-box with no settings on 
different machine(s) (32gb ram min), and the build with the changes 
almost consistently uses more heap (i.e. committed size) than without, 
in the range of 10% without any performance increase.

Specjvm2008 benchmarks are pretty simple application in terms of 
behavior, i.e. does the same things all the time. This also means that 
very likely the current sizing is already way beyond the point of 
diminishing returns (actually, this is a known issue :)); I would prefer 
if we did not add to that. ;)

Unfortunately I lost the graphs I had generated (manually), and I do not 
have more time available right now so can't show you right now.

I started some dacapo 2009 runs (running them for 30 iterations each).

Did not have time to look at the changes themselves any further or 
investigate the reasons for this memory usage increase than I already 
did earlier; will continue on Tuesday as I'm taking the day off Monday.

Thanks,
   Thomas


From per.liden at oracle.com  Mon Feb 17 06:50:32 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 17 Feb 2020 07:50:32 +0100
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
Message-ID: <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>

Hi,

On 2/17/20 5:05 AM, Yasumasa Suenaga wrote:
> Hi,
> 
> I filed this enhancement to JBS:
> 
>  ? JBS: https://bugs.openjdk.java.net/browse/JDK-8239129
>  ? CSR: https://bugs.openjdk.java.net/browse/JDK-8239130

We will not introduce a new option like this, so please withdraw the CSR 
(you also don't need a CSR for adding an experimental options).

>  ? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/

Before this patch can go forward, you need to get to the bottom of how 
to get that ioctl command to work. If it's not possible, you need to 
explain why and propose alternatives that we can discuss.

cheers,
Per

> 
> Could you review this change and CSR?
> It passed tests on submit repo 
> (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205).
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> On 2020/02/15 2:08, Per Liden wrote:
>> Hi,
>>
>> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote:
>>> On 2020/02/14 23:08, Per Liden wrote:
>>>> Hi,
>>>>
>>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
>>>>> Hi Per,
>>>>>
>>>>> On 2020/02/14 20:52, Per Liden wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, 
>>>>>>> but it couldn't.
>>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>>>>>
>>>>>>> According to kernel document [1], DAX is supported in ext2, ext4, 
>>>>>>> and xfs.
>>>>>>> Also we need to mount it with "-o dax".
>>>>>>>
>>>>>>> I want to use ZGC on DAX, so I want to introduce new option 
>>>>>>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as 
>>>>>>> backing storage.
>>>>>>> What do you think this change?
>>>>>>
>>>>>>
>>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
>>>>>> +????????? "Allow to use filesystem as Java heap backing storage " 
>>>>>> ??? \
>>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \
>>>>>> + ??? \
>>>>>>
>>>>>> Instead of adding a new option it would be preferable to 
>>>>>> automatically detect that it's a dax mounted filesystem. But I 
>>>>>> haven't has a chance to look into the best way of doing that.
>>>>>
>>>>> I thought so, but I guess it is difficult.
>>>>> PMDK also does not check it automatically.
>>>>>
>>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ 
>>>>>
>>>>> In addition, we don't seem to be able to get mount option ("-o 
>>>>> dax") via syscall.
>>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th 
>>>>> argument (const void *data). It would be handled in each 
>>>>> filesystem, so I could not get it.
>>>>>
>>>>> Another solution, we can use /proc/mounts, but it might be complex.
>>>>
>>>> I was maybe hoping you could get this information through some 
>>>> ioctl() command on the file descriptor?
>>>
>>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in 
>>> fsx_xflags), but I couldn't get.
>>> (I use ext4 with "-o dax")
>>
>>
>> Ok. It would be good to get to the bottom of why it's not set.
>>
>> cheers,
>> Per


From suenaga at oss.nttdata.com  Mon Feb 17 07:58:41 2020
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Mon, 17 Feb 2020 16:58:41 +0900
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
 <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
Message-ID: <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>

Hi Per,

On 2020/02/17 15:50, Per Liden wrote:
> Hi,
> 
> On 2/17/20 5:05 AM, Yasumasa Suenaga wrote:
>> Hi,
>>
>> I filed this enhancement to JBS:
>>
>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8239129
>> ?? CSR: https://bugs.openjdk.java.net/browse/JDK-8239130
> 
> We will not introduce a new option like this, so please withdraw the CSR (you also don't need a CSR for adding an experimental options).

I withdrew it.


>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/
> 
> Before this patch can go forward, you need to get to the bottom of how to get that ioctl command to work. If it's not possible, you need to explain why and propose alternatives that we can discuss.

I guess it is caused by Linux kernel.
In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags to `struct FS_IOC_FSGETXATTR`.
However `FS_XFLAG_DAX` is not handled in it.

   https://github.com/torvalds/linux/blob/master/fs/ext4/ioctl.c#L525


Cheers,

Yasumasa


> cheers,
> Per
> 
>>
>> Could you review this change and CSR?
>> It passed tests on submit repo (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205).
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>> On 2020/02/15 2:08, Per Liden wrote:
>>> Hi,
>>>
>>> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote:
>>>> On 2020/02/14 23:08, Per Liden wrote:
>>>>> Hi,
>>>>>
>>>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
>>>>>> Hi Per,
>>>>>>
>>>>>> On 2020/02/14 20:52, Per Liden wrote:
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't.
>>>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>>>>>>
>>>>>>>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs.
>>>>>>>> Also we need to mount it with "-o dax".
>>>>>>>>
>>>>>>>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage.
>>>>>>>> What do you think this change?
>>>>>>>
>>>>>>>
>>>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
>>>>>>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \
>>>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \
>>>>>>> + ??? \
>>>>>>>
>>>>>>> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that.
>>>>>>
>>>>>> I thought so, but I guess it is difficult.
>>>>>> PMDK also does not check it automatically.
>>>>>>
>>>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$
>>>>>> In addition, we don't seem to be able to get mount option ("-o dax") via syscall.
>>>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it.
>>>>>>
>>>>>> Another solution, we can use /proc/mounts, but it might be complex.
>>>>>
>>>>> I was maybe hoping you could get this information through some ioctl() command on the file descriptor?
>>>>
>>>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in fsx_xflags), but I couldn't get.
>>>> (I use ext4 with "-o dax")
>>>
>>>
>>> Ok. It would be good to get to the bottom of why it's not set.
>>>
>>> cheers,
>>> Per


From shade at redhat.com  Mon Feb 17 08:12:18 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 17 Feb 2020 09:12:18 +0100
Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native
 barriers
In-Reply-To: <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com>
References: <dcf41c26-9fe4-ddf5-8da9-bf83fbd4f60b@redhat.com>
 <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com>
 <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com>
Message-ID: <3c5f8917-41f8-b312-4f30-6a9c137ed83b@redhat.com>

On 2/15/20 1:35 PM, Roman Kennke wrote:
>>> https://bugs.openjdk.java.net/browse/JDK-8239081
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/
>>
>> Only some stylistic nits:
>>
>> *) I believe the convention is to name these boolean arguments "is_native"?
>>
>> *) C1ShenandoahLoadReferenceBarrierCodeGenClosure::_native should probably be const?
> 
> Right, good points! Both fixed here:
> 
> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.01/

I think variables and fields should be "is_native" too.

Here:

 216     bool native = ShenandoahBarrierSet::use_load_reference_barrier_native(decorators, type);
 217     tmp = load_reference_barrier(gen, tmp, access.resolved_addr(), native);

...and here:

 255 class C1ShenandoahLoadReferenceBarrierCodeGenClosure : public StubAssemblerCodeGenClosure {
 256 private:
 257   const bool _native;

...and here:

  89 class ShenandoahLoadReferenceBarrierStub: public CodeStub {
  ...
  97   bool _native;

...and probably somewhere else too?

-- 
Thanks,
-Aleksey


From maoliang.ml at alibaba-inc.com  Mon Feb 17 09:56:08 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Mon, 17 Feb 2020 17:56:08 +0800
Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?=
 =?UTF-8?B?aGV1cmlzdGljcw==?=
In-Reply-To: <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>
References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
 <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>,
 <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>
Message-ID: <d2ea6071-872f-43b5-8f22-187bc7a2f1a6.maoliang.ml@alibaba-inc.com>

Hi Thomas,

I am able to run specjvm2008 by excluding the compiler subtests
and reproduce the issue that the change commits more memory.
The main cause is addressed that the tests have a lot of
humongous objects which affect the evaluation of adaptive 
IHOP. _last_unrestrained_young_size and _last_allocated_bytes
 used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are
very large. So the expansion after concurrent mark is rather
aggressive. I made an enhancement to restrict this uncommon
expansion with MinHeapFreeRatio:
http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/

Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 15 (Sat.) 03:51
To:hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On 12.02.20 12:16, Thomas Schatzl wrote:
> Hi Liang,
> 
> On 12.02.20 11:17, Liang Mao wrote:
>> Hi Thomas,
>>
>> I made a new patch for the issues we listed in JDK-8238686 and
>> JDK-8236073:
>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/
> 
>    thanks. I only had time to quickly browse the change, and started 
> building and testing it internally. I will run it through our perf 
> benchmarks to look for regressions of out-of-box behavior.
> 
> I will need a day or two until I can get back to looking at the change 
> in detail. There is currently something else I need to look at. Sorry.

   initial results from testing:

- gc/g1/TestPeriodicCollection.java fails consistently because the heap 
does not shrink as expected (but probably this is a test bug as it may 
expect that uncommit occurs at remark).

- memory usage tends to be significantly higher with the change without 
improving scores.

E.g. I have been running specjvm2008 out-of-box with no settings on 
different machine(s) (32gb ram min), and the build with the changes 
almost consistently uses more heap (i.e. committed size) than without, 
in the range of 10% without any performance increase.

Specjvm2008 benchmarks are pretty simple application in terms of 
behavior, i.e. does the same things all the time. This also means that 
very likely the current sizing is already way beyond the point of 
diminishing returns (actually, this is a known issue :)); I would prefer 
if we did not add to that. ;)

Unfortunately I lost the graphs I had generated (manually), and I do not 
have more time available right now so can't show you right now.

I started some dacapo 2009 runs (running them for 30 iterations each).

Did not have time to look at the changes themselves any further or 
investigate the reasons for this memory usage increase than I already 
did earlier; will continue on Tuesday as I'm taking the day off Monday.

Thanks,
   Thomas


From per.liden at oracle.com  Mon Feb 17 10:06:51 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 17 Feb 2020 11:06:51 +0100
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
 <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
 <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>
Message-ID: <fb005276-c7d2-dfcd-5e31-79fb96cc5794@oracle.com>


On 2/17/20 8:58 AM, Yasumasa Suenaga wrote:
> Hi Per,
> 
> On 2020/02/17 15:50, Per Liden wrote:
>> Hi,
>>
>> On 2/17/20 5:05 AM, Yasumasa Suenaga wrote:
>>> Hi,
>>>
>>> I filed this enhancement to JBS:
>>>
>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8239129
>>> ?? CSR: https://bugs.openjdk.java.net/browse/JDK-8239130
>>
>> We will not introduce a new option like this, so please withdraw the 
>> CSR (you also don't need a CSR for adding an experimental options).
> 
> I withdrew it.
> 
> 
>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/
>>
>> Before this patch can go forward, you need to get to the bottom of how 
>> to get that ioctl command to work. If it's not possible, you need to 
>> explain why and propose alternatives that we can discuss.
> 
> I guess it is caused by Linux kernel.
> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags to 
> `struct FS_IOC_FSGETXATTR`.
> However `FS_XFLAG_DAX` is not handled in it.

Did a bit of googleing and it seems the DAX flag is in a bit of flux at 
the moment. I guess this will be fixed down the road, when DAX in the 
kernel becomes a non-experimental feature.

How about we just do like this for now:

http://cr.openjdk.java.net/~pliden/8239129/webrev.0

/Per

> 
>    
> https://urldefense.com/v3/__https://github.com/torvalds/linux/blob/master/fs/ext4/ioctl.c*L525__;Iw!!GqivPVa7Brio!KN3UJKZwdbjq6abJnSXLf78BAUX9742P2PJFHS6kO5_cAgG6kxQEBBBez7uFixk$ 
> 
> 
> Cheers,
> 
> Yasumasa
> 
> 
>> cheers,
>> Per
>>
>>>
>>> Could you review this change and CSR?
>>> It passed tests on submit repo 
>>> (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205).
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> On 2020/02/15 2:08, Per Liden wrote:
>>>> Hi,
>>>>
>>>> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote:
>>>>> On 2020/02/14 23:08, Per Liden wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
>>>>>>> Hi Per,
>>>>>>>
>>>>>>> On 2020/02/14 20:52, Per Liden wrote:
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I tried to allocate heap to DAX on Linux with 
>>>>>>>>> -XX:AllocateHeapAt, but it couldn't.
>>>>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>>>>>>>
>>>>>>>>> According to kernel document [1], DAX is supported in ext2, 
>>>>>>>>> ext4, and xfs.
>>>>>>>>> Also we need to mount it with "-o dax".
>>>>>>>>>
>>>>>>>>> I want to use ZGC on DAX, so I want to introduce new option 
>>>>>>>>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as 
>>>>>>>>> backing storage.
>>>>>>>>> What do you think this change?
>>>>>>>>
>>>>>>>>
>>>>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
>>>>>>>> +????????? "Allow to use filesystem as Java heap backing storage 
>>>>>>>> " ??? \
>>>>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \
>>>>>>>> + ??? \
>>>>>>>>
>>>>>>>> Instead of adding a new option it would be preferable to 
>>>>>>>> automatically detect that it's a dax mounted filesystem. But I 
>>>>>>>> haven't has a chance to look into the best way of doing that.
>>>>>>>
>>>>>>> I thought so, but I guess it is difficult.
>>>>>>> PMDK also does not check it automatically.
>>>>>>>
>>>>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ 
>>>>>>>
>>>>>>> In addition, we don't seem to be able to get mount option ("-o 
>>>>>>> dax") via syscall.
>>>>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th 
>>>>>>> argument (const void *data). It would be handled in each 
>>>>>>> filesystem, so I could not get it.
>>>>>>>
>>>>>>> Another solution, we can use /proc/mounts, but it might be complex.
>>>>>>
>>>>>> I was maybe hoping you could get this information through some 
>>>>>> ioctl() command on the file descriptor?
>>>>>
>>>>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in 
>>>>> fsx_xflags), but I couldn't get.
>>>>> (I use ext4 with "-o dax")
>>>>
>>>>
>>>> Ok. It would be good to get to the bottom of why it's not set.
>>>>
>>>> cheers,
>>>> Per


From stefan.johansson at oracle.com  Mon Feb 17 10:10:07 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Mon, 17 Feb 2020 11:10:07 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <d2ea6071-872f-43b5-8f22-187bc7a2f1a6.maoliang.ml@alibaba-inc.com>
References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
 <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>
 <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>
 <d2ea6071-872f-43b5-8f22-187bc7a2f1a6.maoliang.ml@alibaba-inc.com>
Message-ID: <5883F67D-CE92-4A40-977B-947B413ABF5F@oracle.com>

Hi Liang, 

I?ve started looking at this patch as well and I have a question regarding the change to not allow shrinking after concurrent mark? Before we could shrink the heap at Remark, but now we only check to expand the heap after the concurrent cycle, why is that? I get that we will be able to shrink even more after the mixed collections but if a lot of regions are freed by the concurrent cycle why not check if we can shrink here?

Also good to hear you can run SPECjvm2008, we also avoid running any problematic benchmarks. 

Thanks,
Stefan

> 17 feb. 2020 kl. 10:56 skrev Liang Mao <maoliang.ml at alibaba-inc.com>:
> 
> Hi Thomas,
> 
> I am able to run specjvm2008 by excluding the compiler subtests
> and reproduce the issue that the change commits more memory.
> The main cause is addressed that the tests have a lot of
> humongous objects which affect the evaluation of adaptive 
> IHOP. _last_unrestrained_young_size and _last_allocated_bytes
> used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are
> very large. So the expansion after concurrent mark is rather
> aggressive. I made an enhancement to restrict this uncommon
> expansion with MinHeapFreeRatio:
> http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/
> 
> Thanks,
> Liang
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------
> From:Thomas Schatzl <thomas.schatzl at oracle.com>
> Send Time:2020 Feb. 15 (Sat.) 03:51
> To:hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> 
> Hi,
> 
> On 12.02.20 12:16, Thomas Schatzl wrote:
>> Hi Liang,
>> 
>> On 12.02.20 11:17, Liang Mao wrote:
>>> Hi Thomas,
>>> 
>>> I made a new patch for the issues we listed in JDK-8238686 and
>>> JDK-8236073:
>>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/
>> 
>>   thanks. I only had time to quickly browse the change, and started 
>> building and testing it internally. I will run it through our perf 
>> benchmarks to look for regressions of out-of-box behavior.
>> 
>> I will need a day or two until I can get back to looking at the change 
>> in detail. There is currently something else I need to look at. Sorry.
> 
>   initial results from testing:
> 
> - gc/g1/TestPeriodicCollection.java fails consistently because the heap 
> does not shrink as expected (but probably this is a test bug as it may 
> expect that uncommit occurs at remark).
> 
> - memory usage tends to be significantly higher with the change without 
> improving scores.
> 
> E.g. I have been running specjvm2008 out-of-box with no settings on 
> different machine(s) (32gb ram min), and the build with the changes 
> almost consistently uses more heap (i.e. committed size) than without, 
> in the range of 10% without any performance increase.
> 
> Specjvm2008 benchmarks are pretty simple application in terms of 
> behavior, i.e. does the same things all the time. This also means that 
> very likely the current sizing is already way beyond the point of 
> diminishing returns (actually, this is a known issue :)); I would prefer 
> if we did not add to that. ;)
> 
> Unfortunately I lost the graphs I had generated (manually), and I do not 
> have more time available right now so can't show you right now.
> 
> I started some dacapo 2009 runs (running them for 30 iterations each).
> 
> Did not have time to look at the changes themselves any further or 
> investigate the reasons for this memory usage increase than I already 
> did earlier; will continue on Tuesday as I'm taking the day off Monday.
> 
> Thanks,
>   Thomas
> 


From rkennke at redhat.com  Mon Feb 17 11:49:48 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 17 Feb 2020 12:49:48 +0100
Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native
 barriers
In-Reply-To: <3c5f8917-41f8-b312-4f30-6a9c137ed83b@redhat.com>
References: <dcf41c26-9fe4-ddf5-8da9-bf83fbd4f60b@redhat.com>
 <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com>
 <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com>
 <3c5f8917-41f8-b312-4f30-6a9c137ed83b@redhat.com>
Message-ID: <50043d72-86ff-df50-18e0-48b9f0a0bf0e@redhat.com>

Hi Aleksey,

>>>> https://bugs.openjdk.java.net/browse/JDK-8239081
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.00/
>>>
>>> Only some stylistic nits:
>>>
>>> *) I believe the convention is to name these boolean arguments "is_native"?
>>>
>>> *) C1ShenandoahLoadReferenceBarrierCodeGenClosure::_native should probably be const?
>>
>> Right, good points! Both fixed here:
>>
>> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.01/
> 
> I think variables and fields should be "is_native" too.
> 
> Here:
> 
>  216     bool native = ShenandoahBarrierSet::use_load_reference_barrier_native(decorators, type);
>  217     tmp = load_reference_barrier(gen, tmp, access.resolved_addr(), native);
> 
> ...and here:
> 
>  255 class C1ShenandoahLoadReferenceBarrierCodeGenClosure : public StubAssemblerCodeGenClosure {
>  256 private:
>  257   const bool _native;
> 
> ...and here:
> 
>   89 class ShenandoahLoadReferenceBarrierStub: public CodeStub {
>   ...
>   97   bool _native;
> 
> ...and probably somewhere else too?

Riiiiight.:

http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.02/

(Note: where am I coming from? Java conventions for boolean properties
where field is $property, getter is is$Property(), setter is
set$Property().)

Better now?

Roman


From shade at redhat.com  Mon Feb 17 11:53:21 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 17 Feb 2020 12:53:21 +0100
Subject: RFR: JDK-8239081: Shenandoah: Consolidate C1 LRB and native
 barriers
In-Reply-To: <50043d72-86ff-df50-18e0-48b9f0a0bf0e@redhat.com>
References: <dcf41c26-9fe4-ddf5-8da9-bf83fbd4f60b@redhat.com>
 <83a0a264-f958-6921-0ed7-7859bfe9505f@redhat.com>
 <3b4e78f3-a82c-2801-9d35-292ac18e6907@redhat.com>
 <3c5f8917-41f8-b312-4f30-6a9c137ed83b@redhat.com>
 <50043d72-86ff-df50-18e0-48b9f0a0bf0e@redhat.com>
Message-ID: <b5e18810-f534-04c8-9f2e-f4e1f3aaf943@redhat.com>

On 2/17/20 12:49 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/JDK-8239081/webrev.02/

Looks good.

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Mon Feb 17 12:13:03 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 17 Feb 2020 13:13:03 +0100
Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification
In-Reply-To: <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com>
References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>
 <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com>
Message-ID: <c42a979a-d9a9-222a-2235-0bbf9bdb96e8@redhat.com>

Hi Aleksey,

>> I've also left in some debug-output but under #ifdef ASSERT_DISABLED. I
>> found that very useful and wouldn't want to throw it away.
> 
> I think the proper way to do this is:
> 
>  #if 0 // Helpful for debugging
> 
> ...but then I wonder, why not turn it into the actual fastdebug diagnostics? Our verifier/asserts
> very helpfully include a lot of debugging info into hs_err when asserts fail. Surely if we are
> chasing a very rare bug, it would be more convenient for hs_err to include that right away, not
> require us recompile the VM.

Right, let's leave it there for diagnostics.

>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.00/
> 
> *) Looks like you can just initialize "int count = _oop_count" and skip increments in the first loop.

Right.

> *) Capitalization in "Must", to match the style of other asserts:
> 
>  305     assert(nm == data->nm(), "must be same nmethod");

Ok.

> *) assert(false, ...) is probably just fatal(...)

Ok.

http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/

Good now?

Thanks,
Roman


From shade at redhat.com  Mon Feb 17 12:18:05 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 17 Feb 2020 13:18:05 +0100
Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification
In-Reply-To: <c42a979a-d9a9-222a-2235-0bbf9bdb96e8@redhat.com>
References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>
 <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com>
 <c42a979a-d9a9-222a-2235-0bbf9bdb96e8@redhat.com>
Message-ID: <d05dbcc8-08d4-5474-b993-89ffad968871@redhat.com>

On 2/17/20 1:13 PM, Roman Kennke wrote:
> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/

This is fine.

Although I would probably be open for storing that diagnostics into stringStream (see how
ShenandoahAsserts::print_failure does it), and putting it into the fatal message itself. Pros:
customers would hand over hs_errs to us with the relevant diagnostics. Cons: we can overflow the
stringStream and truncate parts of the data. Your call.

-- 
Thanks,
-Aleksey


From suenaga at oss.nttdata.com  Mon Feb 17 12:28:15 2020
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Mon, 17 Feb 2020 21:28:15 +0900
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <fb005276-c7d2-dfcd-5e31-79fb96cc5794@oracle.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
 <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
 <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>
 <fb005276-c7d2-dfcd-5e31-79fb96cc5794@oracle.com>
Message-ID: <acfff1f6-a3a5-858e-6bbd-a6705a9705ab@oss.nttdata.com>

On 2020/02/17 19:06, Per Liden wrote:
> 
> 
> On 2/17/20 8:58 AM, Yasumasa Suenaga wrote:
>> Hi Per,
>>
>> On 2020/02/17 15:50, Per Liden wrote:
>>> Hi,
>>>
>>> On 2/17/20 5:05 AM, Yasumasa Suenaga wrote:
>>>> Hi,
>>>>
>>>> I filed this enhancement to JBS:
>>>>
>>>> ?? JBS: https://bugs.openjdk.java.net/browse/JDK-8239129
>>>> ?? CSR: https://bugs.openjdk.java.net/browse/JDK-8239130
>>>
>>> We will not introduce a new option like this, so please withdraw the CSR (you also don't need a CSR for adding an experimental options).
>>
>> I withdrew it.
>>
>>
>>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/
>>>
>>> Before this patch can go forward, you need to get to the bottom of how to get that ioctl command to work. If it's not possible, you need to explain why and propose alternatives that we can discuss.
>>
>> I guess it is caused by Linux kernel.
>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags to `struct FS_IOC_FSGETXATTR`.
>> However `FS_XFLAG_DAX` is not handled in it.
> 
> Did a bit of googleing and it seems the DAX flag is in a bit of flux at the moment. I guess this will be fixed down the road, when DAX in the kernel becomes a non-experimental feature.
> 
> How about we just do like this for now:
> 
> http://cr.openjdk.java.net/~pliden/8239129/webrev.0

I thought ZGC requires tmpfs or hugetlbfs due to performance reason.
So I introduced new -XX option to make users aware of it.

If not so, I agree with your change.


Yasumasa


> /Per
> 
>>
>> https://urldefense.com/v3/__https://github.com/torvalds/linux/blob/master/fs/ext4/ioctl.c*L525__;Iw!!GqivPVa7Brio!KN3UJKZwdbjq6abJnSXLf78BAUX9742P2PJFHS6kO5_cAgG6kxQEBBBez7uFixk$
>>
>> Cheers,
>>
>> Yasumasa
>>
>>
>>> cheers,
>>> Per
>>>
>>>>
>>>> Could you review this change and CSR?
>>>> It passed tests on submit repo (mach5-one-ysuenaga-JDK-8239129-20200217-0213-8777205).
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2020/02/15 2:08, Per Liden wrote:
>>>>> Hi,
>>>>>
>>>>> On 2/14/20 3:23 PM, Yasumasa Suenaga wrote:
>>>>>> On 2020/02/14 23:08, Per Liden wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
>>>>>>>> Hi Per,
>>>>>>>>
>>>>>>>> On 2020/02/14 20:52, Per Liden wrote:
>>>>>>>>> Hi Yasumasa,
>>>>>>>>>
>>>>>>>>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but it couldn't.
>>>>>>>>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>>>>>>>>
>>>>>>>>>> According to kernel document [1], DAX is supported in ext2, ext4, and xfs.
>>>>>>>>>> Also we need to mount it with "-o dax".
>>>>>>>>>>
>>>>>>>>>> I want to use ZGC on DAX, so I want to introduce new option -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing storage.
>>>>>>>>>> What do you think this change?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +? experimental(bool, ZAllowHeapOnFileSystem, false, ??? \
>>>>>>>>> +????????? "Allow to use filesystem as Java heap backing storage " ??? \
>>>>>>>>> +????????? "specified by -XX:AllocateHeapAt") ??? \
>>>>>>>>> + ??? \
>>>>>>>>>
>>>>>>>>> Instead of adding a new option it would be preferable to automatically detect that it's a dax mounted filesystem. But I haven't has a chance to look into the best way of doing that.
>>>>>>>>
>>>>>>>> I thought so, but I guess it is difficult.
>>>>>>>> PMDK also does not check it automatically.
>>>>>>>>
>>>>>>>> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$
>>>>>>>> In addition, we don't seem to be able to get mount option ("-o dax") via syscall.
>>>>>>>> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th argument (const void *data). It would be handled in each filesystem, so I could not get it.
>>>>>>>>
>>>>>>>> Another solution, we can use /proc/mounts, but it might be complex.
>>>>>>>
>>>>>>> I was maybe hoping you could get this information through some ioctl() command on the file descriptor?
>>>>>>
>>>>>> I tried to FS_IOC_FSGETXATTR ioctl (FS_XFLAG_DAX might be set in fsx_xflags), but I couldn't get.
>>>>>> (I use ext4 with "-o dax")
>>>>>
>>>>>
>>>>> Ok. It would be good to get to the bottom of why it's not set.
>>>>>
>>>>> cheers,
>>>>> Per


From rkennke at redhat.com  Mon Feb 17 12:34:20 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 17 Feb 2020 13:34:20 +0100
Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification
In-Reply-To: <d05dbcc8-08d4-5474-b993-89ffad968871@redhat.com>
References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>
 <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com>
 <c42a979a-d9a9-222a-2235-0bbf9bdb96e8@redhat.com>
 <d05dbcc8-08d4-5474-b993-89ffad968871@redhat.com>
Message-ID: <cf769eed-3457-983e-c069-cc2d2e017f4f@redhat.com>

>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/
> 
> This is fine.
> 
> Although I would probably be open for storing that diagnostics into stringStream (see how
> ShenandoahAsserts::print_failure does it), and putting it into the fatal message itself. Pros:
> customers would hand over hs_errs to us with the relevant diagnostics. Cons: we can overflow the
> stringStream and truncate parts of the data. Your call.


Ok, let's do that then:

http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.02/

Good?

Roman


From zgu at redhat.com  Mon Feb 17 14:03:28 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 17 Feb 2020 09:03:28 -0500
Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification
In-Reply-To: <cf769eed-3457-983e-c069-cc2d2e017f4f@redhat.com>
References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>
 <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com>
 <c42a979a-d9a9-222a-2235-0bbf9bdb96e8@redhat.com>
 <d05dbcc8-08d4-5474-b993-89ffad968871@redhat.com>
 <cf769eed-3457-983e-c069-cc2d2e017f4f@redhat.com>
Message-ID: <7850613d-3a9f-cece-9a1a-46b4e6823c7f@redhat.com>


On 2/17/20 7:34 AM, Roman Kennke wrote:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/
>>
>> This is fine.
>>
>> Although I would probably be open for storing that diagnostics into stringStream (see how
>> ShenandoahAsserts::print_failure does it), and putting it into the fatal message itself. Pros:
>> customers would hand over hs_errs to us with the relevant diagnostics. Cons: we can overflow the
>> stringStream and truncate parts of the data. Your call.
> 
> 
> Ok, let's do that then:
> 
> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.02/
> 
> Good?

assert_same_oops() itself is assert only (has NOT_DEBUG_RETURN in 
definition), does not need nested ifdef ASSERT ...

-Zhengyu


> 
> Roman
> 


From rkennke at redhat.com  Mon Feb 17 15:27:05 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 17 Feb 2020 16:27:05 +0100
Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification
In-Reply-To: <7850613d-3a9f-cece-9a1a-46b4e6823c7f@redhat.com>
References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>
 <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com>
 <c42a979a-d9a9-222a-2235-0bbf9bdb96e8@redhat.com>
 <d05dbcc8-08d4-5474-b993-89ffad968871@redhat.com>
 <cf769eed-3457-983e-c069-cc2d2e017f4f@redhat.com>
 <7850613d-3a9f-cece-9a1a-46b4e6823c7f@redhat.com>
Message-ID: <5f4bc613-bb6c-7262-934f-5ddac38d3b24@redhat.com>

>>>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/
>>>
>>> This is fine.
>>>
>>> Although I would probably be open for storing that diagnostics into
>>> stringStream (see how
>>> ShenandoahAsserts::print_failure does it), and putting it into the
>>> fatal message itself. Pros:
>>> customers would hand over hs_errs to us with the relevant
>>> diagnostics. Cons: we can overflow the
>>> stringStream and truncate parts of the data. Your call.
>>
>>
>> Ok, let's do that then:
>>
>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.02/
>>
>> Good?
> 
> assert_same_oops() itself is assert only (has NOT_DEBUG_RETURN in
> definition), does not need nested ifdef ASSERT ...


Right! Very good catch!

http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.03/

Good now?

Thanks for reviewing!

Roman


From david.holmes at oracle.com  Tue Feb 18 05:23:46 2020
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 18 Feb 2020 15:23:46 +1000
Subject: RFR: add parallel heap inspection support for jmap histo(G1)
In-Reply-To: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>
References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>
Message-ID: <d3369b66-481a-9c8e-4b7c-4ce8bd37b1cc@oracle.com>

Hi Lin,

Adding in hotspot-gc-dev as they need to see how this interacts with GC 
worker threads, and whether it needs to be extended beyond G1.

I happened to spot one nit when browsing:

src/hotspot/share/gc/shared/collectedHeap.hpp

+   virtual bool run_par_heap_inspect_task(KlassInfoTable* cit,
+                                          BoolObjectClosure* filter,
+                                          size_t* missed_count,
+                                          size_t thread_num) {
+     return NULL;

s/NULL/false/

Cheers,
David

On 18/02/2020 2:15 pm, linzang(??) wrote:
> Dear All,
>  ? ? ?May I ask your help to review the follow changes:
>  ? ? ?webrev: 
> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
>  ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624
>  ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
>  ? ? ?This patch enable parallel heap inspection of G1 for jmap histo.
>  ? ? ?my simple test shown it can speed up 2x of jmap -histo with 
> parallelThreadNum set to 2 for heap at ~500M on 4-core platform.
> 
> ------------------------------------------------------------------------
> BRs,
> Lin


From linzang at tencent.com  Tue Feb 18 06:29:38 2020
From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=)
Date: Tue, 18 Feb 2020 06:29:38 +0000
Subject: RFR: add parallel heap inspection support for jmap
 histo(G1)(Internet mail)
References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>,
 <d3369b66-481a-9c8e-4b7c-4ce8bd37b1cc@oracle.com>
Message-ID: <fff142ab407a4a808cacc4952fb476df@tencent.com>

Dear David,?
? ? ? Thanks a lot!
? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration.
? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap.?
? ?
Thanks,
--------------
Lin
>Hi Lin,
>
>Adding in hotspot-gc-dev as they need to see how this interacts with GC
>worker threads, and whether it needs to be extended beyond G1.
>
>I happened to spot one nit when browsing:
>
>src/hotspot/share/gc/shared/collectedHeap.hpp
>
>+?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit,
>+????????????????????????????????????????? BoolObjectClosure* filter,
>+????????????????????????????????????????? size_t* missed_count,
>+????????????????????????????????????????? size_t thread_num) {
>+???? return NULL;
>
>s/NULL/false/
>
>Cheers,
>David
>
>On 18/02/2020 2:15 pm, linzang(??) wrote:
>> Dear All,
>>? ? ? ?May I ask your help to review the follow changes:
>>? ? ? ?webrev:
>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
>> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624
>> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
>>? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo.
>>? ? ? ?my simple test shown it can speed up 2x of jmap -histo with
>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform.
>>
>> ------------------------------------------------------------------------
>> BRs,
>> Lin
>

From maoliang.ml at alibaba-inc.com  Tue Feb 18 07:27:38 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Tue, 18 Feb 2020 15:27:38 +0800
Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?=
 =?UTF-8?B?aGV1cmlzdGljcw==?=
In-Reply-To: <5883F67D-CE92-4A40-977B-947B413ABF5F@oracle.com>
References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
 <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>
 <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>
 <d2ea6071-872f-43b5-8f22-187bc7a2f1a6.maoliang.ml@alibaba-inc.com>,
 <5883F67D-CE92-4A40-977B-947B413ABF5F@oracle.com>
Message-ID: <7f9be388-c177-4bb4-a6d7-fc4f989250c2.maoliang.ml@alibaba-inc.com>


Hi Stefan,

Thank you for your comments!

Based on previous discussion, the reasons are as below:
1) For the expansion after cm, I think we have the agreement that
original MinHeapFreeRatio might be too large and predicting the necessary
size from adaptive IHOP for expansion sounds reasonable and specjbb2015
have the good result.
2) About when to shrink the heap, I think it's a better spot after
 mixed collections. From my observation, the heap use is still at nearly
peak after remark for most of cases like Alibaba workloads and specjbb2015.
There could be some senario which contains a lot of humongous regions that
remark will cleanup considerable regions. But why don't we decide to shrink
the heap size when most of garbages have been cleaned after mixed GCs. We 
don't need to shrink twice in an old gc cycle. The MaxHeapFreeRatio 70 to 
keep heap capacity with 30% live objects make sence and is unified with full
 gc logic. If we only shrink the heap at remark, the maximum desired capacity
could be 100/30 times of peak heap usage which is obviously not efficient. 

Thanks,
Liang


------------------------------------------------------------------
From:Stefan Johansson <stefan.johansson at oracle.com>
Send Time:2020 Feb. 17 (Mon.) 18:10
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>
Cc:hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi Liang, 

I?ve started looking at this patch as well and I have a question regarding the change to not allow shrinking after concurrent mark? Before we could shrink the heap at Remark, but now we only check to expand the heap after the concurrent cycle, why is that? I get that we will be able to shrink even more after the mixed collections but if a lot of regions are freed by the concurrent cycle why not check if we can shrink here?

Also good to hear you can run SPECjvm2008, we also avoid running any problematic benchmarks. 

Thanks,
Stefan

> 17 feb. 2020 kl. 10:56 skrev Liang Mao <maoliang.ml at alibaba-inc.com>:
> 
> Hi Thomas,
> 
> I am able to run specjvm2008 by excluding the compiler subtests
> and reproduce the issue that the change commits more memory.
> The main cause is addressed that the tests have a lot of
> humongous objects which affect the evaluation of adaptive 
> IHOP. _last_unrestrained_young_size and _last_allocated_bytes
> used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are
> very large. So the expansion after concurrent mark is rather
> aggressive. I made an enhancement to restrict this uncommon
> expansion with MinHeapFreeRatio:
> http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/
> 
> Thanks,
> Liang
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------
> From:Thomas Schatzl <thomas.schatzl at oracle.com>
> Send Time:2020 Feb. 15 (Sat.) 03:51
> To:hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> 
> Hi,
> 
> On 12.02.20 12:16, Thomas Schatzl wrote:
>> Hi Liang,
>> 
>> On 12.02.20 11:17, Liang Mao wrote:
>>> Hi Thomas,
>>> 
>>> I made a new patch for the issues we listed in JDK-8238686 and
>>> JDK-8236073:
>>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/
>> 
>>   thanks. I only had time to quickly browse the change, and started 
>> building and testing it internally. I will run it through our perf 
>> benchmarks to look for regressions of out-of-box behavior.
>> 
>> I will need a day or two until I can get back to looking at the change 
>> in detail. There is currently something else I need to look at. Sorry.
> 
>   initial results from testing:
> 
> - gc/g1/TestPeriodicCollection.java fails consistently because the heap 
> does not shrink as expected (but probably this is a test bug as it may 
> expect that uncommit occurs at remark).
> 
> - memory usage tends to be significantly higher with the change without 
> improving scores.
> 
> E.g. I have been running specjvm2008 out-of-box with no settings on 
> different machine(s) (32gb ram min), and the build with the changes 
> almost consistently uses more heap (i.e. committed size) than without, 
> in the range of 10% without any performance increase.
> 
> Specjvm2008 benchmarks are pretty simple application in terms of 
> behavior, i.e. does the same things all the time. This also means that 
> very likely the current sizing is already way beyond the point of 
> diminishing returns (actually, this is a known issue :)); I would prefer 
> if we did not add to that. ;)
> 
> Unfortunately I lost the graphs I had generated (manually), and I do not 
> have more time available right now so can't show you right now.
> 
> I started some dacapo 2009 runs (running them for 30 iterations each).
> 
> Did not have time to look at the changes themselves any further or 
> investigate the reasons for this memory usage increase than I already 
> did earlier; will continue on Tuesday as I'm taking the day off Monday.
> 
> Thanks,
>   Thomas
> 


From stefan.johansson at oracle.com  Tue Feb 18 09:16:56 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Tue, 18 Feb 2020 10:16:56 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <7f9be388-c177-4bb4-a6d7-fc4f989250c2.maoliang.ml@alibaba-inc.com>
References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
 <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>
 <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>
 <d2ea6071-872f-43b5-8f22-187bc7a2f1a6.maoliang.ml@alibaba-inc.com>
 <5883F67D-CE92-4A40-977B-947B413ABF5F@oracle.com>
 <7f9be388-c177-4bb4-a6d7-fc4f989250c2.maoliang.ml@alibaba-inc.com>
Message-ID: <3D9F54B4-57AE-4328-B95D-12669EEBA6C7@oracle.com>

Hi Liang,

I?ve also recently been looking at shrinking the heap after the Mixed collections. I totally agree that we should try to uncommit at this point, since the usage should be the lowest. I?m however not convinced that we should only uncommit once. My findings the last time, and what I?m seeing with your patch is some very long pauses when doing the uncommit. To try to avoid those I started looking at doing the uncommit concurrently, but didn?t find enough time to really dig into the details around that. An other thing to investigate would be the suggestion in:
https://bugs.openjdk.java.net/browse/JDK-8210709

My main point is that we need to ensure that uncommitting memory don?t come with a to high cost.

Thanks,
Stefan


> 18 feb. 2020 kl. 08:27 skrev Liang Mao <maoliang.ml at alibaba-inc.com>:
> 
> 
> Hi Stefan,
> 
> Thank you for your comments!
> 
> Based on previous discussion, the reasons are as below:
> 1) For the expansion after cm, I think we have the agreement that
> original MinHeapFreeRatio might be too large and predicting the necessary
> size from adaptive IHOP for expansion sounds reasonable and specjbb2015
> have the good result.
> 2) About when to shrink the heap, I think it's a better spot after
> mixed collections. From my observation, the heap use is still at nearly
> peak after remark for most of cases like Alibaba workloads and specjbb2015.
> There could be some senario which contains a lot of humongous regions that
> remark will cleanup considerable regions. But why don't we decide to shrink
> the heap size when most of garbages have been cleaned after mixed GCs. We 
> don't need to shrink twice in an old gc cycle. The MaxHeapFreeRatio 70 to 
> keep heap capacity with 30% live objects make sence and is unified with full
> gc logic. If we only shrink the heap at remark, the maximum desired capacity
> could be 100/30 times of peak heap usage which is obviously not efficient. 
> 
> Thanks,
> Liang
> 
> 
> 
> 
> ------------------------------------------------------------------
> From:Stefan Johansson <stefan.johansson at oracle.com>
> Send Time:2020 Feb. 17 (Mon.) 18:10
> To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>
> Cc:hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> 
> Hi Liang, 
> 
> I?ve started looking at this patch as well and I have a question regarding the change to not allow shrinking after concurrent mark? Before we could shrink the heap at Remark, but now we only check to expand the heap after the concurrent cycle, why is that? I get that we will be able to shrink even more after the mixed collections but if a lot of regions are freed by the concurrent cycle why not check if we can shrink here?
> 
> Also good to hear you can run SPECjvm2008, we also avoid running any problematic benchmarks. 
> 
> Thanks,
> Stefan
> 
> > 17 feb. 2020 kl. 10:56 skrev Liang Mao <maoliang.ml at alibaba-inc.com>:
> > 
> > Hi Thomas,
> > 
> > I am able to run specjvm2008 by excluding the compiler subtests
> > and reproduce the issue that the change commits more memory.
> > The main cause is addressed that the tests have a lot of
> > humongous objects which affect the evaluation of adaptive 
> > IHOP. _last_unrestrained_young_size and _last_allocated_bytes
> > used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are
> > very large. So the expansion after concurrent mark is rather
> > aggressive. I made an enhancement to restrict this uncommon
> > expansion with MinHeapFreeRatio:
> > http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/
> > 
> > Thanks,
> > Liang
> > 
> > 
> > 
> > 
> > 
> > 
> > ------------------------------------------------------------------
> > From:Thomas Schatzl <thomas.schatzl at oracle.com>
> > Send Time:2020 Feb. 15 (Sat.) 03:51
> > To:hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> > 
> > Hi,
> > 
> > On 12.02.20 12:16, Thomas Schatzl wrote:
> >> Hi Liang,
> >> 
> >> On 12.02.20 11:17, Liang Mao wrote:
> >>> Hi Thomas,
> >>> 
> >>> I made a new patch for the issues we listed in JDK-8238686 and
> >>> JDK-8236073:
> >>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/
> >> 
> >>   thanks. I only had time to quickly browse the change, and started 
> >> building and testing it internally. I will run it through our perf 
> >> benchmarks to look for regressions of out-of-box behavior.
> >> 
> >> I will need a day or two until I can get back to looking at the change 
> >> in detail. There is currently something else I need to look at. Sorry.
> > 
> >   initial results from testing:
> > 
> > - gc/g1/TestPeriodicCollection.java fails consistently because the heap 
> > does not shrink as expected (but probably this is a test bug as it may 
> > expect that uncommit occurs at remark).
> > 
> > - memory usage tends to be significantly higher with the change without 
> > improving scores.
> > 
> > E.g. I have been running specjvm2008 out-of-box with no settings on 
> > different machine(s) (32gb ram min), and the build with the changes 
> > almost consistently uses more heap (i.e. committed size) than without, 
> > in the range of 10% without any performance increase.
> > 
> > Specjvm2008 benchmarks are pretty simple application in terms of 
> > behavior, i.e. does the same things all the time. This also means that 
> > very likely the current sizing is already way beyond the point of 
> > diminishing returns (actually, this is a known issue :)); I would prefer 
> > if we did not add to that. ;)
> > 
> > Unfortunately I lost the graphs I had generated (manually), and I do not 
> > have more time available right now so can't show you right now.
> > 
> > I started some dacapo 2009 runs (running them for 30 iterations each).
> > 
> > Did not have time to look at the changes themselves any further or 
> > investigate the reasons for this memory usage increase than I already 
> > did earlier; will continue on Tuesday as I'm taking the day off Monday.
> > 
> > Thanks,
> >   Thomas
> > 
> 


From thomas.schatzl at oracle.com  Tue Feb 18 10:01:35 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 18 Feb 2020 11:01:35 +0100
Subject: RFR (S): 8238999: Remove MemRegion custom new/delete operator
 overloads
In-Reply-To: <CALrW1jwF1aPN4x70VoiZdoXpC4TJMnT3jJ5w3Nu1N2M-gaGoPw@mail.gmail.com>
References: <9a383ba5-68f1-7ed2-5ea9-97b236d2d9a1@oracle.com>
 <23D0C1DC-E59E-43D9-A54A-467F49385429@oracle.com>
 <CALrW1jwF1aPN4x70VoiZdoXpC4TJMnT3jJ5w3Nu1N2M-gaGoPw@mail.gmail.com>
Message-ID: <c7f5cd8d-3289-e3c3-c5c9-479c9fb9372f@oracle.com>

Hi Jiangli, Kim, Ioi,

   thanks for your review.

On 15.02.20 00:14, Jiangli Zhou wrote:
> On Fri, Feb 14, 2020 at 3:05 PM Kim Barrett <kim.barrett at oracle.com> wrote:
>>
>>> On Feb 14, 2020, at 10:05 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>>
>>> Hi all,
>>>
>>>   can I have reviews for this small change to the MemRegion class to remove unnecessary new/delete overloads from MemRegion.
>>>[...]
>> ------------------------------------------------------------------------------
>> src/hotspot/share/memory/memRegion.hpp
>>    96   // Creates and initializes an array of MemRegions of the given length.
>>    97   static MemRegion* create(uint length, MEMFLAGS flags);
>>
>> A function named "create" suggests to me creating a single object, not
>> an array.  Perhaps "make_array" or "create_array" or "new_array"?
> 
> +1. I had the same thoughts when looking at the webrev.1.
> 
> Best regards,
> Jiangli

I pushed with "create_array"; for reference, the webrevs:

http://cr.openjdk.java.net/~tschatzl/8238999/webrev.1_to_2/ (diff(
http://cr.openjdk.java.net/~tschatzl/8238999/webrev.2/ (full)

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Tue Feb 18 10:19:07 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 18 Feb 2020 11:19:07 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <a2f07925-fa83-4d6f-b8a0-7a04d3ad736a.maoliang.ml@alibaba-inc.com>
References: <7107c9f6-ba8e-48a0-830c-5383e2c17ef3.maoliang.ml@alibaba-inc.com>
 <9fdac7ff-bbef-1451-c951-a40dd6f216af@oracle.com>
 <51a95aba-5d31-0b99-87a7-485987f49f54@oracle.com>
 <a2f07925-fa83-4d6f-b8a0-7a04d3ad736a.maoliang.ml@alibaba-inc.com>
Message-ID: <796daaad-c07b-3ad0-7fd5-d8b5bd77c7c3@oracle.com>

Hi Liang,

On 17.02.20 07:03, Liang Mao wrote:
> Hi Thomas,
> 
>> - gc/g1/TestPeriodicCollection.java fails consistently because the heap
>> does not shrink as expected (but probably this is a test bug as it may
>> expect that uncommit occurs at remark).
> 
> The reason should be that the patch makes shrinking after mixed GC
> but the mixed gc doesn't happen. It's the only issue I listed for
> the change.
> 

I agree, but we still need to fix the test in some way ;)

>> - memory usage tends to be significantly higher with the change without
>> improving scores.
> 
>> E.g. I have been running specjvm2008 out-of-box with no settings on
>> different machine(s) (32gb ram min), and the build with the changes
>> almost consistently uses more heap (i.e. committed size) than without,
>> in the range of 10% without any performance increase.
> 
>> Specjvm2008 benchmarks are pretty simple application in terms of
>> behavior, i.e. does the same things all the time. This also means that
>> very likely the current sizing is already way beyond the point of
>> diminishing returns (actually, this is a known issue :)); I would prefer
>> if we did not add to that. ;)
> 
> I have 2 questions here.
> 1) specjvm2008 cannot run with jdk9+:
> https://bugs.openjdk.java.net/browse/JDK-8202460
> I face the same problem. Do you have any way to perform the test
> in JDK15?
> 

Just for reference for others, they (except compiler.compiler, but they 
do not work since jdk8) can be made working with the following options

--add-exports=java.xml/com.sun.org.apache.xerces.internal.parsers=ALL-UNNAMED 

--add-exports=java.xml/com.sun.org.apache.xerces.internal.util=ALL-UNNAMED

as described in the specjvm2008 faq q4.8 [0]

> 2) I didn't understand : "This also means that
>> very likely the current sizing is already way beyond the point of
>> diminishing returns (actually, this is a known issue :));"
> Could you please explain more about this?

Although I think you already got what I meant: current heap sizing 
without the patch already by default sizes the heap too large, i.e. the 
same scores could be achieved using less heap. The change now increases 
the heap even more (obviously without increasing the performance 
either). This seems undesirable.

I saw in the other email that you proposed a fix for that already, I 
will look into this.

Thanks,
   Thomas

[0] https://www.spec.org/jvm2008/docs/FAQ.html#Q4.8


From maoliang.ml at alibaba-inc.com  Tue Feb 18 11:03:31 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Tue, 18 Feb 2020 19:03:31 +0800
Subject: =?UTF-8?B?Q01TLWNvbmN1cnJlbnQtc3dlZXAgc3BlbnQgZXh0cmVtbHkgbG9uZyB0aW1lIHdpdGggOHU=?=
Message-ID: <e6a80643-941d-4bb7-94f8-e252bb7c56ab.maoliang.ml@alibaba-inc.com>

Hi,

I saw an very unusual scenario that concurrent sweep cost extremly long time. 
After a long time concurrent mode failure, the GC was recovered.

In the previous CMS cycle at 05:32, sweep worked fine that we could see
old gen occupancy continued to decrease until sweep completed. After the
 problematic sweep started, the old gen occupancy dropped very slowly in
 the early stage and then increased to promotion failure. The young GC
slowed 30x at that time as well. 

2020-02-18T05:32:12.740+0800: 14277958.735: [CMS-concurrent-sweep-start]
2020-02-18T05:32:13.178+0800: 14277959.173: [GC (Allocation Failure) 2020-02-18T05:32:13.178+0800: 14277959.173: [ParNew 247829K->126882K(3145728K), 0.0259766 secs] 57679525K->55581737K(82837504K), 0.0266680 secs] [Times: user=0.31 sys=0.00, real=0.03 secs]
2020-02-18T05:32:13.764+0800: 14277959.759: [GC (Allocation Failure) 2020-02-18T05:32:13.764+0800: 14277959.759: [ParNew 2224034K->100365K(3145728K), 0.0138495 secs] 56901617K->54782226K(82837504K), 0.0145248 secs] [Times: user=0.30 sys=0.00, real=0.01 secs]
2020-02-18T05:32:14.357+0800: 14277960.353: [GC (Allocation Failure) 2020-02-18T05:32:14.358+0800: 14277960.353: [ParNew 2197517K->107850K(3145728K), 0.0129100 secs] 56173691K->54088043K(82837504K), 0.0135664 secs] [Times: user=0.28 sys=0.00, real=0.01 secs]
...
2020-02-18T05:32:24.655+0800: 14277970.650: [GC (Allocation Failure) 2020-02-18T05:32:24.655+0800: 14277970.650: [ParNew 2218555K->137181K(3145728K), 0.0127018 secs] 45316426K->43236019K(82837504K), 0.0133486 secs] [Times: user=0.26 sys=0.00, real=0.01 secs]
...
2020-02-18T05:32:31.642+0800: 14277977.637: [GC (Allocation Failure) 2020-02-18T05:32:31.642+0800: 14277977.637: [ParNew 2257551K->159124K(3145728K), 0.0135751 secs] 38026885K->35932714K(82837504K), 0.0142222 secs] [Times: user=0.28 sys=0.00, real=0.02 secs]
...
2020-02-18T05:32:36.630+0800: 14277982.625: [GC (Allocation Failure) 2020-02-18T05:32:36.630+0800: 14277982.625: [ParNew 2232867K->134026K(3145728K), 0.0132394 secs] 32928275K->30837706K(82837504K), 0.0138968 secs] [Times: user=0.27 sys=0.00, real=0.02 secs]
2020-02-18T05:32:36.812+0800: 14277982.807: [CMS-concurrent-sweep: 23.250/24.072 secs] [Times: user=137.69 sys=0.00, real=24.07 secs]
...
2020-02-18T06:31:11.378+0800: 14281497.373: [CMS-concurrent-sweep-start]
2020-02-18T06:31:14.862+0800: 14281500.857: [GC (Allocation Failure) 2020-02-18T06:31:14.862+0800: 14281500.857: [ParNew 2925648K->871894K(3145728K), 0.3871002 secs] 58747945K->56735151K(82837504K), 0.3877653 secs] [Times: user=0.84 sys=0.00, real=0.39 secs]
2020-02-18T06:31:18.103+0800: 14281504.098: [GC (Allocation Failure) 2020-02-18T06:31:18.103+0800: 14281504.098: [ParNew 2969046K->828180K(3145728K), 0.4009765 secs] 58768864K->56670502K(82837504K), 0.4016504 secs] [Times: user=0.86 sys=0.00, real=0.40 secs]
...
2020-02-18T06:31:53.530+0800: 14281539.525: [GC (Allocation Failure) 2020-02-18T06:31:53.530+0800: 14281539.525: [ParNew 2884806K->790526K(3145728K), 0.4082761 secs] 58450942K->56399496K(82837504K), 0.4089053 secs] [Times: user=0.86 sys=0.00, real=0.41 secs]
2020-02-18T06:31:56.913+0800: 14281542.908: [GC (Allocation Failure) 2020-02-18T06:31:56.914+0800: 14281542.909: [ParNew 2887678K->830520K(3145728K), 0.3762305 secs] 58449269K->56431129K(82837504K), 0.3768546 secs] [Times: user=0.80 sys=0.00, real=0.37 secs]
...
2020-02-18T06:39:10.412+0800: 14281976.407: [GC (Allocation Failure) 2020-02-18T06:39:10.412+0800: 14281976.407: [ParNew 2912121K->831832K(3145728K), 0.3765129 secs] 55456571K->53415654K(82837504K), 0.3771153 secs] [Times: user=0.90 sys=0.00, real=0.38 secs]
...
2020-02-18T06:47:16.462+0800: 14282462.457: [GC (Allocation Failure) 2020-02-18T06:47:16.463+0800: 14282462.458: [ParNew 2931678K->872899K(3145728K), 0.3223769 secs] 56740808K->54718072K(82837504K), 0.3229753 secs] [Times: user=0.76 sys=0.00, real=0.32 secs]
...
2020-02-18T06:55:36.434+0800: 14282962.429: [GC (Allocation Failure) 2020-02-18T06:55:36.434+0800: 14282962.429: [ParNew 2932041K->837434K(3145728K), 0.3941144 secs] 59917770K->57864311K(82837504K), 0.3947332 secs] [Times: user=0.86 sys=0.00, real=0.39 secs]
...
2020-02-18T07:02:02.007+0800: 14283348.002: [GC (Allocation Failure) 2020-02-18T07:02:02.008+0800: 14283348.003: [ParNew 2871413K->847657K(3145728K), 0.3542050 secs] 68308980K->66321780K(82837504K), 0.3549298 secs] [Times: user=0.79 sys=0.00, real=0.36 secs]
...
2020-02-18T07:05:35.166+0800: 14283561.161: [GC (Allocation Failure) 2020-02-18T07:05:35.166+0800: 14283561.161: [ParNew 2859132K->870624K(3145728K), 0.3499013 secs] 75159088K->73210284K(82837504K), 0.3505328 secs] [Times: user=0.79 sys=0.00, real=0.35 secs] 
2020-02-18T07:05:36.418+0800: 14283562.413: [GC (Allocation Failure) 2020-02-18T07:05:36.419+0800: 14283562.414: [ParNew (0: promotion failure size = 3286)  (1: promotion failure size = 3288)  (2: promotion failure size = 3285)  (3: promotion failure size = 3286)  (4: promotion failure size = 3286)  (5: promotion failure size = 3288)  (6: promotion failure size = 3290)  (7: promotion failure size = 3286)  (8: promotion failure size = 3286)  (9: promotion failure size = 3287)  (10: promotion failure size = 3292)  (11: promotion failure size = 3286)  (12: promotion failure size = 3285)  (13: promotion failure size = 3288)  (14: promotion failure size = 3286)  (15: promotion failure size = 3286)  (16: promotion failure size = 3287)  (17: promotion failure size = 3287)  (18: promotion failure size = 3285)  (19: promotion failure size = 3286)  (20: promotion failure size = 3285)  (21: promotion failure size = 3289)  (22: promotion failure size = 3289)  (23: promotion failure size = 3286)
(promotion failed): 2967776K->3077198K(3145728K), 0.8107304 secs]2020-02-18T07:05:37.229+0800: 14283563.224: [CMS2020-02-18T07:06:57.395+0800: 14283643.390: [CMS-concurrent-sweep: 1752.431/2146.017 secs] [Times: user=8483.92 sys=0.00, real=2146.02 secs]
(concurrent mode failure)      72370103K->31883050K(79691776K), 95.8957007 secs] 75293380K->31883050K(82837504K), [Metaspace: 59347K->59341K(61440K)], 96.7139359 secs] [Times: user=97.25 sys=0.00, real=96.72 secs] 

2020-02-18T07:08:04.605+0800: 14283710.600: [GC (Allocation Failure) 2020-02-18T07:08:04.605+0800: 14283710.600: [ParNew      
2097152K->361723K(3145728K), 0.0195181 secs] 33980202K->32244773K(82837504K), 0.0198080 secs] [Times: user=0.36 sys=0.00, real=0.02 secs] 

There're no any suspicious GC options:
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSScavengeBeforeRemark -XX:GCLogFileSize=209715200 -XX:InitialHeapSize=85899345920 -XX:MaxHeapSize=85899345920 -XX:MaxNewSize=4294967296 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=4294967296 -XX:NumberOfGCLogFiles=5 -XX:OldPLABSize=16 -XX:OldSize=81604378624 -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError=kill -9 %p -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=24 -XX:-ParallelRefProcEnabled -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCRootsTraceTime -XX:+PrintGCTimeStamps -XX:+PrintPromotionFailure -XX:+PrintReferenceGC -XX:SurvivorRatio=2 -XX:+UnlockDiagnosticVMOptions -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC 

The instance has been running for months.
Does anybody know if there is a specific bug? Or it's just because of the fragment issue of CMS?

Thanks,
Liang

From maoliang.ml at alibaba-inc.com  Tue Feb 18 12:48:56 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Tue, 18 Feb 2020 20:48:56 +0800
Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?=
 =?UTF-8?B?aXN0aWNz?=
Message-ID: <d7ac46af-4274-4125-a4e6-104245a154bc.maoliang.ml@alibaba-inc.com>

Hi Stefan,

I don't think we need an earlier shrink if we are trying to do it just a 
bit later after mixed GCs. For the concurrent uncommit, I already had 
a patch http://cr.openjdk.java.net/~luchsh/8236073.webrev/
But need spend sometime to refine it according to Thomas' comments.

Thanks,
Liang


------------------------------------------------------------------
From:Stefan Johansson <stefan.johansson at oracle.com>
Send Time:2020 Feb. 18 (Tue.) 17:17
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>
Cc:hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi Liang,

I?ve also recently been looking at shrinking the heap after the Mixed collections. I totally agree that we should try to uncommit at this point, since the usage should be the lowest. I?m however not convinced that we should only uncommit once. My findings the last time, and what I?m seeing with your patch is some very long pauses when doing the uncommit. To try to avoid those I started looking at doing the uncommit concurrently, but didn?t find enough time to really dig into the details around that. An other thing to investigate would be the suggestion in:
https://bugs.openjdk.java.net/browse/JDK-8210709

My main point is that we need to ensure that uncommitting memory don?t come with a to high cost.

Thanks,
Stefan


> 18 feb. 2020 kl. 08:27 skrev Liang Mao <maoliang.ml at alibaba-inc.com>:
> 
> 
> Hi Stefan,
> 
> Thank you for your comments!
> 
> Based on previous discussion, the reasons are as below:
> 1) For the expansion after cm, I think we have the agreement that
> original MinHeapFreeRatio might be too large and predicting the necessary
> size from adaptive IHOP for expansion sounds reasonable and specjbb2015
> have the good result.
> 2) About when to shrink the heap, I think it's a better spot after
> mixed collections. From my observation, the heap use is still at nearly
> peak after remark for most of cases like Alibaba workloads and specjbb2015.
> There could be some senario which contains a lot of humongous regions that
> remark will cleanup considerable regions. But why don't we decide to shrink
> the heap size when most of garbages have been cleaned after mixed GCs. We 
> don't need to shrink twice in an old gc cycle. The MaxHeapFreeRatio 70 to 
> keep heap capacity with 30% live objects make sence and is unified with full
> gc logic. If we only shrink the heap at remark, the maximum desired capacity
> could be 100/30 times of peak heap usage which is obviously not efficient. 
> 
> Thanks,
> Liang
> 
> 
> 
> 
> ------------------------------------------------------------------
> From:Stefan Johansson <stefan.johansson at oracle.com>
> Send Time:2020 Feb. 17 (Mon.) 18:10
> To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>
> Cc:hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> 
> Hi Liang, 
> 
> I?ve started looking at this patch as well and I have a question regarding the change to not allow shrinking after concurrent mark? Before we could shrink the heap at Remark, but now we only check to expand the heap after the concurrent cycle, why is that? I get that we will be able to shrink even more after the mixed collections but if a lot of regions are freed by the concurrent cycle why not check if we can shrink here?
> 
> Also good to hear you can run SPECjvm2008, we also avoid running any problematic benchmarks. 
> 
> Thanks,
> Stefan
> 
> > 17 feb. 2020 kl. 10:56 skrev Liang Mao <maoliang.ml at alibaba-inc.com>:
> > 
> > Hi Thomas,
> > 
> > I am able to run specjvm2008 by excluding the compiler subtests
> > and reproduce the issue that the change commits more memory.
> > The main cause is addressed that the tests have a lot of
> > humongous objects which affect the evaluation of adaptive 
> > IHOP. _last_unrestrained_young_size and _last_allocated_bytes
> > used in G1AdaptiveIHOPControl::predict_unstrained_buffer_size are
> > very large. So the expansion after concurrent mark is rather
> > aggressive. I made an enhancement to restrict this uncommon
> > expansion with MinHeapFreeRatio:
> > http://cr.openjdk.java.net/~luchsh/8236073.webrev.4/
> > 
> > Thanks,
> > Liang
> > 
> > 
> > 
> > 
> > 
> > 
> > ------------------------------------------------------------------
> > From:Thomas Schatzl <thomas.schatzl at oracle.com>
> > Send Time:2020 Feb. 15 (Sat.) 03:51
> > To:hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> > Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> > 
> > Hi,
> > 
> > On 12.02.20 12:16, Thomas Schatzl wrote:
> >> Hi Liang,
> >> 
> >> On 12.02.20 11:17, Liang Mao wrote:
> >>> Hi Thomas,
> >>> 
> >>> I made a new patch for the issues we listed in JDK-8238686 and
> >>> JDK-8236073:
> >>> http://cr.openjdk.java.net/~luchsh/8236073.webrev.3/
> >> 
> >>   thanks. I only had time to quickly browse the change, and started 
> >> building and testing it internally. I will run it through our perf 
> >> benchmarks to look for regressions of out-of-box behavior.
> >> 
> >> I will need a day or two until I can get back to looking at the change 
> >> in detail. There is currently something else I need to look at. Sorry.
> > 
> >   initial results from testing:
> > 
> > - gc/g1/TestPeriodicCollection.java fails consistently because the heap 
> > does not shrink as expected (but probably this is a test bug as it may 
> > expect that uncommit occurs at remark).
> > 
> > - memory usage tends to be significantly higher with the change without 
> > improving scores.
> > 
> > E.g. I have been running specjvm2008 out-of-box with no settings on 
> > different machine(s) (32gb ram min), and the build with the changes 
> > almost consistently uses more heap (i.e. committed size) than without, 
> > in the range of 10% without any performance increase.
> > 
> > Specjvm2008 benchmarks are pretty simple application in terms of 
> > behavior, i.e. does the same things all the time. This also means that 
> > very likely the current sizing is already way beyond the point of 
> > diminishing returns (actually, this is a known issue :)); I would prefer 
> > if we did not add to that. ;)
> > 
> > Unfortunately I lost the graphs I had generated (manually), and I do not 
> > have more time available right now so can't show you right now.
> > 
> > I started some dacapo 2009 runs (running them for 30 iterations each).
> > 
> > Did not have time to look at the changes themselves any further or 
> > investigate the reasons for this memory usage increase than I already 
> > did earlier; will continue on Tuesday as I'm taking the day off Monday.
> > 
> > Thanks,
> >   Thomas
> > 
> 


From zgu at redhat.com  Tue Feb 18 12:54:32 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 18 Feb 2020 07:54:32 -0500
Subject: RFR: JDK-8237780: Shenandoah: More reliable nmethod verification
In-Reply-To: <5f4bc613-bb6c-7262-934f-5ddac38d3b24@redhat.com>
References: <751cd87f-38e3-cd2f-43fc-f2ef95b41a50@redhat.com>
 <958d35c7-1dfd-39aa-6139-80c794af5791@redhat.com>
 <c42a979a-d9a9-222a-2235-0bbf9bdb96e8@redhat.com>
 <d05dbcc8-08d4-5474-b993-89ffad968871@redhat.com>
 <cf769eed-3457-983e-c069-cc2d2e017f4f@redhat.com>
 <7850613d-3a9f-cece-9a1a-46b4e6823c7f@redhat.com>
 <5f4bc613-bb6c-7262-934f-5ddac38d3b24@redhat.com>
Message-ID: <0da39bd5-8fde-b010-0c19-8973489b6abb@redhat.com>


On 2/17/20 10:27 AM, Roman Kennke wrote:
>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.01/
>>>>
>>>> This is fine.
>>>>
>>>> Although I would probably be open for storing that diagnostics into
>>>> stringStream (see how
>>>> ShenandoahAsserts::print_failure does it), and putting it into the
>>>> fatal message itself. Pros:
>>>> customers would hand over hs_errs to us with the relevant
>>>> diagnostics. Cons: we can overflow the
>>>> stringStream and truncate parts of the data. Your call.
>>>
>>>
>>> Ok, let's do that then:
>>>
>>> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.02/
>>>
>>> Good?
>>
>> assert_same_oops() itself is assert only (has NOT_DEBUG_RETURN in
>> definition), does not need nested ifdef ASSERT ...
> 
> 
> Right! Very good catch!
> 
> http://cr.openjdk.java.net/~rkennke/JDK-8237780/webrev.03/
> 
> Good now?
> 
Yes, good to me.

Thanks,

-Zhengyu

> Thanks for reviewing!
> 
> Roman
> 


From thomas.schatzl at oracle.com  Tue Feb 18 14:17:21 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 18 Feb 2020 15:17:21 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <d7ac46af-4274-4125-a4e6-104245a154bc.maoliang.ml@alibaba-inc.com>
References: <d7ac46af-4274-4125-a4e6-104245a154bc.maoliang.ml@alibaba-inc.com>
Message-ID: <7e7c946e-b35b-54e1-8fc2-53fcf0df5b48@oracle.com>

Hi Liang, Stefan,

   let me summarize the current point of discussion a bit, because I 
believe there are some subtle misunderstandings.

On 18.02.20 13:48, Liang Mao wrote:
> Hi Stefan,
> 
> I don't think we need an earlier shrink if we are trying to do it just a
> bit later after mixed GCs. For the concurrent uncommit, I already had
> a patch http://cr.openjdk.java.net/~luchsh/8236073.webrev/

That's fine, and Stefan agrees too. Let's keep these two separate. These 
changes can even be pushed in a single push if necessary, but I do not 
think so.

Thanks a lot for your really quick responses, we really appreciate your 
effort.

 > But need spend some time to refine it according to Thomas' comments.

Please give us a day to look at the current change (.4) in more detail 
and allow us to respond in a more coherent fashion too :)

We also would like to do some short tests which take some time to 
suggest (hopefully) the best opportunities where/what to improve.


>     ------------------------------------------------------------------
>     From:Stefan Johansson <stefan.johansson at oracle.com>
>     Send Time:2020 Feb. 18 (Tue.) 17:17
>     To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>
>     Cc:hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
>     Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> 
>     Hi?Liang,
> 
>     I?ve?also?recently?been?looking?at?shrinking?the?heap?after?the?Mixed?collections.?I?totally?agree?that?we?should?try?to?uncommit?at?this?point,?since?the?usage?should?be?the?lowest.?I?m?however?not?convinced?that?we?should?only?uncommit?once.?My?findings?the?last?time,?and?what?I?m?seeing?with?your?patch?is?some?very?long?pauses?when?doing?the?uncommit.?To?try?to?avoid?those?I?started?looking?at?doing?the?uncommit?concurrently,?but?didn?t?find?enough?time?to?really?dig?into?the?details?around?that.?An?other?thing?to?investigate?would?be?the?suggestion?in:
>     https://bugs.openjdk.java.net/browse/JDK-8210709
> 
>     My?main?point?is?that?we?need?to?ensure?that?uncommitting?memory?don?t?come?with?a?to?high?cost.
> 
>     Thanks,
>     Stefan
> 
> 
>      >?18?feb.?2020?kl.?08:27?skrev?Liang?Mao?<maoliang.ml at alibaba-inc.com>:
>      >
>      >
>      >?Hi?Stefan,
>      >
>      >?Thank?you?for?your?comments!
>      >
>      >?Based?on?previous?discussion,?the?reasons?are?as?below:
>      >?1)?For?the?expansion?after?cm,?I?think?we?have?the?agreement?that
>      >?original?MinHeapFreeRatio?might?be?too?large?and?predicting?the?necessary
>      >?size?from?adaptive?IHOP?for?expansion?sounds?reasonable?and?specjbb2015
>      >?have?the?good?result.

(Being aware that I am ignoring above comment about premature comments 
two seconds later, but no new comments here, only an attempt on 
clarification :( )

I think Stefan wanted to ask why the heuristic _expands_ at Cleanup at 
all. There does not seem to be need to do that at that time given that 
at the end of mixed gc we resize the heap "optimally" anyway.

Expansion at Cleanup (or Remark) seems to be not desired, so not doing 
anything might be the best option here. At worst G1 will intermittently 
expand automatically at one of the GCs between Cleanup pause and last 
mixed gc.

There may be issues with that idea.


>      >?2)?About?when?to?shrink?the?heap,?I?think?it's?a?better?spot?after
>      >?mixed?collections.?From?my?observation,?the?heap?use?is?still?at?nearly
>      >?peak?after?remark?for?most?of?cases?like?Alibaba?workloads?and?specjbb2015.
>      >?There?could?be?some?senario?which?contains?a?lot?of?humongous?regions?that
>      >?remark?will?cleanup?considerable?regions.?But?why?don't?we?decide?to?shrink
>      >?the?heap?size?when?most?of?garbages?have?been?cleaned?after?mixed?GCs.?We
>      >?don't?need?to?shrink?twice?in?an?old?gc?cycle.?The?MaxHeapFreeRatio?70?to
>      >?keep?heap?capacity?with?30%?live?objects?make?sence?and?is?unified?with?full
>      >?gc?logic.?If?we?only?shrink?the?heap?at?remark,?the?maximum?desired?capacity
>      >?could?be?100/30?times?of?peak?heap?usage?which?is?obviously?not?efficient.
>      >

I think Stefan suggests to shrink both at Remark (but do not expand 
then), particularly if we happen to really have a lot of free data now. 
Then refine that result at the last mixed gc. I.e. not let the user wait 
that long.

While that has disadvantages like you mentioned about maybe doing the 
uncommit twice in a single cycle, the increased responsiveness of the 
application due to memory demands elsewhere might be more important.

How much to shrink? My opinion would be to only shrink if there is a 
huge amount of free memory at this point, i.e. keep that rule simple and 
do the more exact heuristics later (like only considering MaxHeapFreeRatio)

My opinion is to free unused memory asap, so I have a slight preference 
towards also uncommitting during Remark. However that can be added later 
too.

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Tue Feb 18 16:03:41 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 18 Feb 2020 17:03:41 +0100
Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in
 archive regions
In-Reply-To: <A2C13BDE-9C6F-4D52-BACD-93C77B22FB4F@oracle.com>
References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
 <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com>
 <A2C13BDE-9C6F-4D52-BACD-93C77B22FB4F@oracle.com>
Message-ID: <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com>

Hi,

On 15.02.20 09:20, Kim Barrett wrote:
>> On Feb 14, 2020, at 11:08 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>> Hi Thomas,
>>
>> Thanks for fixing this issue. Freeing the array at each exit point seems error prone. How about: refactoring the function to a FileMapInfo::map_heap_data_impl function, allocate inside FileMapInfo::map_heap_data(), call map_heap_data() and if it returns false, free the array in a single place.
> 
> Rather than splitting up the function, one could add a local cleanup handler:
> 
>    ... create and initialize regions object ...
>    struct Cleanup {
>      MemRegion* _regions;
>      bool _aborted;
>      Cleanup(MemRegion* regions) : _regions(regions), _aborted(true) {}
>      ~Cleanup() { if (_aborted) FREE_C_HEAP_ARRAY(MemRegion, _regions); }
>    } cleanup(regions);
>    ...
>    cleanup._aborted = false;
>    return true;
> }
> 

   I implemented that as it is least intrusive in the end.

http://cr.openjdk.java.net/~tschatzl/8239070/webrev.1/ (full, no point 
in providing diff)

Thanks,
   Thomas


From zgu at redhat.com  Tue Feb 18 16:52:48 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 18 Feb 2020 11:52:48 -0500
Subject: [15] RFR 8239354: Shenandoah: minor enhancements to traversal GC
Message-ID: <f8f9b094-3340-7c97-2d01-622164fcd4a7@redhat.com>

1) Added assertion to catch evacuation after completion of heap 
traversal. This should help catch the bug demonstrated in sh-jdk11 w/o 
JDK-8237396.

2) Retire TLAB/GCLAB after completion of heap traversal. Current code 
retires TLAB/GCLAB at the beginning final traversal, but STW traversal 
still uses GCLAB to evacuate remaining objects.

3) Added comments regarding why need to retire TLAB/GCLAB, even we don't 
need heap to be parsable.


Bug: https://bugs.openjdk.java.net/browse/JDK-8239354
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239354/webrev.00/index.html

Test:
   hotspot_gc_shenandoah

Thanks,

-Zhengyu


From ioi.lam at oracle.com  Tue Feb 18 18:00:28 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 18 Feb 2020 10:00:28 -0800
Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in
 archive regions
In-Reply-To: <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com>
References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
 <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com>
 <A2C13BDE-9C6F-4D52-BACD-93C77B22FB4F@oracle.com>
 <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com>
Message-ID: <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com>

The changes look OK to me.

I think this line doesn't need to be changed.

1820?? *heap_mem = cleanup._regions;

Thanks
- Ioi

On 2/18/20 8:03 AM, Thomas Schatzl wrote:
> Hi,
>
> On 15.02.20 09:20, Kim Barrett wrote:
>>> On Feb 14, 2020, at 11:08 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>>
>>> Hi Thomas,
>>>
>>> Thanks for fixing this issue. Freeing the array at each exit point 
>>> seems error prone. How about: refactoring the function to a 
>>> FileMapInfo::map_heap_data_impl function, allocate inside 
>>> FileMapInfo::map_heap_data(), call map_heap_data() and if it returns 
>>> false, free the array in a single place.
>>
>> Rather than splitting up the function, one could add a local cleanup 
>> handler:
>>
>> ?? ... create and initialize regions object ...
>> ?? struct Cleanup {
>> ???? MemRegion* _regions;
>> ???? bool _aborted;
>> ???? Cleanup(MemRegion* regions) : _regions(regions), _aborted(true) {}
>> ???? ~Cleanup() { if (_aborted) FREE_C_HEAP_ARRAY(MemRegion, 
>> _regions); }
>> ?? } cleanup(regions);
>> ?? ...
>> ?? cleanup._aborted = false;
>> ?? return true;
>> }
>>
>
> ? I implemented that as it is least intrusive in the end.
>
> http://cr.openjdk.java.net/~tschatzl/8239070/webrev.1/ (full, no point 
> in providing diff)
>
> Thanks,
> ? Thomas


From kim.barrett at oracle.com  Tue Feb 18 20:30:28 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 18 Feb 2020 15:30:28 -0500
Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in
 archive regions
In-Reply-To: <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com>
References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
 <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com>
 <A2C13BDE-9C6F-4D52-BACD-93C77B22FB4F@oracle.com>
 <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com>
 <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com>
Message-ID: <B84F1309-46AA-4758-9095-9B22FC4BA25F@oracle.com>

> On Feb 18, 2020, at 1:00 PM, Ioi Lam <ioi.lam at oracle.com> wrote:
> 
> The changes look OK to me.
> 
> I think this line doesn't need to be changed.
> 
> 1820   *heap_mem = cleanup._regions;

+1

I don?t need a new webrev for revert of line 1820.

> 
> Thanks
> - Ioi
> 
> On 2/18/20 8:03 AM, Thomas Schatzl wrote:
>> Hi,
>> 
>> On 15.02.20 09:20, Kim Barrett wrote:
>>>> On Feb 14, 2020, at 11:08 AM, Ioi Lam <ioi.lam at oracle.com> wrote:
>>>> 
>>>> Hi Thomas,
>>>> 
>>>> Thanks for fixing this issue. Freeing the array at each exit point seems error prone. How about: refactoring the function to a FileMapInfo::map_heap_data_impl function, allocate inside FileMapInfo::map_heap_data(), call map_heap_data() and if it returns false, free the array in a single place.
>>> 
>>> Rather than splitting up the function, one could add a local cleanup handler:
>>> 
>>>    ... create and initialize regions object ...
>>>    struct Cleanup {
>>>      MemRegion* _regions;
>>>      bool _aborted;
>>>      Cleanup(MemRegion* regions) : _regions(regions), _aborted(true) {}
>>>      ~Cleanup() { if (_aborted) FREE_C_HEAP_ARRAY(MemRegion, _regions); }
>>>    } cleanup(regions);
>>>    ...
>>>    cleanup._aborted = false;
>>>    return true;
>>> }
>>> 
>> 
>>   I implemented that as it is least intrusive in the end.
>> 
>> http://cr.openjdk.java.net/~tschatzl/8239070/webrev.1/ (full, no point in providing diff)
>> 
>> Thanks,
>>   Thomas


From thomas.schatzl at oracle.com  Tue Feb 18 20:52:08 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 18 Feb 2020 21:52:08 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <d7ac46af-4274-4125-a4e6-104245a154bc.maoliang.ml@alibaba-inc.com>
References: <d7ac46af-4274-4125-a4e6-104245a154bc.maoliang.ml@alibaba-inc.com>
Message-ID: <8ea20904-ca6d-f0cb-9cf7-a02dc717ebc7@oracle.com>

Hi Liang,

   dug through the changes a bit, took longer and only managed to do 
cursory testing as there were a few issues.

That (very) cursory testing showed that memory consumption on one 
specjvm2008 out-of-box application is as baselined, but currently 
running the full set.

The change I used is available at 
http://cr.openjdk.java.net/~tschatzl/8236073/webrev.2/, I will step 
through what changed below.

- not really a bug and pre-existing, but I changed the various 
resize_heap_* to always include the exact GC pause because particularly 
for the "after_concurrent_mark" suffix it is not clear what this means. 
I.e. in the Remark or Cleanup pauses, or at the real end of concurrent 
cycle (still concurrent)?

This has not been done consistently yet.

- I think there has been a copy&paste error in 
G1CollectedHeap::resize_heap_if_necessary, the two calculations to 
determine the min and max desired capacity were equal. I.e.

1178   size_t minimum_desired_capacity = 
_heap_sizing_policy->target_heap_capacity(used_after_gc, MinHeapFreeRatio);
1179   size_t maximum_desired_capacity = 
_heap_sizing_policy->target_heap_capacity(used_after_gc, MinHeapFreeRatio);

Note the duplicate use of MinHeapFreeRatio. Fixed in above webrev.

- CollectorState contains flags that basically indicate they type of GC, 
which should be set at the start of gc and updated at the end of gc. The 
new finish_of_mixed_gc does not fit here as it is basically a flag 
indicating that we need to do the resizing.

The previous implementation also lets the first young-only gc after the 
last mixed gc do the resizing which is probably not as intended.

By adding an additional policy()->next_gc_should_be_mixed() call instead 
of the state check (and removing this pause state/type completely) fixes 
this (I think ;)).

- the suggested change removes the expansion during Cleanup for the 
reasons stated earlier. This removes the need for some code in the 
G1HeapSizingPolicy where originally _minimum_desired_bytes_after_last_cm 
had been stored. It's better to move this to G1Policy (and pre-existing, 
G1Policy should be the owner of G1HeapSizingPolicy which I did not fix 
in this change)

- (the suggested change does not add the shrinking at remark discussed 
earlier; I still think it would be nice and maybe fix that failing 
regression test)

- there should be more gc+heap+ergo logging of calculated 
targets/desired sizes in the new methods in G1HeapSizingPolicy, 
otherwise the decisions are very hard to follow after the fact.

- I believe there is an underestimation of the desired bytes after 
concurrent mark with adaptive IHOP enabled in the current code. If you 
look at the method G1Policy::desired_bytes_after_concurrent_mark(), the 
two terms returned by that method do not seem equal. I.e. 
G1AdaptiveIHOP::predict_unrestrained_buffer_size() does not contain the 
used bytes, the reserve and other parts used for the static IHOP (i.e. 
minimum_desired_buffer_size == 0).

At most, G1AdaptiveIHOP::predict_unrestrained_buffer_size() covers the 
young gen part of the latter. Some better name for this should be found 
too =)

As mentioned, currently running more tests until tomorrow (even with 
above known issues) to get some experience/data to look at with the 
sizing at mixed gc heuristic.

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Tue Feb 18 20:54:12 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 18 Feb 2020 21:54:12 +0100
Subject: RFR (S): 8239070: Memory leak when unsuccessfully mapping in
 archive regions
In-Reply-To: <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com>
References: <8f935fd7-3cb3-d5ae-5352-804e51e410b5@oracle.com>
 <605ad3e6-2d09-70ac-1aad-705b2b26ee6a@oracle.com>
 <A2C13BDE-9C6F-4D52-BACD-93C77B22FB4F@oracle.com>
 <6bf87a68-bc48-42f5-5325-e11d6e1a9195@oracle.com>
 <6ea4d12b-9f45-6d43-3635-86b6181a70be@oracle.com>
Message-ID: <62acfed8-5975-078f-7f26-f669ceabec50@oracle.com>

Hi Ioi,

On 18.02.20 19:00, Ioi Lam wrote:
> The changes look OK to me.
> 
> I think this line doesn't need to be changed.
> 
> 1820?? *heap_mem = cleanup._regions;

   fixed and regenerated latest webrev (.1)

http://cr.openjdk.java.net/~tschatzl/8239070/webrev.1/

Thanks,
   Thomas


From serguei.spitsyn at oracle.com  Tue Feb 18 20:59:10 2020
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 18 Feb 2020 12:59:10 -0800
Subject: RFR: add parallel heap inspection support for jmap
 histo(G1)(Internet mail)
In-Reply-To: <fff142ab407a4a808cacc4952fb476df@tencent.com>
References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>
 <d3369b66-481a-9c8e-4b7c-4ce8bd37b1cc@oracle.com>
 <fff142ab407a4a808cacc4952fb476df@tencent.com>
Message-ID: <e4175fbf-868b-14fe-39f9-05cc852fa203@oracle.com>

Hi Lin,

Could you, please, re-post your RFR with the right enhancement number in 
the message subject?
It will be more trackable this way.

Thanks,
Serguei


On 2/17/20 10:29 PM, linzang(??) wrote:
> Dear David,
>  ? ? ? Thanks a lot!
>  ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
>  ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration.
>  ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap.
>     
> Thanks,
> --------------
> Lin
>> Hi Lin,
>>
>> Adding in hotspot-gc-dev as they need to see how this interacts with GC
>> worker threads, and whether it needs to be extended beyond G1.
>>
>> I happened to spot one nit when browsing:
>>
>> src/hotspot/share/gc/shared/collectedHeap.hpp
>>
>> +?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit,
>> +????????????????????????????????????????? BoolObjectClosure* filter,
>> +????????????????????????????????????????? size_t* missed_count,
>> +????????????????????????????????????????? size_t thread_num) {
>> +???? return NULL;
>>
>> s/NULL/false/
>>
>> Cheers,
>> David
>>
>> On 18/02/2020 2:15 pm, linzang(??) wrote:
>>> Dear All,
>>>  ? ? ? ?May I ask your help to review the follow changes:
>>>  ? ? ? ?webrev:
>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
>>>  ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624
>>>  ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
>>>  ? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo.
>>>  ? ? ? ?my simple test shown it can speed up 2x of jmap -histo with
>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform.
>>>
>>> ------------------------------------------------------------------------
>>> BRs,
>>> Lin
> >


From linzang at tencent.com  Wed Feb 19 01:34:40 2020
From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=)
Date: Wed, 19 Feb 2020 01:34:40 +0000
Subject: RFR: JDK-8215264 add parallel heap inspection support for jmap
 histo(G1)(Internet mail)
References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>,
 <d3369b66-481a-9c8e-4b7c-4ce8bd37b1cc@oracle.com>,
 <fff142ab407a4a808cacc4952fb476df@tencent.com>,
 <e4175fbf-868b-14fe-39f9-05cc852fa203@oracle.com>
Message-ID: <7e215dc97a584554b3e854d8801dc256@tencent.com>

Re-post this RFR with enhancement number to make it trackable.
webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/
bug: https://bugs.openjdk.java.net/browse/JDK-8215624
CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
?
Thanks!
--------------
Lin
>Hi Lin,
>
>Could you, please, re-post your RFR with the right enhancement number in
>the message subject?
>It will be more trackable this way.
>
>Thanks,
>Serguei
>
>
>On 2/17/20 10:29 PM, linzang(??) wrote:
>> Dear David,
>>? ? ? ? Thanks a lot!
>> ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
>>? ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration.
>>? ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap.
>>????
>> Thanks,
>> --------------
>> Lin
>>> Hi Lin,
>>>
>>> Adding in hotspot-gc-dev as they need to see how this interacts with GC
>>> worker threads, and whether it needs to be extended beyond G1.
>>>
>>> I happened to spot one nit when browsing:
>>>
>>> src/hotspot/share/gc/shared/collectedHeap.hpp
>>>
>>> +?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit,
>>> +????????????????????????????????????????? BoolObjectClosure* filter,
>>> +????????????????????????????????????????? size_t* missed_count,
>>> +????????????????????????????????????????? size_t thread_num) {
>>> +???? return NULL;
>>>
>>> s/NULL/false/
>>>
>>> Cheers,
>>> David
>>>
>>> On 18/02/2020 2:15 pm, linzang(??) wrote:
>>>> Dear All,
>>>>? ? ? ? ?May I ask your help to review the follow changes:
>>>>? ? ? ? ?webrev:
>>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
>>>> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624
>>>> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
>>>>? ? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo.
>>>>? ? ? ? ?my simple test shown it can speed up 2x of jmap -histo with
>>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform.
>>>>
>>>> ------------------------------------------------------------------------
>>>> BRs,
>>>> Lin
>> >
>

From linzang at tencent.com  Wed Feb 19 01:38:31 2020
From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=)
Date: Wed, 19 Feb 2020 01:38:31 +0000
Subject: RFR: JDK-8215264 add parallel heap inspection support for jmap
 histo(G1)(Internet mail)
References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>,
 <d3369b66-481a-9c8e-4b7c-4ce8bd37b1cc@oracle.com>,
 <fff142ab407a4a808cacc4952fb476df@tencent.com>,
 <e4175fbf-868b-14fe-39f9-05cc852fa203@oracle.com>,
 <7e215dc97a584554b3e854d8801dc256@tencent.com>
Message-ID: <b9cfa0ab6ddd47eda8e077c6e0febb0d@tencent.com>

So sorry the number in this title is wrong. please ignore it !
so sorry about making this mistake.?
will re post with correct number.?
--------------
Lin
>Re-post this RFR with enhancement number to make it trackable.
>webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/
>bug: https://bugs.openjdk.java.net/browse/JDK-8215624
>CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
>?
>Thanks!
>--------------
>Lin
>>Hi Lin,
>>
>>Could you, please, re-post your RFR with the right enhancement number in
>>the message subject?
>>It will be more trackable this way.
>>
>>Thanks,
>>Serguei
>>
>>
>>On 2/17/20 10:29 PM, linzang(??) wrote:
>>> Dear David,
>>>? ? ? ? Thanks a lot!
>>> ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
>>>? ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration.
>>>? ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap.
>>>????
>>> Thanks,
>>> --------------
>>> Lin
>>>> Hi Lin,
>>>>
>>>> Adding in hotspot-gc-dev as they need to see how this interacts with GC
>>>> worker threads, and whether it needs to be extended beyond G1.
>>>>
>>>> I happened to spot one nit when browsing:
>>>>
>>>> src/hotspot/share/gc/shared/collectedHeap.hpp
>>>>
>>>> +?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit,
>>>> +????????????????????????????????????????? BoolObjectClosure* filter,
>>>> +????????????????????????????????????????? size_t* missed_count,
>>>> +????????????????????????????????????????? size_t thread_num) {
>>>> +???? return NULL;
>>>>
>>>> s/NULL/false/
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On 18/02/2020 2:15 pm, linzang(??) wrote:
>>>>> Dear All,
>>>>>? ? ? ? ?May I ask your help to review the follow changes:
>>>>>? ? ? ? ?webrev:
>>>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
>>>>> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624
>>>>> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
>>>>>? ? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo.
>>>>>? ? ? ? ?my simple test shown it can speed up 2x of jmap -histo with
>>>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform.
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> BRs,
>>>>> Lin
>>> >
>>

From linzang at tencent.com  Wed Feb 19 01:40:34 2020
From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=)
Date: Wed, 19 Feb 2020 01:40:34 +0000
Subject: RFR: JDK-8215624 add parallel heap inspection support for jmap
 histo(G1)(Internet mail)
References: <11bca96c0e7745f5b2558cc49b42b996@tencent.com>,
 <d3369b66-481a-9c8e-4b7c-4ce8bd37b1cc@oracle.com>,
 <fff142ab407a4a808cacc4952fb476df@tencent.com>,
 <e4175fbf-868b-14fe-39f9-05cc852fa203@oracle.com>
Message-ID: <c75874892032465b90fbd03ad29242d2@tencent.com>

Re-post this RFR with correct enhancement number to make it trackable.
please ignore the previous wrong post. sorry for troubles.?

webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/
bug: https://bugs.openjdk.java.net/browse/JDK-8215624
CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
--------------
Lin
>Hi Lin,
>
>Could you, please, re-post your RFR with the right enhancement number in
>the message subject?
>It will be more trackable this way.
>
>Thanks,
>Serguei
>
>
>On 2/17/20 10:29 PM, linzang(??) wrote:
>> Dear David,
>>? ? ? ? Thanks a lot!
>> ? ? ? I have updated the refined code to?http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
>>? ? ? ? IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration.
>>? ? ? ? Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap,?then we can extend the solution to other kinds of heap.
>>????
>> Thanks,
>> --------------
>> Lin
>>> Hi Lin,
>>>
>>> Adding in hotspot-gc-dev as they need to see how this interacts with GC
>>> worker threads, and whether it needs to be extended beyond G1.
>>>
>>> I happened to spot one nit when browsing:
>>>
>>> src/hotspot/share/gc/shared/collectedHeap.hpp
>>>
>>> +?? virtual bool run_par_heap_inspect_task(KlassInfoTable* cit,
>>> +????????????????????????????????????????? BoolObjectClosure* filter,
>>> +????????????????????????????????????????? size_t* missed_count,
>>> +????????????????????????????????????????? size_t thread_num) {
>>> +???? return NULL;
>>>
>>> s/NULL/false/
>>>
>>> Cheers,
>>> David
>>>
>>> On 18/02/2020 2:15 pm, linzang(??) wrote:
>>>> Dear All,
>>>>? ? ? ? ?May I ask your help to review the follow changes:
>>>>? ? ? ? ?webrev:
>>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
>>>> ? ? ?bug: https://bugs.openjdk.java.net/browse/JDK-8215624
>>>> ? ? ?related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
>>>>? ? ? ? ?This patch enable parallel heap inspection of G1 for jmap histo.
>>>>? ? ? ? ?my simple test shown it can speed up 2x of jmap -histo with
>>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform.
>>>>
>>>> ------------------------------------------------------------------------
>>>> BRs,
>>>> Lin
>> >
>

From per.liden at oracle.com  Wed Feb 19 08:07:43 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 19 Feb 2020 09:07:43 +0100
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <acfff1f6-a3a5-858e-6bbd-a6705a9705ab@oss.nttdata.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
 <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
 <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>
 <fb005276-c7d2-dfcd-5e31-79fb96cc5794@oracle.com>
 <acfff1f6-a3a5-858e-6bbd-a6705a9705ab@oss.nttdata.com>
Message-ID: <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com>

On 2/17/20 1:28 PM, Yasumasa Suenaga wrote:
[...]
>>>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/
>>>>
>>>> Before this patch can go forward, you need to get to the bottom of 
>>>> how to get that ioctl command to work. If it's not possible, you 
>>>> need to explain why and propose alternatives that we can discuss.
>>>
>>> I guess it is caused by Linux kernel.
>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags 
>>> to `struct FS_IOC_FSGETXATTR`.
>>> However `FS_XFLAG_DAX` is not handled in it.
>>
>> Did a bit of googleing and it seems the DAX flag is in a bit of flux 
>> at the moment. I guess this will be fixed down the road, when DAX in 
>> the kernel becomes a non-experimental feature.
>>
>> How about we just do like this for now:
>>
>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0
> 
> I thought ZGC requires tmpfs or hugetlbfs due to performance reason.
> So I introduced new -XX option to make users aware of it.

The filesystem type check is there to help users avoid the mistake of 
placing the heap on an unintended/slow filesystem. However, most users 
will never use -XX:AllocateHeapAt, so I think that risk is fairly small 
to begin with.

The bar for adding new options to ZGC is high, and I don't think it's 
high enough in this case. Also, other GCs happily allow you to place the 
heap on any filesystem and I don't mind having that flexibility in ZGC too.

> 
> If not so, I agree with your change.
> 

Ok, thanks.

I updated the patch, added and adjusted some logging, and added a test. 
I also updated the bug title/description.

http://cr.openjdk.java.net/~pliden/8239129/webrev.1

cheers,
Per


From maoliang.ml at alibaba-inc.com  Wed Feb 19 08:09:46 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Wed, 19 Feb 2020 16:09:46 +0800
Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?=
 =?UTF-8?B?aXN0aWNz?=
Message-ID: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com>

Hi Thomas and Stefan?

Regarding the failed test case of JEP 346 and the potential idle
 scenario we discussed, I don't oppose to reserve the shring in 
remark because introducing another perodic GC to make sure the
 mixed GCs may not be a good idea as well.

Thank Thomas for fixing my mistakes. By looking into your patch,
I didn't see the expansion after concurrent mark based on
policy()->desired_bytes_after_concurrent_mark(). Is it missed?

Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 19 (Wed.) 04:52
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; Stefan Johansson <stefan.johansson at oracle.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi Liang,

   dug through the changes a bit, took longer and only managed to do 
cursory testing as there were a few issues.

That (very) cursory testing showed that memory consumption on one 
specjvm2008 out-of-box application is as baselined, but currently 
running the full set.

The change I used is available at 
http://cr.openjdk.java.net/~tschatzl/8236073/webrev.2/, I will step 
through what changed below.

- not really a bug and pre-existing, but I changed the various 
resize_heap_* to always include the exact GC pause because particularly 
for the "after_concurrent_mark" suffix it is not clear what this means. 
I.e. in the Remark or Cleanup pauses, or at the real end of concurrent 
cycle (still concurrent)?

This has not been done consistently yet.

- I think there has been a copy&paste error in 
G1CollectedHeap::resize_heap_if_necessary, the two calculations to 
determine the min and max desired capacity were equal. I.e.

1178   size_t minimum_desired_capacity = 
_heap_sizing_policy->target_heap_capacity(used_after_gc, MinHeapFreeRatio);
1179   size_t maximum_desired_capacity = 
_heap_sizing_policy->target_heap_capacity(used_after_gc, MinHeapFreeRatio);

Note the duplicate use of MinHeapFreeRatio. Fixed in above webrev.

- CollectorState contains flags that basically indicate they type of GC, 
which should be set at the start of gc and updated at the end of gc. The 
new finish_of_mixed_gc does not fit here as it is basically a flag 
indicating that we need to do the resizing.

The previous implementation also lets the first young-only gc after the 
last mixed gc do the resizing which is probably not as intended.

By adding an additional policy()->next_gc_should_be_mixed() call instead 
of the state check (and removing this pause state/type completely) fixes 
this (I think ;)).

- the suggested change removes the expansion during Cleanup for the 
reasons stated earlier. This removes the need for some code in the 
G1HeapSizingPolicy where originally _minimum_desired_bytes_after_last_cm 
had been stored. It's better to move this to G1Policy (and pre-existing, 
G1Policy should be the owner of G1HeapSizingPolicy which I did not fix 
in this change)

- (the suggested change does not add the shrinking at remark discussed 
earlier; I still think it would be nice and maybe fix that failing 
regression test)

- there should be more gc+heap+ergo logging of calculated 
targets/desired sizes in the new methods in G1HeapSizingPolicy, 
otherwise the decisions are very hard to follow after the fact.

- I believe there is an underestimation of the desired bytes after 
concurrent mark with adaptive IHOP enabled in the current code. If you 
look at the method G1Policy::desired_bytes_after_concurrent_mark(), the 
two terms returned by that method do not seem equal. I.e. 
G1AdaptiveIHOP::predict_unrestrained_buffer_size() does not contain the 
used bytes, the reserve and other parts used for the static IHOP (i.e. 
minimum_desired_buffer_size == 0).

At most, G1AdaptiveIHOP::predict_unrestrained_buffer_size() covers the 
young gen part of the latter. Some better name for this should be found 
too =)

As mentioned, currently running more tests until tomorrow (even with 
above known issues) to get some experience/data to look at with the 
sizing at mixed gc heuristic.

Thanks,
   Thomas


From ivan.walulya at oracle.com  Wed Feb 19 08:35:51 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Wed, 19 Feb 2020 09:35:51 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing with
 parallel gc
Message-ID: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>

Hi all,

Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.

Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
 <https://bugs.openjdk.java.net/browse/JDK-8216975>Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ <http://cr.openjdk.java.net/~iwalulya/8216975/00/>
Testing: Tier 1 - 3


//Ivan


From suenaga at oss.nttdata.com  Wed Feb 19 08:43:53 2020
From: suenaga at oss.nttdata.com (Yasumasa Suenaga)
Date: Wed, 19 Feb 2020 17:43:53 +0900
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
 <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
 <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>
 <fb005276-c7d2-dfcd-5e31-79fb96cc5794@oracle.com>
 <acfff1f6-a3a5-858e-6bbd-a6705a9705ab@oss.nttdata.com>
 <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com>
Message-ID: <0ff872cd-c3ff-1f9e-c119-b555e948d380@oss.nttdata.com>

Hi Per,

Thanks for updating JBS and for creating patch!
Your change looks good to me.
Please list me as Reviewer.


Thanks,

Yasumasa


On 2020/02/19 17:07, Per Liden wrote:
> On 2/17/20 1:28 PM, Yasumasa Suenaga wrote:
> [...]
>>>>>> ?? webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/
>>>>>
>>>>> Before this patch can go forward, you need to get to the bottom of how to get that ioctl command to work. If it's not possible, you need to explain why and propose alternatives that we can discuss.
>>>>
>>>> I guess it is caused by Linux kernel.
>>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem flags to `struct FS_IOC_FSGETXATTR`.
>>>> However `FS_XFLAG_DAX` is not handled in it.
>>>
>>> Did a bit of googleing and it seems the DAX flag is in a bit of flux at the moment. I guess this will be fixed down the road, when DAX in the kernel becomes a non-experimental feature.
>>>
>>> How about we just do like this for now:
>>>
>>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0
>>
>> I thought ZGC requires tmpfs or hugetlbfs due to performance reason.
>> So I introduced new -XX option to make users aware of it.
> 
> The filesystem type check is there to help users avoid the mistake of placing the heap on an unintended/slow filesystem. However, most users will never use -XX:AllocateHeapAt, so I think that risk is fairly small to begin with.
> 
> The bar for adding new options to ZGC is high, and I don't think it's high enough in this case. Also, other GCs happily allow you to place the heap on any filesystem and I don't mind having that flexibility in ZGC too.
> 
>>
>> If not so, I agree with your change.
>>
> 
> Ok, thanks.
> 
> I updated the patch, added and adjusted some logging, and added a test. I also updated the bug title/description.
> 
> http://cr.openjdk.java.net/~pliden/8239129/webrev.1
> 
> cheers,
> Per


From thomas.schatzl at oracle.com  Wed Feb 19 08:45:12 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 19 Feb 2020 09:45:12 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com>
References: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com>
Message-ID: <3a82dbc3-fe60-1ca4-c11f-e2c2b7f84527@oracle.com>

On 19.02.20 09:09, Liang Mao wrote:
> Hi Thomas and Stefan?
> 
> Regarding the failed test case of JEP 346 and the potential idle
> scenario we discussed, I don't oppose to reserve the shring in
> remark because introducing another perodic GC to make sure the
> mixed GCs may not be a good idea as well.
> 
> Thank Thomas for fixing my mistakes. By looking into your patch,
> I didn't see the expansion after concurrent mark based on
> policy()->desired_bytes_after_concurrent_mark(). Is it missed?
> 

   in an earlier email Stefan asked why the heuristic expands during 
Cleanup. In our opinion this is unnecessary and an artifact of doing 
full gc sizing in the Remark pause.

The reasoning goes as follows: at worst normal expansion between Cleanup 
and the first mixed gc will expand the heap anyway. There does not seem 
to be much difference in doing expansion during Cleanup or GC (or 
inbetween) except that it would arbitrarily move the cost into the 
Cleanup pause. (And in the stable state this shouldn't happen because we 
previously already sized the heap optimally ;) )

So the recent suggestion removes it. As mentioned, this is untested (and 
I am going to look at overnight results later today) but seems okay as 
the last mixed gc will size "optimally" later anyway.

Cleanup pause still records the _minimum_desired_bytes_after_last_gc 
since it is still needed later (and when discussing this last time we 
thought that this is the "best" place, now if we do not expand during 
Cleanup we actually do not need to do that there any more).

One more comment about one of the raised issues with the code further below.

> 
>     ------------------------------------------------------------------
>     From:Thomas Schatzl <thomas.schatzl at oracle.com>
>     Send Time:2020 Feb. 19 (Wed.) 04:52
>     To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; Stefan Johansson
>     <stefan.johansson at oracle.com>; hotspot-gc-dev
>     <hotspot-gc-dev at openjdk.java.net>
>     Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> 
[...]
> 
>     -?I?believe?there?is?an?underestimation?of?the?desired?bytes?after
>     concurrent?mark?with?adaptive?IHOP?enabled?in?the?current?code.?If?you
>     look?at?the?method?G1Policy::desired_bytes_after_concurrent_mark(),?the
>     two?terms?returned?by?that?method?do?not?seem?equal.?I.e.
>     G1AdaptiveIHOP::predict_unrestrained_buffer_size()?does?not?contain?the
>     used?bytes,?the?reserve?and?other?parts?used?for?the?static?IHOP?(i.e.
>     minimum_desired_buffer_size?==?0).
> 
>     At?most,?G1AdaptiveIHOP::predict_unrestrained_buffer_size()?covers?the
>     young?gen?part?of?the?latter.

I.e.

size_t G1Policy::minimum_desired_bytes_after_concurrent_mark(size_t 
used_bytes) {
   size_t minimum_desired_buffer_size = 
_ihop_control->predict_unstrained_buffer_size();
   return minimum_desired_buffer_size != 0 ?
            minimum_desired_buffer_size : _young_list_max_length * 
HeapRegion::GrainBytes
          + _reserve_regions * HeapRegion::GrainBytes + used_bytes;

is from what I understand the same as:

if (minimum_desired_buffer_size != 0) {
   return minimum_desired_buffer_size;
} else {
   return _young_list_max_length * ... + reserve_regions...;
}

I *think* the following has been intended:

return (minimum_desired_buffer_size != 0 ?
         minimum_desired_buffer_size : _young_list_max_length * 
HeapRegion::GrainBytes)
         + _reserve_regions * HeapRegion::GrainBytes + used_bytes;

It would be nicer to restructure the code a bit though.

>     As?mentioned,?currently?running?more?tests?until?tomorrow?(even?with
>     above?known?issues)?to?get?some?experience/data?to?look?at?with?the
>     sizing?at?mixed?gc?heuristic.
> 

Thanks,
 ? Thomas


From per.liden at oracle.com  Wed Feb 19 08:48:17 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 19 Feb 2020 09:48:17 +0100
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <0ff872cd-c3ff-1f9e-c119-b555e948d380@oss.nttdata.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
 <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
 <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>
 <fb005276-c7d2-dfcd-5e31-79fb96cc5794@oracle.com>
 <acfff1f6-a3a5-858e-6bbd-a6705a9705ab@oss.nttdata.com>
 <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com>
 <0ff872cd-c3ff-1f9e-c119-b555e948d380@oss.nttdata.com>
Message-ID: <7c362a1e-c4c3-90b7-ff67-15a522072b4b@oracle.com>

Hi Yasumasa,

On 2/19/20 9:43 AM, Yasumasa Suenaga wrote:
> Hi Per,
> 
> Thanks for updating JBS and for creating patch!
> Your change looks good to me.

Great, thanks.

> Please list me as Reviewer.

I'll add you both as reviewer and contributor of the patch.

cheers,
Per

> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> On 2020/02/19 17:07, Per Liden wrote:
>> On 2/17/20 1:28 PM, Yasumasa Suenaga wrote:
>> [...]
>>>>>>> ?? webrev: 
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/
>>>>>>
>>>>>> Before this patch can go forward, you need to get to the bottom of 
>>>>>> how to get that ioctl command to work. If it's not possible, you 
>>>>>> need to explain why and propose alternatives that we can discuss.
>>>>>
>>>>> I guess it is caused by Linux kernel.
>>>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem 
>>>>> flags to `struct FS_IOC_FSGETXATTR`.
>>>>> However `FS_XFLAG_DAX` is not handled in it.
>>>>
>>>> Did a bit of googleing and it seems the DAX flag is in a bit of flux 
>>>> at the moment. I guess this will be fixed down the road, when DAX in 
>>>> the kernel becomes a non-experimental feature.
>>>>
>>>> How about we just do like this for now:
>>>>
>>>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0
>>>
>>> I thought ZGC requires tmpfs or hugetlbfs due to performance reason.
>>> So I introduced new -XX option to make users aware of it.
>>
>> The filesystem type check is there to help users avoid the mistake of 
>> placing the heap on an unintended/slow filesystem. However, most users 
>> will never use -XX:AllocateHeapAt, so I think that risk is fairly 
>> small to begin with.
>>
>> The bar for adding new options to ZGC is high, and I don't think it's 
>> high enough in this case. Also, other GCs happily allow you to place 
>> the heap on any filesystem and I don't mind having that flexibility in 
>> ZGC too.
>>
>>>
>>> If not so, I agree with your change.
>>>
>>
>> Ok, thanks.
>>
>> I updated the patch, added and adjusted some logging, and added a 
>> test. I also updated the bug title/description.
>>
>> http://cr.openjdk.java.net/~pliden/8239129/webrev.1
>>
>> cheers,
>> Per


From thomas.schatzl at oracle.com  Wed Feb 19 09:22:25 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 19 Feb 2020 10:22:25 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
Message-ID: <e1987cf4-b73d-c321-cc8a-480aaf2a0429@oracle.com>

Hi,

On 19.02.20 09:35, Ivan Walulya wrote:
> Hi all,
> 
> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
> Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/
> Testing: Tier 1 - 3
> 
> 
> //Ivan
> 

   lgtm :)

Thomas


From ivan.walulya at oracle.com  Wed Feb 19 09:27:34 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Wed, 19 Feb 2020 10:27:34 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <e1987cf4-b73d-c321-cc8a-480aaf2a0429@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <e1987cf4-b73d-c321-cc8a-480aaf2a0429@oracle.com>
Message-ID: <CF760AE6-88FC-4B8E-A8B2-F3550EC96B28@oracle.com>

Thanks Thomas!

> On 19 Feb 2020, at 10:22, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi,
> 
> On 19.02.20 09:35, Ivan Walulya wrote:
>> Hi all,
>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
>> Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/
>> Testing: Tier 1 - 3
>> //Ivan
> 
>  lgtm :)
> 
> Thomas


From maoliang.ml at alibaba-inc.com  Wed Feb 19 10:44:19 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Wed, 19 Feb 2020 18:44:19 +0800
Subject: =?UTF-8?B?UmU6IFJGUjogODIzNjA3MzogRzE6IFVzZSBTb2Z0TWF4SGVhcFNpemUgdG8gZ3VpZGUgR0Mg?=
 =?UTF-8?B?aGV1cmlzdGljcw==?=
In-Reply-To: <3a82dbc3-fe60-1ca4-c11f-e2c2b7f84527@oracle.com>
References: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com>,
 <3a82dbc3-fe60-1ca4-c11f-e2c2b7f84527@oracle.com>
Message-ID: <1724eb2b-5b47-4a5d-8153-9080de8c4391.maoliang.ml@alibaba-inc.com>

Hi Thomas,

When I was testing those benchmarks like specjbb2015 and specjvm2008,
the expansions mostly happened at remark. So I guess the expansion after
concurrent mark at peak usage based on a minimal capacity might
 prevent several expansions in normal young collections. It's only my
 thinking since I don't have much performance data. I don't have any
 problems with expanding after young collection:)

BTW, do you and Stefan prefer to leave the shrink at remark for fixing
the failure of JEP346 and handling the idle scenario?

Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 19 (Wed.) 16:45
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; Stefan Johansson <stefan.johansson at oracle.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

On 19.02.20 09:09, Liang Mao wrote:
> Hi Thomas and Stefan?
> 
> Regarding the failed test case of JEP 346 and the potential idle
> scenario we discussed, I don't oppose to reserve the shring in
> remark because introducing another perodic GC to make sure the
> mixed GCs may not be a good idea as well.
> 
> Thank Thomas for fixing my mistakes. By looking into your patch,
> I didn't see the expansion after concurrent mark based on
> policy()->desired_bytes_after_concurrent_mark(). Is it missed?
> 

   in an earlier email Stefan asked why the heuristic expands during 
Cleanup. In our opinion this is unnecessary and an artifact of doing 
full gc sizing in the Remark pause.

The reasoning goes as follows: at worst normal expansion between Cleanup 
and the first mixed gc will expand the heap anyway. There does not seem 
to be much difference in doing expansion during Cleanup or GC (or 
inbetween) except that it would arbitrarily move the cost into the 
Cleanup pause. (And in the stable state this shouldn't happen because we 
previously already sized the heap optimally ;) )

So the recent suggestion removes it. As mentioned, this is untested (and 
I am going to look at overnight results later today) but seems okay as 
the last mixed gc will size "optimally" later anyway.

Cleanup pause still records the _minimum_desired_bytes_after_last_gc 
since it is still needed later (and when discussing this last time we 
thought that this is the "best" place, now if we do not expand during 
Cleanup we actually do not need to do that there any more).

One more comment about one of the raised issues with the code further below.

> 
>     ------------------------------------------------------------------
>     From:Thomas Schatzl <thomas.schatzl at oracle.com>
>     Send Time:2020 Feb. 19 (Wed.) 04:52
>     To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; Stefan Johansson
>     <stefan.johansson at oracle.com>; hotspot-gc-dev
>     <hotspot-gc-dev at openjdk.java.net>
>     Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
> 
[...]
> 
>     - I believe there is an underestimation of the desired bytes after
>     concurrent mark with adaptive IHOP enabled in the current code. If you
>     look at the method G1Policy::desired_bytes_after_concurrent_mark(), the
>     two terms returned by that method do not seem equal. I.e.
>     G1AdaptiveIHOP::predict_unrestrained_buffer_size() does not contain the
>     used bytes, the reserve and other parts used for the static IHOP (i.e.
>     minimum_desired_buffer_size == 0).
> 
>     At most, G1AdaptiveIHOP::predict_unrestrained_buffer_size() covers the
>     young gen part of the latter.

I.e.

size_t G1Policy::minimum_desired_bytes_after_concurrent_mark(size_t 
used_bytes) {
   size_t minimum_desired_buffer_size = 
_ihop_control->predict_unstrained_buffer_size();
   return minimum_desired_buffer_size != 0 ?
            minimum_desired_buffer_size : _young_list_max_length * 
HeapRegion::GrainBytes
          + _reserve_regions * HeapRegion::GrainBytes + used_bytes;

is from what I understand the same as:

if (minimum_desired_buffer_size != 0) {
   return minimum_desired_buffer_size;
} else {
   return _young_list_max_length * ... + reserve_regions...;
}

I *think* the following has been intended:

return (minimum_desired_buffer_size != 0 ?
         minimum_desired_buffer_size : _young_list_max_length * 
HeapRegion::GrainBytes)
         + _reserve_regions * HeapRegion::GrainBytes + used_bytes;

It would be nicer to restructure the code a bit though.

>     As mentioned, currently running more tests until tomorrow (even with
>     above known issues) to get some experience/data to look at with the
>     sizing at mixed gc heuristic.
> 

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Wed Feb 19 10:54:29 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 19 Feb 2020 11:54:29 +0100
Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
In-Reply-To: <1724eb2b-5b47-4a5d-8153-9080de8c4391.maoliang.ml@alibaba-inc.com>
References: <7b116be1-fba3-42f6-a9b1-0500dbabda1d.maoliang.ml@alibaba-inc.com>
 <3a82dbc3-fe60-1ca4-c11f-e2c2b7f84527@oracle.com>
 <1724eb2b-5b47-4a5d-8153-9080de8c4391.maoliang.ml@alibaba-inc.com>
Message-ID: <6c8281c1-d2b3-073b-8984-8f030f105b14@oracle.com>

Hi,

On 19.02.20 11:44, Liang Mao wrote:
> Hi Thomas,
> 
> When I was testing those benchmarks like specjbb2015 and specjvm2008,
> the expansions mostly happened at remark. So I guess the expansion after
> concurrent mark at peak usage based on a minimal capacity might
> prevent several expansions in normal young collections. It's only my
> thinking since I don't have much performance data. I don't have any
> problems with expanding after young collection:)

We'll collect perf data about this.

> 
> BTW, do you and Stefan prefer to leave the shrink at remark for fixing
> the failure of JEP346 and handling the idle scenario?

Yes, and since Stefan suggested that we should shrink during Remark 
already I think he agrees.

Thomas


From kim.barrett at oracle.com  Wed Feb 19 15:23:09 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 19 Feb 2020 10:23:09 -0500
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
Message-ID: <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>

> On Feb 19, 2020, at 3:35 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
> 
> Hi all,
> 
> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
> <https://bugs.openjdk.java.net/browse/JDK-8216975>Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ <http://cr.openjdk.java.net/~iwalulya/8216975/00/>
> Testing: Tier 1 - 3
> 
> 
> //Ivan

Setting UseNUMA true when Linux::libnuma_init returns false seems
unlikely to work.  The description of ForceNUMA is

    Force NUMA optimizations on single-node/UMA systems

which suggests how it's presently being used in numa_init is wrong.  I
think the current use should be removed and this conditional clause

5129       // If there's only one node (they start from 0) or if the process
5130       // is bound explicitly to a single node using membind, disable NUMA.
5131       UseNUMA = false;

should instead use

  UseNUMA = ForceNUMA


From kim.barrett at oracle.com  Wed Feb 19 15:30:52 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 19 Feb 2020 10:30:52 -0500
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
Message-ID: <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>

> On Feb 19, 2020, at 10:23 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
>> 
>> Hi all,
>> 
>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.
>> 
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
>> <https://bugs.openjdk.java.net/browse/JDK-8216975>Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ <http://cr.openjdk.java.net/~iwalulya/8216975/00/>
>> Testing: Tier 1 - 3
>> 
>> 
>> //Ivan
> 
> Setting UseNUMA true when Linux::libnuma_init returns false seems
> unlikely to work.  The description of ForceNUMA is
> 
>    Force NUMA optimizations on single-node/UMA systems
> 
> which suggests how it's presently being used in numa_init is wrong.  I
> think the current use should be removed and this conditional clause
> 
> 5129       // If there's only one node (they start from 0) or if the process
> 5130       // is bound explicitly to a single node using membind, disable NUMA.
> 5131       UseNUMA = false;
> 
> should instead use
> 
>  UseNUMA = ForceNUMA

The Solaris use of ForceNUMA looks like it has a similar problem.

On Windows, UseNUMA seems to get forced off unless ForceNUMA, because
NUMA support isn?t complete there.  Which is an entirely different meaning for
ForceNUMA from its description.

That covers all the uses of ForceNUMA.


From ivan.walulya at oracle.com  Wed Feb 19 15:44:31 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Wed, 19 Feb 2020 16:44:31 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
 <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
Message-ID: <D9086E15-1311-4FE4-9934-0D27FE6B4832@oracle.com>

Thanks Kim, I agree it is might be redundant to ForceNUMA when Linux::libnuma_init fails. I will make the changes and make a new RFR. 

> On 19 Feb 2020, at 16:30, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
>> On Feb 19, 2020, at 10:23 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>> 
>>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.
>>> 
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
>>> <https://bugs.openjdk.java.net/browse/JDK-8216975>Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ <http://cr.openjdk.java.net/~iwalulya/8216975/00/>
>>> Testing: Tier 1 - 3
>>> 
>>> 
>>> //Ivan
>> 
>> Setting UseNUMA true when Linux::libnuma_init returns false seems
>> unlikely to work.  The description of ForceNUMA is
>> 
>>   Force NUMA optimizations on single-node/UMA systems
>> 
>> which suggests how it's presently being used in numa_init is wrong.  I
>> think the current use should be removed and this conditional clause
>> 
>> 5129       // If there's only one node (they start from 0) or if the process
>> 5130       // is bound explicitly to a single node using membind, disable NUMA.
>> 5131       UseNUMA = false;
>> 
>> should instead use
>> 
>> UseNUMA = ForceNUMA
> 
> The Solaris use of ForceNUMA looks like it has a similar problem.
> 
> On Windows, UseNUMA seems to get forced off unless ForceNUMA, because
> NUMA support isn?t complete there.  Which is an entirely different meaning for
> ForceNUMA from its description.
> 
> That covers all the uses of ForceNUMA.
> 


From ivan.walulya at oracle.com  Thu Feb 20 08:04:45 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Thu, 20 Feb 2020 09:04:45 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
 <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
Message-ID: <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>

Hi all,

Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ <http://cr.openjdk.java.net/~iwalulya/8216975/01/> 

//Ivan

> On 19 Feb 2020, at 16:30, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
>> On Feb 19, 2020, at 10:23 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>> 
>>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.
>>> 
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
>>> <https://bugs.openjdk.java.net/browse/JDK-8216975>Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ <http://cr.openjdk.java.net/~iwalulya/8216975/00/>
>>> Testing: Tier 1 - 3
>>> 
>>> 
>>> //Ivan
>> 
>> Setting UseNUMA true when Linux::libnuma_init returns false seems
>> unlikely to work.  The description of ForceNUMA is
>> 
>>   Force NUMA optimizations on single-node/UMA systems
>> 
>> which suggests how it's presently being used in numa_init is wrong.  I
>> think the current use should be removed and this conditional clause
>> 
>> 5129       // If there's only one node (they start from 0) or if the process
>> 5130       // is bound explicitly to a single node using membind, disable NUMA.
>> 5131       UseNUMA = false;
>> 
>> should instead use
>> 
>> UseNUMA = ForceNUMA
> 
> The Solaris use of ForceNUMA looks like it has a similar problem.
> 
> On Windows, UseNUMA seems to get forced off unless ForceNUMA, because
> NUMA support isn?t complete there.  Which is an entirely different meaning for
> ForceNUMA from its description.
> 
> That covers all the uses of ForceNUMA.
> 


From per.liden at oracle.com  Thu Feb 20 08:26:07 2020
From: per.liden at oracle.com (Per Liden)
Date: Thu, 20 Feb 2020 09:26:07 +0100
Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic
Message-ID: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com>

I propose that the ZProactive flag shouldn't be a diagnostic flag, since 
it's a feature you might want to permanently enable/disable (similar to 
ZUncommit), rather than something you enable/disable to diagnose an issue.

Bug: https://bugs.openjdk.java.net/browse/JDK-8239533
Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0

/Per


From erik.osterlund at oracle.com  Thu Feb 20 10:03:39 2020
From: erik.osterlund at oracle.com (erik.osterlund at oracle.com)
Date: Thu, 20 Feb 2020 11:03:39 +0100
Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic
In-Reply-To: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com>
References: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com>
Message-ID: <50696985-7657-021e-bdcd-a463f5b79456@oracle.com>

Hi Per,

Looks good.

Thanks,
/Erik

On 2/20/20 9:26 AM, Per Liden wrote:
> I propose that the ZProactive flag shouldn't be a diagnostic flag, 
> since it's a feature you might want to permanently enable/disable 
> (similar to ZUncommit), rather than something you enable/disable to 
> diagnose an issue.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8239533
> Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0
>
> /Per


From per.liden at oracle.com  Thu Feb 20 10:52:17 2020
From: per.liden at oracle.com (Per Liden)
Date: Thu, 20 Feb 2020 11:52:17 +0100
Subject: RFC: JEP: ZGC: Production Ready
Message-ID: <f39f4160-4f77-c45a-85f5-a9910bc6ecc5@oracle.com>

Hi all,

I've created a JEP draft to make ZGC a product (non-experimental) feature.

https://bugs.openjdk.java.net/browse/JDK-8209683

Comments and feedback welcome.

cheers,
Per


From per.liden at oracle.com  Thu Feb 20 10:52:37 2020
From: per.liden at oracle.com (Per Liden)
Date: Thu, 20 Feb 2020 11:52:37 +0100
Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic
In-Reply-To: <50696985-7657-021e-bdcd-a463f5b79456@oracle.com>
References: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com>
 <50696985-7657-021e-bdcd-a463f5b79456@oracle.com>
Message-ID: <d4877ef5-d2ca-5c15-5078-13caf7b36f85@oracle.com>

Thanks Erik!

/Per

On 2/20/20 11:03 AM, erik.osterlund at oracle.com wrote:
> Hi Per,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
> On 2/20/20 9:26 AM, Per Liden wrote:
>> I propose that the ZProactive flag shouldn't be a diagnostic flag, 
>> since it's a feature you might want to permanently enable/disable 
>> (similar to ZUncommit), rather than something you enable/disable to 
>> diagnose an issue.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239533
>> Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0
>>
>> /Per
> 


From shade at redhat.com  Thu Feb 20 12:24:15 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 20 Feb 2020 13:24:15 +0100
Subject: RFR (S) 8232100: GC timings should use proper units for heap sizes
In-Reply-To: <ab5224ff-1842-23b1-e8c6-80f97e0dd1de@redhat.com>
References: <ab5224ff-1842-23b1-e8c6-80f97e0dd1de@redhat.com>
Message-ID: <4ad7db72-1a52-037f-37b1-558ec176a172@redhat.com>

On 10/10/19 2:03 PM, Aleksey Shipilev wrote:
> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8232100
> 
> Webrev:
>   https://cr.openjdk.java.net/~shade/8232100/webrev.01/
> 
> GC log prints heap sizes in selected GC events. Currently, it unconditionally uses "M" as the suffix
> for heap sizes, which makes GC logs too coarse on smaller heaps. This loses performance data
> accuracy, which is sometimes a dealbreaker in logs analysis. Let's make it into proper units.
> 
> I ran many tests of my own, but would appreciate if somebody runs it through more comprehensive
> suite of tests, looking for tests that parse the GC logs for whatever reason.
> 
> Testing: eyeballing GC logs, jdk-submit, hotspot_gc {g1, shenandoah, parallel}

No takers? :)

-- 
Thanks,
-Aleksey


From kim.barrett at oracle.com  Thu Feb 20 21:08:52 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 20 Feb 2020 16:08:52 -0500
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
 <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
 <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>
Message-ID: <B2D87B9A-FCC8-4787-90CA-67DA78298104@oracle.com>

> On Feb 20, 2020, at 3:04 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
> 
> Hi all,
> 
> Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ 

Looks good.


From ivan.walulya at oracle.com  Fri Feb 21 09:08:26 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Fri, 21 Feb 2020 10:08:26 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <B2D87B9A-FCC8-4787-90CA-67DA78298104@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
 <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
 <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>
 <B2D87B9A-FCC8-4787-90CA-67DA78298104@oracle.com>
Message-ID: <974919FC-07EA-43DD-B71C-83DB1834BFF1@oracle.com>

Thanks kim!

> On 20 Feb 2020, at 22:08, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
>> On Feb 20, 2020, at 3:04 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
>> 
>> Hi all,
>> 
>> Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ 
> 
> Looks good.
> 


From stefan.karlsson at oracle.com  Fri Feb 21 09:21:47 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 21 Feb 2020 10:21:47 +0100
Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic
In-Reply-To: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com>
References: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com>
Message-ID: <ce92726d-559c-11fe-b9bd-6dbf20176602@oracle.com>

Looks good.

StefanK

On 2020-02-20 09:26, Per Liden wrote:
> I propose that the ZProactive flag shouldn't be a diagnostic flag, 
> since it's a feature you might want to permanently enable/disable 
> (similar to ZUncommit), rather than something you enable/disable to 
> diagnose an issue.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8239533
> Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0
>
> /Per


From thomas.schatzl at oracle.com  Fri Feb 21 09:30:03 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 21 Feb 2020 10:30:03 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
 <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
 <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>
Message-ID: <b3e47235-69e5-a71b-c478-94c84b94bf24@oracle.com>

Hi,

On 20.02.20 09:04, Ivan Walulya wrote:
> Hi all,
> 
> Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ <http://cr.openjdk.java.net/~iwalulya/8216975/01/>

   I think this is even better :)

Thomas


From leo.korinth at oracle.com  Fri Feb 21 09:45:27 2020
From: leo.korinth at oracle.com (Leo Korinth)
Date: Fri, 21 Feb 2020 10:45:27 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
 <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
 <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>
Message-ID: <b52ae254-122f-3583-4e36-f4d088807786@oracle.com>

Looks good, I will push for you.

Thanks,
Leo

On 20/02/2020 09:04, Ivan Walulya wrote:
> Hi all,
> 
> Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ <http://cr.openjdk.java.net/~iwalulya/8216975/01/>
> 
> //Ivan
> 
>> On 19 Feb 2020, at 16:30, Kim Barrett <kim.barrett at oracle.com> wrote:
>>
>>> On Feb 19, 2020, at 10:23 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>>
>>>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
>>>> <https://bugs.openjdk.java.net/browse/JDK-8216975>Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ <http://cr.openjdk.java.net/~iwalulya/8216975/00/>
>>>> Testing: Tier 1 - 3
>>>>
>>>>
>>>> //Ivan
>>>
>>> Setting UseNUMA true when Linux::libnuma_init returns false seems
>>> unlikely to work.  The description of ForceNUMA is
>>>
>>>    Force NUMA optimizations on single-node/UMA systems
>>>
>>> which suggests how it's presently being used in numa_init is wrong.  I
>>> think the current use should be removed and this conditional clause
>>>
>>> 5129       // If there's only one node (they start from 0) or if the process
>>> 5130       // is bound explicitly to a single node using membind, disable NUMA.
>>> 5131       UseNUMA = false;
>>>
>>> should instead use
>>>
>>> UseNUMA = ForceNUMA
>>
>> The Solaris use of ForceNUMA looks like it has a similar problem.
>>
>> On Windows, UseNUMA seems to get forced off unless ForceNUMA, because
>> NUMA support isn?t complete there.  Which is an entirely different meaning for
>> ForceNUMA from its description.
>>
>> That covers all the uses of ForceNUMA.
>>
> 


From ivan.walulya at oracle.com  Fri Feb 21 10:02:41 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Fri, 21 Feb 2020 11:02:41 +0100
Subject: FRF: 8216975 Using ForceNUMA does not disable adaptive sizing
 with parallel gc
In-Reply-To: <b52ae254-122f-3583-4e36-f4d088807786@oracle.com>
References: <8D0A181B-4A38-4E4F-B4EA-E21A28E2F27C@oracle.com>
 <083E9D0A-CB76-4630-B79C-F496A5296F65@oracle.com>
 <B0788D4E-C2D9-432F-93A3-94519DE49711@oracle.com>
 <BB08C28A-87DB-4951-87E8-14F59498ACFA@oracle.com>
 <b52ae254-122f-3583-4e36-f4d088807786@oracle.com>
Message-ID: <A3F71288-071F-4613-97CB-43FA068F378C@oracle.com>

Thanks Leo!

> On 21 Feb 2020, at 10:45, Leo Korinth <leo.korinth at oracle.com> wrote:
> 
> Looks good, I will push for you.
> 
> Thanks,
> Leo
> 
> On 20/02/2020 09:04, Ivan Walulya wrote:
>> Hi all,
>> Here is the revised webrev: http://cr.openjdk.java.net/~iwalulya/8216975/01/ <http://cr.openjdk.java.net/~iwalulya/8216975/01/>
>> //Ivan
>>> On 19 Feb 2020, at 16:30, Kim Barrett <kim.barrett at oracle.com> wrote:
>>> 
>>>> On Feb 19, 2020, at 10:23 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>>> 
>>>>> On Feb 19, 2020, at 3:35 AM, Ivan Walulya <ivan.walulya at oracle.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> Please review a minor modification to disable adaptive sizing when ForceNuma is used with ParallelGC and UseLargePages on Linux OS.
>>>>> 
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216975
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8216975>Webrev: http://cr.openjdk.java.net/~iwalulya/8216975/00/ <http://cr.openjdk.java.net/~iwalulya/8216975/00/>
>>>>> Testing: Tier 1 - 3
>>>>> 
>>>>> 
>>>>> //Ivan
>>>> 
>>>> Setting UseNUMA true when Linux::libnuma_init returns false seems
>>>> unlikely to work.  The description of ForceNUMA is
>>>> 
>>>>   Force NUMA optimizations on single-node/UMA systems
>>>> 
>>>> which suggests how it's presently being used in numa_init is wrong.  I
>>>> think the current use should be removed and this conditional clause
>>>> 
>>>> 5129       // If there's only one node (they start from 0) or if the process
>>>> 5130       // is bound explicitly to a single node using membind, disable NUMA.
>>>> 5131       UseNUMA = false;
>>>> 
>>>> should instead use
>>>> 
>>>> UseNUMA = ForceNUMA
>>> 
>>> The Solaris use of ForceNUMA looks like it has a similar problem.
>>> 
>>> On Windows, UseNUMA seems to get forced off unless ForceNUMA, because
>>> NUMA support isn?t complete there.  Which is an entirely different meaning for
>>> ForceNUMA from its description.
>>> 
>>> That covers all the uses of ForceNUMA.
>>> 


From per.liden at oracle.com  Fri Feb 21 10:10:18 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 21 Feb 2020 11:10:18 +0100
Subject: RFR: 8239533: ZGC: Make the ZProactive flag non-diagnostic
In-Reply-To: <ce92726d-559c-11fe-b9bd-6dbf20176602@oracle.com>
References: <566a6991-e02d-cc57-296b-c2a14dfed329@oracle.com>
 <ce92726d-559c-11fe-b9bd-6dbf20176602@oracle.com>
Message-ID: <ab466a0a-771e-74ae-38cf-59aa5d74b2f5@oracle.com>

Thanks Stefan!

/Per

On 2/21/20 10:21 AM, Stefan Karlsson wrote:
> Looks good.
> 
> StefanK
> 
> On 2020-02-20 09:26, Per Liden wrote:
>> I propose that the ZProactive flag shouldn't be a diagnostic flag, 
>> since it's a feature you might want to permanently enable/disable 
>> (similar to ZUncommit), rather than something you enable/disable to 
>> diagnose an issue.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8239533
>> Webrev: http://cr.openjdk.java.net/~pliden/8239533/webrev.0
>>
>> /Per
> 


From stefan.karlsson at oracle.com  Fri Feb 21 11:30:18 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 21 Feb 2020 12:30:18 +0100
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
 <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
 <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>
 <fb005276-c7d2-dfcd-5e31-79fb96cc5794@oracle.com>
 <acfff1f6-a3a5-858e-6bbd-a6705a9705ab@oss.nttdata.com>
 <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com>
Message-ID: <65a46af0-bc01-6354-6680-3459b2a06f23@oracle.com>

Looks good.

StefanK

On 2020-02-19 09:07, Per Liden wrote:
> On 2/17/20 1:28 PM, Yasumasa Suenaga wrote:
> [...]
>>>>>> ?? webrev: 
>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/
>>>>>
>>>>> Before this patch can go forward, you need to get to the bottom of 
>>>>> how to get that ioctl command to work. If it's not possible, you 
>>>>> need to explain why and propose alternatives that we can discuss.
>>>>
>>>> I guess it is caused by Linux kernel.
>>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem 
>>>> flags to `struct FS_IOC_FSGETXATTR`.
>>>> However `FS_XFLAG_DAX` is not handled in it.
>>>
>>> Did a bit of googleing and it seems the DAX flag is in a bit of flux 
>>> at the moment. I guess this will be fixed down the road, when DAX in 
>>> the kernel becomes a non-experimental feature.
>>>
>>> How about we just do like this for now:
>>>
>>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0
>>
>> I thought ZGC requires tmpfs or hugetlbfs due to performance reason.
>> So I introduced new -XX option to make users aware of it.
>
> The filesystem type check is there to help users avoid the mistake of 
> placing the heap on an unintended/slow filesystem. However, most users 
> will never use -XX:AllocateHeapAt, so I think that risk is fairly 
> small to begin with.
>
> The bar for adding new options to ZGC is high, and I don't think it's 
> high enough in this case. Also, other GCs happily allow you to place 
> the heap on any filesystem and I don't mind having that flexibility in 
> ZGC too.
>
>>
>> If not so, I agree with your change.
>>
>
> Ok, thanks.
>
> I updated the patch, added and adjusted some logging, and added a 
> test. I also updated the bug title/description.
>
> http://cr.openjdk.java.net/~pliden/8239129/webrev.1
>
> cheers,
> Per


From leonid.mesnik at oracle.com  Fri Feb 21 19:48:06 2020
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Fri, 21 Feb 2020 11:48:06 -0800
Subject: RFR: 8203239: [TESTBUG] remove vmTestbase/vm/gc/kind/parOld test
Message-ID: <b16a6a9e-19d5-ed9d-4eac-6d44144385f2@oracle.com>

Hi

Could you please review following fix which removes parOld test. Test 
checks that ParOldGC is used if no GC is selected and new gen GC is 
PSYoungGen. Test is obsolete now and should be removed.

webrev: http://cr.openjdk.java.net/~lmesnik/8203239/webrev.00/

bug: https://bugs.openjdk.java.net/browse/JDK-8203239


From per.liden at oracle.com  Mon Feb 24 10:53:03 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 24 Feb 2020 11:53:03 +0100
Subject: RFR: 8239129: Use DAX in ZGC
In-Reply-To: <65a46af0-bc01-6354-6680-3459b2a06f23@oracle.com>
References: <64207ef5-fabb-748a-15c9-e96e4bc612d8@oss.nttdata.com>
 <07354697-3758-02b9-0cc2-5fe887449e2a@oracle.com>
 <ff3a815e-cf29-aeda-a0e7-3483a0f01b59@oss.nttdata.com>
 <2a781b6a-0277-3bd1-3d0a-f3b2ac8a93c6@oracle.com>
 <0ae8d397-99c4-a2b6-93bb-5ab59861e25f@oss.nttdata.com>
 <d251c75f-b02c-872f-46cb-c6ddeba1084b@oracle.com>
 <64f25d5e-e352-2210-718f-667d2c547de7@oss.nttdata.com>
 <5af0f20e-3909-c656-e1c0-276d0e3c72c3@oracle.com>
 <dd68a935-c69d-1672-18ae-dfc2c977fb9c@oss.nttdata.com>
 <fb005276-c7d2-dfcd-5e31-79fb96cc5794@oracle.com>
 <acfff1f6-a3a5-858e-6bbd-a6705a9705ab@oss.nttdata.com>
 <15478dad-ccba-2bd8-006a-2c2cc5f2c5b9@oracle.com>
 <65a46af0-bc01-6354-6680-3459b2a06f23@oracle.com>
Message-ID: <2d1db6fb-cce1-65dc-0539-819419314fcc@oracle.com>

Thanks Stefan!

/Per

On 2/21/20 12:30 PM, Stefan Karlsson wrote:
> Looks good.
> 
> StefanK
> 
> On 2020-02-19 09:07, Per Liden wrote:
>> On 2/17/20 1:28 PM, Yasumasa Suenaga wrote:
>> [...]
>>>>>>> ?? webrev: 
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8239129/webrev.00/
>>>>>>
>>>>>> Before this patch can go forward, you need to get to the bottom of 
>>>>>> how to get that ioctl command to work. If it's not possible, you 
>>>>>> need to explain why and propose alternatives that we can discuss.
>>>>>
>>>>> I guess it is caused by Linux kernel.
>>>>> In case of ext4, `ext4_iflags_to_xflags()` would set filesystem 
>>>>> flags to `struct FS_IOC_FSGETXATTR`.
>>>>> However `FS_XFLAG_DAX` is not handled in it.
>>>>
>>>> Did a bit of googleing and it seems the DAX flag is in a bit of flux 
>>>> at the moment. I guess this will be fixed down the road, when DAX in 
>>>> the kernel becomes a non-experimental feature.
>>>>
>>>> How about we just do like this for now:
>>>>
>>>> http://cr.openjdk.java.net/~pliden/8239129/webrev.0
>>>
>>> I thought ZGC requires tmpfs or hugetlbfs due to performance reason.
>>> So I introduced new -XX option to make users aware of it.
>>
>> The filesystem type check is there to help users avoid the mistake of 
>> placing the heap on an unintended/slow filesystem. However, most users 
>> will never use -XX:AllocateHeapAt, so I think that risk is fairly 
>> small to begin with.
>>
>> The bar for adding new options to ZGC is high, and I don't think it's 
>> high enough in this case. Also, other GCs happily allow you to place 
>> the heap on any filesystem and I don't mind having that flexibility in 
>> ZGC too.
>>
>>>
>>> If not so, I agree with your change.
>>>
>>
>> Ok, thanks.
>>
>> I updated the patch, added and adjusted some logging, and added a 
>> test. I also updated the bug title/description.
>>
>> http://cr.openjdk.java.net/~pliden/8239129/webrev.1
>>
>> cheers,
>> Per
> 


From shade at redhat.com  Mon Feb 24 16:12:00 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 24 Feb 2020 17:12:00 +0100
Subject: RFR (XS) 8239868: Shenandoah: ditch C2 node limit adjustments
Message-ID: <8eeac17f-a6ed-18c1-ef90-667e692e309a@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8239868

We have the block added to Shenandoah arguments code that adjusts MaxNodeLimit and friends (predates
inclusion of Shenandoah into mainline):
 https://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-August/006983.html

At the time, it was prompted by observing that lots of barriers everywhere really needed to have
this limit bumped. Today, with simplified LRB scheme, more simple LRB due to SFX, etc, we do not
need this.

The change above used ShenandoahCompileCheck, which made it into upstream code under generic
AbortVMOnCompilationFailure. With that, I was able to verify that dropping the block does not yield
compilation failures due to exceeded node budget on hotspot_gc_shenandoah, specjvm2008, specjbb2015.
Performance numbers are also not affected (as expected).

Therefore, the adjustment can be removed:

diff -r 5c5dcd036a76 src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp   Mon Feb 24 11:01:51 2020 +0100
+++ b/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp   Mon Feb 24 17:09:58 2020 +0100
@@ -193,13 +193,4 @@
   }

-  // Shenandoah needs more C2 nodes to compile some methods with lots of barriers.
-  // NodeLimitFudgeFactor needs to stay the same relative to MaxNodeLimit.
-#ifdef COMPILER2
-  if (FLAG_IS_DEFAULT(MaxNodeLimit)) {
-    FLAG_SET_DEFAULT(MaxNodeLimit, MaxNodeLimit * 3);
-    FLAG_SET_DEFAULT(NodeLimitFudgeFactor, NodeLimitFudgeFactor * 3);
-  }
-#endif
-
   // Make sure safepoint deadlocks are failing predictably. This sets up VM to report
   // fatal error after 10 seconds of wait for safepoint syncronization (not the VM

Testing: hotspot_gc_shenandoah; benchmarks, +AbortVMOnCompilationFailure testing

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Mon Feb 24 16:22:50 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 24 Feb 2020 17:22:50 +0100
Subject: RFR (XS) 8239868: Shenandoah: ditch C2 node limit adjustments
In-Reply-To: <8eeac17f-a6ed-18c1-ef90-667e692e309a@redhat.com>
References: <8eeac17f-a6ed-18c1-ef90-667e692e309a@redhat.com>
Message-ID: <f1976143-491a-4945-f126-2ce11fb40b78@redhat.com>

> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8239868
> 
> We have the block added to Shenandoah arguments code that adjusts MaxNodeLimit and friends (predates
> inclusion of Shenandoah into mainline):
>  https://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-August/006983.html
> 
> At the time, it was prompted by observing that lots of barriers everywhere really needed to have
> this limit bumped. Today, with simplified LRB scheme, more simple LRB due to SFX, etc, we do not
> need this.
> 
> The change above used ShenandoahCompileCheck, which made it into upstream code under generic
> AbortVMOnCompilationFailure. With that, I was able to verify that dropping the block does not yield
> compilation failures due to exceeded node budget on hotspot_gc_shenandoah, specjvm2008, specjbb2015.
> Performance numbers are also not affected (as expected).
> 
> Therefore, the adjustment can be removed:
> 
> diff -r 5c5dcd036a76 src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp   Mon Feb 24 11:01:51 2020 +0100
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp   Mon Feb 24 17:09:58 2020 +0100
> @@ -193,13 +193,4 @@
>    }
> 
> -  // Shenandoah needs more C2 nodes to compile some methods with lots of barriers.
> -  // NodeLimitFudgeFactor needs to stay the same relative to MaxNodeLimit.
> -#ifdef COMPILER2
> -  if (FLAG_IS_DEFAULT(MaxNodeLimit)) {
> -    FLAG_SET_DEFAULT(MaxNodeLimit, MaxNodeLimit * 3);
> -    FLAG_SET_DEFAULT(NodeLimitFudgeFactor, NodeLimitFudgeFactor * 3);
> -  }
> -#endif
> -
>    // Make sure safepoint deadlocks are failing predictably. This sets up VM to report
>    // fatal error after 10 seconds of wait for safepoint syncronization (not the VM
> 
> Testing: hotspot_gc_shenandoah; benchmarks, +AbortVMOnCompilationFailure testing

Ok.

Thank you!
Roman


From sangheon.kim at oracle.com  Mon Feb 24 22:02:20 2020
From: sangheon.kim at oracle.com (sangheon.kim at oracle.com)
Date: Mon, 24 Feb 2020 14:02:20 -0800
Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously
 paused buffers
In-Reply-To: <C4E56CF5-94A0-4064-96EF-CBA0D4CD4D41@oracle.com>
References: <C4E56CF5-94A0-4064-96EF-CBA0D4CD4D41@oracle.com>
Message-ID: <2360042b-b9fa-9766-f235-a5ed62801191@oracle.com>

Hi Kim,

On 2/13/20 5:46 PM, Kim Barrett wrote:
> Please review this simplification of the handling of previously paused
> buffers by G1DirtyCardQueueSet.  This change moves the call to
> enqueue_previous_paused_buffers() into record_paused_buffer().  This
> ensures any paused buffers from a previous safepoint have been flushed
> out before recording a buffer for the next safepoint.
>
> This move eliminates the former precondition that the enqueue had to
> have been performed before recording.
>
> This move also permits the enqueue_previous_paused_buffers in
> get_completed_buffer() to be moved to a point where it will be called
> much more rarely, slightly improving the normal performance of
> get_dirtied_buffer.  The old location of the call was in support of
> the call order invariant needed by record_paused_buffer().
>
> As a consequence of the changed enqueue locations, the fast path check
> in enqueue_previous_paused_buffers() will now only rarely succeed, and
> is no longer worth the (very small) performance cost and (much more
> importantly) the largish block comment arguing its correctness.  So
> that fast path is removed.  And since the raison d'etre for
> PausedBuffers::is_empty() was to support that fast path, that function
> is also removed.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238979
>
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8238979/open.00/
>
> Testing:
> mach5 tier1-5 in conjunction with other in-development changes.
> Local (linux-x64) hotspot:tier1 for this change in isolation.
Looks good to me.

Thanks,
Sangheon


>


From kim.barrett at oracle.com  Tue Feb 25 03:33:33 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Mon, 24 Feb 2020 22:33:33 -0500
Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously
 paused buffers
In-Reply-To: <2360042b-b9fa-9766-f235-a5ed62801191@oracle.com>
References: <C4E56CF5-94A0-4064-96EF-CBA0D4CD4D41@oracle.com>
 <2360042b-b9fa-9766-f235-a5ed62801191@oracle.com>
Message-ID: <2398C423-F6EA-497E-B3F1-929A48042C59@oracle.com>

> On Feb 24, 2020, at 5:02 PM, sangheon.kim at oracle.com wrote:
> On 2/13/20 5:46 PM, Kim Barrett wrote:
>> [?]
>> 
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8238979
>> 
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8238979/open.00/
>> 
>> Testing:
>> mach5 tier1-5 in conjunction with other in-development changes.
>> Local (linux-x64) hotspot:tier1 for this change in isolation.
> Looks good to me.
> 
> Thanks,
> Sangheon

Thanks.


From shade at redhat.com  Tue Feb 25 08:05:03 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 25 Feb 2020 09:05:03 +0100
Subject: RFR (S) 8239904: Shenandoah: accumulated penalties should not be over
 100% of capacity
Message-ID: <2b73bc42-1d5b-1277-a6b2-382acf660ea2@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8239904

See details in the bug.

Fix:
  https://cr.openjdk.java.net/~shade/8239904/webrev.01/

Testing: hotspot_gc_shenandoah

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Tue Feb 25 11:29:41 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 25 Feb 2020 12:29:41 +0100
Subject: RFR (S) 8239904: Shenandoah: accumulated penalties should not be
 over 100% of capacity
In-Reply-To: <2b73bc42-1d5b-1277-a6b2-382acf660ea2@redhat.com>
References: <2b73bc42-1d5b-1277-a6b2-382acf660ea2@redhat.com>
Message-ID: <c6ffa5a8-0210-7121-9c19-8d1d1eac1087@redhat.com>

Yes, looks good!

Roman


> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8239904
> 
> See details in the bug.
> 
> Fix:
>   https://cr.openjdk.java.net/~shade/8239904/webrev.01/
> 
> Testing: hotspot_gc_shenandoah
> 


From maoliang.ml at alibaba-inc.com  Tue Feb 25 11:28:41 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Tue, 25 Feb 2020 19:28:41 +0800
Subject: =?UTF-8?B?UkZSOiA4MjM2MDczOiBHMTogVXNlIFNvZnRNYXhIZWFwU2l6ZSB0byBndWlkZSBHQyBoZXVy?=
 =?UTF-8?B?aXN0aWNz?=
Message-ID: <6cdfc61a-1e91-42ac-b1d8-725e3c45ff97.maoliang.ml@alibaba-inc.com>

Hi Thomas,

Do you have any testing result of the patch? 
I made a little change based on your webrev:
http://cr.openjdk.java.net/~tschatzl/8236073/webrev.2/
to retain the shrink in remark which fixed the failure of
JEP 346 and should handle the "idle" scenario.

http://cr.openjdk.java.net/~luchsh/8236073.webrev.5/

Thanks,
Liang


------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Feb. 19 (Wed.) 18:56
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; Stefan Johansson <stefan.johansson at oracle.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Hi,

On 19.02.20 11:44, Liang Mao wrote:
> Hi Thomas,
> 
> When I was testing those benchmarks like specjbb2015 and specjvm2008,
> the expansions mostly happened at remark. So I guess the expansion after
> concurrent mark at peak usage based on a minimal capacity might
> prevent several expansions in normal young collections. It's only my
> thinking since I don't have much performance data. I don't have any
> problems with expanding after young collection:)

We'll collect perf data about this.

> 
> BTW, do you and Stefan prefer to leave the shrink at remark for fixing
> the failure of JEP346 and handling the idle scenario?

Yes, and since Stefan suggested that we should shrink during Remark 
already I think he agrees.

Thomas


From erik.osterlund at oracle.com  Tue Feb 25 13:10:40 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 25 Feb 2020 14:10:40 +0100
Subject: RFC: JEP: ZGC: Concurrent Execution Stack Processing
Message-ID: <e52f7f3d-1b2b-b3a4-9bc5-04e5f8c0b24a@oracle.com>

Hi,

I have created a JEP draft to add concurrent execution stack scanning to 
ZGC.

https://bugs.openjdk.java.net/browse/JDK-8239600

Comments and feedback welcome.

Thanks,
/Erik


From zgu at redhat.com  Tue Feb 25 17:13:03 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 25 Feb 2020 12:13:03 -0500
Subject: [15] RFR 8239926: Shenandoah: Shenandoah needs to mark nmethod's
 metadata
Message-ID: <75c20855-5234-ba00-f07b-f9da0f7b8047@redhat.com>

Shenandoah encounters a few test failures with tools/javac. Verifier 
catches unmarked oops in nmethod's metadata during root evacuation in 
final mark phase.

The problem is that, Shenandoah marks on stack nmethods in init mark 
pause, but it does not mark nmethod's metadata during concurrent mark 
phase, when new nmethod is about to be executed.

The solution:
1) Use nmethod_entry_barrier to keep nmethod's metadata alive when the 
nmethod is about to be executed, when nmethod entry barrier is supported.

2) Remark on stack nmethod's metadata at final mark pause.

Bug: https://bugs.openjdk.java.net/browse/JDK-8239926
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239926/webrev.00/

Test:
   hotspot_gc_shenandoah (fastdebug and release)
   tools/javac with ShenandoahCodeRootsStyle = 1 and 2 (fastdebug and 
release)

Thanks,

-Zhengyu


From hohensee at amazon.com  Tue Feb 25 21:13:38 2020
From: hohensee at amazon.com (Hohensee, Paul)
Date: Tue, 25 Feb 2020 21:13:38 +0000
Subject: RFR(XS): 8239916 - SA: delete dead code in
 jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java
In-Reply-To: <d1397b4a-290a-3392-4aed-a730fc23bfe9@oracle.com>
References: <6ccd3ea6fc974cecb202865c7528912e@tencent.com>
 <d1397b4a-290a-3392-4aed-a730fc23bfe9@oracle.com>
Message-ID: <D6C810B4-2881-44B3-B21B-82AF28F16355@amazon.com>

That?s indeed dead code, so lgtm.

Thanks,
Paul

From: serviceability-dev <serviceability-dev-bounces at openjdk.java.net> on behalf of Chris Plummer <chris.plummer at oracle.com>
Date: Tuesday, February 25, 2020 at 10:04 AM
To: "linzang(??)" <linzang at tencent.com>, serviceability-dev <serviceability-dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>
Subject: Re: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java

Adding hotspot-gc-dev.

Chris

On 2/25/20 2:21 AM, linzang(??) wrote:
Hi,
    Please review the following change:
    Bugs: https://bugs.openjdk.java.net/browse/JDK-8239916
    webrev: http://cr.openjdk.java.net/~lzang/8239916/webrev/

Thanks,
Lin


From stefan.karlsson at oracle.com  Tue Feb 25 21:46:59 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 25 Feb 2020 22:46:59 +0100
Subject: RFR(XS): 8239916 - SA: delete dead code in
 jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java
In-Reply-To: <d1397b4a-290a-3392-4aed-a730fc23bfe9@oracle.com>
References: <6ccd3ea6fc974cecb202865c7528912e@tencent.com>
 <d1397b4a-290a-3392-4aed-a730fc23bfe9@oracle.com>
Message-ID: <9a67c326-f693-99ed-0c51-4f6bf96dd9b3@oracle.com>

Looks good. This is left-overs from the CMS removal.

StefanK

On 2020-02-25 19:02, Chris Plummer wrote:
> Adding hotspot-gc-dev.
>
> Chris
>
> On 2/25/20 2:21 AM, linzang(??) wrote:
>> Hi,
>> ? ? Please review the following change:
>> ? ? Bugs: https://bugs.openjdk.java.net/browse/JDK-8239916
>> ? ? webrev: http://cr.openjdk.java.net/~lzang/8239916/webrev/
>>
>> Thanks,
>> Lin
>


From linzang at tencent.com  Wed Feb 26 02:47:35 2020
From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=)
Date: Wed, 26 Feb 2020 02:47:35 +0000
Subject: RFR(XS): 8239916 - SA: delete dead code in
 jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java(Internet
 mail)
In-Reply-To: <9a67c326-f693-99ed-0c51-4f6bf96dd9b3@oracle.com>
References: <6ccd3ea6fc974cecb202865c7528912e@tencent.com>
 <d1397b4a-290a-3392-4aed-a730fc23bfe9@oracle.com>
 <9a67c326-f693-99ed-0c51-4f6bf96dd9b3@oracle.com>
Message-ID: <8C8E0733-3076-49F1-9527-F11A8860661C@tencent.com>

Thanks for reviewing, so can this change be merged now? 

BRs,
Lin

> On Feb 26, 2020, at 5:46 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> Looks good. This is left-overs from the CMS removal.
> 
> StefanK
> 
> On 2020-02-25 19:02, Chris Plummer wrote:
>> Adding hotspot-gc-dev.
>> 
>> Chris
>> 
>> On 2/25/20 2:21 AM, linzang(??) wrote:
>>> Hi,
>>>     Please review the following change:
>>>     Bugs: https://bugs.openjdk.java.net/browse/JDK-8239916
>>>     webrev: http://cr.openjdk.java.net/~lzang/8239916/webrev/
>>> 
>>> Thanks,
>>> Lin
>> 
> 
> 


From stefan.johansson at oracle.com  Wed Feb 26 09:07:08 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 26 Feb 2020 10:07:08 +0100
Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously
 paused buffers
In-Reply-To: <C4E56CF5-94A0-4064-96EF-CBA0D4CD4D41@oracle.com>
References: <C4E56CF5-94A0-4064-96EF-CBA0D4CD4D41@oracle.com>
Message-ID: <ABB4208C-1B9D-4085-925D-F366DAAADCC6@oracle.com>

Hi Kim,

> 14 feb. 2020 kl. 02:46 skrev Kim Barrett <kim.barrett at oracle.com>:
> 
> Please review this simplification of the handling of previously paused
> buffers by G1DirtyCardQueueSet.  This change moves the call to
> enqueue_previous_paused_buffers() into record_paused_buffer().  This
> ensures any paused buffers from a previous safepoint have been flushed
> out before recording a buffer for the next safepoint.
> 
> This move eliminates the former precondition that the enqueue had to
> have been performed before recording.
> 
> This move also permits the enqueue_previous_paused_buffers in
> get_completed_buffer() to be moved to a point where it will be called
> much more rarely, slightly improving the normal performance of
> get_dirtied_buffer.  The old location of the call was in support of
> the call order invariant needed by record_paused_buffer().
> 
> As a consequence of the changed enqueue locations, the fast path check
> in enqueue_previous_paused_buffers() will now only rarely succeed, and
> is no longer worth the (very small) performance cost and (much more
> importantly) the largish block comment arguing its correctness.  So
> that fast path is removed.  And since the raison d'etre for
> PausedBuffers::is_empty() was to support that fast path, that function
> is also removed.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8238979
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8238979/open.00/
Looks good,
StefanJ

> 
> Testing:
> mach5 tier1-5 in conjunction with other in-development changes.
> Local (linux-x64) hotspot:tier1 for this change in isolation.
> 


From shade at redhat.com  Wed Feb 26 09:19:26 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Feb 2020 10:19:26 +0100
Subject: RFR (XS) 8240069: Shenandoah: turn more flags diagnostic
Message-ID: <80e3c299-575d-1603-0341-6176738b1280@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8240069

Webrev:
  http://cr.openjdk.java.net/~shade/8240069/webrev.01/

Regular sweep of flags that are experimental, but have been used as diagnostic. Diagnostic flags are
usually for features that are enabled by default, and are not expected to be disabled, unless
someone is chasing the bug.

Testing: hotspot_gc_shenandoah {fastdebug,release}

-- 
Thanks,
-Aleksey


From shade at redhat.com  Wed Feb 26 09:38:21 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Feb 2020 10:38:21 +0100
Subject: RFR (S) 8240070: Shenandoah: remove obsolete
 ShenandoahCommonGCStateLoads
Message-ID: <1705d27b-bdf7-ef28-edb7-84e804786798@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8240070

This is the leftover of the older experiment that optimized the frequently emitted barriers. With
the switch to LRB and questionable performance improvements (sometimes hijacked by elevated register
pressure), it makes less sense to keep the option exposed and C2 code more complicated.

Removal webrev:
  https://cr.openjdk.java.net/~shade/8240070/webrev.01/

Testing: hotspot_gc_shenandoah {fastdebug,release}

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Wed Feb 26 10:00:18 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 26 Feb 2020 11:00:18 +0100
Subject: RFR (XS) 8240069: Shenandoah: turn more flags diagnostic
In-Reply-To: <80e3c299-575d-1603-0341-6176738b1280@redhat.com>
References: <80e3c299-575d-1603-0341-6176738b1280@redhat.com>
Message-ID: <6009896c-853d-16fb-6769-fcf1b97387ac@redhat.com>

Yes, that makes sense! Thank you!

Roman


> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8240069
> 
> Webrev:
>   http://cr.openjdk.java.net/~shade/8240069/webrev.01/
> 
> Regular sweep of flags that are experimental, but have been used as diagnostic. Diagnostic flags are
> usually for features that are enabled by default, and are not expected to be disabled, unless
> someone is chasing the bug.
> 
> Testing: hotspot_gc_shenandoah {fastdebug,release}
> 


From rkennke at redhat.com  Wed Feb 26 10:03:47 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 26 Feb 2020 11:03:47 +0100
Subject: RFR (S) 8240070: Shenandoah: remove obsolete
 ShenandoahCommonGCStateLoads
In-Reply-To: <1705d27b-bdf7-ef28-edb7-84e804786798@redhat.com>
References: <1705d27b-bdf7-ef28-edb7-84e804786798@redhat.com>
Message-ID: <58632b60-f466-d443-d520-78da2702b096@redhat.com>

As far as I understand, this optimization pass would not work with LRB
anyway. So yeah, please remove it.

Roman

> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8240070
> 
> This is the leftover of the older experiment that optimized the frequently emitted barriers. With
> the switch to LRB and questionable performance improvements (sometimes hijacked by elevated register
> pressure), it makes less sense to keep the option exposed and C2 code more complicated.
> 
> Removal webrev:
>   https://cr.openjdk.java.net/~shade/8240070/webrev.01/
> 
> Testing: hotspot_gc_shenandoah {fastdebug,release}
> 


From shade at redhat.com  Wed Feb 26 11:52:39 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 26 Feb 2020 12:52:39 +0100
Subject: RFR (S) 8240076: Shenandoah: pacer should cover reset and preclean
 phases
Message-ID: <30ee2c94-22dd-536e-7d59-c3d61ae87780@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8240076

See bug above for discussion.

Webrev:
  https://cr.openjdk.java.net/~shade/8240076/webrev.01/

Testing: hotspot_gc_shenandoah, eyeballing logs

-- 
Thanks,
-Aleksey


From erik.gahlin at oracle.com  Wed Feb 26 12:50:45 2020
From: erik.gahlin at oracle.com (Erik Gahlin)
Date: Wed, 26 Feb 2020 13:50:45 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
Message-ID: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>

Hi,

Could I have a review of a JFR event that is emitted when System.gc() is 
called.

Purpose is to collect the stack trace. It is not sufficient with the 
cause field that the GarbageCollection event has today.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8003216

Webrev:
http://cr.openjdk.java.net/~egahlin/8003216/

Testing:
tier1+tier2+jdk/jdk/jfr

Thanks
Erik


From per.liden at oracle.com  Wed Feb 26 12:56:45 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 26 Feb 2020 13:56:45 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
Message-ID: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>

Hi Erik,

On 2020-02-26 13:50, Erik Gahlin wrote:
> Hi,
> 
> Could I have a review of a JFR event that is emitted when System.gc() is 
> called.
> 
> Purpose is to collect the stack trace. It is not sufficient with the 
> cause field that the GarbageCollection event has today.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8003216
> 
> Webrev:
> http://cr.openjdk.java.net/~egahlin/8003216/

  489     EventSystemGC event;
  490     event.commit();
  491     Universe::heap()->collect(GCCause::_java_lang_system_gc);

Don't you want the commit() call after the call to collect(), to get the 
timing right?

cheers,
Per

> 
> Testing:
> tier1+tier2+jdk/jdk/jfr
> 
> Thanks
> Erik
> 
> 


From stefan.johansson at oracle.com  Wed Feb 26 13:21:16 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Wed, 26 Feb 2020 14:21:16 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
 <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
Message-ID: <DC41CDC7-07F7-4FC9-8C8D-46ECA1BF3FDE@oracle.com>

Hi Erik,

> 26 feb. 2020 kl. 13:56 skrev Per Liden <per.liden at oracle.com>:
> 
> Hi Erik,
> 
> On 2020-02-26 13:50, Erik Gahlin wrote:
>> Hi,
>> Could I have a review of a JFR event that is emitted when System.gc() is called.
>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today.
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8003216
>> Webrev:
>> http://cr.openjdk.java.net/~egahlin/8003216/
> 
> 489     EventSystemGC event;
> 490     event.commit();
> 491     Universe::heap()->collect(GCCause::_java_lang_system_gc);
> 
> Don't you want the commit() call after the call to collect(), to get the timing right?

I was thinking the same thing, could also be nice to have the GC-id associated with the event to make it easy to match it to GC-logs and other GC-events. Not sure how to easily get the GC-id though, since it?s not set at the time we commit the event.

I guess if the event has the correct span with timestamps it will be easy to figure out which other events are associated with it, even without the GC-id.

Cheers,
Stefan

> 
> cheers,
> Per
> 
>> Testing:
>> tier1+tier2+jdk/jdk/jfr
>> Thanks
>> Erik


From zgu at redhat.com  Wed Feb 26 13:32:34 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 26 Feb 2020 08:32:34 -0500
Subject: RFR (S) 8240076: Shenandoah: pacer should cover reset and
 preclean phases
In-Reply-To: <30ee2c94-22dd-536e-7d59-c3d61ae87780@redhat.com>
References: <30ee2c94-22dd-536e-7d59-c3d61ae87780@redhat.com>
Message-ID: <d072ac19-68db-1136-2a14-4028d46da249@redhat.com>

Looks good to me.

-Zhengyu

On 2/26/20 6:52 AM, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8240076
> 
> See bug above for discussion.
> 
> Webrev:
>    https://cr.openjdk.java.net/~shade/8240076/webrev.01/
> 
> Testing: hotspot_gc_shenandoah, eyeballing logs
> 


From erik.gahlin at oracle.com  Wed Feb 26 13:50:41 2020
From: erik.gahlin at oracle.com (Erik Gahlin)
Date: Wed, 26 Feb 2020 14:50:41 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
 <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
Message-ID: <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com>

Hi Per,

My thinking was that users expect the timestamp of the event to happen before the GarbageCollection event, so I made the event untimed.

I could make the event timed, but what happens if a concurrent gc is used and it is already in progress. Would a new gc cycle start, or will it complete the existing cycle before returning?

Thanks
Erik

> On 26 Feb 2020, at 13:56, Per Liden <per.liden at oracle.com> wrote:
> 
> Hi Erik,
> 
> On 2020-02-26 13:50, Erik Gahlin wrote:
>> Hi,
>> Could I have a review of a JFR event that is emitted when System.gc() is called.
>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today.
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8003216
>> Webrev:
>> http://cr.openjdk.java.net/~egahlin/8003216/
> 
> 489     EventSystemGC event;
> 490     event.commit();
> 491     Universe::heap()->collect(GCCause::_java_lang_system_gc);
> 
> Don't you want the commit() call after the call to collect(), to get the timing right?
> 
> cheers,
> Per
> 
>> Testing:
>> tier1+tier2+jdk/jdk/jfr
>> Thanks
>> Erik


From kim.barrett at oracle.com  Wed Feb 26 13:56:39 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 26 Feb 2020 08:56:39 -0500
Subject: RFR: 8238979: Improve G1DirtyCardQueueSet handling of previously
 paused buffers
In-Reply-To: <ABB4208C-1B9D-4085-925D-F366DAAADCC6@oracle.com>
References: <C4E56CF5-94A0-4064-96EF-CBA0D4CD4D41@oracle.com>
 <ABB4208C-1B9D-4085-925D-F366DAAADCC6@oracle.com>
Message-ID: <9A987C14-942E-4CED-9E89-A6678AA6F9B0@oracle.com>

> On Feb 26, 2020, at 4:07 AM, Stefan Johansson <stefan.johansson at oracle.com> wrote:
> 
> Hi Kim,
> 
>> 14 feb. 2020 kl. 02:46 skrev Kim Barrett <kim.barrett at oracle.com>:
>> 
>> Please review this simplification of the handling of previously paused
>> buffers by G1DirtyCardQueueSet.  This change moves the call to
>> enqueue_previous_paused_buffers() into record_paused_buffer().  This
>> ensures any paused buffers from a previous safepoint have been flushed
>> out before recording a buffer for the next safepoint.
>> 
>> [?]
>> 
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8238979
>> 
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8238979/open.00/
> Looks good,
> StefanJ

Thanks.

> 
>> 
>> Testing:
>> mach5 tier1-5 in conjunction with other in-development changes.
>> Local (linux-x64) hotspot:tier1 for this change in isolation.


From per.liden at oracle.com  Wed Feb 26 14:02:01 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 26 Feb 2020 15:02:01 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
 <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
 <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com>
Message-ID: <0487f8cf-e826-d991-12c6-f720818f50a1@oracle.com>

Hi,

On 2/26/20 2:50 PM, Erik Gahlin wrote:
> Hi Per,
> 
> My thinking was that users expect the timestamp of the event to happen before the GarbageCollection event, so I made the event untimed.

I would have expected the event start time to be before the collection 
starts, and the end time when it's done.

> 
> I could make the event timed, but what happens if a concurrent gc is used and it is already in progress. Would a new gc cycle start, or will it complete the existing cycle before returning?

Unless -XX:+ExplicitGCInvokesConcurrent is used (off by default), they 
do the same, which is wait until the "System.gc" collection has 
completed. For ZGC, should a GC cycle be in progress when a call to 
System.gc() is made, it will first wait for the in progress cycle to 
finish, and then execute the "System.gc" cycle before returning.

cheers,
Per

> 
> Thanks
> Erik
> 
>> On 26 Feb 2020, at 13:56, Per Liden <per.liden at oracle.com> wrote:
>>
>> Hi Erik,
>>
>> On 2020-02-26 13:50, Erik Gahlin wrote:
>>> Hi,
>>> Could I have a review of a JFR event that is emitted when System.gc() is called.
>>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today.
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8003216
>>> Webrev:
>>> http://cr.openjdk.java.net/~egahlin/8003216/
>>
>> 489     EventSystemGC event;
>> 490     event.commit();
>> 491     Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>
>> Don't you want the commit() call after the call to collect(), to get the timing right?
>>
>> cheers,
>> Per
>>
>>> Testing:
>>> tier1+tier2+jdk/jdk/jfr
>>> Thanks
>>> Erik
> 


From erik.gahlin at oracle.com  Wed Feb 26 15:17:29 2020
From: erik.gahlin at oracle.com (Erik Gahlin)
Date: Wed, 26 Feb 2020 16:17:29 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <0487f8cf-e826-d991-12c6-f720818f50a1@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
 <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
 <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com>
 <0487f8cf-e826-d991-12c6-f720818f50a1@oracle.com>
Message-ID: <D04645E9-FBB0-4857-8E31-928E926999A1@oracle.com>


> On 26 Feb 2020, at 15:02, Per Liden <per.liden at oracle.com> wrote:
> 
> Hi,
> 
> On 2/26/20 2:50 PM, Erik Gahlin wrote:
>> Hi Per,
>> My thinking was that users expect the timestamp of the event to happen before the GarbageCollection event, so I made the event untimed.
> 
> I would have expected the event start time to be before the collection starts, and the end time when it's done.

We have sometimes sorted events by their end time, or when they are committed (written to the buffer).This works better than start time in some cases.

> 
>> I could make the event timed, but what happens if a concurrent gc is used and it is already in progress. Would a new gc cycle start, or will it complete the existing cycle before returning?
> 
> Unless -XX:+ExplicitGCInvokesConcurrent is used (off by default), they do the same, which is wait until the "System.gc" collection has completed. For ZGC, should a GC cycle be in progress when a call to System.gc() is made, it will first wait for the in progress cycle to finish, and then execute the "System.gc" cycle before returning.

If users expect the System GC event to measure the time the Java thread is blocked, making the event timed makes sense. 

If users expect the duration to be the length of the triggered GC, or even the pause time, it would mislead users to make it timed.

Erik

> 
> cheers,
> Per
> 
>> Thanks
>> Erik
>>> On 26 Feb 2020, at 13:56, Per Liden <per.liden at oracle.com> wrote:
>>> 
>>> Hi Erik,
>>> 
>>> On 2020-02-26 13:50, Erik Gahlin wrote:
>>>> Hi,
>>>> Could I have a review of a JFR event that is emitted when System.gc() is called.
>>>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today.
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8003216
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~egahlin/8003216/
>>> 
>>> 489     EventSystemGC event;
>>> 490     event.commit();
>>> 491     Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>> 
>>> Don't you want the commit() call after the call to collect(), to get the timing right?
>>> 
>>> cheers,
>>> Per
>>> 
>>>> Testing:
>>>> tier1+tier2+jdk/jdk/jfr
>>>> Thanks
>>>> Erik


From erik.gahlin at oracle.com  Wed Feb 26 17:28:13 2020
From: erik.gahlin at oracle.com (Erik Gahlin)
Date: Wed, 26 Feb 2020 18:28:13 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <DC41CDC7-07F7-4FC9-8C8D-46ECA1BF3FDE@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
 <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
 <DC41CDC7-07F7-4FC9-8C8D-46ECA1BF3FDE@oracle.com>
Message-ID: <aec96b31-60e9-9238-8411-011d38462722@oracle.com>

Hi Stefan,

GC-id would be nice, but perhaps not possible in all scenarios, i.e. 
-XX:+ExplicitGCInvokesConcurrent and Epsilon GC?

Thanks
Erik

On 2020-02-26 14:21, Stefan Johansson wrote:
> Hi Erik,
>
>> 26 feb. 2020 kl. 13:56 skrev Per Liden <per.liden at oracle.com>:
>>
>> Hi Erik,
>>
>> On 2020-02-26 13:50, Erik Gahlin wrote:
>>> Hi,
>>> Could I have a review of a JFR event that is emitted when System.gc() is called.
>>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today.
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8003216
>>> Webrev:
>>> http://cr.openjdk.java.net/~egahlin/8003216/
>> 489     EventSystemGC event;
>> 490     event.commit();
>> 491     Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>
>> Don't you want the commit() call after the call to collect(), to get the timing right?
> I was thinking the same thing, could also be nice to have the GC-id associated with the event to make it easy to match it to GC-logs and other GC-events. Not sure how to easily get the GC-id though, since it?s not set at the time we commit the event.
>
> I guess if the event has the correct span with timestamps it will be easy to figure out which other events are associated with it, even without the GC-id.
>
> Cheers,
> Stefan
>
>> cheers,
>> Per
>>
>>> Testing:
>>> tier1+tier2+jdk/jdk/jfr
>>> Thanks
>>> Erik


From mikhailo.seledtsov at oracle.com  Wed Feb 26 21:07:42 2020
From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com)
Date: Wed, 26 Feb 2020 13:07:42 -0800
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
Message-ID: <565e3578-0b67-41e0-13a4-3c5ac0b86904@oracle.com>

Looks good to me,

Misha

On 2/26/20 4:50 AM, Erik Gahlin wrote:
> Hi,
>
> Could I have a review of a JFR event that is emitted when System.gc() 
> is called.
>
> Purpose is to collect the stack trace. It is not sufficient with the 
> cause field that the GarbageCollection event has today.
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8003216
>
> Webrev:
> http://cr.openjdk.java.net/~egahlin/8003216/
>
> Testing:
> tier1+tier2+jdk/jdk/jfr
>
> Thanks
> Erik
>
>


From felixxfyang at tencent.com  Thu Feb 27 08:41:24 2020
From: felixxfyang at tencent.com (=?utf-8?B?ZmVsaXh4Znlhbmco5p2o5pmT5bOwKQ==?=)
Date: Thu, 27 Feb 2020 08:41:24 +0000
Subject: RFR(XS): 8239916 - SA: delete dead code in
 jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java(Internet
 mail)
In-Reply-To: <8C202D0D-DEB9-43F5-B06F-822AC9AF430F@tencent.com>
References: <8C202D0D-DEB9-43F5-B06F-822AC9AF430F@tencent.com>
Message-ID: <E3E76139-47E7-463D-A91D-BB5A78BACA29@tencent.com>

Copy correct alias
-Felix
???: "felixxfyang(???)" <felixxfyang at tencent.com>
??: 2020?2?27? ??? ??4:29
???: "linzang(??)" <linzang at tencent.com>
??: hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net>
??: Re: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java(Internet mail)

Hi Lin,
Suppose yes, this change looks trivial. I can sponsor to push it.

Thanks,
Felix

From: linzang(??)<mailto:linzang at tencent.com>
Date: 2020-02-26 10:47
To: Stefan Karlsson<mailto:stefan.karlsson at oracle.com>; Paul Hohensee<mailto:hohensee at amazon.com>
CC: linzang(??)<mailto:linzang at tencent.com>; Chris Plummer<mailto:chris.plummer at oracle.com>; serviceability-dev<mailto:serviceability-dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>
Subject: Re: RFR(XS): 8239916 - SA: delete dead code in jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java(Internet mail)
Thanks for reviewing, so can this change be merged now?

BRs,
Lin

> On Feb 26, 2020, at 5:46 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>
> Looks good. This is left-overs from the CMS removal.
>
> StefanK
>
> On 2020-02-25 19:02, Chris Plummer wrote:
>> Adding hotspot-gc-dev.
>>
>> Chris
>>
>> On 2/25/20 2:21 AM, linzang(??) wrote:
>>> Hi,
>>>     Please review the following change:
>>>     Bugs: https://bugs.openjdk.java.net/browse/JDK-8239916
>>>     webrev: http://cr.openjdk.java.net/~lzang/8239916/webrev/
>>>
>>> Thanks,
>>> Lin
>>
>
>


From stefan.johansson at oracle.com  Thu Feb 27 09:13:50 2020
From: stefan.johansson at oracle.com (Stefan Johansson)
Date: Thu, 27 Feb 2020 10:13:50 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <aec96b31-60e9-9238-8411-011d38462722@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
 <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
 <DC41CDC7-07F7-4FC9-8C8D-46ECA1BF3FDE@oracle.com>
 <aec96b31-60e9-9238-8411-011d38462722@oracle.com>
Message-ID: <73577b98-c59e-9e80-b966-a11d501952d8@oracle.com>

Hi Erik,

On 2020-02-26 18:28, Erik Gahlin wrote:
> Hi Stefan,
> 
> GC-id would be nice, but perhaps not possible in all scenarios, i.e. 
> -XX:+ExplicitGCInvokesConcurrent and Epsilon GC?
For ExplicitGCInvokesConcurrent it would not be a big problem, that 
would start a concurrent cycle and we could use the id for that GC. I 
also realized that we can get the GC-id without any problem. For other 
events sent before the GC-id is properly setup, we use GCId::peek() 
which returns the id that will be used for the next collection.

For Epsilon, I'm not sure an event should be sent at all since they are 
blocked, see: EpsilonHeap::collect(...)

Thanks,
Stefan

> 
> Thanks
> Erik
> 
> On 2020-02-26 14:21, Stefan Johansson wrote:
>> Hi Erik,
>>
>>> 26 feb. 2020 kl. 13:56 skrev Per Liden <per.liden at oracle.com>:
>>>
>>> Hi Erik,
>>>
>>> On 2020-02-26 13:50, Erik Gahlin wrote:
>>>> Hi,
>>>> Could I have a review of a JFR event that is emitted when 
>>>> System.gc() is called.
>>>> Purpose is to collect the stack trace. It is not sufficient with the 
>>>> cause field that the GarbageCollection event has today.
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8003216
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~egahlin/8003216/
>>> 489???? EventSystemGC event;
>>> 490???? event.commit();
>>> 491???? Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>>
>>> Don't you want the commit() call after the call to collect(), to get 
>>> the timing right?
>> I was thinking the same thing, could also be nice to have the GC-id 
>> associated with the event to make it easy to match it to GC-logs and 
>> other GC-events. Not sure how to easily get the GC-id though, since 
>> it?s not set at the time we commit the event.
>>
>> I guess if the event has the correct span with timestamps it will be 
>> easy to figure out which other events are associated with it, even 
>> without the GC-id.
>>
>> Cheers,
>> Stefan
>>
>>> cheers,
>>> Per
>>>
>>>> Testing:
>>>> tier1+tier2+jdk/jdk/jfr
>>>> Thanks
>>>> Erik


From per.liden at oracle.com  Thu Feb 27 12:32:20 2020
From: per.liden at oracle.com (Per Liden)
Date: Thu, 27 Feb 2020 13:32:20 +0100
Subject: RFR: 8003216: Add JFR event indicating explicit System.gc() cal
In-Reply-To: <D04645E9-FBB0-4857-8E31-928E926999A1@oracle.com>
References: <fe03675c-046d-b9cb-b8a8-bfcc32d32640@oracle.com>
 <07178c56-dde3-25eb-c95c-32fff443cb55@oracle.com>
 <409F6986-AAE1-4565-AA5D-78DDD9E4EC89@oracle.com>
 <0487f8cf-e826-d991-12c6-f720818f50a1@oracle.com>
 <D04645E9-FBB0-4857-8E31-928E926999A1@oracle.com>
Message-ID: <ad52567c-76a1-1d4c-1199-39a74d7331a2@oracle.com>

Hi,

On 2/26/20 4:17 PM, Erik Gahlin wrote:
> 
>> On 26 Feb 2020, at 15:02, Per Liden <per.liden at oracle.com> wrote:
>>
>> Hi,
>>
>> On 2/26/20 2:50 PM, Erik Gahlin wrote:
>>> Hi Per,
>>> My thinking was that users expect the timestamp of the event to happen before the GarbageCollection event, so I made the event untimed.
>>
>> I would have expected the event start time to be before the collection starts, and the end time when it's done.
> 
> We have sometimes sorted events by their end time, or when they are committed (written to the buffer).This works better than start time in some cases.
> 
>>
>>> I could make the event timed, but what happens if a concurrent gc is used and it is already in progress. Would a new gc cycle start, or will it complete the existing cycle before returning?
>>
>> Unless -XX:+ExplicitGCInvokesConcurrent is used (off by default), they do the same, which is wait until the "System.gc" collection has completed. For ZGC, should a GC cycle be in progress when a call to System.gc() is made, it will first wait for the in progress cycle to finish, and then execute the "System.gc" cycle before returning.
> 
> If users expect the System GC event to measure the time the Java thread is blocked, making the event timed makes sense.
> 
> If users expect the duration to be the length of the triggered GC, or even the pause time, it would mislead users to make it timed.

I agree. I'm thinking the event time should just reflect how long time 
the Java thread was blocked, waiting for System.gc() to complete.

cheers,
Per

> 
> Erik
> 
>>
>> cheers,
>> Per
>>
>>> Thanks
>>> Erik
>>>> On 26 Feb 2020, at 13:56, Per Liden <per.liden at oracle.com> wrote:
>>>>
>>>> Hi Erik,
>>>>
>>>> On 2020-02-26 13:50, Erik Gahlin wrote:
>>>>> Hi,
>>>>> Could I have a review of a JFR event that is emitted when System.gc() is called.
>>>>> Purpose is to collect the stack trace. It is not sufficient with the cause field that the GarbageCollection event has today.
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8003216
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~egahlin/8003216/
>>>>
>>>> 489     EventSystemGC event;
>>>> 490     event.commit();
>>>> 491     Universe::heap()->collect(GCCause::_java_lang_system_gc);
>>>>
>>>> Don't you want the commit() call after the call to collect(), to get the timing right?
>>>>
>>>> cheers,
>>>> Per
>>>>
>>>>> Testing:
>>>>> tier1+tier2+jdk/jdk/jfr
>>>>> Thanks
>>>>> Erik
> 


From zgu at redhat.com  Thu Feb 27 13:21:24 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 27 Feb 2020 08:21:24 -0500
Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests with
 "Forwardee must point to a heap address"
In-Reply-To: <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
 <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
Message-ID: <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com>

Hi,

Based on Erik's suggestion from JDK-8238633 review [1], we can filter 
out oops marked by JVMTI and JFR leak profiler via resolve_forwarded() 
barrier, by inserting an null check on forwarding pointer.

To reduce performance impact, we split up compiler and runtime resolve 
forwarded barrier, only performs extra null check in runtime barrier, as 
JVMTI and leak profiler heap walk are performed at safepoints, where 
mutators are stopped.


Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/

Test:
   hotspot_gc_shenandoah
   vmTestbase_nsk_jvmti
   vmTestbase_nsk_jdi

Thanks,

-Zhengyu

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-February/040974.html


On 2/4/20 2:23 PM, Aleksey Shipilev wrote:
> On 2/3/20 9:59 PM, Zhengyu Gu wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632
>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/
> 
> Uh. It seems to me the cure is worse than the disease:
>    1) It rewires sensitive parts of barrier paths, root handling, etc, which requires more thorough
> testing, and we are too deep in RDP2 for this;
>    2) It effectively disables asserts for anything not in collection set. Which means it disables
> most of asserts. The fact that Verifier still works is a small consolation.
> 
> I propose to accept this failure in 14, and rework the JVMTI heap walk to stop messing around with
> mark words in 15. Since this relates to concurrent root handling, 11-shenandoah is already safe.
> 


From shade at redhat.com  Thu Feb 27 13:24:35 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 27 Feb 2020 14:24:35 +0100
Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests
 with "Forwardee must point to a heap address"
In-Reply-To: <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com>
References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
 <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
 <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com>
Message-ID: <c0e78abb-8d0f-3c86-4c5e-e7cdf363af3b@redhat.com>

On 2/27/20 2:21 PM, Zhengyu Gu wrote:
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/

This looks good to me.

Let Roman look through it as well.

-- 
Thanks,
-Aleksey


From shade at redhat.com  Thu Feb 27 13:26:57 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 27 Feb 2020 14:26:57 +0100
Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests
 with "Forwardee must point to a heap address"
In-Reply-To: <c0e78abb-8d0f-3c86-4c5e-e7cdf363af3b@redhat.com>
References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
 <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
 <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com>
 <c0e78abb-8d0f-3c86-4c5e-e7cdf363af3b@redhat.com>
Message-ID: <5722e9ec-05c9-611c-f2ae-b112813997a1@redhat.com>

On 2/27/20 2:24 PM, Aleksey Shipilev wrote:
> On 2/27/20 2:21 PM, Zhengyu Gu wrote:
>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/
> 
> This looks good to me.

Suggestion to change the synopsis, though:
 "Shenandoah: accept NULL fwdptr to cooperate with JVMTI and JFR"

-- 
Thanks,
-Aleksey


From zgu at redhat.com  Thu Feb 27 13:29:08 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 27 Feb 2020 08:29:08 -0500
Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests
 with "Forwardee must point to a heap address"
In-Reply-To: <5722e9ec-05c9-611c-f2ae-b112813997a1@redhat.com>
References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
 <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
 <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com>
 <c0e78abb-8d0f-3c86-4c5e-e7cdf363af3b@redhat.com>
 <5722e9ec-05c9-611c-f2ae-b112813997a1@redhat.com>
Message-ID: <019a2d3f-9bc1-f28d-4b69-bd3461d10c6d@redhat.com>


On 2/27/20 8:26 AM, Aleksey Shipilev wrote:
> On 2/27/20 2:24 PM, Aleksey Shipilev wrote:
>> On 2/27/20 2:21 PM, Zhengyu Gu wrote:
>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/
>>
>> This looks good to me.
> 
> Suggestion to change the synopsis, though:
>   "Shenandoah: accept NULL fwdptr to cooperate with JVMTI and JFR"

Thanks for the review, Aleksey.

I will fix the synopsis before push.

-Zhengyu

> 


From rkennke at redhat.com  Thu Feb 27 14:54:04 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 27 Feb 2020 15:54:04 +0100
Subject: [15] 8237632: Shenandoah fails some vmTestbase_nsk_jvmti tests
 with "Forwardee must point to a heap address"
In-Reply-To: <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com>
References: <24a45316-25f2-8be5-004e-47ca59cd1f13@redhat.com>
 <b1d31c39-714a-0998-001c-926ad3565c09@redhat.com>
 <2d619863-e6e2-2dde-29ed-ef95b13b7413@redhat.com>
Message-ID: <893bb323-ee07-7621-e80f-41e899064a65@redhat.com>

Looks good to me. Thank you!

Roman


> Hi,
> 
> Based on Erik's suggestion from JDK-8238633 review [1], we can filter
> out oops marked by JVMTI and JFR leak profiler via resolve_forwarded()
> barrier, by inserting an null check on forwarding pointer.
> 
> To reduce performance impact, we split up compiler and runtime resolve
> forwarded barrier, only performs extra null check in runtime barrier, as
> JVMTI and leak profiler heap walk are performed at safepoints, where
> mutators are stopped.
> 
> 
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.01/
> 
> Test:
> ? hotspot_gc_shenandoah
> ? vmTestbase_nsk_jvmti
> ? vmTestbase_nsk_jdi
> 
> Thanks,
> 
> -Zhengyu
> 
> [1]
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-February/040974.html
> 
> 
> 
> 
> On 2/4/20 2:23 PM, Aleksey Shipilev wrote:
>> On 2/3/20 9:59 PM, Zhengyu Gu wrote:
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8237632
>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8237632/webrev.00/
>>
>> Uh. It seems to me the cure is worse than the disease:
>> ?? 1) It rewires sensitive parts of barrier paths, root handling, etc,
>> which requires more thorough
>> testing, and we are too deep in RDP2 for this;
>> ?? 2) It effectively disables asserts for anything not in collection
>> set. Which means it disables
>> most of asserts. The fact that Verifier still works is a small
>> consolation.
>>
>> I propose to accept this failure in 14, and rework the JVMTI heap walk
>> to stop messing around with
>> mark words in 15. Since this relates to concurrent root handling,
>> 11-shenandoah is already safe.
>>
> 


From rkennke at redhat.com  Thu Feb 27 14:54:16 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 27 Feb 2020 15:54:16 +0100
Subject: [15] RFR 8239354: Shenandoah: minor enhancements to traversal GC
In-Reply-To: <f8f9b094-3340-7c97-2d01-622164fcd4a7@redhat.com>
References: <f8f9b094-3340-7c97-2d01-622164fcd4a7@redhat.com>
Message-ID: <60257cc5-6e76-cc71-79d8-42c96de669b6@redhat.com>

Hi Zhengyu,

This looks good to me, thank you!

Roman


> 1) Added assertion to catch evacuation after completion of heap
> traversal. This should help catch the bug demonstrated in sh-jdk11 w/o
> JDK-8237396.
> 
> 2) Retire TLAB/GCLAB after completion of heap traversal. Current code
> retires TLAB/GCLAB at the beginning final traversal, but STW traversal
> still uses GCLAB to evacuate remaining objects.
> 
> 3) Added comments regarding why need to retire TLAB/GCLAB, even we don't
> need heap to be parsable.
> 
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8239354
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8239354/webrev.00/index.html
> 
> Test:
> ? hotspot_gc_shenandoah
> 
> Thanks,
> 
> -Zhengyu
> 


From shade at redhat.com  Fri Feb 28 09:53:14 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 28 Feb 2020 10:53:14 +0100
Subject: RFR (XS) 8240216: Shenandoah: remove ShenandoahTerminationTrace
Message-ID: <dd697bef-180a-7198-c148-616d31379e4d@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8240216

This was the diagnostic option for working on improving the termination protocol. Now that VM had
moved globally to OWST as termination protocol, this seems to only increase the maintenance burden.
The option is turned off by default already.

Zhengyu, do you agree?

Webrev:
  https://cr.openjdk.java.net/~shade/8240216/webrev.01/

Testing: hotspot_gc_shenandoah {fastdebug,release}

-- 
Thanks,
-Aleksey


From shade at redhat.com  Fri Feb 28 10:06:15 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 28 Feb 2020 11:06:15 +0100
Subject: RFR (S) 8240217: Shenandoah: remove ShenandoahEvacAssist
Message-ID: <f59371a5-81a5-283d-127f-6b74a415f513@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8240217

ShenandoahEvacAssist is an experimental option that strived to make calling into WB/LRB slowpath
less frequent.

It implicitly relied on WB/LRB midpath to check for forwardee and shortcut from there. With the
introduction of self-fixing barriers, this was intentionally removed. Therefore, Shenandoah would
call into slow-path anyway, even when evac-assist path had evacuated some objects.

Also, with Traversal, the assist path breaks out of Traversal's intent to evacuate the objects in
traversal order. There, it becomes actively harmful. We should consider removing it.

Webrev:
  https://cr.openjdk.java.net/~shade/8240217/webrev.01/

Testing: hotspot_gc_shenandoah {fastdebug,release}

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Fri Feb 28 10:52:01 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 28 Feb 2020 11:52:01 +0100
Subject: RFR (S) 8240217: Shenandoah: remove ShenandoahEvacAssist
In-Reply-To: <f59371a5-81a5-283d-127f-6b74a415f513@redhat.com>
References: <f59371a5-81a5-283d-127f-6b74a415f513@redhat.com>
Message-ID: <bad677c3-af23-919d-aa78-41fc5922dbb9@redhat.com>

Have you done any performance experiments?

A (not so long) while back, I ran SPECjbb2015 with and without the
option, and couldn't measure a difference. If anything, latency slightly
improved with evac-assist turned off.

Other than that, good. Less code, less maintenance.

Roman

> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8240217
> 
> ShenandoahEvacAssist is an experimental option that strived to make calling into WB/LRB slowpath
> less frequent.
> 
> It implicitly relied on WB/LRB midpath to check for forwardee and shortcut from there. With the
> introduction of self-fixing barriers, this was intentionally removed. Therefore, Shenandoah would
> call into slow-path anyway, even when evac-assist path had evacuated some objects.
> 
> Also, with Traversal, the assist path breaks out of Traversal's intent to evacuate the objects in
> traversal order. There, it becomes actively harmful. We should consider removing it.
> 
> Webrev:
>   https://cr.openjdk.java.net/~shade/8240217/webrev.01/
> 
> Testing: hotspot_gc_shenandoah {fastdebug,release}
> 


From zgu at redhat.com  Fri Feb 28 13:26:12 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 28 Feb 2020 08:26:12 -0500
Subject: RFR (XS) 8240216: Shenandoah: remove ShenandoahTerminationTrace
In-Reply-To: <dd697bef-180a-7198-c148-616d31379e4d@redhat.com>
References: <dd697bef-180a-7198-c148-616d31379e4d@redhat.com>
Message-ID: <96a74fad-6463-8793-8ffb-60d62254cd0e@redhat.com>


On 2/28/20 4:53 AM, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8240216
> 
> This was the diagnostic option for working on improving the termination protocol. Now that VM had
> moved globally to OWST as termination protocol, this seems to only increase the maintenance burden.
> The option is turned off by default already.
> 
> Zhengyu, do you agree?

Okay.

-Zhengyu

> 
> Webrev:
>    https://cr.openjdk.java.net/~shade/8240216/webrev.01/
> 
> Testing: hotspot_gc_shenandoah {fastdebug,release}
> 


From shade at redhat.com  Fri Feb 28 14:24:06 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 28 Feb 2020 15:24:06 +0100
Subject: RFR (S) 8240217: Shenandoah: remove ShenandoahEvacAssist
In-Reply-To: <bad677c3-af23-919d-aa78-41fc5922dbb9@redhat.com>
References: <f59371a5-81a5-283d-127f-6b74a415f513@redhat.com>
 <bad677c3-af23-919d-aa78-41fc5922dbb9@redhat.com>
Message-ID: <633fc9e0-5b2c-9be4-8c74-0a29149cb0ef@redhat.com>

On 2/28/20 11:52 AM, Roman Kennke wrote:
> Have you done any performance experiments?

Just finished: no improvement/regressions, unless it hides in the noise.

-- 
Thanks,
-Aleksey


From m.sundar85 at gmail.com  Fri Feb 28 18:57:12 2020
From: m.sundar85 at gmail.com (Sundara Mohan M)
Date: Fri, 28 Feb 2020 13:57:12 -0500
Subject: Parallel GC Thread crash
In-Reply-To: <6afab3a3-92ab-f1bf-2022-9e21034cd28a@oracle.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
 <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
 <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com>
 <CACGCMVroXwRupLH3yrC36zZBBR3En=V4tSrfSXCq8mrZdh9zuw@mail.gmail.com>
 <6afab3a3-92ab-f1bf-2022-9e21034cd28a@oracle.com>
Message-ID: <CACGCMVog90SS-aoS8TjsXi-xBF90e-khP145tESo9bnipbLSsg@mail.gmail.com>

Hi Stefan,
    I tried running with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC but it
seems some of the operations are timing out (ex. ssl connect not sure if i
have very low timeout or this flag increases the latency)

Also from the crash error log i see following
...
  0x00007f0588066000 GCTaskThread "ParGC Thread#20" [stack:
0x00007ef5c4d13000,0x00007ef5c4e13000] [id=87534]

*=>0x00007f0588068000 GCTaskThread "ParGC Thread#21" [stack:
0x00007ef5c4c12000,0x00007ef5c4d12000] [id=87535]*  0x00007f0588069800
GCTaskThread "ParGC Thread#22" [stack:
0x00007ef5c4b11000,0x00007ef5c4c11000] [id=87536]
...
Threads with active compile tasks:


*VM state:at safepoint (normal execution)*

VM Mutex/Monitor currently owned by a thread:  ([mutex/lock_event])
[0x00007f0588015750] Threads_lock - owner thread: 0x00007f0588112800
[0x00007f0588016350] Heap_lock - owner thread: 0x00007ef4100f9000

Does this mean VM is in safepoint and executing GC operation or JVM related
activity which requires to be in safepoint (ie. not executing user code)
when it crashed?
I am trying to see if library or application code is causing this behaviour.

Thanks
Sundar

On Mon, Feb 10, 2020 at 3:13 PM Stefan Karlsson <stefan.karlsson at oracle.com>
wrote:

> On 2020-02-10 20:53, Sundara Mohan M wrote:
> > Hi Stefan,
> >     Yes we are trying to move to 13.0.2. Wanted to verify if anyone
> > else seen this or upgrading will really solve this problem.
> >
> > Can you share how to file a bug report for this? I don't have access
> > to https://bugs.openjdk.java.net/
>
> There are directions in the hs_err crash file that points you to the web
> page to file a bug.
>
> You seem to be running AdoptJDK builds so your bug reports would end up
> at their system:
>  >     > # If you would like to submit a bug report, please visit:
>  >     > # https://github.com/AdoptOpenJDK/openjdk-build/issues
>
>
> If you were running with Oracle binaries you would get lines like this:
> # If you would like to submit a bug report, please visit:
> #   https://bugreport.java.com/bugreport/crash.jsp
>
> >
> > I will try to run with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC to
> > get more information.
>
> OK. Hopefully this gives us more information.
>
> StefanK
> >
> >
> > Thanks
> > Sundar
> >
> > On Mon, Feb 10, 2020 at 2:42 PM Stefan Karlsson
> > <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
> >
> >     Hi Sundar,
> >
> >     On 2020-02-10 19:32, Sundara Mohan M wrote:
> >     > Hi Stefan,
> >     >     We started seeing more crashes on JDK13.0.1+9
> >     >
> >     > Since seeing it on GC Task Thread assumed it is related to GC.
> >
> >     As I said in my previous mail, I don't think this is caused by GC
> >     code.
> >     More below.
> >
> >     >
> >     > # Problematic frame:
> >     > # V  [libjvm.so+0xd183c0]
> >      PSRootsClosure<false>::do_oop(oopDesc**)+0x30
> >     >
> >     > Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m
> >     > -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc
> >     > -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh
> >     > reads=5 ...
> >     >
> >     > Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G,
> >     Red
> >     > Hat Enterprise Linux Server release 6.10 (Santiago)
> >     > Time: Fri Feb  7 11:15:04 2020 UTC elapsed time: 286290 seconds
> >     (3d 7h
> >     > 31m 30s)
> >     >
> >     > ---------------  T H R E A D  ---------------
> >     >
> >     > Current thread (0x00007fca6c074000):  GCTaskThread "ParGC
> >     Thread#28"
> >     > [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]
> >     >
> >     > Stack: [0x00007fba72ff1000,0x00007fba730f1000],
> >     >  sp=0x00007fba730ee850,  free space=1014k
> >     > Native frames: (J=compiled Java code, A=aot compiled Java code,
> >     > j=interpreted, Vv=VM code, C=native code)
> >     > V  [libjvm.so+0xd183c0]
> >      PSRootsClosure<false>::do_oop(oopDesc**)+0x30
> >     > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*,
> >     RegisterMap
> >     > const*, OopClosure*)+0x2eb
> >     > V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> >     > CodeBlobClosure*, RegisterMap*, bool)+0x99
> >     > V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> >     > CodeBlobClosure*)+0x187
> >     > V  [libjvm.so+0xd190be]  ThreadRootsTask::do_it(GCTaskManager*,
> >     > unsigned int)+0x6e
> >     > V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> >     > V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> >     > V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
> >     >
> >     > JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed
> >     > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> >     > v  ~RuntimeStub::_new_array_Java
> >     > J 58520 c2
> >     >
> >
>  ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> >
> >     > (207 bytes) @ 0x00007fca5fd23dec
> >     [0x00007fca5fd1dbc0+0x000000000000622c]
> >     > J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V
> >     > (1004 bytes) @ 0x00007fca60c02588
> >     [0x00007fca60bffce0+0x00000000000028a8]
> >     > J 58224 c2
> >     >
> >
>  webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> >
> >     > (105 bytes) @ 0x00007fca5f59bad8
> >     [0x00007fca5f59b880+0x0000000000000258]
> >     > J 69992 c2
> >     >
> >
>  webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> >
> >     > (9 bytes) @ 0x00007fca5e1019f4
> >     [0x00007fca5e101940+0x00000000000000b4]
> >     > J 55265 c2
> >     >
> >
>  webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> >
> >     > (332 bytes) @ 0x00007fca5f6f58e0
> >     [0x00007fca5f6f5700+0x00000000000001e0]
> >     > J 483122 c2
> >     webservice.filters.ResponseSerializationWorker.execute()Z
> >     > (272 bytes) @ 0x00007fca622fc2b4
> >     [0x00007fca622fbc80+0x0000000000000634]
> >     > J 15811% c2
> >     >
> >
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> >
> >     > (486 bytes) @ 0x00007fca5c108794
> >     [0x00007fca5c1082a0+0x00000000000004f4]
> >     > j
> >     >
> >
>   com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> >     > J 4586 c1 java.util.concurrent.FutureTask.run()V
> >     java.base at 13.0.1 (123
> >     > bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684]
> >     > J 7550 c1
> >     >
> >
>  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> >
> >     > java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4
> >     > [0x00007fca54fba8e0+0x0000000000000df4]
> >     > J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> >     > java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c
> >     > [0x00007fca5454b8c0+0x000000000000007c]
> >     > J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> >     > 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
> >     > v  ~StubRoutines::call_stub
> >     >
> >     > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> >     > 0x0000000000000000
> >     >
> >     > Does JDK11 and 13 have different code for GC. Do you think
> >     > downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here?
> >
> >     You should at least move to 13.0.2, to get the latest bug
> >     fixes/patches.
> >
> >     There has been a lot of changes in all areas of the JVM between 11
> >     and
> >     13. We don't yet know the root cause of this crash, and I can't
> >     say if
> >     this is caused by new changes or not. Have you or anyone filed a bug
> >     report for this?
> >
> >     > Any insight to debug this will be helpful.
> >
> >     Did you try my previous suggestion to run with -XX:+VerifyBeforeGC
> >     and
> >     -XX:+VerifyAfterGC? If you can tolerate the longer GC times it
> >     introduces, then you could try to run with
> >     -XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC
> >     -XX:+VerifyAfterGC .
> >
> >     Cheers,
> >     StefanK
> >
> >     >
> >     > TIA
> >     > Sundar
> >     >
> >     > On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson
> >     > <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>
> >     <mailto:stefan.karlsson at oracle.com
> >     <mailto:stefan.karlsson at oracle.com>>> wrote:
> >     >
> >     >     Hi Sundar,
> >     >
> >     >     The GC crashes when it encounters something bad on the stack:
> >     >      > V  [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*,
> >     >     RegisterMap
> >     >      > const*, OopClosure*)+0x2eb
> >     >      > V  [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
> >     >
> >     >     This is probably not a GC bug. It's more likely that this is
> >     >     caused by
> >     >     the JIT compiler. I see in your hotspot-runtime-dev thread,
> >     that you
> >     >     also get crashes in other compiler related areas.
> >     >
> >     >     If you want to rule out the GC, you can run with
> >     >     -XX:+VerifyBeforeGC and
> >     >     -XX:+VerifyAfterGC, and see if this asserts before the GC
> >     has started
> >     >     running.
> >     >
> >     >     StefanK
> >     >
> >     >     On 2020-02-04 04:38, Sundara Mohan M wrote:
> >     >     > Hi,
> >     >     >     I am seeing following crashes frequently on our servers
> >     >     > #
> >     >     > # A fatal error has been detected by the Java Runtime
> >     Environment:
> >     >     > #
> >     >     > #  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575,
> >     tid=108299
> >     >     > #
> >     >     > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build
> >     >     13.0.1+9)
> >     >     > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode,
> >     >     tiered, parallel
> >     >     > gc, linux-amd64)
> >     >     > # Problematic frame:
> >     >     > # V  [libjvm.so+0xcd3311]
> >     >     PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> >     >     > #
> >     >     > # No core dump will be written. Core dumps have been
> disabled.
> >     >     To enable
> >     >     > core dumping, try "ulimit -c unlimited" before starting
> >     Java again
> >     >     > #
> >     >     > # If you would like to submit a bug report, please visit:
> >     >     > # https://github.com/AdoptOpenJDK/openjdk-build/issues
> >     >     > #
> >     >     >
> >     >     >
> >     >     > ---------------  T H R E A D  ---------------
> >     >     >
> >     >     > Current thread (0x00007fca2c051000): GCTaskThread "ParGC
> >     >     Thread#8" [stack:
> >     >     > 0x00007fca30277000,0x00007fca30377000] [id=108299]
> >     >     >
> >     >     > Stack: [0x00007fca30277000,0x00007fca30377000],
> >     >     sp=0x00007fca30374890,
> >     >     >   free space=1014k
> >     >     > Native frames: (J=compiled Java code, A=aot compiled Java
> >     code,
> >     >     > j=interpreted, Vv=VM code, C=native code)
> >     >     > V  [libjvm.so+0xcd3311]
> >     PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> >     >     > V  [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*,
> >     >     RegisterMap
> >     >     > const*, OopClosure*)+0x2eb
> >     >     > V  [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
> >     >     > CodeBlobClosure*, RegisterMap*, bool)+0x99
> >     >     > V  [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*,
> >     >     > CodeBlobClosure*)+0x187
> >     >     > V  [libjvm.so+0xcce2f0]
> >     >     ThreadRootsMarkingTask::do_it(GCTaskManager*,
> >     >     > unsigned int)+0xb0
> >     >     > V  [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb
> >     >     > V  [libjvm.so+0xf707fd] Thread::call_run()+0x10d
> >     >     > V  [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7
> >     >     >
> >     >     > JavaThread 0x00007fb85c004800 (nid = 111387) was being
> >     processed
> >     >     > Java frames: (J=compiled Java code, j=interpreted, Vv=VM
> code)
> >     >     > v  ~RuntimeStub::_new_array_Java
> >     >     > J 225122 c2
> >     >     >
> >     >
> >
>   ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> >     >     > (207 bytes) @ 0x00007fca21f1a5d8
> >     >     [0x00007fca21f17f20+0x00000000000026b8]
> >     >     > J 62342 c2
> >     >  webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> >     >     > bytes) @ 0x00007fca20f0aec8
> >     [0x00007fca20f07f40+0x0000000000002f88]
> >     >     > J 225129 c2
> >     >     >
> >     >
> >
>   webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> >     >     > (105 bytes) @ 0x00007fca1da512ac
> >     >     [0x00007fca1da51100+0x00000000000001ac]
> >     >     > J 131643 c2
> >     >     >
> >     >
> >
>   webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> >     >     > (9 bytes) @ 0x00007fca20ce6190
> >     >     [0x00007fca20ce60c0+0x00000000000000d0]
> >     >     > J 55114 c2
> >     >     >
> >     >
> >
>   webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> >     >     > (332 bytes) @ 0x00007fca2051fe64
> >     >     [0x00007fca2051f820+0x0000000000000644]
> >     >     > J 57859 c2
> >     >  webservice.filters.ResponseSerializationWorker.execute()Z (272
> >     >     > bytes) @ 0x00007fca1ef2ed18
> >     [0x00007fca1ef2e140+0x0000000000000bd8]
> >     >     > J 16114% c2
> >     >     >
> >     >
> >
>   com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> >     >     > (486 bytes) @ 0x00007fca1ced465c
> >     >     [0x00007fca1ced4200+0x000000000000045c]
> >     >     > j
> >     >     >
> >     >
> >
>    com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> >     >     > J 11639 c2 java.util.concurrent.FutureTask.run()V
> >     >     java.base at 13.0.1 (123
> >     >     > bytes) @ 0x00007fca1cd00858
> >     [0x00007fca1cd007c0+0x0000000000000098]
> >     >     > J 7560 c1
> >     >     >
> >     >
> >
>   java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> >     >     > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
> >     >     > [0x00007fca15b23160+0x0000000000000df4]
> >     >     > J 5143 c1
> >     java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> >     >     > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
> >     >     > [0x00007fca15b39a40+0x000000000000007c]
> >     >     > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17
> >     bytes) @
> >     >     > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
> >     >     > v  ~StubRoutines::call_stub
> >     >     >
> >     >     > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL),
> >     si_addr:
> >     >     > 0x0000000000000000
> >     >     >
> >     >     > Register to memory mapping:
> >     >     > ...
> >     >     >
> >     >     > Can someone shed more info on when this can happen? I am
> >     seeing
> >     >     this on
> >     >     > multiple servers with Java 13.0.1+9 on RHEL6 servers.
> >     >     >
> >     >     > There was another thread in hotspot runtime where David
> Holmes
> >     >     pointed this
> >     >     >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128
> >     (SI_KERNEL), si_addr:
> >     >     > 0x0000000000000000
> >     >     >
> >     >     >> This seems it may be related to:
> >     >     >> https://bugs.openjdk.java.net/browse/JDK-8004124
> >     >     >
> >     >     > Just wondering if this is same or something to do with GC
> >     specific.
> >     >     >
> >     >     >
> >     >     >
> >     >     > TIA
> >     >     > Sundar
> >     >     >
> >     >
> >
>
>


From stefan.karlsson at oracle.com  Fri Feb 28 19:18:15 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 28 Feb 2020 20:18:15 +0100
Subject: Parallel GC Thread crash
In-Reply-To: <CACGCMVog90SS-aoS8TjsXi-xBF90e-khP145tESo9bnipbLSsg@mail.gmail.com>
References: <CACGCMVppFSz_62TNuuT8RgPPtt2ZhdUBKE3F7szasX3ECHWzYg@mail.gmail.com>
 <b2511208-9913-d64b-9419-30fc6c29f54b@oracle.com>
 <CACGCMVrej0bHmjAikevdb2HpO18Ya=bavqhZtTLb8JuZQT0oMw@mail.gmail.com>
 <5454bc87-1452-1402-3496-c3c8f128a499@oracle.com>
 <CACGCMVroXwRupLH3yrC36zZBBR3En=V4tSrfSXCq8mrZdh9zuw@mail.gmail.com>
 <6afab3a3-92ab-f1bf-2022-9e21034cd28a@oracle.com>
 <CACGCMVog90SS-aoS8TjsXi-xBF90e-khP145tESo9bnipbLSsg@mail.gmail.com>
Message-ID: <90c436e6-c678-7e12-a03f-1ad835c6d667@oracle.com>

Hi Sundar,

On 2020-02-28 19:57, Sundara Mohan M wrote:
> Hi?Stefan,
> ? ? I tried running with?-XX:+VerifyBeforeGC and -XX:+VerifyAfterGC 
> but it seems some of the operations are timing out (ex. ssl connect 
> not sure if i have very low timeout or this flag increases the latency)

The flag increase the latencies, because it runs extra verification 
checks in the pauses.

>
> Also from the crash error log i see following
> ...
> ? 0x00007f0588066000 GCTaskThread "ParGC Thread#20" [stack: 
> 0x00007ef5c4d13000,0x00007ef5c4e13000] [id=87534]
> *=>0x00007f0588068000 GCTaskThread "ParGC Thread#21" [stack: 
> 0x00007ef5c4c12000,0x00007ef5c4d12000] [id=87535]
> *? 0x00007f0588069800 GCTaskThread "ParGC Thread#22" [stack: 
> 0x00007ef5c4b11000,0x00007ef5c4c11000] [id=87536]
> ...
> Threads with active compile tasks:
>
> *VM state:at safepoint (normal execution)
> *
>
> VM Mutex/Monitor currently owned by a thread: ?([mutex/lock_event])
> [0x00007f0588015750] Threads_lock - owner thread: 0x00007f0588112800
> [0x00007f0588016350] Heap_lock - owner thread: 0x00007ef4100f9000
>
> Does this mean VM is in safepoint and executing GC operation or JVM 
> related activity which requires to be in safepoint (ie. not executing 
> user code) when it crashed?

Yes, exactly. The Parallel GC does all work in a stop-the-world pause.

StefanK

> I am trying to see if library or application code is causing this 
> behaviour.
>
> Thanks
> Sundar
>
> On Mon, Feb 10, 2020 at 3:13 PM Stefan Karlsson 
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
>     On 2020-02-10 20:53, Sundara Mohan M wrote:
>     > Hi Stefan,
>     > ? ? Yes we are trying to move to 13.0.2. Wanted to verify if anyone
>     > else seen this or upgrading will really solve this?problem.
>     >
>     > Can you share how to file a bug report for this? I don't have
>     access
>     > to https://bugs.openjdk.java.net/
>
>     There are directions in the hs_err crash file that points you to
>     the web
>     page to file a bug.
>
>     You seem to be running AdoptJDK builds so your bug reports would
>     end up
>     at their system:
>     ?>? ? ?> # If you would like to submit a bug report, please visit:
>     ?>? ? ?> # https://github.com/AdoptOpenJDK/openjdk-build/issues
>     <https://urldefense.com/v3/__https://github.com/AdoptOpenJDK/openjdk-build/issues__;!!GqivPVa7Brio!PT70czeXdrqB3tGHjnmJA2Ds3bxId1GpNsjq5FfqX84mHS8aVWbikUqEuOqvwXQ0vVOn$>
>
>
>     If you were running with Oracle binaries you would get lines like
>     this:
>     # If you would like to submit a bug report, please visit:
>     # https://bugreport.java.com/bugreport/crash.jsp
>     <https://urldefense.com/v3/__https://bugreport.java.com/bugreport/crash.jsp__;!!GqivPVa7Brio!PT70czeXdrqB3tGHjnmJA2Ds3bxId1GpNsjq5FfqX84mHS8aVWbikUqEuOqvwcxkrNHY$>
>
>     >
>     > I will try to run with -XX:+VerifyBeforeGC and
>     -XX:+VerifyAfterGC to
>     > get more information.
>
>     OK. Hopefully this gives us more information.
>
>     StefanK
>     >
>     >
>     > Thanks
>     > Sundar
>     >
>     > On Mon, Feb 10, 2020 at 2:42 PM Stefan Karlsson
>     > <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>
>     <mailto:stefan.karlsson at oracle.com
>     <mailto:stefan.karlsson at oracle.com>>> wrote:
>     >
>     >? ? ?Hi Sundar,
>     >
>     >? ? ?On 2020-02-10 19:32, Sundara Mohan M wrote:
>     >? ? ?> Hi?Stefan,
>     >? ? ?> ? ? We started seeing more crashes on JDK13.0.1+9
>     >? ? ?>
>     >? ? ?> Since seeing it on GC Task Thread assumed it is related to GC.
>     >
>     >? ? ?As I said in my previous mail, I don't think this is caused
>     by GC
>     >? ? ?code.
>     >? ? ?More below.
>     >
>     >? ? ?>
>     >? ? ?> # Problematic frame:
>     >? ? ?> # V ?[libjvm.so+0xd183c0]
>     >? ? ??PSRootsClosure<false>::do_oop(oopDesc**)+0x30
>     >? ? ?>
>     >? ? ?> Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m
>     >? ? ?> -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc
>     >? ? ?> -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh
>     >? ? ?> reads=5 ...
>     >? ? ?>
>     >? ? ?> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores,
>     125G,
>     >? ? ?Red
>     >? ? ?> Hat Enterprise Linux Server release 6.10 (Santiago)
>     >? ? ?> Time: Fri Feb ?7 11:15:04 2020 UTC elapsed time: 286290
>     seconds
>     >? ? ?(3d 7h
>     >? ? ?> 31m 30s)
>     >? ? ?>
>     >? ? ?> --------------- ?T H R E A D ?---------------
>     >? ? ?>
>     >? ? ?> Current thread (0x00007fca6c074000): ?GCTaskThread "ParGC
>     >? ? ?Thread#28"
>     >? ? ?> [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]
>     >? ? ?>
>     >? ? ?> Stack: [0x00007fba72ff1000,0x00007fba730f1000],
>     >? ? ?> ?sp=0x00007fba730ee850, ?free space=1014k
>     >? ? ?> Native frames: (J=compiled Java code, A=aot compiled Java
>     code,
>     >? ? ?> j=interpreted, Vv=VM code, C=native code)
>     >? ? ?> V ?[libjvm.so+0xd183c0]
>     >? ? ??PSRootsClosure<false>::do_oop(oopDesc**)+0x30
>     >? ? ?> V ?[libjvm.so+0xc6bf0b] ?OopMapSet::oops_do(frame const*,
>     >? ? ?RegisterMap
>     >? ? ?> const*, OopClosure*)+0x2eb
>     >? ? ?> V ?[libjvm.so+0x765489] ?frame::oops_do_internal(OopClosure*,
>     >? ? ?> CodeBlobClosure*, RegisterMap*, bool)+0x99
>     >? ? ?> V ?[libjvm.so+0xf68b17] ?JavaThread::oops_do(OopClosure*,
>     >? ? ?> CodeBlobClosure*)+0x187
>     >? ? ?> V ?[libjvm.so+0xd190be]
>     ?ThreadRootsTask::do_it(GCTaskManager*,
>     >? ? ?> unsigned int)+0x6e
>     >? ? ?> V ?[libjvm.so+0x7f422b] ?GCTaskThread::run()+0x1eb
>     >? ? ?> V ?[libjvm.so+0xf707fd] ?Thread::call_run()+0x10d
>     >? ? ?> V ?[libjvm.so+0xc875b7] ?thread_native_entry(Thread*)+0xe7
>     >? ? ?>
>     >? ? ?> JavaThread 0x00007fb8f4036800 (nid = 60927) was being
>     processed
>     >? ? ?> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>     >? ? ?> v ?~RuntimeStub::_new_array_Java
>     >? ? ?> J 58520 c2
>     >? ? ?>
>     >
>     ?ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
>     >
>     >? ? ?> (207 bytes) @ 0x00007fca5fd23dec
>     >? ? ?[0x00007fca5fd1dbc0+0x000000000000622c]
>     >? ? ?> J 66864 c2
>     webservice.exception.ExceptionLoggingWrapper.execute()V
>     >? ? ?> (1004 bytes) @ 0x00007fca60c02588
>     >? ? ?[0x00007fca60bffce0+0x00000000000028a8]
>     >? ? ?> J 58224 c2
>     >? ? ?>
>     >
>     ?webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     >
>     >? ? ?> (105 bytes) @ 0x00007fca5f59bad8
>     >? ? ?[0x00007fca5f59b880+0x0000000000000258]
>     >? ? ?> J 69992 c2
>     >? ? ?>
>     >
>     ?webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     >
>     >? ? ?> (9 bytes) @ 0x00007fca5e1019f4
>     >? ? ?[0x00007fca5e101940+0x00000000000000b4]
>     >? ? ?> J 55265 c2
>     >? ? ?>
>     >
>     ?webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
>     >
>     >? ? ?> (332 bytes) @ 0x00007fca5f6f58e0
>     >? ? ?[0x00007fca5f6f5700+0x00000000000001e0]
>     >? ? ?> J 483122 c2
>     > ?webservice.filters.ResponseSerializationWorker.execute()Z
>     >? ? ?> (272 bytes) @ 0x00007fca622fc2b4
>     >? ? ?[0x00007fca622fbc80+0x0000000000000634]
>     >? ? ?> J 15811% c2
>     >? ? ?>
>     >
>     ?com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
>     >
>     >? ? ?> (486 bytes) @ 0x00007fca5c108794
>     >? ? ?[0x00007fca5c1082a0+0x00000000000004f4]
>     >? ? ?> j
>     >? ? ?>
>     >
>     ??com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
>     >? ? ?> J 4586 c1 java.util.concurrent.FutureTask.run()V
>     >? ? ?java.base at 13.0.1 (123
>     >? ? ?> bytes) @ 0x00007fca54d27184
>     [0x00007fca54d26b00+0x0000000000000684]
>     >? ? ?> J 7550 c1
>     >? ? ?>
>     >
>     ?java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>     >
>     >? ? ?> java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4
>     >? ? ?> [0x00007fca54fba8e0+0x0000000000000df4]
>     >? ? ?> J 7549 c1
>     java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>     >? ? ?> java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c
>     >? ? ?> [0x00007fca5454b8c0+0x000000000000007c]
>     >? ? ?> J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17
>     bytes) @
>     >? ? ?> 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
>     >? ? ?> v ?~StubRoutines::call_stub
>     >? ? ?>
>     >? ? ?> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL),
>     si_addr:
>     >? ? ?> 0x0000000000000000
>     >? ? ?>
>     >? ? ?> Does JDK11 and 13 have different code for GC. Do you think
>     >? ? ?> downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help
>     here?
>     >
>     >? ? ?You should at least move to 13.0.2, to get the latest bug
>     >? ? ?fixes/patches.
>     >
>     >? ? ?There has been a lot of changes in all areas of the JVM
>     between 11
>     >? ? ?and
>     >? ? ?13. We don't yet know the root cause of this crash, and I can't
>     >? ? ?say if
>     >? ? ?this is caused by new changes or not. Have you or anyone
>     filed a bug
>     >? ? ?report for this?
>     >
>     >? ? ?> Any insight to debug this will be helpful.
>     >
>     >? ? ?Did you try my previous suggestion to run with
>     -XX:+VerifyBeforeGC
>     >? ? ?and
>     >? ? ?-XX:+VerifyAfterGC? If you can tolerate the longer GC times it
>     >? ? ?introduces, then you could try to run with
>     >? ? ?-XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC
>     >? ? ?-XX:+VerifyAfterGC .
>     >
>     >? ? ?Cheers,
>     >? ? ?StefanK
>     >
>     >? ? ?>
>     >? ? ?> TIA
>     >? ? ?> Sundar
>     >? ? ?>
>     >? ? ?> On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson
>     >? ? ?> <stefan.karlsson at oracle.com
>     <mailto:stefan.karlsson at oracle.com>
>     <mailto:stefan.karlsson at oracle.com
>     <mailto:stefan.karlsson at oracle.com>>
>     >? ? ?<mailto:stefan.karlsson at oracle.com
>     <mailto:stefan.karlsson at oracle.com>
>     >? ? ?<mailto:stefan.karlsson at oracle.com
>     <mailto:stefan.karlsson at oracle.com>>>> wrote:
>     >? ? ?>
>     >? ? ?>? ? ?Hi Sundar,
>     >? ? ?>
>     >? ? ?>? ? ?The GC crashes when it encounters something bad on the
>     stack:
>     >? ? ?>? ? ??> V? [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame
>     const*,
>     >? ? ?>? ? ?RegisterMap
>     >? ? ?>? ? ??> const*, OopClosure*)+0x2eb
>     >? ? ?>? ? ??> V? [libjvm.so+0x765489]
>     frame::oops_do_internal(OopClosure*,
>     >? ? ?>
>     >? ? ?>? ? ?This is probably not a GC bug. It's more likely that
>     this is
>     >? ? ?>? ? ?caused by
>     >? ? ?>? ? ?the JIT compiler. I see in your hotspot-runtime-dev
>     thread,
>     >? ? ?that you
>     >? ? ?>? ? ?also get crashes in other compiler related areas.
>     >? ? ?>
>     >? ? ?>? ? ?If you want to rule out the GC, you can run with
>     >? ? ?>? ? ?-XX:+VerifyBeforeGC and
>     >? ? ?>? ? ?-XX:+VerifyAfterGC, and see if this asserts before the GC
>     >? ? ?has started
>     >? ? ?>? ? ?running.
>     >? ? ?>
>     >? ? ?>? ? ?StefanK
>     >? ? ?>
>     >? ? ?>? ? ?On 2020-02-04 04:38, Sundara Mohan M wrote:
>     >? ? ?>? ? ?> Hi,
>     >? ? ?>? ? ?>? ? ?I am seeing following crashes frequently on our
>     servers
>     >? ? ?>? ? ?> #
>     >? ? ?>? ? ?> # A fatal error has been detected by the Java Runtime
>     >? ? ?Environment:
>     >? ? ?>? ? ?> #
>     >? ? ?>? ? ?> #? SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575,
>     >? ? ?tid=108299
>     >? ? ?>? ? ?> #
>     >? ? ?>? ? ?> # JRE version: OpenJDK Runtime Environment
>     (13.0.1+9) (build
>     >? ? ?>? ? ?13.0.1+9)
>     >? ? ?>? ? ?> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed
>     mode,
>     >? ? ?>? ? ?tiered, parallel
>     >? ? ?>? ? ?> gc, linux-amd64)
>     >? ? ?>? ? ?> # Problematic frame:
>     >? ? ?>? ? ?> # V? [libjvm.so+0xcd3311]
>     >? ? ?>? ? ?PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
>     >? ? ?>? ? ?> #
>     >? ? ?>? ? ?> # No core dump will be written. Core dumps have been
>     disabled.
>     >? ? ?>? ? ?To enable
>     >? ? ?>? ? ?> core dumping, try "ulimit -c unlimited" before starting
>     >? ? ?Java again
>     >? ? ?>? ? ?> #
>     >? ? ?>? ? ?> # If you would like to submit a bug report, please
>     visit:
>     >? ? ?>? ? ?> #
>     https://github.com/AdoptOpenJDK/openjdk-build/issues
>     <https://urldefense.com/v3/__https://github.com/AdoptOpenJDK/openjdk-build/issues__;!!GqivPVa7Brio!PT70czeXdrqB3tGHjnmJA2Ds3bxId1GpNsjq5FfqX84mHS8aVWbikUqEuOqvwXQ0vVOn$>
>     >? ? ?>? ? ?> #
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> ---------------? T H R E A D ---------------
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> Current thread (0x00007fca2c051000): GCTaskThread "ParGC
>     >? ? ?>? ? ?Thread#8" [stack:
>     >? ? ?>? ? ?> 0x00007fca30277000,0x00007fca30377000] [id=108299]
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> Stack: [0x00007fca30277000,0x00007fca30377000],
>     >? ? ?>? ? ?sp=0x00007fca30374890,
>     >? ? ?>? ? ?>? ?free space=1014k
>     >? ? ?>? ? ?> Native frames: (J=compiled Java code, A=aot compiled
>     Java
>     >? ? ?code,
>     >? ? ?>? ? ?> j=interpreted, Vv=VM code, C=native code)
>     >? ? ?>? ? ?> V? [libjvm.so+0xcd3311]
>     >? ? ?PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
>     >? ? ?>? ? ?> V? [libjvm.so+0xc6bf0b] OopMapSet::oops_do(frame const*,
>     >? ? ?>? ? ?RegisterMap
>     >? ? ?>? ? ?> const*, OopClosure*)+0x2eb
>     >? ? ?>? ? ?> V? [libjvm.so+0x765489]
>     frame::oops_do_internal(OopClosure*,
>     >? ? ?>? ? ?> CodeBlobClosure*, RegisterMap*, bool)+0x99
>     >? ? ?>? ? ?> V? [libjvm.so+0xf68b17] JavaThread::oops_do(OopClosure*,
>     >? ? ?>? ? ?> CodeBlobClosure*)+0x187
>     >? ? ?>? ? ?> V? [libjvm.so+0xcce2f0]
>     >? ? ?> ?ThreadRootsMarkingTask::do_it(GCTaskManager*,
>     >? ? ?>? ? ?> unsigned int)+0xb0
>     >? ? ?>? ? ?> V? [libjvm.so+0x7f422b] GCTaskThread::run()+0x1eb
>     >? ? ?>? ? ?> V? [libjvm.so+0xf707fd] Thread::call_run()+0x10d
>     >? ? ?>? ? ?> V? [libjvm.so+0xc875b7]
>     thread_native_entry(Thread*)+0xe7
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> JavaThread 0x00007fb85c004800 (nid = 111387) was being
>     >? ? ?processed
>     >? ? ?>? ? ?> Java frames: (J=compiled Java code, j=interpreted,
>     Vv=VM code)
>     >? ? ?>? ? ?> v? ~RuntimeStub::_new_array_Java
>     >? ? ?>? ? ?> J 225122 c2
>     >? ? ?>? ? ?>
>     >? ? ?>
>     >
>     ??ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
>     >? ? ?>? ? ?> (207 bytes) @ 0x00007fca21f1a5d8
>     >? ? ?>? ? ?[0x00007fca21f17f20+0x00000000000026b8]
>     >? ? ?>? ? ?> J 62342 c2
>     >? ? ?> ?webservice.exception.ExceptionLoggingWrapper.execute()V (1004
>     >? ? ?>? ? ?> bytes) @ 0x00007fca20f0aec8
>     >? ? ?[0x00007fca20f07f40+0x0000000000002f88]
>     >? ? ?>? ? ?> J 225129 c2
>     >? ? ?>? ? ?>
>     >? ? ?>
>     >
>     ??webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     >? ? ?>? ? ?> (105 bytes) @ 0x00007fca1da512ac
>     >? ? ?>? ? ?[0x00007fca1da51100+0x00000000000001ac]
>     >? ? ?>? ? ?> J 131643 c2
>     >? ? ?>? ? ?>
>     >? ? ?>
>     >
>     ??webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     >? ? ?>? ? ?> (9 bytes) @ 0x00007fca20ce6190
>     >? ? ?>? ? ?[0x00007fca20ce60c0+0x00000000000000d0]
>     >? ? ?>? ? ?> J 55114 c2
>     >? ? ?>? ? ?>
>     >? ? ?>
>     >
>     ??webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
>     >? ? ?>? ? ?> (332 bytes) @ 0x00007fca2051fe64
>     >? ? ?>? ? ?[0x00007fca2051f820+0x0000000000000644]
>     >? ? ?>? ? ?> J 57859 c2
>     >? ? ?> ?webservice.filters.ResponseSerializationWorker.execute()Z
>     (272
>     >? ? ?>? ? ?> bytes) @ 0x00007fca1ef2ed18
>     >? ? ?[0x00007fca1ef2e140+0x0000000000000bd8]
>     >? ? ?>? ? ?> J 16114% c2
>     >? ? ?>? ? ?>
>     >? ? ?>
>     >
>     ??com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
>     >? ? ?>? ? ?> (486 bytes) @ 0x00007fca1ced465c
>     >? ? ?>? ? ?[0x00007fca1ced4200+0x000000000000045c]
>     >? ? ?>? ? ?> j
>     >? ? ?>? ? ?>
>     >? ? ?>
>     >
>     ???com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
>     >? ? ?>? ? ?> J 11639 c2 java.util.concurrent.FutureTask.run()V
>     >? ? ?>? ? ?java.base at 13.0.1 (123
>     >? ? ?>? ? ?> bytes) @ 0x00007fca1cd00858
>     >? ? ?[0x00007fca1cd007c0+0x0000000000000098]
>     >? ? ?>? ? ?> J 7560 c1
>     >? ? ?>? ? ?>
>     >? ? ?>
>     >
>     ??java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>     >? ? ?>? ? ?> java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
>     >? ? ?>? ? ?> [0x00007fca15b23160+0x0000000000000df4]
>     >? ? ?>? ? ?> J 5143 c1
>     >? ? ?java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>     >? ? ?>? ? ?> java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
>     >? ? ?>? ? ?> [0x00007fca15b39a40+0x000000000000007c]
>     >? ? ?>? ? ?> J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17
>     >? ? ?bytes) @
>     >? ? ?>? ? ?> 0x00007fca159fc174
>     [0x00007fca159fc040+0x0000000000000134]
>     >? ? ?>? ? ?> v? ~StubRoutines::call_stub
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> siginfo: si_signo: 11 (SIGSEGV), si_code: 128
>     (SI_KERNEL),
>     >? ? ?si_addr:
>     >? ? ?>? ? ?> 0x0000000000000000
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> Register to memory mapping:
>     >? ? ?>? ? ?> ...
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> Can someone shed more info on when this can happen? I am
>     >? ? ?seeing
>     >? ? ?>? ? ?this on
>     >? ? ?>? ? ?> multiple servers with Java 13.0.1+9 on RHEL6 servers.
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> There was another thread in hotspot runtime where
>     David Holmes
>     >? ? ?>? ? ?pointed this
>     >? ? ?>? ? ?>> siginfo: si_signo: 11 (SIGSEGV), si_code: 128
>     >? ? ?(SI_KERNEL), si_addr:
>     >? ? ?>? ? ?> 0x0000000000000000
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?>> This seems it may be related to:
>     >? ? ?>? ? ?>> https://bugs.openjdk.java.net/browse/JDK-8004124
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> Just wondering if this is same or something to do
>     with GC
>     >? ? ?specific.
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?>
>     >? ? ?>? ? ?> TIA
>     >? ? ?>? ? ?> Sundar
>     >? ? ?>? ? ?>
>     >? ? ?>
>     >
>


From leo.korinth at oracle.com  Fri Feb 28 19:24:28 2020
From: leo.korinth at oracle.com (Leo Korinth)
Date: Fri, 28 Feb 2020 20:24:28 +0100
Subject: RFR: 8203239: [TESTBUG] remove vmTestbase/vm/gc/kind/parOld test
In-Reply-To: <b16a6a9e-19d5-ed9d-4eac-6d44144385f2@oracle.com>
References: <b16a6a9e-19d5-ed9d-4eac-6d44144385f2@oracle.com>
Message-ID: <22b9e055-ab0f-75fb-bebf-d9955db018fb@oracle.com>

On 21/02/2020 20:48, Leonid Mesnik wrote:
> Hi
> 
> Could you please review following fix which removes parOld test. Test 
> checks that ParOldGC is used if no GC is selected and new gen GC is 
> PSYoungGen. Test is obsolete now and should be removed.
> 
> webrev: http://cr.openjdk.java.net/~lmesnik/8203239/webrev.00/

Looks good to me (I am not a reviewer).

Thanks for cleaning up!
/Leo

> bug: https://bugs.openjdk.java.net/browse/JDK-8203239
> 


From shade at redhat.com  Fri Feb 28 19:36:24 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 28 Feb 2020 20:36:24 +0100
Subject: RFR: 8203239: [TESTBUG] remove vmTestbase/vm/gc/kind/parOld test
In-Reply-To: <22b9e055-ab0f-75fb-bebf-d9955db018fb@oracle.com>
References: <b16a6a9e-19d5-ed9d-4eac-6d44144385f2@oracle.com>
 <22b9e055-ab0f-75fb-bebf-d9955db018fb@oracle.com>
Message-ID: <05e2aace-200a-a6f8-43bb-d8bcb8977eab@redhat.com>

On 2/28/20 8:24 PM, Leo Korinth wrote:
> On 21/02/2020 20:48, Leonid Mesnik wrote:
>> Hi
>>
>> Could you please review following fix which removes parOld test. Test 
>> checks that ParOldGC is used if no GC is selected and new gen GC is 
>> PSYoungGen. Test is obsolete now and should be removed.
>>
>> webrev: http://cr.openjdk.java.net/~lmesnik/8203239/webrev.00/
> 
> Looks good to me (I am not a reviewer).

Looks good.

-- 
Thanks,
-Aleksey


From kim.barrett at oracle.com  Fri Feb 28 21:48:08 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 28 Feb 2020 16:48:08 -0500
Subject: RFR: 8240239: Replace ConcurrentGCPhaseManager 
Message-ID: <4C14B89F-1550-44DE-B738-0DBEE7A2E167@oracle.com>

Please review this change which removes the ConcurrentGCPhaseManager
class and replaces it with ConcurrentGCBreakpoints.

This is joint work with Per Liden.

This change provides a client API, used by WhiteBox.  The usage model
for a client is

(1) Acquire control of concurrent collection cycles.

(2) Do work that must be performed while the collection cycle is in a
known state.

(3) Request the concurrent collector run to a named "breakpoint", or
run to completion, and then hold there, waiting for further commands.

(4) Optionally goto (2).

(5) Release control of concurrent collection cycles.

Tests have been updated to use the new WhiteBox API.

This change provides implementations of the new mechanism for G1 and
ZGC.  A Shenandoah implementation is being left to others, but we
don't see any obvious reason for it to be difficult.

CR:
https://bugs.openjdk.java.net/browse/JDK-8240239

Webrev:
https://cr.openjdk.java.net/~kbarrett/8240239/open.03/

To possibly simplify the review, the open patch is also provided as a
pair of patches, one for removing the old mechanism and a second to
add the new mechanism.

https://cr.openjdk.java.net/~kbarrett/8240239/remove_phase_control.03/ 
Removes ConcurrentGCPhaseManager and its G1 implementation, except
that tests are not modifed.

https://cr.openjdk.java.net/~kbarrett/8240239/control.03/
Adds ConcurrenGCBreakpoints, with G1 and ZGC implementations, and
updates tests to use it.

Testing:
mach5 tier1-5, which includes all the updated and new tests.


From leonid.mesnik at oracle.com  Fri Feb 28 23:56:17 2020
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Fri, 28 Feb 2020 15:56:17 -0800
Subject: RFR: 8203239: [TESTBUG] remove vmTestbase/vm/gc/kind/parOld test
In-Reply-To: <05e2aace-200a-a6f8-43bb-d8bcb8977eab@redhat.com>
References: <b16a6a9e-19d5-ed9d-4eac-6d44144385f2@oracle.com>
 <22b9e055-ab0f-75fb-bebf-d9955db018fb@oracle.com>
 <05e2aace-200a-a6f8-43bb-d8bcb8977eab@redhat.com>
Message-ID: <70b6a9c4-42a7-0cf4-cd34-2c597df67eff@oracle.com>

Aleksey, Leo

Thank you for review.

Leonid

On 2/28/20 11:36 AM, Aleksey Shipilev wrote:
> On 2/28/20 8:24 PM, Leo Korinth wrote:
>> On 21/02/2020 20:48, Leonid Mesnik wrote:
>>> Hi
>>>
>>> Could you please review following fix which removes parOld test. Test
>>> checks that ParOldGC is used if no GC is selected and new gen GC is
>>> PSYoungGen. Test is obsolete now and should be removed.
>>>
>>> webrev: http://cr.openjdk.java.net/~lmesnik/8203239/webrev.00/
>> Looks good to me (I am not a reviewer).
> Looks good.
>