From jianglizhou at google.com  Mon Jun  1 03:27:08 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Sun, 31 May 2020 20:27:08 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <13836d5c-db91-6e6a-5022-0e7585722f77@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <CALrW1jyzfEE36GCCYjEF3noqF7+_wc7Lzj1K-beix=aj5L5guA@mail.gmail.com>
 <13836d5c-db91-6e6a-5022-0e7585722f77@oracle.com>
Message-ID: <CALrW1jz5FgDmexfZf8xAFsDh3WcVSkFvT1PED3413BJxDudcPQ@mail.gmail.com>

On Fri, May 29, 2020 at 10:44 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 5/29/20 8:40 PM, Jiangli Zhou wrote:
> > On Fri, May 29, 2020 at 7:30 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >> https://bugs.openjdk.java.net/browse/JDK-8245925
> >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>
> >>
> >> Summary:
> >>
> >> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >> CDS executes a full GC so that G1 will compact the heap regions, leaving
> >> maximum contiguous free space at the top of the heap. Then, the archived
> >> heap regions are allocated from the top of the heap.
> >>
> >> Under some circumstances, java.lang.ref.Cleaners will execute
> >> after the GC has completed. The cleaners may allocate or synchronized, which
> >> will cause G1 to allocate an EDEN region at the top of the heap.
> > This is an interesting one. Please give more details on under what
> > circumstances java.lang.ref.Cleaners causes the issue. It's unclear to
> > me why it hasn't been showing up before.
>
> Hi Jiangli,
>
> Thanks for the review. It's very helpful.
>
> The assert (see my comment in JDK-8245925) happened in my prototype for
> JDK-8244778
>
> http://cr.openjdk.java.net/~iklam/jdk15/8244778-archive-full-module-graph.v00.8/
>
> I have to archive AppClassLoader and PlatformClassLoader, but need to
> clear their URLClassPath field (the "ucp" field). See
> clear_loader_states in metaspaceShared.cpp. Because of this, some
> java.util.zip.ZipFiles referenced by the URLClassPath  become garbage,
> and their Cleaners are executed after full GC has finished.
>

I haven't looked at your 8244778-archive-full-module-graph change yet,
if you are going to archive class loader objects, you probably want to
go with a solution that scrubs fields that are 'not archivable' and
then restores at runtime. Sounds like you are going with that. When I
worked on the initial implementation for system module object
archiving, I implemented static field scrubber with the goal for
archiving class loaders. I didn't complete it as it was not yet
needed, but the code probably is helpful for you now. I might have
sent you the pointer to one of the versions at the time, but try
looking under my old /home directory if it's still around. It might be
good to trigger runtime field restoration by Java code, that's the
part I haven't fully explored yet. But, hopefully these inputs would
be useful for your current work.

> I think the bug has always existed, but is just never triggered because
> we have not activated the Cleaners.
>
> >
> >>
> >> The fix is simple -- after CDS has entered a safepoint, if EDEN regions
> >> exist,
> >> exit the safepoint, run GC, and try again. Eventually all the cleaners will
> >> be executed and no more allocation can happen.
> >>
> >> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>
> > I think it's better to skip the top allocated region(s) in such cases
> > and avoid retrying. Dump time performance is important, as we are
> > moving the cost from runtime to CDS dump time. It's desirable to keep
> > the dump time cost as low as possible, so using CDS delivers better
> > net gain overall.
> >
> > Here are some comments for your current webrev itself.
> >
> > 1611 static bool has_unwanted_g1_eden_regions() {
> > 1612 #if INCLUDE_G1GC
> > 1613   return HeapShared::is_heap_object_archiving_allowed() && UseG1GC &&
> > 1614          G1CollectedHeap::heap()->eden_regions_count() > 0;
> > 1615 #else
> > 1616   return false;
> > 1617 #endif
> > 1618 }
> >
> > You can remove 'UseG1GC' from line 1613, as
> > is_heap_object_archiving_allowed() check already covers it:
> >
> > static bool is_heap_object_archiving_allowed() {
> >    CDS_JAVA_HEAP_ONLY(return (UseG1GC && UseCompressedOops &&
> > UseCompressedClassPointers);)
> >    NOT_CDS_JAVA_HEAP(return false;)
> > }
> >
> > Please include heap archiving code under #if INCLUDE_CDS_JAVA_HEAP.
> > It's better to extract the GC handling code in
> > VM_PopulateDumpSharedSpace::doit() into a separate API in
> > heapShared.*.
> >
> > It's time to enhance heap archiving to use a separate buffer when
> > copying the objects at dump time (discussed before), as a longer term
> > fix. I'll file a RFE.
>
> Thanks for reminding me. I think that is a better way to fix this
> problem. It should be fairly easy to do, as we can already relocate the
> heap regions using HeapShared::patch_archived_heap_embedded_pointers().
> Let me try to implement it.
>

Sounds good. Thanks for doing that.

> BTW, the GC speed is very fast, because the heap is not used very much
> during -Xshare:dump. -Xlog:gc shows:
>
> [0.259s][info][gc ] GC(0) Pause Full (Full GC for -Xshare:dump)
> 4M->1M(32M) 8.220ms
>
> So we have allocated only 4MB of objects, and only 1MB of those are
> reachable.
>
> Anyway, I think we can even avoid running the GC altogether. We can scan
> for contiguous free space from the top of the heap (below the EDEN
> region). If there's more contiguous free space than the current
> allocated heap regions, we know for sure that we can archive all the
> heap objects that we need without doing the GC. That can be done as an
> RFE after copying the objects. It won't save much though (8ms out of
> about 700ms of total -Xshare:dump time).

Looks like we had similar thoughts about finding free heap regions for
copying. Here are the details:

Solution 1):
Allocate a buffer (no specific memory location requirement) and copy
the heap objects to the buffer. Additional buffers can be allocated
when needed, and they don't need to form a consecutive block of
memory. The pointers within the copied Java objects need to be
computed from the Java heap top as the 'real heap address' (so their
runtime positions are at the heap top), instead of the buffer address.
Region verification code also needs to be updated to reflect how the
pointers are computed now.

Solution 2):
Find a range (consecutive heap regions) within the Java heap that is
free. Copy the archive objects to that range. The pointer update and
verification are similar to solution 1).

I think you can go with solution 1 now. Solution 2) has the benefit of
not requiring additional memory for copying archived objects. That's
important as I did run into insufficient memory at dump time in real
use cases, so any memory saving at dump time is desirable. It's better
to go with 2) when the size of archive range is known before copying.
With the planned work for class pre-initialization and enhanced object
archiving support, it will be able to obtain (or have a good estimate
of) the total size before copying. Solution 1 can be enhanced to use
heap memory when that happens.

I'll log these details as a RFE on Monday.

Best,
Jiangli

>
> I'll withdraw this patch for now, and will try to implement the object
> copying.
>
> Thanks
> - Ioi
>
> >
> > Best,
> > Jiangli
> >
> >> Thanks
> >> - Ioi
> >>
> >>
> >> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>


From jianglizhou at google.com  Mon Jun  1 03:33:20 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Sun, 31 May 2020 20:33:20 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <CALrW1jz5FgDmexfZf8xAFsDh3WcVSkFvT1PED3413BJxDudcPQ@mail.gmail.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <CALrW1jyzfEE36GCCYjEF3noqF7+_wc7Lzj1K-beix=aj5L5guA@mail.gmail.com>
 <13836d5c-db91-6e6a-5022-0e7585722f77@oracle.com>
 <CALrW1jz5FgDmexfZf8xAFsDh3WcVSkFvT1PED3413BJxDudcPQ@mail.gmail.com>
Message-ID: <CALrW1jyaDThW00gyVr8uOiKWF+TSsDQ5g-Zmd5QCwAX86R8xvA@mail.gmail.com>

On Sun, May 31, 2020 at 8:27 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>
> On Fri, May 29, 2020 at 10:44 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >
> >
> >
> > On 5/29/20 8:40 PM, Jiangli Zhou wrote:
> > > On Fri, May 29, 2020 at 7:30 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> > >> https://bugs.openjdk.java.net/browse/JDK-8245925
> > >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> > >>
> > >>
> > >> Summary:
> > >>
> > >> CDS supports archived heap objects only for G1. During -Xshare:dump,
> > >> CDS executes a full GC so that G1 will compact the heap regions, leaving
> > >> maximum contiguous free space at the top of the heap. Then, the archived
> > >> heap regions are allocated from the top of the heap.
> > >>
> > >> Under some circumstances, java.lang.ref.Cleaners will execute
> > >> after the GC has completed. The cleaners may allocate or synchronized, which
> > >> will cause G1 to allocate an EDEN region at the top of the heap.
> > > This is an interesting one. Please give more details on under what
> > > circumstances java.lang.ref.Cleaners causes the issue. It's unclear to
> > > me why it hasn't been showing up before.
> >
> > Hi Jiangli,
> >
> > Thanks for the review. It's very helpful.
> >
> > The assert (see my comment in JDK-8245925) happened in my prototype for
> > JDK-8244778
> >
> > http://cr.openjdk.java.net/~iklam/jdk15/8244778-archive-full-module-graph.v00.8/
> >
> > I have to archive AppClassLoader and PlatformClassLoader, but need to
> > clear their URLClassPath field (the "ucp" field). See
> > clear_loader_states in metaspaceShared.cpp. Because of this, some
> > java.util.zip.ZipFiles referenced by the URLClassPath  become garbage,
> > and their Cleaners are executed after full GC has finished.
> >
>
> I haven't looked at your 8244778-archive-full-module-graph change yet,
> if you are going to archive class loader objects, you probably want to
> go with a solution that scrubs fields that are 'not archivable' and
> then restores at runtime. Sounds like you are going with that. When I
> worked on the initial implementation for system module object
> archiving, I implemented static field scrubber with the goal for
> archiving class loaders. I didn't complete it as it was not yet
> needed, but the code probably is helpful for you now. I might have
> sent you the pointer to one of the versions at the time, but try
> looking under my old /home directory if it's still around. It might be
> good to trigger runtime field restoration by Java code, that's the
> part I haven't fully explored yet. But, hopefully these inputs would
> be useful for your current work.
>
> > I think the bug has always existed, but is just never triggered because
> > we have not activated the Cleaners.
> >
> > >
> > >>
> > >> The fix is simple -- after CDS has entered a safepoint, if EDEN regions
> > >> exist,
> > >> exit the safepoint, run GC, and try again. Eventually all the cleaners will
> > >> be executed and no more allocation can happen.
> > >>
> > >> For safety, I limit the retry count to 30 (or about total 9 seconds).
> > >>
> > > I think it's better to skip the top allocated region(s) in such cases
> > > and avoid retrying. Dump time performance is important, as we are
> > > moving the cost from runtime to CDS dump time. It's desirable to keep
> > > the dump time cost as low as possible, so using CDS delivers better
> > > net gain overall.
> > >
> > > Here are some comments for your current webrev itself.
> > >
> > > 1611 static bool has_unwanted_g1_eden_regions() {
> > > 1612 #if INCLUDE_G1GC
> > > 1613   return HeapShared::is_heap_object_archiving_allowed() && UseG1GC &&
> > > 1614          G1CollectedHeap::heap()->eden_regions_count() > 0;
> > > 1615 #else
> > > 1616   return false;
> > > 1617 #endif
> > > 1618 }
> > >
> > > You can remove 'UseG1GC' from line 1613, as
> > > is_heap_object_archiving_allowed() check already covers it:
> > >
> > > static bool is_heap_object_archiving_allowed() {
> > >    CDS_JAVA_HEAP_ONLY(return (UseG1GC && UseCompressedOops &&
> > > UseCompressedClassPointers);)
> > >    NOT_CDS_JAVA_HEAP(return false;)
> > > }
> > >
> > > Please include heap archiving code under #if INCLUDE_CDS_JAVA_HEAP.
> > > It's better to extract the GC handling code in
> > > VM_PopulateDumpSharedSpace::doit() into a separate API in
> > > heapShared.*.
> > >
> > > It's time to enhance heap archiving to use a separate buffer when
> > > copying the objects at dump time (discussed before), as a longer term
> > > fix. I'll file a RFE.
> >
> > Thanks for reminding me. I think that is a better way to fix this
> > problem. It should be fairly easy to do, as we can already relocate the
> > heap regions using HeapShared::patch_archived_heap_embedded_pointers().
> > Let me try to implement it.
> >
>
> Sounds good. Thanks for doing that.
>
> > BTW, the GC speed is very fast, because the heap is not used very much
> > during -Xshare:dump. -Xlog:gc shows:
> >
> > [0.259s][info][gc ] GC(0) Pause Full (Full GC for -Xshare:dump)
> > 4M->1M(32M) 8.220ms
> >
> > So we have allocated only 4MB of objects, and only 1MB of those are
> > reachable.
> >
> > Anyway, I think we can even avoid running the GC altogether. We can scan
> > for contiguous free space from the top of the heap (below the EDEN
> > region). If there's more contiguous free space than the current
> > allocated heap regions, we know for sure that we can archive all the
> > heap objects that we need without doing the GC. That can be done as an
> > RFE after copying the objects. It won't save much though (8ms out of
> > about 700ms of total -Xshare:dump time).
>
> Looks like we had similar thoughts about finding free heap regions for
> copying. Here are the details:
>
> Solution 1):
> Allocate a buffer (no specific memory location requirement) and copy
> the heap objects to the buffer. Additional buffers can be allocated
> when needed, and they don't need to form a consecutive block of
> memory. The pointers within the copied Java objects need to be
> computed from the Java heap top as the 'real heap address' (so their
> runtime positions are at the heap top), instead of the buffer address.
> Region verification code also needs to be updated to reflect how the
> pointers are computed now.
>
> Solution 2):
> Find a range (consecutive heap regions) within the Java heap that is
> free. Copy the archive objects to that range. The pointer update and
> verification are similar to solution 1).
>
> I think you can go with solution 1 now. Solution 2) has the benefit of
> not requiring additional memory for copying archived objects. That's
> important as I did run into insufficient memory at dump time in real
> use cases, so any memory saving at dump time is desirable.

Some clarifications to avoid confusion: The insufficient memory is due
to memory restriction for builds in cloud environment.

Thanks,
Jiangli

> It's better
> to go with 2) when the size of archive range is known before copying.
> With the planned work for class pre-initialization and enhanced object
> archiving support, it will be able to obtain (or have a good estimate
> of) the total size before copying. Solution 1 can be enhanced to use
> heap memory when that happens.
>
> I'll log these details as a RFE on Monday.
>
> Best,
> Jiangli
>
> >
> > I'll withdraw this patch for now, and will try to implement the object
> > copying.
> >
> > Thanks
> > - Ioi
> >
> > >
> > > Best,
> > > Jiangli
> > >
> > >> Thanks
> > >> - Ioi
> > >>
> > >>
> > >> <https://bugs.openjdk.java.net/browse/JDK-8245925>
> >


From per.liden at oracle.com  Mon Jun  1 05:32:27 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 1 Jun 2020 07:32:27 +0200
Subject: RFR: 8245203/8245204/8245208: ZGC: Don't hold the ZPageAllocator
 lock while committing/uncommitting memory
In-Reply-To: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
References: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
Message-ID: <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>

On 5/18/20 11:23 PM, Per Liden wrote:
[...]
> 3) 8245208: ZGC: Don't hold the ZPageAllocator lock while 
> committing/uncommitting memory
> 
> We're currently holding the ZPageAllocator lock while performing a 
> number of expensive operations, such as committing and uncommitting 
> memory. This can have a very negative impact on latency, for example, 
> when a Java thread is trying to allocate a page from the page cache 
> while the ZUncommitter thread is uncommitting a portion of the heap.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8245208
> Webrev: http://cr.openjdk.java.net/~pliden/8245208/webrev.0

Updated webrev based on feedback received so far. I've split the updates 
into many smaller ones for easier reviewing.

---------------

* Misc. smaller adjustments.
http://cr.openjdk.java.net/~pliden/8245208/webrev.1-misc/

* Don't use CollectedHeap::total_collections() in ZPageAllocation.
http://cr.openjdk.java.net/~pliden/8245208/webrev.1-total_collections/

* Don't fiddle with MinHeapSize/InitialHeapSize after initialization.
http://cr.openjdk.java.net/~pliden/8245208/webrev.1-min_initial_heap_size/

* Introduce "claimed" instead of fiddling with "used" when uncommitting.
http://cr.openjdk.java.net/~pliden/8245208/webrev.1-claimed/

* Adjust JFR events.
http://cr.openjdk.java.net/~pliden/8245208/webrev.1-events/

* Introduce ZConditionLock.
http://cr.openjdk.java.net/~pliden/8245208/webrev.1-conditionlock/

* Restructure the ZUncommitter.
http://cr.openjdk.java.net/~pliden/8245208/webrev.1-zuncommitter/

* Introduce ZUnmapper to asynchronous unmap pages (broken out into a 
separate bug, JDK-8246220)
http://cr.openjdk.java.net/~pliden/8246220/webrev.0/

---------------

And finally, here's a combined diff, with all of the above patches:
http://cr.openjdk.java.net/~pliden/8245208/webrev.1

---------------

Testing: Multiple runs of Tier1-6, multiple iterations of gc-test-suite.

cheers,
Per


From per.liden at oracle.com  Mon Jun  1 08:24:35 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 1 Jun 2020 10:24:35 +0200
Subject: RFR: 8246134: ZGC: Restructure hs_err sections
In-Reply-To: <4c78bf70-3705-b9a6-6648-cce53138fd3b@oracle.com>
References: <4c78bf70-3705-b9a6-6648-cce53138fd3b@oracle.com>
Message-ID: <c70af732-3a0d-2f20-6af4-0f7406d099a4@oracle.com>

On 5/29/20 12:13 PM, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this small patch to restructure and cleanup the 
> information ZGC prints to hs_err files (and jcmd VM.info).
> 
> https://cr.openjdk.java.net/~stefank/8246134/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246134
> 
> The patch:
> - Moves the Page Table dumping later, to make it easier to find the 
> other sections
> - Pretty print some info
> - Add barrier set print (mostly to get rid of awkward double new lines)
> - Update titles and cleanup newlines

Looks good. Some nits:

1) How about we move z_global_phase_string() to zGlobal.hpp/cpp and call 
it e.g. ZGlobalPhaseToString()?


2) I find code like this unnecessarily hard to read:

+  switch (ZGlobalPhase) {
+  case ZPhaseMark: return "Mark";
+  case ZPhaseMarkCompleted: return "MarkCompleted";
+  case ZPhaseRelocate: return "Relocate";
+  default: assert(false, "Unknown ZGlobalPhase"); return "Unknown";

How about:

switch (ZGlobalPhase) {
case ZPhaseMark:
   return "Mark";

case ZPhaseMarkCompleted:
   return "MarkCompleted";

case ZPhaseRelocate:
   return "Relocate";

default:
   assert(false, "Unknown ZGlobalPhase");
   return "Unknown";
}


3) I see it was like this before your change, but how about removing the 
extra space on all print_cr-lines, for example:

  317   st->print_cr( "ZGC Globals:");

to:

  317   st->print_cr("ZGC Globals:");

It also looks like the argument indentation here is off by one:

  320   st->print_cr( " Offset Max:        " SIZE_FORMAT "%s (" 
PTR_FORMAT ")",
  321               byte_size_in_exact_unit(ZAddressOffsetMax),
  322               exact_unit_for_byte_size(ZAddressOffsetMax),
  323               ZAddressOffsetMax);
  321               byte_size_in_exact_unit(ZAddressOffsetMax),
  322               exact_unit_for_byte_size(ZAddressOffsetMax),
  323               ZAddressOffsetMax);

cheers,
Per

> 
> Thanks,
> StefanK


From shade at redhat.com  Mon Jun  1 10:24:57 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 1 Jun 2020 12:24:57 +0200
Subject: RFR (S) 8246100: Shenandoah: walk roots in more efficient order
Message-ID: <4a733745-6f94-b6aa-b8c9-8daedace22f2@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8246100

See the rationale in the RFE.

Webrev:
  https://cr.openjdk.java.net/~shade/8246100/webrev.02/

Testing: hotspot_gc_shenandoah

-- 
Thanks,
-Aleksey


From shade at redhat.com  Mon Jun  1 10:38:44 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 1 Jun 2020 12:38:44 +0200
Subject: RFR (S) 8246097: Shenandoah: limit parallelism in CLDG root handling
Message-ID: <31edeb90-b199-0ba2-e782-c38792d3bb4c@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8246097

See the details in the RFE.

Webrev:
  https://cr.openjdk.java.net/~shade/8246097/webrev.02/

Testing: hotspot_gc_shenandoah, benchmarks.

-- 
Thanks,
-Aleksey


From stefan.karlsson at oracle.com  Mon Jun  1 13:18:30 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 1 Jun 2020 15:18:30 +0200
Subject: RFR: 8246134: ZGC: Restructure hs_err sections
In-Reply-To: <c70af732-3a0d-2f20-6af4-0f7406d099a4@oracle.com>
References: <4c78bf70-3705-b9a6-6648-cce53138fd3b@oracle.com>
 <c70af732-3a0d-2f20-6af4-0f7406d099a4@oracle.com>
Message-ID: <d93dad54-d3ef-1b58-2b7e-ee9c56bc5c82@oracle.com>

Updated webrev after Per's comments:
https://cr.openjdk.java.net/~stefank/8246134/webrev.02

StefanK

On 2020-06-01 10:24, Per Liden wrote:
> On 5/29/20 12:13 PM, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this small patch to restructure and cleanup the 
>> information ZGC prints to hs_err files (and jcmd VM.info).
>>
>> https://cr.openjdk.java.net/~stefank/8246134/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246134
>>
>> The patch:
>> - Moves the Page Table dumping later, to make it easier to find the 
>> other sections
>> - Pretty print some info
>> - Add barrier set print (mostly to get rid of awkward double new lines)
>> - Update titles and cleanup newlines
>
> Looks good. Some nits:
>
> 1) How about we move z_global_phase_string() to zGlobal.hpp/cpp and 
> call it e.g. ZGlobalPhaseToString()?
>
>
> 2) I find code like this unnecessarily hard to read:
>
> +? switch (ZGlobalPhase) {
> +? case ZPhaseMark: return "Mark";
> +? case ZPhaseMarkCompleted: return "MarkCompleted";
> +? case ZPhaseRelocate: return "Relocate";
> +? default: assert(false, "Unknown ZGlobalPhase"); return "Unknown";
>
> How about:
>
> switch (ZGlobalPhase) {
> case ZPhaseMark:
> ? return "Mark";
>
> case ZPhaseMarkCompleted:
> ? return "MarkCompleted";
>
> case ZPhaseRelocate:
> ? return "Relocate";
>
> default:
> ? assert(false, "Unknown ZGlobalPhase");
> ? return "Unknown";
> }
>
>
> 3) I see it was like this before your change, but how about removing 
> the extra space on all print_cr-lines, for example:
>
> ?317?? st->print_cr( "ZGC Globals:");
>
> to:
>
> ?317?? st->print_cr("ZGC Globals:");
>
> It also looks like the argument indentation here is off by one:
>
> ?320?? st->print_cr( " Offset Max:??????? " SIZE_FORMAT "%s (" 
> PTR_FORMAT ")",
> ?321?????????????? byte_size_in_exact_unit(ZAddressOffsetMax),
> ?322?????????????? exact_unit_for_byte_size(ZAddressOffsetMax),
> ?323?????????????? ZAddressOffsetMax);
> ?321?????????????? byte_size_in_exact_unit(ZAddressOffsetMax),
> ?322?????????????? exact_unit_for_byte_size(ZAddressOffsetMax),
> ?323?????????????? ZAddressOffsetMax);
>
> cheers,
> Per
>
>>
>> Thanks,
>> StefanK


From per.liden at oracle.com  Mon Jun  1 14:00:19 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 1 Jun 2020 16:00:19 +0200
Subject: RFR: 8246134: ZGC: Restructure hs_err sections
In-Reply-To: <d93dad54-d3ef-1b58-2b7e-ee9c56bc5c82@oracle.com>
References: <d93dad54-d3ef-1b58-2b7e-ee9c56bc5c82@oracle.com>
Message-ID: <B2065FB1-D274-417C-9A34-9A4DFD69B43E@oracle.com>

Looks good!

/Per

> On 1 Jun 2020, at 15:18, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> ?Updated webrev after Per's comments:
> https://cr.openjdk.java.net/~stefank/8246134/webrev.02
> 
> StefanK
> 
>> On 2020-06-01 10:24, Per Liden wrote:
>>> On 5/29/20 12:13 PM, Stefan Karlsson wrote:
>>> Hi all,
>>> 
>>> Please review this small patch to restructure and cleanup the information ZGC prints to hs_err files (and jcmd VM.info).
>>> 
>>> https://cr.openjdk.java.net/~stefank/8246134/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8246134
>>> 
>>> The patch:
>>> - Moves the Page Table dumping later, to make it easier to find the other sections
>>> - Pretty print some info
>>> - Add barrier set print (mostly to get rid of awkward double new lines)
>>> - Update titles and cleanup newlines
>> 
>> Looks good. Some nits:
>> 
>> 1) How about we move z_global_phase_string() to zGlobal.hpp/cpp and call it e.g. ZGlobalPhaseToString()?
>> 
>> 
>> 2) I find code like this unnecessarily hard to read:
>> 
>> +  switch (ZGlobalPhase) {
>> +  case ZPhaseMark: return "Mark";
>> +  case ZPhaseMarkCompleted: return "MarkCompleted";
>> +  case ZPhaseRelocate: return "Relocate";
>> +  default: assert(false, "Unknown ZGlobalPhase"); return "Unknown";
>> 
>> How about:
>> 
>> switch (ZGlobalPhase) {
>> case ZPhaseMark:
>>   return "Mark";
>> 
>> case ZPhaseMarkCompleted:
>>   return "MarkCompleted";
>> 
>> case ZPhaseRelocate:
>>   return "Relocate";
>> 
>> default:
>>   assert(false, "Unknown ZGlobalPhase");
>>   return "Unknown";
>> }
>> 
>> 
>> 3) I see it was like this before your change, but how about removing the extra space on all print_cr-lines, for example:
>> 
>>  317   st->print_cr( "ZGC Globals:");
>> 
>> to:
>> 
>>  317   st->print_cr("ZGC Globals:");
>> 
>> It also looks like the argument indentation here is off by one:
>> 
>>  320   st->print_cr( " Offset Max:        " SIZE_FORMAT "%s (" PTR_FORMAT ")",
>>  321               byte_size_in_exact_unit(ZAddressOffsetMax),
>>  322               exact_unit_for_byte_size(ZAddressOffsetMax),
>>  323               ZAddressOffsetMax);
>>  321               byte_size_in_exact_unit(ZAddressOffsetMax),
>>  322               exact_unit_for_byte_size(ZAddressOffsetMax),
>>  323               ZAddressOffsetMax);
>> 
>> cheers,
>> Per
>> 
>>> 
>>> Thanks,
>>> StefanK
> 


From stefan.karlsson at oracle.com  Mon Jun  1 14:38:43 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 1 Jun 2020 16:38:43 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
Message-ID: <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>

Updated webrev:
https://cr.openjdk.java.net/~stefank/8246135/webrev.02

StefanJ asked if I could make this a utility that other GCs could use as 
well. I've moved the functionality to gc/shared/gcLogPrecious.[hc]pp, 
but I haven't implemented this for the other GCs. That part is left for 
separate RFEs.

Thanks,
StefanK

On 2020-05-29 12:23, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to save some of the important ZGC log lines 
> and print them when dumping hs_err files.
>
> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246135
>
> The patch adds a concept of "precious" log lines. What's typically 
> logged are GC initialization lines, but also error messages are saved. 
> These lines are then dumped in the hs_err file if the JVM crashes or 
> hits an assert. The lines can also be printed in a debugger to get a 
> quick overview when debugging.
>
> The precious lines are always saved, but just like any other Unified 
> Logging calls, only logged if the tags are enabled.
>
> The patch builds on the JDK-8246134 patch. The hs_err output looks 
> like this:
>
> ZGC Precious Log:
> ?NUMA Support: Disabled
> ?CPUs: 8 total, 8 available
> ?Memory: 16384M
> ?Large Page Support: Disabled
> ?Medium Page Size: 32M
> ?Workers: 5 parallel, 1 concurrent
> ?Address Space Type: Contiguous/Unrestricted/Complete
> ?Address Space Size: 65536M x 3 = 196608M
> ?Min Capacity: 42M
> ?Initial Capacity: 256M
> ?Max Capacity: 4096M
> ?Max Reserve: 42M
> ?Pre-touch: Disabled
> ?Uncommit: Enabled
> ?Uncommit Delay: 300s
> ?Runtime Workers: 5 parallel
>
> ZGC Globals:
> ?GlobalPhase:?????? 2 (Relocate)
> ?GlobalSeqNum:????? 1
> ?Offset Max:??????? 4096G (0x0000040000000000)
> ?Page Size Small:?? 2M
> ?Page Size Medium:? 32M
>
> ZGC Metadata Bits:
> ?Good:????????????? 0x0000100000000000
> ?Bad:?????????????? 0x00002c0000000000
> ?WeakBad:?????????? 0x00000c0000000000
> ?Marked:??????????? 0x0000040000000000
> ?Remapped:????????? 0x0000100000000000
>
> Heap:
> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, reserved 
> 1056768K
> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
> 1048576K
>
> ZGC Page Table:
> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
> Allocating
> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
> Allocating
> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
> Allocating
> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
> Allocating
> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
> Allocating
> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
> Allocating
>
> Thanks,
> StefanK


From stefan.karlsson at oracle.com  Mon Jun  1 15:55:10 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 1 Jun 2020 17:55:10 +0200
Subject: RFR: 8246258: Enable hs_err heap printing earlier during
 initialization
Message-ID: <db3ef810-0718-00f2-19b1-d2442ea99efc@oracle.com>

Hi all,

Please review this patch to enable the hs_err GC / heap printing 
directly after the heap has been set up.

https://cr.openjdk.java.net/~stefank/8246258/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8246258

Changes in the patch:
- Remove the Universe::is_fully_initialized
- Add NULL initializations and checks in print paths

I tested this patch by adding a temporary fatal(...) here:

jint Universe::initialize_heap() {
 ? assert(_collectedHeap == NULL, "Heap already created");
 ? _collectedHeap = GCConfig::arguments()->create_heap();
 ? // <<<< HERE
 ? log_info(gc)("Using %s", _collectedHeap->name());
 ? return _collectedHeap->initialize();
}

and manually looking at the result when running with all GCs. Will run 
this through tier1-3.

Thanks,
StefanK


From thomas.stuefe at gmail.com  Mon Jun  1 16:04:58 2020
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 1 Jun 2020 18:04:58 +0200
Subject: RFR: 8246258: Enable hs_err heap printing earlier during
 initialization
In-Reply-To: <db3ef810-0718-00f2-19b1-d2442ea99efc@oracle.com>
References: <db3ef810-0718-00f2-19b1-d2442ea99efc@oracle.com>
Message-ID: <CAA-vtUyAGe-cWQFogYfDxBtGe7YB7m5g2aQo+8Wx7fDnMg95_Q@mail.gmail.com>

Hi Stefan,

looks good. Note that there are tests which test very early error handling
(see TestVeryEarlyAssert), you could extend that to guard against bitrot.
But I am fine with the patch as it is.

Cheers, Thomas

On Mon, Jun 1, 2020 at 5:57 PM Stefan Karlsson <stefan.karlsson at oracle.com>
wrote:

> Hi all,
>
> Please review this patch to enable the hs_err GC / heap printing
> directly after the heap has been set up.
>
> https://cr.openjdk.java.net/~stefank/8246258/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246258
>
> Changes in the patch:
> - Remove the Universe::is_fully_initialized
> - Add NULL initializations and checks in print paths
>
> I tested this patch by adding a temporary fatal(...) here:
>
> jint Universe::initialize_heap() {
>    assert(_collectedHeap == NULL, "Heap already created");
>    _collectedHeap = GCConfig::arguments()->create_heap();
>    // <<<< HERE
>    log_info(gc)("Using %s", _collectedHeap->name());
>    return _collectedHeap->initialize();
> }
>
> and manually looking at the result when running with all GCs. Will run
> this through tier1-3.
>
> Thanks,
> StefanK
>


From per.liden at oracle.com  Mon Jun  1 17:06:31 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 1 Jun 2020 19:06:31 +0200
Subject: RFR: 8245203/8245204/8245208: ZGC: Don't hold the ZPageAllocator
 lock while committing/uncommitting memory
In-Reply-To: <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>
References: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
 <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>
Message-ID: <49447851-8f64-8c99-8443-5b400e1851c0@oracle.com>


On 6/1/20 7:32 AM, Per Liden wrote:
> On 5/18/20 11:23 PM, Per Liden wrote:
> [...]
>> 3) 8245208: ZGC: Don't hold the ZPageAllocator lock while 
>> committing/uncommitting memory
>>
>> We're currently holding the ZPageAllocator lock while performing a 
>> number of expensive operations, such as committing and uncommitting 
>> memory. This can have a very negative impact on latency, for example, 
>> when a Java thread is trying to allocate a page from the page cache 
>> while the ZUncommitter thread is uncommitting a portion of the heap.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8245208
>> Webrev: http://cr.openjdk.java.net/~pliden/8245208/webrev.0
> 

Another update, after receiving some more comments from Stefan and Erik:

* 8245208: ZGC: Don't hold the ZPageAllocator lock while 
committing/uncommitting memory
Full: http://cr.openjdk.java.net/~pliden/8245208/webrev.2/
Diff: http://cr.openjdk.java.net/~pliden/8245208/webrev.2-diff/

* 8246220: ZGC: Introduce ZUnmapper to asynchronous unmap pages
Full: http://cr.openjdk.java.net/~pliden/8246220/webrev.1/
Diff: http://cr.openjdk.java.net/~pliden/8246220/webrev.1-diff/


... and the above patches sits on top of these, which have not been 
modified in this round:

* 8246265: ZGC: Introduce ZConditionLock
http://cr.openjdk.java.net/~pliden/8246265/webrev.0/

* 8245204: ZGC: Introduce ZListRemoveIterator
http://cr.openjdk.java.net/~pliden/8245204/webrev.0/

* 8245203: ZGC: Don't track size in ZPhysicalMemoryBacking
http://cr.openjdk.java.net/~pliden/8245203/webrev.0/

cheers,
Per


From zgu at redhat.com  Mon Jun  1 18:23:15 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 1 Jun 2020 14:23:15 -0400
Subject: [15] RFR 8245961: Shenandoah: move some root marking to concurrent
 phase
Message-ID: <61efdf04-5d74-8a59-a193-576b115738ef@redhat.com>

Please review this patch that moves following root scanning to 
concurrent phase.

1) JNIHandle roots
2) VM global roots
3) CLD roots

and also consolidates code cache root marking.


Bug: https://bugs.openjdk.java.net/browse/JDK-8245961
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.00/

Test:
   hotspot_gc_shenandoah

Thanks,

-Zhengyu


From stefan.johansson at oracle.com  Mon Jun  1 18:50:50 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Mon, 1 Jun 2020 20:50:50 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
Message-ID: <9f501f2c-bae7-16c5-f165-62ce2bcb59e6@oracle.com>

Hi Stefan,

On 2020-06-01 16:38, Stefan Karlsson wrote:
> Updated webrev:
> https://cr.openjdk.java.net/~stefank/8246135/webrev.02
Looks good, thanks for making the functionality shared. Filed 
https://bugs.openjdk.java.net/browse/JDK-8246272 for the other GCs.

Cheers,
Stefan

> 
> StefanJ asked if I could make this a utility that other GCs could use as 
> well. I've moved the functionality to gc/shared/gcLogPrecious.[hc]pp, 
> but I haven't implemented this for the other GCs. That part is left for 
> separate RFEs.
> 
> Thanks,
> StefanK
> 
> On 2020-05-29 12:23, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to save some of the important ZGC log lines 
>> and print them when dumping hs_err files.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>
>> The patch adds a concept of "precious" log lines. What's typically 
>> logged are GC initialization lines, but also error messages are saved. 
>> These lines are then dumped in the hs_err file if the JVM crashes or 
>> hits an assert. The lines can also be printed in a debugger to get a 
>> quick overview when debugging.
>>
>> The precious lines are always saved, but just like any other Unified 
>> Logging calls, only logged if the tags are enabled.
>>
>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>> like this:
>>
>> ZGC Precious Log:
>> ?NUMA Support: Disabled
>> ?CPUs: 8 total, 8 available
>> ?Memory: 16384M
>> ?Large Page Support: Disabled
>> ?Medium Page Size: 32M
>> ?Workers: 5 parallel, 1 concurrent
>> ?Address Space Type: Contiguous/Unrestricted/Complete
>> ?Address Space Size: 65536M x 3 = 196608M
>> ?Min Capacity: 42M
>> ?Initial Capacity: 256M
>> ?Max Capacity: 4096M
>> ?Max Reserve: 42M
>> ?Pre-touch: Disabled
>> ?Uncommit: Enabled
>> ?Uncommit Delay: 300s
>> ?Runtime Workers: 5 parallel
>>
>> ZGC Globals:
>> ?GlobalPhase:?????? 2 (Relocate)
>> ?GlobalSeqNum:????? 1
>> ?Offset Max:??????? 4096G (0x0000040000000000)
>> ?Page Size Small:?? 2M
>> ?Page Size Medium:? 32M
>>
>> ZGC Metadata Bits:
>> ?Good:????????????? 0x0000100000000000
>> ?Bad:?????????????? 0x00002c0000000000
>> ?WeakBad:?????????? 0x00000c0000000000
>> ?Marked:??????????? 0x0000040000000000
>> ?Remapped:????????? 0x0000100000000000
>>
>> Heap:
>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, reserved 
>> 1056768K
>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>> 1048576K
>>
>> ZGC Page Table:
>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>> Allocating
>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>> Allocating
>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>> Allocating
>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>> Allocating
>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>> Allocating
>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>> Allocating
>>
>> Thanks,
>> StefanK
> 


From zgu at redhat.com  Mon Jun  1 19:03:47 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 1 Jun 2020 15:03:47 -0400
Subject: RFR (S) 8246100: Shenandoah: walk roots in more efficient order
In-Reply-To: <4a733745-6f94-b6aa-b8c9-8daedace22f2@redhat.com>
References: <4a733745-6f94-b6aa-b8c9-8daedace22f2@redhat.com>
Message-ID: <3a8331ad-9ec1-dda9-058d-6edcdd594e4d@redhat.com>

Looks good in general. But not sure why call vm/weak/dedup roots limited 
parallel, I think they are fully parallel.

-Zhengyu

On 6/1/20 6:24 AM, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8246100
> 
> See the rationale in the RFE.
> 
> Webrev:
>    https://cr.openjdk.java.net/~shade/8246100/webrev.02/
> 
> Testing: hotspot_gc_shenandoah
> 


From zgu at redhat.com  Mon Jun  1 19:26:43 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 1 Jun 2020 15:26:43 -0400
Subject: RFR (S) 8246100: Shenandoah: walk roots in more efficient order
In-Reply-To: <3a8331ad-9ec1-dda9-058d-6edcdd594e4d@redhat.com>
References: <4a733745-6f94-b6aa-b8c9-8daedace22f2@redhat.com>
 <3a8331ad-9ec1-dda9-058d-6edcdd594e4d@redhat.com>
Message-ID: <f774ec7b-22a3-b8ee-3541-8a6f6c476ede@redhat.com>

Wait, should we process CLDG early? CLD likely uneven, e.g. boot class 
loader may take a lot longer than others.

Thanks,

-Zhengyu

On 6/1/20 3:03 PM, Zhengyu Gu wrote:
> Looks good in general. But not sure why call vm/weak/dedup roots limited 
> parallel, I think they are fully parallel.
> 
> -Zhengyu
> 
> On 6/1/20 6:24 AM, Aleksey Shipilev wrote:
>> RFE:
>> ?? https://bugs.openjdk.java.net/browse/JDK-8246100
>>
>> See the rationale in the RFE.
>>
>> Webrev:
>> ?? https://cr.openjdk.java.net/~shade/8246100/webrev.02/
>>
>> Testing: hotspot_gc_shenandoah
>>


From per.liden at oracle.com  Mon Jun  1 20:08:47 2020
From: per.liden at oracle.com (Per Liden)
Date: Mon, 1 Jun 2020 22:08:47 +0200
Subject: RFR: 8245203/8245204/8245208: ZGC: Don't hold the ZPageAllocator
 lock while committing/uncommitting memory
In-Reply-To: <49447851-8f64-8c99-8443-5b400e1851c0@oracle.com>
References: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
 <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>
 <49447851-8f64-8c99-8443-5b400e1851c0@oracle.com>
Message-ID: <06db729e-d804-a0b4-262a-aa70181d904b@oracle.com>

On 6/1/20 7:06 PM, Per Liden wrote:
> 
> On 6/1/20 7:32 AM, Per Liden wrote:
>> On 5/18/20 11:23 PM, Per Liden wrote:
>> [...]
>>> 3) 8245208: ZGC: Don't hold the ZPageAllocator lock while 
>>> committing/uncommitting memory
>>>
>>> We're currently holding the ZPageAllocator lock while performing a 
>>> number of expensive operations, such as committing and uncommitting 
>>> memory. This can have a very negative impact on latency, for example, 
>>> when a Java thread is trying to allocate a page from the page cache 
>>> while the ZUncommitter thread is uncommitting a portion of the heap.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8245208
>>> Webrev: http://cr.openjdk.java.net/~pliden/8245208/webrev.0
>>
> 
> Another update, after receiving some more comments from Stefan and Erik:
> 
> * 8245208: ZGC: Don't hold the ZPageAllocator lock while 
> committing/uncommitting memory
> Full: http://cr.openjdk.java.net/~pliden/8245208/webrev.2/
> Diff: http://cr.openjdk.java.net/~pliden/8245208/webrev.2-diff/

Just a heads up. I found a nicer way to structure the uncommit_run() 
function and updated the webrev in-place.

cheers,
Per

> 
> * 8246220: ZGC: Introduce ZUnmapper to asynchronous unmap pages
> Full: http://cr.openjdk.java.net/~pliden/8246220/webrev.1/
> Diff: http://cr.openjdk.java.net/~pliden/8246220/webrev.1-diff/
> 
> 
> ... and the above patches sits on top of these, which have not been 
> modified in this round:
> 
> * 8246265: ZGC: Introduce ZConditionLock
> http://cr.openjdk.java.net/~pliden/8246265/webrev.0/
> 
> * 8245204: ZGC: Introduce ZListRemoveIterator
> http://cr.openjdk.java.net/~pliden/8245204/webrev.0/
> 
> * 8245203: ZGC: Don't track size in ZPhysicalMemoryBacking
> http://cr.openjdk.java.net/~pliden/8245203/webrev.0/
> 
> cheers,
> Per


From luoziyi at amazon.com  Mon Jun  1 23:12:25 2020
From: luoziyi at amazon.com (Luo, Ziyi)
Date: Mon, 1 Jun 2020 23:12:25 +0000
Subject: RFR (S) 8246274: G1 old gen allocation tracking is not in a separate
 class
Message-ID: <1D252CA6-AEF8-4E57-9B6C-37AACE9D7EC1@amazon.com>

Hi,

Could you please review this change which refactors 
G1Policy::_bytes_allocated_in_old_since_last_gc into a dedicated new 
tracking class G1OldGenAllocationTracker?

Bug ID:
https://bugs.openjdk.java.net/browse/JDK-8246274
Webrev:
http://cr.openjdk.java.net/~phh/8246274/webrev.00/

Testing: Local run hotspot:tier1.

This is the first step toward improving the G1 old gen allocation tracking. As 
described in JDK-8245511, we will further add humongous allocation tracking 
and refactor G1IHOPControl::update_allocation_info(). This is a clean 
refactoring of the original G1Policy::_bytes_allocated_in_old_since_last_gc 
field and G1Policy::add_bytes_allocated_in_old_since_last_gc() method.

Thanks,
Ziyi


From ioi.lam at oracle.com  Tue Jun  2 02:28:25 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Mon, 1 Jun 2020 19:28:25 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <CALrW1jyaDThW00gyVr8uOiKWF+TSsDQ5g-Zmd5QCwAX86R8xvA@mail.gmail.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <CALrW1jyzfEE36GCCYjEF3noqF7+_wc7Lzj1K-beix=aj5L5guA@mail.gmail.com>
 <13836d5c-db91-6e6a-5022-0e7585722f77@oracle.com>
 <CALrW1jz5FgDmexfZf8xAFsDh3WcVSkFvT1PED3413BJxDudcPQ@mail.gmail.com>
 <CALrW1jyaDThW00gyVr8uOiKWF+TSsDQ5g-Zmd5QCwAX86R8xvA@mail.gmail.com>
Message-ID: <726b3445-9531-a3c3-7629-2152bedce2d1@oracle.com>


On 5/31/20 8:33 PM, Jiangli Zhou wrote:
> On Sun, May 31, 2020 at 8:27 PM Jiangli Zhou <jianglizhou at google.com> wrote:
>> On Fri, May 29, 2020 at 10:44 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>
>>>
>>> On 5/29/20 8:40 PM, Jiangli Zhou wrote:
>>>> On Fri, May 29, 2020 at 7:30 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
>>>>>
>>>>>
>>>>> Summary:
>>>>>
>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>>> CDS executes a full GC so that G1 will compact the heap regions, leaving
>>>>> maximum contiguous free space at the top of the heap. Then, the archived
>>>>> heap regions are allocated from the top of the heap.
>>>>>
>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>>> after the GC has completed. The cleaners may allocate or synchronized, which
>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>> This is an interesting one. Please give more details on under what
>>>> circumstances java.lang.ref.Cleaners causes the issue. It's unclear to
>>>> me why it hasn't been showing up before.
>>> Hi Jiangli,
>>>
>>> Thanks for the review. It's very helpful.
>>>
>>> The assert (see my comment in JDK-8245925) happened in my prototype for
>>> JDK-8244778
>>>
>>> http://cr.openjdk.java.net/~iklam/jdk15/8244778-archive-full-module-graph.v00.8/
>>>
>>> I have to archive AppClassLoader and PlatformClassLoader, but need to
>>> clear their URLClassPath field (the "ucp" field). See
>>> clear_loader_states in metaspaceShared.cpp. Because of this, some
>>> java.util.zip.ZipFiles referenced by the URLClassPath  become garbage,
>>> and their Cleaners are executed after full GC has finished.
>>>
>> I haven't looked at your 8244778-archive-full-module-graph change yet,
>> if you are going to archive class loader objects, you probably want to
>> go with a solution that scrubs fields that are 'not archivable' and
>> then restores at runtime. Sounds like you are going with that. When I
>> worked on the initial implementation for system module object
>> archiving, I implemented static field scrubber with the goal for
>> archiving class loaders. I didn't complete it as it was not yet
>> needed, but the code probably is helpful for you now. I might have
>> sent you the pointer to one of the versions at the time, but try
>> looking under my old /home directory if it's still around. It might be
>> good to trigger runtime field restoration by Java code, that's the
>> part I haven't fully explored yet. But, hopefully these inputs would
>> be useful for your current work.

Hi Jiangli,

I can't access your old home directory. I already implemented the field 
scrubbing in the above patch. It's in clear_loader_states in 
metaspaceShared.cpp.

>>> I think the bug has always existed, but is just never triggered because
>>> we have not activated the Cleaners.
>>>
>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN regions
>>>>> exist,
>>>>> exit the safepoint, run GC, and try again. Eventually all the cleaners will
>>>>> be executed and no more allocation can happen.
>>>>>
>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>>
>>>> I think it's better to skip the top allocated region(s) in such cases
>>>> and avoid retrying. Dump time performance is important, as we are
>>>> moving the cost from runtime to CDS dump time. It's desirable to keep
>>>> the dump time cost as low as possible, so using CDS delivers better
>>>> net gain overall.
>>>>
>>>> Here are some comments for your current webrev itself.
>>>>
>>>> 1611 static bool has_unwanted_g1_eden_regions() {
>>>> 1612 #if INCLUDE_G1GC
>>>> 1613   return HeapShared::is_heap_object_archiving_allowed() && UseG1GC &&
>>>> 1614          G1CollectedHeap::heap()->eden_regions_count() > 0;
>>>> 1615 #else
>>>> 1616   return false;
>>>> 1617 #endif
>>>> 1618 }
>>>>
>>>> You can remove 'UseG1GC' from line 1613, as
>>>> is_heap_object_archiving_allowed() check already covers it:
>>>>
>>>> static bool is_heap_object_archiving_allowed() {
>>>>     CDS_JAVA_HEAP_ONLY(return (UseG1GC && UseCompressedOops &&
>>>> UseCompressedClassPointers);)
>>>>     NOT_CDS_JAVA_HEAP(return false;)
>>>> }
>>>>
>>>> Please include heap archiving code under #if INCLUDE_CDS_JAVA_HEAP.
>>>> It's better to extract the GC handling code in
>>>> VM_PopulateDumpSharedSpace::doit() into a separate API in
>>>> heapShared.*.
>>>>
>>>> It's time to enhance heap archiving to use a separate buffer when
>>>> copying the objects at dump time (discussed before), as a longer term
>>>> fix. I'll file a RFE.
>>> Thanks for reminding me. I think that is a better way to fix this
>>> problem. It should be fairly easy to do, as we can already relocate the
>>> heap regions using HeapShared::patch_archived_heap_embedded_pointers().
>>> Let me try to implement it.
>>>
>> Sounds good. Thanks for doing that.
>>
>>> BTW, the GC speed is very fast, because the heap is not used very much
>>> during -Xshare:dump. -Xlog:gc shows:
>>>
>>> [0.259s][info][gc ] GC(0) Pause Full (Full GC for -Xshare:dump)
>>> 4M->1M(32M) 8.220ms
>>>
>>> So we have allocated only 4MB of objects, and only 1MB of those are
>>> reachable.
>>>
>>> Anyway, I think we can even avoid running the GC altogether. We can scan
>>> for contiguous free space from the top of the heap (below the EDEN
>>> region). If there's more contiguous free space than the current
>>> allocated heap regions, we know for sure that we can archive all the
>>> heap objects that we need without doing the GC. That can be done as an
>>> RFE after copying the objects. It won't save much though (8ms out of
>>> about 700ms of total -Xshare:dump time).
>> Looks like we had similar thoughts about finding free heap regions for
>> copying. Here are the details:
>>
>> Solution 1):
>> Allocate a buffer (no specific memory location requirement) and copy
>> the heap objects to the buffer. Additional buffers can be allocated
>> when needed, and they don't need to form a consecutive block of
>> memory. The pointers within the copied Java objects need to be
>> computed from the Java heap top as the 'real heap address' (so their
>> runtime positions are at the heap top), instead of the buffer address.
>> Region verification code also needs to be updated to reflect how the
>> pointers are computed now.
>>
>> Solution 2):
>> Find a range (consecutive heap regions) within the Java heap that is
>> free. Copy the archive objects to that range. The pointer update and
>> verification are similar to solution 1).
>>
>>

I am thinking of doing something simpler. During heap archiving, G1 just 
allocates the highest free region

bool G1ArchiveAllocator::alloc_new_region() {
 ? // Allocate the highest free region in the reserved heap,
 ? // and add it to our list of allocated regions. It is marked
 ? // archive and added to the old set.
 ? HeapRegion* hr = _g1h->alloc_highest_free_region();

If there are used regions scattered around in the heap, we will end up 
with a few archive regions that are not contiguous, and the highest 
archive region may not be flushed to the top of the heap.

Inside VM_PopulateDumpSharedSpace::dump_archive_heap_oopmaps, I will 
rewrite the pointers inside each of the archived regions, such that the 
contents would be the same as if all the archive regions were 
consecutively allocated at the top of the heap.

This will require no more memory allocation at dump time that what it 
does today, and can be done with very little overhead.

> I think you can go with solution 1 now. Solution 2) has the benefit of
> not requiring additional memory for copying archived objects. That's
> important as I did run into insufficient memory at dump time in real
> use cases, so any memory saving at dump time is desirable.
>
> Some clarifications to avoid confusion: The insufficient memory is due
> to memory restriction for builds in cloud environment.
Could you elaborate? The Java heap space is reserved but initially not 
committed. Physical memory is allocated when we write into the archive 
heap regions.

Were you getting OOM during virtual space reservation, or during 
os::commit_memory()?

How much heap objects are you dumping? The current jdk repo needs only 2 
G1 regions, so it's 2* 4M of memory for small heaps like -Xmx128m, and 2 
* 8M of memory for larger heaps.

Thanks
- Ioi

> Thanks,
> Jiangli
>
>> It's better
>> to go with 2) when the size of archive range is known before copying.
>> With the planned work for class pre-initialization and enhanced object
>> archiving support, it will be able to obtain (or have a good estimate
>> of) the total size before copying. Solution 1 can be enhanced to use
>> heap memory when that happens.
>>
>> I'll log these details as a RFE on Monday.
>>
>> Best,
>> Jiangli
>>
>>> I'll withdraw this patch for now, and will try to implement the object
>>> copying.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> Best,
>>>> Jiangli
>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>


From igor.ignatyev at oracle.com  Tue Jun  2 03:00:18 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 1 Jun 2020 20:00:18 -0700
Subject: RFR(S/M) : 8243430 : use reproducible random in :vmTestbase_vm_gc
In-Reply-To: <6D4C5C1A-5D83-4DA9-A093-D3E9799584AC@oracle.com>
References: <587C03FD-EFF4-42C1-9860-0BDC4F3A800F@oracle.com>
 <1440ED97-6390-402A-B1E1-810DC9DEDBA3@oracle.com>
 <EEF219A3-1EAE-44A9-A14D-26F55686D89B@oracle.com>
 <F977C6E3-64CF-4F67-B266-7D55C4C2B18A@oracle.com>
 <6D4C5C1A-5D83-4DA9-A093-D3E9799584AC@oracle.com>
Message-ID: <C630559B-E98E-49C7-B5AB-F12F57492828@oracle.com>

ping?
-- Igor

> On May 20, 2020, at 3:43 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> ping?
> -- Igor
> 
>> On May 5, 2020, at 10:18 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>> 
>> can I get 2nd review for that? or ack. that it's fine to be pushed w/ just one review?
>> 
>> -- Igor
>> 
>>> On May 3, 2020, at 8:29 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On May 3, 2020, at 12:13 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>>> 
>>>>> On Apr 30, 2020, at 4:38 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>>> 
>>>>> http://cr.openjdk.java.net/~iignatyev/8243430/webrev.00
>>>>>> 555 lines changed: 11 ins; 65 del; 479 mod;
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> could you please review this small patch?
>>>>> from JBS: 
>>>>>> this subtask is to use j.t.l.Utils.getRandomInstance() as a random number generator, where applicable, in : vmTestbase_vm_gc test group and marking the tests which make use of "randomness" with a proper k/w.
>>>>> 
>>>>> testing: : vmTestbase_vm_gc test group
>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243430
>>>>> webrevs:
>>>>> - code changes: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00.code
>>>>>> 94 lines changed: 11 ins; 65 del; 18 mod;
>>>>> - adding k/w: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00.kw
>>>>>> 229 lines changed: 0 ins; 0 del; 229 mod; 
>>>>> - full: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00
>>>>>> 555 lines changed: 11 ins; 65 del; 479 mod;
>>>>> 
>>>>> Thanks,
>>>>> -- Igor
>>>> 
>>> Hi Kim,
>>> 
>>>> I could be missing something, bug I don't see where either of these
>>>> use randomness:
>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest03/gctest03.java
>>> gctest03 starts Yellowthread at L#130, and Yellowthread uses nsk.share.test.LocalRandom in its 'run' method at L#167 of vmTestbase/gc/gctests/gctest03/appthread.java
>>> 
>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest04/gctest04.java
>>> gctest04 starts reqdisp thread at L#97, and reqdisp uses n.s.t.LocalRandom in its 'run' method at L#211 of vmTestbase/gc/gctests/gctest04/reqgen.java
>>>> 
>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest02/gctest02.java
>>>> Still importing java.util.Random, but that doesn't seem needed.
>>>> 53 import java.util.Random;
>>> you're right, I'll remove it before pushing.
>>> 
>>>> 
>>>> Other than that, looks good.
>>> thanks for review!
>>>> 
>>>> I don't need a new webrev for changes related to the above.
>>> 
>>> -- Igor
>>>> 
>>> 
>> 
> 


From igor.ignatyev at oracle.com  Tue Jun  2 03:00:45 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 1 Jun 2020 20:00:45 -0700
Subject: RFR(S/M) : 8243434 : use reproducible random in
 :vmTestbase_vm_g1classunloading
In-Reply-To: <E09134CB-1E79-492D-9852-DD54748C41B4@oracle.com>
References: <5AA3E4F2-AC0D-4B77-8111-56FC8C993FEC@oracle.com>
 <8DE9130B-2A84-4B95-8A1B-892CA3D8F01B@oracle.com>
 <E09134CB-1E79-492D-9852-DD54748C41B4@oracle.com>
Message-ID: <CE567E45-6A99-4D8E-9EB8-585C36F46FA2@oracle.com>

ping?
-- Igor

> On May 20, 2020, at 3:42 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> ping?
> -- Igor
> 
>> On May 5, 2020, at 9:56 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>> 
>> ping?
>> -- Igor
>> 
>>> On Apr 30, 2020, at 4:10 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>> 
>>> http://cr.openjdk.java.net/~iignatyev/8243434/webrev.00
>>>> 132 lines changed: 8 ins; 0 del; 124 mod
>>> 
>>> Hi all,
>>> 
>>> could you please review this patch?
>>> from JBS: 
>>>> this subtask is to use j.t.l.Utils.getRandomInstance() as a random number generator, where applicable, in : vmTestbase_vm_g1classunloading test group and marking the tests which make use of "randomness" with a proper k/w.
>>> 
>>> testing: : vmTestbase_vm_g1classunloading test group
>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243434
>>> webrevs:
>>> - code changes: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00.code
>>>> 15 lines changed: 8 ins; 0 del; 7 mod;
>>> - adding k/w: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00.kw
>>>> 112 lines changed: 0 ins; 0 del; 112 mod;
>>> - full: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00
>>>> 132 lines changed: 8 ins; 0 del; 124 mod
>>> 
>>> Thanks,
>>> -- Igor
>> 
> 


From leonid.mesnik at oracle.com  Tue Jun  2 03:04:41 2020
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Mon, 1 Jun 2020 20:04:41 -0700
Subject: RFR(S/M) : 8243430 : use reproducible random in :vmTestbase_vm_gc
In-Reply-To: <C630559B-E98E-49C7-B5AB-F12F57492828@oracle.com>
References: <587C03FD-EFF4-42C1-9860-0BDC4F3A800F@oracle.com>
 <1440ED97-6390-402A-B1E1-810DC9DEDBA3@oracle.com>
 <EEF219A3-1EAE-44A9-A14D-26F55686D89B@oracle.com>
 <F977C6E3-64CF-4F67-B266-7D55C4C2B18A@oracle.com>
 <6D4C5C1A-5D83-4DA9-A093-D3E9799584AC@oracle.com>
 <C630559B-E98E-49C7-B5AB-F12F57492828@oracle.com>
Message-ID: <405909cf-39bb-75b0-0302-7b1f27625ea7@oracle.com>

Looks good.

Leonid

On 6/1/20 8:00 PM, Igor Ignatyev wrote:
> ping?
> -- Igor
>
>> On May 20, 2020, at 3:43 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>
>> ping?
>> -- Igor
>>
>>> On May 5, 2020, at 10:18 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>
>>> can I get 2nd review for that? or ack. that it's fine to be pushed w/ just one review?
>>>
>>> -- Igor
>>>
>>>> On May 3, 2020, at 8:29 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>>
>>>>
>>>>
>>>>> On May 3, 2020, at 12:13 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>>>>
>>>>>> On Apr 30, 2020, at 4:38 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~iignatyev/8243430/webrev.00
>>>>>>> 555 lines changed: 11 ins; 65 del; 479 mod;
>>>>>> Hi all,
>>>>>>
>>>>>> could you please review this small patch?
>>>>>> from JBS:
>>>>>>> this subtask is to use j.t.l.Utils.getRandomInstance() as a random number generator, where applicable, in : vmTestbase_vm_gc test group and marking the tests which make use of "randomness" with a proper k/w.
>>>>>> testing: : vmTestbase_vm_gc test group
>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243430
>>>>>> webrevs:
>>>>>> - code changes: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00.code
>>>>>>> 94 lines changed: 11 ins; 65 del; 18 mod;
>>>>>> - adding k/w: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00.kw
>>>>>>> 229 lines changed: 0 ins; 0 del; 229 mod;
>>>>>> - full: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00
>>>>>>> 555 lines changed: 11 ins; 65 del; 479 mod;
>>>>>> Thanks,
>>>>>> -- Igor
>>>> Hi Kim,
>>>>
>>>>> I could be missing something, bug I don't see where either of these
>>>>> use randomness:
>>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest03/gctest03.java
>>>> gctest03 starts Yellowthread at L#130, and Yellowthread uses nsk.share.test.LocalRandom in its 'run' method at L#167 of vmTestbase/gc/gctests/gctest03/appthread.java
>>>>
>>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest04/gctest04.java
>>>> gctest04 starts reqdisp thread at L#97, and reqdisp uses n.s.t.LocalRandom in its 'run' method at L#211 of vmTestbase/gc/gctests/gctest04/reqgen.java
>>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest02/gctest02.java
>>>>> Still importing java.util.Random, but that doesn't seem needed.
>>>>> 53 import java.util.Random;
>>>> you're right, I'll remove it before pushing.
>>>>
>>>>> Other than that, looks good.
>>>> thanks for review!
>>>>> I don't need a new webrev for changes related to the above.
>>>> -- Igor


From leonid.mesnik at oracle.com  Tue Jun  2 03:05:36 2020
From: leonid.mesnik at oracle.com (Leonid Mesnik)
Date: Mon, 1 Jun 2020 20:05:36 -0700
Subject: RFR(S/M) : 8243434 : use reproducible random in
 :vmTestbase_vm_g1classunloading
In-Reply-To: <CE567E45-6A99-4D8E-9EB8-585C36F46FA2@oracle.com>
References: <5AA3E4F2-AC0D-4B77-8111-56FC8C993FEC@oracle.com>
 <8DE9130B-2A84-4B95-8A1B-892CA3D8F01B@oracle.com>
 <E09134CB-1E79-492D-9852-DD54748C41B4@oracle.com>
 <CE567E45-6A99-4D8E-9EB8-585C36F46FA2@oracle.com>
Message-ID: <8c23edf7-e389-3fa0-669e-2c015b166065@oracle.com>

Looks good.

Leonid

On 6/1/20 8:00 PM, Igor Ignatyev wrote:
> ping?
> -- Igor
>
>> On May 20, 2020, at 3:42 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>
>> ping?
>> -- Igor
>>
>>> On May 5, 2020, at 9:56 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>
>>> ping?
>>> -- Igor
>>>
>>>> On Apr 30, 2020, at 4:10 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>>
>>>> http://cr.openjdk.java.net/~iignatyev/8243434/webrev.00
>>>>> 132 lines changed: 8 ins; 0 del; 124 mod
>>>> Hi all,
>>>>
>>>> could you please review this patch?
>>>> from JBS:
>>>>> this subtask is to use j.t.l.Utils.getRandomInstance() as a random number generator, where applicable, in : vmTestbase_vm_g1classunloading test group and marking the tests which make use of "randomness" with a proper k/w.
>>>> testing: : vmTestbase_vm_g1classunloading test group
>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243434
>>>> webrevs:
>>>> - code changes: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00.code
>>>>> 15 lines changed: 8 ins; 0 del; 7 mod;
>>>> - adding k/w: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00.kw
>>>>> 112 lines changed: 0 ins; 0 del; 112 mod;
>>>> - full: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00
>>>>> 132 lines changed: 8 ins; 0 del; 124 mod
>>>> Thanks,
>>>> -- Igor


From kim.barrett at oracle.com  Tue Jun  2 05:09:39 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 2 Jun 2020 01:09:39 -0400
Subject: RFR(S/M) : 8243434 : use reproducible random in
 :vmTestbase_vm_g1classunloading
In-Reply-To: <5AA3E4F2-AC0D-4B77-8111-56FC8C993FEC@oracle.com>
References: <5AA3E4F2-AC0D-4B77-8111-56FC8C993FEC@oracle.com>
Message-ID: <192B067F-707C-4E7F-9F7A-1283684BBCD7@oracle.com>

> On Apr 30, 2020, at 7:10 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~iignatyev/8243434/webrev.00
>> 132 lines changed: 8 ins; 0 del; 124 mod
> 
> Hi all,
> 
> could you please review this patch?
> from JBS: 
>> this subtask is to use j.t.l.Utils.getRandomInstance() as a random number generator, where applicable, in : vmTestbase_vm_g1classunloading test group and marking the tests which make use of "randomness" with a proper k/w.
> 
> testing: : vmTestbase_vm_g1classunloading test group
> JBS: https://bugs.openjdk.java.net/browse/JDK-8243434
> webrevs:
> - code changes: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00.code
>> 15 lines changed: 8 ins; 0 del; 7 mod;
> - adding k/w: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00.kw
>> 112 lines changed: 0 ins; 0 del; 112 mod;
> - full: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00
>> 132 lines changed: 8 ins; 0 del; 124 mod
> 
> Thanks,
> -- Igor

Looks good.


From stefan.karlsson at oracle.com  Tue Jun  2 06:09:34 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 2 Jun 2020 08:09:34 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <9f501f2c-bae7-16c5-f165-62ce2bcb59e6@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
 <9f501f2c-bae7-16c5-f165-62ce2bcb59e6@oracle.com>
Message-ID: <1a956a83-8993-f95f-87eb-6a5f477534f1@oracle.com>

Thanks for reviewing.

StefanK

On 2020-06-01 20:50, stefan.johansson at oracle.com wrote:
> Hi Stefan,
>
> On 2020-06-01 16:38, Stefan Karlsson wrote:
>> Updated webrev:
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.02
> Looks good, thanks for making the functionality shared. Filed 
> https://bugs.openjdk.java.net/browse/JDK-8246272 for the other GCs.
>
> Cheers,
> Stefan
>
>>
>> StefanJ asked if I could make this a utility that other GCs could use 
>> as well. I've moved the functionality to 
>> gc/shared/gcLogPrecious.[hc]pp, but I haven't implemented this for 
>> the other GCs. That part is left for separate RFEs.
>>
>> Thanks,
>> StefanK
>>
>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> Please review this patch to save some of the important ZGC log lines 
>>> and print them when dumping hs_err files.
>>>
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>
>>> The patch adds a concept of "precious" log lines. What's typically 
>>> logged are GC initialization lines, but also error messages are 
>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>> crashes or hits an assert. The lines can also be printed in a 
>>> debugger to get a quick overview when debugging.
>>>
>>> The precious lines are always saved, but just like any other Unified 
>>> Logging calls, only logged if the tags are enabled.
>>>
>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>> like this:
>>>
>>> ZGC Precious Log:
>>> ?NUMA Support: Disabled
>>> ?CPUs: 8 total, 8 available
>>> ?Memory: 16384M
>>> ?Large Page Support: Disabled
>>> ?Medium Page Size: 32M
>>> ?Workers: 5 parallel, 1 concurrent
>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>> ?Address Space Size: 65536M x 3 = 196608M
>>> ?Min Capacity: 42M
>>> ?Initial Capacity: 256M
>>> ?Max Capacity: 4096M
>>> ?Max Reserve: 42M
>>> ?Pre-touch: Disabled
>>> ?Uncommit: Enabled
>>> ?Uncommit Delay: 300s
>>> ?Runtime Workers: 5 parallel
>>>
>>> ZGC Globals:
>>> ?GlobalPhase:?????? 2 (Relocate)
>>> ?GlobalSeqNum:????? 1
>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>> ?Page Size Small:?? 2M
>>> ?Page Size Medium:? 32M
>>>
>>> ZGC Metadata Bits:
>>> ?Good:????????????? 0x0000100000000000
>>> ?Bad:?????????????? 0x00002c0000000000
>>> ?WeakBad:?????????? 0x00000c0000000000
>>> ?Marked:??????????? 0x0000040000000000
>>> ?Remapped:????????? 0x0000100000000000
>>>
>>> Heap:
>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>> reserved 1056768K
>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>> 1048576K
>>>
>>> ZGC Page Table:
>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>> Allocating
>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>> Allocating
>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>> Allocating
>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>> Allocating
>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>> Allocating
>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>> Allocating
>>>
>>> Thanks,
>>> StefanK
>>


From stefan.karlsson at oracle.com  Tue Jun  2 06:51:50 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 2 Jun 2020 08:51:50 +0200
Subject: RFR: 8246258: Enable hs_err heap printing earlier during
 initialization
In-Reply-To: <CAA-vtUyAGe-cWQFogYfDxBtGe7YB7m5g2aQo+8Wx7fDnMg95_Q@mail.gmail.com>
References: <db3ef810-0718-00f2-19b1-d2442ea99efc@oracle.com>
 <CAA-vtUyAGe-cWQFogYfDxBtGe7YB7m5g2aQo+8Wx7fDnMg95_Q@mail.gmail.com>
Message-ID: <1b14f9e6-70a2-b97b-1a11-b98c21895bf4@oracle.com>

Hi Thomas,

On 2020-06-01 18:04, Thomas St?fe wrote:
> Hi Stefan,
>
> looks good.

Thanks for reviewing.

> Note that there are tests which test very early error handling (see 
> TestVeryEarlyAssert), you could extend that to guard against bitrot. 
> But I am fine with the patch as it is.
>
OK. I think I prefer to deal with any problems if/when they arise. Also 
note that if the heap printing fails, then it's only the rest of the 
heap section that gets truncated, the rest of the hs_err file still gets 
printed. So, that limits the risk a bit.

StefanK

> Cheers, Thomas
>
> On Mon, Jun 1, 2020 at 5:57 PM Stefan Karlsson 
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
>     Hi all,
>
>     Please review this patch to enable the hs_err GC / heap printing
>     directly after the heap has been set up.
>
>     https://cr.openjdk.java.net/~stefank/8246258/webrev.01/
>     https://bugs.openjdk.java.net/browse/JDK-8246258
>
>     Changes in the patch:
>     - Remove the Universe::is_fully_initialized
>     - Add NULL initializations and checks in print paths
>
>     I tested this patch by adding a temporary fatal(...) here:
>
>     jint Universe::initialize_heap() {
>     ?? assert(_collectedHeap == NULL, "Heap already created");
>     ?? _collectedHeap = GCConfig::arguments()->create_heap();
>     ?? // <<<< HERE
>     ?? log_info(gc)("Using %s", _collectedHeap->name());
>     ?? return _collectedHeap->initialize();
>     }
>
>     and manually looking at the result when running with all GCs. Will
>     run
>     this through tier1-3.
>
>     Thanks,
>     StefanK
>


From stefan.karlsson at oracle.com  Tue Jun  2 07:55:50 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 2 Jun 2020 09:55:50 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <1a956a83-8993-f95f-87eb-6a5f477534f1@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
 <9f501f2c-bae7-16c5-f165-62ce2bcb59e6@oracle.com>
 <1a956a83-8993-f95f-87eb-6a5f477534f1@oracle.com>
Message-ID: <eea702d9-1ee7-0060-8ab8-22fb5a627892@oracle.com>

Found one more line that we should save:
 ?https://cr.openjdk.java.net/~stefank/8246135/webrev.03.delta/
 ?https://cr.openjdk.java.net/~stefank/8246135/webrev.03/

StefanK

On 2020-06-02 08:09, Stefan Karlsson wrote:
> Thanks for reviewing.
>
> StefanK
>
> On 2020-06-01 20:50, stefan.johansson at oracle.com wrote:
>> Hi Stefan,
>>
>> On 2020-06-01 16:38, Stefan Karlsson wrote:
>>> Updated webrev:
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.02
>> Looks good, thanks for making the functionality shared. Filed 
>> https://bugs.openjdk.java.net/browse/JDK-8246272 for the other GCs.
>>
>> Cheers,
>> Stefan
>>
>>>
>>> StefanJ asked if I could make this a utility that other GCs could 
>>> use as well. I've moved the functionality to 
>>> gc/shared/gcLogPrecious.[hc]pp, but I haven't implemented this for 
>>> the other GCs. That part is left for separate RFEs.
>>>
>>> Thanks,
>>> StefanK
>>>
>>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>>> Hi all,
>>>>
>>>> Please review this patch to save some of the important ZGC log 
>>>> lines and print them when dumping hs_err files.
>>>>
>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>>
>>>> The patch adds a concept of "precious" log lines. What's typically 
>>>> logged are GC initialization lines, but also error messages are 
>>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>>> crashes or hits an assert. The lines can also be printed in a 
>>>> debugger to get a quick overview when debugging.
>>>>
>>>> The precious lines are always saved, but just like any other 
>>>> Unified Logging calls, only logged if the tags are enabled.
>>>>
>>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>>> like this:
>>>>
>>>> ZGC Precious Log:
>>>> ?NUMA Support: Disabled
>>>> ?CPUs: 8 total, 8 available
>>>> ?Memory: 16384M
>>>> ?Large Page Support: Disabled
>>>> ?Medium Page Size: 32M
>>>> ?Workers: 5 parallel, 1 concurrent
>>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>>> ?Address Space Size: 65536M x 3 = 196608M
>>>> ?Min Capacity: 42M
>>>> ?Initial Capacity: 256M
>>>> ?Max Capacity: 4096M
>>>> ?Max Reserve: 42M
>>>> ?Pre-touch: Disabled
>>>> ?Uncommit: Enabled
>>>> ?Uncommit Delay: 300s
>>>> ?Runtime Workers: 5 parallel
>>>>
>>>> ZGC Globals:
>>>> ?GlobalPhase:?????? 2 (Relocate)
>>>> ?GlobalSeqNum:????? 1
>>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>>> ?Page Size Small:?? 2M
>>>> ?Page Size Medium:? 32M
>>>>
>>>> ZGC Metadata Bits:
>>>> ?Good:????????????? 0x0000100000000000
>>>> ?Bad:?????????????? 0x00002c0000000000
>>>> ?WeakBad:?????????? 0x00000c0000000000
>>>> ?Marked:??????????? 0x0000040000000000
>>>> ?Remapped:????????? 0x0000100000000000
>>>>
>>>> Heap:
>>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>>> reserved 1056768K
>>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>>> 1048576K
>>>>
>>>> ZGC Page Table:
>>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>>> Allocating
>>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>>> Allocating
>>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>>> Allocating
>>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>>> Allocating
>>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>>> Allocating
>>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>>> Allocating
>>>>
>>>> Thanks,
>>>> StefanK
>>>
>


From stefan.johansson at oracle.com  Tue Jun  2 08:47:31 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Tue, 2 Jun 2020 10:47:31 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <eea702d9-1ee7-0060-8ab8-22fb5a627892@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
 <9f501f2c-bae7-16c5-f165-62ce2bcb59e6@oracle.com>
 <1a956a83-8993-f95f-87eb-6a5f477534f1@oracle.com>
 <eea702d9-1ee7-0060-8ab8-22fb5a627892@oracle.com>
Message-ID: <c1029e7c-b70b-1c19-fe90-9555b1b4c81c@oracle.com>

Still good,
Stefan

On 2020-06-02 09:55, Stefan Karlsson wrote:
> Found one more line that we should save:
>  ?https://cr.openjdk.java.net/~stefank/8246135/webrev.03.delta/
>  ?https://cr.openjdk.java.net/~stefank/8246135/webrev.03/
> 
> StefanK
> 
> On 2020-06-02 08:09, Stefan Karlsson wrote:
>> Thanks for reviewing.
>>
>> StefanK
>>
>> On 2020-06-01 20:50, stefan.johansson at oracle.com wrote:
>>> Hi Stefan,
>>>
>>> On 2020-06-01 16:38, Stefan Karlsson wrote:
>>>> Updated webrev:
>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.02
>>> Looks good, thanks for making the functionality shared. Filed 
>>> https://bugs.openjdk.java.net/browse/JDK-8246272 for the other GCs.
>>>
>>> Cheers,
>>> Stefan
>>>
>>>>
>>>> StefanJ asked if I could make this a utility that other GCs could 
>>>> use as well. I've moved the functionality to 
>>>> gc/shared/gcLogPrecious.[hc]pp, but I haven't implemented this for 
>>>> the other GCs. That part is left for separate RFEs.
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>>>> Hi all,
>>>>>
>>>>> Please review this patch to save some of the important ZGC log 
>>>>> lines and print them when dumping hs_err files.
>>>>>
>>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>>>
>>>>> The patch adds a concept of "precious" log lines. What's typically 
>>>>> logged are GC initialization lines, but also error messages are 
>>>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>>>> crashes or hits an assert. The lines can also be printed in a 
>>>>> debugger to get a quick overview when debugging.
>>>>>
>>>>> The precious lines are always saved, but just like any other 
>>>>> Unified Logging calls, only logged if the tags are enabled.
>>>>>
>>>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>>>> like this:
>>>>>
>>>>> ZGC Precious Log:
>>>>> ?NUMA Support: Disabled
>>>>> ?CPUs: 8 total, 8 available
>>>>> ?Memory: 16384M
>>>>> ?Large Page Support: Disabled
>>>>> ?Medium Page Size: 32M
>>>>> ?Workers: 5 parallel, 1 concurrent
>>>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>>>> ?Address Space Size: 65536M x 3 = 196608M
>>>>> ?Min Capacity: 42M
>>>>> ?Initial Capacity: 256M
>>>>> ?Max Capacity: 4096M
>>>>> ?Max Reserve: 42M
>>>>> ?Pre-touch: Disabled
>>>>> ?Uncommit: Enabled
>>>>> ?Uncommit Delay: 300s
>>>>> ?Runtime Workers: 5 parallel
>>>>>
>>>>> ZGC Globals:
>>>>> ?GlobalPhase:?????? 2 (Relocate)
>>>>> ?GlobalSeqNum:????? 1
>>>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>>>> ?Page Size Small:?? 2M
>>>>> ?Page Size Medium:? 32M
>>>>>
>>>>> ZGC Metadata Bits:
>>>>> ?Good:????????????? 0x0000100000000000
>>>>> ?Bad:?????????????? 0x00002c0000000000
>>>>> ?WeakBad:?????????? 0x00000c0000000000
>>>>> ?Marked:??????????? 0x0000040000000000
>>>>> ?Remapped:????????? 0x0000100000000000
>>>>>
>>>>> Heap:
>>>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>>>> reserved 1056768K
>>>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>>>> 1048576K
>>>>>
>>>>> ZGC Page Table:
>>>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>>>> Allocating
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>
>>
> 


From stefan.karlsson at oracle.com  Tue Jun  2 08:51:51 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 2 Jun 2020 10:51:51 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <c1029e7c-b70b-1c19-fe90-9555b1b4c81c@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
 <9f501f2c-bae7-16c5-f165-62ce2bcb59e6@oracle.com>
 <1a956a83-8993-f95f-87eb-6a5f477534f1@oracle.com>
 <eea702d9-1ee7-0060-8ab8-22fb5a627892@oracle.com>
 <c1029e7c-b70b-1c19-fe90-9555b1b4c81c@oracle.com>
Message-ID: <ce1c0585-b2c4-24db-6625-77310c7e8ae1@oracle.com>

Thanks.

StefanK

On 2020-06-02 10:47, stefan.johansson at oracle.com wrote:
> Still good,
> Stefan
>
> On 2020-06-02 09:55, Stefan Karlsson wrote:
>> Found one more line that we should save:
>> ??https://cr.openjdk.java.net/~stefank/8246135/webrev.03.delta/
>> ??https://cr.openjdk.java.net/~stefank/8246135/webrev.03/
>>
>> StefanK
>>
>> On 2020-06-02 08:09, Stefan Karlsson wrote:
>>> Thanks for reviewing.
>>>
>>> StefanK
>>>
>>> On 2020-06-01 20:50, stefan.johansson at oracle.com wrote:
>>>> Hi Stefan,
>>>>
>>>> On 2020-06-01 16:38, Stefan Karlsson wrote:
>>>>> Updated webrev:
>>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.02
>>>> Looks good, thanks for making the functionality shared. Filed 
>>>> https://bugs.openjdk.java.net/browse/JDK-8246272 for the other GCs.
>>>>
>>>> Cheers,
>>>> Stefan
>>>>
>>>>>
>>>>> StefanJ asked if I could make this a utility that other GCs could 
>>>>> use as well. I've moved the functionality to 
>>>>> gc/shared/gcLogPrecious.[hc]pp, but I haven't implemented this for 
>>>>> the other GCs. That part is left for separate RFEs.
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>
>>>>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Please review this patch to save some of the important ZGC log 
>>>>>> lines and print them when dumping hs_err files.
>>>>>>
>>>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>>>>
>>>>>> The patch adds a concept of "precious" log lines. What's 
>>>>>> typically logged are GC initialization lines, but also error 
>>>>>> messages are saved. These lines are then dumped in the hs_err 
>>>>>> file if the JVM crashes or hits an assert. The lines can also be 
>>>>>> printed in a debugger to get a quick overview when debugging.
>>>>>>
>>>>>> The precious lines are always saved, but just like any other 
>>>>>> Unified Logging calls, only logged if the tags are enabled.
>>>>>>
>>>>>> The patch builds on the JDK-8246134 patch. The hs_err output 
>>>>>> looks like this:
>>>>>>
>>>>>> ZGC Precious Log:
>>>>>> ?NUMA Support: Disabled
>>>>>> ?CPUs: 8 total, 8 available
>>>>>> ?Memory: 16384M
>>>>>> ?Large Page Support: Disabled
>>>>>> ?Medium Page Size: 32M
>>>>>> ?Workers: 5 parallel, 1 concurrent
>>>>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>>>>> ?Address Space Size: 65536M x 3 = 196608M
>>>>>> ?Min Capacity: 42M
>>>>>> ?Initial Capacity: 256M
>>>>>> ?Max Capacity: 4096M
>>>>>> ?Max Reserve: 42M
>>>>>> ?Pre-touch: Disabled
>>>>>> ?Uncommit: Enabled
>>>>>> ?Uncommit Delay: 300s
>>>>>> ?Runtime Workers: 5 parallel
>>>>>>
>>>>>> ZGC Globals:
>>>>>> ?GlobalPhase:?????? 2 (Relocate)
>>>>>> ?GlobalSeqNum:????? 1
>>>>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>>>>> ?Page Size Small:?? 2M
>>>>>> ?Page Size Medium:? 32M
>>>>>>
>>>>>> ZGC Metadata Bits:
>>>>>> ?Good:????????????? 0x0000100000000000
>>>>>> ?Bad:?????????????? 0x00002c0000000000
>>>>>> ?WeakBad:?????????? 0x00000c0000000000
>>>>>> ?Marked:??????????? 0x0000040000000000
>>>>>> ?Remapped:????????? 0x0000100000000000
>>>>>>
>>>>>> Heap:
>>>>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>>>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>>>>> reserved 1056768K
>>>>>> ? class space??? used 559K, capacity 588K, committed 640K, 
>>>>>> reserved 1048576K
>>>>>>
>>>>>> ZGC Page Table:
>>>>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>>>>> Allocating
>>>>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>>>>> Allocating
>>>>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>>>>> Allocating
>>>>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>>>>> Allocating
>>>>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>>>>> Allocating
>>>>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>>>>> Allocating
>>>>>>
>>>>>> Thanks,
>>>>>> StefanK
>>>>>
>>>
>>


From shade at redhat.com  Tue Jun  2 09:49:29 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 2 Jun 2020 11:49:29 +0200
Subject: [15] RFR 8245961: Shenandoah: move some root marking to
 concurrent phase
In-Reply-To: <61efdf04-5d74-8a59-a193-576b115738ef@redhat.com>
References: <61efdf04-5d74-8a59-a193-576b115738ef@redhat.com>
Message-ID: <ef70f68f-c69c-f18e-c510-192c285fabf7@redhat.com>

On 6/1/20 8:23 PM, Zhengyu Gu wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8245961
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.00/

Let me push 8246097 and 8246100 before this, so we would have the backport base.

Mostly stylistic nits follow:

*) Does it make sense to move ShenandoahConcurrentRootsIterator to shenandoahRootProcessor? And do
something like ShenandoahConcurrentRootScanner?

*) conc_mark_roots can be just "  Concurrent Roots", no need to duplicate "Concurrent Mark":

  70   f(conc_mark,                                      "Concurrent Marking")              \
  71   f(conc_mark_roots,                                "  Concurrent Mark Roots ")        \
  72   SHENANDOAH_PAR_PHASE_DO(conc_mark_roots,          "    CM: ", f)                     \

*) Typo: "Concurrnet":

 145   f(full_gc_scan_conc_roots,                        "  Scan Concurrnet Roots")         \


-- 
Thanks,
-Aleksey


From shade at redhat.com  Tue Jun  2 09:52:37 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 2 Jun 2020 11:52:37 +0200
Subject: RFR (S) 8246100: Shenandoah: walk roots in more efficient order
In-Reply-To: <f774ec7b-22a3-b8ee-3541-8a6f6c476ede@redhat.com>
References: <4a733745-6f94-b6aa-b8c9-8daedace22f2@redhat.com>
 <3a8331ad-9ec1-dda9-058d-6edcdd594e4d@redhat.com>
 <f774ec7b-22a3-b8ee-3541-8a6f6c476ede@redhat.com>
Message-ID: <7945aaf1-da2c-83ef-44d5-caf34e7ed607@redhat.com>

On 6/1/20 9:26 PM, Zhengyu Gu wrote:
> Wait, should we process CLDG early? CLD likely uneven, e.g. boot class 
> loader may take a lot longer than others.

It is not uneven in my tests. But even if it is, we want to do CLDG before thread/code roots, which
is what the patch does. With 8246097, it would be moved to the "lightweight/limited" block before.

> On 6/1/20 3:03 PM, Zhengyu Gu wrote:
>> Looks good in general. But not sure why call vm/weak/dedup roots limited 
>> parallel, I think they are fully parallel.

Yes. But they are also lightweight, that's what block does: both lightweight or limited-parallel.

>>> Webrev:
>>> ?? https://cr.openjdk.java.net/~shade/8246100/webrev.02/

-- 
Thanks,
-Aleksey


From thomas.schatzl at oracle.com  Tue Jun  2 10:40:20 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 2 Jun 2020 12:40:20 +0200
Subject: RFR(S/M) : 8243430 : use reproducible random in :vmTestbase_vm_gc
In-Reply-To: <C630559B-E98E-49C7-B5AB-F12F57492828@oracle.com>
References: <587C03FD-EFF4-42C1-9860-0BDC4F3A800F@oracle.com>
 <1440ED97-6390-402A-B1E1-810DC9DEDBA3@oracle.com>
 <EEF219A3-1EAE-44A9-A14D-26F55686D89B@oracle.com>
 <F977C6E3-64CF-4F67-B266-7D55C4C2B18A@oracle.com>
 <6D4C5C1A-5D83-4DA9-A093-D3E9799584AC@oracle.com>
 <C630559B-E98E-49C7-B5AB-F12F57492828@oracle.com>
Message-ID: <183d2103-6a8d-2477-dad5-4d3c1e1bcf37@oracle.com>

Hi,

On 02.06.20 05:00, Igor Ignatyev wrote:
> ping?
> -- Igor

   looks good.

Thomas


From erik.osterlund at oracle.com  Tue Jun  2 10:42:02 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 2 Jun 2020 12:42:02 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <eea702d9-1ee7-0060-8ab8-22fb5a627892@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
 <9f501f2c-bae7-16c5-f165-62ce2bcb59e6@oracle.com>
 <1a956a83-8993-f95f-87eb-6a5f477534f1@oracle.com>
 <eea702d9-1ee7-0060-8ab8-22fb5a627892@oracle.com>
Message-ID: <7fe6a3e3-1bab-7733-273f-47576d013e8c@oracle.com>

+1

/Erik

On 2020-06-02 09:55, Stefan Karlsson wrote:
> Found one more line that we should save:
> ?https://cr.openjdk.java.net/~stefank/8246135/webrev.03.delta/
> ?https://cr.openjdk.java.net/~stefank/8246135/webrev.03/
>
> StefanK
>
> On 2020-06-02 08:09, Stefan Karlsson wrote:
>> Thanks for reviewing.
>>
>> StefanK
>>
>> On 2020-06-01 20:50, stefan.johansson at oracle.com wrote:
>>> Hi Stefan,
>>>
>>> On 2020-06-01 16:38, Stefan Karlsson wrote:
>>>> Updated webrev:
>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.02
>>> Looks good, thanks for making the functionality shared. Filed 
>>> https://bugs.openjdk.java.net/browse/JDK-8246272 for the other GCs.
>>>
>>> Cheers,
>>> Stefan
>>>
>>>>
>>>> StefanJ asked if I could make this a utility that other GCs could 
>>>> use as well. I've moved the functionality to 
>>>> gc/shared/gcLogPrecious.[hc]pp, but I haven't implemented this for 
>>>> the other GCs. That part is left for separate RFEs.
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>>>> Hi all,
>>>>>
>>>>> Please review this patch to save some of the important ZGC log 
>>>>> lines and print them when dumping hs_err files.
>>>>>
>>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>>>
>>>>> The patch adds a concept of "precious" log lines. What's typically 
>>>>> logged are GC initialization lines, but also error messages are 
>>>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>>>> crashes or hits an assert. The lines can also be printed in a 
>>>>> debugger to get a quick overview when debugging.
>>>>>
>>>>> The precious lines are always saved, but just like any other 
>>>>> Unified Logging calls, only logged if the tags are enabled.
>>>>>
>>>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>>>> like this:
>>>>>
>>>>> ZGC Precious Log:
>>>>> ?NUMA Support: Disabled
>>>>> ?CPUs: 8 total, 8 available
>>>>> ?Memory: 16384M
>>>>> ?Large Page Support: Disabled
>>>>> ?Medium Page Size: 32M
>>>>> ?Workers: 5 parallel, 1 concurrent
>>>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>>>> ?Address Space Size: 65536M x 3 = 196608M
>>>>> ?Min Capacity: 42M
>>>>> ?Initial Capacity: 256M
>>>>> ?Max Capacity: 4096M
>>>>> ?Max Reserve: 42M
>>>>> ?Pre-touch: Disabled
>>>>> ?Uncommit: Enabled
>>>>> ?Uncommit Delay: 300s
>>>>> ?Runtime Workers: 5 parallel
>>>>>
>>>>> ZGC Globals:
>>>>> ?GlobalPhase:?????? 2 (Relocate)
>>>>> ?GlobalSeqNum:????? 1
>>>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>>>> ?Page Size Small:?? 2M
>>>>> ?Page Size Medium:? 32M
>>>>>
>>>>> ZGC Metadata Bits:
>>>>> ?Good:????????????? 0x0000100000000000
>>>>> ?Bad:?????????????? 0x00002c0000000000
>>>>> ?WeakBad:?????????? 0x00000c0000000000
>>>>> ?Marked:??????????? 0x0000040000000000
>>>>> ?Remapped:????????? 0x0000100000000000
>>>>>
>>>>> Heap:
>>>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>>>> reserved 1056768K
>>>>> ? class space??? used 559K, capacity 588K, committed 640K, 
>>>>> reserved 1048576K
>>>>>
>>>>> ZGC Page Table:
>>>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>>>> Allocating
>>>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>>>> Allocating
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>
>>
>


From per.liden at oracle.com  Tue Jun  2 11:18:34 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 2 Jun 2020 13:18:34 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <f17c6ed0-d2a6-8062-cf31-38d513adce6e@oracle.com>
Message-ID: <5a9e6e8b-cccc-c70a-19e6-0fcbb5b80e9f@oracle.com>

Looks good!

/Per

On 6/1/20 4:38 PM, Stefan Karlsson wrote:
> Updated webrev:
> https://cr.openjdk.java.net/~stefank/8246135/webrev.02
> 
> StefanJ asked if I could make this a utility that other GCs could use as 
> well. I've moved the functionality to gc/shared/gcLogPrecious.[hc]pp, 
> but I haven't implemented this for the other GCs. That part is left for 
> separate RFEs.
> 
> Thanks,
> StefanK
> 
> On 2020-05-29 12:23, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to save some of the important ZGC log lines 
>> and print them when dumping hs_err files.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>
>> The patch adds a concept of "precious" log lines. What's typically 
>> logged are GC initialization lines, but also error messages are saved. 
>> These lines are then dumped in the hs_err file if the JVM crashes or 
>> hits an assert. The lines can also be printed in a debugger to get a 
>> quick overview when debugging.
>>
>> The precious lines are always saved, but just like any other Unified 
>> Logging calls, only logged if the tags are enabled.
>>
>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>> like this:
>>
>> ZGC Precious Log:
>> ?NUMA Support: Disabled
>> ?CPUs: 8 total, 8 available
>> ?Memory: 16384M
>> ?Large Page Support: Disabled
>> ?Medium Page Size: 32M
>> ?Workers: 5 parallel, 1 concurrent
>> ?Address Space Type: Contiguous/Unrestricted/Complete
>> ?Address Space Size: 65536M x 3 = 196608M
>> ?Min Capacity: 42M
>> ?Initial Capacity: 256M
>> ?Max Capacity: 4096M
>> ?Max Reserve: 42M
>> ?Pre-touch: Disabled
>> ?Uncommit: Enabled
>> ?Uncommit Delay: 300s
>> ?Runtime Workers: 5 parallel
>>
>> ZGC Globals:
>> ?GlobalPhase:?????? 2 (Relocate)
>> ?GlobalSeqNum:????? 1
>> ?Offset Max:??????? 4096G (0x0000040000000000)
>> ?Page Size Small:?? 2M
>> ?Page Size Medium:? 32M
>>
>> ZGC Metadata Bits:
>> ?Good:????????????? 0x0000100000000000
>> ?Bad:?????????????? 0x00002c0000000000
>> ?WeakBad:?????????? 0x00000c0000000000
>> ?Marked:??????????? 0x0000040000000000
>> ?Remapped:????????? 0x0000100000000000
>>
>> Heap:
>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, reserved 
>> 1056768K
>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>> 1048576K
>>
>> ZGC Page Table:
>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>> Allocating
>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>> Allocating
>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>> Allocating
>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>> Allocating
>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>> Allocating
>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>> Allocating
>>
>> Thanks,
>> StefanK
> 


From zgu at redhat.com  Tue Jun  2 11:22:02 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 2 Jun 2020 07:22:02 -0400
Subject: RFR (S) 8246100: Shenandoah: walk roots in more efficient order
In-Reply-To: <7945aaf1-da2c-83ef-44d5-caf34e7ed607@redhat.com>
References: <4a733745-6f94-b6aa-b8c9-8daedace22f2@redhat.com>
 <3a8331ad-9ec1-dda9-058d-6edcdd594e4d@redhat.com>
 <f774ec7b-22a3-b8ee-3541-8a6f6c476ede@redhat.com>
 <7945aaf1-da2c-83ef-44d5-caf34e7ed607@redhat.com>
Message-ID: <77219dc7-6dfe-9f45-61d1-20b5da789237@redhat.com>


On 6/2/20 5:52 AM, Aleksey Shipilev wrote:
> On 6/1/20 9:26 PM, Zhengyu Gu wrote:
>> Wait, should we process CLDG early? CLD likely uneven, e.g. boot class
>> loader may take a lot longer than others.
> 
> It is not uneven in my tests. But even if it is, we want to do CLDG before thread/code roots, which
> is what the patch does. With 8246097, it would be moved to the "lightweight/limited" block before.
> 
>> On 6/1/20 3:03 PM, Zhengyu Gu wrote:
>>> Looks good in general. But not sure why call vm/weak/dedup roots limited
>>> parallel, I think they are fully parallel.
> 
> Yes. But they are also lightweight, that's what block does: both lightweight or limited-parallel.
> 

Okay, then.

-Zhengyu

>>>> Webrev:
>>>>  ?? https://cr.openjdk.java.net/~shade/8246100/webrev.02/
> 


From zgu at redhat.com  Tue Jun  2 12:44:23 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 2 Jun 2020 08:44:23 -0400
Subject: RFR (S) 8246097: Shenandoah: limit parallelism in CLDG root
 handling
In-Reply-To: <31edeb90-b199-0ba2-e782-c38792d3bb4c@redhat.com>
References: <31edeb90-b199-0ba2-e782-c38792d3bb4c@redhat.com>
Message-ID: <819d513f-ad38-7287-6e0a-dfbbf0f179db@redhat.com>

Looks good.

-Zhengyu

On 6/1/20 6:38 AM, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8246097
> 
> See the details in the RFE.
> 
> Webrev:
>    https://cr.openjdk.java.net/~shade/8246097/webrev.02/
> 
> Testing: hotspot_gc_shenandoah, benchmarks.
> 


From zgu at redhat.com  Tue Jun  2 13:12:41 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 2 Jun 2020 09:12:41 -0400
Subject: [15] RFR 8245961: Shenandoah: move some root marking to
 concurrent phase
In-Reply-To: <ef70f68f-c69c-f18e-c510-192c285fabf7@redhat.com>
References: <61efdf04-5d74-8a59-a193-576b115738ef@redhat.com>
 <ef70f68f-c69c-f18e-c510-192c285fabf7@redhat.com>
Message-ID: <5b7385d8-8b08-fd52-4ef0-d846fc0c3e11@redhat.com>


On 6/2/20 5:49 AM, Aleksey Shipilev wrote:
> On 6/1/20 8:23 PM, Zhengyu Gu wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8245961
>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.00/
> 
> Let me push 8246097 and 8246100 before this, so we would have the backport base.
Sure.

> 
> Mostly stylistic nits follow:
> 
> *) Does it make sense to move ShenandoahConcurrentRootsIterator to shenandoahRootProcessor? And do
> something like ShenandoahConcurrentRootScanner?

Okay.

> 
> *) conc_mark_roots can be just "  Concurrent Roots", no need to duplicate "Concurrent Mark":
> 
>    70   f(conc_mark,                                      "Concurrent Marking")              \
>    71   f(conc_mark_roots,                                "  Concurrent Mark Roots ")        \
>    72   SHENANDOAH_PAR_PHASE_DO(conc_mark_roots,          "    CM: ", f)                     \
> 

Fixed
> *) Typo: "Concurrnet":
> 
>   145   f(full_gc_scan_conc_roots,                        "  Scan Concurrnet Roots")         \
> 
Fixed.

http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.01/index.html


Thanks,

-Zhengyu
> 


From shade at redhat.com  Tue Jun  2 13:34:45 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 2 Jun 2020 15:34:45 +0200
Subject: RFR (S) 8246097: Shenandoah: limit parallelism in CLDG root
 handling
In-Reply-To: <819d513f-ad38-7287-6e0a-dfbbf0f179db@redhat.com>
References: <31edeb90-b199-0ba2-e782-c38792d3bb4c@redhat.com>
 <819d513f-ad38-7287-6e0a-dfbbf0f179db@redhat.com>
Message-ID: <a74e2c1d-4056-0929-dc03-e3e4e67eaa65@redhat.com>

On 6/2/20 2:44 PM, Zhengyu Gu wrote:
> Looks good.

Thanks, pushed.

-- 
-Aleksey


From stefan.karlsson at oracle.com  Tue Jun  2 13:51:51 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 2 Jun 2020 15:51:51 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
Message-ID: <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>

Hi all,

While working on getting more information when dumping early during 
initialization, it became apparent that we don't print these log lines 
as early as we could. In ZGC we can assert and/or fail during the set up 
of the heap. I'd like to print the precious lines even when that 
happens. The following patch moves the the precious line printing out of 
ZCollectedHeap and into a direct call from the VMError code. GCs that 
populate these lines will now automatically get them printed.

https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
https://cr.openjdk.java.net/~stefank/8246135/webrev.04

Thanks,
StefanK

On 2020-05-29 12:23, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to save some of the important ZGC log lines 
> and print them when dumping hs_err files.
>
> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246135
>
> The patch adds a concept of "precious" log lines. What's typically 
> logged are GC initialization lines, but also error messages are saved. 
> These lines are then dumped in the hs_err file if the JVM crashes or 
> hits an assert. The lines can also be printed in a debugger to get a 
> quick overview when debugging.
>
> The precious lines are always saved, but just like any other Unified 
> Logging calls, only logged if the tags are enabled.
>
> The patch builds on the JDK-8246134 patch. The hs_err output looks 
> like this:
>
> ZGC Precious Log:
> ?NUMA Support: Disabled
> ?CPUs: 8 total, 8 available
> ?Memory: 16384M
> ?Large Page Support: Disabled
> ?Medium Page Size: 32M
> ?Workers: 5 parallel, 1 concurrent
> ?Address Space Type: Contiguous/Unrestricted/Complete
> ?Address Space Size: 65536M x 3 = 196608M
> ?Min Capacity: 42M
> ?Initial Capacity: 256M
> ?Max Capacity: 4096M
> ?Max Reserve: 42M
> ?Pre-touch: Disabled
> ?Uncommit: Enabled
> ?Uncommit Delay: 300s
> ?Runtime Workers: 5 parallel
>
> ZGC Globals:
> ?GlobalPhase:?????? 2 (Relocate)
> ?GlobalSeqNum:????? 1
> ?Offset Max:??????? 4096G (0x0000040000000000)
> ?Page Size Small:?? 2M
> ?Page Size Medium:? 32M
>
> ZGC Metadata Bits:
> ?Good:????????????? 0x0000100000000000
> ?Bad:?????????????? 0x00002c0000000000
> ?WeakBad:?????????? 0x00000c0000000000
> ?Marked:??????????? 0x0000040000000000
> ?Remapped:????????? 0x0000100000000000
>
> Heap:
> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, reserved 
> 1056768K
> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
> 1048576K
>
> ZGC Page Table:
> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
> Allocating
> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
> Allocating
> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
> Allocating
> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
> Allocating
> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
> Allocating
> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
> Allocating
>
> Thanks,
> StefanK


From per.liden at oracle.com  Tue Jun  2 13:56:03 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 2 Jun 2020 15:56:03 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
Message-ID: <7a2d0d88-cbc4-bbac-8032-b0175412a394@oracle.com>

Looks good!

/Per

On 6/2/20 3:51 PM, Stefan Karlsson wrote:
> Hi all,
> 
> While working on getting more information when dumping early during 
> initialization, it became apparent that we don't print these log lines 
> as early as we could. In ZGC we can assert and/or fail during the set up 
> of the heap. I'd like to print the precious lines even when that 
> happens. The following patch moves the the precious line printing out of 
> ZCollectedHeap and into a direct call from the VMError code. GCs that 
> populate these lines will now automatically get them printed.
> 
> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
> 
> Thanks,
> StefanK
> 
> On 2020-05-29 12:23, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to save some of the important ZGC log lines 
>> and print them when dumping hs_err files.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>
>> The patch adds a concept of "precious" log lines. What's typically 
>> logged are GC initialization lines, but also error messages are saved. 
>> These lines are then dumped in the hs_err file if the JVM crashes or 
>> hits an assert. The lines can also be printed in a debugger to get a 
>> quick overview when debugging.
>>
>> The precious lines are always saved, but just like any other Unified 
>> Logging calls, only logged if the tags are enabled.
>>
>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>> like this:
>>
>> ZGC Precious Log:
>> ?NUMA Support: Disabled
>> ?CPUs: 8 total, 8 available
>> ?Memory: 16384M
>> ?Large Page Support: Disabled
>> ?Medium Page Size: 32M
>> ?Workers: 5 parallel, 1 concurrent
>> ?Address Space Type: Contiguous/Unrestricted/Complete
>> ?Address Space Size: 65536M x 3 = 196608M
>> ?Min Capacity: 42M
>> ?Initial Capacity: 256M
>> ?Max Capacity: 4096M
>> ?Max Reserve: 42M
>> ?Pre-touch: Disabled
>> ?Uncommit: Enabled
>> ?Uncommit Delay: 300s
>> ?Runtime Workers: 5 parallel
>>
>> ZGC Globals:
>> ?GlobalPhase:?????? 2 (Relocate)
>> ?GlobalSeqNum:????? 1
>> ?Offset Max:??????? 4096G (0x0000040000000000)
>> ?Page Size Small:?? 2M
>> ?Page Size Medium:? 32M
>>
>> ZGC Metadata Bits:
>> ?Good:????????????? 0x0000100000000000
>> ?Bad:?????????????? 0x00002c0000000000
>> ?WeakBad:?????????? 0x00000c0000000000
>> ?Marked:??????????? 0x0000040000000000
>> ?Remapped:????????? 0x0000100000000000
>>
>> Heap:
>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, reserved 
>> 1056768K
>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>> 1048576K
>>
>> ZGC Page Table:
>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>> Allocating
>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>> Allocating
>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>> Allocating
>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>> Allocating
>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>> Allocating
>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>> Allocating
>>
>> Thanks,
>> StefanK
> 


From stefan.karlsson at oracle.com  Tue Jun  2 13:56:45 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 2 Jun 2020 15:56:45 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <7a2d0d88-cbc4-bbac-8032-b0175412a394@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
 <7a2d0d88-cbc4-bbac-8032-b0175412a394@oracle.com>
Message-ID: <a3a4b2fa-5456-76d9-92bd-9c9abe88ea2b@oracle.com>

Thanks!

StefanK

On 2020-06-02 15:56, Per Liden wrote:
> Looks good!
>
> /Per
>
> On 6/2/20 3:51 PM, Stefan Karlsson wrote:
>> Hi all,
>>
>> While working on getting more information when dumping early during 
>> initialization, it became apparent that we don't print these log 
>> lines as early as we could. In ZGC we can assert and/or fail during 
>> the set up of the heap. I'd like to print the precious lines even 
>> when that happens. The following patch moves the the precious line 
>> printing out of ZCollectedHeap and into a direct call from the 
>> VMError code. GCs that populate these lines will now automatically 
>> get them printed.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
>>
>> Thanks,
>> StefanK
>>
>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> Please review this patch to save some of the important ZGC log lines 
>>> and print them when dumping hs_err files.
>>>
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>
>>> The patch adds a concept of "precious" log lines. What's typically 
>>> logged are GC initialization lines, but also error messages are 
>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>> crashes or hits an assert. The lines can also be printed in a 
>>> debugger to get a quick overview when debugging.
>>>
>>> The precious lines are always saved, but just like any other Unified 
>>> Logging calls, only logged if the tags are enabled.
>>>
>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>> like this:
>>>
>>> ZGC Precious Log:
>>> ?NUMA Support: Disabled
>>> ?CPUs: 8 total, 8 available
>>> ?Memory: 16384M
>>> ?Large Page Support: Disabled
>>> ?Medium Page Size: 32M
>>> ?Workers: 5 parallel, 1 concurrent
>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>> ?Address Space Size: 65536M x 3 = 196608M
>>> ?Min Capacity: 42M
>>> ?Initial Capacity: 256M
>>> ?Max Capacity: 4096M
>>> ?Max Reserve: 42M
>>> ?Pre-touch: Disabled
>>> ?Uncommit: Enabled
>>> ?Uncommit Delay: 300s
>>> ?Runtime Workers: 5 parallel
>>>
>>> ZGC Globals:
>>> ?GlobalPhase:?????? 2 (Relocate)
>>> ?GlobalSeqNum:????? 1
>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>> ?Page Size Small:?? 2M
>>> ?Page Size Medium:? 32M
>>>
>>> ZGC Metadata Bits:
>>> ?Good:????????????? 0x0000100000000000
>>> ?Bad:?????????????? 0x00002c0000000000
>>> ?WeakBad:?????????? 0x00000c0000000000
>>> ?Marked:??????????? 0x0000040000000000
>>> ?Remapped:????????? 0x0000100000000000
>>>
>>> Heap:
>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>> reserved 1056768K
>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>> 1048576K
>>>
>>> ZGC Page Table:
>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>> Allocating
>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>> Allocating
>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>> Allocating
>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>> Allocating
>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>> Allocating
>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>> Allocating
>>>
>>> Thanks,
>>> StefanK
>>


From stefan.johansson at oracle.com  Tue Jun  2 13:58:19 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Tue, 2 Jun 2020 15:58:19 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
Message-ID: <5b677c7b-d36e-3583-bdbf-0caf095d3a99@oracle.com>

Hi Stefan,

On 2020-06-02 15:51, Stefan Karlsson wrote:
> Hi all,
> 
> While working on getting more information when dumping early during 
> initialization, it became apparent that we don't print these log lines 
> as early as we could. In ZGC we can assert and/or fail during the set up 
> of the heap. I'd like to print the precious lines even when that 
> happens. The following patch moves the the precious line printing out of 
> ZCollectedHeap and into a direct call from the VMError code. GCs that 
> populate these lines will now automatically get them printed.
> 
> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
Very nice, looks great!

Thanks,
StefanJ

> 
> Thanks,
> StefanK
> 
> On 2020-05-29 12:23, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to save some of the important ZGC log lines 
>> and print them when dumping hs_err files.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>
>> The patch adds a concept of "precious" log lines. What's typically 
>> logged are GC initialization lines, but also error messages are saved. 
>> These lines are then dumped in the hs_err file if the JVM crashes or 
>> hits an assert. The lines can also be printed in a debugger to get a 
>> quick overview when debugging.
>>
>> The precious lines are always saved, but just like any other Unified 
>> Logging calls, only logged if the tags are enabled.
>>
>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>> like this:
>>
>> ZGC Precious Log:
>> ?NUMA Support: Disabled
>> ?CPUs: 8 total, 8 available
>> ?Memory: 16384M
>> ?Large Page Support: Disabled
>> ?Medium Page Size: 32M
>> ?Workers: 5 parallel, 1 concurrent
>> ?Address Space Type: Contiguous/Unrestricted/Complete
>> ?Address Space Size: 65536M x 3 = 196608M
>> ?Min Capacity: 42M
>> ?Initial Capacity: 256M
>> ?Max Capacity: 4096M
>> ?Max Reserve: 42M
>> ?Pre-touch: Disabled
>> ?Uncommit: Enabled
>> ?Uncommit Delay: 300s
>> ?Runtime Workers: 5 parallel
>>
>> ZGC Globals:
>> ?GlobalPhase:?????? 2 (Relocate)
>> ?GlobalSeqNum:????? 1
>> ?Offset Max:??????? 4096G (0x0000040000000000)
>> ?Page Size Small:?? 2M
>> ?Page Size Medium:? 32M
>>
>> ZGC Metadata Bits:
>> ?Good:????????????? 0x0000100000000000
>> ?Bad:?????????????? 0x00002c0000000000
>> ?WeakBad:?????????? 0x00000c0000000000
>> ?Marked:??????????? 0x0000040000000000
>> ?Remapped:????????? 0x0000100000000000
>>
>> Heap:
>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, reserved 
>> 1056768K
>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>> 1048576K
>>
>> ZGC Page Table:
>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>> Allocating
>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>> Allocating
>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>> Allocating
>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>> Allocating
>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>> Allocating
>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>> Allocating
>>
>> Thanks,
>> StefanK
> 


From stefan.karlsson at oracle.com  Tue Jun  2 13:59:22 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 2 Jun 2020 15:59:22 +0200
Subject: RFR: 8246135: ZGC: Save important log lines and print them when
 dumping hs_err files
In-Reply-To: <5b677c7b-d36e-3583-bdbf-0caf095d3a99@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
 <5b677c7b-d36e-3583-bdbf-0caf095d3a99@oracle.com>
Message-ID: <25bd3301-a743-b18f-c837-7103eadbbb8b@oracle.com>

Thanks StefanJ!

StefanK

On 2020-06-02 15:58, stefan.johansson at oracle.com wrote:
> Hi Stefan,
>
> On 2020-06-02 15:51, Stefan Karlsson wrote:
>> Hi all,
>>
>> While working on getting more information when dumping early during 
>> initialization, it became apparent that we don't print these log 
>> lines as early as we could. In ZGC we can assert and/or fail during 
>> the set up of the heap. I'd like to print the precious lines even 
>> when that happens. The following patch moves the the precious line 
>> printing out of ZCollectedHeap and into a direct call from the 
>> VMError code. GCs that populate these lines will now automatically 
>> get them printed.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
> Very nice, looks great!
>
> Thanks,
> StefanJ
>
>>
>> Thanks,
>> StefanK
>>
>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> Please review this patch to save some of the important ZGC log lines 
>>> and print them when dumping hs_err files.
>>>
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>
>>> The patch adds a concept of "precious" log lines. What's typically 
>>> logged are GC initialization lines, but also error messages are 
>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>> crashes or hits an assert. The lines can also be printed in a 
>>> debugger to get a quick overview when debugging.
>>>
>>> The precious lines are always saved, but just like any other Unified 
>>> Logging calls, only logged if the tags are enabled.
>>>
>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>> like this:
>>>
>>> ZGC Precious Log:
>>> ?NUMA Support: Disabled
>>> ?CPUs: 8 total, 8 available
>>> ?Memory: 16384M
>>> ?Large Page Support: Disabled
>>> ?Medium Page Size: 32M
>>> ?Workers: 5 parallel, 1 concurrent
>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>> ?Address Space Size: 65536M x 3 = 196608M
>>> ?Min Capacity: 42M
>>> ?Initial Capacity: 256M
>>> ?Max Capacity: 4096M
>>> ?Max Reserve: 42M
>>> ?Pre-touch: Disabled
>>> ?Uncommit: Enabled
>>> ?Uncommit Delay: 300s
>>> ?Runtime Workers: 5 parallel
>>>
>>> ZGC Globals:
>>> ?GlobalPhase:?????? 2 (Relocate)
>>> ?GlobalSeqNum:????? 1
>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>> ?Page Size Small:?? 2M
>>> ?Page Size Medium:? 32M
>>>
>>> ZGC Metadata Bits:
>>> ?Good:????????????? 0x0000100000000000
>>> ?Bad:?????????????? 0x00002c0000000000
>>> ?WeakBad:?????????? 0x00000c0000000000
>>> ?Marked:??????????? 0x0000040000000000
>>> ?Remapped:????????? 0x0000100000000000
>>>
>>> Heap:
>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>> reserved 1056768K
>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>> 1048576K
>>>
>>> ZGC Page Table:
>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>> Allocating
>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>> Allocating
>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>> Allocating
>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>> Allocating
>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>> Allocating
>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>> Allocating
>>>
>>> Thanks,
>>> StefanK
>>


From shade at redhat.com  Tue Jun  2 14:13:43 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 2 Jun 2020 16:13:43 +0200
Subject: [15] RFR 8245961: Shenandoah: move some root marking to
 concurrent phase
In-Reply-To: <5b7385d8-8b08-fd52-4ef0-d846fc0c3e11@redhat.com>
References: <61efdf04-5d74-8a59-a193-576b115738ef@redhat.com>
 <ef70f68f-c69c-f18e-c510-192c285fabf7@redhat.com>
 <5b7385d8-8b08-fd52-4ef0-d846fc0c3e11@redhat.com>
Message-ID: <06e40ce2-9f0d-e90a-8929-9600c3bbf4c5@redhat.com>

On 6/2/20 3:12 PM, Zhengyu Gu wrote:
> http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.01/index.html

Looks OK. A few more minor nits (after rebasing to jdk/jdk):
  https://cr.openjdk.java.net/~shade/shenandoah/8245961-shade-updates.patch

Does it pass hotspot_gc_shenandoah?

Just got a failure while testing a patch above. Shouldn't we also except the newly handled
concurrent roots from this verification?

#  Internal Error
(/home/shade/trunks/jdk-jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:943), pid=14874,
tid=14881
#  Error: Verify Roots; Should not be forwarded

Referenced from:
  interior location: 0x00007fb898447a18
  outside of Java heap
  0x00007fb898447a18 is at entry_point+440 in (nmethod*)0x00007fb898447690

Object:
  0x00000000c0304350 - klass 0x000000080008bbf8 jdk.internal.loader.ClassLoaders$AppClassLoader
    not allocated after mark start
        marked
        in collection set
  mark: marked(0x00000000fff80003)
  region: |    6|CS |BTE     c0300000,     c0380000,     c0380000|TAMS     c0380000|UWM
c0380000|U   512K|T   494K|G     0B|S 17472B|L 17472B|CP   0

Forwardee:
  0x00000000fff80000 - klass 0x000000080008bbf8 jdk.internal.loader.ClassLoaders$AppClassLoader
        allocated after mark start
        marked
    not in collection set
  mark: mark(is_neutral hash=0x000000000c387f44 age=0)
  region: | 2047|R  |BTE     fff80000,    100000000,    100000000|TAMS     fff80000|UWM
100000000|U   512K|T     0B|G   512K|S     0B|L     0B|CP   0

Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V  [libjvm.so+0x17246ee]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*,
Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x37e
V  [libjvm.so+0x17253ff]  VMError::report_and_die(Thread*, void*, char const*, int, char const*,
char const*, __va_list_tag*)+0x2f
V  [libjvm.so+0x8c8301]  report_vm_error(char const*, int, char const*, char const*, ...)+0x111
V  [libjvm.so+0x1420dbd]  ShenandoahAsserts::print_failure(ShenandoahAsserts::SafeLevel, oop, void*,
oop, char const*, char const*, char const*, int)+0x3ed
V  [libjvm.so+0x150fe53]  ShenandoahVerifyNoForwared::do_oop(oop*)+0xa3
V  [libjvm.so+0x11fe150]  nmethod::oops_do(OopClosure*, bool)+0x120
V  [libjvm.so+0xc41907]  CodeBlobToOopClosure::do_code_blob(CodeBlob*)+0x37
V  [libjvm.so+0x16576d7]  JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187
V  [libjvm.so+0x1662dab]  Threads::possibly_parallel_oops_do(bool, OopClosure*, CodeBlobClosure*)+0x7b
V  [libjvm.so+0x14e7730]  ShenandoahRootVerifier::oops_do(OopClosure*)+0x290
V  [libjvm.so+0x150e171]
ShenandoahVerifier::verify_roots_no_forwarded_except(ShenandoahRootVerifier::RootTypes)+0x41
V  [libjvm.so+0x149d1e0]  ShenandoahHeap::op_final_mark()+0x4e0
V  [libjvm.so+0x149d85f]  ShenandoahHeap::entry_final_mark()+0xbf
V  [libjvm.so+0x150dd54]  VM_ShenandoahFinalMarkStartEvac::doit()+0x34
V  [libjvm.so+0x172683d]  VM_Operation::evaluate()+0x1cd
V  [libjvm.so+0x175562b]  VMThread::evaluate_operation(VM_Operation*) [clone .constprop.71]+0x13b
V  [libjvm.so+0x175606d]  VMThread::loop()+0x7bd
V  [libjvm.so+0x17564ba]  VMThread::run()+0xca
V  [libjvm.so+0x1664616]  Thread::call_run()+0xf6
V  [libjvm.so+0x12907de]  thread_native_entry(Thread*)+0x10e


-- 
Thanks,
-Aleksey


From serguei.spitsyn at oracle.com  Tue Jun  2 16:54:42 2020
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 2 Jun 2020 09:54:42 -0700
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
References: <AM0PR0202MB333121DBB703616DF690F38B9B1D0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <32f34616-cf17-8caa-5064-455e013e2313@oracle.com>
 <AM0PR0202MB3331802279A7608927F4D9B49BB00@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
Message-ID: <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>

Hi Richard,

This looks good to me.

Thanks,
Serguei


On 5/28/20 09:02, Vladimir Kozlov wrote:
> Vladimir Ivanov is on break currently.
> It looks good to me.
>
> Thanks,
> Vladimir K
>
> On 5/26/20 7:31 AM, Reingruber, Richard wrote:
>> Hi Vladimir,
>>
>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>
>>> Not an expert in JVMTI code base, so can't comment on the actual 
>>> changes.
>>
>>> ? From JIT-compilers perspective it looks good.
>>
>> I put out webrev.1 a while ago [1]:
>>
>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/
>> Webrev(delta): 
>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/
>>
>> You originally suggested to use a handshake to switch a thread into 
>> interpreter mode [2]. I'm using
>> a direct handshake now, because I think it is the best fit.
>>
>> May I ask if webrev.1 still looks good to you from JIT-compilers 
>> perspective?
>>
>> Can I list you as (partial) Reviewer?
>>
>> Thanks, Richard.
>>
>> [1] 
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html
>> [2] 
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html
>>
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Freitag, 7. Februar 2020 09:19
>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>> serviceability-dev at openjdk.java.net; 
>> hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR(S) 8238585: Use handshake for 
>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make 
>> compiled methods on stack not_entrant
>>
>>
>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>
>> Not an expert in JVMTI code base, so can't comment on the actual 
>> changes.
>>
>> ? From JIT-compilers perspective it looks good.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585
>>>
>>> The change avoids making all compiled methods on stack not_entrant 
>>> when switching a java thread to
>>> interpreter only execution for jvmti purposes. It is sufficient to 
>>> deoptimize the compiled frames on stack.
>>>
>>> Additionally a handshake is used instead of a vm operation to walk 
>>> the stack and do the deoptimizations.
>>>
>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and 
>>> release builds on all platforms.
>>>
>>> Thanks, Richard.
>>>
>>> See also my question if anyone knows a reason for making the 
>>> compiled methods not_entrant:
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html 
>>>
>>>


From zgu at redhat.com  Tue Jun  2 17:50:50 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 2 Jun 2020 13:50:50 -0400
Subject: [15] RFR 8245961: Shenandoah: move some root marking to
 concurrent phase
In-Reply-To: <06e40ce2-9f0d-e90a-8929-9600c3bbf4c5@redhat.com>
References: <61efdf04-5d74-8a59-a193-576b115738ef@redhat.com>
 <ef70f68f-c69c-f18e-c510-192c285fabf7@redhat.com>
 <5b7385d8-8b08-fd52-4ef0-d846fc0c3e11@redhat.com>
 <06e40ce2-9f0d-e90a-8929-9600c3bbf4c5@redhat.com>
Message-ID: <57228404-acb5-9537-caef-35919cbff493@redhat.com>


On 6/2/20 10:13 AM, Aleksey Shipilev wrote:
> On 6/2/20 3:12 PM, Zhengyu Gu wrote:
>> http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.01/index.html
> 
> Looks OK. A few more minor nits (after rebasing to jdk/jdk):
>    https://cr.openjdk.java.net/~shade/shenandoah/8245961-shade-updates.patch

Rebased webrev.01 to jdk/jdk and manually folded some of your suggested 
changes:

http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.02/

Test:
   hotspot_gc_shenandoah (with +ShenandoahVerify), all clear.

Thanks,

-Zhengyu

> 
> Does it pass hotspot_gc_shenandoah?
> 
> Just got a failure while testing a patch above. Shouldn't we also except the newly handled
> concurrent roots from this verification?
> 
> #  Internal Error
> (/home/shade/trunks/jdk-jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:943), pid=14874,
> tid=14881
> #  Error: Verify Roots; Should not be forwarded
> 
> Referenced from:
>    interior location: 0x00007fb898447a18
>    outside of Java heap
>    0x00007fb898447a18 is at entry_point+440 in (nmethod*)0x00007fb898447690
> 
> Object:
>    0x00000000c0304350 - klass 0x000000080008bbf8 jdk.internal.loader.ClassLoaders$AppClassLoader
>      not allocated after mark start
>          marked
>          in collection set
>    mark: marked(0x00000000fff80003)
>    region: |    6|CS |BTE     c0300000,     c0380000,     c0380000|TAMS     c0380000|UWM
> c0380000|U   512K|T   494K|G     0B|S 17472B|L 17472B|CP   0
> 
> Forwardee:
>    0x00000000fff80000 - klass 0x000000080008bbf8 jdk.internal.loader.ClassLoaders$AppClassLoader
>          allocated after mark start
>          marked
>      not in collection set
>    mark: mark(is_neutral hash=0x000000000c387f44 age=0)
>    region: | 2047|R  |BTE     fff80000,    100000000,    100000000|TAMS     fff80000|UWM
> 100000000|U   512K|T     0B|G   512K|S     0B|L     0B|CP   0
> 
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V  [libjvm.so+0x17246ee]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*,
> Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x37e
> V  [libjvm.so+0x17253ff]  VMError::report_and_die(Thread*, void*, char const*, int, char const*,
> char const*, __va_list_tag*)+0x2f
> V  [libjvm.so+0x8c8301]  report_vm_error(char const*, int, char const*, char const*, ...)+0x111
> V  [libjvm.so+0x1420dbd]  ShenandoahAsserts::print_failure(ShenandoahAsserts::SafeLevel, oop, void*,
> oop, char const*, char const*, char const*, int)+0x3ed
> V  [libjvm.so+0x150fe53]  ShenandoahVerifyNoForwared::do_oop(oop*)+0xa3
> V  [libjvm.so+0x11fe150]  nmethod::oops_do(OopClosure*, bool)+0x120
> V  [libjvm.so+0xc41907]  CodeBlobToOopClosure::do_code_blob(CodeBlob*)+0x37
> V  [libjvm.so+0x16576d7]  JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x187
> V  [libjvm.so+0x1662dab]  Threads::possibly_parallel_oops_do(bool, OopClosure*, CodeBlobClosure*)+0x7b
> V  [libjvm.so+0x14e7730]  ShenandoahRootVerifier::oops_do(OopClosure*)+0x290
> V  [libjvm.so+0x150e171]
> ShenandoahVerifier::verify_roots_no_forwarded_except(ShenandoahRootVerifier::RootTypes)+0x41
> V  [libjvm.so+0x149d1e0]  ShenandoahHeap::op_final_mark()+0x4e0
> V  [libjvm.so+0x149d85f]  ShenandoahHeap::entry_final_mark()+0xbf
> V  [libjvm.so+0x150dd54]  VM_ShenandoahFinalMarkStartEvac::doit()+0x34
> V  [libjvm.so+0x172683d]  VM_Operation::evaluate()+0x1cd
> V  [libjvm.so+0x175562b]  VMThread::evaluate_operation(VM_Operation*) [clone .constprop.71]+0x13b
> V  [libjvm.so+0x175606d]  VMThread::loop()+0x7bd
> V  [libjvm.so+0x17564ba]  VMThread::run()+0xca
> V  [libjvm.so+0x1664616]  Thread::call_run()+0xf6
> V  [libjvm.so+0x12907de]  thread_native_entry(Thread*)+0x10e
> 
> 
> 


From richard.reingruber at sap.com  Tue Jun  2 17:57:26 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Tue, 2 Jun 2020 17:57:26 +0000
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>
References: <AM0PR0202MB333121DBB703616DF690F38B9B1D0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <32f34616-cf17-8caa-5064-455e013e2313@oracle.com>
 <AM0PR0202MB3331802279A7608927F4D9B49BB00@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
 <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>
Message-ID: <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>

Hi Serguei,

> This looks good to me.

Thanks!

From an earlier mail:

> I'm thinking it would be more safe to run full tier5.

I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would
like to push.

Thanks, Richard.

-----Original Message-----
From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com> 
Sent: Dienstag, 2. Juni 2020 18:55
To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant

Hi Richard,

This looks good to me.

Thanks,
Serguei


On 5/28/20 09:02, Vladimir Kozlov wrote:
> Vladimir Ivanov is on break currently.
> It looks good to me.
>
> Thanks,
> Vladimir K
>
> On 5/26/20 7:31 AM, Reingruber, Richard wrote:
>> Hi Vladimir,
>>
>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>
>>> Not an expert in JVMTI code base, so can't comment on the actual 
>>> changes.
>>
>>> ? From JIT-compilers perspective it looks good.
>>
>> I put out webrev.1 a while ago [1]:
>>
>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/
>> Webrev(delta): 
>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/
>>
>> You originally suggested to use a handshake to switch a thread into 
>> interpreter mode [2]. I'm using
>> a direct handshake now, because I think it is the best fit.
>>
>> May I ask if webrev.1 still looks good to you from JIT-compilers 
>> perspective?
>>
>> Can I list you as (partial) Reviewer?
>>
>> Thanks, Richard.
>>
>> [1] 
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html
>> [2] 
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html
>>
>> -----Original Message-----
>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> Sent: Freitag, 7. Februar 2020 09:19
>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>> serviceability-dev at openjdk.java.net; 
>> hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR(S) 8238585: Use handshake for 
>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make 
>> compiled methods on stack not_entrant
>>
>>
>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>
>> Not an expert in JVMTI code base, so can't comment on the actual 
>> changes.
>>
>> ? From JIT-compilers perspective it looks good.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585
>>>
>>> The change avoids making all compiled methods on stack not_entrant 
>>> when switching a java thread to
>>> interpreter only execution for jvmti purposes. It is sufficient to 
>>> deoptimize the compiled frames on stack.
>>>
>>> Additionally a handshake is used instead of a vm operation to walk 
>>> the stack and do the deoptimizations.
>>>
>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and 
>>> release builds on all platforms.
>>>
>>> Thanks, Richard.
>>>
>>> See also my question if anyone knows a reason for making the 
>>> compiled methods not_entrant:
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html 
>>>
>>>


From serguei.spitsyn at oracle.com  Tue Jun  2 18:01:42 2020
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 2 Jun 2020 11:01:42 -0700
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
References: <AM0PR0202MB333121DBB703616DF690F38B9B1D0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <32f34616-cf17-8caa-5064-455e013e2313@oracle.com>
 <AM0PR0202MB3331802279A7608927F4D9B49BB00@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
 <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>
 <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
Message-ID: <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com>

Hi Richard,


On 6/2/20 10:57, Reingruber, Richard wrote:
> Hi Serguei,
>
>> This looks good to me.
> Thanks!
>
>  From an earlier mail:
>
>> I'm thinking it would be more safe to run full tier5.
> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would
> like to push.

Okay, I'll submit a mach5 job with your fix and let you know about the 
results.

Thanks,
Serguei

> Thanks, Richard.
>
> -----Original Message-----
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Dienstag, 2. Juni 2020 18:55
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>
> Hi Richard,
>
> This looks good to me.
>
> Thanks,
> Serguei
>
>
> On 5/28/20 09:02, Vladimir Kozlov wrote:
>> Vladimir Ivanov is on break currently.
>> It looks good to me.
>>
>> Thanks,
>> Vladimir K
>>
>> On 5/26/20 7:31 AM, Reingruber, Richard wrote:
>>> Hi Vladimir,
>>>
>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>>> Not an expert in JVMTI code base, so can't comment on the actual
>>>> changes.
>>>>  ? From JIT-compilers perspective it looks good.
>>> I put out webrev.1 a while ago [1]:
>>>
>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/
>>> Webrev(delta):
>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/
>>>
>>> You originally suggested to use a handshake to switch a thread into
>>> interpreter mode [2]. I'm using
>>> a direct handshake now, because I think it is the best fit.
>>>
>>> May I ask if webrev.1 still looks good to you from JIT-compilers
>>> perspective?
>>>
>>> Can I list you as (partial) Reviewer?
>>>
>>> Thanks, Richard.
>>>
>>> [1]
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html
>>> [2]
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html
>>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Freitag, 7. Februar 2020 09:19
>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>> serviceability-dev at openjdk.java.net;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR(S) 8238585: Use handshake for
>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make
>>> compiled methods on stack not_entrant
>>>
>>>
>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>> Not an expert in JVMTI code base, so can't comment on the actual
>>> changes.
>>>
>>>  ? From JIT-compilers perspective it looks good.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585
>>>>
>>>> The change avoids making all compiled methods on stack not_entrant
>>>> when switching a java thread to
>>>> interpreter only execution for jvmti purposes. It is sufficient to
>>>> deoptimize the compiled frames on stack.
>>>>
>>>> Additionally a handshake is used instead of a vm operation to walk
>>>> the stack and do the deoptimizations.
>>>>
>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and
>>>> release builds on all platforms.
>>>>
>>>> Thanks, Richard.
>>>>
>>>> See also my question if anyone knows a reason for making the
>>>> compiled methods not_entrant:
>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html
>>>>
>>>>


From zgu at redhat.com  Tue Jun  2 18:24:19 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 2 Jun 2020 14:24:19 -0400
Subject: [15] RFR(T) 8246342: Shenandoah: remove unused
 ShenandoahIsMarkedNextClosure
Message-ID: <80d72c1b-700e-df7a-8f43-c0368139e541@redhat.com>

Please review this trivial patch that removes unused 
ShenandoahIsMarkedNextClosure.

Bug: https://bugs.openjdk.java.net/browse/JDK-8246342
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246342/webrev.00/

Test:
   Built on Linux x86_64

Thanks,

-Zhengyu


From shade at redhat.com  Tue Jun  2 18:49:31 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 2 Jun 2020 20:49:31 +0200
Subject: [15] RFR 8245961: Shenandoah: move some root marking to
 concurrent phase
In-Reply-To: <57228404-acb5-9537-caef-35919cbff493@redhat.com>
References: <61efdf04-5d74-8a59-a193-576b115738ef@redhat.com>
 <ef70f68f-c69c-f18e-c510-192c285fabf7@redhat.com>
 <5b7385d8-8b08-fd52-4ef0-d846fc0c3e11@redhat.com>
 <06e40ce2-9f0d-e90a-8929-9600c3bbf4c5@redhat.com>
 <57228404-acb5-9537-caef-35919cbff493@redhat.com>
Message-ID: <f3793813-98cf-cf9a-5e7c-30ca6c5e030e@redhat.com>

On 6/2/20 7:50 PM, Zhengyu Gu wrote:
> On 6/2/20 10:13 AM, Aleksey Shipilev wrote:
>> On 6/2/20 3:12 PM, Zhengyu Gu wrote:
>>> http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.01/index.html
>>
>> Looks OK. A few more minor nits (after rebasing to jdk/jdk):
>>    https://cr.openjdk.java.net/~shade/shenandoah/8245961-shade-updates.patch
> 
> Rebased webrev.01 to jdk/jdk and manually folded some of your suggested 
> changes:
> 
> http://cr.openjdk.java.net/~zgu/JDK-8245961/webrev.02/

Looks OK.

-- 
Thanks,
-Aleksey


From shade at redhat.com  Tue Jun  2 18:50:22 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 2 Jun 2020 20:50:22 +0200
Subject: [15] RFR(T) 8246342: Shenandoah: remove unused
 ShenandoahIsMarkedNextClosure
In-Reply-To: <80d72c1b-700e-df7a-8f43-c0368139e541@redhat.com>
References: <80d72c1b-700e-df7a-8f43-c0368139e541@redhat.com>
Message-ID: <12e52d7d-1106-34e6-d462-ae7350e0e309@redhat.com>

On 6/2/20 8:24 PM, Zhengyu Gu wrote:
> Please review this trivial patch that removes unused 
> ShenandoahIsMarkedNextClosure.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8246342
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246342/webrev.00/

Looks good!

-- 
Thanks,
-Aleksey


From richard.reingruber at sap.com  Tue Jun  2 19:14:08 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Tue, 2 Jun 2020 19:14:08 +0000
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com>
References: <AM0PR0202MB333121DBB703616DF690F38B9B1D0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <32f34616-cf17-8caa-5064-455e013e2313@oracle.com>
 <AM0PR0202MB3331802279A7608927F4D9B49BB00@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
 <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>
 <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <4c92e183-6a3c-f8da-4330-c297ad2afef6@oracle.com>
Message-ID: <AM0PR0202MB3331ADDC2A66057DAC7386539B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>

Excellent. Thanks!
Richard.

-----Original Message-----
From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com> 
Sent: Dienstag, 2. Juni 2020 20:02
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant

Hi Richard,


On 6/2/20 10:57, Reingruber, Richard wrote:
> Hi Serguei,
>
>> This looks good to me.
> Thanks!
>
>  From an earlier mail:
>
>> I'm thinking it would be more safe to run full tier5.
> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would
> like to push.

Okay, I'll submit a mach5 job with your fix and let you know about the 
results.

Thanks,
Serguei

> Thanks, Richard.
>
> -----Original Message-----
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Dienstag, 2. Juni 2020 18:55
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>
> Hi Richard,
>
> This looks good to me.
>
> Thanks,
> Serguei
>
>
> On 5/28/20 09:02, Vladimir Kozlov wrote:
>> Vladimir Ivanov is on break currently.
>> It looks good to me.
>>
>> Thanks,
>> Vladimir K
>>
>> On 5/26/20 7:31 AM, Reingruber, Richard wrote:
>>> Hi Vladimir,
>>>
>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>>> Not an expert in JVMTI code base, so can't comment on the actual
>>>> changes.
>>>>  ? From JIT-compilers perspective it looks good.
>>> I put out webrev.1 a while ago [1]:
>>>
>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/
>>> Webrev(delta):
>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/
>>>
>>> You originally suggested to use a handshake to switch a thread into
>>> interpreter mode [2]. I'm using
>>> a direct handshake now, because I think it is the best fit.
>>>
>>> May I ask if webrev.1 still looks good to you from JIT-compilers
>>> perspective?
>>>
>>> Can I list you as (partial) Reviewer?
>>>
>>> Thanks, Richard.
>>>
>>> [1]
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html
>>> [2]
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html
>>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Freitag, 7. Februar 2020 09:19
>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>> serviceability-dev at openjdk.java.net;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR(S) 8238585: Use handshake for
>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make
>>> compiled methods on stack not_entrant
>>>
>>>
>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>> Not an expert in JVMTI code base, so can't comment on the actual
>>> changes.
>>>
>>>  ? From JIT-compilers perspective it looks good.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585
>>>>
>>>> The change avoids making all compiled methods on stack not_entrant
>>>> when switching a java thread to
>>>> interpreter only execution for jvmti purposes. It is sufficient to
>>>> deoptimize the compiled frames on stack.
>>>>
>>>> Additionally a handshake is used instead of a vm operation to walk
>>>> the stack and do the deoptimizations.
>>>>
>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and
>>>> release builds on all platforms.
>>>>
>>>> Thanks, Richard.
>>>>
>>>> See also my question if anyone knows a reason for making the
>>>> compiled methods not_entrant:
>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html
>>>>
>>>>


From igor.ignatyev at oracle.com  Tue Jun  2 20:17:08 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 2 Jun 2020 13:17:08 -0700
Subject: RFR(S/M) : 8243430 : use reproducible random in :vmTestbase_vm_gc
In-Reply-To: <405909cf-39bb-75b0-0302-7b1f27625ea7@oracle.com>
References: <587C03FD-EFF4-42C1-9860-0BDC4F3A800F@oracle.com>
 <1440ED97-6390-402A-B1E1-810DC9DEDBA3@oracle.com>
 <EEF219A3-1EAE-44A9-A14D-26F55686D89B@oracle.com>
 <F977C6E3-64CF-4F67-B266-7D55C4C2B18A@oracle.com>
 <6D4C5C1A-5D83-4DA9-A093-D3E9799584AC@oracle.com>
 <C630559B-E98E-49C7-B5AB-F12F57492828@oracle.com>
 <405909cf-39bb-75b0-0302-7b1f27625ea7@oracle.com>
Message-ID: <49F83909-FF09-49B3-A7CD-7E36D3317AD8@oracle.com>

Thomas, Leonid and Kim

thank you for your reviews, pushed.

-- Igor

> On Jun 2, 2020, at 3:40 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
>  looks good.
> 
> Thomas


> On Jun 1, 2020, at 8:04 PM, Leonid Mesnik <leonid.mesnik at oracle.com> wrote:
> 
> Looks good.
> 
> Leonid
> 
> On 6/1/20 8:00 PM, Igor Ignatyev wrote:
>> ping?
>> -- Igor
>> 
>>> On May 20, 2020, at 3:43 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>> 
>>> ping?
>>> -- Igor
>>> 
>>>> On May 5, 2020, at 10:18 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>> 
>>>> can I get 2nd review for that? or ack. that it's fine to be pushed w/ just one review?
>>>> 
>>>> -- Igor
>>>> 
>>>>> On May 3, 2020, at 8:29 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On May 3, 2020, at 12:13 AM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>>>>> 
>>>>>>> On Apr 30, 2020, at 4:38 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>>>>> 
>>>>>>> http://cr.openjdk.java.net/~iignatyev/8243430/webrev.00
>>>>>>>> 555 lines changed: 11 ins; 65 del; 479 mod;
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> could you please review this small patch?
>>>>>>> from JBS:
>>>>>>>> this subtask is to use j.t.l.Utils.getRandomInstance() as a random number generator, where applicable, in : vmTestbase_vm_gc test group and marking the tests which make use of "randomness" with a proper k/w.
>>>>>>> testing: : vmTestbase_vm_gc test group
>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243430
>>>>>>> webrevs:
>>>>>>> - code changes: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00.code
>>>>>>>> 94 lines changed: 11 ins; 65 del; 18 mod;
>>>>>>> - adding k/w: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00.kw
>>>>>>>> 229 lines changed: 0 ins; 0 del; 229 mod;
>>>>>>> - full: http://cr.openjdk.java.net/~iignatyev//8243430/webrev.00
>>>>>>>> 555 lines changed: 11 ins; 65 del; 479 mod;
>>>>>>> Thanks,
>>>>>>> -- Igor
>>>>> Hi Kim,
>>>>> 
>>>>>> I could be missing something, bug I don't see where either of these
>>>>>> use randomness:
>>>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest03/gctest03.java
>>>>> gctest03 starts Yellowthread at L#130, and Yellowthread uses nsk.share.test.LocalRandom in its 'run' method at L#167 of vmTestbase/gc/gctests/gctest03/appthread.java
>>>>> 
>>>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest04/gctest04.java
>>>>> gctest04 starts reqdisp thread at L#97, and reqdisp uses n.s.t.LocalRandom in its 'run' method at L#211 of vmTestbase/gc/gctests/gctest04/reqgen.java
>>>>>> test/hotspot/jtreg/vmTestbase/gc/gctests/gctest02/gctest02.java
>>>>>> Still importing java.util.Random, but that doesn't seem needed.
>>>>>> 53 import java.util.Random;
>>>>> you're right, I'll remove it before pushing.
>>>>> 
>>>>>> Other than that, looks good.
>>>>> thanks for review!
>>>>>> I don't need a new webrev for changes related to the above.
>>>>> -- Igor


From igor.ignatyev at oracle.com  Tue Jun  2 20:17:19 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Tue, 2 Jun 2020 13:17:19 -0700
Subject: RFR(S/M) : 8243434 : use reproducible random in
 :vmTestbase_vm_g1classunloading
In-Reply-To: <8c23edf7-e389-3fa0-669e-2c015b166065@oracle.com>
References: <5AA3E4F2-AC0D-4B77-8111-56FC8C993FEC@oracle.com>
 <8DE9130B-2A84-4B95-8A1B-892CA3D8F01B@oracle.com>
 <E09134CB-1E79-492D-9852-DD54748C41B4@oracle.com>
 <CE567E45-6A99-4D8E-9EB8-585C36F46FA2@oracle.com>
 <8c23edf7-e389-3fa0-669e-2c015b166065@oracle.com>
Message-ID: <9B97AC83-07F2-4E91-96BF-E945BFC890EA@oracle.com>

Kim, Leonid,

thank you for your reviews, pushed.

-- Igor

> On Jun 1, 2020, at 10:09 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
> Looks good.


> On Jun 1, 2020, at 8:05 PM, Leonid Mesnik <leonid.mesnik at oracle.com> wrote:
> 
> Looks good.
> 
> Leonid
> 
> On 6/1/20 8:00 PM, Igor Ignatyev wrote:
>> ping?
>> -- Igor
>> 
>>> On May 20, 2020, at 3:42 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>> 
>>> ping?
>>> -- Igor
>>> 
>>>> On May 5, 2020, at 9:56 AM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>> 
>>>> ping?
>>>> -- Igor
>>>> 
>>>>> On Apr 30, 2020, at 4:10 PM, Igor Ignatyev <igor.ignatyev at oracle.com> wrote:
>>>>> 
>>>>> http://cr.openjdk.java.net/~iignatyev/8243434/webrev.00
>>>>>> 132 lines changed: 8 ins; 0 del; 124 mod
>>>>> Hi all,
>>>>> 
>>>>> could you please review this patch?
>>>>> from JBS:
>>>>>> this subtask is to use j.t.l.Utils.getRandomInstance() as a random number generator, where applicable, in : vmTestbase_vm_g1classunloading test group and marking the tests which make use of "randomness" with a proper k/w.
>>>>> testing: : vmTestbase_vm_g1classunloading test group
>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8243434
>>>>> webrevs:
>>>>> - code changes: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00.code
>>>>>> 15 lines changed: 8 ins; 0 del; 7 mod;
>>>>> - adding k/w: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00.kw
>>>>>> 112 lines changed: 0 ins; 0 del; 112 mod;
>>>>> - full: http://cr.openjdk.java.net/~iignatyev//8243434/webrev.00
>>>>>> 132 lines changed: 8 ins; 0 del; 124 mod
>>>>> Thanks,
>>>>> -- Igor


From jianglizhou at google.com  Tue Jun  2 23:01:48 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Tue, 2 Jun 2020 16:01:48 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <726b3445-9531-a3c3-7629-2152bedce2d1@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <CALrW1jyzfEE36GCCYjEF3noqF7+_wc7Lzj1K-beix=aj5L5guA@mail.gmail.com>
 <13836d5c-db91-6e6a-5022-0e7585722f77@oracle.com>
 <CALrW1jz5FgDmexfZf8xAFsDh3WcVSkFvT1PED3413BJxDudcPQ@mail.gmail.com>
 <CALrW1jyaDThW00gyVr8uOiKWF+TSsDQ5g-Zmd5QCwAX86R8xvA@mail.gmail.com>
 <726b3445-9531-a3c3-7629-2152bedce2d1@oracle.com>
Message-ID: <CALrW1jye7+EGXQj1J6LRbcjU6HOQZ++6Eq8Ue4_SVWXNbAKPxQ@mail.gmail.com>

On Mon, Jun 1, 2020 at 7:28 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 5/31/20 8:33 PM, Jiangli Zhou wrote:
> > On Sun, May 31, 2020 at 8:27 PM Jiangli Zhou <jianglizhou at google.com> wrote:
> >> On Fri, May 29, 2020 at 10:44 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>
> >>>
> >>> On 5/29/20 8:40 PM, Jiangli Zhou wrote:
> >>>> On Fri, May 29, 2020 at 7:30 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
> >>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>>>>
> >>>>>
> >>>>> Summary:
> >>>>>
> >>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >>>>> CDS executes a full GC so that G1 will compact the heap regions, leaving
> >>>>> maximum contiguous free space at the top of the heap. Then, the archived
> >>>>> heap regions are allocated from the top of the heap.
> >>>>>
> >>>>> Under some circumstances, java.lang.ref.Cleaners will execute
> >>>>> after the GC has completed. The cleaners may allocate or synchronized, which
> >>>>> will cause G1 to allocate an EDEN region at the top of the heap.
> >>>> This is an interesting one. Please give more details on under what
> >>>> circumstances java.lang.ref.Cleaners causes the issue. It's unclear to
> >>>> me why it hasn't been showing up before.
> >>> Hi Jiangli,
> >>>
> >>> Thanks for the review. It's very helpful.
> >>>
> >>> The assert (see my comment in JDK-8245925) happened in my prototype for
> >>> JDK-8244778
> >>>
> >>> http://cr.openjdk.java.net/~iklam/jdk15/8244778-archive-full-module-graph.v00.8/
> >>>
> >>> I have to archive AppClassLoader and PlatformClassLoader, but need to
> >>> clear their URLClassPath field (the "ucp" field). See
> >>> clear_loader_states in metaspaceShared.cpp. Because of this, some
> >>> java.util.zip.ZipFiles referenced by the URLClassPath  become garbage,
> >>> and their Cleaners are executed after full GC has finished.
> >>>
> >> I haven't looked at your 8244778-archive-full-module-graph change yet,
> >> if you are going to archive class loader objects, you probably want to
> >> go with a solution that scrubs fields that are 'not archivable' and
> >> then restores at runtime. Sounds like you are going with that. When I
> >> worked on the initial implementation for system module object
> >> archiving, I implemented static field scrubber with the goal for
> >> archiving class loaders. I didn't complete it as it was not yet
> >> needed, but the code probably is helpful for you now. I might have
> >> sent you the pointer to one of the versions at the time, but try
> >> looking under my old /home directory if it's still around. It might be
> >> good to trigger runtime field restoration by Java code, that's the
> >> part I haven't fully explored yet. But, hopefully these inputs would
> >> be useful for your current work.
>
> Hi Jiangli,
>
> I can't access your old home directory. I already implemented the field
> scrubbing in the above patch. It's in clear_loader_states in
> metaspaceShared.cpp.
>

I'll look at it after it's ready for code review.

> >>> I think the bug has always existed, but is just never triggered because
> >>> we have not activated the Cleaners.
> >>>
> >>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN regions
> >>>>> exist,
> >>>>> exit the safepoint, run GC, and try again. Eventually all the cleaners will
> >>>>> be executed and no more allocation can happen.
> >>>>>
> >>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>>>>
> >>>> I think it's better to skip the top allocated region(s) in such cases
> >>>> and avoid retrying. Dump time performance is important, as we are
> >>>> moving the cost from runtime to CDS dump time. It's desirable to keep
> >>>> the dump time cost as low as possible, so using CDS delivers better
> >>>> net gain overall.
> >>>>
> >>>> Here are some comments for your current webrev itself.
> >>>>
> >>>> 1611 static bool has_unwanted_g1_eden_regions() {
> >>>> 1612 #if INCLUDE_G1GC
> >>>> 1613   return HeapShared::is_heap_object_archiving_allowed() && UseG1GC &&
> >>>> 1614          G1CollectedHeap::heap()->eden_regions_count() > 0;
> >>>> 1615 #else
> >>>> 1616   return false;
> >>>> 1617 #endif
> >>>> 1618 }
> >>>>
> >>>> You can remove 'UseG1GC' from line 1613, as
> >>>> is_heap_object_archiving_allowed() check already covers it:
> >>>>
> >>>> static bool is_heap_object_archiving_allowed() {
> >>>>     CDS_JAVA_HEAP_ONLY(return (UseG1GC && UseCompressedOops &&
> >>>> UseCompressedClassPointers);)
> >>>>     NOT_CDS_JAVA_HEAP(return false;)
> >>>> }
> >>>>
> >>>> Please include heap archiving code under #if INCLUDE_CDS_JAVA_HEAP.
> >>>> It's better to extract the GC handling code in
> >>>> VM_PopulateDumpSharedSpace::doit() into a separate API in
> >>>> heapShared.*.
> >>>>
> >>>> It's time to enhance heap archiving to use a separate buffer when
> >>>> copying the objects at dump time (discussed before), as a longer term
> >>>> fix. I'll file a RFE.
> >>> Thanks for reminding me. I think that is a better way to fix this
> >>> problem. It should be fairly easy to do, as we can already relocate the
> >>> heap regions using HeapShared::patch_archived_heap_embedded_pointers().
> >>> Let me try to implement it.
> >>>
> >> Sounds good. Thanks for doing that.
> >>
> >>> BTW, the GC speed is very fast, because the heap is not used very much
> >>> during -Xshare:dump. -Xlog:gc shows:
> >>>
> >>> [0.259s][info][gc ] GC(0) Pause Full (Full GC for -Xshare:dump)
> >>> 4M->1M(32M) 8.220ms
> >>>
> >>> So we have allocated only 4MB of objects, and only 1MB of those are
> >>> reachable.
> >>>
> >>> Anyway, I think we can even avoid running the GC altogether. We can scan
> >>> for contiguous free space from the top of the heap (below the EDEN
> >>> region). If there's more contiguous free space than the current
> >>> allocated heap regions, we know for sure that we can archive all the
> >>> heap objects that we need without doing the GC. That can be done as an
> >>> RFE after copying the objects. It won't save much though (8ms out of
> >>> about 700ms of total -Xshare:dump time).
> >> Looks like we had similar thoughts about finding free heap regions for
> >> copying. Here are the details:
> >>
> >> Solution 1):
> >> Allocate a buffer (no specific memory location requirement) and copy
> >> the heap objects to the buffer. Additional buffers can be allocated
> >> when needed, and they don't need to form a consecutive block of
> >> memory. The pointers within the copied Java objects need to be
> >> computed from the Java heap top as the 'real heap address' (so their
> >> runtime positions are at the heap top), instead of the buffer address.
> >> Region verification code also needs to be updated to reflect how the
> >> pointers are computed now.
> >>
> >> Solution 2):
> >> Find a range (consecutive heap regions) within the Java heap that is
> >> free. Copy the archive objects to that range. The pointer update and
> >> verification are similar to solution 1).
> >>
> >>
>
> I am thinking of doing something simpler. During heap archiving, G1 just
> allocates the highest free region
>
> bool G1ArchiveAllocator::alloc_new_region() {
>    // Allocate the highest free region in the reserved heap,
>    // and add it to our list of allocated regions. It is marked
>    // archive and added to the old set.
>    HeapRegion* hr = _g1h->alloc_highest_free_region();
>
> If there are used regions scattered around in the heap, we will end up
> with a few archive regions that are not contiguous, and the highest
> archive region may not be flushed to the top of the heap.
>
> Inside VM_PopulateDumpSharedSpace::dump_archive_heap_oopmaps, I will
> rewrite the pointers inside each of the archived regions, such that the
> contents would be the same as if all the archive regions were
> consecutively allocated at the top of the heap.

One complication is the archive region verification. You would need to
adjust that as well.

I've created JDK-8246297.

>
> This will require no more memory allocation at dump time that what it
> does today, and can be done with very little overhead.
>
> > I think you can go with solution 1 now. Solution 2) has the benefit of
> > not requiring additional memory for copying archived objects. That's
> > important as I did run into insufficient memory at dump time in real
> > use cases, so any memory saving at dump time is desirable.
> >
> > Some clarifications to avoid confusion: The insufficient memory is due
> > to memory restriction for builds in cloud environment.
> Could you elaborate? The Java heap space is reserved but initially not
> committed. Physical memory is allocated when we write into the archive
> heap regions.

When -Xms and -Xmx is set to be the same, all heap memory is committed
upfront, as far as I can tell. Setting -Xms and -Xmx to the same is a
very common use case.

>
> Were you getting OOM during virtual space reservation, or during
> os::commit_memory()?
>
> How much heap objects are you dumping? The current jdk repo needs only 2
> G1 regions, so it's 2* 4M of memory for small heaps like -Xmx128m, and 2
> * 8M of memory for larger heaps.

It is the total available memory, which has a limit. When archiving a
very large number of classes, the total required memory (including
Java heap, metaspace, archive space, etc) during the dumping process
can exceed the memory limit.

Best,
Jiangli
>
> Thanks
> - Ioi
>
> > Thanks,
> > Jiangli
> >
> >> It's better
> >> to go with 2) when the size of archive range is known before copying.
> >> With the planned work for class pre-initialization and enhanced object
> >> archiving support, it will be able to obtain (or have a good estimate
> >> of) the total size before copying. Solution 1 can be enhanced to use
> >> heap memory when that happens.
> >>
> >> I'll log these details as a RFE on Monday.
> >>
> >> Best,
> >> Jiangli
> >>
> >>> I'll withdraw this patch for now, and will try to implement the object
> >>> copying.
> >>>
> >>> Thanks
> >>> - Ioi
> >>>
> >>>> Best,
> >>>> Jiangli
> >>>>
> >>>>> Thanks
> >>>>> - Ioi
> >>>>>
> >>>>>
> >>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>


From stefan.karlsson at oracle.com  Wed Jun  3 08:30:36 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 3 Jun 2020 10:30:36 +0200
Subject: [renamed] RFR: 8246135: Save important GC log lines and print
 them when dumping hs_err files
In-Reply-To: <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
Message-ID: <239afd1c-f2ef-a5ec-36e6-d9a3970ac40d@oracle.com>

After further deliberations and discussions with Per, Erik and StefanJ, 
I'm going to make this patch more generic, and split this CR into two parts:
 ?JDK-8246135: Save important GC log lines and print them when dumping 
hs_err files
 ?JDK-8246404: ZGC: Use GCLogPrecious for important logging lines

As StefanJ mentioned earlier there are a separate task for G1, Parallel, 
and Serial:
 ?JDK-8246272: Make use of GCLogPrecious for G1, Parallel and Serial

Since this patch is becoming more generic than it started out as, I've 
done some more changes to accommodate that:
 ?https://cr.openjdk.java.net/~stefank/8246135/webrev.06

- Moved the ZGC usages out of this patch
- Moved initialization out of ZGC so that all GCs can simply start to 
use this functionality
- Added a lock to allow for concurrent usage
- Don't print the GC Precious Log section if the GC isn't using it
- Added a temp buffer to simplify generation of messages

The ZGC part has not changed and has already been reviewed:
 ?https://cr.openjdk.java.net/~stefank/8246404/webrev.01/

Some of the stylistic changes in the patch is done to prepare for:
 ?https://cr.openjdk.java.net/~stefank/8246405/webrev.01/

Thanks,
StefanK


On 2020-06-02 15:51, Stefan Karlsson wrote:
> Hi all,
>
> While working on getting more information when dumping early during 
> initialization, it became apparent that we don't print these log lines 
> as early as we could. In ZGC we can assert and/or fail during the set 
> up of the heap. I'd like to print the precious lines even when that 
> happens. The following patch moves the the precious line printing out 
> of ZCollectedHeap and into a direct call from the VMError code. GCs 
> that populate these lines will now automatically get them printed.
>
> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
>
> Thanks,
> StefanK
>
> On 2020-05-29 12:23, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to save some of the important ZGC log lines 
>> and print them when dumping hs_err files.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>
>> The patch adds a concept of "precious" log lines. What's typically 
>> logged are GC initialization lines, but also error messages are 
>> saved. These lines are then dumped in the hs_err file if the JVM 
>> crashes or hits an assert. The lines can also be printed in a 
>> debugger to get a quick overview when debugging.
>>
>> The precious lines are always saved, but just like any other Unified 
>> Logging calls, only logged if the tags are enabled.
>>
>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>> like this:
>>
>> ZGC Precious Log:
>> ?NUMA Support: Disabled
>> ?CPUs: 8 total, 8 available
>> ?Memory: 16384M
>> ?Large Page Support: Disabled
>> ?Medium Page Size: 32M
>> ?Workers: 5 parallel, 1 concurrent
>> ?Address Space Type: Contiguous/Unrestricted/Complete
>> ?Address Space Size: 65536M x 3 = 196608M
>> ?Min Capacity: 42M
>> ?Initial Capacity: 256M
>> ?Max Capacity: 4096M
>> ?Max Reserve: 42M
>> ?Pre-touch: Disabled
>> ?Uncommit: Enabled
>> ?Uncommit Delay: 300s
>> ?Runtime Workers: 5 parallel
>>
>> ZGC Globals:
>> ?GlobalPhase:?????? 2 (Relocate)
>> ?GlobalSeqNum:????? 1
>> ?Offset Max:??????? 4096G (0x0000040000000000)
>> ?Page Size Small:?? 2M
>> ?Page Size Medium:? 32M
>>
>> ZGC Metadata Bits:
>> ?Good:????????????? 0x0000100000000000
>> ?Bad:?????????????? 0x00002c0000000000
>> ?WeakBad:?????????? 0x00000c0000000000
>> ?Marked:??????????? 0x0000040000000000
>> ?Remapped:????????? 0x0000100000000000
>>
>> Heap:
>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>> reserved 1056768K
>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>> 1048576K
>>
>> ZGC Page Table:
>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>> Allocating
>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>> Allocating
>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>> Allocating
>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>> Allocating
>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>> Allocating
>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>> Allocating
>>
>> Thanks,
>> StefanK
>


From stefan.karlsson at oracle.com  Wed Jun  3 08:42:07 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 3 Jun 2020 10:42:07 +0200
Subject: RFR: 8246405: Add GCLogPrecious functionality to log and report debug
 errors
Message-ID: <08bb1203-39f9-cf25-7014-e32585767396@oracle.com>

Hi all,

Please review this patch to enhance the GCLogPrecious functionality 
(JDK-8246405) to add support for a way to both log and generate a crash 
report in debug builds.

https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8246405

I've split out a patch where ZGC uses this functionality:

https://cr.openjdk.java.net/~stefank/8246406/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8246406

Tested manually by running:
(ulimit -v >low value>; ../build/fastdebug/jdk/bin/java -XX:+UseZGC 
-Xmx18m -Xlog:gc* -version)

and verified that it generates a hs_err file with the appropriate 
information.

On macOS the output points to the right file and line number:

#? Internal Error (src/hotspot/share/gc/z/zVirtualMemory.cpp:46), 
pid=67695, tid=8451
#? Error: Failed to reserve enough address space for Java heap

but since TOUCH_ASSERT_POISON isn't implemented we don't get registers 
and the output contains the GCLogPrecious code:

V? [libjvm.dylib+0xb3d95c]? VMError::report_and_die(int, char const*, 
char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char 
const*, int, unsigned long)+0x670
V? [libjvm.dylib+0xb3e083]? VMError::report_and_die(Thread*, void*, char 
const*, int, char const*, char const*, __va_list_tag*)+0x47
V? [libjvm.dylib+0x334b48]? report_vm_error(char const*, int, char 
const*, char const*, ...)+0x145
V? [libjvm.dylib+0x48d629] 
GCLogPrecious::vwrite_and_debug(LogTargetHandle, char const*, 
__va_list_tag*, char const*, int)+0x81
V? [libjvm.dylib+0xbbdf70] GCLogPreciousHandle::write_and_debug(char 
const*, ...)+0x92
V? [libjvm.dylib+0xbd833e] 
ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0xb6

On Linux, where TOUCH_ASSERT_POISON is implemented, we get the last 
parts cut away:

V? [libjvm.so+0x1857179] 
ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0x79
V? [libjvm.so+0x182f84e]? ZPageAllocator::ZPageAllocator(ZWorkers*, 
unsigned long, unsigned long, unsigned long, unsigned long)+0x6e
V? [libjvm.so+0x1808b61]? ZHeap::ZHeap()+0x81
V? [libjvm.so+0x1802559]? ZCollectedHeap::ZCollectedHeap()+0x49

Thanks,
StefanK


From thomas.schatzl at oracle.com  Wed Jun  3 09:04:30 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 3 Jun 2020 11:04:30 +0200
Subject: RFR (S) 8246274: G1 old gen allocation tracking is not in a
 separate class
In-Reply-To: <1D252CA6-AEF8-4E57-9B6C-37AACE9D7EC1@amazon.com>
References: <1D252CA6-AEF8-4E57-9B6C-37AACE9D7EC1@amazon.com>
Message-ID: <f296e001-a9dc-f618-627e-7eb38b197e00@oracle.com>

Hi,

On 02.06.20 01:12, Luo, Ziyi wrote:
> Hi,
> 
> Could you please review this change which refactors
> G1Policy::_bytes_allocated_in_old_since_last_gc into a dedicated new
> tracking class G1OldGenAllocationTracker?
> 
> Bug ID:
> https://bugs.openjdk.java.net/browse/JDK-8246274
> Webrev:
> http://cr.openjdk.java.net/~phh/8246274/webrev.00/
> 
> Testing: Local run hotspot:tier1.
> 
> This is the first step toward improving the G1 old gen allocation tracking. As
> described in JDK-8245511, we will further add humongous allocation tracking
> and refactor G1IHOPControl::update_allocation_info(). This is a clean
> refactoring of the original G1Policy::_bytes_allocated_in_old_since_last_gc
> field and G1Policy::add_bytes_allocated_in_old_since_last_gc() method.
> 
> Thanks,
> Ziyi
> 

   - I suggest to keep the existing public interface in G1Policy, i.e. 
the add_bytes_allocated_in_old_since_last_gc. Making the old gen tracker 
object public does not seem to be advantageous.
I.e. imo it is good to group everything related to old gen allocation 
tracking into that helper class, but we do not need to expose that fact.

Maybe there is something in a follow up change that requires this?

   - the _old_gen_alloc_tracker instance can be aggregated within the 
G1Policy class directly, i.e. there is no need to make it a pointer and 
manage via new and delete afaics.

Maybe there is something in a follow up change that requires this?

- I would prefer if there were separate reset_after_[young_]gc() and a 
reset_after_full_gc() methods. Initially I asked myselves why for full 
gc that first parameter passed to reset_after_gc() is zero, and only 
when looking into the code found that it is ignored anyway. I think the 
API of that class isn't that huge yet.

Thanks,
   Thomas


From per.liden at oracle.com  Wed Jun  3 09:16:38 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 3 Jun 2020 11:16:38 +0200
Subject: [renamed] RFR: 8246135: Save important GC log lines and print
 them when dumping hs_err files
In-Reply-To: <239afd1c-f2ef-a5ec-36e6-d9a3970ac40d@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
 <239afd1c-f2ef-a5ec-36e6-d9a3970ac40d@oracle.com>
Message-ID: <ae0273b7-5269-9576-c83b-fc3b77719a58@oracle.com>

On 6/3/20 10:30 AM, Stefan Karlsson wrote:
> After further deliberations and discussions with Per, Erik and StefanJ, 
> I'm going to make this patch more generic, and split this CR into two 
> parts:
>  ?JDK-8246135: Save important GC log lines and print them when dumping 
> hs_err files
>  ?JDK-8246404: ZGC: Use GCLogPrecious for important logging lines
> 
> As StefanJ mentioned earlier there are a separate task for G1, Parallel, 
> and Serial:
>  ?JDK-8246272: Make use of GCLogPrecious for G1, Parallel and Serial
> 
> Since this patch is becoming more generic than it started out as, I've 
> done some more changes to accommodate that:
>  ?https://cr.openjdk.java.net/~stefank/8246135/webrev.06

Looks good!

/Per

> 
> - Moved the ZGC usages out of this patch
> - Moved initialization out of ZGC so that all GCs can simply start to 
> use this functionality
> - Added a lock to allow for concurrent usage
> - Don't print the GC Precious Log section if the GC isn't using it
> - Added a temp buffer to simplify generation of messages
> 
> The ZGC part has not changed and has already been reviewed:
>  ?https://cr.openjdk.java.net/~stefank/8246404/webrev.01/
> 
> Some of the stylistic changes in the patch is done to prepare for:
>  ?https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
> 
> Thanks,
> StefanK
> 
> 
> On 2020-06-02 15:51, Stefan Karlsson wrote:
>> Hi all,
>>
>> While working on getting more information when dumping early during 
>> initialization, it became apparent that we don't print these log lines 
>> as early as we could. In ZGC we can assert and/or fail during the set 
>> up of the heap. I'd like to print the precious lines even when that 
>> happens. The following patch moves the the precious line printing out 
>> of ZCollectedHeap and into a direct call from the VMError code. GCs 
>> that populate these lines will now automatically get them printed.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
>>
>> Thanks,
>> StefanK
>>
>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> Please review this patch to save some of the important ZGC log lines 
>>> and print them when dumping hs_err files.
>>>
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>
>>> The patch adds a concept of "precious" log lines. What's typically 
>>> logged are GC initialization lines, but also error messages are 
>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>> crashes or hits an assert. The lines can also be printed in a 
>>> debugger to get a quick overview when debugging.
>>>
>>> The precious lines are always saved, but just like any other Unified 
>>> Logging calls, only logged if the tags are enabled.
>>>
>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>> like this:
>>>
>>> ZGC Precious Log:
>>> ?NUMA Support: Disabled
>>> ?CPUs: 8 total, 8 available
>>> ?Memory: 16384M
>>> ?Large Page Support: Disabled
>>> ?Medium Page Size: 32M
>>> ?Workers: 5 parallel, 1 concurrent
>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>> ?Address Space Size: 65536M x 3 = 196608M
>>> ?Min Capacity: 42M
>>> ?Initial Capacity: 256M
>>> ?Max Capacity: 4096M
>>> ?Max Reserve: 42M
>>> ?Pre-touch: Disabled
>>> ?Uncommit: Enabled
>>> ?Uncommit Delay: 300s
>>> ?Runtime Workers: 5 parallel
>>>
>>> ZGC Globals:
>>> ?GlobalPhase:?????? 2 (Relocate)
>>> ?GlobalSeqNum:????? 1
>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>> ?Page Size Small:?? 2M
>>> ?Page Size Medium:? 32M
>>>
>>> ZGC Metadata Bits:
>>> ?Good:????????????? 0x0000100000000000
>>> ?Bad:?????????????? 0x00002c0000000000
>>> ?WeakBad:?????????? 0x00000c0000000000
>>> ?Marked:??????????? 0x0000040000000000
>>> ?Remapped:????????? 0x0000100000000000
>>>
>>> Heap:
>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>> reserved 1056768K
>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>> 1048576K
>>>
>>> ZGC Page Table:
>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>> Allocating
>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>> Allocating
>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>> Allocating
>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>> Allocating
>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>> Allocating
>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>> Allocating
>>>
>>> Thanks,
>>> StefanK
>>
> 


From per.liden at oracle.com  Wed Jun  3 09:17:09 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 3 Jun 2020 11:17:09 +0200
Subject: RFR: 8246405: Add GCLogPrecious functionality to log and report
 debug errors
In-Reply-To: <08bb1203-39f9-cf25-7014-e32585767396@oracle.com>
References: <08bb1203-39f9-cf25-7014-e32585767396@oracle.com>
Message-ID: <e6085c26-3be7-3173-43ba-a1bd890b8f55@oracle.com>

Looks good!

/Per

On 6/3/20 10:42 AM, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to enhance the GCLogPrecious functionality 
> (JDK-8246405) to add support for a way to both log and generate a crash 
> report in debug builds.
> 
> https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246405
> 
> I've split out a patch where ZGC uses this functionality:
> 
> https://cr.openjdk.java.net/~stefank/8246406/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246406
> 
> Tested manually by running:
> (ulimit -v >low value>; ../build/fastdebug/jdk/bin/java -XX:+UseZGC 
> -Xmx18m -Xlog:gc* -version)
> 
> and verified that it generates a hs_err file with the appropriate 
> information.
> 
> On macOS the output points to the right file and line number:
> 
> #? Internal Error (src/hotspot/share/gc/z/zVirtualMemory.cpp:46), 
> pid=67695, tid=8451
> #? Error: Failed to reserve enough address space for Java heap
> 
> but since TOUCH_ASSERT_POISON isn't implemented we don't get registers 
> and the output contains the GCLogPrecious code:
> 
> V? [libjvm.dylib+0xb3d95c]? VMError::report_and_die(int, char const*, 
> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char 
> const*, int, unsigned long)+0x670
> V? [libjvm.dylib+0xb3e083]? VMError::report_and_die(Thread*, void*, char 
> const*, int, char const*, char const*, __va_list_tag*)+0x47
> V? [libjvm.dylib+0x334b48]? report_vm_error(char const*, int, char 
> const*, char const*, ...)+0x145
> V? [libjvm.dylib+0x48d629] 
> GCLogPrecious::vwrite_and_debug(LogTargetHandle, char const*, 
> __va_list_tag*, char const*, int)+0x81
> V? [libjvm.dylib+0xbbdf70] GCLogPreciousHandle::write_and_debug(char 
> const*, ...)+0x92
> V? [libjvm.dylib+0xbd833e] 
> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0xb6
> 
> On Linux, where TOUCH_ASSERT_POISON is implemented, we get the last 
> parts cut away:
> 
> V? [libjvm.so+0x1857179] 
> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0x79
> V? [libjvm.so+0x182f84e]? ZPageAllocator::ZPageAllocator(ZWorkers*, 
> unsigned long, unsigned long, unsigned long, unsigned long)+0x6e
> V? [libjvm.so+0x1808b61]? ZHeap::ZHeap()+0x81
> V? [libjvm.so+0x1802559]? ZCollectedHeap::ZCollectedHeap()+0x49
> 
> Thanks,
> StefanK


From stefan.karlsson at oracle.com  Wed Jun  3 09:23:47 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 3 Jun 2020 11:23:47 +0200
Subject: [renamed] RFR: 8246135: Save important GC log lines and print
 them when dumping hs_err files
In-Reply-To: <ae0273b7-5269-9576-c83b-fc3b77719a58@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
 <239afd1c-f2ef-a5ec-36e6-d9a3970ac40d@oracle.com>
 <ae0273b7-5269-9576-c83b-fc3b77719a58@oracle.com>
Message-ID: <2fe1d30e-58b6-f5c2-32ad-9939dda214ef@oracle.com>

Thanks, Per!

StefanK

On 2020-06-03 11:16, Per Liden wrote:
> On 6/3/20 10:30 AM, Stefan Karlsson wrote:
>> After further deliberations and discussions with Per, Erik and 
>> StefanJ, I'm going to make this patch more generic, and split this CR 
>> into two parts:
>> ??JDK-8246135: Save important GC log lines and print them when 
>> dumping hs_err files
>> ??JDK-8246404: ZGC: Use GCLogPrecious for important logging lines
>>
>> As StefanJ mentioned earlier there are a separate task for G1, 
>> Parallel, and Serial:
>> ??JDK-8246272: Make use of GCLogPrecious for G1, Parallel and Serial
>>
>> Since this patch is becoming more generic than it started out as, 
>> I've done some more changes to accommodate that:
>> ??https://cr.openjdk.java.net/~stefank/8246135/webrev.06
>
> Looks good!
>
> /Per
>
>>
>> - Moved the ZGC usages out of this patch
>> - Moved initialization out of ZGC so that all GCs can simply start to 
>> use this functionality
>> - Added a lock to allow for concurrent usage
>> - Don't print the GC Precious Log section if the GC isn't using it
>> - Added a temp buffer to simplify generation of messages
>>
>> The ZGC part has not changed and has already been reviewed:
>> ??https://cr.openjdk.java.net/~stefank/8246404/webrev.01/
>>
>> Some of the stylistic changes in the patch is done to prepare for:
>> ??https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
>>
>> Thanks,
>> StefanK
>>
>>
>> On 2020-06-02 15:51, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> While working on getting more information when dumping early during 
>>> initialization, it became apparent that we don't print these log 
>>> lines as early as we could. In ZGC we can assert and/or fail during 
>>> the set up of the heap. I'd like to print the precious lines even 
>>> when that happens. The following patch moves the the precious line 
>>> printing out of ZCollectedHeap and into a direct call from the 
>>> VMError code. GCs that populate these lines will now automatically 
>>> get them printed.
>>>
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
>>>
>>> Thanks,
>>> StefanK
>>>
>>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>>> Hi all,
>>>>
>>>> Please review this patch to save some of the important ZGC log 
>>>> lines and print them when dumping hs_err files.
>>>>
>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>>
>>>> The patch adds a concept of "precious" log lines. What's typically 
>>>> logged are GC initialization lines, but also error messages are 
>>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>>> crashes or hits an assert. The lines can also be printed in a 
>>>> debugger to get a quick overview when debugging.
>>>>
>>>> The precious lines are always saved, but just like any other 
>>>> Unified Logging calls, only logged if the tags are enabled.
>>>>
>>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>>> like this:
>>>>
>>>> ZGC Precious Log:
>>>> ?NUMA Support: Disabled
>>>> ?CPUs: 8 total, 8 available
>>>> ?Memory: 16384M
>>>> ?Large Page Support: Disabled
>>>> ?Medium Page Size: 32M
>>>> ?Workers: 5 parallel, 1 concurrent
>>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>>> ?Address Space Size: 65536M x 3 = 196608M
>>>> ?Min Capacity: 42M
>>>> ?Initial Capacity: 256M
>>>> ?Max Capacity: 4096M
>>>> ?Max Reserve: 42M
>>>> ?Pre-touch: Disabled
>>>> ?Uncommit: Enabled
>>>> ?Uncommit Delay: 300s
>>>> ?Runtime Workers: 5 parallel
>>>>
>>>> ZGC Globals:
>>>> ?GlobalPhase:?????? 2 (Relocate)
>>>> ?GlobalSeqNum:????? 1
>>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>>> ?Page Size Small:?? 2M
>>>> ?Page Size Medium:? 32M
>>>>
>>>> ZGC Metadata Bits:
>>>> ?Good:????????????? 0x0000100000000000
>>>> ?Bad:?????????????? 0x00002c0000000000
>>>> ?WeakBad:?????????? 0x00000c0000000000
>>>> ?Marked:??????????? 0x0000040000000000
>>>> ?Remapped:????????? 0x0000100000000000
>>>>
>>>> Heap:
>>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>>> reserved 1056768K
>>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>>> 1048576K
>>>>
>>>> ZGC Page Table:
>>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>>> Allocating
>>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>>> Allocating
>>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>>> Allocating
>>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>>> Allocating
>>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>>> Allocating
>>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>>> Allocating
>>>>
>>>> Thanks,
>>>> StefanK
>>>
>>


From stefan.karlsson at oracle.com  Wed Jun  3 09:24:22 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 3 Jun 2020 11:24:22 +0200
Subject: RFR: 8246405: Add GCLogPrecious functionality to log and report
 debug errors
In-Reply-To: <e6085c26-3be7-3173-43ba-a1bd890b8f55@oracle.com>
References: <08bb1203-39f9-cf25-7014-e32585767396@oracle.com>
 <e6085c26-3be7-3173-43ba-a1bd890b8f55@oracle.com>
Message-ID: <a9629282-f7a5-94d8-9d14-10093c3059ce@oracle.com>

Thanks, Per!

StefanK

On 2020-06-03 11:17, Per Liden wrote:
> Looks good!
>
> /Per
>
> On 6/3/20 10:42 AM, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to enhance the GCLogPrecious functionality 
>> (JDK-8246405) to add support for a way to both log and generate a 
>> crash report in debug builds.
>>
>> https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246405
>>
>> I've split out a patch where ZGC uses this functionality:
>>
>> https://cr.openjdk.java.net/~stefank/8246406/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246406
>>
>> Tested manually by running:
>> (ulimit -v >low value>; ../build/fastdebug/jdk/bin/java -XX:+UseZGC 
>> -Xmx18m -Xlog:gc* -version)
>>
>> and verified that it generates a hs_err file with the appropriate 
>> information.
>>
>> On macOS the output points to the right file and line number:
>>
>> #? Internal Error (src/hotspot/share/gc/z/zVirtualMemory.cpp:46), 
>> pid=67695, tid=8451
>> #? Error: Failed to reserve enough address space for Java heap
>>
>> but since TOUCH_ASSERT_POISON isn't implemented we don't get 
>> registers and the output contains the GCLogPrecious code:
>>
>> V? [libjvm.dylib+0xb3d95c]? VMError::report_and_die(int, char const*, 
>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, 
>> char const*, int, unsigned long)+0x670
>> V? [libjvm.dylib+0xb3e083]? VMError::report_and_die(Thread*, void*, 
>> char const*, int, char const*, char const*, __va_list_tag*)+0x47
>> V? [libjvm.dylib+0x334b48]? report_vm_error(char const*, int, char 
>> const*, char const*, ...)+0x145
>> V? [libjvm.dylib+0x48d629] 
>> GCLogPrecious::vwrite_and_debug(LogTargetHandle, char const*, 
>> __va_list_tag*, char const*, int)+0x81
>> V? [libjvm.dylib+0xbbdf70] GCLogPreciousHandle::write_and_debug(char 
>> const*, ...)+0x92
>> V? [libjvm.dylib+0xbd833e] 
>> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0xb6
>>
>> On Linux, where TOUCH_ASSERT_POISON is implemented, we get the last 
>> parts cut away:
>>
>> V? [libjvm.so+0x1857179] 
>> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0x79
>> V? [libjvm.so+0x182f84e] ZPageAllocator::ZPageAllocator(ZWorkers*, 
>> unsigned long, unsigned long, unsigned long, unsigned long)+0x6e
>> V? [libjvm.so+0x1808b61]? ZHeap::ZHeap()+0x81
>> V? [libjvm.so+0x1802559]? ZCollectedHeap::ZCollectedHeap()+0x49
>>
>> Thanks,
>> StefanK


From shade at redhat.com  Wed Jun  3 10:45:24 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 3 Jun 2020 12:45:24 +0200
Subject: RFR (XS) 8246433: Shenandoah: walk roots in more efficient order in
 ShenandoahRootUpdater
Message-ID: <b998beb4-6a78-5868-2508-be615d90bce3@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8246433

Missed the spot in ShenandoahRootUpdater while doing JDK-8246100.

Fix:

diff -r d0d06b8be678 src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp        Fri May 29 11:58:00
2020 +0200
+++ b/src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp        Wed Jun 03 12:44:47
2020 +0200
@@ -237,18 +237,21 @@
                                   static_cast<CodeBlobToOopClosure*>(&blobs_and_disarm_Cl) :
                                   static_cast<CodeBlobToOopClosure*>(&update_blobs);

   CLDToOopClosure clds(keep_alive, ClassLoaderData::_claim_strong);

+  // Process serial-claiming roots first
   _serial_roots.oops_do(keep_alive, worker_id);
+  _serial_weak_roots.weak_oops_do(is_alive, keep_alive, worker_id);
+
+  // Process light-weight/limited parallel roots then
   _vm_roots.oops_do(keep_alive, worker_id);
+  _weak_roots.weak_oops_do(is_alive, keep_alive, worker_id);
+  _dedup_roots.oops_do(is_alive, keep_alive, worker_id);
+  _cld_roots.cld_do(&clds, worker_id);

-  _cld_roots.cld_do(&clds, worker_id);
+  // Process heavy-weight/fully parallel roots the last
   _code_roots.code_blobs_do(codes_cl, worker_id);
   _thread_roots.oops_do(keep_alive, NULL, worker_id);
-
-  _serial_weak_roots.weak_oops_do(is_alive, keep_alive, worker_id);
-  _weak_roots.weak_oops_do(is_alive, keep_alive, worker_id);
-  _dedup_roots.oops_do(is_alive, keep_alive, worker_id);
 }

 #endif // SHARE_GC_SHENANDOAH_SHENANDOAHROOTPROCESSOR_INLINE_HPP

Testing: hotspot_gc_shenandoah

-- 
Thanks,
-Aleksey


From stefan.johansson at oracle.com  Wed Jun  3 11:24:57 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Wed, 3 Jun 2020 13:24:57 +0200
Subject: [renamed] RFR: 8246135: Save important GC log lines and print
 them when dumping hs_err files
In-Reply-To: <239afd1c-f2ef-a5ec-36e6-d9a3970ac40d@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
 <239afd1c-f2ef-a5ec-36e6-d9a3970ac40d@oracle.com>
Message-ID: <12cb5a82-813b-92db-56be-0c6dd574581a@oracle.com>

Hi Stefan,

On 2020-06-03 10:30, Stefan Karlsson wrote:
> After further deliberations and discussions with Per, Erik and StefanJ, 
> I'm going to make this patch more generic, and split this CR into two 
> parts:
>  ?JDK-8246135: Save important GC log lines and print them when dumping 
> hs_err files
>  ?JDK-8246404: ZGC: Use GCLogPrecious for important logging lines
> 
> As StefanJ mentioned earlier there are a separate task for G1, Parallel, 
> and Serial:
>  ?JDK-8246272: Make use of GCLogPrecious for G1, Parallel and Serial
> 
> Since this patch is becoming more generic than it started out as, I've 
> done some more changes to accommodate that:
>  ?https://cr.openjdk.java.net/~stefank/8246135/webrev.06
> 
> - Moved the ZGC usages out of this patch
> - Moved initialization out of ZGC so that all GCs can simply start to 
> use this functionality
> - Added a lock to allow for concurrent usage
> - Don't print the GC Precious Log section if the GC isn't using it
> - Added a temp buffer to simplify generation of messages
Looks good!

StefanJ
> 
> The ZGC part has not changed and has already been reviewed:
>  ?https://cr.openjdk.java.net/~stefank/8246404/webrev.01/
> 
> Some of the stylistic changes in the patch is done to prepare for:
>  ?https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
> 
> Thanks,
> StefanK
> 
> 
> On 2020-06-02 15:51, Stefan Karlsson wrote:
>> Hi all,
>>
>> While working on getting more information when dumping early during 
>> initialization, it became apparent that we don't print these log lines 
>> as early as we could. In ZGC we can assert and/or fail during the set 
>> up of the heap. I'd like to print the precious lines even when that 
>> happens. The following patch moves the the precious line printing out 
>> of ZCollectedHeap and into a direct call from the VMError code. GCs 
>> that populate these lines will now automatically get them printed.
>>
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
>>
>> Thanks,
>> StefanK
>>
>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> Please review this patch to save some of the important ZGC log lines 
>>> and print them when dumping hs_err files.
>>>
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>
>>> The patch adds a concept of "precious" log lines. What's typically 
>>> logged are GC initialization lines, but also error messages are 
>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>> crashes or hits an assert. The lines can also be printed in a 
>>> debugger to get a quick overview when debugging.
>>>
>>> The precious lines are always saved, but just like any other Unified 
>>> Logging calls, only logged if the tags are enabled.
>>>
>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>> like this:
>>>
>>> ZGC Precious Log:
>>> ?NUMA Support: Disabled
>>> ?CPUs: 8 total, 8 available
>>> ?Memory: 16384M
>>> ?Large Page Support: Disabled
>>> ?Medium Page Size: 32M
>>> ?Workers: 5 parallel, 1 concurrent
>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>> ?Address Space Size: 65536M x 3 = 196608M
>>> ?Min Capacity: 42M
>>> ?Initial Capacity: 256M
>>> ?Max Capacity: 4096M
>>> ?Max Reserve: 42M
>>> ?Pre-touch: Disabled
>>> ?Uncommit: Enabled
>>> ?Uncommit Delay: 300s
>>> ?Runtime Workers: 5 parallel
>>>
>>> ZGC Globals:
>>> ?GlobalPhase:?????? 2 (Relocate)
>>> ?GlobalSeqNum:????? 1
>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>> ?Page Size Small:?? 2M
>>> ?Page Size Medium:? 32M
>>>
>>> ZGC Metadata Bits:
>>> ?Good:????????????? 0x0000100000000000
>>> ?Bad:?????????????? 0x00002c0000000000
>>> ?WeakBad:?????????? 0x00000c0000000000
>>> ?Marked:??????????? 0x0000040000000000
>>> ?Remapped:????????? 0x0000100000000000
>>>
>>> Heap:
>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>> reserved 1056768K
>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>> 1048576K
>>>
>>> ZGC Page Table:
>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>> Allocating
>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>> Allocating
>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>> Allocating
>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>> Allocating
>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>> Allocating
>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>> Allocating
>>>
>>> Thanks,
>>> StefanK
>>
> 


From stefan.karlsson at oracle.com  Wed Jun  3 11:51:56 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 3 Jun 2020 13:51:56 +0200
Subject: [renamed] RFR: 8246135: Save important GC log lines and print
 them when dumping hs_err files
In-Reply-To: <12cb5a82-813b-92db-56be-0c6dd574581a@oracle.com>
References: <e3099206-a1ab-8202-5713-1cd7f6e81920@oracle.com>
 <208f2f9e-4387-b851-7097-3f16e710a21d@oracle.com>
 <239afd1c-f2ef-a5ec-36e6-d9a3970ac40d@oracle.com>
 <12cb5a82-813b-92db-56be-0c6dd574581a@oracle.com>
Message-ID: <c497b5e9-edb4-0959-ba9a-5eb2ead25526@oracle.com>

Thanks, StefanJ!

StefanK

On 2020-06-03 13:24, stefan.johansson at oracle.com wrote:
> Hi Stefan,
>
> On 2020-06-03 10:30, Stefan Karlsson wrote:
>> After further deliberations and discussions with Per, Erik and 
>> StefanJ, I'm going to make this patch more generic, and split this CR 
>> into two parts:
>> ??JDK-8246135: Save important GC log lines and print them when 
>> dumping hs_err files
>> ??JDK-8246404: ZGC: Use GCLogPrecious for important logging lines
>>
>> As StefanJ mentioned earlier there are a separate task for G1, 
>> Parallel, and Serial:
>> ??JDK-8246272: Make use of GCLogPrecious for G1, Parallel and Serial
>>
>> Since this patch is becoming more generic than it started out as, 
>> I've done some more changes to accommodate that:
>> ??https://cr.openjdk.java.net/~stefank/8246135/webrev.06
>>
>> - Moved the ZGC usages out of this patch
>> - Moved initialization out of ZGC so that all GCs can simply start to 
>> use this functionality
>> - Added a lock to allow for concurrent usage
>> - Don't print the GC Precious Log section if the GC isn't using it
>> - Added a temp buffer to simplify generation of messages
> Looks good!
>
> StefanJ
>>
>> The ZGC part has not changed and has already been reviewed:
>> ??https://cr.openjdk.java.net/~stefank/8246404/webrev.01/
>>
>> Some of the stylistic changes in the patch is done to prepare for:
>> ??https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
>>
>> Thanks,
>> StefanK
>>
>>
>> On 2020-06-02 15:51, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> While working on getting more information when dumping early during 
>>> initialization, it became apparent that we don't print these log 
>>> lines as early as we could. In ZGC we can assert and/or fail during 
>>> the set up of the heap. I'd like to print the precious lines even 
>>> when that happens. The following patch moves the the precious line 
>>> printing out of ZCollectedHeap and into a direct call from the 
>>> VMError code. GCs that populate these lines will now automatically 
>>> get them printed.
>>>
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04.delta
>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.04
>>>
>>> Thanks,
>>> StefanK
>>>
>>> On 2020-05-29 12:23, Stefan Karlsson wrote:
>>>> Hi all,
>>>>
>>>> Please review this patch to save some of the important ZGC log 
>>>> lines and print them when dumping hs_err files.
>>>>
>>>> https://cr.openjdk.java.net/~stefank/8246135/webrev.01/
>>>> https://bugs.openjdk.java.net/browse/JDK-8246135
>>>>
>>>> The patch adds a concept of "precious" log lines. What's typically 
>>>> logged are GC initialization lines, but also error messages are 
>>>> saved. These lines are then dumped in the hs_err file if the JVM 
>>>> crashes or hits an assert. The lines can also be printed in a 
>>>> debugger to get a quick overview when debugging.
>>>>
>>>> The precious lines are always saved, but just like any other 
>>>> Unified Logging calls, only logged if the tags are enabled.
>>>>
>>>> The patch builds on the JDK-8246134 patch. The hs_err output looks 
>>>> like this:
>>>>
>>>> ZGC Precious Log:
>>>> ?NUMA Support: Disabled
>>>> ?CPUs: 8 total, 8 available
>>>> ?Memory: 16384M
>>>> ?Large Page Support: Disabled
>>>> ?Medium Page Size: 32M
>>>> ?Workers: 5 parallel, 1 concurrent
>>>> ?Address Space Type: Contiguous/Unrestricted/Complete
>>>> ?Address Space Size: 65536M x 3 = 196608M
>>>> ?Min Capacity: 42M
>>>> ?Initial Capacity: 256M
>>>> ?Max Capacity: 4096M
>>>> ?Max Reserve: 42M
>>>> ?Pre-touch: Disabled
>>>> ?Uncommit: Enabled
>>>> ?Uncommit Delay: 300s
>>>> ?Runtime Workers: 5 parallel
>>>>
>>>> ZGC Globals:
>>>> ?GlobalPhase:?????? 2 (Relocate)
>>>> ?GlobalSeqNum:????? 1
>>>> ?Offset Max:??????? 4096G (0x0000040000000000)
>>>> ?Page Size Small:?? 2M
>>>> ?Page Size Medium:? 32M
>>>>
>>>> ZGC Metadata Bits:
>>>> ?Good:????????????? 0x0000100000000000
>>>> ?Bad:?????????????? 0x00002c0000000000
>>>> ?WeakBad:?????????? 0x00000c0000000000
>>>> ?Marked:??????????? 0x0000040000000000
>>>> ?Remapped:????????? 0x0000100000000000
>>>>
>>>> Heap:
>>>> ?ZHeap?????????? used 12M, capacity 256M, max capacity 4096M
>>>> ?Metaspace?????? used 6501K, capacity 6615K, committed 6784K, 
>>>> reserved 1056768K
>>>> ? class space??? used 559K, capacity 588K, committed 640K, reserved 
>>>> 1048576K
>>>>
>>>> ZGC Page Table:
>>>> ?Small?? 0x0000000000000000 0x0000000000200000 0x0000000000200000 
>>>> Allocating
>>>> ?Small?? 0x0000000000200000 0x0000000000240000 0x0000000000400000 
>>>> Allocating
>>>> ?Small?? 0x0000000000400000 0x0000000000600000 0x0000000000600000 
>>>> Allocating
>>>> ?Small?? 0x0000000000600000 0x0000000000800000 0x0000000000800000 
>>>> Allocating
>>>> ?Small?? 0x0000000000800000 0x00000000009c0000 0x0000000000a00000 
>>>> Allocating
>>>> ?Small?? 0x0000000000a00000 0x0000000000a40000 0x0000000000c00000 
>>>> Allocating
>>>>
>>>> Thanks,
>>>> StefanK
>>>
>>


From zgu at redhat.com  Wed Jun  3 12:01:38 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 3 Jun 2020 08:01:38 -0400
Subject: RFR (XS) 8246433: Shenandoah: walk roots in more efficient order
 in ShenandoahRootUpdater
In-Reply-To: <b998beb4-6a78-5868-2508-be615d90bce3@redhat.com>
References: <b998beb4-6a78-5868-2508-be615d90bce3@redhat.com>
Message-ID: <2ebdb1b7-b9a3-301a-1cc6-5ab3e9966eec@redhat.com>

Looks good.

-Zhengyu

On 6/3/20 6:45 AM, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8246433
> 
> Missed the spot in ShenandoahRootUpdater while doing JDK-8246100.
> 
> Fix:
> 
> diff -r d0d06b8be678 src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp        Fri May 29 11:58:00
> 2020 +0200
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp        Wed Jun 03 12:44:47
> 2020 +0200
> @@ -237,18 +237,21 @@
>                                     static_cast<CodeBlobToOopClosure*>(&blobs_and_disarm_Cl) :
>                                     static_cast<CodeBlobToOopClosure*>(&update_blobs);
> 
>     CLDToOopClosure clds(keep_alive, ClassLoaderData::_claim_strong);
> 
> +  // Process serial-claiming roots first
>     _serial_roots.oops_do(keep_alive, worker_id);
> +  _serial_weak_roots.weak_oops_do(is_alive, keep_alive, worker_id);
> +
> +  // Process light-weight/limited parallel roots then
>     _vm_roots.oops_do(keep_alive, worker_id);
> +  _weak_roots.weak_oops_do(is_alive, keep_alive, worker_id);
> +  _dedup_roots.oops_do(is_alive, keep_alive, worker_id);
> +  _cld_roots.cld_do(&clds, worker_id);
> 
> -  _cld_roots.cld_do(&clds, worker_id);
> +  // Process heavy-weight/fully parallel roots the last
>     _code_roots.code_blobs_do(codes_cl, worker_id);
>     _thread_roots.oops_do(keep_alive, NULL, worker_id);
> -
> -  _serial_weak_roots.weak_oops_do(is_alive, keep_alive, worker_id);
> -  _weak_roots.weak_oops_do(is_alive, keep_alive, worker_id);
> -  _dedup_roots.oops_do(is_alive, keep_alive, worker_id);
>   }
> 
>   #endif // SHARE_GC_SHENANDOAH_SHENANDOAHROOTPROCESSOR_INLINE_HPP
> 
> Testing: hotspot_gc_shenandoah
> 


From stefan.johansson at oracle.com  Wed Jun  3 12:14:15 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Wed, 3 Jun 2020 14:14:15 +0200
Subject: RFR: 8246258: Enable hs_err heap printing earlier during
 initialization
In-Reply-To: <db3ef810-0718-00f2-19b1-d2442ea99efc@oracle.com>
References: <db3ef810-0718-00f2-19b1-d2442ea99efc@oracle.com>
Message-ID: <51f9fdf9-99f2-a0c2-a1ff-f8f98e9305c3@oracle.com>

Hi Stefan,

On 2020-06-01 17:55, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to enable the hs_err GC / heap printing 
> directly after the heap has been set up.
> 
> https://cr.openjdk.java.net/~stefank/8246258/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246258
Looks good,
StefanJ

> 
> Changes in the patch:
> - Remove the Universe::is_fully_initialized
> - Add NULL initializations and checks in print paths
> 
> I tested this patch by adding a temporary fatal(...) here:
> 
> jint Universe::initialize_heap() {
>  ? assert(_collectedHeap == NULL, "Heap already created");
>  ? _collectedHeap = GCConfig::arguments()->create_heap();
>  ? // <<<< HERE
>  ? log_info(gc)("Using %s", _collectedHeap->name());
>  ? return _collectedHeap->initialize();
> }
> 
> and manually looking at the result when running with all GCs. Will run 
> this through tier1-3.
> 
> Thanks,
> StefanK


From stefan.karlsson at oracle.com  Wed Jun  3 12:15:29 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 3 Jun 2020 14:15:29 +0200
Subject: RFR: 8246258: Enable hs_err heap printing earlier during
 initialization
In-Reply-To: <51f9fdf9-99f2-a0c2-a1ff-f8f98e9305c3@oracle.com>
References: <db3ef810-0718-00f2-19b1-d2442ea99efc@oracle.com>
 <51f9fdf9-99f2-a0c2-a1ff-f8f98e9305c3@oracle.com>
Message-ID: <31010b7f-4b3a-52ae-6c6b-5b0c6b98a432@oracle.com>

Thanks, StefanJ.

StefanK

On 2020-06-03 14:14, stefan.johansson at oracle.com wrote:
> Hi Stefan,
>
> On 2020-06-01 17:55, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to enable the hs_err GC / heap printing 
>> directly after the heap has been set up.
>>
>> https://cr.openjdk.java.net/~stefank/8246258/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246258
> Looks good,
> StefanJ
>
>>
>> Changes in the patch:
>> - Remove the Universe::is_fully_initialized
>> - Add NULL initializations and checks in print paths
>>
>> I tested this patch by adding a temporary fatal(...) here:
>>
>> jint Universe::initialize_heap() {
>> ?? assert(_collectedHeap == NULL, "Heap already created");
>> ?? _collectedHeap = GCConfig::arguments()->create_heap();
>> ?? // <<<< HERE
>> ?? log_info(gc)("Using %s", _collectedHeap->name());
>> ?? return _collectedHeap->initialize();
>> }
>>
>> and manually looking at the result when running with all GCs. Will 
>> run this through tier1-3.
>>
>> Thanks,
>> StefanK


From erik.osterlund at oracle.com  Wed Jun  3 15:01:53 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Wed, 3 Jun 2020 17:01:53 +0200
Subject: RFR: 8246405: Add GCLogPrecious functionality to log and report
 debug errors
In-Reply-To: <08bb1203-39f9-cf25-7014-e32585767396@oracle.com>
References: <08bb1203-39f9-cf25-7014-e32585767396@oracle.com>
Message-ID: <17def85c-103e-9b4d-dbec-2a5578e0672a@oracle.com>

Hi Stefan,

Looks good.

/Erik

On 2020-06-03 10:42, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to enhance the GCLogPrecious functionality 
> (JDK-8246405) to add support for a way to both log and generate a 
> crash report in debug builds.
>
> https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246405
>
> I've split out a patch where ZGC uses this functionality:
>
> https://cr.openjdk.java.net/~stefank/8246406/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246406
>
> Tested manually by running:
> (ulimit -v >low value>; ../build/fastdebug/jdk/bin/java -XX:+UseZGC 
> -Xmx18m -Xlog:gc* -version)
>
> and verified that it generates a hs_err file with the appropriate 
> information.
>
> On macOS the output points to the right file and line number:
>
> #? Internal Error (src/hotspot/share/gc/z/zVirtualMemory.cpp:46), 
> pid=67695, tid=8451
> #? Error: Failed to reserve enough address space for Java heap
>
> but since TOUCH_ASSERT_POISON isn't implemented we don't get registers 
> and the output contains the GCLogPrecious code:
>
> V? [libjvm.dylib+0xb3d95c]? VMError::report_and_die(int, char const*, 
> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, 
> char const*, int, unsigned long)+0x670
> V? [libjvm.dylib+0xb3e083] VMError::report_and_die(Thread*, void*, 
> char const*, int, char const*, char const*, __va_list_tag*)+0x47
> V? [libjvm.dylib+0x334b48]? report_vm_error(char const*, int, char 
> const*, char const*, ...)+0x145
> V? [libjvm.dylib+0x48d629] 
> GCLogPrecious::vwrite_and_debug(LogTargetHandle, char const*, 
> __va_list_tag*, char const*, int)+0x81
> V? [libjvm.dylib+0xbbdf70] GCLogPreciousHandle::write_and_debug(char 
> const*, ...)+0x92
> V? [libjvm.dylib+0xbd833e] 
> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0xb6
>
> On Linux, where TOUCH_ASSERT_POISON is implemented, we get the last 
> parts cut away:
>
> V? [libjvm.so+0x1857179] 
> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0x79
> V? [libjvm.so+0x182f84e] ZPageAllocator::ZPageAllocator(ZWorkers*, 
> unsigned long, unsigned long, unsigned long, unsigned long)+0x6e
> V? [libjvm.so+0x1808b61]? ZHeap::ZHeap()+0x81
> V? [libjvm.so+0x1802559] ZCollectedHeap::ZCollectedHeap()+0x49
>
> Thanks,
> StefanK


From volker.simonis at gmail.com  Wed Jun  3 15:02:52 2020
From: volker.simonis at gmail.com (Volker Simonis)
Date: Wed, 3 Jun 2020 17:02:52 +0200
Subject: Need help to fix a potential G1 crash in jdk11
Message-ID: <CA+3eh11vEWqwn+fLvgG=v8gi0fTRLebo8cYOnS7qBWCueJbH-Q@mail.gmail.com>

Hi,

I would appreciate some help/advice for debugging a potential G1 crash in
jdk 11. The crash usually occurs when running a proprietary jar file for
about 20-30 minutes and it happens in various parts of the VM (C1- or
C2-compiled code, interpreter, GC). Because the crash locations are so
different and because the customer which reported the issue claimed that it
doesn't happen with Parallel GC, I thought it might be a G1 issue. I
couldn't reproduce the crash with jdk 12 and 14 (but with jdk 11 and
11.0.7, OpenJDK and Oracle JDK). When looking at the G1 changes in jdk 12 I
couldn't find any apparent bug fix which potentially solves this problem
but it may have been solved by one of the many G1 changes which happened in
jdk 12.

I did run the reproducer with "-XX:+UnlockDiagnosticVMOptions
-XX:+VerifyBeforeGC -XX:+VerifyAfterGC -XX:+VerifyDuringGC
-XX:+CheckJNICalls -XX:+G1VerifyRSetsDuringFullGC
-XX:+G1VerifyHeapRegionCodeRoots" and I indeed got verification errors (see
[1] for a complete hs_err file). Sometimes it's just a few fields pointing
to dead objects:

[1035.782s][error][gc,verify         ] ----------
[1035.782s][error][gc,verify         ] Field 0x00000000fb509148 of live obj
0x00000000fb509130 in region [0x00000000fb500000, 0x00000000fb600000)
[1035.782s][error][gc,verify         ] class name
org.antlr.v4.runtime.atn.ATNConfig
[1035.782s][error][gc,verify         ] points to dead obj
0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
[1035.782s][error][gc,verify         ] class name
org.antlr.v4.runtime.atn.SingletonPredictionContext
[1035.782s][error][gc,verify         ] ----------
[1035.783s][error][gc,verify         ] Field 0x00000000fb509168 of live obj
0x00000000fb509150 in region [0x00000000fb500000, 0x00000000fb600000)
[1035.783s][error][gc,verify         ] class name
org.antlr.v4.runtime.atn.ATNConfig
[1035.783s][error][gc,verify         ] points to dead obj
0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
[1035.783s][error][gc,verify         ] class name
org.antlr.v4.runtime.atn.SingletonPredictionContext
[1035.783s][error][gc,verify         ] ----------
...
[1043.928s][error][gc,verify         ] Heap Regions: E=young(eden),
S=young(survivor), O=old, HS=humongous(starts), HC=humongous(continues),
CS=collection set, F=free, A=archive, TAMS=top-at-mark-start (previous,
next)
...
[1043.929s][error][gc,verify         ] |  79|0x00000000f9b00000,
0x00000000f9bfffe8, 0x00000000f9c00000| 99%| O|  |TAMS 0x00000000f9bfffe8,
0x00000000f9b00000| Updating
...
[1043.971s][error][gc,verify         ] | 105|0x00000000fb500000,
0x00000000fb54fc08, 0x00000000fb600000| 31%| S|CS|TAMS 0x00000000fb500000,
0x00000000fb500000| Complete

but I also got verification errors with more than 30000 fields of distinct
objects pointing to more than 1000 dead objects. How can that happen? Is
the verification always accurate or can this also be a problem with the
verification itself and I'm hunting the wrong problem?

Sometimes I also saw verification errors where fields point to objects in
regions with "Untracked remset":

[673.762s][error][gc,verify] ----------
[673.762s][error][gc,verify] Field 0x00000000fca49298 of live obj
0x00000000fca49280 in region [0x00000000fca0000
0, 0x00000000fcb00000)
[673.762s][error][gc,verify] class name org.antlr.v4.runtime.atn.ATNConfig
[673.762s][error][gc,verify] points to obj 0x00000000f9d5a9a0 in region
81:(F)[0x00000000f9d00000,0x00000000f9d00000,0x00000000f9e00000] remset
Untracked
[673.762s][error][gc,verify] ----------

But they are by far not that common like the pointers to dead objects. Once
I even saw a "Root location" pointing to a dead object:

[369.808s][error][gc,verify] Root location 0x00007f35bb33f1f8 points to
dead obj 0x00000000f87fa200
[369.808s][error][gc,verify] org.antlr.v4.runtime.atn.PredictionContextCache
[369.808s][error][gc,verify] {0x00000000f87fa200} - klass:
'org/antlr/v4/runtime/atn/PredictionContextCache'
[369.850s][error][gc,verify] ----------
[369.850s][error][gc,verify] Field 0x00000000fbc60900 of live obj
0x00000000fbc608f0 in region [0x00000000fbc00000, 0x00000000fbd00000)
[369.850s][error][gc,verify] class name
org.antlr.v4.runtime.atn.ParserATNSimulator
[369.850s][error][gc,verify] points to dead obj 0x00000000f87fa200 in
region [0x00000000f8700000, 0x00000000f8800000)
[369.850s][error][gc,verify] class name
org.antlr.v4.runtime.atn.PredictionContextCache
[369.850s][error][gc,verify] ----------

All these verification errors occur after the Remark phase in
G1ConcurrentMark::remark() at:

verify_during_pause(G1HeapVerifier::G1VerifyRemark,
VerifyOption_G1UsePrevMarking, "Remark after");

V  [libjvm.so+0x6ca186]  report_vm_error(char const*, int, char const*,
char const*, ...)+0x106
V  [libjvm.so+0x7d4a99]  G1HeapVerifier::verify(VerifyOption)+0x399
V  [libjvm.so+0xe128bb]  Universe::verify(VerifyOption, char const*)+0x16b
V  [libjvm.so+0x7d44ee]
 G1HeapVerifier::verify(G1HeapVerifier::G1VerifyType, VerifyOption, char
const*)+0x9e
V  [libjvm.so+0x7addcf]
 G1ConcurrentMark::verify_during_pause(G1HeapVerifier::G1VerifyType,
VerifyOption, char const*)+0x9f
V  [libjvm.so+0x7b172e]  G1ConcurrentMark::remark()+0x3be
V  [libjvm.so+0xe6a5e1]  VM_CGC_Operation::doit()+0x211
V  [libjvm.so+0xe69908]  VM_Operation::evaluate()+0xd8
V  [libjvm.so+0xe6713f]  VMThread::evaluate_operation(VM_Operation*) [clone
.constprop.54]+0xff
V  [libjvm.so+0xe6764e]  VMThread::loop()+0x3be
V  [libjvm.so+0xe67a7b]  VMThread::run()+0x7b

The GC log output looks as follows:
...
[1035.775s][info ][gc,verify,start   ] Verifying During GC (Remark after)
[1035.775s][debug][gc,verify         ] Threads
[1035.776s][debug][gc,verify         ] Heap
[1035.776s][debug][gc,verify         ] Roots
[1035.782s][debug][gc,verify         ] HeapRegionSets
[1035.782s][debug][gc,verify         ] HeapRegions
[1035.782s][error][gc,verify         ] ----------
...
A more complete GC log can be found here [2].

For the field 0x00000000fb509148 of live obj 0x00000000fb509130 which
points to the dead object 0x00000000f9ba39b0 I get the following
information if I inspect them with clhsdb:

hsdb> inspect 0x00000000fb509130
instance of Oop for org/antlr/v4/runtime/atn/ATNConfig @ 0x00000000fb509130
@ 0x00000000fb509130 (size = 32)
_mark: 13
_metadata._compressed_klass: InstanceKlass for
org/antlr/v4/runtime/atn/ATNConfig
state: Oop for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8 Oop
for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
alt: 1
context: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
0x00000000f9ba39b0 Oop for
org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba39b0
reachesIntoOuterContext: 8
semanticContext: Oop for org/antlr/v4/runtime/atn/SemanticContext$Predicate
@ 0x00000000f83d57c0 Oop for
org/antlr/v4/runtime/atn/SemanticContext$Predicate @ 0x00000000f83d57c0

hsdb> inspect 0x00000000f9ba39b0
instance of Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
0x00000000f9ba39b0 @ 0x00000000f9ba39b0 (size = 32)
_mark: 41551306041
_metadata._compressed_klass: InstanceKlass for
org/antlr/v4/runtime/atn/SingletonPredictionContext
id: 100635259
cachedHashCode: 2005943142
parent: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
0x00000000f9ba01b0 Oop for
org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba01b0
returnState: 18228

I could also reproduce the verification errors with a fast debug build of
11.0.7 which I did run with "-XX:+CheckCompressedOops -XX:+VerifyOops
-XX:+G1VerifyCTCleanup -XX:+G1VerifyBitmaps" in addition to the options
mentioned before, but unfortunaltey the run didn't trigger neither an
assertion nor a different verification error.

So to summarize, my basic questions are:
 - has somebody else encountered similar crashes?
 - is someone aware of specific changes in jdk12 which might solve this
problem?
 - are the verification errors I'm seeing accurate or is it possible to get
false positives when running with -XX:Verify{Before,During,After}GC ?

Thanks for your patience,
Volker

[1]
http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/hs_err_pid28294.log
[2]
http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/verify-error.log


From erik.osterlund at oracle.com  Wed Jun  3 15:26:04 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Wed, 3 Jun 2020 17:26:04 +0200
Subject: Need help to fix a potential G1 crash in jdk11
In-Reply-To: <CA+3eh11vEWqwn+fLvgG=v8gi0fTRLebo8cYOnS7qBWCueJbH-Q@mail.gmail.com>
References: <CA+3eh11vEWqwn+fLvgG=v8gi0fTRLebo8cYOnS7qBWCueJbH-Q@mail.gmail.com>
Message-ID: <e553e35e-3cb3-1c03-048a-3e4e7adad923@oracle.com>

Hi Volker,

In JDK 12, I changed quite a bit how G1 performs class unloading, to a 
new model.
Since the verification runs just after class unloading, I guess it could 
be interesting
to check if the error happens with -XX:-ClassUnloading as well. If not, 
then perhaps
some of my class unloading changes for G1 in JDK 12 fixed the problem.

Just a gut feeling...

Thanks,
/Erik

On 2020-06-03 17:02, Volker Simonis wrote:
> Hi,
>
> I would appreciate some help/advice for debugging a potential G1 crash in
> jdk 11. The crash usually occurs when running a proprietary jar file for
> about 20-30 minutes and it happens in various parts of the VM (C1- or
> C2-compiled code, interpreter, GC). Because the crash locations are so
> different and because the customer which reported the issue claimed that it
> doesn't happen with Parallel GC, I thought it might be a G1 issue. I
> couldn't reproduce the crash with jdk 12 and 14 (but with jdk 11 and
> 11.0.7, OpenJDK and Oracle JDK). When looking at the G1 changes in jdk 12 I
> couldn't find any apparent bug fix which potentially solves this problem
> but it may have been solved by one of the many G1 changes which happened in
> jdk 12.
>
> I did run the reproducer with "-XX:+UnlockDiagnosticVMOptions
> -XX:+VerifyBeforeGC -XX:+VerifyAfterGC -XX:+VerifyDuringGC
> -XX:+CheckJNICalls -XX:+G1VerifyRSetsDuringFullGC
> -XX:+G1VerifyHeapRegionCodeRoots" and I indeed got verification errors (see
> [1] for a complete hs_err file). Sometimes it's just a few fields pointing
> to dead objects:
>
> [1035.782s][error][gc,verify         ] ----------
> [1035.782s][error][gc,verify         ] Field 0x00000000fb509148 of live obj
> 0x00000000fb509130 in region [0x00000000fb500000, 0x00000000fb600000)
> [1035.782s][error][gc,verify         ] class name
> org.antlr.v4.runtime.atn.ATNConfig
> [1035.782s][error][gc,verify         ] points to dead obj
> 0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
> [1035.782s][error][gc,verify         ] class name
> org.antlr.v4.runtime.atn.SingletonPredictionContext
> [1035.782s][error][gc,verify         ] ----------
> [1035.783s][error][gc,verify         ] Field 0x00000000fb509168 of live obj
> 0x00000000fb509150 in region [0x00000000fb500000, 0x00000000fb600000)
> [1035.783s][error][gc,verify         ] class name
> org.antlr.v4.runtime.atn.ATNConfig
> [1035.783s][error][gc,verify         ] points to dead obj
> 0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
> [1035.783s][error][gc,verify         ] class name
> org.antlr.v4.runtime.atn.SingletonPredictionContext
> [1035.783s][error][gc,verify         ] ----------
> ...
> [1043.928s][error][gc,verify         ] Heap Regions: E=young(eden),
> S=young(survivor), O=old, HS=humongous(starts), HC=humongous(continues),
> CS=collection set, F=free, A=archive, TAMS=top-at-mark-start (previous,
> next)
> ...
> [1043.929s][error][gc,verify         ] |  79|0x00000000f9b00000,
> 0x00000000f9bfffe8, 0x00000000f9c00000| 99%| O|  |TAMS 0x00000000f9bfffe8,
> 0x00000000f9b00000| Updating
> ...
> [1043.971s][error][gc,verify         ] | 105|0x00000000fb500000,
> 0x00000000fb54fc08, 0x00000000fb600000| 31%| S|CS|TAMS 0x00000000fb500000,
> 0x00000000fb500000| Complete
>
> but I also got verification errors with more than 30000 fields of distinct
> objects pointing to more than 1000 dead objects. How can that happen? Is
> the verification always accurate or can this also be a problem with the
> verification itself and I'm hunting the wrong problem?
>
> Sometimes I also saw verification errors where fields point to objects in
> regions with "Untracked remset":
>
> [673.762s][error][gc,verify] ----------
> [673.762s][error][gc,verify] Field 0x00000000fca49298 of live obj
> 0x00000000fca49280 in region [0x00000000fca0000
> 0, 0x00000000fcb00000)
> [673.762s][error][gc,verify] class name org.antlr.v4.runtime.atn.ATNConfig
> [673.762s][error][gc,verify] points to obj 0x00000000f9d5a9a0 in region
> 81:(F)[0x00000000f9d00000,0x00000000f9d00000,0x00000000f9e00000] remset
> Untracked
> [673.762s][error][gc,verify] ----------
>
> But they are by far not that common like the pointers to dead objects. Once
> I even saw a "Root location" pointing to a dead object:
>
> [369.808s][error][gc,verify] Root location 0x00007f35bb33f1f8 points to
> dead obj 0x00000000f87fa200
> [369.808s][error][gc,verify] org.antlr.v4.runtime.atn.PredictionContextCache
> [369.808s][error][gc,verify] {0x00000000f87fa200} - klass:
> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> [369.850s][error][gc,verify] ----------
> [369.850s][error][gc,verify] Field 0x00000000fbc60900 of live obj
> 0x00000000fbc608f0 in region [0x00000000fbc00000, 0x00000000fbd00000)
> [369.850s][error][gc,verify] class name
> org.antlr.v4.runtime.atn.ParserATNSimulator
> [369.850s][error][gc,verify] points to dead obj 0x00000000f87fa200 in
> region [0x00000000f8700000, 0x00000000f8800000)
> [369.850s][error][gc,verify] class name
> org.antlr.v4.runtime.atn.PredictionContextCache
> [369.850s][error][gc,verify] ----------
>
> All these verification errors occur after the Remark phase in
> G1ConcurrentMark::remark() at:
>
> verify_during_pause(G1HeapVerifier::G1VerifyRemark,
> VerifyOption_G1UsePrevMarking, "Remark after");
>
> V  [libjvm.so+0x6ca186]  report_vm_error(char const*, int, char const*,
> char const*, ...)+0x106
> V  [libjvm.so+0x7d4a99]  G1HeapVerifier::verify(VerifyOption)+0x399
> V  [libjvm.so+0xe128bb]  Universe::verify(VerifyOption, char const*)+0x16b
> V  [libjvm.so+0x7d44ee]
>   G1HeapVerifier::verify(G1HeapVerifier::G1VerifyType, VerifyOption, char
> const*)+0x9e
> V  [libjvm.so+0x7addcf]
>   G1ConcurrentMark::verify_during_pause(G1HeapVerifier::G1VerifyType,
> VerifyOption, char const*)+0x9f
> V  [libjvm.so+0x7b172e]  G1ConcurrentMark::remark()+0x3be
> V  [libjvm.so+0xe6a5e1]  VM_CGC_Operation::doit()+0x211
> V  [libjvm.so+0xe69908]  VM_Operation::evaluate()+0xd8
> V  [libjvm.so+0xe6713f]  VMThread::evaluate_operation(VM_Operation*) [clone
> .constprop.54]+0xff
> V  [libjvm.so+0xe6764e]  VMThread::loop()+0x3be
> V  [libjvm.so+0xe67a7b]  VMThread::run()+0x7b
>
> The GC log output looks as follows:
> ...
> [1035.775s][info ][gc,verify,start   ] Verifying During GC (Remark after)
> [1035.775s][debug][gc,verify         ] Threads
> [1035.776s][debug][gc,verify         ] Heap
> [1035.776s][debug][gc,verify         ] Roots
> [1035.782s][debug][gc,verify         ] HeapRegionSets
> [1035.782s][debug][gc,verify         ] HeapRegions
> [1035.782s][error][gc,verify         ] ----------
> ...
> A more complete GC log can be found here [2].
>
> For the field 0x00000000fb509148 of live obj 0x00000000fb509130 which
> points to the dead object 0x00000000f9ba39b0 I get the following
> information if I inspect them with clhsdb:
>
> hsdb> inspect 0x00000000fb509130
> instance of Oop for org/antlr/v4/runtime/atn/ATNConfig @ 0x00000000fb509130
> @ 0x00000000fb509130 (size = 32)
> _mark: 13
> _metadata._compressed_klass: InstanceKlass for
> org/antlr/v4/runtime/atn/ATNConfig
> state: Oop for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8 Oop
> for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
> alt: 1
> context: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> 0x00000000f9ba39b0 Oop for
> org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba39b0
> reachesIntoOuterContext: 8
> semanticContext: Oop for org/antlr/v4/runtime/atn/SemanticContext$Predicate
> @ 0x00000000f83d57c0 Oop for
> org/antlr/v4/runtime/atn/SemanticContext$Predicate @ 0x00000000f83d57c0
>
> hsdb> inspect 0x00000000f9ba39b0
> instance of Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> 0x00000000f9ba39b0 @ 0x00000000f9ba39b0 (size = 32)
> _mark: 41551306041
> _metadata._compressed_klass: InstanceKlass for
> org/antlr/v4/runtime/atn/SingletonPredictionContext
> id: 100635259
> cachedHashCode: 2005943142
> parent: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> 0x00000000f9ba01b0 Oop for
> org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba01b0
> returnState: 18228
>
> I could also reproduce the verification errors with a fast debug build of
> 11.0.7 which I did run with "-XX:+CheckCompressedOops -XX:+VerifyOops
> -XX:+G1VerifyCTCleanup -XX:+G1VerifyBitmaps" in addition to the options
> mentioned before, but unfortunaltey the run didn't trigger neither an
> assertion nor a different verification error.
>
> So to summarize, my basic questions are:
>   - has somebody else encountered similar crashes?
>   - is someone aware of specific changes in jdk12 which might solve this
> problem?
>   - are the verification errors I'm seeing accurate or is it possible to get
> false positives when running with -XX:Verify{Before,During,After}GC ?
>
> Thanks for your patience,
> Volker
>
> [1]
> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/hs_err_pid28294.log
> [2]
> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/verify-error.log


From zgu at redhat.com  Wed Jun  3 15:45:01 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 3 Jun 2020 11:45:01 -0400
Subject: [15] RFR 8246458: Shenandoah: TestAllocObjects.java test fail with
 -XX:+ShenandoahVerify
Message-ID: <baa7d70e-8278-7e37-f742-fff81a396ef7@redhat.com>

We should not run root verifier if OOM during evacuating/updating roots 
in final mark phase, cause there is no guarantee that they are consistent.

Bug: https://bugs.openjdk.java.net/browse/JDK-8246458
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246458/weberv.00/

Test:
   hotspot_gc_shenandoah


Thanks,

-Zhengyu


From shade at redhat.com  Wed Jun  3 15:53:37 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 3 Jun 2020 17:53:37 +0200
Subject: [15] RFR 8246458: Shenandoah: TestAllocObjects.java test fail
 with -XX:+ShenandoahVerify
In-Reply-To: <baa7d70e-8278-7e37-f742-fff81a396ef7@redhat.com>
References: <baa7d70e-8278-7e37-f742-fff81a396ef7@redhat.com>
Message-ID: <c6820296-21aa-6b16-c647-40b360e21e9c@redhat.com>

On 6/3/20 5:45 PM, Zhengyu Gu wrote:
> We should not run root verifier if OOM during evacuating/updating roots 
> in final mark phase, cause there is no guarantee that they are consistent.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8246458
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246458/weberv.00/

Looks fine.

-- 
Thanks,
-Aleksey


From zgu at redhat.com  Wed Jun  3 16:09:50 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 3 Jun 2020 12:09:50 -0400
Subject: [15] RFR 8246458: Shenandoah: TestAllocObjects.java test fail
 with -XX:+ShenandoahVerify
In-Reply-To: <c6820296-21aa-6b16-c647-40b360e21e9c@redhat.com>
References: <baa7d70e-8278-7e37-f742-fff81a396ef7@redhat.com>
 <c6820296-21aa-6b16-c647-40b360e21e9c@redhat.com>
Message-ID: <108557b2-0625-7369-3aa9-56a0de781f37@redhat.com>

Thanks and pushed.

-Zhengyu

On 6/3/20 11:53 AM, Aleksey Shipilev wrote:
> On 6/3/20 5:45 PM, Zhengyu Gu wrote:
>> We should not run root verifier if OOM during evacuating/updating roots
>> in final mark phase, cause there is no guarantee that they are consistent.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8246458
>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246458/weberv.00/
> 
> Looks fine.
> 


From volker.simonis at gmail.com  Wed Jun  3 17:14:38 2020
From: volker.simonis at gmail.com (Volker Simonis)
Date: Wed, 3 Jun 2020 19:14:38 +0200
Subject: Need help to fix a potential G1 crash in jdk11
In-Reply-To: <e553e35e-3cb3-1c03-048a-3e4e7adad923@oracle.com>
References: <CA+3eh11vEWqwn+fLvgG=v8gi0fTRLebo8cYOnS7qBWCueJbH-Q@mail.gmail.com>
 <e553e35e-3cb3-1c03-048a-3e4e7adad923@oracle.com>
Message-ID: <CA+3eh12=fhJhXcyeP9kiwo9JUCVU1aXzUpQRAuXMCos2jsqGDw@mail.gmail.com>

Hi Erik,

thanks a lot for the quick response and the hint with ClassUnloading. I've
just started several runs of the test program with "-XX:-ClassUnloading".
I'll report back instantly once I have some results.

Best regards,
Volker

On Wed, Jun 3, 2020 at 5:26 PM Erik ?sterlund <erik.osterlund at oracle.com>
wrote:

> Hi Volker,
>
> In JDK 12, I changed quite a bit how G1 performs class unloading, to a
> new model.
> Since the verification runs just after class unloading, I guess it could
> be interesting
> to check if the error happens with -XX:-ClassUnloading as well. If not,
> then perhaps
> some of my class unloading changes for G1 in JDK 12 fixed the problem.
>
> Just a gut feeling...
>
> Thanks,
> /Erik
>
> On 2020-06-03 17:02, Volker Simonis wrote:
> > Hi,
> >
> > I would appreciate some help/advice for debugging a potential G1 crash in
> > jdk 11. The crash usually occurs when running a proprietary jar file for
> > about 20-30 minutes and it happens in various parts of the VM (C1- or
> > C2-compiled code, interpreter, GC). Because the crash locations are so
> > different and because the customer which reported the issue claimed that
> it
> > doesn't happen with Parallel GC, I thought it might be a G1 issue. I
> > couldn't reproduce the crash with jdk 12 and 14 (but with jdk 11 and
> > 11.0.7, OpenJDK and Oracle JDK). When looking at the G1 changes in jdk
> 12 I
> > couldn't find any apparent bug fix which potentially solves this problem
> > but it may have been solved by one of the many G1 changes which happened
> in
> > jdk 12.
> >
> > I did run the reproducer with "-XX:+UnlockDiagnosticVMOptions
> > -XX:+VerifyBeforeGC -XX:+VerifyAfterGC -XX:+VerifyDuringGC
> > -XX:+CheckJNICalls -XX:+G1VerifyRSetsDuringFullGC
> > -XX:+G1VerifyHeapRegionCodeRoots" and I indeed got verification errors
> (see
> > [1] for a complete hs_err file). Sometimes it's just a few fields
> pointing
> > to dead objects:
> >
> > [1035.782s][error][gc,verify         ] ----------
> > [1035.782s][error][gc,verify         ] Field 0x00000000fb509148 of live
> obj
> > 0x00000000fb509130 in region [0x00000000fb500000, 0x00000000fb600000)
> > [1035.782s][error][gc,verify         ] class name
> > org.antlr.v4.runtime.atn.ATNConfig
> > [1035.782s][error][gc,verify         ] points to dead obj
> > 0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
> > [1035.782s][error][gc,verify         ] class name
> > org.antlr.v4.runtime.atn.SingletonPredictionContext
> > [1035.782s][error][gc,verify         ] ----------
> > [1035.783s][error][gc,verify         ] Field 0x00000000fb509168 of live
> obj
> > 0x00000000fb509150 in region [0x00000000fb500000, 0x00000000fb600000)
> > [1035.783s][error][gc,verify         ] class name
> > org.antlr.v4.runtime.atn.ATNConfig
> > [1035.783s][error][gc,verify         ] points to dead obj
> > 0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
> > [1035.783s][error][gc,verify         ] class name
> > org.antlr.v4.runtime.atn.SingletonPredictionContext
> > [1035.783s][error][gc,verify         ] ----------
> > ...
> > [1043.928s][error][gc,verify         ] Heap Regions: E=young(eden),
> > S=young(survivor), O=old, HS=humongous(starts), HC=humongous(continues),
> > CS=collection set, F=free, A=archive, TAMS=top-at-mark-start (previous,
> > next)
> > ...
> > [1043.929s][error][gc,verify         ] |  79|0x00000000f9b00000,
> > 0x00000000f9bfffe8, 0x00000000f9c00000| 99%| O|  |TAMS
> 0x00000000f9bfffe8,
> > 0x00000000f9b00000| Updating
> > ...
> > [1043.971s][error][gc,verify         ] | 105|0x00000000fb500000,
> > 0x00000000fb54fc08, 0x00000000fb600000| 31%| S|CS|TAMS
> 0x00000000fb500000,
> > 0x00000000fb500000| Complete
> >
> > but I also got verification errors with more than 30000 fields of
> distinct
> > objects pointing to more than 1000 dead objects. How can that happen? Is
> > the verification always accurate or can this also be a problem with the
> > verification itself and I'm hunting the wrong problem?
> >
> > Sometimes I also saw verification errors where fields point to objects in
> > regions with "Untracked remset":
> >
> > [673.762s][error][gc,verify] ----------
> > [673.762s][error][gc,verify] Field 0x00000000fca49298 of live obj
> > 0x00000000fca49280 in region [0x00000000fca0000
> > 0, 0x00000000fcb00000)
> > [673.762s][error][gc,verify] class name
> org.antlr.v4.runtime.atn.ATNConfig
> > [673.762s][error][gc,verify] points to obj 0x00000000f9d5a9a0 in region
> > 81:(F)[0x00000000f9d00000,0x00000000f9d00000,0x00000000f9e00000] remset
> > Untracked
> > [673.762s][error][gc,verify] ----------
> >
> > But they are by far not that common like the pointers to dead objects.
> Once
> > I even saw a "Root location" pointing to a dead object:
> >
> > [369.808s][error][gc,verify] Root location 0x00007f35bb33f1f8 points to
> > dead obj 0x00000000f87fa200
> > [369.808s][error][gc,verify]
> org.antlr.v4.runtime.atn.PredictionContextCache
> > [369.808s][error][gc,verify] {0x00000000f87fa200} - klass:
> > 'org/antlr/v4/runtime/atn/PredictionContextCache'
> > [369.850s][error][gc,verify] ----------
> > [369.850s][error][gc,verify] Field 0x00000000fbc60900 of live obj
> > 0x00000000fbc608f0 in region [0x00000000fbc00000, 0x00000000fbd00000)
> > [369.850s][error][gc,verify] class name
> > org.antlr.v4.runtime.atn.ParserATNSimulator
> > [369.850s][error][gc,verify] points to dead obj 0x00000000f87fa200 in
> > region [0x00000000f8700000, 0x00000000f8800000)
> > [369.850s][error][gc,verify] class name
> > org.antlr.v4.runtime.atn.PredictionContextCache
> > [369.850s][error][gc,verify] ----------
> >
> > All these verification errors occur after the Remark phase in
> > G1ConcurrentMark::remark() at:
> >
> > verify_during_pause(G1HeapVerifier::G1VerifyRemark,
> > VerifyOption_G1UsePrevMarking, "Remark after");
> >
> > V  [libjvm.so+0x6ca186]  report_vm_error(char const*, int, char const*,
> > char const*, ...)+0x106
> > V  [libjvm.so+0x7d4a99]  G1HeapVerifier::verify(VerifyOption)+0x399
> > V  [libjvm.so+0xe128bb]  Universe::verify(VerifyOption, char
> const*)+0x16b
> > V  [libjvm.so+0x7d44ee]
> >   G1HeapVerifier::verify(G1HeapVerifier::G1VerifyType, VerifyOption, char
> > const*)+0x9e
> > V  [libjvm.so+0x7addcf]
> >   G1ConcurrentMark::verify_during_pause(G1HeapVerifier::G1VerifyType,
> > VerifyOption, char const*)+0x9f
> > V  [libjvm.so+0x7b172e]  G1ConcurrentMark::remark()+0x3be
> > V  [libjvm.so+0xe6a5e1]  VM_CGC_Operation::doit()+0x211
> > V  [libjvm.so+0xe69908]  VM_Operation::evaluate()+0xd8
> > V  [libjvm.so+0xe6713f]  VMThread::evaluate_operation(VM_Operation*)
> [clone
> > .constprop.54]+0xff
> > V  [libjvm.so+0xe6764e]  VMThread::loop()+0x3be
> > V  [libjvm.so+0xe67a7b]  VMThread::run()+0x7b
> >
> > The GC log output looks as follows:
> > ...
> > [1035.775s][info ][gc,verify,start   ] Verifying During GC (Remark after)
> > [1035.775s][debug][gc,verify         ] Threads
> > [1035.776s][debug][gc,verify         ] Heap
> > [1035.776s][debug][gc,verify         ] Roots
> > [1035.782s][debug][gc,verify         ] HeapRegionSets
> > [1035.782s][debug][gc,verify         ] HeapRegions
> > [1035.782s][error][gc,verify         ] ----------
> > ...
> > A more complete GC log can be found here [2].
> >
> > For the field 0x00000000fb509148 of live obj 0x00000000fb509130 which
> > points to the dead object 0x00000000f9ba39b0 I get the following
> > information if I inspect them with clhsdb:
> >
> > hsdb> inspect 0x00000000fb509130
> > instance of Oop for org/antlr/v4/runtime/atn/ATNConfig @
> 0x00000000fb509130
> > @ 0x00000000fb509130 (size = 32)
> > _mark: 13
> > _metadata._compressed_klass: InstanceKlass for
> > org/antlr/v4/runtime/atn/ATNConfig
> > state: Oop for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
> Oop
> > for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
> > alt: 1
> > context: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> > 0x00000000f9ba39b0 Oop for
> > org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba39b0
> > reachesIntoOuterContext: 8
> > semanticContext: Oop for
> org/antlr/v4/runtime/atn/SemanticContext$Predicate
> > @ 0x00000000f83d57c0 Oop for
> > org/antlr/v4/runtime/atn/SemanticContext$Predicate @ 0x00000000f83d57c0
> >
> > hsdb> inspect 0x00000000f9ba39b0
> > instance of Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> > 0x00000000f9ba39b0 @ 0x00000000f9ba39b0 (size = 32)
> > _mark: 41551306041
> > _metadata._compressed_klass: InstanceKlass for
> > org/antlr/v4/runtime/atn/SingletonPredictionContext
> > id: 100635259
> > cachedHashCode: 2005943142
> > parent: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
> > 0x00000000f9ba01b0 Oop for
> > org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba01b0
> > returnState: 18228
> >
> > I could also reproduce the verification errors with a fast debug build of
> > 11.0.7 which I did run with "-XX:+CheckCompressedOops -XX:+VerifyOops
> > -XX:+G1VerifyCTCleanup -XX:+G1VerifyBitmaps" in addition to the options
> > mentioned before, but unfortunaltey the run didn't trigger neither an
> > assertion nor a different verification error.
> >
> > So to summarize, my basic questions are:
> >   - has somebody else encountered similar crashes?
> >   - is someone aware of specific changes in jdk12 which might solve this
> > problem?
> >   - are the verification errors I'm seeing accurate or is it possible to
> get
> > false positives when running with -XX:Verify{Before,During,After}GC ?
> >
> > Thanks for your patience,
> > Volker
> >
> > [1]
> >
> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/hs_err_pid28294.log
> > [2]
> >
> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/verify-error.log
>
>


From volker.simonis at gmail.com  Wed Jun  3 18:18:38 2020
From: volker.simonis at gmail.com (Volker Simonis)
Date: Wed, 3 Jun 2020 20:18:38 +0200
Subject: Need help to fix a potential G1 crash in jdk11
In-Reply-To: <CA+3eh12=fhJhXcyeP9kiwo9JUCVU1aXzUpQRAuXMCos2jsqGDw@mail.gmail.com>
References: <CA+3eh11vEWqwn+fLvgG=v8gi0fTRLebo8cYOnS7qBWCueJbH-Q@mail.gmail.com>
 <e553e35e-3cb3-1c03-048a-3e4e7adad923@oracle.com>
 <CA+3eh12=fhJhXcyeP9kiwo9JUCVU1aXzUpQRAuXMCos2jsqGDw@mail.gmail.com>
Message-ID: <CA+3eh132QHBGxyjU3K3ZQ_Xv-UbUUukfQ_3F2c9dD+hb=AAC8g@mail.gmail.com>

Unfortunately, "-XX:-ClassUnloading" doesn't help :(

I already saw two new crashes. The first one has 6 distinct Root locations
pointing to one dead object:

[863.222s][info ][gc,verify,start   ] Verifying During GC (Remark after)
[863.222s][debug][gc,verify         ] Threads
[863.224s][debug][gc,verify         ] Heap
[863.224s][debug][gc,verify         ] Roots
[863.229s][error][gc,verify         ] Root location 0x00007f11719174e7
points to dead obj 0x00000000f956dbd8
[863.229s][error][gc,verify         ]
org.antlr.v4.runtime.atn.PredictionContextCache
[863.229s][error][gc,verify         ] {0x00000000f956dbd8} - klass:
'org/antlr/v4/runtime/atn/PredictionContextCache'
...
[863.229s][error][gc,verify         ] Root location 0x00007f1171921978
points to dead obj 0x00000000f956dbd8
[863.229s][error][gc,verify         ]
org.antlr.v4.runtime.atn.PredictionContextCache
[863.229s][error][gc,verify         ] {0x00000000f956dbd8} - klass:
'org/antlr/v4/runtime/atn/PredictionContextCache'
[863.231s][debug][gc,verify         ] HeapRegionSets
[863.231s][debug][gc,verify         ] HeapRegions
[863.349s][error][gc,verify         ] Heap after failed verification (kind
0):

The second crash has only two Root locations pointing to the same dead
object but more than 40_000 fields in distinct objects pointing to more
than 3_500 dead objects:

[854.473s][info ][gc,verify,start   ] Verifying During GC (Remark after)
[854.473s][debug][gc,verify         ] Threads
[854.475s][debug][gc,verify         ] Heap
[854.475s][debug][gc,verify         ] Roots
[854.479s][error][gc,verify         ] Root location 0x00007f6e60461d5f
points to dead obj 0x00000000fa874528
[854.479s][error][gc,verify         ]
org.antlr.v4.runtime.atn.PredictionContextCache
[854.479s][error][gc,verify         ] {0x00000000fa874528} - klass:
'org/antlr/v4/runtime/atn/PredictionContextCache'
[854.479s][error][gc,verify         ] Root location 0x00007f6e60461d6d
points to dead obj 0x00000000fa874528
[854.479s][error][gc,verify         ]
org.antlr.v4.runtime.atn.PredictionContextCache
[854.479s][error][gc,verify         ] {0x00000000fa874528} - klass:
'org/antlr/v4/runtime/atn/PredictionContextCache'
[854.479s][error][gc,verify         ] Root location 0x00007f6e60462138
points to dead obj 0x00000000fa874528
[854.479s][error][gc,verify         ]
org.antlr.v4.runtime.atn.PredictionContextCache
[854.479s][error][gc,verify         ] {0x00000000fa874528} - klass:
'org/antlr/v4/runtime/atn/PredictionContextCache'
[854.482s][debug][gc,verify         ] HeapRegionSets
[854.482s][debug][gc,verify         ] HeapRegions
[854.484s][error][gc,verify         ] ----------
[854.484s][error][gc,verify         ] Field 0x00000000fd363c70 of live obj
0x00000000fd363c58 in region [0x00000000fd300000, 0x00000000fd400000)
[854.484s][error][gc,verify         ] class name
org.antlr.v4.runtime.atn.ATNConfig
[854.484s][error][gc,verify         ] points to dead obj 0x00000000fa88a540
in region [0x00000000fa800000, 0x00000000fa900000)
[854.484s][error][gc,verify         ] class name
org.antlr.v4.runtime.atn.ArrayPredictionContext
[854.484s][error][gc,verify         ] ----------
...
more than 40_000 fields in distinct objects pointing to more than 3_500
dead objects.

So how can this happen. Is "-XX:+VerifyAfterGC" really reliable here?

Thank you and best regards,
Volker


On Wed, Jun 3, 2020 at 7:14 PM Volker Simonis <volker.simonis at gmail.com>
wrote:

> Hi Erik,
>
> thanks a lot for the quick response and the hint with ClassUnloading. I've
> just started several runs of the test program with "-XX:-ClassUnloading".
> I'll report back instantly once I have some results.
>
> Best regards,
> Volker
>
> On Wed, Jun 3, 2020 at 5:26 PM Erik ?sterlund <erik.osterlund at oracle.com>
> wrote:
>
>> Hi Volker,
>>
>> In JDK 12, I changed quite a bit how G1 performs class unloading, to a
>> new model.
>> Since the verification runs just after class unloading, I guess it could
>> be interesting
>> to check if the error happens with -XX:-ClassUnloading as well. If not,
>> then perhaps
>> some of my class unloading changes for G1 in JDK 12 fixed the problem.
>>
>> Just a gut feeling...
>>
>> Thanks,
>> /Erik
>>
>> On 2020-06-03 17:02, Volker Simonis wrote:
>> > Hi,
>> >
>> > I would appreciate some help/advice for debugging a potential G1 crash
>> in
>> > jdk 11. The crash usually occurs when running a proprietary jar file for
>> > about 20-30 minutes and it happens in various parts of the VM (C1- or
>> > C2-compiled code, interpreter, GC). Because the crash locations are so
>> > different and because the customer which reported the issue claimed
>> that it
>> > doesn't happen with Parallel GC, I thought it might be a G1 issue. I
>> > couldn't reproduce the crash with jdk 12 and 14 (but with jdk 11 and
>> > 11.0.7, OpenJDK and Oracle JDK). When looking at the G1 changes in jdk
>> 12 I
>> > couldn't find any apparent bug fix which potentially solves this problem
>> > but it may have been solved by one of the many G1 changes which
>> happened in
>> > jdk 12.
>> >
>> > I did run the reproducer with "-XX:+UnlockDiagnosticVMOptions
>> > -XX:+VerifyBeforeGC -XX:+VerifyAfterGC -XX:+VerifyDuringGC
>> > -XX:+CheckJNICalls -XX:+G1VerifyRSetsDuringFullGC
>> > -XX:+G1VerifyHeapRegionCodeRoots" and I indeed got verification errors
>> (see
>> > [1] for a complete hs_err file). Sometimes it's just a few fields
>> pointing
>> > to dead objects:
>> >
>> > [1035.782s][error][gc,verify         ] ----------
>> > [1035.782s][error][gc,verify         ] Field 0x00000000fb509148 of live
>> obj
>> > 0x00000000fb509130 in region [0x00000000fb500000, 0x00000000fb600000)
>> > [1035.782s][error][gc,verify         ] class name
>> > org.antlr.v4.runtime.atn.ATNConfig
>> > [1035.782s][error][gc,verify         ] points to dead obj
>> > 0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
>> > [1035.782s][error][gc,verify         ] class name
>> > org.antlr.v4.runtime.atn.SingletonPredictionContext
>> > [1035.782s][error][gc,verify         ] ----------
>> > [1035.783s][error][gc,verify         ] Field 0x00000000fb509168 of live
>> obj
>> > 0x00000000fb509150 in region [0x00000000fb500000, 0x00000000fb600000)
>> > [1035.783s][error][gc,verify         ] class name
>> > org.antlr.v4.runtime.atn.ATNConfig
>> > [1035.783s][error][gc,verify         ] points to dead obj
>> > 0x00000000f9ba39b0 in region [0x00000000f9b00000, 0x00000000f9c00000)
>> > [1035.783s][error][gc,verify         ] class name
>> > org.antlr.v4.runtime.atn.SingletonPredictionContext
>> > [1035.783s][error][gc,verify         ] ----------
>> > ...
>> > [1043.928s][error][gc,verify         ] Heap Regions: E=young(eden),
>> > S=young(survivor), O=old, HS=humongous(starts), HC=humongous(continues),
>> > CS=collection set, F=free, A=archive, TAMS=top-at-mark-start (previous,
>> > next)
>> > ...
>> > [1043.929s][error][gc,verify         ] |  79|0x00000000f9b00000,
>> > 0x00000000f9bfffe8, 0x00000000f9c00000| 99%| O|  |TAMS
>> 0x00000000f9bfffe8,
>> > 0x00000000f9b00000| Updating
>> > ...
>> > [1043.971s][error][gc,verify         ] | 105|0x00000000fb500000,
>> > 0x00000000fb54fc08, 0x00000000fb600000| 31%| S|CS|TAMS
>> 0x00000000fb500000,
>> > 0x00000000fb500000| Complete
>> >
>> > but I also got verification errors with more than 30000 fields of
>> distinct
>> > objects pointing to more than 1000 dead objects. How can that happen? Is
>> > the verification always accurate or can this also be a problem with the
>> > verification itself and I'm hunting the wrong problem?
>> >
>> > Sometimes I also saw verification errors where fields point to objects
>> in
>> > regions with "Untracked remset":
>> >
>> > [673.762s][error][gc,verify] ----------
>> > [673.762s][error][gc,verify] Field 0x00000000fca49298 of live obj
>> > 0x00000000fca49280 in region [0x00000000fca0000
>> > 0, 0x00000000fcb00000)
>> > [673.762s][error][gc,verify] class name
>> org.antlr.v4.runtime.atn.ATNConfig
>> > [673.762s][error][gc,verify] points to obj 0x00000000f9d5a9a0 in region
>> > 81:(F)[0x00000000f9d00000,0x00000000f9d00000,0x00000000f9e00000] remset
>> > Untracked
>> > [673.762s][error][gc,verify] ----------
>> >
>> > But they are by far not that common like the pointers to dead objects.
>> Once
>> > I even saw a "Root location" pointing to a dead object:
>> >
>> > [369.808s][error][gc,verify] Root location 0x00007f35bb33f1f8 points to
>> > dead obj 0x00000000f87fa200
>> > [369.808s][error][gc,verify]
>> org.antlr.v4.runtime.atn.PredictionContextCache
>> > [369.808s][error][gc,verify] {0x00000000f87fa200} - klass:
>> > 'org/antlr/v4/runtime/atn/PredictionContextCache'
>> > [369.850s][error][gc,verify] ----------
>> > [369.850s][error][gc,verify] Field 0x00000000fbc60900 of live obj
>> > 0x00000000fbc608f0 in region [0x00000000fbc00000, 0x00000000fbd00000)
>> > [369.850s][error][gc,verify] class name
>> > org.antlr.v4.runtime.atn.ParserATNSimulator
>> > [369.850s][error][gc,verify] points to dead obj 0x00000000f87fa200 in
>> > region [0x00000000f8700000, 0x00000000f8800000)
>> > [369.850s][error][gc,verify] class name
>> > org.antlr.v4.runtime.atn.PredictionContextCache
>> > [369.850s][error][gc,verify] ----------
>> >
>> > All these verification errors occur after the Remark phase in
>> > G1ConcurrentMark::remark() at:
>> >
>> > verify_during_pause(G1HeapVerifier::G1VerifyRemark,
>> > VerifyOption_G1UsePrevMarking, "Remark after");
>> >
>> > V  [libjvm.so+0x6ca186]  report_vm_error(char const*, int, char const*,
>> > char const*, ...)+0x106
>> > V  [libjvm.so+0x7d4a99]  G1HeapVerifier::verify(VerifyOption)+0x399
>> > V  [libjvm.so+0xe128bb]  Universe::verify(VerifyOption, char
>> const*)+0x16b
>> > V  [libjvm.so+0x7d44ee]
>> >   G1HeapVerifier::verify(G1HeapVerifier::G1VerifyType, VerifyOption,
>> char
>> > const*)+0x9e
>> > V  [libjvm.so+0x7addcf]
>> >   G1ConcurrentMark::verify_during_pause(G1HeapVerifier::G1VerifyType,
>> > VerifyOption, char const*)+0x9f
>> > V  [libjvm.so+0x7b172e]  G1ConcurrentMark::remark()+0x3be
>> > V  [libjvm.so+0xe6a5e1]  VM_CGC_Operation::doit()+0x211
>> > V  [libjvm.so+0xe69908]  VM_Operation::evaluate()+0xd8
>> > V  [libjvm.so+0xe6713f]  VMThread::evaluate_operation(VM_Operation*)
>> [clone
>> > .constprop.54]+0xff
>> > V  [libjvm.so+0xe6764e]  VMThread::loop()+0x3be
>> > V  [libjvm.so+0xe67a7b]  VMThread::run()+0x7b
>> >
>> > The GC log output looks as follows:
>> > ...
>> > [1035.775s][info ][gc,verify,start   ] Verifying During GC (Remark
>> after)
>> > [1035.775s][debug][gc,verify         ] Threads
>> > [1035.776s][debug][gc,verify         ] Heap
>> > [1035.776s][debug][gc,verify         ] Roots
>> > [1035.782s][debug][gc,verify         ] HeapRegionSets
>> > [1035.782s][debug][gc,verify         ] HeapRegions
>> > [1035.782s][error][gc,verify         ] ----------
>> > ...
>> > A more complete GC log can be found here [2].
>> >
>> > For the field 0x00000000fb509148 of live obj 0x00000000fb509130 which
>> > points to the dead object 0x00000000f9ba39b0 I get the following
>> > information if I inspect them with clhsdb:
>> >
>> > hsdb> inspect 0x00000000fb509130
>> > instance of Oop for org/antlr/v4/runtime/atn/ATNConfig @
>> 0x00000000fb509130
>> > @ 0x00000000fb509130 (size = 32)
>> > _mark: 13
>> > _metadata._compressed_klass: InstanceKlass for
>> > org/antlr/v4/runtime/atn/ATNConfig
>> > state: Oop for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
>> Oop
>> > for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
>> > alt: 1
>> > context: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
>> > 0x00000000f9ba39b0 Oop for
>> > org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba39b0
>> > reachesIntoOuterContext: 8
>> > semanticContext: Oop for
>> org/antlr/v4/runtime/atn/SemanticContext$Predicate
>> > @ 0x00000000f83d57c0 Oop for
>> > org/antlr/v4/runtime/atn/SemanticContext$Predicate @ 0x00000000f83d57c0
>> >
>> > hsdb> inspect 0x00000000f9ba39b0
>> > instance of Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext
>> @
>> > 0x00000000f9ba39b0 @ 0x00000000f9ba39b0 (size = 32)
>> > _mark: 41551306041
>> > _metadata._compressed_klass: InstanceKlass for
>> > org/antlr/v4/runtime/atn/SingletonPredictionContext
>> > id: 100635259
>> > cachedHashCode: 2005943142
>> > parent: Oop for org/antlr/v4/runtime/atn/SingletonPredictionContext @
>> > 0x00000000f9ba01b0 Oop for
>> > org/antlr/v4/runtime/atn/SingletonPredictionContext @ 0x00000000f9ba01b0
>> > returnState: 18228
>> >
>> > I could also reproduce the verification errors with a fast debug build
>> of
>> > 11.0.7 which I did run with "-XX:+CheckCompressedOops -XX:+VerifyOops
>> > -XX:+G1VerifyCTCleanup -XX:+G1VerifyBitmaps" in addition to the options
>> > mentioned before, but unfortunaltey the run didn't trigger neither an
>> > assertion nor a different verification error.
>> >
>> > So to summarize, my basic questions are:
>> >   - has somebody else encountered similar crashes?
>> >   - is someone aware of specific changes in jdk12 which might solve this
>> > problem?
>> >   - are the verification errors I'm seeing accurate or is it possible
>> to get
>> > false positives when running with -XX:Verify{Before,During,After}GC ?
>> >
>> > Thanks for your patience,
>> > Volker
>> >
>> > [1]
>> >
>> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/hs_err_pid28294.log
>> > [2]
>> >
>> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/verify-error.log
>>
>>


From igor.ignatyev at oracle.com  Wed Jun  3 21:30:52 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 3 Jun 2020 14:30:52 -0700
Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property
Message-ID: <F3998ECB-C19E-4F50-A7CA-A2D0A390ED70@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
> 70 lines changed: 66 ins; 0 del; 4 mod

Hi all,

could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags?

the idea behind this patch is to have a way to clearly mark tests which ignore flags, so 
a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed;
b) they can be easily excluded from runs w/ flags.

@requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless`  w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable.

this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1].

please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3].

JBS: https://bugs.openjdk.java.net/browse/JDK-8246494
webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags

[1] https://bugs.openjdk.java.net/browse/JDK-8151707
[2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336
[3] https://bugs.openjdk.java.net/browse/JDK-8246387

Thanks,
-- Igor

From david.holmes at oracle.com  Wed Jun  3 23:02:28 2020
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 4 Jun 2020 09:02:28 +1000
Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property
In-Reply-To: <F3998ECB-C19E-4F50-A7CA-A2D0A390ED70@oracle.com>
References: <F3998ECB-C19E-4F50-A7CA-A2D0A390ED70@oracle.com>
Message-ID: <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com>

Hi Igor,

On 4/06/2020 7:30 am, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
>> 70 lines changed: 66 ins; 0 del; 4 mod
> 
> Hi all,
> 
> could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags?
> 
> the idea behind this patch is to have a way to clearly mark tests which ignore flags, so
> a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed;
> b) they can be easily excluded from runs w/ flags.

So all such tests should be using driver mode, and further the VMs they 
then exec don't use any of the APIs that include the jtreg test arguments.

Okay this seems reasonable in what it does.

Thanks,
David

> @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless`  w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable.
> 
> this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1].
> 
> please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3].
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8246494
> webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
> testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8151707
> [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336
> [3] https://bugs.openjdk.java.net/browse/JDK-8246387
> 
> Thanks,
> -- Igor
> 


From luoziyi at amazon.com  Wed Jun  3 23:15:51 2020
From: luoziyi at amazon.com (Luo, Ziyi)
Date: Wed, 3 Jun 2020 23:15:51 +0000
Subject: RFR (S) 8246274: G1 old gen allocation tracking is not in a
 separate class
In-Reply-To: <f296e001-a9dc-f618-627e-7eb38b197e00@oracle.com>
References: <1D252CA6-AEF8-4E57-9B6C-37AACE9D7EC1@amazon.com>
 <f296e001-a9dc-f618-627e-7eb38b197e00@oracle.com>
Message-ID: <DF73B94F-B6F0-4523-B08F-B6246BB8525F@amazon.com>

Hi Thomas, thanks for your review. A second revision is published:

http://cr.openjdk.java.net/~phh/8246274/webrev.01
(all hotspot:tier1 passed)

On 6/3/20, 2:06 AM, Thomas Schatzl wrote:    

> Hi,
>
> On 02.06.20 01:12, Luo, Ziyi wrote:
>> Hi,
>>
>> Could you please review this change which refactors
>> G1Policy::_bytes_allocated_in_old_since_last_gc into a dedicated new
>> tracking class G1OldGenAllocationTracker?
>>
>> Bug ID:
>> https://bugs.openjdk.java.net/browse/JDK-8246274
>> Webrev:
>> http://cr.openjdk.java.net/~phh/8246274/webrev.00/
>>
>> Testing: Local run hotspot:tier1.
>>
>> This is the first step toward improving the G1 old gen allocation tracking. As
>> described in JDK-8245511, we will further add humongous allocation tracking
>> and refactor G1IHOPControl::update_allocation_info(). This is a clean
>> refactoring of the original G1Policy::_bytes_allocated_in_old_since_last_gc
>> field and G1Policy::add_bytes_allocated_in_old_since_last_gc() method.
>>
>> Thanks,
>> Ziyi
>>

>    - I suggest to keep the existing public interface in G1Policy, i.e.
> the add_bytes_allocated_in_old_since_last_gc. Making the old gen tracker
> object public does not seem to be advantageous.
> I.e. imo it is good to group everything related to old gen allocation
> tracking into that helper class, but we do not need to expose that fact.

> Maybe there is something in a follow up change that requires this?

Yes, the follow up change will introduce two more interfaces in 
G1OldGenAllocationTracker to track the regular and on-collection-pause 
humongous allocations respectively in G1CollectedHeap.

>    - the _old_gen_alloc_tracker instance can be aggregated within the
> G1Policy class directly, i.e. there is no need to make it a pointer and
> manage via new and delete afaics.

> Maybe there is something in a follow up change that requires this?

You are right, fixed in rev.01

> - I would prefer if there were separate reset_after_[young_]gc() and a
> reset_after_full_gc() methods. Initially I asked myselves why for full
> gc that first parameter passed to reset_after_gc() is zero, and only
> when looking into the code found that it is ignored anyway. I think the
> API of that class isn't that huge yet.

Make sense. I refactored it into two methods in rev.01:
* reset_after_full_gc()        for full GC
* reset_after_incremental_gc() for young and mixed GC

Thanks,
Ziyi


From igor.ignatyev at oracle.com  Thu Jun  4 01:05:07 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 3 Jun 2020 18:05:07 -0700
Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property
In-Reply-To: <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com>
References: <F3998ECB-C19E-4F50-A7CA-A2D0A390ED70@oracle.com>
 <4584c046-ed5b-e1b9-f16b-3d4383cf1001@oracle.com>
Message-ID: <5430D545-BE0C-4022-9468-D6EAFF7BAC78@oracle.com>

Hi David,

> So all such tests should be using driver mode, and further the VMs they then exec don't use any of the APIs that include the jtreg test arguments.

correct, and 8151707's subtasks are going to mark only such tests (and tests which should be using driver-mode, but can't due to external factors, remember these follow-up fixes for my use driver-mode? ;) ). there are two more (a bit controversial) use cases where we can consider usage of vm.flagless:
 - some of debugger-debuggee tests have debugger executed w/ external flags, but don't pass these flags to debuggee; and in most cases, it doesn't seem to be right, so arguable all such tests should be updated to use driver mode to run debugger and then marked w/ vm.flagless. I know that svc team was doing some cleanup in this area recently, and given it's require more investigation w.r.t the tests' intent, I don't plan to do it as a part of 8151707, and instead will create follow up RFEs/tasks.
- a unit-like tests which don't ignore flags, but weren't designed to be run w/ external flags; most of jfr tests can be used as an example: you can run w/ any flags, but they might fail as they assert things which happen only in certain configurations and these configurations are written in jtreg test descriptions. currently, these tests are marked w/ jfr k/w and it's advised not to run them w/ any external flags, yet I know that some people successfully do that to test their configurations. given the set of configurations which satisfies needs of jfr tests is much bigger than the configurations listed in the tests, I kinda feel sympathetic to people doing that, on the other hand, it's unsupported and I'd prefer us to express (and enforce) that more clearly. again, given the possible controversiality and need for a broader discussion, I'm planning to file an issue for jfr tests and follow up later w/ interested parties.

to sum up, 8151707's subtasks are going to mark *only* obvious and non-controversial cases. for all other cases, the JBS entries are to be filed and followed up on.

Cheers,
-- Igor

> On Jun 3, 2020, at 4:02 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> Hi Igor,
> 
> On 4/06/2020 7:30 am, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
>>> 70 lines changed: 66 ins; 0 del; 4 mod
>> Hi all,
>> could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags?
>> the idea behind this patch is to have a way to clearly mark tests which ignore flags, so
>> a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed;
>> b) they can be easily excluded from runs w/ flags.
> 
> So all such tests should be using driver mode, and further the VMs they then exec don't use any of the APIs that include the jtreg test arguments.

> 
> Okay this seems reasonable in what it does.
> 
> Thanks,
> David
> 
>> @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless`  w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable.
>> this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1].
>> please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3].
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8246494
>> webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
>> testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags
>> [1] https://bugs.openjdk.java.net/browse/JDK-8151707
>> [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336
>> [3] https://bugs.openjdk.java.net/browse/JDK-8246387
>> Thanks,
>> -- Igor


From serguei.spitsyn at oracle.com  Thu Jun  4 02:07:20 2020
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Wed, 3 Jun 2020 19:07:20 -0700
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
References: <AM0PR0202MB333121DBB703616DF690F38B9B1D0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <32f34616-cf17-8caa-5064-455e013e2313@oracle.com>
 <AM0PR0202MB3331802279A7608927F4D9B49BB00@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
 <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>
 <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
Message-ID: <cfc355bd-812c-ab25-4bd9-530bdddba278@oracle.com>

Hi Richard,

The mach5 test run is good.

Thanks,
Serguei


On 6/2/20 10:57, Reingruber, Richard wrote:
> Hi Serguei,
>
>> This looks good to me.
> Thanks!
>
>  From an earlier mail:
>
>> I'm thinking it would be more safe to run full tier5.
> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would
> like to push.
>
> Thanks, Richard.
>
> -----Original Message-----
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Dienstag, 2. Juni 2020 18:55
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>
> Hi Richard,
>
> This looks good to me.
>
> Thanks,
> Serguei
>
>
> On 5/28/20 09:02, Vladimir Kozlov wrote:
>> Vladimir Ivanov is on break currently.
>> It looks good to me.
>>
>> Thanks,
>> Vladimir K
>>
>> On 5/26/20 7:31 AM, Reingruber, Richard wrote:
>>> Hi Vladimir,
>>>
>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>>> Not an expert in JVMTI code base, so can't comment on the actual
>>>> changes.
>>>>  ? From JIT-compilers perspective it looks good.
>>> I put out webrev.1 a while ago [1]:
>>>
>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/
>>> Webrev(delta):
>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/
>>>
>>> You originally suggested to use a handshake to switch a thread into
>>> interpreter mode [2]. I'm using
>>> a direct handshake now, because I think it is the best fit.
>>>
>>> May I ask if webrev.1 still looks good to you from JIT-compilers
>>> perspective?
>>>
>>> Can I list you as (partial) Reviewer?
>>>
>>> Thanks, Richard.
>>>
>>> [1]
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html
>>> [2]
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html
>>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Freitag, 7. Februar 2020 09:19
>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>> serviceability-dev at openjdk.java.net;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR(S) 8238585: Use handshake for
>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make
>>> compiled methods on stack not_entrant
>>>
>>>
>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>> Not an expert in JVMTI code base, so can't comment on the actual
>>> changes.
>>>
>>>  ? From JIT-compilers perspective it looks good.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585
>>>>
>>>> The change avoids making all compiled methods on stack not_entrant
>>>> when switching a java thread to
>>>> interpreter only execution for jvmti purposes. It is sufficient to
>>>> deoptimize the compiled frames on stack.
>>>>
>>>> Additionally a handshake is used instead of a vm operation to walk
>>>> the stack and do the deoptimizations.
>>>>
>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and
>>>> release builds on all platforms.
>>>>
>>>> Thanks, Richard.
>>>>
>>>> See also my question if anyone knows a reason for making the
>>>> compiled methods not_entrant:
>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html
>>>>
>>>>


From thomas.schatzl at oracle.com  Thu Jun  4 10:15:26 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 4 Jun 2020 12:15:26 +0200
Subject: RFR (XXS): 8246557: test_os_linux.cpp uses NULL instead of MAP_FAILED
 to check for failed mmap call
Message-ID: <16acf677-df46-3419-a851-d09bbd6f4e83@oracle.com>

Hi all,

   can I have reviews for this (trivial?) change that fixes wrong 
detection logic for the mmap calls in the test_os_linux gtests?

Instead of

    ASSERT_TRUE(mapping != NULL) << " mmap failed, mapping_size = " << 
mapping_size;

it should use

    ASSERT_TRUE(mapping != MAP_FAILED) << " mmap failed, mapping_size = 
" << mapping_size;

since MAP_FAILED = (void*)-1.

All other uses of mmap seem to be okay.

CR:
https://bugs.openjdk.java.net/browse/JDK-8246557
Webrev:
http://cr.openjdk.java.net/~tschatzl/8246557/webrev/
Testing:
hs-tier1

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Thu Jun  4 10:48:01 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 4 Jun 2020 12:48:01 +0200
Subject: RFR (S) 8246274: G1 old gen allocation tracking is not in a
 separate class
In-Reply-To: <DF73B94F-B6F0-4523-B08F-B6246BB8525F@amazon.com>
References: <1D252CA6-AEF8-4E57-9B6C-37AACE9D7EC1@amazon.com>
 <f296e001-a9dc-f618-627e-7eb38b197e00@oracle.com>
 <DF73B94F-B6F0-4523-B08F-B6246BB8525F@amazon.com>
Message-ID: <bb6fa665-cfbc-220f-8553-f8afea954ddd@oracle.com>

Hi,

On 04.06.20 01:15, Luo, Ziyi wrote:
> Hi Thomas, thanks for your review. A second revision is published:
> 
> http://cr.openjdk.java.net/~phh/8246274/webrev.01
> (all hotspot:tier1 passed)
> 
> On 6/3/20, 2:06 AM, Thomas Schatzl wrote:
> 
>> Hi,

[..]
>>     - I suggest to keep the existing public interface in G1Policy, i.e.
>> the add_bytes_allocated_in_old_since_last_gc. Making the old gen tracker
>> object public does not seem to be advantageous.
>> I.e. imo it is good to group everything related to old gen allocation
>> tracking into that helper class, but we do not need to expose that fact.
> 
>> Maybe there is something in a follow up change that requires this?
> 
> Yes, the follow up change will introduce two more interfaces in
> G1OldGenAllocationTracker to track the regular and on-collection-pause
> humongous allocations respectively in G1CollectedHeap.

Okay, then let's wait for that.

> 
>>     - the _old_gen_alloc_tracker instance can be aggregated within the
>> G1Policy class directly, i.e. there is no need to make it a pointer and
>> manage via new and delete afaics.
> 
>> Maybe there is something in a follow up change that requires this?
> 
> You are right, fixed in rev.01
> 
>> - I would prefer if there were separate reset_after_[young_]gc() and a
>> reset_after_full_gc() methods. Initially I asked myselves why for full
>> gc that first parameter passed to reset_after_gc() is zero, and only
>> when looking into the code found that it is ignored anyway. I think the
>> API of that class isn't that huge yet.
> 
> Make sense. I refactored it into two methods in rev.01:
> * reset_after_full_gc()        for full GC
> * reset_after_incremental_gc() for young and mixed GC
> 

I would prefer to use "reset_after_young_gc" for the latter - almost all 
existing code uses "young" already to indicate incremental gc (except 
the MBeans support code, but this is _very_ old code).

The "correct" terms of the two types of young gc would be "young only" 
and "mixed" btw. Even "young only" has for a long time now also 
reclaimed some kind of old gen (humongous) regions.
I do not expect this to go away in the future but the line blurring even 
more with potentially adding defragmentation work to young only to 
facilitate humongous allocation.

And as an ultra-minor nit, please remove the "private" visibility 
specifier in the g1OldGenAllocationTracker class as its default.

As for performance testing I will do that in conjunction with the 
changes to the actual IHOP calculations.

Do you need a sponsor for this change or has Paul Hohensee offered to do 
that?

Thanks,
   Thomas


From stefan.karlsson at oracle.com  Thu Jun  4 11:26:35 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 4 Jun 2020 13:26:35 +0200
Subject: RFR (XXS): 8246557: test_os_linux.cpp uses NULL instead of
 MAP_FAILED to check for failed mmap call
In-Reply-To: <16acf677-df46-3419-a851-d09bbd6f4e83@oracle.com>
References: <16acf677-df46-3419-a851-d09bbd6f4e83@oracle.com>
Message-ID: <d0cf804b-a481-8e9c-c8a3-095c952e7f93@oracle.com>

Looks good.

StefanK

On 2020-06-04 12:15, Thomas Schatzl wrote:
> Hi all,
>
> ? can I have reviews for this (trivial?) change that fixes wrong 
> detection logic for the mmap calls in the test_os_linux gtests?
>
> Instead of
>
> ?? ASSERT_TRUE(mapping != NULL) << " mmap failed, mapping_size = " << 
> mapping_size;
>
> it should use
>
> ?? ASSERT_TRUE(mapping != MAP_FAILED) << " mmap failed, mapping_size = 
> " << mapping_size;
>
> since MAP_FAILED = (void*)-1.
>
> All other uses of mmap seem to be okay.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8246557
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8246557/webrev/
> Testing:
> hs-tier1
>
> Thanks,
> ? Thomas


From stefan.johansson at oracle.com  Thu Jun  4 12:03:07 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Thu, 4 Jun 2020 14:03:07 +0200
Subject: RFR (XXS): 8246557: test_os_linux.cpp uses NULL instead of
 MAP_FAILED to check for failed mmap call
In-Reply-To: <d0cf804b-a481-8e9c-c8a3-095c952e7f93@oracle.com>
References: <16acf677-df46-3419-a851-d09bbd6f4e83@oracle.com>
 <d0cf804b-a481-8e9c-c8a3-095c952e7f93@oracle.com>
Message-ID: <bb41633b-a9b0-ff97-1fb5-78f5d83cc0a2@oracle.com>

+1

On 2020-06-04 13:26, Stefan Karlsson wrote:
> Looks good.
> 
> StefanK
> 
> On 2020-06-04 12:15, Thomas Schatzl wrote:
>> Hi all,
>>
>> ? can I have reviews for this (trivial?) change that fixes wrong 
>> detection logic for the mmap calls in the test_os_linux gtests?
>>
>> Instead of
>>
>> ?? ASSERT_TRUE(mapping != NULL) << " mmap failed, mapping_size = " << 
>> mapping_size;
>>
>> it should use
>>
>> ?? ASSERT_TRUE(mapping != MAP_FAILED) << " mmap failed, mapping_size = 
>> " << mapping_size;
>>
>> since MAP_FAILED = (void*)-1.
>>
>> All other uses of mmap seem to be okay.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8246557
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8246557/webrev/
>> Testing:
>> hs-tier1
>>
>> Thanks,
>> ? Thomas
> 


From thomas.schatzl at oracle.com  Thu Jun  4 12:26:16 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 4 Jun 2020 14:26:16 +0200
Subject: RFR (XXS): 8246557: test_os_linux.cpp uses NULL instead of
 MAP_FAILED to check for failed mmap call
In-Reply-To: <bb41633b-a9b0-ff97-1fb5-78f5d83cc0a2@oracle.com>
References: <16acf677-df46-3419-a851-d09bbd6f4e83@oracle.com>
 <d0cf804b-a481-8e9c-c8a3-095c952e7f93@oracle.com>
 <bb41633b-a9b0-ff97-1fb5-78f5d83cc0a2@oracle.com>
Message-ID: <e4978ddd-7383-1283-9acf-7c6283d4bfcc@oracle.com>

Hi Stefan and Stefan,

On 04.06.20 14:03, stefan.johansson at oracle.com wrote:
> +1
> 
> On 2020-06-04 13:26, Stefan Karlsson wrote:
>> Looks good.
>>
>> StefanK

   thanks for your reviews.

Thomas


From luoziyi at amazon.com  Thu Jun  4 16:15:40 2020
From: luoziyi at amazon.com (Luo, Ziyi)
Date: Thu, 4 Jun 2020 16:15:40 +0000
Subject: RFR (S) 8246274: G1 old gen allocation tracking is not in a
 separate class
In-Reply-To: <bb6fa665-cfbc-220f-8553-f8afea954ddd@oracle.com>
References: <1D252CA6-AEF8-4E57-9B6C-37AACE9D7EC1@amazon.com>
 <f296e001-a9dc-f618-627e-7eb38b197e00@oracle.com>
 <DF73B94F-B6F0-4523-B08F-B6246BB8525F@amazon.com>
 <bb6fa665-cfbc-220f-8553-f8afea954ddd@oracle.com>
Message-ID: <7A0D8D4A-E6CE-444C-863E-33E77A446B48@amazon.com>

Hi Thomas,

A third revision is here:
http://cr.openjdk.java.net/~phh/8246274/webrev.02

On 6/4/20, 3:51 AM, Thomas Schatzl wrote:

[..]

>>> - I would prefer if there were separate reset_after_[young_]gc() and a
>>> reset_after_full_gc() methods. Initially I asked myselves why for full
>>> gc that first parameter passed to reset_after_gc() is zero, and only
>>> when looking into the code found that it is ignored anyway. I think the
>>> API of that class isn't that huge yet.
>>
>> Make sense. I refactored it into two methods in rev.01:
>> * reset_after_full_gc()        for full GC
>> * reset_after_incremental_gc() for young and mixed GC
>>

> I would prefer to use "reset_after_young_gc" for the latter - almost all
> existing code uses "young" already to indicate incremental gc (except
> the MBeans support code, but this is _very_ old code).
> 
> The "correct" terms of the two types of young gc would be "young only"
> and "mixed" btw. Even "young only" has for a long time now also
> reclaimed some kind of old gen (humongous) regions.
> I do not expect this to go away in the future but the line blurring even
> more with potentially adding defragmentation work to young only to
> facilitate humongous allocation.

Thanks for the explanation. Fixed in rev.02.

> And as an ultra-minor nit, please remove the "private" visibility
> specifier in the g1OldGenAllocationTracker class as its default.

Done

> As for performance testing I will do that in conjunction with the
> changes to the actual IHOP calculations.

Yep. Besides, I'll add a new gtest for g1OldGenAllocationTracker in the next 
change.

> Do you need a sponsor for this change or has Paul Hohensee offered to do
> that?

Thanks, Paul will sponsor me.

Best,
Ziyi 


From ioi.lam at oracle.com  Thu Jun  4 16:22:37 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 4 Jun 2020 09:22:37 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
Message-ID: <bca5641f-7d64-df8a-6c26-bad7df1cd4bb@oracle.com>

CC-ing hotspot-gc-dev as well

On 6/3/20 10:53 PM, Ioi Lam wrote:
>
>> On 5/29/20 9:40 PM, Yumin Qi wrote:
>>> HI, Ioi
>>>
>>> ? If the allocation of EDEN happens between GC and dump, should we 
>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at 
>>> safepoint there should no allocation happens. The stack trace showed 
>>> it happened with a Java Thread, which should be blocked at safepoint.
>>>
>>
>> Hi Yumin,
>>
>> I think GCs cannot be executed inside a safepoint, because some parts 
>> of GC need to execute in a safepoint, so they will be blocked until 
>> VM_PopulateDumpSharedSpace::doit has returned.
>>
>> Anyway, as I mentioned in my reply to Jiangli, there's a better way 
>> to fix this, so I will withdraw the current patch.
>>
>
> Hi Yumin,
>
> Actually, I changed my mind again, and implemented your suggestion :-)
>
> There's actually a way to invoke GC inside a safepoint (it's used by 
> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the 
> same thing. It's a much simpler change and does what I want -- no 
> other thread will be able to make any heap allocation after the GC has 
> completed, so no EDEN region will be allocated:
>
> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
>
> The check for "if (GCLocker::is_active())" should almost always be 
> false. I left it there just for safety:
>
> During -Xshare:dump, we execute Java code to build the module graph, 
> load classes, etc. So theoretically someone could try to parallelize 
> some of that Java code in the future. Theoretically when CDS has 
> entered the safepoint, another thread could be in the middle a JNI 
> method that has held the GCLock.
>
> Thanks
> - Ioi
>
>>
>>>
>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/ 
>>>>
>>>>
>>>>
>>>> Summary:
>>>>
>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>> CDS executes a full GC so that G1 will compact the heap regions, 
>>>> leaving
>>>> maximum contiguous free space at the top of the heap. Then, the 
>>>> archived
>>>> heap regions are allocated from the top of the heap.
>>>>
>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>> after the GC has completed. The cleaners may allocate or 
>>>> synchronized, which
>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>>
>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN 
>>>> regions exist,
>>>> exit the safepoint, run GC, and try again. Eventually all the 
>>>> cleaners will
>>>> be executed and no more allocation can happen.
>>>>
>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>>>
>>
>


From thomas.schatzl at oracle.com  Thu Jun  4 17:05:35 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 4 Jun 2020 19:05:35 +0200
Subject: RFR (S) 8246274: G1 old gen allocation tracking is not in a
 separate class
In-Reply-To: <7A0D8D4A-E6CE-444C-863E-33E77A446B48@amazon.com>
References: <1D252CA6-AEF8-4E57-9B6C-37AACE9D7EC1@amazon.com>
 <f296e001-a9dc-f618-627e-7eb38b197e00@oracle.com>
 <DF73B94F-B6F0-4523-B08F-B6246BB8525F@amazon.com>
 <bb6fa665-cfbc-220f-8553-f8afea954ddd@oracle.com>
 <7A0D8D4A-E6CE-444C-863E-33E77A446B48@amazon.com>
Message-ID: <afe06869-4731-7d50-4686-fe4a2be78272@oracle.com>

Hi,

   lgtm. Thanks.

Thomas

On 04.06.20 18:15, Luo, Ziyi wrote:
> Hi Thomas,
> 
> A third revision is here:
> http://cr.openjdk.java.net/~phh/8246274/webrev.02
> 
> On 6/4/20, 3:51 AM, Thomas Schatzl wrote:
> 
> [..]
> 
>>>> - I would prefer if there were separate reset_after_[young_]gc() and a
>>>> reset_after_full_gc() methods. Initially I asked myselves why for full
>>>> gc that first parameter passed to reset_after_gc() is zero, and only
>>>> when looking into the code found that it is ignored anyway. I think the
>>>> API of that class isn't that huge yet.
>>>
>>> Make sense. I refactored it into two methods in rev.01:
>>> * reset_after_full_gc()        for full GC
>>> * reset_after_incremental_gc() for young and mixed GC
>>>
> 
>> I would prefer to use "reset_after_young_gc" for the latter - almost all
>> existing code uses "young" already to indicate incremental gc (except
>> the MBeans support code, but this is _very_ old code).
>>
>> The "correct" terms of the two types of young gc would be "young only"
>> and "mixed" btw. Even "young only" has for a long time now also
>> reclaimed some kind of old gen (humongous) regions.
>> I do not expect this to go away in the future but the line blurring even
>> more with potentially adding defragmentation work to young only to
>> facilitate humongous allocation.
> 
> Thanks for the explanation. Fixed in rev.02.
> 
>> And as an ultra-minor nit, please remove the "private" visibility
>> specifier in the g1OldGenAllocationTracker class as its default.
> 
> Done
> 
>> As for performance testing I will do that in conjunction with the
>> changes to the actual IHOP calculations.
> 
> Yep. Besides, I'll add a new gtest for g1OldGenAllocationTracker in the next
> change.
> 
>> Do you need a sponsor for this change or has Paul Hohensee offered to do
>> that?
> 
> Thanks, Paul will sponsor me.
> 
> Best,
> Ziyi
> 


From zgu at redhat.com  Thu Jun  4 18:18:30 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 4 Jun 2020 14:18:30 -0400
Subject: [15] 8246612: Shenandoah: add timing tracking to
 ShenandoahStringDedupRoots
Message-ID: <719c5c95-d3fa-7808-4360-588cd1af3b86@redhat.com>

Please review this small patch that adds worker timing for 
ShenandoahStringDedupRoots.


Bug: https://bugs.openjdk.java.net/browse/JDK-8246612
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246612/weberv.00/index.html

Test:
   hotspot_gc_shenandoah

Thanks,

-Zhengyu


From shade at redhat.com  Thu Jun  4 18:20:43 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 4 Jun 2020 20:20:43 +0200
Subject: [15] 8246612: Shenandoah: add timing tracking to
 ShenandoahStringDedupRoots
In-Reply-To: <719c5c95-d3fa-7808-4360-588cd1af3b86@redhat.com>
References: <719c5c95-d3fa-7808-4360-588cd1af3b86@redhat.com>
Message-ID: <87b68609-8405-c88e-cf3a-dc8ab07f5640@redhat.com>

On 6/4/20 8:18 PM, Zhengyu Gu wrote:
> Please review this small patch that adds worker timing for 
> ShenandoahStringDedupRoots.
> 
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8246612
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246612/weberv.00/index.html

*) Please use the argument name here, "phase", to match other uses:
  212   ShenandoahConcurrentStringDedupRoots(ShenandoahPhaseTimings::Phase);

Otherwise looks good!

-- 
Thanks,
-Aleksey


From ioi.lam at oracle.com  Thu Jun  4 18:36:54 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 4 Jun 2020 11:36:54 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <ad4c12bf-5a9b-879d-f055-7ebfe6f9f9fa@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <ad4c12bf-5a9b-879d-f055-7ebfe6f9f9fa@oracle.com>
Message-ID: <fb49e9b1-7560-ec6e-8017-31b180ded1db@oracle.com>


On 6/4/20 11:02 AM, Yumin Qi wrote:
> Hi, Ioi
>
> ? Glad you make it simple.
>
> metaspaceShared.cpp:
>
> 1612 void VM_PopulateDumpSharedSpace::doit() {
> 1613 if (HeapShared::is_heap_object_archiving_allowed()) {
> 1614 // Avoid fragmentation while archiving heap objects.
> 1615 if (GCLocker::is_active()) {
> 1616 // This should rarely happen during -Xshare:dump, if at all, but 
> just to be safe.
> 1617 log_debug(cds)("GCLocker::is_active() ... try again");
> 1618 return;
> 1619 }
> 1620
> 1621 log_debug(cds)("Run GC ...");
> 1622 Universe::heap()->collect_as_vm_thread(GCCause::_archive_time_gc);
> 1623 log_debug(cds)("Run GC done");
> 1624 }
>
> I think the check for GCLocker should be after the collection, the G1 
> full collection will check active GC:
>
> bool G1CollectedHeap::do_full_collection(bool explicit_gc, bool 
> clear_all_soft_refs) {
> ? assert_at_safepoint_on_vm_thread();
> ? if (GCLocker::check_active_before_gc()) {
> ??? // Full GC was not completed. return false;
> ? }
>
> It should be very rare for dumping that full collection aborted 
> premature, but put the checking after collection seems safer.
>
>
I copied the code from here (src/hotspot/share/services/heapDumper.cpp)

void VM_HeapDumper::doit() {
 ? CollectedHeap* ch = Universe::heap();

 ? ch->ensure_parsability(false); // must happen, even if collection does
 ???????????????????????????????? // not happen (e.g. due to GCLocker)

 ? if (_gc_before_heap_dump) {
 ??? if (GCLocker::is_active()) {
 ????? warning("GC locker is held; pre-heapdump GC was skipped");
 ??? } else {
 ????? ch->collect_as_vm_thread(GCCause::_heap_dump);
 ??? }
 ? }

Could someone on the GC team comment whether this is the correct way to 
do it?

(I should probably add ensure_parsability() to my code as well??).

> For test GCDuringDump.java:
>
> 64 String extraArg = (i == 0) ? "-showversion" : "-javaagent:" + agentJar;
> 65 String extraOption = (i == 0) ? "-showversion" : 
> "-XX:+AllowArchivingWithJavaAgent";
> 66 String extraOption2 = (i != 2) ? "-showversion" : 
> "-Dtest.with.cleaner=true";
>
> Is that correct for line 65? for i = 0, both extraArg and extraOption 
> will be same, looks a bug.

"-showversion" is just a way to pass a "no-op" to TestCommon.testDump. 
It's a harmless option and can be repeated several times.

Thanks
- Ioi

>
> Others looks OK to me. Thanks Yumin
>
> On 6/3/20 10:53 PM, Ioi Lam wrote:
>>
>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
>>>> HI, Ioi
>>>>
>>>> ? If the allocation of EDEN happens between GC and dump, should we 
>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at 
>>>> safepoint there should no allocation happens. The stack trace 
>>>> showed it happened with a Java Thread, which should be blocked at 
>>>> safepoint.
>>>>
>>>
>>> Hi Yumin,
>>>
>>> I think GCs cannot be executed inside a safepoint, because some 
>>> parts of GC need to execute in a safepoint, so they will be blocked 
>>> until VM_PopulateDumpSharedSpace::doit has returned.
>>>
>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way 
>>> to fix this, so I will withdraw the current patch.
>>>
>>
>> Hi Yumin,
>>
>> Actually, I changed my mind again, and implemented your suggestion :-)
>>
>> There's actually a way to invoke GC inside a safepoint (it's used by 
>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do 
>> the same thing. It's a much simpler change and does what I want -- no 
>> other thread will be able to make any heap allocation after the GC 
>> has completed, so no EDEN region will be allocated:
>>
>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/ 
>>
>>
>> The check for "if (GCLocker::is_active())" should almost always be 
>> false. I left it there just for safety:
>>
>> During -Xshare:dump, we execute Java code to build the module graph, 
>> load classes, etc. So theoretically someone could try to parallelize 
>> some of that Java code in the future. Theoretically when CDS has 
>> entered the safepoint, another thread could be in the middle a JNI 
>> method that has held the GCLock.
>>
>> Thanks
>> - Ioi
>>
>>>
>>>>
>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/ 
>>>>>
>>>>>
>>>>>
>>>>> Summary:
>>>>>
>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>>> CDS executes a full GC so that G1 will compact the heap regions, 
>>>>> leaving
>>>>> maximum contiguous free space at the top of the heap. Then, the 
>>>>> archived
>>>>> heap regions are allocated from the top of the heap.
>>>>>
>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>>> after the GC has completed. The cleaners may allocate or 
>>>>> synchronized, which
>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>>>
>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN 
>>>>> regions exist,
>>>>> exit the safepoint, run GC, and try again. Eventually all the 
>>>>> cleaners will
>>>>> be executed and no more allocation can happen.
>>>>>
>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>>>>
>>>
>>
>


From per.liden at oracle.com  Thu Jun  4 18:42:16 2020
From: per.liden at oracle.com (Per Liden)
Date: Thu, 4 Jun 2020 20:42:16 +0200
Subject: RFR: 8246622: Remove CollectedHeap::print_gc_threads_on()
Message-ID: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>

Instead of having all GCs implement 
CollectedHeap::print_gc_threads_on(), we can just let the single caller 
(Threads::print_on) provide a closure and use 
CollectedHeap::gc_threads_do(). That will better match what 
Threads::print_on_error() is already doing, and remove repetitive code 
in the GCs.

Bug: https://bugs.openjdk.java.net/browse/JDK-8246622
Webrev: http://cr.openjdk.java.net/~pliden/8246622/webrev.0

/Per


From stefan.karlsson at oracle.com  Thu Jun  4 18:55:53 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 4 Jun 2020 20:55:53 +0200
Subject: RFR: 8246622: Remove CollectedHeap::print_gc_threads_on()
In-Reply-To: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
References: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
Message-ID: <c1708f60-cd5b-a700-f1cd-46d814f0f08d@oracle.com>

Looks good.

Did you fix a Shenandoah bug with this change?

StefanK

On 2020-06-04 20:42, Per Liden wrote:
> Instead of having all GCs implement 
> CollectedHeap::print_gc_threads_on(), we can just let the single 
> caller (Threads::print_on) provide a closure and use 
> CollectedHeap::gc_threads_do(). That will better match what 
> Threads::print_on_error() is already doing, and remove repetitive code 
> in the GCs.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8246622
> Webrev: http://cr.openjdk.java.net/~pliden/8246622/webrev.0
>
> /Per


From per.liden at oracle.com  Thu Jun  4 19:02:15 2020
From: per.liden at oracle.com (Per Liden)
Date: Thu, 4 Jun 2020 21:02:15 +0200
Subject: RFR: 8246622: Remove CollectedHeap::print_gc_threads_on()
In-Reply-To: <c1708f60-cd5b-a700-f1cd-46d814f0f08d@oracle.com>
References: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
 <c1708f60-cd5b-a700-f1cd-46d814f0f08d@oracle.com>
Message-ID: <da43e688-82b6-c06c-f192-d7130768cae7@oracle.com>

On 6/4/20 8:55 PM, Stefan Karlsson wrote:
> Looks good.

Thanks!

> 
> Did you fix a Shenandoah bug with this change?

Yep

cheers,
Per

> 
> StefanK
> 
> On 2020-06-04 20:42, Per Liden wrote:
>> Instead of having all GCs implement 
>> CollectedHeap::print_gc_threads_on(), we can just let the single 
>> caller (Threads::print_on) provide a closure and use 
>> CollectedHeap::gc_threads_do(). That will better match what 
>> Threads::print_on_error() is already doing, and remove repetitive code 
>> in the GCs.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8246622
>> Webrev: http://cr.openjdk.java.net/~pliden/8246622/webrev.0
>>
>> /Per
> 


From zgu at redhat.com  Thu Jun  4 19:02:37 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 4 Jun 2020 15:02:37 -0400
Subject: [15] 8246612: Shenandoah: add timing tracking to
 ShenandoahStringDedupRoots
In-Reply-To: <87b68609-8405-c88e-cf3a-dc8ab07f5640@redhat.com>
References: <719c5c95-d3fa-7808-4360-588cd1af3b86@redhat.com>
 <87b68609-8405-c88e-cf3a-dc8ab07f5640@redhat.com>
Message-ID: <dc5e4c8a-3949-222d-d0ae-33c8f14b630f@redhat.com>


On 6/4/20 2:20 PM, Aleksey Shipilev wrote:
> On 6/4/20 8:18 PM, Zhengyu Gu wrote:
>> Please review this small patch that adds worker timing for
>> ShenandoahStringDedupRoots.
>>
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8246612
>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246612/weberv.00/index.html
> 
> *) Please use the argument name here, "phase", to match other uses:
>    212   ShenandoahConcurrentStringDedupRoots(ShenandoahPhaseTimings::Phase);

Fixed and pushed.

Thanks,

-Zhengyu

> 
> Otherwise looks good!
> 


From hohensee at amazon.com  Thu Jun  4 19:06:57 2020
From: hohensee at amazon.com (Hohensee, Paul)
Date: Thu, 4 Jun 2020 19:06:57 +0000
Subject: RFR (S) 8246274: G1 old gen allocation tracking is not in a
 separate class
Message-ID: <2023BBC2-9401-4D1D-8808-856822C65F4E@amazon.com>

Thanks, Thomas. Looks good to me as well. I've submitted a submit repo run just to be sure, and will push when it successfully completes.

Paul

?On 6/4/20, 10:09 AM, "hotspot-gc-dev on behalf of Thomas Schatzl" <hotspot-gc-dev-bounces at openjdk.java.net on behalf of thomas.schatzl at oracle.com> wrote:

    Hi,

       lgtm. Thanks.

    Thomas

    On 04.06.20 18:15, Luo, Ziyi wrote:
    > Hi Thomas,
    >
    > A third revision is here:
    > http://cr.openjdk.java.net/~phh/8246274/webrev.02
    >
    > On 6/4/20, 3:51 AM, Thomas Schatzl wrote:
    >
    > [..]
    >
    >>>> - I would prefer if there were separate reset_after_[young_]gc() and a
    >>>> reset_after_full_gc() methods. Initially I asked myselves why for full
    >>>> gc that first parameter passed to reset_after_gc() is zero, and only
    >>>> when looking into the code found that it is ignored anyway. I think the
    >>>> API of that class isn't that huge yet.
    >>>
    >>> Make sense. I refactored it into two methods in rev.01:
    >>> * reset_after_full_gc()        for full GC
    >>> * reset_after_incremental_gc() for young and mixed GC
    >>>
    >
    >> I would prefer to use "reset_after_young_gc" for the latter - almost all
    >> existing code uses "young" already to indicate incremental gc (except
    >> the MBeans support code, but this is _very_ old code).
    >>
    >> The "correct" terms of the two types of young gc would be "young only"
    >> and "mixed" btw. Even "young only" has for a long time now also
    >> reclaimed some kind of old gen (humongous) regions.
    >> I do not expect this to go away in the future but the line blurring even
    >> more with potentially adding defragmentation work to young only to
    >> facilitate humongous allocation.
    >
    > Thanks for the explanation. Fixed in rev.02.
    >
    >> And as an ultra-minor nit, please remove the "private" visibility
    >> specifier in the g1OldGenAllocationTracker class as its default.
    >
    > Done
    >
    >> As for performance testing I will do that in conjunction with the
    >> changes to the actual IHOP calculations.
    >
    > Yep. Besides, I'll add a new gtest for g1OldGenAllocationTracker in the next
    > change.
    >
    >> Do you need a sponsor for this change or has Paul Hohensee offered to do
    >> that?
    >
    > Thanks, Paul will sponsor me.
    >
    > Best,
    > Ziyi
    >


From zgu at redhat.com  Thu Jun  4 20:08:10 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 4 Jun 2020 16:08:10 -0400
Subject: [15] RFR 8246593: Shenandoah: string dedup roots should be processed
 during concurrent weak roots phase
Message-ID: <181dc5ec-64b9-32d3-3c86-3e62e9aa6f02@redhat.com>

String dedup roots are weak roots, they are mistakenly placed as strong 
roots during weak/strong roots split (JDK-8242643).

This patch moves string dedup roots from concurrent strong roots to 
concurrent weak roots.

Bug: https://bugs.openjdk.java.net/browse/JDK-8246593
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246593/webrev.00/

Test:
   hotspot_gc_shenandoah with -XX:+UseStringDeduplication

Thanks,

-Zhengyu


From shade at redhat.com  Thu Jun  4 21:15:44 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 4 Jun 2020 23:15:44 +0200
Subject: [15] RFR 8246593: Shenandoah: string dedup roots should be
 processed during concurrent weak roots phase
In-Reply-To: <181dc5ec-64b9-32d3-3c86-3e62e9aa6f02@redhat.com>
References: <181dc5ec-64b9-32d3-3c86-3e62e9aa6f02@redhat.com>
Message-ID: <ce908f27-1cb2-1654-726d-b6da56de103c@redhat.com>

On 6/4/20 10:08 PM, Zhengyu Gu wrote:
> String dedup roots are weak roots, they are mistakenly placed as strong 
> roots during weak/strong roots split (JDK-8242643).
> 
> This patch moves string dedup roots from concurrent strong roots to 
> concurrent weak roots.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8246593
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246593/webrev.00/

OK, good!

-- 
-Aleksey


From ioi.lam at oracle.com  Thu Jun  4 23:54:26 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 4 Jun 2020 16:54:26 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
Message-ID: <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>


On 6/4/20 12:04 PM, Jiangli Zhou wrote:
> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
>>>> HI, Ioi
>>>>
>>>>    If the allocation of EDEN happens between GC and dump, should we
>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
>>>> safepoint there should no allocation happens. The stack trace showed
>>>> it happened with a Java Thread, which should be blocked at safepoint.
>>>>
>>> Hi Yumin,
>>>
>>> I think GCs cannot be executed inside a safepoint, because some parts
>>> of GC need to execute in a safepoint, so they will be blocked until
>>> VM_PopulateDumpSharedSpace::doit has returned.
>>>
>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
>>> fix this, so I will withdraw the current patch.
>>>
>> Hi Yumin,
>>
>> Actually, I changed my mind again, and implemented your suggestion :-)
>>
>> There's actually a way to invoke GC inside a safepoint (it's used by
>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
>> same thing. It's a much simpler change and does what I want -- no other
>> thread will be able to make any heap allocation after the GC has
>> completed, so no EDEN region will be allocated:
>>
>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
> Going with the simple approach for the short term sounds ok.
>
> 1612 void VM_PopulateDumpSharedSpace::doit() {
> ...
> 1615     if (GCLocker::is_active()) {
> 1616       // This should rarely happen during -Xshare:dump, if at
> all, but just to be safe.
> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
> 1618       return;
> 1619     }
>
> MetaspaceShared::preload_and_dump()
> ...
> 1945     while (true) {
> 1946       {
> 1947         MutexLocker ml(THREAD, Heap_lock);
> 1948         VMThread::execute(&op);
> 1949       }
> 1950       // If dumping has finished, the VM would have exited. The
> only reason to
> 1951       // come back here is to wait for the GCLocker.
> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
> 1953       os::naked_short_sleep(1);
> 1954     }
> 1955   }
>
>
> Instead of doing the while/retry, calling
> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
> before VM_PopulateDumpSharedSpace probably is much cleaner?

Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()? 
cannot be called in the VM thread:

#? assert(thread->is_Java_thread()) failed: just checking

> Please also add a comment in MetaspaceShared::preload_and_dump to
> explain that Universe::heap()->collect_as_vm_thread expects that
> Heap_lock is already held by the thread, and that's the reason for the
> call to MutexLocker ml(THREAD, Heap_lock).

I changed the code to be like this:

 ??? MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
 ?????????????????? Heap_lock : NULL);???? // needed for 
collect_as_vm_thread

> Do you also need to call
> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
> before GC?

There's no need:

void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
 ???? ...
 ??? case GCCause::_archive_time_gc:
 ??? case GCCause::_metadata_GC_clear_soft_refs: {
 ????? HandleMark hm;
 ????? do_full_collection(true);???????? // do clear all soft refs

Thanks
- Ioi

> Best,
> Jiangli
>
>> The check for "if (GCLocker::is_active())" should almost always be
>> false. I left it there just for safety:
>>
>> During -Xshare:dump, we execute Java code to build the module graph,
>> load classes, etc. So theoretically someone could try to parallelize
>> some of that Java code in the future. Theoretically when CDS has entered
>> the safepoint, another thread could be in the middle a JNI method that
>> has held the GCLock.
>>
>> Thanks
>> - Ioi
>>
>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
>>>>>
>>>>>
>>>>>
>>>>> Summary:
>>>>>
>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>>> CDS executes a full GC so that G1 will compact the heap regions,
>>>>> leaving
>>>>> maximum contiguous free space at the top of the heap. Then, the
>>>>> archived
>>>>> heap regions are allocated from the top of the heap.
>>>>>
>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>>> after the GC has completed. The cleaners may allocate or
>>>>> synchronized, which
>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>>>
>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
>>>>> regions exist,
>>>>> exit the safepoint, run GC, and try again. Eventually all the
>>>>> cleaners will
>>>>> be executed and no more allocation can happen.
>>>>>
>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>


From jianglizhou at google.com  Fri Jun  5 00:23:51 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Thu, 4 Jun 2020 17:23:51 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
Message-ID: <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>

Ioi, do you have a new webrev?

Best,
Jiangli

On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
> > On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>
> >>> On 5/29/20 9:40 PM, Yumin Qi wrote:
> >>>> HI, Ioi
> >>>>
> >>>>    If the allocation of EDEN happens between GC and dump, should we
> >>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
> >>>> safepoint there should no allocation happens. The stack trace showed
> >>>> it happened with a Java Thread, which should be blocked at safepoint.
> >>>>
> >>> Hi Yumin,
> >>>
> >>> I think GCs cannot be executed inside a safepoint, because some parts
> >>> of GC need to execute in a safepoint, so they will be blocked until
> >>> VM_PopulateDumpSharedSpace::doit has returned.
> >>>
> >>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
> >>> fix this, so I will withdraw the current patch.
> >>>
> >> Hi Yumin,
> >>
> >> Actually, I changed my mind again, and implemented your suggestion :-)
> >>
> >> There's actually a way to invoke GC inside a safepoint (it's used by
> >> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
> >> same thing. It's a much simpler change and does what I want -- no other
> >> thread will be able to make any heap allocation after the GC has
> >> completed, so no EDEN region will be allocated:
> >>
> >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
> > Going with the simple approach for the short term sounds ok.
> >
> > 1612 void VM_PopulateDumpSharedSpace::doit() {
> > ...
> > 1615     if (GCLocker::is_active()) {
> > 1616       // This should rarely happen during -Xshare:dump, if at
> > all, but just to be safe.
> > 1617       log_debug(cds)("GCLocker::is_active() ... try again");
> > 1618       return;
> > 1619     }
> >
> > MetaspaceShared::preload_and_dump()
> > ...
> > 1945     while (true) {
> > 1946       {
> > 1947         MutexLocker ml(THREAD, Heap_lock);
> > 1948         VMThread::execute(&op);
> > 1949       }
> > 1950       // If dumping has finished, the VM would have exited. The
> > only reason to
> > 1951       // come back here is to wait for the GCLocker.
> > 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
> > 1953       os::naked_short_sleep(1);
> > 1954     }
> > 1955   }
> >
> >
> > Instead of doing the while/retry, calling
> > GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
> > before VM_PopulateDumpSharedSpace probably is much cleaner?
>
> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
> cannot be called in the VM thread:
>
> #  assert(thread->is_Java_thread()) failed: just checking
>
> > Please also add a comment in MetaspaceShared::preload_and_dump to
> > explain that Universe::heap()->collect_as_vm_thread expects that
> > Heap_lock is already held by the thread, and that's the reason for the
> > call to MutexLocker ml(THREAD, Heap_lock).
>
> I changed the code to be like this:
>
>      MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
>                     Heap_lock : NULL);     // needed for
> collect_as_vm_thread
>
> > Do you also need to call
> > Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
> > before GC?
>
> There's no need:
>
> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
>       ...
>      case GCCause::_archive_time_gc:
>      case GCCause::_metadata_GC_clear_soft_refs: {
>        HandleMark hm;
>        do_full_collection(true);         // do clear all soft refs
>
> Thanks
> - Ioi
>
> > Best,
> > Jiangli
> >
> >> The check for "if (GCLocker::is_active())" should almost always be
> >> false. I left it there just for safety:
> >>
> >> During -Xshare:dump, we execute Java code to build the module graph,
> >> load classes, etc. So theoretically someone could try to parallelize
> >> some of that Java code in the future. Theoretically when CDS has entered
> >> the safepoint, another thread could be in the middle a JNI method that
> >> has held the GCLock.
> >>
> >> Thanks
> >> - Ioi
> >>
> >>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
> >>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
> >>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>>>>
> >>>>>
> >>>>>
> >>>>> Summary:
> >>>>>
> >>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >>>>> CDS executes a full GC so that G1 will compact the heap regions,
> >>>>> leaving
> >>>>> maximum contiguous free space at the top of the heap. Then, the
> >>>>> archived
> >>>>> heap regions are allocated from the top of the heap.
> >>>>>
> >>>>> Under some circumstances, java.lang.ref.Cleaners will execute
> >>>>> after the GC has completed. The cleaners may allocate or
> >>>>> synchronized, which
> >>>>> will cause G1 to allocate an EDEN region at the top of the heap.
> >>>>>
> >>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
> >>>>> regions exist,
> >>>>> exit the safepoint, run GC, and try again. Eventually all the
> >>>>> cleaners will
> >>>>> be executed and no more allocation can happen.
> >>>>>
> >>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>>>>
> >>>>> Thanks
> >>>>> - Ioi
> >>>>>
> >>>>>
> >>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>


From ioi.lam at oracle.com  Fri Jun  5 04:37:18 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 4 Jun 2020 21:37:18 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
Message-ID: <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>

Hi Jiangli,

Updated webrev is here:

http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v03/

The only difference with the previous version is this part:

1975 MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
1976??????????????? Heap_lock : NULL);???? // needed for 
collect_as_vm_thread

Thanks
- Ioi


On 6/4/20 5:23 PM, Jiangli Zhou wrote:
> Ioi, do you have a new webrev?
>
> Best,
> Jiangli
>
> On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>>
>> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
>>> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
>>>>>> HI, Ioi
>>>>>>
>>>>>>     If the allocation of EDEN happens between GC and dump, should we
>>>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
>>>>>> safepoint there should no allocation happens. The stack trace showed
>>>>>> it happened with a Java Thread, which should be blocked at safepoint.
>>>>>>
>>>>> Hi Yumin,
>>>>>
>>>>> I think GCs cannot be executed inside a safepoint, because some parts
>>>>> of GC need to execute in a safepoint, so they will be blocked until
>>>>> VM_PopulateDumpSharedSpace::doit has returned.
>>>>>
>>>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
>>>>> fix this, so I will withdraw the current patch.
>>>>>
>>>> Hi Yumin,
>>>>
>>>> Actually, I changed my mind again, and implemented your suggestion :-)
>>>>
>>>> There's actually a way to invoke GC inside a safepoint (it's used by
>>>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
>>>> same thing. It's a much simpler change and does what I want -- no other
>>>> thread will be able to make any heap allocation after the GC has
>>>> completed, so no EDEN region will be allocated:
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
>>> Going with the simple approach for the short term sounds ok.
>>>
>>> 1612 void VM_PopulateDumpSharedSpace::doit() {
>>> ...
>>> 1615     if (GCLocker::is_active()) {
>>> 1616       // This should rarely happen during -Xshare:dump, if at
>>> all, but just to be safe.
>>> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
>>> 1618       return;
>>> 1619     }
>>>
>>> MetaspaceShared::preload_and_dump()
>>> ...
>>> 1945     while (true) {
>>> 1946       {
>>> 1947         MutexLocker ml(THREAD, Heap_lock);
>>> 1948         VMThread::execute(&op);
>>> 1949       }
>>> 1950       // If dumping has finished, the VM would have exited. The
>>> only reason to
>>> 1951       // come back here is to wait for the GCLocker.
>>> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
>>> 1953       os::naked_short_sleep(1);
>>> 1954     }
>>> 1955   }
>>>
>>>
>>> Instead of doing the while/retry, calling
>>> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
>>> before VM_PopulateDumpSharedSpace probably is much cleaner?
>> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
>> cannot be called in the VM thread:
>>
>> #  assert(thread->is_Java_thread()) failed: just checking
>>
>>> Please also add a comment in MetaspaceShared::preload_and_dump to
>>> explain that Universe::heap()->collect_as_vm_thread expects that
>>> Heap_lock is already held by the thread, and that's the reason for the
>>> call to MutexLocker ml(THREAD, Heap_lock).
>> I changed the code to be like this:
>>
>>       MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
>>                      Heap_lock : NULL);     // needed for
>> collect_as_vm_thread
>>
>>> Do you also need to call
>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
>>> before GC?
>> There's no need:
>>
>> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
>>        ...
>>       case GCCause::_archive_time_gc:
>>       case GCCause::_metadata_GC_clear_soft_refs: {
>>         HandleMark hm;
>>         do_full_collection(true);         // do clear all soft refs
>>
>> Thanks
>> - Ioi
>>
>>> Best,
>>> Jiangli
>>>
>>>> The check for "if (GCLocker::is_active())" should almost always be
>>>> false. I left it there just for safety:
>>>>
>>>> During -Xshare:dump, we execute Java code to build the module graph,
>>>> load classes, etc. So theoretically someone could try to parallelize
>>>> some of that Java code in the future. Theoretically when CDS has entered
>>>> the safepoint, another thread could be in the middle a JNI method that
>>>> has held the GCLock.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Summary:
>>>>>>>
>>>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>>>>> CDS executes a full GC so that G1 will compact the heap regions,
>>>>>>> leaving
>>>>>>> maximum contiguous free space at the top of the heap. Then, the
>>>>>>> archived
>>>>>>> heap regions are allocated from the top of the heap.
>>>>>>>
>>>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>>>>> after the GC has completed. The cleaners may allocate or
>>>>>>> synchronized, which
>>>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>>>>>
>>>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
>>>>>>> regions exist,
>>>>>>> exit the safepoint, run GC, and try again. Eventually all the
>>>>>>> cleaners will
>>>>>>> be executed and no more allocation can happen.
>>>>>>>
>>>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>


From stefan.karlsson at oracle.com  Fri Jun  5 05:53:56 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 5 Jun 2020 07:53:56 +0200
Subject: RFR: 8246405: Add GCLogPrecious functionality to log and report
 debug errors
In-Reply-To: <17def85c-103e-9b4d-dbec-2a5578e0672a@oracle.com>
References: <08bb1203-39f9-cf25-7014-e32585767396@oracle.com>
 <17def85c-103e-9b4d-dbec-2a5578e0672a@oracle.com>
Message-ID: <67e9369f-6696-4da6-7b68-845e44921c83@oracle.com>

Thanks, Erik.

StefanK

On 2020-06-03 17:01, Erik ?sterlund wrote:
> Hi Stefan,
>
> Looks good.
>
> /Erik
>
> On 2020-06-03 10:42, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to enhance the GCLogPrecious functionality 
>> (JDK-8246405) to add support for a way to both log and generate a 
>> crash report in debug builds.
>>
>> https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246405
>>
>> I've split out a patch where ZGC uses this functionality:
>>
>> https://cr.openjdk.java.net/~stefank/8246406/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246406
>>
>> Tested manually by running:
>> (ulimit -v >low value>; ../build/fastdebug/jdk/bin/java -XX:+UseZGC 
>> -Xmx18m -Xlog:gc* -version)
>>
>> and verified that it generates a hs_err file with the appropriate 
>> information.
>>
>> On macOS the output points to the right file and line number:
>>
>> #? Internal Error (src/hotspot/share/gc/z/zVirtualMemory.cpp:46), 
>> pid=67695, tid=8451
>> #? Error: Failed to reserve enough address space for Java heap
>>
>> but since TOUCH_ASSERT_POISON isn't implemented we don't get 
>> registers and the output contains the GCLogPrecious code:
>>
>> V? [libjvm.dylib+0xb3d95c]? VMError::report_and_die(int, char const*, 
>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, 
>> char const*, int, unsigned long)+0x670
>> V? [libjvm.dylib+0xb3e083] VMError::report_and_die(Thread*, void*, 
>> char const*, int, char const*, char const*, __va_list_tag*)+0x47
>> V? [libjvm.dylib+0x334b48]? report_vm_error(char const*, int, char 
>> const*, char const*, ...)+0x145
>> V? [libjvm.dylib+0x48d629] 
>> GCLogPrecious::vwrite_and_debug(LogTargetHandle, char const*, 
>> __va_list_tag*, char const*, int)+0x81
>> V? [libjvm.dylib+0xbbdf70] GCLogPreciousHandle::write_and_debug(char 
>> const*, ...)+0x92
>> V? [libjvm.dylib+0xbd833e] 
>> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0xb6
>>
>> On Linux, where TOUCH_ASSERT_POISON is implemented, we get the last 
>> parts cut away:
>>
>> V? [libjvm.so+0x1857179] 
>> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0x79
>> V? [libjvm.so+0x182f84e] ZPageAllocator::ZPageAllocator(ZWorkers*, 
>> unsigned long, unsigned long, unsigned long, unsigned long)+0x6e
>> V? [libjvm.so+0x1808b61]? ZHeap::ZHeap()+0x81
>> V? [libjvm.so+0x1802559] ZCollectedHeap::ZCollectedHeap()+0x49
>>
>> Thanks,
>> StefanK
>


From stefan.johansson at oracle.com  Fri Jun  5 07:14:23 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Fri, 5 Jun 2020 09:14:23 +0200
Subject: RFR: 8246622: Remove CollectedHeap::print_gc_threads_on()
In-Reply-To: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
References: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
Message-ID: <8e70b200-661b-51a2-5ea6-580544ac7d47@oracle.com>

Nice cleanup Per,

On 2020-06-04 20:42, Per Liden wrote:
> Instead of having all GCs implement 
> CollectedHeap::print_gc_threads_on(), we can just let the single caller 
> (Threads::print_on) provide a closure and use 
> CollectedHeap::gc_threads_do(). That will better match what 
> Threads::print_on_error() is already doing, and remove repetitive code 
> in the GCs.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8246622
> Webrev: http://cr.openjdk.java.net/~pliden/8246622/webrev.0
> 
Looks good,
Stefan

> /Per


From richard.reingruber at sap.com  Fri Jun  5 07:18:53 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Fri, 5 Jun 2020 07:18:53 +0000
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <cfc355bd-812c-ab25-4bd9-530bdddba278@oracle.com>
References: <AM0PR0202MB333121DBB703616DF690F38B9B1D0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <32f34616-cf17-8caa-5064-455e013e2313@oracle.com>
 <AM0PR0202MB3331802279A7608927F4D9B49BB00@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
 <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>
 <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <cfc355bd-812c-ab25-4bd9-530bdddba278@oracle.com>
Message-ID: <AM0PR0202MB333150108E1FA54B8B7A3D0F9B860@AM0PR0202MB3331.eurprd02.prod.outlook.com>

Hi,

> The mach5 test run is good.

Thanks Serguei and thanks to everybody providing feedback! I just pushed the change.

Just curious: is mach5 an alias for tier5? And is this mach5 the same as in "Job:
mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job?

Thanks,
Richard.

-----Original Message-----
From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com> 
Sent: Donnerstag, 4. Juni 2020 04:07
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant

Hi Richard,

The mach5 test run is good.

Thanks,
Serguei


On 6/2/20 10:57, Reingruber, Richard wrote:
> Hi Serguei,
>
>> This looks good to me.
> Thanks!
>
>  From an earlier mail:
>
>> I'm thinking it would be more safe to run full tier5.
> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would
> like to push.
>
> Thanks, Richard.
>
> -----Original Message-----
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Dienstag, 2. Juni 2020 18:55
> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>
> Hi Richard,
>
> This looks good to me.
>
> Thanks,
> Serguei
>
>
> On 5/28/20 09:02, Vladimir Kozlov wrote:
>> Vladimir Ivanov is on break currently.
>> It looks good to me.
>>
>> Thanks,
>> Vladimir K
>>
>> On 5/26/20 7:31 AM, Reingruber, Richard wrote:
>>> Hi Vladimir,
>>>
>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>>> Not an expert in JVMTI code base, so can't comment on the actual
>>>> changes.
>>>>  ? From JIT-compilers perspective it looks good.
>>> I put out webrev.1 a while ago [1]:
>>>
>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/
>>> Webrev(delta):
>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/
>>>
>>> You originally suggested to use a handshake to switch a thread into
>>> interpreter mode [2]. I'm using
>>> a direct handshake now, because I think it is the best fit.
>>>
>>> May I ask if webrev.1 still looks good to you from JIT-compilers
>>> perspective?
>>>
>>> Can I list you as (partial) Reviewer?
>>>
>>> Thanks, Richard.
>>>
>>> [1]
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html
>>> [2]
>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html
>>>
>>> -----Original Message-----
>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>> Sent: Freitag, 7. Februar 2020 09:19
>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>> serviceability-dev at openjdk.java.net;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR(S) 8238585: Use handshake for
>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make
>>> compiled methods on stack not_entrant
>>>
>>>
>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>> Not an expert in JVMTI code base, so can't comment on the actual
>>> changes.
>>>
>>>  ? From JIT-compilers perspective it looks good.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585
>>>>
>>>> The change avoids making all compiled methods on stack not_entrant
>>>> when switching a java thread to
>>>> interpreter only execution for jvmti purposes. It is sufficient to
>>>> deoptimize the compiled frames on stack.
>>>>
>>>> Additionally a handshake is used instead of a vm operation to walk
>>>> the stack and do the deoptimizations.
>>>>
>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and
>>>> release builds on all platforms.
>>>>
>>>> Thanks, Richard.
>>>>
>>>> See also my question if anyone knows a reason for making the
>>>> compiled methods not_entrant:
>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html
>>>>
>>>>


From thomas.schatzl at oracle.com  Fri Jun  5 07:26:03 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 5 Jun 2020 09:26:03 +0200
Subject: RFR: 8246622: Remove CollectedHeap::print_gc_threads_on()
In-Reply-To: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
References: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
Message-ID: <016c5891-6edd-563e-54a0-318fab3a62c4@oracle.com>

Hi,

On 04.06.20 20:42, Per Liden wrote:
> Instead of having all GCs implement 
> CollectedHeap::print_gc_threads_on(), we can just let the single caller 
> (Threads::print_on) provide a closure and use 
> CollectedHeap::gc_threads_do(). That will better match what 
> Threads::print_on_error() is already doing, and remove repetitive code 
> in the GCs.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8246622
> Webrev: http://cr.openjdk.java.net/~pliden/8246622/webrev.0
> 
> /Per

   thanks for this cleanup.

Looks good.

Thomas


From thomas.schatzl at oracle.com  Fri Jun  5 07:27:38 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 5 Jun 2020 09:27:38 +0200
Subject: RFR: 8246405: Add GCLogPrecious functionality to log and report
 debug errors
In-Reply-To: <67e9369f-6696-4da6-7b68-845e44921c83@oracle.com>
References: <08bb1203-39f9-cf25-7014-e32585767396@oracle.com>
 <17def85c-103e-9b4d-dbec-2a5578e0672a@oracle.com>
 <67e9369f-6696-4da6-7b68-845e44921c83@oracle.com>
Message-ID: <74b11af2-a1ae-10e2-9f41-11d4346b4101@oracle.com>

Hi,

On 05.06.20 07:53, Stefan Karlsson wrote:
> Thanks, Erik.
> 
> StefanK
> 
> On 2020-06-03 17:01, Erik ?sterlund wrote:
>> Hi Stefan,
>>
>> Looks good.
>>
>> /Erik
>>
>> On 2020-06-03 10:42, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> Please review this patch to enhance the GCLogPrecious functionality 
>>> (JDK-8246405) to add support for a way to both log and generate a 
>>> crash report in debug builds.
>>>

Looks good.

Thomas

>>> https://cr.openjdk.java.net/~stefank/8246405/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8246405
>>>
>>> I've split out a patch where ZGC uses this functionality:
>>>
>>> https://cr.openjdk.java.net/~stefank/8246406/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8246406
>>>
>>> Tested manually by running:
>>> (ulimit -v >low value>; ../build/fastdebug/jdk/bin/java -XX:+UseZGC 
>>> -Xmx18m -Xlog:gc* -version)
>>>
>>> and verified that it generates a hs_err file with the appropriate 
>>> information.
>>>
>>> On macOS the output points to the right file and line number:
>>>
>>> #? Internal Error (src/hotspot/share/gc/z/zVirtualMemory.cpp:46), 
>>> pid=67695, tid=8451
>>> #? Error: Failed to reserve enough address space for Java heap
>>>
>>> but since TOUCH_ASSERT_POISON isn't implemented we don't get 
>>> registers and the output contains the GCLogPrecious code:
>>>
>>> V? [libjvm.dylib+0xb3d95c]? VMError::report_and_die(int, char const*, 
>>> char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, 
>>> char const*, int, unsigned long)+0x670
>>> V? [libjvm.dylib+0xb3e083] VMError::report_and_die(Thread*, void*, 
>>> char const*, int, char const*, char const*, __va_list_tag*)+0x47
>>> V? [libjvm.dylib+0x334b48]? report_vm_error(char const*, int, char 
>>> const*, char const*, ...)+0x145
>>> V? [libjvm.dylib+0x48d629] 
>>> GCLogPrecious::vwrite_and_debug(LogTargetHandle, char const*, 
>>> __va_list_tag*, char const*, int)+0x81
>>> V? [libjvm.dylib+0xbbdf70] GCLogPreciousHandle::write_and_debug(char 
>>> const*, ...)+0x92
>>> V? [libjvm.dylib+0xbd833e] 
>>> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0xb6
>>>
>>> On Linux, where TOUCH_ASSERT_POISON is implemented, we get the last 
>>> parts cut away:
>>>
>>> V? [libjvm.so+0x1857179] 
>>> ZVirtualMemoryManager::ZVirtualMemoryManager(unsigned long)+0x79
>>> V? [libjvm.so+0x182f84e] ZPageAllocator::ZPageAllocator(ZWorkers*, 
>>> unsigned long, unsigned long, unsigned long, unsigned long)+0x6e
>>> V? [libjvm.so+0x1808b61]? ZHeap::ZHeap()+0x81
>>> V? [libjvm.so+0x1802559] ZCollectedHeap::ZCollectedHeap()+0x49
>>>
>>> Thanks,
>>> StefanK
>>
> 


From serguei.spitsyn at oracle.com  Fri Jun  5 07:31:01 2020
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Fri, 5 Jun 2020 00:31:01 -0700
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <AM0PR0202MB333150108E1FA54B8B7A3D0F9B860@AM0PR0202MB3331.eurprd02.prod.outlook.com>
References: <AM0PR0202MB333121DBB703616DF690F38B9B1D0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <32f34616-cf17-8caa-5064-455e013e2313@oracle.com>
 <AM0PR0202MB3331802279A7608927F4D9B49BB00@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
 <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>
 <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <cfc355bd-812c-ab25-4bd9-530bdddba278@oracle.com>
 <AM0PR0202MB333150108E1FA54B8B7A3D0F9B860@AM0PR0202MB3331.eurprd02.prod.outlook.com>
Message-ID: <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com>

Hi Richard,


On 6/5/20 00:18, Reingruber, Richard wrote:
> Hi,
>
>> The mach5 test run is good.
> Thanks Serguei and thanks to everybody providing feedback! I just pushed the change.

Great, thanks!

> Just curious: is mach5 an alias for tier5?

The mach5 is a build and test system which also provides CI.
Tier5 is one of the testing levels.

>   And is this mach5 the same as in "Job:
> mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job?

Yes. I guess all mach5 jobs have this prefix.

Thanks,
Serguei

>
> Thanks,
> Richard.
>
> -----Original Message-----
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Donnerstag, 4. Juni 2020 04:07
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>
> Hi Richard,
>
> The mach5 test run is good.
>
> Thanks,
> Serguei
>
>
> On 6/2/20 10:57, Reingruber, Richard wrote:
>> Hi Serguei,
>>
>>> This looks good to me.
>> Thanks!
>>
>>   From an earlier mail:
>>
>>> I'm thinking it would be more safe to run full tier5.
>> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would
>> like to push.
>>
>> Thanks, Richard.
>>
>> -----Original Message-----
>> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
>> Sent: Dienstag, 2. Juni 2020 18:55
>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
>> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>>
>> Hi Richard,
>>
>> This looks good to me.
>>
>> Thanks,
>> Serguei
>>
>>
>> On 5/28/20 09:02, Vladimir Kozlov wrote:
>>> Vladimir Ivanov is on break currently.
>>> It looks good to me.
>>>
>>> Thanks,
>>> Vladimir K
>>>
>>> On 5/26/20 7:31 AM, Reingruber, Richard wrote:
>>>> Hi Vladimir,
>>>>
>>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>>>> Not an expert in JVMTI code base, so can't comment on the actual
>>>>> changes.
>>>>>   ? From JIT-compilers perspective it looks good.
>>>> I put out webrev.1 a while ago [1]:
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/
>>>> Webrev(delta):
>>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/
>>>>
>>>> You originally suggested to use a handshake to switch a thread into
>>>> interpreter mode [2]. I'm using
>>>> a direct handshake now, because I think it is the best fit.
>>>>
>>>> May I ask if webrev.1 still looks good to you from JIT-compilers
>>>> perspective?
>>>>
>>>> Can I list you as (partial) Reviewer?
>>>>
>>>> Thanks, Richard.
>>>>
>>>> [1]
>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html
>>>> [2]
>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>> Sent: Freitag, 7. Februar 2020 09:19
>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>> serviceability-dev at openjdk.java.net;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR(S) 8238585: Use handshake for
>>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make
>>>> compiled methods on stack not_entrant
>>>>
>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>>> Not an expert in JVMTI code base, so can't comment on the actual
>>>> changes.
>>>>
>>>>   ? From JIT-compilers perspective it looks good.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585
>>>>>
>>>>> The change avoids making all compiled methods on stack not_entrant
>>>>> when switching a java thread to
>>>>> interpreter only execution for jvmti purposes. It is sufficient to
>>>>> deoptimize the compiled frames on stack.
>>>>>
>>>>> Additionally a handshake is used instead of a vm operation to walk
>>>>> the stack and do the deoptimizations.
>>>>>
>>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and
>>>>> release builds on all platforms.
>>>>>
>>>>> Thanks, Richard.
>>>>>
>>>>> See also my question if anyone knows a reason for making the
>>>>> compiled methods not_entrant:
>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html
>>>>>
>>>>>


From per.liden at oracle.com  Fri Jun  5 07:43:07 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 5 Jun 2020 09:43:07 +0200
Subject: RFR: 8246622: Remove CollectedHeap::print_gc_threads_on()
In-Reply-To: <016c5891-6edd-563e-54a0-318fab3a62c4@oracle.com>
References: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
 <016c5891-6edd-563e-54a0-318fab3a62c4@oracle.com>
Message-ID: <f14f8b37-b526-c838-ab63-0b1cc8ef4a81@oracle.com>

Thanks for reviewing, Thomas!

/Per

On 6/5/20 9:26 AM, Thomas Schatzl wrote:
> Hi,
> 
> On 04.06.20 20:42, Per Liden wrote:
>> Instead of having all GCs implement 
>> CollectedHeap::print_gc_threads_on(), we can just let the single 
>> caller (Threads::print_on) provide a closure and use 
>> CollectedHeap::gc_threads_do(). That will better match what 
>> Threads::print_on_error() is already doing, and remove repetitive code 
>> in the GCs.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8246622
>> Webrev: http://cr.openjdk.java.net/~pliden/8246622/webrev.0
>>
>> /Per
> 
>  ? thanks for this cleanup.
> 
> Looks good.
> 
> Thomas


From per.liden at oracle.com  Fri Jun  5 07:43:57 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 5 Jun 2020 09:43:57 +0200
Subject: RFR: 8246622: Remove CollectedHeap::print_gc_threads_on()
In-Reply-To: <8e70b200-661b-51a2-5ea6-580544ac7d47@oracle.com>
References: <51d8f47e-ac3b-8660-9df2-f98780a6eaa7@oracle.com>
 <8e70b200-661b-51a2-5ea6-580544ac7d47@oracle.com>
Message-ID: <7c3ed6d7-56c3-952d-ff88-b58cb3af3a8d@oracle.com>

Thanks for reviewing, Stefan!

/Per

On 6/5/20 9:14 AM, stefan.johansson at oracle.com wrote:
> Nice cleanup Per,
> 
> On 2020-06-04 20:42, Per Liden wrote:
>> Instead of having all GCs implement 
>> CollectedHeap::print_gc_threads_on(), we can just let the single 
>> caller (Threads::print_on) provide a closure and use 
>> CollectedHeap::gc_threads_do(). That will better match what 
>> Threads::print_on_error() is already doing, and remove repetitive code 
>> in the GCs.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8246622
>> Webrev: http://cr.openjdk.java.net/~pliden/8246622/webrev.0
>>
> Looks good,
> Stefan
> 
>> /Per


From richard.reingruber at sap.com  Fri Jun  5 08:05:46 2020
From: richard.reingruber at sap.com (Reingruber, Richard)
Date: Fri, 5 Jun 2020 08:05:46 +0000
Subject: RFR(S) 8238585: Use handshake for
 JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled
 methods on stack not_entrant
In-Reply-To: <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com>
References: <AM0PR0202MB333121DBB703616DF690F38B9B1D0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <32f34616-cf17-8caa-5064-455e013e2313@oracle.com>
 <AM0PR0202MB3331802279A7608927F4D9B49BB00@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <057dfdb4-74df-e0ec-198d-455aeb14d5a1@oracle.com>
 <adf083b1-3bd1-9c1b-29d7-1207b6446c3b@oracle.com>
 <AM0PR0202MB3331698E713447F4923BF4CB9B8B0@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <cfc355bd-812c-ab25-4bd9-530bdddba278@oracle.com>
 <AM0PR0202MB333150108E1FA54B8B7A3D0F9B860@AM0PR0202MB3331.eurprd02.prod.outlook.com>
 <42417262-7a4c-31cf-73af-55e22cd36627@oracle.com>
Message-ID: <AM0PR0202MB33317E4FB91493F5E2EDE04E9B860@AM0PR0202MB3331.eurprd02.prod.outlook.com>

I see. Thanks for the explanation :)

Richard.

-----Original Message-----
From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com> 
Sent: Freitag, 5. Juni 2020 09:31
To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant

Hi Richard,


On 6/5/20 00:18, Reingruber, Richard wrote:
> Hi,
>
>> The mach5 test run is good.
> Thanks Serguei and thanks to everybody providing feedback! I just pushed the change.

Great, thanks!

> Just curious: is mach5 an alias for tier5?

The mach5 is a build and test system which also provides CI.
Tier5 is one of the testing levels.

>   And is this mach5 the same as in "Job:
> mach5-one-rrich-JDK-8238585-2-20200604-1334-11519059" which is the (successful) submit repo job?

Yes. I guess all mach5 jobs have this prefix.

Thanks,
Serguei

>
> Thanks,
> Richard.
>
> -----Original Message-----
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Donnerstag, 4. Juni 2020 04:07
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>
> Hi Richard,
>
> The mach5 test run is good.
>
> Thanks,
> Serguei
>
>
> On 6/2/20 10:57, Reingruber, Richard wrote:
>> Hi Serguei,
>>
>>> This looks good to me.
>> Thanks!
>>
>>   From an earlier mail:
>>
>>> I'm thinking it would be more safe to run full tier5.
>> I guess we're done with reviewing. Would be good if you could run full tier5 now. After that I would
>> like to push.
>>
>> Thanks, Richard.
>>
>> -----Original Message-----
>> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
>> Sent: Dienstag, 2. Juni 2020 18:55
>> To: Vladimir Kozlov <vladimir.kozlov at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net
>> Subject: Re: RFR(S) 8238585: Use handshake for JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make compiled methods on stack not_entrant
>>
>> Hi Richard,
>>
>> This looks good to me.
>>
>> Thanks,
>> Serguei
>>
>>
>> On 5/28/20 09:02, Vladimir Kozlov wrote:
>>> Vladimir Ivanov is on break currently.
>>> It looks good to me.
>>>
>>> Thanks,
>>> Vladimir K
>>>
>>> On 5/26/20 7:31 AM, Reingruber, Richard wrote:
>>>> Hi Vladimir,
>>>>
>>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>>>> Not an expert in JVMTI code base, so can't comment on the actual
>>>>> changes.
>>>>>   ? From JIT-compilers perspective it looks good.
>>>> I put out webrev.1 a while ago [1]:
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1/
>>>> Webrev(delta):
>>>> http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.1.inc/
>>>>
>>>> You originally suggested to use a handshake to switch a thread into
>>>> interpreter mode [2]. I'm using
>>>> a direct handshake now, because I think it is the best fit.
>>>>
>>>> May I ask if webrev.1 still looks good to you from JIT-compilers
>>>> perspective?
>>>>
>>>> Can I list you as (partial) Reviewer?
>>>>
>>>> Thanks, Richard.
>>>>
>>>> [1]
>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-April/031245.html
>>>> [2]
>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030340.html
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>>>> Sent: Freitag, 7. Februar 2020 09:19
>>>> To: Reingruber, Richard <richard.reingruber at sap.com>;
>>>> serviceability-dev at openjdk.java.net;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: RFR(S) 8238585: Use handshake for
>>>> JvmtiEventControllerPrivate::enter_interp_only_mode() and don't make
>>>> compiled methods on stack not_entrant
>>>>
>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8238585/webrev.0/
>>>> Not an expert in JVMTI code base, so can't comment on the actual
>>>> changes.
>>>>
>>>>   ? From JIT-compilers perspective it looks good.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8238585
>>>>>
>>>>> The change avoids making all compiled methods on stack not_entrant
>>>>> when switching a java thread to
>>>>> interpreter only execution for jvmti purposes. It is sufficient to
>>>>> deoptimize the compiled frames on stack.
>>>>>
>>>>> Additionally a handshake is used instead of a vm operation to walk
>>>>> the stack and do the deoptimizations.
>>>>>
>>>>> Testing: JCK and JTREG tests, also in Xcomp mode with fastdebug and
>>>>> release builds on all platforms.
>>>>>
>>>>> Thanks, Richard.
>>>>>
>>>>> See also my question if anyone knows a reason for making the
>>>>> compiled methods not_entrant:
>>>>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2020-January/030339.html
>>>>>
>>>>>


From per.liden at oracle.com  Fri Jun  5 08:20:22 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 5 Jun 2020 10:20:22 +0200
Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property
In-Reply-To: <F3998ECB-C19E-4F50-A7CA-A2D0A390ED70@oracle.com>
References: <F3998ECB-C19E-4F50-A7CA-A2D0A390ED70@oracle.com>
Message-ID: <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com>

Hi Igor,

When looking at the follow-up sub-tasks for this, I see for example this:

 
http://cr.openjdk.java.net/~iignatyev/8246499/webrev.00/test/hotspot/jtreg/gc/z/TestSmallHeap.java.udiff.html

Maybe I'm misunderstanding how this is supposed to work, but it looks 
like this test would now _not_ be executed if I do:

   make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java 
JTREG="VM_OPTIONS=-XX:+UseZGC"

Is that so? In that case, that seems incorrect.

cheers,
Per

On 6/3/20 11:30 PM, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
>> 70 lines changed: 66 ins; 0 del; 4 mod
> 
> Hi all,
> 
> could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags?
> 
> the idea behind this patch is to have a way to clearly mark tests which ignore flags, so
> a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed;
> b) they can be easily excluded from runs w/ flags.
> 
> @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless`  w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable.
> 
> this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1].
> 
> please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3].
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8246494
> webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
> testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8151707
> [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336
> [3] https://bugs.openjdk.java.net/browse/JDK-8246387
> 
> Thanks,
> -- Igor
> 


From erik.osterlund at oracle.com  Fri Jun  5 08:55:14 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Fri, 5 Jun 2020 10:55:14 +0200
Subject: Need help to fix a potential G1 crash in jdk11
In-Reply-To: <CA+3eh132QHBGxyjU3K3ZQ_Xv-UbUUukfQ_3F2c9dD+hb=AAC8g@mail.gmail.com>
References: <CA+3eh11vEWqwn+fLvgG=v8gi0fTRLebo8cYOnS7qBWCueJbH-Q@mail.gmail.com>
 <e553e35e-3cb3-1c03-048a-3e4e7adad923@oracle.com>
 <CA+3eh12=fhJhXcyeP9kiwo9JUCVU1aXzUpQRAuXMCos2jsqGDw@mail.gmail.com>
 <CA+3eh132QHBGxyjU3K3ZQ_Xv-UbUUukfQ_3F2c9dD+hb=AAC8g@mail.gmail.com>
Message-ID: <abd45ff1-7699-c1a5-d143-d2ec24e55667@oracle.com>

Hi Volker,

On 2020-06-03 20:18, Volker Simonis wrote:
> Unfortunately, "-XX:-ClassUnloading" doesn't help :(

I am actually happy that did not help. I suspect a bug in that code 
would be harder to track down; it is rather complicated.

> I already saw two new crashes. The first one has 6 distinct Root 
> locations pointing to one dead object:
>
> [863.222s][info ][gc,verify,start ? ] Verifying During GC (Remark after)
> [863.222s][debug][gc,verify ? ? ? ? ] Threads
> [863.224s][debug][gc,verify ? ? ? ? ] Heap
> [863.224s][debug][gc,verify ? ? ? ? ] Roots
> [863.229s][error][gc,verify ? ? ? ? ] Root location 0x00007f11719174e7 
> points to dead obj 0x00000000f956dbd8
> [863.229s][error][gc,verify ? ? ? ? ] 
> org.antlr.v4.runtime.atn.PredictionContextCache
> [863.229s][error][gc,verify ? ? ? ? ] {0x00000000f956dbd8} - klass: 
> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> ...
> [863.229s][error][gc,verify ? ? ? ? ] Root location 0x00007f1171921978 
> points to dead obj 0x00000000f956dbd8
> [863.229s][error][gc,verify ? ? ? ? ] 
> org.antlr.v4.runtime.atn.PredictionContextCache
> [863.229s][error][gc,verify ? ? ? ? ] {0x00000000f956dbd8} - klass: 
> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> [863.231s][debug][gc,verify ? ? ? ? ] HeapRegionSets
> [863.231s][debug][gc,verify ? ? ? ? ] HeapRegions
> [863.349s][error][gc,verify ? ? ? ? ] Heap after failed verification 
> (kind 0):
>
> The second crash has only two Root locations pointing to the same dead 
> object but more than 40_000 fields in distinct objects pointing to 
> more than 3_500 dead objects:
>
> [854.473s][info ][gc,verify,start ? ] Verifying During GC (Remark after)
> [854.473s][debug][gc,verify ? ? ? ? ] Threads
> [854.475s][debug][gc,verify ? ? ? ? ] Heap
> [854.475s][debug][gc,verify ? ? ? ? ] Roots
> [854.479s][error][gc,verify ? ? ? ? ] Root location 0x00007f6e60461d5f 
> points to dead obj 0x00000000fa874528
> [854.479s][error][gc,verify ? ? ? ? ] 
> org.antlr.v4.runtime.atn.PredictionContextCache
> [854.479s][error][gc,verify ? ? ? ? ] {0x00000000fa874528} - klass: 
> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> [854.479s][error][gc,verify ? ? ? ? ] Root location 0x00007f6e60461d6d 
> points to dead obj 0x00000000fa874528
> [854.479s][error][gc,verify ? ? ? ? ] 
> org.antlr.v4.runtime.atn.PredictionContextCache
> [854.479s][error][gc,verify ? ? ? ? ] {0x00000000fa874528} - klass: 
> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> [854.479s][error][gc,verify ? ? ? ? ] Root location 0x00007f6e60462138 
> points to dead obj 0x00000000fa874528
> [854.479s][error][gc,verify ? ? ? ? ] 
> org.antlr.v4.runtime.atn.PredictionContextCache
> [854.479s][error][gc,verify ? ? ? ? ] {0x00000000fa874528} - klass: 
> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> [854.482s][debug][gc,verify ? ? ? ? ] HeapRegionSets
> [854.482s][debug][gc,verify ? ? ? ? ] HeapRegions
> [854.484s][error][gc,verify ? ? ? ? ] ----------
> [854.484s][error][gc,verify ? ? ? ? ] Field 0x00000000fd363c70 of live 
> obj 0x00000000fd363c58 in region [0x00000000fd300000, 0x00000000fd400000)
> [854.484s][error][gc,verify ? ? ? ? ] class name 
> org.antlr.v4.runtime.atn.ATNConfig
> [854.484s][error][gc,verify ? ? ? ? ] points to dead obj 
> 0x00000000fa88a540 in region [0x00000000fa800000, 0x00000000fa900000)
> [854.484s][error][gc,verify ? ? ? ? ] class name 
> org.antlr.v4.runtime.atn.ArrayPredictionContext
> [854.484s][error][gc,verify ? ? ? ? ] ----------
> ...
> more than 40_000 fields in distinct objects pointing to more than 
> 3_500 dead objects.
>
> So how can this happen. Is "-XX:+VerifyAfterGC" really reliable here?

Naturally, it's hard to tell for definite what the issue is with only 
these printouts.
However, we can make some observations from the printouts:

Based on the address values of the "Root location" of the printouts, 
each dead object
reported is pointed at from at least one misaligned oop. The only 
misaligned oops in
HotSpot are nmethod oops embedded into the instruction stream as immediates.
So this smells like some kind of nmethod oop processing bug in G1 to me.

The Abortable Mixed GCs (https://openjdk.java.net/jeps/344) that went 
into 12 changed
quite a bit of the nmethod oop scanning code. Perhaps the reason why 
this stopped
reproducing in 12 is related to that. The nmethod oop processing code 
introduced with
AMGC actually had a word tearing problem for nmethod oops, which was 
fixed later with
https://bugs.openjdk.java.net/browse/JDK-8235305

Hope these pointers help.

/Erik

> Thank you and best regards,
> Volker
>
>
> On Wed, Jun 3, 2020 at 7:14 PM Volker Simonis 
> <volker.simonis at gmail.com <mailto:volker.simonis at gmail.com>> wrote:
>
>     Hi Erik,
>
>     thanks a lot for the quick response and the hint with
>     ClassUnloading. I've just started several runs of the test program
>     with "-XX:-ClassUnloading". I'll report back instantly once I have
>     some results.
>
>     Best regards,
>     Volker
>
>     On Wed, Jun 3, 2020 at 5:26 PM Erik ?sterlund
>     <erik.osterlund at oracle.com <mailto:erik.osterlund at oracle.com>> wrote:
>
>         Hi Volker,
>
>         In JDK 12, I changed quite a bit how G1 performs class
>         unloading, to a
>         new model.
>         Since the verification runs just after class unloading, I
>         guess it could
>         be interesting
>         to check if the error happens with -XX:-ClassUnloading as
>         well. If not,
>         then perhaps
>         some of my class unloading changes for G1 in JDK 12 fixed the
>         problem.
>
>         Just a gut feeling...
>
>         Thanks,
>         /Erik
>
>         On 2020-06-03 17:02, Volker Simonis wrote:
>         > Hi,
>         >
>         > I would appreciate some help/advice for debugging a
>         potential G1 crash in
>         > jdk 11. The crash usually occurs when running a proprietary
>         jar file for
>         > about 20-30 minutes and it happens in various parts of the
>         VM (C1- or
>         > C2-compiled code, interpreter, GC). Because the crash
>         locations are so
>         > different and because the customer which reported the issue
>         claimed that it
>         > doesn't happen with Parallel GC, I thought it might be a G1
>         issue. I
>         > couldn't reproduce the crash with jdk 12 and 14 (but with
>         jdk 11 and
>         > 11.0.7, OpenJDK and Oracle JDK). When looking at the G1
>         changes in jdk 12 I
>         > couldn't find any apparent bug fix which potentially solves
>         this problem
>         > but it may have been solved by one of the many G1 changes
>         which happened in
>         > jdk 12.
>         >
>         > I did run the reproducer with "-XX:+UnlockDiagnosticVMOptions
>         > -XX:+VerifyBeforeGC -XX:+VerifyAfterGC -XX:+VerifyDuringGC
>         > -XX:+CheckJNICalls -XX:+G1VerifyRSetsDuringFullGC
>         > -XX:+G1VerifyHeapRegionCodeRoots" and I indeed got
>         verification errors (see
>         > [1] for a complete hs_err file). Sometimes it's just a few
>         fields pointing
>         > to dead objects:
>         >
>         > [1035.782s][error][gc,verify? ? ? ? ?] ----------
>         > [1035.782s][error][gc,verify? ? ? ? ?] Field
>         0x00000000fb509148 of live obj
>         > 0x00000000fb509130 in region [0x00000000fb500000,
>         0x00000000fb600000)
>         > [1035.782s][error][gc,verify? ? ? ? ?] class name
>         > org.antlr.v4.runtime.atn.ATNConfig
>         > [1035.782s][error][gc,verify? ? ? ? ?] points to dead obj
>         > 0x00000000f9ba39b0 in region [0x00000000f9b00000,
>         0x00000000f9c00000)
>         > [1035.782s][error][gc,verify? ? ? ? ?] class name
>         > org.antlr.v4.runtime.atn.SingletonPredictionContext
>         > [1035.782s][error][gc,verify? ? ? ? ?] ----------
>         > [1035.783s][error][gc,verify? ? ? ? ?] Field
>         0x00000000fb509168 of live obj
>         > 0x00000000fb509150 in region [0x00000000fb500000,
>         0x00000000fb600000)
>         > [1035.783s][error][gc,verify? ? ? ? ?] class name
>         > org.antlr.v4.runtime.atn.ATNConfig
>         > [1035.783s][error][gc,verify? ? ? ? ?] points to dead obj
>         > 0x00000000f9ba39b0 in region [0x00000000f9b00000,
>         0x00000000f9c00000)
>         > [1035.783s][error][gc,verify? ? ? ? ?] class name
>         > org.antlr.v4.runtime.atn.SingletonPredictionContext
>         > [1035.783s][error][gc,verify? ? ? ? ?] ----------
>         > ...
>         > [1043.928s][error][gc,verify? ? ? ? ?] Heap Regions:
>         E=young(eden),
>         > S=young(survivor), O=old, HS=humongous(starts),
>         HC=humongous(continues),
>         > CS=collection set, F=free, A=archive, TAMS=top-at-mark-start
>         (previous,
>         > next)
>         > ...
>         > [1043.929s][error][gc,verify? ? ? ? ?] | 79|0x00000000f9b00000,
>         > 0x00000000f9bfffe8, 0x00000000f9c00000| 99%| O| |TAMS
>         0x00000000f9bfffe8,
>         > 0x00000000f9b00000| Updating
>         > ...
>         > [1043.971s][error][gc,verify? ? ? ? ?] | 105|0x00000000fb500000,
>         > 0x00000000fb54fc08, 0x00000000fb600000| 31%| S|CS|TAMS
>         0x00000000fb500000,
>         > 0x00000000fb500000| Complete
>         >
>         > but I also got verification errors with more than 30000
>         fields of distinct
>         > objects pointing to more than 1000 dead objects. How can
>         that happen? Is
>         > the verification always accurate or can this also be a
>         problem with the
>         > verification itself and I'm hunting the wrong problem?
>         >
>         > Sometimes I also saw verification errors where fields point
>         to objects in
>         > regions with "Untracked remset":
>         >
>         > [673.762s][error][gc,verify] ----------
>         > [673.762s][error][gc,verify] Field 0x00000000fca49298 of
>         live obj
>         > 0x00000000fca49280 in region [0x00000000fca0000
>         > 0, 0x00000000fcb00000)
>         > [673.762s][error][gc,verify] class name
>         org.antlr.v4.runtime.atn.ATNConfig
>         > [673.762s][error][gc,verify] points to obj
>         0x00000000f9d5a9a0 in region
>         >
>         81:(F)[0x00000000f9d00000,0x00000000f9d00000,0x00000000f9e00000]
>         remset
>         > Untracked
>         > [673.762s][error][gc,verify] ----------
>         >
>         > But they are by far not that common like the pointers to
>         dead objects. Once
>         > I even saw a "Root location" pointing to a dead object:
>         >
>         > [369.808s][error][gc,verify] Root location
>         0x00007f35bb33f1f8 points to
>         > dead obj 0x00000000f87fa200
>         > [369.808s][error][gc,verify]
>         org.antlr.v4.runtime.atn.PredictionContextCache
>         > [369.808s][error][gc,verify] {0x00000000f87fa200} - klass:
>         > 'org/antlr/v4/runtime/atn/PredictionContextCache'
>         > [369.850s][error][gc,verify] ----------
>         > [369.850s][error][gc,verify] Field 0x00000000fbc60900 of
>         live obj
>         > 0x00000000fbc608f0 in region [0x00000000fbc00000,
>         0x00000000fbd00000)
>         > [369.850s][error][gc,verify] class name
>         > org.antlr.v4.runtime.atn.ParserATNSimulator
>         > [369.850s][error][gc,verify] points to dead obj
>         0x00000000f87fa200 in
>         > region [0x00000000f8700000, 0x00000000f8800000)
>         > [369.850s][error][gc,verify] class name
>         > org.antlr.v4.runtime.atn.PredictionContextCache
>         > [369.850s][error][gc,verify] ----------
>         >
>         > All these verification errors occur after the Remark phase in
>         > G1ConcurrentMark::remark() at:
>         >
>         > verify_during_pause(G1HeapVerifier::G1VerifyRemark,
>         > VerifyOption_G1UsePrevMarking, "Remark after");
>         >
>         > V? [libjvm.so+0x6ca186]? report_vm_error(char const*, int,
>         char const*,
>         > char const*, ...)+0x106
>         > V? [libjvm.so+0x7d4a99]
>         G1HeapVerifier::verify(VerifyOption)+0x399
>         > V? [libjvm.so+0xe128bb] Universe::verify(VerifyOption, char
>         const*)+0x16b
>         > V? [libjvm.so+0x7d44ee]
>         > ?G1HeapVerifier::verify(G1HeapVerifier::G1VerifyType,
>         VerifyOption, char
>         > const*)+0x9e
>         > V? [libjvm.so+0x7addcf]
>         >
>         ?G1ConcurrentMark::verify_during_pause(G1HeapVerifier::G1VerifyType,
>         > VerifyOption, char const*)+0x9f
>         > V? [libjvm.so+0x7b172e] G1ConcurrentMark::remark()+0x3be
>         > V? [libjvm.so+0xe6a5e1] VM_CGC_Operation::doit()+0x211
>         > V? [libjvm.so+0xe69908] VM_Operation::evaluate()+0xd8
>         > V? [libjvm.so+0xe6713f]
>         VMThread::evaluate_operation(VM_Operation*) [clone
>         > .constprop.54]+0xff
>         > V? [libjvm.so+0xe6764e]? VMThread::loop()+0x3be
>         > V? [libjvm.so+0xe67a7b]? VMThread::run()+0x7b
>         >
>         > The GC log output looks as follows:
>         > ...
>         > [1035.775s][info ][gc,verify,start? ?] Verifying During GC
>         (Remark after)
>         > [1035.775s][debug][gc,verify? ? ? ? ?] Threads
>         > [1035.776s][debug][gc,verify? ? ? ? ?] Heap
>         > [1035.776s][debug][gc,verify? ? ? ? ?] Roots
>         > [1035.782s][debug][gc,verify? ? ? ? ?] HeapRegionSets
>         > [1035.782s][debug][gc,verify? ? ? ? ?] HeapRegions
>         > [1035.782s][error][gc,verify? ? ? ? ?] ----------
>         > ...
>         > A more complete GC log can be found here [2].
>         >
>         > For the field 0x00000000fb509148 of live obj
>         0x00000000fb509130 which
>         > points to the dead object 0x00000000f9ba39b0 I get the following
>         > information if I inspect them with clhsdb:
>         >
>         > hsdb> inspect 0x00000000fb509130
>         > instance of Oop for org/antlr/v4/runtime/atn/ATNConfig @
>         0x00000000fb509130
>         > @ 0x00000000fb509130 (size = 32)
>         > _mark: 13
>         > _metadata._compressed_klass: InstanceKlass for
>         > org/antlr/v4/runtime/atn/ATNConfig
>         > state: Oop for org/antlr/v4/runtime/atn/BasicState @
>         0x00000000f83ecfa8 Oop
>         > for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
>         > alt: 1
>         > context: Oop for
>         org/antlr/v4/runtime/atn/SingletonPredictionContext @
>         > 0x00000000f9ba39b0 Oop for
>         > org/antlr/v4/runtime/atn/SingletonPredictionContext @
>         0x00000000f9ba39b0
>         > reachesIntoOuterContext: 8
>         > semanticContext: Oop for
>         org/antlr/v4/runtime/atn/SemanticContext$Predicate
>         > @ 0x00000000f83d57c0 Oop for
>         > org/antlr/v4/runtime/atn/SemanticContext$Predicate @
>         0x00000000f83d57c0
>         >
>         > hsdb> inspect 0x00000000f9ba39b0
>         > instance of Oop for
>         org/antlr/v4/runtime/atn/SingletonPredictionContext @
>         > 0x00000000f9ba39b0 @ 0x00000000f9ba39b0 (size = 32)
>         > _mark: 41551306041
>         > _metadata._compressed_klass: InstanceKlass for
>         > org/antlr/v4/runtime/atn/SingletonPredictionContext
>         > id: 100635259
>         > cachedHashCode: 2005943142
>         > parent: Oop for
>         org/antlr/v4/runtime/atn/SingletonPredictionContext @
>         > 0x00000000f9ba01b0 Oop for
>         > org/antlr/v4/runtime/atn/SingletonPredictionContext @
>         0x00000000f9ba01b0
>         > returnState: 18228
>         >
>         > I could also reproduce the verification errors with a fast
>         debug build of
>         > 11.0.7 which I did run with "-XX:+CheckCompressedOops
>         -XX:+VerifyOops
>         > -XX:+G1VerifyCTCleanup -XX:+G1VerifyBitmaps" in addition to
>         the options
>         > mentioned before, but unfortunaltey the run didn't trigger
>         neither an
>         > assertion nor a different verification error.
>         >
>         > So to summarize, my basic questions are:
>         >? ?- has somebody else encountered similar crashes?
>         >? ?- is someone aware of specific changes in jdk12 which
>         might solve this
>         > problem?
>         >? ?- are the verification errors I'm seeing accurate or is it
>         possible to get
>         > false positives when running with
>         -XX:Verify{Before,During,After}GC ?
>         >
>         > Thanks for your patience,
>         > Volker
>         >
>         > [1]
>         >
>         http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/hs_err_pid28294.log
>         > [2]
>         >
>         http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/verify-error.log
>


From per.liden at oracle.com  Fri Jun  5 10:52:03 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 5 Jun 2020 12:52:03 +0200
Subject: Discussion on ZGC's Page Cache Flush
In-Reply-To: <d64e4c15-5b0c-47d8-93c3-d378f9d13fbe.albert.th@alibaba-inc.com>
References: <d64e4c15-5b0c-47d8-93c3-d378f9d13fbe.albert.th@alibaba-inc.com>
Message-ID: <a6db1323-f14f-1ac4-81c4-37451dd978eb@oracle.com>

Hi,

On 6/5/20 11:24 AM, Hao Tang wrote:
> 
> Hi ZGC Team,
> 
> We encountered "Page Cache Flushed" when we enable ZGC feature. Much longer response time can be observed at the time when "Page Cache Flushed" happened. There is a case that is able to reproduce this scenario. In this case, medium-sized objects are periodically cleaned up. Right after the clean-up, small pages is not sufficient for allocating small-sized objects, which needs to flush medium pages into small pages. We found that simply enlarging the max heap size cannot solve this problem. We believe that "page cache flush" issue could be a general problem, because the ratio of small/medium/large objects are not always constant.
> 
> Sample code:
> import java.util.Random;
> import java.util.concurrent.locks.LockSupport;
> public class TestPageCacheFlush {
>      /*
>       * Options: -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UnlockDiagnosticVMOptions -Xms10g -Xmx10g -XX:ParallelGCThreads=2 -XX:ConcGCThreads=4 -Xlog:gc,gc+heap
>       * small object: fast allocation
>       * medium object: slow allocation, periodic deletion
>       */
>      public static void main(String[] args) throws Exception {
>          long heapSizeKB = Runtime.getRuntime().totalMemory() >> 10;
>          System.out.println(heapSizeKB);
>          SmallContainer smallContainer = new SmallContainer((long)(heapSizeKB * 0.4));     // 40% heap for live small objects
>          MediumContainer mediumContainer = new MediumContainer((long)(heapSizeKB * 0.4));  // 40% heap for live medium objects
>          int totalSmall = smallContainer.getTotalObjects();
>          int totalMedium = mediumContainer.getTotalObjects();
>          int addedSmall = 0;
>          int addedMedium = 1; // should not be divided by zero
>          while (addedMedium < totalMedium * 10) {
>              if (totalSmall / totalMedium > addedSmall / addedMedium) { // keep the ratio of allocated small/medium objects
>                  smallContainer.createAndSaveObject();
>                  addedSmall ++;
>              } else {
>                  mediumContainer.createAndAppendObject();
>                  addedMedium ++;
>              }
>              if ((addedSmall + addedMedium) % 50 == 0) {
>                  LockSupport.parkNanos(500); // make allocation slower
>              }
>          }
>      }
>      static class SmallContainer {
>          private final int KB_PER_OBJECT = 64; // 64KB per object
>          private final Random RANDOM = new Random();
>          private byte[][] smallObjectArray;
>          private long totalKB;
>          private int totalObjects;
>          SmallContainer(long totalKB) {
>              this.totalKB = totalKB;
>              totalObjects = (int)(totalKB / KB_PER_OBJECT);
>              smallObjectArray = new byte[totalObjects][];
>          }
>          int getTotalObjects() {
>              return totalObjects;
>          }
>          // random insertion (with random deletion)
>          void createAndSaveObject() {
>              smallObjectArray[RANDOM.nextInt(totalObjects)] = new byte[KB_PER_OBJECT << 10];
>          }
>      }
>      static class MediumContainer {
>          private final int KB_PER_OBJECT = 512; // 512KB per object
>          private byte[][] mediumObjectArray;
>          private int mediumObjectArrayCurrentIndex = 0;
>          private long totalKB;
>          private int totalObjects;
>          MediumContainer(long totalKB) {
>              this.totalKB = totalKB;
>              totalObjects = (int)(totalKB / KB_PER_OBJECT);
>              mediumObjectArray = new byte[totalObjects][];
>          }
>          int getTotalObjects() {
>              return totalObjects;
>          }
>          void createAndAppendObject() {
>              if (mediumObjectArrayCurrentIndex == totalObjects) { // periodic deletion
>                  mediumObjectArray = new byte[totalObjects][]; // also delete all medium objects in the old array
>                  mediumObjectArrayCurrentIndex = 0;
>              } else {
>                  mediumObjectArray[mediumObjectArrayCurrentIndex] = new byte[KB_PER_OBJECT << 10];
>                  mediumObjectArrayCurrentIndex ++;
>              }
>          }
>      }
> }
> 
> To avoid "page cache flush", we made a patch for converting small/medium pages to medium/small pages ahead of time. This patch works well on an application with relatively-stable allocation rate, which has not encountered throughput problem. How do you think of this solution?
> 
> We notice that you are improving the efficiency for map/unmap operations (https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/029936.html). It may be a step for improving the delay caused by "page cache flush". Do you have further plan for eliminating or improving "page cache flush"?

Yes, and as you might have seen, the latest incarnation of this patchset 
includes asynchronous unmapping, which helps reduce the time for page 
cache flushing. I ran your example program above, with these patches and 
can see ~30% reduction in average page allocation time, and ~60% 
reduction in worst case page allocation time. So, it will be an improvement.

However, I'd be more than happy to take a look at your patch and see 
what you've done. Making page cache flushing even less expensive is 
something we're interested in going forward.

cheers,
Per

> 
> Sincerely,Hao Tang
> 


From per.liden at oracle.com  Fri Jun  5 15:15:26 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 5 Jun 2020 17:15:26 +0200
Subject: RFR: 8245203/8245204/8245208: ZGC: Don't hold the ZPageAllocator
 lock while committing/uncommitting memory
In-Reply-To: <06db729e-d804-a0b4-262a-aa70181d904b@oracle.com>
References: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
 <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>
 <49447851-8f64-8c99-8443-5b400e1851c0@oracle.com>
 <06db729e-d804-a0b4-262a-aa70181d904b@oracle.com>
Message-ID: <dfa723dd-334f-88cf-c21b-f4e09539fc28@oracle.com>

Hi,

Here are the latest patches for this. Contains more review comments from 
Stefan, and they have been rebased on today's jdk/jdk.

* 8246220: ZGC: Introduce ZUnmapper to asynchronous unmap pages
http://cr.openjdk.java.net/~pliden/8246220/webrev.2

* 8245208: ZGC: Don't hold the ZPageAllocator lock while 
committing/uncommitting memory
http://cr.openjdk.java.net/~pliden/8245208/webrev.3

* 8246265: ZGC: Introduce ZConditionLock
http://cr.openjdk.java.net/~pliden/8246265/webrev.0/

* (already review) 8245204: ZGC: Introduce ZListRemoveIterator
http://cr.openjdk.java.net/~pliden/8245204/webrev.0/

* (already review) 8245203: ZGC: Don't track size in ZPhysicalMemoryBacking
http://cr.openjdk.java.net/~pliden/8245203/webrev.0/


And for convenience, here's an all-in-one patch:
http://cr.openjdk.java.net/~pliden/zgc/commit_uncommit/all/webrev.1

cheers,
Per


From igor.ignatyev at oracle.com  Fri Jun  5 16:10:37 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Fri, 5 Jun 2020 09:10:37 -0700
Subject: RFR(S) : 8246494 : introduce vm.flagless at-requires property
In-Reply-To: <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com>
References: <F3998ECB-C19E-4F50-A7CA-A2D0A390ED70@oracle.com>
 <5b1cac8c-7e9b-195a-edfa-6ab972e32bf0@oracle.com>
Message-ID: <A153375D-2932-418F-A534-F7D9B18ACB02@oracle.com>

Hi Per,

you are reading this correctly, make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java JTREG="VM_OPTIONS=-XX:+UseZGC" won't execute gc/z/TestSmallHeap.java; and I don't see it to be incorrect. Let me try to explain why using gc/z/TestSmallHeap.java as a running example. 

A hotspot test is expected not to be just runnable in an out-of-box configuration, but also to serve its purpose as much as possible (which is not always 100% given some tests require special build flavor, environment setup, etc); in other words, a test is to at least have all necessary VM flags within it and not to hope that someone will provide them. gc/z/TestSmallHeap.java does that, it explicitly selects zGC, so there is no need for -XX:+UseZGC to achieve that. Given this test can be run only when zGC can be selected, it @requires vm.gc.Z, which is set to true if zGC is already explicitly selected or if zGC is available and no other GC is specified, and the latter holds for an out-of-box configuration (assuming that zGC is available in the JVM under test); thus, again, you don't have to specify -XX:+UseZGC to run this test. So there are no "technical" reasons to run gc/z/TestSmallHeap.java (or any other gc/z/ tests) with -XX:+UseZGC. The proposed patches don't change that fact in any way.

The patches exclude the tests that ignore external VM flags from execution if any significant VM flags are specified. gc/z/TestSmallHeap.java ignores all externally provided VM flags, including -XX:+UseZGC. And although in the case of -XX:+UseZGC, it's harmless, in almost all other cases it's not. Just to give you a few examples:
	Let's say you are fixing a bug in zGC which could be reproduced by gc/z/TestSmallHeap.java. You came up with two alternative solutions, one of which is guarded by `if (UseNewCode)`. To test these solutions, you ran gc/z tests twice: with -XX:+UseZGC -XX:+UseNewCode, and all tests passed; with XX:+UseZGC, and many tests (but not gc/z/TestSmallHeap.java) failed. So based on these results, you decided that the guarded solution is perfect, cleaned up the code, sent it out for review, got it pushed, and minutes later found out that gc/z/TestSmallHeap.java and some other tests which ignore VM flags failed. It would take you some time, to realize that you hadn't tested your UseNewCode solution by these tests. Yet were these tests excluded from your testing, it would be much easier for you to spot that and react accordingly.
	Here is another scenario, you decided to change the default value of ZUncommit, so you ran different tests with `XX:+UseZGC -XX:-ZUncommit`, all green, you pushed a trivial change s/true/false in z_globals.hpp, next thing you knew a bunch of zGC specific tests failed in CI. And again, these were the tests that silently ignored `XX:+UseZGC -XX:-ZUncommit`.
	Or a slight variation, zGC-supported was added to a future JIT, gc/z tests were run with the flag combination which enabled the future JIT, all passed, the victory was declared; N releases later; default JIT got changed to the future JIT; the next CI build is a disaster, with lots of tests failing from the bugs which had not been found N/2 years ago. 

Although I understand that it might take some getting used to from you and others who used to run gc/x tests with -XX:+Use${X}GC, I am certain that this will improve the overall quality of hotspot, save not only machine time (from running these tests with other flags) but engineers time from analyzing surprising failures, and increase confidence and trust in the hotspot test suite.

In a word, I can see how this can be a bit surprising, yet still less surprising than the current behavior, but I don't see it as incorrect, it just surfaces limitations of certain tests. From my (slightly biased) point of view, it's the right thing to do.

Thanks.
-- Igor

> On Jun 5, 2020, at 1:20 AM, Per Liden <per.liden at oracle.com> wrote:
> 
> Hi Igor,
> 
> When looking at the follow-up sub-tasks for this, I see for example this:
> 
> http://cr.openjdk.java.net/~iignatyev/8246499/webrev.00/test/hotspot/jtreg/gc/z/TestSmallHeap.java.udiff.html
> 
> Maybe I'm misunderstanding how this is supposed to work, but it looks like this test would now _not_ be executed if I do:
> 
>  make TEST=test/hotspot/jtreg/gc/z/TestSmallHeap.java JTREG="VM_OPTIONS=-XX:+UseZGC"
> 
> Is that so? In that case, that seems incorrect.
> 
> cheers,
> Per
> 
> On 6/3/20 11:30 PM, Igor Ignatyev wrote:
>> http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
>>> 70 lines changed: 66 ins; 0 del; 4 mod
>> Hi all,
>> could you please review the patch which introduces a new @requires property to filter out the tests which ignore externally provided JVM flags?
>> the idea behind this patch is to have a way to clearly mark tests which ignore flags, so
>> a) it's obvious that they don't execute a flag-guarded code/feature, and extra care should be taken to use them to verify any flag-guarded changed;
>> b) they can be easily excluded from runs w/ flags.
>> @requires and VMProps allows us to achieve both, so it's been decided to add a new property `vm.flagless`. `vm.flagless` is set to false if there are any XX flags other than `-XX:MaxRAMPercentage` and `-XX:CreateCoredumpOnCrash` (which are known to be set almost always) or any X flags other `-Xmixed`; in other words any tests w/ `@requires vm.flagless` will be excluded from runs w/ any other X / XX flags passed via `-vmoption` / `-javaoption`. in rare cases, when one still wants to run the tests marked by `vm.flagless`  w/ external flags, `vm.flagless` can be forcefully set to true by setting any value to `TEST_VM_FLAGLESS` env. variable.
>> this patch adds necessary common changes and marks common tests, namely Scimark, GTestWrapper and TestNativeProcessBuilder. Component-specific tests will be marked separately by the corresponding subtasks of 8151707[1].
>> please note, the patch depends on CODETOOLS-7902336[2], which will be included in the next jtreg version, so this patch is to be integrated only after jtreg5.1 is promoted and we switch to use it by 8246387[3].
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8246494
>> webrev: http://cr.openjdk.java.net/~iignatyev//8246494/webrev.00
>> testing: marked tests w/ different XX and X flags w/ and w/o TEST_VM_FLAGLESS env. var, and w/o any flags
>> [1] https://bugs.openjdk.java.net/browse/JDK-8151707
>> [2] https://bugs.openjdk.java.net/browse/CODETOOLS-7902336
>> [3] https://bugs.openjdk.java.net/browse/JDK-8246387
>> Thanks,
>> -- Igor


From jianglizhou at google.com  Fri Jun  5 21:38:41 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Fri, 5 Jun 2020 14:38:41 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
 <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
Message-ID: <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>

Hi Ioi,

Thanks for the updated webrev.

To avoid the assert, could you call GCLocker::stall_until_clear in
MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
and VMThread::execute(&op), like following. It is the best to check
with Thomas and other GC experts about the interaction between
'Heap_lock' and 'GCLocker'.

        if (HeapShared::is_heap_object_archiving_allowed()) {
          GCLocker::stall_until_clear();
        }

There is also a compiler error caused by the extra 'THREAD' arg for
MutexLocker. Please fix:

mutexLocker.hpp:205:22: note:   no known conversion for argument 1
from ?Thread*? to ?Mutex*?

  205 |   MutexLocker(Mutex* mutex, Thread* thread,
Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) :

      |               ~~~~~~~^~~~~

mutexLocker.hpp:191:3: note: candidate:
?MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)?

  191 |   MutexLocker(Mutex* mutex, Mutex::SafepointCheckFlag flag =
Mutex::_safepoint_check_flag) :

   ... (rest of output omitted)

Best,
Jiangli

On Thu, Jun 4, 2020 at 9:37 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
> Hi Jiangli,
>
> Updated webrev is here:
>
> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v03/
>
> The only difference with the previous version is this part:
>
> 1975 MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
> 1976                Heap_lock : NULL);     // needed for
> collect_as_vm_thread
>
> Thanks
> - Ioi
>
>
> On 6/4/20 5:23 PM, Jiangli Zhou wrote:
> > Ioi, do you have a new webrev?
> >
> > Best,
> > Jiangli
> >
> > On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>
> >>
> >> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
> >>> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
> >>>>>> HI, Ioi
> >>>>>>
> >>>>>>     If the allocation of EDEN happens between GC and dump, should we
> >>>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
> >>>>>> safepoint there should no allocation happens. The stack trace showed
> >>>>>> it happened with a Java Thread, which should be blocked at safepoint.
> >>>>>>
> >>>>> Hi Yumin,
> >>>>>
> >>>>> I think GCs cannot be executed inside a safepoint, because some parts
> >>>>> of GC need to execute in a safepoint, so they will be blocked until
> >>>>> VM_PopulateDumpSharedSpace::doit has returned.
> >>>>>
> >>>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
> >>>>> fix this, so I will withdraw the current patch.
> >>>>>
> >>>> Hi Yumin,
> >>>>
> >>>> Actually, I changed my mind again, and implemented your suggestion :-)
> >>>>
> >>>> There's actually a way to invoke GC inside a safepoint (it's used by
> >>>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
> >>>> same thing. It's a much simpler change and does what I want -- no other
> >>>> thread will be able to make any heap allocation after the GC has
> >>>> completed, so no EDEN region will be allocated:
> >>>>
> >>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
> >>> Going with the simple approach for the short term sounds ok.
> >>>
> >>> 1612 void VM_PopulateDumpSharedSpace::doit() {
> >>> ...
> >>> 1615     if (GCLocker::is_active()) {
> >>> 1616       // This should rarely happen during -Xshare:dump, if at
> >>> all, but just to be safe.
> >>> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
> >>> 1618       return;
> >>> 1619     }
> >>>
> >>> MetaspaceShared::preload_and_dump()
> >>> ...
> >>> 1945     while (true) {
> >>> 1946       {
> >>> 1947         MutexLocker ml(THREAD, Heap_lock);
> >>> 1948         VMThread::execute(&op);
> >>> 1949       }
> >>> 1950       // If dumping has finished, the VM would have exited. The
> >>> only reason to
> >>> 1951       // come back here is to wait for the GCLocker.
> >>> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
> >>> 1953       os::naked_short_sleep(1);
> >>> 1954     }
> >>> 1955   }
> >>>
> >>>
> >>> Instead of doing the while/retry, calling
> >>> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
> >>> before VM_PopulateDumpSharedSpace probably is much cleaner?
> >> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
> >> cannot be called in the VM thread:
> >>
> >> #  assert(thread->is_Java_thread()) failed: just checking
> >>
> >>> Please also add a comment in MetaspaceShared::preload_and_dump to
> >>> explain that Universe::heap()->collect_as_vm_thread expects that
> >>> Heap_lock is already held by the thread, and that's the reason for the
> >>> call to MutexLocker ml(THREAD, Heap_lock).
> >> I changed the code to be like this:
> >>
> >>       MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
> >>                      Heap_lock : NULL);     // needed for
> >> collect_as_vm_thread
> >>
> >>> Do you also need to call
> >>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
> >>> before GC?
> >> There's no need:
> >>
> >> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
> >>        ...
> >>       case GCCause::_archive_time_gc:
> >>       case GCCause::_metadata_GC_clear_soft_refs: {
> >>         HandleMark hm;
> >>         do_full_collection(true);         // do clear all soft refs
> >>
> >> Thanks
> >> - Ioi
> >>
> >>> Best,
> >>> Jiangli
> >>>
> >>>> The check for "if (GCLocker::is_active())" should almost always be
> >>>> false. I left it there just for safety:
> >>>>
> >>>> During -Xshare:dump, we execute Java code to build the module graph,
> >>>> load classes, etc. So theoretically someone could try to parallelize
> >>>> some of that Java code in the future. Theoretically when CDS has entered
> >>>> the safepoint, another thread could be in the middle a JNI method that
> >>>> has held the GCLock.
> >>>>
> >>>> Thanks
> >>>> - Ioi
> >>>>
> >>>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
> >>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Summary:
> >>>>>>>
> >>>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >>>>>>> CDS executes a full GC so that G1 will compact the heap regions,
> >>>>>>> leaving
> >>>>>>> maximum contiguous free space at the top of the heap. Then, the
> >>>>>>> archived
> >>>>>>> heap regions are allocated from the top of the heap.
> >>>>>>>
> >>>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
> >>>>>>> after the GC has completed. The cleaners may allocate or
> >>>>>>> synchronized, which
> >>>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
> >>>>>>>
> >>>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
> >>>>>>> regions exist,
> >>>>>>> exit the safepoint, run GC, and try again. Eventually all the
> >>>>>>> cleaners will
> >>>>>>> be executed and no more allocation can happen.
> >>>>>>>
> >>>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> - Ioi
> >>>>>>>
> >>>>>>>
> >>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>


From ioi.lam at oracle.com  Sat Jun  6 05:51:31 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Fri, 5 Jun 2020 22:51:31 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
 <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
 <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>
Message-ID: <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>


On 6/5/20 2:38 PM, Jiangli Zhou wrote:
> Hi Ioi,
>
> Thanks for the updated webrev.
>
> To avoid the assert, could you call GCLocker::stall_until_clear in
> MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
> and VMThread::execute(&op), like following. It is the best to check
> with Thomas and other GC experts about the interaction between
> 'Heap_lock' and 'GCLocker'.
>
>          if (HeapShared::is_heap_object_archiving_allowed()) {
>            GCLocker::stall_until_clear();
>          }
Hi Jiangli,

Thanks for the review.

I don't think this is sufficient. Immediately after 
GCLocker::stall_until_clear() has returned, another thread can invoke a 
JNI function that will activate the GCLocker. That's stated by the 
gcLocker.hpp comments:

http://hg.openjdk.java.net/jdk/jdk/file/882b61be2c19/src/hotspot/share/gc/shared/gcLocker.hpp#l109

I've pinged the GC team for this to get their opinion on this.

> There is also a compiler error caused by the extra 'THREAD' arg for
> MutexLocker. Please fix:

I think you are using an older repo. The MutexLocker constructor has 
been changed since Jan 2020:

http://hg.openjdk.java.net/jdk/jdk/annotate/882b61be2c19/src/hotspot/share/runtime/mutexLocker.hpp#l210

Thanks
- Ioi

> mutexLocker.hpp:205:22: note:   no known conversion for argument 1
> from ?Thread*? to ?Mutex*?
>
>    205 |   MutexLocker(Mutex* mutex, Thread* thread,
> Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) :
>
>        |               ~~~~~~~^~~~~
>
> mutexLocker.hpp:191:3: note: candidate:
> ?MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)?
>
>    191 |   MutexLocker(Mutex* mutex, Mutex::SafepointCheckFlag flag =
> Mutex::_safepoint_check_flag) :
>
>     ... (rest of output omitted)
>
> Best,
> Jiangli
>
> On Thu, Jun 4, 2020 at 9:37 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>> Hi Jiangli,
>>
>> Updated webrev is here:
>>
>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v03/
>>
>> The only difference with the previous version is this part:
>>
>> 1975 MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
>> 1976                Heap_lock : NULL);     // needed for
>> collect_as_vm_thread
>>
>> Thanks
>> - Ioi
>>
>>
>> On 6/4/20 5:23 PM, Jiangli Zhou wrote:
>>> Ioi, do you have a new webrev?
>>>
>>> Best,
>>> Jiangli
>>>
>>> On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>
>>>> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
>>>>> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
>>>>>>>> HI, Ioi
>>>>>>>>
>>>>>>>>      If the allocation of EDEN happens between GC and dump, should we
>>>>>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
>>>>>>>> safepoint there should no allocation happens. The stack trace showed
>>>>>>>> it happened with a Java Thread, which should be blocked at safepoint.
>>>>>>>>
>>>>>>> Hi Yumin,
>>>>>>>
>>>>>>> I think GCs cannot be executed inside a safepoint, because some parts
>>>>>>> of GC need to execute in a safepoint, so they will be blocked until
>>>>>>> VM_PopulateDumpSharedSpace::doit has returned.
>>>>>>>
>>>>>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
>>>>>>> fix this, so I will withdraw the current patch.
>>>>>>>
>>>>>> Hi Yumin,
>>>>>>
>>>>>> Actually, I changed my mind again, and implemented your suggestion :-)
>>>>>>
>>>>>> There's actually a way to invoke GC inside a safepoint (it's used by
>>>>>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
>>>>>> same thing. It's a much simpler change and does what I want -- no other
>>>>>> thread will be able to make any heap allocation after the GC has
>>>>>> completed, so no EDEN region will be allocated:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
>>>>> Going with the simple approach for the short term sounds ok.
>>>>>
>>>>> 1612 void VM_PopulateDumpSharedSpace::doit() {
>>>>> ...
>>>>> 1615     if (GCLocker::is_active()) {
>>>>> 1616       // This should rarely happen during -Xshare:dump, if at
>>>>> all, but just to be safe.
>>>>> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
>>>>> 1618       return;
>>>>> 1619     }
>>>>>
>>>>> MetaspaceShared::preload_and_dump()
>>>>> ...
>>>>> 1945     while (true) {
>>>>> 1946       {
>>>>> 1947         MutexLocker ml(THREAD, Heap_lock);
>>>>> 1948         VMThread::execute(&op);
>>>>> 1949       }
>>>>> 1950       // If dumping has finished, the VM would have exited. The
>>>>> only reason to
>>>>> 1951       // come back here is to wait for the GCLocker.
>>>>> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
>>>>> 1953       os::naked_short_sleep(1);
>>>>> 1954     }
>>>>> 1955   }
>>>>>
>>>>>
>>>>> Instead of doing the while/retry, calling
>>>>> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
>>>>> before VM_PopulateDumpSharedSpace probably is much cleaner?
>>>> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
>>>> cannot be called in the VM thread:
>>>>
>>>> #  assert(thread->is_Java_thread()) failed: just checking
>>>>
>>>>> Please also add a comment in MetaspaceShared::preload_and_dump to
>>>>> explain that Universe::heap()->collect_as_vm_thread expects that
>>>>> Heap_lock is already held by the thread, and that's the reason for the
>>>>> call to MutexLocker ml(THREAD, Heap_lock).
>>>> I changed the code to be like this:
>>>>
>>>>        MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
>>>>                       Heap_lock : NULL);     // needed for
>>>> collect_as_vm_thread
>>>>
>>>>> Do you also need to call
>>>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
>>>>> before GC?
>>>> There's no need:
>>>>
>>>> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
>>>>         ...
>>>>        case GCCause::_archive_time_gc:
>>>>        case GCCause::_metadata_GC_clear_soft_refs: {
>>>>          HandleMark hm;
>>>>          do_full_collection(true);         // do clear all soft refs
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Best,
>>>>> Jiangli
>>>>>
>>>>>> The check for "if (GCLocker::is_active())" should almost always be
>>>>>> false. I left it there just for safety:
>>>>>>
>>>>>> During -Xshare:dump, we execute Java code to build the module graph,
>>>>>> load classes, etc. So theoretically someone could try to parallelize
>>>>>> some of that Java code in the future. Theoretically when CDS has entered
>>>>>> the safepoint, another thread could be in the middle a JNI method that
>>>>>> has held the GCLock.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Summary:
>>>>>>>>>
>>>>>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>>>>>>> CDS executes a full GC so that G1 will compact the heap regions,
>>>>>>>>> leaving
>>>>>>>>> maximum contiguous free space at the top of the heap. Then, the
>>>>>>>>> archived
>>>>>>>>> heap regions are allocated from the top of the heap.
>>>>>>>>>
>>>>>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>>>>>>> after the GC has completed. The cleaners may allocate or
>>>>>>>>> synchronized, which
>>>>>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>>>>>>>
>>>>>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
>>>>>>>>> regions exist,
>>>>>>>>> exit the safepoint, run GC, and try again. Eventually all the
>>>>>>>>> cleaners will
>>>>>>>>> be executed and no more allocation can happen.
>>>>>>>>>
>>>>>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>


From kim.barrett at oracle.com  Sun Jun  7 05:08:17 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Sun, 7 Jun 2020 01:08:17 -0400
Subject: RFR: 8246718: ParallelGC should not check for forward objects for
 copy task queue 
Message-ID: <7367A959-85C4-4CA7-9B4C-822ACD68AD63@oracle.com>

Please review this change to the handling of copy tasks by ParallelGC.
Formerly, before adding a task entry to the queue it would check
whether the referenced object was already forwarded.  If so, it would
handle the reference immediately, inline, rather than pushing the task
onto the queue.

Measurements show that a no-worse and sometimes better (depending on
the hardware configuration) approach for most applications is to not
do the forwarding check, but to instead prefetch the start of the
referenced object and then always push the task.

The corresponding G1 code does a for-write prefetch on the mark word
of the object, and a for-read prefetch past the object header.
Measurements with ParallelGC found the for-read prefetch not very
productive, and possibly even slightly counter-productive, so this
change only does the for-write prefetch on the mark word.

CR:
https://bugs.openjdk.java.net/browse/JDK-8246718

Webrev:
https://cr.openjdk.java.net/~kbarrett/8246718/

Testing:
mach5 tier1-5
various performance tests


From thomas.schatzl at oracle.com  Sun Jun  7 10:25:44 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Sun, 7 Jun 2020 12:25:44 +0200
Subject: RFR: 8246718: ParallelGC should not check for forward objects for
 copy task queue
In-Reply-To: <7367A959-85C4-4CA7-9B4C-822ACD68AD63@oracle.com>
References: <7367A959-85C4-4CA7-9B4C-822ACD68AD63@oracle.com>
Message-ID: <1af58421-25d3-666b-6a06-0167f20fa87b@oracle.com>

Hi,

On 07.06.20 07:08, Kim Barrett wrote:
> Please review this change to the handling of copy tasks by ParallelGC.
> Formerly, before adding a task entry to the queue it would check
> whether the referenced object was already forwarded.  If so, it would
> handle the reference immediately, inline, rather than pushing the task
> onto the queue.
> 
> Measurements show that a no-worse and sometimes better (depending on
> the hardware configuration) approach for most applications is to not
> do the forwarding check, but to instead prefetch the start of the
> referenced object and then always push the task.
> 
> The corresponding G1 code does a for-write prefetch on the mark word
> of the object, and a for-read prefetch past the object header.
> Measurements with ParallelGC found the for-read prefetch not very
> productive, and possibly even slightly counter-productive, so this
> change only does the for-write prefetch on the mark word.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8246718
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8246718/
> 
> Testing:
> mach5 tier1-5
> various performance tests

   looks good. Good find! :)

Thomas


From kim.barrett at oracle.com  Sun Jun  7 20:54:34 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Sun, 7 Jun 2020 16:54:34 -0400
Subject: RFR: 8246718: ParallelGC should not check for forward objects for
 copy task queue
In-Reply-To: <1af58421-25d3-666b-6a06-0167f20fa87b@oracle.com>
References: <7367A959-85C4-4CA7-9B4C-822ACD68AD63@oracle.com>
 <1af58421-25d3-666b-6a06-0167f20fa87b@oracle.com>
Message-ID: <E34EB504-AC09-44D7-B321-9A2E54A1CFBD@oracle.com>


> On Jun 7, 2020, at 6:25 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi,
> 
> On 07.06.20 07:08, Kim Barrett wrote:
>> Please review this change to the handling of copy tasks by ParallelGC.
>> Formerly, before adding a task entry to the queue it would check
>> whether the referenced object was already forwarded.  If so, it would
>> handle the reference immediately, inline, rather than pushing the task
>> onto the queue.
>> Measurements show that a no-worse and sometimes better (depending on
>> the hardware configuration) approach for most applications is to not
>> do the forwarding check, but to instead prefetch the start of the
>> referenced object and then always push the task.
>> The corresponding G1 code does a for-write prefetch on the mark word
>> of the object, and a for-read prefetch past the object header.
>> Measurements with ParallelGC found the for-read prefetch not very
>> productive, and possibly even slightly counter-productive, so this
>> change only does the for-write prefetch on the mark word.
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8246718
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8246718/
>> Testing:
>> mach5 tier1-5
>> various performance tests
> 
>  looks good. Good find! :)
> 
> Thomas

Thanks.


From erik.osterlund at oracle.com  Mon Jun  8 06:59:30 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Mon, 8 Jun 2020 08:59:30 +0200
Subject: RFR: 8246718: ParallelGC should not check for forward objects for
 copy task queue
In-Reply-To: <7367A959-85C4-4CA7-9B4C-822ACD68AD63@oracle.com>
References: <7367A959-85C4-4CA7-9B4C-822ACD68AD63@oracle.com>
Message-ID: <345c44c5-740d-a079-f20e-c11181f6b696@oracle.com>

Hi Kim,

Looks good.

Thanks,
/Erik

On 2020-06-07 07:08, Kim Barrett wrote:
> Please review this change to the handling of copy tasks by ParallelGC.
> Formerly, before adding a task entry to the queue it would check
> whether the referenced object was already forwarded.  If so, it would
> handle the reference immediately, inline, rather than pushing the task
> onto the queue.
>
> Measurements show that a no-worse and sometimes better (depending on
> the hardware configuration) approach for most applications is to not
> do the forwarding check, but to instead prefetch the start of the
> referenced object and then always push the task.
>
> The corresponding G1 code does a for-write prefetch on the mark word
> of the object, and a for-read prefetch past the object header.
> Measurements with ParallelGC found the for-read prefetch not very
> productive, and possibly even slightly counter-productive, so this
> change only does the for-write prefetch on the mark word.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8246718
>
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8246718/
>
> Testing:
> mach5 tier1-5
> various performance tests
>


From ofirg6 at gmail.com  Mon Jun  8 07:56:34 2020
From: ofirg6 at gmail.com (Ofir Gordon)
Date: Mon, 8 Jun 2020 10:56:34 +0300
Subject: Logging of all GC memory access
Message-ID: <CAC9y=qALog4rgkYJpW7kw-i4V_yhLWanNBZX547__dGCYREQng@mail.gmail.com>

Hello,

Is there a way to enable logging of any access to memory that the gc
process is performing? For the purpose of theoretical time analysis.
Is there a way to do this using flags or with some external tool? or is
there a specific place in the code in which adding few lines would enable
such property?
I'm familiar with the GC logging flags which provide information about
execution of collections, but I'm specifically looking for information
about accesses to memory.

(I'm working with the jdk-14 source code if it's relevant).

Thanks,
Ofir


From thomas.schatzl at oracle.com  Mon Jun  8 09:17:29 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 8 Jun 2020 11:17:29 +0200
Subject: Logging of all GC memory access
In-Reply-To: <CAC9y=qALog4rgkYJpW7kw-i4V_yhLWanNBZX547__dGCYREQng@mail.gmail.com>
References: <CAC9y=qALog4rgkYJpW7kw-i4V_yhLWanNBZX547__dGCYREQng@mail.gmail.com>
Message-ID: <ef11a6d8-0dea-0877-094c-e7c47aba3570@oracle.com>

Hi,

On 08.06.20 09:56, Ofir Gordon wrote:
> Hello,
> 
> Is there a way to enable logging of any access to memory that the gc
> process is performing? For the purpose of theoretical time analysis.

"Process" as in operating system process, or the general actions taken 
during gc?

> Is there a way to do this using flags or with some external tool? or is
> there a specific place in the code in which adding few lines would enable
> such property?

Your best option (as far as I understand the problem) if you want all 
accesses is probably using your machine's hardware performance counters 
to collect memory accesses, aggregating them by stack trace.

On Linux there is the perf tool, and several vendors provide profilers 
that can read them (e.g. Intel VTune), partially open source. A search 
for "performance counter" in one of the big development platforms (e.g. 
Github) gives hundreds of hits for potentially interesting tools that 
provide access to them.

> I'm familiar with the GC logging flags which provide information about
> execution of collections, but I'm specifically looking for information
> about accesses to memory.
> 
> (I'm working with the jdk-14 source code if it's relevant).

In case of accesses to the Java heap (and not general memory accesses) 
by the VM (that does not include compiled code, but includes "other" 
accesses) you could probably look whether hooking into the Access class 
and derivatives (RawAccess, HeapAccess, ArrayAccess) helps.

> 
> Thanks,
> Ofir
> 

Hth,
   Thomas


From aph at redhat.com  Mon Jun  8 09:21:25 2020
From: aph at redhat.com (Andrew Haley)
Date: Mon, 8 Jun 2020 10:21:25 +0100
Subject: Logging of all GC memory access
In-Reply-To: <ef11a6d8-0dea-0877-094c-e7c47aba3570@oracle.com>
References: <CAC9y=qALog4rgkYJpW7kw-i4V_yhLWanNBZX547__dGCYREQng@mail.gmail.com>
 <ef11a6d8-0dea-0877-094c-e7c47aba3570@oracle.com>
Message-ID: <17cd9553-2379-fc73-19d5-58508c08db03@redhat.com>

On 08/06/2020 10:17, Thomas Schatzl wrote:
> In case of accesses to the Java heap (and not general memory accesses) 
> by the VM (that does not include compiled code, but includes "other" 
> accesses) you could probably look whether hooking into the Access class 
> and derivatives (RawAccess, HeapAccess, ArrayAccess) helps.

I was composing another reply. I'd add that I'd create a ring buffer and
add logging to G1ParScanThreadState::copy_to_survivor_space and its friends.
But it depends on exactly what OP wants.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From kim.barrett at oracle.com  Mon Jun  8 10:16:00 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Mon, 8 Jun 2020 06:16:00 -0400
Subject: RFR: 8246718: ParallelGC should not check for forward objects for
 copy task queue
In-Reply-To: <345c44c5-740d-a079-f20e-c11181f6b696@oracle.com>
References: <7367A959-85C4-4CA7-9B4C-822ACD68AD63@oracle.com>
 <345c44c5-740d-a079-f20e-c11181f6b696@oracle.com>
Message-ID: <9B488828-39A1-414D-B6DF-3E0D95DCC62A@oracle.com>


> On Jun 8, 2020, at 2:59 AM, Erik ?sterlund <erik.osterlund at oracle.com> wrote:
> 
> Hi Kim,
> 
> Looks good.

Thanks.

> 
> Thanks,
> /Erik
> 
> On 2020-06-07 07:08, Kim Barrett wrote:
>> Please review this change to the handling of copy tasks by ParallelGC.
>> Formerly, before adding a task entry to the queue it would check
>> whether the referenced object was already forwarded.  If so, it would
>> handle the reference immediately, inline, rather than pushing the task
>> onto the queue.
>> 
>> Measurements show that a no-worse and sometimes better (depending on
>> the hardware configuration) approach for most applications is to not
>> do the forwarding check, but to instead prefetch the start of the
>> referenced object and then always push the task.
>> 
>> The corresponding G1 code does a for-write prefetch on the mark word
>> of the object, and a for-read prefetch past the object header.
>> Measurements with ParallelGC found the for-read prefetch not very
>> productive, and possibly even slightly counter-productive, so this
>> change only does the for-write prefetch on the mark word.
>> 
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8246718
>> 
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8246718/
>> 
>> Testing:
>> mach5 tier1-5
>> various performance tests
>> 
> 


From ofirg6 at gmail.com  Mon Jun  8 12:12:09 2020
From: ofirg6 at gmail.com (Ofir Gordon)
Date: Mon, 8 Jun 2020 15:12:09 +0300
Subject: Logging of all GC memory access
In-Reply-To: <17cd9553-2379-fc73-19d5-58508c08db03@redhat.com>
References: <CAC9y=qALog4rgkYJpW7kw-i4V_yhLWanNBZX547__dGCYREQng@mail.gmail.com>
 <ef11a6d8-0dea-0877-094c-e7c47aba3570@oracle.com>
 <17cd9553-2379-fc73-19d5-58508c08db03@redhat.com>
Message-ID: <CAC9y=qCamDR+AN+jE2RyJ8gm_6Dk_o6kcLpNbfnmJguS6GpByw@mail.gmail.com>

Thank you both for your answers.

I was referring to the memory accesses taking place during the gc, and also
specifically access to the heap (i.e. access to an object on the heap as
part of the marking process etc.)

Also, are you familiar with a way to get a "snapshot" of the heap at any gc
activation? log the addresses where objects are placed, and the pointers
between them? is such a thing possible?


??????? ??? ??, 8 ????? 2020 ?-12:22 ??? ?Andrew Haley?? <?aph at redhat.com
??>:?

> On 08/06/2020 10:17, Thomas Schatzl wrote:
> > In case of accesses to the Java heap (and not general memory accesses)
> > by the VM (that does not include compiled code, but includes "other"
> > accesses) you could probably look whether hooking into the Access class
> > and derivatives (RawAccess, HeapAccess, ArrayAccess) helps.
>
> I was composing another reply. I'd add that I'd create a ring buffer and
> add logging to G1ParScanThreadState::copy_to_survivor_space and its
> friends.
> But it depends on exactly what OP wants.
>
> --
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
>


From thomas.schatzl at oracle.com  Mon Jun  8 14:12:37 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 8 Jun 2020 16:12:37 +0200
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
 <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
 <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>
 <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>
Message-ID: <f61a2143-a0c2-fc8c-4e4e-fa75ddb3de61@oracle.com>

Hi,

On 06.06.20 07:51, Ioi Lam wrote:
> 
> 
> On 6/5/20 2:38 PM, Jiangli Zhou wrote:
>> Hi Ioi,
>>
>> Thanks for the updated webrev.
>>
>> To avoid the assert, could you call GCLocker::stall_until_clear in
>> MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
>> and VMThread::execute(&op), like following. It is the best to check
>> with Thomas and other GC experts about the interaction between
>> 'Heap_lock' and 'GCLocker'.
>>
>> ???????? if (HeapShared::is_heap_object_archiving_allowed()) {
>> ?????????? GCLocker::stall_until_clear();
>> ???????? }
> Hi Jiangli,
> 
> Thanks for the review.
> 
> I don't think this is sufficient. Immediately after 
> GCLocker::stall_until_clear() has returned, another thread can invoke a 
> JNI function that will activate the GCLocker. That's stated by the 
> gcLocker.hpp comments:
> 
> http://hg.openjdk.java.net/jdk/jdk/file/882b61be2c19/src/hotspot/share/gc/shared/gcLocker.hpp#l109 

The GCLocker::stall_until_clear()/GClocker::is_active() loop seems to 
try to achieve something different than what the CR tries to fix.

The change without that loop/GClocker::stall_until_clear() already fully 
achieves that there is no allocation between compaction and the CDS dump 
(at least for STW collectors or collectors that can do a "full" 
collection during an STW pause.)

This additional code to wait for GCLocker being inactive to always get a 
contiguous range of memory is a different matter:

Using GCLocker::stall_until_clear() does not work: it would only wait 
for completion of all JNI critical sections if a gc had been requested 
while these JNI critical sections were active. Otherwise it would do 
nothing and let the caller continue, some threads still being in a 
critical section and the gclocker held. Which means that the following 
full gc may still "fail".

(That gc would set needs_gc and then it would work. But since you did 
not know whether the first full gc has been successful, you would need 
another one. Since at that point the return from the last JNI CS would 
trigger a young gc, you have two extra gcs in the common case...).

Another serious problem is that the stall_until_clear() code is written 
to be only called in a Java thread. I do not think it works in the VM 
thread without lots of more thought (and at least necessary changes to 
asserts). As Jiangli mentions, there is likely also an issue with 
ordering of Heap_lock and JNI_critical_lock.

The GCLocker::is_active() loop is a bit better as it makes sure that all 
threads left the JNI critical section and since we are in a safepoint, 
when exiting the VM call these threads will block (and not be able to go 
into another critical section).

So the full gc will likely be able to move away all objects from eden 
regions.

However:
- this part of the code is bound to break with a change of G1 to pin 
(eden) regions instead of using the gclocker (I am aware that it is very 
g1 specific already).

One option to decrease the direct reliance on the GCLocker a bit could 
be to have CollectedHeap::collect_as_vm_thread return a bool about 
whether the collection could be executed (or there is a pinned eden 
region?). Since the only reason to abort is the GCLocker for all STW gcs 
at that point, this boils down to the same loop without directly using 
the GCLocker here.

- one other issue I think there is with this piece code is that 
safepoints/blocks for another reason while being in a critical section. 
Then we have a deadlock.

This is prohibited by the JNI spec ("In a code segment enclosed by 
GetRelease*Critical the native code must not ... or cause the current 
thread to block"), but would be very hard to analyze so I recommend to 
at least error out (or in this case continue with a suboptimal heap) 
after too many retries, or at least start logging some warning that you 
are stalled because of GCLocker being held.
Otherwise this error state will be too hard to understand and likely 
very very hard to reproduce.

- this seems to be an optimization only if I understand the situation 
correctly.

So Overall, for this change I would tend to suggest to only fix the bug, 
i.e. remove the retries of any kind and so not try to optimize memory 
layout here and think about this in an extra CR.

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Mon Jun  8 14:49:34 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 8 Jun 2020 16:49:34 +0200
Subject: RFR (L): 8244603 and 8238858: Improve young gen sizing
In-Reply-To: <d5c748c8-f10d-65ff-b1fb-50c0ab123384@oracle.com>
References: <d5c748c8-f10d-65ff-b1fb-50c0ab123384@oracle.com>
Message-ID: <a10bbc25-0045-baf6-8304-7ea4de5833c9@oracle.com>

Hi all,

   ping! Looking for reviews still...

Thanks,
   Thomas

On 19.05.20 15:37, Thomas Schatzl wrote:
> Hi all,
> 
>  ? can I have reviews for this change that improves young gen sizing a 
> lot to prepare for heap shrinking during young gc (JDK-8238687) ;)
> 
> In particular, with this change the following issues two related issues 
> are fixed:
> 
> * 8238858: G1 Mixed gc young gen sizing might cause the first mixed gc 
> to immediately follow the prepare mixed gc
> * 8244603: G1 incorrectly limiting young gen size when using the reserve 
> can result in repeated full gcs
> 
> These have been grouped together because it is too hard to separate them 
> out as the bugs required significant rewrite in the young gen sizing.
> 
> This results in G1 following GCTimeRatio much better than before, 
> leading to less erratic heap expansion. That is, constant loads do not 
> result in that much overcommit any more.
> 
> Some background:
> 
> At end of gc and when the remembered set sampling thread (re-)evaluates 
> the young gen G1 calculates two values:
> 
> - the *desired* (unconstrained) size of the young gen; 
> desired/unconstrained meaning that these values are not limited by 
> actually existing free regions. This value is interesting for adaptive 
> IHOP so that it (better) converges to a fixed value. (There is some 
> bugfix, JDK-8238163, coming for this to fix a few issues with that)
> 
> - the actual *target* size for the young gen, i.e. after taking 
> constraints in available free regions into account, and whether we are 
> currently already needing to use parts of the reserve or not.
> 
> Some problems that were fixed with the current code:
> 
> - during calculation of the *desired* young gen length G1 basically 
> sizes the young gen during mixed gc to the minimum allowed size always. 
> This causes unnecessary spikes in short/long term ratios, causing lots 
> of heap increases even with a constant load.
> Since we do not shrink the heap yet during regular gcs, this typically 
> ended up in fully expanding the heap (unless the reclamations during 
> Remark actually reclaimed something, but the equilibrium of committed 
> heap between these two mechanisms is much higher).
> 
> E.g. on specjbb2015 fixed IR (constant load) with "small" and "medium" 
> load G1 will use half the heap now while staying < GCTimeRatio.
> 
> - at end of gc g1 calculates the young gen for the *next* gc, i.e. 
> during the prepare mixed gc g1 should already use the "reduced" amount 
> of regions (that's JDK-8238858); similarly the *last* mixed gc in the 
> mixed gc phase should already use the calculation for the young phase. 
> The current code does not. This partially fixes some "at end of mixed gc 
> it takes a while for g1 to achieve previous young gen size again" issues.
> (There is a CR for that, but as mentioned, this change does not 
> completely solve it).
> 
> - there were some calculations to ensure that "at least one region will 
> be allocated" every time g1 recalculates young gen but that really 
> resulted in g1 increasing the young gen by at least one. You do not 
> always want that, particularly since we regularly revise the young gen. 
> What you want is a minimum desired *eden* size.
> 
> - the desired and target young gen size calculation was done in a single 
> huge method. This change splits up the code a bit.
> 
> - the code that calculated the actual *target* young length has been 
> very inconsistent in handling the reserve. I.e. the limit to the 
> actually available free regions only applied in some cases, and notably 
> not if you were already using the reserve, causing strange behavior 
> where at least the calculated young gen target length was higher than 
> available free regions. This could cause full gcs.
> 
> - I added some detailed trace-level logging for the ergonomic decisions 
> which really helps when debugging issues, but might be too 
> large/intrusive for the code - I am not sure whether to leave it in.
> 
> Reviewing: I think the best entry point for this change is 
> G1Policy::update_young_list_target_length(size_t) where the new code 
> first calculates a desired young gen length and then target young length.
> 
> Some additional notes:
> - eden before a mixed gc is calculated by predicting the time for the 
> minimum amount of old gen regions we will definitely take 
> (min_old_cset_length) and then letting eden fill up the remainder.
> 
> Often this results in significantly larger young gens than before this 
> change, at worst young gen will be limited to minimum young gen size (as 
> before). Overall it works fairly well, i.e. gives much smoother cpu 
> usage. There is a caveat to that in that it depends on accuracy of 
> predictions. Since G1 predictions are often too high, we might want to 
> take more a lot more optional regions in the future to not be required 
> to early terminate the mixed gc
> I.e. I have often seen that we do not use up the 200ms pause time goal.
> 
> - currently, and like before, MMU desired length overrides the pause 
> time desired length. I.e. *if* a GCPauseTimeIntervalMillis is set, the 
> spacing between gcs is more important than actual pause times. The 
> difference is that now that there is an explicit comment about this 
> behavior there :)
> 
> - when eating into the reserve (last 10% of heap), we at most allow use 
> of the sizer's min young length or half of the reserve regions (rounded 
> up!), whichever is higher. This is an arbitrary decision, but since 
> supposedly at that time we are already close to the next mixed gc due to 
> adaptive ihop, so we can take more.
> 
> - note that all this is about calculating a *target* length, not the 
> actual length. Actual length may be higher e.g. due to gclocker.
> 
> - there may be some out-of-box performance regressions since G1 does not 
> expand the heap that much more. Performance can be restored by either 
> decreasing GCTimeRatio, or better setting minimum heap sizes.
> 
> Actually, in the future, when shrinking is implemented (JDK-8238687), 
> these may be more severe (in some benchmarks, actual gc usage is still 
> <2%). I will likely try to balance that with decreasing default 
> GCTimeRatio value in the future.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8244603
> https://bugs.openjdk.java.net/browse/JDK-8238858
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8244603/webrev/
> Testing:
> mach5 tier1-5, perf testing
> 
> Thanks,
>  ? Thomas


From poonam.bajaj at oracle.com  Mon Jun  8 14:57:09 2020
From: poonam.bajaj at oracle.com (Poonam Parhar)
Date: Mon, 8 Jun 2020 07:57:09 -0700
Subject: Need help to fix a potential G1 crash in jdk11
In-Reply-To: <abd45ff1-7699-c1a5-d143-d2ec24e55667@oracle.com>
References: <CA+3eh11vEWqwn+fLvgG=v8gi0fTRLebo8cYOnS7qBWCueJbH-Q@mail.gmail.com>
 <e553e35e-3cb3-1c03-048a-3e4e7adad923@oracle.com>
 <CA+3eh12=fhJhXcyeP9kiwo9JUCVU1aXzUpQRAuXMCos2jsqGDw@mail.gmail.com>
 <CA+3eh132QHBGxyjU3K3ZQ_Xv-UbUUukfQ_3F2c9dD+hb=AAC8g@mail.gmail.com>
 <abd45ff1-7699-c1a5-d143-d2ec24e55667@oracle.com>
Message-ID: <f67237f4-cfd2-edf8-6488-6ac07c5ba2d6@oracle.com>

Hi Volker,

Did you try running with -XX:+VerifyRememberedSets? This might tell if 
the problem is related to remset updates.

Thanks,
Poonam

On 6/5/20 1:55 AM, Erik ?sterlund wrote:
> Hi Volker,
>
> On 2020-06-03 20:18, Volker Simonis wrote:
>> Unfortunately, "-XX:-ClassUnloading" doesn't help :(
>
> I am actually happy that did not help. I suspect a bug in that code 
> would be harder to track down; it is rather complicated.
>
>> I already saw two new crashes. The first one has 6 distinct Root 
>> locations pointing to one dead object:
>>
>> [863.222s][info ][gc,verify,start ? ] Verifying During GC (Remark after)
>> [863.222s][debug][gc,verify ? ? ? ? ] Threads
>> [863.224s][debug][gc,verify ? ? ? ? ] Heap
>> [863.224s][debug][gc,verify ? ? ? ? ] Roots
>> [863.229s][error][gc,verify ? ? ? ? ] Root location 
>> 0x00007f11719174e7 points to dead obj 0x00000000f956dbd8
>> [863.229s][error][gc,verify ? ? ? ? ] 
>> org.antlr.v4.runtime.atn.PredictionContextCache
>> [863.229s][error][gc,verify ? ? ? ? ] {0x00000000f956dbd8} - klass: 
>> 'org/antlr/v4/runtime/atn/PredictionContextCache'
>> ...
>> [863.229s][error][gc,verify ? ? ? ? ] Root location 
>> 0x00007f1171921978 points to dead obj 0x00000000f956dbd8
>> [863.229s][error][gc,verify ? ? ? ? ] 
>> org.antlr.v4.runtime.atn.PredictionContextCache
>> [863.229s][error][gc,verify ? ? ? ? ] {0x00000000f956dbd8} - klass: 
>> 'org/antlr/v4/runtime/atn/PredictionContextCache'
>> [863.231s][debug][gc,verify ? ? ? ? ] HeapRegionSets
>> [863.231s][debug][gc,verify ? ? ? ? ] HeapRegions
>> [863.349s][error][gc,verify ? ? ? ? ] Heap after failed verification 
>> (kind 0):
>>
>> The second crash has only two Root locations pointing to the same 
>> dead object but more than 40_000 fields in distinct objects pointing 
>> to more than 3_500 dead objects:
>>
>> [854.473s][info ][gc,verify,start ? ] Verifying During GC (Remark after)
>> [854.473s][debug][gc,verify ? ? ? ? ] Threads
>> [854.475s][debug][gc,verify ? ? ? ? ] Heap
>> [854.475s][debug][gc,verify ? ? ? ? ] Roots
>> [854.479s][error][gc,verify ? ? ? ? ] Root location 
>> 0x00007f6e60461d5f points to dead obj 0x00000000fa874528
>> [854.479s][error][gc,verify ? ? ? ? ] 
>> org.antlr.v4.runtime.atn.PredictionContextCache
>> [854.479s][error][gc,verify ? ? ? ? ] {0x00000000fa874528} - klass: 
>> 'org/antlr/v4/runtime/atn/PredictionContextCache'
>> [854.479s][error][gc,verify ? ? ? ? ] Root location 
>> 0x00007f6e60461d6d points to dead obj 0x00000000fa874528
>> [854.479s][error][gc,verify ? ? ? ? ] 
>> org.antlr.v4.runtime.atn.PredictionContextCache
>> [854.479s][error][gc,verify ? ? ? ? ] {0x00000000fa874528} - klass: 
>> 'org/antlr/v4/runtime/atn/PredictionContextCache'
>> [854.479s][error][gc,verify ? ? ? ? ] Root location 
>> 0x00007f6e60462138 points to dead obj 0x00000000fa874528
>> [854.479s][error][gc,verify ? ? ? ? ] 
>> org.antlr.v4.runtime.atn.PredictionContextCache
>> [854.479s][error][gc,verify ? ? ? ? ] {0x00000000fa874528} - klass: 
>> 'org/antlr/v4/runtime/atn/PredictionContextCache'
>> [854.482s][debug][gc,verify ? ? ? ? ] HeapRegionSets
>> [854.482s][debug][gc,verify ? ? ? ? ] HeapRegions
>> [854.484s][error][gc,verify ? ? ? ? ] ----------
>> [854.484s][error][gc,verify ? ? ? ? ] Field 0x00000000fd363c70 of 
>> live obj 0x00000000fd363c58 in region [0x00000000fd300000, 
>> 0x00000000fd400000)
>> [854.484s][error][gc,verify ? ? ? ? ] class name 
>> org.antlr.v4.runtime.atn.ATNConfig
>> [854.484s][error][gc,verify ? ? ? ? ] points to dead obj 
>> 0x00000000fa88a540 in region [0x00000000fa800000, 0x00000000fa900000)
>> [854.484s][error][gc,verify ? ? ? ? ] class name 
>> org.antlr.v4.runtime.atn.ArrayPredictionContext
>> [854.484s][error][gc,verify ? ? ? ? ] ----------
>> ...
>> more than 40_000 fields in distinct objects pointing to more than 
>> 3_500 dead objects.
>>
>> So how can this happen. Is "-XX:+VerifyAfterGC" really reliable here?
>
> Naturally, it's hard to tell for definite what the issue is with only 
> these printouts.
> However, we can make some observations from the printouts:
>
> Based on the address values of the "Root location" of the printouts, 
> each dead object
> reported is pointed at from at least one misaligned oop. The only 
> misaligned oops in
> HotSpot are nmethod oops embedded into the instruction stream as 
> immediates.
> So this smells like some kind of nmethod oop processing bug in G1 to me.
>
> The Abortable Mixed GCs (https://openjdk.java.net/jeps/344) that went 
> into 12 changed
> quite a bit of the nmethod oop scanning code. Perhaps the reason why 
> this stopped
> reproducing in 12 is related to that. The nmethod oop processing code 
> introduced with
> AMGC actually had a word tearing problem for nmethod oops, which was 
> fixed later with
> https://bugs.openjdk.java.net/browse/JDK-8235305
>
> Hope these pointers help.
>
> /Erik
>
>> Thank you and best regards,
>> Volker
>>
>>
>> On Wed, Jun 3, 2020 at 7:14 PM Volker Simonis 
>> <volker.simonis at gmail.com <mailto:volker.simonis at gmail.com>> wrote:
>>
>> ??? Hi Erik,
>>
>> ??? thanks a lot for the quick response and the hint with
>> ??? ClassUnloading. I've just started several runs of the test program
>> ??? with "-XX:-ClassUnloading". I'll report back instantly once I have
>> ??? some results.
>>
>> ??? Best regards,
>> ??? Volker
>>
>> ??? On Wed, Jun 3, 2020 at 5:26 PM Erik ?sterlund
>> ??? <erik.osterlund at oracle.com <mailto:erik.osterlund at oracle.com>> 
>> wrote:
>>
>> ??????? Hi Volker,
>>
>> ??????? In JDK 12, I changed quite a bit how G1 performs class
>> ??????? unloading, to a
>> ??????? new model.
>> ??????? Since the verification runs just after class unloading, I
>> ??????? guess it could
>> ??????? be interesting
>> ??????? to check if the error happens with -XX:-ClassUnloading as
>> ??????? well. If not,
>> ??????? then perhaps
>> ??????? some of my class unloading changes for G1 in JDK 12 fixed the
>> ??????? problem.
>>
>> ??????? Just a gut feeling...
>>
>> ??????? Thanks,
>> ??????? /Erik
>>
>> ??????? On 2020-06-03 17:02, Volker Simonis wrote:
>> ??????? > Hi,
>> ??????? >
>> ??????? > I would appreciate some help/advice for debugging a
>> ??????? potential G1 crash in
>> ??????? > jdk 11. The crash usually occurs when running a proprietary
>> ??????? jar file for
>> ??????? > about 20-30 minutes and it happens in various parts of the
>> ??????? VM (C1- or
>> ??????? > C2-compiled code, interpreter, GC). Because the crash
>> ??????? locations are so
>> ??????? > different and because the customer which reported the issue
>> ??????? claimed that it
>> ??????? > doesn't happen with Parallel GC, I thought it might be a G1
>> ??????? issue. I
>> ??????? > couldn't reproduce the crash with jdk 12 and 14 (but with
>> ??????? jdk 11 and
>> ??????? > 11.0.7, OpenJDK and Oracle JDK). When looking at the G1
>> ??????? changes in jdk 12 I
>> ??????? > couldn't find any apparent bug fix which potentially solves
>> ??????? this problem
>> ??????? > but it may have been solved by one of the many G1 changes
>> ??????? which happened in
>> ??????? > jdk 12.
>> ??????? >
>> ??????? > I did run the reproducer with "-XX:+UnlockDiagnosticVMOptions
>> ??????? > -XX:+VerifyBeforeGC -XX:+VerifyAfterGC -XX:+VerifyDuringGC
>> ??????? > -XX:+CheckJNICalls -XX:+G1VerifyRSetsDuringFullGC
>> ??????? > -XX:+G1VerifyHeapRegionCodeRoots" and I indeed got
>> ??????? verification errors (see
>> ??????? > [1] for a complete hs_err file). Sometimes it's just a few
>> ??????? fields pointing
>> ??????? > to dead objects:
>> ??????? >
>> ??????? > [1035.782s][error][gc,verify? ? ? ? ?] ----------
>> ??????? > [1035.782s][error][gc,verify? ? ? ? ?] Field
>> ??????? 0x00000000fb509148 of live obj
>> ??????? > 0x00000000fb509130 in region [0x00000000fb500000,
>> ??????? 0x00000000fb600000)
>> ??????? > [1035.782s][error][gc,verify? ? ? ? ?] class name
>> ??????? > org.antlr.v4.runtime.atn.ATNConfig
>> ??????? > [1035.782s][error][gc,verify? ? ? ? ?] points to dead obj
>> ??????? > 0x00000000f9ba39b0 in region [0x00000000f9b00000,
>> ??????? 0x00000000f9c00000)
>> ??????? > [1035.782s][error][gc,verify? ? ? ? ?] class name
>> ??????? > org.antlr.v4.runtime.atn.SingletonPredictionContext
>> ??????? > [1035.782s][error][gc,verify? ? ? ? ?] ----------
>> ??????? > [1035.783s][error][gc,verify? ? ? ? ?] Field
>> ??????? 0x00000000fb509168 of live obj
>> ??????? > 0x00000000fb509150 in region [0x00000000fb500000,
>> ??????? 0x00000000fb600000)
>> ??????? > [1035.783s][error][gc,verify? ? ? ? ?] class name
>> ??????? > org.antlr.v4.runtime.atn.ATNConfig
>> ??????? > [1035.783s][error][gc,verify? ? ? ? ?] points to dead obj
>> ??????? > 0x00000000f9ba39b0 in region [0x00000000f9b00000,
>> ??????? 0x00000000f9c00000)
>> ??????? > [1035.783s][error][gc,verify? ? ? ? ?] class name
>> ??????? > org.antlr.v4.runtime.atn.SingletonPredictionContext
>> ??????? > [1035.783s][error][gc,verify? ? ? ? ?] ----------
>> ??????? > ...
>> ??????? > [1043.928s][error][gc,verify? ? ? ? ?] Heap Regions:
>> ??????? E=young(eden),
>> ??????? > S=young(survivor), O=old, HS=humongous(starts),
>> ??????? HC=humongous(continues),
>> ??????? > CS=collection set, F=free, A=archive, TAMS=top-at-mark-start
>> ??????? (previous,
>> ??????? > next)
>> ??????? > ...
>> ??????? > [1043.929s][error][gc,verify? ? ? ? ?] | 
>> 79|0x00000000f9b00000,
>> ??????? > 0x00000000f9bfffe8, 0x00000000f9c00000| 99%| O| |TAMS
>> ??????? 0x00000000f9bfffe8,
>> ??????? > 0x00000000f9b00000| Updating
>> ??????? > ...
>> ??????? > [1043.971s][error][gc,verify? ? ? ? ?] | 
>> 105|0x00000000fb500000,
>> ??????? > 0x00000000fb54fc08, 0x00000000fb600000| 31%| S|CS|TAMS
>> ??????? 0x00000000fb500000,
>> ??????? > 0x00000000fb500000| Complete
>> ??????? >
>> ??????? > but I also got verification errors with more than 30000
>> ??????? fields of distinct
>> ??????? > objects pointing to more than 1000 dead objects. How can
>> ??????? that happen? Is
>> ??????? > the verification always accurate or can this also be a
>> ??????? problem with the
>> ??????? > verification itself and I'm hunting the wrong problem?
>> ??????? >
>> ??????? > Sometimes I also saw verification errors where fields point
>> ??????? to objects in
>> ??????? > regions with "Untracked remset":
>> ??????? >
>> ??????? > [673.762s][error][gc,verify] ----------
>> ??????? > [673.762s][error][gc,verify] Field 0x00000000fca49298 of
>> ??????? live obj
>> ??????? > 0x00000000fca49280 in region [0x00000000fca0000
>> ??????? > 0, 0x00000000fcb00000)
>> ??????? > [673.762s][error][gc,verify] class name
>> ??????? org.antlr.v4.runtime.atn.ATNConfig
>> ??????? > [673.762s][error][gc,verify] points to obj
>> ??????? 0x00000000f9d5a9a0 in region
>> ??????? >
>> 81:(F)[0x00000000f9d00000,0x00000000f9d00000,0x00000000f9e00000]
>> ??????? remset
>> ??????? > Untracked
>> ??????? > [673.762s][error][gc,verify] ----------
>> ??????? >
>> ??????? > But they are by far not that common like the pointers to
>> ??????? dead objects. Once
>> ??????? > I even saw a "Root location" pointing to a dead object:
>> ??????? >
>> ??????? > [369.808s][error][gc,verify] Root location
>> ??????? 0x00007f35bb33f1f8 points to
>> ??????? > dead obj 0x00000000f87fa200
>> ??????? > [369.808s][error][gc,verify]
>> ??????? org.antlr.v4.runtime.atn.PredictionContextCache
>> ??????? > [369.808s][error][gc,verify] {0x00000000f87fa200} - klass:
>> ??????? > 'org/antlr/v4/runtime/atn/PredictionContextCache'
>> ??????? > [369.850s][error][gc,verify] ----------
>> ??????? > [369.850s][error][gc,verify] Field 0x00000000fbc60900 of
>> ??????? live obj
>> ??????? > 0x00000000fbc608f0 in region [0x00000000fbc00000,
>> ??????? 0x00000000fbd00000)
>> ??????? > [369.850s][error][gc,verify] class name
>> ??????? > org.antlr.v4.runtime.atn.ParserATNSimulator
>> ??????? > [369.850s][error][gc,verify] points to dead obj
>> ??????? 0x00000000f87fa200 in
>> ??????? > region [0x00000000f8700000, 0x00000000f8800000)
>> ??????? > [369.850s][error][gc,verify] class name
>> ??????? > org.antlr.v4.runtime.atn.PredictionContextCache
>> ??????? > [369.850s][error][gc,verify] ----------
>> ??????? >
>> ??????? > All these verification errors occur after the Remark phase in
>> ??????? > G1ConcurrentMark::remark() at:
>> ??????? >
>> ??????? > verify_during_pause(G1HeapVerifier::G1VerifyRemark,
>> ??????? > VerifyOption_G1UsePrevMarking, "Remark after");
>> ??????? >
>> ??????? > V? [libjvm.so+0x6ca186]? report_vm_error(char const*, int,
>> ??????? char const*,
>> ??????? > char const*, ...)+0x106
>> ??????? > V? [libjvm.so+0x7d4a99]
>> ??????? G1HeapVerifier::verify(VerifyOption)+0x399
>> ??????? > V? [libjvm.so+0xe128bb] Universe::verify(VerifyOption, char
>> ??????? const*)+0x16b
>> ??????? > V? [libjvm.so+0x7d44ee]
>> ??????? > ?G1HeapVerifier::verify(G1HeapVerifier::G1VerifyType,
>> ??????? VerifyOption, char
>> ??????? > const*)+0x9e
>> ??????? > V? [libjvm.so+0x7addcf]
>> ??????? >
>> ?G1ConcurrentMark::verify_during_pause(G1HeapVerifier::G1VerifyType,
>> ??????? > VerifyOption, char const*)+0x9f
>> ??????? > V? [libjvm.so+0x7b172e] G1ConcurrentMark::remark()+0x3be
>> ??????? > V? [libjvm.so+0xe6a5e1] VM_CGC_Operation::doit()+0x211
>> ??????? > V? [libjvm.so+0xe69908] VM_Operation::evaluate()+0xd8
>> ??????? > V? [libjvm.so+0xe6713f]
>> ??????? VMThread::evaluate_operation(VM_Operation*) [clone
>> ??????? > .constprop.54]+0xff
>> ??????? > V? [libjvm.so+0xe6764e]? VMThread::loop()+0x3be
>> ??????? > V? [libjvm.so+0xe67a7b]? VMThread::run()+0x7b
>> ??????? >
>> ??????? > The GC log output looks as follows:
>> ??????? > ...
>> ??????? > [1035.775s][info ][gc,verify,start? ?] Verifying During GC
>> ??????? (Remark after)
>> ??????? > [1035.775s][debug][gc,verify? ? ? ? ?] Threads
>> ??????? > [1035.776s][debug][gc,verify? ? ? ? ?] Heap
>> ??????? > [1035.776s][debug][gc,verify? ? ? ? ?] Roots
>> ??????? > [1035.782s][debug][gc,verify? ? ? ? ?] HeapRegionSets
>> ??????? > [1035.782s][debug][gc,verify? ? ? ? ?] HeapRegions
>> ??????? > [1035.782s][error][gc,verify? ? ? ? ?] ----------
>> ??????? > ...
>> ??????? > A more complete GC log can be found here [2].
>> ??????? >
>> ??????? > For the field 0x00000000fb509148 of live obj
>> ??????? 0x00000000fb509130 which
>> ??????? > points to the dead object 0x00000000f9ba39b0 I get the 
>> following
>> ??????? > information if I inspect them with clhsdb:
>> ??????? >
>> ??????? > hsdb> inspect 0x00000000fb509130
>> ??????? > instance of Oop for org/antlr/v4/runtime/atn/ATNConfig @
>> ??????? 0x00000000fb509130
>> ??????? > @ 0x00000000fb509130 (size = 32)
>> ??????? > _mark: 13
>> ??????? > _metadata._compressed_klass: InstanceKlass for
>> ??????? > org/antlr/v4/runtime/atn/ATNConfig
>> ??????? > state: Oop for org/antlr/v4/runtime/atn/BasicState @
>> ??????? 0x00000000f83ecfa8 Oop
>> ??????? > for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
>> ??????? > alt: 1
>> ??????? > context: Oop for
>> ??????? org/antlr/v4/runtime/atn/SingletonPredictionContext @
>> ??????? > 0x00000000f9ba39b0 Oop for
>> ??????? > org/antlr/v4/runtime/atn/SingletonPredictionContext @
>> ??????? 0x00000000f9ba39b0
>> ??????? > reachesIntoOuterContext: 8
>> ??????? > semanticContext: Oop for
>> ??????? org/antlr/v4/runtime/atn/SemanticContext$Predicate
>> ??????? > @ 0x00000000f83d57c0 Oop for
>> ??????? > org/antlr/v4/runtime/atn/SemanticContext$Predicate @
>> ??????? 0x00000000f83d57c0
>> ??????? >
>> ??????? > hsdb> inspect 0x00000000f9ba39b0
>> ??????? > instance of Oop for
>> ??????? org/antlr/v4/runtime/atn/SingletonPredictionContext @
>> ??????? > 0x00000000f9ba39b0 @ 0x00000000f9ba39b0 (size = 32)
>> ??????? > _mark: 41551306041
>> ??????? > _metadata._compressed_klass: InstanceKlass for
>> ??????? > org/antlr/v4/runtime/atn/SingletonPredictionContext
>> ??????? > id: 100635259
>> ??????? > cachedHashCode: 2005943142
>> ??????? > parent: Oop for
>> ??????? org/antlr/v4/runtime/atn/SingletonPredictionContext @
>> ??????? > 0x00000000f9ba01b0 Oop for
>> ??????? > org/antlr/v4/runtime/atn/SingletonPredictionContext @
>> ??????? 0x00000000f9ba01b0
>> ??????? > returnState: 18228
>> ??????? >
>> ??????? > I could also reproduce the verification errors with a fast
>> ??????? debug build of
>> ??????? > 11.0.7 which I did run with "-XX:+CheckCompressedOops
>> ??????? -XX:+VerifyOops
>> ??????? > -XX:+G1VerifyCTCleanup -XX:+G1VerifyBitmaps" in addition to
>> ??????? the options
>> ??????? > mentioned before, but unfortunaltey the run didn't trigger
>> ??????? neither an
>> ??????? > assertion nor a different verification error.
>> ??????? >
>> ??????? > So to summarize, my basic questions are:
>> ??????? >? ?- has somebody else encountered similar crashes?
>> ??????? >? ?- is someone aware of specific changes in jdk12 which
>> ??????? might solve this
>> ??????? > problem?
>> ??????? >? ?- are the verification errors I'm seeing accurate or is it
>> ??????? possible to get
>> ??????? > false positives when running with
>> ??????? -XX:Verify{Before,During,After}GC ?
>> ??????? >
>> ??????? > Thanks for your patience,
>> ??????? > Volker
>> ??????? >
>> ??????? > [1]
>> ??????? >
>> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/hs_err_pid28294.log
>> ??????? > [2]
>> ??????? >
>> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/verify-error.log
>>
>


From aph at redhat.com  Mon Jun  8 15:31:09 2020
From: aph at redhat.com (Andrew Haley)
Date: Mon, 8 Jun 2020 16:31:09 +0100
Subject: Logging of all GC memory access
In-Reply-To: <CAC9y=qCamDR+AN+jE2RyJ8gm_6Dk_o6kcLpNbfnmJguS6GpByw@mail.gmail.com>
References: <CAC9y=qALog4rgkYJpW7kw-i4V_yhLWanNBZX547__dGCYREQng@mail.gmail.com>
 <ef11a6d8-0dea-0877-094c-e7c47aba3570@oracle.com>
 <17cd9553-2379-fc73-19d5-58508c08db03@redhat.com>
 <CAC9y=qCamDR+AN+jE2RyJ8gm_6Dk_o6kcLpNbfnmJguS6GpByw@mail.gmail.com>
Message-ID: <f7f0e2f6-f93d-558e-7a31-b93dbfa93fc4@redhat.com>

On 08/06/2020 13:12, Ofir Gordon wrote:
> Thank you both for your answers.
> 
> I was referring?to the memory accesses taking place during the gc, and also specifically access to the heap (i.e. access to an object on the heap as part of the marking process etc.)
> 
> Also, are you familiar?with a way to get a "snapshot" of the heap at any gc activation? log the addresses where objects are placed, and the pointers between them? is such a thing possible?

Partly. Look for heap->pre_full_gc_dump.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From jianglizhou at google.com  Mon Jun  8 18:00:32 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Mon, 8 Jun 2020 11:00:32 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
 <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
 <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>
 <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>
Message-ID: <CALrW1jwKmrnVxTHKMaM_bCuCOSLKrEyuoqm554kRNxbx8GbQJw@mail.gmail.com>

Hi Ioi,

On Fri, Jun 5, 2020 at 10:51 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 6/5/20 2:38 PM, Jiangli Zhou wrote:
> > Hi Ioi,
> >
> > Thanks for the updated webrev.
> >
> > To avoid the assert, could you call GCLocker::stall_until_clear in
> > MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
> > and VMThread::execute(&op), like following. It is the best to check
> > with Thomas and other GC experts about the interaction between
> > 'Heap_lock' and 'GCLocker'.
> >
> >          if (HeapShared::is_heap_object_archiving_allowed()) {
> >            GCLocker::stall_until_clear();
> >          }
> Hi Jiangli,
>
> Thanks for the review.
>
> I don't think this is sufficient. Immediately after
> GCLocker::stall_until_clear() has returned, another thread can invoke a
> JNI function that will activate the GCLocker. That's stated by the
> gcLocker.hpp comments:
>
> http://hg.openjdk.java.net/jdk/jdk/file/882b61be2c19/src/hotspot/share/gc/shared/gcLocker.hpp#l109
>
> I've pinged the GC team for this to get their opinion on this.
>

Thanks for pinging. Thomas pointed out in his email that
GCLocker::stall_until_clear() would only wait if a gc had been
requested while the JNI critical sections were active. Otherwise
GCLocker::stall_until_clear() would do nothing and let the caller
continue. My suggestion likely does not achieve what you intend to do.

Could you please describe the specific issue that you want to address
with GCLocker?


> > There is also a compiler error caused by the extra 'THREAD' arg for
> > MutexLocker. Please fix:
>
> I think you are using an older repo. The MutexLocker constructor has
> been changed since Jan 2020:
>
> http://hg.openjdk.java.net/jdk/jdk/annotate/882b61be2c19/src/hotspot/share/runtime/mutexLocker.hpp#l210
>

Ok, thanks. My repo was not updated properly due to unresolved tooling
issues on my side.

Best,
Jiangli

> Thanks
> - Ioi
>
> > mutexLocker.hpp:205:22: note:   no known conversion for argument 1
> > from ?Thread*? to ?Mutex*?
> >
> >    205 |   MutexLocker(Mutex* mutex, Thread* thread,
> > Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) :
> >
> >        |               ~~~~~~~^~~~~
> >
> > mutexLocker.hpp:191:3: note: candidate:
> > ?MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)?
> >
> >    191 |   MutexLocker(Mutex* mutex, Mutex::SafepointCheckFlag flag =
> > Mutex::_safepoint_check_flag) :
> >
> >     ... (rest of output omitted)
> >
> > Best,
> > Jiangli
> >
> > On Thu, Jun 4, 2020 at 9:37 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >> Hi Jiangli,
> >>
> >> Updated webrev is here:
> >>
> >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v03/
> >>
> >> The only difference with the previous version is this part:
> >>
> >> 1975 MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
> >> 1976                Heap_lock : NULL);     // needed for
> >> collect_as_vm_thread
> >>
> >> Thanks
> >> - Ioi
> >>
> >>
> >> On 6/4/20 5:23 PM, Jiangli Zhou wrote:
> >>> Ioi, do you have a new webrev?
> >>>
> >>> Best,
> >>> Jiangli
> >>>
> >>> On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>
> >>>> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
> >>>>> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
> >>>>>>>> HI, Ioi
> >>>>>>>>
> >>>>>>>>      If the allocation of EDEN happens between GC and dump, should we
> >>>>>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
> >>>>>>>> safepoint there should no allocation happens. The stack trace showed
> >>>>>>>> it happened with a Java Thread, which should be blocked at safepoint.
> >>>>>>>>
> >>>>>>> Hi Yumin,
> >>>>>>>
> >>>>>>> I think GCs cannot be executed inside a safepoint, because some parts
> >>>>>>> of GC need to execute in a safepoint, so they will be blocked until
> >>>>>>> VM_PopulateDumpSharedSpace::doit has returned.
> >>>>>>>
> >>>>>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
> >>>>>>> fix this, so I will withdraw the current patch.
> >>>>>>>
> >>>>>> Hi Yumin,
> >>>>>>
> >>>>>> Actually, I changed my mind again, and implemented your suggestion :-)
> >>>>>>
> >>>>>> There's actually a way to invoke GC inside a safepoint (it's used by
> >>>>>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
> >>>>>> same thing. It's a much simpler change and does what I want -- no other
> >>>>>> thread will be able to make any heap allocation after the GC has
> >>>>>> completed, so no EDEN region will be allocated:
> >>>>>>
> >>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
> >>>>> Going with the simple approach for the short term sounds ok.
> >>>>>
> >>>>> 1612 void VM_PopulateDumpSharedSpace::doit() {
> >>>>> ...
> >>>>> 1615     if (GCLocker::is_active()) {
> >>>>> 1616       // This should rarely happen during -Xshare:dump, if at
> >>>>> all, but just to be safe.
> >>>>> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
> >>>>> 1618       return;
> >>>>> 1619     }
> >>>>>
> >>>>> MetaspaceShared::preload_and_dump()
> >>>>> ...
> >>>>> 1945     while (true) {
> >>>>> 1946       {
> >>>>> 1947         MutexLocker ml(THREAD, Heap_lock);
> >>>>> 1948         VMThread::execute(&op);
> >>>>> 1949       }
> >>>>> 1950       // If dumping has finished, the VM would have exited. The
> >>>>> only reason to
> >>>>> 1951       // come back here is to wait for the GCLocker.
> >>>>> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
> >>>>> 1953       os::naked_short_sleep(1);
> >>>>> 1954     }
> >>>>> 1955   }
> >>>>>
> >>>>>
> >>>>> Instead of doing the while/retry, calling
> >>>>> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
> >>>>> before VM_PopulateDumpSharedSpace probably is much cleaner?
> >>>> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
> >>>> cannot be called in the VM thread:
> >>>>
> >>>> #  assert(thread->is_Java_thread()) failed: just checking
> >>>>
> >>>>> Please also add a comment in MetaspaceShared::preload_and_dump to
> >>>>> explain that Universe::heap()->collect_as_vm_thread expects that
> >>>>> Heap_lock is already held by the thread, and that's the reason for the
> >>>>> call to MutexLocker ml(THREAD, Heap_lock).
> >>>> I changed the code to be like this:
> >>>>
> >>>>        MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
> >>>>                       Heap_lock : NULL);     // needed for
> >>>> collect_as_vm_thread
> >>>>
> >>>>> Do you also need to call
> >>>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
> >>>>> before GC?
> >>>> There's no need:
> >>>>
> >>>> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
> >>>>         ...
> >>>>        case GCCause::_archive_time_gc:
> >>>>        case GCCause::_metadata_GC_clear_soft_refs: {
> >>>>          HandleMark hm;
> >>>>          do_full_collection(true);         // do clear all soft refs
> >>>>
> >>>> Thanks
> >>>> - Ioi
> >>>>
> >>>>> Best,
> >>>>> Jiangli
> >>>>>
> >>>>>> The check for "if (GCLocker::is_active())" should almost always be
> >>>>>> false. I left it there just for safety:
> >>>>>>
> >>>>>> During -Xshare:dump, we execute Java code to build the module graph,
> >>>>>> load classes, etc. So theoretically someone could try to parallelize
> >>>>>> some of that Java code in the future. Theoretically when CDS has entered
> >>>>>> the safepoint, another thread could be in the middle a JNI method that
> >>>>>> has held the GCLock.
> >>>>>>
> >>>>>> Thanks
> >>>>>> - Ioi
> >>>>>>
> >>>>>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
> >>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Summary:
> >>>>>>>>>
> >>>>>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >>>>>>>>> CDS executes a full GC so that G1 will compact the heap regions,
> >>>>>>>>> leaving
> >>>>>>>>> maximum contiguous free space at the top of the heap. Then, the
> >>>>>>>>> archived
> >>>>>>>>> heap regions are allocated from the top of the heap.
> >>>>>>>>>
> >>>>>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
> >>>>>>>>> after the GC has completed. The cleaners may allocate or
> >>>>>>>>> synchronized, which
> >>>>>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
> >>>>>>>>>
> >>>>>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
> >>>>>>>>> regions exist,
> >>>>>>>>> exit the safepoint, run GC, and try again. Eventually all the
> >>>>>>>>> cleaners will
> >>>>>>>>> be executed and no more allocation can happen.
> >>>>>>>>>
> >>>>>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> - Ioi
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>


From stefan.karlsson at oracle.com  Mon Jun  8 18:48:36 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 8 Jun 2020 20:48:36 +0200
Subject: RFR: 8246272: Make use of GCLogPrecious for G1, Parallel and Serial
Message-ID: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>

Hi all,

Please review this patch to turn the GC init logging into "precious" 
logging, so that those lines get dumped into the hs_err file.

https://cr.openjdk.java.net/~stefank/8246272/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8246272

Example output from a G1 hs_err file:

GC Precious Log:
 ?CPUs: 32 total, 32 available
 ?Memory: 125G
 ?Large Page Support: Disabled
 ?NUMA Support: Disabled
 ?Compressed Oops: Enabled (Zero based)
 ?Heap Region Size: 16M
 ?Heap Min Capacity: 16M
 ?Heap Initial Capacity: 2016M
 ?Heap Max Capacity: 30688M
 ?Pre-touch: Disabled
 ?Parallel Workers: 23
 ?Concurrent Workers: 6
 ?Concurrent Refinement Workers: 23
 ?Periodic GC: Disabled

I've changed the generic GCInitLogger so all GCs get these lines dumped. 
I've also added the G1 and Parallel specific lines. I can add the 
Shenandoah and Epsilon specific lines, if it is requested.

Thanks,
StefanK


From ioi.lam at oracle.com  Mon Jun  8 22:21:17 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Mon, 8 Jun 2020 15:21:17 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <f61a2143-a0c2-fc8c-4e4e-fa75ddb3de61@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
 <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
 <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>
 <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>
 <f61a2143-a0c2-fc8c-4e4e-fa75ddb3de61@oracle.com>
Message-ID: <59c601d8-8db9-256c-b3be-2d9064ff0cef@oracle.com>


On 6/8/20 7:12 AM, Thomas Schatzl wrote:
> Hi,
>
> On 06.06.20 07:51, Ioi Lam wrote:
>>
>>
>> On 6/5/20 2:38 PM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>> Thanks for the updated webrev.
>>>
>>> To avoid the assert, could you call GCLocker::stall_until_clear in
>>> MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
>>> and VMThread::execute(&op), like following. It is the best to check
>>> with Thomas and other GC experts about the interaction between
>>> 'Heap_lock' and 'GCLocker'.
>>>
>>> ???????? if (HeapShared::is_heap_object_archiving_allowed()) {
>>> ?????????? GCLocker::stall_until_clear();
>>> ???????? }
>> Hi Jiangli,
>>
>> Thanks for the review.
>>
>> I don't think this is sufficient. Immediately after 
>> GCLocker::stall_until_clear() has returned, another thread can invoke 
>> a JNI function that will activate the GCLocker. That's stated by the 
>> gcLocker.hpp comments:
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/882b61be2c19/src/hotspot/share/gc/shared/gcLocker.hpp#l109 
>
>
> The GCLocker::stall_until_clear()/GClocker::is_active() loop seems to 
> try to achieve something different than what the CR tries to fix.
>
> The change without that loop/GClocker::stall_until_clear() already 
> fully achieves that there is no allocation between compaction and the 
> CDS dump (at least for STW collectors or collectors that can do a 
> "full" collection during an STW pause.)
>
> This additional code to wait for GCLocker being inactive to always get 
> a contiguous range of memory is a different matter:
>
> Using GCLocker::stall_until_clear() does not work: it would only wait 
> for completion of all JNI critical sections if a gc had been requested 
> while these JNI critical sections were active. Otherwise it would do 
> nothing and let the caller continue, some threads still being in a 
> critical section and the gclocker held. Which means that the following 
> full gc may still "fail".
>
> (That gc would set needs_gc and then it would work. But since you did 
> not know whether the first full gc has been successful, you would need 
> another one. Since at that point the return from the last JNI CS would 
> trigger a young gc, you have two extra gcs in the common case...).
>
> Another serious problem is that the stall_until_clear() code is 
> written to be only called in a Java thread. I do not think it works in 
> the VM thread without lots of more thought (and at least necessary 
> changes to asserts). As Jiangli mentions, there is likely also an 
> issue with ordering of Heap_lock and JNI_critical_lock.
>
> The GCLocker::is_active() loop is a bit better as it makes sure that 
> all threads left the JNI critical section and since we are in a 
> safepoint, when exiting the VM call these threads will block (and not 
> be able to go into another critical section).
>
> So the full gc will likely be able to move away all objects from eden 
> regions.
>
> However:
> - this part of the code is bound to break with a change of G1 to pin 
> (eden) regions instead of using the gclocker (I am aware that it is 
> very g1 specific already).
>
> One option to decrease the direct reliance on the GCLocker a bit could 
> be to have CollectedHeap::collect_as_vm_thread return a bool about 
> whether the collection could be executed (or there is a pinned eden 
> region?). Since the only reason to abort is the GCLocker for all STW 
> gcs at that point, this boils down to the same loop without directly 
> using the GCLocker here.
>
> - one other issue I think there is with this piece code is that 
> safepoints/blocks for another reason while being in a critical 
> section. Then we have a deadlock.
>
> This is prohibited by the JNI spec ("In a code segment enclosed by 
> GetRelease*Critical the native code must not ... or cause the current 
> thread to block"), but would be very hard to analyze so I recommend to 
> at least error out (or in this case continue with a suboptimal heap) 
> after too many retries, or at least start logging some warning that 
> you are stalled because of GCLocker being held.
> Otherwise this error state will be too hard to understand and likely 
> very very hard to reproduce.
>
> - this seems to be an optimization only if I understand the situation 
> correctly.
>
> So Overall, for this change I would tend to suggest to only fix the 
> bug, i.e. remove the retries of any kind and so not try to optimize 
> memory layout here and think about this in an extra CR.
>
> Thanks,
> ? Thomas
Hi Thomas & Jiangli,

Thanks for the comments. I have created an updated webrev:

http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v04/

Now the code works the same way as jcmd GC.heap_dump, which also check 
for GCLocker before calling collect_as_vm_thread():

http://hg.openjdk.java.net/jdk/jdk/file/523a504bb062/src/hotspot/share/gc/shared/gcVMOperations.cpp#l124

Question to Thomas -- is it necessary to call 
Universe::heap()->ensure_parsability(false)?

Have removed the loop and just print a warning message that the archived 
heap may be sub-optimal.

Jiangli> Could you please describe the specific issue that you want to 
address with GCLocker?

With the current code base, we execute very simple Java code during 
-Xshare:dump, so after all classes are loaded, GCLocker should not be 
active. As noted in the code comments, GCLocker will be active only in 
the unlikely scenario where the Java code in core lib has been modified 
to do some sort of clean up that involves JNI code.

What do you think?

Thanks
- Ioi


======
[PS] I am adding an assert like this to run tier1-4 in mach5 to see if 
this can happen.

+? assert(!GCLocker::is_active(), "huh");
 ?? if (GCLocker::is_active()) {


From jianglizhou at google.com  Mon Jun  8 22:53:09 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Mon, 8 Jun 2020 15:53:09 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <59c601d8-8db9-256c-b3be-2d9064ff0cef@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
 <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
 <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>
 <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>
 <f61a2143-a0c2-fc8c-4e4e-fa75ddb3de61@oracle.com>
 <59c601d8-8db9-256c-b3be-2d9064ff0cef@oracle.com>
Message-ID: <CALrW1jxoh-gSLOUbwk7jd6ydtmm2PDXStCon=DFe2N0WJRMrzQ@mail.gmail.com>

Hi Ioi,

The latest webrev looks ok. Nit: for cleaner API, could you please
change VM_PopulateDumpSharedSpace::run_gc() to static
HepShared::run_gc()?

On Mon, Jun 8, 2020 at 3:22 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 6/8/20 7:12 AM, Thomas Schatzl wrote:
> > Hi,
> >
> > On 06.06.20 07:51, Ioi Lam wrote:
> >>
> >>
> >> On 6/5/20 2:38 PM, Jiangli Zhou wrote:
> >>> Hi Ioi,
> >>>
> >>> Thanks for the updated webrev.
> >>>
> >>> To avoid the assert, could you call GCLocker::stall_until_clear in
> >>> MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
> >>> and VMThread::execute(&op), like following. It is the best to check
> >>> with Thomas and other GC experts about the interaction between
> >>> 'Heap_lock' and 'GCLocker'.
> >>>
> >>>          if (HeapShared::is_heap_object_archiving_allowed()) {
> >>>            GCLocker::stall_until_clear();
> >>>          }
> >> Hi Jiangli,
> >>
> >> Thanks for the review.
> >>
> >> I don't think this is sufficient. Immediately after
> >> GCLocker::stall_until_clear() has returned, another thread can invoke
> >> a JNI function that will activate the GCLocker. That's stated by the
> >> gcLocker.hpp comments:
> >>
> >> http://hg.openjdk.java.net/jdk/jdk/file/882b61be2c19/src/hotspot/share/gc/shared/gcLocker.hpp#l109
> >
> >
> > The GCLocker::stall_until_clear()/GClocker::is_active() loop seems to
> > try to achieve something different than what the CR tries to fix.
> >
> > The change without that loop/GClocker::stall_until_clear() already
> > fully achieves that there is no allocation between compaction and the
> > CDS dump (at least for STW collectors or collectors that can do a
> > "full" collection during an STW pause.)
> >
> > This additional code to wait for GCLocker being inactive to always get
> > a contiguous range of memory is a different matter:
> >
> > Using GCLocker::stall_until_clear() does not work: it would only wait
> > for completion of all JNI critical sections if a gc had been requested
> > while these JNI critical sections were active. Otherwise it would do
> > nothing and let the caller continue, some threads still being in a
> > critical section and the gclocker held. Which means that the following
> > full gc may still "fail".
> >
> > (That gc would set needs_gc and then it would work. But since you did
> > not know whether the first full gc has been successful, you would need
> > another one. Since at that point the return from the last JNI CS would
> > trigger a young gc, you have two extra gcs in the common case...).
> >
> > Another serious problem is that the stall_until_clear() code is
> > written to be only called in a Java thread. I do not think it works in
> > the VM thread without lots of more thought (and at least necessary
> > changes to asserts). As Jiangli mentions, there is likely also an
> > issue with ordering of Heap_lock and JNI_critical_lock.
> >
> > The GCLocker::is_active() loop is a bit better as it makes sure that
> > all threads left the JNI critical section and since we are in a
> > safepoint, when exiting the VM call these threads will block (and not
> > be able to go into another critical section).
> >
> > So the full gc will likely be able to move away all objects from eden
> > regions.
> >
> > However:
> > - this part of the code is bound to break with a change of G1 to pin
> > (eden) regions instead of using the gclocker (I am aware that it is
> > very g1 specific already).
> >
> > One option to decrease the direct reliance on the GCLocker a bit could
> > be to have CollectedHeap::collect_as_vm_thread return a bool about
> > whether the collection could be executed (or there is a pinned eden
> > region?). Since the only reason to abort is the GCLocker for all STW
> > gcs at that point, this boils down to the same loop without directly
> > using the GCLocker here.
> >
> > - one other issue I think there is with this piece code is that
> > safepoints/blocks for another reason while being in a critical
> > section. Then we have a deadlock.
> >
> > This is prohibited by the JNI spec ("In a code segment enclosed by
> > GetRelease*Critical the native code must not ... or cause the current
> > thread to block"), but would be very hard to analyze so I recommend to
> > at least error out (or in this case continue with a suboptimal heap)
> > after too many retries, or at least start logging some warning that
> > you are stalled because of GCLocker being held.
> > Otherwise this error state will be too hard to understand and likely
> > very very hard to reproduce.
> >
> > - this seems to be an optimization only if I understand the situation
> > correctly.
> >
> > So Overall, for this change I would tend to suggest to only fix the
> > bug, i.e. remove the retries of any kind and so not try to optimize
> > memory layout here and think about this in an extra CR.
> >
> > Thanks,
> >   Thomas
> Hi Thomas & Jiangli,
>
> Thanks for the comments. I have created an updated webrev:
>
> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v04/
>
> Now the code works the same way as jcmd GC.heap_dump, which also check
> for GCLocker before calling collect_as_vm_thread():
>
> http://hg.openjdk.java.net/jdk/jdk/file/523a504bb062/src/hotspot/share/gc/shared/gcVMOperations.cpp#l124
>
> Question to Thomas -- is it necessary to call
> Universe::heap()->ensure_parsability(false)?
>
> Have removed the loop and just print a warning message that the archived
> heap may be sub-optimal.
>
> Jiangli> Could you please describe the specific issue that you want to
> address with GCLocker?
>
> With the current code base, we execute very simple Java code during
> -Xshare:dump, so after all classes are loaded, GCLocker should not be
> active. As noted in the code comments, GCLocker will be active only in
> the unlikely scenario where the Java code in core lib has been modified
> to do some sort of clean up that involves JNI code.
>

Ok, thanks. It would involve JNI critical, agreed that it's unlikely
for static CDS dumping.

> What do you think?
>
> Thanks
> - Ioi
>
>
> ======
> [PS] I am adding an assert like this to run tier1-4 in mach5 to see if
> this can happen.
>
> +  assert(!GCLocker::is_active(), "huh");
>     if (GCLocker::is_active()) {
>
>

Thanks, that will give some confidence before we can help do more
tests for this.

Best,
Jiangli

>
>
>
>
>


From erik.osterlund at oracle.com  Tue Jun  9 05:57:49 2020
From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=)
Date: Tue, 9 Jun 2020 07:57:49 +0200
Subject: RFR: 8246272: Make use of GCLogPrecious for G1,
 Parallel and Serial
In-Reply-To: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
References: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
Message-ID: <C86AE813-C5A2-49A1-9A7B-C19EF677578D@oracle.com>

Hi Stefan,

Looks good.

Thanks,
/Erik

> On 8 Jun 2020, at 20:49, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> ?Hi all,
> 
> Please review this patch to turn the GC init logging into "precious" logging, so that those lines get dumped into the hs_err file.
> 
> https://cr.openjdk.java.net/~stefank/8246272/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246272
> 
> Example output from a G1 hs_err file:
> 
> GC Precious Log:
>  CPUs: 32 total, 32 available
>  Memory: 125G
>  Large Page Support: Disabled
>  NUMA Support: Disabled
>  Compressed Oops: Enabled (Zero based)
>  Heap Region Size: 16M
>  Heap Min Capacity: 16M
>  Heap Initial Capacity: 2016M
>  Heap Max Capacity: 30688M
>  Pre-touch: Disabled
>  Parallel Workers: 23
>  Concurrent Workers: 6
>  Concurrent Refinement Workers: 23
>  Periodic GC: Disabled
> 
> I've changed the generic GCInitLogger so all GCs get these lines dumped. I've also added the G1 and Parallel specific lines. I can add the Shenandoah and Epsilon specific lines, if it is requested.
> 
> Thanks,
> StefanK


From per.liden at oracle.com  Tue Jun  9 06:54:56 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 9 Jun 2020 08:54:56 +0200
Subject: RFR: 8246272: Make use of GCLogPrecious for G1, Parallel and
 Serial
In-Reply-To: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
References: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
Message-ID: <3ba65736-ed9a-3dc8-8a86-4f4ca56555d0@oracle.com>

Looks good!

/Per

On 6/8/20 8:48 PM, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to turn the GC init logging into "precious" 
> logging, so that those lines get dumped into the hs_err file.
> 
> https://cr.openjdk.java.net/~stefank/8246272/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8246272
> 
> Example output from a G1 hs_err file:
> 
> GC Precious Log:
>  ?CPUs: 32 total, 32 available
>  ?Memory: 125G
>  ?Large Page Support: Disabled
>  ?NUMA Support: Disabled
>  ?Compressed Oops: Enabled (Zero based)
>  ?Heap Region Size: 16M
>  ?Heap Min Capacity: 16M
>  ?Heap Initial Capacity: 2016M
>  ?Heap Max Capacity: 30688M
>  ?Pre-touch: Disabled
>  ?Parallel Workers: 23
>  ?Concurrent Workers: 6
>  ?Concurrent Refinement Workers: 23
>  ?Periodic GC: Disabled
> 
> I've changed the generic GCInitLogger so all GCs get these lines dumped. 
> I've also added the G1 and Parallel specific lines. I can add the 
> Shenandoah and Epsilon specific lines, if it is requested.
> 
> Thanks,
> StefanK


From erik.osterlund at oracle.com  Tue Jun  9 07:14:58 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 9 Jun 2020 09:14:58 +0200
Subject: RFR: 8245203/8245204/8245208: ZGC: Don't hold the ZPageAllocator
 lock while committing/uncommitting memory
In-Reply-To: <dfa723dd-334f-88cf-c21b-f4e09539fc28@oracle.com>
References: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
 <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>
 <49447851-8f64-8c99-8443-5b400e1851c0@oracle.com>
 <06db729e-d804-a0b4-262a-aa70181d904b@oracle.com>
 <dfa723dd-334f-88cf-c21b-f4e09539fc28@oracle.com>
Message-ID: <b63aae55-7a5b-00a7-5b4c-48284da7c318@oracle.com>

Hi Per,

Looks good.

Thanks,
/Erik

On 2020-06-05 17:15, Per Liden wrote:
> Hi,
>
> Here are the latest patches for this. Contains more review comments 
> from Stefan, and they have been rebased on today's jdk/jdk.
>
> * 8246220: ZGC: Introduce ZUnmapper to asynchronous unmap pages
> http://cr.openjdk.java.net/~pliden/8246220/webrev.2
>
> * 8245208: ZGC: Don't hold the ZPageAllocator lock while 
> committing/uncommitting memory
> http://cr.openjdk.java.net/~pliden/8245208/webrev.3
>
> * 8246265: ZGC: Introduce ZConditionLock
> http://cr.openjdk.java.net/~pliden/8246265/webrev.0/
>
> * (already review) 8245204: ZGC: Introduce ZListRemoveIterator
> http://cr.openjdk.java.net/~pliden/8245204/webrev.0/
>
> * (already review) 8245203: ZGC: Don't track size in 
> ZPhysicalMemoryBacking
> http://cr.openjdk.java.net/~pliden/8245203/webrev.0/
>
>
> And for convenience, here's an all-in-one patch:
> http://cr.openjdk.java.net/~pliden/zgc/commit_uncommit/all/webrev.1
>
> cheers,
> Per


From per.liden at oracle.com  Tue Jun  9 07:31:12 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 9 Jun 2020 09:31:12 +0200
Subject: RFR: 8245203/8245204/8245208: ZGC: Don't hold the ZPageAllocator
 lock while committing/uncommitting memory
In-Reply-To: <b63aae55-7a5b-00a7-5b4c-48284da7c318@oracle.com>
References: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
 <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>
 <49447851-8f64-8c99-8443-5b400e1851c0@oracle.com>
 <06db729e-d804-a0b4-262a-aa70181d904b@oracle.com>
 <dfa723dd-334f-88cf-c21b-f4e09539fc28@oracle.com>
 <b63aae55-7a5b-00a7-5b4c-48284da7c318@oracle.com>
Message-ID: <00532b67-2a20-ecef-5127-845ba3b71361@oracle.com>

Thanks for reviewing, Erik!

/Per

On 6/9/20 9:14 AM, Erik ?sterlund wrote:
> Hi Per,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
> On 2020-06-05 17:15, Per Liden wrote:
>> Hi,
>>
>> Here are the latest patches for this. Contains more review comments 
>> from Stefan, and they have been rebased on today's jdk/jdk.
>>
>> * 8246220: ZGC: Introduce ZUnmapper to asynchronous unmap pages
>> http://cr.openjdk.java.net/~pliden/8246220/webrev.2
>>
>> * 8245208: ZGC: Don't hold the ZPageAllocator lock while 
>> committing/uncommitting memory
>> http://cr.openjdk.java.net/~pliden/8245208/webrev.3
>>
>> * 8246265: ZGC: Introduce ZConditionLock
>> http://cr.openjdk.java.net/~pliden/8246265/webrev.0/
>>
>> * (already review) 8245204: ZGC: Introduce ZListRemoveIterator
>> http://cr.openjdk.java.net/~pliden/8245204/webrev.0/
>>
>> * (already review) 8245203: ZGC: Don't track size in 
>> ZPhysicalMemoryBacking
>> http://cr.openjdk.java.net/~pliden/8245203/webrev.0/
>>
>>
>> And for convenience, here's an all-in-one patch:
>> http://cr.openjdk.java.net/~pliden/zgc/commit_uncommit/all/webrev.1
>>
>> cheers,
>> Per
> 


From stefan.karlsson at oracle.com  Tue Jun  9 07:55:21 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 9 Jun 2020 09:55:21 +0200
Subject: RFR: 8245203/8245204/8245208: ZGC: Don't hold the ZPageAllocator
 lock while committing/uncommitting memory
In-Reply-To: <b63aae55-7a5b-00a7-5b4c-48284da7c318@oracle.com>
References: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
 <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>
 <49447851-8f64-8c99-8443-5b400e1851c0@oracle.com>
 <06db729e-d804-a0b4-262a-aa70181d904b@oracle.com>
 <dfa723dd-334f-88cf-c21b-f4e09539fc28@oracle.com>
 <b63aae55-7a5b-00a7-5b4c-48284da7c318@oracle.com>
Message-ID: <718bf9e0-5d2f-ecd0-e3dc-d47d34e60d03@oracle.com>

Looks good.

StefanK

On 2020-06-09 09:14, Erik ?sterlund wrote:
> Hi Per,
>
> Looks good.
>
> Thanks,
> /Erik
>
> On 2020-06-05 17:15, Per Liden wrote:
>> Hi,
>>
>> Here are the latest patches for this. Contains more review comments 
>> from Stefan, and they have been rebased on today's jdk/jdk.
>>
>> * 8246220: ZGC: Introduce ZUnmapper to asynchronous unmap pages
>> http://cr.openjdk.java.net/~pliden/8246220/webrev.2
>>
>> * 8245208: ZGC: Don't hold the ZPageAllocator lock while 
>> committing/uncommitting memory
>> http://cr.openjdk.java.net/~pliden/8245208/webrev.3
>>
>> * 8246265: ZGC: Introduce ZConditionLock
>> http://cr.openjdk.java.net/~pliden/8246265/webrev.0/
>>
>> * (already review) 8245204: ZGC: Introduce ZListRemoveIterator
>> http://cr.openjdk.java.net/~pliden/8245204/webrev.0/
>>
>> * (already review) 8245203: ZGC: Don't track size in 
>> ZPhysicalMemoryBacking
>> http://cr.openjdk.java.net/~pliden/8245203/webrev.0/
>>
>>
>> And for convenience, here's an all-in-one patch:
>> http://cr.openjdk.java.net/~pliden/zgc/commit_uncommit/all/webrev.1
>>
>> cheers,
>> Per
>


From stefan.karlsson at oracle.com  Tue Jun  9 07:55:40 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 9 Jun 2020 09:55:40 +0200
Subject: RFR: 8246272: Make use of GCLogPrecious for G1, Parallel and
 Serial
In-Reply-To: <C86AE813-C5A2-49A1-9A7B-C19EF677578D@oracle.com>
References: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
 <C86AE813-C5A2-49A1-9A7B-C19EF677578D@oracle.com>
Message-ID: <18adcb2d-39e3-5dc6-81d2-10fe7e0e5a07@oracle.com>

Thanks, Erik.

StefanK

On 2020-06-09 07:57, Erik ?sterlund wrote:
> Hi Stefan,
>
> Looks good.
>
> Thanks,
> /Erik
>
>> On 8 Jun 2020, at 20:49, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>>
>> ?Hi all,
>>
>> Please review this patch to turn the GC init logging into "precious" logging, so that those lines get dumped into the hs_err file.
>>
>> https://cr.openjdk.java.net/~stefank/8246272/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246272
>>
>> Example output from a G1 hs_err file:
>>
>> GC Precious Log:
>>   CPUs: 32 total, 32 available
>>   Memory: 125G
>>   Large Page Support: Disabled
>>   NUMA Support: Disabled
>>   Compressed Oops: Enabled (Zero based)
>>   Heap Region Size: 16M
>>   Heap Min Capacity: 16M
>>   Heap Initial Capacity: 2016M
>>   Heap Max Capacity: 30688M
>>   Pre-touch: Disabled
>>   Parallel Workers: 23
>>   Concurrent Workers: 6
>>   Concurrent Refinement Workers: 23
>>   Periodic GC: Disabled
>>
>> I've changed the generic GCInitLogger so all GCs get these lines dumped. I've also added the G1 and Parallel specific lines. I can add the Shenandoah and Epsilon specific lines, if it is requested.
>>
>> Thanks,
>> StefanK


From stefan.karlsson at oracle.com  Tue Jun  9 07:55:56 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 9 Jun 2020 09:55:56 +0200
Subject: RFR: 8246272: Make use of GCLogPrecious for G1, Parallel and
 Serial
In-Reply-To: <3ba65736-ed9a-3dc8-8a86-4f4ca56555d0@oracle.com>
References: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
 <3ba65736-ed9a-3dc8-8a86-4f4ca56555d0@oracle.com>
Message-ID: <a4c56b1a-5f23-7ff9-cb64-d4585ca869c8@oracle.com>

Thanks, Per.

StefanK

On 2020-06-09 08:54, Per Liden wrote:
> Looks good!
>
> /Per
>
> On 6/8/20 8:48 PM, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to turn the GC init logging into "precious" 
>> logging, so that those lines get dumped into the hs_err file.
>>
>> https://cr.openjdk.java.net/~stefank/8246272/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8246272
>>
>> Example output from a G1 hs_err file:
>>
>> GC Precious Log:
>> ??CPUs: 32 total, 32 available
>> ??Memory: 125G
>> ??Large Page Support: Disabled
>> ??NUMA Support: Disabled
>> ??Compressed Oops: Enabled (Zero based)
>> ??Heap Region Size: 16M
>> ??Heap Min Capacity: 16M
>> ??Heap Initial Capacity: 2016M
>> ??Heap Max Capacity: 30688M
>> ??Pre-touch: Disabled
>> ??Parallel Workers: 23
>> ??Concurrent Workers: 6
>> ??Concurrent Refinement Workers: 23
>> ??Periodic GC: Disabled
>>
>> I've changed the generic GCInitLogger so all GCs get these lines 
>> dumped. I've also added the G1 and Parallel specific lines. I can add 
>> the Shenandoah and Epsilon specific lines, if it is requested.
>>
>> Thanks,
>> StefanK


From per.liden at oracle.com  Tue Jun  9 08:58:07 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 9 Jun 2020 10:58:07 +0200
Subject: RFR: 8245203/8245204/8245208: ZGC: Don't hold the ZPageAllocator
 lock while committing/uncommitting memory
In-Reply-To: <718bf9e0-5d2f-ecd0-e3dc-d47d34e60d03@oracle.com>
References: <8ea6dc02-c518-9b6e-6038-589bd9ed86b1@oracle.com>
 <f3584c15-7bb9-df2b-9668-9cb038104a6e@oracle.com>
 <49447851-8f64-8c99-8443-5b400e1851c0@oracle.com>
 <06db729e-d804-a0b4-262a-aa70181d904b@oracle.com>
 <dfa723dd-334f-88cf-c21b-f4e09539fc28@oracle.com>
 <b63aae55-7a5b-00a7-5b4c-48284da7c318@oracle.com>
 <718bf9e0-5d2f-ecd0-e3dc-d47d34e60d03@oracle.com>
Message-ID: <245e4135-2fb6-2897-0052-3d1f273082e7@oracle.com>

Thanks Stefan!

/Per

On 6/9/20 9:55 AM, Stefan Karlsson wrote:
> Looks good.
> 
> StefanK
> 
> On 2020-06-09 09:14, Erik ?sterlund wrote:
>> Hi Per,
>>
>> Looks good.
>>
>> Thanks,
>> /Erik
>>
>> On 2020-06-05 17:15, Per Liden wrote:
>>> Hi,
>>>
>>> Here are the latest patches for this. Contains more review comments 
>>> from Stefan, and they have been rebased on today's jdk/jdk.
>>>
>>> * 8246220: ZGC: Introduce ZUnmapper to asynchronous unmap pages
>>> http://cr.openjdk.java.net/~pliden/8246220/webrev.2
>>>
>>> * 8245208: ZGC: Don't hold the ZPageAllocator lock while 
>>> committing/uncommitting memory
>>> http://cr.openjdk.java.net/~pliden/8245208/webrev.3
>>>
>>> * 8246265: ZGC: Introduce ZConditionLock
>>> http://cr.openjdk.java.net/~pliden/8246265/webrev.0/
>>>
>>> * (already review) 8245204: ZGC: Introduce ZListRemoveIterator
>>> http://cr.openjdk.java.net/~pliden/8245204/webrev.0/
>>>
>>> * (already review) 8245203: ZGC: Don't track size in 
>>> ZPhysicalMemoryBacking
>>> http://cr.openjdk.java.net/~pliden/8245203/webrev.0/
>>>
>>>
>>> And for convenience, here's an all-in-one patch:
>>> http://cr.openjdk.java.net/~pliden/zgc/commit_uncommit/all/webrev.1
>>>
>>> cheers,
>>> Per
>>
> 


From stefan.johansson at oracle.com  Tue Jun  9 09:18:02 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Tue, 9 Jun 2020 11:18:02 +0200
Subject: RFR: 8246272: Make use of GCLogPrecious for G1, Parallel and
 Serial
In-Reply-To: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
References: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
Message-ID: <01da116e-1fbb-4564-1cce-efc9e58a7a67@oracle.com>

Hi Stefan,

On 2020-06-08 20:48, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to turn the GC init logging into "precious" 
> logging, so that those lines get dumped into the hs_err file.
> 
> https://cr.openjdk.java.net/~stefank/8246272/webrev.01/
Looks good.

Thanks,
StefanJ

> https://bugs.openjdk.java.net/browse/JDK-8246272
> 
> Example output from a G1 hs_err file:
> 
> GC Precious Log:
>  ?CPUs: 32 total, 32 available
>  ?Memory: 125G
>  ?Large Page Support: Disabled
>  ?NUMA Support: Disabled
>  ?Compressed Oops: Enabled (Zero based)
>  ?Heap Region Size: 16M
>  ?Heap Min Capacity: 16M
>  ?Heap Initial Capacity: 2016M
>  ?Heap Max Capacity: 30688M
>  ?Pre-touch: Disabled
>  ?Parallel Workers: 23
>  ?Concurrent Workers: 6
>  ?Concurrent Refinement Workers: 23
>  ?Periodic GC: Disabled
> 
> I've changed the generic GCInitLogger so all GCs get these lines dumped. 
> I've also added the G1 and Parallel specific lines. I can add the 
> Shenandoah and Epsilon specific lines, if it is requested.
> 
> Thanks,
> StefanK


From stefan.karlsson at oracle.com  Tue Jun  9 09:23:53 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 9 Jun 2020 11:23:53 +0200
Subject: RFR: 8246272: Make use of GCLogPrecious for G1, Parallel and
 Serial
In-Reply-To: <01da116e-1fbb-4564-1cce-efc9e58a7a67@oracle.com>
References: <ea088142-02e2-61ce-a411-a6f27dca4990@oracle.com>
 <01da116e-1fbb-4564-1cce-efc9e58a7a67@oracle.com>
Message-ID: <c1c20569-490b-6e0e-5ff4-0db33521fc47@oracle.com>

Thanks, StefanJ.

StefanK

On 2020-06-09 11:18, stefan.johansson at oracle.com wrote:
> Hi Stefan,
>
> On 2020-06-08 20:48, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to turn the GC init logging into "precious" 
>> logging, so that those lines get dumped into the hs_err file.
>>
>> https://cr.openjdk.java.net/~stefank/8246272/webrev.01/
> Looks good.
>
> Thanks,
> StefanJ
>
>> https://bugs.openjdk.java.net/browse/JDK-8246272
>>
>> Example output from a G1 hs_err file:
>>
>> GC Precious Log:
>> ??CPUs: 32 total, 32 available
>> ??Memory: 125G
>> ??Large Page Support: Disabled
>> ??NUMA Support: Disabled
>> ??Compressed Oops: Enabled (Zero based)
>> ??Heap Region Size: 16M
>> ??Heap Min Capacity: 16M
>> ??Heap Initial Capacity: 2016M
>> ??Heap Max Capacity: 30688M
>> ??Pre-touch: Disabled
>> ??Parallel Workers: 23
>> ??Concurrent Workers: 6
>> ??Concurrent Refinement Workers: 23
>> ??Periodic GC: Disabled
>>
>> I've changed the generic GCInitLogger so all GCs get these lines 
>> dumped. I've also added the G1 and Parallel specific lines. I can add 
>> the Shenandoah and Epsilon specific lines, if it is requested.
>>
>> Thanks,
>> StefanK


From zgu at redhat.com  Tue Jun  9 12:14:35 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 9 Jun 2020 08:14:35 -0400
Subject: [15] RFR 8246591: Shenandoah: move string dedup roots scanning to
 concurrent phase
Message-ID: <bd24ff8d-320f-af6a-7c85-898f444501b8@redhat.com>

Please review this patch that moves string deduplication roots scanning 
to concurrent phase.

Bug: https://bugs.openjdk.java.net/browse/JDK-8246591
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246591/webrev.00/

Test:
   hotspot_gc_shenandoah
   tier1 with "-XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions 
-XX:+ShenandoahVerify -XX:+UseStringDeduplication"

Thanks,

-Zhengyu


From stefan.karlsson at oracle.com  Tue Jun  9 12:35:47 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 9 Jun 2020 14:35:47 +0200
Subject: RFR: 8247214: ZGC: ZUncommit initialization should use precious
 logging
Message-ID: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>

Hi all,

Please review this trivial patch to convert the ZUncommit initialization 
logging to use precious logging.

https://cr.openjdk.java.net/~stefank/8247214/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8247214

Thanks,
StefanK


From erik.osterlund at oracle.com  Tue Jun  9 13:08:35 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 9 Jun 2020 15:08:35 +0200
Subject: RFR: 8247214: ZGC: ZUncommit initialization should use precious
 logging
In-Reply-To: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
References: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
Message-ID: <3876ec17-3904-8848-d51a-eebac38cfd54@oracle.com>

Hi Stefan,

Looks good and trivial.

Thanks,
/Erik

On 2020-06-09 14:35, Stefan Karlsson wrote:
> Hi all,
>
> Please review this trivial patch to convert the ZUncommit 
> initialization logging to use precious logging.
>
> https://cr.openjdk.java.net/~stefank/8247214/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8247214
>
> Thanks,
> StefanK


From stefan.karlsson at oracle.com  Tue Jun  9 13:12:51 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 9 Jun 2020 15:12:51 +0200
Subject: RFR: 8247214: ZGC: ZUncommit initialization should use precious
 logging
In-Reply-To: <3876ec17-3904-8848-d51a-eebac38cfd54@oracle.com>
References: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
 <3876ec17-3904-8848-d51a-eebac38cfd54@oracle.com>
Message-ID: <c71cb896-d595-5a39-1761-4be061800c6b@oracle.com>

Thanks, Erik.

StefanK

On 2020-06-09 15:08, Erik ?sterlund wrote:
> Hi Stefan,
>
> Looks good and trivial.
>
> Thanks,
> /Erik
>
> On 2020-06-09 14:35, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this trivial patch to convert the ZUncommit 
>> initialization logging to use precious logging.
>>
>> https://cr.openjdk.java.net/~stefank/8247214/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8247214
>>
>> Thanks,
>> StefanK
>


From per.liden at oracle.com  Tue Jun  9 13:21:07 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 9 Jun 2020 15:21:07 +0200
Subject: RFR: 8247214: ZGC: ZUncommit initialization should use precious
 logging
In-Reply-To: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
References: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
Message-ID: <bb88d40f-9b08-8060-31ca-5e8081cf7e87@oracle.com>

Looks good!

/Per

On 6/9/20 2:35 PM, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this trivial patch to convert the ZUncommit initialization 
> logging to use precious logging.
> 
> https://cr.openjdk.java.net/~stefank/8247214/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8247214
> 
> Thanks,
> StefanK


From stefan.karlsson at oracle.com  Tue Jun  9 13:26:24 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 9 Jun 2020 15:26:24 +0200
Subject: RFR: 8247214: ZGC: ZUncommit initialization should use precious
 logging
In-Reply-To: <bb88d40f-9b08-8060-31ca-5e8081cf7e87@oracle.com>
References: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
 <bb88d40f-9b08-8060-31ca-5e8081cf7e87@oracle.com>
Message-ID: <b7d9ef00-6661-00f2-9e89-be719472cc5b@oracle.com>

Thanks, Per!

StefanK

On 2020-06-09 15:21, Per Liden wrote:
> Looks good!
>
> /Per
>
> On 6/9/20 2:35 PM, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this trivial patch to convert the ZUncommit 
>> initialization logging to use precious logging.
>>
>> https://cr.openjdk.java.net/~stefank/8247214/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8247214
>>
>> Thanks,
>> StefanK


From thomas.schatzl at oracle.com  Tue Jun  9 13:55:40 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 9 Jun 2020 15:55:40 +0200
Subject: RFR: 8247214: ZGC: ZUncommit initialization should use precious
 logging
In-Reply-To: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
References: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
Message-ID: <122184a9-9e74-33ed-286b-e55810716f19@oracle.com>

Hi,

On 09.06.20 14:35, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this trivial patch to convert the ZUncommit initialization 
> logging to use precious logging.
> 
> https://cr.openjdk.java.net/~stefank/8247214/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8247214

   lgtm.

Thomas


From stefan.karlsson at oracle.com  Tue Jun  9 13:57:17 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 9 Jun 2020 15:57:17 +0200
Subject: RFR: 8247214: ZGC: ZUncommit initialization should use precious
 logging
In-Reply-To: <122184a9-9e74-33ed-286b-e55810716f19@oracle.com>
References: <2c959e48-ce95-d025-450d-541d84d813cd@oracle.com>
 <122184a9-9e74-33ed-286b-e55810716f19@oracle.com>
Message-ID: <1d803e70-7f8d-8400-8d4d-f9be2745782d@oracle.com>

Thanks, Thomas.

StefanK

On 2020-06-09 15:55, Thomas Schatzl wrote:
> Hi,
>
> On 09.06.20 14:35, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this trivial patch to convert the ZUncommit 
>> initialization logging to use precious logging.
>>
>> https://cr.openjdk.java.net/~stefank/8247214/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8247214
>
> ? lgtm.
>
> Thomas
>


From ioi.lam at oracle.com  Wed Jun 10 05:49:17 2020
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 9 Jun 2020 22:49:17 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <67cfca64-c83f-99b2-aabf-3be7cf4acb26@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
 <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
 <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>
 <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>
 <f61a2143-a0c2-fc8c-4e4e-fa75ddb3de61@oracle.com>
 <59c601d8-8db9-256c-b3be-2d9064ff0cef@oracle.com>
 <67cfca64-c83f-99b2-aabf-3be7cf4acb26@oracle.com>
Message-ID: <283668b6-23c8-ee26-ba8b-370f79d5d7d5@oracle.com>


On 6/9/20 12:57 AM, Thomas Schatzl wrote:
> Hi,
>
> On 09.06.20 00:21, Ioi Lam wrote:
>> > Hi Thomas & Jiangli,
>>
>> Thanks for the comments. I have created an updated webrev:
>>
>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v04/ 
>>
>>
>> Now the code works the same way as jcmd GC.heap_dump, which also 
>> check for GCLocker before calling collect_as_vm_thread():
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/523a504bb062/src/hotspot/share/gc/shared/gcVMOperations.cpp#l124 
>>
>>
>> Question to Thomas -- is it necessary to call 
>> Universe::heap()->ensure_parsability(false)?
>
> Not for the GC, it will do that by itself. From what I understand from 
> the CDS dump code, it does not walk the heap memory linearly directly 
> from object to object, and does not rely on HeapRegion statistics 
> (used bytes etc), so no.
>
>>
>> Have removed the loop and just print a warning message that the 
>> archived heap may be sub-optimal.
>
> Thanks.
>
>>
>> Jiangli> Could you please describe the specific issue that you want 
>> to address with GCLocker?
>>
>> With the current code base, we execute very simple Java code during 
>> -Xshare:dump, so after all classes are loaded, GCLocker should not be 
>> active. As noted in the code comments, GCLocker will be active only 
>> in the unlikely scenario where the Java code in core lib has been 
>> modified to do some sort of clean up that involves JNI code.
>>
>> What do you think?
>>
>> Thanks
>> - Ioi
>>
>>
>> ======
>> [PS] I am adding an assert like this to run tier1-4 in mach5 to see 
>> if this can happen.
>>
>> +? assert(!GCLocker::is_active(), "huh");
>> ??? if (GCLocker::is_active()) {
>>
>
> I agree with Jiangli that the run_gc() method should probably be put 
> in HeapShared.
>
> Maybe: in the warning that finds an active GC locker, the message may 
> be less intimidating if the message spoke of an extra GC or something. 
> I think the important part for the user here could be that this is an 
> extra GC.
>
> Something like: "GC locker is held, unable to start extra compacting 
> GC. This may produce suboptimal results.".
>
> Feel free to ignore this comment.

Hi Thomas, I have moved the function as HeapShared::run_gc() and changed 
the warning message as you suggested.

http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v05/
>
> Also the code would look nicer if 
> CollectedHeap::collect_as_vm_thread() returned a bool about success 
> itself, but I'll leave that to you.
>

Since VM_GC_HeapInspection::collect() also has the same code pattern, I 
think it's best to change this in a separate RFE.

Thanks
- Ioi
> Thanks,
> ? Thomas


From shade at redhat.com  Wed Jun 10 06:53:17 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jun 2020 08:53:17 +0200
Subject: [15] RFR 8246591: Shenandoah: move string dedup roots scanning to
 concurrent phase
In-Reply-To: <bd24ff8d-320f-af6a-7c85-898f444501b8@redhat.com>
References: <bd24ff8d-320f-af6a-7c85-898f444501b8@redhat.com>
Message-ID: <7bad54a4-d6f0-15bd-cd84-bad71ad3e9bd@redhat.com>

On 6/9/20 2:14 PM, Zhengyu Gu wrote:
> Please review this patch that moves string deduplication roots scanning 
> to concurrent phase.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8246591
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8246591/webrev.00/

Looks fine.

-- 
Thanks,
-Aleksey


From shade at redhat.com  Wed Jun 10 08:56:16 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jun 2020 10:56:16 +0200
Subject: RFR (S) 8247310: Shenandoah: pacer should not affect interrupt status
Message-ID: <686314f1-d202-025a-bd88-3951d10dbf53@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8247310

This was originally found by Aditya Mandaleeka when running sh/jdk11 tests, kudos goes to him for
analysing it. See the details in the bug report.

Fix:
  https://cr.openjdk.java.net/~shade/8247310/webrev.01/

Testing: hotspot_gc_shenandoah

-- 
Thanks,
-Aleksey


From thomas.schatzl at oracle.com  Wed Jun 10 09:31:00 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 10 Jun 2020 11:31:00 +0200
Subject: RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1
Message-ID: <4528908c-e34c-270b-89ec-df798c9a8e84@oracle.com>

Hi all, Liang,

   after a few months of busy working in the area of G1 heap resizing 
and ultimately SoftMaxHeapSize support, I am fairly okay with a first 
preview of these associated changes. So I would like to ask for feedback 
on the current changes for what I intend to complete in the (early) 
jdk16 timeframe.

This is not a request for review of the changes for pushing, although 
feedback on the code is also appreciated.

 From my point of view only tuning a few heuristics and code polishing 
is left to do as the change seems to do what it is intended to do.

In particular it would be nice if Liang Mao, the original requestor of 
all this functionality, could help with feedback on his loads. :)

Just to recap: Sometime around end of last year, Liang posted review(s) 
with functionality to:
- concurrent uncommit of memory
- implement SoftMaxHeapSize by uncommitting free memory

That did not work well in some cases, so we agreed on us at Oracle 
taking over. Today I would like to talk about the progress on the second 
part :)

The original proposal did not work well because it did not really change 
how G1 resized the heap - i.e. SoftMaxHeapSize related changes to the 
heap were quickly undone by regular heap expansion because it was too 
aggressive for several reasons (e.g. bugs like JDK-8243672, 
JDK-8244603), uncooperative (JDK-8238686) and never actually helped 
shrinking or keeping a particular heap size.

This resulted in lots of unnecessary heap changes even on known constant 
load.

After some analysis after fixing these issues (at least internally ;)) I 
thought that for G1 to keep a particular heap size G1 needs to have an 
element in its heap sizing control loop that pushes back on (excessive) 
heap expansion.

The best approach I thought of has been to introduce a *lower* 
GCTimeRatio that G1 tries to stay *above* by resizing the heap. 
Effectively, G1 then tries to stay within ["LowerGCTimeRatio", 
GCTimeRatio] for its actual gc time ratio.

That works out fairly well actually, and today I thought that the code 
is in a state, while still heavy in development (it does look like that 
:) still), could be provided for gathering feedback on more loads from you.

First, how to try and use before going into the details and questions I 
have:

This is a series of patches, which I put up on cr.openjdk.net that need 
to be applied on recent trunk:

These are the ones already out for review:

1) JDK-8243672: http://cr.openjdk.java.net/~tschatzl/8243672/webrev.1/
2) JDK-8244603: http://cr.openjdk.java.net/~tschatzl/8244603/webrev/

These are in the pipeline and not "fully complete" yet:

3) JDK-8238163: http://cr.openjdk.java.net/~tschatzl/8238163/webrev/ 
(optional)
4) JDK-8238686: http://cr.openjdk.java.net/~tschatzl/8238686/webrev/
5) JDK-8238687: http://cr.openjdk.java.net/~tschatzl/8238687/webrev/
6) JDK-8236073: http://cr.openjdk.java.net/~tschatzl/8236073/webrev/

All of the above: 
http://cr.openjdk.java.net/~tschatzl/8236073/webrev.preview/

What these do:

(1) and (2) make the input variables to the control loop more 
consistent. Since they are out for review, I would defer to the review 
threads for them.

(3) stabilizes IHOP calculation a bit, trying to improve uncommon 
situations. This change is optional.

(4) fixes the issue with resizing at Remark being totally disconnected 
with actual load, causing some erratic expansions and shrinks.
After some time tinkering with that I decided to remove resizing at 
Remark - since we check heap size at every gc anyway, this is not 
required any more (but also delaying uncommit to the next gc).

(5) is the main change that implements a what has been mentioned above: 
G1 tries to keep actual GC time ratio within the range of 
LowerGCTimeRatio and GCTimeRatio. As long as actual GC time ratio is 
within this range, no action occurs. As soon as it finds that there is a 
trend of being outside, it tries to correct for that, internally trying 
to reach an actual gc time ratio in the middle of that range.

(6) implements SoftMaxHeapSize on top of that, trying to steer IHOP so 
that G1 does not use more than that. (I.e. a complete mess of 
potentially conflicting goals ;)

What I would like to ask you is try out these changes on your load(s), 
and potentially report back with at least

gc*,gc=debug,gc+ergo+heap=trace

logging.

Of course more feedback about how it works for you is even better, and 
if you are adventurous, maybe try tuning (internal) knobs a bit, which 
I'll describe in a minute :)

As mentioned, the changes are not complete, here's what I think should 
still be tuned a bit, and what I expect helps. The interesting method is 
G1HeapSizingPolicy::resize_amount_after_young_gc().

- determining the desired gc time ratio range: there is a new (likely 
temporary) option G1MinimumPercentOfGCTimeRatio that determines the 
lower gc time ratio described above as percentage of the original 
GCTimeRatio. Currently set at 50%, which seems a good value as a too 
tight range will cause lots of resizing (which might be good), and a too 
large range will effectively disable shrinking (which also might be 
desired).
Either way, this value works fairly well so far in my tests. Suggestions 
very appreciated.

- detection of being outside of the expected gc time ratio range: this 
somewhat works as before, separating short term and long term behavior.

Long term behavior: every X gcs without a heap resize g1 looks if long 
term gc ratio is outside of the bounds, if so, react. I think this is 
fairly straightforward.

Short term behavior: tracks the amount of times short term gc time ratio 
exceeds the bounds in a single variable, incrementing or decrementing it 
depending on whether current gc time ratio is above or below the gc time 
ratio bounds. If that value exceeds certain thresholds, do something.

There is a new bias towards expansion at startup to make g1 react faster 
at that time, and some decay towards "no action to be taken" if for a 
"long" time nothing happens.

I reused the same values for "short" time (+/-4) and "long" (10) as 
before, they seem to be okay.

- actual resizing: expansion is supposed to be the same as before, 
relatively aggressive, which I intend to keep.

Shrinking is based on the number of free regions at the moment. This is 
not optimal because e.g. you do not want to shrink below what is needed 
for current eden (and the survivors of the next gc).

Other than that it is bounded by a percentage of the number of free 
regions (G1ShrinkByPercentOfAvailable). That results some heap size 
undershoot in some cases (i.e. temporarily uncommitting a bit to much), 
but in my tests it hasn't been too bad.

Still rather (too) simple, expect some tunings and changes particularly 
here, deviating a bit more from the expansion code.

Comments and ideas in this area, particularly ones applied to your 
workloads, particularly appreciated.

Another big area not yet really tested is interaction with JEP 346: 
Promptly Return Unused Committed Memory from G1, but I am certain that 
with it you can reduce heap usage a lot (too much?).

My performance (throughput) tests so far look almost always encouraging: 
20-30% less heap size with statistically insignificant throughput 
changes. There are some exceptions, in these cases you loose 10% of 
throughput for like 90% of less heap usage.

The only really bad results come from tests that try to find the maximum 
throughput of g1 by incrementally increasing the load finding out that 
it does not work, slightly back off with the load and then increase the 
load again to find an "equilibrium". From what I can tell it looks like 
the heap sizing follows the application (i.e. what it's supposed to do), 
making the application think it's already done while there is still more 
heap available to potentially increase performance (looking at you 
specjbb2015 out-of-box performance!).

Not yet sure how to counter that, but some decrease in default 
GCTimeRatio to decrease the shrinking aggressiveness (and keeping more 
heap longer) might fix this.

Of course, if you disable this adaptive heap sizing by fixing the heap 
min/max in your benchmarks, there are no differences to before.

One interesting upcoming change is to make MinHeapSize manageable 
(JDK-8224879) to help the algorithm a bit.

As closing words, given that the email is quite long already, thanks for 
your attention and looking forward to feedback :)
If you have questions, please chime in too, I am happy to answer them.

Thanks,
   Thomas


From zgu at redhat.com  Wed Jun 10 12:41:01 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 10 Jun 2020 08:41:01 -0400
Subject: RFR (S) 8247310: Shenandoah: pacer should not affect interrupt
 status
In-Reply-To: <686314f1-d202-025a-bd88-3951d10dbf53@redhat.com>
References: <686314f1-d202-025a-bd88-3951d10dbf53@redhat.com>
Message-ID: <708b11b1-b9ea-1b87-fc54-e134394a8514@redhat.com>

Looks fine.

-Zhengyu

On 6/10/20 4:56 AM, Aleksey Shipilev wrote:
> Bug:
>    https://bugs.openjdk.java.net/browse/JDK-8247310
> 
> This was originally found by Aditya Mandaleeka when running sh/jdk11 tests, kudos goes to him for
> analysing it. See the details in the bug report.
> 
> Fix:
>    https://cr.openjdk.java.net/~shade/8247310/webrev.01/
> 
> Testing: hotspot_gc_shenandoah
> 


From leo.korinth at oracle.com  Wed Jun 10 12:46:46 2020
From: leo.korinth at oracle.com (Leo Korinth)
Date: Wed, 10 Jun 2020 14:46:46 +0200
Subject: RFR: 8247213: G1: Reduce usage of volatile in favour of Atomic
 operations
Message-ID: <4611e386-2d3f-efaa-fdce-920c7d3d5b85@oracle.com>

Hi, could I have a review for this change that adds AtomicValue<> to 
atomic.hpp and uses it in G1?

I am adding an AtomicValue<> to atomic.hpp. This is an opaque type (the 
"value" is private) and that protect against non-atomic operations being 
used by mistake. AtomicValue methods are proxied to the corresponding 
(all-static) Atomic:: methods. All operations are explicitly atomic and 
in a type safe manner with semantics defined in enum atomic_memory_order
(memory_order_conservative by default).

Instead of using field variables as volatile, I change them to be of 
AtomicValue<> type. No need to verify that += (for example) will result 
in an atomic instruction on all supported compilers.

I have some open questions regarding the exported fields in 
vmStructs_g1.hpp. Today, volatile fields are sometimes "exported" as 
volatile (and sometimes not, and I guess this is by mistake). I choose 
to export them all as non-volatile. From what I can see the volatile 
specific part only casts to void* (only documentation?). Java code is 
unchanged and still access them as the unwrapped values (static assert 
in atomic.hpp guarantees that memory layout is the same for T and 
AtomicValue<T>). I think this is okay, but would like feedback on all this.

The change is presented as a two part change. The first part changes all 
volatile to AtomicValue, the other part removes the AtomicValue part on 
non-field accesses. By doing it two part I will not forget to transform 
some operations by mistake.

Copyright years will be updated when all other changes are approved.

How about pushing this after 15 is branched off and thus have it for 16?

Enhancement:
https://bugs.openjdk.java.net/browse/JDK-8247213

Webrev:
http://cr.openjdk.java.net/~lkorinth/8247213/0/part1
http://cr.openjdk.java.net/~lkorinth/8247213/0/part2
http://cr.openjdk.java.net/~lkorinth/8247213/0/full

Testing:
   tier1-3

Thanks,
Leo


From shade at redhat.com  Wed Jun 10 14:06:17 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 10 Jun 2020 16:06:17 +0200
Subject: RFR (S) 8247310: Shenandoah: pacer should not affect interrupt
 status
In-Reply-To: <708b11b1-b9ea-1b87-fc54-e134394a8514@redhat.com>
References: <686314f1-d202-025a-bd88-3951d10dbf53@redhat.com>
 <708b11b1-b9ea-1b87-fc54-e134394a8514@redhat.com>
Message-ID: <bd7aff4f-6613-8987-a70b-9509fdbe18bc@redhat.com>

On 6/10/20 2:41 PM, Zhengyu Gu wrote:
> Looks fine.

Cheers, pushed.

-- 
-Aleksey


From jianglizhou at google.com  Wed Jun 10 16:38:19 2020
From: jianglizhou at google.com (Jiangli Zhou)
Date: Wed, 10 Jun 2020 09:38:19 -0700
Subject: RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC
In-Reply-To: <283668b6-23c8-ee26-ba8b-370f79d5d7d5@oracle.com>
References: <793b0618-184a-eec6-4136-4344b221e4a7@oracle.com>
 <80ada90e-598b-c7ca-3782-0a779e195291@oracle.com>
 <7157bbee-03e7-9fc7-47e0-a25d98970875@oracle.com>
 <863b1b7d-bb1a-ae3c-2990-eca26803788a@oracle.com>
 <CALrW1jwQx2Hkyg+L2+WPkwDa9VjeAu5Q93xGER7iSxVcFD7Dbw@mail.gmail.com>
 <81cd3cec-e39d-8a16-6e07-2ff0b0b80dc8@oracle.com>
 <CALrW1jxZswaptzFmbQDpHRpn5+4qkxLGt4HPhpB=VDN9D4+WbQ@mail.gmail.com>
 <b6cc0e4b-7e17-dc24-5ba3-c09887a55a02@oracle.com>
 <CALrW1jwR82-H-hh1vkFJkNg0SthTRYffFB8wqT9T7-7c9apmAw@mail.gmail.com>
 <fbc59a63-7f79-40cc-ba2c-3d664e935c01@oracle.com>
 <f61a2143-a0c2-fc8c-4e4e-fa75ddb3de61@oracle.com>
 <59c601d8-8db9-256c-b3be-2d9064ff0cef@oracle.com>
 <67cfca64-c83f-99b2-aabf-3be7cf4acb26@oracle.com>
 <283668b6-23c8-ee26-ba8b-370f79d5d7d5@oracle.com>
Message-ID: <CALrW1jzK2VJXZhAjnPwshh7h3=XV9kT2sG55JMwR5=VWjTT04g@mail.gmail.com>

The updated version looks good and cleaner.

Thanks!
Jiangli

On Tue, Jun 9, 2020 at 10:50 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 6/9/20 12:57 AM, Thomas Schatzl wrote:
> > Hi,
> >
> > On 09.06.20 00:21, Ioi Lam wrote:
> >> > Hi Thomas & Jiangli,
> >>
> >> Thanks for the comments. I have created an updated webrev:
> >>
> >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v04/
> >>
> >>
> >> Now the code works the same way as jcmd GC.heap_dump, which also
> >> check for GCLocker before calling collect_as_vm_thread():
> >>
> >> http://hg.openjdk.java.net/jdk/jdk/file/523a504bb062/src/hotspot/share/gc/shared/gcVMOperations.cpp#l124
> >>
> >>
> >> Question to Thomas -- is it necessary to call
> >> Universe::heap()->ensure_parsability(false)?
> >
> > Not for the GC, it will do that by itself. From what I understand from
> > the CDS dump code, it does not walk the heap memory linearly directly
> > from object to object, and does not rely on HeapRegion statistics
> > (used bytes etc), so no.
> >
> >>
> >> Have removed the loop and just print a warning message that the
> >> archived heap may be sub-optimal.
> >
> > Thanks.
> >
> >>
> >> Jiangli> Could you please describe the specific issue that you want
> >> to address with GCLocker?
> >>
> >> With the current code base, we execute very simple Java code during
> >> -Xshare:dump, so after all classes are loaded, GCLocker should not be
> >> active. As noted in the code comments, GCLocker will be active only
> >> in the unlikely scenario where the Java code in core lib has been
> >> modified to do some sort of clean up that involves JNI code.
> >>
> >> What do you think?
> >>
> >> Thanks
> >> - Ioi
> >>
> >>
> >> ======
> >> [PS] I am adding an assert like this to run tier1-4 in mach5 to see
> >> if this can happen.
> >>
> >> +  assert(!GCLocker::is_active(), "huh");
> >>     if (GCLocker::is_active()) {
> >>
> >
> > I agree with Jiangli that the run_gc() method should probably be put
> > in HeapShared.
> >
> > Maybe: in the warning that finds an active GC locker, the message may
> > be less intimidating if the message spoke of an extra GC or something.
> > I think the important part for the user here could be that this is an
> > extra GC.
> >
> > Something like: "GC locker is held, unable to start extra compacting
> > GC. This may produce suboptimal results.".
> >
> > Feel free to ignore this comment.
>
> Hi Thomas, I have moved the function as HeapShared::run_gc() and changed
> the warning message as you suggested.
>
> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v05/
> >
> > Also the code would look nicer if
> > CollectedHeap::collect_as_vm_thread() returned a bool about success
> > itself, but I'll leave that to you.
> >
>
> Since VM_GC_HeapInspection::collect() also has the same code pattern, I
> think it's best to change this in a separate RFE.
>
> Thanks
> - Ioi
> > Thanks,
> >   Thomas
>
>


From ralf.schmelter at sap.com  Wed Jun 10 21:00:15 2020
From: ralf.schmelter at sap.com (Schmelter, Ralf)
Date: Wed, 10 Jun 2020 21:00:15 +0000
Subject: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to
 "Multiple garbage collectors selected"
Message-ID: <AM6PR0202MB335246D3D8E3A376CA267BA49F830@AM6PR0202MB3352.eurprd02.prod.outlook.com>

Hi,

https://bugs.openjdk.java.net/browse/JDK-8237354 added a test, which did not properly protect against explicitly set GCs (for serial, parallel and G1 GC). This fixes it by adding the corresponding @requires tag for each of the three GCs.

bugreport: https://bugs.openjdk.java.net/browse/JDK-8247362
webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8247362/webrev.0/

Best regards,
Ralf


From daniel.daugherty at oracle.com  Wed Jun 10 21:06:43 2020
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Wed, 10 Jun 2020 17:06:43 -0400
Subject: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to
 "Multiple garbage collectors selected"
In-Reply-To: <AM6PR0202MB335246D3D8E3A376CA267BA49F830@AM6PR0202MB3352.eurprd02.prod.outlook.com>
References: <AM6PR0202MB335246D3D8E3A376CA267BA49F830@AM6PR0202MB3352.eurprd02.prod.outlook.com>
Message-ID: <708c0e2c-7838-0cfb-ea1b-1de5ae43a830@oracle.com>

Hi Ralf,

This looks correct to me, but please wait for one of the GC folks to
chime in on this thread...

Dan


On 6/10/20 5:00 PM, Schmelter, Ralf wrote:
> Hi,
>
> https://bugs.openjdk.java.net/browse/JDK-8237354 added a test, which did not properly protect against explicitly set GCs (for serial, parallel and G1 GC). This fixes it by adding the corresponding @requires tag for each of the three GCs.
>
> bugreport: https://bugs.openjdk.java.net/browse/JDK-8247362
> webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8247362/webrev.0/
>
> Best regards,
> Ralf
>


From stefan.karlsson at oracle.com  Wed Jun 10 21:32:35 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 10 Jun 2020 23:32:35 +0200
Subject: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to
 "Multiple garbage collectors selected"
In-Reply-To: <AM6PR0202MB335246D3D8E3A376CA267BA49F830@AM6PR0202MB3352.eurprd02.prod.outlook.com>
References: <AM6PR0202MB335246D3D8E3A376CA267BA49F830@AM6PR0202MB3352.eurprd02.prod.outlook.com>
Message-ID: <b7fc77f8-f6d3-690a-f70d-96c68df8c775@oracle.com>

Looks good.

StefanK

On 2020-06-10 23:00, Schmelter, Ralf wrote:
>
> Hi,
>
> https://bugs.openjdk.java.net/browse/JDK-8237354 added a test, which 
> did not properly protect against explicitly set GCs (for serial, 
> parallel and G1 GC). This fixes it by adding the corresponding 
> @requires tag for each of the three GCs.
>
> bugreport: https://bugs.openjdk.java.net/browse/JDK-8247362 
> <https://bugs.openjdk.java.net/browse/JDK-8247362>
>
> webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8247362/webrev.0/
>
> Best regards,
>
> Ralf
>


From ralf.schmelter at sap.com  Wed Jun 10 21:38:33 2020
From: ralf.schmelter at sap.com (Schmelter, Ralf)
Date: Wed, 10 Jun 2020 21:38:33 +0000
Subject: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to
 "Multiple garbage collectors selected"
In-Reply-To: <b7fc77f8-f6d3-690a-f70d-96c68df8c775@oracle.com>
References: <AM6PR0202MB335246D3D8E3A376CA267BA49F830@AM6PR0202MB3352.eurprd02.prod.outlook.com>
 <b7fc77f8-f6d3-690a-f70d-96c68df8c775@oracle.com>
Message-ID: <AM6PR0202MB335285724B129DB24045C2FC9F830@AM6PR0202MB3352.eurprd02.prod.outlook.com>

Hi Stefan and Daniel,

Thanks for reviewing. I will push this change if there are no further concerns.

Best regards,
Ralf

-----Original Message-----
From: Stefan Karlsson <stefan.karlsson at oracle.com> 
Sent: Wednesday, 10 June 2020 23:33
To: Schmelter, Ralf <ralf.schmelter at sap.com>; serviceability-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net runtime <hotspot-runtime-dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net
Subject: Re: RFR(S): JDK-8247362 HeapDumpCompressedTest.java#id0 fails due to "Multiple garbage collectors selected"

Looks good.

StefanK

On 2020-06-10 23:00, Schmelter, Ralf wrote:
>
> Hi,
>
> https://bugs.openjdk.java.net/browse/JDK-8237354 added a test, which 
> did not properly protect against explicitly set GCs (for serial, 
> parallel and G1 GC). This fixes it by adding the corresponding 
> @requires tag for each of the three GCs.
>
> bugreport: https://bugs.openjdk.java.net/browse/JDK-8247362 
> <https://bugs.openjdk.java.net/browse/JDK-8247362>
>
> webrev: http://cr.openjdk.java.net/~rschmelter/webrevs/8247362/webrev.0/
>
> Best regards,
>
> Ralf
>


From stefan.johansson at oracle.com  Thu Jun 11 07:51:00 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Thu, 11 Jun 2020 09:51:00 +0200
Subject: RFR (L): 8244603 and 8238858: Improve young gen sizing
In-Reply-To: <d5c748c8-f10d-65ff-b1fb-50c0ab123384@oracle.com>
References: <d5c748c8-f10d-65ff-b1fb-50c0ab123384@oracle.com>
Message-ID: <5da7c2e2-2d36-de11-d0b7-91cdf6fdc077@oracle.com>

Hi Thomas,

Sorry for not getting to this sooner.

On 2020-05-19 15:37, Thomas Schatzl wrote:
> Hi all,
> 
>  ? can I have reviews for this change that improves young gen sizing a 
> lot to prepare for heap shrinking during young gc (JDK-8238687) ;)
> 
> In particular, with this change the following issues two related issues 
> are fixed:
> 
> * 8238858: G1 Mixed gc young gen sizing might cause the first mixed gc 
> to immediately follow the prepare mixed gc
> * 8244603: G1 incorrectly limiting young gen size when using the reserve 
> can result in repeated full gcs
> 
> These have been grouped together because it is too hard to separate them 
> out as the bugs required significant rewrite in the young gen sizing.
> 
> This results in G1 following GCTimeRatio much better than before, 
> leading to less erratic heap expansion. That is, constant loads do not 
> result in that much overcommit any more.
> 
> Some background:
> 
> At end of gc and when the remembered set sampling thread (re-)evaluates 
> the young gen G1 calculates two values:
> 
> - the *desired* (unconstrained) size of the young gen; 
> desired/unconstrained meaning that these values are not limited by 
> actually existing free regions. This value is interesting for adaptive 
> IHOP so that it (better) converges to a fixed value. (There is some 
> bugfix, JDK-8238163, coming for this to fix a few issues with that)
> 
> - the actual *target* size for the young gen, i.e. after taking 
> constraints in available free regions into account, and whether we are 
> currently already needing to use parts of the reserve or not.
> 
> Some problems that were fixed with the current code:
> 
> - during calculation of the *desired* young gen length G1 basically 
> sizes the young gen during mixed gc to the minimum allowed size always. 
> This causes unnecessary spikes in short/long term ratios, causing lots 
> of heap increases even with a constant load.
> Since we do not shrink the heap yet during regular gcs, this typically 
> ended up in fully expanding the heap (unless the reclamations during 
> Remark actually reclaimed something, but the equilibrium of committed 
> heap between these two mechanisms is much higher).
> 
> E.g. on specjbb2015 fixed IR (constant load) with "small" and "medium" 
> load G1 will use half the heap now while staying < GCTimeRatio.
> 
> - at end of gc g1 calculates the young gen for the *next* gc, i.e. 
> during the prepare mixed gc g1 should already use the "reduced" amount 
> of regions (that's JDK-8238858); similarly the *last* mixed gc in the 
> mixed gc phase should already use the calculation for the young phase. 
> The current code does not. This partially fixes some "at end of mixed gc 
> it takes a while for g1 to achieve previous young gen size again" issues.
> (There is a CR for that, but as mentioned, this change does not 
> completely solve it).
> 
> - there were some calculations to ensure that "at least one region will 
> be allocated" every time g1 recalculates young gen but that really 
> resulted in g1 increasing the young gen by at least one. You do not 
> always want that, particularly since we regularly revise the young gen. 
> What you want is a minimum desired *eden* size.
> 
> - the desired and target young gen size calculation was done in a single 
> huge method. This change splits up the code a bit.
> 
> - the code that calculated the actual *target* young length has been 
> very inconsistent in handling the reserve. I.e. the limit to the 
> actually available free regions only applied in some cases, and notably 
> not if you were already using the reserve, causing strange behavior 
> where at least the calculated young gen target length was higher than 
> available free regions. This could cause full gcs.
> 
> - I added some detailed trace-level logging for the ergonomic decisions 
> which really helps when debugging issues, but might be too 
> large/intrusive for the code - I am not sure whether to leave it in.
> 
> Reviewing: I think the best entry point for this change is 
> G1Policy::update_young_list_target_length(size_t) where the new code 
> first calculates a desired young gen length and then target young length.
> 
> Some additional notes:
> - eden before a mixed gc is calculated by predicting the time for the 
> minimum amount of old gen regions we will definitely take 
> (min_old_cset_length) and then letting eden fill up the remainder.
> 
> Often this results in significantly larger young gens than before this 
> change, at worst young gen will be limited to minimum young gen size (as 
> before). Overall it works fairly well, i.e. gives much smoother cpu 
> usage. There is a caveat to that in that it depends on accuracy of 
> predictions. Since G1 predictions are often too high, we might want to 
> take more a lot more optional regions in the future to not be required 
> to early terminate the mixed gc
> I.e. I have often seen that we do not use up the 200ms pause time goal.
> 
> - currently, and like before, MMU desired length overrides the pause 
> time desired length. I.e. *if* a GCPauseTimeIntervalMillis is set, the 
> spacing between gcs is more important than actual pause times. The 
> difference is that now that there is an explicit comment about this 
> behavior there :)
> 
> - when eating into the reserve (last 10% of heap), we at most allow use 
> of the sizer's min young length or half of the reserve regions (rounded 
> up!), whichever is higher. This is an arbitrary decision, but since 
> supposedly at that time we are already close to the next mixed gc due to 
> adaptive ihop, so we can take more.
> 
> - note that all this is about calculating a *target* length, not the 
> actual length. Actual length may be higher e.g. due to gclocker.
> 
> - there may be some out-of-box performance regressions since G1 does not 
> expand the heap that much more. Performance can be restored by either 
> decreasing GCTimeRatio, or better setting minimum heap sizes.
> 
> Actually, in the future, when shrinking is implemented (JDK-8238687), 
> these may be more severe (in some benchmarks, actual gc usage is still 
> <2%). I will likely try to balance that with decreasing default 
> GCTimeRatio value in the future.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8244603
> https://bugs.openjdk.java.net/browse/JDK-8238858
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8244603/webrev/
Very nice change Thomas, really helpful with all comments.

As I've mentioned to you offline I think we can re-structure the code a 
bit, to separate the updating of young length bounds from the returning 
of values. Here's a suggestion on how to do that:
http://cr.openjdk.java.net/~sjohanss/8244603/rev-1/

src/hotspot/share/gc/g1/g1Analytics.cpp
---
  226 double G1Analytics::predict_alloc_rate_ms() const {
  227   if (!enough_samples_available(_alloc_rate_ms_seq)) {
  228     return predict_zero_bounded(_alloc_rate_ms_seq);
  229   } else {
  230     return 0.0;
  231   }
  232 }

As discussed, on line 227 the ! should be removed.
---

Apart from this I think it is all good. There are a few places in 
g1Policy.cpp where local variables could be either merged or skipped, 
but I think they add to the overall ease of understanding.

Thanks,
Stefan


> Testing:
> mach5 tier1-5, perf testing
> 
> Thanks,
>  ? Thomas


From shade at redhat.com  Thu Jun 11 08:47:53 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 11 Jun 2020 10:47:53 +0200
Subject: RFR (XS) 8247358: Shenandoah: reconsider free budget slice for marking
Message-ID: <c86d33d0-0820-6cf9-7aca-98f18eb3ce4b@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247358

See the details in JIRA.

Fix:
  https://cr.openjdk.java.net/~shade/8247358/webrev.01/

Testing: hotspot_gc_shenandoah, benchmarks

-- 
Thanks,
-Aleksey


From shade at redhat.com  Thu Jun 11 08:49:43 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 11 Jun 2020 10:49:43 +0200
Subject: RFR (S) 8247367: Shenandoah: pacer should wait on lock instead of
 exponential backoff
Message-ID: <7e2fa715-211a-4459-f20d-1075afeec180@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247367

After JDK-8247310, we can just use the wait/notify on the newly introduced lock to coordinate
wakeups of paced threads. This avoids doing exponential backoff that introduces additional latency.

Fix:
  https://cr.openjdk.java.net/~shade/8247367/webrev.01/

Testing: hotspot_gc_shenandoah; benchmarks

-- 
Thanks,
-Aleksey


From leo.korinth at oracle.com  Thu Jun 11 12:39:27 2020
From: leo.korinth at oracle.com (Leo Korinth)
Date: Thu, 11 Jun 2020 14:39:27 +0200
Subject: RFR: 8247213: G1: Reduce usage of volatile in favour of Atomic
 operations
In-Reply-To: <4611e386-2d3f-efaa-fdce-920c7d3d5b85@oracle.com>
References: <4611e386-2d3f-efaa-fdce-920c7d3d5b85@oracle.com>
Message-ID: <332626b7-5947-523a-2622-d2555da2c6e8@oracle.com>

Lets take this review on hotspot-dev instead. Sorry for the confusion.

Thanks,
Leo

On 10/06/2020 14:46, Leo Korinth wrote:
> Hi, could I have a review for this change that adds AtomicValue<> to 
> atomic.hpp and uses it in G1?
> 
> I am adding an AtomicValue<> to atomic.hpp. This is an opaque type (the 
> "value" is private) and that protect against non-atomic operations being 
> used by mistake. AtomicValue methods are proxied to the corresponding 
> (all-static) Atomic:: methods. All operations are explicitly atomic and 
> in a type safe manner with semantics defined in enum atomic_memory_order
> (memory_order_conservative by default).
> 
> Instead of using field variables as volatile, I change them to be of 
> AtomicValue<> type. No need to verify that += (for example) will result 
> in an atomic instruction on all supported compilers.
> 
> I have some open questions regarding the exported fields in 
> vmStructs_g1.hpp. Today, volatile fields are sometimes "exported" as 
> volatile (and sometimes not, and I guess this is by mistake). I choose 
> to export them all as non-volatile. From what I can see the volatile 
> specific part only casts to void* (only documentation?). Java code is 
> unchanged and still access them as the unwrapped values (static assert 
> in atomic.hpp guarantees that memory layout is the same for T and 
> AtomicValue<T>). I think this is okay, but would like feedback on all this.
> 
> The change is presented as a two part change. The first part changes all 
> volatile to AtomicValue, the other part removes the AtomicValue part on 
> non-field accesses. By doing it two part I will not forget to transform 
> some operations by mistake.
> 
> Copyright years will be updated when all other changes are approved.
> 
> How about pushing this after 15 is branched off and thus have it for 16?
> 
> Enhancement:
> https://bugs.openjdk.java.net/browse/JDK-8247213
> 
> Webrev:
> http://cr.openjdk.java.net/~lkorinth/8247213/0/part1
> http://cr.openjdk.java.net/~lkorinth/8247213/0/part2
> http://cr.openjdk.java.net/~lkorinth/8247213/0/full
> 
> Testing:
>  ? tier1-3
> 
> Thanks,
> Leo


From zgu at redhat.com  Thu Jun 11 13:12:02 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Thu, 11 Jun 2020 09:12:02 -0400
Subject: RFR (S) 8247367: Shenandoah: pacer should wait on lock instead of
 exponential backoff
In-Reply-To: <7e2fa715-211a-4459-f20d-1075afeec180@redhat.com>
References: <7e2fa715-211a-4459-f20d-1075afeec180@redhat.com>
Message-ID: <8d1b7ef4-427b-9460-0ec7-70a05b7738fc@redhat.com>

Looks good.

-Zhengyu

On 6/11/20 4:49 AM, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8247367
> 
> After JDK-8247310, we can just use the wait/notify on the newly introduced lock to coordinate
> wakeups of paced threads. This avoids doing exponential backoff that introduces additional latency.
> 
> Fix:
>    https://cr.openjdk.java.net/~shade/8247367/webrev.01/
> 
> Testing: hotspot_gc_shenandoah; benchmarks
> 


From shade at redhat.com  Fri Jun 12 10:56:18 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Fri, 12 Jun 2020 12:56:18 +0200
Subject: [15] RFR (XS) 8247474: Shenandoah: Windows build warning after
 JDK-8247310
Message-ID: <378c703d-3e8c-4550-241b-e44adb111638@redhat.com>

(resending from proper email)

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8247474

Since the underlying cause is the changeset in 15, I am planning to push it to jdk/jdk15.

Fix:

diff -r a39eb5a4f1c1 src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp       Thu Jun 11 18:16:32 2020 +0200
+++ b/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp       Fri Jun 12 12:38:30 2020 +0200
@@ -279,10 +279,11 @@
 }

-void ShenandoahPacer::wait(long time_ms) {
+void ShenandoahPacer::wait(size_t time_ms) {
   // Perform timed wait. It works like like sleep(), except without modifying
   // the thread interruptible status. MonitorLocker also checks for safepoints.
   assert(time_ms > 0, "Should not call this with zero argument, as it would stall until notify");
+  assert(time_ms <= LONG_MAX, "Sanity");
   MonitorLocker locker(_wait_monitor);
-  _wait_monitor->wait(time_ms);
+  _wait_monitor->wait((long)time_ms);
 }

diff -r a39eb5a4f1c1 src/hotspot/share/gc/shenandoah/shenandoahPacer.hpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahPacer.hpp       Thu Jun 11 18:16:32 2020 +0200
+++ b/src/hotspot/share/gc/shenandoah/shenandoahPacer.hpp       Fri Jun 12 12:38:30 2020 +0200
@@ -102,5 +102,5 @@
   size_t update_and_get_progress_history();

-  void wait(long time_ms);
+  void wait(size_t time_ms);
   void notify_waiters();
 };


Testing: Linux, Windows builds; hotspot_gc_shenandoah

Thanks,
-Aleksey


From rkennke at redhat.com  Sun Jun 14 09:13:16 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Sun, 14 Jun 2020 11:13:16 +0200
Subject: [15] RFR (XS) 8247474: Shenandoah: Windows build warning after
 JDK-8247310
In-Reply-To: <378c703d-3e8c-4550-241b-e44adb111638@redhat.com>
References: <378c703d-3e8c-4550-241b-e44adb111638@redhat.com>
Message-ID: <1c96cdc1f7c6b73bb919c0d5cbb2da8fa41867d1.camel@redhat.com>

Looks good!

Thank you!

Roman


(resending from proper email)
> 
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8247474
> 
> Since the underlying cause is the changeset in 15, I am planning to
> push it to jdk/jdk15.
> 
> Fix:
> 
> diff -r a39eb5a4f1c1
> src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp       Thu
> Jun 11 18:16:32 2020 +0200
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp       Fri
> Jun 12 12:38:30 2020 +0200
> @@ -279,10 +279,11 @@
>  }
> 
> -void ShenandoahPacer::wait(long time_ms) {
> +void ShenandoahPacer::wait(size_t time_ms) {
>    // Perform timed wait. It works like like sleep(), except without
> modifying
>    // the thread interruptible status. MonitorLocker also checks for
> safepoints.
>    assert(time_ms > 0, "Should not call this with zero argument, as
> it would stall until notify");
> +  assert(time_ms <= LONG_MAX, "Sanity");
>    MonitorLocker locker(_wait_monitor);
> -  _wait_monitor->wait(time_ms);
> +  _wait_monitor->wait((long)time_ms);
>  }
> 
> diff -r a39eb5a4f1c1
> src/hotspot/share/gc/shenandoah/shenandoahPacer.hpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahPacer.hpp       Thu
> Jun 11 18:16:32 2020 +0200
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahPacer.hpp       Fri
> Jun 12 12:38:30 2020 +0200
> @@ -102,5 +102,5 @@
>    size_t update_and_get_progress_history();
> 
> -  void wait(long time_ms);
> +  void wait(size_t time_ms);
>    void notify_waiters();
>  };
> 
> 
> Testing: Linux, Windows builds; hotspot_gc_shenandoah
> 
> Thanks,
> -Aleksey
> 


From shade at redhat.com  Mon Jun 15 07:38:37 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jun 2020 09:38:37 +0200
Subject: [15] RFR (XS) 8247560: Shenandoah: heap iteration holds root locks
 all the time
Message-ID: <a6abdd82-c8ef-3e76-c7c4-31be181c95f5@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8247560

Newly added compressed hprof test exposes a trouble with Shenandoah heap iteration: "Attempting to
wait on monitor HProf Compression Backend/11 while holding lock CodeCache_lock/6 -- possible deadlock".

ShenandoahHeapIterationRootScanner holds the CodeCache_lock for code roots iteration, and it lingers
for the entirety of heap iteration. The fix is to scope it properly:

diff -r a39eb5a4f1c1 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp        Thu Jun 11 18:16:32 2020 +0200
+++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp        Mon Jun 15 09:19:21 2020 +0200
@@ -1298,9 +1298,14 @@
   Stack<oop,mtGC> oop_stack;

-  // First, we process GC roots according to current GC cycle. This populates the work stack with
initial objects.
-  ShenandoahHeapIterationRootScanner rp;
   ObjectIterateScanRootClosure oops(&_aux_bit_map, &oop_stack);

-  rp.roots_do(&oops);
+  {
+    // First, we process GC roots according to current GC cycle.
+    // This populates the work stack with initial objects.
+    // It is important to relinquish the associated locks before diving
+    // into heap dumper.
+    ShenandoahHeapIterationRootScanner rp;
+    rp.roots_do(&oops);
+  }

   // Work through the oop stack to traverse heap.

Testing: hotspot_gc_shenandoah, affected tests (many times)

-- 
Thanks,
-Aleksey


From thomas.schatzl at oracle.com  Mon Jun 15 09:23:20 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 15 Jun 2020 11:23:20 +0200
Subject: RFR (L): 8244603 and 8238858: Improve young gen sizing
In-Reply-To: <5da7c2e2-2d36-de11-d0b7-91cdf6fdc077@oracle.com>
References: <d5c748c8-f10d-65ff-b1fb-50c0ab123384@oracle.com>
 <5da7c2e2-2d36-de11-d0b7-91cdf6fdc077@oracle.com>
Message-ID: <37a95d37-5253-4dff-ff46-3b82b4e4bcdf@oracle.com>

Hi Stefan,

On 11.06.20 09:51, stefan.johansson at oracle.com wrote:
> Hi Thomas,
> 
> Sorry for not getting to this sooner.
> 
> On 2020-05-19 15:37, Thomas Schatzl wrote:
>> Hi all,
>>
>> ?? can I have reviews for this change that improves young gen sizing a 
>> lot to prepare for heap shrinking during young gc (JDK-8238687) ;)
[...]>> Actually, in the future, when shrinking is implemented 
(JDK-8238687),
>> these may be more severe (in some benchmarks, actual gc usage is still 
>> <2%). I will likely try to balance that with decreasing default 
>> GCTimeRatio value in the future.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8244603
>> https://bugs.openjdk.java.net/browse/JDK-8238858
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8244603/webrev/
> Very nice change Thomas, really helpful with all comments.
> 
> As I've mentioned to you offline I think we can re-structure the code a 
> bit, to separate the updating of young length bounds from the returning 
> of values. Here's a suggestion on how to do that:
> http://cr.openjdk.java.net/~sjohanss/8244603/rev-1/
> 
> src/hotspot/share/gc/g1/g1Analytics.cpp
> ---
>  ?226 double G1Analytics::predict_alloc_rate_ms() const {
>  ?227?? if (!enough_samples_available(_alloc_rate_ms_seq)) {
>  ?228???? return predict_zero_bounded(_alloc_rate_ms_seq);
>  ?229?? } else {
>  ?230???? return 0.0;
>  ?231?? }
>  ?232 }
> 
> As discussed, on line 227 the ! should be removed.
> ---
> 
> Apart from this I think it is all good. There are a few places in 
> g1Policy.cpp where local variables could be either merged or skipped, 
> but I think they add to the overall ease of understanding.

Applied all your comments.

New webrev:
http://cr.openjdk.java.net/~tschatzl/8244603/webrev.0_to_1 (diff)
http://cr.openjdk.java.net/~tschatzl/8244603/webrev.1 (full)

Thanks,
   Thomas


From shade at redhat.com  Mon Jun 15 09:58:44 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jun 2020 11:58:44 +0200
Subject: RFR (XS) 8247575: serviceability/dcmd/gc/HeapDumpCompressedTest
 unlocks experimental options for Shenandoah and Z
Message-ID: <6c65077f-65c5-764c-c68b-d8e695f3e28c@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247575

I was looking at this new test, and thought we should be consistent here:

diff -r 627cfc1935b7 test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java
--- a/test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java     Fri Jun 12 16:40:47
2020 +0200
+++ b/test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java     Mon Jun 15 11:55:47
2020 +0200
@@ -80,5 +80,5 @@
  *          java.management
  *          jdk.internal.jvmstat/sun.jvmstat.monitor
- * @run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+UseZGC HeapDumpCompressedTest
+ * @run main/othervm -XX:+UseZGC HeapDumpCompressedTest
  */

@@ -92,5 +92,5 @@
  *          java.management
  *          jdk.internal.jvmstat/sun.jvmstat.monitor
- * @run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC HeapDumpCompressedTest
+ * @run main/othervm -XX:+UseShenandoahGC HeapDumpCompressedTest
  */

I can push it to either jdk/jdk15 or jdk/jdk.

Testing: affected test on Linux x86_64 fastdebug, nothing else

-- 
Thanks,
-Aleksey


From stefan.karlsson at oracle.com  Mon Jun 15 10:01:17 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Mon, 15 Jun 2020 12:01:17 +0200
Subject: RFR (XS) 8247575: serviceability/dcmd/gc/HeapDumpCompressedTest
 unlocks experimental options for Shenandoah and Z
In-Reply-To: <6c65077f-65c5-764c-c68b-d8e695f3e28c@redhat.com>
References: <6c65077f-65c5-764c-c68b-d8e695f3e28c@redhat.com>
Message-ID: <5aefdf0d-4630-20c7-d75b-d5eee6a5270a@oracle.com>

Looks good.

StefanK

On 2020-06-15 11:58, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8247575
>
> I was looking at this new test, and thought we should be consistent here:
>
> diff -r 627cfc1935b7 test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java
> --- a/test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java     Fri Jun 12 16:40:47
> 2020 +0200
> +++ b/test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java     Mon Jun 15 11:55:47
> 2020 +0200
> @@ -80,5 +80,5 @@
>    *          java.management
>    *          jdk.internal.jvmstat/sun.jvmstat.monitor
> - * @run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+UseZGC HeapDumpCompressedTest
> + * @run main/othervm -XX:+UseZGC HeapDumpCompressedTest
>    */
>
> @@ -92,5 +92,5 @@
>    *          java.management
>    *          jdk.internal.jvmstat/sun.jvmstat.monitor
> - * @run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC HeapDumpCompressedTest
> + * @run main/othervm -XX:+UseShenandoahGC HeapDumpCompressedTest
>    */
>
> I can push it to either jdk/jdk15 or jdk/jdk.
>
> Testing: affected test on Linux x86_64 fastdebug, nothing else
>


From volker.simonis at gmail.com  Mon Jun 15 10:19:04 2020
From: volker.simonis at gmail.com (Volker Simonis)
Date: Mon, 15 Jun 2020 12:19:04 +0200
Subject: Need help to fix a potential G1 crash in jdk11
In-Reply-To: <f67237f4-cfd2-edf8-6488-6ac07c5ba2d6@oracle.com>
References: <CA+3eh11vEWqwn+fLvgG=v8gi0fTRLebo8cYOnS7qBWCueJbH-Q@mail.gmail.com>
 <e553e35e-3cb3-1c03-048a-3e4e7adad923@oracle.com>
 <CA+3eh12=fhJhXcyeP9kiwo9JUCVU1aXzUpQRAuXMCos2jsqGDw@mail.gmail.com>
 <CA+3eh132QHBGxyjU3K3ZQ_Xv-UbUUukfQ_3F2c9dD+hb=AAC8g@mail.gmail.com>
 <abd45ff1-7699-c1a5-d143-d2ec24e55667@oracle.com>
 <f67237f4-cfd2-edf8-6488-6ac07c5ba2d6@oracle.com>
Message-ID: <CA+3eh10a2+0uXZqmt+TiQtf9P8JUs6pce7+svPCUBB4MbKWMWw@mail.gmail.com>

Hi Poonam,

thanks for your assistance. Unfortunately, -XX:+VerifyRememberedSets
doesn't provide any additional information. I still get the
verification errors at "Verifying During GC (Remark after)"

Best regards,
Volker

On Mon, Jun 8, 2020 at 4:57 PM Poonam Parhar <poonam.bajaj at oracle.com> wrote:
>
> Hi Volker,
>
> Did you try running with -XX:+VerifyRememberedSets? This might tell if
> the problem is related to remset updates.
>
> Thanks,
> Poonam
>
> On 6/5/20 1:55 AM, Erik ?sterlund wrote:
> > Hi Volker,
> >
> > On 2020-06-03 20:18, Volker Simonis wrote:
> >> Unfortunately, "-XX:-ClassUnloading" doesn't help :(
> >
> > I am actually happy that did not help. I suspect a bug in that code
> > would be harder to track down; it is rather complicated.
> >
> >> I already saw two new crashes. The first one has 6 distinct Root
> >> locations pointing to one dead object:
> >>
> >> [863.222s][info ][gc,verify,start   ] Verifying During GC (Remark after)
> >> [863.222s][debug][gc,verify         ] Threads
> >> [863.224s][debug][gc,verify         ] Heap
> >> [863.224s][debug][gc,verify         ] Roots
> >> [863.229s][error][gc,verify         ] Root location
> >> 0x00007f11719174e7 points to dead obj 0x00000000f956dbd8
> >> [863.229s][error][gc,verify         ]
> >> org.antlr.v4.runtime.atn.PredictionContextCache
> >> [863.229s][error][gc,verify         ] {0x00000000f956dbd8} - klass:
> >> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> >> ...
> >> [863.229s][error][gc,verify         ] Root location
> >> 0x00007f1171921978 points to dead obj 0x00000000f956dbd8
> >> [863.229s][error][gc,verify         ]
> >> org.antlr.v4.runtime.atn.PredictionContextCache
> >> [863.229s][error][gc,verify         ] {0x00000000f956dbd8} - klass:
> >> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> >> [863.231s][debug][gc,verify         ] HeapRegionSets
> >> [863.231s][debug][gc,verify         ] HeapRegions
> >> [863.349s][error][gc,verify         ] Heap after failed verification
> >> (kind 0):
> >>
> >> The second crash has only two Root locations pointing to the same
> >> dead object but more than 40_000 fields in distinct objects pointing
> >> to more than 3_500 dead objects:
> >>
> >> [854.473s][info ][gc,verify,start   ] Verifying During GC (Remark after)
> >> [854.473s][debug][gc,verify         ] Threads
> >> [854.475s][debug][gc,verify         ] Heap
> >> [854.475s][debug][gc,verify         ] Roots
> >> [854.479s][error][gc,verify         ] Root location
> >> 0x00007f6e60461d5f points to dead obj 0x00000000fa874528
> >> [854.479s][error][gc,verify         ]
> >> org.antlr.v4.runtime.atn.PredictionContextCache
> >> [854.479s][error][gc,verify         ] {0x00000000fa874528} - klass:
> >> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> >> [854.479s][error][gc,verify         ] Root location
> >> 0x00007f6e60461d6d points to dead obj 0x00000000fa874528
> >> [854.479s][error][gc,verify         ]
> >> org.antlr.v4.runtime.atn.PredictionContextCache
> >> [854.479s][error][gc,verify         ] {0x00000000fa874528} - klass:
> >> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> >> [854.479s][error][gc,verify         ] Root location
> >> 0x00007f6e60462138 points to dead obj 0x00000000fa874528
> >> [854.479s][error][gc,verify         ]
> >> org.antlr.v4.runtime.atn.PredictionContextCache
> >> [854.479s][error][gc,verify         ] {0x00000000fa874528} - klass:
> >> 'org/antlr/v4/runtime/atn/PredictionContextCache'
> >> [854.482s][debug][gc,verify         ] HeapRegionSets
> >> [854.482s][debug][gc,verify         ] HeapRegions
> >> [854.484s][error][gc,verify         ] ----------
> >> [854.484s][error][gc,verify         ] Field 0x00000000fd363c70 of
> >> live obj 0x00000000fd363c58 in region [0x00000000fd300000,
> >> 0x00000000fd400000)
> >> [854.484s][error][gc,verify         ] class name
> >> org.antlr.v4.runtime.atn.ATNConfig
> >> [854.484s][error][gc,verify         ] points to dead obj
> >> 0x00000000fa88a540 in region [0x00000000fa800000, 0x00000000fa900000)
> >> [854.484s][error][gc,verify         ] class name
> >> org.antlr.v4.runtime.atn.ArrayPredictionContext
> >> [854.484s][error][gc,verify         ] ----------
> >> ...
> >> more than 40_000 fields in distinct objects pointing to more than
> >> 3_500 dead objects.
> >>
> >> So how can this happen. Is "-XX:+VerifyAfterGC" really reliable here?
> >
> > Naturally, it's hard to tell for definite what the issue is with only
> > these printouts.
> > However, we can make some observations from the printouts:
> >
> > Based on the address values of the "Root location" of the printouts,
> > each dead object
> > reported is pointed at from at least one misaligned oop. The only
> > misaligned oops in
> > HotSpot are nmethod oops embedded into the instruction stream as
> > immediates.
> > So this smells like some kind of nmethod oop processing bug in G1 to me.
> >
> > The Abortable Mixed GCs (https://openjdk.java.net/jeps/344) that went
> > into 12 changed
> > quite a bit of the nmethod oop scanning code. Perhaps the reason why
> > this stopped
> > reproducing in 12 is related to that. The nmethod oop processing code
> > introduced with
> > AMGC actually had a word tearing problem for nmethod oops, which was
> > fixed later with
> > https://bugs.openjdk.java.net/browse/JDK-8235305
> >
> > Hope these pointers help.
> >
> > /Erik
> >
> >> Thank you and best regards,
> >> Volker
> >>
> >>
> >> On Wed, Jun 3, 2020 at 7:14 PM Volker Simonis
> >> <volker.simonis at gmail.com <mailto:volker.simonis at gmail.com>> wrote:
> >>
> >>     Hi Erik,
> >>
> >>     thanks a lot for the quick response and the hint with
> >>     ClassUnloading. I've just started several runs of the test program
> >>     with "-XX:-ClassUnloading". I'll report back instantly once I have
> >>     some results.
> >>
> >>     Best regards,
> >>     Volker
> >>
> >>     On Wed, Jun 3, 2020 at 5:26 PM Erik ?sterlund
> >>     <erik.osterlund at oracle.com <mailto:erik.osterlund at oracle.com>>
> >> wrote:
> >>
> >>         Hi Volker,
> >>
> >>         In JDK 12, I changed quite a bit how G1 performs class
> >>         unloading, to a
> >>         new model.
> >>         Since the verification runs just after class unloading, I
> >>         guess it could
> >>         be interesting
> >>         to check if the error happens with -XX:-ClassUnloading as
> >>         well. If not,
> >>         then perhaps
> >>         some of my class unloading changes for G1 in JDK 12 fixed the
> >>         problem.
> >>
> >>         Just a gut feeling...
> >>
> >>         Thanks,
> >>         /Erik
> >>
> >>         On 2020-06-03 17:02, Volker Simonis wrote:
> >>         > Hi,
> >>         >
> >>         > I would appreciate some help/advice for debugging a
> >>         potential G1 crash in
> >>         > jdk 11. The crash usually occurs when running a proprietary
> >>         jar file for
> >>         > about 20-30 minutes and it happens in various parts of the
> >>         VM (C1- or
> >>         > C2-compiled code, interpreter, GC). Because the crash
> >>         locations are so
> >>         > different and because the customer which reported the issue
> >>         claimed that it
> >>         > doesn't happen with Parallel GC, I thought it might be a G1
> >>         issue. I
> >>         > couldn't reproduce the crash with jdk 12 and 14 (but with
> >>         jdk 11 and
> >>         > 11.0.7, OpenJDK and Oracle JDK). When looking at the G1
> >>         changes in jdk 12 I
> >>         > couldn't find any apparent bug fix which potentially solves
> >>         this problem
> >>         > but it may have been solved by one of the many G1 changes
> >>         which happened in
> >>         > jdk 12.
> >>         >
> >>         > I did run the reproducer with "-XX:+UnlockDiagnosticVMOptions
> >>         > -XX:+VerifyBeforeGC -XX:+VerifyAfterGC -XX:+VerifyDuringGC
> >>         > -XX:+CheckJNICalls -XX:+G1VerifyRSetsDuringFullGC
> >>         > -XX:+G1VerifyHeapRegionCodeRoots" and I indeed got
> >>         verification errors (see
> >>         > [1] for a complete hs_err file). Sometimes it's just a few
> >>         fields pointing
> >>         > to dead objects:
> >>         >
> >>         > [1035.782s][error][gc,verify         ] ----------
> >>         > [1035.782s][error][gc,verify         ] Field
> >>         0x00000000fb509148 of live obj
> >>         > 0x00000000fb509130 in region [0x00000000fb500000,
> >>         0x00000000fb600000)
> >>         > [1035.782s][error][gc,verify         ] class name
> >>         > org.antlr.v4.runtime.atn.ATNConfig
> >>         > [1035.782s][error][gc,verify         ] points to dead obj
> >>         > 0x00000000f9ba39b0 in region [0x00000000f9b00000,
> >>         0x00000000f9c00000)
> >>         > [1035.782s][error][gc,verify         ] class name
> >>         > org.antlr.v4.runtime.atn.SingletonPredictionContext
> >>         > [1035.782s][error][gc,verify         ] ----------
> >>         > [1035.783s][error][gc,verify         ] Field
> >>         0x00000000fb509168 of live obj
> >>         > 0x00000000fb509150 in region [0x00000000fb500000,
> >>         0x00000000fb600000)
> >>         > [1035.783s][error][gc,verify         ] class name
> >>         > org.antlr.v4.runtime.atn.ATNConfig
> >>         > [1035.783s][error][gc,verify         ] points to dead obj
> >>         > 0x00000000f9ba39b0 in region [0x00000000f9b00000,
> >>         0x00000000f9c00000)
> >>         > [1035.783s][error][gc,verify         ] class name
> >>         > org.antlr.v4.runtime.atn.SingletonPredictionContext
> >>         > [1035.783s][error][gc,verify         ] ----------
> >>         > ...
> >>         > [1043.928s][error][gc,verify         ] Heap Regions:
> >>         E=young(eden),
> >>         > S=young(survivor), O=old, HS=humongous(starts),
> >>         HC=humongous(continues),
> >>         > CS=collection set, F=free, A=archive, TAMS=top-at-mark-start
> >>         (previous,
> >>         > next)
> >>         > ...
> >>         > [1043.929s][error][gc,verify         ] |
> >> 79|0x00000000f9b00000,
> >>         > 0x00000000f9bfffe8, 0x00000000f9c00000| 99%| O| |TAMS
> >>         0x00000000f9bfffe8,
> >>         > 0x00000000f9b00000| Updating
> >>         > ...
> >>         > [1043.971s][error][gc,verify         ] |
> >> 105|0x00000000fb500000,
> >>         > 0x00000000fb54fc08, 0x00000000fb600000| 31%| S|CS|TAMS
> >>         0x00000000fb500000,
> >>         > 0x00000000fb500000| Complete
> >>         >
> >>         > but I also got verification errors with more than 30000
> >>         fields of distinct
> >>         > objects pointing to more than 1000 dead objects. How can
> >>         that happen? Is
> >>         > the verification always accurate or can this also be a
> >>         problem with the
> >>         > verification itself and I'm hunting the wrong problem?
> >>         >
> >>         > Sometimes I also saw verification errors where fields point
> >>         to objects in
> >>         > regions with "Untracked remset":
> >>         >
> >>         > [673.762s][error][gc,verify] ----------
> >>         > [673.762s][error][gc,verify] Field 0x00000000fca49298 of
> >>         live obj
> >>         > 0x00000000fca49280 in region [0x00000000fca0000
> >>         > 0, 0x00000000fcb00000)
> >>         > [673.762s][error][gc,verify] class name
> >>         org.antlr.v4.runtime.atn.ATNConfig
> >>         > [673.762s][error][gc,verify] points to obj
> >>         0x00000000f9d5a9a0 in region
> >>         >
> >> 81:(F)[0x00000000f9d00000,0x00000000f9d00000,0x00000000f9e00000]
> >>         remset
> >>         > Untracked
> >>         > [673.762s][error][gc,verify] ----------
> >>         >
> >>         > But they are by far not that common like the pointers to
> >>         dead objects. Once
> >>         > I even saw a "Root location" pointing to a dead object:
> >>         >
> >>         > [369.808s][error][gc,verify] Root location
> >>         0x00007f35bb33f1f8 points to
> >>         > dead obj 0x00000000f87fa200
> >>         > [369.808s][error][gc,verify]
> >>         org.antlr.v4.runtime.atn.PredictionContextCache
> >>         > [369.808s][error][gc,verify] {0x00000000f87fa200} - klass:
> >>         > 'org/antlr/v4/runtime/atn/PredictionContextCache'
> >>         > [369.850s][error][gc,verify] ----------
> >>         > [369.850s][error][gc,verify] Field 0x00000000fbc60900 of
> >>         live obj
> >>         > 0x00000000fbc608f0 in region [0x00000000fbc00000,
> >>         0x00000000fbd00000)
> >>         > [369.850s][error][gc,verify] class name
> >>         > org.antlr.v4.runtime.atn.ParserATNSimulator
> >>         > [369.850s][error][gc,verify] points to dead obj
> >>         0x00000000f87fa200 in
> >>         > region [0x00000000f8700000, 0x00000000f8800000)
> >>         > [369.850s][error][gc,verify] class name
> >>         > org.antlr.v4.runtime.atn.PredictionContextCache
> >>         > [369.850s][error][gc,verify] ----------
> >>         >
> >>         > All these verification errors occur after the Remark phase in
> >>         > G1ConcurrentMark::remark() at:
> >>         >
> >>         > verify_during_pause(G1HeapVerifier::G1VerifyRemark,
> >>         > VerifyOption_G1UsePrevMarking, "Remark after");
> >>         >
> >>         > V  [libjvm.so+0x6ca186]  report_vm_error(char const*, int,
> >>         char const*,
> >>         > char const*, ...)+0x106
> >>         > V  [libjvm.so+0x7d4a99]
> >>         G1HeapVerifier::verify(VerifyOption)+0x399
> >>         > V  [libjvm.so+0xe128bb] Universe::verify(VerifyOption, char
> >>         const*)+0x16b
> >>         > V  [libjvm.so+0x7d44ee]
> >>         >  G1HeapVerifier::verify(G1HeapVerifier::G1VerifyType,
> >>         VerifyOption, char
> >>         > const*)+0x9e
> >>         > V  [libjvm.so+0x7addcf]
> >>         >
> >>  G1ConcurrentMark::verify_during_pause(G1HeapVerifier::G1VerifyType,
> >>         > VerifyOption, char const*)+0x9f
> >>         > V  [libjvm.so+0x7b172e] G1ConcurrentMark::remark()+0x3be
> >>         > V  [libjvm.so+0xe6a5e1] VM_CGC_Operation::doit()+0x211
> >>         > V  [libjvm.so+0xe69908] VM_Operation::evaluate()+0xd8
> >>         > V  [libjvm.so+0xe6713f]
> >>         VMThread::evaluate_operation(VM_Operation*) [clone
> >>         > .constprop.54]+0xff
> >>         > V  [libjvm.so+0xe6764e]  VMThread::loop()+0x3be
> >>         > V  [libjvm.so+0xe67a7b]  VMThread::run()+0x7b
> >>         >
> >>         > The GC log output looks as follows:
> >>         > ...
> >>         > [1035.775s][info ][gc,verify,start   ] Verifying During GC
> >>         (Remark after)
> >>         > [1035.775s][debug][gc,verify         ] Threads
> >>         > [1035.776s][debug][gc,verify         ] Heap
> >>         > [1035.776s][debug][gc,verify         ] Roots
> >>         > [1035.782s][debug][gc,verify         ] HeapRegionSets
> >>         > [1035.782s][debug][gc,verify         ] HeapRegions
> >>         > [1035.782s][error][gc,verify         ] ----------
> >>         > ...
> >>         > A more complete GC log can be found here [2].
> >>         >
> >>         > For the field 0x00000000fb509148 of live obj
> >>         0x00000000fb509130 which
> >>         > points to the dead object 0x00000000f9ba39b0 I get the
> >> following
> >>         > information if I inspect them with clhsdb:
> >>         >
> >>         > hsdb> inspect 0x00000000fb509130
> >>         > instance of Oop for org/antlr/v4/runtime/atn/ATNConfig @
> >>         0x00000000fb509130
> >>         > @ 0x00000000fb509130 (size = 32)
> >>         > _mark: 13
> >>         > _metadata._compressed_klass: InstanceKlass for
> >>         > org/antlr/v4/runtime/atn/ATNConfig
> >>         > state: Oop for org/antlr/v4/runtime/atn/BasicState @
> >>         0x00000000f83ecfa8 Oop
> >>         > for org/antlr/v4/runtime/atn/BasicState @ 0x00000000f83ecfa8
> >>         > alt: 1
> >>         > context: Oop for
> >>         org/antlr/v4/runtime/atn/SingletonPredictionContext @
> >>         > 0x00000000f9ba39b0 Oop for
> >>         > org/antlr/v4/runtime/atn/SingletonPredictionContext @
> >>         0x00000000f9ba39b0
> >>         > reachesIntoOuterContext: 8
> >>         > semanticContext: Oop for
> >>         org/antlr/v4/runtime/atn/SemanticContext$Predicate
> >>         > @ 0x00000000f83d57c0 Oop for
> >>         > org/antlr/v4/runtime/atn/SemanticContext$Predicate @
> >>         0x00000000f83d57c0
> >>         >
> >>         > hsdb> inspect 0x00000000f9ba39b0
> >>         > instance of Oop for
> >>         org/antlr/v4/runtime/atn/SingletonPredictionContext @
> >>         > 0x00000000f9ba39b0 @ 0x00000000f9ba39b0 (size = 32)
> >>         > _mark: 41551306041
> >>         > _metadata._compressed_klass: InstanceKlass for
> >>         > org/antlr/v4/runtime/atn/SingletonPredictionContext
> >>         > id: 100635259
> >>         > cachedHashCode: 2005943142
> >>         > parent: Oop for
> >>         org/antlr/v4/runtime/atn/SingletonPredictionContext @
> >>         > 0x00000000f9ba01b0 Oop for
> >>         > org/antlr/v4/runtime/atn/SingletonPredictionContext @
> >>         0x00000000f9ba01b0
> >>         > returnState: 18228
> >>         >
> >>         > I could also reproduce the verification errors with a fast
> >>         debug build of
> >>         > 11.0.7 which I did run with "-XX:+CheckCompressedOops
> >>         -XX:+VerifyOops
> >>         > -XX:+G1VerifyCTCleanup -XX:+G1VerifyBitmaps" in addition to
> >>         the options
> >>         > mentioned before, but unfortunaltey the run didn't trigger
> >>         neither an
> >>         > assertion nor a different verification error.
> >>         >
> >>         > So to summarize, my basic questions are:
> >>         >   - has somebody else encountered similar crashes?
> >>         >   - is someone aware of specific changes in jdk12 which
> >>         might solve this
> >>         > problem?
> >>         >   - are the verification errors I'm seeing accurate or is it
> >>         possible to get
> >>         > false positives when running with
> >>         -XX:Verify{Before,During,After}GC ?
> >>         >
> >>         > Thanks for your patience,
> >>         > Volker
> >>         >
> >>         > [1]
> >>         >
> >> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/hs_err_pid28294.log
> >>         > [2]
> >>         >
> >> http://cr.openjdk.java.net/~simonis/webrevs/2020/jdk11-g1-crash/verify-error.log
> >>
> >
>


From thomas.schatzl at oracle.com  Mon Jun 15 11:54:15 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 15 Jun 2020 13:54:15 +0200
Subject: RFR (XS) 8247575: serviceability/dcmd/gc/HeapDumpCompressedTest
 unlocks experimental options for Shenandoah and Z
In-Reply-To: <6c65077f-65c5-764c-c68b-d8e695f3e28c@redhat.com>
References: <6c65077f-65c5-764c-c68b-d8e695f3e28c@redhat.com>
Message-ID: <018d6d13-667b-822e-0c85-e36b92c157e5@oracle.com>

Hi,

On 15.06.20 11:58, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8247575
> 
> I was looking at this new test, and thought we should be consistent here:
> 
> diff -r 627cfc1935b7 test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java
> --- a/test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java     Fri Jun 12 16:40:47
> 2020 +0200
> +++ b/test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java     Mon Jun 15 11:55:47
> 2020 +0200
> @@ -80,5 +80,5 @@
>    *          java.management
>    *          jdk.internal.jvmstat/sun.jvmstat.monitor
> - * @run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+UseZGC HeapDumpCompressedTest
> + * @run main/othervm -XX:+UseZGC HeapDumpCompressedTest
>    */
> 
> @@ -92,5 +92,5 @@
>    *          java.management
>    *          jdk.internal.jvmstat/sun.jvmstat.monitor
> - * @run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC HeapDumpCompressedTest
> + * @run main/othervm -XX:+UseShenandoahGC HeapDumpCompressedTest
>    */

lgtm. Feel free to push to 15, it's a "testbug" after all.

Thomas


From zgu at redhat.com  Mon Jun 15 12:09:57 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 15 Jun 2020 08:09:57 -0400
Subject: [15] RFR (XS) 8247560: Shenandoah: heap iteration holds root
 locks all the time
In-Reply-To: <a6abdd82-c8ef-3e76-c7c4-31be181c95f5@redhat.com>
References: <a6abdd82-c8ef-3e76-c7c4-31be181c95f5@redhat.com>
Message-ID: <ab59b7a0-ec28-7591-bffc-8159e34cc256@redhat.com>

Yes.

Thanks,

-Zhengyu

On 6/15/20 3:38 AM, Aleksey Shipilev wrote:
> Bug:
>    https://bugs.openjdk.java.net/browse/JDK-8247560
> 
> Newly added compressed hprof test exposes a trouble with Shenandoah heap iteration: "Attempting to
> wait on monitor HProf Compression Backend/11 while holding lock CodeCache_lock/6 -- possible deadlock".
> 
> ShenandoahHeapIterationRootScanner holds the CodeCache_lock for code roots iteration, and it lingers
> for the entirety of heap iteration. The fix is to scope it properly:
> 
> diff -r a39eb5a4f1c1 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp        Thu Jun 11 18:16:32 2020 +0200
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp        Mon Jun 15 09:19:21 2020 +0200
> @@ -1298,9 +1298,14 @@
>     Stack<oop,mtGC> oop_stack;
> 
> -  // First, we process GC roots according to current GC cycle. This populates the work stack with
> initial objects.
> -  ShenandoahHeapIterationRootScanner rp;
>     ObjectIterateScanRootClosure oops(&_aux_bit_map, &oop_stack);
> 
> -  rp.roots_do(&oops);
> +  {
> +    // First, we process GC roots according to current GC cycle.
> +    // This populates the work stack with initial objects.
> +    // It is important to relinquish the associated locks before diving
> +    // into heap dumper.
> +    ShenandoahHeapIterationRootScanner rp;
> +    rp.roots_do(&oops);
> +  }
> 
>     // Work through the oop stack to traverse heap.
> 
> Testing: hotspot_gc_shenandoah, affected tests (many times)
> 


From shade at redhat.com  Mon Jun 15 12:12:25 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jun 2020 14:12:25 +0200
Subject: [15] RFR (XS) 8247560: Shenandoah: heap iteration holds root
 locks all the time
In-Reply-To: <ab59b7a0-ec28-7591-bffc-8159e34cc256@redhat.com>
References: <a6abdd82-c8ef-3e76-c7c4-31be181c95f5@redhat.com>
 <ab59b7a0-ec28-7591-bffc-8159e34cc256@redhat.com>
Message-ID: <47e10dfd-9414-5374-dd62-f95f65161940@redhat.com>

On 6/15/20 2:09 PM, Zhengyu Gu wrote:
> Yes.

Thanks, pushed.

-- 
-Aleksey


From shade at redhat.com  Mon Jun 15 14:17:01 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jun 2020 16:17:01 +0200
Subject: RFR (XS) 8247575: serviceability/dcmd/gc/HeapDumpCompressedTest
 unlocks experimental options for Shenandoah and Z
In-Reply-To: <5aefdf0d-4630-20c7-d75b-d5eee6a5270a@oracle.com>
References: <6c65077f-65c5-764c-c68b-d8e695f3e28c@redhat.com>
 <5aefdf0d-4630-20c7-d75b-d5eee6a5270a@oracle.com>
Message-ID: <306bf0c2-e53c-38dd-584f-096c811a7478@redhat.com>

On 6/15/20 12:01 PM, Stefan Karlsson wrote:
> Looks good.

Thanks. Trivial, right?

-- 
-Aleksey


From thomas.schatzl at oracle.com  Mon Jun 15 14:33:09 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Mon, 15 Jun 2020 16:33:09 +0200
Subject: RFR (XS) 8247575: serviceability/dcmd/gc/HeapDumpCompressedTest
 unlocks experimental options for Shenandoah and Z
In-Reply-To: <306bf0c2-e53c-38dd-584f-096c811a7478@redhat.com>
References: <6c65077f-65c5-764c-c68b-d8e695f3e28c@redhat.com>
 <5aefdf0d-4630-20c7-d75b-d5eee6a5270a@oracle.com>
 <306bf0c2-e53c-38dd-584f-096c811a7478@redhat.com>
Message-ID: <c5541d9e-43d4-52fa-ca63-0d4df71e97d7@oracle.com>

Hi,

On 15.06.20 16:17, Aleksey Shipilev wrote:
> On 6/15/20 12:01 PM, Stefan Karlsson wrote:
>> Looks good.
> 
> Thanks. Trivial, right?
> 

   looks good, and trival to me. (I thought I had sent such answer 
already, but apparently I did not send it)

Thomas


From shade at redhat.com  Mon Jun 15 14:36:44 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jun 2020 16:36:44 +0200
Subject: RFR (XS) 8247575: serviceability/dcmd/gc/HeapDumpCompressedTest
 unlocks experimental options for Shenandoah and Z
In-Reply-To: <c5541d9e-43d4-52fa-ca63-0d4df71e97d7@oracle.com>
References: <6c65077f-65c5-764c-c68b-d8e695f3e28c@redhat.com>
 <5aefdf0d-4630-20c7-d75b-d5eee6a5270a@oracle.com>
 <306bf0c2-e53c-38dd-584f-096c811a7478@redhat.com>
 <c5541d9e-43d4-52fa-ca63-0d4df71e97d7@oracle.com>
Message-ID: <970b91fe-7691-e233-fd96-596f97bb832e@redhat.com>

On 6/15/20 4:33 PM, Thomas Schatzl wrote:
> On 15.06.20 16:17, Aleksey Shipilev wrote:
>> On 6/15/20 12:01 PM, Stefan Karlsson wrote:
>>> Looks good.
>>
>> Thanks. Trivial, right?
>>
>    looks good, and trival to me. (I thought I had sent such answer 
> already, but apparently I did not send it)

Ah, you did, my mailer got me confused.

Pushed!

-- 
Thanks,
-Aleksey


From shade at redhat.com  Mon Jun 15 14:38:24 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 15 Jun 2020 16:38:24 +0200
Subject: RFR (S) 8247593: Shenandoah: should not block pacing reporters
Message-ID: <e23faf77-8973-a4fc-88d1-7a976bba862e@redhat.com>

Bug:
  https://bugs.openjdk.java.net/browse/JDK-8247593

After JDK-8247358, we are acquiring the _wait_monitor in ShenandoahPacer::report_internal. That runs
into potential deadlocks with threads that are waiting on the same lock *and* safepointing at the
same time, against the concurrent workers that want to report the progress before returning for
subsequent safepoint.

Fix:
  https://cr.openjdk.java.net/~shade/8247593/webrev.01/

Testing: hotspot_gc_shenandoah (some tests timed out before); ad-hoc benchmarks

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Tue Jun 16 09:31:31 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Tue, 16 Jun 2020 11:31:31 +0200
Subject: RFR (S) 8247593: Shenandoah: should not block pacing reporters
In-Reply-To: <e23faf77-8973-a4fc-88d1-7a976bba862e@redhat.com>
References: <e23faf77-8973-a4fc-88d1-7a976bba862e@redhat.com>
Message-ID: <98b8c42a2f420197a4dcc8cedd80d70bb6179728.camel@redhat.com>

Ok looks good.

Roman

On Mon, 2020-06-15 at 16:38 +0200, Aleksey Shipilev wrote:
> Bug:
>   https://bugs.openjdk.java.net/browse/JDK-8247593
> 
> After JDK-8247358, we are acquiring the _wait_monitor in
> ShenandoahPacer::report_internal. That runs
> into potential deadlocks with threads that are waiting on the same
> lock *and* safepointing at the
> same time, against the concurrent workers that want to report the
> progress before returning for
> subsequent safepoint.
> 
> Fix:
>   https://cr.openjdk.java.net/~shade/8247593/webrev.01/
> 
> Testing: hotspot_gc_shenandoah (some tests timed out before); ad-hoc
> benchmarks
> 


From zgu at redhat.com  Tue Jun 16 19:48:18 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 16 Jun 2020 15:48:18 -0400
Subject: RFR 8247670: Shenandoah: deadlock during class unloading OOME
Message-ID: <a8f81fb6-718e-41c2-50eb-d04bab521df8@redhat.com>

The deadlock is caused by one thread holding per-nmethod lock, then 
encountering evac-oom. At the same time, another thread entering 
evac-oom scope, then acquiring the same per-nmethod lock.

The first thread expects the second thread to see evac-oom and exit the 
scope, but the second thread is blocked on acquiring per-nmethod lock.

The solution is to introduce an abortable locker on per-nmethod lock. If 
the second thread can not acquire the lock, but see evac-oom, it simply 
aborts, so it can exit evac-oom scope.

The solution does come with penalties:

If the second thread is a Java thread (via nmethod entry barrier), the 
nmethod will be deopt.

If the second thread is worker, it causes current code root processing 
to abort, then restart.


Bug: https://bugs.openjdk.java.net/browse/JDK-8247670
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8247670/webrev.00/

Test:
   hotspot_gc_shenandoah (x86_64 and aarch64)

Thanks,

-Zhengyu


From shade at redhat.com  Wed Jun 17 11:44:35 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jun 2020 13:44:35 +0200
Subject: RFR (S) 8247751: Shenandoah: options tests should run with smaller
 heaps
Message-ID: <3935aa0d-0a10-efd0-4aba-e47eacf0f24a@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247751

On huge machines, the default heap sizes are quite large, and memory zapping takes quite some time
to startup in fastdebug mode. We should be running at least "options" tests with smaller default heaps.

Fix:
  https://cr.openjdk.java.net/~shade/8247751/webrev.01/

Testing: hotspot_gc_shenandoah (faster now)

-- 
Thanks,
-Aleksey


From shade at redhat.com  Wed Jun 17 12:06:15 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jun 2020 14:06:15 +0200
Subject: RFR (XS) 8247754: Shenandoah: mxbeans tests can be shorter
Message-ID: <8e3fcd9c-0c28-9102-2eab-9f149ad09379@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247754

Fix:

diff -r 826804f83f85 test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.java
--- a/test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.java      Wed Jun 17 13:37:46
2020 +0200
+++ b/test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.java      Wed Jun 17 14:05:48
2020 +0200
@@ -92,5 +92,5 @@

     static final long HEAP_MB = 128;                           // adjust for test configuration above
-    static final long TARGET_MB = Long.getLong("target", 8_000); // 8 Gb allocation
+    static final long TARGET_MB = Long.getLong("target", 2_000); // 2 Gb allocation

     // Should we track the churn precisely?
diff -r 826804f83f85 test/hotspot/jtreg/gc/shenandoah/mxbeans/TestPauseNotifications.java
--- a/test/hotspot/jtreg/gc/shenandoah/mxbeans/TestPauseNotifications.java      Wed Jun 17 13:37:46
2020 +0200
+++ b/test/hotspot/jtreg/gc/shenandoah/mxbeans/TestPauseNotifications.java      Wed Jun 17 14:05:48
2020 +0200
@@ -86,5 +86,5 @@

     static final long HEAP_MB = 128;                           // adjust for test configuration above
-    static final long TARGET_MB = Long.getLong("target", 8_000); // 8 Gb allocation
+    static final long TARGET_MB = Long.getLong("target", 2_000); // 2 Gb allocation

     static volatile Object sink;

Testing: hotspot_gc_shenandoah (faster now)

-- 
Thanks,
-Aleksey


From shade at redhat.com  Wed Jun 17 12:26:22 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jun 2020 14:26:22 +0200
Subject: RFR (S) 8247757: Shenandoah: split heavy tests by heuristics to
 improve parallelism
Message-ID: <3580fadc-335e-17a8-9d1c-43779df0d1d3@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247757

There are plenty of tests that take a while and test multiple modes and heuristics. Their
configurations are split by mode. Splitting them further by heuristics improve their parallelism
even more.

Fix:
  https://cr.openjdk.java.net/~shade/8247757/webrev.01/

Testing: hotspot_gc_shenandoah (even faster now)

-- 
Thanks,
-Aleksey


From kim.barrett at oracle.com  Wed Jun 17 12:32:18 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 17 Jun 2020 08:32:18 -0400
Subject: RFR: 8247740: Inline derived CollectedHeap access for G1 and
 ParallelGC 
Message-ID: <15884ED9-3AF4-4F89-894D-5BD42C7796BA@oracle.com>

Please review this change to derived CollectedHeap access for the
various collectors.  Most of the collectors have a heap() function
that returns the derived CollectedHeap object, with the definitions of
these functions being nearly identical.  This change adds a helper
function in CollectedHeap for use by these derived heap() functions.

This change also inlines the heap() functions for G1 and ParallelGC.
These functions have a very simple definition in a release build.
Since both of these collectors have calls in relatively performance
critical places, inlining should be (a little bit) helpful, though I
haven't tried to measure it.  In some cases it may be better to get
the heap once and cache it in a variable or data member; indeed,
that's often done, but not always, and tracking down all the cases
that matter isn't a small task.

This change only does the inlining for G1 and ParallelGC.  Performance
of Serial and Epsilon is less critical, and ZGC does a good job of
avoiding hot calls.  Shenandoah doesn't have a corresponding function;
it seems to have not conformed to the JDK-8077415 change.

CR:
https://bugs.openjdk.java.net/browse/JDK-8247740

Webrev:
https://cr.openjdk.java.net/~kbarrett/8247740/open.00/

Testing:
mach5 tier1


From zgu at redhat.com  Wed Jun 17 12:33:37 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 17 Jun 2020 08:33:37 -0400
Subject: RFR (S) 8247751: Shenandoah: options tests should run with
 smaller heaps
In-Reply-To: <3935aa0d-0a10-efd0-4aba-e47eacf0f24a@redhat.com>
References: <3935aa0d-0a10-efd0-4aba-e47eacf0f24a@redhat.com>
Message-ID: <ebe8f928-1ce6-a279-4469-968c4d918173@redhat.com>

Looks good.

-Zhengyu

On 6/17/20 7:44 AM, Aleksey Shipilev wrote:
> RFE:
>    https://bugs.openjdk.java.net/browse/JDK-8247751
> 
> On huge machines, the default heap sizes are quite large, and memory zapping takes quite some time
> to startup in fastdebug mode. We should be running at least "options" tests with smaller default heaps.
> 
> Fix:
>    https://cr.openjdk.java.net/~shade/8247751/webrev.01/
> 
> Testing: hotspot_gc_shenandoah (faster now)
> 


From rkennke at redhat.com  Wed Jun 17 15:18:31 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 17 Jun 2020 17:18:31 +0200
Subject: RFR (S) 8247751: Shenandoah: options tests should run with
 smaller heaps
In-Reply-To: <3935aa0d-0a10-efd0-4aba-e47eacf0f24a@redhat.com>
References: <3935aa0d-0a10-efd0-4aba-e47eacf0f24a@redhat.com>
Message-ID: <c1b1c0a4299754c8f73712af973c38b65ccaa35a.camel@redhat.com>

Yeah, that is generally a good idea to run tests with specified
smallish heap size. Patch looks good!

Thanks,
Roman

On Wed, 2020-06-17 at 13:44 +0200, Aleksey Shipilev wrote:
> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8247751
> 
> On huge machines, the default heap sizes are quite large, and memory
> zapping takes quite some time
> to startup in fastdebug mode. We should be running at least "options"
> tests with smaller default heaps.
> 
> Fix:
>   https://cr.openjdk.java.net/~shade/8247751/webrev.01/
> 
> Testing: hotspot_gc_shenandoah (faster now)
> 


From rkennke at redhat.com  Wed Jun 17 15:19:09 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 17 Jun 2020 17:19:09 +0200
Subject: RFR (XS) 8247754: Shenandoah: mxbeans tests can be shorter
In-Reply-To: <8e3fcd9c-0c28-9102-2eab-9f149ad09379@redhat.com>
References: <8e3fcd9c-0c28-9102-2eab-9f149ad09379@redhat.com>
Message-ID: <28b8c1398a1ca1078e99f9f0bd9d0dc6c72126b9.camel@redhat.com>

Ok, good.

Thanks,
Roman

On Wed, 2020-06-17 at 14:06 +0200, Aleksey Shipilev wrote:
> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8247754
> 
> Fix:
> 
> diff -r 826804f83f85
> test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.java
> ---
> a/test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.jav
> a      Wed Jun 17 13:37:46
> 2020 +0200
> +++
> b/test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.jav
> a      Wed Jun 17 14:05:48
> 2020 +0200
> @@ -92,5 +92,5 @@
> 
>      static final long HEAP_MB = 128;                           //
> adjust for test configuration above
> -    static final long TARGET_MB = Long.getLong("target", 8_000); //
> 8 Gb allocation
> +    static final long TARGET_MB = Long.getLong("target", 2_000); //
> 2 Gb allocation
> 
>      // Should we track the churn precisely?
> diff -r 826804f83f85
> test/hotspot/jtreg/gc/shenandoah/mxbeans/TestPauseNotifications.java
> ---
> a/test/hotspot/jtreg/gc/shenandoah/mxbeans/TestPauseNotifications.jav
> a      Wed Jun 17 13:37:46
> 2020 +0200
> +++
> b/test/hotspot/jtreg/gc/shenandoah/mxbeans/TestPauseNotifications.jav
> a      Wed Jun 17 14:05:48
> 2020 +0200
> @@ -86,5 +86,5 @@
> 
>      static final long HEAP_MB = 128;                           //
> adjust for test configuration above
> -    static final long TARGET_MB = Long.getLong("target", 8_000); //
> 8 Gb allocation
> +    static final long TARGET_MB = Long.getLong("target", 2_000); //
> 2 Gb allocation
> 
>      static volatile Object sink;
> 
> Testing: hotspot_gc_shenandoah (faster now)
> 


From rkennke at redhat.com  Wed Jun 17 15:20:06 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 17 Jun 2020 17:20:06 +0200
Subject: RFR (S) 8247757: Shenandoah: split heavy tests by heuristics to
 improve parallelism
In-Reply-To: <3580fadc-335e-17a8-9d1c-43779df0d1d3@redhat.com>
References: <3580fadc-335e-17a8-9d1c-43779df0d1d3@redhat.com>
Message-ID: <33da45984d91f36298b10dd7a6ffc48b51e38195.camel@redhat.com>

Ok looks good.

Thanks,
Roman


On Wed, 2020-06-17 at 14:26 +0200, Aleksey Shipilev wrote:
> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8247757
> 
> There are plenty of tests that take a while and test multiple modes
> and heuristics. Their
> configurations are split by mode. Splitting them further by
> heuristics improve their parallelism
> even more.
> 
> Fix:
>   https://cr.openjdk.java.net/~shade/8247757/webrev.01/
> 
> Testing: hotspot_gc_shenandoah (even faster now)
> 


From shade at redhat.com  Wed Jun 17 16:13:03 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Wed, 17 Jun 2020 18:13:03 +0200
Subject: RFR (XS) 8247778: ZGC: More parallel gc/z/TestUncommit.java test
 configuration
Message-ID: <b2a123c2-b903-1b80-bf6a-9f7d120779fd@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247778

See the details in the RFE.

Fix:
  https://cr.openjdk.java.net/~shade/8247778/webrev.01/

Testing: affected test (and nothing else, because I consider the change trivial)

-- 
Thanks,
-Aleksey


From igor.ignatyev at oracle.com  Wed Jun 17 17:15:29 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Wed, 17 Jun 2020 10:15:29 -0700
Subject: RFR (XS) 8247778: ZGC: More parallel gc/z/TestUncommit.java test
 configuration
In-Reply-To: <b2a123c2-b903-1b80-bf6a-9f7d120779fd@redhat.com>
References: <b2a123c2-b903-1b80-bf6a-9f7d120779fd@redhat.com>
Message-ID: <44AEDB93-EBFA-4D97-8F0E-6E03ECC8F660@oracle.com>

Hi Aleksey,

this looks good to me and agree that it's trivial. given it's a test-only change, would you consider pushing it to jdk/jdk15?

-- Igor

> On Jun 17, 2020, at 9:13 AM, Aleksey Shipilev <shade at redhat.com> wrote:
> 
> RFE:
>  https://bugs.openjdk.java.net/browse/JDK-8247778
> 
> See the details in the RFE.
> 
> Fix:
>  https://cr.openjdk.java.net/~shade/8247778/webrev.01/
> 
> Testing: affected test (and nothing else, because I consider the change trivial)
> 
> -- 
> Thanks,
> -Aleksey
> 


From stefan.karlsson at oracle.com  Wed Jun 17 17:38:15 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 17 Jun 2020 19:38:15 +0200
Subject: RFR: 8247740: Inline derived CollectedHeap access for G1 and
 ParallelGC
In-Reply-To: <15884ED9-3AF4-4F89-894D-5BD42C7796BA@oracle.com>
References: <15884ED9-3AF4-4F89-894D-5BD42C7796BA@oracle.com>
Message-ID: <f217d6d9-4d15-86ba-74aa-13dde7bcf873@oracle.com>

Looks good.

Some nits that you might want to consider:

- Extraneous new line. Other places in the file don't add a newline 
after public:, private:, or protected:.

181 protected:
182
183 // Must follow Name enum. C++11 forward declaration of enum.

- I'm not sure the last part is complete. I do understand what it tries 
to say, but I just don't think it's necessary, or that helpful. A 
descriptive comment about the function would be more helpful, IMHO.

// Must follow Name enum. C++11 forward declaration of enum.

Thanks,
StefanK

On 2020-06-17 14:32, Kim Barrett wrote:
> Please review this change to derived CollectedHeap access for the
> various collectors.  Most of the collectors have a heap() function
> that returns the derived CollectedHeap object, with the definitions of
> these functions being nearly identical.  This change adds a helper
> function in CollectedHeap for use by these derived heap() functions.
>
> This change also inlines the heap() functions for G1 and ParallelGC.
> These functions have a very simple definition in a release build.
> Since both of these collectors have calls in relatively performance
> critical places, inlining should be (a little bit) helpful, though I
> haven't tried to measure it.  In some cases it may be better to get
> the heap once and cache it in a variable or data member; indeed,
> that's often done, but not always, and tracking down all the cases
> that matter isn't a small task.
>
> This change only does the inlining for G1 and ParallelGC.  Performance
> of Serial and Epsilon is less critical, and ZGC does a good job of
> avoiding hot calls.  Shenandoah doesn't have a corresponding function;
> it seems to have not conformed to the JDK-8077415 change.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8247740
>
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8247740/open.00/
>
> Testing:
> mach5 tier1
>


From kim.barrett at oracle.com  Thu Jun 18 02:21:05 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 17 Jun 2020 22:21:05 -0400
Subject: RFR: 8247740: Inline derived CollectedHeap access for G1 and
 ParallelGC
In-Reply-To: <f217d6d9-4d15-86ba-74aa-13dde7bcf873@oracle.com>
References: <15884ED9-3AF4-4F89-894D-5BD42C7796BA@oracle.com>
 <f217d6d9-4d15-86ba-74aa-13dde7bcf873@oracle.com>
Message-ID: <F6C09E1F-F038-43E5-AC63-57E36C627EA9@oracle.com>

> On Jun 17, 2020, at 1:38 PM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> Looks good.

Thanks.

> Some nits that you might want to consider:
> 
> - Extraneous new line. Other places in the file don't add a newline after public:, private:, or protected:.
> 
> 181 protected:
> 182
> 183 // Must follow Name enum. C++11 forward declaration of enum.
> 
> - I'm not sure the last part is complete. I do understand what it tries to say, but I just don't think it's necessary, or that helpful. A descriptive comment about the function would be more helpful, IMHO.
> 
> // Must follow Name enum. C++11 forward declaration of enum.

I should just file a bug to clean this up later.  I will do that?


From maoliang.ml at alibaba-inc.com  Thu Jun 18 09:01:49 2020
From: maoliang.ml at alibaba-inc.com (Liang Mao)
Date: Thu, 18 Jun 2020 17:01:49 +0800
Subject: =?UTF-8?B?UkZDOiBBZGFwdGl2ZWx5IHJlc2l6ZSBoZWFwIGF0IGFueSBHQy9Tb2Z0TWF4SGVhcFNpemUg?=
 =?UTF-8?B?Zm9yIEcx?=
Message-ID: <7ae44395-5e9f-4da5-a543-73da740dea59.maoliang.ml@alibaba-inc.com>

Hi Thomas,

Sorry for replying this late. It's great to see the good progress of the approach
we've disscussed for a while. Resizing at any GC is definitly the right way. I have
 some quetions in inlined comments below.

BTW, I want to answer some questions in advance:
1) We may not be able to test this approach in our work loads recently since the versions
are quite different. But we shall want to merge this and further concurent uncommit stuff
 together later in JDK11.
2) JEP 346 is backported to our JDK11 and works fine as expected in some work loads. I guess
the new elastic solution in future would be better:)
3) The previous humongous proposal by aborting initial mark solved some problems but still had
the issue of frequent GC. We are now tunning this and verifying in our work loads.


> ------------------------------------------------------------------
> From:Thomas Schatzl <thomas.schatzl at oracle.com>
> Send Time:2020 Jun. 10 (Wed.) 17:31
> To:hotspot-gc-dev at openjdk.java.net <hotspot-gc-dev at openjdk.java.net>
> Subject:RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1

> Hi all, Liang,

>   after a few months of busy working in the area of G1 heap resizing 
> and ultimately SoftMaxHeapSize support, I am fairly okay with a first 
> preview of these associated changes. So I would like to ask for feedback 
> on the current changes for what I intend to complete in the (early) 
> jdk16 timeframe.

> This is not a request for review of the changes for pushing, although 
> feedback on the code is also appreciated.

>  From my point of view only tuning a few heuristics and code polishing 
> is left to do as the change seems to do what it is intended to do.

> In particular it would be nice if Liang Mao, the original requestor of 
> all this functionality, could help with feedback on his loads. :)

> Just to recap: Sometime around end of last year, Liang posted review(s) 
> with functionality to:
> - concurrent uncommit of memory
> - implement SoftMaxHeapSize by uncommitting free memory

> That did not work well in some cases, so we agreed on us at Oracle 
> taking over. Today I would like to talk about the progress on the second 
> part :)


> The original proposal did not work well because it did not really change 
> how G1 resized the heap - i.e. SoftMaxHeapSize related changes to the 
> heap were quickly undone by regular heap expansion because it was too 
> aggressive for several reasons (e.g. bugs like JDK-8243672, 
> JDK-8244603), uncooperative (JDK-8238686) and never actually helped 
> shrinking or keeping a particular heap size.

> This resulted in lots of unnecessary heap changes even on known constant 
load.

> After some analysis after fixing these issues (at least internally ;)) I 
> thought that for G1 to keep a particular heap size G1 needs to have an 
> element in its heap sizing control loop that pushes back on (excessive) 
> heap expansion.

> The best approach I thought of has been to introduce a *lower* 
> GCTimeRatio that G1 tries to stay *above* by resizing the heap. 
> Effectively, G1 then tries to stay within ["LowerGCTimeRatio", 
> GCTimeRatio] for its actual gc time ratio.

> That works out fairly well actually, and today I thought that the code 
> is in a state, while still heavy in development (it does look like that 
> :) still), could be provided for gathering feedback on more loads from you.

> First, how to try and use before going into the details and questions I 
> have:

> This is a series of patches, which I put up on cr.openjdk.net that need 
> to be applied on recent trunk:

> These are the ones already out for review:

> 1) JDK-8243672: http://cr.openjdk.java.net/~tschatzl/8243672/webrev.1/
> 2) JDK-8244603: http://cr.openjdk.java.net/~tschatzl/8244603/webrev/

> These are in the pipeline and not "fully complete" yet:

> 3) JDK-8238163: http://cr.openjdk.java.net/~tschatzl/8238163/webrev/ 
> (optional)
> 4) JDK-8238686: http://cr.openjdk.java.net/~tschatzl/8238686/webrev/
> 5) JDK-8238687: http://cr.openjdk.java.net/~tschatzl/8238687/webrev/
> 6) JDK-8236073: http://cr.openjdk.java.net/~tschatzl/8236073/webrev/

> All of the above: 
> http://cr.openjdk.java.net/~tschatzl/8236073/webrev.preview/

> What these do:

> (1) and (2) make the input variables to the control loop more 
> consistent. Since they are out for review, I would defer to the review 
> threads for them.

> (3) stabilizes IHOP calculation a bit, trying to improve uncommon 
> situations. This change is optional.

> (4) fixes the issue with resizing at Remark being totally disconnected 
> with actual load, causing some erratic expansions and shrinks.
> After some time tinkering with that I decided to remove resizing at 
> Remark - since we check heap size at every gc anyway, this is not 
> required any more (but also delaying uncommit to the next gc).


> (5) is the main change that implements a what has been mentioned above: 
> G1 tries to keep actual GC time ratio within the range of 
> LowerGCTimeRatio and GCTimeRatio. As long as actual GC time ratio is 
> within this range, no action occurs. As soon as it finds that there is a 
> trend of being outside, it tries to correct for that, internally trying 
> to reach an actual gc time ratio in the middle of that range.

Mostly I have some concerns in this change:
a) i didn't see you change the default GCTimeRatio in G1. Do you think the lowbound
 of 6 would be too low? I don't have a precise number but intuitively at least around 10
seems more safer for those online interactive applications. That means we have 20
as the default GCTimeRatio for G1.
b) It's a known issue about mixed GC. We know that mixed GC would severely decay
the GC time ratio. (I have no test result for abortable mixed gc after JDK12.) I'm not
sure if some work loads with heavy mixed GC would easily decrease the heap size. Or
abortable mixed GC can roughly make sure the GC time ratio in mixed GC phases is above
50% of normal young GC?


> (6) implements SoftMaxHeapSize on top of that, trying to steer IHOP so 
> that G1 does not use more than that. (I.e. a complete mess of 
> potentially conflicting goals ;)

> What I would like to ask you is try out these changes on your load(s), 
> and potentially report back with at least

> gc*,gc=debug,gc+ergo+heap=trace

> logging.

> Of course more feedback about how it works for you is even better, and 
> if you are adventurous, maybe try tuning (internal) knobs a bit, which 
> I'll describe in a minute :)

> As mentioned, the changes are not complete, here's what I think should 
> still be tuned a bit, and what I expect helps. The interesting method is 
> G1HeapSizingPolicy::resize_amount_after_young_gc().

> - determining the desired gc time ratio range: there is a new (likely 
> temporary) option G1MinimumPercentOfGCTimeRatio that determines the 
> lower gc time ratio described above as percentage of the original 
> GCTimeRatio. Currently set at 50%, which seems a good value as a too 
> tight range will cause lots of resizing (which might be good), and a too 
> large range will effectively disable shrinking (which also might be 
> desired).
> Either way, this value works fairly well so far in my tests. Suggestions 
> very appreciated.

> - detection of being outside of the expected gc time ratio range: this 
> somewhat works as before, separating short term and long term behavior.

> Long term behavior: every X gcs without a heap resize g1 looks if long 
> term gc ratio is outside of the bounds, if so, react. I think this is 
> fairly straightforward.

> Short term behavior: tracks the amount of times short term gc time ratio 
> exceeds the bounds in a single variable, incrementing or decrementing it 
> depending on whether current gc time ratio is above or below the gc time 
> ratio bounds. If that value exceeds certain thresholds, do something.

> There is a new bias towards expansion at startup to make g1 react faster 
> at that time, and some decay towards "no action to be taken" if for a 
> "long" time nothing happens.

> I reused the same values for "short" time (+/-4) and "long" (10) as 
> before, they seem to be okay.

> - actual resizing: expansion is supposed to be the same as before, 
> relatively aggressive, which I intend to keep.

> Shrinking is based on the number of free regions at the moment. This is 
> not optimal because e.g. you do not want to shrink below what is needed 
> for current eden (and the survivors of the next gc).

> Other than that it is bounded by a percentage of the number of free 
> regions (G1ShrinkByPercentOfAvailable). That results some heap size 
> undershoot in some cases (i.e. temporarily uncommitting a bit to much), 
> but in my tests it hasn't been too bad.

> Still rather (too) simple, expect some tunings and changes particularly 
> here, deviating a bit more from the expansion code.

> Comments and ideas in this area, particularly ones applied to your 
> workloads, particularly appreciated.

> Another big area not yet really tested is interaction with JEP 346: 
> Promptly Return Unused Committed Memory from G1, but I am certain that 
> with it you can reduce heap usage a lot (too much?).

> My performance (throughput) tests so far look almost always encouraging: 
> 20-30% less heap size with statistically insignificant throughput 
> changes. There are some exceptions, in these cases you loose 10% of 
> throughput for like 90% of less heap usage.

> The only really bad results come from tests that try to find the maximum 
> throughput of g1 by incrementally increasing the load finding out that 
> it does not work, slightly back off with the load and then increase the 
> load again to find an "equilibrium". From what I can tell it looks like 
> the heap sizing follows the application (i.e. what it's supposed to do), 
> making the application think it's already done while there is still more 
> heap available to potentially increase performance (looking at you 
> specjbb2015 out-of-box performance!).

> Not yet sure how to counter that, but some decrease in default 
> GCTimeRatio to decrease the shrinking aggressiveness (and keeping more 
> heap longer) might fix this.

> Of course, if you disable this adaptive heap sizing by fixing the heap 
> min/max in your benchmarks, there are no differences to before.

> One interesting upcoming change is to make MinHeapSize manageable 
> (JDK-8224879) to help the algorithm a bit.

> As closing words, given that the email is quite long already, thanks for 
> your attention and looking forward to feedback :)
> If you have questions, please chime in too, I am happy to answer them.

> Thanks,
>   Thomas


From shade at redhat.com  Thu Jun 18 09:12:39 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 18 Jun 2020 11:12:39 +0200
Subject: RFR (XS) 8247778: ZGC: More parallel gc/z/TestUncommit.java test
 configuration
In-Reply-To: <44AEDB93-EBFA-4D97-8F0E-6E03ECC8F660@oracle.com>
References: <b2a123c2-b903-1b80-bf6a-9f7d120779fd@redhat.com>
 <44AEDB93-EBFA-4D97-8F0E-6E03ECC8F660@oracle.com>
Message-ID: <f77dbf79-9a92-cec5-d9d8-6312e2406a28@redhat.com>

On 6/17/20 7:15 PM, Igor Ignatyev wrote:
> this looks good to me and agree that it's trivial. given it's a test-only change, would you consider pushing it to jdk/jdk15?

Sure I can push it to jdk/jdk15.

Do ZGC people need to ack this too?

-- 
Thanks,
-Aleksey


From stefan.karlsson at oracle.com  Thu Jun 18 09:19:49 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 18 Jun 2020 11:19:49 +0200
Subject: RFR (XS) 8247778: ZGC: More parallel gc/z/TestUncommit.java test
 configuration
In-Reply-To: <f77dbf79-9a92-cec5-d9d8-6312e2406a28@redhat.com>
References: <b2a123c2-b903-1b80-bf6a-9f7d120779fd@redhat.com>
 <44AEDB93-EBFA-4D97-8F0E-6E03ECC8F660@oracle.com>
 <f77dbf79-9a92-cec5-d9d8-6312e2406a28@redhat.com>
Message-ID: <0f81259b-4318-e7c3-4492-252519f5ce84@oracle.com>

On 2020-06-18 11:12, Aleksey Shipilev wrote:
> On 6/17/20 7:15 PM, Igor Ignatyev wrote:
>> this looks good to me and agree that it's trivial. given it's a test-only change, would you consider pushing it to jdk/jdk15?
> Sure I can push it to jdk/jdk15.
>
> Do ZGC people need to ack this too?

If it works, then I think it's good. If it becomes problematic, then 
it's an easy thing to revert.

StefanK

>


From shade at redhat.com  Thu Jun 18 09:24:04 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 18 Jun 2020 11:24:04 +0200
Subject: RFR (XS) 8247778: ZGC: More parallel gc/z/TestUncommit.java test
 configuration
In-Reply-To: <0f81259b-4318-e7c3-4492-252519f5ce84@oracle.com>
References: <b2a123c2-b903-1b80-bf6a-9f7d120779fd@redhat.com>
 <44AEDB93-EBFA-4D97-8F0E-6E03ECC8F660@oracle.com>
 <f77dbf79-9a92-cec5-d9d8-6312e2406a28@redhat.com>
 <0f81259b-4318-e7c3-4492-252519f5ce84@oracle.com>
Message-ID: <b144337d-c9d4-c48a-5709-df9c8569d89b@redhat.com>

On 6/18/20 11:19 AM, Stefan Karlsson wrote:
> On 2020-06-18 11:12, Aleksey Shipilev wrote:
>> On 6/17/20 7:15 PM, Igor Ignatyev wrote:
>>> this looks good to me and agree that it's trivial. given it's a test-only change, would you consider pushing it to jdk/jdk15?
>> Sure I can push it to jdk/jdk15.
>>
>> Do ZGC people need to ack this too?
> 
> If it works, then I think it's good. If it becomes problematic, then 
> it's an easy thing to revert.

Okay then, pushed.

-- 
Thanks,
-Aleksey


From thomas.schatzl at oracle.com  Thu Jun 18 10:57:09 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 18 Jun 2020 12:57:09 +0200
Subject: RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1
In-Reply-To: <7ae44395-5e9f-4da5-a543-73da740dea59.maoliang.ml@alibaba-inc.com>
References: <7ae44395-5e9f-4da5-a543-73da740dea59.maoliang.ml@alibaba-inc.com>
Message-ID: <877470a3-6154-31a3-7cf3-2369fd998f29@oracle.com>

Hi,

On 18.06.20 11:01, Liang Mao wrote:
> Hi Thomas,
> 
> Sorry for replying this late. It's great to see the good progress of the approach
> we've disscussed for a while. Resizing at any GC is definitly the right way. I have
>   some quetions in inlined comments below.
> 
> BTW, I want to answer some questions in advance:
> 1) We may not be able to test this approach in our work loads recently since the versions
> are quite different. But we shall want to merge this and further concurent uncommit stuff
>   together later in JDK11.

Okay. We also expect this to land close together with the concurrent 
uncommit in latest jdk/jdk.

> 2) JEP 346 is backported to our JDK11 and works fine as expected in some work loads. I guess
> the new elastic solution in future would be better:)
> 3) The previous humongous proposal by aborting initial mark solved some problems but still had
> the issue of frequent GC. We are now tunning this and verifying in our work loads.

Okay. We are still working on this as it is a good idea to do, we only 
think that this change and the concurrent uncommit are more important.

Maybe you have noticed the discussions with Ziyi Luo from Amazon who is 
also currently working on improving adaptive IHOP. There may be some 
synergies in the overall effect here.
>> ------------------------------------------------------------------
>> From:Thomas Schatzl <thomas.schatzl at oracle.com>
>> Send Time:2020 Jun. 10 (Wed.) 17:31
>> To:hotspot-gc-dev at openjdk.java.net <hotspot-gc-dev at openjdk.java.net>
>> Subject:RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1
> 
>> Hi all, Liang,
> 
>>    after a few months of busy working in the area of G1 heap resizing
>> and ultimately SoftMaxHeapSize support, I am fairly okay with a first
>> preview of these associated changes. So I would like to ask for feedback
>> on the current changes for what I intend to complete in the (early)
>> jdk16 timeframe.
> 
>> This is not a request for review of the changes for pushing, although
>> feedback on the code is also appreciated.
> [...]
>> (5) is the main change that implements a what has been mentioned above:
>> G1 tries to keep actual GC time ratio within the range of
>> LowerGCTimeRatio and GCTimeRatio. As long as actual GC time ratio is
>> within this range, no action occurs. As soon as it finds that there is a
>> trend of being outside, it tries to correct for that, internally trying
>> to reach an actual gc time ratio in the middle of that range.
> 
> Mostly I have some concerns in this change:
> a) i didn't see you change the default GCTimeRatio in G1. Do you think the lowbound
>   of 6 would be too low? I don't have a precise number but intuitively at least around 10
> seems more safer for those online interactive applications. That means we have 20
> as the default GCTimeRatio for G1.

As mentioned in the original email at the end I am considering changing 
the default of GCTimeRatio, probably increasing it to get more 
aggressive behavior. Whether 20 or some other value needs to be 
determined yet.

Note that for resizing (at this time) G1 would target a GCTimeRatio of 
15 (midpoint of the range).

Also I noticed that it might be good to make the target range tighter 
for larger values of GCTimeRatio (it makes a lot of difference whether 
we are currently on average at 12 or 18 in this example). Otherwise g1 
might not decrease heap size in many cases. Need to think about this.

The reason for changing the gctimeratio defaults is not only this 
change, but over the years G1 has improved so much in pause time 
performance that a change is imho warranted. Since g1 previously did not 
really return memory that much, a higher value would increase overall 
memory usage a lot, while g1 will be much more dynamic (for better or 
worse).

This will be a different follow-up change. Thank you for your input 
about this, I filed JDK-8247843 to follow up.

> b) It's a known issue about mixed GC. We know that mixed GC would severely decay
> the GC time ratio. (I have no test result for abortable mixed gc after JDK12.) I'm not
> sure if some work loads with heavy mixed GC would easily decrease the heap size. Or
> abortable mixed GC can roughly make sure the GC time ratio in mixed GC phases is above
> 50% of normal young GC?gc time 

This issue has hopefully been fixed (or at least improved) in 8244603 
and 8238858 which are out for review.

In my testing, the increase in actual GCTimeRatio during mixed gc is now 
much much lower.

The changes will likely land next week in jdk/jdk (just missed 15).

Thanks,
   Thomas


From rs at jelastic.com  Thu Jun 18 12:16:19 2020
From: rs at jelastic.com (Ruslan Synytsky)
Date: Thu, 18 Jun 2020 15:16:19 +0300
Subject: RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1
In-Reply-To: <CA++bR4NM5sL1hhb1DutkY-Y7DFRG_cEsK8XkPb6wzzwrux3akA@mail.gmail.com>
References: <mailman.189.1592471596.17886.hotspot-gc-dev@openjdk.java.net>
 <CA++bR4NM5sL1hhb1DutkY-Y7DFRG_cEsK8XkPb6wzzwrux3akA@mail.gmail.com>
Message-ID: <CA++bR4PSJBWsSC1u3Z5KKBrAid6aEALhjaf3tDdmzed=3Ndz8g@mail.gmail.com>

Hi Thomas and Liang, thank you for moving this improvement forward. A quick
question regarding the naming: did we agree on how this parameter should be
called? What happens when heap usage goes higher than SoftMaxHeapSize -
OOMError or JVM gets a little bit more memory? If JVM throws OOMError I
believe the right naming should be HardMaxHeapSize. Sorry in advance if I
missed this point in the previous conversations.

Also, some news regarding analysis automation of memory usage efficiency
I'm working on in the background. We built a relatively small script that
collects memory usage metrics from many containers running inside the same
large host machine. After executing it in one of our dev environments with
about 150 containers we got interesting results - the used heap is very
close to the committed heap while Xmx is much higher compared to committed
value. Please note, almost all containers use JEP 346 improvement or
javaagent which triggers GC at idle state in the older JDK versions.

[image: Screenshot 2020-06-18 at 13.20.19.jpg]

Zoomed

[image: Screenshot 2020-06-18 at 14.40.18.jpg]
There is a challenge to get metrics from a running java process. I as
understand the most accurate metrics can be collected via MBean, for
example

JMXConnector c =
JMXConnectorFactory.newJMXConnector(createConnectionURL(host, port),
null);
c.connect();
MBeanServerConnection mbsc = c.getMBeanServerConnection();
MemoryMXBean mem = ManagementFactory.getPlatformMXBean(mbsc,
MemoryMXBean.class);
MemoryUsage heap = mem.*getHeapMemoryUsage*();

However, enabling JMX ManagementAgent via jcmd and connecting to JVM with a
JMX client is a quite complex operation for getting such a simple metric
about heap memory usage. Also, some java processes may already
start ManagementAgent on a custom port with auth protection, so we can't
collect statistics from such java processes without contacting the
application owner (you can see the gaps on the chart). Do you know any
other way for collecting accurate heap usage statistics from a running java
process? We plan to run this analysis tool on productions with a large
number of containers, so we can get a more realistic picture.

Thanks
-- 
Ruslan Synytsky


From shade at redhat.com  Thu Jun 18 16:23:47 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 18 Jun 2020 18:23:47 +0200
Subject: RFR (XS) 8247860: Shenandoah: add update watermark line in rich
 assert failure message
Message-ID: <5a07f10c-f105-8a5d-c27b-730cdcea54a0@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247860

Patch:

diff -r dbd95dd97289 src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp     Tue Jun 16 15:18:58 2020 -0400
+++ b/src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp     Thu Jun 18 18:23:27 2020 +0200
@@ -68,4 +68,5 @@
   msg.append("  " PTR_FORMAT " - klass " PTR_FORMAT " %s\n", p2i(obj), p2i(obj->klass()),
obj->klass()->external_name());
   msg.append("    %3s allocated after mark start\n", ctx->allocated_after_mark_start(obj) ? "" :
"not");
+  msg.append("    %3s after update watermark\n",     cast_from_oop<HeapWord*>(obj) >=
r->get_update_watermark() ? "" : "not");
   msg.append("    %3s marked \n",                    ctx->is_marked(obj) ? "" : "not");
   msg.append("    %3s in collection set\n",          heap->in_collection_set(obj) ? "" : "not");

Testing: hotspot_gc_shenandoah

-- 
Thanks,
-Aleksey


From shade at redhat.com  Thu Jun 18 16:30:05 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Thu, 18 Jun 2020 18:30:05 +0200
Subject: RFR (S) 8247845: Shenandoah: refactor TLAB/GCLAB retirement code
Message-ID: <a14f55bf-854f-6ab0-66ce-afbcdec1d9ee@redhat.com>

RFE:
  https://bugs.openjdk.java.net/browse/JDK-8247845

Fix;
  http://cr.openjdk.java.net/~shade/8247845/webrev.01/

Current TLAB/GCLAB retirement code is all over the place. Sometimes we retire GCLABs twice.
Sometimes we resize TLABs twice. This hopefully makes the things more clear by lifting things out of
CollectedHeap::ensure_parsability and specializing it for Shenandoah use cases.

Testing: hotspot_gc_shenandoah {fastdebug,release}; tier{1,2} with Shenandoah; benchmarks (running)

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Thu Jun 18 16:30:27 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Thu, 18 Jun 2020 18:30:27 +0200
Subject: RFR (XS) 8247860: Shenandoah: add update watermark line in rich
 assert failure message
In-Reply-To: <5a07f10c-f105-8a5d-c27b-730cdcea54a0@redhat.com>
References: <5a07f10c-f105-8a5d-c27b-730cdcea54a0@redhat.com>
Message-ID: <59ecab1ac8b54bd96aa91c7df3078c9b39a9b697.camel@redhat.com>

Looks good!

Roman

On Thu, 2020-06-18 at 18:23 +0200, Aleksey Shipilev wrote:
> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8247860
> 
> Patch:
> 
> diff -r dbd95dd97289
> src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp
> --- a/src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp     Tue
> Jun 16 15:18:58 2020 -0400
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp     Thu
> Jun 18 18:23:27 2020 +0200
> @@ -68,4 +68,5 @@
>    msg.append("  " PTR_FORMAT " - klass " PTR_FORMAT " %s\n",
> p2i(obj), p2i(obj->klass()),
> obj->klass()->external_name());
>    msg.append("    %3s allocated after mark start\n", ctx-
> >allocated_after_mark_start(obj) ? "" :
> "not");
> +  msg.append("    %3s after update
> watermark\n",     cast_from_oop<HeapWord*>(obj) >=
> r->get_update_watermark() ? "" : "not");
>    msg.append("    %3s marked \n",                    ctx-
> >is_marked(obj) ? "" : "not");
>    msg.append("    %3s in collection set\n",          heap-
> >in_collection_set(obj) ? "" : "not");
> 
> Testing: hotspot_gc_shenandoah
> 


From thomas.schatzl at oracle.com  Thu Jun 18 18:03:49 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 18 Jun 2020 20:03:49 +0200
Subject: RFR (XXS): 8247748: Use default alpha for G1 adaptive IHOP allocation
 rate calculation
Message-ID: <99c1ca37-8e43-2c5f-4745-82441512333a@oracle.com>

Hi all,

   can I have reviews for this change that tones down the g1 adaptive 
ihop allocation rate prediction to not follow the most recent allocation 
rate that much?

Instead of 0.95, use the default 0.7.

CR:
https://bugs.openjdk.java.net/browse/JDK-8247748
Webrev:
http://cr.openjdk.java.net/~tschatzl/8247748/webrev/
Testing:
tier1-3

Thanks,
   Thomas


From luoziyi at amazon.com  Thu Jun 18 18:45:00 2020
From: luoziyi at amazon.com (Luo, Ziyi)
Date: Thu, 18 Jun 2020 18:45:00 +0000
Subject: RFR (XXS): 8247748: Use default alpha for G1 adaptive IHOP
 allocation rate calculation
In-Reply-To: <99c1ca37-8e43-2c5f-4745-82441512333a@oracle.com>
References: <99c1ca37-8e43-2c5f-4745-82441512333a@oracle.com>
Message-ID: <E9CE71E1-B44D-4CFA-BB9E-02672E365AE6@amazon.com>

Hi Thomas,

I am a little bit confused here.

According to the TruncatedSeq constructor, the alpha is used to construct 
AbsSeq:
http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l31

When adding a new value to AbsSeq, the weight of the new val is (1.0 - _alpha):
http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l146
http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l44

Right now, in Adaptive IHOP, the weight of the new value is (1-0.95) =
0.05. After changing alpha to the default 0.7, the weight is increased to 0.3, 
which actually emphasizes the most recent allocation rate and will potentially 
make the prediction spikier.

Best,
Ziyi

On 6/18/20, 11:18 AM, Thomas Schatzl wrote:

    Hi all,

       can I have reviews for this change that tones down the g1 adaptive
    ihop allocation rate prediction to not follow the most recent allocation
    rate that much?

    Instead of 0.95, use the default 0.7.

    CR:
    https://bugs.openjdk.java.net/browse/JDK-8247748
    Webrev:
    http://cr.openjdk.java.net/~tschatzl/8247748/webrev/
    Testing:
    tier1-3

    Thanks,
       Thomas


From thomas.schatzl at oracle.com  Thu Jun 18 19:18:49 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 18 Jun 2020 21:18:49 +0200
Subject: RFR (XXS): 8247748: Use default alpha for G1 adaptive IHOP
 allocation rate calculation
In-Reply-To: <E9CE71E1-B44D-4CFA-BB9E-02672E365AE6@amazon.com>
References: <99c1ca37-8e43-2c5f-4745-82441512333a@oracle.com>
 <E9CE71E1-B44D-4CFA-BB9E-02672E365AE6@amazon.com>
Message-ID: <9e843752-7615-4a60-19b0-2d7a73211564@oracle.com>

Hi,

On 18.06.20 20:45, Luo, Ziyi wrote:
> Hi Thomas,
> 
> I am a little bit confused here.
> 
> According to the TruncatedSeq constructor, the alpha is used to construct
> AbsSeq:
> http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l31
> 
> When adding a new value to AbsSeq, the weight of the new val is (1.0 - _alpha):
> http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l146
> http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l44
> 
> Right now, in Adaptive IHOP, the weight of the new value is (1-0.95) =
> 0.05. After changing alpha to the default 0.7, the weight is increased to 0.3,
> which actually emphasizes the most recent allocation rate and will potentially
> make the prediction spikier.
> 

   you are right. I mostly went with my memory  which apparently has 
been the complete opposite of what it does/did, which led to the CR 
title/description :(

In my tests I could not see an actual difference in behavior so I 
figured it would be better to not deviate from the other, existing 
predictors instead of some other number as we do not have any 
explanation for the default value (of 0.7) either. I.e. something like 
"one magic number is better than two".

For this (very weak) reason I would still like to make this change, but 
I am open to just retract this change.

Thanks for making me aware of this!

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Thu Jun 18 19:25:44 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 18 Jun 2020 21:25:44 +0200
Subject: RFR (XXS): 8247748: Use default alpha for G1 adaptive IHOP
 allocation rate calculation
In-Reply-To: <9e843752-7615-4a60-19b0-2d7a73211564@oracle.com>
References: <99c1ca37-8e43-2c5f-4745-82441512333a@oracle.com>
 <E9CE71E1-B44D-4CFA-BB9E-02672E365AE6@amazon.com>
 <9e843752-7615-4a60-19b0-2d7a73211564@oracle.com>
Message-ID: <7bcaec8a-adef-777f-6218-4362424c7eef@oracle.com>

Hi,

On 18.06.20 21:18, Thomas Schatzl wrote:
> Hi,
> 
> On 18.06.20 20:45, Luo, Ziyi wrote:
>> Hi Thomas,
>>
>> I am a little bit confused here.
>>
>> According to the TruncatedSeq constructor, the alpha is used to construct
>> AbsSeq:
>> http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l31 
>>
>>
>> When adding a new value to AbsSeq, the weight of the new val is (1.0 - 
>> _alpha):
>> http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l146 
>>
>> http://hg.openjdk.java.net/jdk/jdk/file/08211be640e9/src/hotspot/share/utilities/numberSeq.cpp#l44 
>>
>>
>> Right now, in Adaptive IHOP, the weight of the new value is (1-0.95) =
>> 0.05. After changing alpha to the default 0.7, the weight is increased 
>> to 0.3,
>> which actually emphasizes the most recent allocation rate and will 
>> potentially
>> make the prediction spikier.
>>
> 
>  ? you are right. I mostly went with my memory? which apparently has 
> been the complete opposite of what it does/did, which led to the CR 
> title/description :(
> 
> In my tests I could not see an actual difference in behavior so I 
> figured it would be better to not deviate from the other, existing 
> predictors instead of some other number as we do not have any 
> explanation for the default value (of 0.7) either. I.e. something like 
> "one magic number is better than two".
> 
> For this (very weak) reason I would still like to make this change, but 
> I am open to just retract this change.
> 
> Thanks for making me aware of this!

   just let me retract this RFR for now. I will take more time to find a 
better reasoning for any change in that area.

Thanks,
   Thomas


From albert.th at alibaba-inc.com  Fri Jun 19 06:34:33 2020
From: albert.th at alibaba-inc.com (Hao Tang)
Date: Fri, 19 Jun 2020 14:34:33 +0800
Subject: =?UTF-8?B?UmU6IERpc2N1c3Npb24gb24gWkdDJ3MgUGFnZSBDYWNoZSBGbHVzaA==?=
In-Reply-To: <a6db1323-f14f-1ac4-81c4-37451dd978eb@oracle.com>
References: <d64e4c15-5b0c-47d8-93c3-d378f9d13fbe.albert.th@alibaba-inc.com>,
 <a6db1323-f14f-1ac4-81c4-37451dd978eb@oracle.com>
Message-ID: <2b4f3dc4-002e-4967-85d0-945904eef27e.albert.th@alibaba-inc.com>

Thanks for your reply.

This is our patch for "balancing" page cache: https://github.com/tanghaoth90/jdk11u/commit/77631cf3 (based on jdk11u).

We notice two cases that "page cache flush" frequently happens:

* The number of cached pages is not sufficient for concurrent relocation.

    For example, 34 medium pages are "to-space" as the GC log shows below.
    "[2020-03-06T05:46:31.618+0800] GC(10406) Relocation Set (Medium Pages): 54->34, 91 skipped"
    In our scenario, hundreds of mutator threads is running. To my knowledge, these mutator can possibly relocate medium-sized
    objects in the relocation set. If there are less than 34 cached medium pages, "page cache flush" is likely to happen.

    Our strategy is to ensure at least 34 cached medium pages before relocation.

* A lot of medium(small)-sized objects become unreachable at a moment (such as removing the root of these objects).
    Assume that the ratio of allocation rate of small and medium objects is 1:1. In this case, small-sized and medium-sized 
    objects occupy 50% and 50% of the total memory, respectively. If medium-sized objects of 25% total memory are removed, there 
    are still cached medium pages of 25% total memory when all small pages are used up. Since ZDriver does not trigger a new
    GC cycle at this moment, 12.5% total memory should be transformed from medium pages into small pages for allocating small
    -sized objects.

    Our strategy is to ensure the ratio of different types of cached pages to match the ratio of allocation rate.

The patch works well on our application (by eliminating "page cache flush" and the corresponding delay). However, this approach have 
shortcomings as my previous mail mentioned. It might not be a complete solution for general cases, but still worth discussing. We are 
also thinking about alternative solutions, such as keep some cached page as buffer.

Looking forward to your feedback. Thanks.

Sincerely,

Hao Tang


------------------------------------------------------------------
From:Per Liden <per.liden at oracle.com>
Send Time:2020?6?5? 18:54
To:albert.th at alibaba-inc.com; hotspot-gc-dev openjdk.java.net <hotspot-gc-dev at openjdk.java.net>; zgc-dev <zgc-dev at openjdk.java.net>
Subject:Re: Discussion on ZGC's Page Cache Flush

Hi,

On 6/5/20 11:24 AM, Hao Tang wrote:
> 
> Hi ZGC Team,
> 
> We encountered "Page Cache Flushed" when we enable ZGC feature. Much longer response time can be observed at the time when "Page Cache Flushed" happened. There is a case that is able to reproduce this scenario. In this case, medium-sized objects are periodically cleaned up. Right after the clean-up, small pages is not sufficient for allocating small-sized objects, which needs to flush medium pages into small pages. We found that simply enlarging the max heap size cannot solve this problem. We believe that "page cache flush" issue could be a general problem, because the ratio of small/medium/large objects are not always constant.
> 
> Sample code:
> import java.util.Random;
> import java.util.concurrent.locks.LockSupport;
> public class TestPageCacheFlush {
>      /*
>       * Options: -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UnlockDiagnosticVMOptions -Xms10g -Xmx10g -XX:ParallelGCThreads=2 -XX:ConcGCThreads=4 -Xlog:gc,gc+heap
>       * small object: fast allocation
>       * medium object: slow allocation, periodic deletion
>       */
>      public static void main(String[] args) throws Exception {
>          long heapSizeKB = Runtime.getRuntime().totalMemory() >> 10;
>          System.out.println(heapSizeKB);
>          SmallContainer smallContainer = new SmallContainer((long)(heapSizeKB * 0.4));     // 40% heap for live small objects
>          MediumContainer mediumContainer = new MediumContainer((long)(heapSizeKB * 0.4));  // 40% heap for live medium objects
>          int totalSmall = smallContainer.getTotalObjects();
>          int totalMedium = mediumContainer.getTotalObjects();
>          int addedSmall = 0;
>          int addedMedium = 1; // should not be divided by zero
>          while (addedMedium < totalMedium * 10) {
>              if (totalSmall / totalMedium > addedSmall / addedMedium) { // keep the ratio of allocated small/medium objects
>                  smallContainer.createAndSaveObject();
>                  addedSmall ++;
>              } else {
>                  mediumContainer.createAndAppendObject();
>                  addedMedium ++;
>              }
>              if ((addedSmall + addedMedium) % 50 == 0) {
>                  LockSupport.parkNanos(500); // make allocation slower
>              }
>          }
>      }
>      static class SmallContainer {
>          private final int KB_PER_OBJECT = 64; // 64KB per object
>          private final Random RANDOM = new Random();
>          private byte[][] smallObjectArray;
>          private long totalKB;
>          private int totalObjects;
>          SmallContainer(long totalKB) {
>              this.totalKB = totalKB;
>              totalObjects = (int)(totalKB / KB_PER_OBJECT);
>              smallObjectArray = new byte[totalObjects][];
>          }
>          int getTotalObjects() {
>              return totalObjects;
>          }
>          // random insertion (with random deletion)
>          void createAndSaveObject() {
>              smallObjectArray[RANDOM.nextInt(totalObjects)] = new byte[KB_PER_OBJECT << 10];
>          }
>      }
>      static class MediumContainer {
>          private final int KB_PER_OBJECT = 512; // 512KB per object
>          private byte[][] mediumObjectArray;
>          private int mediumObjectArrayCurrentIndex = 0;
>          private long totalKB;
>          private int totalObjects;
>          MediumContainer(long totalKB) {
>              this.totalKB = totalKB;
>              totalObjects = (int)(totalKB / KB_PER_OBJECT);
>              mediumObjectArray = new byte[totalObjects][];
>          }
>          int getTotalObjects() {
>              return totalObjects;
>          }
>          void createAndAppendObject() {
>              if (mediumObjectArrayCurrentIndex == totalObjects) { // periodic deletion
>                  mediumObjectArray = new byte[totalObjects][]; // also delete all medium objects in the old array
>                  mediumObjectArrayCurrentIndex = 0;
>              } else {
>                  mediumObjectArray[mediumObjectArrayCurrentIndex] = new byte[KB_PER_OBJECT << 10];
>                  mediumObjectArrayCurrentIndex ++;
>              }
>          }
>      }
> }
> 
> To avoid "page cache flush", we made a patch for converting small/medium pages to medium/small pages ahead of time. This patch works well on an application with relatively-stable allocation rate, which has not encountered throughput problem. How do you think of this solution?
> 
> We notice that you are improving the efficiency for map/unmap operations (https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/029936.html). It may be a step for improving the delay caused by "page cache flush". Do you have further plan for eliminating or improving "page cache flush"?

Yes, and as you might have seen, the latest incarnation of this patchset 
includes asynchronous unmapping, which helps reduce the time for page 
cache flushing. I ran your example program above, with these patches and 
can see ~30% reduction in average page allocation time, and ~60% 
reduction in worst case page allocation time. So, it will be an improvement.

However, I'd be more than happy to take a look at your patch and see 
what you've done. Making page cache flushing even less expensive is 
something we're interested in going forward.

cheers,
Per

> 
> Sincerely,Hao Tang
> 


From zgu at redhat.com  Mon Jun 22 14:12:25 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 22 Jun 2020 10:12:25 -0400
Subject: [16] 8247736: Shenandoah: assert(_nm->is_alive()) failed: only alive
 nmethods here
Message-ID: <51c85d82-6bc1-56b9-9d5d-408bfcdbe87a@redhat.com>

The assertion is unreliable, as a nmethod can become a zombie before it 
is unregistered, and nmethod's state change can race against concurrent 
nmethod iteration, since they are under two different locks.

We did not see this assertion before JDK-8245961, because we used 
CodeCache::blobs_do() to scan code cache and did not have the assertion 
on its code path.

Ideally, I would prefer to keep nmethod list hygienic: unregister the 
nmethod before making state transition. However, offline discussion with 
Erik, he convinced me that could have unexpected consequences and risky. 
Mark through and evacuate/disarm zombie nmethods, while undesirable, but 
harmless.

So, let's just filter out dead nmethod (still racy) and remove the 
assertion.

Bug: https://bugs.openjdk.java.net/browse/JDK-8247736
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8247736/webrev.00/index.html

Test:
   hotspot_gc_shenandoah

Thanks,

-Zhengyu


From rkennke at redhat.com  Mon Jun 22 15:00:34 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 22 Jun 2020 17:00:34 +0200
Subject: [16] 8247736: Shenandoah: assert(_nm->is_alive()) failed: only
 alive nmethods here
In-Reply-To: <51c85d82-6bc1-56b9-9d5d-408bfcdbe87a@redhat.com>
References: <51c85d82-6bc1-56b9-9d5d-408bfcdbe87a@redhat.com>
Message-ID: <b0480170e62cd439efaa970e05dd86c8bfe37052.camel@redhat.com>

The patch looks ok to me.

Thank you,
Roman

On Mon, 2020-06-22 at 10:12 -0400, Zhengyu Gu wrote:
> The assertion is unreliable, as a nmethod can become a zombie before
> it 
> is unregistered, and nmethod's state change can race against
> concurrent 
> nmethod iteration, since they are under two different locks.
> 
> We did not see this assertion before JDK-8245961, because we used 
> CodeCache::blobs_do() to scan code cache and did not have the
> assertion 
> on its code path.
> 
> Ideally, I would prefer to keep nmethod list hygienic: unregister
> the 
> nmethod before making state transition. However, offline discussion
> with 
> Erik, he convinced me that could have unexpected consequences and
> risky. 
> Mark through and evacuate/disarm zombie nmethods, while undesirable,
> but 
> harmless.
> 
> So, let's just filter out dead nmethod (still racy) and remove the 
> assertion.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8247736
> Webrev: 
> http://cr.openjdk.java.net/~zgu/JDK-8247736/webrev.00/index.html
> 
> Test:
>    hotspot_gc_shenandoah
> 
> Thanks,
> 
> -Zhengyu
> 


From shade at redhat.com  Mon Jun 22 15:34:00 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 22 Jun 2020 17:34:00 +0200
Subject: [16] 8247736: Shenandoah: assert(_nm->is_alive()) failed: only
 alive nmethods here
In-Reply-To: <51c85d82-6bc1-56b9-9d5d-408bfcdbe87a@redhat.com>
References: <51c85d82-6bc1-56b9-9d5d-408bfcdbe87a@redhat.com>
Message-ID: <250520d7-04c3-db0a-78da-922d616de5a6@redhat.com>

On 6/22/20 4:12 PM, Zhengyu Gu wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8247736
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8247736/webrev.00/index.html

Looks fine.

-- 
Thanks,
-Aleksey


From daniel.daugherty at oracle.com  Mon Jun 22 20:25:16 2020
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 22 Jun 2020 16:25:16 -0400
Subject: RFR(T): 8248049: minor cleanups in gc/whitebox/TestWBGC.java
Message-ID: <0696b255-152b-13d1-9471-accd478c5681@oracle.com>

Greetings,

During the code review for the following fix:

 ??? JDK-8246477 add whitebox support for deflating idle monitors

David H. made a couple of nit comments on my new test.
Since I created my test by copying and modifying TestWBGC.java,
it makes sense to also fix those nits in that test.

Here's the tracking bug:

 ??? JDK-8248049 minor cleanups in gc/whitebox/TestWBGC.java
 ??? https://bugs.openjdk.java.net/browse/JDK-8248049

And here's the context diffs for a trivial review:

$ hg diff
diff -r aa1a1a674ec6 test/hotspot/jtreg/gc/whitebox/TestWBGC.java
--- a/test/hotspot/jtreg/gc/whitebox/TestWBGC.java Mon Jun 22 16:03:40 
2020 -0400
+++ b/test/hotspot/jtreg/gc/whitebox/TestWBGC.java Mon Jun 22 16:06:29 
2020 -0400
@@ -24,9 +24,9 @@
 ?package gc.whitebox;

 ?/*
- * @test TestWBGC
+ * @test
 ? * @bug 8055098
- * @summary Test verify that WB methods isObjectInOldGen and youngGC 
works correctly.
+ * @summary Test to verify that WB methods isObjectInOldGen and youngGC 
work correctly.
 ? * @requires vm.gc != "Z" & vm.gc != "Shenandoah"
 ? * @library /test/lib
 ? * @modules java.base/jdk.internal.misc


Thanks, in advance, for any comments, questions or suggestions.

Dan


From daniel.daugherty at oracle.com  Mon Jun 22 20:37:42 2020
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Mon, 22 Jun 2020 16:37:42 -0400
Subject: RFR(T): 8248049: minor cleanups in gc/whitebox/TestWBGC.java
In-Reply-To: <9f80df1c-fddd-9cc3-f8c9-7bd44d035d94@oracle.com>
References: <0696b255-152b-13d1-9471-accd478c5681@oracle.com>
 <6b3164ea-72e6-fc7c-19ef-24113031ff74@oracle.com>
 <9f80df1c-fddd-9cc3-f8c9-7bd44d035d94@oracle.com>
Message-ID: <9c1d95d7-262c-9a3b-c510-e35264389774@oracle.com>

Resending to add back the hotspot-gc-dev at ... alias...

Dan


On 6/22/20 4:36 PM, Daniel D. Daugherty wrote:
> Harold,
>
> Thanks for the fast review!
>
> Dan
>
>
> On 6/22/20 4:32 PM, Harold Seigel wrote:
>> Looks good and trivial.
>>
>> Thanks, Harold
>>
>> On 6/22/2020 4:25 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> During the code review for the following fix:
>>>
>>> ??? JDK-8246477 add whitebox support for deflating idle monitors
>>>
>>> David H. made a couple of nit comments on my new test.
>>> Since I created my test by copying and modifying TestWBGC.java,
>>> it makes sense to also fix those nits in that test.
>>>
>>> Here's the tracking bug:
>>>
>>> ??? JDK-8248049 minor cleanups in gc/whitebox/TestWBGC.java
>>> ??? https://bugs.openjdk.java.net/browse/JDK-8248049
>>>
>>> And here's the context diffs for a trivial review:
>>>
>>> $ hg diff
>>> diff -r aa1a1a674ec6 test/hotspot/jtreg/gc/whitebox/TestWBGC.java
>>> --- a/test/hotspot/jtreg/gc/whitebox/TestWBGC.java Mon Jun 22 
>>> 16:03:40 2020 -0400
>>> +++ b/test/hotspot/jtreg/gc/whitebox/TestWBGC.java Mon Jun 22 
>>> 16:06:29 2020 -0400
>>> @@ -24,9 +24,9 @@
>>> ?package gc.whitebox;
>>>
>>> ?/*
>>> - * @test TestWBGC
>>> + * @test
>>> ? * @bug 8055098
>>> - * @summary Test verify that WB methods isObjectInOldGen and 
>>> youngGC works correctly.
>>> + * @summary Test to verify that WB methods isObjectInOldGen and 
>>> youngGC work correctly.
>>> ? * @requires vm.gc != "Z" & vm.gc != "Shenandoah"
>>> ? * @library /test/lib
>>> ? * @modules java.base/jdk.internal.misc
>>>
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>>
>


From per.liden at oracle.com  Tue Jun 23 06:56:48 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 23 Jun 2020 08:56:48 +0200
Subject: RFR: 8247740: Inline derived CollectedHeap access for G1 and
 ParallelGC
In-Reply-To: <15884ED9-3AF4-4F89-894D-5BD42C7796BA@oracle.com>
References: <15884ED9-3AF4-4F89-894D-5BD42C7796BA@oracle.com>
Message-ID: <f02d4597-bd1e-8e22-c4c7-51d45f5686d6@oracle.com>

On 6/17/20 2:32 PM, Kim Barrett wrote:
> Please review this change to derived CollectedHeap access for the
> various collectors.  Most of the collectors have a heap() function
> that returns the derived CollectedHeap object, with the definitions of
> these functions being nearly identical.  This change adds a helper
> function in CollectedHeap for use by these derived heap() functions.
> 
> This change also inlines the heap() functions for G1 and ParallelGC.
> These functions have a very simple definition in a release build.
> Since both of these collectors have calls in relatively performance
> critical places, inlining should be (a little bit) helpful, though I
> haven't tried to measure it.  In some cases it may be better to get
> the heap once and cache it in a variable or data member; indeed,
> that's often done, but not always, and tracking down all the cases
> that matter isn't a small task.
> 
> This change only does the inlining for G1 and ParallelGC.  Performance
> of Serial and Epsilon is less critical, and ZGC does a good job of
> avoiding hot calls.  Shenandoah doesn't have a corresponding function;
> it seems to have not conformed to the JDK-8077415 change.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8247740
> 
> Webrev:
> https://cr.openjdk.java.net/~kbarrett/8247740/open.00/

Looks good!

/Per

> 
> Testing:
> mach5 tier1
> 


From stefan.karlsson at oracle.com  Tue Jun 23 08:10:02 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 10:10:02 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in root
 processing
Message-ID: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>

Hi all,

Please review this patch to unify handling of all OopStorage instances 
in root processing.

https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8248132

This removes the explicit enumeration of "strong" OopStorages in ZGC. 
This is a step towards allowing the Runtime code to add new OopStorages 
without having to update all GCs.

Tested with tier1-3

Thanks,
StefanK


From stefan.karlsson at oracle.com  Tue Jun 23 08:12:42 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 10:12:42 +0200
Subject: RFR: 8248133: SerialGC: Unify handling of all OopStorage instances in
 root processing
Message-ID: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>

Hi all,

Please review this patch to nify handling of all OopStorage instances in 
root processing for the Serial GC.

https://cr.openjdk.java.net/~stefank/8248133/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8248133

This removes the explicit enumeration of "strong" OopStorages in the 
Serial GC. This is a step towards allowing the Runtime code to add new 
OopStorages without having to update all GCs.

Tested with tier1-tier3

Thanks,
StefanK


From stefan.karlsson at oracle.com  Tue Jun 23 08:23:50 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 10:23:50 +0200
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
Message-ID: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>

Hi all,

Please review this patch to both unify handling of all OopStorage 
instances and parallelize it in the root processing of the Parallel GC.

https://cr.openjdk.java.net/~stefank/8247820/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8247820

This removes the explicit enumeration of "strong" OopStorages in the 
Parallel GC. This is a step towards allowing the Runtime code to add new 
OopStorages without having to update all GCs.

It also parallelizes the processing of the OopStorages, using the class 
that's being introduced in:
https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/030152.html

Tested with tier1-3

Thanks,
StefanK


From per.liden at oracle.com  Tue Jun 23 08:26:45 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 23 Jun 2020 10:26:45 +0200
Subject: Discussion on ZGC's Page Cache Flush
In-Reply-To: <2b4f3dc4-002e-4967-85d0-945904eef27e.albert.th@alibaba-inc.com>
References: <d64e4c15-5b0c-47d8-93c3-d378f9d13fbe.albert.th@alibaba-inc.com>
 <a6db1323-f14f-1ac4-81c4-37451dd978eb@oracle.com>
 <2b4f3dc4-002e-4967-85d0-945904eef27e.albert.th@alibaba-inc.com>
Message-ID: <5534bd26-b080-3cb7-32dd-3c7f020d0253@oracle.com>

Hi,

On 6/19/20 8:34 AM, Hao Tang wrote:
> Thanks for your reply.
> 
> This?is?our?patch?for?"balancing"?page?cache: 
> https://github.com/tanghaoth90/jdk11u/commit/77631cf3?(based?on?jdk11u).

Sorry, but for IP clarity could you please post that patch to 
cr.openjdk.java.net, otherwise I'm afraid I can't look at the patch.

> 
> We?notice?two?cases?that?"page?cache?flush"?frequently?happens:
> 
> *?The?number?of?cached?pages?is?not?sufficient?for?concurrent?relocation.
> 
>  ????For?example,?34?medium?pages?are?"to-space"?as?the?GC?log?shows?below.
>  ????"[2020-03-06T05:46:31.618+0800]?GC(10406)?Relocation?Set?(Medium?Pages):?54->34,?91?skipped"
>  ????In?our?scenario,?hundreds?of?mutator?threads?is?running.?To?my?knowledge,?these?mutator?can?possibly?relocate?medium-sized
>  ????objects?in?the?relocation?set.?If?there?are?less?than?34?cached?medium?pages,?"page?cache?flush"?is?likely?to?happen.
> 
>  ????Our?strategy?is?to?ensure?at?least?34?cached?medium?pages?before?relocation.
> 
> *?A?lot?of?medium(small)-sized?objects?become?unreachable?at?a?moment?(such?as?removing?the?root?of?these?objects).
>  ????Assume?that?the?ratio?of?allocation?rate?of?small?and?medium?objects?is?1:1.?In?this?case,?small-sized?and?medium-sized
>  ????objects?occupy?50%?and?50%?of?the?total?memory,?respectively.?If?medium-sized?objects?of?25%?total?memory?are?removed,?there
>  ????are?still?cached?medium?pages?of?25%?total?memory?when?all?small?pages?are?used?up.?Since?ZDriver?does?not?trigger?a?new
>  ????GC?cycle?at?this?moment,?12.5%?total?memory?should?be?transformed?from?medium?pages?into?small?pages?for?allocating?small
>  ????-sized?objects.
> 
>  ????Our?strategy?is?to?ensure?the?ratio?of?different?types?of?cached?pages?to?match?the?ratio?of?allocation?rate.
> 
> The?patch?works?well?on?our?application?(by?eliminating?"page?cache?flush"?and?the?corresponding?delay).?However,?this?approach?have 
> 
> shortcomings?as?my?previous?mail?mentioned.?It?might?not?be?a?complete?solution?for?general?cases,?but?still?worth?discussing.?We?are 
> 
> also?thinking?about?alternative?solutions,?such?as?keep?some?cached?page?as?buffer.
> 
> Looking?forward?to?your?feedback.?Thanks.

As of JDK 13, having lots of medium/large pages in the page cache is not 
a problem, since ZGC will split such pages into small pages (which is 
inexpensive) when needed. However, going from small to medium/large is 
more problematic, as it involved (re)mapping memory. One possible 
solution to make this less expensive might be to fuse small pages into 
medium (or large) pages when they are freed. Either by 1) just 
opportunistically fusing small pages that sit next to each other in the 
address space (which would be relatively inexpensive), or 2) by 
remapping memory (which would be more expensive, but that work would be 
done by GC threads).

Alt. 1 would require the page cache to keep pages sorted by virtual 
address. While that's doable, it would be slightly complicated by 
uncommit, which wants to keep pages sorted by LRU.

Alt. 2 might be too expensive to do all the time, but might perhaps be 
useful a complement to alt. 1, if a large set of cached small pages 
can't be fused.

Monitoring the distribution of small/medium page allocations (as you 
mention), might be useful to guide alt. 1 & 2.

cheers,
Per

> 
> Sincerely,
> 
> Hao?Tang
> 
> 
> 
> ------------------------------------------------------------------
> From:Per Liden <per.liden at oracle.com>
> Send Time:2020?6?5? 18:54
> To:albert.th at alibaba-inc.com; hotspot-gc-dev openjdk.java.net 
> <hotspot-gc-dev at openjdk.java.net>; zgc-dev <zgc-dev at openjdk.java.net>
> Subject:Re: Discussion on ZGC's Page Cache Flush
> 
> Hi,
> 
> On?6/5/20?11:24?AM,?Hao?Tang?wrote:
>  >
>  >?Hi?ZGC?Team,
>  >
>  >?We?encountered?"Page?Cache?Flushed"?when?we?enable?ZGC?feature.?Much?longer?response?time?can?be?observed?at?the?time?when?"Page?Cache?Flushed"?happened.?There?is?a?case?that?is?able?to?reproduce?this?scenario.?In?this?case,?medium-sized?objects?are?periodically?cleaned?up.?Right?after?the?clean-up,?small?pages?is?not?sufficient?for?allocating?small-sized?objects,?which?needs?to?flush?medium?pages?into?small?pages.?We?found?that?simply?enlarging?the?max?heap?size?cannot?solve?this?problem.?We?believe?that?"page?cache?flush"?issue?could?be?a?general?problem,?because?the?ratio?of?small/medium/large?objects?are?not?always?constant.
>  >
>  >?Sample?code:
>  >?import?java.util.Random;
>  >?import?java.util.concurrent.locks.LockSupport;
>  >?public?class?TestPageCacheFlush?{
>  >??????/*
>  >???????*?Options:?-XX:+UnlockExperimentalVMOptions?-XX:+UseZGC?-XX:+UnlockDiagnosticVMOptions?-Xms10g?-Xmx10g?-XX:ParallelGCThreads=2?-XX:ConcGCThreads=4?-Xlog:gc,gc+heap
>  >???????*?small?object:?fast?allocation
>  >???????*?medium?object:?slow?allocation,?periodic?deletion
>  >???????*/
>  >??????public?static?void?main(String[]?args)?throws?Exception?{
>  >??????????long?heapSizeKB?=?Runtime.getRuntime().totalMemory()?>>?10;
>  >??????????System.out.println(heapSizeKB);
>  >??????????SmallContainer?smallContainer?=?new?SmallContainer((long)(heapSizeKB?*?0.4));?????//?40%?heap?for?live?small?objects
>  >??????????MediumContainer?mediumContainer?=?new?MediumContainer((long)(heapSizeKB?*?0.4));??//?40%?heap?for?live?medium?objects
>  >??????????int?totalSmall?=?smallContainer.getTotalObjects();
>  >??????????int?totalMedium?=?mediumContainer.getTotalObjects();
>  >??????????int?addedSmall?=?0;
>  >??????????int?addedMedium?=?1;?//?should?not?be?divided?by?zero
>  >??????????while?(addedMedium?<?totalMedium?*?10)?{
>  >??????????????if?(totalSmall?/?totalMedium?>?addedSmall?/?addedMedium)?{?//?keep?the?ratio?of?allocated?small/medium?objects
>  >??????????????????smallContainer.createAndSaveObject();
>  >??????????????????addedSmall?++;
>  >??????????????}?else?{
>  >??????????????????mediumContainer.createAndAppendObject();
>  >??????????????????addedMedium?++;
>  >??????????????}
>  >??????????????if?((addedSmall?+?addedMedium)?%?50?==?0)?{
>  >??????????????????LockSupport.parkNanos(500);?//?make?allocation?slower
>  >??????????????}
>  >??????????}
>  >??????}
>  >??????static?class?SmallContainer?{
>  >??????????private?final?int?KB_PER_OBJECT?=?64;?//?64KB?per?object
>  >??????????private?final?Random?RANDOM?=?new?Random();
>  >??????????private?byte[][]?smallObjectArray;
>  >??????????private?long?totalKB;
>  >??????????private?int?totalObjects;
>  >??????????SmallContainer(long?totalKB)?{
>  >??????????????this.totalKB?=?totalKB;
>  >??????????????totalObjects?=?(int)(totalKB?/?KB_PER_OBJECT);
>  >??????????????smallObjectArray?=?new?byte[totalObjects][];
>  >??????????}
>  >??????????int?getTotalObjects()?{
>  >??????????????return?totalObjects;
>  >??????????}
>  >??????????//?random?insertion?(with?random?deletion)
>  >??????????void?createAndSaveObject()?{
>  >??????????????smallObjectArray[RANDOM.nextInt(totalObjects)]?=?new?byte[KB_PER_OBJECT?<<?10];
>  >??????????}
>  >??????}
>  >??????static?class?MediumContainer?{
>  >??????????private?final?int?KB_PER_OBJECT?=?512;?//?512KB?per?object
>  >??????????private?byte[][]?mediumObjectArray;
>  >??????????private?int?mediumObjectArrayCurrentIndex?=?0;
>  >??????????private?long?totalKB;
>  >??????????private?int?totalObjects;
>  >??????????MediumContainer(long?totalKB)?{
>  >??????????????this.totalKB?=?totalKB;
>  >??????????????totalObjects?=?(int)(totalKB?/?KB_PER_OBJECT);
>  >??????????????mediumObjectArray?=?new?byte[totalObjects][];
>  >??????????}
>  >??????????int?getTotalObjects()?{
>  >??????????????return?totalObjects;
>  >??????????}
>  >??????????void?createAndAppendObject()?{
>  >??????????????if?(mediumObjectArrayCurrentIndex?==?totalObjects)?{?//?periodic?deletion
>  >??????????????????mediumObjectArray?=?new?byte[totalObjects][];?//?also?delete?all?medium?objects?in?the?old?array
>  >??????????????????mediumObjectArrayCurrentIndex?=?0;
>  >??????????????}?else?{
>  >??????????????????mediumObjectArray[mediumObjectArrayCurrentIndex]?=?new?byte[KB_PER_OBJECT?<<?10];
>  >??????????????????mediumObjectArrayCurrentIndex?++;
>  >??????????????}
>  >??????????}
>  >??????}
>  >?}
>  >
>  >?To?avoid?"page?cache?flush",?we?made?a?patch?for?converting?small/medium?pages?to?medium/small?pages?ahead?of?time.?This?patch?works?well?on?an?application?with?relatively-stable?allocation?rate,?which?has?not?encountered?throughput?problem.?How?do?you?think?of?this?solution?
>  >
>  >?We?notice?that?you?are?improving?the?efficiency?for?map/unmap?operations?(https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/029936.html).?It?may?be?a?step?for?improving?the?delay?caused?by?"page?cache?flush".?Do?you?have?further?plan?for?eliminating?or?improving?"page?cache?flush"?
> 
> Yes,?and?as?you?might?have?seen,?the?latest?incarnation?of?this?patchset
> includes?asynchronous?unmapping,?which?helps?reduce?the?time?for?page
> cache?flushing.?I?ran?your?example?program?above,?with?these?patches?and
> can?see?~30%?reduction?in?average?page?allocation?time,?and?~60%
> reduction?in?worst?case?page?allocation?time.?So,?it?will?be?an?improvement.
> 
> However,?I'd?be?more?than?happy?to?take?a?look?at?your?patch?and?see
> what?you've?done.?Making?page?cache?flushing?even?less?expensive?is
> something?we're?interested?in?going?forward.
> 
> cheers,
> Per
> 
>  >
>  >?Sincerely,Hao?Tang
>  >
> 


From thomas.schatzl at oracle.com  Tue Jun 23 09:32:36 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 23 Jun 2020 11:32:36 +0200
Subject: RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1
In-Reply-To: <CA++bR4PSJBWsSC1u3Z5KKBrAid6aEALhjaf3tDdmzed=3Ndz8g@mail.gmail.com>
References: <mailman.189.1592471596.17886.hotspot-gc-dev@openjdk.java.net>
 <CA++bR4NM5sL1hhb1DutkY-Y7DFRG_cEsK8XkPb6wzzwrux3akA@mail.gmail.com>
 <CA++bR4PSJBWsSC1u3Z5KKBrAid6aEALhjaf3tDdmzed=3Ndz8g@mail.gmail.com>
Message-ID: <083dbe9c-cc84-0dc1-4c75-e583f81196fc@oracle.com>

Hi Ruslan,

On 18.06.20 14:16, Ruslan Synytsky wrote:
> Hi Thomas and Liang, thank you for moving this improvement forward. A quick

:) Thanks.

> question regarding the naming: did we agree on how this parameter should be
> called? What happens when heap usage goes higher than SoftMaxHeapSize -
> OOMError or JVM gets a little bit more memory? If JVM throws OOMError I
> believe the right naming should be HardMaxHeapSize. Sorry in advance if I
> missed this point in the previous conversations.

SoftMaxHeapSize is not what you describe here - SoftMaxHeapSize is only 
an internal goal for heap sizing without guarantees. Hence the name 
*Soft*MaxHeapSize.

See https://bugs.openjdk.java.net/browse/JDK-8222145.

There has been no progress on anything like Current/HardMaxHeapSize.

> 
> Also, some news regarding analysis automation of memory usage efficiency
> I'm working on in the background. We built a relatively small script that
> collects memory usage metrics from many containers running inside the same
> large host machine. After executing it in one of our dev environments with
> about 150 containers we got interesting results - the used heap is very
> close to the committed heap while Xmx is much higher compared to committed
> value. Please note, almost all containers use JEP 346 improvement or
> javaagent which triggers GC at idle state in the older JDK versions.
> 
> [image: Screenshot 2020-06-18 at 13.20.19.jpg] >
> Zoomed
> 
> [image: Screenshot 2020-06-18 at 14.40.18.jpg]

While the screenshots have been scrubbed by the mailing list it's very 
nice to hear that you have had success with these approaches.

> There is a challenge to get metrics from a running java process. I as
> understand the most accurate metrics can be collected via MBean, for
> example
> 
> JMXConnector c =
> JMXConnectorFactory.newJMXConnector(createConnectionURL(host, port),
> null);
> c.connect();
> MBeanServerConnection mbsc = c.getMBeanServerConnection();
> MemoryMXBean mem = ManagementFactory.getPlatformMXBean(mbsc,
> MemoryMXBean.class);
> MemoryUsage heap = mem.*getHeapMemoryUsage*();
> 
> However, enabling JMX ManagementAgent via jcmd and connecting to JVM with a
> JMX client is a quite complex operation for getting such a simple metric
> about heap memory usage. Also, some java processes may already
> start ManagementAgent on a custom port with auth protection, so we can't
> collect statistics from such java processes without contacting the
> application owner (you can see the gaps on the chart). Do you know any
> other way for collecting accurate heap usage statistics from a running java
> process? We plan to run this analysis tool on productions with a large
> number of containers, so we can get a more realistic picture.
> 

Jcmd with the GC.heap_info command provides some information, probably 
not enough (I filed JDK-8248136) though. More information can be 
retrieved with the "VM.info" command, the detailed per-region printout 
which might be too much information.

There is also JFR with its event streaming API that could be an option, 
however it is JDK14 only (https://openjdk.java.net/jeps/349).

Finally, there is jstat to gather some information.

Thanks,
   Thomas


From erik.osterlund at oracle.com  Tue Jun 23 09:52:14 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 23 Jun 2020 11:52:14 +0200
Subject: RFR: 8248133: SerialGC: Unify handling of all OopStorage
 instances in root processing
In-Reply-To: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
References: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
Message-ID: <83bfdfb8-c86c-aceb-3d5c-97faa88a69a7@oracle.com>

Hi Stefan,

Looks good.

Thanks,
/Erik

On 2020-06-23 10:12, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to nify handling of all OopStorage instances 
> in root processing for the Serial GC.
>
> https://cr.openjdk.java.net/~stefank/8248133/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8248133
>
> This removes the explicit enumeration of "strong" OopStorages in the 
> Serial GC. This is a step towards allowing the Runtime code to add new 
> OopStorages without having to update all GCs.
>
> Tested with tier1-tier3
>
> Thanks,
> StefanK


From erik.osterlund at oracle.com  Tue Jun 23 09:50:46 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 23 Jun 2020 11:50:46 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
Message-ID: <d803e53e-7f2d-04db-c264-3de1e00ff1ad@oracle.com>

Hi Stefan,

Looks good.

Thanks,
/Erik

On 2020-06-23 10:10, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to unify handling of all OopStorage instances 
> in root processing.
>
> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8248132
>
> This removes the explicit enumeration of "strong" OopStorages in ZGC. 
> This is a step towards allowing the Runtime code to add new 
> OopStorages without having to update all GCs.
>
> Tested with tier1-3
>
> Thanks,
> StefanK


From kim.barrett at oracle.com  Tue Jun 23 09:53:54 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 23 Jun 2020 05:53:54 -0400
Subject: RFR: 8247740: Inline derived CollectedHeap access for G1 and
 ParallelGC
In-Reply-To: <f02d4597-bd1e-8e22-c4c7-51d45f5686d6@oracle.com>
References: <15884ED9-3AF4-4F89-894D-5BD42C7796BA@oracle.com>
 <f02d4597-bd1e-8e22-c4c7-51d45f5686d6@oracle.com>
Message-ID: <48C9B828-45CD-4B30-B0B1-E2F39FD40E2D@oracle.com>

> On Jun 23, 2020, at 2:56 AM, Per Liden <per.liden at oracle.com> wrote:
> 
> On 6/17/20 2:32 PM, Kim Barrett wrote:
>> Please review this change to derived CollectedHeap access for the
>> various collectors.  [?]
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8247740
>> Webrev:
>> https://cr.openjdk.java.net/~kbarrett/8247740/open.00/
> 
> Looks good!
> 
> /Per
> 
>> Testing:
>> mach5 tier1

Thanks.


From kim.barrett at oracle.com  Tue Jun 23 09:53:19 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 23 Jun 2020 05:53:19 -0400
Subject: RFR: 8248133: SerialGC: Unify handling of all OopStorage
 instances in root processing
In-Reply-To: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
References: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
Message-ID: <621EAE92-264D-48A0-94B9-E4DAB1A5BCBB@oracle.com>

> On Jun 23, 2020, at 4:12 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> Hi all,
> 
> Please review this patch to nify handling of all OopStorage instances in root processing for the Serial GC.
> 
> https://cr.openjdk.java.net/~stefank/8248133/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8248133
> 
> This removes the explicit enumeration of "strong" OopStorages in the Serial GC. This is a step towards allowing the Runtime code to add new OopStorages without having to update all GCs.
> 
> Tested with tier1-tier3
> 
> Thanks,
> StefanK

This change presumes we like JDK-8234502 :)

Looks good.


From erik.osterlund at oracle.com  Tue Jun 23 09:55:50 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Tue, 23 Jun 2020 11:55:50 +0200
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
Message-ID: <19f93bfc-e773-6a19-c6c9-91c36f5bff72@oracle.com>

Hi Stefan,

Looks good.

Thanks,
/Erik

On 2020-06-23 10:23, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to both unify handling of all OopStorage 
> instances and parallelize it in the root processing of the Parallel GC.
>
> https://cr.openjdk.java.net/~stefank/8247820/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8247820
>
> This removes the explicit enumeration of "strong" OopStorages in the 
> Parallel GC. This is a step towards allowing the Runtime code to add 
> new OopStorages without having to update all GCs.
>
> It also parallelizes the processing of the OopStorages, using the 
> class that's being introduced in:
> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/030152.html 
>
>
> Tested with tier1-3
>
> Thanks,
> StefanK


From stefan.karlsson at oracle.com  Tue Jun 23 10:02:42 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 12:02:42 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <d803e53e-7f2d-04db-c264-3de1e00ff1ad@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <d803e53e-7f2d-04db-c264-3de1e00ff1ad@oracle.com>
Message-ID: <b1af1877-3695-6ad7-fe79-66f6f3bd9ddf@oracle.com>

Thanks, Erik.

StefanK

On 2020-06-23 11:50, Erik ?sterlund wrote:
> Hi Stefan,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
> On 2020-06-23 10:10, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to unify handling of all OopStorage instances 
>> in root processing.
>>
>> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8248132
>>
>> This removes the explicit enumeration of "strong" OopStorages in ZGC. 
>> This is a step towards allowing the Runtime code to add new 
>> OopStorages without having to update all GCs.
>>
>> Tested with tier1-3
>>
>> Thanks,
>> StefanK
> 


From stefan.karlsson at oracle.com  Tue Jun 23 10:06:10 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 12:06:10 +0200
Subject: RFR: 8248133: SerialGC: Unify handling of all OopStorage
 instances in root processing
In-Reply-To: <621EAE92-264D-48A0-94B9-E4DAB1A5BCBB@oracle.com>
References: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
 <621EAE92-264D-48A0-94B9-E4DAB1A5BCBB@oracle.com>
Message-ID: <b482365e-246a-71c4-271c-94098091e50e@oracle.com>


On 2020-06-23 11:53, Kim Barrett wrote:
>> On Jun 23, 2020, at 4:12 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>>
>> Hi all,
>>
>> Please review this patch to nify handling of all OopStorage instances in root processing for the Serial GC.
>>
>> https://cr.openjdk.java.net/~stefank/8248133/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8248133
>>
>> This removes the explicit enumeration of "strong" OopStorages in the Serial GC. This is a step towards allowing the Runtime code to add new OopStorages without having to update all GCs.
>>
>> Tested with tier1-tier3
>>
>> Thanks,
>> StefanK
> 
> This change presumes we like JDK-8234502 :)

Yeah, we do. ;) It would be nice to see some start chipping away at 
that. I'm not sure it has to be an all-in-one change.

> 
> Looks good.

Thanks,
StefanK

> 


From stefan.karlsson at oracle.com  Tue Jun 23 10:06:21 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 12:06:21 +0200
Subject: RFR: 8248133: SerialGC: Unify handling of all OopStorage
 instances in root processing
In-Reply-To: <83bfdfb8-c86c-aceb-3d5c-97faa88a69a7@oracle.com>
References: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
 <83bfdfb8-c86c-aceb-3d5c-97faa88a69a7@oracle.com>
Message-ID: <a319cc62-848f-8659-6058-5f788ad4173f@oracle.com>

Thanks, Erik.

StefanK

On 2020-06-23 11:52, Erik ?sterlund wrote:
> Hi Stefan,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
> On 2020-06-23 10:12, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to nify handling of all OopStorage instances 
>> in root processing for the Serial GC.
>>
>> https://cr.openjdk.java.net/~stefank/8248133/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8248133
>>
>> This removes the explicit enumeration of "strong" OopStorages in the 
>> Serial GC. This is a step towards allowing the Runtime code to add new 
>> OopStorages without having to update all GCs.
>>
>> Tested with tier1-tier3
>>
>> Thanks,
>> StefanK
> 


From stefan.karlsson at oracle.com  Tue Jun 23 10:06:33 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 12:06:33 +0200
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <19f93bfc-e773-6a19-c6c9-91c36f5bff72@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
 <19f93bfc-e773-6a19-c6c9-91c36f5bff72@oracle.com>
Message-ID: <7a9f44e8-54d2-454d-0ded-cd15cb26fa03@oracle.com>

Thanks, Erik.

StefanK

On 2020-06-23 11:55, Erik ?sterlund wrote:
> Hi Stefan,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
> On 2020-06-23 10:23, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to both unify handling of all OopStorage 
>> instances and parallelize it in the root processing of the Parallel GC.
>>
>> https://cr.openjdk.java.net/~stefank/8247820/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8247820
>>
>> This removes the explicit enumeration of "strong" OopStorages in the 
>> Parallel GC. This is a step towards allowing the Runtime code to add 
>> new OopStorages without having to update all GCs.
>>
>> It also parallelizes the processing of the OopStorages, using the 
>> class that's being introduced in:
>> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/030152.html 
>>
>>
>> Tested with tier1-3
>>
>> Thanks,
>> StefanK
> 


From kim.barrett at oracle.com  Tue Jun 23 10:23:07 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 23 Jun 2020 06:23:07 -0400
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
Message-ID: <347325E2-DE64-4308-BF31-0B9CF33D602B@oracle.com>

> On Jun 23, 2020, at 4:23 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> Hi all,
> 
> Please review this patch to both unify handling of all OopStorage instances and parallelize it in the root processing of the Parallel GC.
> 
> https://cr.openjdk.java.net/~stefank/8247820/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8247820
> 
> This removes the explicit enumeration of "strong" OopStorages in the Parallel GC. This is a step towards allowing the Runtime code to add new OopStorages without having to update all GCs.
> 
> It also parallelizes the processing of the OopStorages, using the class that's being introduced in:
> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/030152.html
> 
> Tested with tier1-3
> 
> Thanks,
> StefanK

------------------------------------------------------------------------------
src/hotspot/share/gc/parallel/psScavenge.cpp
 367     // Scavenge OopStorages
...
 376     PSThreadRootsTaskClosure closure(worker_id);
 377     Threads::possibly_parallel_threads_do(true /*parallel */, &closure);

I think it's better to do these in the other order.  Processing the
OopStorages is very parallel, with relatively small work chunks.
Thread processing could encounter a large thread late in the process,
leaving the one thread processing it as the long pole, with other
threads possibly not having much to do, other than (relatively
expensive) stealing.

Similarly for psParallelCompact.

------------------------------------------------------------------------------

Other than that one comment, looks good.

Gosh, I thought that was going to be much harder. I'm guessing Leo's
conversion of ParallelGC to use workgangs simplified things some.


From stefan.karlsson at oracle.com  Tue Jun 23 10:29:28 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 12:29:28 +0200
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <347325E2-DE64-4308-BF31-0B9CF33D602B@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
 <347325E2-DE64-4308-BF31-0B9CF33D602B@oracle.com>
Message-ID: <b808e2e4-cc98-e7dc-9c87-80baafc281f5@oracle.com>


On 2020-06-23 12:23, Kim Barrett wrote:
>> On Jun 23, 2020, at 4:23 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>>
>> Hi all,
>>
>> Please review this patch to both unify handling of all OopStorage instances and parallelize it in the root processing of the Parallel GC.
>>
>> https://cr.openjdk.java.net/~stefank/8247820/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8247820
>>
>> This removes the explicit enumeration of "strong" OopStorages in the Parallel GC. This is a step towards allowing the Runtime code to add new OopStorages without having to update all GCs.
>>
>> It also parallelizes the processing of the OopStorages, using the class that's being introduced in:
>> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/030152.html
>>
>> Tested with tier1-3
>>
>> Thanks,
>> StefanK
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/parallel/psScavenge.cpp
>   367     // Scavenge OopStorages
> ...
>   376     PSThreadRootsTaskClosure closure(worker_id);
>   377     Threads::possibly_parallel_threads_do(true /*parallel */, &closure);
> 
> I think it's better to do these in the other order.  Processing the
> OopStorages is very parallel, with relatively small work chunks.
> Thread processing could encounter a large thread late in the process,
> leaving the one thread processing it as the long pole, with other
> threads possibly not having much to do, other than (relatively
> expensive) stealing.
> 
> Similarly for psParallelCompact.
> 
> ------------------------------------------------------------------------------
> 
> Other than that one comment, looks good.

Updated webrev:
  https://cr.openjdk.java.net/~stefank/8247820/webrev.02.delta/
  https://cr.openjdk.java.net/~stefank/8247820/webrev.02/

> 
> Gosh, I thought that was going to be much harder. I'm guessing Leo's
> conversion of ParallelGC to use workgangs simplified things some.

:)

Thanks for reviewing,
StefanK

> 
> 


From thomas.schatzl at oracle.com  Tue Jun 23 10:33:52 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 23 Jun 2020 12:33:52 +0200
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
Message-ID: <fdfafe56-61a1-111d-136c-4c25faaae3dd@oracle.com>

Hi,

On 23.06.20 10:23, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to both unify handling of all OopStorage 
> instances and parallelize it in the root processing of the Parallel GC.
> 
> https://cr.openjdk.java.net/~stefank/8247820/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8247820
> 
> This removes the explicit enumeration of "strong" OopStorages in the 
> Parallel GC. This is a step towards allowing the Runtime code to add new 
> OopStorages without having to update all GCs.
> 
> It also parallelizes the processing of the OopStorages, using the class 
> that's being introduced in:
> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/030152.html 
> 
> 
> Tested with tier1-3
> 

   lgtm

Thomas


From thomas.schatzl at oracle.com  Tue Jun 23 10:32:52 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 23 Jun 2020 10:32:52 +0000 (UTC)
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
Message-ID: <d9bfd6f8-ab57-3aa6-0106-a8d2bd5f74fa@oracle.com>

Hi,

On 23.06.20 10:10, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to unify handling of all OopStorage instances 
> in root processing.
> 
> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8248132
> 
> This removes the explicit enumeration of "strong" OopStorages in ZGC. 
> This is a step towards allowing the Runtime code to add new OopStorages 
> without having to update all GCs.
> 
> Tested with tier1-3

   looks good.

Thomas


From thomas.schatzl at oracle.com  Tue Jun 23 10:48:20 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 23 Jun 2020 12:48:20 +0200
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <b808e2e4-cc98-e7dc-9c87-80baafc281f5@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
 <347325E2-DE64-4308-BF31-0B9CF33D602B@oracle.com>
 <b808e2e4-cc98-e7dc-9c87-80baafc281f5@oracle.com>
Message-ID: <067b2ba3-5d72-d62d-47d4-c19ca1d09877@oracle.com>

Hi,

On 23.06.20 12:29, Stefan Karlsson wrote:
> 
> 
> On 2020-06-23 12:23, Kim Barrett wrote:
>>> On Jun 23, 2020, at 4:23 AM, Stefan Karlsson 
>>> <stefan.karlsson at oracle.com> wrote:
>>>[...]
>>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/share/gc/parallel/psScavenge.cpp
>> ? 367???? // Scavenge OopStorages
>> ...
>> ? 376???? PSThreadRootsTaskClosure closure(worker_id);
>> ? 377???? Threads::possibly_parallel_threads_do(true /*parallel */, 
>> &closure);
>>
>> I think it's better to do these in the other order.? Processing the
>> OopStorages is very parallel, with relatively small work chunks.
>> Thread processing could encounter a large thread late in the process,
>> leaving the one thread processing it as the long pole, with other
>> threads possibly not having much to do, other than (relatively
>> expensive) stealing.
>>
>> Similarly for psParallelCompact.
>>
>> ------------------------------------------------------------------------------ 
>>
>>
>> Other than that one comment, looks good.
> 
> Updated webrev:
>  ?https://cr.openjdk.java.net/~stefank/8247820/webrev.02.delta/
>  ?https://cr.openjdk.java.net/~stefank/8247820/webrev.02/
> 

   still good.

Thomas


From stefan.karlsson at oracle.com  Tue Jun 23 10:49:18 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 12:49:18 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <d9bfd6f8-ab57-3aa6-0106-a8d2bd5f74fa@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <d9bfd6f8-ab57-3aa6-0106-a8d2bd5f74fa@oracle.com>
Message-ID: <47acf89f-2aa1-7e78-a1ae-426b4a10889c@oracle.com>

Thanks, Thomas.

StefanK

On 2020-06-23 12:32, Thomas Schatzl wrote:
> Hi,
> 
> On 23.06.20 10:10, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to unify handling of all OopStorage instances 
>> in root processing.
>>
>> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8248132
>>
>> This removes the explicit enumeration of "strong" OopStorages in ZGC. 
>> This is a step towards allowing the Runtime code to add new 
>> OopStorages without having to update all GCs.
>>
>> Tested with tier1-3
> 
>  ? looks good.
> 
> Thomas


From stefan.karlsson at oracle.com  Tue Jun 23 11:03:12 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 23 Jun 2020 13:03:12 +0200
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <067b2ba3-5d72-d62d-47d4-c19ca1d09877@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
 <347325E2-DE64-4308-BF31-0B9CF33D602B@oracle.com>
 <b808e2e4-cc98-e7dc-9c87-80baafc281f5@oracle.com>
 <067b2ba3-5d72-d62d-47d4-c19ca1d09877@oracle.com>
Message-ID: <38afc511-7b6c-5e31-ee81-d09dd249d117@oracle.com>

Thanks, Thomas.

StefanK

On 2020-06-23 12:48, Thomas Schatzl wrote:
> Hi,
> 
> On 23.06.20 12:29, Stefan Karlsson wrote:
>>
>>
>> On 2020-06-23 12:23, Kim Barrett wrote:
>>>> On Jun 23, 2020, at 4:23 AM, Stefan Karlsson 
>>>> <stefan.karlsson at oracle.com> wrote:
>>>> [...]
>>>
>>> ------------------------------------------------------------------------------ 
>>>
>>> src/hotspot/share/gc/parallel/psScavenge.cpp
>>> ? 367???? // Scavenge OopStorages
>>> ...
>>> ? 376???? PSThreadRootsTaskClosure closure(worker_id);
>>> ? 377???? Threads::possibly_parallel_threads_do(true /*parallel */, 
>>> &closure);
>>>
>>> I think it's better to do these in the other order.? Processing the
>>> OopStorages is very parallel, with relatively small work chunks.
>>> Thread processing could encounter a large thread late in the process,
>>> leaving the one thread processing it as the long pole, with other
>>> threads possibly not having much to do, other than (relatively
>>> expensive) stealing.
>>>
>>> Similarly for psParallelCompact.
>>>
>>> ------------------------------------------------------------------------------ 
>>>
>>>
>>> Other than that one comment, looks good.
>>
>> Updated webrev:
>> ??https://cr.openjdk.java.net/~stefank/8247820/webrev.02.delta/
>> ??https://cr.openjdk.java.net/~stefank/8247820/webrev.02/
>>
> 
>  ? still good.
> 
> Thomas


From kim.barrett at oracle.com  Tue Jun 23 11:39:03 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 23 Jun 2020 07:39:03 -0400
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <b808e2e4-cc98-e7dc-9c87-80baafc281f5@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
 <347325E2-DE64-4308-BF31-0B9CF33D602B@oracle.com>
 <b808e2e4-cc98-e7dc-9c87-80baafc281f5@oracle.com>
Message-ID: <E3A27AC3-0F89-4D5F-8CA3-DB160447AE66@oracle.com>

> On Jun 23, 2020, at 6:29 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> 
> 
> On 2020-06-23 12:23, Kim Barrett wrote:
>> Other than that one comment, looks good.
> 
> Updated webrev:
> https://cr.openjdk.java.net/~stefank/8247820/webrev.02.delta/
> https://cr.openjdk.java.net/~stefank/8247820/webrev.02/

Looks good.


From kim.barrett at oracle.com  Tue Jun 23 11:40:17 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 23 Jun 2020 07:40:17 -0400
Subject: RFR: 8248133: SerialGC: Unify handling of all OopStorage
 instances in root processing
In-Reply-To: <b482365e-246a-71c4-271c-94098091e50e@oracle.com>
References: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
 <621EAE92-264D-48A0-94B9-E4DAB1A5BCBB@oracle.com>
 <b482365e-246a-71c4-271c-94098091e50e@oracle.com>
Message-ID: <4B0A457E-27A9-4689-8037-4DF3E2A432F4@oracle.com>

> On Jun 23, 2020, at 6:06 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> 
> 
> On 2020-06-23 11:53, Kim Barrett wrote:
>> This change presumes we like JDK-8234502 :)
> 
> Yeah, we do. ;) It would be nice to see some start chipping away at that. I'm not sure it has to be an all-in-one change.

Oh please, NOT an all-in-one change!


From per.liden at oracle.com  Tue Jun 23 13:09:54 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 23 Jun 2020 15:09:54 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
Message-ID: <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>

Hi Stefan,

On 6/23/20 10:10 AM, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to unify handling of all OopStorage instances 
> in root processing.
> 
> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/

I note that the size of the dynamic allocation is always known at 
compile-time. So how about we just avoid it altogether with something 
like this?

diff --git a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp 
b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
--- a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
+++ b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
@@ -26,16 +26,16 @@
  #define SHARE_GC_SHARED_OOPSTORAGESETPARSTATE_HPP

  #include "gc/shared/oopStorageParState.hpp"
+#include "gc/shared/oopStorageSet.hpp"

  template <bool concurrent, bool is_const>
  class OopStorageSetStrongParState {
  private:
    typedef OopStorage::ParState<concurrent, is_const> ParStateType;

-  ParStateType* _par_states;
+  char _par_states[sizeof(ParStateType) * OopStorageSet::strong_count];

-  static ParStateType* allocate();
-  static void deallocate(ParStateType* iter_set);
+  ParStateType* par_state(int index);

  public:
    OopStorageSetStrongParState();
diff --git 
a/src/hotspot/share/gc/shared/oopStorageSetParState.inline.hpp 
b/src/hotspot/share/gc/shared/oopStorageSetParState.inline.hpp
--- a/src/hotspot/share/gc/shared/oopStorageSetParState.inline.hpp
+++ b/src/hotspot/share/gc/shared/oopStorageSetParState.inline.hpp
@@ -26,25 +26,19 @@
  #define SHARE_GC_SHARED_OOPSTORAGESETPARSTATE_INLINE_HPP

  #include "gc/shared/oopStorageParState.inline.hpp"
-#include "gc/shared/oopStorageSet.hpp"
  #include "gc/shared/oopStorageSetParState.hpp"

  template <bool concurrent, bool is_const>
-typename OopStorageSetStrongParState<concurrent, 
is_const>::ParStateType* OopStorageSetStrongParState<concurrent, 
is_const>::allocate() {
-  return 
MallocArrayAllocator<ParStateType>::allocate(OopStorageSet::strong_count, 
mtGC);
-}
-
-template <bool concurrent, bool is_const>
-void OopStorageSetStrongParState<concurrent, 
is_const>::deallocate(ParStateType* iter_set) {
-  MallocArrayAllocator<ParStateType>::free(iter_set);
+typename OopStorageSetStrongParState<concurrent, 
is_const>::ParStateType* OopStorageSetStrongParState<concurrent, 
is_const>::par_state(int index) {
+  return reinterpret_cast<ParStateType*>(_par_states) + index;
  }

  template <bool concurrent, bool is_const>
  OopStorageSetStrongParState<concurrent, 
is_const>::OopStorageSetStrongParState() :
-    _par_states(allocate()) {
+    _par_states() {
    int counter = 0;
    for (OopStorageSet::Iterator it = OopStorageSet::strong_iterator(); 
!it.is_end(); ++it) {
-    new (&_par_states[counter++]) ParStateType(*it);
+    new (par_state(counter++)) ParStateType(*it);
    }
  }

@@ -52,17 +46,15 @@
  OopStorageSetStrongParState<concurrent, 
is_const>::~OopStorageSetStrongParState() {
    int counter = 0;
    for (OopStorageSet::Iterator it = OopStorageSet::strong_iterator(); 
!it.is_end(); ++it) {
-    _par_states[counter++].~ParStateType();
+    par_state(counter++)->~ParStateType();
    }
-
-  deallocate(_par_states);
  }

  template <bool concurrent, bool is_const>
  template <typename Closure>
  void OopStorageSetStrongParState<concurrent, 
is_const>::oops_do(Closure* cl) {
    for (size_t i = 0; i < OopStorageSet::strong_count; i++) {
-    _par_states[i].oops_do(cl);
+    par_state(i)->oops_do(cl);
    }
  }


/Per

> https://bugs.openjdk.java.net/browse/JDK-8248132
> 
> This removes the explicit enumeration of "strong" OopStorages in ZGC. 
> This is a step towards allowing the Runtime code to add new OopStorages 
> without having to update all GCs.
> 
> Tested with tier1-3
> 
> Thanks,
> StefanK


From per.liden at oracle.com  Tue Jun 23 13:53:09 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 23 Jun 2020 15:53:09 +0200
Subject: RFR: 8248133: SerialGC: Unify handling of all OopStorage
 instances in root processing
In-Reply-To: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
References: <491d4729-c36c-114e-db60-048adcd2e998@oracle.com>
Message-ID: <b6535d49-70fa-fd54-e9ff-65ebb7617f1b@oracle.com>

Looks good!

/Per

On 6/23/20 10:12 AM, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to nify handling of all OopStorage instances in 
> root processing for the Serial GC.
> 
> https://cr.openjdk.java.net/~stefank/8248133/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8248133
> 
> This removes the explicit enumeration of "strong" OopStorages in the 
> Serial GC. This is a step towards allowing the Runtime code to add new 
> OopStorages without having to update all GCs.
> 
> Tested with tier1-tier3
> 
> Thanks,
> StefanK


From per.liden at oracle.com  Tue Jun 23 13:54:16 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 23 Jun 2020 15:54:16 +0200
Subject: RFR: 8247820: ParallelGC: Process strong OopStorage entries in
 parallel
In-Reply-To: <b808e2e4-cc98-e7dc-9c87-80baafc281f5@oracle.com>
References: <159caf9c-3a74-cbca-8c07-81e78f0c356f@oracle.com>
 <347325E2-DE64-4308-BF31-0B9CF33D602B@oracle.com>
 <b808e2e4-cc98-e7dc-9c87-80baafc281f5@oracle.com>
Message-ID: <32d0f8d4-b4e3-6ca1-badc-fbfa6e2b3796@oracle.com>

On 6/23/20 12:29 PM, Stefan Karlsson wrote:
> 
> 
> On 2020-06-23 12:23, Kim Barrett wrote:
>>> On Jun 23, 2020, at 4:23 AM, Stefan Karlsson 
>>> <stefan.karlsson at oracle.com> wrote:
>>>
>>> Hi all,
>>>
>>> Please review this patch to both unify handling of all OopStorage 
>>> instances and parallelize it in the root processing of the Parallel GC.
>>>
>>> https://cr.openjdk.java.net/~stefank/8247820/webrev.01/
>>> https://bugs.openjdk.java.net/browse/JDK-8247820
>>>
>>> This removes the explicit enumeration of "strong" OopStorages in the 
>>> Parallel GC. This is a step towards allowing the Runtime code to add 
>>> new OopStorages without having to update all GCs.
>>>
>>> It also parallelizes the processing of the OopStorages, using the 
>>> class that's being introduced in:
>>> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/030152.html 
>>>
>>>
>>> Tested with tier1-3
>>>
>>> Thanks,
>>> StefanK
>>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/share/gc/parallel/psScavenge.cpp
>> ? 367???? // Scavenge OopStorages
>> ...
>> ? 376???? PSThreadRootsTaskClosure closure(worker_id);
>> ? 377???? Threads::possibly_parallel_threads_do(true /*parallel */, 
>> &closure);
>>
>> I think it's better to do these in the other order.? Processing the
>> OopStorages is very parallel, with relatively small work chunks.
>> Thread processing could encounter a large thread late in the process,
>> leaving the one thread processing it as the long pole, with other
>> threads possibly not having much to do, other than (relatively
>> expensive) stealing.
>>
>> Similarly for psParallelCompact.
>>
>> ------------------------------------------------------------------------------ 
>>
>>
>> Other than that one comment, looks good.
> 
> Updated webrev:
>  ?https://cr.openjdk.java.net/~stefank/8247820/webrev.02.delta/
>  ?https://cr.openjdk.java.net/~stefank/8247820/webrev.02/

Looks good!

/Per

> 
>>
>> Gosh, I thought that was going to be much harder. I'm guessing Leo's
>> conversion of ParallelGC to use workgangs simplified things some.
> 
> :)
> 
> Thanks for reviewing,
> StefanK
> 
>>
>>


From kim.barrett at oracle.com  Tue Jun 23 14:09:41 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 23 Jun 2020 10:09:41 -0400
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
Message-ID: <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>

> On Jun 23, 2020, at 9:09 AM, Per Liden <per.liden at oracle.com> wrote:
> 
> Hi Stefan,
> 
> On 6/23/20 10:10 AM, Stefan Karlsson wrote:
>> Hi all,
>> Please review this patch to unify handling of all OopStorage instances in root processing.
>> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
> 
> I note that the size of the dynamic allocation is always known at compile-time. So how about we just avoid it altogether with something like this?
> 
> diff --git a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
> --- a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
> +++ b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
> @@ -26,16 +26,16 @@
> #define SHARE_GC_SHARED_OOPSTORAGESETPARSTATE_HPP
> 
> #include "gc/shared/oopStorageParState.hpp"
> +#include "gc/shared/oopStorageSet.hpp"
> 
> template <bool concurrent, bool is_const>
> class OopStorageSetStrongParState {
> private:
>   typedef OopStorage::ParState<concurrent, is_const> ParStateType;
> 
> -  ParStateType* _par_states;
> +  char _par_states[sizeof(ParStateType) * OopStorageSet::strong_count];

(Not a review, just a drive-by comment.)

This doesn't guarantee proper alignment of _par_states; with
_par_states being the only member, it only requires char alignment.
(It might happen to work because of vagaries of heap allocators and
stack alignment requirements, possibly always on some platforms, but
that's not guaranteed.)

-  ParStateType* _par_states;
+  char _par_states[sizeof(ParStateType) * OopStorageSet::strong_count];

Doing that correctly is a bit annoying, which is why C++11 added
std::aligned_storage.


From per.liden at oracle.com  Tue Jun 23 14:29:46 2020
From: per.liden at oracle.com (Per Liden)
Date: Tue, 23 Jun 2020 16:29:46 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
 <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>
Message-ID: <5aca219e-3119-f5e7-de0b-a38bc534baac@oracle.com>

Hi,

On 6/23/20 4:09 PM, Kim Barrett wrote:
>> On Jun 23, 2020, at 9:09 AM, Per Liden <per.liden at oracle.com> wrote:
>>
>> Hi Stefan,
>>
>> On 6/23/20 10:10 AM, Stefan Karlsson wrote:
>>> Hi all,
>>> Please review this patch to unify handling of all OopStorage instances in root processing.
>>> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
>>
>> I note that the size of the dynamic allocation is always known at compile-time. So how about we just avoid it altogether with something like this?
>>
>> diff --git a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>> --- a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>> +++ b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>> @@ -26,16 +26,16 @@
>> #define SHARE_GC_SHARED_OOPSTORAGESETPARSTATE_HPP
>>
>> #include "gc/shared/oopStorageParState.hpp"
>> +#include "gc/shared/oopStorageSet.hpp"
>>
>> template <bool concurrent, bool is_const>
>> class OopStorageSetStrongParState {
>> private:
>>    typedef OopStorage::ParState<concurrent, is_const> ParStateType;
>>
>> -  ParStateType* _par_states;
>> +  char _par_states[sizeof(ParStateType) * OopStorageSet::strong_count];
> 
> (Not a review, just a drive-by comment.)
> 
> This doesn't guarantee proper alignment of _par_states; with
> _par_states being the only member, it only requires char alignment.
> (It might happen to work because of vagaries of heap allocators and
> stack alignment requirements, possibly always on some platforms, but
> that's not guaranteed.)
> 
> -  ParStateType* _par_states;
> +  char _par_states[sizeof(ParStateType) * OopStorageSet::strong_count];

You're right, but I'm thinking ATTRIBUTE_ALIGNED should work here.

cheers,
Per


> 
> Doing that correctly is a bit annoying, which is why C++11 added
> std::aligned_storage.
> 


From zgu at redhat.com  Tue Jun 23 14:56:56 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 23 Jun 2020 10:56:56 -0400
Subject: [16] 8248041: Shenandoah: pre-Full GC root updates may miss some roots
Message-ID: <cc77a0da-3dee-7623-ac61-a0c2195d7c19@redhat.com>

Updating roots before full GC, should really belong to prepare phase.
The roots may contain forwarded pointers, only when this full GC is 
triggered by evacuation OOM, that leaves forwarded objects in roots.

Moving to prepare phase, while has_forwarded_objects flag still valid, 
also allows us to bypass this phase, if no forwarded objects in heap.


Bug: https://bugs.openjdk.java.net/browse/JDK-8248041
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8248041/webrev.00/

Test:
   hotspot_gc_shenandoah

Thanks,

-Zhengyu


From stefan.johansson at oracle.com  Tue Jun 23 15:15:09 2020
From: stefan.johansson at oracle.com (stefan.johansson at oracle.com)
Date: Tue, 23 Jun 2020 17:15:09 +0200
Subject: RFR (L): 8244603 and 8238858: Improve young gen sizing
In-Reply-To: <37a95d37-5253-4dff-ff46-3b82b4e4bcdf@oracle.com>
References: <d5c748c8-f10d-65ff-b1fb-50c0ab123384@oracle.com>
 <5da7c2e2-2d36-de11-d0b7-91cdf6fdc077@oracle.com>
 <37a95d37-5253-4dff-ff46-3b82b4e4bcdf@oracle.com>
Message-ID: <2cae48be-9fc4-7f3e-c907-5b225514e5aa@oracle.com>


On 2020-06-15 11:23, Thomas Schatzl wrote:
> Hi Stefan,
> 
> On 11.06.20 09:51, stefan.johansson at oracle.com wrote:
>> Hi Thomas,
>>
>> Sorry for not getting to this sooner.
>>
>> On 2020-05-19 15:37, Thomas Schatzl wrote:
>>> Hi all,
>>>
>>> ?? can I have reviews for this change that improves young gen sizing 
>>> a lot to prepare for heap shrinking during young gc (JDK-8238687) ;)
> [...]>> Actually, in the future, when shrinking is implemented 
> (JDK-8238687),
>>> these may be more severe (in some benchmarks, actual gc usage is 
>>> still <2%). I will likely try to balance that with decreasing default 
>>> GCTimeRatio value in the future.
>>>
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8244603
>>> https://bugs.openjdk.java.net/browse/JDK-8238858
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8244603/webrev/
>> Very nice change Thomas, really helpful with all comments.
>>
>> As I've mentioned to you offline I think we can re-structure the code 
>> a bit, to separate the updating of young length bounds from the 
>> returning of values. Here's a suggestion on how to do that:
>> http://cr.openjdk.java.net/~sjohanss/8244603/rev-1/
>>
>> src/hotspot/share/gc/g1/g1Analytics.cpp
>> ---
>> ??226 double G1Analytics::predict_alloc_rate_ms() const {
>> ??227?? if (!enough_samples_available(_alloc_rate_ms_seq)) {
>> ??228???? return predict_zero_bounded(_alloc_rate_ms_seq);
>> ??229?? } else {
>> ??230???? return 0.0;
>> ??231?? }
>> ??232 }
>>
>> As discussed, on line 227 the ! should be removed.
>> ---
>>
>> Apart from this I think it is all good. There are a few places in 
>> g1Policy.cpp where local variables could be either merged or skipped, 
>> but I think they add to the overall ease of understanding.
> 
> Applied all your comments.
> 
> New webrev:
> http://cr.openjdk.java.net/~tschatzl/8244603/webrev.0_to_1 (diff)
> http://cr.openjdk.java.net/~tschatzl/8244603/webrev.1 (full)
Looks good,
Stefan

> 
> Thanks,
>  ? Thomas


From thomas.schatzl at oracle.com  Tue Jun 23 15:16:56 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 23 Jun 2020 17:16:56 +0200
Subject: RFR (L): 8244603 and 8238858: Improve young gen sizing
In-Reply-To: <2cae48be-9fc4-7f3e-c907-5b225514e5aa@oracle.com>
References: <d5c748c8-f10d-65ff-b1fb-50c0ab123384@oracle.com>
 <5da7c2e2-2d36-de11-d0b7-91cdf6fdc077@oracle.com>
 <37a95d37-5253-4dff-ff46-3b82b4e4bcdf@oracle.com>
 <2cae48be-9fc4-7f3e-c907-5b225514e5aa@oracle.com>
Message-ID: <0142ffd5-27c6-76c6-f0e8-688da1481ea3@oracle.com>

Hi,

On 23.06.20 17:15, stefan.johansson at oracle.com wrote:
> 
> 
> On 2020-06-15 11:23, Thomas Schatzl wrote:
>> Hi Stefan,
[...]
>>>
>>> Apart from this I think it is all good. There are a few places in 
>>> g1Policy.cpp where local variables could be either merged or skipped, 
>>> but I think they add to the overall ease of understanding.
>>
>> Applied all your comments.
>>
>> New webrev:
>> http://cr.openjdk.java.net/~tschatzl/8244603/webrev.0_to_1 (diff)
>> http://cr.openjdk.java.net/~tschatzl/8244603/webrev.1 (full)
> Looks good,
> Stefan

   thanks for your review.

Could I get a second review from somebody?

Thanks,
   Thomas


From shade at redhat.com  Tue Jun 23 15:46:50 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Tue, 23 Jun 2020 17:46:50 +0200
Subject: [16] 8248041: Shenandoah: pre-Full GC root updates may miss some
 roots
In-Reply-To: <cc77a0da-3dee-7623-ac61-a0c2195d7c19@redhat.com>
References: <cc77a0da-3dee-7623-ac61-a0c2195d7c19@redhat.com>
Message-ID: <988e6de1-411b-c289-96b4-690c9f83ff02@redhat.com>

On 6/23/20 4:56 PM, Zhengyu Gu wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8248041
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8248041/webrev.00/

Nice trick, looks good.

-- 
Thanks,
-Aleksey


From albert.th at alibaba-inc.com  Wed Jun 24 03:16:00 2020
From: albert.th at alibaba-inc.com (Hao Tang)
Date: Wed, 24 Jun 2020 11:16:00 +0800
Subject: =?UTF-8?B?UmU6IERpc2N1c3Npb24gb24gWkdDJ3MgUGFnZSBDYWNoZSBGbHVzaA==?=
In-Reply-To: <5534bd26-b080-3cb7-32dd-3c7f020d0253@oracle.com>
References: <d64e4c15-5b0c-47d8-93c3-d378f9d13fbe.albert.th@alibaba-inc.com>
 <a6db1323-f14f-1ac4-81c4-37451dd978eb@oracle.com>
 <2b4f3dc4-002e-4967-85d0-945904eef27e.albert.th@alibaba-inc.com>,
 <5534bd26-b080-3cb7-32dd-3c7f020d0253@oracle.com>
Message-ID: <51f59517-26f1-4c86-8a90-5ec290640c16.albert.th@alibaba-inc.com>

Hi, I have posted the patch here: http://cr.openjdk.java.net/~ddong/haotang/balance_page_cache/webrev/

Thank you.


------------------------------------------------------------------
From:Per Liden <per.liden at oracle.com>
Send Time:2020?6?23?(???) 16:27
To:??(??) <albert.th at alibaba-inc.com>; hotspot-gc-dev openjdk.java.net <hotspot-gc-dev at openjdk.java.net>; zgc-dev <zgc-dev at openjdk.java.net>
Subject:Re: Discussion on ZGC's Page Cache Flush

Hi,

On 6/19/20 8:34 AM, Hao Tang wrote:
> Thanks for your reply.
> 
> This is our patch for "balancing" page cache: 
> https://github.com/tanghaoth90/jdk11u/commit/77631cf3 (based on jdk11u).

Sorry, but for IP clarity could you please post that patch to 
cr.openjdk.java.net, otherwise I'm afraid I can't look at the patch.

> 
> We notice two cases that "page cache flush" frequently happens:
> 
> * The number of cached pages is not sufficient for concurrent relocation.
> 
>      For example, 34 medium pages are "to-space" as the GC log shows below.
>      "[2020-03-06T05:46:31.618+0800] GC(10406) Relocation Set (Medium Pages): 54->34, 91 skipped"
>      In our scenario, hundreds of mutator threads is running. To my knowledge, these mutator can possibly relocate medium-sized
>      objects in the relocation set. If there are less than 34 cached medium pages, "page cache flush" is likely to happen.
> 
>      Our strategy is to ensure at least 34 cached medium pages before relocation.
> 
> * A lot of medium(small)-sized objects become unreachable at a moment (such as removing the root of these objects).
>      Assume that the ratio of allocation rate of small and medium objects is 1:1. In this case, small-sized and medium-sized
>      objects occupy 50% and 50% of the total memory, respectively. If medium-sized objects of 25% total memory are removed, there
>      are still cached medium pages of 25% total memory when all small pages are used up. Since ZDriver does not trigger a new
>      GC cycle at this moment, 12.5% total memory should be transformed from medium pages into small pages for allocating small
>      -sized objects.
> 
>      Our strategy is to ensure the ratio of different types of cached pages to match the ratio of allocation rate.
> 
> The patch works well on our application (by eliminating "page cache flush" and the corresponding delay). However, this approach have 
> 
> shortcomings as my previous mail mentioned. It might not be a complete solution for general cases, but still worth discussing. We are 
> 
> also thinking about alternative solutions, such as keep some cached page as buffer.
> 
> Looking forward to your feedback. Thanks.

As of JDK 13, having lots of medium/large pages in the page cache is not 
a problem, since ZGC will split such pages into small pages (which is 
inexpensive) when needed. However, going from small to medium/large is 
more problematic, as it involved (re)mapping memory. One possible 
solution to make this less expensive might be to fuse small pages into 
medium (or large) pages when they are freed. Either by 1) just 
opportunistically fusing small pages that sit next to each other in the 
address space (which would be relatively inexpensive), or 2) by 
remapping memory (which would be more expensive, but that work would be 
done by GC threads).

Alt. 1 would require the page cache to keep pages sorted by virtual 
address. While that's doable, it would be slightly complicated by 
uncommit, which wants to keep pages sorted by LRU.

Alt. 2 might be too expensive to do all the time, but might perhaps be 
useful a complement to alt. 1, if a large set of cached small pages 
can't be fused.

Monitoring the distribution of small/medium page allocations (as you 
mention), might be useful to guide alt. 1 & 2.

cheers,
Per

> 
> Sincerely,
> 
> Hao Tang
> 
> 
> 
> ------------------------------------------------------------------
> From:Per Liden <per.liden at oracle.com>
> Send Time:2020?6?5? 18:54
> To:albert.th at alibaba-inc.com; hotspot-gc-dev openjdk.java.net 
> <hotspot-gc-dev at openjdk.java.net>; zgc-dev <zgc-dev at openjdk.java.net>
> Subject:Re: Discussion on ZGC's Page Cache Flush
> 
> Hi,
> 
> On 6/5/20 11:24 AM, Hao Tang wrote:
>  >
>  > Hi ZGC Team,
>  >
>  > We encountered "Page Cache Flushed" when we enable ZGC feature. Much longer response time can be observed at the time when "Page Cache Flushed" happened. There is a case that is able to reproduce this scenario. In this case, medium-sized objects are periodically cleaned up. Right after the clean-up, small pages is not sufficient for allocating small-sized objects, which needs to flush medium pages into small pages. We found that simply enlarging the max heap size cannot solve this problem. We believe that "page cache flush" issue could be a general problem, because the ratio of small/medium/large objects are not always constant.
>  >
>  > Sample code:
>  > import java.util.Random;
>  > import java.util.concurrent.locks.LockSupport;
>  > public class TestPageCacheFlush {
>  >      /*
>  >       * Options: -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UnlockDiagnosticVMOptions -Xms10g -Xmx10g -XX:ParallelGCThreads=2 -XX:ConcGCThreads=4 -Xlog:gc,gc+heap
>  >       * small object: fast allocation
>  >       * medium object: slow allocation, periodic deletion
>  >       */
>  >      public static void main(String[] args) throws Exception {
>  >          long heapSizeKB = Runtime.getRuntime().totalMemory() >> 10;
>  >          System.out.println(heapSizeKB);
>  >          SmallContainer smallContainer = new SmallContainer((long)(heapSizeKB * 0.4));     // 40% heap for live small objects
>  >          MediumContainer mediumContainer = new MediumContainer((long)(heapSizeKB * 0.4));  // 40% heap for live medium objects
>  >          int totalSmall = smallContainer.getTotalObjects();
>  >          int totalMedium = mediumContainer.getTotalObjects();
>  >          int addedSmall = 0;
>  >          int addedMedium = 1; // should not be divided by zero
>  >          while (addedMedium < totalMedium * 10) {
>  >              if (totalSmall / totalMedium > addedSmall / addedMedium) { // keep the ratio of allocated small/medium objects
>  >                  smallContainer.createAndSaveObject();
>  >                  addedSmall ++;
>  >              } else {
>  >                  mediumContainer.createAndAppendObject();
>  >                  addedMedium ++;
>  >              }
>  >              if ((addedSmall + addedMedium) % 50 == 0) {
>  >                  LockSupport.parkNanos(500); // make allocation slower
>  >              }
>  >          }
>  >      }
>  >      static class SmallContainer {
>  >          private final int KB_PER_OBJECT = 64; // 64KB per object
>  >          private final Random RANDOM = new Random();
>  >          private byte[][] smallObjectArray;
>  >          private long totalKB;
>  >          private int totalObjects;
>  >          SmallContainer(long totalKB) {
>  >              this.totalKB = totalKB;
>  >              totalObjects = (int)(totalKB / KB_PER_OBJECT);
>  >              smallObjectArray = new byte[totalObjects][];
>  >          }
>  >          int getTotalObjects() {
>  >              return totalObjects;
>  >          }
>  >          // random insertion (with random deletion)
>  >          void createAndSaveObject() {
>  >              smallObjectArray[RANDOM.nextInt(totalObjects)] = new byte[KB_PER_OBJECT << 10];
>  >          }
>  >      }
>  >      static class MediumContainer {
>  >          private final int KB_PER_OBJECT = 512; // 512KB per object
>  >          private byte[][] mediumObjectArray;
>  >          private int mediumObjectArrayCurrentIndex = 0;
>  >          private long totalKB;
>  >          private int totalObjects;
>  >          MediumContainer(long totalKB) {
>  >              this.totalKB = totalKB;
>  >              totalObjects = (int)(totalKB / KB_PER_OBJECT);
>  >              mediumObjectArray = new byte[totalObjects][];
>  >          }
>  >          int getTotalObjects() {
>  >              return totalObjects;
>  >          }
>  >          void createAndAppendObject() {
>  >              if (mediumObjectArrayCurrentIndex == totalObjects) { // periodic deletion
>  >                  mediumObjectArray = new byte[totalObjects][]; // also delete all medium objects in the old array
>  >                  mediumObjectArrayCurrentIndex = 0;
>  >              } else {
>  >                  mediumObjectArray[mediumObjectArrayCurrentIndex] = new byte[KB_PER_OBJECT << 10];
>  >                  mediumObjectArrayCurrentIndex ++;
>  >              }
>  >          }
>  >      }
>  > }
>  >
>  > To avoid "page cache flush", we made a patch for converting small/medium pages to medium/small pages ahead of time. This patch works well on an application with relatively-stable allocation rate, which has not encountered throughput problem. How do you think of this solution?
>  >
>  > We notice that you are improving the efficiency for map/unmap operations (https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/029936.html). It may be a step for improving the delay caused by "page cache flush". Do you have further plan for eliminating or improving "page cache flush"?
> 
> Yes, and as you might have seen, the latest incarnation of this patchset
> includes asynchronous unmapping, which helps reduce the time for page
> cache flushing. I ran your example program above, with these patches and
> can see ~30% reduction in average page allocation time, and ~60%
> reduction in worst case page allocation time. So, it will be an improvement.
> 
> However, I'd be more than happy to take a look at your patch and see
> what you've done. Making page cache flushing even less expensive is
> something we're interested in going forward.
> 
> cheers,
> Per
> 
>  >
>  > Sincerely,Hao Tang
>  >
> 

From thomas.schatzl at oracle.com  Wed Jun 24 08:03:20 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 24 Jun 2020 10:03:20 +0200
Subject: RFR (M): 8245721: Refactor the TaskTerminator
Message-ID: <bd530d10-a4bb-969f-b142-39ff259c91b8@oracle.com>

Hi all,

   can I have reviews for this refactoring of the (OWST) TaskTerminator 
to make the algorithm more understandable.

The original implementation imho suffers from two issues:

- manual lock() and unlock() of the _blocker synchronization lock 
everywhere, distributed around two separate methods.

- interspersing the actual spinning code somewhere inlined inbetween.

This change tries to hopefully successfully make reasoning about the 
code *much* easier by different separation of these two methods, and 
using scoped locks.

The final structure of the code has been intensively tested to not cause 
a regression in performance, however it made a few "obvious" further 
refactorings undesired due to signficant perf regressions.

I believe I found a good tradeoff here, but I am of course open to 
improvements :) I tried to sketch a few of those ultimately unsuccessful 
attempts in the CR.

CR:
https://bugs.openjdk.java.net/browse/JDK-8245721
Webrev:
http://cr.openjdk.java.net/~tschatzl/8245721/webrev/
Testing:
tier1-5, many many perf rounds, many tier1-X rounds with other patches

Thanks,
   Thomas


From ofirg6 at gmail.com  Wed Jun 24 09:42:58 2020
From: ofirg6 at gmail.com (Ofir Gordon)
Date: Wed, 24 Jun 2020 12:42:58 +0300
Subject: How to run specific part of the SerialGC in a new thread?
Message-ID: <CAC9y=qCXmGkeHZgWwuHwk-ypMEf8iHRBkQmDqpgdHGckL1QweQ@mail.gmail.com>

Hello,

I'm trying to run a specific procedure within the SerialGC in a
separate thread, i.e. replace the call for the procedure with a creation of
a new thread that will run the entire procedure and then return to the main
thread and continue (the main thread should wait to the splitted thread).

What is the correct way to do it? Is there any threads mechanism that the
vm uses in order to split tasks to new threads?
I tried to simply use the pthread library, but when I add calls to pthread
methods (like pthread_create) the compilation fails (with segfault..)

I'll appreciate your help,
Thanks,
Ofir


From stefan.karlsson at oracle.com  Wed Jun 24 09:54:05 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 24 Jun 2020 11:54:05 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <5aca219e-3119-f5e7-de0b-a38bc534baac@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
 <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>
 <5aca219e-3119-f5e7-de0b-a38bc534baac@oracle.com>
Message-ID: <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>

Hi Per,

Good point about the strong_count being statically known. I wasn't 
entirely happy about the type fiddling in the proposed change below, so 
I experimented with alternative implementations. The proposal I current 
have is this:
https://cr.openjdk.java.net/~stefank/8248132/webrev.07/

The patch adds a few utility classes:
- ValueObjBlock stamps out a number of instances:
- ValueObjArray provides an array over those instances

With this we can now stamp out the OopStorage::ParState instances into 
OopStorageSetParState without dynamic allocation and without type casting.

A version without the gtest and with the out-of-bounds check was tested 
in tier1-3.

Thanks,
StefanK

On 2020-06-23 16:29, Per Liden wrote:
> Hi,
>
> On 6/23/20 4:09 PM, Kim Barrett wrote:
>>> On Jun 23, 2020, at 9:09 AM, Per Liden <per.liden at oracle.com> wrote:
>>>
>>> Hi Stefan,
>>>
>>> On 6/23/20 10:10 AM, Stefan Karlsson wrote:
>>>> Hi all,
>>>> Please review this patch to unify handling of all OopStorage 
>>>> instances in root processing.
>>>> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
>>>
>>> I note that the size of the dynamic allocation is always known at 
>>> compile-time. So how about we just avoid it altogether with 
>>> something like this?
>>>
>>> diff --git a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp 
>>> b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>>> --- a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>>> +++ b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>>> @@ -26,16 +26,16 @@
>>> #define SHARE_GC_SHARED_OOPSTORAGESETPARSTATE_HPP
>>>
>>> #include "gc/shared/oopStorageParState.hpp"
>>> +#include "gc/shared/oopStorageSet.hpp"
>>>
>>> template <bool concurrent, bool is_const>
>>> class OopStorageSetStrongParState {
>>> private:
>>> ?? typedef OopStorage::ParState<concurrent, is_const> ParStateType;
>>>
>>> -? ParStateType* _par_states;
>>> +? char _par_states[sizeof(ParStateType) * 
>>> OopStorageSet::strong_count];
>>
>> (Not a review, just a drive-by comment.)
>>
>> This doesn't guarantee proper alignment of _par_states; with
>> _par_states being the only member, it only requires char alignment.
>> (It might happen to work because of vagaries of heap allocators and
>> stack alignment requirements, possibly always on some platforms, but
>> that's not guaranteed.)
>>
>> -? ParStateType* _par_states;
>> +? char _par_states[sizeof(ParStateType) * OopStorageSet::strong_count];
>
> You're right, but I'm thinking ATTRIBUTE_ALIGNED should work here.
>
> cheers,
> Per
>
>
>>
>> Doing that correctly is a bit annoying, which is why C++11 added
>> std::aligned_storage.
>>


From patrick at os.amperecomputing.com  Wed Jun 24 09:55:09 2020
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Wed, 24 Jun 2020 09:55:09 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
Message-ID: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>

Hi

Could I ask for a review of this simple patch which takes a tiny part from the original ticket JDK-8243326 [1]. The reason that I do not want a full backport is, the majority of the patch at jdk/jdk [2] is to clean up the volatile use and may be not very meaningful to 11u, furthermore the context (dependencies on atomic.hpp refactor) is too complicated to generate a clear backport (I tried, ~81 files need to be changed).

The purpose of having this one-line change to 11u is, the two volatile variables in TaskQueuSuper: _bottom, _age and corresponding atomic operations upon, may cause severe cache contention inside GC with larger number of threads, i.e., specified by -XX:ParallelGCThreads=##, adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce the possibility of false-sharing cache contention. I do not need the paddings before _bottom and after _age from the original patch [2], because the instances of TaskQueuSuper are usually (always) allocated in a set of queues, in which they are naturally separated. Please review, thanks.

JBS: https://bugs.openjdk.java.net/browse/JDK-8248214
Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/
Testing: tier1-2 pass with the patch, commercial benchmarks and small C++ test cases (to simulate the data struct and work-stealing algorithm atomics) validated the performance, no regression.

By the way, I am going to request for 8u backport as well once 11u would have it.

[1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of volatile in taskqueue code
[2] https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

Regards
Patrick


From per.liden at oracle.com  Wed Jun 24 10:01:14 2020
From: per.liden at oracle.com (Per Liden)
Date: Wed, 24 Jun 2020 12:01:14 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
 <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>
 <5aca219e-3119-f5e7-de0b-a38bc534baac@oracle.com>
 <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>
Message-ID: <ed6084ee-b6ec-043d-14e7-45da9b4bbf72@oracle.com>

Hi Stefan,

On 6/24/20 11:54 AM, Stefan Karlsson wrote:
> Hi Per,
> 
> Good point about the strong_count being statically known. I wasn't 
> entirely happy about the type fiddling in the proposed change below, so 
> I experimented with alternative implementations. The proposal I current 
> have is this:
> https://cr.openjdk.java.net/~stefank/8248132/webrev.07/
> 
> The patch adds a few utility classes:
> - ValueObjBlock stamps out a number of instances:
> - ValueObjArray provides an array over those instances
> 
> With this we can now stamp out the OopStorage::ParState instances into 
> OopStorageSetParState without dynamic allocation and without type casting.

Nice! Looks good!

cheers,
Per

> 
> A version without the gtest and with the out-of-bounds check was tested 
> in tier1-3.
> 
> Thanks,
> StefanK
> 
> On 2020-06-23 16:29, Per Liden wrote:
>> Hi,
>>
>> On 6/23/20 4:09 PM, Kim Barrett wrote:
>>>> On Jun 23, 2020, at 9:09 AM, Per Liden <per.liden at oracle.com> wrote:
>>>>
>>>> Hi Stefan,
>>>>
>>>> On 6/23/20 10:10 AM, Stefan Karlsson wrote:
>>>>> Hi all,
>>>>> Please review this patch to unify handling of all OopStorage 
>>>>> instances in root processing.
>>>>> https://cr.openjdk.java.net/~stefank/8248132/webrev.01/
>>>>
>>>> I note that the size of the dynamic allocation is always known at 
>>>> compile-time. So how about we just avoid it altogether with 
>>>> something like this?
>>>>
>>>> diff --git a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp 
>>>> b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>>>> --- a/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>>>> +++ b/src/hotspot/share/gc/shared/oopStorageSetParState.hpp
>>>> @@ -26,16 +26,16 @@
>>>> #define SHARE_GC_SHARED_OOPSTORAGESETPARSTATE_HPP
>>>>
>>>> #include "gc/shared/oopStorageParState.hpp"
>>>> +#include "gc/shared/oopStorageSet.hpp"
>>>>
>>>> template <bool concurrent, bool is_const>
>>>> class OopStorageSetStrongParState {
>>>> private:
>>>> ?? typedef OopStorage::ParState<concurrent, is_const> ParStateType;
>>>>
>>>> -? ParStateType* _par_states;
>>>> +? char _par_states[sizeof(ParStateType) * 
>>>> OopStorageSet::strong_count];
>>>
>>> (Not a review, just a drive-by comment.)
>>>
>>> This doesn't guarantee proper alignment of _par_states; with
>>> _par_states being the only member, it only requires char alignment.
>>> (It might happen to work because of vagaries of heap allocators and
>>> stack alignment requirements, possibly always on some platforms, but
>>> that's not guaranteed.)
>>>
>>> -? ParStateType* _par_states;
>>> +? char _par_states[sizeof(ParStateType) * OopStorageSet::strong_count];
>>
>> You're right, but I'm thinking ATTRIBUTE_ALIGNED should work here.
>>
>> cheers,
>> Per
>>
>>
>>>
>>> Doing that correctly is a bit annoying, which is why C++11 added
>>> std::aligned_storage.
>>>
> 


From kim.barrett at oracle.com  Wed Jun 24 10:04:02 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 24 Jun 2020 06:04:02 -0400
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
 <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>
 <5aca219e-3119-f5e7-de0b-a38bc534baac@oracle.com>
 <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>
Message-ID: <16307D62-BA3B-4E7A-99AC-19D7D310AF53@oracle.com>

> On Jun 24, 2020, at 5:54 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> Hi Per,
> 
> Good point about the strong_count being statically known. I wasn't entirely happy about the type fiddling in the proposed change below, so I experimented with alternative implementations. The proposal I current have is this:
> https://cr.openjdk.java.net/~stefank/8248132/webrev.07/

Looks good.


From stefan.karlsson at oracle.com  Wed Jun 24 10:04:26 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 24 Jun 2020 12:04:26 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <16307D62-BA3B-4E7A-99AC-19D7D310AF53@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
 <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>
 <5aca219e-3119-f5e7-de0b-a38bc534baac@oracle.com>
 <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>
 <16307D62-BA3B-4E7A-99AC-19D7D310AF53@oracle.com>
Message-ID: <720edef4-2f20-aadc-b95c-5b3eb14295ba@oracle.com>

Thanks, Kim.

StefanK

On 2020-06-24 12:04, Kim Barrett wrote:
>> On Jun 24, 2020, at 5:54 AM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>>
>> Hi Per,
>>
>> Good point about the strong_count being statically known. I wasn't entirely happy about the type fiddling in the proposed change below, so I experimented with alternative implementations. The proposal I current have is this:
>> https://cr.openjdk.java.net/~stefank/8248132/webrev.07/
> Looks good.
>


From thomas.schatzl at oracle.com  Wed Jun 24 10:05:51 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 24 Jun 2020 12:05:51 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
 <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>
 <5aca219e-3119-f5e7-de0b-a38bc534baac@oracle.com>
 <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>
Message-ID: <f04b43b0-4b29-04b5-6dd5-47a1b7f77fbd@oracle.com>

Hi,

On 24.06.20 11:54, Stefan Karlsson wrote:
> Hi Per,
> 
> Good point about the strong_count being statically known. I wasn't 
> entirely happy about the type fiddling in the proposed change below, so 
> I experimented with alternative implementations. The proposal I current 
> have is this:
> https://cr.openjdk.java.net/~stefank/8248132/webrev.07/
> 
> The patch adds a few utility classes:
> - ValueObjBlock stamps out a number of instances:
> - ValueObjArray provides an array over those instances
> 
> With this we can now stamp out the OopStorage::ParState instances into 
> OopStorageSetParState without dynamic allocation and without type casting.
> 
> A version without the gtest and with the out-of-bounds check was tested 
> in tier1-3.
> 

   looks good.

Thomas


From stefan.karlsson at oracle.com  Wed Jun 24 10:08:36 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Wed, 24 Jun 2020 12:08:36 +0200
Subject: RFR: 8248132: ZGC: Unify handling of all OopStorage instances in
 root processing
In-Reply-To: <f04b43b0-4b29-04b5-6dd5-47a1b7f77fbd@oracle.com>
References: <ac7edce7-4f05-5539-df6e-8203e7c86eb6@oracle.com>
 <26bd783c-cffa-c292-3ce6-c31d866c422f@oracle.com>
 <B509F750-0F12-4E94-B139-FEF25B11CE8D@oracle.com>
 <5aca219e-3119-f5e7-de0b-a38bc534baac@oracle.com>
 <50f745de-9868-e5b5-e8ad-6770c721aa96@oracle.com>
 <f04b43b0-4b29-04b5-6dd5-47a1b7f77fbd@oracle.com>
Message-ID: <c724f253-594a-1221-056d-41c9fd5638a8@oracle.com>

Thanks, Thomas.

StefanK

On 2020-06-24 12:05, Thomas Schatzl wrote:
> Hi,
>
> On 24.06.20 11:54, Stefan Karlsson wrote:
>> Hi Per,
>>
>> Good point about the strong_count being statically known. I wasn't 
>> entirely happy about the type fiddling in the proposed change below, 
>> so I experimented with alternative implementations. The proposal I 
>> current have is this:
>> https://cr.openjdk.java.net/~stefank/8248132/webrev.07/
>>
>> The patch adds a few utility classes:
>> - ValueObjBlock stamps out a number of instances:
>> - ValueObjArray provides an array over those instances
>>
>> With this we can now stamp out the OopStorage::ParState instances 
>> into OopStorageSetParState without dynamic allocation and without 
>> type casting.
>>
>> A version without the gtest and with the out-of-bounds check was 
>> tested in tier1-3.
>>
>
> ? looks good.
>
> Thomas
>


From thomas.schatzl at oracle.com  Wed Jun 24 11:46:37 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Wed, 24 Jun 2020 13:46:37 +0200
Subject: How to run specific part of the SerialGC in a new thread?
In-Reply-To: <CAC9y=qCXmGkeHZgWwuHwk-ypMEf8iHRBkQmDqpgdHGckL1QweQ@mail.gmail.com>
References: <CAC9y=qCXmGkeHZgWwuHwk-ypMEf8iHRBkQmDqpgdHGckL1QweQ@mail.gmail.com>
Message-ID: <afabcff3-8741-4c32-e4f8-1a560f92f5cc@oracle.com>

Hi,

On 24.06.20 11:42, Ofir Gordon wrote:
> Hello,
> 
> I'm trying to run a specific procedure within the SerialGC in a
> separate thread, i.e. replace the call for the procedure with a creation of
> a new thread that will run the entire procedure and then return to the main
> thread and continue (the main thread should wait to the splitted thread).
> 
> What is the correct way to do it? Is there any threads mechanism that the
> vm uses in order to split tasks to new threads?
> I tried to simply use the pthread library, but when I add calls to pthread
> methods (like pthread_create) the compilation fails (with segfault..)
> 
> I'll appreciate your help,
> Thanks,
> Ofir
> 

   probably the simplest way is to add a WorkGang with one thread and 
use the run_task() method.

There are a lot of uses of it in the code.

   Thomas


From zgu at redhat.com  Wed Jun 24 12:18:57 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Wed, 24 Jun 2020 08:18:57 -0400
Subject: [16] RFR 8248227: Shenandoah: Refactor Shenandoah::heap() to match
 other GCs
Message-ID: <d82940ba-d34c-ec3d-3103-b0c44b195183@redhat.com>

Please review this small patch that refactors Shenandoah::heap() to 
match other GCs.

Bug: https://bugs.openjdk.java.net/browse/JDK-8248227
Webrev: http://cr.openjdk.java.net/~zgu/JDK-8248227/webev.00/index.html


Test:
   hotspot_gc_shenandoah

Thanks,

-Zhengyu


From kdnilsen at amazon.com  Wed Jun 24 13:54:13 2020
From: kdnilsen at amazon.com (Nilsen, Kelvin)
Date: Wed, 24 Jun 2020 13:54:13 +0000
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier (aarch64)
Message-ID: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>

See http://cr.openjdk.java.net/~kdnilsen/JDK-8232782/webrev.00/

This patch addresses the problem described in https://bugs.openjdk.java.net/browse/JDK-8232782

The implementation mimics the behavior of the recently revised x86 implementation of cmpxchg_oop with slight refinements:

X86 version:
Step 1: Try CAS
Step 2: if CAS fails, check if original memory holds equivalent from-space pointer
Step 3: Use CAS to overwrite memory with equivalent to-space pointer
Step 4: Try CAS again
Step 5: Return boolean result to indicate success or failure

AARCH64 version:
Step 1: Try CAS
Step 2: if CAS fails, check if original memory holds equivalent from-space pointer
Step 3 (differs): Do not overwrite memory with equivalent to-space pointer,  Instead, run the original CAS request with from-space pointer as the "expected" value. If this succeeds, we're done.  If this fails, go back to step 1 and try that again.

Step 5: Return boolean result to indicate success or failure

This patch satisfies tier1, tier2, and hotspot_gc_shenandoah regression tests on Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1023-aws aarch64).  I have also run an "extreme" garbage collection workload for 20 minutes without problem.

Is this ok to merge?

(This is a repost of a message that was posted and bounced yesterday.)


From aph at redhat.com  Wed Jun 24 14:29:04 2020
From: aph at redhat.com (Andrew Haley)
Date: Wed, 24 Jun 2020 15:29:04 +0100
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
Message-ID: <34978650-be69-1e01-e11e-608f205338ff@redhat.com>

On 24/06/2020 14:54, Nilsen, Kelvin wrote:
> Is this ok to merge?

One thing:

Some CPUs, in particular those based on Neoverse N1, can perform very
badly when using ldxr/stxr. For that reason, all code doing CAS

I can't see any reason why your code needs to use ldxr/stxr. Is there
any?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From rkennke at redhat.com  Wed Jun 24 14:48:33 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jun 2020 16:48:33 +0200
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
 <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
Message-ID: <43f1a7648b5861d9a7ff16622ccea4a5e6164c3c.camel@redhat.com>

On Wed, 2020-06-24 at 15:29 +0100, Andrew Haley wrote:
> On 24/06/2020 14:54, Nilsen, Kelvin wrote:
> > Is this ok to merge?
> 
> One thing:
> 
> Some CPUs, in particular those based on Neoverse N1, can perform very
> badly when using ldxr/stxr. For that reason, all code doing CAS
> 
> I can't see any reason why your code needs to use ldxr/stxr. Is there
> any?

As far as I know, Shenandoah's AArch64-CAS-implementation always did it
that way (don't remember why). If regular CAS is generally better, then
we should go for it.

Roman


From rkennke at redhat.com  Wed Jun 24 15:10:35 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jun 2020 17:10:35 +0200
Subject: RFR (S) 8247845: Shenandoah: refactor TLAB/GCLAB retirement code
In-Reply-To: <a14f55bf-854f-6ab0-66ce-afbcdec1d9ee@redhat.com>
References: <a14f55bf-854f-6ab0-66ce-afbcdec1d9ee@redhat.com>
Message-ID: <d14dbec78345ca9f2541422d11d9929c522d65a4.camel@redhat.com>

I forgot to review it, sorry for the late reply.

The patch looks good to me.

Thank you,
Roman

On Thu, 2020-06-18 at 18:30 +0200, Aleksey Shipilev wrote:
> RFE:
>   https://bugs.openjdk.java.net/browse/JDK-8247845
> 
> Fix;
>   http://cr.openjdk.java.net/~shade/8247845/webrev.01/
> 
> Current TLAB/GCLAB retirement code is all over the place. Sometimes
> we retire GCLABs twice.
> Sometimes we resize TLABs twice. This hopefully makes the things more
> clear by lifting things out of
> CollectedHeap::ensure_parsability and specializing it for Shenandoah
> use cases.
> 
> Testing: hotspot_gc_shenandoah {fastdebug,release}; tier{1,2} with
> Shenandoah; benchmarks (running)
> 


From aph at redhat.com  Wed Jun 24 15:22:40 2020
From: aph at redhat.com (Andrew Haley)
Date: Wed, 24 Jun 2020 16:22:40 +0100
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
 <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
Message-ID: <c8c5e393-188c-ef17-e2e8-b651c81d7cf9@redhat.com>

On 24/06/2020 15:29, Andrew Haley wrote:
> On 24/06/2020 14:54, Nilsen, Kelvin wrote:
>> Is this ok to merge?
> 
> One thing:
> 
> Some CPUs, in particular those based on Neoverse N1, can perform very
> badly when using ldxr/stxr. For that reason, all code doing CAS
> 
> I can't see any reason why your code needs to use ldxr/stxr. Is there
> any?

I should have said, but didn't: please use MacroAssembler::cmpxchg()

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Wed Jun 24 15:22:39 2020
From: aph at redhat.com (Andrew Haley)
Date: Wed, 24 Jun 2020 16:22:39 +0100
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <43f1a7648b5861d9a7ff16622ccea4a5e6164c3c.camel@redhat.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
 <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
 <43f1a7648b5861d9a7ff16622ccea4a5e6164c3c.camel@redhat.com>
Message-ID: <e5692fd1-c127-bf12-704b-ebe1853b5d36@redhat.com>

On 24/06/2020 15:48, Roman Kennke wrote:
> On Wed, 2020-06-24 at 15:29 +0100, Andrew Haley wrote:
>> On 24/06/2020 14:54, Nilsen, Kelvin wrote:
>>> Is this ok to merge?
>>
>> One thing:
>>
>> Some CPUs, in particular those based on Neoverse N1, can perform very
>> badly when using ldxr/stxr. For that reason, all code doing CAS
>>
>> I can't see any reason why your code needs to use ldxr/stxr. Is there
>> any?
> 
> As far as I know, Shenandoah's AArch64-CAS-implementation always did it
> that way (don't remember why). If regular CAS is generally better, then
> we should go for it.

Does this algorithm need a full barrier even when CAS fails?

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From rkennke at redhat.com  Wed Jun 24 15:28:06 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Wed, 24 Jun 2020 17:28:06 +0200
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <e5692fd1-c127-bf12-704b-ebe1853b5d36@redhat.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
 <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
 <43f1a7648b5861d9a7ff16622ccea4a5e6164c3c.camel@redhat.com>
 <e5692fd1-c127-bf12-704b-ebe1853b5d36@redhat.com>
Message-ID: <f555bdde6e573fe98b00ee5469921cfabd276aab.camel@redhat.com>

On Wed, 2020-06-24 at 16:22 +0100, Andrew Haley wrote:
> On 24/06/2020 15:48, Roman Kennke wrote:
> > On Wed, 2020-06-24 at 15:29 +0100, Andrew Haley wrote:
> > > On 24/06/2020 14:54, Nilsen, Kelvin wrote:
> > > > Is this ok to merge?
> > > 
> > > One thing:
> > > 
> > > Some CPUs, in particular those based on Neoverse N1, can perform
> > > very
> > > badly when using ldxr/stxr. For that reason, all code doing CAS
> > > 
> > > I can't see any reason why your code needs to use ldxr/stxr. Is
> > > there
> > > any?
> > 
> > As far as I know, Shenandoah's AArch64-CAS-implementation always
> > did it
> > that way (don't remember why). If regular CAS is generally better,
> > then
> > we should go for it.
> 
> Does this algorithm need a full barrier even when CAS fails?

We need to do extra work *only* when CAS fails. We need to catch false
negatives -- when the compare-value is to-space (that's guaranteed) and
the value in memory is from-space copy of the same object.

Roman


From ivan.walulya at oracle.com  Wed Jun 24 21:20:17 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Wed, 24 Jun 2020 23:20:17 +0200
Subject: RFR (L): 8244603 and 8238858: Improve young gen sizing
In-Reply-To: <37a95d37-5253-4dff-ff46-3b82b4e4bcdf@oracle.com>
References: <d5c748c8-f10d-65ff-b1fb-50c0ab123384@oracle.com>
 <5da7c2e2-2d36-de11-d0b7-91cdf6fdc077@oracle.com>
 <37a95d37-5253-4dff-ff46-3b82b4e4bcdf@oracle.com>
Message-ID: <1FDE4E65-1408-429B-ABD4-15B5DC37FC46@oracle.com>

Looks good to me!

//Ivan


> On 15 Jun 2020, at 11:23, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Stefan,
> 
> On 11.06.20 09:51, stefan.johansson at oracle.com wrote:
>> Hi Thomas,
>> Sorry for not getting to this sooner.
>> On 2020-05-19 15:37, Thomas Schatzl wrote:
>>> Hi all,
>>> 
>>>    can I have reviews for this change that improves young gen sizing a lot to prepare for heap shrinking during young gc (JDK-8238687) ;)
> [...]>> Actually, in the future, when shrinking is implemented (JDK-8238687),
>>> these may be more severe (in some benchmarks, actual gc usage is still <2%). I will likely try to balance that with decreasing default GCTimeRatio value in the future.
>>> 
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8244603
>>> https://bugs.openjdk.java.net/browse/JDK-8238858
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8244603/webrev/
>> Very nice change Thomas, really helpful with all comments.
>> As I've mentioned to you offline I think we can re-structure the code a bit, to separate the updating of young length bounds from the returning of values. Here's a suggestion on how to do that:
>> http://cr.openjdk.java.net/~sjohanss/8244603/rev-1/
>> src/hotspot/share/gc/g1/g1Analytics.cpp
>> ---
>>  226 double G1Analytics::predict_alloc_rate_ms() const {
>>  227   if (!enough_samples_available(_alloc_rate_ms_seq)) {
>>  228     return predict_zero_bounded(_alloc_rate_ms_seq);
>>  229   } else {
>>  230     return 0.0;
>>  231   }
>>  232 }
>> As discussed, on line 227 the ! should be removed.
>> ---
>> Apart from this I think it is all good. There are a few places in g1Policy.cpp where local variables could be either merged or skipped, but I think they add to the overall ease of understanding.
> 
> Applied all your comments.
> 
> New webrev:
> http://cr.openjdk.java.net/~tschatzl/8244603/webrev.0_to_1 (diff)
> http://cr.openjdk.java.net/~tschatzl/8244603/webrev.1 (full)
> 
> Thanks,
>  Thomas


From thomas.schatzl at oracle.com  Thu Jun 25 07:35:01 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 09:35:01 +0200
Subject: RFR (S): 8165501: Serial, Parallel and G1 should track the time since
 last gc for millis_since_last_gc() with full precision
Message-ID: <71723e44-d2f0-8c15-7a99-054c0dd870bc@oracle.com>

Hi all,

   can I get reviews for this small change that unifies code a bit 
between Serial, Parallel and G1 with regards to tracking the time stamp 
for CollectedHeap::millis_since_last_gc?

In particular, it names the members uniformly, and removes an 
unnecessary division by internally storing raw os::JavaTimeNanos instead 
of some intermediate result.

Other collectors are not affected because they calculate 
millis_since_last_gc from an internal clock source.

There is no functionality change

CR:
https://bugs.openjdk.java.net/browse/JDK-8165501
Webrev:
http://cr.openjdk.java.net/~tschatzl/8165501
Testing:
tier1-5 with 8243974, 8248221; see also 8248221 for manual testing info

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Thu Jun 25 07:39:09 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 09:39:09 +0200
Subject: RFR (S): 8243974: Move G1CollectedHeap::millis_since_last_gc support
 from G1Policy
Message-ID: <d64bb054-1038-08d9-c14c-6a7f08069885@oracle.com>

Hi all,

   please review the following change that moves 
G1CollectedHeap::millis_since_last_gc support away from G1Policy to 
G1CollectedHeap. That functionality has not much to do with policy 
decisions at all.

Fwiw, found the issue 8248221 when fixing this. Will send out that one soon.

Based on 8165501.

CR:
https://bugs.openjdk.java.net/browse/JDK-8243974
Webrev:
http://cr.openjdk.java.net/~tschatzl/8243974/webrev/
Testing:
tier1-5 with 8165501, 8248221; see also 8248221 for manual testing info

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Thu Jun 25 08:04:09 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 10:04:09 +0200
Subject: RFR (S): 8248221: G1: millis_since_last_gc updated at wrong time
Message-ID: <ca98725a-fdb3-2115-a966-a107746ed033@oracle.com>

Hi all,

   can I have reviews for this change that fixes the location at which 
the millis_since_last_gc timestamp is updated.

The spec of the only user in sun.rmi.transport.GC.maxObjectInspectionAge 
says:

* Returns the maximum <em>object-inspection age</em>, which is the number
* of real-time milliseconds that have elapsed since the
* least-recently-inspected heap object was last inspected by the garbage
* collector.

Currently we do that update only at young gc, which is wrong. We should 
do that if/when a complete liveness analysis cycle has finished instead, 
i.e. end of marking/end of full gc.

I did some testing using the TestMillisSinceLastGC.java jtreg test in 
http://cr.openjdk.java.net/~tschatzl/8248221/webrev.test/. As expected, 
it failed for G1 but all other GCs tested there pass (Epsilon will 
probably also fail, because it never does any gc so I did not even try).

However the test intrinsically depends on timing i.e. the duration 
between some GC and checking the CollectedHeap::millis_since_last_gc() 
result, so I did not add it to this webrev. While in my ci testing it 
did not fail, I deemed it too risky to add.
I also eyeballed that the output is still reasonable (i.e. no unit 
problem introduced by 8165501) using its log output for all affected 
collectors.

Depends on 8243974.

CR:
https://bugs.openjdk.java.net/browse/JDK-8248221
Webrev:
http://cr.openjdk.java.net/~tschatzl/8248221/webrev/
Testing:
tier1-5 with 8243974 and 8165501, manual testing as described above.

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Thu Jun 25 08:56:12 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 10:56:12 +0200
Subject: RFR (M/L): 8247928: Refactor G1ConcurrentMarkThread for mark abort
 (JDK-8240556)
Message-ID: <78ac3032-1f8d-6724-37ec-712d95ecb518@oracle.com>

Hi all,

   can I have reviews for this change that refactors the 
G1ConcurrentMarkThread class to prepare it better for mark abort 
(JDK-8240556).

So the idea in the latter is to abort the concurrent cycle if after 
concurrent start gc we find that old gen occupancy actually went down 
below the IHOP again due to e.g. eager reclaim. To do that, G1 needs to 
scrub any marks on the next bitmap during the gc pause.
The current, original change just performs this work in the pause, which 
is very slow, so the idea is to do the bitmap cleaning concurrently 
instead. The problem is that G1ConcurrentMarkThread is a mess and you 
can't easily "jump" to the end of the current marking.

Additionally, the code was worth refactoring without that requirement 
anyway ;)

This change refactors the code so that it is much easier to do add a 
second path through the concurrent cycle.

Overall, there are two options to do that:
1) provide an explicit state machine for the concurrent marking so that 
you can jump to the end easily.
2) provide building blocks that can be easily put together to implement 
the second path.

While 1) works (there is a sample POC at 
http://cr.openjdk.java.net/~tschatzl/8247928/webrev.sm/), I found that 
it is much more code and less understandable than just building two 
paths through the marking cycle as in the "abort marking" case we only 
need to do the very tail end of the regular full marking cycle, so I 
propose this option for review.

This is refactoring (almost) only, the additional path should be added 
with JDK-8240556. Almost because I changed two things:
- the concurrent mark (control) thread does not do any marking any more 
for a long time (long long ago it did root scanning/marking), so I 
removed the _vtime_mark_accum accumulator.
- the "Concurrent Mark" finish message is now only printed at the end of 
all marking, not every iteration. First, the restart case is very rare 
so probably anyone parsing it will not handle this case correctly, and 
the contents (times) of that finish message are confusing, which means 
most likely anyone handling it will likely do the wrong thing.
The "Concurrent Mark Restart for Mark Overflow" remains to indicate a 
restart.

Based on JDK-8248221 also out for review.

CR:
https://bugs.openjdk.java.net/browse/JDK-8247928
Webrev:
http://cr.openjdk.java.net/~tschatzl/8247928/webrev/
Testing:
tier1-5, a few local jtreg runs of the gc directory

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Thu Jun 25 09:40:24 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 11:40:24 +0200
Subject: [15, testbug]: 8248306:
 gc/stress/gclocker/TestExcessGCLockerCollections.java does not compile
Message-ID: <5a57528d-32f3-9dd3-70d9-3afa59a97f99@oracle.com>

Hi all,

   can I have reviews for this testbug?

Since JDK-8244010: Simplify usages of 
ProcessTools.createJavaProcessBuilder in our tests the 
gc/stress/gclocker/TestExcessGCLockerCollections.java test does not 
compile any more because its use of 
ProcessTools.createJavaProcessBuilder has not been updated.

I simply replaced the ProcessTools.createJavaProcessBuilder() call by 
ProcessTools.executeTestJvm which according to the discussion for 
8244010 does the right thing.

I would like to fix this in 15, because 8244010 is in 15.

CR:
https://bugs.openjdk.java.net/browse/JDK-8248306
Webrev:
http://cr.openjdk.java.net/~tschatzl/8248306/webrev/
Testing:
local run of the test

Thanks,
   Thomas


From ivan.walulya at oracle.com  Thu Jun 25 09:47:38 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Thu, 25 Jun 2020 11:47:38 +0200
Subject: RFR (S): 8165501: Serial, Parallel and G1 should track the time
 since last gc for millis_since_last_gc() with full precision
In-Reply-To: <71723e44-d2f0-8c15-7a99-054c0dd870bc@oracle.com>
References: <71723e44-d2f0-8c15-7a99-054c0dd870bc@oracle.com>
Message-ID: <CD2871B2-AC54-4CE9-B22B-D3D1981E7AE3@oracle.com>

Hi Thomas, 

A few comments below,

src/hotspot/share/gc/parallel/psParallelCompact.cpp
-  jlong now = os::javaTimeNanos() / NANOSECS_PER_MILLISEC;
-  jlong ret_val = now - _time_of_last_gc;
+  jlong now = os::javaTimeNanos();
+  jlong ret_val = (now - _time_of_last_gc_ns) / NANOSECS_PER_MILLISEC;
 jlong now  => jlong now_ns  i see you do this for most of the variable names that refer to time in Nanos
src/hotspot/share/gc/g1/g1Policy.hpp
+  jlong time_of_last_gc() { return _time_of_last_gc_ns; }
same as previous comment, better to maintain the _ns
src/hotspot/share/gc/shared/generation.hpp
  // Time (in ms) when we were last collected or now if a collection is
  // in progress.
  virtual jlong time_of_last_gc(jlong now) {
    // Both _time_of_last_gc and now are set using a time source
    // that guarantees monotonically non-decreasing values provided
    // the underlying platform provides such a source. So we still
    // have to guard against non-monotonicity.
    NOT_PRODUCT(
      if (now < _time_of_last_gc_ns) {
        log_warning(gc)("time warp: " JLONG_FORMAT " to " JLONG_FORMAT, _time_of_last_gc_ns, now);
      }
    )
    return _time_of_last_gc_ns;
  }
  virtual void update_time_of_last_gc(jlong now)  {
    _time_of_last_gc_ns = now;
  }
comments should be edited to reflect the changes, in addition to _ns


//Ivan

> On 25 Jun 2020, at 09:35, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
>  can I get reviews for this small change that unifies code a bit between Serial, Parallel and G1 with regards to tracking the time stamp for CollectedHeap::millis_since_last_gc?
> 
> In particular, it names the members uniformly, and removes an unnecessary division by internally storing raw os::JavaTimeNanos instead of some intermediate result.
> 
> Other collectors are not affected because they calculate millis_since_last_gc from an internal clock source.
> 
> There is no functionality change
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8165501
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8165501
> Testing:
> tier1-5 with 8243974, 8248221; see also 8248221 for manual testing info
> 
> Thanks,
>  Thomas


From thomas.schatzl at oracle.com  Thu Jun 25 09:56:22 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 11:56:22 +0200
Subject: RFR (M): 8247819: G1: Process strong OopStorage entries in parallel
Message-ID: <0d929030-5be0-c9f7-54af-ffa87f7c39c9@oracle.com>

Hi all,

   can I have reviews for this follow-up change to JDK-8248132, adding 
parallel OopStorage strong root processing for G1?

The main difference to other collectors is that G1 has some 
per-OopStorage timing, so a slightly different approach in iterating 
over the OopStorages has been taken. Also, this messes up G1GCPhaseTimes 
a bit (more), but I see fixing that, as it has been a mess before, a 
separate CR.

Also since the internal names of OopStorage (e.g. "VM global") are now 
used in some log messages, I upper-cased them (ie. "VM Global") to match 
other, existing log messages.

This work is based on a POC from Erik ?sterlund, crediting him for that.

CR:
https://bugs.openjdk.java.net/browse/JDK-8247819
Webrev:
http://cr.openjdk.java.net/~tschatzl/8247819/webrev/
Testing:
tier1-5

Thanks,
   Thomas


From stefan.karlsson at oracle.com  Thu Jun 25 10:37:34 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 25 Jun 2020 12:37:34 +0200
Subject: RFR (M): 8247819: G1: Process strong OopStorage entries in
 parallel
In-Reply-To: <0d929030-5be0-c9f7-54af-ffa87f7c39c9@oracle.com>
References: <0d929030-5be0-c9f7-54af-ffa87f7c39c9@oracle.com>
Message-ID: <c6658d1b-5dc8-3ede-be7e-16c79879fbf5@oracle.com>

Hi Thomas,

This isn't needed after we rewrote the OopStorageSetParState:

+// Needed by _oop_storage_set_strong_par_state as the definition is in the
+// .inline.hpp file.
+G1RootProcessor::~G1RootProcessor() {} --- This doesn't seem to be 
used: + + template <typename Closure> + static void 
strong_oops_do(Closure* cl); }; --- Just a suggestion to lower the 
noise: + G1GCPhaseTimes::GCParPhases phase = 
G1GCPhaseTimes::GCParPhases(G1GCPhaseTimes::StrongOopStorageSetRoots + 
i); could be changed to: + G1GCPhaseTimes::GCParPhases 
phase(G1GCPhaseTimes::StrongOopStorageSetRoots + i);


Thanks,
StefanK

On 2020-06-25 11:56, Thomas Schatzl wrote:
> Hi all,
>
> ? can I have reviews for this follow-up change to JDK-8248132, adding 
> parallel OopStorage strong root processing for G1?
>
> The main difference to other collectors is that G1 has some 
> per-OopStorage timing, so a slightly different approach in iterating 
> over the OopStorages has been taken. Also, this messes up 
> G1GCPhaseTimes a bit (more), but I see fixing that, as it has been a 
> mess before, a separate CR.
>
> Also since the internal names of OopStorage (e.g. "VM global") are now 
> used in some log messages, I upper-cased them (ie. "VM Global") to 
> match other, existing log messages.
>
> This work is based on a POC from Erik ?sterlund, crediting him for that.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8247819
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8247819/webrev/
> Testing:
> tier1-5
>
> Thanks,
> ? Thomas


From stefan.karlsson at oracle.com  Thu Jun 25 11:28:09 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 25 Jun 2020 13:28:09 +0200
Subject: RFR (M): 8247819: G1: Process strong OopStorage entries in
 parallel
In-Reply-To: <c6658d1b-5dc8-3ede-be7e-16c79879fbf5@oracle.com>
References: <0d929030-5be0-c9f7-54af-ffa87f7c39c9@oracle.com>
 <c6658d1b-5dc8-3ede-be7e-16c79879fbf5@oracle.com>
Message-ID: <79dac1ee-0217-bd18-2ccd-8611693d88a7@oracle.com>

The formatting looked weird. I'll try again:

Hi Thomas,

This isn't needed after we rewrote the OopStorageSetParState:

+// Needed by _oop_storage_set_strong_par_state as the definition is in the
+// .inline.hpp file.
+G1RootProcessor::~G1RootProcessor() {}

---
This doesn't seem to be used:
 ?+
 ?+ template <typename Closure>
 ?+ static void strong_oops_do(Closure* cl); };

---
Just a suggestion to lower the noise:
+ G1GCPhaseTimes::GCParPhases phase = 
G1GCPhaseTimes::GCParPhases(G1GCPhaseTimes::StrongOopStorageSetRoots + i);

could be changed to:
+ G1GCPhaseTimes::GCParPhases 
phase(G1GCPhaseTimes::StrongOopStorageSetRoots + i);

Thanks,
StefanK

On 2020-06-25 12:37, Stefan Karlsson wrote:
> Hi Thomas,
>
> This isn't needed after we rewrote the OopStorageSetParState:
>
> +// Needed by _oop_storage_set_strong_par_state as the definition is 
> in the
> +// .inline.hpp file.
> +G1RootProcessor::~G1RootProcessor() {} --- This doesn't seem to be 
> used: + + template <typename Closure> + static void 
> strong_oops_do(Closure* cl); }; --- Just a suggestion to lower the 
> noise: + G1GCPhaseTimes::GCParPhases phase = 
> G1GCPhaseTimes::GCParPhases(G1GCPhaseTimes::StrongOopStorageSetRoots + 
> i); could be changed to: + G1GCPhaseTimes::GCParPhases 
> phase(G1GCPhaseTimes::StrongOopStorageSetRoots + i);
>
>
> Thanks,
> StefanK 


From thomas.schatzl at oracle.com  Thu Jun 25 11:44:42 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 13:44:42 +0200
Subject: RFR (S): 8165501: Serial, Parallel and G1 should track the time
 since last gc for millis_since_last_gc() with full precision
In-Reply-To: <CD2871B2-AC54-4CE9-B22B-D3D1981E7AE3@oracle.com>
References: <71723e44-d2f0-8c15-7a99-054c0dd870bc@oracle.com>
 <CD2871B2-AC54-4CE9-B22B-D3D1981E7AE3@oracle.com>
Message-ID: <2cf64df3-4bbb-b346-3b56-415479064c7f@oracle.com>

Hi Ivan,

   thanks for your review!

On 25.06.20 11:47, Ivan Walulya wrote:
> Hi Thomas,
> 
> A few comments below,
> 
> src/hotspot/share/gc/parallel/psParallelCompact.cpp
> 
> -  jlong now = os::javaTimeNanos() / NANOSECS_PER_MILLISEC;
> -  jlong ret_val = now - _time_of_last_gc;
> +  jlong now = os::javaTimeNanos();
> +  jlong ret_val = (now - _time_of_last_gc_ns) / NANOSECS_PER_MILLISEC;
> 
> |jlong now|? => |jlong now_ns|? i see you do this for most of the 
> variable names that refer to time in 
> Nanossrc/hotspot/share/gc/g1/g1Policy.hpp
> 
> +  jlong time_of_last_gc() { return _time_of_last_gc_ns; }
> 
> same as previous comment, better to maintain the 
> _nssrc/hotspot/share/gc/shared/generation.hpp
> 
>    // Time (in ms) when we were last collected or now if a collection is
>    // in progress.
>    virtual jlong time_of_last_gc(jlong now) {
>      // Both _time_of_last_gc and now are set using a time source
>      // that guarantees monotonically non-decreasing values provided
>      // the underlying platform provides such a source. So we still
>      // have to guard against non-monotonicity.
>      NOT_PRODUCT(
>        if (now < _time_of_last_gc_ns) {
>          log_warning(gc)("time warp: " JLONG_FORMAT " to " JLONG_FORMAT, _time_of_last_gc_ns, now);
>        }
>      )
>      return _time_of_last_gc_ns;
>    }   virtual void update_time_of_last_gc(jlong now)  {
>      _time_of_last_gc_ns = now;
>    }
> 
> comments should be edited to reflect the changes, in addition to _ns
> 

I think I caught them all in

http://cr.openjdk.java.net/~tschatzl/8165501/webrev.1

no incremental one since almost everything changed anyway, doing another 
pass at making the code look similar.

Retested using the mentioned test.

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Thu Jun 25 11:53:07 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 13:53:07 +0200
Subject: RFR (M): 8247819: G1: Process strong OopStorage entries in
 parallel
In-Reply-To: <79dac1ee-0217-bd18-2ccd-8611693d88a7@oracle.com>
References: <0d929030-5be0-c9f7-54af-ffa87f7c39c9@oracle.com>
 <c6658d1b-5dc8-3ede-be7e-16c79879fbf5@oracle.com>
 <79dac1ee-0217-bd18-2ccd-8611693d88a7@oracle.com>
Message-ID: <f34eb13f-554e-416e-d26a-75318d3b789b@oracle.com>

Hi Stefan,

   thanks for your review.

On 25.06.20 13:28, Stefan Karlsson wrote:
> The formatting looked weird. I'll try again:
> 
> Hi Thomas,
> 
> This isn't needed after we rewrote the OopStorageSetParState:
> 
> +// Needed by _oop_storage_set_strong_par_state as the definition is in the
> +// .inline.hpp file.
> +G1RootProcessor::~G1RootProcessor() {}

Removed.

> 
> ---
> This doesn't seem to be used:
>  ?+
>  ?+ template <typename Closure>
>  ?+ static void strong_oops_do(Closure* cl); };

The method is still used by Parallel and Serial GC.

> 
> ---
> Just a suggestion to lower the noise:
> + G1GCPhaseTimes::GCParPhases phase = 
> G1GCPhaseTimes::GCParPhases(G1GCPhaseTimes::StrongOopStorageSetRoots + i);
> 
> could be changed to:
> + G1GCPhaseTimes::GCParPhases 
> phase(G1GCPhaseTimes::StrongOopStorageSetRoots + i);

This does not work (compile). G1GCPhaseTimes::GCParPhases is an enum, 
not a class.

New webrevs:
http://cr.openjdk.java.net/~tschatzl/8247819/webrev.0_to_1/ (diff)
http://cr.openjdk.java.net/~tschatzl/8247819/webrev.1/ (full)

Testing:
recompilation

Thanks,
   Thomas


From ivan.walulya at oracle.com  Thu Jun 25 13:18:57 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Thu, 25 Jun 2020 15:18:57 +0200
Subject: RFR (S): 8165501: Serial, Parallel and G1 should track the time
 since last gc for millis_since_last_gc() with full precision
In-Reply-To: <2cf64df3-4bbb-b346-3b56-415479064c7f@oracle.com>
References: <71723e44-d2f0-8c15-7a99-054c0dd870bc@oracle.com>
 <CD2871B2-AC54-4CE9-B22B-D3D1981E7AE3@oracle.com>
 <2cf64df3-4bbb-b346-3b56-415479064c7f@oracle.com>
Message-ID: <F837633E-891B-4744-AACB-7F223EE4CECD@oracle.com>

Thanks for the clean up.

Looks good.

//Ivan

> On 25 Jun 2020, at 13:44, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Ivan,
> 
>  thanks for your review!
> 
> On 25.06.20 11:47, Ivan Walulya wrote:
>> Hi Thomas,
>> A few comments below,
>> src/hotspot/share/gc/parallel/psParallelCompact.cpp
>> -  jlong now = os::javaTimeNanos() / NANOSECS_PER_MILLISEC;
>> -  jlong ret_val = now - _time_of_last_gc;
>> +  jlong now = os::javaTimeNanos();
>> +  jlong ret_val = (now - _time_of_last_gc_ns) / NANOSECS_PER_MILLISEC;
>> |jlong now|  => |jlong now_ns|  i see you do this for most of the variable names that refer to time in Nanossrc/hotspot/share/gc/g1/g1Policy.hpp
>> +  jlong time_of_last_gc() { return _time_of_last_gc_ns; }
>> same as previous comment, better to maintain the _nssrc/hotspot/share/gc/shared/generation.hpp
>>   // Time (in ms) when we were last collected or now if a collection is
>>   // in progress.
>>   virtual jlong time_of_last_gc(jlong now) {
>>     // Both _time_of_last_gc and now are set using a time source
>>     // that guarantees monotonically non-decreasing values provided
>>     // the underlying platform provides such a source. So we still
>>     // have to guard against non-monotonicity.
>>     NOT_PRODUCT(
>>       if (now < _time_of_last_gc_ns) {
>>         log_warning(gc)("time warp: " JLONG_FORMAT " to " JLONG_FORMAT, _time_of_last_gc_ns, now);
>>       }
>>     )
>>     return _time_of_last_gc_ns;
>>   }   virtual void update_time_of_last_gc(jlong now)  {
>>     _time_of_last_gc_ns = now;
>>   }
>> comments should be edited to reflect the changes, in addition to _ns
> 
> I think I caught them all in
> 
> http://cr.openjdk.java.net/~tschatzl/8165501/webrev.1 <http://cr.openjdk.java.net/~tschatzl/8165501/webrev.1>
> 
> no incremental one since almost everything changed anyway, doing another pass at making the code look similar.
> 
> Retested using the mentioned test.
> 
> Thanks,
>  Thomas


From thomas.schatzl at oracle.com  Thu Jun 25 13:24:31 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 15:24:31 +0200
Subject: RFR (M): 8247819: G1: Process strong OopStorage entries in
 parallel
In-Reply-To: <f34eb13f-554e-416e-d26a-75318d3b789b@oracle.com>
References: <0d929030-5be0-c9f7-54af-ffa87f7c39c9@oracle.com>
 <c6658d1b-5dc8-3ede-be7e-16c79879fbf5@oracle.com>
 <79dac1ee-0217-bd18-2ccd-8611693d88a7@oracle.com>
 <f34eb13f-554e-416e-d26a-75318d3b789b@oracle.com>
Message-ID: <5a498936-62e5-dbfd-0c1d-d881876f8ef5@oracle.com>

Hi all,

On 25.06.20 13:53, Thomas Schatzl wrote:
> Hi Stefan,
> 
>  ? thanks for your review.
> 
> On 25.06.20 13:28, Stefan Karlsson wrote:
>> The formatting looked weird. I'll try again:
>>
>> Hi Thomas,
>>
>> This isn't needed after we rewrote the OopStorageSetParState:
>>
>> +// Needed by _oop_storage_set_strong_par_state as the definition is 
>> in the
>> +// .inline.hpp file.
>> +G1RootProcessor::~G1RootProcessor() {}
> 
> Removed.
> 
>>
>> ---
>> This doesn't seem to be used:
>> ??+
>> ??+ template <typename Closure>
>> ??+ static void strong_oops_do(Closure* cl); };
> 
> The method is still used by Parallel and Serial GC.

Stefan made me aware that I looked at the wrong strong_oops_do(). 
Removed the new one, and regenerated the webrev.

Thanks,
   Thomas


From ivan.walulya at oracle.com  Thu Jun 25 13:56:41 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Thu, 25 Jun 2020 15:56:41 +0200
Subject: RFR (S): 8243974: Move G1CollectedHeap::millis_since_last_gc
 support from G1Policy
In-Reply-To: <d64bb054-1038-08d9-c14c-6a7f08069885@oracle.com>
References: <d64bb054-1038-08d9-c14c-6a7f08069885@oracle.com>
Message-ID: <3EF197D1-F89A-4B51-89F9-D4801EEE9719@oracle.com>

Looks good to me!

//Ivan

> On 25 Jun 2020, at 09:39, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
>  please review the following change that moves G1CollectedHeap::millis_since_last_gc support away from G1Policy to G1CollectedHeap. That functionality has not much to do with policy decisions at all.
> 
> Fwiw, found the issue 8248221 when fixing this. Will send out that one soon.
> 
> Based on 8165501.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8243974
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8243974/webrev/
> Testing:
> tier1-5 with 8165501, 8248221; see also 8248221 for manual testing info
> 
> Thanks,
>  Thomas


From patrick at os.amperecomputing.com  Thu Jun 25 14:00:48 2020
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Thu, 25 Jun 2020 14:00:48 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce
 false-sharing cache contention
Message-ID: <DM6PR01MB49219D8220806495C02016C38F920@DM6PR01MB4921.prod.exchangelabs.com>

Fixed the typo with TaskQueueSuper


Regards

Patrick


-----Original Message-----
From: hotspot-gc-dev <hotspot-gc-dev-bounces at openjdk.java.net> On Behalf Of Patrick Zhang OS
Sent: Wednesday, June 24, 2020 5:55 PM
To: jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi


Could I ask for a review of this simple patch which takes a tiny part from the original ticket JDK-8243326 [1]. The reason that I do not want a full backport is, the majority of the patch at jdk/jdk [2] is to clean up the volatile use and may be not very meaningful to 11u, furthermore the context (dependencies on atomic.hpp refactor) is too complicated to generate a clear backport (I tried, ~81 files need to be changed).


The purpose of having this one-line change to 11u is, the two volatile variables in TaskQueueSuper: _bottom, _age and corresponding atomic operations upon, may cause severe cache contention inside GC with larger number of threads, i.e., specified by -XX:ParallelGCThreads=##, adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce the possibility of false-sharing cache contention. I do not need the paddings before _bottom and after _age from the original patch [2], because the instances of TaskQueueSuper are usually (always) allocated in a set of queues, in which they are naturally separated. Please review, thanks.


JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

Testing: tier1-2 pass with the patch, commercial benchmarks and small C++ test cases (to simulate the data struct and work-stealing algorithm atomics) validated the performance, no regression.


By the way, I am going to request for 8u backport as well once 11u would have it.


[1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of volatile in taskqueue code

[2] https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6


Regards

Patrick


From thomas.schatzl at oracle.com  Thu Jun 25 14:01:25 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Thu, 25 Jun 2020 16:01:25 +0200
Subject: RFR (S): 8243974: Move G1CollectedHeap::millis_since_last_gc
 support from G1Policy
In-Reply-To: <3EF197D1-F89A-4B51-89F9-D4801EEE9719@oracle.com>
References: <d64bb054-1038-08d9-c14c-6a7f08069885@oracle.com>
 <3EF197D1-F89A-4B51-89F9-D4801EEE9719@oracle.com>
Message-ID: <c54e5e22-f039-c83c-3f8b-dfcf18e7e959@oracle.com>

Hi,

On 25.06.20 15:56, Ivan Walulya wrote:
> Looks good to me!
> 
> //Ivan

   thanks for your review.

Thomas


From kim.barrett at oracle.com  Thu Jun 25 18:17:06 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 25 Jun 2020 14:17:06 -0400
Subject: RFR (S): 8243974: Move G1CollectedHeap::millis_since_last_gc
 support from G1Policy
In-Reply-To: <d64bb054-1038-08d9-c14c-6a7f08069885@oracle.com>
References: <d64bb054-1038-08d9-c14c-6a7f08069885@oracle.com>
Message-ID: <64F124D1-E3BB-4A8F-AB3C-B069EF9981C3@oracle.com>

> On Jun 25, 2020, at 3:39 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
>  please review the following change that moves G1CollectedHeap::millis_since_last_gc support away from G1Policy to G1CollectedHeap. That functionality has not much to do with policy decisions at all.
> 
> Fwiw, found the issue 8248221 when fixing this. Will send out that one soon.
> 
> Based on 8165501.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8243974
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8243974/webrev/
> Testing:
> tier1-5 with 8165501, 8248221; see also 8248221 for manual testing info
> 
> Thanks,
>  Thomas

Looks good.


From kim.barrett at oracle.com  Thu Jun 25 18:38:58 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 25 Jun 2020 14:38:58 -0400
Subject: RFR (M): 8247819: G1: Process strong OopStorage entries in
 parallel
In-Reply-To: <f34eb13f-554e-416e-d26a-75318d3b789b@oracle.com>
References: <0d929030-5be0-c9f7-54af-ffa87f7c39c9@oracle.com>
 <c6658d1b-5dc8-3ede-be7e-16c79879fbf5@oracle.com>
 <79dac1ee-0217-bd18-2ccd-8611693d88a7@oracle.com>
 <f34eb13f-554e-416e-d26a-75318d3b789b@oracle.com>
Message-ID: <4532B7D9-2351-4363-8823-4717F1B593C7@oracle.com>

> On Jun 25, 2020, at 7:53 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Stefan,
> 
>  thanks for your review.
> 
> On 25.06.20 13:28, Stefan Karlsson wrote:
>> The formatting looked weird. I'll try again:
>> Hi Thomas,
>> This isn't needed after we rewrote the OopStorageSetParState:
>> +// Needed by _oop_storage_set_strong_par_state as the definition is in the
>> +// .inline.hpp file.
>> +G1RootProcessor::~G1RootProcessor() {}
> 
> Removed.
> 
>> ---
>> This doesn't seem to be used:
>>  +
>>  + template <typename Closure>
>>  + static void strong_oops_do(Closure* cl); };
> 
> The method is still used by Parallel and Serial GC.
> 
>> ---
>> Just a suggestion to lower the noise:
>> + G1GCPhaseTimes::GCParPhases phase = G1GCPhaseTimes::GCParPhases(G1GCPhaseTimes::StrongOopStorageSetRoots + i);
>> could be changed to:
>> + G1GCPhaseTimes::GCParPhases phase(G1GCPhaseTimes::StrongOopStorageSetRoots + i);
> 
> This does not work (compile). G1GCPhaseTimes::GCParPhases is an enum, not a class.
> 
> New webrevs:
> http://cr.openjdk.java.net/~tschatzl/8247819/webrev.0_to_1/ (diff)
> http://cr.openjdk.java.net/~tschatzl/8247819/webrev.1/ (full)
> 
> Testing:
> recompilation
> 
> Thanks,
>  Thomas

Looks good.

One minor thing, for which I don?t need a new webrev if you take this suggestion:

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp
  69   int counter = 0;
  70   for (OopStorageSet::Iterator it = OopStorageSet::strong_iterator(); !it.is_end(); ++it, ++counter) {
...
  75     uint index = G1GCPhaseTimes::StrongOopStorageSetRoots + counter;

Rather than separate counter and index, maybe

  uint index = G1GCPhaseTimes::StrongOopStorageSetRoots;
  for (... ++index) {

------------------------------------------------------------------------------


From kim.barrett at oracle.com  Thu Jun 25 18:41:48 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 25 Jun 2020 14:41:48 -0400
Subject: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB49219D8220806495C02016C38F920@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB49219D8220806495C02016C38F920@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <36FDC687-128D-44CF-8E63-97E12807D2BD@oracle.com>

> On Jun 25, 2020, at 10:00 AM, Patrick Zhang OS <patrick at os.amperecomputing.com> wrote:
> 
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214
> 
> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/
> 
> Testing: tier1-2 pass with the patch, commercial benchmarks and small C++ test cases (to simulate the data struct and work-stealing algorithm atomics) validated the performance, no regression.

Looks good.


From kim.barrett at oracle.com  Thu Jun 25 18:49:48 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 25 Jun 2020 14:49:48 -0400
Subject: RFR (S): 8248221: G1: millis_since_last_gc updated at wrong time
In-Reply-To: <ca98725a-fdb3-2115-a966-a107746ed033@oracle.com>
References: <ca98725a-fdb3-2115-a966-a107746ed033@oracle.com>
Message-ID: <A1EE64BF-B6FC-41EF-8F3A-240BB65BFD2D@oracle.com>

> On Jun 25, 2020, at 4:04 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8248221
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8248221/webrev/
> Testing:
> tier1-5 with 8243974 and 8165501, manual testing as described above.

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1CollectedHeap.cpp
2067     _time_of_last_gc_ns = os::javaTimeNanos();

This variable seems really poorly named.

_time_for_millis_since_last_gc ?

Of course, millis_since_last_gc's name seems pretty poor too.

Maybe update those names as a followup.

------------------------------------------------------------------------------

Other than that naming issue, looks good.


From ivan.walulya at oracle.com  Thu Jun 25 19:05:13 2020
From: ivan.walulya at oracle.com (Ivan Walulya)
Date: Thu, 25 Jun 2020 21:05:13 +0200
Subject: RFR (S): 8248221: G1: millis_since_last_gc updated at wrong time
In-Reply-To: <ca98725a-fdb3-2115-a966-a107746ed033@oracle.com>
References: <ca98725a-fdb3-2115-a966-a107746ed033@oracle.com>
Message-ID: <1ABE6DE0-CC10-449F-9E5E-029821D631DE@oracle.com>

Looks good!

//Ivan

> On 25 Jun 2020, at 10:04, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
>  can I have reviews for this change that fixes the location at which the millis_since_last_gc timestamp is updated.
> 
> The spec of the only user in sun.rmi.transport.GC.maxObjectInspectionAge says:
> 
> * Returns the maximum <em>object-inspection age</em>, which is the number
> * of real-time milliseconds that have elapsed since the
> * least-recently-inspected heap object was last inspected by the garbage
> * collector.
> 
> Currently we do that update only at young gc, which is wrong. We should do that if/when a complete liveness analysis cycle has finished instead, i.e. end of marking/end of full gc.
> 
> I did some testing using the TestMillisSinceLastGC.java jtreg test in http://cr.openjdk.java.net/~tschatzl/8248221/webrev.test/. As expected, it failed for G1 but all other GCs tested there pass (Epsilon will probably also fail, because it never does any gc so I did not even try).
> 
> However the test intrinsically depends on timing i.e. the duration between some GC and checking the CollectedHeap::millis_since_last_gc() result, so I did not add it to this webrev. While in my ci testing it did not fail, I deemed it too risky to add.
> I also eyeballed that the output is still reasonable (i.e. no unit problem introduced by 8165501) using its log output for all affected collectors.
> 
> Depends on 8243974.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8248221
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8248221/webrev/
> Testing:
> tier1-5 with 8243974 and 8165501, manual testing as described above.
> 
> Thanks,
>  Thomas


From stefan.karlsson at oracle.com  Thu Jun 25 19:40:41 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Thu, 25 Jun 2020 21:40:41 +0200
Subject: RFR: 8248346: Move OopStorage mutex setup out from OopStorageSet
Message-ID: <9d922bd0-f1ad-a058-7699-f8805752b0c6@oracle.com>

Hi all,

Please review this small patch to move the OopStorage mutex creation out 
from oopStorageSet.cpp and put it inside the OopStorage constructor.

https://cr.openjdk.java.net/~stefank/8248346/webrev.01
https://bugs.openjdk.java.net/browse/JDK-8248346

So far I've only tested this locally with gtest. The product code should 
be the same, but the test used slightly different values when 
initializing the mutexes. That doesn't seem to affect the tests.

Thanks,
StefanK


From goetz.lindenmaier at sap.com  Fri Jun 26 07:17:24 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 26 Jun 2020 07:17:24 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi Patrick,

I had a look at your change. 
I think it makes sense to bring this to 11, if there actually is 
the performance gain you mention. 
Reviewed.

Please add in the "Fix request" comment in the JBS the risk
of downporting this.  And I think is should be "Fix request (11u)"
because different people will review your fix request for 11 and 8.

Best regards,
  Goetz.

> -----Original Message-----
> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net> On
> Behalf Of Patrick Zhang OS
> Sent: Wednesday, June 24, 2020 11:55 AM
> To: jdk-updates-dev at openjdk.java.net
> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for
> TaskQueuSuper to reduce false-sharing cache contention
> 
> Hi
> 
> Could I ask for a review of this simple patch which takes a tiny part from the
> original ticket JDK-8243326 [1]. The reason that I do not want a full backport
> is, the majority of the patch at jdk/jdk [2] is to clean up the volatile use and
> may be not very meaningful to 11u, furthermore the context (dependencies
> on atomic.hpp refactor) is too complicated to generate a clear backport (I
> tried, ~81 files need to be changed).
> 
> The purpose of having this one-line change to 11u is, the two volatile
> variables in TaskQueuSuper: _bottom, _age and corresponding atomic
> operations upon, may cause severe cache contention inside GC with larger
> number of threads, i.e., specified by -XX:ParallelGCThreads=##, adding
> paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce the
> possibility of false-sharing cache contention. I do not need the paddings
> before _bottom and after _age from the original patch [2], because the
> instances of TaskQueuSuper are usually (always) allocated in a set of queues,
> in which they are naturally separated. Please review, thanks.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214
> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/
> Testing: tier1-2 pass with the patch, commercial benchmarks and small C++
> test cases (to simulate the data struct and work-stealing algorithm atomics)
> validated the performance, no regression.
> 
> By the way, I am going to request for 8u backport as well once 11u would
> have it.
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of
> volatile in taskqueue code
> [2] https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6
> 
> Regards
> Patrick
> 


From stefan.karlsson at oracle.com  Fri Jun 26 08:50:53 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 26 Jun 2020 10:50:53 +0200
Subject: [15] RFR: 8248048: ZGC: AArch64: SIGILL in load barrier register
 spilling
Message-ID: <bd359011-fb9e-fa82-a21b-aa3360e2d397@oracle.com>

Hi all,

Please review this patch to fix a ZGC load barrier register spilling bug.

https://cr.openjdk.java.net/~stefank/8248048/webrev.01/
https://bugs.openjdk.java.net/browse/JDK-8248048

The JVM crashed with an ILL_ILLOPC when executing this instruction in 
our load barrier stub:

 ? ldp q31, q31, [sp, #224]

The entire load barrier stub:

 ?? 0x0000ffff998ab964:??? stp??? x10, x13, [sp, #-32]!
 ?? 0x0000ffff998ab968:??? stp??? x14, x17, [sp, #16]
 ?? 0x0000ffff998ab96c:??? stp??? q1, q2, [sp, #-256]!
 ?? 0x0000ffff998ab970:??? stp??? q3, q19, [sp, #32]
 ?? 0x0000ffff998ab974:??? stp??? q20, q21, [sp, #64]
 ?? 0x0000ffff998ab978:??? stp??? q22, q23, [sp, #96]
 ?? 0x0000ffff998ab97c:??? stp??? q24, q25, [sp, #128]
 ?? 0x0000ffff998ab980:??? stp??? q26, q28, [sp, #160]
 ?? 0x0000ffff998ab984:??? stp??? q29, q30, [sp, #192]
 ?? 0x0000ffff998ab988:??? stp??? q31, q31, [sp, #224]
 ?? 0x0000ffff998ab98c:??? sub??? x1, x10, #0x0
 ?? 0x0000ffff998ab990:??? mov??? x0, x11
 ?? 0x0000ffff998ab994:??? mov??? x8, #0xfc28??????????????? ??? // #64552
 ?? 0x0000ffff998ab998:??? movk??? x8, #0xaf11, lsl #16
 ?? 0x0000ffff998ab99c:??? movk??? x8, #0xffff, lsl #32
 ?? 0x0000ffff998ab9a0:??? blr??? x8 ; branch into the JVM
 ?? 0x0000ffff998ab9a4:??? mov??? x11, x0
 ?? 0x0000ffff998ab9a8:??? ldp??? q3, q19, [sp, #32]
 ?? 0x0000ffff998ab9ac:??? ldp??? q20, q21, [sp, #64]
 ?? 0x0000ffff998ab9b0:??? ldp??? q22, q23, [sp, #96]
 ?? 0x0000ffff998ab9b4:??? ldp??? q24, q25, [sp, #128]
 ?? 0x0000ffff998ab9b8:??? ldp??? q26, q28, [sp, #160]
 ?? 0x0000ffff998ab9bc:??? ldp??? q29, q30, [sp, #192]
=> 0x0000ffff998ab9c0:??? ldp??? q31, q31, [sp, #224]
 ?? 0x0000ffff998ab9c4:??? ldp??? q1, q2, [sp], #256
 ?? 0x0000ffff998ab9c8:??? ldp??? x14, x17, [sp, #16]
 ?? 0x0000ffff998ab9cc:??? ldp??? x10, x13, [sp], #32
 ?? 0x0000ffff998ab9d0:??? b??? 0xffff998aa718

It seems to be illegal to use the same register twice when loading into 
a pair of registers. I verified that that was the problem, and not the 
usage of zr (see below) that caused some weird encoding, by changing the 
code to always generate stp/ldp with the same register:

=> 0x0000ffff757d22fc:??? ldp??? q20, q20, [sp, #32] ; Crash here as well
 ?? 0x0000ffff757d2300:??? ldp??? q21, q21, [sp, #48]
 ?? 0x0000ffff757d2304:??? ldp??? q22, q22, [sp, #64]

The code that generates this instruction is MacroAssembler::push_fp, 
which spills the necessary registers in pairs with stp/ldp calls. If the 
number of registers to spill is odd it needs to deal with one of the 
registers separately. This is done by adding a dummy register here:

2136 regs[count++] = zr->encoding_nocheck();
2137 count &= ~1; // Only push an even number of regs

This scheme seems to work for the normal registers 
(MacroAssembler::push), but the usage of zr seems dubious when we're 
dealing with the fp/simd version of stp/ldp.

My proposed patch replaces the stp/ldp for the odd numbered register 
with the single-register versions: str/ldr. I make sure to keep the 
stack 16 bytes aligned by still bumping 16 bytes, but skipping the 
store/load to the second 8 bytes half.

Note that right now MacroAssembler::push_fp is only used by ZGC.

This fixes the crash. I've run this code through jtreg groups :tier1, 
tier2, tier3, and an Oracle-internal stress suite without any new problems.

The smallest reproducer I have is:
make -C ../build/fastdebug test TEST=test/jdk/java/util/concurrent/ 
JTREG="JAVA_OPTIONS=-XX:+UseZGC"

Does this look OK?

Thanks,
StefanK


From kim.barrett at oracle.com  Fri Jun 26 09:43:13 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 26 Jun 2020 05:43:13 -0400
Subject: RFR: 8248346: Move OopStorage mutex setup out from OopStorageSet
In-Reply-To: <9d922bd0-f1ad-a058-7699-f8805752b0c6@oracle.com>
References: <9d922bd0-f1ad-a058-7699-f8805752b0c6@oracle.com>
Message-ID: <6F9FFDB8-0E4C-429C-8995-72C48C51E2FD@oracle.com>

> On Jun 25, 2020, at 3:40 PM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
> 
> Hi all,
> 
> Please review this small patch to move the OopStorage mutex creation out from oopStorageSet.cpp and put it inside the OopStorage constructor.
> 
> https://cr.openjdk.java.net/~stefank/8248346/webrev.01
> https://bugs.openjdk.java.net/browse/JDK-8248346
> 
> So far I've only tested this locally with gtest. The product code should be the same, but the test used slightly different values when initializing the mutexes. That doesn't seem to affect the tests.
> 
> Thanks,
> StefanK

Looks good.


From kim.barrett at oracle.com  Fri Jun 26 09:44:29 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 26 Jun 2020 05:44:29 -0400
Subject: [15, testbug]: 8248306:
 gc/stress/gclocker/TestExcessGCLockerCollections.java does not compile
In-Reply-To: <5a57528d-32f3-9dd3-70d9-3afa59a97f99@oracle.com>
References: <5a57528d-32f3-9dd3-70d9-3afa59a97f99@oracle.com>
Message-ID: <4E1C6025-3684-4E04-B5A7-5FF437985FC6@oracle.com>

> On Jun 25, 2020, at 5:40 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
>  can I have reviews for this testbug?
> 
> Since JDK-8244010: Simplify usages of ProcessTools.createJavaProcessBuilder in our tests the gc/stress/gclocker/TestExcessGCLockerCollections.java test does not compile any more because its use of ProcessTools.createJavaProcessBuilder has not been updated.
> 
> I simply replaced the ProcessTools.createJavaProcessBuilder() call by ProcessTools.executeTestJvm which according to the discussion for 8244010 does the right thing.
> 
> I would like to fix this in 15, because 8244010 is in 15.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8248306
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8248306/webrev/
> Testing:
> local run of the test
> 
> Thanks,
>  Thomas

Looks good.


From thomas.schatzl at oracle.com  Fri Jun 26 09:46:45 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 26 Jun 2020 11:46:45 +0200
Subject: [15, testbug]: 8248306:
 gc/stress/gclocker/TestExcessGCLockerCollections.java does not compile
In-Reply-To: <4E1C6025-3684-4E04-B5A7-5FF437985FC6@oracle.com>
References: <5a57528d-32f3-9dd3-70d9-3afa59a97f99@oracle.com>
 <4E1C6025-3684-4E04-B5A7-5FF437985FC6@oracle.com>
Message-ID: <d08d970f-5f41-a2c9-957b-8f34dc8d4a4f@oracle.com>

Hi Kim,

On 26.06.20 11:44, Kim Barrett wrote:
>> On Jun 25, 2020, at 5:40 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> Hi all,
>>
>>   can I have reviews for this testbug?
>>
>> Since JDK-8244010: Simplify usages of ProcessTools.createJavaProcessBuilder in our tests the gc/stress/gclocker/TestExcessGCLockerCollections.java test does not compile any more because its use of ProcessTools.createJavaProcessBuilder has not been updated.
>>
>> I simply replaced the ProcessTools.createJavaProcessBuilder() call by ProcessTools.executeTestJvm which according to the discussion for 8244010 does the right thing.
>>
>> I would like to fix this in 15, because 8244010 is in 15.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8248306
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8248306/webrev/
>> Testing:
>> local run of the test
>>
>> Thanks,
>>   Thomas
> 
> Looks good.
> 

   thanks for your review.

Thomas


From stefan.karlsson at oracle.com  Fri Jun 26 09:48:10 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 26 Jun 2020 11:48:10 +0200
Subject: RFR: 8248346: Move OopStorage mutex setup out from OopStorageSet
In-Reply-To: <6F9FFDB8-0E4C-429C-8995-72C48C51E2FD@oracle.com>
References: <9d922bd0-f1ad-a058-7699-f8805752b0c6@oracle.com>
 <6F9FFDB8-0E4C-429C-8995-72C48C51E2FD@oracle.com>
Message-ID: <5fa0d6d2-0a24-9a7d-eb1d-6f67f2894a17@oracle.com>

Thanks, Kim.

StefanK

On 2020-06-26 11:43, Kim Barrett wrote:
>> On Jun 25, 2020, at 3:40 PM, Stefan Karlsson <stefan.karlsson at oracle.com> wrote:
>>
>> Hi all,
>>
>> Please review this small patch to move the OopStorage mutex creation out from oopStorageSet.cpp and put it inside the OopStorage constructor.
>>
>> https://cr.openjdk.java.net/~stefank/8248346/webrev.01
>> https://bugs.openjdk.java.net/browse/JDK-8248346
>>
>> So far I've only tested this locally with gtest. The product code should be the same, but the test used slightly different values when initializing the mutexes. That doesn't seem to affect the tests.
>>
>> Thanks,
>> StefanK
> Looks good.
>


From thomas.schatzl at oracle.com  Fri Jun 26 09:49:36 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 26 Jun 2020 11:49:36 +0200
Subject: RFR (S): 8243974: Move G1CollectedHeap::millis_since_last_gc
 support from G1Policy
In-Reply-To: <64F124D1-E3BB-4A8F-AB3C-B069EF9981C3@oracle.com>
References: <d64bb054-1038-08d9-c14c-6a7f08069885@oracle.com>
 <64F124D1-E3BB-4A8F-AB3C-B069EF9981C3@oracle.com>
Message-ID: <d8fa81f6-7e89-46c0-4610-06a044926c71@oracle.com>

Hi,

On 25.06.20 20:17, Kim Barrett wrote:
>> On Jun 25, 2020, at 3:39 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> Hi all,
>>
>>   please review the following change that moves G1CollectedHeap::millis_since_last_gc support away from G1Policy to G1CollectedHeap. That functionality has not much to do with policy decisions at all.
>>
>> Fwiw, found the issue 8248221 when fixing this. Will send out that one soon.
>>
>> Based on 8165501.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8243974
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8243974/webrev/
>> Testing:
>> tier1-5 with 8165501, 8248221; see also 8248221 for manual testing info
>>
>> Thanks,
>>   Thomas
> 
> Looks good.
> 

   thanks for your review.

Thomas


From thomas.schatzl at oracle.com  Fri Jun 26 09:50:13 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 26 Jun 2020 11:50:13 +0200
Subject: RFR (S): 8165501: Serial, Parallel and G1 should track the time
 since last gc for millis_since_last_gc() with full precision
In-Reply-To: <F837633E-891B-4744-AACB-7F223EE4CECD@oracle.com>
References: <71723e44-d2f0-8c15-7a99-054c0dd870bc@oracle.com>
 <CD2871B2-AC54-4CE9-B22B-D3D1981E7AE3@oracle.com>
 <2cf64df3-4bbb-b346-3b56-415479064c7f@oracle.com>
 <F837633E-891B-4744-AACB-7F223EE4CECD@oracle.com>
Message-ID: <02581f0f-f980-6d83-2f4f-88f98040e9a1@oracle.com>

Hi Ivan,

On 25.06.20 15:18, Ivan Walulya wrote:
> Thanks for the clean up.
> 
> Looks good.
> 
> //Ivan

   thanks for your review.

Thomas


From adinn at redhat.com  Fri Jun 26 10:05:35 2020
From: adinn at redhat.com (Andrew Dinn)
Date: Fri, 26 Jun 2020 11:05:35 +0100
Subject: [15] RFR: 8248048: ZGC: AArch64: SIGILL in load barrier register
 spilling
In-Reply-To: <bd359011-fb9e-fa82-a21b-aa3360e2d397@oracle.com>
References: <bd359011-fb9e-fa82-a21b-aa3360e2d397@oracle.com>
Message-ID: <5261f83d-46bf-e469-0e40-7b8b534d86c6@redhat.com>

Hi Stefan,

Yes, nice catch. zr is clearly the wrong choice here. In the context of
an FP register it ends up being interpreted as q31 which, as you show,
clashes when r31 is the last register in an odd register set.

Your fix looks ok to me (so count it as reviewed).

Just for your interest, another alternative would be to continue to use
stpq instructions but replace zr with an fp register that is a) in the
save set but b) guaranteed to differ from the last register,-- the
obvious choice being regs[0]. That will work in all cases where count >=
2 (well 3 actually since the problem only arises in odd cases). So, you
would still need to special case count == 0/1 and only add regs[0] to
the set after handling those cases.

I'm agnostic over which of these two is better as I don't think either a
stpq or a strq pre/post is preferable to the other.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill

On 26/06/2020 09:50, Stefan Karlsson wrote:
> Hi all,
> 
> Please review this patch to fix a ZGC load barrier register spilling bug.
> 
> https://cr.openjdk.java.net/~stefank/8248048/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8248048
> 
> The JVM crashed with an ILL_ILLOPC when executing this instruction in
> our load barrier stub:
> 
> ? ldp q31, q31, [sp, #224]
> 
> The entire load barrier stub:
> 
> ?? 0x0000ffff998ab964:??? stp??? x10, x13, [sp, #-32]!
> ?? 0x0000ffff998ab968:??? stp??? x14, x17, [sp, #16]
> ?? 0x0000ffff998ab96c:??? stp??? q1, q2, [sp, #-256]!
> ?? 0x0000ffff998ab970:??? stp??? q3, q19, [sp, #32]
> ?? 0x0000ffff998ab974:??? stp??? q20, q21, [sp, #64]
> ?? 0x0000ffff998ab978:??? stp??? q22, q23, [sp, #96]
> ?? 0x0000ffff998ab97c:??? stp??? q24, q25, [sp, #128]
> ?? 0x0000ffff998ab980:??? stp??? q26, q28, [sp, #160]
> ?? 0x0000ffff998ab984:??? stp??? q29, q30, [sp, #192]
> ?? 0x0000ffff998ab988:??? stp??? q31, q31, [sp, #224]
> ?? 0x0000ffff998ab98c:??? sub??? x1, x10, #0x0
> ?? 0x0000ffff998ab990:??? mov??? x0, x11
> ?? 0x0000ffff998ab994:??? mov??? x8, #0xfc28??????????????? ??? // #64552
> ?? 0x0000ffff998ab998:??? movk??? x8, #0xaf11, lsl #16
> ?? 0x0000ffff998ab99c:??? movk??? x8, #0xffff, lsl #32
> ?? 0x0000ffff998ab9a0:??? blr??? x8 ; branch into the JVM
> ?? 0x0000ffff998ab9a4:??? mov??? x11, x0
> ?? 0x0000ffff998ab9a8:??? ldp??? q3, q19, [sp, #32]
> ?? 0x0000ffff998ab9ac:??? ldp??? q20, q21, [sp, #64]
> ?? 0x0000ffff998ab9b0:??? ldp??? q22, q23, [sp, #96]
> ?? 0x0000ffff998ab9b4:??? ldp??? q24, q25, [sp, #128]
> ?? 0x0000ffff998ab9b8:??? ldp??? q26, q28, [sp, #160]
> ?? 0x0000ffff998ab9bc:??? ldp??? q29, q30, [sp, #192]
> => 0x0000ffff998ab9c0:??? ldp??? q31, q31, [sp, #224]
> ?? 0x0000ffff998ab9c4:??? ldp??? q1, q2, [sp], #256
> ?? 0x0000ffff998ab9c8:??? ldp??? x14, x17, [sp, #16]
> ?? 0x0000ffff998ab9cc:??? ldp??? x10, x13, [sp], #32
> ?? 0x0000ffff998ab9d0:??? b??? 0xffff998aa718
> 
> It seems to be illegal to use the same register twice when loading into
> a pair of registers. I verified that that was the problem, and not the
> usage of zr (see below) that caused some weird encoding, by changing the
> code to always generate stp/ldp with the same register:
> 
> => 0x0000ffff757d22fc:??? ldp??? q20, q20, [sp, #32] ; Crash here as well
> ?? 0x0000ffff757d2300:??? ldp??? q21, q21, [sp, #48]
> ?? 0x0000ffff757d2304:??? ldp??? q22, q22, [sp, #64]
> 
> The code that generates this instruction is MacroAssembler::push_fp,
> which spills the necessary registers in pairs with stp/ldp calls. If the
> number of registers to spill is odd it needs to deal with one of the
> registers separately. This is done by adding a dummy register here:
> 
> 2136 regs[count++] = zr->encoding_nocheck();
> 2137 count &= ~1; // Only push an even number of regs
> 
> This scheme seems to work for the normal registers
> (MacroAssembler::push), but the usage of zr seems dubious when we're
> dealing with the fp/simd version of stp/ldp.
> 
> My proposed patch replaces the stp/ldp for the odd numbered register
> with the single-register versions: str/ldr. I make sure to keep the
> stack 16 bytes aligned by still bumping 16 bytes, but skipping the
> store/load to the second 8 bytes half.
> 
> Note that right now MacroAssembler::push_fp is only used by ZGC.
> 
> This fixes the crash. I've run this code through jtreg groups :tier1,
> tier2, tier3, and an Oracle-internal stress suite without any new problems.
> 
> The smallest reproducer I have is:
> make -C ../build/fastdebug test TEST=test/jdk/java/util/concurrent/
> JTREG="JAVA_OPTIONS=-XX:+UseZGC"
> 
> Does this look OK?
> 
> Thanks,
> StefanK
> 


From erik.osterlund at oracle.com  Fri Jun 26 10:11:48 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Fri, 26 Jun 2020 12:11:48 +0200
Subject: RFR: 8248346: Move OopStorage mutex setup out from OopStorageSet
In-Reply-To: <9d922bd0-f1ad-a058-7699-f8805752b0c6@oracle.com>
References: <9d922bd0-f1ad-a058-7699-f8805752b0c6@oracle.com>
Message-ID: <f5d3f27f-3fed-709d-ab5d-4f95eb007b5b@oracle.com>

Hi Stefan,

Looks good.

Thanks,
/Erik

On 2020-06-25 21:40, Stefan Karlsson wrote:
> Hi all,
>
> Please review this small patch to move the OopStorage mutex creation 
> out from oopStorageSet.cpp and put it inside the OopStorage constructor.
>
> https://cr.openjdk.java.net/~stefank/8248346/webrev.01
> https://bugs.openjdk.java.net/browse/JDK-8248346
>
> So far I've only tested this locally with gtest. The product code 
> should be the same, but the test used slightly different values when 
> initializing the mutexes. That doesn't seem to affect the tests.
>
> Thanks,
> StefanK


From aph at redhat.com  Fri Jun 26 10:43:55 2020
From: aph at redhat.com (Andrew Haley)
Date: Fri, 26 Jun 2020 11:43:55 +0100
Subject: [15] RFR: 8248048: ZGC: AArch64: SIGILL in load barrier register
 spilling
In-Reply-To: <5261f83d-46bf-e469-0e40-7b8b534d86c6@redhat.com>
References: <bd359011-fb9e-fa82-a21b-aa3360e2d397@oracle.com>
 <5261f83d-46bf-e469-0e40-7b8b534d86c6@redhat.com>
Message-ID: <a118bc66-b508-6185-8613-90327165be4d@redhat.com>

On 26/06/2020 11:05, Andrew Dinn wrote:
> Yes, nice catch. zr is clearly the wrong choice here. In the context of
> an FP register it ends up being interpreted as q31 which, as you show,
> clashes when r31 is the last register in an odd register set.

OK.

I'm sure we've seen this bug years ago and fixed it. maybe the fix was
never pushed, or maybe it was another instance of the same error.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From thomas.schatzl at oracle.com  Fri Jun 26 11:00:30 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 26 Jun 2020 13:00:30 +0200
Subject: RFR (S): 8248221: G1: millis_since_last_gc updated at wrong time
In-Reply-To: <A1EE64BF-B6FC-41EF-8F3A-240BB65BFD2D@oracle.com>
References: <ca98725a-fdb3-2115-a966-a107746ed033@oracle.com>
 <A1EE64BF-B6FC-41EF-8F3A-240BB65BFD2D@oracle.com>
Message-ID: <ec81a9b6-7f23-204b-1850-8822a5e7a464@oracle.com>

Hi,

On 25.06.20 20:49, Kim Barrett wrote:
>> On Jun 25, 2020, at 4:04 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8248221
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8248221/webrev/
>> Testing:
>> tier1-5 with 8243974 and 8165501, manual testing as described above.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
> 2067     _time_of_last_gc_ns = os::javaTimeNanos();
> 
> This variable seems really poorly named.
> 
> _time_for_millis_since_last_gc ?
> 
> Of course, millis_since_last_gc's name seems pretty poor too.
> 
> Maybe update those names as a followup.
> 
> ------------------------------------------------------------------------------
> 
> Other than that naming issue, looks good.
> 

   I totally agree that millis_since_last_gc is very much misnamed, but 
I would like to not redo the recent three changes in that area because 
of that. I will file a CR.

Thanks for your review,
   Thomas


From thomas.schatzl at oracle.com  Fri Jun 26 11:00:51 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 26 Jun 2020 13:00:51 +0200
Subject: RFR (S): 8248221: G1: millis_since_last_gc updated at wrong time
In-Reply-To: <1ABE6DE0-CC10-449F-9E5E-029821D631DE@oracle.com>
References: <ca98725a-fdb3-2115-a966-a107746ed033@oracle.com>
 <1ABE6DE0-CC10-449F-9E5E-029821D631DE@oracle.com>
Message-ID: <fe7af4a1-738a-0789-6001-98fcbe36a9cb@oracle.com>

Hi Ivan,

On 25.06.20 21:05, Ivan Walulya wrote:
> Looks good!
> 
> //Ivan

   thanks for your review!

Thomas


From thomas.schatzl at oracle.com  Fri Jun 26 11:31:04 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 26 Jun 2020 13:31:04 +0200
Subject: RFR (M): 8247819: G1: Process strong OopStorage entries in
 parallel
In-Reply-To: <4532B7D9-2351-4363-8823-4717F1B593C7@oracle.com>
References: <0d929030-5be0-c9f7-54af-ffa87f7c39c9@oracle.com>
 <c6658d1b-5dc8-3ede-be7e-16c79879fbf5@oracle.com>
 <79dac1ee-0217-bd18-2ccd-8611693d88a7@oracle.com>
 <f34eb13f-554e-416e-d26a-75318d3b789b@oracle.com>
 <4532B7D9-2351-4363-8823-4717F1B593C7@oracle.com>
Message-ID: <9fb4442a-a6d6-ba51-1d7c-d2cb29e4045a@oracle.com>

Hi Kim,

On 25.06.20 20:38, Kim Barrett wrote:
>> On Jun 25, 2020, at 7:53 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> Hi Stefan,
>>
>>   thanks for your review.
>>
>> On 25.06.20 13:28, Stefan Karlsson wrote:
[...]
> 
> Looks good.
> 
> One minor thing, for which I don?t need a new webrev if you take this suggestion:
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp
>    69   int counter = 0;
>    70   for (OopStorageSet::Iterator it = OopStorageSet::strong_iterator(); !it.is_end(); ++it, ++counter) {
> ...
>    75     uint index = G1GCPhaseTimes::StrongOopStorageSetRoots + counter;
> 
> Rather than separate counter and index, maybe
> 
>    uint index = G1GCPhaseTimes::StrongOopStorageSetRoots;
>    for (... ++index) {
> 
> ------------------------------------------------------------------------------
> 

   will do that, thanks. For reference, here is the latest webrev:

http://cr.openjdk.java.net/~tschatzl/8247819/webrev.1_to_2/ (diff)
http://cr.openjdk.java.net/~tschatzl/8247819/webrev.2/ (full)

Thanks,
   Thomas


From stefan.karlsson at oracle.com  Fri Jun 26 11:35:50 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 26 Jun 2020 13:35:50 +0200
Subject: [15] RFR: 8248048: ZGC: AArch64: SIGILL in load barrier register
 spilling
In-Reply-To: <5261f83d-46bf-e469-0e40-7b8b534d86c6@redhat.com>
References: <bd359011-fb9e-fa82-a21b-aa3360e2d397@oracle.com>
 <5261f83d-46bf-e469-0e40-7b8b534d86c6@redhat.com>
Message-ID: <39e71fba-1b24-1c16-3747-49fad78842b0@oracle.com>

Hi Andrew,

On 2020-06-26 12:05, Andrew Dinn wrote:
> Hi Stefan,
> 
> Yes, nice catch. zr is clearly the wrong choice here. In the context of
> an FP register it ends up being interpreted as q31 which, as you show,
> clashes when r31 is the last register in an odd register set.
> 
> Your fix looks ok to me (so count it as reviewed).

Thanks for reviewing!

> 
> Just for your interest, another alternative would be to continue to use
> stpq instructions but replace zr with an fp register that is a) in the
> save set but b) guaranteed to differ from the last register,-- the
> obvious choice being regs[0]. That will work in all cases where count >=
> 2 (well 3 actually since the problem only arises in odd cases). So, you
> would still need to special case count == 0/1 and only add regs[0] to
> the set after handling those cases.

I was thinking along the same lines first. The problem with the count == 
1 case made me look for another solution.

> 
> I'm agnostic over which of these two is better as I don't think either a
> stpq or a strq pre/post is preferable to the other.

OK. Again, thanks for taking the time to think about the different ways 
to fix this.

StefanK

> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> Red Hat Distinguished Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
> 
> On 26/06/2020 09:50, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to fix a ZGC load barrier register spilling bug.
>>
>> https://cr.openjdk.java.net/~stefank/8248048/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8248048
>>
>> The JVM crashed with an ILL_ILLOPC when executing this instruction in
>> our load barrier stub:
>>
>>  ? ldp q31, q31, [sp, #224]
>>
>> The entire load barrier stub:
>>
>>  ?? 0x0000ffff998ab964:??? stp??? x10, x13, [sp, #-32]!
>>  ?? 0x0000ffff998ab968:??? stp??? x14, x17, [sp, #16]
>>  ?? 0x0000ffff998ab96c:??? stp??? q1, q2, [sp, #-256]!
>>  ?? 0x0000ffff998ab970:??? stp??? q3, q19, [sp, #32]
>>  ?? 0x0000ffff998ab974:??? stp??? q20, q21, [sp, #64]
>>  ?? 0x0000ffff998ab978:??? stp??? q22, q23, [sp, #96]
>>  ?? 0x0000ffff998ab97c:??? stp??? q24, q25, [sp, #128]
>>  ?? 0x0000ffff998ab980:??? stp??? q26, q28, [sp, #160]
>>  ?? 0x0000ffff998ab984:??? stp??? q29, q30, [sp, #192]
>>  ?? 0x0000ffff998ab988:??? stp??? q31, q31, [sp, #224]
>>  ?? 0x0000ffff998ab98c:??? sub??? x1, x10, #0x0
>>  ?? 0x0000ffff998ab990:??? mov??? x0, x11
>>  ?? 0x0000ffff998ab994:??? mov??? x8, #0xfc28??????????????? ??? // #64552
>>  ?? 0x0000ffff998ab998:??? movk??? x8, #0xaf11, lsl #16
>>  ?? 0x0000ffff998ab99c:??? movk??? x8, #0xffff, lsl #32
>>  ?? 0x0000ffff998ab9a0:??? blr??? x8 ; branch into the JVM
>>  ?? 0x0000ffff998ab9a4:??? mov??? x11, x0
>>  ?? 0x0000ffff998ab9a8:??? ldp??? q3, q19, [sp, #32]
>>  ?? 0x0000ffff998ab9ac:??? ldp??? q20, q21, [sp, #64]
>>  ?? 0x0000ffff998ab9b0:??? ldp??? q22, q23, [sp, #96]
>>  ?? 0x0000ffff998ab9b4:??? ldp??? q24, q25, [sp, #128]
>>  ?? 0x0000ffff998ab9b8:??? ldp??? q26, q28, [sp, #160]
>>  ?? 0x0000ffff998ab9bc:??? ldp??? q29, q30, [sp, #192]
>> => 0x0000ffff998ab9c0:??? ldp??? q31, q31, [sp, #224]
>>  ?? 0x0000ffff998ab9c4:??? ldp??? q1, q2, [sp], #256
>>  ?? 0x0000ffff998ab9c8:??? ldp??? x14, x17, [sp, #16]
>>  ?? 0x0000ffff998ab9cc:??? ldp??? x10, x13, [sp], #32
>>  ?? 0x0000ffff998ab9d0:??? b??? 0xffff998aa718
>>
>> It seems to be illegal to use the same register twice when loading into
>> a pair of registers. I verified that that was the problem, and not the
>> usage of zr (see below) that caused some weird encoding, by changing the
>> code to always generate stp/ldp with the same register:
>>
>> => 0x0000ffff757d22fc:??? ldp??? q20, q20, [sp, #32] ; Crash here as well
>>  ?? 0x0000ffff757d2300:??? ldp??? q21, q21, [sp, #48]
>>  ?? 0x0000ffff757d2304:??? ldp??? q22, q22, [sp, #64]
>>
>> The code that generates this instruction is MacroAssembler::push_fp,
>> which spills the necessary registers in pairs with stp/ldp calls. If the
>> number of registers to spill is odd it needs to deal with one of the
>> registers separately. This is done by adding a dummy register here:
>>
>> 2136 regs[count++] = zr->encoding_nocheck();
>> 2137 count &= ~1; // Only push an even number of regs
>>
>> This scheme seems to work for the normal registers
>> (MacroAssembler::push), but the usage of zr seems dubious when we're
>> dealing with the fp/simd version of stp/ldp.
>>
>> My proposed patch replaces the stp/ldp for the odd numbered register
>> with the single-register versions: str/ldr. I make sure to keep the
>> stack 16 bytes aligned by still bumping 16 bytes, but skipping the
>> store/load to the second 8 bytes half.
>>
>> Note that right now MacroAssembler::push_fp is only used by ZGC.
>>
>> This fixes the crash. I've run this code through jtreg groups :tier1,
>> tier2, tier3, and an Oracle-internal stress suite without any new problems.
>>
>> The smallest reproducer I have is:
>> make -C ../build/fastdebug test TEST=test/jdk/java/util/concurrent/
>> JTREG="JAVA_OPTIONS=-XX:+UseZGC"
>>
>> Does this look OK?
>>
>> Thanks,
>> StefanK
>>
> 


From stefan.karlsson at oracle.com  Fri Jun 26 11:36:57 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 26 Jun 2020 13:36:57 +0200
Subject: RFR: 8248346: Move OopStorage mutex setup out from OopStorageSet
In-Reply-To: <f5d3f27f-3fed-709d-ab5d-4f95eb007b5b@oracle.com>
References: <9d922bd0-f1ad-a058-7699-f8805752b0c6@oracle.com>
 <f5d3f27f-3fed-709d-ab5d-4f95eb007b5b@oracle.com>
Message-ID: <78d938d0-babb-e51f-96ba-54d5fbeaeb38@oracle.com>

Thanks, Erik.

StefanK

On 2020-06-26 12:11, Erik ?sterlund wrote:
> Hi Stefan,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
> On 2020-06-25 21:40, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this small patch to move the OopStorage mutex creation 
>> out from oopStorageSet.cpp and put it inside the OopStorage constructor.
>>
>> https://cr.openjdk.java.net/~stefank/8248346/webrev.01
>> https://bugs.openjdk.java.net/browse/JDK-8248346
>>
>> So far I've only tested this locally with gtest. The product code 
>> should be the same, but the test used slightly different values when 
>> initializing the mutexes. That doesn't seem to affect the tests.
>>
>> Thanks,
>> StefanK
> 


From stefan.karlsson at oracle.com  Fri Jun 26 11:36:41 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Fri, 26 Jun 2020 13:36:41 +0200
Subject: [15] RFR: 8248048: ZGC: AArch64: SIGILL in load barrier register
 spilling
In-Reply-To: <a118bc66-b508-6185-8613-90327165be4d@redhat.com>
References: <bd359011-fb9e-fa82-a21b-aa3360e2d397@oracle.com>
 <5261f83d-46bf-e469-0e40-7b8b534d86c6@redhat.com>
 <a118bc66-b508-6185-8613-90327165be4d@redhat.com>
Message-ID: <33b5349f-a4f0-01cb-855d-c00646a83c05@oracle.com>

Thanks for looking at this.

StefanK

On 2020-06-26 12:43, Andrew Haley wrote:
> On 26/06/2020 11:05, Andrew Dinn wrote:
>> Yes, nice catch. zr is clearly the wrong choice here. In the context of
>> an FP register it ends up being interpreted as q31 which, as you show,
>> clashes when r31 is the last register in an odd register set.
> 
> OK.
> 
> I'm sure we've seen this bug years ago and fixed it. maybe the fix was
> never pushed, or maybe it was another instance of the same error.
> 


From kim.barrett at oracle.com  Fri Jun 26 12:02:43 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 26 Jun 2020 08:02:43 -0400
Subject: RFR (S): 8248221: G1: millis_since_last_gc updated at wrong time
In-Reply-To: <ec81a9b6-7f23-204b-1850-8822a5e7a464@oracle.com>
References: <ca98725a-fdb3-2115-a966-a107746ed033@oracle.com>
 <A1EE64BF-B6FC-41EF-8F3A-240BB65BFD2D@oracle.com>
 <ec81a9b6-7f23-204b-1850-8822a5e7a464@oracle.com>
Message-ID: <529496BB-9F91-4AAE-BB8A-ADD7C7FF74DB@oracle.com>

> On Jun 26, 2020, at 7:00 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi,
> 
> On 25.06.20 20:49, Kim Barrett wrote:
>>> On Jun 25, 2020, at 4:04 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>> 
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8248221
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8248221/webrev/
>>> Testing:
>>> tier1-5 with 8243974 and 8165501, manual testing as described above.
>> ------------------------------------------------------------------------------
>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>> 2067     _time_of_last_gc_ns = os::javaTimeNanos();
>> This variable seems really poorly named.
>> _time_for_millis_since_last_gc ?
>> Of course, millis_since_last_gc's name seems pretty poor too.
>> Maybe update those names as a followup.
>> ------------------------------------------------------------------------------
>> Other than that naming issue, looks good.
> 
>  I totally agree that millis_since_last_gc is very much misnamed, but I would like to not redo the recent three changes in that area because of that. I will file a CR.

Fine by me.

> 
> Thanks for your review,
>  Thomas


From erik.osterlund at oracle.com  Fri Jun 26 13:06:03 2020
From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=)
Date: Fri, 26 Jun 2020 15:06:03 +0200
Subject: [16] RFR: 8248391: Unify handling of all OopStorage instances in weak
 root processing
Message-ID: <8e831a42-61b0-1605-ea71-09f97d82f328@oracle.com>

Hi,

Today, when a weak OopStorage is added, you have to plug it in 
explicitly to ZGC, Shenandoah and the WeakProcessor, used by Shenandoah, 
Serial, Parallel and G1. Especially when the runtime data structure 
associated with an OopStorage needs a notification when oops die. Then 
you have to explicitly plug in notification code in various places in GC 
code.
It would be ideal if this process could be completely automated.

This patch allows each OopStorage to have an associated notification 
function. This is a callback function into the runtime, stating how many 
oops have died this GC cycle. This allows runtime data structures to 
perform accounting for how large part of the data structure needs 
cleaning, and whether to trigger such cleaning or not.

So the interface between the GCs to the OopStorage is that during weak 
processing, the GC promises to call the callback function with how many 
oops died. Some shared infrastructure makes this very easy for the GCs.

Weak processing now uses the OopStorageSet iterators across all GCs, so 
that adding a new weak OopStorage (even with notification functions) 
does not require touching any GC code.

Kudos to Zhengyu for providing some Shenandoah code for this, and 
StefanK for pre-reviewing it. Also, I am about to go out of office now, 
so StefanK promised to take it from here. Big thanks for that!

CR:
https://bugs.openjdk.java.net/browse/JDK-8248391

Webrev:
http://cr.openjdk.java.net/~eosterlund/8248391/webrev.00/

Thanks,
/Erik


From kim.barrett at oracle.com  Fri Jun 26 13:15:42 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 26 Jun 2020 09:15:42 -0400
Subject: RFR (S): 8165501: Serial, Parallel and G1 should track the time
 since last gc for millis_since_last_gc() with full precision
In-Reply-To: <2cf64df3-4bbb-b346-3b56-415479064c7f@oracle.com>
References: <71723e44-d2f0-8c15-7a99-054c0dd870bc@oracle.com>
 <CD2871B2-AC54-4CE9-B22B-D3D1981E7AE3@oracle.com>
 <2cf64df3-4bbb-b346-3b56-415479064c7f@oracle.com>
Message-ID: <726C90BB-8474-4E6D-A9AC-DCAFDB0C9557@oracle.com>

> On Jun 25, 2020, at 7:44 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> I think I caught them all in
> 
> http://cr.openjdk.java.net/~tschatzl/8165501/webrev.1
> 
> no incremental one since almost everything changed anyway, doing another pass at making the code look similar.
> 
> Retested using the mentioned test.
> 
> Thanks,
>  Thomas

Looks okay, so far as it goes.

I have to wonder though, why is millis_since_last_gc a pure virtual?
Indeed, why is it virtual at all? It's not obvious to me why it's not
a relatively simple accessor with an associated (possibly non-public)
update function that is called by each collector with appropriate
values?

I also wonder about the checking for non-monotonic os::javaTimeNanos.
I suspect there are much worse problems than millis_since_last_gc
being wrong if that happens.

Maybe yet more follow-ups in this area?


From zgu at redhat.com  Fri Jun 26 13:25:53 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 26 Jun 2020 09:25:53 -0400
Subject: RFR (M): 8245721: Refactor the TaskTerminator
In-Reply-To: <bd530d10-a4bb-969f-b142-39ff259c91b8@oracle.com>
References: <bd530d10-a4bb-969f-b142-39ff259c91b8@oracle.com>
Message-ID: <b23c7642-f912-56e2-ae83-4a24746be7de@redhat.com>

Hi Thomas,

I believe you can use MonitorLocker (vs. MutexLocker) to remove naked 
_blocker->waitxxxx.

diff -r d76db3e96d46 src/hotspot/share/gc/shared/taskTerminator.cpp
--- a/src/hotspot/share/gc/shared/taskTerminator.cpp    Fri Jun 26 
07:59:40 2020 -0400
+++ b/src/hotspot/share/gc/shared/taskTerminator.cpp    Fri Jun 26 
09:23:00 2020 -0400
@@ -153,7 +153,7 @@
    Thread* the_thread = Thread::current();
    SpinContext spin_context;

-  MutexLocker x(_blocker, Mutex::_no_safepoint_check_flag);
+  MonitorLocker x(_blocker, Mutex::_no_safepoint_check_flag);
    _offered_termination++;

    if (_offered_termination == _n_threads) {
@@ -194,7 +194,7 @@
        // Give up spin master before sleeping.
        _spin_master = NULL;
      }
-    _blocker->wait_without_safepoint_check(WorkStealingSleepMillis);
+    x.wait(WorkStealingSleepMillis);

      // Immediately check exit conditions after re-acquiring the lock.
      if (_offered_termination == _n_threads) {

-Zhengyu

On 6/24/20 4:03 AM, Thomas Schatzl wrote:
> Hi all,
> 
>  ? can I have reviews for this refactoring of the (OWST) TaskTerminator 
> to make the algorithm more understandable.
> 
> The original implementation imho suffers from two issues:
> 
> - manual lock() and unlock() of the _blocker synchronization lock 
> everywhere, distributed around two separate methods.
> 
> - interspersing the actual spinning code somewhere inlined inbetween.
> 
> This change tries to hopefully successfully make reasoning about the 
> code *much* easier by different separation of these two methods, and 
> using scoped locks.
> 
> The final structure of the code has been intensively tested to not cause 
> a regression in performance, however it made a few "obvious" further 
> refactorings undesired due to signficant perf regressions.
> 
> I believe I found a good tradeoff here, but I am of course open to 
> improvements :) I tried to sketch a few of those ultimately unsuccessful 
> attempts in the CR.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8245721
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8245721/webrev/
> Testing:
> tier1-5, many many perf rounds, many tier1-X rounds with other patches
> 
> Thanks,
>  ? Thomas
> 


From per.liden at oracle.com  Fri Jun 26 13:28:29 2020
From: per.liden at oracle.com (Per Liden)
Date: Fri, 26 Jun 2020 15:28:29 +0200
Subject: RFR: 8248266: ZGC: TestUncommit.java fails due to "Exception:
 Uncommitted too fast" again
Message-ID: <440d323d-fb6d-efb7-9b2e-ec056ac23be4@oracle.com>

Hi,

The test gc/z/TestUncommit.java sometimes fail because of bad timing, 
caused by what appears to be an heavily loaded machine so the test 
thread doesn't get to execute in a timely manner.

I've restructured the test a bit, to be less sensitive to this. Instead 
of sleeping and then checking if uncommit has happened, the test will 
now wait until uncommit happens and record the time. When checking the 
time it now uses TIMEOUT_FACTOR to control how strict the check should be.

To keep things simple, I've also broken out the part that tests with 
uncommit disabled into a separate test.

The change in ZPage/ZPageCache rounds up the last_used/last_committed 
timestamps to the nearest second. Without this we will always be 
rounding the time down, which means we can sometimes uncommit one second 
too early, which is wrong but not a big problem, but this can also cause 
TestUncommit.java to fail with "Uncommitted too fast".

Bug: https://bugs.openjdk.java.net/browse/JDK-8248266
Webrev: http://cr.openjdk.java.net/~pliden/8248266/webrev.0

Testing: Ran TestUncommit 100+ times on all Oracle-platforms.

/Per


From thomas.schatzl at oracle.com  Fri Jun 26 14:26:54 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 26 Jun 2020 16:26:54 +0200
Subject: RFR (M): 8245721: Refactor the TaskTerminator
In-Reply-To: <b23c7642-f912-56e2-ae83-4a24746be7de@redhat.com>
References: <bd530d10-a4bb-969f-b142-39ff259c91b8@oracle.com>
 <b23c7642-f912-56e2-ae83-4a24746be7de@redhat.com>
Message-ID: <0fa56471-bafc-340e-4ebd-9a6fa7f14007@oracle.com>

Hi Zhengyu,

   thanks for your review.

On 26.06.20 15:25, Zhengyu Gu wrote:
> Hi Thomas,
> 
> I believe you can use MonitorLocker (vs. MutexLocker) to remove naked 
> _blocker->waitxxxx.

I plan to drop the for-Java Monitor and use os::PlatformMonitor instead 
here. That removes all _no_safepoint_check_flag code.

Also there is no MonitorUnlocker, so we'd still have to use 
MutexUnlocker. It will look a bit weird to not have matching 
locker/unlocker names.

Would it be okay for you to wait for that?

Thanks,
   Thomas

> 
> diff -r d76db3e96d46 src/hotspot/share/gc/shared/taskTerminator.cpp
> --- a/src/hotspot/share/gc/shared/taskTerminator.cpp??? Fri Jun 26 
> 07:59:40 2020 -0400
> +++ b/src/hotspot/share/gc/shared/taskTerminator.cpp??? Fri Jun 26 
> 09:23:00 2020 -0400
> @@ -153,7 +153,7 @@
>  ?? Thread* the_thread = Thread::current();
>  ?? SpinContext spin_context;
> 
> -? MutexLocker x(_blocker, Mutex::_no_safepoint_check_flag);
> +? MonitorLocker x(_blocker, Mutex::_no_safepoint_check_flag);
>  ?? _offered_termination++;
> 
>  ?? if (_offered_termination == _n_threads) {
> @@ -194,7 +194,7 @@
>  ?????? // Give up spin master before sleeping.
>  ?????? _spin_master = NULL;
>  ???? }
> -??? _blocker->wait_without_safepoint_check(WorkStealingSleepMillis);
> +??? x.wait(WorkStealingSleepMillis);
> 
>  ???? // Immediately check exit conditions after re-acquiring the lock.
>  ???? if (_offered_termination == _n_threads) {
> 
> -Zhengyu
> 
> On 6/24/20 4:03 AM, Thomas Schatzl wrote:
>> Hi all,
>>
>> ?? can I have reviews for this refactoring of the (OWST) 
>> TaskTerminator to make the algorithm more understandable.
>>
>> The original implementation imho suffers from two issues:
>>
>> - manual lock() and unlock() of the _blocker synchronization lock 
>> everywhere, distributed around two separate methods.
>>
>> - interspersing the actual spinning code somewhere inlined inbetween.
>>
>> This change tries to hopefully successfully make reasoning about the 
>> code *much* easier by different separation of these two methods, and 
>> using scoped locks.
>>
>> The final structure of the code has been intensively tested to not 
>> cause a regression in performance, however it made a few "obvious" 
>> further refactorings undesired due to signficant perf regressions.
>>
>> I believe I found a good tradeoff here, but I am of course open to 
>> improvements :) I tried to sketch a few of those ultimately 
>> unsuccessful attempts in the CR.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8245721
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8245721/webrev/
>> Testing:
>> tier1-5, many many perf rounds, many tier1-X rounds with other patches
>>
>> Thanks,
>> ?? Thomas
>>
> 


From thomas.schatzl at oracle.com  Fri Jun 26 14:30:17 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 26 Jun 2020 16:30:17 +0200
Subject: RFR (S): 8248322: G1: Refactor full collection sizing code
Message-ID: <1a156b7e-f5fc-9910-1248-8d60f3cb38a1@oracle.com>

Hi all,

   can I have reviews for this small refactoring changes that moves some 
heap sizing policy related code into a the "right" place (into the heap 
sizing policy class), and refactors it a bit.

CR:
https://bugs.openjdk.java.net/browse/JDK-8248322
Webrev:
http://cr.openjdk.java.net/~tschatzl/8248322/webrev/
Testing:
tier1

Thanks,
   Thomas


From zgu at redhat.com  Fri Jun 26 14:32:49 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Fri, 26 Jun 2020 10:32:49 -0400
Subject: RFR (M): 8245721: Refactor the TaskTerminator
In-Reply-To: <0fa56471-bafc-340e-4ebd-9a6fa7f14007@oracle.com>
References: <bd530d10-a4bb-969f-b142-39ff259c91b8@oracle.com>
 <b23c7642-f912-56e2-ae83-4a24746be7de@redhat.com>
 <0fa56471-bafc-340e-4ebd-9a6fa7f14007@oracle.com>
Message-ID: <c38226dd-5912-fb11-7cff-762086bb97a0@redhat.com>


On 6/26/20 10:26 AM, Thomas Schatzl wrote:
> Hi Zhengyu,
> 
>  ? thanks for your review.
> 
> On 26.06.20 15:25, Zhengyu Gu wrote:
>> Hi Thomas,
>>
>> I believe you can use MonitorLocker (vs. MutexLocker) to remove naked 
>> _blocker->waitxxxx.
> 
> I plan to drop the for-Java Monitor and use os::PlatformMonitor instead 
> here. That removes all _no_safepoint_check_flag code.

Yes, sounds good idea.

> 
> Also there is no MonitorUnlocker, so we'd still have to use 
> MutexUnlocker. It will look a bit weird to not have matching 
> locker/unlocker names.
> 
> Would it be okay for you to wait for that?

Sure.

Thanks,

-Zhengyu

> 
> Thanks,
>  ? Thomas
> 
>>
>> diff -r d76db3e96d46 src/hotspot/share/gc/shared/taskTerminator.cpp
>> --- a/src/hotspot/share/gc/shared/taskTerminator.cpp??? Fri Jun 26 
>> 07:59:40 2020 -0400
>> +++ b/src/hotspot/share/gc/shared/taskTerminator.cpp??? Fri Jun 26 
>> 09:23:00 2020 -0400
>> @@ -153,7 +153,7 @@
>> ??? Thread* the_thread = Thread::current();
>> ??? SpinContext spin_context;
>>
>> -? MutexLocker x(_blocker, Mutex::_no_safepoint_check_flag);
>> +? MonitorLocker x(_blocker, Mutex::_no_safepoint_check_flag);
>> ??? _offered_termination++;
>>
>> ??? if (_offered_termination == _n_threads) {
>> @@ -194,7 +194,7 @@
>> ??????? // Give up spin master before sleeping.
>> ??????? _spin_master = NULL;
>> ????? }
>> -??? _blocker->wait_without_safepoint_check(WorkStealingSleepMillis);
>> +??? x.wait(WorkStealingSleepMillis);
>>
>> ????? // Immediately check exit conditions after re-acquiring the lock.
>> ????? if (_offered_termination == _n_threads) {
>>
>> -Zhengyu
>>
>> On 6/24/20 4:03 AM, Thomas Schatzl wrote:
>>> Hi all,
>>>
>>> ?? can I have reviews for this refactoring of the (OWST) 
>>> TaskTerminator to make the algorithm more understandable.
>>>
>>> The original implementation imho suffers from two issues:
>>>
>>> - manual lock() and unlock() of the _blocker synchronization lock 
>>> everywhere, distributed around two separate methods.
>>>
>>> - interspersing the actual spinning code somewhere inlined inbetween.
>>>
>>> This change tries to hopefully successfully make reasoning about the 
>>> code *much* easier by different separation of these two methods, and 
>>> using scoped locks.
>>>
>>> The final structure of the code has been intensively tested to not 
>>> cause a regression in performance, however it made a few "obvious" 
>>> further refactorings undesired due to signficant perf regressions.
>>>
>>> I believe I found a good tradeoff here, but I am of course open to 
>>> improvements :) I tried to sketch a few of those ultimately 
>>> unsuccessful attempts in the CR.
>>>
>>> CR:
>>> https://bugs.openjdk.java.net/browse/JDK-8245721
>>> Webrev:
>>> http://cr.openjdk.java.net/~tschatzl/8245721/webrev/
>>> Testing:
>>> tier1-5, many many perf rounds, many tier1-X rounds with other patches
>>>
>>> Thanks,
>>> ?? Thomas
>>>
>>
> 


From kdnilsen at amazon.com  Fri Jun 26 20:21:10 2020
From: kdnilsen at amazon.com (Nilsen, Kelvin)
Date: Fri, 26 Jun 2020 20:21:10 +0000
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <f555bdde6e573fe98b00ee5469921cfabd276aab.camel@redhat.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
 <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
 <43f1a7648b5861d9a7ff16622ccea4a5e6164c3c.camel@redhat.com>
 <e5692fd1-c127-bf12-704b-ebe1853b5d36@redhat.com>
 <f555bdde6e573fe98b00ee5469921cfabd276aab.camel@redhat.com>
Message-ID: <A59F6903-FBFF-40E1-92BD-243537256FEB@amazon.com>

Is there consensus that we should use CAS instruction instead of ldxr/stxr?

Presumably, there are some platforms where ldxr/stxr performs better than CAS, or at least there is the potential that such would exist.

Perhaps the JIT and run-time should adjust their behavior depending on the host platform.

Perhaps the whole issue of which synchronization primitives to use should be addressed in a different ticket.

I am willing to rework this patch.  Just need some clear guidance as to which direction to move it.

Thanks.


?On 6/24/20, 8:28 AM, "Roman Kennke" <rkennke at redhat.com> wrote:

      On Wed, 2020-06-24 at 16:22 +0100, Andrew Haley wrote:
    > On 24/06/2020 15:48, Roman Kennke wrote:
    > > On Wed, 2020-06-24 at 15:29 +0100, Andrew Haley wrote:
    > > > On 24/06/2020 14:54, Nilsen, Kelvin wrote:
    > > > > Is this ok to merge?
    > > >
    > > > One thing:
    > > >
    > > > Some CPUs, in particular those based on Neoverse N1, can perform
    > > > very
    > > > badly when using ldxr/stxr. For that reason, all code doing CAS
    > > >
    > > > I can't see any reason why your code needs to use ldxr/stxr. Is
    > > > there
    > > > any?
    > >
    > > As far as I know, Shenandoah's AArch64-CAS-implementation always
    > > did it
    > > that way (don't remember why). If regular CAS is generally better,
    > > then
    > > we should go for it.
    >
    > Does this algorithm need a full barrier even when CAS fails?

    We need to do extra work *only* when CAS fails. We need to catch false
    negatives -- when the compare-value is to-space (that's guaranteed) and
    the value in memory is from-space copy of the same object.

    Roman


From rkennke at redhat.com  Fri Jun 26 21:00:34 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Fri, 26 Jun 2020 23:00:34 +0200
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <A59F6903-FBFF-40E1-92BD-243537256FEB@amazon.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
 <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
 <43f1a7648b5861d9a7ff16622ccea4a5e6164c3c.camel@redhat.com>
 <e5692fd1-c127-bf12-704b-ebe1853b5d36@redhat.com>
 <f555bdde6e573fe98b00ee5469921cfabd276aab.camel@redhat.com>
 <A59F6903-FBFF-40E1-92BD-243537256FEB@amazon.com>
Message-ID: <e2e6d87caf1e1793387262036288d7ccc32b9281.camel@redhat.com>

I believe if you do what Andrew Haley suggested an use
MacroAssembler::cmpxchg() it will do a CAS if supported by the
platform, or ldxr/stxr if not. So either we simply use that, or maybe
come up with two different implementations and select one or the other
like MacroAssembler::cmpxchg() does? Not sure if there would be any
advantage in the latter.

Roman

On Fri, 2020-06-26 at 20:21 +0000, Nilsen, Kelvin wrote:
> Is there consensus that we should use CAS instruction instead of
> ldxr/stxr?
> 
> Presumably, there are some platforms where ldxr/stxr performs better
> than CAS, or at least there is the potential that such would exist.
> 
> Perhaps the JIT and run-time should adjust their behavior depending
> on the host platform.
> 
> Perhaps the whole issue of which synchronization primitives to use
> should be addressed in a different ticket.
> 
> I am willing to rework this patch.  Just need some clear guidance as
> to which direction to move it.
> 
> Thanks.
> 
> 
> ?On 6/24/20, 8:28 AM, "Roman Kennke" <rkennke at redhat.com> wrote:
> 
>       On Wed, 2020-06-24 at 16:22 +0100, Andrew Haley wrote:
>     > On 24/06/2020 15:48, Roman Kennke wrote:
>     > > On Wed, 2020-06-24 at 15:29 +0100, Andrew Haley wrote:
>     > > > On 24/06/2020 14:54, Nilsen, Kelvin wrote:
>     > > > > Is this ok to merge?
>     > > >
>     > > > One thing:
>     > > >
>     > > > Some CPUs, in particular those based on Neoverse N1, can
> perform
>     > > > very
>     > > > badly when using ldxr/stxr. For that reason, all code doing
> CAS
>     > > >
>     > > > I can't see any reason why your code needs to use
> ldxr/stxr. Is
>     > > > there
>     > > > any?
>     > >
>     > > As far as I know, Shenandoah's AArch64-CAS-implementation
> always
>     > > did it
>     > > that way (don't remember why). If regular CAS is generally
> better,
>     > > then
>     > > we should go for it.
>     >
>     > Does this algorithm need a full barrier even when CAS fails?
> 
>     We need to do extra work *only* when CAS fails. We need to catch
> false
>     negatives -- when the compare-value is to-space (that's
> guaranteed) and
>     the value in memory is from-space copy of the same object.
> 
>     Roman
> 
> 


From aph at redhat.com  Sat Jun 27 08:02:39 2020
From: aph at redhat.com (Andrew Haley)
Date: Sat, 27 Jun 2020 09:02:39 +0100
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <A59F6903-FBFF-40E1-92BD-243537256FEB@amazon.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
 <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
 <43f1a7648b5861d9a7ff16622ccea4a5e6164c3c.camel@redhat.com>
 <e5692fd1-c127-bf12-704b-ebe1853b5d36@redhat.com>
 <f555bdde6e573fe98b00ee5469921cfabd276aab.camel@redhat.com>
 <A59F6903-FBFF-40E1-92BD-243537256FEB@amazon.com>
Message-ID: <46e2400d-0613-f9e0-cb06-a89180ca6f5f@redhat.com>

On 26/06/2020 21:21, Nilsen, Kelvin wrote:
> Is there consensus that we should use CAS instruction instead of ldxr/stxr?
> 
> Presumably, there are some platforms where ldxr/stxr performs better than CAS, or at least there is the potential that such would exist.
> 
> Perhaps the JIT and run-time should adjust their behavior depending on the host platform.

That's exactly what it does.

> Perhaps the whole issue of which synchronization primitives to use should be addressed in a different ticket.
> 
> I am willing to rework this patch.  Just need some clear guidance as to which direction to move it.

Simple: don't use ldxr/stxr, call MacroAssembler::cmpxchg() . It will do
the right thing on whatever platform it runs on.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From aph at redhat.com  Sat Jun 27 08:35:04 2020
From: aph at redhat.com (Andrew Haley)
Date: Sat, 27 Jun 2020 09:35:04 +0100
Subject: RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier
 (aarch64)
In-Reply-To: <e2e6d87caf1e1793387262036288d7ccc32b9281.camel@redhat.com>
References: <EBBC4610-3589-413C-B7B3-019610FFDD94@amazon.com>
 <34978650-be69-1e01-e11e-608f205338ff@redhat.com>
 <43f1a7648b5861d9a7ff16622ccea4a5e6164c3c.camel@redhat.com>
 <e5692fd1-c127-bf12-704b-ebe1853b5d36@redhat.com>
 <f555bdde6e573fe98b00ee5469921cfabd276aab.camel@redhat.com>
 <A59F6903-FBFF-40E1-92BD-243537256FEB@amazon.com>
 <e2e6d87caf1e1793387262036288d7ccc32b9281.camel@redhat.com>
Message-ID: <842546fa-93b0-3d0f-e6f7-1c4dfc5b9931@redhat.com>

On 26/06/2020 22:00, Roman Kennke wrote:
> I believe if you do what Andrew Haley suggested an use
> MacroAssembler::cmpxchg() it will do a CAS if supported by the
> platform, or ldxr/stxr if not. So either we simply use that, or maybe
> come up with two different implementations and select one or the other
> like MacroAssembler::cmpxchg() does? Not sure if there would be any
> advantage in the latter.

Please don't.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From patrick at os.amperecomputing.com  Sat Jun 27 09:32:37 2020
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Sat, 27 Jun 2020 09:32:37 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>

The message from this sender included one or more files
which could not be scanned for virus detection; do not
open these files unless you are certain of the sender's intent.

----------------------------------------------------------------------
Thanks Goetz

I updated the of reviewers, http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-dev.changeset. Regarding the performance, I had tests on Linux system with a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here would be appropriate, by far, most results of this benchmark are positive especially the metrics sensitive to GC stability (G1 or ParallelGC), and no obvious change with others probably due to microarchitecture level differences in handling exclusive load/store. This is similar as the original patch [1].

Updated "Fix request (11u)" with a risk estimation of this downporting, see JBS [1] please.

I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

[1] https://bugs.openjdk.java.net/browse/JDK-8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14349531


Regards

Patrick


-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Sent: Friday, June 26, 2020 3:17 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com>; jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi Patrick,


I had a look at your change.

I think it makes sense to bring this to 11, if there actually is the performance gain you mention.

Reviewed.


Please add in the "Fix request" comment in the JBS the risk of downporting this.  And I think is should be "Fix request (11u)"

because different people will review your fix request for 11 and 8.


Best regards,

  Goetz.


> -----Original Message-----

> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net<mailto:jdk-updates-dev-bounces at openjdk.java.net>> On

> Behalf Of Patrick Zhang OS

> Sent: Wednesday, June 24, 2020 11:55 AM

> To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>

> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>

> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for

> TaskQueueSuper to reduce false-sharing cache contention

>

> Hi

>

> Could I ask for a review of this simple patch which takes a tiny part

> from the original ticket JDK-8243326 [1]. The reason that I do not

> want a full backport is, the majority of the patch at jdk/jdk [2] is

> to clean up the volatile use and may be not very meaningful to 11u,

> furthermore the context (dependencies on atomic.hpp refactor) is too

> complicated to generate a clear backport (I tried, ~81 files need to be changed).

>

> The purpose of having this one-line change to 11u is, the two volatile

> variables in TaskQueueSuper: _bottom, _age and corresponding atomic

> operations upon, may cause severe cache contention inside GC with

> larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,

> adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce

> the possibility of false-sharing cache contention. I do not need the

> paddings before _bottom and after _age from the original patch [2],

> because the instances of TaskQueueSuper are usually (always) allocated

> in a set of queues, in which they are naturally separated. Please review, thanks.

>

> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

> Testing: tier1-2 pass with the patch, commercial benchmarks and small

> C++ test cases (to simulate the data struct and work-stealing

> algorithm atomics) validated the performance, no regression.

>

> By the way, I am going to request for 8u backport as well once 11u

> would have it.

>

> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of

> volatile in taskqueue code [2]

> https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

>

> Regards

> Patrick

>


From shade at redhat.com  Mon Jun 29 07:04:09 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 29 Jun 2020 09:04:09 +0200
Subject: [16] RFR 8248227: Shenandoah: Refactor Shenandoah::heap() to
 match other GCs
In-Reply-To: <d82940ba-d34c-ec3d-3103-b0c44b195183@redhat.com>
References: <d82940ba-d34c-ec3d-3103-b0c44b195183@redhat.com>
Message-ID: <ce323dea-2425-eae3-8ea1-91768684e036@redhat.com>

On 6/24/20 2:18 PM, Zhengyu Gu wrote:
> Please review this small patch that refactors Shenandoah::heap() to 
> match other GCs.
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8248227
> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8248227/webev.00/index.html

Wait. We do the same thing as ZHeap::heap(). This patch clashes with the intent of JDK-8241743. At
very least we have to prove it compiles down to the same.

-- 
Thanks,
-Aleksey


From christoph.langer at sap.com  Mon Jun 29 07:48:19 2020
From: christoph.langer at sap.com (Langer, Christoph)
Date: Mon, 29 Jun 2020 07:48:19 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <AM0PR02MB4337235665876CF5B2B94BDE8A6E0@AM0PR02MB4337.eurprd02.prod.outlook.com>

Hi Patrick,

yes, you need to wait for maintainers approval before pushing. Reason for that process is that apart from the technical review of your change, maintainers will also have a look focusing on whether a change is appropriate at for an update release, e.g. risk assessment etc.

But I've approved it now, so you can go ahead ?

When pushing, you should also update the copyright year of the file.

Best regards
Christoph

> -----Original Message-----
> From: jdk-updates-dev <jdk-updates-dev-retn at openjdk.java.net> On
> Behalf Of Patrick Zhang OS
> Sent: Samstag, 27. Juni 2020 11:33
> To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; jdk-updates-
> dev at openjdk.java.net
> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject: [DMARC FAILURE] RE: [11u] RFR: 8244214: Add paddings for
> TaskQueuSuper to reduce false-sharing cache contention
> 
> The message from this sender included one or more files
> which could not be scanned for virus detection; do not
> open these files unless you are certain of the sender's intent.
> 
> ----------------------------------------------------------------------
> Thanks Goetz
> 
> I updated the of reviewers,
> http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-
> dev.changeset. Regarding the performance, I had tests on Linux system with
> a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here
> would be appropriate, by far, most results of this benchmark are positive
> especially the metrics sensitive to GC stability (G1 or ParallelGC), and no
> obvious change with others probably due to microarchitecture level
> differences in handling exclusive load/store. This is similar as the original
> patch [1].
> 
> Updated "Fix request (11u)" with a risk estimation of this downporting, see
> JBS [1] please.
> 
> I am not familiar with the process of jdk-updates. Is it ok to push this
> downporting patch now? or I should still wait for maintainer's approval at JBS
> (jdk11u-fix-yes?).
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-
> 8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.syst
> em.issuetabpanels:comment-tabpanel#comment-14349531
> 
> 
> Regards
> 
> Patrick
> 
> 
> 
> -----Original Message-----
> From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Sent: Friday, June 26, 2020 3:17 PM
> To: Patrick Zhang OS <patrick at os.amperecomputing.com>; jdk-updates-
> dev at openjdk.java.net
> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce
> false-sharing cache contention
> 
> 
> 
> Hi Patrick,
> 
> 
> 
> I had a look at your change.
> 
> I think it makes sense to bring this to 11, if there actually is the performance
> gain you mention.
> 
> Reviewed.
> 
> 
> 
> Please add in the "Fix request" comment in the JBS the risk of downporting
> this.  And I think is should be "Fix request (11u)"
> 
> because different people will review your fix request for 11 and 8.
> 
> 
> 
> Best regards,
> 
>   Goetz.
> 
> 
> 
> > -----Original Message-----
> 
> > From: jdk-updates-dev <jdk-updates-dev-
> bounces at openjdk.java.net<mailto:jdk-updates-dev-
> bounces at openjdk.java.net>> On
> 
> > Behalf Of Patrick Zhang OS
> 
> > Sent: Wednesday, June 24, 2020 11:55 AM
> 
> > To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-
> dev at openjdk.java.net>
> 
> > Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-
> dev at openjdk.java.net>>
> 
> > Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for
> 
> > TaskQueueSuper to reduce false-sharing cache contention
> 
> >
> 
> > Hi
> 
> >
> 
> > Could I ask for a review of this simple patch which takes a tiny part
> 
> > from the original ticket JDK-8243326 [1]. The reason that I do not
> 
> > want a full backport is, the majority of the patch at jdk/jdk [2] is
> 
> > to clean up the volatile use and may be not very meaningful to 11u,
> 
> > furthermore the context (dependencies on atomic.hpp refactor) is too
> 
> > complicated to generate a clear backport (I tried, ~81 files need to be
> changed).
> 
> >
> 
> > The purpose of having this one-line change to 11u is, the two volatile
> 
> > variables in TaskQueueSuper: _bottom, _age and corresponding atomic
> 
> > operations upon, may cause severe cache contention inside GC with
> 
> > larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,
> 
> > adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can
> reduce
> 
> > the possibility of false-sharing cache contention. I do not need the
> 
> > paddings before _bottom and after _age from the original patch [2],
> 
> > because the instances of TaskQueueSuper are usually (always) allocated
> 
> > in a set of queues, in which they are naturally separated. Please review,
> thanks.
> 
> >
> 
> > JBS: https://bugs.openjdk.java.net/browse/JDK-8248214
> 
> > Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/
> 
> > Testing: tier1-2 pass with the patch, commercial benchmarks and small
> 
> > C++ test cases (to simulate the data struct and work-stealing
> 
> > algorithm atomics) validated the performance, no regression.
> 
> >
> 
> > By the way, I am going to request for 8u backport as well once 11u
> 
> > would have it.
> 
> >
> 
> > [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of
> 
> > volatile in taskqueue code [2]
> 
> > https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6
> 
> >
> 
> > Regards
> 
> > Patrick
> 
> >
> 


From goetz.lindenmaier at sap.com  Mon Jun 29 08:19:46 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 29 Jun 2020 08:19:46 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <AM4PR0202MB2964D26BBC778A489F13253AEC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi Patrick,

The change looks good now. Please remove the "Summary:" line. It only
repeats the bug title, and thus is redundant.
You can use the Summary line if you want to add more than the bug title.
You might add "Summary: This is a downport of a part of JDK-8243326"

Also, the Bugid in the mail Subject is wrong ...  But no matter, it's all
set now.

Best regards,
  Goetz.

From: Patrick Zhang OS <patrick at os.amperecomputing.com>
Sent: Saturday, June 27, 2020 11:33 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Thanks Goetz

I updated the of reviewers, http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-dev.changeset. Regarding the performance, I had tests on Linux system with a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here would be appropriate, by far, most results of this benchmark are positive especially the metrics sensitive to GC stability (G1 or ParallelGC), and no obvious change with others probably due to microarchitecture level differences in handling exclusive load/store. This is similar as the original patch [1].

Updated "Fix request (11u)" with a risk estimation of this downporting, see JBS [1] please.

I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

[1] https://bugs.openjdk.java.net/browse/JDK-8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14349531


Regards

Patrick


-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Friday, June 26, 2020 3:17 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi Patrick,


I had a look at your change.

I think it makes sense to bring this to 11, if there actually is the performance gain you mention.

Reviewed.


Please add in the "Fix request" comment in the JBS the risk of downporting this.  And I think is should be "Fix request (11u)"

because different people will review your fix request for 11 and 8.


Best regards,

  Goetz.


> -----Original Message-----

> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net<mailto:jdk-updates-dev-bounces at openjdk.java.net>> On

> Behalf Of Patrick Zhang OS

> Sent: Wednesday, June 24, 2020 11:55 AM

> To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>

> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>

> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for

> TaskQueueSuper to reduce false-sharing cache contention

>

> Hi

>

> Could I ask for a review of this simple patch which takes a tiny part

> from the original ticket JDK-8243326 [1]. The reason that I do not

> want a full backport is, the majority of the patch at jdk/jdk [2] is

> to clean up the volatile use and may be not very meaningful to 11u,

> furthermore the context (dependencies on atomic.hpp refactor) is too

> complicated to generate a clear backport (I tried, ~81 files need to be changed).

>

> The purpose of having this one-line change to 11u is, the two volatile

> variables in TaskQueueSuper: _bottom, _age and corresponding atomic

> operations upon, may cause severe cache contention inside GC with

> larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,

> adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce

> the possibility of false-sharing cache contention. I do not need the

> paddings before _bottom and after _age from the original patch [2],

> because the instances of TaskQueueSuper are usually (always) allocated

> in a set of queues, in which they are naturally separated. Please review, thanks.

>

> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

> Testing: tier1-2 pass with the patch, commercial benchmarks and small

> C++ test cases (to simulate the data struct and work-stealing

> algorithm atomics) validated the performance, no regression.

>

> By the way, I am going to request for 8u backport as well once 11u

> would have it.

>

> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of

> volatile in taskqueue code [2]

> https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

>

> Regards

> Patrick

>


From sgehwolf at redhat.com  Mon Jun 29 08:46:18 2020
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Mon, 29 Jun 2020 10:46:18 +0200
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <c00fc83675f0d85e1d5da4c0d7a7c8d69eef9ba4.camel@redhat.com>

On Sat, 2020-06-27 at 09:32 +0000, Patrick Zhang OS wrote:
> I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

Yes. Pushes to jdk11u-dev should wait for jdk11u-fix-yes label.

Thanks,
Severin


From patrick at os.amperecomputing.com  Mon Jun 29 09:38:38 2020
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Mon, 29 Jun 2020 09:38:38 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <AM4PR0202MB2964D26BBC778A489F13253AEC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964D26BBC778A489F13253AEC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <DM6PR01MB4921D9EA94D4DBC89FE596BF8F6E0@DM6PR01MB4921.prod.exchangelabs.com>

Hi Goetz and Christoph,


Thanks for reviewing.

I updated the copyright year and summary line accordingly: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.03/jdk11u-dev.changeset.


Very appreciate if any committer could do me a favor and help pushing it.

Regards
Patrick

From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Sent: Monday, June 29, 2020 4:20 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com>; jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Hi Patrick,

The change looks good now. Please remove the "Summary:" line. It only
repeats the bug title, and thus is redundant.
You can use the Summary line if you want to add more than the bug title.
You might add "Summary: This is a downport of a part of JDK-8243326"

Also, the Bugid in the mail Subject is wrong ...  But no matter, it's all
set now.
[Patrick Zhang] sorry about the typos there, I could not 'fix' as it would get a new thread in mail list...

Best regards,
  Goetz.

From: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>
Sent: Saturday, June 27, 2020 11:33 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Thanks Goetz

I updated the of reviewers, http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-dev.changeset. Regarding the performance, I had tests on Linux system with a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here would be appropriate, by far, most results of this benchmark are positive especially the metrics sensitive to GC stability (G1 or ParallelGC), and no obvious change with others probably due to microarchitecture level differences in handling exclusive load/store. This is similar as the original patch [1].

Updated "Fix request (11u)" with a risk estimation of this downporting, see JBS [1] please.

I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

[1] https://bugs.openjdk.java.net/browse/JDK-8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14349531


Regards

Patrick


-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Friday, June 26, 2020 3:17 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi Patrick,


I had a look at your change.

I think it makes sense to bring this to 11, if there actually is the performance gain you mention.

Reviewed.


Please add in the "Fix request" comment in the JBS the risk of downporting this.  And I think is should be "Fix request (11u)"

because different people will review your fix request for 11 and 8.


Best regards,

  Goetz.


> -----Original Message-----

> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net<mailto:jdk-updates-dev-bounces at openjdk.java.net>> On

> Behalf Of Patrick Zhang OS

> Sent: Wednesday, June 24, 2020 11:55 AM

> To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>

> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>

> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for

> TaskQueueSuper to reduce false-sharing cache contention

>

> Hi

>

> Could I ask for a review of this simple patch which takes a tiny part

> from the original ticket JDK-8243326 [1]. The reason that I do not

> want a full backport is, the majority of the patch at jdk/jdk [2] is

> to clean up the volatile use and may be not very meaningful to 11u,

> furthermore the context (dependencies on atomic.hpp refactor) is too

> complicated to generate a clear backport (I tried, ~81 files need to be changed).

>

> The purpose of having this one-line change to 11u is, the two volatile

> variables in TaskQueueSuper: _bottom, _age and corresponding atomic

> operations upon, may cause severe cache contention inside GC with

> larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,

> adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce

> the possibility of false-sharing cache contention. I do not need the

> paddings before _bottom and after _age from the original patch [2],

> because the instances of TaskQueueSuper are usually (always) allocated

> in a set of queues, in which they are naturally separated. Please review, thanks.

>

> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

> Testing: tier1-2 pass with the patch, commercial benchmarks and small

> C++ test cases (to simulate the data struct and work-stealing

> algorithm atomics) validated the performance, no regression.

>

> By the way, I am going to request for 8u backport as well once 11u

> would have it.

>

> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of

> volatile in taskqueue code [2]

> https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

>

> Regards

> Patrick

>


From christoph.langer at sap.com  Mon Jun 29 11:47:29 2020
From: christoph.langer at sap.com (Langer, Christoph)
Date: Mon, 29 Jun 2020 11:47:29 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB4921D9EA94D4DBC89FE596BF8F6E0@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964D26BBC778A489F13253AEC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921D9EA94D4DBC89FE596BF8F6E0@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <AM0PR02MB4337C2910D6210332D00E4068A6E0@AM0PR02MB4337.eurprd02.prod.outlook.com>

Hi Patrick,

Matthias just notified me of issues with the Mac build with your patch. So, in case anybody was about to push this, please hold off. We're looking into it...

Thanks
Christoph


From: Patrick Zhang OS <patrick at os.amperecomputing.com>
Sent: Montag, 29. Juni 2020 11:39
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Langer, Christoph <christoph.langer at sap.com>; jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention


Hi Goetz and Christoph,


Thanks for reviewing.

I updated the copyright year and summary line accordingly: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.03/jdk11u-dev.changeset.


Very appreciate if any committer could do me a favor and help pushing it.

Regards
Patrick

From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Monday, June 29, 2020 4:20 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Hi Patrick,

The change looks good now. Please remove the "Summary:" line. It only
repeats the bug title, and thus is redundant.
You can use the Summary line if you want to add more than the bug title.
You might add "Summary: This is a downport of a part of JDK-8243326"

Also, the Bugid in the mail Subject is wrong ...  But no matter, it's all
set now.
[Patrick Zhang] sorry about the typos there, I could not 'fix' as it would get a new thread in mail list...

Best regards,
  Goetz.

From: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>
Sent: Saturday, June 27, 2020 11:33 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Thanks Goetz

I updated the of reviewers, http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-dev.changeset. Regarding the performance, I had tests on Linux system with a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here would be appropriate, by far, most results of this benchmark are positive especially the metrics sensitive to GC stability (G1 or ParallelGC), and no obvious change with others probably due to microarchitecture level differences in handling exclusive load/store. This is similar as the original patch [1].

Updated "Fix request (11u)" with a risk estimation of this downporting, see JBS [1] please.

I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

[1] https://bugs.openjdk.java.net/browse/JDK-8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14349531


Regards

Patrick


-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Friday, June 26, 2020 3:17 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi Patrick,


I had a look at your change.

I think it makes sense to bring this to 11, if there actually is the performance gain you mention.

Reviewed.


Please add in the "Fix request" comment in the JBS the risk of downporting this.  And I think is should be "Fix request (11u)"

because different people will review your fix request for 11 and 8.


Best regards,

  Goetz.


> -----Original Message-----

> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net<mailto:jdk-updates-dev-bounces at openjdk.java.net>> On

> Behalf Of Patrick Zhang OS

> Sent: Wednesday, June 24, 2020 11:55 AM

> To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>

> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>

> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for

> TaskQueueSuper to reduce false-sharing cache contention

>

> Hi

>

> Could I ask for a review of this simple patch which takes a tiny part

> from the original ticket JDK-8243326 [1]. The reason that I do not

> want a full backport is, the majority of the patch at jdk/jdk [2] is

> to clean up the volatile use and may be not very meaningful to 11u,

> furthermore the context (dependencies on atomic.hpp refactor) is too

> complicated to generate a clear backport (I tried, ~81 files need to be changed).

>

> The purpose of having this one-line change to 11u is, the two volatile

> variables in TaskQueueSuper: _bottom, _age and corresponding atomic

> operations upon, may cause severe cache contention inside GC with

> larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,

> adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce

> the possibility of false-sharing cache contention. I do not need the

> paddings before _bottom and after _age from the original patch [2],

> because the instances of TaskQueueSuper are usually (always) allocated

> in a set of queues, in which they are naturally separated. Please review, thanks.

>

> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

> Testing: tier1-2 pass with the patch, commercial benchmarks and small

> C++ test cases (to simulate the data struct and work-stealing

> algorithm atomics) validated the performance, no regression.

>

> By the way, I am going to request for 8u backport as well once 11u

> would have it.

>

> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of

> volatile in taskqueue code [2]

> https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

>

> Regards

> Patrick

>


From goetz.lindenmaier at sap.com  Mon Jun 29 11:52:59 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 29 Jun 2020 11:52:59 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB4921D9EA94D4DBC89FE596BF8F6E0@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964D26BBC778A489F13253AEC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921D9EA94D4DBC89FE596BF8F6E0@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <AM4PR0202MB29642AA64C30744E74AF3B75EC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>

The message from this sender included one or more files
which could not be scanned for virus detection; do not
open these files unless you are certain of the sender's intent.

----------------------------------------------------------------------
Hi Patrick,

Please give it a moment ... we saw a build error on Mac:

In file included from /usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/parallel/asPSYoungGen.cpp:29:
In file included from /usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/parallel/psScavenge.inline.hpp:29:
In file included from /usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp:32:
In file included from /usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/parallel/psPromotionManager.hpp:32:
/usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/shared/taskqueue.hpp:153:60: error: invalid use of non-static data member '_bottom'
  DEFINE_PAD_MINUS_SIZE(0, DEFAULT_CACHE_LINE_SIZE, sizeof(_bottom));
                                                           ^~~~~~~
/usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/memory/padded.hpp:87:44: note: expanded from macro 'DEFINE_PAD_MINUS_SIZE'
          char _pad_buf##id[(alignment) - (size)]
                                           ^~~~
In file included from /usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/parallel/asPSYoungGen.cpp:29:
In file included from /usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/parallel/psScavenge.inline.hpp:29:
In file included from /usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp:32:
In file included from /usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/parallel/psPromotionManager.hpp:32:
/usr/work/openjdk/nb/darwinintel64/nightly/jdk11u-dev/src/hotspot/share/gc/shared/taskqueue.hpp:257:42: error: 'Age' is a protected member of 'TaskQueueSuper<131072, MemoryType::mtGC>'
if test `/usr/bin/wc -l < /usr/work/openjdk/nb/darwinintel64/nightly/output-jdk11-dev-fastdebug/make-support/failure-logs/hotspot_variant-server_libjvm_objs_asPSYoungGen.o.log` -gt 15; then /bin/echo "   ... (rest of output omitted)" ; fi
   ... (rest of output omitted)
/usr/bin/printf "\n* All command lines available in /usr/work/openjdk/nb/darwinintel64/nightly/output-jdk11-dev-fastdebug/make-support/failure-logs.\n"

* All command lines available in /usr/work/openjdk/nb/darwinintel64/nightly/output-jdk11-dev-fastdebug/make-support/failure-logs.

Do you have a mac for testing at hand?  Else I can have a look later on,
currently I'm busy with a bug in 15...

Best regards,
  Goetz.


From: Patrick Zhang OS <patrick at os.amperecomputing.com>
Sent: Monday, June 29, 2020 11:39 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Langer, Christoph <christoph.langer at sap.com>; jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention


Hi Goetz and Christoph,


Thanks for reviewing.

I updated the copyright year and summary line accordingly: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.03/jdk11u-dev.changeset.


Very appreciate if any committer could do me a favor and help pushing it.

Regards
Patrick

From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Monday, June 29, 2020 4:20 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Hi Patrick,

The change looks good now. Please remove the "Summary:" line. It only
repeats the bug title, and thus is redundant.
You can use the Summary line if you want to add more than the bug title.
You might add "Summary: This is a downport of a part of JDK-8243326"

Also, the Bugid in the mail Subject is wrong ...  But no matter, it's all
set now.
[Patrick Zhang] sorry about the typos there, I could not 'fix' as it would get a new thread in mail list...

Best regards,
  Goetz.

From: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>
Sent: Saturday, June 27, 2020 11:33 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Thanks Goetz

I updated the of reviewers, http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-dev.changeset. Regarding the performance, I had tests on Linux system with a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here would be appropriate, by far, most results of this benchmark are positive especially the metrics sensitive to GC stability (G1 or ParallelGC), and no obvious change with others probably due to microarchitecture level differences in handling exclusive load/store. This is similar as the original patch [1].

Updated "Fix request (11u)" with a risk estimation of this downporting, see JBS [1] please.

I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

[1] https://bugs.openjdk.java.net/browse/JDK-8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14349531


Regards

Patrick


-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Friday, June 26, 2020 3:17 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi Patrick,


I had a look at your change.

I think it makes sense to bring this to 11, if there actually is the performance gain you mention.

Reviewed.


Please add in the "Fix request" comment in the JBS the risk of downporting this.  And I think is should be "Fix request (11u)"

because different people will review your fix request for 11 and 8.


Best regards,

  Goetz.


> -----Original Message-----

> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net<mailto:jdk-updates-dev-bounces at openjdk.java.net>> On

> Behalf Of Patrick Zhang OS

> Sent: Wednesday, June 24, 2020 11:55 AM

> To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>

> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>

> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for

> TaskQueueSuper to reduce false-sharing cache contention

>

> Hi

>

> Could I ask for a review of this simple patch which takes a tiny part

> from the original ticket JDK-8243326 [1]. The reason that I do not

> want a full backport is, the majority of the patch at jdk/jdk [2] is

> to clean up the volatile use and may be not very meaningful to 11u,

> furthermore the context (dependencies on atomic.hpp refactor) is too

> complicated to generate a clear backport (I tried, ~81 files need to be changed).

>

> The purpose of having this one-line change to 11u is, the two volatile

> variables in TaskQueueSuper: _bottom, _age and corresponding atomic

> operations upon, may cause severe cache contention inside GC with

> larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,

> adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce

> the possibility of false-sharing cache contention. I do not need the

> paddings before _bottom and after _age from the original patch [2],

> because the instances of TaskQueueSuper are usually (always) allocated

> in a set of queues, in which they are naturally separated. Please review, thanks.

>

> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

> Testing: tier1-2 pass with the patch, commercial benchmarks and small

> C++ test cases (to simulate the data struct and work-stealing

> algorithm atomics) validated the performance, no regression.

>

> By the way, I am going to request for 8u backport as well once 11u

> would have it.

>

> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of

> volatile in taskqueue code [2]

> https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

>

> Regards

> Patrick

>


From zgu at redhat.com  Mon Jun 29 13:29:38 2020
From: zgu at redhat.com (Zhengyu Gu)
Date: Mon, 29 Jun 2020 09:29:38 -0400
Subject: [16] RFR 8248227: Shenandoah: Refactor Shenandoah::heap() to
 match other GCs
In-Reply-To: <ce323dea-2425-eae3-8ea1-91768684e036@redhat.com>
References: <d82940ba-d34c-ec3d-3103-b0c44b195183@redhat.com>
 <ce323dea-2425-eae3-8ea1-91768684e036@redhat.com>
Message-ID: <a224b312-0de8-def3-8cbd-611a3f965d42@redhat.com>

Hi Aleksey,

On 6/29/20 3:04 AM, Aleksey Shipilev wrote:
> On 6/24/20 2:18 PM, Zhengyu Gu wrote:
>> Please review this small patch that refactors Shenandoah::heap() to
>> match other GCs.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248227
>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8248227/webev.00/index.html
> 
> Wait. We do the same thing as ZHeap::heap(). This patch clashes with the intent of JDK-8241743. At
> very least we have to prove it compiles down to the same.
> 

Based on comments of JDK-8247740, it intends to make the call inline-able.

ZGC is different story, it is not derived from CollectedHeap.

I checked a few places, it does seem to inline ShenandoahHeap::heap() 
calls, e.g.

    0x00007ffff780c29c <+28>:	push   %rbx
    0x00007ffff780c29d <+29>:	sub    $0x48,%rsp
    0x00007ffff780c2a1 <+33>:	lea    0x54e8c8(%rip),%rax        # 
0x7ffff7d5ab70 <_ZN14ShenandoahHeap5_heapE>
    0x00007ffff780c2a8 <+40>:	mov    (%rax),%rbx


Please feel free to reject the patch, if you otherwise.

Thanks,

-Zhengyu


From shade at redhat.com  Mon Jun 29 13:31:50 2020
From: shade at redhat.com (Aleksey Shipilev)
Date: Mon, 29 Jun 2020 15:31:50 +0200
Subject: [16] RFR 8248227: Shenandoah: Refactor Shenandoah::heap() to
 match other GCs
In-Reply-To: <a224b312-0de8-def3-8cbd-611a3f965d42@redhat.com>
References: <d82940ba-d34c-ec3d-3103-b0c44b195183@redhat.com>
 <ce323dea-2425-eae3-8ea1-91768684e036@redhat.com>
 <a224b312-0de8-def3-8cbd-611a3f965d42@redhat.com>
Message-ID: <ce776c2e-340f-3627-b957-c9195456f4e0@redhat.com>

On 6/29/20 3:29 PM, Zhengyu Gu wrote:
> Hi Aleksey,
> 
> On 6/29/20 3:04 AM, Aleksey Shipilev wrote:
>> On 6/24/20 2:18 PM, Zhengyu Gu wrote:
>>> Please review this small patch that refactors Shenandoah::heap() to
>>> match other GCs.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8248227
>>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8248227/webev.00/index.html
>>
>> Wait. We do the same thing as ZHeap::heap(). This patch clashes with the intent of JDK-8241743. At
>> very least we have to prove it compiles down to the same.
>>
> 
> Based on comments of JDK-8247740, it intends to make the call inline-able.
> 
> ZGC is different story, it is not derived from CollectedHeap.
> 
> I checked a few places, it does seem to inline ShenandoahHeap::heap() 
> calls, e.g.
> 
>     0x00007ffff780c29c <+28>:	push   %rbx
>     0x00007ffff780c29d <+29>:	sub    $0x48,%rsp
>     0x00007ffff780c2a1 <+33>:	lea    0x54e8c8(%rip),%rax        # 
> 0x7ffff7d5ab70 <_ZN14ShenandoahHeap5_heapE>
>     0x00007ffff780c2a8 <+40>:	mov    (%rax),%rbx

OK then! Looks good.

-- 
Thanks,
-Aleksey


From rkennke at redhat.com  Mon Jun 29 15:14:12 2020
From: rkennke at redhat.com (Roman Kennke)
Date: Mon, 29 Jun 2020 17:14:12 +0200
Subject: [16] RFR: 8248391: Unify handling of all OopStorage instances
 in weak root processing
In-Reply-To: <8e831a42-61b0-1605-ea71-09f97d82f328@oracle.com>
References: <8e831a42-61b0-1605-ea71-09f97d82f328@oracle.com>
Message-ID: <f265c213b9a2b871b76472a82bcb6ed0f1169d3a.camel@redhat.com>

Hi Erik,

This is great stuff.
I've reviewed the Shenandoah parts -- they look good to me.

I'll try to review the rest of it later this week.

Thanks,
Roman

> Hi,
> 
> Today, when a weak OopStorage is added, you have to plug it in 
> explicitly to ZGC, Shenandoah and the WeakProcessor, used by
> Shenandoah, 
> Serial, Parallel and G1. Especially when the runtime data structure 
> associated with an OopStorage needs a notification when oops die.
> Then 
> you have to explicitly plug in notification code in various places in
> GC 
> code.
> It would be ideal if this process could be completely automated.
> 
> This patch allows each OopStorage to have an associated notification 
> function. This is a callback function into the runtime, stating how
> many 
> oops have died this GC cycle. This allows runtime data structures to 
> perform accounting for how large part of the data structure needs 
> cleaning, and whether to trigger such cleaning or not.
> 
> So the interface between the GCs to the OopStorage is that during
> weak 
> processing, the GC promises to call the callback function with how
> many 
> oops died. Some shared infrastructure makes this very easy for the
> GCs.
> 
> Weak processing now uses the OopStorageSet iterators across all GCs,
> so 
> that adding a new weak OopStorage (even with notification functions) 
> does not require touching any GC code.
> 
> Kudos to Zhengyu for providing some Shenandoah code for this, and 
> StefanK for pre-reviewing it. Also, I am about to go out of office
> now, 
> so StefanK promised to take it from here. Big thanks for that!
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8248391
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8248391/webrev.00/
> 
> Thanks,
> /Erik
> 


From goetz.lindenmaier at sap.com  Mon Jun 29 16:26:03 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 29 Jun 2020 16:26:03 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <AM4PR0202MB29642BDA02AC6E55DC1115A3EC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>

The message from this sender included one or more files
which could not be scanned for virus detection; do not
open these files unless you are certain of the sender's intent.

----------------------------------------------------------------------
Hi Patrick,

If you use sizeof(uint) it works on mac. Uint is also the
term jdk/jdk uses here.
I put it into our CI again to make sure all platforms build.
I'll  update you tomorrow (or ping me if I forget).

Also I think we should move the line below the
declaration of _bottom, as it now depends on the
type used there.

Best regards,
Goetz.

diff --git a/src/hotspot/share/gc/shared/taskqueue.hpp b/src/hotspot/share/gc/shared/taskqueue.hpp
--- a/src/hotspot/share/gc/shared/taskqueue.hpp
+++ b/src/hotspot/share/gc/shared/taskqueue.hpp
@@ -113,6 +113,8 @@
   // The first free element after the last one pushed (mod N).
   volatile uint _bottom;
+  // Add paddings to reduce false-sharing cache contention between _bottom and _age
+  DEFINE_PAD_MINUS_SIZE(0, DEFAULT_CACHE_LINE_SIZE, sizeof(uint));
   enum { MOD_N_MASK = N - 1 };


From: Patrick Zhang OS <patrick at os.amperecomputing.com>
Sent: Saturday, June 27, 2020 11:33 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Thanks Goetz

I updated the of reviewers, http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-dev.changeset. Regarding the performance, I had tests on Linux system with a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here would be appropriate, by far, most results of this benchmark are positive especially the metrics sensitive to GC stability (G1 or ParallelGC), and no obvious change with others probably due to microarchitecture level differences in handling exclusive load/store. This is similar as the original patch [1].

Updated "Fix request (11u)" with a risk estimation of this downporting, see JBS [1] please.

I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

[1] https://bugs.openjdk.java.net/browse/JDK-8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14349531


Regards

Patrick


-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Friday, June 26, 2020 3:17 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi Patrick,


I had a look at your change.

I think it makes sense to bring this to 11, if there actually is the performance gain you mention.

Reviewed.


Please add in the "Fix request" comment in the JBS the risk of downporting this.  And I think is should be "Fix request (11u)"

because different people will review your fix request for 11 and 8.


Best regards,

  Goetz.


> -----Original Message-----

> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net<mailto:jdk-updates-dev-bounces at openjdk.java.net>> On

> Behalf Of Patrick Zhang OS

> Sent: Wednesday, June 24, 2020 11:55 AM

> To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>

> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>

> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for

> TaskQueueSuper to reduce false-sharing cache contention

>

> Hi

>

> Could I ask for a review of this simple patch which takes a tiny part

> from the original ticket JDK-8243326 [1]. The reason that I do not

> want a full backport is, the majority of the patch at jdk/jdk [2] is

> to clean up the volatile use and may be not very meaningful to 11u,

> furthermore the context (dependencies on atomic.hpp refactor) is too

> complicated to generate a clear backport (I tried, ~81 files need to be changed).

>

> The purpose of having this one-line change to 11u is, the two volatile

> variables in TaskQueueSuper: _bottom, _age and corresponding atomic

> operations upon, may cause severe cache contention inside GC with

> larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,

> adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce

> the possibility of false-sharing cache contention. I do not need the

> paddings before _bottom and after _age from the original patch [2],

> because the instances of TaskQueueSuper are usually (always) allocated

> in a set of queues, in which they are naturally separated. Please review, thanks.

>

> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

> Testing: tier1-2 pass with the patch, commercial benchmarks and small

> C++ test cases (to simulate the data struct and work-stealing

> algorithm atomics) validated the performance, no regression.

>

> By the way, I am going to request for 8u backport as well once 11u

> would have it.

>

> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of

> volatile in taskqueue code [2]

> https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

>

> Regards

> Patrick

>


From goetz.lindenmaier at sap.com  Mon Jun 29 17:32:50 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 29 Jun 2020 17:32:50 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
Message-ID: <AM4PR0202MB29644CA05D876AA56F1E08EEEC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi,

Christoph and I sent several replies to this mail.
For some reason they are not delivered to the archives.
I get them back with security scan issues ...
So I write a clean answer here, maybe this works:

We saw build problems on mac.  It does not like sizeof(_bottom).

If I use sizeof(uint) it works on mac. Uint is also the
term jdk/jdk uses here.
I put it into our CI again to make sure all platforms build.
I'll  update you tomorrow if it worked.

Also I think we should move the line up, just below the
declaration of _bottom, as it now depends on the
type used there.

Patrick, the bug id in the subject of this mail is wrong. I kept the 
wrong one in this answer so the mail thread does not break.

Also, please remove the "Summary:" line in webrev.02. It only 
repeats the bug title, and thus is redundant. 
You can use the Summary line if you want to add more than the bug title.
You might add "Summary: This is a downport of a part of JDK-8243326"

Best regards,
  Goetz.


Best regards,
Goetz.


From goetz.lindenmaier at sap.com  Mon Jun 29 18:39:19 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Mon, 29 Jun 2020 18:39:19 +0000
Subject: [11u] RFR: 8244241: Add paddings for TaskQueueSuper to reduce
 false-sharing cache contention
Message-ID: <AM4PR0202MB29648BCBE8FF926FBE8E9C5CEC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi,

Christoph and I sent several replies to this mail.
For some reason they are not delivered to the archives.
I get them back with security scan issues ...
So I write a clean answer here, maybe this works:

We saw build problems on mac. It does not like sizeof(_bottom).

If I use sizeof(uint) it works on mac. Uint is also the
term jdk/jdk uses here.
I put it into our CI again to make sure all platforms build.
I'll  update you tomorrow if it worked.

Also I think we should move the line up, just below the
declaration of _bottom, as it now depends on the
type used there.

Patrick, the bug id in the subject of this mail is wrong. I kept the 
wrong one in this answer so the mail thread does not break.

Also, please remove the "Summary:" line in webrev.02. It only 
repeats the bug title, and thus is redundant. 
You can use the Summary line if you want to add more than the bug title.
You might add Summary: This is a downport of a part of JDK-8243326

Best regards,
  Goetz.


From linzang at tencent.com  Tue Jun 30 02:19:24 2020
From: linzang at tencent.com (=?utf-8?B?bGluemFuZyjoh6fnkLMp?=)
Date: Tue, 30 Jun 2020 02:19:24 +0000
Subject: RFR(L): 8215624: add parallel heap inspection support for jmap
 histo(G1)(Internet mail)
In-Reply-To: <E1307BA3-FD13-4598-9B9D-2A45C3034234@tencent.com>
References: <F342EFCF-6E78-49AB-906E-ECF048146410@amazon.com>
 <94C0D11E-F395-4FE4-9ECE-5ECC84B3AE1B@tencent.com>
 <09702D94-F53C-413D-A156-B7390D689BC6@tencent.com>
 <4751f476-1e7a-490f-80c5-96b58eb25191@oracle.com>
 <B59C7508-5CB1-4389-9710-3894ADE20682@tencent.com>
 <E1307BA3-FD13-4598-9B9D-2A45C3034234@tencent.com>
Message-ID: <FAB1944E-C8A9-4450-AE18-598118F11C67@tencent.com>

Dear All, 
	Sorry to bother again, I just want to make sure that is this change worth to be continue to work on? If decision is made to not. I think I can drop this work and stop asking for help reviewing...
	Thanks for all your help about reviewing this previously. 

BRs,
Lin

?On 2020/5/9, 3:47 PM, "linzang(??)" <linzang at tencent.com> wrote:

    Dear All, 
           May I ask your help again for review the latest change?  Thanks!

    BRs,
    Lin

    On 2020/4/28, 1:54 PM, "linzang(??)" <linzang at tencent.com> wrote:

        Hi Stefan, 
          >>  - Adding Atomic::load/store.
          >>  - Removing the time measurement in the run_task. I renamed G1's function 
          >>  to run_task_timed. If we need this outside of G1, we can rethink the API 
          >>  at that point.
           >>  - ZGC style cleanups
           Thanks for revising the patch,  they are all good to me, and I have made a tiny change based on it: 
               http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_04/ 
               http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_04-delta/
          it reduce the scope of mutex in ParHeapInspectTask, and delete unnecessary comments.

        BRs,
        Lin

        On 2020/4/27, 4:34 PM, "Stefan Karlsson" <stefan.karlsson at oracle.com> wrote:

            Hi Lin,

            On 2020-04-26 05:10, linzang(??) wrote:
            > Hi Stefan and Paul?
            >      I have made a new patch based on your comments and Stefan's Poc code:
            >      Webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_03/
            >      Delta(based on Stefan's change:) : http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_03-delta/webrev_03-delta/

            Thanks for providing a delta patch. It makes it much easier to look at, 
            and more likely for reviewers to continue reviewing.

            I'm going to continue focusing on the GC parts, and leave the rest to 
            others to review.

            > 
            >      And Here are main changed I made and want to discuss with you:
            >      1.  changed"parallelThreadNum=" to "parallel=" for jmap -histo options.
            >      2.  Add logic to test where parallelHeapInspection is fail, in heapInspection.cpp
            >            This is because the parHeapInspectTask create thread local KlassInfoTable in it's work() method, and this may fail because of native OOM, in this case, the parallel should fail and serial heap inspection can be tried.
            >            One more thing I want discuss with you is about the member "_success" of parHeapInspectTask, when native OOM happenes, it is set to false. And since this "set" operation can be conducted in multiple threads, should it be atomic ops?  IMO, this is not necessary because "_success" can only be set to false, and there is no way to change it from back to true after the ParHeapInspectTask instance is created, so it is save to be non-atomic, do you agree with that?

            In these situations you should be using the Atomic::load/store 
            primitives. We're moving toward a later C++ standard were data races are 
            considered undefined behavior.

            >     3. make CollectedHeap::run_task() be an abstract virtual func, so that every subclass of collectedHeap should support it, so later implementation of new collectedHeap will not miss the "parallel" features.
            >           The problem I want to discuss with you is about epsilonHeap and SerialHeap, as they may not need parallel heap iteration, so I only make task->work(0), in case the run_task() is invoked someway in future. Another way is to left run_task()  unimplemented, which one do you think is better?

            I don't have a strong opinion about this.

              And also please help take a look at the zHeap, as there is a class 
            zTask that wrap the abstractGangTask, and the collectedHeap::run_task() 
            only accept  AbstraceGangTask* as argument, so I made a delegate class 
            to adapt it , please see src/hotspot/share/gc/z/zHeap.cpp.
            > 
            >        There maybe other better ways to sovle the above problems, welcome for any comments, Thanks!

            I've created a few cleanups and changes on top of your latest patch:

            https://cr.openjdk.java.net/~stefank/8215624/webrev.02.delta
            https://cr.openjdk.java.net/~stefank/8215624/webrev.02

            - Adding Atomic::load/store.
            - Removing the time measurement in the run_task. I renamed G1's function 
            to run_task_timed. If we need this outside of G1, we can rethink the API 
            at that point.
            - ZGC style cleanups

            Thanks,
            StefanK

            > 
            > BRs,
            > Lin
            > 
            > On 2020/4/23, 11:08 AM, "linzang(??)" <linzang at tencent.com> wrote:
            > 
            >      Thanks Paul! I agree with using "parallel", will make the update in next patch, Thanks for help update the CSR.
            > 
            >      BRs,
            >      Lin
            > 
            >      On 2020/4/23, 4:42 AM, "Hohensee, Paul" <hohensee at amazon.com> wrote:
            > 
            >          For the interface, I'd use "parallel" instead of "parallelThreadNum". All the other options are lower case, and it's a lot easier to type "parallel". I took the liberty of updating the CSR. If you're ok with it, you might want to change variable names and such, plus of course JMap.usage.
            > 
            >          Thanks,
            >          Paul
            > 
            >          On 4/22/20, 2:29 AM, "serviceability-dev on behalf of linzang(??)" <serviceability-dev-bounces at openjdk.java.net on behalf of linzang at tencent.com> wrote:
            > 
            >              Dear Stefan,
            > 
            >                      Thanks a lot! I agree with you to decouple the heap inspection code with GC's.
            >                      I will start  from your POC code, may discuss with you later.
            > 
            > 
            >              BRs,
            >              Lin
            > 
            >              On 2020/4/22, 5:14 PM, "Stefan Karlsson" <stefan.karlsson at oracle.com> wrote:
            > 
            >                  Hi Lin,
            > 
            >                  I took a look at this earlier and saw that the heap inspection code is
            >                  strongly coupled with the CollectedHeap and G1CollectedHeap. I'd prefer
            >                  if we'd abstract this away, so that the GCs only provide a "parallel
            >                  object iteration" interface, and the heap inspection code is kept elsewhere.
            > 
            >                  I started experimenting with doing that, but other higher-priority (to
            >                  me) tasks have had to take precedence.
            > 
            >                  I've uploaded my work-in-progress / proof-of-concept:
            >                    https://cr.openjdk.java.net/~stefank/8215624/webrev.01.delta/
            >                    https://cr.openjdk.java.net/~stefank/8215624/webrev.01/
            > 
            >                  The current code doesn't handle the lifecycle (deletion) of the
            >                  ParallelObjectIterators. There's also code left unimplemented in around
            >                  CollectedHeap::run_task. However, I think this could work as a basis to
            >                  pull out the heap inspection code out of the GCs.
            > 
            >                  Thanks,
            >                  StefanK
            > 
            >                  On 2020-04-22 02:21, linzang(??) wrote:
            >                  > Dear all,
            >                  >       May I ask you help to review? This RFR has been there for quite a while.
            >                  >       Thanks!
            >                  >
            >                  > BRs,
            >                  > Lin
            >                  >
            >                  > > On 2020/3/16, 5:18 PM, "linzang(??)" <linzang at tencent.com> wrote:>
            >                  >
            >                  >>    Just update a new path, my preliminary measure show about 3.5x speedup of jmap histo on a nearly full 4GB G1 heap (8-core platform with parallel thread number set to 4).
            >                  >>     webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_02/
            >                  >>     bug: https://bugs.openjdk.java.net/browse/JDK-8215624
            >                  >>     CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
            >                  >>     BRs,
            >                  >>       Lin
            >                  >>       > On 2020/3/2, 9:56 PM, "linzang(??)" <linzang at tencent.com> wrote:
            >                  >>       >
            >                  >>       >    Dear all,
            >                  >>       >          Let me try to ease the reviewing work by some explanation :P
            >                  >>       >          The patch's target is to speed up jmap -histo for heap iteration, from my experience it is necessary for large heap investigation. E.g in bigData scenario I have tried to conduct jmap -histo against 180GB heap, it does take quite a while.
            >                  >>       >          And if my understanding is corrent, even the jmap -histo without "live" option does heap inspection with heap lock acquired. so it is very likely to block mutator thread in allocation-sensitive scenario. I would say the faster the heap inspection does, the shorter the mutator be blocked. This is parallel iteration for jmap is necessary.
            >                  >>       >          I think the parallel heap inspection should be applied to all kind of heap. However, consider the heap layout are different for  GCs, much time is required to understand all kinds of the heap layout to make the whole change. IMO, It is not wise to have a huge patch for the whole solution at once, and it is even harder to review it. So I plan to implement it incrementally, the first patch (this one) is going to confirm the implemention detail of how jmap accept the new option, passes it to attachListener of the jvm process and then how to make the parallel inspection closure be generic enough to make it easy to extend to different heap layout. And also how to implement the heap inspection in specific gc's heap. This patch use G1's heap as the begining.
            >                  >>       >          This patch actually do several things:
            >                  >>       >          1. Add an option "parallelThreadNum=<N>" to jmap -histo, the default behavior is to set N to 0, means let's JVM decide how many threads to use for heap inspection. Set this option to 1 will disable parallel heap inspection. (more details in CSR: https://bugs.openjdk.java.net/browse/JDK-8239290)
            >                  >>       >          2. Make a change in how Jmap passing arguments, changes in http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/src/jdk.jcmd/share/classes/sun/tools/jmap/JMap.java.udiff.html, originally it pass options as separate arguments to attachListener, this patch change to that all options be compose to a single string. So the arg_count_max in attachListener.hpp do not need to be changed, and hence avoid the compatibility issue, as disscussed at https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-March/027334.html
            >                  >>       >         3. Add an abstract class ParHeapInspectTask in heapInspection.hpp / heapInspection.cpp, It's work(uint worker_id) method prepares the data structure (KlassInfoTable) need for every parallel worker thread, and then call do_object_iterate_parallel() which is heap specific implementation. I also added some machenism in KlassInfoTable to support parallel iteration, such as merge().
            >                  >>       >        4. In specific heap (G1 in this patch), create a subclass of ParHeapInspectTask, implement the do_object_iterate_parallel() for parallel heap inspection. For G1, it simply invoke g1CollectedHeap's object_iterate_parallel().
            >                  >>       >        5. Add related test.
            >                  >>       >        6. it may be easy to extend this patch for other kinds of heap by creating subclass of ParHeapInspectTask and implement the do_object_iterate_parallel().
            >                  >>       >
            >                  >>       >    Hope these info could help on code review and initate the discussion :-)
            >                  >>       >    Thanks!
            >                  >>       >
            >                  >>       >    BRs,
            >                  >>       >    Lin
            >                  >>       >    >On 2020/2/19, 9:40 AM, "linzang(??)" <linzang at tencent.com> wrote:.
            >                  >>       >    >
            >                  >>       >    >  Re-post this RFR with correct enhancement number to make it trackable.
            >                  >>       >    >  please ignore the previous wrong post. sorry for troubles.
            >                  >>       >    >
            >                  >>       >    >   webrev: http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/
            >                  >>       >    >    Hi bug: https://bugs.openjdk.java.net/browse/JDK-8215624
            >                  >>       >    >    CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
            >                  >>       >    >    --------------
            >                  >>       >    >    Lin
            >                  >>       >    >    >Hi Lin,
            >                  >   >     >    >    >
            >                  >>       >    >    >Could you, please, re-post your RFR with the right enhancement number in
            >                  >>       >    >    >the message subject?
            >                  >>       >    >    >It will be more trackable this way.
            >                  >>       >    >    >
            >                  >>       >    >    >Thanks,
            >                  >>       >    >    >Serguei
            >                  >>       >    >    >
            >                  >>       >    >    >
            >                  >>       >    >    >On 2/17/20 10:29 PM, linzang(??) wrote:
            >                  >>       >    >    >> Dear David,
            >                  >>       >    >    >>        Thanks a lot!
            >                  >>       >    >    >>       I have updated the refined code to http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
            >                  >>       >    >    >>        IMHO the parallel heap inspection can be extended to all kinds of heap as long as the heap layout can support parallel iteration.
            >                  >>       >    >    >>        Maybe we can firstly use this webrev to discuss how to implement it, because I am not sure my current implementation is an appropriate way to communicate with collectedHeap, then we can extend the solution to other kinds of heap.
            >                  >>       >    >    >>
            >                  >>       >    >    >> Thanks,
            >                  >>       >    >    >> --------------
            >                  >>       >    >    >> Lin
            >                  >>       >    >    >>> Hi Lin,
            >                  >>       >    >    >>>
            >                  >>       >    >    >>> Adding in hotspot-gc-dev as they need to see how this interacts with GC
            >                  >>       >    >    >>> worker threads, and whether it needs to be extended beyond G1.
            >                  >>       >    >    >>>
            >                  >>       >    >   >>> I happened to spot one nit when browsing:
            >                  >>       >    >    >>>
            >                  >>       >    >    >>> src/hotspot/share/gc/shared/collectedHeap.hpp
            >                  >>       >    >    >>>
            >                  >>       >    >    >>> +   virtual bool run_par_heap_inspect_task(KlassInfoTable* cit,
            >                  >>       >    >    >>> +                                          BoolObjectClosure* filter,
            >                  >>       >    >    >>> +                                          size_t* missed_count,
            >                  >>       >    >    >>> +                                          size_t thread_num) {
            >                  >>       >    >    >>> +     return NULL;
            >                  >>       >    >    >>>
            >                  >>       >    >    >>> s/NULL/false/
            >                  >>       >    >    >>>
            >                  >>       >    >    >>> Cheers,
            >                  >>       >    >    >>> David
            >                  >   >     >    >    >>>
            >                  >>       >    >    >>> On 18/02/2020 2:15 pm, linzang(??) wrote:
            >                  >>       >    >    >>>> Dear All,
            >                  >>       >    >    >>>>         May I ask your help to review the follow changes:
            >                  >>       >    >    >>>>         webrev:
            >                  >>       >    >    >>>> http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
            >                  >>       >    >    >>>>      bug: https://bugs.openjdk.java.net/browse/JDK-8215624
            >                  >>       >    >    >>>>      related CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
            >                  >>       >    >    >>>>         This patch enable parallel heap inspection of G1 for jmap histo.
            >                  >>       >    >    >>>>         my simple test shown it can speed up 2x of jmap -histo with
            >                  >>       >    >    >>>> parallelThreadNum set to 2 for heap at ~500M on 4-core platform.
            >                  >>       >    >    >>>>
            >                  >>       >    >    >>>> ------------------------------------------------------------------------
            >                  >>       >    >    >>>> BRs,
            >                  >>       >    >    >>>> Lin
            >                  >>       >    >    >> >
            >                  >>       >    >    >
            >                  >
            >                  >
            >                  >
            > 
            > 
            > 
            > 
            > 
            > 


From igor.ignatyev at oracle.com  Tue Jun 30 03:23:57 2020
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 29 Jun 2020 20:23:57 -0700
Subject: RFR(S) [15] : 8208207 : Test nsk/stress/jni/gclocker/gcl001 fails
 after co-location
Message-ID: <975A00DA-EA4F-4150-ABBF-31FE81CD2F05@oracle.com>

http://cr.openjdk.java.net/~iignatyev//8208207/webrev.00
> 38 lines changed: 1 ins; 4 del; 33 mod;

Hi all,

could you please review the small patch which fixes gcl001 test and returns it back to work?

the issue reported in the bug (assert(!JavaThread::current()->in_critical()) failed: Would deadlock) was caused by calls to JNI functions other than (Get|Release).*Critical within a critical region. after this get fixed by moving Get.*Length calls outside of critical regions, the test started to fail w/ "Data validation failure" message. this was b/c of returning 0 and w/o sorting arrays in case isCopy was changed to TRUE by Get.*Critical. as there is no way to guarantee that we won't get a copy of array, I decided to remove checks of isCopy which, although might slightly change that this test check, makes the test more robust.

JBS: https://bugs.openjdk.java.net/browse/JDK-8208207
webrev: http://cr.openjdk.java.net/~iignatyev//8208207/webrev.00
testing:
 - run the test against {linux,windows,macosx-x64}-{product,fastdebug} w/ default GC
 - run the test against macosx-x64-slowdebug w/ Serial,Parallel,G1,ZGC

Thanks,
-- Igor


From patrick at os.amperecomputing.com  Tue Jun 30 03:40:56 2020
From: patrick at os.amperecomputing.com (Patrick Zhang OS)
Date: Tue, 30 Jun 2020 03:40:56 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <AM4PR0202MB29642BDA02AC6E55DC1115A3EC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB29642BDA02AC6E55DC1115A3EC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
Message-ID: <DM6PR01MB4921584EDE1483EFC56984EC8F6F0@DM6PR01MB4921.prod.exchangelabs.com>

Thanks for finding out it.

I updated the patch: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.04/jdk11u-dev.changeset.

Yes, jdk/jdk uses sizeof(uint), and placing the two variables side-by-side can remind people in case of type changes in future. I don't have mac system, so thanks a lot again for having it run in your CI.

Regards
Patrick

From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Sent: Tuesday, June 30, 2020 12:26 AM
To: Patrick Zhang OS <patrick at os.amperecomputing.com>; jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Hi Patrick,

If you use sizeof(uint) it works on mac. Uint is also the
term jdk/jdk uses here.
I put it into our CI again to make sure all platforms build.
I'll  update you tomorrow (or ping me if I forget).

Also I think we should move the line below the
declaration of _bottom, as it now depends on the
type used there.

Best regards,
Goetz.

diff --git a/src/hotspot/share/gc/shared/taskqueue.hpp b/src/hotspot/share/gc/shared/taskqueue.hpp
--- a/src/hotspot/share/gc/shared/taskqueue.hpp
+++ b/src/hotspot/share/gc/shared/taskqueue.hpp
@@ -113,6 +113,8 @@

   // The first free element after the last one pushed (mod N).
   volatile uint _bottom;
+  // Add paddings to reduce false-sharing cache contention between _bottom and _age
+  DEFINE_PAD_MINUS_SIZE(0, DEFAULT_CACHE_LINE_SIZE, sizeof(uint));

   enum { MOD_N_MASK = N - 1 };


From: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>
Sent: Saturday, June 27, 2020 11:33 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Thanks Goetz

I updated the of reviewers, http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-dev.changeset. Regarding the performance, I had tests on Linux system with a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here would be appropriate, by far, most results of this benchmark are positive especially the metrics sensitive to GC stability (G1 or ParallelGC), and no obvious change with others probably due to microarchitecture level differences in handling exclusive load/store. This is similar as the original patch [1].

Updated "Fix request (11u)" with a risk estimation of this downporting, see JBS [1] please.

I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

[1] https://bugs.openjdk.java.net/browse/JDK-8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14349531


Regards

Patrick


-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Friday, June 26, 2020 3:17 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi Patrick,


I had a look at your change.

I think it makes sense to bring this to 11, if there actually is the performance gain you mention.

Reviewed.


Please add in the "Fix request" comment in the JBS the risk of downporting this.  And I think is should be "Fix request (11u)"

because different people will review your fix request for 11 and 8.


Best regards,

  Goetz.


> -----Original Message-----

> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net<mailto:jdk-updates-dev-bounces at openjdk.java.net>> On

> Behalf Of Patrick Zhang OS

> Sent: Wednesday, June 24, 2020 11:55 AM

> To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>

> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>

> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for

> TaskQueueSuper to reduce false-sharing cache contention

>

> Hi

>

> Could I ask for a review of this simple patch which takes a tiny part

> from the original ticket JDK-8243326 [1]. The reason that I do not

> want a full backport is, the majority of the patch at jdk/jdk [2] is

> to clean up the volatile use and may be not very meaningful to 11u,

> furthermore the context (dependencies on atomic.hpp refactor) is too

> complicated to generate a clear backport (I tried, ~81 files need to be changed).

>

> The purpose of having this one-line change to 11u is, the two volatile

> variables in TaskQueueSuper: _bottom, _age and corresponding atomic

> operations upon, may cause severe cache contention inside GC with

> larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,

> adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce

> the possibility of false-sharing cache contention. I do not need the

> paddings before _bottom and after _age from the original patch [2],

> because the instances of TaskQueueSuper are usually (always) allocated

> in a set of queues, in which they are naturally separated. Please review, thanks.

>

> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

> Testing: tier1-2 pass with the patch, commercial benchmarks and small

> C++ test cases (to simulate the data struct and work-stealing

> algorithm atomics) validated the performance, no regression.

>

> By the way, I am going to request for 8u backport as well once 11u

> would have it.

>

> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of

> volatile in taskqueue code [2]

> https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

>

> Regards

> Patrick

>


From ofirg6 at gmail.com  Tue Jun 30 07:02:51 2020
From: ofirg6 at gmail.com (Ofir Gordon)
Date: Tue, 30 Jun 2020 10:02:51 +0300
Subject: How to run specific part of the SerialGC in a new thread?
In-Reply-To: <afabcff3-8741-4c32-e4f8-1a560f92f5cc@oracle.com>
References: <CAC9y=qCXmGkeHZgWwuHwk-ypMEf8iHRBkQmDqpgdHGckL1QweQ@mail.gmail.com>
 <afabcff3-8741-4c32-e4f8-1a560f92f5cc@oracle.com>
Message-ID: <CAC9y=qAGjBMMsNfZzB2c=kMOLmRKfSZdWEHQT_Tkya_hMaKoZw@mail.gmail.com>

Hi,
Sorry for the late followup, thank you for the answer.
I looked at the WorkGang mechanism and it seems a little overkill for my
purposes, I only want to be able to split the task to a different thread
and wait for it to complete. From what I understand, in order to do this
with WotkGang I need to create a dedicated AbstractGangClass which will
need to include all methods of the scenario I'm trying to run separately..

Is it not possible to use pthread or similar libraries within the code? Is
there another way that doesn't require using WorkGang?

Thanks again

??????? ??? ??, 24 ????? 2020 ?-14:52 ??? ?Thomas Schatzl?? <?
thomas.schatzl at oracle.com??>:?

> Hi,
>
> On 24.06.20 11:42, Ofir Gordon wrote:
> > Hello,
> >
> > I'm trying to run a specific procedure within the SerialGC in a
> > separate thread, i.e. replace the call for the procedure with a creation
> of
> > a new thread that will run the entire procedure and then return to the
> main
> > thread and continue (the main thread should wait to the splitted thread).
> >
> > What is the correct way to do it? Is there any threads mechanism that the
> > vm uses in order to split tasks to new threads?
> > I tried to simply use the pthread library, but when I add calls to
> pthread
> > methods (like pthread_create) the compilation fails (with segfault..)
> >
> > I'll appreciate your help,
> > Thanks,
> > Ofir
> >
>
>    probably the simplest way is to add a WorkGang with one thread and
> use the run_task() method.
>
> There are a lot of uses of it in the code.
>
>    Thomas
>


From kim.barrett at oracle.com  Tue Jun 30 07:10:20 2020
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 30 Jun 2020 03:10:20 -0400
Subject: [16] RFR: 8248391: Unify handling of all OopStorage instances in
 weak root processing
In-Reply-To: <8e831a42-61b0-1605-ea71-09f97d82f328@oracle.com>
References: <8e831a42-61b0-1605-ea71-09f97d82f328@oracle.com>
Message-ID: <11EE1DA7-3265-4A40-BA37-95B66B129F50@oracle.com>

> On Jun 26, 2020, at 9:06 AM, Erik ?sterlund <erik.osterlund at oracle.com> wrote:
> 
> Hi,
> 
> Today, when a weak OopStorage is added, you have to plug it in explicitly to ZGC, Shenandoah and the WeakProcessor, used by Shenandoah, Serial, Parallel and G1. Especially when the runtime data structure associated with an OopStorage needs a notification when oops die. Then you have to explicitly plug in notification code in various places in GC code.
> It would be ideal if this process could be completely automated.
> 
> This patch allows each OopStorage to have an associated notification function. This is a callback function into the runtime, stating how many oops have died this GC cycle. This allows runtime data structures to perform accounting for how large part of the data structure needs cleaning, and whether to trigger such cleaning or not.
> 
> So the interface between the GCs to the OopStorage is that during weak processing, the GC promises to call the callback function with how many oops died. Some shared infrastructure makes this very easy for the GCs.
> 
> Weak processing now uses the OopStorageSet iterators across all GCs, so that adding a new weak OopStorage (even with notification functions) does not require touching any GC code.
> 
> Kudos to Zhengyu for providing some Shenandoah code for this, and StefanK for pre-reviewing it. Also, I am about to go out of office now, so StefanK promised to take it from here. Big thanks for that!
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8248391
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8248391/webrev.00/
> 
> Thanks,
> /Erik

While I approve of the goal for this change, I have a couple of issues
of some significance with the approach taken.

I think the way the gc notifications are being set up is backward.
The GC (in particular, OopStorage) shouldn't know about client
subsystems in order to record the notification function.  That seems
like a dependency inversion.  Rather, I think client subsystems should
be registering callbacks with the GC.

I've never liked the existing mechanism used to gather iteration
statistics (number of entries cleared, skipped, processed...).  It
always seemed rather bolted on after the fact.  This change continues
that.  I'd rather take the opportunity to improve that, by improving
OopStorage in this area.

I'm working on a reworked version of Erik's changes.  So far it's
looking pretty good.  Some of Erik's work I'm using as-is, but some
parts not so much.  I'm not yet ready to send anything out for review,
but mentioning it here so others hopefully don't spend too much time
reviewing the existing change.

I've discussed this with Stefan, and he's agreed to hand off
responsibility for this project to me.


From goetz.lindenmaier at sap.com  Tue Jun 30 07:45:47 2020
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 30 Jun 2020 07:45:47 +0000
Subject: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce
 false-sharing cache contention
In-Reply-To: <DM6PR01MB4921584EDE1483EFC56984EC8F6F0@DM6PR01MB4921.prod.exchangelabs.com>
References: <DM6PR01MB4921DE364CE0C8F48E072AFD8F950@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB2964F9322AF36BD8507FD6C6EC930@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921BC5A762E605E9295284F8F900@DM6PR01MB4921.prod.exchangelabs.com>
 <AM4PR0202MB29642BDA02AC6E55DC1115A3EC6E0@AM4PR0202MB2964.eurprd02.prod.outlook.com>
 <DM6PR01MB4921584EDE1483EFC56984EC8F6F0@DM6PR01MB4921.prod.exchangelabs.com>
Message-ID: <AM4PR0202MB2964196FB35FFDF48FB39FB9EC6F0@AM4PR0202MB2964.eurprd02.prod.outlook.com>

Hi Patrick,

This looks good now, and also our builds all passed.
I'll sponsor it.

Best regards,
Goetz

PS: Sorry for the mail flood yesterday, they finally all showed
up in the archive.

From: Patrick Zhang OS <patrick at os.amperecomputing.com>
Sent: Tuesday, June 30, 2020 5:41 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; jdk-updates-dev at openjdk.java.net
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Thanks for finding out it.

I updated the patch: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.04/jdk11u-dev.changeset.

Yes, jdk/jdk uses sizeof(uint), and placing the two variables side-by-side can remind people in case of type changes in future. I don't have mac system, so thanks a lot again for having it run in your CI.

Regards
Patrick

From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Tuesday, June 30, 2020 12:26 AM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Hi Patrick,

If you use sizeof(uint) it works on mac. Uint is also the
term jdk/jdk uses here.
I put it into our CI again to make sure all platforms build.
I'll  update you tomorrow (or ping me if I forget).

Also I think we should move the line below the
declaration of _bottom, as it now depends on the
type used there.

Best regards,
Goetz.

diff --git a/src/hotspot/share/gc/shared/taskqueue.hpp b/src/hotspot/share/gc/shared/taskqueue.hpp
--- a/src/hotspot/share/gc/shared/taskqueue.hpp
+++ b/src/hotspot/share/gc/shared/taskqueue.hpp
@@ -113,6 +113,8 @@

   // The first free element after the last one pushed (mod N).
   volatile uint _bottom;
+  // Add paddings to reduce false-sharing cache contention between _bottom and _age
+  DEFINE_PAD_MINUS_SIZE(0, DEFAULT_CACHE_LINE_SIZE, sizeof(uint));

   enum { MOD_N_MASK = N - 1 };


From: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>
Sent: Saturday, June 27, 2020 11:33 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueuSuper to reduce false-sharing cache contention

Thanks Goetz

I updated the of reviewers, http://cr.openjdk.java.net/~qpzhang/8248214/webrev.02/jdk11u-dev.changeset. Regarding the performance, I had tests on Linux system with a couple of x86_64/aarch64 servers, I am not sure if mentioning specjbb here would be appropriate, by far, most results of this benchmark are positive especially the metrics sensitive to GC stability (G1 or ParallelGC), and no obvious change with others probably due to microarchitecture level differences in handling exclusive load/store. This is similar as the original patch [1].

Updated "Fix request (11u)" with a risk estimation of this downporting, see JBS [1] please.

I am not familiar with the process of jdk-updates. Is it ok to push this downporting patch now? or I should still wait for maintainer's approval at JBS (jdk11u-fix-yes?).

[1] https://bugs.openjdk.java.net/browse/JDK-8248214?focusedCommentId=14349531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14349531


Regards

Patrick


-----Original Message-----
From: Lindenmaier, Goetz <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>
Sent: Friday, June 26, 2020 3:17 PM
To: Patrick Zhang OS <patrick at os.amperecomputing.com<mailto:patrick at os.amperecomputing.com>>; jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>
Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
Subject: RE: [11u] RFR: 8244214: Add paddings for TaskQueueSuper to reduce false-sharing cache contention


Hi Patrick,


I had a look at your change.

I think it makes sense to bring this to 11, if there actually is the performance gain you mention.

Reviewed.


Please add in the "Fix request" comment in the JBS the risk of downporting this.  And I think is should be "Fix request (11u)"

because different people will review your fix request for 11 and 8.


Best regards,

  Goetz.


> -----Original Message-----

> From: jdk-updates-dev <jdk-updates-dev-bounces at openjdk.java.net<mailto:jdk-updates-dev-bounces at openjdk.java.net>> On

> Behalf Of Patrick Zhang OS

> Sent: Wednesday, June 24, 2020 11:55 AM

> To: jdk-updates-dev at openjdk.java.net<mailto:jdk-updates-dev at openjdk.java.net>

> Cc: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>

> Subject: [DMARC FAILURE] [11u] RFR: 8244214: Add paddings for

> TaskQueueSuper to reduce false-sharing cache contention

>

> Hi

>

> Could I ask for a review of this simple patch which takes a tiny part

> from the original ticket JDK-8243326 [1]. The reason that I do not

> want a full backport is, the majority of the patch at jdk/jdk [2] is

> to clean up the volatile use and may be not very meaningful to 11u,

> furthermore the context (dependencies on atomic.hpp refactor) is too

> complicated to generate a clear backport (I tried, ~81 files need to be changed).

>

> The purpose of having this one-line change to 11u is, the two volatile

> variables in TaskQueueSuper: _bottom, _age and corresponding atomic

> operations upon, may cause severe cache contention inside GC with

> larger number of threads, i.e., specified by -XX:ParallelGCThreads=##,

> adding paddings (up to DEFAULT_CACHE_LINE_SIZE) in-between can reduce

> the possibility of false-sharing cache contention. I do not need the

> paddings before _bottom and after _age from the original patch [2],

> because the instances of TaskQueueSuper are usually (always) allocated

> in a set of queues, in which they are naturally separated. Please review, thanks.

>

> JBS: https://bugs.openjdk.java.net/browse/JDK-8248214

> Webrev: http://cr.openjdk.java.net/~qpzhang/8248214/webrev.01/

> Testing: tier1-2 pass with the patch, commercial benchmarks and small

> C++ test cases (to simulate the data struct and work-stealing

> algorithm atomics) validated the performance, no regression.

>

> By the way, I am going to request for 8u backport as well once 11u

> would have it.

>

> [1] https://bugs.openjdk.java.net/browse/JDK-8243326 Cleanup use of

> volatile in taskqueue code [2]

> https://hg.openjdk.java.net/jdk/jdk/rev/252a1602b4c6

>

> Regards

> Patrick

>


From thomas.schatzl at oracle.com  Tue Jun 30 08:09:50 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 30 Jun 2020 10:09:50 +0200
Subject: How to run specific part of the SerialGC in a new thread?
In-Reply-To: <CAC9y=qAGjBMMsNfZzB2c=kMOLmRKfSZdWEHQT_Tkya_hMaKoZw@mail.gmail.com>
References: <CAC9y=qCXmGkeHZgWwuHwk-ypMEf8iHRBkQmDqpgdHGckL1QweQ@mail.gmail.com>
 <afabcff3-8741-4c32-e4f8-1a560f92f5cc@oracle.com>
 <CAC9y=qAGjBMMsNfZzB2c=kMOLmRKfSZdWEHQT_Tkya_hMaKoZw@mail.gmail.com>
Message-ID: <2b1a3443-ccbd-602d-ae4c-4cb97284b43d@oracle.com>

Hi,

On 30.06.20 09:02, Ofir Gordon wrote:
> Hi,
> Sorry for the late followup, thank you for the answer.
> I looked at the WorkGang mechanism and it seems a little overkill for my 
> purposes, I only want to be able to split the task to a different thread 
> and wait for it to complete. From what I understand, in order to do this 
> with WotkGang?I need to create a dedicated AbstractGangClass which will 
> need to include all methods of the scenario I'm trying to run separately..

I am not completely clear about the problem here: of course the spawned 
thread needs to include (call) the method(s) you want it to run, so 
there seems to be no difference than any other mechanism.

The original question also asked about a single ("a new thread") thread 
doing all the work, so the problem you have with WorkGang is even more 
puzzling to me.

I agree that there is some boilerplate involved with AbstractGangClass, 
but pthreads with all the manual synchronization involved seems to be 
much worse.

Maybe you want to do multiple tasks once each by multiple threads? What 
Hotspot code usually does in this case is that every worker claims a 
part of that work.

There is e.g. SubTasksDone that could help you with that, see e.g. 
G1RootProcessor or any other use for an application.

Otherwise please clarify your use case.

> Is it not possible to use pthread or similar libraries within the code? 
> Is there another way that doesn't require using WorkGang?
> 

Workgang is the simply the simplest option ;) I think managing a 
Thread/NamedThread yourselves could be fine too if you want to keep the 
rest of Hotspot working, but that (like pthreads) involves coding the 
synchronization between main and worker thread yourselves.

The reason why pthread most likely fails with trying to do existing work 
is that they want to implicitly access some members of the thread during 
processing, e.g. logging and others, but idk as I do not know the code. 
I have no idea why using pthreads breaks compilation: at least on 
Linux/BSD/lots of others it is in use already.

So if that work you are trying to move into a separate thread is 
completely independent of existing work, pthreads should work just fine.

Thanks,
   Thomas


From ofirg6 at gmail.com  Tue Jun 30 08:27:38 2020
From: ofirg6 at gmail.com (Ofir Gordon)
Date: Tue, 30 Jun 2020 11:27:38 +0300
Subject: How to run specific part of the SerialGC in a new thread?
In-Reply-To: <2b1a3443-ccbd-602d-ae4c-4cb97284b43d@oracle.com>
References: <CAC9y=qCXmGkeHZgWwuHwk-ypMEf8iHRBkQmDqpgdHGckL1QweQ@mail.gmail.com>
 <afabcff3-8741-4c32-e4f8-1a560f92f5cc@oracle.com>
 <CAC9y=qAGjBMMsNfZzB2c=kMOLmRKfSZdWEHQT_Tkya_hMaKoZw@mail.gmail.com>
 <2b1a3443-ccbd-602d-ae4c-4cb97284b43d@oracle.com>
Message-ID: <CAC9y=qCDKAdyKd05mV+2PCmUEF1coO49FLjEnoRODWCe1YguEQ@mail.gmail.com>

I'll try to explain my end goal:
I simply want to "take out" the part that runs the marking phase in a full
collection with serial GC to run on a dedicated cpu (on a simulator).
So, I figured the simpler way to do that would be to create a new thread
which gets the "follow_stack" part as its task and bind it to the dedicated
cpu. Any aspect of performance or synchronization issues is currently not
relevant (maybe later it will be). I'm just looking for the simplest way to
split the "follow_stack" execution to a seperate thread, while the main
thread is waiting on it.
That's why I thought that creating this AbstractGangClass task is too
complicated for my purposes.
If there is a way to create the Task class for running only "follow_stack"
in a simple way I would really like to know.

Regarding the compilation failure, when I'm trying to include <thread> and
use it the build fails because it doesn't recognize it. I tried to add to
the configuration: --with-extra-cxxflags="-std=c++0x -pthread" but then the
"make images" command fails with the message "use of global operators new
and delete is not allowed in Hotspot".
Trying to use <pthread> didn't work out as well (different errors).

Any other suggestions? (:

Thanks again,
Ofir

??????? ??? ??, 30 ????? 2020 ?-11:10 ??? ?Thomas Schatzl?? <?
thomas.schatzl at oracle.com??>:?

> Hi,
>
> On 30.06.20 09:02, Ofir Gordon wrote:
> > Hi,
> > Sorry for the late followup, thank you for the answer.
> > I looked at the WorkGang mechanism and it seems a little overkill for my
> > purposes, I only want to be able to split the task to a different thread
> > and wait for it to complete. From what I understand, in order to do this
> > with WotkGang I need to create a dedicated AbstractGangClass which will
> > need to include all methods of the scenario I'm trying to run
> separately..
>
> I am not completely clear about the problem here: of course the spawned
> thread needs to include (call) the method(s) you want it to run, so
> there seems to be no difference than any other mechanism.
>
> The original question also asked about a single ("a new thread") thread
> doing all the work, so the problem you have with WorkGang is even more
> puzzling to me.
>
> I agree that there is some boilerplate involved with AbstractGangClass,
> but pthreads with all the manual synchronization involved seems to be
> much worse.
>
> Maybe you want to do multiple tasks once each by multiple threads? What
> Hotspot code usually does in this case is that every worker claims a
> part of that work.
>
> There is e.g. SubTasksDone that could help you with that, see e.g.
> G1RootProcessor or any other use for an application.
>
> Otherwise please clarify your use case.
>
> > Is it not possible to use pthread or similar libraries within the code?
> > Is there another way that doesn't require using WorkGang?
> >
>
> Workgang is the simply the simplest option ;) I think managing a
> Thread/NamedThread yourselves could be fine too if you want to keep the
> rest of Hotspot working, but that (like pthreads) involves coding the
> synchronization between main and worker thread yourselves.
>
> The reason why pthread most likely fails with trying to do existing work
> is that they want to implicitly access some members of the thread during
> processing, e.g. logging and others, but idk as I do not know the code.
> I have no idea why using pthreads breaks compilation: at least on
> Linux/BSD/lots of others it is in use already.
>
> So if that work you are trying to move into a separate thread is
> completely independent of existing work, pthreads should work just fine.
>
> Thanks,
>    Thomas
>


From thomas.schatzl at oracle.com  Tue Jun 30 09:50:30 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 30 Jun 2020 11:50:30 +0200
Subject: RFR(S) [15] : 8208207 : Test nsk/stress/jni/gclocker/gcl001 fails
 after co-location
In-Reply-To: <975A00DA-EA4F-4150-ABBF-31FE81CD2F05@oracle.com>
References: <975A00DA-EA4F-4150-ABBF-31FE81CD2F05@oracle.com>
Message-ID: <9391590c-f063-caaa-2b4b-91dc09a1005d@oracle.com>

Hi,

On 30.06.20 05:23, Igor Ignatyev wrote:
> http://cr.openjdk.java.net/~iignatyev//8208207/webrev.00
>> 38 lines changed: 1 ins; 4 del; 33 mod;
> 
> Hi all,
> 
> could you please review the small patch which fixes gcl001 test and returns it back to work?
> 
> the issue reported in the bug (assert(!JavaThread::current()->in_critical()) failed: Would deadlock) was caused by calls to JNI functions other than (Get|Release).*Critical within a critical region. after this get fixed by moving Get.*Length calls outside of critical regions, the test started to fail w/ "Data validation failure" message. this was b/c of returning 0 and w/o sorting arrays in case isCopy was changed to TRUE by Get.*Critical. as there is no way to guarantee that we won't get a copy of array, I decided to remove checks of isCopy which, although might slightly change that this test check, makes the test more robust.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8208207
> webrev: http://cr.openjdk.java.net/~iignatyev//8208207/webrev.00
> testing:
>   - run the test against {linux,windows,macosx-x64}-{product,fastdebug} w/ default GC
>   - run the test against macosx-x64-slowdebug w/ Serial,Parallel,G1,ZGC
> 

- I would prefer if the test removed the debug code, i.e. the native 
EnterCS/ReleaseCS methods and associated data structures. At least 
instead of commenting out code to disable it, add an ifdef.

- could the backslashes in the macro be lined up? Otherwise the code is 
even uglier than it already is :P

- unless it is intentional, the code would be easier to read if the 
GetxxxCritical and ReleasexxxCritical were scoped better, i.e. now it is 
like:

GetPrimitiveArrayCritical
GetStringCritical
ReleaseStringCritical
GetStringCritical
ReleasePrimitiveArrayCritical <--- this one
ReleaseStringCritical

and the question is why the ReleasePrimitiveArrayCritical is between the 
second get/releaseStringCritical and not the last call.

Thanks,
   Thomas


From thomas.schatzl at oracle.com  Tue Jun 30 14:21:46 2020
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Tue, 30 Jun 2020 16:21:46 +0200
Subject: RFR (M): 8210462: Fix remaining mentions of initial mark
Message-ID: <bc347a79-63d4-e259-3e57-53ddb4a00be1@oracle.com>

Hi all,

   can I have reviews for this change that removes remaining use of the 
"initial mark pause" to the "newer" concurrent start in the code?

This is mostly a straightforward s/initial mark/concurrent start/ 
replacement.

CR:
https://bugs.openjdk.java.net/browse/JDK-8210462
Webrev:
http://cr.openjdk.java.net/~tschatzl/8210462/webrev/
Testing:
tier1-5

Thanks,
   Thomas


From stefan.karlsson at oracle.com  Tue Jun 30 16:35:31 2020
From: stefan.karlsson at oracle.com (Stefan Karlsson)
Date: Tue, 30 Jun 2020 18:35:31 +0200
Subject: [16] RFR: 8248391: Unify handling of all OopStorage instances in
 weak root processing
In-Reply-To: <11EE1DA7-3265-4A40-BA37-95B66B129F50@oracle.com>
References: <8e831a42-61b0-1605-ea71-09f97d82f328@oracle.com>
 <11EE1DA7-3265-4A40-BA37-95B66B129F50@oracle.com>
Message-ID: <eb24f5f4-fdcc-9f48-99d3-ff7a0ac9f86b@oracle.com>

I talked to kim about this point:

"The GC (in particular, OopStorage) shouldn't know about client
subsystems in order to record the notification function"

The current OopStorageSet implementation already knows about the 
different client subsystems, therefore I though that adding the 
notification registration there actually made sense. I've seen the 
alternative where the notification registration happens in the client 
subsystem, and that patch didn't look that great either, IMHO.

So, I decided to create a PoC to (almost) completely decouple the client 
subsystems and the OopStorageSet. With the patch, the OopStorageSet 
doesn't know anything about the existing subsystems. It only knows about 
the number of weak and strong OopStorages in the JVM. The need to know 
the exact numbers arise from the wish to statically know that number 
during compile time. I'm not sure it's important to keep this static 
nature of the number of OopStorages, instead of using a more dynamic 
approach, but I've left this as is for now.

https://cr.openjdk.java.net/~stefank/8248391/webrev.02.delta/
https://cr.openjdk.java.net/~stefank/8248391/webrev.02

To see the effect of this patch, take a look at the StringTable changes. 
The StringTable now creates it's own OopStorage by calling the 
OopStorageSet::create_weak() factory function, that both creates the 
OopStorage and registers it so that the GCs will visit it during their 
iterations over OopStorages:

+ _oop_storage = OopStorageSet::create_weak("StringTable Weak");
+ _oop_storage->register_notification_function(&gc_notification);

The associated WeakHandles use the StringTable internal/private 
reference to its OopStorage:

      // Callers have already looked up the String using the jchar* name, so just go to add.
- WeakHandle wh(OopStorageSet::string_table_weak(), string_h);
+ WeakHandle wh(_oop_storage, string_h);

One problem with the current state of this patch is that during G1's 
heap initialization, it wants to eagerly figure out all OopStorage 
names. The problem is that with my patch OopStorages are initialized 
together with its subsystem, and the StringTable and ResolvedMethodTable 
are initialized after the heap. It's a bit unfortunate, and I would like 
to find a good solution for that, but for this PoC I've changed the 
G1Policy::phase_times to lazily created phase times instance. This moves 
the allocations to the first GC. I'm open for suggestions on what do do 
about that part.

Thanks,
StefanK

On 2020-06-30 09:10, Kim Barrett wrote:
>> On Jun 26, 2020, at 9:06 AM, Erik ?sterlund <erik.osterlund at oracle.com> wrote:
>>
>> Hi,
>>
>> Today, when a weak OopStorage is added, you have to plug it in explicitly to ZGC, Shenandoah and the WeakProcessor, used by Shenandoah, Serial, Parallel and G1. Especially when the runtime data structure associated with an OopStorage needs a notification when oops die. Then you have to explicitly plug in notification code in various places in GC code.
>> It would be ideal if this process could be completely automated.
>>
>> This patch allows each OopStorage to have an associated notification function. This is a callback function into the runtime, stating how many oops have died this GC cycle. This allows runtime data structures to perform accounting for how large part of the data structure needs cleaning, and whether to trigger such cleaning or not.
>>
>> So the interface between the GCs to the OopStorage is that during weak processing, the GC promises to call the callback function with how many oops died. Some shared infrastructure makes this very easy for the GCs.
>>
>> Weak processing now uses the OopStorageSet iterators across all GCs, so that adding a new weak OopStorage (even with notification functions) does not require touching any GC code.
>>
>> Kudos to Zhengyu for providing some Shenandoah code for this, and StefanK for pre-reviewing it. Also, I am about to go out of office now, so StefanK promised to take it from here. Big thanks for that!
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8248391
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8248391/webrev.00/
>>
>> Thanks,
>> /Erik
> While I approve of the goal for this change, I have a couple of issues
> of some significance with the approach taken.
>
> I think the way the gc notifications are being set up is backward.
> The GC (in particular, OopStorage) shouldn't know about client
> subsystems in order to record the notification function.  That seems
> like a dependency inversion.  Rather, I think client subsystems should
> be registering callbacks with the GC.
>
> I've never liked the existing mechanism used to gather iteration
> statistics (number of entries cleared, skipped, processed...).  It
> always seemed rather bolted on after the fact.  This change continues
> that.  I'd rather take the opportunity to improve that, by improving
> OopStorage in this area.
>
> I'm working on a reworked version of Erik's changes.  So far it's
> looking pretty good.  Some of Erik's work I'm using as-is, but some
> parts not so much.  I'm not yet ready to send anything out for review,
> but mentioning it here so others hopefully don't spend too much time
> reviewing the existing change.
>
> I've discussed this with Stefan, and he's agreed to hand off
> responsibility for this project to me.
>