From thomas.schatzl at oracle.com Tue Oct 1 08:59:30 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 1 Oct 2019 10:59:30 +0200 Subject: G1 patch of elastic Java heap In-Reply-To: References: <6270ce59-4a8e-431e-9ccf-f6d2c0f927eb.maoliang.ml@alibaba-inc.com> <1267a5dd2cf6cc1d03df64d07a06ba0f45195951.camel@oracle.com> <3140197d-8cab-4a86-af92-58431c74cb6b.maoliang.ml@alibaba-inc.com> Message-ID: <73a32f24-3f6e-8396-779f-5f21284200e9@oracle.com> Hi Liang, just to you: I am looking into your changes, I need some time to think about what you wrote here and trying to find out how this works in the patch. Thanks, Thomas From shade at redhat.com Tue Oct 1 10:48:35 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 1 Oct 2019 12:48:35 +0200 Subject: RFR (S) 8231667: Shenandoah: Full GC should take empty regions into slices for compaction Message-ID: <0b56c1ea-00c5-dbdc-7e2a-556c4147f26f@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8231667 Fix: https://cr.openjdk.java.net/~shade/8231667/wevrev.01/ There is a problem with current Full GC that makes some tests fail with OOME unnecessarily. See details in the bug report. Testing: {x86_64, x86_32} hotspot_gc_shenandoah, affected tests -- Thanks, -Aleksey From rkennke at redhat.com Tue Oct 1 12:07:38 2019 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 1 Oct 2019 14:07:38 +0200 Subject: RFR (S) 8231667: Shenandoah: Full GC should take empty regions into slices for compaction In-Reply-To: <0b56c1ea-00c5-dbdc-7e2a-556c4147f26f@redhat.com> References: <0b56c1ea-00c5-dbdc-7e2a-556c4147f26f@redhat.com> Message-ID: Ok. Thanks, Roman > Bug: > https://bugs.openjdk.java.net/browse/JDK-8231667 > > Fix: > https://cr.openjdk.java.net/~shade/8231667/wevrev.01/ > > There is a problem with current Full GC that makes some tests fail with OOME unnecessarily. See > details in the bug report. > > Testing: {x86_64, x86_32} hotspot_gc_shenandoah, affected tests > From sangheon.kim at oracle.com Tue Oct 1 16:43:58 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 1 Oct 2019 09:43:58 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> Message-ID: <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Hi Kim and others, This webrev.2 simplified a bit more after changing 'heap expansion' approach. Previously heap may expand with preferred numa id which means contiguous same numa id heap regions may exist but current version is assuming to have evenly split heap regions. i.e. 4 numa node system, heap regions will be 012301230123, so if we know address or heap region index, we can know preferred numa id. Many codes related to support previous style expansion were removed. On 9/24/19 6:44 PM, Kim Barrett wrote: >> On Sep 21, 2019, at 1:19 AM,sangheon.kim at oracle.com wrote: >> >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1 >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1.inc (this may not help much! :) ) >> Testing: hs-tier 1 ~ 5 (with/without UseNUMA) > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1AllocRegion.hpp > 96 uint _node_index; > > Protected; should be private. _node_index is used from derived classes. Are you suggesting to add a getter? > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Allocator.cpp > 42 _mutator_alloc_region(NULL), > > Should be _mutator_alloc_regions (plural), since it's now an array. > > Similarly, these should be pluralized: > 67 void G1Allocator::init_mutator_alloc_region() { > 74 void G1Allocator::release_mutator_alloc_region() { > > And this > 48 // The number of MutatorAllocRegions used, one per memory node. > 49 size_t _num_alloc_region; Done > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Allocator.cpp > 53 G1Allocator::~G1Allocator() { > 54 for (uint i = 0; i < _num_alloc_region; i++) { > 55 _mutator_alloc_region[i].~MutatorAllocRegion(); > 56 } > 57 FREE_C_HEAP_ARRAY(MutatorAllocRegion, _mutator_alloc_region); > 58 } > > --- should also be calling _mutator_alloc_region[i].release() ?? > --- or does destructor do that? No, release() is never called. release() is not actually releasing allocated resources but sets null to pointers and inc/dec some numbers such as used bytes. So I was thinking we don't need to call release(). > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Arguments.cpp > 161 if (UseNUMA) { > 162 if (FLAG_IS_DEFAULT(AlwaysPreTouch)) { > 163 FLAG_SET_DEFAULT(AlwaysPreTouch, true); > 164 } > 165 if (!AlwaysPreTouch && FLAG_IS_CMDLINE(AlwaysPreTouch)) { > 166 warning("Disabling AlwaysPreTouch is incompatible with UseNUMA. Disabling UseNUMA."); > 167 FLAG_SET_ERGO(UseNUMA, false); > 168 } > 169 } > > Stefan asked about why AlwaysPreTouch is required when UseNUMA. I have > a different question. Assuming UseNUMA does require AlwaysPreTouch, > why is !AlwaysPreTouch winning here? Why not have UseNUMA win if they > are conflicting? As webrev.2 removes above code, we can skip this discussion? > But see discussion below about > G1RegionsSmallerThanCommitSizeMapper::commit_regions(), which > suggested AlwaysPreTouch is required. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp > 83 G1PageBasedVirtualSpace::~G1PageBasedVirtualSpace() { > ... > 92 _numa = NULL; > 93 } > > [pre-existing] Destructors are for resource management. Nulling out / > zeroing out members in a destructor generally isn't useful. This is > really a comment on the existing code rather than a request to change > anything. The addition of line 92 is okay in context, just the context > is not good. Agreed on pre-existing. The intent here is to align with existing context, so leave as is? > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp > 108 _next_node_index(G1MemoryNodeManager::mgr()->num_active_nodes() - 1), > 109 _max_node_index(G1MemoryNodeManager::mgr()->num_active_nodes()) { > > Consider reversing the order of these members and their initializers, > so the _next_node_index can use _max_node_index rather than another > call to num_active_nodes(). Good point! However this newly added part is removed. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp > 113 uint next_node_index() const { > 114 return _next_node_index; > 115 } > > I think this is mis-named. It's the current index for the > distributor. I think it should just be called "node_index". Agree, but this line is also removed. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp > > I'm confused by G1RegionsSmallerThanCommitSizeMapper::commit_regions(). > > For the LargerThan mapper, we have a sequence of regions that > completely covers a sequence of pages, and we commit all of the > associated pages using the requested node_index. > > For the SmallerThan mapper, we have a sequence of regions split up > into subsequences that are each contained in a single page. The first > such subsequence might be on an already committed page. Similarly for > the last subsequence. Nothing is done to those pages. > > In between there may be a series of region sequences, with each region > sequence on a single page. If there are more than one of these region > sequences then more than one page will need to be committed. > > As we step through the seuqnce of pages and commit them, we also step > the numa index to use for each page. > > Stefan asked a question in this area about the mechanism by which the > node stepping is provided, and you responded with what sounds like an > improvement. But I have a different question. > > Why are we committing different pages on different numa nodes? The > caller requested these regions be on the requested node. Why are we > not honoring that request (as much as possible within the constraints > of possible leading and trailing regions being on already committed > pages.) The comment for G1NodeDistributor discusses (at a high level) > what it's doing (e.g. a short summary of the above description), but > there is no discussion of why that distribution is needed or > desirable. If I understand your question correctly, we do honor 'requested node index' at G1RegionSmallerThan case. Please look at 'G1NodeDistributor::next()'. ??? void next() { ????? if (_requested_node_index == G1MemoryNodeManager::AnyNodeIndex) { ??????? _node_index = (_node_index + 1) % _max_node_index; ????? } else { ??????? _node_index = _requested_node_index; ????? } If _requested_node_index is AnyNodeIndex, we cycle through valid node indices. This code is also removed. So G1NUMA is responsible to decide preferred node and upper APIs only decide whether need to expand or not. > There might be a good reason for this behavior, in which case your > response with an improvement sounds good. But if so, I'm guessing I > won't be the only one who doesn't know what that reason might be, and > it would be good to provide an explanatory comment. And of course, if > there isn't a good reason... > > I think there is also a problem here if AlwaysPreTouch is false. (As > discussed earlier, maybe it isn't required to be true.) The node index > for the committed regions gets set (in make_regions_available) via the > result of the syscall, so we really need pretouch to have been done. > The alternative would be to assume commit_regions used the requested > numa node. But with the request stepping that wouldn't hold. Of > course, it also doesn't hold for any leading or trailing regions that > were covered by already committed pages. > > I think this is the basis of your argument that AlwaysPreTouch is > required for UseNUMA, and I think I'm now agreeing. Otherwise we may > think the leading and trailing regions in the sequence are on a > different node than they actually are, since the associated pages may > have already been committed on a different node than requested, but > not yet touched. The leading regions are only committed, so other regions which belong to same page will not actually committed, so touching issue doesn't happen. SmallerThan class is supposed to handle this situation. > But I still don't know why we would want to cycle through nodes for > the "middle" pages. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegionManager.cpp > 127 if (G1VerifyNUMAIdOfHeapRegions) { > 128 // Read actual node index via system call. > 129 uint actual_node_index = mgr->index_of_address(hr->bottom()); > 130 if (hr->node_index() != actual_node_index) { > > Can we actually do this here? I thought the system call only gave a > useful answer if the addressed location has been paged in. I'm not > sure that's necessarily happened at this point. At webrev.1: We can do this here because webrev.1 assumes AlwaysPreTouch is enabled. So at the time of commit, we pretouch as soon as commit is finished. And we can check actual node id here. At webrev.2: AlwaysPreTouch is NOT coupled with UseNUMA. We trust OS that the requested memory will be located on preferred node. i.e. we don't actually touch the memory. > I think Stefan suggested the logging of mismatches between requested > and actual numa node for a region should occur at region retirement. > We could log mismatches there and correct the region's information. > > But see discussion above about > G1RegionsSmallerThanCommitSizeMapper::commit_regions(). If > AlwaysPreTouch is indeed required, then this code is okay. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegionSet.inline.hpp > 156 HeapRegion * cur; > > s/HeapRegion */HeapRegion*/ Done > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > 38 static const uint InvalidNodeIndex = (uint)os::InvalidId; > 39 static const uint AnyNodeIndex = (uint)os::AnyId; > > I complained about these in my review of webrev.0. These are making > very strong assumptions about the values of the os Id values, for no > good reason that I can see. You responded > > "But the intend is to make same value after casting for same meaning > constants instead of randomly chosen ones." > > I don't buy that. There aren't any casts that I can see between NUMA > ids and indexes. Nor should there be any such casts. If there were, > I'd strongly question them, as being units mismatches. Okay, Fixed similar to your previous comment. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > 54 virtual const int* node_ids() const { static int dummy_id = 0; return &dummy_id; } > > dummy_id should be const. > > I would probably put that definition in the .cpp file. I've run into > multiple compiler bugs with function scoped static variables in inline > functions. Not recently, but I'm paranoid. Good to know. Done > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > 34 class G1MemoryNodeManager : public CHeapObj { > ... > 43 static G1MemoryNodeManager* create(); > > Given that we have a factory function that should be used for > creation, the constructor ought to be non-public. It needs to be > protected so the derived G1MemoryNodeManager can refer to it. Changed to protected. > A different approach would have G1MemoryNodeManager be abstract (with > all virtuals but the destructor being pure), with hidden (possibly > private nested) classes for the single-node and multi-node cases. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp > 62 if (UseNUMA SOLARIS_ONLY(&& false)) { > > I thought we were only providing Linux support. This seems like it > would attempt to work (and probably fail somewhere later) on other > platforms (anything not Linux or Solaris). You are right. Changed to use LINUX_ONLY() macro. Please correct me if I misunderstood your point. :) All platforms are allowed to set +UseNUMA and eventually get some benefit from UseNUMAInterleaving. Treating Windows and Mac are easy because those have only one active node. However, Solaris may have multiple active nodes, so above line is added. I believe that is one of the simplest way to filter out Solaris case on top of existing filtering logic(active node check). But as you pointed out, previous one is not that clear so changed to use LINUX_ONLY(). > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp > 64 G1NUMA* numa = new G1NUMA(); > 65 > 66 if (numa != NULL) { > > numa cannot be NULL here. Done > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp > 86 G1MemoryMultiNodeManager::~G1MemoryMultiNodeManager() { > 87 delete _numa; > 88 } > > This is leaving a stale pointer to the G1NUMA object in wherever > G1NUMA::set_numa stashed it. G1NUMA::_inst = NULL is added at the dtor of G1NUMA because G1NUMA::set_numa() sets '_inst'. Correct me if I misunderstood. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1CollectedHeap.cpp > 1500 _mem_node_mgr(G1MemoryNodeManager::create()), > > Maybe this manager should be created in G1CH::initialize() (and > initialized here to NULL). No. We need the number of active node ids at G1Allocator::G1Allocator. This is the reason why G1MemoryNodeManager is created earlier. Previously HeapRegionManager also had dependency but it is removed now. > Then the page_size could be passed to create, and there wouldn't be a > need to later set the page size of the manager and pass that along to > the G1NUMA, instead both getting it as a constructor argument. Then > the associated setters go away too. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.hpp > 95 // Gets a next valid numa id. > 96 inline int next_numa_id(); > > Appears to be unused. Removed. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp > 102 uint G1MemoryMultiNodeManager::index_of_current_thread() const { > 103 int node_id = os::numa_get_group_id(); > 104 return _numa->index_of_numa_id(node_id); > 105 } > > Other than here, os::numa_xxx usage is encapsulated in G1NUMA, with > the manager forwarding to the G1NUMA object as needed. I suggest > doing that here too. (Note that this file doesn't #include os.hpp.) > I think doing so eliminates the need for G1NUMA::index_of_numa_id(), > which also seems like a good thing. Done. Added G1NUMA::index_of_current_thread() to remove os call. However still we need G1NUMA::index_of_numa_id() which is used at G1NUMA::index_of_address(HeapWord*). > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.cpp > 92 void G1NUMA::touch_memory(address aligned_address, size_t size_in_bytes, uint numa_index) { > > Assert aligned_address is page aligned? > Assert size_in_bytes is a page aligned? Added 2 assertions as you commented. The first one 'aligned_address' means page size aligned though. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.cpp > 42 memset(_numa_id_to_index_map, > 43 G1MemoryNodeManager::InvalidNodeIndex, > 44 sizeof(uint) * _len_numa_id_to_index_map); > > memset only works here because all bytes of InvalidNodeIndex happen to > have the same value. I would prefer an explicit fill loop rather than > memset here. Or a static assert on the value, but that's probably > more code. Changed to fill during loop. I'm aware of this and the only reason of changing InvalidNodeIndex from 0xfffe to 0xffff was to use memset here. I was thinking you are okay with memset as you commented to use memset from your previous email. :) webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc Testing: hs-tier1 ~ 5 +-UseNUMA Thanks, Sangheon > ------------------------------------------------------------------------------ > From sangheon.kim at oracle.com Tue Oct 1 16:53:16 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 1 Oct 2019 09:53:16 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: References: Message-ID: Hi all, As JDK-8220310 changed a lot, I'm posting next webrev. Previous webrev just conflicts. Webrev: http://cr.openjdk.java.net/~sangheki/8220311/webrev.1 http://cr.openjdk.java.net/~sangheki/8220311/webrev.1.inc Testing: hs-tier 1 ~ 5 with +- UseNUMA Thanks, Sangheon On 9/4/19 12:16 AM, sangheon.kim at oracle.com wrote: > Hi all, > > Please review this patch making G1 NUMA aware. > This is the second part of G1 NUMA implementation: > - Making Survivor region NUMA aware. > > CR: https://bugs.openjdk.java.net/browse/JDK-8220311 > Webrev: http://cr.openjdk.java.net/~sangheki/8220311/webrev.0 > Testing: hs-tier 1 ~ 5 with +- UseNUMA > > Thanks, > Sangheon From kim.barrett at oracle.com Wed Oct 2 01:08:32 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 1 Oct 2019 21:08:32 -0400 Subject: RFR: 8231153: Improve concurrent refinement statistics In-Reply-To: <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> References: <1DADC595-3106-4CE7-BA5D-7B6C7EE0E81E@oracle.com> <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> Message-ID: > On Sep 25, 2019, at 5:40 AM, Thomas Schatzl wrote: >> https://bugs.openjdk.java.net/browse/JDK-8231153 >> Webrev: >> https://cr.openjdk.java.net/~kbarrett/8231153/open.00/ >> Testing: >> mach5 tier1-5 >> some local by-hand testing to look at the rate tracking. I've made some updates to names, per some offline discussion with Thomas. The main one is to use a "total_" prefix in various places for accumulated refinement time and accumulated number of cards refined. Also backed out any "num_" prefix removals from the original change. Removed the unsynchronized accumulation of the number of concurrently scanned cards. I also fixed a bug in the card logging rate calculation. The function I was using to get the end time for the last GC pause (last_known_gc_end_time_sec) doesn't do any such thing, but has a confusing name (JDK-8231638). > When reading the change I had the following thoughts to improve readability: > > - maybe some comment somewhere what "scanned" really means compared to "refined". Initially I was surprised with the change at G1RemSet::_num_conc_scanned_cards, but some thinking made me aware of the difference. No longer relevant, as that's been removed. > - the change reuses the "processed" term for counted cards in a few places, and it is unclear to me what the difference to just "refined" cards would be in some cases. Fixed. > - I would also suggest to add a "num_" prefix to numbers/counts of values. Using "total_" prefix. > - in G1Policy::_pending_cards should be renamed to "_pending_cards_at_start_of_gc" since we also now have a "_pending_cards_after_last_gc" to distinguish their use a little better? Updated names: pending_cards_at_gc_start pending_cards_at_prev_gc_end > - pre-existing: probably rename G1RemSet::_num_conc_scanned_cards and G1RemSetSummary::_conc_scanned_cards to "_concurrent_scanned_cards" to match the "_concurrent_refined_cards?. Fixed by code deletion. > - not sure, but I think exposing size() and start() and in G1FreeIdSet seems unnecessary: the only user is G1DirtyCardQueueSet anyway, and it is already owner of G1FreeIdSet. I.e. it knows these values already (and passes it to the initializer of the G1FreeIdSet instance, and already has a getter for the size() value), so getting it back from G1FreeIdSet seems a bit strange to me, but I am okay with current code. I've backed out the changes to G1FreeIdSet, and instead introduced in G1DirtyCardQueue a private function providing the start index (always returning 0) and used it in the two relevant places, along with noting that there is code elsewhere that is assuming a 0 value. That can be cleaned up later (JDK-8231734). New webrevs: full: https://cr.openjdk.java.net/~kbarrett/8231153/open.01/ incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.01.inc/ Testing: mach5 tier1-5 some local by-hand testing to look at the rate tracking. From thomas.schatzl at oracle.com Wed Oct 2 09:57:06 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 2 Oct 2019 11:57:06 +0200 Subject: RFR: 8231153: Improve concurrent refinement statistics In-Reply-To: References: <1DADC595-3106-4CE7-BA5D-7B6C7EE0E81E@oracle.com> <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> Message-ID: <4ac5b37a-ba45-a0ec-359f-e15501af639e@oracle.com> Hi, On 02.10.19 03:08, Kim Barrett wrote: >> On Sep 25, 2019, at 5:40 AM, Thomas Schatzl wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8231153 >>> Webrev: >>> https://cr.openjdk.java.net/~kbarrett/8231153/open.00/ >>> Testing: >>> mach5 tier1-5 >>> some local by-hand testing to look at the rate tracking. > > I've made some updates to names, per some offline discussion with > Thomas. The main one is to use a "total_" prefix in various places > for accumulated refinement time and accumulated number of cards > refined. Also backed out any "num_" prefix removals from the original > change. Removed the unsynchronized accumulation of the number of > concurrently scanned cards. > > I also fixed a bug in the card logging rate calculation. The function > I was using to get the end time for the last GC pause > (last_known_gc_end_time_sec) doesn't do any such thing, but has a > confusing name (JDK-8231638). > > >> When reading the change I had the following thoughts to improve readability: >> [...] Thanks a lot for considering my comments. > >> - not sure, but I think exposing size() and start() and in G1FreeIdSet seems unnecessary: the only user is G1DirtyCardQueueSet anyway, and it is already owner of G1FreeIdSet. I.e. it knows these values already (and passes it to the initializer of the G1FreeIdSet instance, and already has a getter for the size() value), so getting it back from G1FreeIdSet seems a bit strange to me, but I am okay with current code. > > I've backed out the changes to G1FreeIdSet, and instead introduced in > G1DirtyCardQueue a private function providing the start index (always > returning 0) and used it in the two relevant places, along with noting > that there is code elsewhere that is assuming a 0 value. That can be > cleaned up later (JDK-8231734). > > New webrevs: > full: https://cr.openjdk.java.net/~kbarrett/8231153/open.01/ > incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.01.inc/ > > Testing: > mach5 tier1-5 > some local by-hand testing to look at the rate tracking. > looks good. Thanks, Thomas From sangheon.kim at oracle.com Wed Oct 2 17:11:26 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Wed, 2 Oct 2019 10:11:26 -0700 Subject: RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3) In-Reply-To: References: Message-ID: Hi, Here's the rebased webrev with minor changes. Webrev: http://cr.openjdk.java.net/~sangheki/8220312/webrev.1 http://cr.openjdk.java.net/~sangheki/8220312/webrev.1.inc Testing: hs-tier 1 ~ 5 with +- UseNUMA FYI, here's the full patch including JDK-8220310, 8220311, 8220312. http://cr.openjdk.java.net/~sangheki/8220312/webrev.full/ Thanks, Sangheon On 9/4/19 12:16 AM, sangheon.kim at oracle.com wrote: > Hi all, > > Please review this patch making G1 NUMA aware. > This is the last part of G1 NUMA implementation: > - Adding logs and stat. > > CR: https://bugs.openjdk.java.net/browse/JDK-8220312 > Webrev: http://cr.openjdk.java.net/~sangheki/8220312/webrev.0 > Testing: hs-tier 1 ~ 8 with +- UseNUMA > > Thanks, > Sangheon From per.liden at oracle.com Wed Oct 2 21:53:16 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 2 Oct 2019 23:53:16 +0200 Subject: RFR: 8231774: ZGC: ZVirtualMemoryManager unmaps incorrect address Message-ID: When failing to map the requested address, map() in ZVirtualMemoryManager.cpp, incorrectly calls unmap(start, size) instead of unmap(res, size). Bug: https://bugs.openjdk.java.net/browse/JDK-8231774 Webrev: http://cr.openjdk.java.net/~pliden/8231774/webrev.0 /Per From per.liden at oracle.com Wed Oct 2 22:28:26 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 00:28:26 +0200 Subject: RFR: 8231776: ZGC: Fix incorrect address space description Message-ID: <0942d5a7-d6d7-2e81-97ef-b8fc55880f02@oracle.com> After JDK-8224820, the space between the Remapped heap view and the Marked1 heap view is no longer reserved. The ASCII art describing the address space layout should be updated to reflect that. Bug: https://bugs.openjdk.java.net/browse/JDK-8231776 Webrev: http://cr.openjdk.java.net/~pliden/8231776/webrev.0 /Per From kim.barrett at oracle.com Wed Oct 2 23:55:05 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 2 Oct 2019 19:55:05 -0400 Subject: RFR: 8231774: ZGC: ZVirtualMemoryManager unmaps incorrect address In-Reply-To: References: Message-ID: <8D5D60EF-5F2C-4C7A-A50C-79ABB1AE0254@oracle.com> > On Oct 2, 2019, at 5:53 PM, Per Liden wrote: > > When failing to map the requested address, map() in ZVirtualMemoryManager.cpp, incorrectly calls unmap(start, size) instead of unmap(res, size). > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231774 > Webrev: http://cr.openjdk.java.net/~pliden/8231774/webrev.0 > > /Per Looks good. From per.liden at oracle.com Thu Oct 3 04:24:55 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 06:24:55 +0200 Subject: RFR: 8231774: ZGC: ZVirtualMemoryManager unmaps incorrect address In-Reply-To: <8D5D60EF-5F2C-4C7A-A50C-79ABB1AE0254@oracle.com> References: <8D5D60EF-5F2C-4C7A-A50C-79ABB1AE0254@oracle.com> Message-ID: Thanks Kim! /Per > On 3 Oct 2019, at 01:55, Kim Barrett wrote: > > ? >> >> On Oct 2, 2019, at 5:53 PM, Per Liden wrote: >> >> When failing to map the requested address, map() in ZVirtualMemoryManager.cpp, incorrectly calls unmap(start, size) instead of unmap(res, size). >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231774 >> Webrev: http://cr.openjdk.java.net/~pliden/8231774/webrev.0 >> >> /Per > > Looks good. > From stefan.karlsson at oracle.com Thu Oct 3 06:27:30 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 3 Oct 2019 08:27:30 +0200 Subject: RFR: 8231774: ZGC: ZVirtualMemoryManager unmaps incorrect address In-Reply-To: References: Message-ID: <0c38d6bd-bfb0-a16d-c247-ee28f3902883@oracle.com> Looks good. StefanK On 2019-10-02 23:53, Per Liden wrote: > When failing to map the requested address, map() in > ZVirtualMemoryManager.cpp, incorrectly calls unmap(start, size) instead > of unmap(res, size). > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231774 > Webrev: http://cr.openjdk.java.net/~pliden/8231774/webrev.0 > > /Per From per.liden at oracle.com Thu Oct 3 06:39:46 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 08:39:46 +0200 Subject: RFR: 8231774: ZGC: ZVirtualMemoryManager unmaps incorrect address In-Reply-To: <0c38d6bd-bfb0-a16d-c247-ee28f3902883@oracle.com> References: <0c38d6bd-bfb0-a16d-c247-ee28f3902883@oracle.com> Message-ID: <727f9841-ef20-db44-6edf-d610a8b8d5c7@oracle.com> Thanks! /Per On 10/3/19 8:27 AM, Stefan Karlsson wrote: > Looks good. > > StefanK > > On 2019-10-02 23:53, Per Liden wrote: >> When failing to map the requested address, map() in >> ZVirtualMemoryManager.cpp, incorrectly calls unmap(start, size) >> instead of unmap(res, size). >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231774 >> Webrev: http://cr.openjdk.java.net/~pliden/8231774/webrev.0 >> >> /Per From stefan.karlsson at oracle.com Thu Oct 3 07:48:25 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 3 Oct 2019 09:48:25 +0200 Subject: RFR: 8231563: ZGC: Fails to warn when user sets the max heap size to larger than 16TB In-Reply-To: References: Message-ID: <869788d7-3cf4-6891-81aa-eeb9508f0f5c@oracle.com> Thanks, Thomas. StefanK On 2019-09-27 10:46, Thomas Schatzl wrote: > Hi, > > On 27.09.19 09:16, Stefan Karlsson wrote: >> Hi all, >> >> Please review this small patch to fix the max heap size check in ZGC. >> >> https://cr.openjdk.java.net/~stefank/8231563/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8231563 >> >> After this fix the JVM refuses to start if a too high -Xmx is set: >> >> $ java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xmx17t -version: >> Error occurred during initialization of VM >> Java heap too large > > ? looks good to me. > > Thomas > From stefan.karlsson at oracle.com Thu Oct 3 07:48:36 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 3 Oct 2019 09:48:36 +0200 Subject: RFR: 8231563: ZGC: Fails to warn when user sets the max heap size to larger than 16TB In-Reply-To: <18a67b8d-79b1-ccfd-daa5-9f8552ee2f9a@oracle.com> References: <18a67b8d-79b1-ccfd-daa5-9f8552ee2f9a@oracle.com> Message-ID: <6a72c5b7-f4a9-67a1-5b9e-cc9608c4b44a@oracle.com> Thanks, Per. StefanK On 2019-09-27 15:18, Per Liden wrote: > Looks good! > > /Per > > On 9/27/19 9:16 AM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this small patch to fix the max heap size check in ZGC. >> >> https://cr.openjdk.java.net/~stefank/8231563/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8231563 >> >> After this fix the JVM refuses to start if a too high -Xmx is set: >> >> $ java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xmx17t -version: >> Error occurred during initialization of VM >> Java heap too large >> >> Thanks, >> StefanK From per.liden at oracle.com Thu Oct 3 08:47:30 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 10:47:30 +0200 Subject: RFR: 8231489: GC watermark_0_1 failed due to "metaspace.gc.Fault: GC has happened too rare" Message-ID: <2aa33fe4-5d04-cd58-f786-d7a12b977ccf@oracle.com> vmTestbase/metaspace/gc/HighWaterMarkTest relies on timing and fails when "Metaspace GC Threshold" isn't handled in a STW pause. The problem can be reproduced on both G1 and ZGC, but it's hard, as the window is small. However, it reproduces every time when injecting a 100ms delay to prolong the GC cycle a bit. This test used to be disabled for G1 with ClassUnloadingWithConcurrentMark, but JDK-8204163 enabled it about a year ago. Fixing the test properly is tricky. As far as I can see, we can either: 1) Disable this test for G1+ClassUnloadingWithConcurrentMark and ZGC, or 2) Add a sleep in the test loop, to make the race less likely to happen, or 3) Remove the test completely, with the rational that it's a buggy low value test. I've gone with 1) here. The test is already disabled for CMS today, with code in the test itself (i.e. not using @requires), so I did two alternative patches: A) Follows the existing style to disable the other GCs: http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt1 B) Adds @requires to the tests using the HighWaterMarkTest class, and removes the old check to disable CMS: http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt2 I prefer B, but I don't have a strong opinion on which way to go. Bug: https://bugs.openjdk.java.net/browse/JDK-8231489 /Per From erik.osterlund at oracle.com Thu Oct 3 08:56:48 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Thu, 3 Oct 2019 10:56:48 +0200 Subject: RFR: 8231776: ZGC: Fix incorrect address space description In-Reply-To: <0942d5a7-d6d7-2e81-97ef-b8fc55880f02@oracle.com> References: <0942d5a7-d6d7-2e81-97ef-b8fc55880f02@oracle.com> Message-ID: Hi Per, Looks good. /Erik On 10/3/19 12:28 AM, Per Liden wrote: > After JDK-8224820, the space between the Remapped heap view and the > Marked1 heap view is no longer reserved. The ASCII art describing the > address space layout should be updated to reflect that. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231776 > Webrev: http://cr.openjdk.java.net/~pliden/8231776/webrev.0 > > /Per From per.liden at oracle.com Thu Oct 3 08:59:58 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 10:59:58 +0200 Subject: RFR: 8231776: ZGC: Fix incorrect address space description In-Reply-To: References: <0942d5a7-d6d7-2e81-97ef-b8fc55880f02@oracle.com> Message-ID: <3ccac031-22da-0b21-dfc6-23643eddf42b@oracle.com> Thanks! /Per On 10/3/19 10:56 AM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good. > > /Erik > > On 10/3/19 12:28 AM, Per Liden wrote: >> After JDK-8224820, the space between the Remapped heap view and the >> Marked1 heap view is no longer reserved. The ASCII art describing the >> address space layout should be updated to reflect that. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231776 >> Webrev: http://cr.openjdk.java.net/~pliden/8231776/webrev.0 >> >> /Per > From per.liden at oracle.com Thu Oct 3 09:34:13 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 11:34:13 +0200 Subject: RFR: 8231825: ZGC: Remove ZMaxHeapSize and ZMaxHeapSizeShift Message-ID: The global constants ZMaxHeapSize, ZMaxHeapSizeShift and their respective platform versions can be removed. The single case where ZMaxHeapSize is used can be replaced by ZAddressOffsetMax. Bug: https://bugs.openjdk.java.net/browse/JDK-8231825 Webrev: http://cr.openjdk.java.net/~pliden/8231825/webrev.0 /Per From per.liden at oracle.com Thu Oct 3 09:45:47 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 11:45:47 +0200 Subject: RFR: 8231552: ZGC: Refine address space reservation Message-ID: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> We could be slightly more sophisticated and do a better job reserving address space in situations where parts of the address space is already occupied or when the process is running with address space limitations. Bug: https://bugs.openjdk.java.net/browse/JDK-8231552 Webrev: http://cr.openjdk.java.net/~pliden/8231552/webrev.0 /Per From erik.osterlund at oracle.com Thu Oct 3 10:30:56 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Thu, 3 Oct 2019 12:30:56 +0200 Subject: RFR: 8231825: ZGC: Remove ZMaxHeapSize and ZMaxHeapSizeShift In-Reply-To: References: Message-ID: <399f8de7-1e5b-5d4c-62f0-4ec4d173f1cf@oracle.com> Hi Per, Looks good. Thanks, /Erik On 10/3/19 11:34 AM, Per Liden wrote: > The global constants ZMaxHeapSize, ZMaxHeapSizeShift and their > respective platform versions can be removed. The single case where > ZMaxHeapSize is used can be replaced by ZAddressOffsetMax. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231825 > Webrev: http://cr.openjdk.java.net/~pliden/8231825/webrev.0 > > /Per From thomas.schatzl at oracle.com Thu Oct 3 11:47:20 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 3 Oct 2019 13:47:20 +0200 Subject: RFR: 8231825: ZGC: Remove ZMaxHeapSize and ZMaxHeapSizeShift In-Reply-To: References: Message-ID: <4041d344-c605-0817-5aae-89f4e9ea48c2@oracle.com> Hi, On 03.10.19 11:34, Per Liden wrote: > The global constants ZMaxHeapSize, ZMaxHeapSizeShift and their > respective platform versions can be removed. The single case where > ZMaxHeapSize is used can be replaced by ZAddressOffsetMax. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231825 > Webrev: http://cr.openjdk.java.net/~pliden/8231825/webrev.0 > > /Per looks good. Thomas From per.liden at oracle.com Thu Oct 3 12:17:17 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 14:17:17 +0200 Subject: RFR: 8231825: ZGC: Remove ZMaxHeapSize and ZMaxHeapSizeShift In-Reply-To: <399f8de7-1e5b-5d4c-62f0-4ec4d173f1cf@oracle.com> References: <399f8de7-1e5b-5d4c-62f0-4ec4d173f1cf@oracle.com> Message-ID: <2756ea20-a408-6038-2592-6248573d4e66@oracle.com> Thanks Erik! /Per On 10/3/19 12:30 PM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > > On 10/3/19 11:34 AM, Per Liden wrote: >> The global constants ZMaxHeapSize, ZMaxHeapSizeShift and their >> respective platform versions can be removed. The single case where >> ZMaxHeapSize is used can be replaced by ZAddressOffsetMax. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231825 >> Webrev: http://cr.openjdk.java.net/~pliden/8231825/webrev.0 >> >> /Per > From per.liden at oracle.com Thu Oct 3 13:15:27 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 3 Oct 2019 15:15:27 +0200 Subject: RFR: 8231825: ZGC: Remove ZMaxHeapSize and ZMaxHeapSizeShift In-Reply-To: <4041d344-c605-0817-5aae-89f4e9ea48c2@oracle.com> References: <4041d344-c605-0817-5aae-89f4e9ea48c2@oracle.com> Message-ID: Thanks Thomas! /Per On 10/3/19 1:47 PM, Thomas Schatzl wrote: > Hi, > > On 03.10.19 11:34, Per Liden wrote: >> The global constants ZMaxHeapSize, ZMaxHeapSizeShift and their >> respective platform versions can be removed. The single case where >> ZMaxHeapSize is used can be replaced by ZAddressOffsetMax. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231825 >> Webrev: http://cr.openjdk.java.net/~pliden/8231825/webrev.0 >> >> /Per > > > ? looks good. > > Thomas From mark.reinhold at oracle.com Thu Oct 3 22:11:44 2019 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Thu, 3 Oct 2019 15:11:44 -0700 (PDT) Subject: New candidate JEP: 363: Remove the Concurrent Mark Sweep (CMS) Garbage Collector Message-ID: <20191003221144.85C84309580@eggemoggin.niobe.net> https://openjdk.java.net/jeps/363 - Mark From kishor.kharbas at intel.com Fri Oct 4 01:00:16 2019 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Fri, 4 Oct 2019 01:00:16 +0000 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. Message-ID: Hi, When I worked on JDK-8211425, there was a request for better abstraction for pinning G1's CM bitmaps. RFE for the request is here - JDK-8215893. Here is a proposal : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00/ Here G1PageBasedVirtualSpace pins the entire reserved memory to memory during construction. The constructor takes an additional bool flag which says "does it need to pin the memory". If the memory is pinned, '_special' flag is set to true. I piggy back on _special flag's behavior which is to not do actual OS (un-)commits on calls to (un)commit(). Rest of the changes is the mechanism to pass this flag from CM bitmaps creation in G1CollectedHeap all the way to G1PageBasedVirtualSpace. Let me know if this is a good abstraction and if there is any better way. Thanks Kishor From thomas.schatzl at oracle.com Fri Oct 4 12:11:42 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 4 Oct 2019 14:11:42 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: <43f26130-f340-f23a-11fd-773f696998a9@oracle.com> Hi Sangheon, thanks for your hard work on this! On 01.10.19 18:43, sangheon.kim at oracle.com wrote: > Hi Kim and others, > > This webrev.2 simplified a bit more after changing 'heap expansion' > approach. [...] > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ > http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc > Testing: hs-tier1 ~ 5 +-UseNUMA > Comments: - os_solaris.cpp:2236: indentation addition - os_windows.cpp: os::get_address_id(): the "return 0" is in the same line as the method declaration, while the change uses extra lines in os_bsd. Please make this uniform. g1_globals.cpp: unnecessary whitespace change - G1Allocator::unsafe_max_alloc(): I need to think some more if this is correct - should that really be node specific? (and not the max of all nodes). Otoh I think this is fine. - g1Allocator::used_in_alloc_regions(): not sure why the assert has been removed - os_linux.cpp: os::numa_get_address_id(): I think "id" should be an "int", not uint32_t according to http://man7.org/linux/man-pages/man2/get_mempolicy.2.html And I think you can initialize it with os::InvalidId; - in G1Allocator::current_node_index() retrieving the current node index is part of the G1MemoryNodeManager; I would really prefer if this were some property of a Thread. Not sure what others think. That value could be put into the G1ThreadLocalData. In any case, G1Allocator should probably cache the reference to the G1MemoryNodeManager for faster access. - G1CollectedHeap::expand_single_region(): the log output in the first line looks more like some debug code than generally interesting information. - G1CollectedHeap::expand_single_region(): pre-existing: it should add to the in-safepoint expansion time like expand(); okay to just file a CR. - instead of the "late initialization method" set_page_size() I would prefer to have this value passed in the constructor. It is not required to me to have the create() call in the initialization list of g1CollectedHeap at all costs... it could be put right after we determine the page size in the body of the G1CollectedHeap constructor. - g1CollectedHeap.hpp:940: no need to delete the newline. - g1MemoryNodeManager.cpp:41: that comment does not add information imho - G1NUMA::index_of_current_thread() needs a comment - G1NUMA::index_of_num_id/is_valid_numa_id/ should be private - not sure why G1NUMA::initialize()/set_numa() are needed. It's only call is right after instantiating a G1NUMA instance - G1NUMA::request_memory_on_node needs a comment. - I observed that a *lot* of G1NUMA methods are only used by G1MemoryNodeManager; and G1MemoryNodeManager just forwards to G1NUMA a lot. Maybe these two can be merged? - G1NUMA::preferred_index_for_address/request_memory_on_node: I would prefer if these methods were not hardcoded with HeapRegion metrics as example. I.e. for preferred_index_for_address(), instead of the address it is probably better to pass it the zero-based index directly, that is used for calculating the node index. I.e. all callers know the HeapRegion's index anyway *and* this would make the method independent of G1CollectedHeap. I.e. something like preferred_node_index_for_index(), because then the same method can be reused for other data structures than the heap/heap region. G1NUMA::request_memory_on_node() could also be moved to G1PageBasedVirtualSpace, using the chunk sizes of page based virtual space instead of hardcoding HeapRegion::GrainBytes (i.e. hardcode the method to HeapRegion) - or pass in the "chunk size" calculated there from G1PageBasedVirtualSize. I think this would increase the generality and usefulness of G1NUMA/G1MemoryNodeManager a lot without "passing in too many node indices everywhere". - G1PageBasedVirtualSpace: the _numa member seems to be used exactly in one method where performance does not look critical. Maybe it is better to reference it directly there via G1CollectedHeap. - heapRegionManager.cpp:verfiy_actual_node_index(): not sure if that should be debug level. I would also kind of prefer a method that iterates over all regions and prints a summary status (and potentially drop this per-region checking at least when allocating a new free region). It is sufficient to print a summary of expected/actual values, at most a summary per node. I.e. "NUMA Node index verification: Nodes: X_0/Y_0 X_1/Y_1 ... X_N/Y_N Unknown: Z Total: X/Y" where X(_i) is the number of matching indexes (for node index i) and Y(_i) the number of expected (for node index i). Also, the correct word to use here is "mismatch" not "different" (to what?) - in some discussion we talked about the "node_index" lifecycle, and what I remember is the following: - initially, when we commit/make the region available, we set that HeapRegion's node_index to "Unknown" (with AlwaysPretouch on we can of course immediately set the correct one). - in HeapRegion::node_index() we do something like the following pseudo-code: { if (_node_index == Unknown) { // try to get actual node index from OS, and update _node_index if we could get the information } if (_node_index == Unknown) { // Still unknown // return _preferred_ node index *without* updating _node_index } return _node_index; } - now, during the "verification" pass, we use whether HeapRegion::node_index() == preferred_node_index to determine if the region is on the correct node. The change only sets the node index during making the region available, and immediately to the preferred node index. I.e. we eventually end up with the actual node index reported by the OS in HeapRegion::_node_index. - for the expression "G1MemoryNodeManager::num_active_nodes() > 1" it would be nice to have an extra method in G1MemoryNodeManager instead of repeating it over and over. - heapRegionManager.cpp:print_node_id_of_regions: that method will print a huge amount of lines. Better to print the summary I sketched out above. - in FreeRegionList::remove_region_with_node_index(), the maximum search depth must take into account how many regions are there per page. Consider 1GB pages, 32M region size, meaning that we get 32 consecutive regions/page. Now with a node amount of 2, the maximum search depth will be 6 - which is too low :) The intention is probably 3 * MAX(page_size / region size, 1) * numa->num_active_numa_ids(). I think it is useful to put that expresssion into G1NUMA/G1NodeMemoryManager (or somewhere else appropriate - HeapRegionManager?) to avoid that part having too much info about page size. - os.hpp: the new Enum values might or might need some description. Btw, there is no regression in performance from the .0/.1 versions of this code in our benchmarks. Thanks, Thomas From stefan.johansson at oracle.com Fri Oct 4 12:23:40 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 4 Oct 2019 14:23:40 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: <25fe4295-7de8-bbc3-3dd3-6b750d4982f2@oracle.com> Hi Sangheon, First of all, thanks for this updated version incorporating a lot of our comments. I think we are getting closer to the goal, but I still have some more comments :) On 2019-10-01 18:43, sangheon.kim at oracle.com wrote: > Hi Kim and others, > > This webrev.2 simplified a bit more after changing 'heap expansion' > approach. > Previously heap may expand with preferred numa id which means contiguous > same numa id heap regions may exist but current version is assuming to > have evenly split heap regions. i.e. 4 numa node system, heap regions > will be 012301230123, so if we know address or heap region index, we can > know preferred numa id. > > Many codes related to support previous style expansion were removed. > > ... > > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ > http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc src/hotspot/share/gc/g1/g1Allocator.cpp --- 31 #include "gc/g1/g1NUMA.hpp" I don't see why this include is needed, but you might want to include gc/g1/g1MemoryNodeManager.hpp instead. --- hotspot/share/gc/g1/g1CollectedHeap.cpp --- 1518 _mem_node_mgr(G1MemoryNodeManager::create()), I saw your response to Kim regarding G1Allocator needing it do be initialized and I get that, but have you looked at moving the creation of G1Allocator to initialize() as well, I think it's first use is actually below: 1802 _mem_node_mgr->set_page_size(page_size); here: 1851 _allocator->init_mutator_alloc_regions(); I might be missing some other place where it gets called, but I think it should be safe to create both the node manager and the allocator early in initialize(). --- src/hotspot/share/gc/g1/g1RegionToSpaceMapper.hpp --- 28 #include "gc/g1/g1MemoryNodeManager.hpp" Remove this include. --- src/hotspot/share/gc/g1/g1_globals.hpp --- 326 range(0, 100) Remove the backslash and add back the removed line to leave the file gc, heap, numa, verificationunchanged. --- src/hotspot/share/gc/g1/heapRegionManager.cpp --- 142 if (hr != NULL) { 143 assert(hr->next() == NULL, "Single region should not have next"); 144 assert(is_available(hr->hrm_index()), "Must be committed"); 145 146 verify_actual_node_index(hr->bottom(), hr->node_index()); 147 } I don't think this is a good place to do the verification, we allocate the free region while holding a lock and I think we should avoid doing a system call there. I would rather see this done during a safepoint, having a closure that iterates the heap and verify all regions. I also think it would be nice to have two levels of the output, the one line for each region on trace level and on debug we can have a summary, something like: NUMA Node 1: expected=25, actual=23 NUMA Node 2: expected=25, actual=27 What do you (and others) think about that? --- 216 static void print_node_id_of_regions(uint start, uint num_regions){ 217 LogTarget(Trace, gc, heap, numa) lt; I understand that it might make the test a bit more complicated, but have you thought about instead adding the node index to the heap printing done when is enabled on trace level? --- 235 static void set_heapregion_node_index(HeapRegion* hr) { I don't think we should special case for when AlwaysPreTouch is on and instead always just call hr->set_node_index(preferred_index) directly in make_regions_available. The reason is that I think it will make the NUMA support harder to understand and explain and it can potentially also hide problems with a systems configuration. It might also actually be worse then using the preferred id, because the OS might decide to move the pages back to the preferred node right after we checked this (not sure it will happen, but in theory). An other problem with this code is the call to: verify_actual_node_index(hr->bottom(), node_index) This function will only return the "actual" node index if logging for is enable on debug level. --- 346 bool HeapRegionManager::is_on_preferred_index(uint region_index, uint preferred_node_index) { 347 uint region_node_index = G1MemoryNodeManager::mgr()->preferred_index_for_address( 348 G1CollectedHeap::heap()->bottom_addr_for_region(region_index)); 349 return region_node_index == preferred_node_index || 350 preferred_node_index == G1MemoryNodeManager::AnyNodeIndex; I guess adding the AnyNodeIndex case here is because in this patch nobody is expanding on a preferred node, right? To me this is just another argument to not do any changes to the expand code in this patch. I know I suggested adding expand_on_preferred_node(), but I should have been clearer about when I think we should add it. --- src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp --- 56 // Returns memory node ids 57 virtual const int* node_ids() const; Doesn't seem to be used, remove. --- src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp --- 67 LINUX_ONLY(if (UseNUMA) { ... 79 delete numa; 80 }) A bit confusing with a multi-line LINUX_ONLY, I would prefer to hide this in a private helper, something like: if (UseNUMA) { LINUX_ONLY(create_numa_manager()); } if (_inst == NULL) { _inst = new G1MemoryNodeManager(); } Not really happy about this either, but we can look at simplifying the NUMA initialization as a follow up. --- src/hotspot/share/gc/g1/g1NUMA.hpp --- 87 // Returns numa id of the given numa index. 88 inline int numa_id_of_index(uint numa_index) const; Currently unused, either remove or make use of it when calling numa_make_local. --- 94 // Returns current active numa ids. 95 const int* numa_ids() const { return _numa_ids; } Only used by memory manager above, which in turn is unused, remove. --- src/hotspot/share/gc/g1/g1NUMA.hpp --- 55 // Request the given memory to locate on preferred node. 56 // There are 2 things to consider. 57 // First, size comparison for G1HeapRegionSize and page size. ... 62 // Examples of 4 numa ids with non-preferred numa id. What do you think about this instead: // Request to spread the given memory evenly across the available NUMA // nodes. Which node to request for a given address is given by the // region size and the page size. Below are two examples: I would also like a "NUMA node" row for each example showing which numa node the pages and regions end up on. --- Thanks, Stefan > Testing: hs-tier1 ~ 5 +-UseNUMA > > Thanks, > Sangheon > > >> ------------------------------------------------------------------------------ >> >> > From thomas.schatzl at oracle.com Fri Oct 4 13:34:20 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 4 Oct 2019 15:34:20 +0200 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Hi Kishor, On 04.10.19 03:00, Kharbas, Kishor wrote: > Hi, > When I worked on JDK-8211425, there was a request for better abstraction for pinning G1's CM bitmaps. RFE for the request is here - JDK-8215893. > > Here is a proposal : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00/ > > Here G1PageBasedVirtualSpace pins the entire reserved memory to memory during construction. The constructor takes an additional bool flag which says "does it need to pin the memory". > If the memory is pinned, '_special' flag is set to true. I piggy back on _special flag's behavior which is to not do actual OS (un-)commits on calls to (un)commit(). > Rest of the changes is the mechanism to pass this flag from CM bitmaps creation in G1CollectedHeap all the way to G1PageBasedVirtualSpace. > > Let me know if this is a good abstraction and if there is any better way. > > Thanks > Kishor > Some comments: - in the parameter lists, if the parameters are already laid out line-by-line, if adding a new one, please put it on a new line as well. - this code if (_special) { if (!rs.special()) { commit_internal(addr_to_page_index(_low_boundary), addr_to_page_index(_high_boundary)); } in g1PageBasedVirtualSpace looks very incomprehensible. :) I would prefer (pending the second reviewer's comment) to either use the "pinned" flag here, or even better, move the necessary commit calls into the (now removed) HeterogeneousHeapRegionManager::initialize(). - I would just purely from feeling prefer if the "pinned" flag parameter would be listed after the "type" parameter in the G1RegionToSpaceMapper. But that's probably just me. Also, finally one parameter per line for the declaration/definition of the constructor would improve readability. Thanks, Thomas From zgu at redhat.com Fri Oct 4 14:51:33 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 4 Oct 2019 10:51:33 -0400 Subject: RFR 8231324: Shenandoah: avoid duplicated weak root works during final traversal Message-ID: Please review this patch that avoids traversal GC to walk weak roots twice during final traversal. Also, it should process weak roots first, so that, fixup phase does not visit dead CLDs/codes, etc. Bug: https://bugs.openjdk.java.net/browse/JDK-8231324 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231324/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) on Linux x86_64 Thanks, -Zhengyu From mark.reinhold at oracle.com Fri Oct 4 17:16:15 2019 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Fri, 4 Oct 2019 10:16:15 -0700 (PDT) Subject: New candidate JEP: 364: ZGC on macOS Message-ID: <20191004171615.2E34130971A@eggemoggin.niobe.net> https://openjdk.java.net/jeps/364 - Mark From kishor.kharbas at intel.com Fri Oct 4 23:15:50 2019 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Fri, 4 Oct 2019 23:15:50 +0000 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Hi Stefan, Thanks for the review. Some comments inline. New webrev : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00_to_01/ http://cr.openjdk.java.net/~kkharbas/8215893/webrev.01/ > Hi Kishor, > > On 04.10.19 03:00, Kharbas, Kishor wrote: >> Hi, >> When I worked on JDK-8211425, there was a request for better abstraction for pinning G1's CM bitmaps. RFE for the request is here - JDK-8215893. >> >> Here is a proposal : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00/ >> >> Here G1PageBasedVirtualSpace pins the entire reserved memory to memory during construction. The constructor takes an additional bool flag which says "does it need to pin the memory". >> If the memory is pinned, '_special' flag is set to true. I piggy back on _special flag's behavior which is to not do actual OS (un-)commits on calls to (un)commit(). >> Rest of the changes is the mechanism to pass this flag from CM bitmaps creation in G1CollectedHeap all the way to G1PageBasedVirtualSpace. >> >> Let me know if this is a good abstraction and if there is any better way. >> >> Thanks >> Kishor >> > > Some comments: > > - in the parameter lists, if the parameters are already laid out > line-by-line, if adding a new one, please put it on a new line as well. > Fixed in the new webrev. > - this code > > if (_special) { > if (!rs.special()) { > commit_internal(addr_to_page_index(_low_boundary), > addr_to_page_index(_high_boundary)); > } > > in g1PageBasedVirtualSpace looks very incomprehensible. :) > > I would prefer (pending the second reviewer's comment) to either use the > "pinned" flag here, or even better, move the necessary commit calls into > the (now removed) HeterogeneousHeapRegionManager::initialize(). > Made it little more comprehensible. Will see what other reviewers think about moving it somewhere else. > - I would just purely from feeling prefer if the "pinned" flag parameter > would be listed after the "type" parameter in the G1RegionToSpaceMapper. > But that's probably just me. > I did it this way to logically group the parameters. MemTracker is a tracker used by the VM everywhere and does not pertain to this class as such, so I kept it in the end. > Also, finally one parameter per line for the declaration/definition of > the constructor would improve readability. > Done. Thank you, Kishor > Thanks, > Thomas From stefan.johansson at oracle.com Mon Oct 7 08:25:19 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 7 Oct 2019 10:25:19 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <43f26130-f340-f23a-11fd-773f696998a9@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <43f26130-f340-f23a-11fd-773f696998a9@oracle.com> Message-ID: <523dae88-11d8-1f88-044e-7f722eb3c84d@oracle.com> Hi Thomas and Sangheon, I have one comment on Thomas comments =) On 2019-10-04 14:11, Thomas Schatzl wrote: > Hi Sangheon, > > ? thanks for your? hard work on this! > > On 01.10.19 18:43, sangheon.kim at oracle.com wrote: >> Hi Kim and others, >> >> This webrev.2 simplified a bit more after changing 'heap expansion' >> approach. > [...] >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc >> Testing: hs-tier1 ~ 5 +-UseNUMA >> > > Comments: > > ... > > - in some discussion we talked about the "node_index" lifecycle, and > what I remember is the following: > > ? - initially, when we commit/make the region available, we set that > HeapRegion's node_index to "Unknown" (with AlwaysPretouch on we can of > course immediately set the correct one). > ? - in HeapRegion::node_index() we do something like the following > pseudo-code: > > ? { > ??? if (_node_index == Unknown) { > ????? // try to get actual node index from OS, and update _node_index > if we could get the information > ??? } > > ??? if (_node_index == Unknown) { // Still unknown > ????? // return _preferred_ node index *without* updating _node_index > ??? } > ??? return _node_index; > ? } > > ? - now, during the "verification" pass, we use whether > HeapRegion::node_index() == preferred_node_index to determine if the > region is on the correct node. > > The change only sets the node index during making the region available, > and immediately to the preferred node index. > > I.e. we eventually end up with the actual node index reported by the OS > in HeapRegion::_node_index. > I like the idea of being able to get the correct node index from HeapRegion, but I have two concerns about the above idea. First, this will cause us to do a syscall while holding the lock to get a new region. This might not be a big deal, but I would prefer to do this update during a safepoint. The second thing is that if pages get migrated by the OS we would not see this if we only request the actual node index one time. It's possible that both those concerns can be ignored, but I wanted to bring them up to hear others opinions. Thanks, Stefan From thomas.schatzl at oracle.com Mon Oct 7 08:45:38 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 7 Oct 2019 10:45:38 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <523dae88-11d8-1f88-044e-7f722eb3c84d@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <43f26130-f340-f23a-11fd-773f696998a9@oracle.com> <523dae88-11d8-1f88-044e-7f722eb3c84d@oracle.com> Message-ID: Hi, On 07.10.19 10:25, Stefan Johansson wrote: > Hi Thomas and Sangheon, > > I have one comment on Thomas comments =) > > On 2019-10-04 14:11, Thomas Schatzl wrote: >> Hi Sangheon, [...] >> >> I.e. we eventually end up with the actual node index reported by the >> OS in HeapRegion::_node_index. >> > > I like the idea of being able to get the correct node index from > HeapRegion, but I have two concerns about the above idea. First, this > will cause us to do a syscall while holding the lock to get a new > region. This might not be a big deal, but I would prefer to do this I am not completely into the actual code flow right now, but I do not think there is a need to get the node index in this code path from the HeapRegion. Maybe when allocating from the free list later? > update during a safepoint. The second thing is that if pages get Fine with me too to piggyback it on some existing region iteration to be 100% sure. > migrated by the OS we would not see this if we only request the actual > node index one time. This is what the logging/verification is for I guess at this time. If the migration is significant, we need to handle this and update the node index - but I think we can do this node index update as RFE. Above update of the actual node index values during safepoint could also "always" do the summary logging then (with gc+numa=debug or something) if NUMA is enabled. Overall I would agree with that too. > It's possible that both those concerns can be ignored, but I wanted to > bring them up to hear others opinions. Thanks, Thomas From stefan.johansson at oracle.com Mon Oct 7 08:58:04 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 7 Oct 2019 10:58:04 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <43f26130-f340-f23a-11fd-773f696998a9@oracle.com> <523dae88-11d8-1f88-044e-7f722eb3c84d@oracle.com> Message-ID: On 2019-10-07 10:45, Thomas Schatzl wrote: > Hi, > > On 07.10.19 10:25, Stefan Johansson wrote: >> Hi Thomas and Sangheon, >> >> I have one comment on Thomas comments =) >> >> On 2019-10-04 14:11, Thomas Schatzl wrote: >>> Hi Sangheon, > [...] >>> >>> I.e. we eventually end up with the actual node index reported by the >>> OS in HeapRegion::_node_index. >>> >> >> I like the idea of being able to get the correct node index from >> HeapRegion, but I have two concerns about the above idea. First, this >> will cause us to do a syscall while holding the lock to get a new >> region. This might not be a big deal, but I would prefer to do this > I am not completely into the actual code flow right now, but I do not > think there is a need to get the node index in this code path from the > HeapRegion. Maybe when allocating from the free list later? Allocating from the free list is also under the lock, but I think we are on the same page, just asking for the hr->node_index() should not cause a syscall. > >> update during a safepoint. The second thing is that if pages get > > Fine with me too to piggyback it on some existing region iteration to be > 100% sure. Yes, that should make the cost fairly low. > >> migrated by the OS we would not see this if we only request the actual >> node index one time. > > This is what the logging/verification is for I guess at this time. If > the migration is significant, we need to handle this and update the node > index - but I think we can do this node index update as RFE. Yes, I have no idea if migration is a real problem, so separate RFE is ok. > > Above update of the actual node index values during safepoint could also > "always" do the summary logging then (with gc+numa=debug or something) > if NUMA is enabled. Sounds reasonable. Thanks, Stefan > > Overall I would agree with that too. > >> It's possible that both those concerns can be ignored, but I wanted to >> bring them up to hear others opinions. > > Thanks, > ? Thomas From per.liden at oracle.com Mon Oct 7 11:36:31 2019 From: per.liden at oracle.com (Per Liden) Date: Mon, 7 Oct 2019 13:36:31 +0200 Subject: RFR: 8231940: ZGC: Print correct low/high capacity Message-ID: <2742b8ba-7fa0-b789-a250-4c9de40e1fc0@oracle.com> After JDK-8222480, heap capacity can go down, not just up. The heap logging should take that into account when when printing capacity high/low numbers. Bug: https://bugs.openjdk.java.net/browse/JDK-8231940 Webrev: http://cr.openjdk.java.net/~pliden/8231940/webrev.0 /Per From per.liden at oracle.com Mon Oct 7 12:38:05 2019 From: per.liden at oracle.com (Per Liden) Date: Mon, 7 Oct 2019 14:38:05 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest Message-ID: This test is currently disabled for ZGC, but it can easily be enabled by adjusting the expected log string. ZGC doesn't print "Pause Full", but it still prints the "(Diagnostic Command)" part. Also, the test enables gc=debug logging, which is unnecessary since this is always printed on the gc=info level. Bug: https://bugs.openjdk.java.net/browse/JDK-8231943 Webrev: http://cr.openjdk.java.net/~pliden/8231943/webrev.0 Testing: Manually ran test with all GCs (except Epsilon) /Per From shade at redhat.com Mon Oct 7 12:51:19 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 7 Oct 2019 14:51:19 +0200 Subject: RFR (S) 8231932: Shenandoah: conc/par GC threads ergonomics overrides user settings Message-ID: <4e63d1e7-0c98-a491-f954-7c6b7048602c@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8231932 Fix: https://cr.openjdk.java.net/~shade/8231932/webrev.01/ This manifests in tier1 tests, and that is actually the UX problem. New test captures it directly. Patched code favors adjusting the setting that was selected ergonomically, which leaves the user setting alone. Also, it is awkward to adjust the GC threads settings silently (which is why test failed without proper message), and we should fail on misconfiguration right away, which explains the adjustments in existing tests. Testing: new test, hotspot_gc (with Shenandoah), tier1 (with Shenandoah), hotspot_gc_shenandoah -- Thanks, -Aleksey From stefan.johansson at oracle.com Mon Oct 7 12:57:39 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 7 Oct 2019 14:57:39 +0200 Subject: RFR: 8231153: Improve concurrent refinement statistics In-Reply-To: References: <1DADC595-3106-4CE7-BA5D-7B6C7EE0E81E@oracle.com> <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> Message-ID: <3bafef2f-1380-3105-5c54-5e8095c42409@oracle.com> Hi Kim, On 2019-10-02 03:08, Kim Barrett wrote: > New webrevs: > full: https://cr.openjdk.java.net/~kbarrett/8231153/open.01/ > incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.01.inc/ The changes looks good, just one question around the calculation of total time and size. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp --- 415 Tickspan G1ConcurrentRefine::total_refinement_time() const { ... 425 const_cast(this)->threads_do(&closure); 426 return closure._total_time; 427 } 428 429 size_t G1ConcurrentRefine::total_refined_cards() const { ... 439 const_cast(this)->threads_do(&closure); 440 return closure._total_cards; 441 } Did you consider grouping these two functions into one, to avoid iterating the threads twice? Not sure this is a big deal, and it might only make the code more complicated, but it feels a bit unnecessary to do two iteration right after each other. --- Thanks, Stefan > > Testing: > mach5 tier1-5 > some local by-hand testing to look at the rate tracking. > From rkennke at redhat.com Mon Oct 7 13:36:38 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 07 Oct 2019 15:36:38 +0200 Subject: RFR (S) 8231932: Shenandoah: conc/par GC threads ergonomics overrides user settings In-Reply-To: <4e63d1e7-0c98-a491-f954-7c6b7048602c@redhat.com> References: <4e63d1e7-0c98-a491-f954-7c6b7048602c@redhat.com> Message-ID: <155246FC-4947-48BA-93BF-29B895F49B19@redhat.com> Looks OK to me. Thanks! Roman Am 7. Oktober 2019 14:51:19 MESZ schrieb Aleksey Shipilev : >Bug: > https://bugs.openjdk.java.net/browse/JDK-8231932 > >Fix: > https://cr.openjdk.java.net/~shade/8231932/webrev.01/ > >This manifests in tier1 tests, and that is actually the UX problem. New >test captures it directly. >Patched code favors adjusting the setting that was selected >ergonomically, which leaves the user >setting alone. > >Also, it is awkward to adjust the GC threads settings silently (which >is why test failed without >proper message), and we should fail on misconfiguration right away, >which explains the adjustments >in existing tests. > >Testing: new test, hotspot_gc (with Shenandoah), tier1 (with >Shenandoah), hotspot_gc_shenandoah > >-- >Thanks, >-Aleksey -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From stefan.johansson at oracle.com Mon Oct 7 13:41:21 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 7 Oct 2019 15:41:21 +0200 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: <09ff53d7-cbc2-37e9-81b5-b3de6bc6ea16@oracle.com> Hi Kishor, On 2019-10-04 03:00, Kharbas, Kishor wrote: > Hi, > > When I worked on JDK-8211425 > , there was a request > for better abstraction for pinning G1?s CM bitmaps. RFE for the request > is here - JDK-8215893 . > > Here is a proposal : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00/ > > Here G1PageBasedVirtualSpace pins the entire reserved memory to memory > during construction. The constructor takes an additional bool flag which > says ?does it need to pin the memory?. > > If the memory is pinned, ?_special? flag is set to true. I piggy back on > _special flag?s behavior which is to not do actual OS (un-)commits on > calls to (un)commit(). > > Rest of the changes is the mechanism to pass this flag from CM bitmaps > creation in G1CollectedHeap all the way to G1PageBasedVirtualSpace. > > Let me know if this is a good abstraction and if there is any better way. > I'm not sure I like this approach better, and even though I'm not super fond of the commit_and_set_special function either, at least the old way kept the pinning code quite isolated. Moving the commit_internal() call into initialize_with_page_size() feels like a move in the wrong direction. I'm not sure I have a much better idea, but one thing to try would be to tell the underlying ReservedSpace that it should be special/pinned even if it is not mapped with large pages. That way the upper layers should just work. Another thing, can you remind me why we need the bitmaps to be pinned but not other structures such as the card table? Thanks, Stefan > Thanks > > Kishor > From shade at redhat.com Mon Oct 7 14:08:18 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 7 Oct 2019 16:08:18 +0200 Subject: RFR (XS/T) 8231946: Remove obsolete and unused ShenandoahVerifyObjectEquals flag Message-ID: <2ac7e61d-4f52-79aa-239f-80b4a1bbd019@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8231946 This flag was obsoleted and not used for a while. Let's remove it: diff -r de43643147c6 src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp --- a/src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp Mon Oct 07 15:30:29 2019 +0200 +++ b/src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp Mon Oct 07 16:07:09 2019 +0200 @@ -315,7 +315,4 @@ "Tracing task termination timings") \ \ - develop(bool, ShenandoahVerifyObjectEquals, false, \ - "Verify that == and != are not used on oops. Only in fastdebug") \ - \ diagnostic(bool, ShenandoahAlwaysPreTouch, false, \ "Pre-touch heap memory, overrides global AlwaysPreTouch") \ Testing: x86_64 build, hotspot_gc_shenandoah (running) -- Thanks, -Aleksey From rkennke at redhat.com Mon Oct 7 14:19:11 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 07 Oct 2019 16:19:11 +0200 Subject: RFR (XS/T) 8231946: Remove obsolete and unused ShenandoahVerifyObjectEquals flag In-Reply-To: <2ac7e61d-4f52-79aa-239f-80b4a1bbd019@redhat.com> References: <2ac7e61d-4f52-79aa-239f-80b4a1bbd019@redhat.com> Message-ID: <1D29E0A2-0EF2-4C52-80B9-9EB7229E202D@redhat.com> Yup. (I believe we have some more unused flags like ShStoreCheck) Am 7. Oktober 2019 16:08:18 MESZ schrieb Aleksey Shipilev : >RFE: > https://bugs.openjdk.java.net/browse/JDK-8231946 > >This flag was obsoleted and not used for a while. Let's remove it: > >diff -r de43643147c6 >src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp >--- a/src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp Mon Oct >07 15:30:29 2019 +0200 >+++ b/src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp Mon Oct >07 16:07:09 2019 +0200 >@@ -315,7 +315,4 @@ > "Tracing task termination timings") \ > \ >- develop(bool, ShenandoahVerifyObjectEquals, false, > \ >- "Verify that == and != are not used on oops. Only in >fastdebug") \ >- > \ >diagnostic(bool, ShenandoahAlwaysPreTouch, false, > \ > "Pre-touch heap memory, overrides global AlwaysPreTouch") \ > >Testing: x86_64 build, hotspot_gc_shenandoah (running) > >-- >Thanks, >-Aleksey -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From kim.barrett at oracle.com Mon Oct 7 18:10:42 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 7 Oct 2019 14:10:42 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: > On Oct 1, 2019, at 12:43 PM, sangheon.kim at oracle.com wrote: > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ > http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc > Testing: hs-tier1 ~ 5 +-UseNUMA I like the direction of this. I think there are some additional simplifications possible around G1NUMA, which are discussed below. I still need to respond to your earlier individual responses. That will be in another email. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp 67 LINUX_ONLY(if (UseNUMA) { Maybe instead use #ifdef LINUX. Either way, add a trailing comment at the end of the conditional block. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.cpp 79 // If we don't have preferred numa id, touch the given area with round-robin manner. This comment seems out of place / obsolete. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.cpp 138 uint region_index = G1CollectedHeap::heap()->addr_to_region(address); This requires the address be in the range reserved for the heap. That's okay; that's what we decided we want to do. But that should be part of the function's description, e.g. it should be mentioned as a precondition for prefered_index_for_address. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.hpp 87 // Returns numa id of the given numa index. 88 inline int numa_id_of_index(uint numa_index) const; Unused function. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.hpp 83 inline uint index_of_numa_id(int numa_id) const; This function should be private. It is only needed in the implementation of index_of_current_thread and index_of_address. It should have a precondition that the argument is an active numa id, e.g. a definition something like uint G1NUMA::index_of_numa_id(int numa_id) const { assert(numa_id >= 0, "invalid numa id %d", numa_id); assert(numa_id < _len_numa_id_to_index_map, "invalid numa id %d", numa_id); uint numa_index = _numa_id_to_index_map[numa_id]; assert(numa_index != G1MemoryNodeManager::InvalidNodeIndex, "invalid numa id %d", numa_id); return numa_index; } To make this work, index_of_address should also be changed, to something like: uint G1NUMA::index_of_address(HeapWord* address) const { int numa_id = os::numa_get_address_id((uintptr_t)address); if (numa_id == os::InvalidId) { return G1MemoryNodeManager::InvalidNodeIndex; } else { return index_of_numa_id(numa_id); } } ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.cpp 31 void G1NUMA::init_numa_id_to_index_map(const int* numa_ids, uint num_numa_ids) { This function is only called from one place, G1NUMA::initialize. The code would be simpler and more clear if the body of this function were just directly inlined into initialize and this function eliminated. And once that's done it becomes apparent that initialize could be hoisted into the (moved out of line) constructor. This also lets num_active_numa_ids just be a trivial accessor function in the header; there's no possibility of finding it uninitialized after the constructor returns, so no need for the assert that it has been set. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.inline.hpp 32 inline bool G1NUMA::is_valid_numa_id(int numa_id) { Only called by init_numa_to_index_map in a guarantee that would be more obviously vacuous after the earlier suggested merge of that function into initialize. ------------------------------------------------------------------------------ src/hotspot/share/runtime/os.hpp 393 enum NumaIdState { 394 InvalidId = -1, 395 AnyId = -2 396 }; The type NumaIdState is unused. The AnyId enumerator is unused. Suggest making InvalidId just a static const int in the class. ------------------------------------------------------------------------------ src/hotspot/share/runtime/os.hpp 398 static int numa_get_address_id(uintptr_t address); Why is the type of address uintptr_t rather than a pointer type? I see that the underlying Linux syscall (get_mempolicy) wants an unsigned long, but that detail ought to be isolated to the Linux implementation layer. Callers are going to want to pass in addresses (pointers) and should not need to cast. That cast should happen at the point where the syscall is being made. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1Allocator.inline.hpp 37 inline MutatorAllocRegion* G1Allocator::mutator_alloc_region(uint node_index) { 38 assert(_g1h->mem_node_mgr()->is_valid_node_index(node_index), "Invariant, index %u", node_index); 39 return &_mutator_alloc_regions[node_index]; 40 } I think the assert here should be that node_index < _num_alloc_regions. is_valid_node_index gives a somewhat indirect (so weak) check of the validity of the array access. Such a change would also eliminate one of the two callers of is_valid_node_index, which I think can be eliminated (see next comment). ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/heapRegionManager.cpp 126 HeapRegion* HeapRegionManager::allocate_free_region(HeapRegionType type, uint requested_node_index) { ... 131 if (mgr->num_active_nodes() > 1 && mgr->is_valid_node_index(requested_node_index)) { I think a better test here would be if ((requested_node_index != G1MemoryNodeManager::AnyNodeIndex) && (mgr->num_active_nodes() > 1)) { This eliminates one of two calls to is_valid_node_index (which I think can be eliminated, see previous comment). And callers should not be passing in actually invalid indices. I think there are asserts lower down in the stack (in G1NUMA) to complain about such, but they shouldn't be getting in here anyway. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp 42 static const uint InvalidNodeIndex = UINT_MAX; 43 static const uint AnyNodeIndex = InvalidNodeIndex - 1; These seem misplaced to me. Shouldn't they be in G1NUMA? Possibly reexported here for convenience? (Assuming it actually is convenient.) ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp 42 static const uint InvalidNodeIndex = UINT_MAX; I think the only place this arises is as the result of index_of_address when the numa id for the location isn't known. Which suggests the name should be "UnknownNodeIndex" rather than "InvalidNodeIndex". And the description of index_of_address should mention that it can return that value (whatever its name ends up being.) ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp I'm not sure G1MemoryNodeManager is useful. It seems to be just a thin wrapper over the G1NUMA API, with a virtual dispatch between a non-NUMA or single-node implementation and the multi-node implementation that uses a G1NUMA that is only created for multi-node support. The virtual dispatch can't be eliminated in most (all or nearly all?) cases. But I think most of the single-node implementation would just fall out as a 1-node boundary case for multi-node G1MemoryNodeManager / G1NUMA. So I think this might all be collapsed down to a G1NUMA that always exists. If there are any places that require actual distinction, that class can have a private member to select the appropriate behavior. (Or maybe it's just the number of active nodes.) ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.inline.hpp I think that with the changes I've proposed above, I think there's not much left in this file, and it might not be worth having it. Consider moving any lingering remnents to the .hpp or .cpp file as appropriate. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.hpp Consider adding a page_size() accessor function (private for now) that asserts the associated data member is > 0 (e.g. initialized), since it is initialized after construction. Use that instead of direct uses of the data member. ------------------------------------------------------------------------------ src/hotspot/share/runtime/arguments.cpp 4108 // such as Parallel GC for Linux and Solaris or G1 GC for Linux will ... 4111 // Non NUMA-aware collectors such as CMS and Serial-GC on 4112 // all platforms and ParallelGC on Windows will interleave all I think that these comments about which configurations do or don't support NUMA are just a maintenance headache. I think it would be better here to just say NUMA-aware collectors will interleave ... Non NUMA-aware collectors will interleave ... And leave out mentions of configurations that may change (as is being done here) or be removed (as soon expected for CMS). ------------------------------------------------------------------------------ From kim.barrett at oracle.com Mon Oct 7 18:35:56 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 7 Oct 2019 14:35:56 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: > On Oct 7, 2019, at 2:10 PM, Kim Barrett wrote: > >> On Oct 1, 2019, at 12:43 PM, sangheon.kim at oracle.com wrote: >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc >> Testing: hs-tier1 ~ 5 +-UseNUMA > > I like the direction of this. I think there are some additional simplifications possible > around G1NUMA, which are discussed below. I forgot to mention: Some of my comments were a bit intertwined, so that I ended up making a couple of patches to help me keep track. Here are webrevs for those patches, which might be of some help to you; use any parts you find useful. https://cr.openjdk.java.net/~kbarrett/8220310/kab_g1numa/ https://cr.openjdk.java.net/~kbarrett/8220310/is_valid_numa_index/ From kim.barrett at oracle.com Mon Oct 7 18:48:21 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 7 Oct 2019 14:48:21 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: > On Oct 1, 2019, at 12:43 PM, sangheon.kim at oracle.com wrote: > Here are my inline responses to yours. > > On 9/24/19 6:44 PM, Kim Barrett wrote: >>> On Sep 21, 2019, at 1:19 AM, sangheon.kim at oracle.com >>> wrote: >>> >>> webrev: >>> >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1 >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1.inc >>> (this may not help much! :) ) >>> Testing: hs-tier 1 ~ 5 (with/without UseNUMA) >>> >> ------------------------------------------------------------------------------ >> src/hotspot/share/gc/g1/g1AllocRegion.hpp >> 96 uint _node_index; >> >> Protected; should be private. >> > _node_index is used from derived classes. > Are you suggesting to add a getter? Oops, missed that it was used in derived classes. I usually try to avoid non-private data members, and would add a getter here, but that?s not a universal style in our code. >> src/hotspot/share/gc/g1/g1Allocator.cpp >> 53 G1Allocator::~G1Allocator() { >> 54 for (uint i = 0; i < _num_alloc_region; i++) { >> 55 _mutator_alloc_region[i].~MutatorAllocRegion(); >> 56 } >> 57 FREE_C_HEAP_ARRAY(MutatorAllocRegion, _mutator_alloc_region); >> 58 } >> >> --- should also be calling _mutator_alloc_region[i].release() ?? >> --- or does destructor do that? >> > No, release() is never called. > release() is not actually releasing allocated resources but sets null to pointers and inc/dec some numbers such as used bytes. So I was thinking we don't need to call release(). Thanks for clarifying that. That was a reminder for me to go figure that out, but I forgot to do so before sending off that round of comments. >> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp >> 83 G1PageBasedVirtualSpace::~G1PageBasedVirtualSpace() { >> ... >> 92 _numa = NULL; >> 93 } >> >> [pre-existing] Destructors are for resource management. Nulling out / >> zeroing out members in a destructor generally isn't useful. This is >> really a comment on the existing code rather than a request to change >> anything. The addition of line 92 is okay in context, just the context >> is not good. >> > Agreed on pre-existing. > The intent here is to align with existing context, so leave as is? You can leave as is. >> src/hotspot/share/gc/g1/g1NUMA.cpp >> 42 memset(_numa_id_to_index_map, >> 43 G1MemoryNodeManager::InvalidNodeIndex, >> 44 sizeof(uint) * _len_numa_id_to_index_map); >> >> memset only works here because all bytes of InvalidNodeIndex happen to >> have the same value. I would prefer an explicit fill loop rather than >> memset here. Or a static assert on the value, but that's probably >> more code. >> > Changed to fill during loop. > I'm aware of this and the only reason of changing InvalidNodeIndex from 0xfffe to 0xffff was to use memset here. > I was thinking you are okay with memset as you commented to use memset from your previous email. :) Seems like my earlier suggestion to use memset was a bad idea? From kim.barrett at oracle.com Mon Oct 7 22:38:49 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 7 Oct 2019 18:38:49 -0400 Subject: RFR: 8231153: Improve concurrent refinement statistics In-Reply-To: <3bafef2f-1380-3105-5c54-5e8095c42409@oracle.com> References: <1DADC595-3106-4CE7-BA5D-7B6C7EE0E81E@oracle.com> <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> <3bafef2f-1380-3105-5c54-5e8095c42409@oracle.com> Message-ID: <0280B88E-45D7-46C0-A0E5-2E708B0132ED@oracle.com> > On Oct 7, 2019, at 8:57 AM, Stefan Johansson wrote: > > Hi Kim, > > On 2019-10-02 03:08, Kim Barrett wrote: >> New webrevs: >> full: https://cr.openjdk.java.net/~kbarrett/8231153/open.01/ >> incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.01.inc/ > > The changes looks good, just one question around the calculation of total time and size. > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp > --- > 415 Tickspan G1ConcurrentRefine::total_refinement_time() const { > ... > 425 const_cast(this)->threads_do(&closure); > 426 return closure._total_time; > 427 } > 428 > 429 size_t G1ConcurrentRefine::total_refined_cards() const { > ... > 439 const_cast(this)->threads_do(&closure); > 440 return closure._total_cards; > 441 } > > Did you consider grouping these two functions into one, to avoid iterating the threads twice? Not sure this is a big deal, and it might only make the code more complicated, but it feels a bit unnecessary to do two iteration right after each other. Thanks for the suggestion. I tried doing something like that in an earlier version of this change, but I didn't like how it turned out. But enough code has changed since then that I decided to try again. This time seems okay. So G1ConcurrentRefine now provides a RefinementStats class that packages up the time and card counts, and a new function total_refinment_stats() that returns one of those. Also removed total_refinement_time() and total_refined_cards(), which are no longer used. (If that were to change they are easily reinstated as wrappers over total_refinement_stats().) New webrevs: full: https://cr.openjdk.java.net/~kbarrett/8231153/open.02/ incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.02.inc/ Testing: mach5 tier1 From sangheon.kim at oracle.com Tue Oct 8 04:13:00 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Mon, 7 Oct 2019 21:13:00 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <43f26130-f340-f23a-11fd-773f696998a9@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <43f26130-f340-f23a-11fd-773f696998a9@oracle.com> Message-ID: <2cb3774d-3ea9-19ef-1eaa-a224129bed93@oracle.com> Hi Thomas, Many thanks for this thorough review! On 10/4/19 5:11 AM, Thomas Schatzl wrote: > Hi Sangheon, > > ? thanks for your? hard work on this! > > On 01.10.19 18:43, sangheon.kim at oracle.com wrote: >> Hi Kim and others, >> >> This webrev.2 simplified a bit more after changing 'heap expansion' >> approach. > [...] >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc >> Testing: hs-tier1 ~ 5 +-UseNUMA >> > > Comments: > > - os_solaris.cpp:2236: indentation addition I don't see any changes at line 2236? > > - os_windows.cpp: os::get_address_id(): the "return 0" is in the same > line as the method declaration, while the change uses extra lines in > os_bsd. Please make this uniform. Done. > > g1_globals.cpp: unnecessary whitespace? change Done. > > - G1Allocator::unsafe_max_alloc(): I need to think some more if this > is correct - should that really be node specific? (and not the max of > all nodes). > > Otoh I think this is fine. Yes, according to the comment at the first line at the method: ? // Return the remaining space in the cur alloc region, but not less than ? // the min TLAB size. 'cur alloc region' differs per numa node, so it reflects node index. > > - g1Allocator::used_in_alloc_regions(): not sure why the assert has > been removed Probably removed during rebasing the patch. > > - os_linux.cpp: os::numa_get_address_id(): I think "id" should be an > "int", not uint32_t according to > http://man7.org/linux/man-pages/man2/get_mempolicy.2.html > > And I think you can initialize it with os::InvalidId; All done, change to 'int' and 'InvalidId'. > > - in G1Allocator::current_node_index() retrieving the current node > index is part of the G1MemoryNodeManager; I would really prefer if > this were some property of a Thread. Not sure what others think. > > That value could be put into the G1ThreadLocalData. > > In any case, G1Allocator should probably cache the reference to the > G1MemoryNodeManager for faster access. Added new member of G1MemoryNodeManager* at G1Allocator. > > - G1CollectedHeap::expand_single_region(): the log output in the first > line looks more like some debug code than generally interesting > information. Removed the first line log. > > - G1CollectedHeap::expand_single_region(): pre-existing: it should add > to the in-safepoint expansion time like expand(); okay to just file a CR. Filed > > - instead of the "late initialization method" set_page_size() I would > prefer to have this value passed in the constructor. It is not > required to me to have the create() call in the initialization list of > g1CollectedHeap at all costs... it could be put right after we > determine the page size in the body of the G1CollectedHeap constructor. We do need G1MemoryNodeManager instance to get the number of active numa nodes when we construct G1Allocator. i.e. we create per numa node G1AllocRegion at G1Allocator. And we also need page size at G1MemoryNodeManager after G1CollectedHeap is initialized. I tried to add comment at G1NUMA::initialize(). > > - g1CollectedHeap.hpp:940: no need to delete the newline. Reverted the newline. > > - g1MemoryNodeManager.cpp:41: that comment does not add information imho :) Removed the comment. > > - G1NUMA::index_of_current_thread() needs a comment Added: // Returns numa index of current calling thread. Do you have any suggestions? I was thinking the method name is more than enough to explain itself. :) > > - G1NUMA::index_of_num_id/is_valid_numa_id/ should be private Done. Actually got same comment from Kim as well during private discussion. > > - not sure why G1NUMA::initialize()/set_numa() are needed. It's only > call is right after instantiating a G1NUMA instance Probably you are pointing G1NUMA::initialize() and set_page_size()? I tried to explain above why we need 2 calls. > > - G1NUMA::request_memory_on_node needs a comment. Added: ? // Request the given range of memory to be located at a specific numa node. ? // But OS doesn't guarantee to reside on the node. ? // The numa node is decided by preferred_index_for_address(). > > - I observed that a *lot* of G1NUMA methods are only used by > G1MemoryNodeManager; and G1MemoryNodeManager just forwards to G1NUMA a > lot. Maybe these two can be merged? Done. I agree since G1NUMA is getting smaller and smaller. Since this is relatively large change, I had to revisit all addressed comments above. :) > > - G1NUMA::preferred_index_for_address/request_memory_on_node: I would > prefer if these methods were not hardcoded with HeapRegion metrics as > example. > > I.e. for preferred_index_for_address(), instead of the address it is > probably better to pass it the zero-based index directly, that is used > for calculating the node index. I.e. all callers know the HeapRegion's > index anyway *and* this would make the method independent of > G1CollectedHeap. > > I.e. something like preferred_node_index_for_index(), > because then the same method can be reused for other data structures > than the heap/heap region. Done. Removed all dependency with G1CollectedHeap and HeapRegion at G1MemoryNodeManager(previously G1NUMA). Added 'size_t _region_size' at G1NUMA and then G1NUMA::preferred_node_index_for_index(uint heap_region_index). > G1NUMA::request_memory_on_node() could also be moved to > G1PageBasedVirtualSpace, using the chunk sizes of page based virtual > space instead of hardcoding HeapRegion::GrainBytes (i.e. hardcode the > method to HeapRegion) - or pass in the "chunk size" calculated there > from G1PageBasedVirtualSize. > > I think this would increase the generality and usefulness of > G1NUMA/G1MemoryNodeManager a lot without "passing in too many node > indices everywhere". > > - G1PageBasedVirtualSpace: the _numa member seems to be used exactly > in one method where performance does not look critical. Maybe it is > better to reference it directly there via G1CollectedHeap. With the above comment of 'G1NUMA::request_memory_on_node() to be moved to G1PageBasedVirtualSpace, I tried to change a bit. _numa is removed from G1PageBasedVS. The intent of new implementation is to avoid HeapRegion dependency as you mentioned. > > - heapRegionManager.cpp:verfiy_actual_node_index(): not sure if that > should be debug level. Are you suggesting 'trace' level'? > > I would also kind of prefer a method that iterates over all regions > and prints a summary status (and potentially drop this per-region > checking at least when allocating a new free region). Probably I'm missing something but what I tried to mention during internal discussion was this. But someone told me adding different log level should be sufficient and I agree on that. I don't have strong opinion verifying at HRM::allocate_free_region() so I will remove if there's no other opinion on this. Currently we can check node index when the region is being used on specific log level/tag. Stefan (and probably you as well) mentioned verifying at safepoint, but it would be better to address at JDK-8220312 (3/3 which is part of this JEP) since I would like to go forward. :) > It is sufficient to print a summary of expected/actual values, at most > a summary per node. I.e. > > "NUMA Node index verification: Nodes: X_0/Y_0 X_1/Y_1 ... X_N/Y_N > Unknown: Z Total: X/Y" > > where X(_i) is the number of matching indexes (for node index i) and > Y(_i) the number of expected (for node index i). I'm okay with adding additional log which is simpler version than existing one. I would say the new one at Debug level while existing one remains Trace level. The benefit of printing current way is that we can see how heap region (and page) is consisted. So current jtreg test is utilizing it. Do you have any recommendation for print timing for the new log? > > Also, the correct word to use here is "mismatch" not "different" (to > what?) Changed to 'mismatch'. The logging was printing both actual value and preferred value, so I'm thinking those 2 values are different. :) > > - in some discussion we talked about the "node_index" lifecycle, and > what I remember is the following: > > ? - initially, when we commit/make the region available, we set that > HeapRegion's node_index to "Unknown" (with AlwaysPretouch on we can of > course immediately set the correct one). > ? - in HeapRegion::node_index() we do something like the following > pseudo-code: > > ? { > ??? if (_node_index == Unknown) { > ????? // try to get actual node index from OS, and update _node_index > if we could get the information > ??? } > > ??? if (_node_index == Unknown) { // Still unknown > ????? // return _preferred_ node index *without* updating _node_index > ??? } > ??? return _node_index; > ? } > > ? - now, during the "verification" pass, we use whether > HeapRegion::node_index() == preferred_node_index to determine if the > region is on the correct node. > > The change only sets the node index during making the region > available, and immediately to the preferred node index. > > I.e. we eventually end up with the actual node index reported by the > OS in HeapRegion::_node_index. Above is what I tried to implement what we discussed internally. :) I was aware about this bug that set_heapregion_node_index() is using verify_actual_node_index(). But forget fixing it before posting the webrev. :( Changed like below: static void set_heapregion_node_index(HeapRegion* hr) { ? uint node_index; ? if(AlwaysPreTouch) { ??? // If we already pretouched, we can check actual node index here. ??? node_index = G1MemoryNodeManager::mgr()->index_of_address(hr->bottom()); ? } else { ??? node_index = G1MemoryNodeManager::mgr()->preferred_node_index_for_index(hr->hrm_index()); ? } ? hr->set_node_index(node_index); } BTW, setting node index at make_regions_available() is intentional since it is the only place HeapRegion is initialized so HeapRegion change is limited. Am I missing something? I would prefer to remain HeapRegion::node_index() simple getter. Another idea that I didn't choose is let HeapRegion::initialize() do the setting node index work and change HeapRegion::set_node_index() to clear_node_index(which sets to InvalidIndex). > > > - for the expression "G1MemoryNodeManager::num_active_nodes() > 1" it > would be nice to have an extra method in G1MemoryNodeManager instead > of repeating it over and over. Done. FYI, currently there are 2 locations but following patches may use more. > > - heapRegionManager.cpp:print_node_id_of_regions: that method will > print a huge amount of lines. Better to print the summary I sketched > out above. This is why I added at 'trace' level and it is okay to me. > > - in FreeRegionList::remove_region_with_node_index(), the maximum > search depth must take into account how many regions are there per page. > > Consider 1GB pages, 32M region size, meaning that we get 32 > consecutive regions/page. > Now with a node amount of 2, the maximum search depth will be 6 - > which is too low :) > The intention is probably 3? * MAX(page_size / region size, 1) * > numa->num_active_numa_ids(). > > I think it is useful to put that expresssion into > G1NUMA/G1NodeMemoryManager (or somewhere else appropriate - > HeapRegionManager?) to avoid that part having too much info about page > size. Nice catch! Added G1MemoryNodeManager::max_search_depth() which addresses your comment. > > - os.hpp: the new Enum values might or might need some description. enum will be replaced with static const int as AnyId will be removed. > > Btw, there is no regression in performance from the .0/.1 versions of > this code in our benchmarks. Great! Many thanks for doing benchmark tests, Thomas! Let me post next webrev after addressing all Stefan and Kim's comments as well. Thanks, Sangheon > > Thanks, > ? Thomas From sangheon.kim at oracle.com Tue Oct 8 05:44:02 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Mon, 7 Oct 2019 22:44:02 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <25fe4295-7de8-bbc3-3dd3-6b750d4982f2@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <25fe4295-7de8-bbc3-3dd3-6b750d4982f2@oracle.com> Message-ID: Hi Stefan, On 10/4/19 5:23 AM, Stefan Johansson wrote: > Hi Sangheon, > > First of all, thanks for this updated version incorporating a lot of > our comments. I think we are getting closer to the goal, but I still > have some more comments :) Thanks for the nice suggestions! > > On 2019-10-01 18:43, sangheon.kim at oracle.com wrote: >> Hi Kim and others, >> >> This webrev.2 simplified a bit more after changing 'heap expansion' >> approach. >> Previously heap may expand with preferred numa id which means >> contiguous same numa id heap regions may exist but current version is >> assuming to have evenly split heap regions. i.e. 4 numa node system, >> heap regions will be 012301230123, so if we know address or heap >> region index, we can know preferred numa id. >> >> Many codes related to support previous style expansion were removed. >> >> ... >> >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc > > src/hotspot/share/gc/g1/g1Allocator.cpp > --- > ? 31 #include "gc/g1/g1NUMA.hpp" > I don't see why this include is needed, but you might want to include > gc/g1/g1MemoryNodeManager.hpp instead. You're right. Done. > --- > > hotspot/share/gc/g1/g1CollectedHeap.cpp > --- > 1518?? _mem_node_mgr(G1MemoryNodeManager::create()), > > I saw your response to Kim regarding G1Allocator needing it do be > initialized and I get that, but have you looked at moving the creation > of G1Allocator to initialize() as well, I think it's first use is > actually below: > 1802?? _mem_node_mgr->set_page_size(page_size); > here: > 1851?? _allocator->init_mutator_alloc_regions(); > > I might be missing some other place where it gets called, but I think > it should be safe to create both the node manager and the allocator > early in initialize(). Yeah, we can consider this as well. But there are some other followup enhancements which may affect to this initialization order, so I would like to leave as is. And then file a separate CR. One of the example is separating free list, so HeapRegionManager also needs G1MemoryNodeManager instance to initialize free list. > --- > > src/hotspot/share/gc/g1/g1RegionToSpaceMapper.hpp > --- > 28 #include "gc/g1/g1MemoryNodeManager.hpp" > > Remove this include. Done. > --- > > src/hotspot/share/gc/g1/g1_globals.hpp > --- > 326??????????????? range(0, 100) > > Remove the backslash and add back the removed line to leave the file > gc, heap, numa, verificationunchanged. Done. > --- > > src/hotspot/share/gc/g1/heapRegionManager.cpp > --- > ?142?? if (hr != NULL) { > ?143???? assert(hr->next() == NULL, "Single region should not have > next"); > ?144???? assert(is_available(hr->hrm_index()), "Must be committed"); > ?145 > ?146???? verify_actual_node_index(hr->bottom(), hr->node_index()); > ?147?? } > > I don't think this is a good place to do the verification, we allocate > the free region while holding a lock and I think we should avoid doing > a system call there. I would rather see this done during a safepoint, > having a closure that iterates the heap and verify all regions. I tried to point out this during the discussion but probably not enough. :( My understanding of the result is okay as the logs are protected by log level+tag. But as replied to Thomas, I will remove the verification at HRM::allocate_free_region() if there's no more opinions. Any opinions? Thomas or Kim? > > I also think it would be nice to have two levels of the output, the > one line for each region on trace level and on debug we can have a > summary, something like: > NUMA Node 1: expected=25, actual=23 > NUMA Node 2: expected=25, actual=27 > > What do you (and others) think about that? Having 2 level log print seems good to me. And your suggestion is similar to Thomas' one and I would like to address it at the later patch #3 (JDK-8220312 which is also part of the JEP) > --- > ?216 static void print_node_id_of_regions(uint start, uint num_regions){ > ?217?? LogTarget(Trace, gc, heap, numa) lt; > > I understand that it might make the test a bit more complicated, but > have you thought about instead adding the node index to the heap > printing done when is enabled on trace level? So you are suggesting the log tag from gc+heap+numa to gc+heap+region? > --- > ?235 static void set_heapregion_node_index(HeapRegion* hr) { > > I don't think we should special case for when AlwaysPreTouch is on and > instead always just call hr->set_node_index(preferred_index) directly > in make_regions_available. The reason is that I think it will make the > NUMA support harder to understand and explain and it can potentially > also hide problems with a systems configuration. It might also > actually be worse then using the preferred id, because the OS might > decide to move the pages back to the preferred node right after we > checked this (not sure it will happen, but in theory). I have different opinion, sorry. I do believe when AlwaysPreTouch is enabled, we should check actual node and then use it because; 1. If don't check the actual node id when 'AlwayPreTouch' is enabled, we will loose a chance of having improvement if actual node is different from preferred node. (I know this will not happen frequently but in theory.. ) 2. I don't think acting differently with AlwayPreTouch is a problem. I think it is opposite that it is a good chance to analyze the behavior of VM earlier. Earlier means we are planning to add verification code at safepoint(not yet decided when, so please give me good suggestion) which is later than make_regions_available(). In addition, the default value of AlwaysPreTouch is false so it means user requested pages to faulted in. 3. We are already assuming we cannot immediately react when OS migrates the memory. So if OS migrates after checking, still we are consistent on that assumption. > > An other problem with this code is the call to: > verify_actual_node_index(hr->bottom(), node_index) > > This function will only return the "actual" node index if logging for > is enable on debug level. Yes, I'm aware of this problem so planned to fix before posting the webrev but completely forgot about it. My bad. Replaced to work as expected. > --- > > ?346? bool HeapRegionManager::is_on_preferred_index(uint region_index, > uint preferred_node_index) { > ?347??? uint region_node_index = > G1MemoryNodeManager::mgr()->preferred_index_for_address( > ?348 G1CollectedHeap::heap()->bottom_addr_for_region(region_index)); > ?349?? return region_node_index == preferred_node_index || > ?350????????? preferred_node_index == G1MemoryNodeManager::AnyNodeIndex; > > I guess adding the AnyNodeIndex case here is because in this patch > nobody is expanding on a preferred node, right? To me this is just > another argument to not do any changes to the expand code in this > patch. I know I suggested adding expand_on_preferred_node(), but I > should have been clearer about when I think we should add it. Got it. Removed AnyNodeIndex. > --- > > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > --- > ? 56?? // Returns memory node ids > ? 57?? virtual const int* node_ids() const; > > Doesn't seem to be used, remove. It will be used at patch 3/3, JDK-8220312. > --- > > src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp > --- > ?67?? LINUX_ONLY(if (UseNUMA) { > ... > ?79???? delete numa; > ?80?? }) > > A bit confusing with a multi-line LINUX_ONLY, I would prefer to hide > this in a private helper, something like: > ? if (UseNUMA) { > ???? LINUX_ONLY(create_numa_manager()); > ? } > > ? if (_inst == NULL) { > ??? _inst = new G1MemoryNodeManager(); > ? } > > Not really happy about this either, but we can look at simplifying the > NUMA initialization as a follow up. Changed as Kim suggested, hope you are okay with this. #ifdef LINUX > --- > > src/hotspot/share/gc/g1/g1NUMA.hpp > --- > ? 87?? // Returns numa id of the given numa index. > ? 88?? inline int numa_id_of_index(uint numa_index) const; > > Currently unused, either remove or make use of it when calling > numa_make_local. Done. > --- > ? 94?? // Returns current active numa ids. > ? 95?? const int* numa_ids() const { return _numa_ids; } > > Only used by memory manager above, which in turn is unused, remove. It will be used at patch 3/3, JDK-8220312. > --- > > src/hotspot/share/gc/g1/g1NUMA.hpp > --- > ? 55 // Request the given memory to locate on preferred node. > ? 56 // There are 2 things to consider. > ? 57 // First, size comparison for G1HeapRegionSize and page size. > ?... > ? 62 // Examples of 4 numa ids with non-preferred numa id. > > What do you think about this instead: > // Request to spread the given memory evenly across the available NUMA > // nodes. Which node to request for a given address is given by the > // region size and the page size. Below are two examples: > > I would also like a "NUMA node" row for each example showing which > numa node the pages and regions end up on. Changed / added as you suggested. Will post the webrev.3 after addressing Kim's comments and tests finished. Thanks, Sangheon > --- > > Thanks, > Stefan > >> Testing: hs-tier1 ~ 5 +-UseNUMA >> >> Thanks, >> Sangheon >> >> >>> ------------------------------------------------------------------------------ >>> >>> >> From thomas.schatzl at oracle.com Tue Oct 8 07:45:45 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 8 Oct 2019 09:45:45 +0200 Subject: RFR: 8231153: Improve concurrent refinement statistics In-Reply-To: <0280B88E-45D7-46C0-A0E5-2E708B0132ED@oracle.com> References: <1DADC595-3106-4CE7-BA5D-7B6C7EE0E81E@oracle.com> <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> <3bafef2f-1380-3105-5c54-5e8095c42409@oracle.com> <0280B88E-45D7-46C0-A0E5-2E708B0132ED@oracle.com> Message-ID: Hi, On 08.10.19 00:38, Kim Barrett wrote: >> On Oct 7, 2019, at 8:57 AM, Stefan Johansson wrote: >> >> Hi Kim, >> >> On 2019-10-02 03:08, Kim Barrett wrote: >>> New webrevs: >>> full: https://cr.openjdk.java.net/~kbarrett/8231153/open.01/ >>> incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.01.inc/ >> >> The changes looks good, just one question around the calculation of total time and size. >> [...] >> >> Did you consider grouping these two functions into one, to avoid iterating the threads twice? Not sure this is a big deal, and it might only make the code more complicated, but it feels a bit unnecessary to do two iteration right after each other. > > Thanks for the suggestion. I tried doing something like that in an > earlier version of this change, but I didn't like how it turned out. > But enough code has changed since then that I decided to try again. > This time seems okay. > > So G1ConcurrentRefine now provides a RefinementStats class that > packages up the time and card counts, and a new function > total_refinment_stats() that returns one of those. Also removed > total_refinement_time() and total_refined_cards(), which are no longer > used. (If that were to change they are easily reinstated as wrappers > over total_refinement_stats().) > > New webrevs: > full: https://cr.openjdk.java.net/~kbarrett/8231153/open.02/ > incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.02.inc/ > still good. Thomas From thomas.schatzl at oracle.com Tue Oct 8 07:50:38 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 8 Oct 2019 09:50:38 +0200 Subject: RFR (S): 8231956: Remove seq_add_card/reference from PerRegionTable class Message-ID: <5a94fbaa-08f1-5fca-62cc-030709b6ba13@oracle.com> Hi all, can I have reviews for this small change that removes some unused methods and performs associated cleanup of unnecessary parameters? There is one related cleanup that might raise some questions: 38 inline void PerRegionTable::add_card_work(CardIdx_t from_card, bool par) { 39 if (!_bm.at(from_card)) { 40 if (par) { 41 if (_bm.par_set_bit(from_card)) { 42 Atomic::inc(&_occupied); changed to 38 inline void PerRegionTable::add_card(CardIdx_t from_card_index) { 39 if (_bm.par_set_bit(from_card_index)) { The reason for this change is that BitMap::par_set_bit() implicitly performs the BitMap::at() check even without doing a cmpxchg, duplicating this functionality. CR: https://bugs.openjdk.java.net/browse/JDK-8231956 Webrev: http://cr.openjdk.java.net/~tschatzl/8231956/webrev/ Testing: hs-tier1-5 Thanks, Thomas From stefan.johansson at oracle.com Tue Oct 8 08:23:19 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 8 Oct 2019 10:23:19 +0200 Subject: RFR: 8231153: Improve concurrent refinement statistics In-Reply-To: <0280B88E-45D7-46C0-A0E5-2E708B0132ED@oracle.com> References: <1DADC595-3106-4CE7-BA5D-7B6C7EE0E81E@oracle.com> <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> <3bafef2f-1380-3105-5c54-5e8095c42409@oracle.com> <0280B88E-45D7-46C0-A0E5-2E708B0132ED@oracle.com> Message-ID: <1c77cb41-2a70-c9d6-a600-2a87df24ae9c@oracle.com> On 2019-10-08 00:38, Kim Barrett wrote: >> On Oct 7, 2019, at 8:57 AM, Stefan Johansson wrote: >> >> Hi Kim, >> >> On 2019-10-02 03:08, Kim Barrett wrote: >>> New webrevs: >>> full: https://cr.openjdk.java.net/~kbarrett/8231153/open.01/ >>> incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.01.inc/ >> >> The changes looks good, just one question around the calculation of total time and size. >> >> src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp >> --- >> 415 Tickspan G1ConcurrentRefine::total_refinement_time() const { >> ... >> 425 const_cast(this)->threads_do(&closure); >> 426 return closure._total_time; >> 427 } >> 428 >> 429 size_t G1ConcurrentRefine::total_refined_cards() const { >> ... >> 439 const_cast(this)->threads_do(&closure); >> 440 return closure._total_cards; >> 441 } >> >> Did you consider grouping these two functions into one, to avoid iterating the threads twice? Not sure this is a big deal, and it might only make the code more complicated, but it feels a bit unnecessary to do two iteration right after each other. > > Thanks for the suggestion. I tried doing something like that in an > earlier version of this change, but I didn't like how it turned out. > But enough code has changed since then that I decided to try again. > This time seems okay. > > So G1ConcurrentRefine now provides a RefinementStats class that > packages up the time and card counts, and a new function > total_refinment_stats() that returns one of those. Also removed > total_refinement_time() and total_refined_cards(), which are no longer > used. (If that were to change they are easily reinstated as wrappers > over total_refinement_stats().) > > New webrevs: > full: https://cr.openjdk.java.net/~kbarrett/8231153/open.02/ > incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.02.inc/ > Thanks for trying it out a second time, this is more or less exactly what I had in mind. Looks good, Stefan > Testing: > mach5 tier1 > > From thomas.schatzl at oracle.com Tue Oct 8 08:39:04 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 8 Oct 2019 10:39:04 +0200 Subject: RFR: 8231489: GC watermark_0_1 failed due to "metaspace.gc.Fault: GC has happened too rare" In-Reply-To: <2aa33fe4-5d04-cd58-f786-d7a12b977ccf@oracle.com> References: <2aa33fe4-5d04-cd58-f786-d7a12b977ccf@oracle.com> Message-ID: Hi, On 03.10.19 10:47, Per Liden wrote: > vmTestbase/metaspace/gc/HighWaterMarkTest relies on timing and fails > when "Metaspace GC Threshold" isn't handled in a STW pause. > > The problem can be reproduced on both G1 and ZGC, but it's hard, as the > window is small. However, it reproduces every time when injecting a > 100ms delay to prolong the GC cycle a bit. This test used to be disabled > for G1 with ClassUnloadingWithConcurrentMark, but JDK-8204163 enabled it > about a year ago. > > Fixing the test properly is tricky. As far as I can see, we can either: > 1) Disable this test for G1+ClassUnloadingWithConcurrentMark and ZGC, or > 2) Add a sleep in the test loop, to make the race less likely to happen, or > 3) Remove the test completely, with the rational that it's a buggy low > value test. > > I've gone with 1) here. The test is already disabled for CMS today, with > code in the test itself (i.e. not using @requires), so I did two > alternative patches: > > A) Follows the existing style to disable the other GCs: > http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt1 > > B) Adds @requires to the tests using the HighWaterMarkTest class, and > removes the old check to disable CMS: > http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt2 > > I prefer B, but I don't have a strong opinion on which way to go. > B is fine with me. Looks good. Thomas From per.liden at oracle.com Tue Oct 8 09:12:09 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 8 Oct 2019 11:12:09 +0200 Subject: RFR: 8231489: GC watermark_0_1 failed due to "metaspace.gc.Fault: GC has happened too rare" In-Reply-To: References: <2aa33fe4-5d04-cd58-f786-d7a12b977ccf@oracle.com> Message-ID: <995df990-30ce-b2eb-e8c1-9199c8a6806d@oracle.com> Thanks for reviewing, Thomas! /Per On 10/8/19 10:39 AM, Thomas Schatzl wrote: > Hi, > > On 03.10.19 10:47, Per Liden wrote: >> vmTestbase/metaspace/gc/HighWaterMarkTest relies on timing and fails >> when "Metaspace GC Threshold" isn't handled in a STW pause. >> >> The problem can be reproduced on both G1 and ZGC, but it's hard, as >> the window is small. However, it reproduces every time when injecting >> a 100ms delay to prolong the GC cycle a bit. This test used to be >> disabled for G1 with ClassUnloadingWithConcurrentMark, but JDK-8204163 >> enabled it about a year ago. >> >> Fixing the test properly is tricky. As far as I can see, we can either: >> 1) Disable this test for G1+ClassUnloadingWithConcurrentMark and ZGC, or >> 2) Add a sleep in the test loop, to make the race less likely to >> happen, or >> 3) Remove the test completely, with the rational that it's a buggy low >> value test. >> >> I've gone with 1) here. The test is already disabled for CMS today, >> with code in the test itself (i.e. not using @requires), so I did two >> alternative patches: >> >> A) Follows the existing style to disable the other GCs: >> http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt1 >> >> B) Adds @requires to the tests using the HighWaterMarkTest class, and >> removes the old check to disable CMS: >> http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt2 >> >> I prefer B, but I don't have a strong opinion on which way to go. >> > > B is fine with me. > > Looks good. > > Thomas > From per.liden at oracle.com Tue Oct 8 09:15:45 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 8 Oct 2019 11:15:45 +0200 Subject: RFR (S): 8231956: Remove seq_add_card/reference from PerRegionTable class In-Reply-To: <5a94fbaa-08f1-5fca-62cc-030709b6ba13@oracle.com> References: <5a94fbaa-08f1-5fca-62cc-030709b6ba13@oracle.com> Message-ID: <785ee316-a389-2218-0e4c-d53db0120088@oracle.com> Looks good! /Per On 10/8/19 9:50 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small change that removes some unused > methods and performs associated cleanup of unnecessary parameters? > > There is one related cleanup that might raise some questions: > > ? 38 inline void PerRegionTable::add_card_work(CardIdx_t from_card, > bool par) { > ? 39?? if (!_bm.at(from_card)) { > ? 40???? if (par) { > ? 41?????? if (_bm.par_set_bit(from_card)) { > ? 42???????? Atomic::inc(&_occupied); > > changed to > > ? 38 inline void PerRegionTable::add_card(CardIdx_t from_card_index) { > ? 39?? if (_bm.par_set_bit(from_card_index)) { > > > The reason for this change is that BitMap::par_set_bit() implicitly > performs the BitMap::at() check even without doing a cmpxchg, > duplicating this functionality. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8231956 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8231956/webrev/ > Testing: > hs-tier1-5 > > Thanks, > ? Thomas From stefan.johansson at oracle.com Tue Oct 8 09:25:52 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 8 Oct 2019 11:25:52 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <25fe4295-7de8-bbc3-3dd3-6b750d4982f2@oracle.com> Message-ID: Hi Sangheon, Thanks for addressing all out comments. Just some quick replies below. On 2019-10-08 07:44, sangheon.kim at oracle.com wrote: > Hi Stefan, > > On 10/4/19 5:23 AM, Stefan Johansson wrote: >> Hi Sangheon, >> >> First of all, thanks for this updated version incorporating a lot of >> our comments. I think we are getting closer to the goal, but I still >> have some more comments :) > Thanks for the nice suggestions! > >> >> On 2019-10-01 18:43, sangheon.kim at oracle.com wrote: >>> Hi Kim and others, >>> >>> This webrev.2 simplified a bit more after changing 'heap expansion' >>> approach. >>> Previously heap may expand with preferred numa id which means >>> contiguous same numa id heap regions may exist but current version is >>> assuming to have evenly split heap regions. i.e. 4 numa node system, >>> heap regions will be 012301230123, so if we know address or heap >>> region index, we can know preferred numa id. >>> >>> Many codes related to support previous style expansion were removed. >>> >>> ... >>> >>> webrev: >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc >> >> src/hotspot/share/gc/g1/g1Allocator.cpp >> --- >> ? 31 #include "gc/g1/g1NUMA.hpp" >> I don't see why this include is needed, but you might want to include >> gc/g1/g1MemoryNodeManager.hpp instead. > You're right. > Done. > >> --- >> >> hotspot/share/gc/g1/g1CollectedHeap.cpp >> --- >> 1518?? _mem_node_mgr(G1MemoryNodeManager::create()), >> >> I saw your response to Kim regarding G1Allocator needing it do be >> initialized and I get that, but have you looked at moving the creation >> of G1Allocator to initialize() as well, I think it's first use is >> actually below: >> 1802?? _mem_node_mgr->set_page_size(page_size); >> here: >> 1851?? _allocator->init_mutator_alloc_regions(); >> >> I might be missing some other place where it gets called, but I think >> it should be safe to create both the node manager and the allocator >> early in initialize(). > Yeah, we can consider this as well. But there are some other followup > enhancements which may affect to this initialization order, so I would > like to leave as is. And then file a separate CR. > One of the example is separating free list, so HeapRegionManager also > needs G1MemoryNodeManager instance to initialize free list. > >> --- >> >> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.hpp >> --- >> 28 #include "gc/g1/g1MemoryNodeManager.hpp" >> >> Remove this include. > Done. > >> --- >> >> src/hotspot/share/gc/g1/g1_globals.hpp >> --- >> 326??????????????? range(0, 100) >> >> Remove the backslash and add back the removed line to leave the file >> gc, heap, numa, verificationunchanged. > Done. > >> --- >> >> src/hotspot/share/gc/g1/heapRegionManager.cpp >> --- >> ?142?? if (hr != NULL) { >> ?143???? assert(hr->next() == NULL, "Single region should not have >> next"); >> ?144???? assert(is_available(hr->hrm_index()), "Must be committed"); >> ?145 >> ?146???? verify_actual_node_index(hr->bottom(), hr->node_index()); >> ?147?? } >> >> I don't think this is a good place to do the verification, we allocate >> the free region while holding a lock and I think we should avoid doing >> a system call there. I would rather see this done during a safepoint, >> having a closure that iterates the heap and verify all regions. > I tried to point out this during the discussion but probably not enough. :( > My understanding of the result is okay as the logs are protected by log > level+tag. But as replied to Thomas, I will remove the verification at > HRM::allocate_free_region() if there's no more opinions. > > Any opinions? Thomas or Kim? > >> >> I also think it would be nice to have two levels of the output, the >> one line for each region on trace level and on debug we can have a >> summary, something like: >> NUMA Node 1: expected=25, actual=23 >> NUMA Node 2: expected=25, actual=27 >> >> What do you (and others) think about that? > Having 2 level log print seems good to me. > And your suggestion is similar to Thomas' one and I would like to > address it at the later patch #3 (JDK-8220312 which is also part of the > JEP) > >> --- >> ?216 static void print_node_id_of_regions(uint start, uint num_regions){ >> ?217?? LogTarget(Trace, gc, heap, numa) lt; >> >> I understand that it might make the test a bit more complicated, but >> have you thought about instead adding the node index to the heap >> printing done when is enabled on trace level? > So you are suggesting the log tag from gc+heap+numa to gc+heap+region? No, my suggestion is to add it to HeapRegion::print_on(outputStream* st), if numa is enable. Adding a new column for numa node id could be nice to have not only for testing. This would require the test to change a bit an possibly even add a WhiteBox method that prints all region information. This would be nice since it both gives useful output in the region printing and you can control when it is printed from the test. But it would make the parsing of the information a little bit harder. > >> --- >> ?235 static void set_heapregion_node_index(HeapRegion* hr) { >> >> I don't think we should special case for when AlwaysPreTouch is on and >> instead always just call hr->set_node_index(preferred_index) directly >> in make_regions_available. The reason is that I think it will make the >> NUMA support harder to understand and explain and it can potentially >> also hide problems with a systems configuration. It might also >> actually be worse then using the preferred id, because the OS might >> decide to move the pages back to the preferred node right after we >> checked this (not sure it will happen, but in theory). > I have different opinion, sorry. > I do believe when AlwaysPreTouch is enabled, we should check actual node > and then use it because; > 1. If don't check the actual node id when 'AlwayPreTouch' is enabled, we > will loose a chance of having improvement if actual node is different > from preferred node. (I know this will not happen frequently but in > theory.. ) > 2. I don't think acting differently with AlwayPreTouch is a problem. I > think it is opposite that it is a good chance to analyze the behavior of > VM earlier. Earlier means we are planning to add verification code at > safepoint(not yet decided when, so please give me good suggestion) which > is later than make_regions_available(). In addition, the default value > of AlwaysPreTouch is false so it means user requested pages to faulted in. > 3. We are already assuming we cannot immediately react when OS migrates > the memory. So if OS migrates after checking, still we are consistent on > that assumption. It's ok that we have different opinions, and I'm fine with this if everybody else agrees on it. > >> >> An other problem with this code is the call to: >> verify_actual_node_index(hr->bottom(), node_index) >> >> This function will only return the "actual" node index if logging for >> is enable on debug level. > Yes, I'm aware of this problem so planned to fix before posting the > webrev but completely forgot about it. My bad. > Replaced to work as expected. > >> --- >> >> ?346? bool HeapRegionManager::is_on_preferred_index(uint region_index, >> uint preferred_node_index) { >> ?347??? uint region_node_index = >> G1MemoryNodeManager::mgr()->preferred_index_for_address( >> ?348 G1CollectedHeap::heap()->bottom_addr_for_region(region_index)); >> ?349?? return region_node_index == preferred_node_index || >> ?350????????? preferred_node_index == G1MemoryNodeManager::AnyNodeIndex; >> >> I guess adding the AnyNodeIndex case here is because in this patch >> nobody is expanding on a preferred node, right? To me this is just >> another argument to not do any changes to the expand code in this >> patch. I know I suggested adding expand_on_preferred_node(), but I >> should have been clearer about when I think we should add it. > Got it. > Removed AnyNodeIndex. > >> --- >> >> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp >> --- >> ? 56?? // Returns memory node ids >> ? 57?? virtual const int* node_ids() const; >> >> Doesn't seem to be used, remove. > It will be used at patch 3/3, JDK-8220312. > >> --- >> >> src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp >> --- >> ?67?? LINUX_ONLY(if (UseNUMA) { >> ... >> ?79???? delete numa; >> ?80?? }) >> >> A bit confusing with a multi-line LINUX_ONLY, I would prefer to hide >> this in a private helper, something like: >> ? if (UseNUMA) { >> ???? LINUX_ONLY(create_numa_manager()); >> ? } >> >> ? if (_inst == NULL) { >> ??? _inst = new G1MemoryNodeManager(); >> ? } >> >> Not really happy about this either, but we can look at simplifying the >> NUMA initialization as a follow up. > Changed as Kim suggested, hope you are okay with this. Yes, using an #ifdef should be good enough. > > #ifdef LINUX > > >> --- >> >> src/hotspot/share/gc/g1/g1NUMA.hpp >> --- >> ? 87?? // Returns numa id of the given numa index. >> ? 88?? inline int numa_id_of_index(uint numa_index) const; >> >> Currently unused, either remove or make use of it when calling >> numa_make_local. > Done. > >> --- >> ? 94?? // Returns current active numa ids. >> ? 95?? const int* numa_ids() const { return _numa_ids; } >> >> Only used by memory manager above, which in turn is unused, remove. > It will be used at patch 3/3, JDK-8220312. > >> --- >> >> src/hotspot/share/gc/g1/g1NUMA.hpp >> --- >> ? 55 // Request the given memory to locate on preferred node. >> ? 56 // There are 2 things to consider. >> ? 57 // First, size comparison for G1HeapRegionSize and page size. >> ?... >> ? 62 // Examples of 4 numa ids with non-preferred numa id. >> >> What do you think about this instead: >> // Request to spread the given memory evenly across the available NUMA >> // nodes. Which node to request for a given address is given by the >> // region size and the page size. Below are two examples: >> >> I would also like a "NUMA node" row for each example showing which >> numa node the pages and regions end up on. > Changed / added as you suggested. > > Will post the webrev.3 after addressing Kim's comments and tests finished. > > Thanks, > Sangheon > > >> --- >> >> Thanks, >> Stefan >> >>> Testing: hs-tier1 ~ 5 +-UseNUMA >>> >>> Thanks, >>> Sangheon >>> >>> >>>> ------------------------------------------------------------------------------ >>>> >>>> >>> > From stefan.johansson at oracle.com Tue Oct 8 09:28:22 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 8 Oct 2019 11:28:22 +0200 Subject: RFR: 8231489: GC watermark_0_1 failed due to "metaspace.gc.Fault: GC has happened too rare" In-Reply-To: References: <2aa33fe4-5d04-cd58-f786-d7a12b977ccf@oracle.com> Message-ID: On 2019-10-08 10:39, Thomas Schatzl wrote: > Hi, > > On 03.10.19 10:47, Per Liden wrote: >> vmTestbase/metaspace/gc/HighWaterMarkTest relies on timing and fails >> when "Metaspace GC Threshold" isn't handled in a STW pause. >> >> The problem can be reproduced on both G1 and ZGC, but it's hard, as >> the window is small. However, it reproduces every time when injecting >> a 100ms delay to prolong the GC cycle a bit. This test used to be >> disabled for G1 with ClassUnloadingWithConcurrentMark, but JDK-8204163 >> enabled it about a year ago. >> >> Fixing the test properly is tricky. As far as I can see, we can either: >> 1) Disable this test for G1+ClassUnloadingWithConcurrentMark and ZGC, or >> 2) Add a sleep in the test loop, to make the race less likely to >> happen, or >> 3) Remove the test completely, with the rational that it's a buggy low >> value test. >> >> I've gone with 1) here. The test is already disabled for CMS today, >> with code in the test itself (i.e. not using @requires), so I did two >> alternative patches: >> >> A) Follows the existing style to disable the other GCs: >> http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt1 >> >> B) Adds @requires to the tests using the HighWaterMarkTest class, and >> removes the old check to disable CMS: >> http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt2 >> >> I prefer B, but I don't have a strong opinion on which way to go. >> > > B is fine with me. Same here, I think it is good to use @requires even if they are a bit complicated in this case. Looks good, Stefan > > Looks good. > > Thomas > From per.liden at oracle.com Tue Oct 8 09:49:38 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 8 Oct 2019 11:49:38 +0200 Subject: RFR: 8231489: GC watermark_0_1 failed due to "metaspace.gc.Fault: GC has happened too rare" In-Reply-To: References: <2aa33fe4-5d04-cd58-f786-d7a12b977ccf@oracle.com> Message-ID: Thanks Stefan! /Per On 10/8/19 11:28 AM, Stefan Johansson wrote: > > > On 2019-10-08 10:39, Thomas Schatzl wrote: >> Hi, >> >> On 03.10.19 10:47, Per Liden wrote: >>> vmTestbase/metaspace/gc/HighWaterMarkTest relies on timing and fails >>> when "Metaspace GC Threshold" isn't handled in a STW pause. >>> >>> The problem can be reproduced on both G1 and ZGC, but it's hard, as >>> the window is small. However, it reproduces every time when injecting >>> a 100ms delay to prolong the GC cycle a bit. This test used to be >>> disabled for G1 with ClassUnloadingWithConcurrentMark, but >>> JDK-8204163 enabled it about a year ago. >>> >>> Fixing the test properly is tricky. As far as I can see, we can either: >>> 1) Disable this test for G1+ClassUnloadingWithConcurrentMark and ZGC, or >>> 2) Add a sleep in the test loop, to make the race less likely to >>> happen, or >>> 3) Remove the test completely, with the rational that it's a buggy >>> low value test. >>> >>> I've gone with 1) here. The test is already disabled for CMS today, >>> with code in the test itself (i.e. not using @requires), so I did two >>> alternative patches: >>> >>> A) Follows the existing style to disable the other GCs: >>> http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt1 >>> >>> B) Adds @requires to the tests using the HighWaterMarkTest class, and >>> removes the old check to disable CMS: >>> http://cr.openjdk.java.net/~pliden/8231489/webrev.0-alt2 >>> >>> I prefer B, but I don't have a strong opinion on which way to go. >>> >> >> B is fine with me. > Same here, I think it is good to use @requires even if they are a bit > complicated in this case. > > Looks good, > Stefan > >> >> Looks good. >> >> Thomas >> From thomas.schatzl at oracle.com Tue Oct 8 09:54:56 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 8 Oct 2019 11:54:56 +0200 Subject: RFR (S): 8231956: Remove seq_add_card/reference from PerRegionTable class In-Reply-To: <785ee316-a389-2218-0e4c-d53db0120088@oracle.com> References: <5a94fbaa-08f1-5fca-62cc-030709b6ba13@oracle.com> <785ee316-a389-2218-0e4c-d53db0120088@oracle.com> Message-ID: <1d1f3e1b-c886-155c-75f4-db33edf5b44b@oracle.com> Hi Per, On 08.10.19 11:15, Per Liden wrote: > Looks good! > > /Per thanks for your review. Thomas From stefan.johansson at oracle.com Tue Oct 8 10:49:27 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 8 Oct 2019 12:49:27 +0200 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> Message-ID: Hi Haoyu, I've done some more testing and I haven't seen any issues with the patch so far and the performance looks promising in most cases. For simple tests I've seen some regressions, but I'm not really sure why. Will do some more digging. To move forward with this the first thing we need to do is making sure that you being covered by the Oracle Contributor Agreement is enough. From what we can see it is only you as an individual that has signed the OCA and in that case it is important that this statement from the OCA is fulfilled: "no other person or entity, including my employer, has or will have rights with respect my contributions" Is this the case for this contribution or should we have the university sign the OCA as well? For more information regarding the OCA please refer to: https://www.oracle.com/technetwork/oca-faq-405384.pdf Thanks, Stefan On 2019-09-16 16:02, Haoyu Li wrote: > FYI, the evaluation results on OpenJDK 14 are plotted in the attachment. > I compute the full GC throughput by dividing the heap size before full > GC by the GC pause time, and the results are arithmetic mean values of > ten runs after a warm-up run.?The evaluation is conducted on a machine > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 physical cores > with SMT enabled) and 64G DRAM. > > Best Regrads, > Haoyu Li, > Institute of Parallel and Distributed Systems(IPADS), > School of Software, > Shanghai Jiao Tong University > > > Stefan Johansson > ?2019?9?12??? ??5:34??? > > Hi Haoyu, > > I recently came across your patch and I would like to pick up on > some of the things Kim mentioned in his mails. I especially want > evaluate and?investigate if this is a technique we can use to > improve the other?GCs as well. To start?that work I want to take the > patch for a spin in our internal performance testing. The patch > doesn?t apply clean to the latest JDK repository, so if you could > provide an updated patch that would be very helpful. > > It would also be great if you could share some more information > around the results presented in the paper. For example, it would be > good to get the full?command lines for the different benchmarks so > we can run them locally and reproduce the results?you?ve?seen. > > Thanks, > Stefan > >> 12 mars 2019 kl. 03:21 skrev Haoyu Li > >: >> >> Hi Kim, >> >> Thanks for reviewing and testing the patch. If there are any >> failures or performance degradation relevant to the work, please >> let me know and I'll be very happy to keep improving it. Also, any >> suggestions about code improvements are well appreciated. >> >> I'm not quite sure if both G1 and Shenandoah have the similar >> region dependency issue, since I haven't studied their GC >> behaviors before. If they have, I'm also willing to propose a more >> general optimization. >> >> As to the memory overhead, I believe it will be low because this >> patch exploits empty regions in the young space rather than >> off-heap memory to allocate shadow regions, and also reuses the >> /_source_region/ field of each /RegionData /to record the >> correspongding shadow region index. We only introduce a new >> integer filed /_shadow /in the RegionData class to indicate the >> status of a region, a global /GrowableArray _free_shadow/?to store >> the indices of shadow regions, and a global /Monitor/?to protect >> the array. These information might help if the memory overhead >> need to be evaluated. >> >> Looking forward to your insight. >> >> Best Regrads, >> Haoyu Li, >> Institute of Parallel and Distributed Systems(IPADS), >> School of Software, >> Shanghai Jiao Tong University >> >> >> Kim Barrett > > ?2019?3?12??? ??6:11??? >> >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett >> > wrote: >> > >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li > > wrote: >> >> >> >> Hi Kim, >> >> >> >> I have ported my patch to OpenJDK 13 according to your >> instructions in your last mail, and the patch is attached in >> this mail. The patch does not change much since PSGC is indeed >> pretty stable. >> >> >> >> Also, I evaluate the correctness and performance of PS full >> GC with benchmarks from DaCapo, SPECjvm2008, and JOlden suits >> on a machine with dual Intel Xeon E5-2618L v3 CPUs(16 physical >> cores), 64G DRAM and linux kernel 4.17. The evaluation result, >> indicating 1.9X GC throughput improvement on average, is >> attached, too. >> >> >> >> However, I have no idea how to further test this patch for >> both correctness and performance. Can I please get any >> guidance from you or some sponsor? >> > >> > Sorry I missed that you had sent an updated version of the >> patch. >> > >> > I?ve run the full regression suite across Oracle-supported >> platforms.? There are some >> > failures, but there are almost always some failures in the >> later tiers right now.? I?ll start >> > looking at them tomorrow to figure out whether any of them >> are relevant. >> > >> > I?m also planning to run some of our performance benchmarks. >> > >> > I?ve lightly skimmed the proposed changes.? There might be >> some code improvements >> > to be made. >> > >> > I?m also wondering if this technique applies to other >> collectors.? It seems like both G1 and >> > Shenandoah full gc?s might have similar issues?? If so, a >> solution that is ParallelGC-specific >> > is less interesting than one that has broader >> applicability.? Though maybe this optimization >> > is less important for G1 and Shenandoah, since they actively >> try to avoid full gc?s. >> > >> > I?m also not clear on how much additional memory might be >> temporarily allocated by this >> > mechanism. >> >> I?ve created a CR for this: >> https://bugs.openjdk.java.net/browse/JDK-8220465 >> > From per.liden at oracle.com Tue Oct 8 13:02:21 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 8 Oct 2019 15:02:21 +0200 Subject: RFR: 8232001: ZGC: Ignore metaspace GC threshold until GC is warm Message-ID: As reported here: https://mail.openjdk.java.net/pipermail/zgc-dev/2019-September/000736.html The ZDirector heuristics can get of to a bad start if the statistics is contaminated by early "Metaspace GC Threshold" GC requests. To avoid this, we could simply ignore such requests until the GC is warm, at the potential cost of expanding metaspace a bit more during startup. Bug: https://bugs.openjdk.java.net/browse/JDK-8232001 Webrev: http://cr.openjdk.java.net/~pliden/8232001/webrev.0 /Per From stefan.johansson at oracle.com Tue Oct 8 13:08:11 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 8 Oct 2019 15:08:11 +0200 Subject: RFR (S): 8231956: Remove seq_add_card/reference from PerRegionTable class In-Reply-To: <785ee316-a389-2218-0e4c-d53db0120088@oracle.com> References: <5a94fbaa-08f1-5fca-62cc-030709b6ba13@oracle.com> <785ee316-a389-2218-0e4c-d53db0120088@oracle.com> Message-ID: <4c791423-cf93-0261-f9a0-b208b9baf10e@oracle.com> On 2019-10-08 11:15, Per Liden wrote: > Looks good! > +1 Stefan > /Per > > On 10/8/19 9:50 AM, Thomas Schatzl wrote: >> Hi all, >> >> ?? can I have reviews for this small change that removes some unused >> methods and performs associated cleanup of unnecessary parameters? >> >> There is one related cleanup that might raise some questions: >> >> ?? 38 inline void PerRegionTable::add_card_work(CardIdx_t from_card, >> bool par) { >> ?? 39?? if (!_bm.at(from_card)) { >> ?? 40???? if (par) { >> ?? 41?????? if (_bm.par_set_bit(from_card)) { >> ?? 42???????? Atomic::inc(&_occupied); >> >> changed to >> >> ?? 38 inline void PerRegionTable::add_card(CardIdx_t from_card_index) { >> ?? 39?? if (_bm.par_set_bit(from_card_index)) { >> >> >> The reason for this change is that BitMap::par_set_bit() implicitly >> performs the BitMap::at() check even without doing a cmpxchg, >> duplicating this functionality. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8231956 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8231956/webrev/ >> Testing: >> hs-tier1-5 >> >> Thanks, >> ?? Thomas From thomas.schatzl at oracle.com Tue Oct 8 13:28:25 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 8 Oct 2019 15:28:25 +0200 Subject: RFR (S): 8231956: Remove seq_add_card/reference from PerRegionTable class In-Reply-To: <4c791423-cf93-0261-f9a0-b208b9baf10e@oracle.com> References: <5a94fbaa-08f1-5fca-62cc-030709b6ba13@oracle.com> <785ee316-a389-2218-0e4c-d53db0120088@oracle.com> <4c791423-cf93-0261-f9a0-b208b9baf10e@oracle.com> Message-ID: <813ac43a-c5a7-238c-d7bc-404b124f0b90@oracle.com> Hi Stefan, On 08.10.19 15:08, Stefan Johansson wrote: > > > On 2019-10-08 11:15, Per Liden wrote: >> Looks good! >> > +1 thanks for your review. Thomas From kim.barrett at oracle.com Tue Oct 8 19:05:44 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 8 Oct 2019 15:05:44 -0400 Subject: RFR: 8231153: Improve concurrent refinement statistics In-Reply-To: <1c77cb41-2a70-c9d6-a600-2a87df24ae9c@oracle.com> References: <1DADC595-3106-4CE7-BA5D-7B6C7EE0E81E@oracle.com> <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> <3bafef2f-1380-3105-5c54-5e8095c42409@oracle.com> <0280B88E-45D7-46C0-A0E5-2E708B0132ED@oracle.com> <1c77cb41-2a70-c9d6-a600-2a87df24ae9c@oracle.com> Message-ID: <56E306A1-45DC-4779-A4AD-62133B0A0D52@oracle.com> > On Oct 8, 2019, at 4:23 AM, Stefan Johansson wrote: > > > > On 2019-10-08 00:38, Kim Barrett wrote: >> New webrevs: >> full: https://cr.openjdk.java.net/~kbarrett/8231153/open.02/ >> incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.02.inc/ >> > Thanks for trying it out a second time, this is more or less exactly what I had in mind. > > Looks good, > Stefan > >> Testing: >> mach5 tier1 Thanks. From kim.barrett at oracle.com Tue Oct 8 19:06:00 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 8 Oct 2019 15:06:00 -0400 Subject: RFR: 8231153: Improve concurrent refinement statistics In-Reply-To: References: <1DADC595-3106-4CE7-BA5D-7B6C7EE0E81E@oracle.com> <4a851a19-0979-c696-0c80-1165bd755834@oracle.com> <3bafef2f-1380-3105-5c54-5e8095c42409@oracle.com> <0280B88E-45D7-46C0-A0E5-2E708B0132ED@oracle.com> Message-ID: <3D3BA259-AE30-460A-9381-B6E67A2207EE@oracle.com> > On Oct 8, 2019, at 3:45 AM, Thomas Schatzl wrote: > > Hi, > > On 08.10.19 00:38, Kim Barrett wrote: >> New webrevs: >> full: https://cr.openjdk.java.net/~kbarrett/8231153/open.02/ >> incr: https://cr.openjdk.java.net/~kbarrett/8231153/open.02.inc/ > > still good. > > Thomas Thanks. From kim.barrett at oracle.com Tue Oct 8 23:48:06 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 8 Oct 2019 19:48:06 -0400 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> Message-ID: > On Sep 30, 2019, at 7:14 AM, Thomas Schatzl wrote: > All fixed in new webrev: > > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.0_to_1 (diff) > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.1 (full) > > Rerunning hs-tier1-5, almost done > > Thanks, > Thomas Because of NMETHOD_SENTINEL we already have a "lying to the type system" problem for the nmethod link field, as it doesn't necessarily contain an nmethod*. The introduction of the strongly claimed tagging mechanism just emphasizes that. I think that should be cleaned up and the "lying to the type system" should be eliminated. However, I also think that can be done as a followup cleanup. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp I initially thought there was a bug in the strong claim. A weak claim is established by thread1, successfully setting the link field to NMETHOD_SENTINEL. Before thread1 continues from there... Thread2 tries to strongly mark, and sees NMETHOD_SENTINAL in the link field. NMETHOD_SENTINEL == badAddress == -2, which happens to have the low bit clear. So this seems to work after all. Add STATIC_ASSERT(is_aligned_((uintptr_t)NMETHOD_SENTINEL, 2)) ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp [pre-existing] I think the comment in oops_do_marking_prolog about using cmxchg makes no sense. And why does oops_do_marking_epilogue use cmpxchg at the end? ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp I think using a self-loop to mark end of list would eliminate the need for NMETHOD_SENTINEL. Also eliminates the need for oops_do_marking_prologue. Requires changing oops_do_marking_epilogue to recognize the self-loop. [This can be deferred to later cleanup.] ------------------------------------------------------------------------------ oops_do_mark_merge_claim second argument is called "claim" but should be "strongly_claim" or some such. Actually the whole new suite of oops_do_mark_is_claimed oops_do_mark_strip_claim oops_do_mark_merge_claim all seem misnamed. The link field having a non-NULL value is a (possibly weak) claim. The link field having a non-NULL not 2byte aligned value is a strong claim. Those functions are all dealing with strong claims. is_claimed should use is_aligned strip_claim should use align_down ------------------------------------------------------------------------------ With the introduction of the strongly claimed tag bit, the link field ought not be of type nmethod*, because using that type means we're constructing improperly aligned pointers, which is unspecified behavior. Should now be char* or void* or some opaque pointer type. struct nmethod_claim; // nmethod_claimant ? [This can be deferred to later cleanup.] ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 1884 // On fall through, another racing thread marked this nmethod before we did. [pre-existing] I think s/marked/claimed/ would be better. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 1900 while (cur != oops_do_mark_strip_claim(NMETHOD_SENTINEL)) { Why stripping the claim tag from NMETHOD_SENTINEL; it isn't tagged. (And must not be, as discussed in an earlier comment.) ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.hpp 494 bool test_set_oops_do_strongly_marked(); 495 bool test_set_oops_do_mark(bool strongly = false); I found the naming and protocol here confusing. I'd prefer a "try_claim" style that returns true if the claim attempt is successful, similar to what we now (since JDK-8210119) do for SubTasksDone and friends. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1ParScanThreadState.cpp 131 bool G1ParScanThreadState::has_remembered_strong_nmethods() const { 132 return _remembered_strong_nmethods != NULL && _remembered_strong_nmethods->length() > 0; 133 } Use !is_empty() rather than length() > 0. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1ParScanThreadState.cpp 105 assert(_remembered_strong_nmethods == NULL || _remembered_strong_nmethods->is_empty(), "should be empty at this point."); Use !has_remembered_strong_nmethods(). ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1CollectedHeap.cpp 3874 if (collector_state()->in_initial_mark_gc()) { 3875 remark_strong_nmethods(per_thread_states); 3876 } I think this additional task and the associated pending strong nmethod sets in the pss can be eliminated by using a 2-bit tag and a more complex state machine earlier. I think the additional task is acceptable at least for now, and this could be examined as a followup. There's a tradeoff between the cost of the additional task and the added complexity to remove it. Below is some (obviously) untested pseudo-code (sort of pythonesque) for what I have in mind. The basic idea is that if thread A wants to strongly process an nmethod while thread B is weakly processing it, thread A can mark the nmethod as needing strong processing. When thread B finishes the weak processing it notices the strong request and performs the strong processing too. Note that this code doesn't use NMETHOD_SENTINEL. The end of the global list is indicated by having the last element have a self-looped link value with appropriate tagging. That avoids both the sentinel and tagged NULL values (which have their own potential problems). States, encoded in the link member: - unclaimed: NULL - weak: tag 00 - weak done: tag 01 - weak, need strong: tag 10 - strong: tag 11 weak_processor(n): if n->link != NULL: # already claimed; nothing to do here. return elif not replace_if_null(tagged(n, 0), &n->link): # just claimed by another thread; nothing to do here. return # successfully claimed for weak processing. assert n->link == tagged(n, 0) do_weak_processing(n) # push onto global list. self-loop end of list to avoid tagged NULL. next = xchg(n, &_list_head) if next == NULL: next = n # try to install end of list + weak done tag. if cmpxchg(tagged(next, 1), &n->link, tagged(n, 0) == tagged(n, 0): return # failed, which means some other thread added strong request. assert n->link == tagged(n, 2) # do deferred strong processing. n->link = tagged(next, 3) do_strong_processing(n) strong_processor(n): if replace_if_null(tagged(n, 3), &n->link): # successfully claimed for strong processing. do_strong_processing(n) # push onto global list. self-loop end of list to avoid tagged NULL. next = xchg(n, &_list_head) if next == NULL: next = n n->link = tagged(next, 3) return # claim failed. figure out why and handle it. while true: raw_next = n->link next = strip_tag(raw_next) if raw_next - next >= 2: # already claimed for strong processing or requested for such. return elif cmpxchg(tagged(next, 2), &n->link, tagged(next, 0)) == tagged(next, 0): # added deferred strong request, so done. return elif cmpxchg(tagged(next, 3), &n->link, tagged(next, 1)) == tagged(next, 1): # claimed deferred strong request. do_strong_processing(n) return # concurrent changes interferred with us. try again. # number of retries is bounded and small, since the state # transitions are few and monotonic. (I think we cannot # reach here more than 2 times.) ------------------------------------------------------------------------------ From sangheon.kim at oracle.com Wed Oct 9 04:27:03 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 8 Oct 2019 21:27:03 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: Hi Kim, On 10/7/19 11:10 AM, Kim Barrett wrote: >> On Oct 1, 2019, at 12:43 PM, sangheon.kim at oracle.com wrote: >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/ >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc >> Testing: hs-tier1 ~ 5 +-UseNUMA > I like the direction of this. I think there are some additional simplifications possible > around G1NUMA, which are discussed below. > > I still need to respond to your earlier individual responses. That will be in another email. OK! > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp > 67 LINUX_ONLY(if (UseNUMA) { > > Maybe instead use #ifdef LINUX. Either way, add a trailing comment at > the end of the conditional block. Changed to use #ifdef LINUX > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.cpp > 79 // If we don't have preferred numa id, touch the given area with round-robin manner. > > This comment seems out of place / obsolete. OK > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.cpp > 138 uint region_index = G1CollectedHeap::heap()->addr_to_region(address); > > This requires the address be in the range reserved for the heap. > That's okay; that's what we decided we want to do. But that should be > part of the function's description, e.g. it should be mentioned as a > precondition for prefered_index_for_address. I think I addressed your point. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.hpp > 87 // Returns numa id of the given numa index. > 88 inline int numa_id_of_index(uint numa_index) const; > > Unused function. Removed. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.hpp > 83 inline uint index_of_numa_id(int numa_id) const; > > This function should be private. It is only needed in the > implementation of index_of_current_thread and index_of_address. > It should have a precondition that the argument is an active numa id, > e.g. a definition something like > > uint G1NUMA::index_of_numa_id(int numa_id) const { > assert(numa_id >= 0, "invalid numa id %d", numa_id); > assert(numa_id < _len_numa_id_to_index_map, "invalid numa id %d", numa_id); > uint numa_index = _numa_id_to_index_map[numa_id]; > assert(numa_index != G1MemoryNodeManager::InvalidNodeIndex, > "invalid numa id %d", numa_id); > return numa_index; > } > > To make this work, index_of_address should also be changed, to > something like: > > uint G1NUMA::index_of_address(HeapWord* address) const { > int numa_id = os::numa_get_address_id((uintptr_t)address); > if (numa_id == os::InvalidId) { > return G1MemoryNodeManager::InvalidNodeIndex; > } else { > return index_of_numa_id(numa_id); > } > } Changed as your patch. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.cpp > 31 void G1NUMA::init_numa_id_to_index_map(const int* numa_ids, uint num_numa_ids) { > > This function is only called from one place, G1NUMA::initialize. The > code would be simpler and more clear if the body of this function were > just directly inlined into initialize and this function eliminated. > > And once that's done it becomes apparent that initialize could be > hoisted into the (moved out of line) constructor. > > This also lets num_active_numa_ids just be a trivial accessor function > in the header; there's no possibility of finding it uninitialized > after the constructor returns, so no need for the assert that it has > been set. Changed as your patch. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.inline.hpp > 32 inline bool G1NUMA::is_valid_numa_id(int numa_id) { > > Only called by init_numa_to_index_map in a guarantee that would be > more obviously vacuous after the earlier suggested merge of that > function into initialize. Done. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/os.hpp > 393 enum NumaIdState { > 394 InvalidId = -1, > 395 AnyId = -2 > 396 }; > > The type NumaIdState is unused. > The AnyId enumerator is unused. > > Suggest making InvalidId just a static const int in the class. Changed to static const int. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/os.hpp > 398 static int numa_get_address_id(uintptr_t address); > > Why is the type of address uintptr_t rather than a pointer type? > > I see that the underlying Linux syscall (get_mempolicy) wants an > unsigned long, but that detail ought to be isolated to the Linux > implementation layer. Callers are going to want to pass in addresses > (pointers) and should not need to cast. That cast should happen at > the point where the syscall is being made. Changed to void*. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > 37 inline MutatorAllocRegion* G1Allocator::mutator_alloc_region(uint node_index) { > 38 assert(_g1h->mem_node_mgr()->is_valid_node_index(node_index), "Invariant, index %u", node_index); > 39 return &_mutator_alloc_regions[node_index]; > 40 } > > I think the assert here should be that node_index < _num_alloc_regions. > > is_valid_node_index gives a somewhat indirect (so weak) check of the > validity of the array access. > > Such a change would also eliminate one of the two callers of > is_valid_node_index, which I think can be eliminated (see next comment). Done. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegionManager.cpp > 126 HeapRegion* HeapRegionManager::allocate_free_region(HeapRegionType type, uint requested_node_index) { > ... > 131 if (mgr->num_active_nodes() > 1 && mgr->is_valid_node_index(requested_node_index)) { > > I think a better test here would be > if ((requested_node_index != G1MemoryNodeManager::AnyNodeIndex) && > (mgr->num_active_nodes() > 1)) { > > This eliminates one of two calls to is_valid_node_index (which I think > can be eliminated, see previous comment). And callers should not be > passing in actually invalid indices. I think there are asserts lower > down in the stack (in G1NUMA) to complain about such, but they > shouldn't be getting in here anyway. Done, but introduced G1MemoryNodeManager::has_multi_node() which is Thomas' comment. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > 42 static const uint InvalidNodeIndex = UINT_MAX; > 43 static const uint AnyNodeIndex = InvalidNodeIndex - 1; > > These seem misplaced to me. Shouldn't they be in G1NUMA? Possibly > reexported here for convenience? (Assuming it actually is convenient.) Yes, for convenience. But G1NUMA is merged into G1MemoryNodeManager so no more argue here. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > 42 static const uint InvalidNodeIndex = UINT_MAX; > > I think the only place this arises is as the result of > index_of_address when the numa id for the location isn't known. Which > suggests the name should be "UnknownNodeIndex" rather than > "InvalidNodeIndex". And the description of index_of_address should > mention that it can return that value (whatever its name ends up being.) Good idea. Changed to UnknownNodeIndex. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > > I'm not sure G1MemoryNodeManager is useful. It seems to be just a thin > wrapper over the G1NUMA API, with a virtual dispatch between a > non-NUMA or single-node implementation and the multi-node > implementation that uses a G1NUMA that is only created for multi-node > support. The virtual dispatch can't be eliminated in most (all or > nearly all?) cases. > > But I think most of the single-node implementation would just fall out > as a 1-node boundary case for multi-node G1MemoryNodeManager / G1NUMA. > > So I think this might all be collapsed down to a G1NUMA that always > exists. If there are any places that require actual distinction, that > class can have a private member to select the appropriate behavior. > (Or maybe it's just the number of active nodes.) G1NUMA is merged to G1MemoryNodeManager. Previously G1MNM owned G1NUMA so I tried to keep this relation. Now G1MultiMemoryNodeManager has NUMA related implementations. Thomas also suggested merging these two. We discussed about virtual dispatch stuff, but I couldn't find anything better than now. More than welcome if you have any suggestion. Or keep for later enhancements. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.inline.hpp > > I think that with the changes I've proposed above, I think there's not > much left in this file, and it might not be worth having it. Consider > moving any lingering remnents to the .hpp or .cpp file as appropriate. Removed g1NUMA.inline.hpp > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.hpp > > Consider adding a page_size() accessor function (private for now) that > asserts the associated data member is > 0 (e.g. initialized), since it > is initialized after construction. Use that instead of direct uses of > the data member. Added page_size(). > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/arguments.cpp > 4108 // such as Parallel GC for Linux and Solaris or G1 GC for Linux will > ... > 4111 // Non NUMA-aware collectors such as CMS and Serial-GC on > 4112 // all platforms and ParallelGC on Windows will interleave all > > I think that these comments about which configurations do or don't > support NUMA are just a maintenance headache. I think it would be > better here to just say > > NUMA-aware collectors will interleave ... > Non NUMA-aware collectors will interleave ... > > And leave out mentions of configurations that may change (as is being > done here) or be removed (as soon expected for CMS). I just removed mentioning of configurations. ??? // UseNUMAInterleaving is set to ON for all collectors and platforms when ??? // UseNUMA is set to ON. NUMA-aware collectors will interleave old gen and ??? // survivor spaces on top of NUMA allocation policy for the eden space. ??? // Non NUMA-aware collectors will interleave all of the heap spaces across ??? // NUMA nodes. Here's the major change list at the webrev. Or arguable list :) 1) Verification at HRM::allocate_free_region() is removed and it will be added somewhere at safepoint by JDK-8220312 (3/3 which is part of this JEP). Probably at the end of young gc? 2) Node id printing is changed. Removed old one and added at HeapRegion::print_on() with new column. Node id is only printed when UseNUMA is enabled and gc+heap+region=trace. If there's single active node, it will print the node id and this is intentional. Another approach would be printing only if there are multiple nodes. 3) If AlwaysPreTouch is enabled, HeapRegion will have actual node index instead of preferred node index. 4) HeapRegion::_node_index is set at HRM::make_regions_available() as there is the only place initializing HeapRegion. Another approach would be setting the index at HeapRegion::initialize(we have to pollute HR with G1MNM stuff) or conditionally(*) setting the index at HeapRegion::node_index(). (*) if the index is unknown etc.. 5) G1NUMA class is merged into G1MemoryNodeManager. Webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.3 http://cr.openjdk.java.net/~sangheki/8220310/webrev.3.inc Testing: hs-tier 1~5, with/without UseNUMA Thanks, Sangheon > > ------------------------------------------------------------------------------ > From zgu at redhat.com Wed Oct 9 11:47:59 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 9 Oct 2019 07:47:59 -0400 Subject: RFR 8232008: Shenandoah: C1 load barrier does not match interpreter version Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8232008 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232008/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) with x86_64 and x86-32 JVM on Linux. Thanks, -Zhengyu From thomas.schatzl at oracle.com Wed Oct 9 14:10:13 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 9 Oct 2019 16:10:13 +0200 Subject: G1 patch of elastic Java heap In-Reply-To: References: <6270ce59-4a8e-431e-9ccf-f6d2c0f927eb.maoliang.ml@alibaba-inc.com> <1267a5dd2cf6cc1d03df64d07a06ba0f45195951.camel@oracle.com> <3140197d-8cab-4a86-af92-58431c74cb6b.maoliang.ml@alibaba-inc.com> Message-ID: Hi, sorry for the late reply. First, I have a more general question: lots of changes deal with providing options to separately change properties generations at runtime. Like if there were separate pools of young and old gen memory. G1 is kind of built upon the idea that you pass a pause time goal and then modifies generation sizes and takes memory for the generations from a single memory pool as needed. To me this indicates that automatic sizing is not working correctly, but there are many(?) use cases where it does not work as expected. This requires manual tuning in generation sizes for whatever reason. Can you share your thoughts about this? There seems to be some bit of information missing to me - this is probably the reason for some of the dumb questions about the flags, and me being not too fond of them. On 26.09.19 08:49, Liang Mao wrote: > > Hi All, > > Here is the user guide of G1ElasticHeap patch. Hope it will help to > understand. > > G1ElasticHeap > G1ElasticHeap?is?a?GC?feature?to?return?memory?of?Java?heap?to?OS?to?reduce?the > > memory?footprint?of?Java?process.?To?enable?this?feature,?you?need?to?use?G1?GC > > by?options:?-XX:+UseG1GC?-XX:+G1ElasticHeap. > > ##?Usage > There?are?3?modes?which?can?be?enabled?in?G1ElasticHeap. > ###?1.?Periodic?uncommit > Memory?will?be?uncommitted?by?periodic?GC.?To?enable?periodic?uncommit,?use?option > > -XX:+ElasticHeapPeriodicUncommit?or?dynamically?enable?the?option?via?jinfo: > > `jinfo?-flag?+ElasticHeapPeriodicUncommit?PID` As far as I can tell, this setting periodically scans the heap for (too many?) uncommitted regions and, well, uncommits them. Not completely sure if that is better than doing periodic gcs - as we do not expect to gain memory outside of a GC; in JDK12+ (I think) G1 alwasy uncommits at the remark pause which should give most of the benefits. There *may* be reason to also try to uncommit after the last mixed GC, but not sure if uncommit is that urgent - to some degree the existing JEP 346: Promptly return unused committed memory from G1 (https://openjdk.java.net/jeps/346) should cover some of the use cases. I.e. after some delay (and inactivity) there will be another Remark pause anyway. The main reason why Remark has been chosen to uncommit memory is because we assume that the heap size at Remark (this is what adaptive IHOP shoots for) is the "target heap size". > Related?options: > >>?ElasticHeapPeriodicYGCIntervalMillis,?15000?\ > (target?young?GC?interval?15?seconds?in?default)?\ > (eg,?if?Java?runs?with?MaxNewSize=4g,?young?GC?every?30?seconds,?G1ElasticHeap?will?keep?15s > ?GC?interval?and?make?a?max?2g?young?generation?to?uncommit?2g?memory) > >>?ElasticHeapPeriodicInitialMarkIntervalMillis,?3600000?\ > (Target?initial?mark?interval,?1?hour?in?default.?Unused?memory?of?old?generation?will?be?uncommitted > ?after?last?mixed?GC.) This sesm to implement an unconditional concurrent cycle like with the CMSTriggerInterval flag for CMS. Maybe there is a more clever alternative on triggering concurrent cycles like ZGC does based on the ratio between time spent by the mutator and the gc. > >>?ElasticHeapPeriodicUncommitStartupDelay,?300?\ > (Delay?after?startup?to?do?memory?uncommit,?300?seconds?in?default) > >>?ElasticHeapPeriodicMinYoungCommitPercent,?50?\ > (Percentage?of?young?generation?to?keep,?default?50%?of?the?young?generation?will?not?be?uncommitted) See above about separating young/old. > > ###?2.?Generation?limit > To?limit?the?young/old?generation?separately.?Use?jcmd?or?MXBean?to?enable. I do not understand the reason for those, see above. [...] > > ###?3.?Softmx?mode > Dynamically?to?limit?the?heap?as?a?percentage?of?origin?Xmx. > > Use?jcmd: > > `jcmd?PID?ElasticHeap?softmx_percent=60` > > Use?MXBean: > > `elasticHeapMXBean.setSoftmxPercent(70);` That one sounds good, and actually there is a flag SoftMaxHeapSize already in the VM. Only ZGC implements it though. I think this idea matches the specifications in https://bugs.openjdk.java.net/browse/JDK-8222145 (i.e. as far as I can tell, the softmxpercent is a "soft"/target heap size), so I think this could be implemented under the SoftMaxHeapSize flag. SoftMaxHeapSize is already manageable too, so could be modified already. Only the implementation is missing in G1 :) > > ###?Other?G1ElasticHeap?advanced?options: >>?ElasticHeapMinYoungCommitPercent,?10?\ > ?(Mininum?percentage?of?young?generation) > >>?ElasticHeapYGCIntervalMinMillis,?5000?\ > ?(Mininum?young?GC?interval) > >>?ElasticHeapInitialMarkIntervalMinMillis,?60000?\ > (Mininum?initial?mark?interval) > >>?ElasticHeapEagerMixedGCIntervalMillis,?15000?\ > (Guaranteed?mixed?GC?interval,?to?make?sure?the?mixed?will?happen?in?time?to?uncommit?memory?after?last?mixed?GC) These options seem to be mostly useful for when the allocation rate of the mutator is not high enough to advance the collection cycle. Would that feature provide the requested feature? Maybe it needs some minor improvement, but to me it seems very burdensome to specify so many options... > >>?ElasticHeapOldGenReservePercent,?5?\ > (To?keep?a?mininum?percentage?of?Xmx?for?old?generation?in?the?uncommitment?after?last?mixed?GC) That seems to be related to some strict separation of young/old again. > >>?ElasticHeapPeriodicYGCIntervalCeilingPercent,?25?\ > ElasticHeapPeriodicYGCIntervalFloorPercent,?25?\ > (The?actual?young?GC?interval?will?fluctuate?between?\ > ElasticHeapPeriodicYGCIntervalMillis?*?(100?-?ElasticHeapPeriodicYGCIntervalFloorPercent)?/?100?and?\ > ElasticHeapPeriodicYGCIntervalMillis?*?(100?+?ElasticHeapPeriodicYGCIntervalCeilingPercent)?/?100?) > Thanks, Thomas From shade at redhat.com Wed Oct 9 14:15:06 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 9 Oct 2019 16:15:06 +0200 Subject: RFR (S) 8232051: Epsilon should warn about Xms/Xmx/AlwaysPreTouch configuration Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8232051 This is arguably the UX bug: users expect low latency, but may not be aware that additional configuration is needed for GCs to perform well in those conditions. Epsilon already enables LSM, and should warn about Xms/Xmx/AlwaysPreTouch config too. It cannot adjust these settings, though, because it would affect startup time -- users would have to opt-in. Fix: https://cr.openjdk.java.net/~shade/8232051/webrev.01/ Testing: Linux x86_64 {fastdebug, release} gc/epsilon; jdk-submit (running) -- Thanks, -Aleksey From thomas.schatzl at oracle.com Wed Oct 9 20:32:58 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 09 Oct 2019 22:32:58 +0200 Subject: G1 patch of elastic Java heap In-Reply-To: References: <6270ce59-4a8e-431e-9ccf-f6d2c0f927eb.maoliang.ml@alibaba-inc.com> <1267a5dd2cf6cc1d03df64d07a06ba0f45195951.camel@oracle.com> <3140197d-8cab-4a86-af92-58431c74cb6b.maoliang.ml@alibaba-inc.com> Message-ID: <632f30814d5028cbc957d3ba2c04537aaa21bd41.camel@oracle.com> Hi, On Wed, 2019-10-09 at 16:10 +0200, Thomas Schatzl wrote: > Hi, > > sorry for the late reply. > > First, I have a more general question: lots of changes deal with > providing options to separately change properties generations at > runtime. Like if there were separate pools of young and old gen > memory. > > G1 is kind of built upon the idea that you pass a pause time goal > and then modifies generation sizes and takes memory for the > generations from a single memory pool as needed. > > To me this indicates that automatic sizing is not working correctly, > but there are many(?) use cases where it does not work as expected. > This requires manual tuning in generation sizes for whatever reason. > > Can you share your thoughts about this? There seems to be some bit > of information missing to me - this is probably the reason for some > of the dumb questions about the flags, and me being not too fond of > them. > > On 26.09.19 08:49, Liang Mao wrote: > > > > Hi All, > > > > Here is the user guide of G1ElasticHeap patch. Hope it will help > > to > > understand. > > > > G1ElasticHeap > > G1ElasticHeap is a GC feature to return memory of Java heap to OS t > > o reduce the > > > > memory footprint of Java process. To enable this feature, you need > > to use G1 GC > > > > by options: -XX:+UseG1GC -XX:+G1ElasticHeap. > > > > ## Usage > > There are 3 modes which can be enabled in G1ElasticHeap. > > ### 1. Periodic uncommit > > Memory will be uncommitted by periodic GC. To enable periodic uncom > > mit, use option > > > > -XX:+ElasticHeapPeriodicUncommit or dynamically enable the option v > > ia jinfo: > > > > `jinfo -flag +ElasticHeapPeriodicUncommit PID` > > As far as I can tell, this setting periodically scans the heap for > (too many?) uncommitted regions and, well, uncommits them. > > Not completely sure if that is better than doing periodic gcs - as we > do not expect to gain memory outside of a GC; in JDK12+ (I think) G1 > alwasy uncommits at the remark pause which should give most of the > benefits. > > There *may* be reason to also try to uncommit after the last mixed > GC, but not sure if uncommit is that urgent - to some degree the > existing JEP 346: Promptly return unused committed memory from G1 > (https://openjdk.java.net/jeps/346) should cover some of the use > cases. > I.e. after some delay (and inactivity) there will be another Remark > pause anyway. > > The main reason why Remark has been chosen to uncommit memory is > because we assume that the heap size at Remark (this is what adaptive > IHOP shoots for) is the "target heap size". > > > > Related options: > > > > > ElasticHeapPeriodicYGCIntervalMillis, 15000 \ > > > > (target young GC interval 15 seconds in default) \ > > (eg, if Java runs with MaxNewSize=4g, young GC every 30 seconds, G1 > > ElasticHeap will keep 15s > > GC interval and make a max 2g young generation to uncommit 2g mem > > ory) > > > > > ElasticHeapPeriodicInitialMarkIntervalMillis, 3600000 \ > > > > (Target initial mark interval, 1 hour in default. Unused memory of > > old generation will be uncommitted > > after last mixed GC.) > > This sesm to implement an unconditional concurrent cycle like with > the CMSTriggerInterval flag for CMS. > > Maybe there is a more clever alternative on triggering concurrent > cycles like ZGC does based on the ratio between time spent by the > mutator and the gc. > > > > > > ElasticHeapPeriodicUncommitStartupDelay, 300 \ > > > > (Delay after startup to do memory uncommit, 300 seconds in default) > > > > > ElasticHeapPeriodicMinYoungCommitPercent, 50 \ > > > > (Percentage of young generation to keep, default 50% of the young g > > eneration will not be uncommitted) > > See above about separating young/old. > > > > > ### 2. Generation limit > > To limit the young/old generation separately. Use jcmd or MXBean to > > enable. > > I do not understand the reason for those, see above. > > [...] > > > > ### 3. Softmx mode > > Dynamically to limit the heap as a percentage of origin Xmx. > > > > Use jcmd: > > > > `jcmd PID ElasticHeap softmx_percent=60` > > > > Use MXBean: > > > > `elasticHeapMXBean.setSoftmxPercent(70);` > > That one sounds good, and actually there is a flag SoftMaxHeapSize > already in the VM. Only ZGC implements it though. > > I think this idea matches the specifications in > https://bugs.openjdk.java.net/browse/JDK-8222145 (i.e. as far as I > can tell, the softmxpercent is a "soft"/target heap size), so I think > this could be implemented under the SoftMaxHeapSize flag. > > SoftMaxHeapSize is already manageable too, so could be modified > already. Only the implementation is missing in G1 :) > > > > > ### Other G1ElasticHeap advanced options: > > > ElasticHeapMinYoungCommitPercent, 10 \ > > > > (Mininum percentage of young generation) > > > > > ElasticHeapYGCIntervalMinMillis, 5000 \ > > > > (Mininum young GC interval) > > > > > ElasticHeapInitialMarkIntervalMinMillis, 60000 \ > > > > (Mininum initial mark interval) > > > > > ElasticHeapEagerMixedGCIntervalMillis, 15000 \ > > > > (Guaranteed mixed GC interval, to make sure the mixed will happen i > > n time to uncommit memory after last mixed GC) > > These options seem to be mostly useful for when the allocation rate > of the mutator is not high enough to advance the collection cycle. > > Would the feature provide the requested feature? Maybe it needs > some minor improvement, but to me it seems very burdensome to specify > so many options... > The first sentence got mangled somewhere: A guaranteed concurrent cycle and/or the existing "Promptly return unused memory" feature would imho implicitly provide "guaranteed" advancement in the garbage collection cycle. Starting a particular kind of collection seems to be almost only useful for debugging; also while in jdk11+ triggering a mixed gc is still possible at any time, it may not yield the expected benefit as G1 does not maintain remembered sets all the time - i.e. most of the time there are no old regions with remembered sets around. Maybe the "Promptly return unused memory" feature could be adapted a bit in cases when there is "some but still not significant" activity to not trigger at all to cover such cases. Thanks, Thomas From per.liden at oracle.com Wed Oct 9 21:04:57 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 9 Oct 2019 23:04:57 +0200 Subject: RFR: 8232070: ZGC: Remove unused ZVerifyLoadBarriers Message-ID: <1df387c3-bae2-45b3-7930-1baf56dea03c@oracle.com> After JDK-8230565, we left the develop flag ZVerifyLoadBarriers around, which is no longer used and can be removed. Bug: https://bugs.openjdk.java.net/browse/JDK-8232070 Webrev: http://cr.openjdk.java.net/~pliden/8232070/webrev.0 /Per From kim.barrett at oracle.com Wed Oct 9 21:23:16 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 9 Oct 2019 17:23:16 -0400 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> Message-ID: <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> > On Oct 8, 2019, at 7:48 PM, Kim Barrett wrote: > src/hotspot/share/gc/g1/g1CollectedHeap.cpp > 3874 if (collector_state()->in_initial_mark_gc()) { > 3875 remark_strong_nmethods(per_thread_states); > 3876 } > > I think this additional task and the associated pending strong nmethod > sets in the pss can be eliminated by using a 2-bit tag and a more > complex state machine earlier. I thought about this some more and have some improvements to the previous pseudo-code, including eliminating the loop in strong_processor. More careful consideration of the possible states showed them to be more limited than I'd previously thought they were. I hadn't noticed the benefit from delaying weak_processor's push onto the global list and combining it with the transition to the "weak done" state. States, encoded in the link member of nmethod N: - unclaimed: NULL - weak: N, tag 00 - weak done: NEXT, tag 01 - weak, need strong: N, tag 10 - strong: NEXT, tag 11 where NEXT is the next nmethod in the global list, or N if it is the last entry, e.g. self-loop indicates end of list. weak_processor(n): if n->link != NULL: # already claimed; nothing to do here. return elif not replace_if_null(tagged(n, 0), &n->link): # just claimed by another thread; nothing to do here. return # successfully claimed for weak processing. assert n->link == tagged(n, 0) do_weak_processing(n) # push onto global list. self-loop end of list to avoid tagged NULL. # not pushing onto global list until ready to mark weak processing # done significantly simplifies the set of states. next = xchg(n, &_list_head) if next == NULL: next = n # try to install end of list + weak done tag. if cmpxchg(tagged(next, 1), &n->link, tagged(n, 0)) == tagged(n, 0): return # failed, which means some other thread added strong request. assert n->link == tagged(n, 2) # do deferred strong processing. n->link = tagged(next, 3) do_strong_processing(n) strong_processor(n): raw_next = cmpxchg(tagged(n, 3), &n->link, NULL) if raw_next == NULL: # successfully claimed for strong processing. do_strong_processing(n) # push onto global list. self-loop end of list to avoid tagged NULL. next = xchg(n, &_list_head) if next == NULL: next = n n->link = tagged(next, 3) return # claim failed. figure out why and handle it. next = strip_tag(raw_next) if raw_next == next: # (raw_next - next) == 0 # claim failed because being weak processed (state == "weak"). # try to request deferred strong processing. assert next == tagged(n, 0) raw_next = cmpxchg(tagged(n, 2), &n->link, next) if (raw_next == next): # successfully requested deferred strong processing. return # failed because of a concurrent transition. # no longer in "weak" state. next = strip_tag(raw_next) if (raw_next - next) >= 2: # already claimed for strong processing or requested for such. return # weak processing is complete. # raw_next: tag == 1, NEXT == next list entry or N if cmpxchg(tagged(NEXT, 3), &N->link, raw_next) == raw_next: # claimed "weak done" to "strong". do_strong_processing(N) # if claim failed then some other thread got it. From stefan.johansson at oracle.com Wed Oct 9 21:40:37 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 9 Oct 2019 23:40:37 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: <93592401-FC69-4B7F-95BE-DE9A0F070F3A@oracle.com> Hi Sangheon, Thanks again for a much improved version. Some comments below. > 9 okt. 2019 kl. 06:27 skrev sangheon.kim at oracle.com: > > ... > > Here's the major change list at the webrev. Or arguable list :) > 1) Verification at HRM::allocate_free_region() is removed and it will be added somewhere at safepoint by JDK-8220312 (3/3 which is part of this JEP). Probably at the end of young gc? > 2) Node id printing is changed. Removed old one and added at HeapRegion::print_on() with new column. Node id is only printed when UseNUMA is enabled and gc+heap+region=trace. If there's single active node, it will print the node id and this is intentional. Another approach would be printing only if there are multiple nodes. > 3) If AlwaysPreTouch is enabled, HeapRegion will have actual node index instead of preferred node index. > 4) HeapRegion::_node_index is set at HRM::make_regions_available() as there is the only place initializing HeapRegion. Another approach would be setting the index at HeapRegion::initialize(we have to pollute HR with G1MNM stuff) or conditionally(*) setting the index at HeapRegion::node_index(). (*) if the index is unknown etc.. > 5) G1NUMA class is merged into G1MemoryNodeManager. I saw your comment above about suggestions around this area and I can try out one thought I had, something I think Thomas mentioned as well. Making the non-NUMA case work exactly as a the NUMA case with one node. I?ll need some more time for that, but below are my comments on the current patch. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.3 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.3.inc src/hotspot/os/linux/os_linux.cpp ? 3026 warning("Failed to get numa id at " PTR_FORMAT " with errno=%d", p2i((void*)address), errno); The cast here is no longer needed. ? src/hotspot/share/gc/g1/g1Allocator.hpp ? 44 G1MemoryNodeManager* _mnm; I would prefer a more descriptive name like _memory_node_manager. ? src/hotspot/share/gc/g1/g1CollectedHeap.hpp ? 196 // Manages single or multi node memory. 197 G1MemoryNodeManager* _mem_node_mgr; ... 558 G1MemoryNodeManager* mem_node_mgr() const { return _mem_node_mgr; } As above, I would prefer spelling out the names to memory_node_manager(). ? src/hotspot/share/gc/g1/g1_globals.hpp ? Last line still removed a ?\?, please revert this change. ? src/hotspot/share/gc/g1/heapRegion.cpp ? 462 if (UseNUMA) { 463 const int* node_ids = G1MemoryNodeManager::mgr()->node_ids(); 464 st->print("|Node ID %02d", node_ids[this->node_index()]); 465 } 466 st->print_cr("?); I would prefer having a function that returns the node id given the index. Like the inverse of index_of_node_id(). I also think it would be more informative to say "NUMA id" or "NUMA node?. ? src/hotspot/share/gc/g1/heapRegionManager.cpp ? 195 // Set node index of the given HeapRegion. 196 // If AlwaysPreTouch is enabled, set with actual node index. 197 // If it is disabled, set with preferred node index which is already decided. 198 static void set_heapregion_node_index(HeapRegion* hr) { 199 uint node_index; 200 if(AlwaysPreTouch) { 201 // If we already pretouched, we can check actual node index here. 202 node_index = G1MemoryNodeManager::mgr()->index_of_address(hr->bottom()); 203 } else { 204 node_index = G1MemoryNodeManager::mgr()->preferred_node_index_for_index(hr->hrm_index()); 205 } 206 hr->set_node_index(node_index); 207 } I would prefer to have a helper for calculating the index to set not a helper for setting the index. If you agree, you could move this logic to G1MemoryNodeManager::index_for_region() and then you can change: 233 // Set node index of the heap region after initialization but before inserting 234 // to free list. 235 set_heapregion_node_index(hr); To just: 235 hr->set_node_index(G1MemoryNodeManager::mgr()->index_for_region(hr)); ? 309 bool HeapRegionManager::is_on_preferred_index(uint region_index, uint preferred_node_index) { 310 uint region_node_index = G1MemoryNodeManager::mgr()->preferred_node_index_for_index(region_index); 311 return region_node_index == preferred_node_index; 312 } Indentation on row 311. ? src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp ? 44 static G1MemoryNodeManager* mgr() { return _inst; } I think we should change the name of this getter to manager(), to avoid unnecessary shortenings. ? 57 virtual bool has_multi_nodes() const { return false; } Same as above I would prefer has_multiple_nodes() ? Thanks, Stefan > Testing: hs-tier 1~5, with/without UseNUMA > > Thanks, > Sangheon From sangheon.kim at oracle.com Wed Oct 9 21:41:33 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Wed, 9 Oct 2019 14:41:33 -0700 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Hi Kishor, On 10/4/19 4:15 PM, Kharbas, Kishor wrote: > > Hi Stefan, > > Thanks for the review. Some comments inline. > > New webrev : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00_to_01/ > > http://cr.openjdk.java.net/~kkharbas/8215893/webrev.01/ > > I am reviewing the patch but have a question on top of Stefan's question[1]. Why the bimap mappers are committed? I think all troubles started from 'committing but treating as special here. Couldn't just treat the bitmap mappers as 'special' without commit? If 'not committing' is doable, couldn't simply create ReservedSpace with 'special' enabled (independent to large page setting, which is same to Stefan's comment)? Or add PinnedResevedSpace to force 'special enabled'. [1]: Another thing, can you remind me why we need the bitmaps to be pinned but not other structures such as the card table? +HeterogeneousHeapRegionManager::initialize() ... + // We commit bitmap for all regions during initialization and mark the bitmap space as special. + // This allows regions to be un-committed while concurrent-marking threads are accesing the bitmap concurrently. Thanks, Sangheon > > Hi Kishor, > > > > > > On 04.10.19 03:00, Kharbas, Kishor wrote: > > >> Hi, > > >> When I worked on > JDK-8211425, there > was a request for better abstraction for pinning G1's CM bitmaps. RFE > for the request is here - > JDK-8215893. > > >> > > >> Here is a proposal : > http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00/ > > >> > > >> Here G1PageBasedVirtualSpace pins the entire reserved memory to > memory during construction. The constructor takes an additional bool > flag which says "does it need to pin the memory". > > >> If the memory is pinned, '_special' flag is set to true. I piggy > back on _special flag's behavior which is to not do actual OS > (un-)commits on calls to (un)commit(). > > >> Rest of the changes is the mechanism to pass this flag from CM > bitmaps creation in G1CollectedHeap all the way to > G1PageBasedVirtualSpace. > > >> > > >> Let me know if this is a good abstraction and if there is any > better way. > > >> > > >> Thanks > > >> Kishor > > >> > > > > > > Some comments: > > > > > > - in the parameter lists, if the parameters are already laid out > > > line-by-line, if adding a new one, please put it on a new line as well. > > > > > Fixed in the new webrev. > > > - this code > > > > > >??? if (_special) { > > >????? if (!rs.special()) { > > > commit_internal(addr_to_page_index(_low_boundary), > > > addr_to_page_index(_high_boundary)); > > >????? } > > > > > > in g1PageBasedVirtualSpace looks very incomprehensible.? :) > > > > > > I would prefer (pending the second reviewer's comment) to either use > the > > > "pinned" flag here, or even better, move the necessary commit calls > into > > > the (now removed) HeterogeneousHeapRegionManager::initialize(). > > > > > Made it little more comprehensible. Will see what other reviewers > think about moving it somewhere else. > > > - I would just purely from feeling prefer if the "pinned" flag > parameter > > > would be listed after the "type" parameter in the > G1RegionToSpaceMapper. > > > But that's probably just me. > > > > > I did it this way to logically group the parameters. MemTracker is a > tracker used by the VM everywhere and does not pertain to this class > as such, so I kept it in the end. > > > Also, finally one parameter per line for the declaration/definition of > > > the constructor would improve readability. > > > > > Done. > > Thank you, > > Kishor > > > Thanks, > > >? ??Thomas > From per.liden at oracle.com Thu Oct 10 08:55:45 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 10 Oct 2019 10:55:45 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: References: Message-ID: <0a2eee49-9bb4-a1be-f8fc-b2efcc01fd59@oracle.com> (CC:ing serviceability-dev) On 10/7/19 2:38 PM, Per Liden wrote: > This test is currently disabled for ZGC, but it can easily be enabled by > adjusting the expected log string. ZGC doesn't print "Pause Full", but > it still prints the "(Diagnostic Command)" part. > > Also, the test enables gc=debug logging, which is unnecessary since this > is always printed on the gc=info level. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231943 > Webrev: http://cr.openjdk.java.net/~pliden/8231943/webrev.0 > > Testing: Manually ran test with all GCs (except Epsilon) > > /Per From per.liden at oracle.com Thu Oct 10 09:04:13 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 10 Oct 2019 11:04:13 +0200 Subject: RFR: 8231996: ZGC: Replace ZStatisticsForceTrace with check if JFR event is enabled Message-ID: Remove and replace the diagnostic flag ZStatisticsForceTrace with a check if JFR event is enabled. This flag was introduced as a safety measure back when sending JFR events was problematic in some contexts. This is no longer the case, so we can just let the default.jfc/profile.jfc control when those events should be sent. Bug: https://bugs.openjdk.java.net/browse/JDK-8231996 Webrev: http://cr.openjdk.java.net/~pliden/8231996/webrev.0 /Per From thomas.schatzl at oracle.com Thu Oct 10 09:29:45 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 10 Oct 2019 11:29:45 +0200 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Hi, On 09.10.19 23:41, sangheon.kim at oracle.com wrote: > Hi Kishor, > > On 10/4/19 4:15 PM, Kharbas, Kishor wrote: >> >> Hi Stefan, >> >> Thanks for the review. Some comments inline. >> >> New webrev : >> http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00_to_01/ >> >> http://cr.openjdk.java.net/~kkharbas/8215893/webrev.01/ >> >> > I am reviewing the patch but have a question on top of Stefan's > question[1]. > Why the bimap mappers are committed? I think all troubles started from > 'committing but treating as special here. Couldn't just treat the bitmap > mappers as 'special' without commit? > If 'not committing' is doable, couldn't simply create ReservedSpace with > 'special' enabled (independent to large page setting, which is same to > Stefan's comment)? Or add PinnedResevedSpace to force 'special enabled'. > > [1]: Another thing, can you remind me why we need the bitmaps to be > pinned but not other structures such as the card table? > > +HeterogeneousHeapRegionManager::initialize() ... > > + // We commit bitmap for all regions during initialization and mark the > bitmap space as special. > + // This allows regions to be un-committed while concurrent-marking > threads are accesing the bitmap concurrently. what is the situation where G1 would uncommit parts of the heap while concurrent marking is running? Stale entries in the mark task queues? Regular G1 limits uncommitting of regions (and associated data structures) to after concurrent marking. Note that if never releasing mark bitmaps is really necessary, then never releasing card/offset table is probably required as well. Thanks, Thomas From thomas.schatzl at oracle.com Thu Oct 10 09:48:24 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 10 Oct 2019 11:48:24 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <93592401-FC69-4B7F-95BE-DE9A0F070F3A@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <93592401-FC69-4B7F-95BE-DE9A0F070F3A@oracle.com> Message-ID: Hi, On 09.10.19 23:40, Stefan Johansson wrote: > Hi Sangheon, > > Thanks again for a much improved version. Some comments below. agree, it looks quite nice now. [...] > >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.3 >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.3.inc [...] > > src/hotspot/share/gc/g1/heapRegion.cpp > ? > 462 if (UseNUMA) { > 463 const int* node_ids = G1MemoryNodeManager::mgr()->node_ids(); > 464 st->print("|Node ID %02d", node_ids[this->node_index()]); > 465 } > 466 st->print_cr("?); > > I would prefer having a function that returns the node id given the index. Like the inverse of index_of_node_id(). > > I also think it would be more informative to say "NUMA id" or "NUMA node?. I would also remove the "Node ID" string here as it does not convey any information. Most other columns also do not carry their description. Thanks, Thomas From shade at redhat.com Thu Oct 10 10:53:56 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 10 Oct 2019 12:53:56 +0200 Subject: RFR (S) 8231947: Shenandoah: cleanup ShenandoahHumongousMoves flag treatment Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8231947 Fix: https://cr.openjdk.java.net/~shade/8231947/webrev.02/ This was enabled for a while now. Flag is changed to diagnostic, comment updated, the accessors renamed to make more sense. Testing: hotspot_gc_shenandoah, new test -- Thanks, -Aleksey From leihouyju at gmail.com Thu Oct 10 11:06:15 2019 From: leihouyju at gmail.com (Haoyu Li) Date: Thu, 10 Oct 2019 19:06:15 +0800 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> Message-ID: Hi Stefan, Thanks for your testing! One possible reason for the regressions in simple tests is that the region dependencies maybe not heavy enough. Because the locality of shadow regions is lower than that of heap regions, writing to shadow regions will be slower than to normal regions, and this is a part of the reason why I reuse shadow regions. Therefore, if only a few shadow regions are created and not reused, the overhead may not be amortized. As to the OCA, it is the case that I'm the only person signing the agreement. Please let me know if you have any further questions. Thanks again! Best Regrads, Haoyu Li Stefan Johansson ?2019?10?8??? ??6:49??? > Hi Haoyu, > > I've done some more testing and I haven't seen any issues with the patch > so far and the performance looks promising in most cases. For simple > tests I've seen some regressions, but I'm not really sure why. Will do > some more digging. > > To move forward with this the first thing we need to do is making sure > that you being covered by the Oracle Contributor Agreement is enough. > From what we can see it is only you as an individual that has signed > the OCA and in that case it is important that this statement from the > OCA is fulfilled: "no other person or entity, including my employer, has > or will have rights with respect my contributions" > > Is this the case for this contribution or should we have the university > sign the OCA as well? For more information regarding the OCA please > refer to: > https://www.oracle.com/technetwork/oca-faq-405384.pdf > > Thanks, > Stefan > > On 2019-09-16 16:02, Haoyu Li wrote: > > FYI, the evaluation results on OpenJDK 14 are plotted in the attachment. > > I compute the full GC throughput by dividing the heap size before full > > GC by the GC pause time, and the results are arithmetic mean values of > > ten runs after a warm-up run. The evaluation is conducted on a machine > > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 physical cores > > with SMT enabled) and 64G DRAM. > > > > Best Regrads, > > Haoyu Li, > > Institute of Parallel and Distributed Systems(IPADS), > > School of Software, > > Shanghai Jiao Tong University > > > > > > Stefan Johansson > > ?2019?9?12??? ??5:34??? > > > > Hi Haoyu, > > > > I recently came across your patch and I would like to pick up on > > some of the things Kim mentioned in his mails. I especially want > > evaluate and investigate if this is a technique we can use to > > improve the other GCs as well. To start that work I want to take the > > patch for a spin in our internal performance testing. The patch > > doesn?t apply clean to the latest JDK repository, so if you could > > provide an updated patch that would be very helpful. > > > > It would also be great if you could share some more information > > around the results presented in the paper. For example, it would be > > good to get the full command lines for the different benchmarks so > > we can run them locally and reproduce the results you?ve seen. > > > > Thanks, > > Stefan > > > >> 12 mars 2019 kl. 03:21 skrev Haoyu Li >> >: > >> > >> Hi Kim, > >> > >> Thanks for reviewing and testing the patch. If there are any > >> failures or performance degradation relevant to the work, please > >> let me know and I'll be very happy to keep improving it. Also, any > >> suggestions about code improvements are well appreciated. > >> > >> I'm not quite sure if both G1 and Shenandoah have the similar > >> region dependency issue, since I haven't studied their GC > >> behaviors before. If they have, I'm also willing to propose a more > >> general optimization. > >> > >> As to the memory overhead, I believe it will be low because this > >> patch exploits empty regions in the young space rather than > >> off-heap memory to allocate shadow regions, and also reuses the > >> /_source_region/ field of each /RegionData /to record the > >> correspongding shadow region index. We only introduce a new > >> integer filed /_shadow /in the RegionData class to indicate the > >> status of a region, a global /GrowableArray _free_shadow/ to store > >> the indices of shadow regions, and a global /Monitor/ to protect > >> the array. These information might help if the memory overhead > >> need to be evaluated. > >> > >> Looking forward to your insight. > >> > >> Best Regrads, > >> Haoyu Li, > >> Institute of Parallel and Distributed Systems(IPADS), > >> School of Software, > >> Shanghai Jiao Tong University > >> > >> > >> Kim Barrett >> > ?2019?3?12??? ??6:11??? > >> > >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett > >> > wrote: > >> > > >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li >> > wrote: > >> >> > >> >> Hi Kim, > >> >> > >> >> I have ported my patch to OpenJDK 13 according to your > >> instructions in your last mail, and the patch is attached in > >> this mail. The patch does not change much since PSGC is indeed > >> pretty stable. > >> >> > >> >> Also, I evaluate the correctness and performance of PS full > >> GC with benchmarks from DaCapo, SPECjvm2008, and JOlden suits > >> on a machine with dual Intel Xeon E5-2618L v3 CPUs(16 physical > >> cores), 64G DRAM and linux kernel 4.17. The evaluation result, > >> indicating 1.9X GC throughput improvement on average, is > >> attached, too. > >> >> > >> >> However, I have no idea how to further test this patch for > >> both correctness and performance. Can I please get any > >> guidance from you or some sponsor? > >> > > >> > Sorry I missed that you had sent an updated version of the > >> patch. > >> > > >> > I?ve run the full regression suite across Oracle-supported > >> platforms. There are some > >> > failures, but there are almost always some failures in the > >> later tiers right now. I?ll start > >> > looking at them tomorrow to figure out whether any of them > >> are relevant. > >> > > >> > I?m also planning to run some of our performance benchmarks. > >> > > >> > I?ve lightly skimmed the proposed changes. There might be > >> some code improvements > >> > to be made. > >> > > >> > I?m also wondering if this technique applies to other > >> collectors. It seems like both G1 and > >> > Shenandoah full gc?s might have similar issues? If so, a > >> solution that is ParallelGC-specific > >> > is less interesting than one that has broader > >> applicability. Though maybe this optimization > >> > is less important for G1 and Shenandoah, since they actively > >> try to avoid full gc?s. > >> > > >> > I?m also not clear on how much additional memory might be > >> temporarily allocated by this > >> > mechanism. > >> > >> I?ve created a CR for this: > >> https://bugs.openjdk.java.net/browse/JDK-8220465 > >> > > > From rkennke at redhat.com Thu Oct 10 11:16:55 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 10 Oct 2019 13:16:55 +0200 Subject: RFR (S) 8231947: Shenandoah: cleanup ShenandoahHumongousMoves flag treatment In-Reply-To: References: Message-ID: Yup! Thanks! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8231947 > > Fix: > https://cr.openjdk.java.net/~shade/8231947/webrev.02/ > > This was enabled for a while now. Flag is changed to diagnostic, comment updated, the accessors > renamed to make more sense. > > Testing: hotspot_gc_shenandoah, new test > From shade at redhat.com Thu Oct 10 11:32:28 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 10 Oct 2019 13:32:28 +0200 Subject: RFR (M) 8232102: Shenandoah: print everything in proper units Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8232102 Shenandoah is used on smaller heaps as well as large ones. There, the default unit of "M" makes the logs too coarse and lose information. We have already fixed up some of the uses where it is critical. This issue handles the rest of the cases. This makes more sense after JDK-8217315 is fixed. Note the GC timings/heap-sizes themselves are handled by shared code, see JDK-8232100. Fix: https://cr.openjdk.java.net/~shade/8232102/webrev.01/ Testing: hotspot_gc_shenandoah {fastdebug, release}, eyeballing gc logs -- Thanks, -Aleksey From rkennke at redhat.com Thu Oct 10 11:43:03 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 10 Oct 2019 13:43:03 +0200 Subject: RFR (M) 8232102: Shenandoah: print everything in proper units In-Reply-To: References: Message-ID: Looks good! Thanks! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8232102 > > Shenandoah is used on smaller heaps as well as large ones. There, the default unit of "M" makes the > logs too coarse and lose information. We have already fixed up some of the uses where it is > critical. This issue handles the rest of the cases. This makes more sense after JDK-8217315 is > fixed. Note the GC timings/heap-sizes themselves are handled by shared code, see JDK-8232100. > > Fix: > https://cr.openjdk.java.net/~shade/8232102/webrev.01/ > > Testing: hotspot_gc_shenandoah {fastdebug, release}, eyeballing gc logs > From shade at redhat.com Thu Oct 10 12:03:00 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 10 Oct 2019 14:03:00 +0200 Subject: RFR (S) 8232100: GC timings should use proper units for heap sizes Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8232100 Webrev: https://cr.openjdk.java.net/~shade/8232100/webrev.01/ GC log prints heap sizes in selected GC events. Currently, it unconditionally uses "M" as the suffix for heap sizes, which makes GC logs too coarse on smaller heaps. This loses performance data accuracy, which is sometimes a dealbreaker in logs analysis. Let's make it into proper units. I ran many tests of my own, but would appreciate if somebody runs it through more comprehensive suite of tests, looking for tests that parse the GC logs for whatever reason. Testing: eyeballing GC logs, jdk-submit, hotspot_gc {g1, shenandoah, parallel} -- Thanks, -Aleksey From stefan.johansson at oracle.com Thu Oct 10 12:37:18 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 10 Oct 2019 14:37:18 +0200 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> Message-ID: <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> Hi, On 2019-10-10 13:06, Haoyu Li wrote: > Hi Stefan, > > Thanks for your testing! One possible reason for the regressions in > simple tests is that the region dependencies maybe not heavy enough. > Because the locality of shadow regions is lower than that of heap > regions, writing to shadow regions will be slower than to normal > regions, and this is a part of the reason why I reuse shadow regions. > Therefore, if only a few shadow regions are created and not reused, the > overhead may not be amortized. I guess it is something like this. I thought that for "easy" heaps the shadow regions won't be used at all, and should therefor not really cost anything. > > As to the OCA, it is the case that I'm the only person signing the > agreement. Please let me know if you have any further questions. Thanks > again! Ok, so you are the sole author of the patch. The important part, as the agreement states, is: "no other person or entity, including my employer, has or will have rights with respect my contributions" Is that the case? Thanks, Stefan > > Best Regrads, > Haoyu Li > > Stefan Johansson > ?2019?10?8??? ??6:49??? > > Hi Haoyu, > > I've done some more testing and I haven't seen any issues with the > patch > so far and the performance looks promising in most cases. For simple > tests I've seen some regressions, but I'm not really sure why. Will do > some more digging. > > To move forward with this the first thing we need to do is making sure > that you being covered by the Oracle Contributor Agreement is enough. > ?From what we can see it is only you as an individual that has signed > the OCA and in that case it is important that this statement from the > OCA is fulfilled: "no other person or entity, including my employer, > has > or will have rights with respect my contributions" > > Is this the case for this contribution or should we have the university > sign the OCA as well? For more information regarding the OCA please > refer to: > https://www.oracle.com/technetwork/oca-faq-405384.pdf > > Thanks, > Stefan > > On 2019-09-16 16:02, Haoyu Li wrote: > > FYI, the evaluation results on OpenJDK 14 are plotted in the > attachment. > > I compute the full GC throughput by dividing the heap size before > full > > GC by the GC pause time, and the results are arithmetic mean > values of > > ten runs after a warm-up run.?The evaluation is conducted on a > machine > > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 physical > cores > > with SMT enabled) and 64G DRAM. > > > > Best Regrads, > > Haoyu Li, > > Institute of Parallel and Distributed Systems(IPADS), > > School of Software, > > Shanghai Jiao Tong University > > > > > > Stefan Johansson > > >> ?2019?9?12??? ??5:34 > ??? > > > >? ? ?Hi Haoyu, > > > >? ? ?I recently came across your patch and I would like to pick up on > >? ? ?some of the things Kim mentioned in his mails. I especially want > >? ? ?evaluate and?investigate if this is a technique we can use to > >? ? ?improve the other?GCs as well. To start?that work I want to > take the > >? ? ?patch for a spin in our internal performance testing. The patch > >? ? ?doesn?t apply clean to the latest JDK repository, so if you could > >? ? ?provide an updated patch that would be very helpful. > > > >? ? ?It would also be great if you could share some more information > >? ? ?around the results presented in the paper. For example, it > would be > >? ? ?good to get the full?command lines for the different > benchmarks so > >? ? ?we can run them locally and reproduce the results?you?ve?seen. > > > >? ? ?Thanks, > >? ? ?Stefan > > > >>? ? ?12 mars 2019 kl. 03:21 skrev Haoyu Li > >>? ? ?>>: > >> > >>? ? ?Hi Kim, > >> > >>? ? ?Thanks for reviewing and testing the patch. If there are any > >>? ? ?failures or performance degradation relevant to the work, please > >>? ? ?let me know and I'll be very happy to keep improving it. > Also, any > >>? ? ?suggestions about code improvements are well appreciated. > >> > >>? ? ?I'm not quite sure if both G1 and Shenandoah have the similar > >>? ? ?region dependency issue, since I haven't studied their GC > >>? ? ?behaviors before. If they have, I'm also willing to propose > a more > >>? ? ?general optimization. > >> > >>? ? ?As to the memory overhead, I believe it will be low because this > >>? ? ?patch exploits empty regions in the young space rather than > >>? ? ?off-heap memory to allocate shadow regions, and also reuses the > >>? ? ?/_source_region/ field of each /RegionData /to record the > >>? ? ?correspongding shadow region index. We only introduce a new > >>? ? ?integer filed /_shadow /in the RegionData class to indicate the > >>? ? ?status of a region, a global /GrowableArray _free_shadow/?to > store > >>? ? ?the indices of shadow regions, and a global /Monitor/?to protect > >>? ? ?the array. These information might help if the memory overhead > >>? ? ?need to be evaluated. > >> > >>? ? ?Looking forward to your insight. > >> > >>? ? ?Best Regrads, > >>? ? ?Haoyu Li, > >>? ? ?Institute of Parallel and Distributed Systems(IPADS), > >>? ? ?School of Software, > >>? ? ?Shanghai Jiao Tong University > >> > >> > >>? ? ?Kim Barrett > >>? ? ? >> ?2019?3?12??? ??6:11??? > >> > >>? ? ? ? ?> On Mar 11, 2019, at 1:45 AM, Kim Barrett > >>? ? ? ? ? > >> wrote: > >>? ? ? ? ?> > >>? ? ? ? ?>> On Jan 24, 2019, at 3:58 AM, Haoyu Li > > >>? ? ? ? ? >> wrote: > >>? ? ? ? ?>> > >>? ? ? ? ?>> Hi Kim, > >>? ? ? ? ?>> > >>? ? ? ? ?>> I have ported my patch to OpenJDK 13 according to your > >>? ? ? ? ?instructions in your last mail, and the patch is attached in > >>? ? ? ? ?this mail. The patch does not change much since PSGC is > indeed > >>? ? ? ? ?pretty stable. > >>? ? ? ? ?>> > >>? ? ? ? ?>> Also, I evaluate the correctness and performance of > PS full > >>? ? ? ? ?GC with benchmarks from DaCapo, SPECjvm2008, and JOlden > suits > >>? ? ? ? ?on a machine with dual Intel Xeon E5-2618L v3 CPUs(16 > physical > >>? ? ? ? ?cores), 64G DRAM and linux kernel 4.17. The evaluation > result, > >>? ? ? ? ?indicating 1.9X GC throughput improvement on average, is > >>? ? ? ? ?attached, too. > >>? ? ? ? ?>> > >>? ? ? ? ?>> However, I have no idea how to further test this > patch for > >>? ? ? ? ?both correctness and performance. Can I please get any > >>? ? ? ? ?guidance from you or some sponsor? > >>? ? ? ? ?> > >>? ? ? ? ?> Sorry I missed that you had sent an updated version of the > >>? ? ? ? ?patch. > >>? ? ? ? ?> > >>? ? ? ? ?> I?ve run the full regression suite across Oracle-supported > >>? ? ? ? ?platforms.? There are some > >>? ? ? ? ?> failures, but there are almost always some failures in the > >>? ? ? ? ?later tiers right now.? I?ll start > >>? ? ? ? ?> looking at them tomorrow to figure out whether any of them > >>? ? ? ? ?are relevant. > >>? ? ? ? ?> > >>? ? ? ? ?> I?m also planning to run some of our performance > benchmarks. > >>? ? ? ? ?> > >>? ? ? ? ?> I?ve lightly skimmed the proposed changes.? There might be > >>? ? ? ? ?some code improvements > >>? ? ? ? ?> to be made. > >>? ? ? ? ?> > >>? ? ? ? ?> I?m also wondering if this technique applies to other > >>? ? ? ? ?collectors.? It seems like both G1 and > >>? ? ? ? ?> Shenandoah full gc?s might have similar issues?? If so, a > >>? ? ? ? ?solution that is ParallelGC-specific > >>? ? ? ? ?> is less interesting than one that has broader > >>? ? ? ? ?applicability.? Though maybe this optimization > >>? ? ? ? ?> is less important for G1 and Shenandoah, since they > actively > >>? ? ? ? ?try to avoid full gc?s. > >>? ? ? ? ?> > >>? ? ? ? ?> I?m also not clear on how much additional memory might be > >>? ? ? ? ?temporarily allocated by this > >>? ? ? ? ?> mechanism. > >> > >>? ? ? ? ?I?ve created a CR for this: > >> https://bugs.openjdk.java.net/browse/JDK-8220465 > >> > > > From leihouyju at gmail.com Thu Oct 10 13:10:52 2019 From: leihouyju at gmail.com (Haoyu Li) Date: Thu, 10 Oct 2019 21:10:52 +0800 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> Message-ID: Hi Stefan, Thanks for your quick response! As to your concern about the OCA, I am the sole author of the patch. And it is the case as what the agreement states. Best Regrads, Haoyu Li, Stefan Johansson ?2019?10?10??? ??8:37??? > Hi, > > On 2019-10-10 13:06, Haoyu Li wrote: > > Hi Stefan, > > > > Thanks for your testing! One possible reason for the regressions in > > simple tests is that the region dependencies maybe not heavy enough. > > Because the locality of shadow regions is lower than that of heap > > regions, writing to shadow regions will be slower than to normal > > regions, and this is a part of the reason why I reuse shadow regions. > > Therefore, if only a few shadow regions are created and not reused, the > > overhead may not be amortized. > > I guess it is something like this. I thought that for "easy" heaps the > shadow regions won't be used at all, and should therefor not really cost > anything. > > > > > As to the OCA, it is the case that I'm the only person signing the > > agreement. Please let me know if you have any further questions. Thanks > > again! > > Ok, so you are the sole author of the patch. The important part, as the > agreement states, is: > "no other person or entity, including my employer, has or will have > rights with respect my contributions" > > Is that the case? > > Thanks, > Stefan > > > > > Best Regrads, > > Haoyu Li > > > > Stefan Johansson > > ?2019?10?8??? ??6:49??? > > > > Hi Haoyu, > > > > I've done some more testing and I haven't seen any issues with the > > patch > > so far and the performance looks promising in most cases. For simple > > tests I've seen some regressions, but I'm not really sure why. Will > do > > some more digging. > > > > To move forward with this the first thing we need to do is making > sure > > that you being covered by the Oracle Contributor Agreement is enough. > > From what we can see it is only you as an individual that has > signed > > the OCA and in that case it is important that this statement from the > > OCA is fulfilled: "no other person or entity, including my employer, > > has > > or will have rights with respect my contributions" > > > > Is this the case for this contribution or should we have the > university > > sign the OCA as well? For more information regarding the OCA please > > refer to: > > https://www.oracle.com/technetwork/oca-faq-405384.pdf > > > > Thanks, > > Stefan > > > > On 2019-09-16 16:02, Haoyu Li wrote: > > > FYI, the evaluation results on OpenJDK 14 are plotted in the > > attachment. > > > I compute the full GC throughput by dividing the heap size before > > full > > > GC by the GC pause time, and the results are arithmetic mean > > values of > > > ten runs after a warm-up run. The evaluation is conducted on a > > machine > > > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 physical > > cores > > > with SMT enabled) and 64G DRAM. > > > > > > Best Regrads, > > > Haoyu Li, > > > Institute of Parallel and Distributed Systems(IPADS), > > > School of Software, > > > Shanghai Jiao Tong University > > > > > > > > > Stefan Johansson > > > > > >> ?2019?9?12??? ??5:34 > > ??? > > > > > > Hi Haoyu, > > > > > > I recently came across your patch and I would like to pick up > on > > > some of the things Kim mentioned in his mails. I especially > want > > > evaluate and investigate if this is a technique we can use to > > > improve the other GCs as well. To start that work I want to > > take the > > > patch for a spin in our internal performance testing. The > patch > > > doesn?t apply clean to the latest JDK repository, so if you > could > > > provide an updated patch that would be very helpful. > > > > > > It would also be great if you could share some more > information > > > around the results presented in the paper. For example, it > > would be > > > good to get the full command lines for the different > > benchmarks so > > > we can run them locally and reproduce the results you?ve seen. > > > > > > Thanks, > > > Stefan > > > > > >> 12 mars 2019 kl. 03:21 skrev Haoyu Li > > > >> >>: > > >> > > >> Hi Kim, > > >> > > >> Thanks for reviewing and testing the patch. If there are any > > >> failures or performance degradation relevant to the work, > please > > >> let me know and I'll be very happy to keep improving it. > > Also, any > > >> suggestions about code improvements are well appreciated. > > >> > > >> I'm not quite sure if both G1 and Shenandoah have the similar > > >> region dependency issue, since I haven't studied their GC > > >> behaviors before. If they have, I'm also willing to propose > > a more > > >> general optimization. > > >> > > >> As to the memory overhead, I believe it will be low because > this > > >> patch exploits empty regions in the young space rather than > > >> off-heap memory to allocate shadow regions, and also reuses > the > > >> /_source_region/ field of each /RegionData /to record the > > >> correspongding shadow region index. We only introduce a new > > >> integer filed /_shadow /in the RegionData class to indicate > the > > >> status of a region, a global /GrowableArray _free_shadow/ to > > store > > >> the indices of shadow regions, and a global /Monitor/ to > protect > > >> the array. These information might help if the memory > overhead > > >> need to be evaluated. > > >> > > >> Looking forward to your insight. > > >> > > >> Best Regrads, > > >> Haoyu Li, > > >> Institute of Parallel and Distributed Systems(IPADS), > > >> School of Software, > > >> Shanghai Jiao Tong University > > >> > > >> > > >> Kim Barrett > > > >> > >> ?2019?3?12??? ??6:11??? > > >> > > >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett > > >> > > >> > wrote: > > >> > > > >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li > > > > >> > >> wrote: > > >> >> > > >> >> Hi Kim, > > >> >> > > >> >> I have ported my patch to OpenJDK 13 according to your > > >> instructions in your last mail, and the patch is > attached in > > >> this mail. The patch does not change much since PSGC is > > indeed > > >> pretty stable. > > >> >> > > >> >> Also, I evaluate the correctness and performance of > > PS full > > >> GC with benchmarks from DaCapo, SPECjvm2008, and JOlden > > suits > > >> on a machine with dual Intel Xeon E5-2618L v3 CPUs(16 > > physical > > >> cores), 64G DRAM and linux kernel 4.17. The evaluation > > result, > > >> indicating 1.9X GC throughput improvement on average, is > > >> attached, too. > > >> >> > > >> >> However, I have no idea how to further test this > > patch for > > >> both correctness and performance. Can I please get any > > >> guidance from you or some sponsor? > > >> > > > >> > Sorry I missed that you had sent an updated version of > the > > >> patch. > > >> > > > >> > I?ve run the full regression suite across > Oracle-supported > > >> platforms. There are some > > >> > failures, but there are almost always some failures in > the > > >> later tiers right now. I?ll start > > >> > looking at them tomorrow to figure out whether any of > them > > >> are relevant. > > >> > > > >> > I?m also planning to run some of our performance > > benchmarks. > > >> > > > >> > I?ve lightly skimmed the proposed changes. There > might be > > >> some code improvements > > >> > to be made. > > >> > > > >> > I?m also wondering if this technique applies to other > > >> collectors. It seems like both G1 and > > >> > Shenandoah full gc?s might have similar issues? If > so, a > > >> solution that is ParallelGC-specific > > >> > is less interesting than one that has broader > > >> applicability. Though maybe this optimization > > >> > is less important for G1 and Shenandoah, since they > > actively > > >> try to avoid full gc?s. > > >> > > > >> > I?m also not clear on how much additional memory might > be > > >> temporarily allocated by this > > >> > mechanism. > > >> > > >> I?ve created a CR for this: > > >> https://bugs.openjdk.java.net/browse/JDK-8220465 > > >> > > > > > > From maoliang.ml at alibaba-inc.com Thu Oct 10 13:48:42 2019 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Thu, 10 Oct 2019 21:48:42 +0800 Subject: =?UTF-8?B?UmU6IEcxIHBhdGNoIG9mIGVsYXN0aWMgSmF2YSBoZWFw?= In-Reply-To: References: <6270ce59-4a8e-431e-9ccf-f6d2c0f927eb.maoliang.ml@alibaba-inc.com> <1267a5dd2cf6cc1d03df64d07a06ba0f45195951.camel@oracle.com> <3140197d-8cab-4a86-af92-58431c74cb6b.maoliang.ml@alibaba-inc.com> , Message-ID: <6cc5bdd7-c076-472f-8a36-8294c6cbfe21.maoliang.ml@alibaba-inc.com> Hi Thomas, Thank you for the feedback. You are right about some points that the present code seems to separate the heap into young and old gen pools. In OpenJDK8, there's no adaptive-ihop so fixed ihop and MaxNewSize can clearly separate young gen and old gen. I'm also thinking about how to design it better in upstream of OpenJDK G1. There is a tradeoff between memory and GC frequency. More frequent GC uses less memory. We found our online service applications keep large young generation for potential query traffic but most of time the young GC frequency is quite low. Memory can be easily saved by using smaller young gen. In Shenandoah or ZGC, there is only 1 generation and it's straightforward to determine if memory is wasted and can be returned. G1 has 2 generations, in remark phase MinHeapFreeRatio/MaxHeapFreeRatio cannot tell the young generation is rather wasted for running 2 minutes without a young GC and we can return a lot of memory. Each generation's GC interval or time ratio spent on mutator/gc you mentioned seems more intuitive. The explicit limitation of generation may not be a good design from G1 GC's perspective. From the operation's point of view, it is easy for manipulating JVM. There is a simple relationship: larger network traffic -> higher memory allocation rate -> larger young generation. So cluster operation can easily set the young generation as 10% of max young gen size to every Java instance if the network traffic is guanranteed to be below 10% for a period of time. I'm not sticking to the current implementation to create clear boundary between young and old gen, especially for newer OpenJDK versions and I've been thinking of unifying the 2 generations' resizing within the single memory pool of heap along with Xms. The periodic uncommit mode does not strickly separate the young/old gen. Current implementation calculates the average GC interval and keep it in a certain range between a low bound and high bound and will immediately trigger an expansion if a single GC interval smaller than a threshould. We can use a similar policy to estimate a target young generation capacity and adjust the capacity of old generation after a concurrent cycle. The 2 parts together can be the target heap capacity. The capacity can vary between Xms and Xmx. The difference with current G1 is it can be resized in a young GC not only remark. In order to do swift heap resizing we have to conquer the over head of memory request/release from OS. The memory unmap and map(including the page fault) cost significant time. So we use an intuitive way to have a concurrent thread to do the map/unmap/pretouch. The free regions will be synchronized in GC pause. In our applications, a typical G1 remark cost ~100ms of pause. I haven't tested latest G1 but based on our experimental data, the pause can be easily doubled if done considerable map/unmaps. All of above are our thoughts and the present implementation is kind of reference. Please let me know if I answered all your questions. Hope we can come to an agreement in some points and conceive a good design in latest G1 GC :) Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2019 Oct. 9 (Wed.) 22:12 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: G1 patch of elastic Java heap Hi, sorry for the late reply. First, I have a more general question: lots of changes deal with providing options to separately change properties generations at runtime. Like if there were separate pools of young and old gen memory. G1 is kind of built upon the idea that you pass a pause time goal and then modifies generation sizes and takes memory for the generations from a single memory pool as needed. To me this indicates that automatic sizing is not working correctly, but there are many(?) use cases where it does not work as expected. This requires manual tuning in generation sizes for whatever reason. Can you share your thoughts about this? There seems to be some bit of information missing to me - this is probably the reason for some of the dumb questions about the flags, and me being not too fond of them. On 26.09.19 08:49, Liang Mao wrote: > > Hi All, > > Here is the user guide of G1ElasticHeap patch. Hope it will help to > understand. > > G1ElasticHeap > G1ElasticHeap is a GC feature to return memory of Java heap to OS to reduce the > > memory footprint of Java process. To enable this feature, you need to use G1 GC > > by options: -XX:+UseG1GC -XX:+G1ElasticHeap. > > ## Usage > There are 3 modes which can be enabled in G1ElasticHeap. > ### 1. Periodic uncommit > Memory will be uncommitted by periodic GC. To enable periodic uncommit, use option > > -XX:+ElasticHeapPeriodicUncommit or dynamically enable the option via jinfo: > > `jinfo -flag +ElasticHeapPeriodicUncommit PID` As far as I can tell, this setting periodically scans the heap for (too many?) uncommitted regions and, well, uncommits them. Not completely sure if that is better than doing periodic gcs - as we do not expect to gain memory outside of a GC; in JDK12+ (I think) G1 alwasy uncommits at the remark pause which should give most of the benefits. There *may* be reason to also try to uncommit after the last mixed GC, but not sure if uncommit is that urgent - to some degree the existing JEP 346: Promptly return unused committed memory from G1 (https://openjdk.java.net/jeps/346) should cover some of the use cases. I.e. after some delay (and inactivity) there will be another Remark pause anyway. The main reason why Remark has been chosen to uncommit memory is because we assume that the heap size at Remark (this is what adaptive IHOP shoots for) is the "target heap size". > Related options: > >> ElasticHeapPeriodicYGCIntervalMillis, 15000 \ > (target young GC interval 15 seconds in default) \ > (eg, if Java runs with MaxNewSize=4g, young GC every 30 seconds, G1ElasticHeap will keep 15s > GC interval and make a max 2g young generation to uncommit 2g memory) > >> ElasticHeapPeriodicInitialMarkIntervalMillis, 3600000 \ > (Target initial mark interval, 1 hour in default. Unused memory of old generation will be uncommitted > after last mixed GC.) This sesm to implement an unconditional concurrent cycle like with the CMSTriggerInterval flag for CMS. Maybe there is a more clever alternative on triggering concurrent cycles like ZGC does based on the ratio between time spent by the mutator and the gc. > >> ElasticHeapPeriodicUncommitStartupDelay, 300 \ > (Delay after startup to do memory uncommit, 300 seconds in default) > >> ElasticHeapPeriodicMinYoungCommitPercent, 50 \ > (Percentage of young generation to keep, default 50% of the young generation will not be uncommitted) See above about separating young/old. > > ### 2. Generation limit > To limit the young/old generation separately. Use jcmd or MXBean to enable. I do not understand the reason for those, see above. [...] > > ### 3. Softmx mode > Dynamically to limit the heap as a percentage of origin Xmx. > > Use jcmd: > > `jcmd PID ElasticHeap softmx_percent=60` > > Use MXBean: > > `elasticHeapMXBean.setSoftmxPercent(70);` That one sounds good, and actually there is a flag SoftMaxHeapSize already in the VM. Only ZGC implements it though. I think this idea matches the specifications in https://bugs.openjdk.java.net/browse/JDK-8222145 (i.e. as far as I can tell, the softmxpercent is a "soft"/target heap size), so I think this could be implemented under the SoftMaxHeapSize flag. SoftMaxHeapSize is already manageable too, so could be modified already. Only the implementation is missing in G1 :) > > ### Other G1ElasticHeap advanced options: >> ElasticHeapMinYoungCommitPercent, 10 \ > (Mininum percentage of young generation) > >> ElasticHeapYGCIntervalMinMillis, 5000 \ > (Mininum young GC interval) > >> ElasticHeapInitialMarkIntervalMinMillis, 60000 \ > (Mininum initial mark interval) > >> ElasticHeapEagerMixedGCIntervalMillis, 15000 \ > (Guaranteed mixed GC interval, to make sure the mixed will happen in time to uncommit memory after last mixed GC) These options seem to be mostly useful for when the allocation rate of the mutator is not high enough to advance the collection cycle. Would that feature provide the requested feature? Maybe it needs some minor improvement, but to me it seems very burdensome to specify so many options... > >> ElasticHeapOldGenReservePercent, 5 \ > (To keep a mininum percentage of Xmx for old generation in the uncommitment after last mixed GC) That seems to be related to some strict separation of young/old again. > >> ElasticHeapPeriodicYGCIntervalCeilingPercent, 25 \ > ElasticHeapPeriodicYGCIntervalFloorPercent, 25 \ > (The actual young GC interval will fluctuate between \ > ElasticHeapPeriodicYGCIntervalMillis * (100 - ElasticHeapPeriodicYGCIntervalFloorPercent) / 100 and \ > ElasticHeapPeriodicYGCIntervalMillis * (100 + ElasticHeapPeriodicYGCIntervalCeilingPercent) / 100 ) > Thanks, Thomas From stefan.johansson at oracle.com Thu Oct 10 13:50:56 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 10 Oct 2019 15:50:56 +0200 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> Message-ID: <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> Thanks for the clarification =) Moving on to the next part, the code in the patch. So this won't be a full review of the patch but just an initial comment that I would like to be addressed first. The new function PSParallelCompact::fill_shadow_region() is more or less a copy of PSParallelCompact::fill_region() and I understand that from a proof of concept point of view it was the easy (and right) way to do it. I would prefer if the code could be refactored so that fill_region() and fill_shadow_region() share more code. There might be reasons that I've missed, that prevents it, but we should at least explore how much code can be shared. Thanks, Stefan On 2019-10-10 15:10, Haoyu Li wrote: > Hi Stefan, > > Thanks for your quick response! As to your concern about the OCA, I am > the sole author of the patch. And it is the case as what the agreement > states. > Best Regrads, > Haoyu Li, > > > Stefan Johansson > ?2019?10?10??? ??8:37??? > > Hi, > > On 2019-10-10 13:06, Haoyu Li wrote: > > Hi Stefan, > > > > Thanks for your testing! One possible reason for the regressions in > > simple tests is that the region dependencies maybe not heavy enough. > > Because the locality of shadow regions is lower than that of heap > > regions, writing to shadow regions will be slower than to normal > > regions, and this is a part of the reason why I reuse shadow > regions. > > Therefore, if only a few shadow regions are created and not > reused, the > > overhead may not be amortized. > > I guess it is something like this. I thought that for "easy" heaps the > shadow regions won't be used at all, and should therefor not really > cost > anything. > > > > > As to the OCA, it is the case that I'm the only person signing the > > agreement. Please let me know if you have any further questions. > Thanks > > again! > > Ok, so you are the sole author of the patch. The important part, as the > agreement states, is: > "no other person or entity, including my employer, has or will have > rights with respect my contributions" > > Is that the case? > > Thanks, > Stefan > > > > > Best Regrads, > > Haoyu Li > > > > Stefan Johansson > > >> ?2019?10?8??? ??6:49 > ??? > > > >? ? ?Hi Haoyu, > > > >? ? ?I've done some more testing and I haven't seen any issues > with the > >? ? ?patch > >? ? ?so far and the performance looks promising in most cases. For > simple > >? ? ?tests I've seen some regressions, but I'm not really sure > why. Will do > >? ? ?some more digging. > > > >? ? ?To move forward with this the first thing we need to do is > making sure > >? ? ?that you being covered by the Oracle Contributor Agreement is > enough. > >? ? ? ?From what we can see it is only you as an individual that > has signed > >? ? ?the OCA and in that case it is important that this statement > from the > >? ? ?OCA is fulfilled: "no other person or entity, including my > employer, > >? ? ?has > >? ? ?or will have rights with respect my contributions" > > > >? ? ?Is this the case for this contribution or should we have the > university > >? ? ?sign the OCA as well? For more information regarding the OCA > please > >? ? ?refer to: > > https://www.oracle.com/technetwork/oca-faq-405384.pdf > > > >? ? ?Thanks, > >? ? ?Stefan > > > >? ? ?On 2019-09-16 16:02, Haoyu Li wrote: > >? ? ? > FYI, the evaluation results on OpenJDK 14 are plotted in the > >? ? ?attachment. > >? ? ? > I compute the full GC throughput by dividing the heap size > before > >? ? ?full > >? ? ? > GC by the GC pause time, and the results are arithmetic mean > >? ? ?values of > >? ? ? > ten runs after a warm-up run.?The evaluation is conducted on a > >? ? ?machine > >? ? ? > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 > physical > >? ? ?cores > >? ? ? > with SMT enabled) and 64G DRAM. > >? ? ? > > >? ? ? > Best Regrads, > >? ? ? > Haoyu Li, > >? ? ? > Institute of Parallel and Distributed Systems(IPADS), > >? ? ? > School of Software, > >? ? ? > Shanghai Jiao Tong University > >? ? ? > > >? ? ? > > >? ? ? > Stefan Johansson > >? ? ? > > >? ? ? > > >? ? ? >>> ?2019?9?12??? ??5:34 > >? ? ???? > >? ? ? > > >? ? ? >? ? ?Hi Haoyu, > >? ? ? > > >? ? ? >? ? ?I recently came across your patch and I would like to > pick up on > >? ? ? >? ? ?some of the things Kim mentioned in his mails. I > especially want > >? ? ? >? ? ?evaluate and?investigate if this is a technique we can > use to > >? ? ? >? ? ?improve the other?GCs as well. To start?that work I > want to > >? ? ?take the > >? ? ? >? ? ?patch for a spin in our internal performance testing. > The patch > >? ? ? >? ? ?doesn?t apply clean to the latest JDK repository, so > if you could > >? ? ? >? ? ?provide an updated patch that would be very helpful. > >? ? ? > > >? ? ? >? ? ?It would also be great if you could share some more > information > >? ? ? >? ? ?around the results presented in the paper. For example, it > >? ? ?would be > >? ? ? >? ? ?good to get the full?command lines for the different > >? ? ?benchmarks so > >? ? ? >? ? ?we can run them locally and reproduce the > results?you?ve?seen. > >? ? ? > > >? ? ? >? ? ?Thanks, > >? ? ? >? ? ?Stefan > >? ? ? > > >? ? ? >>? ? ?12 mars 2019 kl. 03:21 skrev Haoyu Li > > >? ? ?> > >? ? ? >>? ? ? >>>: > >? ? ? >> > >? ? ? >>? ? ?Hi Kim, > >? ? ? >> > >? ? ? >>? ? ?Thanks for reviewing and testing the patch. If there > are any > >? ? ? >>? ? ?failures or performance degradation relevant to the > work, please > >? ? ? >>? ? ?let me know and I'll be very happy to keep improving it. > >? ? ?Also, any > >? ? ? >>? ? ?suggestions about code improvements are well appreciated. > >? ? ? >> > >? ? ? >>? ? ?I'm not quite sure if both G1 and Shenandoah have the > similar > >? ? ? >>? ? ?region dependency issue, since I haven't studied their GC > >? ? ? >>? ? ?behaviors before. If they have, I'm also willing to > propose > >? ? ?a more > >? ? ? >>? ? ?general optimization. > >? ? ? >> > >? ? ? >>? ? ?As to the memory overhead, I believe it will be low > because this > >? ? ? >>? ? ?patch exploits empty regions in the young space > rather than > >? ? ? >>? ? ?off-heap memory to allocate shadow regions, and also > reuses the > >? ? ? >>? ? ?/_source_region/ field of each /RegionData /to record the > >? ? ? >>? ? ?correspongding shadow region index. We only introduce > a new > >? ? ? >>? ? ?integer filed /_shadow /in the RegionData class to > indicate the > >? ? ? >>? ? ?status of a region, a global /GrowableArray > _free_shadow/?to > >? ? ?store > >? ? ? >>? ? ?the indices of shadow regions, and a global > /Monitor/?to protect > >? ? ? >>? ? ?the array. These information might help if the memory > overhead > >? ? ? >>? ? ?need to be evaluated. > >? ? ? >> > >? ? ? >>? ? ?Looking forward to your insight. > >? ? ? >> > >? ? ? >>? ? ?Best Regrads, > >? ? ? >>? ? ?Haoyu Li, > >? ? ? >>? ? ?Institute of Parallel and Distributed Systems(IPADS), > >? ? ? >>? ? ?School of Software, > >? ? ? >>? ? ?Shanghai Jiao Tong University > >? ? ? >> > >? ? ? >> > >? ? ? >>? ? ?Kim Barrett > >? ? ?> > >? ? ? >>? ? ? > >? ? ? >>> ?2019?3?12??? ??6:11??? > >? ? ? >> > >? ? ? >>? ? ? ? ?> On Mar 11, 2019, at 1:45 AM, Kim Barrett > >? ? ? >>? ? ? ? ? > > >? ? ? >>> wrote: > >? ? ? >>? ? ? ? ?> > >? ? ? >>? ? ? ? ?>> On Jan 24, 2019, at 3:58 AM, Haoyu Li > >? ? ? > > > >? ? ? >>? ? ? ? ? > >? ? ?>>> > wrote: > >? ? ? >>? ? ? ? ?>> > >? ? ? >>? ? ? ? ?>> Hi Kim, > >? ? ? >>? ? ? ? ?>> > >? ? ? >>? ? ? ? ?>> I have ported my patch to OpenJDK 13 according > to your > >? ? ? >>? ? ? ? ?instructions in your last mail, and the patch is > attached in > >? ? ? >>? ? ? ? ?this mail. The patch does not change much since > PSGC is > >? ? ?indeed > >? ? ? >>? ? ? ? ?pretty stable. > >? ? ? >>? ? ? ? ?>> > >? ? ? >>? ? ? ? ?>> Also, I evaluate the correctness and > performance of > >? ? ?PS full > >? ? ? >>? ? ? ? ?GC with benchmarks from DaCapo, SPECjvm2008, and > JOlden > >? ? ?suits > >? ? ? >>? ? ? ? ?on a machine with dual Intel Xeon E5-2618L v3 CPUs(16 > >? ? ?physical > >? ? ? >>? ? ? ? ?cores), 64G DRAM and linux kernel 4.17. The > evaluation > >? ? ?result, > >? ? ? >>? ? ? ? ?indicating 1.9X GC throughput improvement on > average, is > >? ? ? >>? ? ? ? ?attached, too. > >? ? ? >>? ? ? ? ?>> > >? ? ? >>? ? ? ? ?>> However, I have no idea how to further test this > >? ? ?patch for > >? ? ? >>? ? ? ? ?both correctness and performance. Can I please > get any > >? ? ? >>? ? ? ? ?guidance from you or some sponsor? > >? ? ? >>? ? ? ? ?> > >? ? ? >>? ? ? ? ?> Sorry I missed that you had sent an updated > version of the > >? ? ? >>? ? ? ? ?patch. > >? ? ? >>? ? ? ? ?> > >? ? ? >>? ? ? ? ?> I?ve run the full regression suite across > Oracle-supported > >? ? ? >>? ? ? ? ?platforms.? There are some > >? ? ? >>? ? ? ? ?> failures, but there are almost always some > failures in the > >? ? ? >>? ? ? ? ?later tiers right now.? I?ll start > >? ? ? >>? ? ? ? ?> looking at them tomorrow to figure out whether > any of them > >? ? ? >>? ? ? ? ?are relevant. > >? ? ? >>? ? ? ? ?> > >? ? ? >>? ? ? ? ?> I?m also planning to run some of our performance > >? ? ?benchmarks. > >? ? ? >>? ? ? ? ?> > >? ? ? >>? ? ? ? ?> I?ve lightly skimmed the proposed changes. > There might be > >? ? ? >>? ? ? ? ?some code improvements > >? ? ? >>? ? ? ? ?> to be made. > >? ? ? >>? ? ? ? ?> > >? ? ? >>? ? ? ? ?> I?m also wondering if this technique applies to > other > >? ? ? >>? ? ? ? ?collectors.? It seems like both G1 and > >? ? ? >>? ? ? ? ?> Shenandoah full gc?s might have similar > issues?? If so, a > >? ? ? >>? ? ? ? ?solution that is ParallelGC-specific > >? ? ? >>? ? ? ? ?> is less interesting than one that has broader > >? ? ? >>? ? ? ? ?applicability.? Though maybe this optimization > >? ? ? >>? ? ? ? ?> is less important for G1 and Shenandoah, since they > >? ? ?actively > >? ? ? >>? ? ? ? ?try to avoid full gc?s. > >? ? ? >>? ? ? ? ?> > >? ? ? >>? ? ? ? ?> I?m also not clear on how much additional > memory might be > >? ? ? >>? ? ? ? ?temporarily allocated by this > >? ? ? >>? ? ? ? ?> mechanism. > >? ? ? >> > >? ? ? >>? ? ? ? ?I?ve created a CR for this: > >? ? ? >> https://bugs.openjdk.java.net/browse/JDK-8220465 > >? ? ? >> > >? ? ? > > > > From thomas.schatzl at oracle.com Thu Oct 10 15:23:25 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 10 Oct 2019 17:23:25 +0200 Subject: RFR: 8232070: ZGC: Remove unused ZVerifyLoadBarriers In-Reply-To: <1df387c3-bae2-45b3-7930-1baf56dea03c@oracle.com> References: <1df387c3-bae2-45b3-7930-1baf56dea03c@oracle.com> Message-ID: <0c0a91af-9be1-d879-bf60-fd3cf0888823@oracle.com> Hi, On 09.10.19 23:04, Per Liden wrote: > After JDK-8230565, we left the develop flag ZVerifyLoadBarriers around, > which is no longer used and can be removed. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232070 > Webrev: http://cr.openjdk.java.net/~pliden/8232070/webrev.0 looks good and trivial. Thomas From erik.osterlund at oracle.com Thu Oct 10 15:31:14 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Thu, 10 Oct 2019 17:31:14 +0200 Subject: RFR: 8232116: ZGC: Remove redundant ZLock in ZNMethodTable Message-ID: <99d79fed-1ec2-4e51-55eb-d6c7aa333a42@oracle.com> Hi, The safe memory reclamation technique used in the ZNMethodTable has an unnecessary ZLock. This lock is statically initialized, which creates some bootstrapping issues. We should remove the lock, as in the context it is used, we are always protected under the CodeCache_lock. Bug: https://bugs.openjdk.java.net/browse/JDK-8232116 Webrev: http://cr.openjdk.java.net/~eosterlund/8232116/webrev.00/ Thanks, /Erik From per.liden at oracle.com Thu Oct 10 16:41:05 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 10 Oct 2019 18:41:05 +0200 Subject: RFR: 8232116: ZGC: Remove redundant ZLock in ZNMethodTable In-Reply-To: <99d79fed-1ec2-4e51-55eb-d6c7aa333a42@oracle.com> References: <99d79fed-1ec2-4e51-55eb-d6c7aa333a42@oracle.com> Message-ID: Looks good! /Per On 10/10/19 5:31 PM, erik.osterlund at oracle.com wrote: > Hi, > > The safe memory reclamation technique used in the ZNMethodTable has an > unnecessary ZLock. This lock is statically initialized, which creates > some bootstrapping issues. We should remove the lock, as in the context > it is used, we are always protected under the CodeCache_lock. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232116 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8232116/webrev.00/ > > Thanks, > /Erik From per.liden at oracle.com Thu Oct 10 16:42:02 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 10 Oct 2019 18:42:02 +0200 Subject: RFR: 8232070: ZGC: Remove unused ZVerifyLoadBarriers In-Reply-To: <0c0a91af-9be1-d879-bf60-fd3cf0888823@oracle.com> References: <1df387c3-bae2-45b3-7930-1baf56dea03c@oracle.com> <0c0a91af-9be1-d879-bf60-fd3cf0888823@oracle.com> Message-ID: <5b180c79-3ad2-993b-b24b-bc69b963eeb7@oracle.com> Thanks Thomas! /Per On 10/10/19 5:23 PM, Thomas Schatzl wrote: > Hi, > > On 09.10.19 23:04, Per Liden wrote: >> After JDK-8230565, we left the develop flag ZVerifyLoadBarriers >> around, which is no longer used and can be removed. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232070 >> Webrev: http://cr.openjdk.java.net/~pliden/8232070/webrev.0 > > ? looks good and trivial. > > Thomas > From erik.osterlund at oracle.com Thu Oct 10 16:51:27 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 10 Oct 2019 18:51:27 +0200 Subject: RFR: 8232116: ZGC: Remove redundant ZLock in ZNMethodTable In-Reply-To: References: <99d79fed-1ec2-4e51-55eb-d6c7aa333a42@oracle.com> Message-ID: Hi Per, Thanks for the review. /Erik > On 10 Oct 2019, at 18:41, Per Liden wrote: > > Looks good! > > /Per > >> On 10/10/19 5:31 PM, erik.osterlund at oracle.com wrote: >> Hi, >> The safe memory reclamation technique used in the ZNMethodTable has an unnecessary ZLock. This lock is statically initialized, which creates some bootstrapping issues. We should remove the lock, as in the context it is used, we are always protected under the CodeCache_lock. >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8232116 >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8232116/webrev.00/ >> Thanks, >> /Erik From stefan.karlsson at oracle.com Thu Oct 10 17:00:26 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 10 Oct 2019 19:00:26 +0200 Subject: RFR: 8232116: ZGC: Remove redundant ZLock in ZNMethodTable In-Reply-To: <99d79fed-1ec2-4e51-55eb-d6c7aa333a42@oracle.com> References: <99d79fed-1ec2-4e51-55eb-d6c7aa333a42@oracle.com> Message-ID: <068b00ea-c8cb-606e-7d06-98a22bfcd214@oracle.com> Looks good. StefanK On 2019-10-10 17:31, erik.osterlund at oracle.com wrote: > Hi, > > The safe memory reclamation technique used in the ZNMethodTable has an > unnecessary ZLock. This lock is statically initialized, which creates > some bootstrapping issues. We should remove the lock, as in the > context it is used, we are always protected under the CodeCache_lock. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232116 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8232116/webrev.00/ > > Thanks, > /Erik From erik.osterlund at oracle.com Thu Oct 10 17:08:43 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 10 Oct 2019 19:08:43 +0200 Subject: RFR: 8232116: ZGC: Remove redundant ZLock in ZNMethodTable In-Reply-To: <068b00ea-c8cb-606e-7d06-98a22bfcd214@oracle.com> References: <99d79fed-1ec2-4e51-55eb-d6c7aa333a42@oracle.com> <068b00ea-c8cb-606e-7d06-98a22bfcd214@oracle.com> Message-ID: Hi Stefan, Thanks for the review. /Erik > On 10 Oct 2019, at 19:00, Stefan Karlsson wrote: > > Looks good. > > StefanK > >> On 2019-10-10 17:31, erik.osterlund at oracle.com wrote: >> Hi, >> >> The safe memory reclamation technique used in the ZNMethodTable has an unnecessary ZLock. This lock is statically initialized, which creates some bootstrapping issues. We should remove the lock, as in the context it is used, we are always protected under the CodeCache_lock. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8232116 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8232116/webrev.00/ >> >> Thanks, >> /Erik > From kim.barrett at oracle.com Thu Oct 10 23:34:27 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 10 Oct 2019 19:34:27 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: > On Oct 9, 2019, at 12:27 AM, sangheon.kim at oracle.com wrote: > Webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.3 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.3.inc > Testing: hs-tier 1~5, with/without UseNUMA I agree with Stefan and Thomas; this is looking pretty good. There are some naming issues that I'm not going to comment on here. Stefan has already commented on some, and a bit of offline discussion suggests there's a larger naming discussion needed, but which can follow getting the functionality we want. There has been further discission offline toward collapsing G1MemoryNodeManager to one class without virtual dispatch, and using G1NUMA name. I won't bother to re-iterate any of that here. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1Allocator.cpp 186 assert(Heap_lock->owner() != NULL, "Should be owned on this thread's behalf."); Use assert_lock_strong(Heap_lock). ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp 82 _storage.request_memory_on_node(page, _pages_per_region, node_index); ... 153 _storage.request_memory_on_node(idx, 1, node_index); I'm not sure request_memory_on_node belongs on the _storage object. The current implementation just has the storage object (conditionally) forward the request to the memory node manager object. These places in the space mapper could just make the calls on the memory node manager object directly (it is already being used nearby). And these places don't need the conditionalization. I think making the space mapper directly call the memory node manager here would remove the need for the proposed changes to the virtual space class. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/heapRegion.cpp 464 st->print("|Node ID %02d", node_ids[this->node_index()]); The unchecked use of node_index() here can run afoul of an unset (so UnknownNodeIndex) index. Also, no need for `this->` in `this->node_index()`. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp 81 virtual const uint max_search_depth() const { return 1; } s/const uint/uint/ Similarly for other declarations and definitions. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp 77 virtual void request_memory_on_node(char* aligned_address, size_t size_in_bytes, uint node_index) { } Shouldn't the aligned_address argument be typed "void*" rather than "char*"? ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/heapRegionManager.cpp 112 if (mgr->has_multi_nodes() && requested_node_index != G1MemoryNodeManager::AnyNodeIndex) { I think it would be better to test the requested_node_index value first. The "any" case is a common case. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/heapRegionManager.cpp 200 if(AlwaysPreTouch) { Add space after "if". ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/heapRegionManager.cpp 311 return region_node_index == preferred_node_index; Fix indentation. ------------------------------------------------------------------------------ src/hotspot/share/runtime/os.hpp 393 static const int InvalidId = -1; This should probably be "InvalidNUMAId" or something like that. ------------------------------------------------------------------------------ From sangheon.kim at oracle.com Fri Oct 11 03:23:47 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Thu, 10 Oct 2019 20:23:47 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <93592401-FC69-4B7F-95BE-DE9A0F070F3A@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <93592401-FC69-4B7F-95BE-DE9A0F070F3A@oracle.com> Message-ID: <48528a18-9da0-a69e-135d-8e56b78ecca3@oracle.com> Hi Stefan, On 10/9/19 2:40 PM, Stefan Johansson wrote: > Hi Sangheon, > > Thanks again for a much improved version. Some comments below. > >> 9 okt. 2019 kl. 06:27 skrev sangheon.kim at oracle.com: >> >> ... >> >> Here's the major change list at the webrev. Or arguable list :) >> 1) Verification at HRM::allocate_free_region() is removed and it will be added somewhere at safepoint by JDK-8220312 (3/3 which is part of this JEP). Probably at the end of young gc? >> 2) Node id printing is changed. Removed old one and added at HeapRegion::print_on() with new column. Node id is only printed when UseNUMA is enabled and gc+heap+region=trace. If there's single active node, it will print the node id and this is intentional. Another approach would be printing only if there are multiple nodes. >> 3) If AlwaysPreTouch is enabled, HeapRegion will have actual node index instead of preferred node index. >> 4) HeapRegion::_node_index is set at HRM::make_regions_available() as there is the only place initializing HeapRegion. Another approach would be setting the index at HeapRegion::initialize(we have to pollute HR with G1MNM stuff) or conditionally(*) setting the index at HeapRegion::node_index(). (*) if the index is unknown etc.. >> 5) G1NUMA class is merged into G1MemoryNodeManager. > I saw your comment above about suggestions around this area and I can try out one thought I had, something I think Thomas mentioned as well. Making the non-NUMA case work exactly as a the NUMA case with one node. I?ll need some more time for that, but below are my comments on the current patch. For the record, Stefan provided me a patch showing above idea of 'non-NUMA case work exactly as a the NUMA case with one node'. The next webrev will include this change. > >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.3 >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.3.inc > src/hotspot/os/linux/os_linux.cpp > ? > 3026 warning("Failed to get numa id at " PTR_FORMAT " with errno=%d", p2i((void*)address), errno); > > The cast here is no longer needed. Done > ? > > src/hotspot/share/gc/g1/g1Allocator.hpp > ? > 44 G1MemoryNodeManager* _mnm; > > I would prefer a more descriptive name like _memory_node_manager. After changing to G1NUMA, all members will be _numa. > ? > > src/hotspot/share/gc/g1/g1CollectedHeap.hpp > ? > 196 // Manages single or multi node memory. > 197 G1MemoryNodeManager* _mem_node_mgr; > ... > 558 G1MemoryNodeManager* mem_node_mgr() const { return _mem_node_mgr; } > > As above, I would prefer spelling out the names to memory_node_manager(). Same as above. > ? > > src/hotspot/share/gc/g1/g1_globals.hpp > ? > Last line still removed a ?\?, please revert this change. Done > ? > > src/hotspot/share/gc/g1/heapRegion.cpp > ? > 462 if (UseNUMA) { > 463 const int* node_ids = G1MemoryNodeManager::mgr()->node_ids(); > 464 st->print("|Node ID %02d", node_ids[this->node_index()]); > 465 } > 466 st->print_cr("?); > > I would prefer having a function that returns the node id given the index. Like the inverse of index_of_node_id(). > > I also think it would be more informative to say "NUMA id" or "NUMA node?. I don't strong opinion on this but as Thomas suggests not to have such word, I removed it. It will print something like, "| 00". Hope you are okay with this. > ? > > src/hotspot/share/gc/g1/heapRegionManager.cpp > ? > 195 // Set node index of the given HeapRegion. > 196 // If AlwaysPreTouch is enabled, set with actual node index. > 197 // If it is disabled, set with preferred node index which is already decided. > 198 static void set_heapregion_node_index(HeapRegion* hr) { > 199 uint node_index; > 200 if(AlwaysPreTouch) { > 201 // If we already pretouched, we can check actual node index here. > 202 node_index = G1MemoryNodeManager::mgr()->index_of_address(hr->bottom()); > 203 } else { > 204 node_index = G1MemoryNodeManager::mgr()->preferred_node_index_for_index(hr->hrm_index()); > 205 } > 206 hr->set_node_index(node_index); > 207 } > > I would prefer to have a helper for calculating the index to set not a helper for setting the index. If you agree, you could move this logic to G1MemoryNodeManager::index_for_region() and then you can change: > 233 // Set node index of the heap region after initialization but before inserting > 234 // to free list. > 235 set_heapregion_node_index(hr); > > To just: > 235 hr->set_node_index(G1MemoryNodeManager::mgr()->index_for_region(hr)); > ? > 309 bool HeapRegionManager::is_on_preferred_index(uint region_index, uint preferred_node_index) { > 310 uint region_node_index = G1MemoryNodeManager::mgr()->preferred_node_index_for_index(region_index); > 311 return region_node_index == preferred_node_index; > 312 } > > Indentation on row 311. Changed as you suggested. I had same opinion but the reason that I didn't choose was I wanted to avoid dependency for HeapRegion at G1NUMA. > ? > > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > ? > 44 static G1MemoryNodeManager* mgr() { return _inst; } > > I think we should change the name of this getter to manager(), to avoid unnecessary shortenings. N/A > ? > 57 virtual bool has_multi_nodes() const { return false; } > > Same as above I would prefer has_multiple_nodes() N/A I will post next webrev after applying others' comments. Thanks, Sangheon > ? > > Thanks, > Stefan > >> Testing: hs-tier 1~5, with/without UseNUMA >> >> Thanks, >> Sangheon From sangheon.kim at oracle.com Fri Oct 11 03:24:44 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Thu, 10 Oct 2019 20:24:44 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <93592401-FC69-4B7F-95BE-DE9A0F070F3A@oracle.com> Message-ID: <041edc5a-73d5-27f4-68ab-32c497f930dd@oracle.com> Hi Thomas, On 10/10/19 2:48 AM, Thomas Schatzl wrote: > Hi, > > On 09.10.19 23:40, Stefan Johansson wrote: >> Hi Sangheon, >> >> Thanks again for a much improved version. Some comments below. > > ? agree, it looks quite nice now. :) > > [...] >> >>> Webrev: >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.3 >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.3.inc > [...] >> >> src/hotspot/share/gc/g1/heapRegion.cpp >> ? >> ? 462?? if (UseNUMA) { >> ? 463???? const int* node_ids = G1MemoryNodeManager::mgr()->node_ids(); >> ? 464???? st->print("|Node ID %02d", node_ids[this->node_index()]); >> ? 465?? } >> ? 466?? st->print_cr("?); >> >> I would prefer having a function that returns the node id given the >> index. Like the inverse of index_of_node_id(). >> >> I also think it would be more informative to say "NUMA id" or "NUMA >> node?. > > I would also remove the "Node ID" string here as it does not convey > any information. Most other columns also do not carry their description. Done. I will post the webrev after addressing Kim's comment. Thanks, Sangheon > > Thanks, > ? Thomas > > From thomas.schatzl at oracle.com Fri Oct 11 11:02:01 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 11 Oct 2019 13:02:01 +0200 Subject: G1 patch of elastic Java heap In-Reply-To: <6cc5bdd7-c076-472f-8a36-8294c6cbfe21.maoliang.ml@alibaba-inc.com> References: <6270ce59-4a8e-431e-9ccf-f6d2c0f927eb.maoliang.ml@alibaba-inc.com> <1267a5dd2cf6cc1d03df64d07a06ba0f45195951.camel@oracle.com> <3140197d-8cab-4a86-af92-58431c74cb6b.maoliang.ml@alibaba-inc.com> <6cc5bdd7-c076-472f-8a36-8294c6cbfe21.maoliang.ml@alibaba-inc.com> Message-ID: <5f02f337-f479-55f6-351e-867507845f65@oracle.com> Hi, On 10.10.19 15:48, Liang Mao wrote: > Hi Thomas, > > Thank you for the feedback. > You are right about some points that the present code seems to separate > the heap into young and old gen pools. In OpenJDK8, there's no adaptive-ihop so fixed ihop > and MaxNewSize can clearly separate young gen and old gen. I'm also thinking about how to design it better > in upstream of OpenJDK G1. > > There is a tradeoff between memory and GC frequency. More frequent GC > uses less memory. We found our online service applications keep large young generation for > potential query traffic but most of time the young GC frequency is quite low. Memory can be easily saved > by using smaller young gen > In Shenandoah or ZGC, there is only 1 generation and it's > straightforward to determine if memory is wasted and can be returned. G1 has 2 generations, in remark phase > MinHeapFreeRatio/MaxHeapFreeRatio cannot tell the young generation is rather wasted for running 2 minutes > without a young GC and we can return a lot of memory. Each generation's GC interval or time ratio > spent on mutator/gc you mentioned seems more intuitive. > > The explicit limitation of generation may not be a good design from G1 > GC's perspective. From the operation's point of view, it is easy for manipulating JVM. There is a > simple relationship: larger network traffic -> higher memory allocation rate -> larger young > generation. So cluster operation can easily set the young generation as 10% of max young gen > size to every Java instance if the network traffic is guanranteed to be below 10% for a period of time. > > I'm not sticking to the current implementation to create clear boundary > between young and old gen, especially for newer OpenJDK versions and I've been thinking of unifying > the 2 generations' resizing within the single memory pool of heap along with Xms. The periodic > uncommit mode does not strickly separate the young/old gen. Current implementation calculates the > average GC interval and keep it in a certain range between a low bound and high bound and will immediately > trigger an expansion if a single GC interval smaller than a threshould. We can use a similar > policy to estimate a target young generation capacity and adjust the capacity of old generation after a > concurrent cycle. The 2 parts together can be the target heap capacity. The capacity can vary between > Xms and Xmx. The difference with current G1 is it can be resized in a young GC not only remark. Thank you for presenting your problem (and not insisting on a particular solution upfront). Summary of this long text: In case of "low" activity the user wants to limit the heap resulting in giving back memory. Currently, all the user can do is specifying the maximum amount of work the gc is allowed to use (GCTimeRatio). At least G1, as soon as the time spent in gc compared to mutator time is lower than GCTimeRatio (typically achieved by expanding the heap), it "never" shrinks the heap back (at least not based on that ratio). Which wastes lots of space, which is the problem. We all agree that this is a problem :) I believe we only differ on what knobs the user should have available to achieve this. Here are my current suggestions: One option that I suggested earlier, is that instead of setting generation sizes (or heap sizes) manually (which could be fine in some cases for other reasons) could be thinking a bit differently about GCTimeRatio than now: currently it is the maximum amount of GC activity the user can bear, so we should make the GC to use less. The slight tweak here could be that we assume that any GC activity below that is fine :) Ie. if current GC activity is very low compared to mutator activity (far below what GCTimeRatio allows), and expected additional GC activity caused by this forced GC cycle would not exceed that GCTimeRatio, why not do the GC? Think of a "minimum" GCTimeRatio; in some way this is very much like minimum and maximum GC intervals only with much more flexibility for the GC to meet (also this metric is independent of the environment, e.g. hardware, while setting actual values of sizes needs tuning). I agree that there is then not an immediately obvious relation between external input (the traffic in your example) to what you should set that "minimum" GCTimeRatio to. However since there is a relation to young gen size and GCTimeRatio I think this can be figured out. This is what ZGC does and I think would be worth trying out before thinking about adding a G1 specific way of achieving this or a similar effect. The other option which is more direct would be implementing and changing target heap size during runtime: it would also automatically shrink the heap. I believe that if you were able to modify the current adaptive IHOP's "target" heap size from outside, G1 would already automatically give back memory; in conjunction with the "Promptly Return ...", it would also make sure that in very low mutator activity cases the GC cycle would continue. As for whether this feature would be accepted for inclusion into G1: there is already a SoftMaxHeapSize switch in the JDK, so I guess this is a non-issue. Note that you can *already*, if you know that from a particular time on there will be little activity, modify the "Promptly Return..." settings so that it will immediately start cleaning up and compacting the heap; you can even force maximum compaction at that time by issuing a full gc if service interruption is not an issue. > > In order to do swift heap resizing we have to conquer the over head of > memory request/release from OS. The memory unmap and map(including the page fault) cost significant > time. So we use an intuitive way to have a concurrent thread to do the map/unmap/pretouch. The free > regions will be synchronized in GC pause. In our applications, a typical G1 remark cost ~100ms of pause. I > haven't tested latest G1 but based on our experimental data, the pause can be easily doubled if done > considerable map/unmaps. > That's a related but distinct problem and a solution that seems at least worth trying :) > > All of above are our thoughts and the present implementation is kind of > reference. Please let me know if > I answered all your questions. Hope we can come to an agreement in some > points and conceive a good design > in latest G1 GC :) > Thanks, Thomas From zgu at redhat.com Fri Oct 11 12:30:23 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 11 Oct 2019 08:30:23 -0400 Subject: RFR 8232010: Shenandoah: implement self-fixing native barrier Message-ID: <6cecca8a-a477-53b4-48de-f504a2100955@redhat.com> Please review this patch that implements self-fixing LRB for in native oops. Bug: https://bugs.openjdk.java.net/browse/JDK-8232010 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232010/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) with x86_32 and x86_64 JVM on Linux. Thanks, -Zhengyu From leihouyju at gmail.com Fri Oct 11 12:49:17 2019 From: leihouyju at gmail.com (Haoyu Li) Date: Fri, 11 Oct 2019 20:49:17 +0800 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> Message-ID: Hi Stefan, Thanks for your suggestion! It is very redundant that PSParallelCompact::fill_shadow_region() copies most code from PSParallelCompact::fill_region(), and therefore I've refactored these two functions to share code as many as possible. And the attachment is the updated patch. Specifically, the closure, which moves objects, in PSParallelCompact::fill_region() is now declared as a template of either MoveAndUpdateClosure or ShadowClosure. So by controlling the type of closure when invoking the function, we can decide whether to fill a normal region or a shadow one. Thus, almost all code in PSParallelCompact::fill_region() can be reused. Besides, a virtual function named complete_region() is added in both closures to do some work after the filling, such setting states and copying the shadow region back. Thanks again for reviewing the patch, looking forward to your insights and suggestions! Best Regards, Haoyu Li 2019-10-10 21:50 GMT+08:00, Stefan Johansson : > Thanks for the clarification =) > > Moving on to the next part, the code in the patch. So this won't be a > full review of the patch but just an initial comment that I would like > to be addressed first. > > The new function PSParallelCompact::fill_shadow_region() is more or less > a copy of PSParallelCompact::fill_region() and I understand that from a > proof of concept point of view it was the easy (and right) way to do it. > I would prefer if the code could be refactored so that fill_region() and > fill_shadow_region() share more code. There might be reasons that I've > missed, that prevents it, but we should at least explore how much code > can be shared. > > Thanks, > Stefan > > On 2019-10-10 15:10, Haoyu Li wrote: >> Hi Stefan, >> >> Thanks for your quick response! As to your concern about the OCA, I am >> the sole author of the patch. And it is the case as what the agreement >> states. >> Best Regrads, >> Haoyu Li, >> >> >> Stefan Johansson > > ?2019?10?10??? ??8:37??? >> >> Hi, >> >> On 2019-10-10 13:06, Haoyu Li wrote: >> > Hi Stefan, >> > >> > Thanks for your testing! One possible reason for the regressions >> in >> > simple tests is that the region dependencies maybe not heavy >> enough. >> > Because the locality of shadow regions is lower than that of heap >> > regions, writing to shadow regions will be slower than to normal >> > regions, and this is a part of the reason why I reuse shadow >> regions. >> > Therefore, if only a few shadow regions are created and not >> reused, the >> > overhead may not be amortized. >> >> I guess it is something like this. I thought that for "easy" heaps >> the >> shadow regions won't be used at all, and should therefor not really >> cost >> anything. >> >> > >> > As to the OCA, it is the case that I'm the only person signing the >> > agreement. Please let me know if you have any further questions. >> Thanks >> > again! >> >> Ok, so you are the sole author of the patch. The important part, as >> the >> agreement states, is: >> "no other person or entity, including my employer, has or will have >> rights with respect my contributions" >> >> Is that the case? >> >> Thanks, >> Stefan >> >> > >> > Best Regrads, >> > Haoyu Li >> > >> > Stefan Johansson > >> > > >> ?2019?10?8??? ??6:49 >> ??? >> > >> > Hi Haoyu, >> > >> > I've done some more testing and I haven't seen any issues >> with the >> > patch >> > so far and the performance looks promising in most cases. For >> simple >> > tests I've seen some regressions, but I'm not really sure >> why. Will do >> > some more digging. >> > >> > To move forward with this the first thing we need to do is >> making sure >> > that you being covered by the Oracle Contributor Agreement is >> enough. >> > From what we can see it is only you as an individual that >> has signed >> > the OCA and in that case it is important that this statement >> from the >> > OCA is fulfilled: "no other person or entity, including my >> employer, >> > has >> > or will have rights with respect my contributions" >> > >> > Is this the case for this contribution or should we have the >> university >> > sign the OCA as well? For more information regarding the OCA >> please >> > refer to: >> > https://www.oracle.com/technetwork/oca-faq-405384.pdf >> > >> > Thanks, >> > Stefan >> > >> > On 2019-09-16 16:02, Haoyu Li wrote: >> > > FYI, the evaluation results on OpenJDK 14 are plotted in >> the >> > attachment. >> > > I compute the full GC throughput by dividing the heap size >> before >> > full >> > > GC by the GC pause time, and the results are arithmetic >> mean >> > values of >> > > ten runs after a warm-up run. The evaluation is conducted on >> a >> > machine >> > > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 >> physical >> > cores >> > > with SMT enabled) and 64G DRAM. >> > > >> > > Best Regrads, >> > > Haoyu Li, >> > > Institute of Parallel and Distributed Systems(IPADS), >> > > School of Software, >> > > Shanghai Jiao Tong University >> > > >> > > >> > > Stefan Johansson > >> > > > >> > > > >> > > >>> ?2019?9?12??? ??5:34 >> > ??? >> > > >> > > Hi Haoyu, >> > > >> > > I recently came across your patch and I would like to >> pick up on >> > > some of the things Kim mentioned in his mails. I >> especially want >> > > evaluate and investigate if this is a technique we can >> use to >> > > improve the other GCs as well. To start that work I >> want to >> > take the >> > > patch for a spin in our internal performance testing. >> The patch >> > > doesn?t apply clean to the latest JDK repository, so >> if you could >> > > provide an updated patch that would be very helpful. >> > > >> > > It would also be great if you could share some more >> information >> > > around the results presented in the paper. For example, >> it >> > would be >> > > good to get the full command lines for the different >> > benchmarks so >> > > we can run them locally and reproduce the >> results you?ve seen. >> > > >> > > Thanks, >> > > Stefan >> > > >> > >> 12 mars 2019 kl. 03:21 skrev Haoyu Li >> >> > > >> > >> > > >>>: >> > >> >> > >> Hi Kim, >> > >> >> > >> Thanks for reviewing and testing the patch. If there >> are any >> > >> failures or performance degradation relevant to the >> work, please >> > >> let me know and I'll be very happy to keep improving >> it. >> > Also, any >> > >> suggestions about code improvements are well >> appreciated. >> > >> >> > >> I'm not quite sure if both G1 and Shenandoah have the >> similar >> > >> region dependency issue, since I haven't studied their >> GC >> > >> behaviors before. If they have, I'm also willing to >> propose >> > a more >> > >> general optimization. >> > >> >> > >> As to the memory overhead, I believe it will be low >> because this >> > >> patch exploits empty regions in the young space >> rather than >> > >> off-heap memory to allocate shadow regions, and also >> reuses the >> > >> /_source_region/ field of each /RegionData /to record >> the >> > >> correspongding shadow region index. We only introduce >> a new >> > >> integer filed /_shadow /in the RegionData class to >> indicate the >> > >> status of a region, a global /GrowableArray >> _free_shadow/ to >> > store >> > >> the indices of shadow regions, and a global >> /Monitor/ to protect >> > >> the array. These information might help if the memory >> overhead >> > >> need to be evaluated. >> > >> >> > >> Looking forward to your insight. >> > >> >> > >> Best Regrads, >> > >> Haoyu Li, >> > >> Institute of Parallel and Distributed Systems(IPADS), >> > >> School of Software, >> > >> Shanghai Jiao Tong University >> > >> >> > >> >> > >> Kim Barrett > >> > > > >> > >> > >> > > >>> ?2019?3?12??? ??6:11??? >> > >> >> > >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett >> > >> > > > >> > > > >>> wrote: >> > >> > >> > >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li >> > >> > >> > >> > >> > >>> >> wrote: >> > >> >> >> > >> >> Hi Kim, >> > >> >> >> > >> >> I have ported my patch to OpenJDK 13 according >> to your >> > >> instructions in your last mail, and the patch is >> attached in >> > >> this mail. The patch does not change much since >> PSGC is >> > indeed >> > >> pretty stable. >> > >> >> >> > >> >> Also, I evaluate the correctness and >> performance of >> > PS full >> > >> GC with benchmarks from DaCapo, SPECjvm2008, and >> JOlden >> > suits >> > >> on a machine with dual Intel Xeon E5-2618L v3 >> CPUs(16 >> > physical >> > >> cores), 64G DRAM and linux kernel 4.17. The >> evaluation >> > result, >> > >> indicating 1.9X GC throughput improvement on >> average, is >> > >> attached, too. >> > >> >> >> > >> >> However, I have no idea how to further test >> this >> > patch for >> > >> both correctness and performance. Can I please >> get any >> > >> guidance from you or some sponsor? >> > >> > >> > >> > Sorry I missed that you had sent an updated >> version of the >> > >> patch. >> > >> > >> > >> > I?ve run the full regression suite across >> Oracle-supported >> > >> platforms. There are some >> > >> > failures, but there are almost always some >> failures in the >> > >> later tiers right now. I?ll start >> > >> > looking at them tomorrow to figure out whether >> any of them >> > >> are relevant. >> > >> > >> > >> > I?m also planning to run some of our performance >> > benchmarks. >> > >> > >> > >> > I?ve lightly skimmed the proposed changes. >> There might be >> > >> some code improvements >> > >> > to be made. >> > >> > >> > >> > I?m also wondering if this technique applies to >> other >> > >> collectors. It seems like both G1 and >> > >> > Shenandoah full gc?s might have similar >> issues? If so, a >> > >> solution that is ParallelGC-specific >> > >> > is less interesting than one that has broader >> > >> applicability. Though maybe this optimization >> > >> > is less important for G1 and Shenandoah, since >> they >> > actively >> > >> try to avoid full gc?s. >> > >> > >> > >> > I?m also not clear on how much additional >> memory might be >> > >> temporarily allocated by this >> > >> > mechanism. >> > >> >> > >> I?ve created a CR for this: >> > >> https://bugs.openjdk.java.net/browse/JDK-8220465 >> > >> >> > > >> > >> > -- Best Regrads, Haoyu Li, Institute of Parallel and Distributed Systems(IPADS), School of Software, Shanghai Jiao Tong University -------------- next part -------------- A non-text attachment was scrubbed... Name: shadow-region.patch Type: text/x-patch Size: 23000 bytes Desc: not available URL: From zgu at redhat.com Fri Oct 11 17:11:57 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 11 Oct 2019 13:11:57 -0400 Subject: RFR 8232009: Shenandoah: C2 load barrier does not match interpreter version Message-ID: <689f5b52-de3b-6a1a-0032-365dedf58414@redhat.com> Please review this patch that matches C2 load barrier to interpreter's implementation. Bug: https://bugs.openjdk.java.net/browse/JDK-8232009 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232009/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) with x86_32 and x86_64 JVMs on Linux Thanks, -Zhengyu From sangheon.kim at oracle.com Fri Oct 11 17:34:03 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Fri, 11 Oct 2019 10:34:03 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> Message-ID: <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> Hi Kim, On 10/10/19 4:34 PM, Kim Barrett wrote: >> On Oct 9, 2019, at 12:27 AM, sangheon.kim at oracle.com wrote: >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.3 >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.3.inc >> Testing: hs-tier 1~5, with/without UseNUMA > I agree with Stefan and Thomas; this is looking pretty good. :) > > There are some naming issues that I'm not going to comment on here. > Stefan has already commented on some, and a bit of offline discussion > suggests there's a larger naming discussion needed, but which can > follow getting the functionality we want. > > There has been further discission offline toward collapsing > G1MemoryNodeManager to one class without virtual dispatch, and using > G1NUMA name. I won't bother to re-iterate any of that here. Okay. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1Allocator.cpp > 186 assert(Heap_lock->owner() != NULL, "Should be owned on this thread's behalf."); > > Use assert_lock_strong(Heap_lock). It didn't work. assert_lock_string() checks "lock->owned_by_self()" which is not equivalent to "lock::owner() != NULL". Am I missing something? Since this is pre-existing code, I would like to leave as is. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp > 82 _storage.request_memory_on_node(page, _pages_per_region, node_index); > ... > 153 _storage.request_memory_on_node(idx, 1, node_index); > > I'm not sure request_memory_on_node belongs on the _storage object. > The current implementation just has the storage object (conditionally) > forward the request to the memory node manager object. These places in > the space mapper could just make the calls on the memory node manager > object directly (it is already being used nearby). And these places > don't need the conditionalization. > > I think making the space mapper directly call the memory node manager > here would remove the need for the proposed changes to the virtual > space class. Fixed to directly call G1NUMA::request_memory_on_node() (previously G1MemoryNodeManager). But G1NUMA can't calculate raw address, so I had to add base address at G1NUMA to get that. When I implemented it, I had similar opinion (not good fit for _storage) but I also wanted to avoid adding extra dependency at G1NUMA. But anyway I realized we can achieve it easily if we have base address. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegion.cpp > 464 st->print("|Node ID %02d", node_ids[this->node_index()]); > > The unchecked use of node_index() here can run afoul of an unset (so > UnknownNodeIndex) index. Added such checking. > > Also, no need for `this->` in `this->node_index()`. Removed. I'm aware but tried to follow local style which uses 'this->' in that code. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > 81 virtual const uint max_search_depth() const { return 1; } > > s/const uint/uint/ > > Similarly for other declarations and definitions. Done. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp > 77 virtual void request_memory_on_node(char* aligned_address, size_t size_in_bytes, uint node_index) { } > > Shouldn't the aligned_address argument be typed "void*" rather than "char*"? The signature of that method changed to page based and newly added member is void*. i.e. G1NUMA, void* _base_address But eventually we need char* to call numa_make_local(char*, , ). > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegionManager.cpp > 112 if (mgr->has_multi_nodes() && requested_node_index != G1MemoryNodeManager::AnyNodeIndex) { > > I think it would be better to test the requested_node_index value > first. The "any" case is a common case. Done > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegionManager.cpp > 200 if(AlwaysPreTouch) { > > Add space after "if". Done > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegionManager.cpp > 311 return region_node_index == preferred_node_index; > > Fix indentation. Done > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/os.hpp > 393 static const int InvalidId = -1; > > This should probably be "InvalidNUMAId" or something like that. Changed to InvalidNUMAId. FYI, I filed JDK-8232156 for further investigation of initialization order related to G1NUMA. i.e. about removing G1NUMA::set_region_info(). New webrev includes: 1. Addressed most comments from Kim, Stefan and Thomas. 2. Rename G1MemoryNodeManager to G1NUMA with removing virtual calls. webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.4 http://cr.openjdk.java.net/~sangheki/8220310/webrev.4.inc Testing: hs-tier 1 ~ 5 with/without UseNUMA Thanks, Sangheon > > ------------------------------------------------------------------------------ > From kim.barrett at oracle.com Fri Oct 11 18:30:00 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 11 Oct 2019 14:30:00 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> Message-ID: <8CA80180-7C9A-423D-8804-653CA59E3DF1@oracle.com> > On Oct 11, 2019, at 1:34 PM, sangheon.kim at oracle.com wrote: > On 10/10/19 4:34 PM, Kim Barrett wrote: >> src/hotspot/share/gc/g1/g1Allocator.cpp >> 186 assert(Heap_lock->owner() != NULL, "Should be owned on this thread's behalf."); >> >> Use assert_lock_strong(Heap_lock). > It didn't work. > assert_lock_string() checks "lock->owned_by_self()" which is not equivalent to "lock::owner() != NULL". Am I missing something? > > Since this is pre-existing code, I would like to leave as is. Oh, bleh, you are right. I didn?t read the existing code carefully enough. >> src/hotspot/share/gc/g1/heapRegion.cpp >> 464 st->print("|Node ID %02d", node_ids[this->node_index()]); >> >> The unchecked use of node_index() here can run afoul of an unset (so >> UnknownNodeIndex) index. > Added such checking. >> >> Also, no need for `this->` in `this->node_index()`. > Removed. > I'm aware but tried to follow local style which uses 'this->' in that code. There is one other use of this-> in that function (and one additional one in the whole file). The *vast* majority of accesses use the implicit this. So I wouldn?t describe that as the local style, rather a couple of weirdnesses that probably should be cleaned up. > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.4 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.4.inc > > Testing: hs-tier 1 ~ 5 with/without UseNUMA I?ve started looking at the new webrev. Looking good, and no comments yet, but not done yet either. From shade at redhat.com Fri Oct 11 18:36:02 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 11 Oct 2019 20:36:02 +0200 Subject: RFR (XS/T) 8232176: Shenandoah: new assert in ShenandoahEvacuationTask is too strong Message-ID: Recent regression: https://bugs.openjdk.java.net/browse/JDK-8232176 JDK-8231947 added the assert in ShenandoahEvacuationTask that is too strong. There is a corner case when the region is collection-set-pinned (CSP), and the oom-evac-protocol waits for GC thread to complete the evacuation. There is a short window where GC thread can see the CSP region before seeing cancellation request. It seems easier to remove the too strong assert for now. is_conc_move_allowed() == true is a lie right now. We can add cancelled_gc() check inside of it, but that would only be safe if we know that caller holds oom-evac-scope. The assertion failure reliably reproduces with -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot on SPECjvm2008. Fix: https://cr.openjdk.java.net/~shade/8232176/webrev.01/ Testing: broken tests -- Thanks, -Aleksey From kim.barrett at oracle.com Fri Oct 11 20:12:44 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 11 Oct 2019 16:12:44 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <8CA80180-7C9A-423D-8804-653CA59E3DF1@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <8CA80180-7C9A-423D-8804-653CA59E3DF1@oracle.com> Message-ID: <4D393A46-3ADC-42DC-8C3D-D2132AB68D67@oracle.com> > On Oct 11, 2019, at 2:30 PM, Kim Barrett wrote: > >>> src/hotspot/share/gc/g1/heapRegion.cpp >>> 464 st->print("|Node ID %02d", node_ids[this->node_index()]); >>> >>> The unchecked use of node_index() here can run afoul of an unset (so >>> UnknownNodeIndex) index. >> Added such checking. >>> >>> Also, no need for `this->` in `this->node_index()`. >> Removed. >> I'm aware but tried to follow local style which uses 'this->' in that code. > > There is one other use of this-> in that function (and one additional one in the whole file). > The *vast* majority of accesses use the implicit this. So I wouldn?t describe that as the > local style, rather a couple of weirdnesses that probably should be cleaned up. Looks like this has been fixed in the latest version. >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.4 >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.4.inc >> >> Testing: hs-tier 1 ~ 5 with/without UseNUMA > > I?ve started looking at the new webrev. Looking good, and no comments yet, but not done yet either. The only other thing I found was this: ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1NUMA.hpp 85 // Print current active memory node count. 86 uint num_active_nodes() const; "Print"? Also, "current"? It doesn't change, I think. ------------------------------------------------------------------------------ Other than that, looks good to me. I don't need another webrev for a fix to that comment. From sangheon.kim at oracle.com Fri Oct 11 22:07:55 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Fri, 11 Oct 2019 15:07:55 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <4D393A46-3ADC-42DC-8C3D-D2132AB68D67@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <8CA80180-7C9A-423D-8804-653CA59E3DF1@oracle.com> <4D393A46-3ADC-42DC-8C3D-D2132AB68D67@oracle.com> Message-ID: Hi Kim, On 10/11/19 1:12 PM, Kim Barrett wrote: >> On Oct 11, 2019, at 2:30 PM, Kim Barrett wrote: >> >>>> src/hotspot/share/gc/g1/heapRegion.cpp >>>> 464 st->print("|Node ID %02d", node_ids[this->node_index()]); >>>> >>>> The unchecked use of node_index() here can run afoul of an unset (so >>>> UnknownNodeIndex) index. >>> Added such checking. >>>> Also, no need for `this->` in `this->node_index()`. >>> Removed. >>> I'm aware but tried to follow local style which uses 'this->' in that code. >> There is one other use of this-> in that function (and one additional one in the whole file). >> The *vast* majority of accesses use the implicit this. So I wouldn?t describe that as the >> local style, rather a couple of weirdnesses that probably should be cleaned up. > Looks like this has been fixed in the latest version. Yes > >>> webrev: >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.4 >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.4.inc >>> >>> Testing: hs-tier 1 ~ 5 with/without UseNUMA >> I?ve started looking at the new webrev. Looking good, and no comments yet, but not done yet either. > The only other thing I found was this: > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1NUMA.hpp > 85 // Print current active memory node count. > 86 uint num_active_nodes() const; > > "Print"? Also, "current"? It doesn't change, I think. Okay, changed 'Returns active memory node count'. > > ------------------------------------------------------------------------------ > > Other than that, looks good to me. I don't need another webrev for > a fix to that comment. Nice to hear! Many thanks for your thorough all reviews. Thanks, Sangheon From maoliang.ml at alibaba-inc.com Sat Oct 12 11:51:26 2019 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Sat, 12 Oct 2019 19:51:26 +0800 Subject: =?UTF-8?B?UmU6IEcxIHBhdGNoIG9mIGVsYXN0aWMgSmF2YSBoZWFw?= In-Reply-To: <5f02f337-f479-55f6-351e-867507845f65@oracle.com> References: <6270ce59-4a8e-431e-9ccf-f6d2c0f927eb.maoliang.ml@alibaba-inc.com> <1267a5dd2cf6cc1d03df64d07a06ba0f45195951.camel@oracle.com> <3140197d-8cab-4a86-af92-58431c74cb6b.maoliang.ml@alibaba-inc.com> <6cc5bdd7-c076-472f-8a36-8294c6cbfe21.maoliang.ml@alibaba-inc.com>, <5f02f337-f479-55f6-351e-867507845f65@oracle.com> Message-ID: <66393648-73b1-4a45-9d48-c8fcf94789fa.maoliang.ml@alibaba-inc.com> Hi Thomas, The manual generation limit can be put aside currently since we know it might not be so general for a GC. We can focus on how to change heap size and return memory in runtime first. GCTimeRatio is a good metric to measure the health of a Java application and I have considered to use that. But finally I chose a simple way just like the periodic old GC. Guarantee a long enough young GC interval is an alternative way to make sure the GCTimeRatio at a heathy state. I'm absolutely ok to use GCTimeRatio instead of the fixed young GC interval. This part is same to ZGC or Shenandoah for how to balance the desired memory size and GC frequency. I'm open to any good solution and we are already in the same page for this issue I think:) A big difference of our implementation is evaluating heap resizing in any young GC instead of a concurrent gc cycle which I think is swifter and more immmediate. The concurrent map/unmap mechanism gets rid of the additional pause time. My thought is the heap shrink/expand can be all determined in young GC pause and performed in concurrent thread which could exclude the considerable time cost by OS interface. Most of our Java users are intolerant to those pause pikes caused by page fault which can be up to seconds. And we also found the issue of time cost by map/unmap in ZGC. A direct advantage of the young GC resizing and concurrent memory free machanism is for implementing SoftMaxHeapSize. The heap size can be changed after last mixed GC. The young GC won't have longer pause and the memory can be freed concurrently without side effect. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2019 Oct. 11 (Fri.) 19:02 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: G1 patch of elastic Java heap Hi, On 10.10.19 15:48, Liang Mao wrote: > Hi Thomas, > > Thank you for the feedback. > You are right about some points that the present code seems to separate > the heap into young and old gen pools. In OpenJDK8, there's no adaptive-ihop so fixed ihop > and MaxNewSize can clearly separate young gen and old gen. I'm also thinking about how to design it better > in upstream of OpenJDK G1. > > There is a tradeoff between memory and GC frequency. More frequent GC > uses less memory. We found our online service applications keep large young generation for > potential query traffic but most of time the young GC frequency is quite low. Memory can be easily saved > by using smaller young gen > In Shenandoah or ZGC, there is only 1 generation and it's > straightforward to determine if memory is wasted and can be returned. G1 has 2 generations, in remark phase > MinHeapFreeRatio/MaxHeapFreeRatio cannot tell the young generation is rather wasted for running 2 minutes > without a young GC and we can return a lot of memory. Each generation's GC interval or time ratio > spent on mutator/gc you mentioned seems more intuitive. > > The explicit limitation of generation may not be a good design from G1 > GC's perspective. From the operation's point of view, it is easy for manipulating JVM. There is a > simple relationship: larger network traffic -> higher memory allocation rate -> larger young > generation. So cluster operation can easily set the young generation as 10% of max young gen > size to every Java instance if the network traffic is guanranteed to be below 10% for a period of time. > > I'm not sticking to the current implementation to create clear boundary > between young and old gen, especially for newer OpenJDK versions and I've been thinking of unifying > the 2 generations' resizing within the single memory pool of heap along with Xms. The periodic > uncommit mode does not strickly separate the young/old gen. Current implementation calculates the > average GC interval and keep it in a certain range between a low bound and high bound and will immediately > trigger an expansion if a single GC interval smaller than a threshould. We can use a similar > policy to estimate a target young generation capacity and adjust the capacity of old generation after a > concurrent cycle. The 2 parts together can be the target heap capacity. The capacity can vary between > Xms and Xmx. The difference with current G1 is it can be resized in a young GC not only remark. Thank you for presenting your problem (and not insisting on a particular solution upfront). Summary of this long text: In case of "low" activity the user wants to limit the heap resulting in giving back memory. Currently, all the user can do is specifying the maximum amount of work the gc is allowed to use (GCTimeRatio). At least G1, as soon as the time spent in gc compared to mutator time is lower than GCTimeRatio (typically achieved by expanding the heap), it "never" shrinks the heap back (at least not based on that ratio). Which wastes lots of space, which is the problem. We all agree that this is a problem :) I believe we only differ on what knobs the user should have available to achieve this. Here are my current suggestions: One option that I suggested earlier, is that instead of setting generation sizes (or heap sizes) manually (which could be fine in some cases for other reasons) could be thinking a bit differently about GCTimeRatio than now: currently it is the maximum amount of GC activity the user can bear, so we should make the GC to use less. The slight tweak here could be that we assume that any GC activity below that is fine :) Ie. if current GC activity is very low compared to mutator activity (far below what GCTimeRatio allows), and expected additional GC activity caused by this forced GC cycle would not exceed that GCTimeRatio, why not do the GC? Think of a "minimum" GCTimeRatio; in some way this is very much like minimum and maximum GC intervals only with much more flexibility for the GC to meet (also this metric is independent of the environment, e.g. hardware, while setting actual values of sizes needs tuning). I agree that there is then not an immediately obvious relation between external input (the traffic in your example) to what you should set that "minimum" GCTimeRatio to. However since there is a relation to young gen size and GCTimeRatio I think this can be figured out. This is what ZGC does and I think would be worth trying out before thinking about adding a G1 specific way of achieving this or a similar effect. The other option which is more direct would be implementing and changing target heap size during runtime: it would also automatically shrink the heap. I believe that if you were able to modify the current adaptive IHOP's "target" heap size from outside, G1 would already automatically give back memory; in conjunction with the "Promptly Return ...", it would also make sure that in very low mutator activity cases the GC cycle would continue. As for whether this feature would be accepted for inclusion into G1: there is already a SoftMaxHeapSize switch in the JDK, so I guess this is a non-issue. Note that you can *already*, if you know that from a particular time on there will be little activity, modify the "Promptly Return..." settings so that it will immediately start cleaning up and compacting the heap; you can even force maximum compaction at that time by issuing a full gc if service interruption is not an issue. > > In order to do swift heap resizing we have to conquer the over head of > memory request/release from OS. The memory unmap and map(including the page fault) cost significant > time. So we use an intuitive way to have a concurrent thread to do the map/unmap/pretouch. The free > regions will be synchronized in GC pause. In our applications, a typical G1 remark cost ~100ms of pause. I > haven't tested latest G1 but based on our experimental data, the pause can be easily doubled if done > considerable map/unmaps. > That's a related but distinct problem and a solution that seems at least worth trying :) > > All of above are our thoughts and the present implementation is kind of > reference. Please let me know if > I answered all your questions. Hope we can come to an agreement in some > points and conceive a good design > in latest G1 GC :) > Thanks, Thomas From thomas.schatzl at oracle.com Sat Oct 12 15:00:19 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Sat, 12 Oct 2019 17:00:19 +0200 Subject: G1 patch of elastic Java heap In-Reply-To: <66393648-73b1-4a45-9d48-c8fcf94789fa.maoliang.ml@alibaba-inc.com> References: <6270ce59-4a8e-431e-9ccf-f6d2c0f927eb.maoliang.ml@alibaba-inc.com> <1267a5dd2cf6cc1d03df64d07a06ba0f45195951.camel@oracle.com> <3140197d-8cab-4a86-af92-58431c74cb6b.maoliang.ml@alibaba-inc.com> <6cc5bdd7-c076-472f-8a36-8294c6cbfe21.maoliang.ml@alibaba-inc.com> ,<5f02f337-f479-55f6-351e-867507845f65@oracle.com> <66393648-73b1-4a45-9d48-c8fcf94789fa.maoliang.ml@alibaba-inc.com> Message-ID: <858f43f72ea9325907a6bd6955768af3d64e57fc.camel@oracle.com> Hi, On Sat, 2019-10-12 at 19:51 +0800, Liang Mao wrote: > Hi Thomas, > > The manual generation limit can be put aside currently since we know > it might not be so general for a GC. We can focus on how to change > heap size and return memory in runtime first. > > GCTimeRatio is a good metric to measure the health of a Java > application and I have considered to use that. But finally I chose > a simple way just like the periodic old GC. Guarantee a long > enough young GC interval is an alternative way to make sure the > GCTimeRatio at a heathy state. > I'm absolutely ok to use GCTimeRatio instead of the fixed young GC > interval. This part is same to ZGC or Shenandoah for how to balance > the desired memory size and GC frequency. I'm open to any good > solution and we are already in the same page for this issue > I think:) +1 > A big difference of our implementation is evaluating heap resizing in > any young GC instead of a concurrent gc cycle which I think is > swifter and more immmediate. The concurrent map/unmap > mechanism gets rid of the additional pause time. My thought is the > heap shrink/expand can be all determined in young GC pause and > performed in concurrent thread which could exclude the > considerable time cost by OS interface. Most of our Java users are > intolerant to those pause pikes caused by page fault which can be up > to seconds. And we also found the issue of time cost by map/unmap in > ZGC. > > A direct advantage of the young GC resizing and concurrent memory > free machanism is for implementing SoftMaxHeapSize. The heap size can > be changed after last mixed GC. The young GC won't have longer > pause and the memory can be freed concurrently without side effect. Agree and agree. Both evaluating and giving back memory at any gc sounds nice, and doing that without incurring the costs in the pause is even better :) Thanks, Thomas From sangheon.kim at oracle.com Sun Oct 13 06:00:18 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Sat, 12 Oct 2019 23:00:18 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: References: Message-ID: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> Hi all, Previous patch conflicts, so I'm posting rebased one. Webrev: http://cr.openjdk.java.net/~sangheki/8220311/webrev.2 Testing: hs-tier 1 ~ 5, with/without UseNUMA Thanks, Sangheon On 10/1/19 9:53 AM, sangheon.kim at oracle.com wrote: > Hi all, > > As JDK-8220310 changed a lot, I'm posting next webrev. > Previous webrev just conflicts. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.1 > http://cr.openjdk.java.net/~sangheki/8220311/webrev.1.inc > Testing: hs-tier 1 ~ 5 with +- UseNUMA > > Thanks, > Sangheon > > > On 9/4/19 12:16 AM, sangheon.kim at oracle.com wrote: >> Hi all, >> >> Please review this patch making G1 NUMA aware. >> This is the second part of G1 NUMA implementation: >> - Making Survivor region NUMA aware. >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8220311 >> Webrev: http://cr.openjdk.java.net/~sangheki/8220311/webrev.0 >> Testing: hs-tier 1 ~ 5 with +- UseNUMA >> >> Thanks, >> Sangheon > From sangheon.kim at oracle.com Sun Oct 13 06:16:27 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Sat, 12 Oct 2019 23:16:27 -0700 Subject: RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3) In-Reply-To: References: Message-ID: Hi all, Previous patch conflicts because of JDK-8220310, I'm posting rebased one with some refactoring. Webrev: http://cr.openjdk.java.net/~sangheki/8220312/webrev.2 Testing: hs-tier 1 ~ 5, with/without UseNUMA Here's the full patch of 8220310, 8220311 and 8220312. http://cr.openjdk.java.net/~sangheki/8220312/webrev.full.2/ Thanks, Sangheon On 10/2/19 10:11 AM, sangheon.kim at oracle.com wrote: > Hi, > > Here's the rebased webrev with minor changes. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220312/webrev.1 > http://cr.openjdk.java.net/~sangheki/8220312/webrev.1.inc > Testing: hs-tier 1 ~ 5 with +- UseNUMA > > FYI, here's the full patch including JDK-8220310, 8220311, 8220312. > http://cr.openjdk.java.net/~sangheki/8220312/webrev.full/ > > Thanks, > Sangheon > > > On 9/4/19 12:16 AM, sangheon.kim at oracle.com wrote: >> Hi all, >> >> Please review this patch making G1 NUMA aware. >> This is the last part of G1 NUMA implementation: >> - Adding logs and stat. >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8220312 >> Webrev: http://cr.openjdk.java.net/~sangheki/8220312/webrev.0 >> Testing: hs-tier 1 ~ 8 with +- UseNUMA >> >> Thanks, >> Sangheon > From maoliang.ml at alibaba-inc.com Mon Oct 14 03:52:19 2019 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Mon, 14 Oct 2019 11:52:19 +0800 Subject: =?UTF-8?B?UmU6IEcxIHBhdGNoIG9mIGVsYXN0aWMgSmF2YSBoZWFw?= In-Reply-To: <858f43f72ea9325907a6bd6955768af3d64e57fc.camel@oracle.com> References: <6270ce59-4a8e-431e-9ccf-f6d2c0f927eb.maoliang.ml@alibaba-inc.com> <1267a5dd2cf6cc1d03df64d07a06ba0f45195951.camel@oracle.com> <3140197d-8cab-4a86-af92-58431c74cb6b.maoliang.ml@alibaba-inc.com> <6cc5bdd7-c076-472f-8a36-8294c6cbfe21.maoliang.ml@alibaba-inc.com> , <5f02f337-f479-55f6-351e-867507845f65@oracle.com> <66393648-73b1-4a45-9d48-c8fcf94789fa.maoliang.ml@alibaba-inc.com>, <858f43f72ea9325907a6bd6955768af3d64e57fc.camel@oracle.com> Message-ID: <77e0e95e-8500-46e6-8b80-6f25b33f6c7f.maoliang.ml@alibaba-inc.com> Hi Thomas, Thank you for the recognition:) Since we both agree on some clear specific points, I will try to extract them from current implementation and create a patch in OpenJDK upstream branch so we can continue discussion on the code level. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2019 Oct. 12 (Sat.) 23:00 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: G1 patch of elastic Java heap Hi, On Sat, 2019-10-12 at 19:51 +0800, Liang Mao wrote: > Hi Thomas, > > The manual generation limit can be put aside currently since we know > it might not be so general for a GC. We can focus on how to change > heap size and return memory in runtime first. > > GCTimeRatio is a good metric to measure the health of a Java > application and I have considered to use that. But finally I chose > a simple way just like the periodic old GC. Guarantee a long > enough young GC interval is an alternative way to make sure the > GCTimeRatio at a heathy state. > I'm absolutely ok to use GCTimeRatio instead of the fixed young GC > interval. This part is same to ZGC or Shenandoah for how to balance > the desired memory size and GC frequency. I'm open to any good > solution and we are already in the same page for this issue > I think:) +1 > A big difference of our implementation is evaluating heap resizing in > any young GC instead of a concurrent gc cycle which I think is > swifter and more immmediate. The concurrent map/unmap > mechanism gets rid of the additional pause time. My thought is the > heap shrink/expand can be all determined in young GC pause and > performed in concurrent thread which could exclude the > considerable time cost by OS interface. Most of our Java users are > intolerant to those pause pikes caused by page fault which can be up > to seconds. And we also found the issue of time cost by map/unmap in > ZGC. > > A direct advantage of the young GC resizing and concurrent memory > free machanism is for implementing SoftMaxHeapSize. The heap size can > be changed after last mixed GC. The young GC won't have longer > pause and the memory can be freed concurrently without side effect. Agree and agree. Both evaluating and giving back memory at any gc sounds nice, and doing that without incurring the costs in the pause is even better :) Thanks, Thomas From rkennke at redhat.com Mon Oct 14 09:02:45 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 14 Oct 2019 11:02:45 +0200 Subject: RFR (XS/T) 8232176: Shenandoah: new assert in ShenandoahEvacuationTask is too strong In-Reply-To: References: Message-ID: <82671652-4969-f7d2-2a0b-a8d869f9904e@redhat.com> Hmm, ok. Roman > Recent regression: > https://bugs.openjdk.java.net/browse/JDK-8232176 > > JDK-8231947 added the assert in ShenandoahEvacuationTask that is too strong. There is a corner case > when the region is collection-set-pinned (CSP), and the oom-evac-protocol waits for GC thread to > complete the evacuation. There is a short window where GC thread can see the CSP region before > seeing cancellation request. > > It seems easier to remove the too strong assert for now. is_conc_move_allowed() == true is a lie > right now. We can add cancelled_gc() check inside of it, but that would only be safe if we know that > caller holds oom-evac-scope. > > The assertion failure reliably reproduces with -XX:ShenandoahGCHeuristics=aggressive > -XX:+ShenandoahOOMDuringEvacALot on SPECjvm2008. > > Fix: > https://cr.openjdk.java.net/~shade/8232176/webrev.01/ > > Testing: broken tests > From shade at redhat.com Mon Oct 14 09:07:12 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 14 Oct 2019 11:07:12 +0200 Subject: RFR (XS/T) 8232176: Shenandoah: new assert in ShenandoahEvacuationTask is too strong In-Reply-To: <82671652-4969-f7d2-2a0b-a8d869f9904e@redhat.com> References: <82671652-4969-f7d2-2a0b-a8d869f9904e@redhat.com> Message-ID: Thanks, pushed. -Aleksey On 10/14/19 11:02 AM, Roman Kennke wrote: > Hmm, ok. > > Roman > > >> Recent regression: >> https://bugs.openjdk.java.net/browse/JDK-8232176 >> >> JDK-8231947 added the assert in ShenandoahEvacuationTask that is too strong. There is a corner case >> when the region is collection-set-pinned (CSP), and the oom-evac-protocol waits for GC thread to >> complete the evacuation. There is a short window where GC thread can see the CSP region before >> seeing cancellation request. >> >> It seems easier to remove the too strong assert for now. is_conc_move_allowed() == true is a lie >> right now. We can add cancelled_gc() check inside of it, but that would only be safe if we know that >> caller holds oom-evac-scope. >> >> The assertion failure reliably reproduces with -XX:ShenandoahGCHeuristics=aggressive >> -XX:+ShenandoahOOMDuringEvacALot on SPECjvm2008. >> >> Fix: >> https://cr.openjdk.java.net/~shade/8232176/webrev.01/ >> >> Testing: broken tests >> > From shade at redhat.com Mon Oct 14 09:20:32 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 14 Oct 2019 11:20:32 +0200 Subject: RFR (XS) 8232205: Shenandoah: missing "Update References" -> "Update Roots" tracing Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8232205 Noticed that -Xlog:gc+stats does not print "Update Roots" section for "Update References". This is a regression since JDK-8223951. Fix: https://cr.openjdk.java.net/~shade/8232205/webrev.01/ Testing: hotspot_gc_shenandoah, eyeballing gc+stats -- Thanks, -Aleksey From stefan.johansson at oracle.com Mon Oct 14 13:00:22 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 14 Oct 2019 15:00:22 +0200 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> Message-ID: <400df998-171a-5bbe-9f3e-01af1781afb4@oracle.com> Thanks for the quick update Haoyu, This is a great improvement and I will try to find time to look into the patch in more detail the coming weeks. Thanks, Stefan On 2019-10-11 14:49, Haoyu Li wrote: > Hi Stefan, > > Thanks for your suggestion! It is very redundant that > PSParallelCompact::fill_shadow_region() copies most code from > PSParallelCompact::fill_region(), and therefore I've refactored these > two functions to share code as many as possible. And the attachment is > the updated patch. > > Specifically, the closure, which moves objects, in > PSParallelCompact::fill_region() is now declared as a template of > either MoveAndUpdateClosure or ShadowClosure. So by controlling the > type of closure when invoking the function, we can decide whether to > fill a normal region or a shadow one. Thus, almost all code in > PSParallelCompact::fill_region() can be reused. > > Besides, a virtual function named complete_region() is added in both > closures to do some work after the filling, such setting states and > copying the shadow region back. > > Thanks again for reviewing the patch, looking forward to your insights > and suggestions! > > Best Regards, > Haoyu Li > > 2019-10-10 21:50 GMT+08:00, Stefan Johansson : >> Thanks for the clarification =) >> >> Moving on to the next part, the code in the patch. So this won't be a >> full review of the patch but just an initial comment that I would like >> to be addressed first. >> >> The new function PSParallelCompact::fill_shadow_region() is more or less >> a copy of PSParallelCompact::fill_region() and I understand that from a >> proof of concept point of view it was the easy (and right) way to do it. >> I would prefer if the code could be refactored so that fill_region() and >> fill_shadow_region() share more code. There might be reasons that I've >> missed, that prevents it, but we should at least explore how much code >> can be shared. >> >> Thanks, >> Stefan >> >> On 2019-10-10 15:10, Haoyu Li wrote: >>> Hi Stefan, >>> >>> Thanks for your quick response! As to your concern about the OCA, I am >>> the sole author of the patch. And it is the case as what the agreement >>> states. >>> Best Regrads, >>> Haoyu Li, >>> >>> >>> Stefan Johansson >> > ?2019?10?10??? ??8:37??? >>> >>> Hi, >>> >>> On 2019-10-10 13:06, Haoyu Li wrote: >>> > Hi Stefan, >>> > >>> > Thanks for your testing! One possible reason for the regressions >>> in >>> > simple tests is that the region dependencies maybe not heavy >>> enough. >>> > Because the locality of shadow regions is lower than that of heap >>> > regions, writing to shadow regions will be slower than to normal >>> > regions, and this is a part of the reason why I reuse shadow >>> regions. >>> > Therefore, if only a few shadow regions are created and not >>> reused, the >>> > overhead may not be amortized. >>> >>> I guess it is something like this. I thought that for "easy" heaps >>> the >>> shadow regions won't be used at all, and should therefor not really >>> cost >>> anything. >>> >>> > >>> > As to the OCA, it is the case that I'm the only person signing the >>> > agreement. Please let me know if you have any further questions. >>> Thanks >>> > again! >>> >>> Ok, so you are the sole author of the patch. The important part, as >>> the >>> agreement states, is: >>> "no other person or entity, including my employer, has or will have >>> rights with respect my contributions" >>> >>> Is that the case? >>> >>> Thanks, >>> Stefan >>> >>> > >>> > Best Regrads, >>> > Haoyu Li >>> > >>> > Stefan Johansson >> >>> > >> >> ?2019?10?8??? ??6:49 >>> ??? >>> > >>> > Hi Haoyu, >>> > >>> > I've done some more testing and I haven't seen any issues >>> with the >>> > patch >>> > so far and the performance looks promising in most cases. For >>> simple >>> > tests I've seen some regressions, but I'm not really sure >>> why. Will do >>> > some more digging. >>> > >>> > To move forward with this the first thing we need to do is >>> making sure >>> > that you being covered by the Oracle Contributor Agreement is >>> enough. >>> > From what we can see it is only you as an individual that >>> has signed >>> > the OCA and in that case it is important that this statement >>> from the >>> > OCA is fulfilled: "no other person or entity, including my >>> employer, >>> > has >>> > or will have rights with respect my contributions" >>> > >>> > Is this the case for this contribution or should we have the >>> university >>> > sign the OCA as well? For more information regarding the OCA >>> please >>> > refer to: >>> > https://www.oracle.com/technetwork/oca-faq-405384.pdf >>> > >>> > Thanks, >>> > Stefan >>> > >>> > On 2019-09-16 16:02, Haoyu Li wrote: >>> > > FYI, the evaluation results on OpenJDK 14 are plotted in >>> the >>> > attachment. >>> > > I compute the full GC throughput by dividing the heap size >>> before >>> > full >>> > > GC by the GC pause time, and the results are arithmetic >>> mean >>> > values of >>> > > ten runs after a warm-up run. The evaluation is conducted on >>> a >>> > machine >>> > > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 >>> physical >>> > cores >>> > > with SMT enabled) and 64G DRAM. >>> > > >>> > > Best Regrads, >>> > > Haoyu Li, >>> > > Institute of Parallel and Distributed Systems(IPADS), >>> > > School of Software, >>> > > Shanghai Jiao Tong University >>> > > >>> > > >>> > > Stefan Johansson >> >>> > >> > >>> > > >> >>> > >> >>> ?2019?9?12??? ??5:34 >>> > ??? >>> > > >>> > > Hi Haoyu, >>> > > >>> > > I recently came across your patch and I would like to >>> pick up on >>> > > some of the things Kim mentioned in his mails. I >>> especially want >>> > > evaluate and investigate if this is a technique we can >>> use to >>> > > improve the other GCs as well. To start that work I >>> want to >>> > take the >>> > > patch for a spin in our internal performance testing. >>> The patch >>> > > doesn?t apply clean to the latest JDK repository, so >>> if you could >>> > > provide an updated patch that would be very helpful. >>> > > >>> > > It would also be great if you could share some more >>> information >>> > > around the results presented in the paper. For example, >>> it >>> > would be >>> > > good to get the full command lines for the different >>> > benchmarks so >>> > > we can run them locally and reproduce the >>> results you?ve seen. >>> > > >>> > > Thanks, >>> > > Stefan >>> > > >>> > >> 12 mars 2019 kl. 03:21 skrev Haoyu Li >>> >>> > > >>> > >> >> >> >>>: >>> > >> >>> > >> Hi Kim, >>> > >> >>> > >> Thanks for reviewing and testing the patch. If there >>> are any >>> > >> failures or performance degradation relevant to the >>> work, please >>> > >> let me know and I'll be very happy to keep improving >>> it. >>> > Also, any >>> > >> suggestions about code improvements are well >>> appreciated. >>> > >> >>> > >> I'm not quite sure if both G1 and Shenandoah have the >>> similar >>> > >> region dependency issue, since I haven't studied their >>> GC >>> > >> behaviors before. If they have, I'm also willing to >>> propose >>> > a more >>> > >> general optimization. >>> > >> >>> > >> As to the memory overhead, I believe it will be low >>> because this >>> > >> patch exploits empty regions in the young space >>> rather than >>> > >> off-heap memory to allocate shadow regions, and also >>> reuses the >>> > >> /_source_region/ field of each /RegionData /to record >>> the >>> > >> correspongding shadow region index. We only introduce >>> a new >>> > >> integer filed /_shadow /in the RegionData class to >>> indicate the >>> > >> status of a region, a global /GrowableArray >>> _free_shadow/ to >>> > store >>> > >> the indices of shadow regions, and a global >>> /Monitor/ to protect >>> > >> the array. These information might help if the memory >>> overhead >>> > >> need to be evaluated. >>> > >> >>> > >> Looking forward to your insight. >>> > >> >>> > >> Best Regrads, >>> > >> Haoyu Li, >>> > >> Institute of Parallel and Distributed Systems(IPADS), >>> > >> School of Software, >>> > >> Shanghai Jiao Tong University >>> > >> >>> > >> >>> > >> Kim Barrett >> >>> > >> > >>> > >> >> >>> > >> >>> ?2019?3?12??? ??6:11??? >>> > >> >>> > >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett >>> > >> >> >> > >>> > >> >> >>> wrote: >>> > >> > >>> > >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li >>> > >>> > >>> > >> >> >>> > >>> >>> wrote: >>> > >> >> >>> > >> >> Hi Kim, >>> > >> >> >>> > >> >> I have ported my patch to OpenJDK 13 according >>> to your >>> > >> instructions in your last mail, and the patch is >>> attached in >>> > >> this mail. The patch does not change much since >>> PSGC is >>> > indeed >>> > >> pretty stable. >>> > >> >> >>> > >> >> Also, I evaluate the correctness and >>> performance of >>> > PS full >>> > >> GC with benchmarks from DaCapo, SPECjvm2008, and >>> JOlden >>> > suits >>> > >> on a machine with dual Intel Xeon E5-2618L v3 >>> CPUs(16 >>> > physical >>> > >> cores), 64G DRAM and linux kernel 4.17. The >>> evaluation >>> > result, >>> > >> indicating 1.9X GC throughput improvement on >>> average, is >>> > >> attached, too. >>> > >> >> >>> > >> >> However, I have no idea how to further test >>> this >>> > patch for >>> > >> both correctness and performance. Can I please >>> get any >>> > >> guidance from you or some sponsor? >>> > >> > >>> > >> > Sorry I missed that you had sent an updated >>> version of the >>> > >> patch. >>> > >> > >>> > >> > I?ve run the full regression suite across >>> Oracle-supported >>> > >> platforms. There are some >>> > >> > failures, but there are almost always some >>> failures in the >>> > >> later tiers right now. I?ll start >>> > >> > looking at them tomorrow to figure out whether >>> any of them >>> > >> are relevant. >>> > >> > >>> > >> > I?m also planning to run some of our performance >>> > benchmarks. >>> > >> > >>> > >> > I?ve lightly skimmed the proposed changes. >>> There might be >>> > >> some code improvements >>> > >> > to be made. >>> > >> > >>> > >> > I?m also wondering if this technique applies to >>> other >>> > >> collectors. It seems like both G1 and >>> > >> > Shenandoah full gc?s might have similar >>> issues? If so, a >>> > >> solution that is ParallelGC-specific >>> > >> > is less interesting than one that has broader >>> > >> applicability. Though maybe this optimization >>> > >> > is less important for G1 and Shenandoah, since >>> they >>> > actively >>> > >> try to avoid full gc?s. >>> > >> > >>> > >> > I?m also not clear on how much additional >>> memory might be >>> > >> temporarily allocated by this >>> > >> > mechanism. >>> > >> >>> > >> I?ve created a CR for this: >>> > >> https://bugs.openjdk.java.net/browse/JDK-8220465 >>> > >> >>> > > >>> > >>> >> > > From stefan.johansson at oracle.com Mon Oct 14 15:29:43 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 14 Oct 2019 17:29:43 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> Message-ID: Hi Sangheon (and Kim), On 2019-10-11 19:34, sangheon.kim at oracle.com wrote: >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp >> ?? 82?????? _storage.request_memory_on_node(page, _pages_per_region, >> node_index); >> ... >> ? 153???????? _storage.request_memory_on_node(idx, 1, node_index); >> >> I'm not sure request_memory_on_node belongs on the _storage object. >> The current implementation just has the storage object (conditionally) >> forward the request to the memory node manager object. These places in >> the space mapper could just make the calls on the memory node manager >> object directly (it is already being used nearby).? And these places >> don't need the conditionalization. >> >> I think making the space mapper directly call the memory node manager >> here would remove the need for the proposed changes to the virtual >> space class. > Fixed to directly call G1NUMA::request_memory_on_node() (previously > G1MemoryNodeManager). > But G1NUMA can't calculate raw address, so I had to add base address at > G1NUMA to get that. > > When I implemented it, I had similar opinion (not good fit for _storage) > but I also wanted to avoid adding extra dependency at G1NUMA. But anyway > I realized we can achieve it easily if we have base address. I don't fully I agree here. I think having the storage do the call to G1NUMA does make sense because it knows how to translate a page index to a real address. It also goes along the same lines as the pretouch() call in commit_regions(), but I won't object if we want to leave it in the mapper. If we do that, there are still some changes required, because we currently will call G1NUMA::request_memory_on_node() for all mappers and all mappers will then use the heaps base address when calling numa_make_local(). So I propose two changes: 1. Expose G1PageBasedVirtualSpace::page_start() or use G1CollectedHeap::bottom_addr_for_region(uint index) and let the mapper use it to call request_memory_on_node() with a real address rather than a page index. Another solution could be to change the function even more and call it request_heap_region_on_node() and just pass in the region index and then use G1CollectedHeap::bottom_addr_for_region(uint index) in G1NUMA. 2. Add a state to the mappers to say if they are NUMA aware or not, and currently only the heap mapper should be NUMA aware. We could either set this state to true using the mtJavaHeap type as we have checked before or add an explicit setter that we only call for the heap mapper. I know that only doing 2) will fix the current problem, but I think it would be nice to avoid having the base address in G1NUMA, thoughts? > > > FYI, I filed JDK-8232156 for further investigation of initialization > order related to G1NUMA. i.e. about removing G1NUMA::set_region_info(). > Thanks for filing this. > New webrev includes: > 1. Addressed most comments from Kim, Stefan and Thomas. > 2. Rename G1MemoryNodeManager to G1NUMA with removing virtual calls. > > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.4 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.4.inc Apart from my comment above I think this looks really good, just one small additional comment: src/hotspot/os/linux/os_linux.cpp --- 3021 #endif 3022 3023 int id = InvalidNUMAId; Extra whitespace on line 3022. --- Thanks, Stefan > > Testing: hs-tier 1 ~ 5 with/without UseNUMA > > Thanks, > Sangheon > > >> >> ------------------------------------------------------------------------------ >> >> > From kim.barrett at oracle.com Mon Oct 14 21:03:58 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 14 Oct 2019 17:03:58 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> Message-ID: <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> > On Oct 14, 2019, at 11:29 AM, Stefan Johansson wrote: > > Hi Sangheon (and Kim), > > On 2019-10-11 19:34, sangheon.kim at oracle.com wrote: >>> ------------------------------------------------------------------------------ >>> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp >>> 82 _storage.request_memory_on_node(page, _pages_per_region, node_index); >>> ... >>> 153 _storage.request_memory_on_node(idx, 1, node_index); >>> >>> I'm not sure request_memory_on_node belongs on the _storage object. >>> The current implementation just has the storage object (conditionally) >>> forward the request to the memory node manager object. These places in >>> the space mapper could just make the calls on the memory node manager >>> object directly (it is already being used nearby). And these places >>> don't need the conditionalization. >>> >>> I think making the space mapper directly call the memory node manager >>> here would remove the need for the proposed changes to the virtual >>> space class. >> Fixed to directly call G1NUMA::request_memory_on_node() (previously G1MemoryNodeManager). >> But G1NUMA can't calculate raw address, so I had to add base address at G1NUMA to get that. >> When I implemented it, I had similar opinion (not good fit for _storage) but I also wanted to avoid adding extra dependency at G1NUMA. But anyway I realized we can achieve it easily if we have base address. > > I don't fully I agree here. I think having the storage do the call to G1NUMA does make sense because it knows how to translate a page index to a real address. It also goes along the same lines as the pretouch() call in commit_regions(), but I won't object if we want to leave it in the mapper. > > If we do that, there are still some changes required, because we currently will call G1NUMA::request_memory_on_node() for all mappers and all mappers will then use the heaps base address when calling numa_make_local(). So I propose two changes: > 1. Expose G1PageBasedVirtualSpace::page_start() or use G1CollectedHeap::bottom_addr_for_region(uint index) and let the mapper use it to call request_memory_on_node() with a real address rather than a page index. Another solution could be to change the function even more and call it request_heap_region_on_node() and just pass in the region index and then use G1CollectedHeap::bottom_addr_for_region(uint index) in G1NUMA. I overlooked part of how my suggestion was handled. Yeah, I don't think I like having the base address added to G1NUMA. I like Stefan's change #1 (specifically, adding G1PBVS::page_start()). I also missed that there seems to be a units mismatch in the call to request_memory_on_node in G1RegionsSmallerThanCommitSizeMapper. It's passing a region index rather than a page index (before above change) or address (after above change). > 2. Add a state to the mappers to say if they are NUMA aware or not, and currently only the heap mapper should be NUMA aware. We could either set this state to true using the mtJavaHeap type as we have checked before or add an explicit setter that we only call for the heap mapper. > > I know that only doing 2) will fix the current problem, but I think it would be nice to avoid having the base address in G1NUMA, thoughts? I don't understand the point about mappers needing to know if they are NUMA or not. request_memory_on_node is only called by the two relevant region->space mappers, with the memory involved always in the Java heap (after fixing the units mismatch mentioned above). That is, G1NUMA::request_memory_on_node should only be called for Java heap memory. (It might be able to assert is_in_reserved or something like that, though initialization order might prevent that.) From kim.barrett at oracle.com Mon Oct 14 21:31:01 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 14 Oct 2019 17:31:01 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> Message-ID: <23E3BBAA-A298-42FE-B594-7061DC3E0FD9@oracle.com> > On Oct 14, 2019, at 5:03 PM, Kim Barrett wrote: > I also missed that there seems to be a units mismatch in the call to > request_memory_on_node in G1RegionsSmallerThanCommitSizeMapper. It's > passing a region index rather than a page index (before above change) > or address (after above change). That?s wrong; there are some problematic variable namings here. start_idx is a heap region index, idx is a page index. Sangheon and I discussed this offline and he?s planning to change some variable names here. From kim.barrett at oracle.com Mon Oct 14 22:20:04 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 14 Oct 2019 18:20:04 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> Message-ID: <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> > On Oct 14, 2019, at 5:03 PM, Kim Barrett wrote: >> 2. Add a state to the mappers to say if they are NUMA aware or not, and currently only the heap mapper should be NUMA aware. We could either set this state to true using the mtJavaHeap type as we have checked before or add an explicit setter that we only call for the heap mapper. >> >> I know that only doing 2) will fix the current problem, but I think it would be nice to avoid having the base address in G1NUMA, thoughts? > > I don't understand the point about mappers needing to know if they are > NUMA or not. request_memory_on_node is only called by the two relevant > region->space mappers, with the memory involved always in the Java > heap (after fixing the units mismatch mentioned above). That is, > G1NUMA::request_memory_on_node should only be called for Java heap > memory. (It might be able to assert is_in_reserved or something like > that, though initialization order might prevent that.) I was confused here too. Sangheon has repaired my confusion, and he?s got another change in the works to tidy things up here in a way that I think will make both me and Stefan happy. From rs at jelastic.com Mon Oct 14 22:46:07 2019 From: rs at jelastic.com (Ruslan Synytsky) Date: Mon, 14 Oct 2019 18:46:07 -0400 Subject: G1 patch of elastic Java heap In-Reply-To: References: Message-ID: Dear Liang and Thomas, thank you for your contribution to Java elasticity. I would like to pay attention to the softmx option which is planned to be renamed to SoftMaxHeapSize as I understand. According to the feedback in another thread, if the memory usage reaches the softmx limit then JVM will throw OOM Error. It differs from the logic described at https://bugs.openjdk.java.net/browse/JDK-8222145. Personally I believe OOM Error inside JVM is a little bit safer approach compared to the potential termination of java process by OOM Killer. But how can we avoid confusions? Should we use different naming? Thanks -- Ruslan Synytsky Date: Mon, 14 Oct 2019 11:52:19 +0800 > Subject: Re: G1 patch of elastic Java heap > Hi Thomas, > > Thank you for the recognition:) Since we both agree on some clear specific > points, > I will try to extract them from current implementation and create a patch > in OpenJDK > upstream branch so we can continue discussion on the code level. > > Thanks, > Liang > > > > > > > ------------------------------------------------------------------ > From:Thomas Schatzl > Send Time:2019 Oct. 12 (Sat.) 23:00 > To:"MAO, Liang" ; hotspot-gc-dev < > hotspot-gc-dev at openjdk.java.net> > Subject:Re: G1 patch of elastic Java heap > > Hi, > > On Sat, 2019-10-12 at 19:51 +0800, Liang Mao wrote: > > Hi Thomas, > > > > The manual generation limit can be put aside currently since we know > > it might not be so general for a GC. We can focus on how to change > > heap size and return memory in runtime first. > > > > GCTimeRatio is a good metric to measure the health of a Java > > application and I have considered to use that. But finally I chose > > a simple way just like the periodic old GC. Guarantee a long > > enough young GC interval is an alternative way to make sure the > > GCTimeRatio at a heathy state. > > I'm absolutely ok to use GCTimeRatio instead of the fixed young GC > > interval. This part is same to ZGC or Shenandoah for how to balance > > the desired memory size and GC frequency. I'm open to any good > > solution and we are already in the same page for this issue > > I think:) > > +1 > > > A big difference of our implementation is evaluating heap resizing in > > any young GC instead of a concurrent gc cycle which I think is > > swifter and more immmediate. The concurrent map/unmap > > mechanism gets rid of the additional pause time. My thought is the > > heap shrink/expand can be all determined in young GC pause and > > performed in concurrent thread which could exclude the > > considerable time cost by OS interface. Most of our Java users are > > intolerant to those pause pikes caused by page fault which can be up > > to seconds. And we also found the issue of time cost by map/unmap in > > ZGC. > > > > A direct advantage of the young GC resizing and concurrent memory > > free machanism is for implementing SoftMaxHeapSize. The heap size can > > be changed after last mixed GC. The young GC won't have longer > > pause and the memory can be freed concurrently without side effect. > > Agree and agree. Both evaluating and giving back memory at any gc > sounds nice, and doing that without incurring the costs in the pause is > even better :) > > Thanks, > Thomas > > > From timberonce at gmail.com Tue Oct 15 01:33:44 2019 From: timberonce at gmail.com (Mingyu Wu) Date: Tue, 15 Oct 2019 09:33:44 +0800 Subject: G1GC: The design choice of prefetching Message-ID: Hi all, I find that G1GC (in OpenJDK12) implements a method named *prefetch_and_push*, which prefetches the header and the first field of an object referenced by a pointer *p *while *p* is about to be enqueued. However, the effect of this prefetch instruction can be unstable as the time when the object is processed is unknown. It is possible that many references are enqueued before *p *(the data structure is actually First-In-Last-Out) and finally evict the cache line storing the object, making the prefetch useless. Therefore, what is the design choice of those prefetch instructions? Do they stand for some tradeoffs related to the overhead of prefetching? Thanks, Mingyu From manc at google.com Tue Oct 15 01:56:12 2019 From: manc at google.com (Man Cao) Date: Mon, 14 Oct 2019 18:56:12 -0700 Subject: RFR(S): 8232232: G1RemSetSummary::_rs_threads_vtimes is not initialized to zero Message-ID: Hi all, Can I have reviews for this fix for logging messages of "Concurrent refinement threads times (s)", and code cleanup? Webrev: https://cr.openjdk.java.net/~manc/8232232/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8232232 -Man From kim.barrett at oracle.com Tue Oct 15 05:57:33 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 15 Oct 2019 01:57:33 -0400 Subject: RFR(S): 8232232: G1RemSetSummary::_rs_threads_vtimes is not initialized to zero In-Reply-To: References: Message-ID: <1A6E72C3-A0F1-4683-809A-EB8436485715@oracle.com> > On Oct 14, 2019, at 9:56 PM, Man Cao wrote: > > Hi all, > > Can I have reviews for this fix for logging messages of "Concurrent > refinement threads times (s)", and code cleanup? > > Webrev: https://cr.openjdk.java.net/~manc/8232232/webrev.00/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8232232 > > -Man Looks good. From maoliang.ml at alibaba-inc.com Tue Oct 15 06:10:52 2019 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 15 Oct 2019 14:10:52 +0800 Subject: =?UTF-8?B?UmU6IEcxIHBhdGNoIG9mIGVsYXN0aWMgSmF2YSBoZWFw?= In-Reply-To: References: , Message-ID: <9e1ea9d1-2340-4c47-9249-12cb04886230.maoliang.ml@alibaba-inc.com> Hi Ruslan and OpenJDK developers, I noticed this difference too. The softmx in OpenJ9 seems to not allow the application beyong the new limit while JDK-8222145 treats the SoftMaxHeapSize as a *soft* limit which can be exceeded. Personally I prefer the former a little bit. But introducing another name seems more confused to users. Maybe use an option to control? Like "bool SoftMaxHeapSizeOOM" ? Thanks, Liang ------------------------------------------------------------------ From:Ruslan Synytsky Send Time:2019 Oct. 15 (Tue.) 06:46 To:hotspot-gc-dev at openjdk.java.net openjdk.java.net ; "MAO, Liang" ; Thomas Schatzl Subject:Re: G1 patch of elastic Java heap Dear Liang and Thomas, thank you for your contribution to Java elasticity. I would like to pay attention to the softmx option which is planned to be renamed to SoftMaxHeapSize as I understand. According to the feedback in another thread, if the memory usage reaches the softmx limit then JVM will throw OOM Error. It differs from the logic described at https://bugs.openjdk.java.net/browse/JDK-8222145. Personally I believe OOM Error inside JVM is a little bit safer approach compared to the potential termination of java process by OOM Killer. But how can we avoid confusions? Should we use different naming? Thanks -- Ruslan Synytsky Date: Mon, 14 Oct 2019 11:52:19 +0800 Subject: Re: G1 patch of elastic Java heap Hi Thomas, Thank you for the recognition:) Since we both agree on some clear specific points, I will try to extract them from current implementation and create a patch in OpenJDK upstream branch so we can continue discussion on the code level. Thanks, Liang ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2019 Oct. 12 (Sat.) 23:00 To:"MAO, Liang" ; hotspot-gc-dev Subject:Re: G1 patch of elastic Java heap Hi, On Sat, 2019-10-12 at 19:51 +0800, Liang Mao wrote: > Hi Thomas, > > The manual generation limit can be put aside currently since we know > it might not be so general for a GC. We can focus on how to change > heap size and return memory in runtime first. > > GCTimeRatio is a good metric to measure the health of a Java > application and I have considered to use that. But finally I chose > a simple way just like the periodic old GC. Guarantee a long > enough young GC interval is an alternative way to make sure the > GCTimeRatio at a heathy state. > I'm absolutely ok to use GCTimeRatio instead of the fixed young GC > interval. This part is same to ZGC or Shenandoah for how to balance > the desired memory size and GC frequency. I'm open to any good > solution and we are already in the same page for this issue > I think:) +1 > A big difference of our implementation is evaluating heap resizing in > any young GC instead of a concurrent gc cycle which I think is > swifter and more immmediate. The concurrent map/unmap > mechanism gets rid of the additional pause time. My thought is the > heap shrink/expand can be all determined in young GC pause and > performed in concurrent thread which could exclude the > considerable time cost by OS interface. Most of our Java users are > intolerant to those pause pikes caused by page fault which can be up > to seconds. And we also found the issue of time cost by map/unmap in > ZGC. > > A direct advantage of the young GC resizing and concurrent memory > free machanism is for implementing SoftMaxHeapSize. The heap size can > be changed after last mixed GC. The young GC won't have longer > pause and the memory can be freed concurrently without side effect. Agree and agree. Both evaluating and giving back memory at any gc sounds nice, and doing that without incurring the costs in the pause is even better :) Thanks, Thomas From maoliang.ml at alibaba-inc.com Tue Oct 15 06:18:47 2019 From: maoliang.ml at alibaba-inc.com (Liang Mao) Date: Tue, 15 Oct 2019 14:18:47 +0800 Subject: =?UTF-8?B?UmU6IEcxR0M6IFRoZSBkZXNpZ24gY2hvaWNlIG9mIHByZWZldGNoaW5n?= In-Reply-To: References: Message-ID: <4e9e89f4-69e7-4429-93e5-09f09088b64b.maoliang.ml@alibaba-inc.com> Hi Mingyu, The prefetch design is not only available in new versions of G1 GC but introduced in very early years in hotspot and other GCs like ParNew. It is kind of aggressive prefecting imho which prefetches all the addresses in the ref queue which contains *grey pointers* and also creates enough latency between issuing prefetch instructions and memory access to maximize the cache utilization. There could be the problem you mentioned that cache is evicted if overflowed. Maintaining the proper length of the ref queue is the way to avoid this. You can look into the issue below which fixed this problem and improved performance in G1. https://bugs.openjdk.java.net/browse/JDK-6672778 OpenJDK developers may correct me if there's something I misunderstood. Thanks, Liang ------------------------------------------------------------------ From:Mingyu Wu Send Time:2019 Oct. 15 (Tue.) 09:34 To:hotspot-gc-dev Subject:G1GC: The design choice of prefetching Hi all, I find that G1GC (in OpenJDK12) implements a method named *prefetch_and_push*, which prefetches the header and the first field of an object referenced by a pointer *p *while *p* is about to be enqueued. However, the effect of this prefetch instruction can be unstable as the time when the object is processed is unknown. It is possible that many references are enqueued before *p *(the data structure is actually First-In-Last-Out) and finally evict the cache line storing the object, making the prefetch useless. Therefore, what is the design choice of those prefetch instructions? Do they stand for some tradeoffs related to the overhead of prefetching? Thanks, Mingyu From per.liden at oracle.com Tue Oct 15 06:46:46 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 08:46:46 +0200 Subject: RFR: 8232235: ZGC: Move ZValue inline funtions to zValue.inline.hpp Message-ID: <0cafef55-7ae5-f2c5-e6b5-a6db5c0facc3@oracle.com> Please review this clean up patch to move ZValue inline funtions to zValue.inline.hpp. Bug: https://bugs.openjdk.java.net/browse/JDK-8232235 Webrev: http://cr.openjdk.java.net/~pliden/8232235/webrev.0 /Per From per.liden at oracle.com Tue Oct 15 06:47:06 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 08:47:06 +0200 Subject: RFR: 8232236: ZGC: Move ZThread inline funtions to zThread.inline.hpp Message-ID: <9528eb9e-ae58-fb7b-c593-434f97ec1c0e@oracle.com> Please review this clean up patch to move ZThread inline funtions to zThread.inline.hpp. Bug: https://bugs.openjdk.java.net/browse/JDK-8232236 Webrev: http://cr.openjdk.java.net/~pliden/8232236/webrev.0 /Per From per.liden at oracle.com Tue Oct 15 06:47:21 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 08:47:21 +0200 Subject: RFR: 8232237: ZGC: Move ZArray inline funtions to zArray.inline.hpp Message-ID: <02bf6bce-9e00-ed88-b5b1-f7e50c218446@oracle.com> Please review this clean up patch to move ZArray inline funtions to zArray.inline.hpp. Bug: https://bugs.openjdk.java.net/browse/JDK-8232237 Webrev: http://cr.openjdk.java.net/~pliden/8232237/webrev.0 /Per From per.liden at oracle.com Tue Oct 15 06:47:35 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 08:47:35 +0200 Subject: RFR: 8232238: ZGC: Move ZList inline funtions to zList.inline.hpp Message-ID: <41ab05b7-01e3-3e3f-cf1b-7a5a358763ac@oracle.com> Please review this clean up patch to move ZList inline funtions to zList.inline.hpp. Bug: https://bugs.openjdk.java.net/browse/JDK-8232238 Webrev: http://cr.openjdk.java.net/~pliden/8232238/webrev.0 /Per From per.liden at oracle.com Tue Oct 15 06:48:57 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 08:48:57 +0200 Subject: RFR: 8232239: ZGC: Inline ZCPU::count() and ZCPU:id() Message-ID: Please review this patch to enable inlining of ZCPU::count() and ZCPU:id(), which are used in some fairly hot paths. Bug: https://bugs.openjdk.java.net/browse/JDK-8232239 Webrev: http://cr.openjdk.java.net/~pliden/8232239/webrev.0 /Per From per.liden at oracle.com Tue Oct 15 08:12:15 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 10:12:15 +0200 Subject: G1 patch of elastic Java heap In-Reply-To: <9e1ea9d1-2340-4c47-9249-12cb04886230.maoliang.ml@alibaba-inc.com> References: <9e1ea9d1-2340-4c47-9249-12cb04886230.maoliang.ml@alibaba-inc.com> Message-ID: Hi, On 10/15/19 8:10 AM, Liang Mao wrote: > Hi Ruslan and OpenJDK developers, > > I noticed this difference too. The softmx in OpenJ9 seems to not allow the application beyong > the new limit while JDK-8222145 treats the SoftMaxHeapSize as a *soft* limit which can be > exceeded. Personally I prefer the former a little bit. But introducing another name seems more > confused to users. Maybe use an option to control? Like "bool SoftMaxHeapSizeOOM" ? I personally think the OpenJ9 softmx option is misnamed, as it's not a *soft* limit, but a *hard* limit. Hotspot's SoftMaxHeapSize is *soft* by design. Today's hard limit in Hotspot is of course MaxHeapSize (-Xmx). The only problem is that isn't not a manageable flag so it can't be changed at runtime. Making it manageable is tricky for GCs that size data structures at startup based on MaxHeapSize. One option could be to simply reject changes to MaxHeapSize unless the currently used GC declares that it supports changing it. Another option could be to keep MaxHeapSize as is, and introduce a separate flag (e.g. HardMaxHeapSize or CurrentMaxHeapSize). In that case MaxHeapSize would act as the upper limit for a the "hard limit" flag. cheers, Per > > Thanks, > Liang > > > > > > > ------------------------------------------------------------------ > From:Ruslan Synytsky > Send Time:2019 Oct. 15 (Tue.) 06:46 > To:hotspot-gc-dev at openjdk.java.net openjdk.java.net ; "MAO, Liang" ; Thomas Schatzl > Subject:Re: G1 patch of elastic Java heap > > Dear Liang and Thomas, thank you for your contribution to Java elasticity. > > I would like to pay attention to the softmx option which is planned to be renamed to SoftMaxHeapSize as I understand. According to the feedback in another thread, if the memory usage reaches the softmx limit then JVM will throw OOM Error. It differs from the logic described at https://bugs.openjdk.java.net/browse/JDK-8222145. Personally I believe OOM Error inside JVM is a little bit safer approach compared to the potential termination of java process by OOM Killer. But how can we avoid confusions? Should we use different naming? > > Thanks > From thomas.schatzl at oracle.com Tue Oct 15 08:17:20 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 10:17:20 +0200 Subject: G1GC: The design choice of prefetching In-Reply-To: <4e9e89f4-69e7-4429-93e5-09f09088b64b.maoliang.ml@alibaba-inc.com> References: <4e9e89f4-69e7-4429-93e5-09f09088b64b.maoliang.ml@alibaba-inc.com> Message-ID: <05ce27a3-4e4b-0a46-b9b9-5a7155ac8314@oracle.com> Hi Mingyuh, >> ------------------------------------------------------------------ >> From:Mingyu Wu >> Send Time:2019 Oct. 15 (Tue.) 09:34 >> To:hotspot-gc-dev >> Subject:G1GC: The design choice of prefetching >> >> Hi all, >> I find that G1GC (in OpenJDK12) implements a method named >> *prefetch_and_push*, which prefetches the header and the first field >> of an object referenced by a pointer *p *while *p* is about to be >> enqueued. >> However, the effect of this prefetch instruction can be unstable as >> the time when the object is processed is unknown. It is possible that >> many references are enqueued before *p *(the data structure is >> actually First-In-Last-Out) and finally evict the cache line storing >> the object, making the prefetch useless. Therefore, what is the >> design choice of those prefetch instructions? Do they stand for some >> tradeoffs related to the overhead of prefetching? >> >> Thanks, >> Mingyu > On 15.10.19 08:18, Liang Mao wrote: > Hi Mingyu, > > The prefetch design is not only available in new versions of G1 GC but introduced > in very early years in hotspot and other GCs like ParNew. It is kind of aggressive > prefecting imho which prefetches all the addresses in the ref queue which contains > *grey pointers* and also creates enough latency between issuing prefetch instructions > and memory access to maximize the cache utilization. > There could be the problem you mentioned that cache is evicted if overflowed. > Maintaining the proper length of the ref queue is the way to avoid this. You can > look into the issue below which fixed this problem and improved performance in G1. > https://bugs.openjdk.java.net/browse/JDK-6672778 > OpenJDK developers may correct me if there's something I misunderstood. > > Thanks, > Liang > As Liang correctly pointed out, the current oop prefetch design in G1 is mostly based on existing precedence in other GCs and lots of testing. There are some differences noted below. As you also pointed out correctly, there is a tradeoff to be made wrt to the complexity of this code vs. the actual gains. This code path is in my experience *extremely* sensitive to changes, so adding some simple heuristic here might nullify all the gains from more timely prefetching. In my tests, when implementing JDK-6672778 I performed many tests with variants of this scheme. The currently implemented one (with the upper/lower "trim" bound) proved to be fastest overall. Compared to other collectors, G1 also always prefetches and pushes as indicated in the // We're not going to even bother checking whether the object is // already forwarded or not, as this usually causes an immediate // stall. We'll try to prefetch the object (for write, given that // we might need to install the forwarding reference) and we'll // get back to it when pop it from the queue comment in G1ScanClosureBase::prefetch_and_push, contrary to the other collectors which first check whether the reference has already been forwarded. The current code proved better for G1 at the time. Other attempted changes like prepending a small entry FIFO in the push/pop path just made the whole evacution slower (to induce some "fixed" latency between prefetching and work on these reference). But maybe I did something wrong here. These measurements might be invalid at this time, particularly because of changes how the java heap roots are traversed (JDK-8213108), so revisiting this may be interesting and fruitful. It would be really interesting to me to hear back from you or anybody else in the future about experiments you did whatever the results are; even "failed" attempts can be learned from. :) Thanks, Thomas From thomas.schatzl at oracle.com Tue Oct 15 08:21:33 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 10:21:33 +0200 Subject: RFR: 8232235: ZGC: Move ZValue inline funtions to zValue.inline.hpp In-Reply-To: <0cafef55-7ae5-f2c5-e6b5-a6db5c0facc3@oracle.com> References: <0cafef55-7ae5-f2c5-e6b5-a6db5c0facc3@oracle.com> Message-ID: <0e415dad-e040-749b-b241-cbe798851aa4@oracle.com> Hi, On 15.10.19 08:46, Per Liden wrote: > Please review this clean up patch to move ZValue inline funtions to > zValue.inline.hpp. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232235 > Webrev: http://cr.openjdk.java.net/~pliden/8232235/webrev.0 > > /Per zObjectAllocator.hpp: only seems to change the copyright dates, not actual change. No need to re-review removal of this hunk for me. Thanks, Thomas From thomas.schatzl at oracle.com Tue Oct 15 08:23:02 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 10:23:02 +0200 Subject: RFR: 8232236: ZGC: Move ZThread inline funtions to zThread.inline.hpp In-Reply-To: <9528eb9e-ae58-fb7b-c593-434f97ec1c0e@oracle.com> References: <9528eb9e-ae58-fb7b-c593-434f97ec1c0e@oracle.com> Message-ID: <6e82e947-598f-41db-96dd-eaaf5bc32a20@oracle.com> Hi, On 15.10.19 08:47, Per Liden wrote: > Please review this clean up patch to move ZThread inline funtions to > zThread.inline.hpp. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232236 > Webrev: http://cr.openjdk.java.net/~pliden/8232236/webrev.0 > > /Per looks good. Thomas From thomas.schatzl at oracle.com Tue Oct 15 08:23:50 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 10:23:50 +0200 Subject: RFR: 8232237: ZGC: Move ZArray inline funtions to zArray.inline.hpp In-Reply-To: <02bf6bce-9e00-ed88-b5b1-f7e50c218446@oracle.com> References: <02bf6bce-9e00-ed88-b5b1-f7e50c218446@oracle.com> Message-ID: <4da7de41-ddc8-bf25-27aa-abcee4dfeb14@oracle.com> Hi, On 15.10.19 08:47, Per Liden wrote: > Please review this clean up patch to move ZArray inline funtions to > zArray.inline.hpp. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232237 > Webrev: http://cr.openjdk.java.net/~pliden/8232237/webrev.0 > > /Per looks good (and trivial?). Thomas From thomas.schatzl at oracle.com Tue Oct 15 08:32:23 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 10:32:23 +0200 Subject: RFR(S): 8232232: G1RemSetSummary::_rs_threads_vtimes is not initialized to zero In-Reply-To: References: Message-ID: <75800cc4-9d67-bc11-c0b7-4b0a56b85ab5@oracle.com> Hi Man, On 15.10.19 03:56, Man Cao wrote: > Hi all, > > Can I have reviews for this fix for logging messages of "Concurrent > refinement threads times (s)", and code cleanup? > > Webrev: https://cr.openjdk.java.net/~manc/8232232/webrev.00/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8232232 > > -Man > looks good. Thanks, Thomas From per.liden at oracle.com Tue Oct 15 09:13:19 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 11:13:19 +0200 Subject: RFR: 8232237: ZGC: Move ZArray inline funtions to zArray.inline.hpp In-Reply-To: <4da7de41-ddc8-bf25-27aa-abcee4dfeb14@oracle.com> References: <02bf6bce-9e00-ed88-b5b1-f7e50c218446@oracle.com> <4da7de41-ddc8-bf25-27aa-abcee4dfeb14@oracle.com> Message-ID: Thanks Thomas! /Per On 10/15/19 10:23 AM, Thomas Schatzl wrote: > Hi, > > On 15.10.19 08:47, Per Liden wrote: >> Please review this clean up patch to move ZArray inline funtions to >> zArray.inline.hpp. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232237 >> Webrev: http://cr.openjdk.java.net/~pliden/8232237/webrev.0 >> >> /Per > > ? looks good (and trivial?). > > Thomas From per.liden at oracle.com Tue Oct 15 09:13:30 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 11:13:30 +0200 Subject: RFR: 8232236: ZGC: Move ZThread inline funtions to zThread.inline.hpp In-Reply-To: <6e82e947-598f-41db-96dd-eaaf5bc32a20@oracle.com> References: <9528eb9e-ae58-fb7b-c593-434f97ec1c0e@oracle.com> <6e82e947-598f-41db-96dd-eaaf5bc32a20@oracle.com> Message-ID: <0ad7e93c-af8a-d0be-202c-9dca7eafa27f@oracle.com> Thanks Thomas! /Per On 10/15/19 10:23 AM, Thomas Schatzl wrote: > Hi, > > On 15.10.19 08:47, Per Liden wrote: >> Please review this clean up patch to move ZThread inline funtions to >> zThread.inline.hpp. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232236 >> Webrev: http://cr.openjdk.java.net/~pliden/8232236/webrev.0 >> >> /Per > > ? looks good. > > Thomas From per.liden at oracle.com Tue Oct 15 09:14:28 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 11:14:28 +0200 Subject: RFR: 8232235: ZGC: Move ZValue inline funtions to zValue.inline.hpp In-Reply-To: <0e415dad-e040-749b-b241-cbe798851aa4@oracle.com> References: <0cafef55-7ae5-f2c5-e6b5-a6db5c0facc3@oracle.com> <0e415dad-e040-749b-b241-cbe798851aa4@oracle.com> Message-ID: <8ec5a1ea-fb86-ae4c-86f1-95ac36ffe975@oracle.com> On 10/15/19 10:21 AM, Thomas Schatzl wrote: > Hi, > > On 15.10.19 08:46, Per Liden wrote: >> Please review this clean up patch to move ZValue inline funtions to >> zValue.inline.hpp. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232235 >> Webrev: http://cr.openjdk.java.net/~pliden/8232235/webrev.0 >> >> /Per > > zObjectAllocator.hpp: only seems to change the copyright dates, not > actual change. Good catch, I'll revert that. Thanks for reviewing, Thomas! /Per > > No need to re-review removal of this hunk for me. > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Tue Oct 15 09:21:37 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 11:21:37 +0200 Subject: RFR: 8232239: ZGC: Inline ZCPU::count() and ZCPU:id() In-Reply-To: References: Message-ID: <89e96eb4-8ae3-ed24-8cf0-dccf95967ab6@oracle.com> Hi, On 15.10.19 08:48, Per Liden wrote: > Please review this patch to enable inlining of ZCPU::count() and > ZCPU:id(), which are used in some fairly hot paths. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232239 > Webrev: http://cr.openjdk.java.net/~pliden/8232239/webrev.0 > > /Per looks good. Thomas From per.liden at oracle.com Tue Oct 15 09:33:41 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 11:33:41 +0200 Subject: RFR: 8232239: ZGC: Inline ZCPU::count() and ZCPU:id() In-Reply-To: <89e96eb4-8ae3-ed24-8cf0-dccf95967ab6@oracle.com> References: <89e96eb4-8ae3-ed24-8cf0-dccf95967ab6@oracle.com> Message-ID: Thanks Thomas! /Per On 10/15/19 11:21 AM, Thomas Schatzl wrote: > Hi, > > On 15.10.19 08:48, Per Liden wrote: >> Please review this patch to enable inlining of ZCPU::count() and >> ZCPU:id(), which are used in some fairly hot paths. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232239 >> Webrev: http://cr.openjdk.java.net/~pliden/8232239/webrev.0 >> >> /Per > > ? looks good. > > Thomas From erik.osterlund at oracle.com Tue Oct 15 10:43:15 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 15 Oct 2019 12:43:15 +0200 Subject: RFR: 8232238: ZGC: Move ZList inline funtions to zList.inline.hpp In-Reply-To: <41ab05b7-01e3-3e3f-cf1b-7a5a358763ac@oracle.com> References: <41ab05b7-01e3-3e3f-cf1b-7a5a358763ac@oracle.com> Message-ID: <93d9ec60-2458-e6f2-d64b-e2857225f46a@oracle.com> Hi Per, Looks good. Thanks, /Erik On 10/15/19 8:47 AM, Per Liden wrote: > Please review this clean up patch to move ZList inline funtions to > zList.inline.hpp. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232238 > Webrev: http://cr.openjdk.java.net/~pliden/8232238/webrev.0 > > /Per From per.liden at oracle.com Tue Oct 15 13:07:56 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 15 Oct 2019 15:07:56 +0200 Subject: RFR: 8232238: ZGC: Move ZList inline funtions to zList.inline.hpp In-Reply-To: <93d9ec60-2458-e6f2-d64b-e2857225f46a@oracle.com> References: <93d9ec60-2458-e6f2-d64b-e2857225f46a@oracle.com> Message-ID: <76AD753C-42A0-4743-A85F-4BFA4BFC2551@oracle.com> Thanks Erik! /Per > On 15 Oct 2019, at 12:43, erik.osterlund at oracle.com wrote: > > ?Hi Per, > > Looks good. > > Thanks, > /Erik > >> On 10/15/19 8:47 AM, Per Liden wrote: >> Please review this clean up patch to move ZList inline funtions to zList.inline.hpp. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232238 >> Webrev: http://cr.openjdk.java.net/~pliden/8232238/webrev.0 >> >> /Per > From zgu at redhat.com Tue Oct 15 13:13:38 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 15 Oct 2019 09:13:38 -0400 Subject: RFR (XS) 8232205: Shenandoah: missing "Update References" -> "Update Roots" tracing In-Reply-To: References: Message-ID: Looks good to me. -Zhengyu On 10/14/19 5:20 AM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232205 > > Noticed that -Xlog:gc+stats does not print "Update Roots" section for "Update References". This is a > regression since JDK-8223951. > > Fix: > https://cr.openjdk.java.net/~shade/8232205/webrev.01/ > > Testing: hotspot_gc_shenandoah, eyeballing gc+stats > From thomas.schatzl at oracle.com Tue Oct 15 13:13:47 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 15:13:47 +0200 Subject: RFR (XXS): 8232260: Remove g1 prefix in G1CollectedHeap::g1_hot_card_cache() getter Message-ID: <6eda50a7-fe97-a101-5402-6a09c005e209@oracle.com> Hi, can I have reviews for this small cleanup that removes some "g1_" prefix from some getter and some related unnecessary friend declaration. CR: https://bugs.openjdk.java.net/browse/JDK-8232260 Webrev: http://cr.openjdk.java.net/~tschatzl/8232260/webrev/ Testing: local compilation Thanks, Thomas From stefan.johansson at oracle.com Tue Oct 15 13:20:25 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 15 Oct 2019 15:20:25 +0200 Subject: RFR (XXS): 8232260: Remove g1 prefix in G1CollectedHeap::g1_hot_card_cache() getter In-Reply-To: <6eda50a7-fe97-a101-5402-6a09c005e209@oracle.com> References: <6eda50a7-fe97-a101-5402-6a09c005e209@oracle.com> Message-ID: <89805899-3c84-4632-c995-4d130408aebf@oracle.com> Looks good! On 2019-10-15 15:13, Thomas Schatzl wrote: > Hi, > > ? can I have reviews for this small cleanup that removes some "g1_" > prefix from some getter and some related unnecessary friend declaration. > > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232260 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232260/webrev/ > Testing: > local compilation > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Tue Oct 15 13:23:34 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 15:23:34 +0200 Subject: RFR (XXS): 8232260: Remove g1 prefix in G1CollectedHeap::g1_hot_card_cache() getter In-Reply-To: <89805899-3c84-4632-c995-4d130408aebf@oracle.com> References: <6eda50a7-fe97-a101-5402-6a09c005e209@oracle.com> <89805899-3c84-4632-c995-4d130408aebf@oracle.com> Message-ID: <9ecc0e8a-d1a1-7858-7778-7eb709873d9a@oracle.com> Hi Stefan, On 15.10.19 15:20, Stefan Johansson wrote: > Looks good! > > On 2019-10-15 15:13, Thomas Schatzl wrote: >> Hi, >> >> ?? can I have reviews for this small cleanup that removes some "g1_" >> prefix from some getter and some related unnecessary friend declaration. >> Thanks for your review, Thomas From sangheon.kim at oracle.com Tue Oct 15 14:33:07 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 15 Oct 2019 07:33:07 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> Message-ID: Hi all, Here's revised webrev which addresses: 1) G1RegionToSpaceMapper checks mtJavaHeap and then conditionally calls G1NUMA::request_memory_on_node() (Kim) 2) The signature of G1NUMA::request_memory_on_node(void* address, ,) is changed to have actual address instead of page index. (Stefan) 3) Some local variable name changes at G1RegionToSpaceMapper. i -> region_idx, idx -> page_idx (for local style, used idx instead of index) webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.5/ http://cr.openjdk.java.net/~sangheki/8220310/webrev.5.inc/ Testing: hs-tier 1 ~ 5, with/without UseNUMA Thanks, Sangheon On 10/14/19 3:20 PM, Kim Barrett wrote: >> On Oct 14, 2019, at 5:03 PM, Kim Barrett wrote: >>> 2. Add a state to the mappers to say if they are NUMA aware or not, and currently only the heap mapper should be NUMA aware. We could either set this state to true using the mtJavaHeap type as we have checked before or add an explicit setter that we only call for the heap mapper. >>> >>> I know that only doing 2) will fix the current problem, but I think it would be nice to avoid having the base address in G1NUMA, thoughts? >> I don't understand the point about mappers needing to know if they are >> NUMA or not. request_memory_on_node is only called by the two relevant >> region->space mappers, with the memory involved always in the Java >> heap (after fixing the units mismatch mentioned above). That is, >> G1NUMA::request_memory_on_node should only be called for Java heap >> memory. (It might be able to assert is_in_reserved or something like >> that, though initialization order might prevent that.) > I was confused here too. Sangheon has repaired my confusion, and he?s > got another change in the works to tidy things up here in a way that I think > will make both me and Stefan happy. > From kim.barrett at oracle.com Tue Oct 15 14:34:21 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 15 Oct 2019 10:34:21 -0400 Subject: RFR (XXS): 8232260: Remove g1 prefix in G1CollectedHeap::g1_hot_card_cache() getter In-Reply-To: <6eda50a7-fe97-a101-5402-6a09c005e209@oracle.com> References: <6eda50a7-fe97-a101-5402-6a09c005e209@oracle.com> Message-ID: <07917E2B-A8CC-43E2-BCA5-F4E0548EB6EE@oracle.com> > On Oct 15, 2019, at 9:13 AM, Thomas Schatzl wrote: > > Hi, > > can I have reviews for this small cleanup that removes some "g1_" prefix from some getter and some related unnecessary friend declaration. > > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232260 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232260/webrev/ > Testing: > local compilation > > Thanks, > Thomas Looks good. From thomas.schatzl at oracle.com Tue Oct 15 14:40:44 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 15 Oct 2019 16:40:44 +0200 Subject: RFR (XXS): 8232260: Remove g1 prefix in G1CollectedHeap::g1_hot_card_cache() getter In-Reply-To: <07917E2B-A8CC-43E2-BCA5-F4E0548EB6EE@oracle.com> References: <6eda50a7-fe97-a101-5402-6a09c005e209@oracle.com> <07917E2B-A8CC-43E2-BCA5-F4E0548EB6EE@oracle.com> Message-ID: <47bb3be2-dffc-3822-d19c-8fad3a5dd986@oracle.com> Hi Kim, On 15.10.19 16:34, Kim Barrett wrote: >> On Oct 15, 2019, at 9:13 AM, Thomas Schatzl wrote: >> >> Hi, >> >> can I have reviews for this small cleanup that removes some "g1_" prefix from some getter and some related unnecessary friend declaration. >> >>[...] >> >> Thanks, >> Thomas > > Looks good. > thanks for your review. Thomas From rkennke at redhat.com Tue Oct 15 14:45:12 2019 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 15 Oct 2019 16:45:12 +0200 Subject: RFR (XS) 8232205: Shenandoah: missing "Update References" -> "Update Roots" tracing In-Reply-To: References: Message-ID: <4da2e730-9649-4199-f7c2-a5fc1c64dde8@redhat.com> Ok. Thanks! Roman > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232205 > > Noticed that -Xlog:gc+stats does not print "Update Roots" section for "Update References". This is a > regression since JDK-8223951. > > Fix: > https://cr.openjdk.java.net/~shade/8232205/webrev.01/ > > Testing: hotspot_gc_shenandoah, eyeballing gc+stats > From rs at jelastic.com Tue Oct 15 16:26:08 2019 From: rs at jelastic.com (Ruslan Synytsky) Date: Tue, 15 Oct 2019 12:26:08 -0400 Subject: G1 patch of elastic Java heap In-Reply-To: References: <9e1ea9d1-2340-4c47-9249-12cb04886230.maoliang.ml@alibaba-inc.com> Message-ID: HardMaxHeapSize sounds logical to me, so we will have SoftMaxHeapSize and HardMaxHeapSize - easier to understand and remember. Regards -- Ruslan Synytsky On Tue, 15 Oct 2019 at 04:12, Per Liden wrote: > Hi, > > On 10/15/19 8:10 AM, Liang Mao wrote: > > Hi Ruslan and OpenJDK developers, > > > > I noticed this difference too. The softmx in OpenJ9 seems to not allow > the application beyong > > the new limit while JDK-8222145 treats the SoftMaxHeapSize as a *soft* > limit which can be > > exceeded. Personally I prefer the former a little bit. But introducing > another name seems more > > confused to users. Maybe use an option to control? Like "bool > SoftMaxHeapSizeOOM" ? > > I personally think the OpenJ9 softmx option is misnamed, as it's not a > *soft* limit, but a *hard* limit. Hotspot's SoftMaxHeapSize is *soft* by > design. Today's hard limit in Hotspot is of course MaxHeapSize (-Xmx). > The only problem is that isn't not a manageable flag so it can't be > changed at runtime. Making it manageable is tricky for GCs that size > data structures at startup based on MaxHeapSize. One option could be to > simply reject changes to MaxHeapSize unless the currently used GC > declares that it supports changing it. Another option could be to keep > MaxHeapSize as is, and introduce a separate flag (e.g. HardMaxHeapSize > or CurrentMaxHeapSize). In that case MaxHeapSize would act as the upper > limit for a the "hard limit" flag. > > cheers, > Per > > > > > Thanks, > > Liang > > > > > > > > > > > > > > ------------------------------------------------------------------ > > From:Ruslan Synytsky > > Send Time:2019 Oct. 15 (Tue.) 06:46 > > To:hotspot-gc-dev at openjdk.java.net openjdk.java.net < > hotspot-gc-dev at openjdk.java.net>; "MAO, Liang" < > maoliang.ml at alibaba-inc.com>; Thomas Schatzl > > Subject:Re: G1 patch of elastic Java heap > > > > Dear Liang and Thomas, thank you for your contribution to Java > elasticity. > > > > I would like to pay attention to the softmx option which is planned to > be renamed to SoftMaxHeapSize as I understand. According to the feedback in > another thread, if the memory usage reaches the softmx limit then JVM will > throw OOM Error. It differs from the logic described at > https://bugs.openjdk.java.net/browse/JDK-8222145. Personally I believe > OOM Error inside JVM is a little bit safer approach compared to the > potential termination of java process by OOM Killer. But how can we avoid > confusions? Should we use different naming? > > > > Thanks > > > From shade at redhat.com Tue Oct 15 17:33:45 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 15 Oct 2019 19:33:45 +0200 Subject: RFR (S) 8232051: Epsilon should warn about Xms/Xmx/AlwaysPreTouch configuration In-Reply-To: References: Message-ID: <4d818e96-e0b4-00f2-6a5a-85cdfabb91c9@redhat.com> On 10/9/19 4:15 PM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8232051 > > This is arguably the UX bug: users expect low latency, but may not be aware that additional > configuration is needed for GCs to perform well in those conditions. Epsilon already enables LSM, > and should warn about Xms/Xmx/AlwaysPreTouch config too. It cannot adjust these settings, though, > because it would affect startup time -- users would have to opt-in. > > Fix: > https://cr.openjdk.java.net/~shade/8232051/webrev.01/ Friendly reminder. -- Thanks, -Aleksey From zgu at redhat.com Tue Oct 15 17:42:40 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 15 Oct 2019 13:42:40 -0400 Subject: RFR (S) 8232051: Epsilon should warn about Xms/Xmx/AlwaysPreTouch configuration In-Reply-To: <4d818e96-e0b4-00f2-6a5a-85cdfabb91c9@redhat.com> References: <4d818e96-e0b4-00f2-6a5a-85cdfabb91c9@redhat.com> Message-ID: Looks good to me. Thanks, -Zhengyu On 10/15/19 1:33 PM, Aleksey Shipilev wrote: > On 10/9/19 4:15 PM, Aleksey Shipilev wrote: >> RFE: >> https://bugs.openjdk.java.net/browse/JDK-8232051 >> >> This is arguably the UX bug: users expect low latency, but may not be aware that additional >> configuration is needed for GCs to perform well in those conditions. Epsilon already enables LSM, >> and should warn about Xms/Xmx/AlwaysPreTouch config too. It cannot adjust these settings, though, >> because it would affect startup time -- users would have to opt-in. >> >> Fix: >> https://cr.openjdk.java.net/~shade/8232051/webrev.01/ > > Friendly reminder. > From shade at redhat.com Tue Oct 15 18:00:08 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 15 Oct 2019 20:00:08 +0200 Subject: RFR (S) 8232051: Epsilon should warn about Xms/Xmx/AlwaysPreTouch configuration In-Reply-To: References: <4d818e96-e0b4-00f2-6a5a-85cdfabb91c9@redhat.com> Message-ID: Thank you, pushed. -Aleksey On 10/15/19 7:42 PM, Zhengyu Gu wrote: > Looks good to me. > > Thanks, > > -Zhengyu > > > On 10/15/19 1:33 PM, Aleksey Shipilev wrote: >> On 10/9/19 4:15 PM, Aleksey Shipilev wrote: >>> RFE: >>> ?? https://bugs.openjdk.java.net/browse/JDK-8232051 >>> >>> This is arguably the UX bug: users expect low latency, but may not be aware that additional >>> configuration is needed for GCs to perform well in those conditions. Epsilon already enables LSM, >>> and should warn about Xms/Xmx/AlwaysPreTouch config too. It cannot adjust these settings, though, >>> because it would affect startup time -- users would have to opt-in. >>> >>> Fix: >>> ?? https://cr.openjdk.java.net/~shade/8232051/webrev.01/ >> >> Friendly reminder. From kishor.kharbas at intel.com Wed Oct 16 01:23:30 2019 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Wed, 16 Oct 2019 01:23:30 +0000 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Thank you for the suggestions. In this webrev I added a flag to ReservedSpace constructors to direct it to pin the memory space. So now G1PageBasedVirtualSpace does not have to do special handling. http://cr.openjdk.java.net/~kkharbas/8215893/webrev.02/ To add more to Sangheon's reply to Stefan's question, > Another thing, can you remind me why we need the bitmaps to be pinned but not other structures such as the card table? When I implemented this feature I had run into issue with the default implementation of concurrent marking bitmaps. Thanks, Kishor From: sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com] Sent: Wednesday, October 9, 2019 2:42 PM To: Kharbas, Kishor ; hotspot-gc-dev at openjdk.java.net Cc: Stefan Johansson Subject: Re: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. Hi Kishor, On 10/4/19 4:15 PM, Kharbas, Kishor wrote: Hi Stefan, Thanks for the review. Some comments inline. New webrev : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00_to_01/ http://cr.openjdk.java.net/~kkharbas/8215893/webrev.01/ I am reviewing the patch but have a question on top of Stefan's question[1]. Why the bimap mappers are committed? I think all troubles started from 'committing but treating as special here. Couldn't just treat the bitmap mappers as 'special' without commit? If 'not committing' is doable, couldn't simply create ReservedSpace with 'special' enabled (independent to large page setting, which is same to Stefan's comment)? Or add PinnedResevedSpace to force 'special enabled'. [1]: Another thing, can you remind me why we need the bitmaps to be pinned but not other structures such as the card table? +HeterogeneousHeapRegionManager::initialize() ... + // We commit bitmap for all regions during initialization and mark the bitmap space as special. + // This allows regions to be un-committed while concurrent-marking threads are accesing the bitmap concurrently. Thanks, Sangheon > Hi Kishor, > > On 04.10.19 03:00, Kharbas, Kishor wrote: >> Hi, >> When I worked on JDK-8211425, there was a request for better abstraction for pinning G1's CM bitmaps. RFE for the request is here - JDK-8215893. >> >> Here is a proposal : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00/ >> >> Here G1PageBasedVirtualSpace pins the entire reserved memory to memory during construction. The constructor takes an additional bool flag which says "does it need to pin the memory". >> If the memory is pinned, '_special' flag is set to true. I piggy back on _special flag's behavior which is to not do actual OS (un-)commits on calls to (un)commit(). >> Rest of the changes is the mechanism to pass this flag from CM bitmaps creation in G1CollectedHeap all the way to G1PageBasedVirtualSpace. >> >> Let me know if this is a good abstraction and if there is any better way. >> >> Thanks >> Kishor >> > > Some comments: > > - in the parameter lists, if the parameters are already laid out > line-by-line, if adding a new one, please put it on a new line as well. > Fixed in the new webrev. > - this code > > if (_special) { > if (!rs.special()) { > commit_internal(addr_to_page_index(_low_boundary), > addr_to_page_index(_high_boundary)); > } > > in g1PageBasedVirtualSpace looks very incomprehensible. :) > > I would prefer (pending the second reviewer's comment) to either use the > "pinned" flag here, or even better, move the necessary commit calls into > the (now removed) HeterogeneousHeapRegionManager::initialize(). > Made it little more comprehensible. Will see what other reviewers think about moving it somewhere else. > - I would just purely from feeling prefer if the "pinned" flag parameter > would be listed after the "type" parameter in the G1RegionToSpaceMapper. > But that's probably just me. > I did it this way to logically group the parameters. MemTracker is a tracker used by the VM everywhere and does not pertain to this class as such, so I kept it in the end. > Also, finally one parameter per line for the declaration/definition of > the constructor would improve readability. > Done. Thank you, Kishor > Thanks, > Thomas From erik.osterlund at oracle.com Wed Oct 16 06:51:08 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 16 Oct 2019 08:51:08 +0200 Subject: RFR: 8231940: ZGC: Print correct low/high capacity In-Reply-To: <2742b8ba-7fa0-b789-a250-4c9de40e1fc0@oracle.com> References: <2742b8ba-7fa0-b789-a250-4c9de40e1fc0@oracle.com> Message-ID: Hi Per, Looks good. Thanks, /Erik > On 7 Oct 2019, at 13:37, Per Liden wrote: > > ?After JDK-8222480, heap capacity can go down, not just up. The heap logging should take that into account when when printing capacity high/low numbers. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231940 > Webrev: http://cr.openjdk.java.net/~pliden/8231940/webrev.0 > > /Per From erik.osterlund at oracle.com Wed Oct 16 06:56:08 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 16 Oct 2019 08:56:08 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: <0a2eee49-9bb4-a1be-f8fc-b2efcc01fd59@oracle.com> References: <0a2eee49-9bb4-a1be-f8fc-b2efcc01fd59@oracle.com> Message-ID: <7FC905BD-8F45-4D04-9C2C-C473AB0FA3DD@oracle.com> +1 /Erik > On 10 Oct 2019, at 14:28, Per Liden wrote: > > ?(CC:ing serviceability-dev) > >> On 10/7/19 2:38 PM, Per Liden wrote: >> This test is currently disabled for ZGC, but it can easily be enabled by adjusting the expected log string. ZGC doesn't print "Pause Full", but it still prints the "(Diagnostic Command)" part. >> Also, the test enables gc=debug logging, which is unnecessary since this is always printed on the gc=info level. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231943 >> Webrev: http://cr.openjdk.java.net/~pliden/8231943/webrev.0 >> Testing: Manually ran test with all GCs (except Epsilon) >> /Per From erik.osterlund at oracle.com Wed Oct 16 07:01:48 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 16 Oct 2019 09:01:48 +0200 Subject: RFR: 8232001: ZGC: Ignore metaspace GC threshold until GC is warm In-Reply-To: References: Message-ID: <489ABCE3-B37A-46D9-AC4B-5535957B7DCE@oracle.com> Hi Per, Looks good. /Erik > On 8 Oct 2019, at 15:03, Per Liden wrote: > > ?As reported here: > > https://mail.openjdk.java.net/pipermail/zgc-dev/2019-September/000736.html > > The ZDirector heuristics can get of to a bad start if the statistics is contaminated by early "Metaspace GC Threshold" GC requests. To avoid this, we could simply ignore such requests until the GC is warm, at the potential cost of expanding metaspace a bit more during startup. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232001 > Webrev: http://cr.openjdk.java.net/~pliden/8232001/webrev.0 > > /Per From erik.osterlund at oracle.com Wed Oct 16 07:19:26 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 16 Oct 2019 09:19:26 +0200 Subject: RFR: 8231996: ZGC: Replace ZStatisticsForceTrace with check if JFR event is enabled In-Reply-To: References: Message-ID: <32019E41-9014-450F-BA62-AB1B71A9B886@oracle.com> Hi Per, Looks good. Thanks, /Erik > On 10 Oct 2019, at 12:28, Per Liden wrote: > > ?Remove and replace the diagnostic flag ZStatisticsForceTrace with a check if JFR event is enabled. This flag was introduced as a safety measure back when sending JFR events was problematic in some contexts. This is no longer the case, so we can just let the default.jfc/profile.jfc control when those events should be sent. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231996 > Webrev: http://cr.openjdk.java.net/~pliden/8231996/webrev.0 > > /Per From per.liden at oracle.com Wed Oct 16 07:44:13 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 16 Oct 2019 09:44:13 +0200 Subject: RFR: 8231940: ZGC: Print correct low/high capacity In-Reply-To: References: <2742b8ba-7fa0-b789-a250-4c9de40e1fc0@oracle.com> Message-ID: <80469a8e-da74-dc06-3a0c-7b1c3dbdbd08@oracle.com> Thanks Erik! /Per On 10/16/19 8:51 AM, Erik Osterlund wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > >> On 7 Oct 2019, at 13:37, Per Liden wrote: >> >> ?After JDK-8222480, heap capacity can go down, not just up. The heap logging should take that into account when when printing capacity high/low numbers. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231940 >> Webrev: http://cr.openjdk.java.net/~pliden/8231940/webrev.0 >> >> /Per > From per.liden at oracle.com Wed Oct 16 07:44:33 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 16 Oct 2019 09:44:33 +0200 Subject: RFR: 8232001: ZGC: Ignore metaspace GC threshold until GC is warm In-Reply-To: <489ABCE3-B37A-46D9-AC4B-5535957B7DCE@oracle.com> References: <489ABCE3-B37A-46D9-AC4B-5535957B7DCE@oracle.com> Message-ID: <9e0cc3fe-3430-2503-295b-da3831ce7121@oracle.com> Thanks Erik! /Per On 10/16/19 9:01 AM, Erik Osterlund wrote: > Hi Per, > > Looks good. > > /Erik > >> On 8 Oct 2019, at 15:03, Per Liden wrote: >> >> ?As reported here: >> >> https://mail.openjdk.java.net/pipermail/zgc-dev/2019-September/000736.html >> >> The ZDirector heuristics can get of to a bad start if the statistics is contaminated by early "Metaspace GC Threshold" GC requests. To avoid this, we could simply ignore such requests until the GC is warm, at the potential cost of expanding metaspace a bit more during startup. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232001 >> Webrev: http://cr.openjdk.java.net/~pliden/8232001/webrev.0 >> >> /Per > From per.liden at oracle.com Wed Oct 16 07:44:43 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 16 Oct 2019 09:44:43 +0200 Subject: RFR: 8231996: ZGC: Replace ZStatisticsForceTrace with check if JFR event is enabled In-Reply-To: <32019E41-9014-450F-BA62-AB1B71A9B886@oracle.com> References: <32019E41-9014-450F-BA62-AB1B71A9B886@oracle.com> Message-ID: <17b35be2-a22f-f610-64ac-5c409890b6c5@oracle.com> Thanks Erik! /Per On 10/16/19 9:19 AM, Erik Osterlund wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > >> On 10 Oct 2019, at 12:28, Per Liden wrote: >> >> ?Remove and replace the diagnostic flag ZStatisticsForceTrace with a check if JFR event is enabled. This flag was introduced as a safety measure back when sending JFR events was problematic in some contexts. This is no longer the case, so we can just let the default.jfc/profile.jfc control when those events should be sent. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231996 >> Webrev: http://cr.openjdk.java.net/~pliden/8231996/webrev.0 >> >> /Per > From per.liden at oracle.com Wed Oct 16 07:44:21 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 16 Oct 2019 09:44:21 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: <7FC905BD-8F45-4D04-9C2C-C473AB0FA3DD@oracle.com> References: <0a2eee49-9bb4-a1be-f8fc-b2efcc01fd59@oracle.com> <7FC905BD-8F45-4D04-9C2C-C473AB0FA3DD@oracle.com> Message-ID: Thanks Erik! /Per On 10/16/19 8:56 AM, Erik Osterlund wrote: > +1 > > /Erik > >> On 10 Oct 2019, at 14:28, Per Liden wrote: >> >> ?(CC:ing serviceability-dev) >> >>> On 10/7/19 2:38 PM, Per Liden wrote: >>> This test is currently disabled for ZGC, but it can easily be enabled by adjusting the expected log string. ZGC doesn't print "Pause Full", but it still prints the "(Diagnostic Command)" part. >>> Also, the test enables gc=debug logging, which is unnecessary since this is always printed on the gc=info level. >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231943 >>> Webrev: http://cr.openjdk.java.net/~pliden/8231943/webrev.0 >>> Testing: Manually ran test with all GCs (except Epsilon) >>> /Per > From thomas.schatzl at oracle.com Wed Oct 16 08:07:03 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 16 Oct 2019 10:07:03 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: References: Message-ID: <54d7c6e1-097d-0e70-0c6a-8fa12f788d74@oracle.com> Hi, On 07.10.19 14:38, Per Liden wrote: > This test is currently disabled for ZGC, but it can easily be enabled by > adjusting the expected log string. ZGC doesn't print "Pause Full", but > it still prints the "(Diagnostic Command)" part. > Not sure if that checking only for that satisfies the requirements of the test. I mean that this is a test to verify that jcmd executes (or starts) a GC. I do not think checking for "(Diagnostic Command)" is enough - it could be any diagnostic command that could be executed. What does ZGC print here? Can the check be made more specific? > Also, the test enables gc=debug logging, which is unnecessary since this > is always printed on the gc=info level. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231943 > Webrev: http://cr.openjdk.java.net/~pliden/8231943/webrev.0 > Thanks, Thomas From per.liden at oracle.com Wed Oct 16 08:41:57 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 16 Oct 2019 10:41:57 +0200 Subject: RFR: 8231552: ZGC: Refine address space reservation In-Reply-To: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> References: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> Message-ID: Latest version of this patch, rebased on today's jdk/jdk: http://cr.openjdk.java.net/~pliden/8231552/webrev.2 /Per On 10/3/19 11:45 AM, Per Liden wrote: > We could be slightly more sophisticated and do a better job reserving > address space in situations where parts of the address space is already > occupied or when the process is running with address space limitations. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231552 > Webrev: http://cr.openjdk.java.net/~pliden/8231552/webrev.0 > > /Per From per.liden at oracle.com Wed Oct 16 10:27:32 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 16 Oct 2019 12:27:32 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: <54d7c6e1-097d-0e70-0c6a-8fa12f788d74@oracle.com> References: <54d7c6e1-097d-0e70-0c6a-8fa12f788d74@oracle.com> Message-ID: <674868d8-b391-9e86-698b-1e510b68dc36@oracle.com> Hi Thomas, On 10/16/19 10:07 AM, Thomas Schatzl wrote: > Hi, > > On 07.10.19 14:38, Per Liden wrote: >> This test is currently disabled for ZGC, but it can easily be enabled >> by adjusting the expected log string. ZGC doesn't print "Pause Full", >> but it still prints the "(Diagnostic Command)" part. >> > Not sure if that checking only for that satisfies the requirements of > the test. I mean that this is a test to verify that jcmd executes (or > starts) a GC. I do not think checking for "(Diagnostic Command)" is > enough - it could be any diagnostic command that could be executed. I don't think that's quite true, since the file we're greping in is the GC log (not stdout), which we know only contains stuff from gc=info. So, only if the GC itself is printing "(Diagnostic Command)" on gc=info level somewhere else is this a problem, which I would find somewhat surprising, no? > > What does ZGC print here? Can the check be made more specific? "Garbage Collection (Diagnostic Command)" I opted to search for just "(Diagnostic Command)" mainly to keep the test GC agnostic. I don't have a strong opinion, but I don't believe making more specific greps will make the test more robust in practice, for the reason described above. cheers, Per > >> Also, the test enables gc=debug logging, which is unnecessary since >> this is always printed on the gc=info level. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231943 >> Webrev: http://cr.openjdk.java.net/~pliden/8231943/webrev.0 >> > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Wed Oct 16 11:13:32 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 16 Oct 2019 13:13:32 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: <674868d8-b391-9e86-698b-1e510b68dc36@oracle.com> References: <54d7c6e1-097d-0e70-0c6a-8fa12f788d74@oracle.com> <674868d8-b391-9e86-698b-1e510b68dc36@oracle.com> Message-ID: <2fed46aa-f281-94f1-4aec-c5e45aed8fbc@oracle.com> Hi, On 16.10.19 12:27, Per Liden wrote: > Hi Thomas, > > On 10/16/19 10:07 AM, Thomas Schatzl wrote: >> Hi, >> >> On 07.10.19 14:38, Per Liden wrote: >>> This test is currently disabled for ZGC, but it can easily be enabled >>> by adjusting the expected log string. ZGC doesn't print "Pause Full", >>> but it still prints the "(Diagnostic Command)" part. >>> >> Not sure if that checking only for that satisfies the requirements of >> the test. I mean that this is a test to verify that jcmd executes (or >> starts) a GC. I do not think checking for "(Diagnostic Command)" is >> enough - it could be any diagnostic command that could be executed. > > I don't think that's quite true, since the file we're greping in is the > GC log (not stdout), which we know only contains stuff from gc=info. So, > only if the GC itself is printing "(Diagnostic Command)" on gc=info > level somewhere else is this a problem, which I would find somewhat > surprising, no? Okay, ship it then :) Thanks for the clarification. Thomas From stefan.johansson at oracle.com Wed Oct 16 12:55:01 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 16 Oct 2019 14:55:01 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> Message-ID: Hi Sangheon, On 2019-10-15 16:33, sangheon.kim at oracle.com wrote: > Hi all, > > Here's revised webrev which addresses: > 1) G1RegionToSpaceMapper checks mtJavaHeap and then conditionally calls > G1NUMA::request_memory_on_node() (Kim) > 2) The signature of G1NUMA::request_memory_on_node(void* address, ,) is > changed to have actual address instead of page index. (Stefan) > 3) Some local variable name changes at G1RegionToSpaceMapper. i -> > region_idx, idx -> page_idx (for local style, used idx instead of index) > > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.5/ > http://cr.openjdk.java.net/~sangheki/8220310/webrev.5.inc/ This looks good! Thanks for all your hard work, Stefan > Testing: hs-tier 1 ~ 5, with/without UseNUMA > > Thanks, > Sangheon > > > On 10/14/19 3:20 PM, Kim Barrett wrote: >>> On Oct 14, 2019, at 5:03 PM, Kim Barrett wrote: >>>> 2. Add a state to the mappers to say if they are NUMA aware or not, >>>> and currently only the heap mapper should be NUMA aware. We could >>>> either set this state to true using the mtJavaHeap type as we have >>>> checked before or add an explicit setter that we only call for the >>>> heap mapper. >>>> >>>> I know that only doing 2) will fix the current problem, but I think >>>> it would be nice to avoid having the base address in G1NUMA, thoughts? >>> I don't understand the point about mappers needing to know if they are >>> NUMA or not. request_memory_on_node is only called by the two relevant >>> region->space mappers, with the memory involved always in the Java >>> heap (after fixing the units mismatch mentioned above). That is, >>> G1NUMA::request_memory_on_node should only be called for Java heap >>> memory. (It might be able to assert is_in_reserved or something like >>> that, though initialization order might prevent that.) >> I was confused here too.? Sangheon has repaired my confusion, and he?s >> got another change in the works to tidy things up here in a way that I >> think >> will make both me and Stefan happy. >> > From per.liden at oracle.com Wed Oct 16 13:04:13 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 16 Oct 2019 15:04:13 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: <2fed46aa-f281-94f1-4aec-c5e45aed8fbc@oracle.com> References: <54d7c6e1-097d-0e70-0c6a-8fa12f788d74@oracle.com> <674868d8-b391-9e86-698b-1e510b68dc36@oracle.com> <2fed46aa-f281-94f1-4aec-c5e45aed8fbc@oracle.com> Message-ID: <6230542f-d42f-0672-a454-7cf65123e35e@oracle.com> On 10/16/19 1:13 PM, Thomas Schatzl wrote: > Hi, > > On 16.10.19 12:27, Per Liden wrote: >> Hi Thomas, >> >> On 10/16/19 10:07 AM, Thomas Schatzl wrote: >>> Hi, >>> >>> On 07.10.19 14:38, Per Liden wrote: >>>> This test is currently disabled for ZGC, but it can easily be >>>> enabled by adjusting the expected log string. ZGC doesn't print >>>> "Pause Full", but it still prints the "(Diagnostic Command)" part. >>>> >>> Not sure if that checking only for that satisfies the requirements of >>> the test. I mean that this is a test to verify that jcmd executes (or >>> starts) a GC. I do not think checking for "(Diagnostic Command)" is >>> enough - it could be any diagnostic command that could be executed. >> >> I don't think that's quite true, since the file we're greping in is >> the GC log (not stdout), which we know only contains stuff from >> gc=info. So, only if the GC itself is printing "(Diagnostic Command)" >> on gc=info level somewhere else is this a problem, which I would find >> somewhat surprising, no? > > Okay, ship it then :) Thanks for the clarification. Ok, thanks for reviewing, Thomas! /Per From kim.barrett at oracle.com Wed Oct 16 14:00:35 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 16 Oct 2019 10:00:35 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> Message-ID: > On Oct 15, 2019, at 10:33 AM, sangheon.kim at oracle.com wrote: > > Hi all, > > Here's revised webrev which addresses: > 1) G1RegionToSpaceMapper checks mtJavaHeap and then conditionally calls G1NUMA::request_memory_on_node() (Kim) > 2) The signature of G1NUMA::request_memory_on_node(void* address, ,) is changed to have actual address instead of page index. (Stefan) > 3) Some local variable name changes at G1RegionToSpaceMapper. i -> region_idx, idx -> page_idx (for local style, used idx instead of index) > > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.5/ > http://cr.openjdk.java.net/~sangheki/8220310/webrev.5.inc/ > Testing: hs-tier 1 ~ 5, with/without UseNUMA Looks good. In g1PageBasedVirtualSpace.cpp, could the newly added definition of page_size() be moved to be near the existing definition of page_start()? I don?t need a new webrev if you move it. From thomas.schatzl at oracle.com Wed Oct 16 14:05:45 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 16 Oct 2019 16:05:45 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> Message-ID: <0cfc451b-292c-2ea1-f275-08b186c1e044@oracle.com> Hi, On 15.10.19 16:33, sangheon.kim at oracle.com wrote: > Hi all, > > Here's revised webrev which addresses: > 1) G1RegionToSpaceMapper checks mtJavaHeap and then conditionally calls > G1NUMA::request_memory_on_node() (Kim) > 2) The signature of G1NUMA::request_memory_on_node(void* address, ,) is > changed to have actual address instead of page index. (Stefan) > 3) Some local variable name changes at G1RegionToSpaceMapper. i -> > region_idx, idx -> page_idx (for local style, used idx instead of index) > > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.5/ > http://cr.openjdk.java.net/~sangheki/8220310/webrev.5.inc/ > Testing: hs-tier 1 ~ 5, with/without UseNUMA > looks good. Thomas From zgu at redhat.com Wed Oct 16 14:44:13 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 16 Oct 2019 10:44:13 -0400 Subject: RFR 8231999: Shenandoah: Traversal failed compiler/jsr292/CallSiteDepContextTest.java Message-ID: <1721621f-a5df-58fc-e007-8d0bf713afa1@redhat.com> This patch partially reverts JDK-8231293's fix, because it hides dead oops from GC, by returning NULL, which causes the failure of this test case. The root cause of JDK-8231293 is that, Traversal deactivates SATB barrier too late, it should be turned off before weak root processing. Bug: https://bugs.openjdk.java.net/browse/JDK-8231999 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231999/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) on Linux After this fix, CallSiteDepContextTest.java test hangs in traversal mode, but it is separate issue, tracked by JDK-8232380. Thanks, -Zhengyu From rkennke at redhat.com Wed Oct 16 15:25:08 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 16 Oct 2019 17:25:08 +0200 Subject: RFR 8231999: Shenandoah: Traversal failed compiler/jsr292/CallSiteDepContextTest.java In-Reply-To: <1721621f-a5df-58fc-e007-8d0bf713afa1@redhat.com> References: <1721621f-a5df-58fc-e007-8d0bf713afa1@redhat.com> Message-ID: So for traversal, it falls to the normal LRB, is that what you intended? Roman > This patch partially reverts JDK-8231293's fix, because it hides dead > oops from GC, by returning NULL, which causes the failure of this test > case. > > The root cause of JDK-8231293 is that, Traversal deactivates SATB > barrier too late, it should be turned off before weak root processing. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231999 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231999/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) on Linux > > After this fix, CallSiteDepContextTest.java test hangs in traversal > mode, but it is separate issue, tracked by JDK-8232380. > > Thanks, > > -Zhengyu > > > From zgu at redhat.com Wed Oct 16 15:40:33 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 16 Oct 2019 11:40:33 -0400 Subject: RFR 8231999: Shenandoah: Traversal failed compiler/jsr292/CallSiteDepContextTest.java In-Reply-To: References: <1721621f-a5df-58fc-e007-8d0bf713afa1@redhat.com> Message-ID: <86750e0f-d71a-0ad6-26e9-866e6f022686@redhat.com> On 10/16/19 11:25 AM, Roman Kennke wrote: > So for traversal, it falls to the normal LRB, is that what you intended? It is always the case, isn't it? -Zhengyu > > Roman > >> This patch partially reverts JDK-8231293's fix, because it hides dead >> oops from GC, by returning NULL, which causes the failure of this test >> case. >> >> The root cause of JDK-8231293 is that, Traversal deactivates SATB >> barrier too late, it should be turned off before weak root processing. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231999 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231999/webrev.00/ >> >> Test: >> ? hotspot_gc_shenandoah (fastdebug and release) on Linux >> >> After this fix, CallSiteDepContextTest.java test hangs in traversal >> mode, but it is separate issue, tracked by JDK-8232380. >> >> Thanks, >> >> -Zhengyu >> >> >> > From rkennke at redhat.com Wed Oct 16 15:47:17 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 16 Oct 2019 17:47:17 +0200 Subject: RFR 8231999: Shenandoah: Traversal failed compiler/jsr292/CallSiteDepContextTest.java In-Reply-To: <86750e0f-d71a-0ad6-26e9-866e6f022686@redhat.com> References: <1721621f-a5df-58fc-e007-8d0bf713afa1@redhat.com> <86750e0f-d71a-0ad6-26e9-866e6f022686@redhat.com> Message-ID: >> So for traversal, it falls to the normal LRB, is that what you intended? > > It is always the case, isn't it? Yeah sure, just wanted to check if that is what you intended. The patch is ok. Ideally, the code that calls into GC path wouldn't go through the barrier to begin with, though. Can you keep a record of which code path does that? Roman > > > -Zhengyu > >> >> Roman >> >>> This patch partially reverts JDK-8231293's fix, because it hides dead >>> oops from GC, by returning NULL, which causes the failure of this test >>> case. >>> >>> The root cause of JDK-8231293 is that, Traversal deactivates SATB >>> barrier too late, it should be turned off before weak root processing. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231999 >>> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231999/webrev.00/ >>> >>> Test: >>> ?? hotspot_gc_shenandoah (fastdebug and release) on Linux >>> >>> After this fix, CallSiteDepContextTest.java test hangs in traversal >>> mode, but it is separate issue, tracked by JDK-8232380. >>> >>> Thanks, >>> >>> -Zhengyu >>> >>> >>> >> From sangheon.kim at oracle.com Wed Oct 16 17:54:02 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Wed, 16 Oct 2019 10:54:02 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> Message-ID: <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> Hi Kim, Stefan and Thomas, Many thanks for the reviews and suggestions! Kim, I will move page_size() near page_start() before push as you suggested. As you know, all 3 patches will be pushed together though. Thanks, Sangheon On 10/16/19 7:00 AM, Kim Barrett wrote: >> On Oct 15, 2019, at 10:33 AM, sangheon.kim at oracle.com wrote: >> >> Hi all, >> >> Here's revised webrev which addresses: >> 1) G1RegionToSpaceMapper checks mtJavaHeap and then conditionally calls G1NUMA::request_memory_on_node() (Kim) >> 2) The signature of G1NUMA::request_memory_on_node(void* address, ,) is changed to have actual address instead of page index. (Stefan) >> 3) Some local variable name changes at G1RegionToSpaceMapper. i -> region_idx, idx -> page_idx (for local style, used idx instead of index) >> >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.5/ >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.5.inc/ >> Testing: hs-tier 1 ~ 5, with/without UseNUMA > Looks good. > > In g1PageBasedVirtualSpace.cpp, could the newly added definition of page_size() > be moved to be near the existing definition of page_start()? I don?t need a new > webrev if you move it. > From sangheon.kim at oracle.com Wed Oct 16 18:02:50 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Wed, 16 Oct 2019 11:02:50 -0700 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Hi Kishor, Before reviewing webrev.02, could you remind us what was the motivation of pinning the bitmap mappers here? In addition to explanations of the problematic situation, any logs / stack-trace also may help. We think that understanding of the root cause should be considered first. Thanks, Sangheon On 10/15/19 6:23 PM, Kharbas, Kishor wrote: > > Thank you for the suggestions. > > In this webrev I added a flag to ReservedSpace constructors to direct > it to pin the memory space. So now G1PageBasedVirtualSpace does not > have to do special handling. > > http://cr.openjdk.java.net/~kkharbas/8215893/webrev.02/ > > To add more to Sangheon?s reply to Stefan?s question, > > > Another thing, can you remind me why we need the bitmaps to be pinned but not other structures such as the > card table? > > When I implemented this feature I had run into issue with the default > implementation of concurrent marking bitmaps. > > Thanks, > > Kishor > > *From:*sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com] > *Sent:* Wednesday, October 9, 2019 2:42 PM > *To:* Kharbas, Kishor ; > hotspot-gc-dev at openjdk.java.net > *Cc:* Stefan Johansson > *Subject:* Re: RFR(S): 8215893: Add better abstraction for pinning G1 > concurrent marking bitmaps. > > Hi Kishor, > > On 10/4/19 4:15 PM, Kharbas, Kishor wrote: > > Hi Stefan, > > Thanks for the review. Some comments inline. > > New webrev : > http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00_to_01/ > > http://cr.openjdk.java.net/~kkharbas/8215893/webrev.01/ > > > I am reviewing the patch but have a question on top of Stefan's > question[1]. > Why the bimap mappers are committed? I think all troubles started from > 'committing but treating as special here. Couldn't just treat the > bitmap mappers as 'special' without commit? > If 'not committing' is doable, couldn't simply create ReservedSpace > with 'special' enabled (independent to large page setting, which is > same to Stefan's comment)? Or add PinnedResevedSpace to force 'special > enabled'. > > [1]: Another thing, can you remind me why we need the bitmaps to be > pinned but not other structures such as the card table? > > +HeterogeneousHeapRegionManager::initialize() > ... > +? // We commit bitmap for all regions during initialization and mark > the bitmap space as special. > +? // This allows regions to be un-committed while concurrent-marking > threads are accesing the bitmap concurrently. > > Thanks, > Sangheon > > > > > Hi Kishor, > > > > > > On 04.10.19 03:00, Kharbas, Kishor wrote: > > >> Hi, > > >> When I worked on > JDK-8211425 > , there was a > request for better abstraction for pinning G1's CM bitmaps. RFE > for the request is here - > JDK-8215893 > . > > >> > > >> Here is a proposal : > http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00/ > > > >> > > >> Here G1PageBasedVirtualSpace pins the entire reserved memory to > memory during construction. The constructor takes an additional > bool flag which says "does it need to pin the memory". > > >> If the memory is pinned, '_special' flag is set to true. I > piggy back on _special flag's behavior which is to not do actual > OS (un-)commits on calls to (un)commit(). > > >> Rest of the changes is the mechanism to pass this flag from CM > bitmaps creation in G1CollectedHeap all the way to > G1PageBasedVirtualSpace. > > >> > > >> Let me know if this is a good abstraction and if there is any > better way. > > >> > > >> Thanks > > >> Kishor > > >> > > > > > > Some comments: > > > > > > - in the parameter lists, if the parameters are already laid out > > > line-by-line, if adding a new one, please put it on a new line > as well. > > > > > Fixed in the new webrev. > > > - this code > > > > > >??? if (_special) { > > >????? if (!rs.special()) { > > > commit_internal(addr_to_page_index(_low_boundary), > > > addr_to_page_index(_high_boundary)); > > >????? } > > > > > > in g1PageBasedVirtualSpace looks very incomprehensible.? :) > > > > > > I would prefer (pending the second reviewer's comment) to either > use the > > > "pinned" flag here, or even better, move the necessary commit > calls into > > > the (now removed) HeterogeneousHeapRegionManager::initialize(). > > > > > Made it little more comprehensible. Will see what other reviewers > think about moving it somewhere else. > > > - I would just purely from feeling prefer if the "pinned" flag > parameter > > > would be listed after the "type" parameter in the > G1RegionToSpaceMapper. > > > But that's probably just me. > > > > > I did it this way to logically group the parameters. MemTracker is > a tracker used by the VM everywhere and does not pertain to this > class as such, so I kept it in the end. > > > Also, finally one parameter per line for the > declaration/definition of > > > the constructor would improve readability. > > > > > Done. > > Thank you, > > Kishor > > > Thanks, > > >? ??Thomas > From manc at google.com Wed Oct 16 18:16:46 2019 From: manc at google.com (Man Cao) Date: Wed, 16 Oct 2019 11:16:46 -0700 Subject: RFR(S): 8232232: G1RemSetSummary::_rs_threads_vtimes is not initialized to zero In-Reply-To: <75800cc4-9d67-bc11-c0b7-4b0a56b85ab5@oracle.com> References: <75800cc4-9d67-bc11-c0b7-4b0a56b85ab5@oracle.com> Message-ID: Thanks for the reviews. -Man From serguei.spitsyn at oracle.com Wed Oct 16 23:21:55 2019 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 16 Oct 2019 16:21:55 -0700 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: <7FC905BD-8F45-4D04-9C2C-C473AB0FA3DD@oracle.com> References: <0a2eee49-9bb4-a1be-f8fc-b2efcc01fd59@oracle.com> <7FC905BD-8F45-4D04-9C2C-C473AB0FA3DD@oracle.com> Message-ID: Hi Per, Looks good. Thanks, Serguei On 10/15/19 23:56, Erik Osterlund wrote: > +1 > > /Erik > >> On 10 Oct 2019, at 14:28, Per Liden wrote: >> >> ?(CC:ing serviceability-dev) >> >>> On 10/7/19 2:38 PM, Per Liden wrote: >>> This test is currently disabled for ZGC, but it can easily be enabled by adjusting the expected log string. ZGC doesn't print "Pause Full", but it still prints the "(Diagnostic Command)" part. >>> Also, the test enables gc=debug logging, which is unnecessary since this is always printed on the gc=info level. >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231943 >>> Webrev: http://cr.openjdk.java.net/~pliden/8231943/webrev.0 >>> Testing: Manually ran test with all GCs (except Epsilon) >>> /Per From kishor.kharbas at intel.com Thu Oct 17 01:39:48 2019 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Thu, 17 Oct 2019 01:39:48 +0000 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Hi Sangheon, From: sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com] Sent: Wednesday, October 16, 2019 11:03 AM To: Kharbas, Kishor Cc: hotspot-gc-dev at openjdk.java.net; Stefan Johansson Subject: Re: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. Hi Kishor, Before reviewing webrev.02, could you remind us what was the motivation of pinning the bitmap mappers here? In addition to explanations of the problematic situation, any logs / stack-trace also may help. We think that understanding of the root cause should be considered first. Unfortunately, I do not have log/stack-trace of the problem I had faced. I am trying to reproduce it by running SPECjbb workload over and over again. I haven't looked at GC code since end of last year. So I am having a difficult time pinning what the problem was. I am looking at G1ClearBitMapTask which iterates over bitmap for all available regions. I am not sure when this task is performed. There is comment in HeapRegionManager::par_iterate() as shown below, // This also (potentially) iterates over regions newly allocated during GC. This // is no problem except for some extra work. This method is eventually called from G1ClearBitMapTask. The comment suggests that regions are allocated concurrently when the function is run. This also means with AllocateOldGenAt flag enabled, regions can also be un-committed. Pardon me if my understanding is incorrect. Regards, Kishor Thanks, Sangheon On 10/15/19 6:23 PM, Kharbas, Kishor wrote: Thank you for the suggestions. In this webrev I added a flag to ReservedSpace constructors to direct it to pin the memory space. So now G1PageBasedVirtualSpace does not have to do special handling. http://cr.openjdk.java.net/~kkharbas/8215893/webrev.02/ To add more to Sangheon's reply to Stefan's question, > Another thing, can you remind me why we need the bitmaps to be pinned but not other structures such as the card table? When I implemented this feature I had run into issue with the default implementation of concurrent marking bitmaps. Thanks, Kishor From: sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com] Sent: Wednesday, October 9, 2019 2:42 PM To: Kharbas, Kishor ; hotspot-gc-dev at openjdk.java.net Cc: Stefan Johansson Subject: Re: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. Hi Kishor, On 10/4/19 4:15 PM, Kharbas, Kishor wrote: Hi Stefan, Thanks for the review. Some comments inline. New webrev : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00_to_01/ http://cr.openjdk.java.net/~kkharbas/8215893/webrev.01/ I am reviewing the patch but have a question on top of Stefan's question[1]. Why the bimap mappers are committed? I think all troubles started from 'committing but treating as special here. Couldn't just treat the bitmap mappers as 'special' without commit? If 'not committing' is doable, couldn't simply create ReservedSpace with 'special' enabled (independent to large page setting, which is same to Stefan's comment)? Or add PinnedResevedSpace to force 'special enabled'. [1]: Another thing, can you remind me why we need the bitmaps to be pinned but not other structures such as the card table? +HeterogeneousHeapRegionManager::initialize() ... + // We commit bitmap for all regions during initialization and mark the bitmap space as special. + // This allows regions to be un-committed while concurrent-marking threads are accesing the bitmap concurrently. Thanks, Sangheon > Hi Kishor, > > On 04.10.19 03:00, Kharbas, Kishor wrote: >> Hi, >> When I worked on JDK-8211425, there was a request for better abstraction for pinning G1's CM bitmaps. RFE for the request is here - JDK-8215893. >> >> Here is a proposal : http://cr.openjdk.java.net/~kkharbas/8215893/webrev.00/ >> >> Here G1PageBasedVirtualSpace pins the entire reserved memory to memory during construction. The constructor takes an additional bool flag which says "does it need to pin the memory". >> If the memory is pinned, '_special' flag is set to true. I piggy back on _special flag's behavior which is to not do actual OS (un-)commits on calls to (un)commit(). >> Rest of the changes is the mechanism to pass this flag from CM bitmaps creation in G1CollectedHeap all the way to G1PageBasedVirtualSpace. >> >> Let me know if this is a good abstraction and if there is any better way. >> >> Thanks >> Kishor >> > > Some comments: > > - in the parameter lists, if the parameters are already laid out > line-by-line, if adding a new one, please put it on a new line as well. > Fixed in the new webrev. > - this code > > if (_special) { > if (!rs.special()) { > commit_internal(addr_to_page_index(_low_boundary), > addr_to_page_index(_high_boundary)); > } > > in g1PageBasedVirtualSpace looks very incomprehensible. :) > > I would prefer (pending the second reviewer's comment) to either use the > "pinned" flag here, or even better, move the necessary commit calls into > the (now removed) HeterogeneousHeapRegionManager::initialize(). > Made it little more comprehensible. Will see what other reviewers think about moving it somewhere else. > - I would just purely from feeling prefer if the "pinned" flag parameter > would be listed after the "type" parameter in the G1RegionToSpaceMapper. > But that's probably just me. > I did it this way to logically group the parameters. MemTracker is a tracker used by the VM everywhere and does not pertain to this class as such, so I kept it in the end. > Also, finally one parameter per line for the declaration/definition of > the constructor would improve readability. > Done. Thank you, Kishor > Thanks, > Thomas From per.liden at oracle.com Thu Oct 17 08:44:03 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 17 Oct 2019 10:44:03 +0200 Subject: RFR: 8231943: ZGC: Enable serviceability/dcmd/gc/RunGCTest In-Reply-To: References: <0a2eee49-9bb4-a1be-f8fc-b2efcc01fd59@oracle.com> <7FC905BD-8F45-4D04-9C2C-C473AB0FA3DD@oracle.com> Message-ID: <9b63f39b-8fb6-a6e9-57be-a63df0e6ede4@oracle.com> Thanks Serguei! /Per On 2019-10-17 01:21, serguei.spitsyn at oracle.com wrote: > Hi Per, > > Looks good. > > Thanks, > Serguei > > > On 10/15/19 23:56, Erik Osterlund wrote: >> +1 >> >> /Erik >> >>> On 10 Oct 2019, at 14:28, Per Liden wrote: >>> >>> ?(CC:ing serviceability-dev) >>> >>>> On 10/7/19 2:38 PM, Per Liden wrote: >>>> This test is currently disabled for ZGC, but it can easily be >>>> enabled by adjusting the expected log string. ZGC doesn't print >>>> "Pause Full", but it still prints the "(Diagnostic Command)" part. >>>> Also, the test enables gc=debug logging, which is unnecessary since >>>> this is always printed on the gc=info level. >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231943 >>>> Webrev: http://cr.openjdk.java.net/~pliden/8231943/webrev.0 >>>> Testing: Manually ran test with all GCs (except Epsilon) >>>> /Per > From stefan.johansson at oracle.com Thu Oct 17 11:34:00 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 17 Oct 2019 13:34:00 +0200 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Hi Kishor, On 2019-10-17 03:39, Kharbas, Kishor wrote: > Hi Sangheon, > > *From:*sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com] > *Sent:* Wednesday, October 16, 2019 11:03 AM > *To:* Kharbas, Kishor > *Cc:* hotspot-gc-dev at openjdk.java.net; Stefan Johansson > > *Subject:* Re: RFR(S): 8215893: Add better abstraction for pinning G1 > concurrent marking bitmaps. > >> Hi Kishor, >> >> Before reviewing webrev.02, could you remind us what was the motivation >> of pinning the bitmap mappers here? >> In addition to explanations of the problematic situation, any logs / >> stack-trace also may help. >> >> We think that understanding of the root cause should be considered first. > > Unfortunately, I do not have log/stack-trace of the problem I had faced. > > I am trying to reproduce it by running SPECjbb workload over and over again. > > I haven?t looked at GC code since end of last year. So I am having a > difficult time pinning what the problem was. > > I am looking at G1ClearBitMapTask which iterates over bitmap for all > available regions. I am not sure when this task is performed. > > There is comment in HeapRegionManager::par_iterate() as shown below, > > /// This also (potentially) iterates over regions newly allocated during > GC. This/ > > /? // is no problem except for some extra work./ > > This method is eventually called from G1ClearBitMapTask. The comment > suggests that regions are allocated concurrently when the function is > run. This also means with AllocateOldGenAt flag enabled, regions can > also be un-committed. I don't understand how AllocateOldGenAt would make any difference, regions can be un-committed without it as well and there are mechanisms in place to make sure only the correct parts of the side structures are un-committed when that happens. I want to reiterate what Sangheon said about identifying the root cause. If we don't know why this is needed and can't reproduce any failures without the special pinning of the bitmaps, I would rather see that we remove the pinning code to make things work more like normal G1. Thanks, Stefan > > Pardon me if my understanding is incorrect. > > Regards, > > Kishor From shade at redhat.com Thu Oct 17 17:00:30 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 17 Oct 2019 19:00:30 +0200 Subject: RFR (XS) 8232534: Shenandoah: guard against reentrant ShenandoahHeapLock locking Message-ID: <75af279f-5a46-de65-9f16-dd064ac98210@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8232534 This one was very useful for debugging: diff -r 55fe0d93bdd3 src/hotspot/share/gc/shenandoah/shenandoahLock.hpp --- a/src/hotspot/share/gc/shenandoah/shenandoahLock.hpp Tue Oct 15 22:22:23 2019 -0400 +++ b/src/hotspot/share/gc/shenandoah/shenandoahLock.hpp Thu Oct 17 18:59:27 2019 +0200 @@ -39,10 +39,13 @@ public: ShenandoahLock() : _state(unlocked), _owner(NULL) {}; void lock() { +#ifdef ASSERT + assert(_owner != Thread::current(), "reentrant locking attempt, would deadlock"); +#endif Thread::SpinAcquire(&_state, "Shenandoah Heap Lock"); #ifdef ASSERT assert(_state == locked, "must be locked"); assert(_owner == NULL, "must not be owned"); _owner = Thread::current(); Testing: hotspot_gc_shenandoah; multiple assert failures due to bugs in development -- Thanks, -Aleksey From shade at redhat.com Thu Oct 17 18:10:44 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 17 Oct 2019 20:10:44 +0200 Subject: RFR (S) 8232573: Shenandoah: cleanup and add more logging for in-pause phases Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8232573 Fix: https://cr.openjdk.java.net/~shade/8232573/webrev.01 This improves profiling for pauses, fixing issues recently found when doing some performance investigations and development work. Testing: hotspot_gc_shenandoah, eyeballing -Xlog:gc+stats -- Thanks, -Aleksey From rkennke at redhat.com Thu Oct 17 18:13:53 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 17 Oct 2019 20:13:53 +0200 Subject: RFR (XS) 8232534: Shenandoah: guard against reentrant ShenandoahHeapLock locking In-Reply-To: <75af279f-5a46-de65-9f16-dd064ac98210@redhat.com> References: <75af279f-5a46-de65-9f16-dd064ac98210@redhat.com> Message-ID: Yep, good and useful. Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8232534 > > This one was very useful for debugging: > > diff -r 55fe0d93bdd3 src/hotspot/share/gc/shenandoah/shenandoahLock.hpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahLock.hpp Tue Oct 15 22:22:23 2019 -0400 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahLock.hpp Thu Oct 17 18:59:27 2019 +0200 > @@ -39,10 +39,13 @@ > > public: > ShenandoahLock() : _state(unlocked), _owner(NULL) {}; > > void lock() { > +#ifdef ASSERT > + assert(_owner != Thread::current(), "reentrant locking attempt, would deadlock"); > +#endif > Thread::SpinAcquire(&_state, "Shenandoah Heap Lock"); > #ifdef ASSERT > assert(_state == locked, "must be locked"); > assert(_owner == NULL, "must not be owned"); > _owner = Thread::current(); > > Testing: hotspot_gc_shenandoah; multiple assert failures due to bugs in development > From rkennke at redhat.com Thu Oct 17 18:15:37 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 17 Oct 2019 20:15:37 +0200 Subject: RFR (S) 8232573: Shenandoah: cleanup and add more logging for in-pause phases In-Reply-To: References: Message-ID: <65205df1-b696-6f0e-5ef8-29f6e57a45ca@redhat.com> Good, that seems useful. Patch looks good. Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8232573 > > Fix: > https://cr.openjdk.java.net/~shade/8232573/webrev.01 > > This improves profiling for pauses, fixing issues recently found when doing some performance > investigations and development work. > > Testing: hotspot_gc_shenandoah, eyeballing -Xlog:gc+stats > From zgu at redhat.com Thu Oct 17 18:16:35 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 17 Oct 2019 14:16:35 -0400 Subject: RFR 8231324: Shenandoah: avoid duplicated weak root works during final traversal In-Reply-To: References: Message-ID: <07f55f1f-15ac-339f-37aa-135be1ff2bde@redhat.com> Updated after JDK-8231999. Changed: heap->is_concurrent_traversal_in_progress() to heap->is_traversal_mode() Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231324/webrev.01/ Reran hotspot_gc_shenandoah test. -Zhengyu On 10/4/19 10:51 AM, Zhengyu Gu wrote: > Please review this patch that avoids traversal GC to walk weak roots > twice during final traversal. > > Also, it should process weak roots first, so that, fixup phase does not > visit dead CLDs/codes, etc. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231324 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231324/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) on Linux x86_64 > > Thanks, > > -Zhengyu From kishor.kharbas at intel.com Thu Oct 17 21:28:10 2019 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Thu, 17 Oct 2019 21:28:10 +0000 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: Hi Stefan, > -----Original Message----- > From: Stefan Johansson [mailto:stefan.johansson at oracle.com] > Sent: Thursday, October 17, 2019 4:34 AM > To: Kharbas, Kishor ; sangheon.kim at oracle.com > Cc: hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S): 8215893: Add better abstraction for pinning G1 > concurrent marking bitmaps. > > Hi Kishor, > > On 2019-10-17 03:39, Kharbas, Kishor wrote: > > Hi Sangheon, > > > > *From:*sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com] > > *Sent:* Wednesday, October 16, 2019 11:03 AM > > *To:* Kharbas, Kishor > > *Cc:* hotspot-gc-dev at openjdk.java.net; Stefan Johansson > > > > *Subject:* Re: RFR(S): 8215893: Add better abstraction for pinning G1 > > concurrent marking bitmaps. > > > >> Hi Kishor, > >> > >> Before reviewing webrev.02, could you remind us what was the > >> motivation of pinning the bitmap mappers here? > >> In addition to explanations of the problematic situation, any logs / > >> stack-trace also may help. > >> > >> We think that understanding of the root cause should be considered first. > > > > Unfortunately, I do not have log/stack-trace of the problem I had faced. > > > > I am trying to reproduce it by running SPECjbb workload over and over > again. > > > > I haven't looked at GC code since end of last year. So I am having a > > difficult time pinning what the problem was. > > > > I am looking at G1ClearBitMapTask which iterates over bitmap for all > > available regions. I am not sure when this task is performed. > > > > There is comment in HeapRegionManager::par_iterate() as shown below, > > > > /// This also (potentially) iterates over regions newly allocated > > during GC. This/ > > > > /? // is no problem except for some extra work./ > > > > This method is eventually called from G1ClearBitMapTask. The comment > > suggests that regions are allocated concurrently when the function is > > run. This also means with AllocateOldGenAt flag enabled, regions can > > also be un-committed. > > I don't understand how AllocateOldGenAt would make any difference, > regions can be un-committed without it as well and there are mechanisms in > place to make sure only the correct parts of the side structures are un- > committed when that happens. In the regular code un-commit is only done by VM thread during safepoint. Un-commit of region also causes its corresponding bitmap to be un-committed. But it never happens that CM threads are iterating over bitmap while regions are being un-committed concurrently. Whereas when AllocateOldGenAt is used, because of the way regions are managed between dram and nvdimms, regions can be un-committed by mutator threads and GC threads. 1. Mutator threads - during mutator region allocation and humongous region allocation. 2. GC worker threads - during survivor region and old region allocation. 3. VMThread - heap size adjustment as in default and after full GC to allocate enough regions in dram for young gen (may require to un-commit some regions from nvdimm). Could any of these be running concurrently when CM threads are iterating over the bitmap? > > I want to reiterate what Sangheon said about identifying the root cause. > If we don't know why this is needed and can't reproduce any failures without > the special pinning of the bitmaps, I would rather see that we remove the > pinning code to make things work more like normal G1. I am trying to reproduce but as you can imagine it is very rare and hard-to-reproduce bug, if it is. Thanks, Kishor > > Thanks, > Stefan > > > > > > Pardon me if my understanding is incorrect. > > > > Regards, > > > > Kishor From zgu at redhat.com Fri Oct 18 12:23:17 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 18 Oct 2019 08:23:17 -0400 Subject: RFR 8232009: Shenandoah: C2 load barrier does not match interpreter version In-Reply-To: <689f5b52-de3b-6a1a-0032-365dedf58414@redhat.com> References: <689f5b52-de3b-6a1a-0032-365dedf58414@redhat.com> Message-ID: <737de3b3-fb29-1795-89c7-99781da22a09@redhat.com> Updated: Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232009/webrev.01/ Added PHANTOM_OOP_REF references. Test: Reran tests Thanks, -Zhengyu On 10/11/19 1:11 PM, Zhengyu Gu wrote: > Please review this patch that matches C2 load barrier to interpreter's > implementation. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232009 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232009/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) with x86_32 and x86_64 > JVMs on Linux > > > Thanks, > > -Zhengyu From rkennke at redhat.com Fri Oct 18 13:07:45 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 18 Oct 2019 15:07:45 +0200 Subject: RFR 8232009: Shenandoah: C2 load barrier does not match interpreter version In-Reply-To: <737de3b3-fb29-1795-89c7-99781da22a09@redhat.com> References: <689f5b52-de3b-6a1a-0032-365dedf58414@redhat.com> <737de3b3-fb29-1795-89c7-99781da22a09@redhat.com> Message-ID: <1b753db4-70a8-2b77-fb44-c42dc34aec41@redhat.com> Ok. Thank you! Roman > Updated: Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232009/webrev.01/ > > Added PHANTOM_OOP_REF references. > > Test: > ?Reran tests > > Thanks, > > -Zhengyu > > On 10/11/19 1:11 PM, Zhengyu Gu wrote: >> Please review this patch that matches C2 load barrier to interpreter's >> implementation. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232009 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232009/webrev.00/ >> >> Test: >> ?? hotspot_gc_shenandoah (fastdebug and release) with x86_32 and >> x86_64 JVMs on Linux >> >> >> Thanks, >> >> -Zhengyu > From rkennke at redhat.com Fri Oct 18 13:09:32 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 18 Oct 2019 15:09:32 +0200 Subject: RFR 8231324: Shenandoah: avoid duplicated weak root works during final traversal In-Reply-To: <07f55f1f-15ac-339f-37aa-135be1ff2bde@redhat.com> References: <07f55f1f-15ac-339f-37aa-135be1ff2bde@redhat.com> Message-ID: <1ee00447-919e-c127-3f0f-d65c60aab057@redhat.com> Looks good. (I was pretty sure I looked through it yesterday already. Hmm.) Thanks, Roman > Updated after JDK-8231999. > > Changed: heap->is_concurrent_traversal_in_progress() to > heap->is_traversal_mode() > > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231324/webrev.01/ > > Reran hotspot_gc_shenandoah test. > > -Zhengyu > > > On 10/4/19 10:51 AM, Zhengyu Gu wrote: >> Please review this patch that avoids traversal GC to walk weak roots >> twice during final traversal. >> >> Also, it should process weak roots first, so that, fixup phase does >> not visit dead CLDs/codes, etc. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231324 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8231324/webrev.00/ >> >> Test: >> ?? hotspot_gc_shenandoah (fastdebug and release) on Linux x86_64 >> >> Thanks, >> >> -Zhengyu > From rkennke at redhat.com Fri Oct 18 13:10:09 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 18 Oct 2019 15:10:09 +0200 Subject: RFR 8232008: Shenandoah: C1 load barrier does not match interpreter version In-Reply-To: References: Message-ID: Looks good. Thanks! Roman > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232008 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232008/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) with x86_64 and x86-32 > JVM on Linux. > > Thanks, > > -Zhengyu > From rkennke at redhat.com Fri Oct 18 13:13:52 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 18 Oct 2019 15:13:52 +0200 Subject: RFR 8232010: Shenandoah: implement self-fixing native barrier In-Reply-To: <6cecca8a-a477-53b4-48de-f504a2100955@redhat.com> References: <6cecca8a-a477-53b4-48de-f504a2100955@redhat.com> Message-ID: Would a similar implementation also work for the non-native LRB? It's lacking an aarch64 implementation, right? Roman > Please review this patch that implements self-fixing LRB for in native > oops. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232010 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232010/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) with x86_32 and x86_64 > JVM on Linux. > > Thanks, > > -Zhengyu > From zgu at redhat.com Fri Oct 18 13:24:36 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 18 Oct 2019 09:24:36 -0400 Subject: RFR 8232010: Shenandoah: implement self-fixing native barrier In-Reply-To: References: <6cecca8a-a477-53b4-48de-f504a2100955@redhat.com> Message-ID: On 10/18/19 9:13 AM, Roman Kennke wrote: > Would a similar implementation also work for the non-native LRB? Yes, just need to make LRB stub to take the second parameter. > > It's lacking an aarch64 implementation, right? aarch64 misses all recent barrier changes. I intent to implement them after stabilize x86. Thanks, -Zhengyu > > Roman > > >> Please review this patch that implements self-fixing LRB for in native >> oops. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232010 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232010/webrev.00/ >> >> Test: >> ? hotspot_gc_shenandoah (fastdebug and release) with x86_32 and x86_64 >> JVM on Linux. >> >> Thanks, >> >> -Zhengyu >> > From shade at redhat.com Fri Oct 18 13:58:23 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 18 Oct 2019 15:58:23 +0200 Subject: RFR (M) 8232575: Shenandoah: asynchronous object/region pinning Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8232575 Current object/region pinning scheme bottlenecks on a lock, rendering some non-exceptional scenarios quite slow. The way out is to collect critical pins atomically, and then update the region states near the code that needs it (mostly selecting collection set). See the bug for more info. Fix: https://cr.openjdk.java.net/~shade/8232575/webrev.02/ Testing: hotspot_gc_shenandoah {fastdebug,release}; tier{1,2,3} with Shenandoah; GZIP workload with {normal, traversal} x {adaptive, aggressive} -- Thanks, -Aleksey From rkennke at redhat.com Fri Oct 18 14:03:50 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 18 Oct 2019 16:03:50 +0200 Subject: RFR (M) 8232575: Shenandoah: asynchronous object/region pinning In-Reply-To: References: Message-ID: <63521dc4-53e8-72b4-852b-1d77def03c62@redhat.com> Patch looks good! Thank you! Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8232575 > > Current object/region pinning scheme bottlenecks on a lock, rendering some non-exceptional scenarios > quite slow. The way out is to collect critical pins atomically, and then update the region states > near the code that needs it (mostly selecting collection set). See the bug for more info. > > Fix: > https://cr.openjdk.java.net/~shade/8232575/webrev.02/ > > Testing: hotspot_gc_shenandoah {fastdebug,release}; tier{1,2,3} with Shenandoah; GZIP workload with > {normal, traversal} x {adaptive, aggressive} > From stefan.johansson at oracle.com Fri Oct 18 14:31:48 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Fri, 18 Oct 2019 16:31:48 +0200 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: References: Message-ID: <2CB4D3B2-02E7-46A8-85D3-CCEA34C0695B@oracle.com> Hi Kishor, > 17 okt. 2019 kl. 23:28 skrev Kharbas, Kishor : > > Hi Stefan, > >> -----Original Message----- >> From: Stefan Johansson [mailto:stefan.johansson at oracle.com] >> Sent: Thursday, October 17, 2019 4:34 AM >> To: Kharbas, Kishor ; sangheon.kim at oracle.com >> Cc: hotspot-gc-dev at openjdk.java.net >> Subject: Re: RFR(S): 8215893: Add better abstraction for pinning G1 >> concurrent marking bitmaps. >> >> Hi Kishor, >> >> On 2019-10-17 03:39, Kharbas, Kishor wrote: >>> Hi Sangheon, >>> >>> *From:*sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com] >>> *Sent:* Wednesday, October 16, 2019 11:03 AM >>> *To:* Kharbas, Kishor >>> *Cc:* hotspot-gc-dev at openjdk.java.net; Stefan Johansson >>> >>> *Subject:* Re: RFR(S): 8215893: Add better abstraction for pinning G1 >>> concurrent marking bitmaps. >>> >>>> Hi Kishor, >>>> >>>> Before reviewing webrev.02, could you remind us what was the >>>> motivation of pinning the bitmap mappers here? >>>> In addition to explanations of the problematic situation, any logs / >>>> stack-trace also may help. >>>> >>>> We think that understanding of the root cause should be considered first. >>> >>> Unfortunately, I do not have log/stack-trace of the problem I had faced. >>> >>> I am trying to reproduce it by running SPECjbb workload over and over >> again. >>> >>> I haven't looked at GC code since end of last year. So I am having a >>> difficult time pinning what the problem was. >>> >>> I am looking at G1ClearBitMapTask which iterates over bitmap for all >>> available regions. I am not sure when this task is performed. >>> >>> There is comment in HeapRegionManager::par_iterate() as shown below, >>> >>> /// This also (potentially) iterates over regions newly allocated >>> during GC. This/ >>> >>> / // is no problem except for some extra work./ >>> >>> This method is eventually called from G1ClearBitMapTask. The comment >>> suggests that regions are allocated concurrently when the function is >>> run. This also means with AllocateOldGenAt flag enabled, regions can >>> also be un-committed. >> >> I don't understand how AllocateOldGenAt would make any difference, >> regions can be un-committed without it as well and there are mechanisms in >> place to make sure only the correct parts of the side structures are un- >> committed when that happens. > > In the regular code un-commit is only done by VM thread during safepoint. Un-commit of region also causes its corresponding bitmap to be un-committed. > But it never happens that CM threads are iterating over bitmap while regions are being un-committed concurrently. > > Whereas when AllocateOldGenAt is used, because of the way regions are managed between > dram and nvdimms, regions can be un-committed by mutator threads and GC threads. > 1. Mutator threads - during mutator region allocation and humongous region allocation. This is the problem, I managed to reproduce this by adding a short sleep in the clearing code and force back to back concurrent cycles in SPECjvm2008 and a 2g heap. I think this is only a problem for humongous allocations, because we should never allocate more young regions than we have already made available at the end of the previous GC. But the humongous allocations can very well happen during we clear the bitmaps in the concurrent cycle so that is probably why the pinning was added. Thinking more about this, a different solution would be to not un-commit memory in this situation. This all depends on how one sees the amount of committed memory when using AllocateOldGenAt, should the amount of committed on dram + nvdimm never be more than Xmx or is the important thing that the number of regions use never exceeds Xmx. I think I?m leaning towards the latter, but there might be reasons I haven?t thought about here. This would break the current invariant: assert(total_committed_before == total_regions_committed(), "invariant not met?); But that might be ok. If using that approach, instead of un-committing (shrink_dram), just remove the same number of regions from the freelist, that you expand on nvdimm. The unused removed regions need to be kept track of so we can add them again during the GC. To me this is more or less the same concept we use when borrowing regions during the GC. There might be issues with this approach but I think it would be interesting to explore. I also wonder if we ever should need to expand_dram during allocate_new_region, I see that it happens now during GC and that is probably because we do this at the end of the GC: _manager->adjust_dram_regions((uint)young_list_target_length() ? If this adjustment included the expected number of survivors as well, we should have enough DRAM regions and if we then end up getting an NVDIMM region when asking for a survivor we should return NULL signaling that survivor is full. What do you think about that approach? Thanks, Stefan > 2. GC worker threads - during survivor region and old region allocation. > 3. VMThread - heap size adjustment as in default and after full GC to allocate enough regions in dram for young gen (may require to un-commit some regions from nvdimm). > > Could any of these be running concurrently when CM threads are iterating over the bitmap? > >> >> I want to reiterate what Sangheon said about identifying the root cause. >> If we don't know why this is needed and can't reproduce any failures without >> the special pinning of the bitmaps, I would rather see that we remove the >> pinning code to make things work more like normal G1. > > I am trying to reproduce but as you can imagine it is very rare and hard-to-reproduce bug, if it is. > > Thanks, > Kishor >> >> Thanks, >> Stefan >> >> >>> >>> Pardon me if my understanding is incorrect. >>> >>> Regards, >>> >>> Kishor From thomas.schatzl at oracle.com Sat Oct 19 13:06:09 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Sat, 19 Oct 2019 15:06:09 +0200 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> Message-ID: Hi all, there is a new webrev at http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2/ (full only, there is no point in providing a diff) since I like this solution a lot as it removes a lot of additional post-processing. Testing has been a bit of a headache: interference between strong and weak processing is extremely rare, so I had to make it pretty common by 1) only a single thread doing strong processing 2) the weak processing stage has to be moved right after the root processing so they overlap with a lot higher probability hs-tier 1-5 passes with and without these changes, with a noticable amount of overlap according to additional log messages. That change can be looked at at http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2.testing/ . Obviously I am not going to push this. Surprisingly there had to be no changes to Shenandoah as it does not use the claim mechanism changed here, implementing something else. Shenandoah also passed vmTestbase/gc with these changes with no problem. Below this email is a copy of Kim's suggestion about the state machine again for reference. I also added documentation about why and how the code is supposed to work. Thanks, Thomas On Wed, 2019-10-09 at 17:23 -0400, Kim Barrett wrote: > > On Oct 8, 2019, at 7:48 PM, Kim Barrett > > wrote: > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp > > 3874 if (collector_state()->in_initial_mark_gc()) { > > 3875 remark_strong_nmethods(per_thread_states); > > 3876 } > > > > I think this additional task and the associated pending strong > > nmethod > > sets in the pss can be eliminated by using a 2-bit tag and a more > > complex state machine earlier. > > I thought about this some more and have some improvements to the > previous pseudo-code, including eliminating the loop in > strong_processor. More careful consideration of the possible states > showed them to be more limited than I'd previously thought they were. > I hadn't noticed the benefit from delaying weak_processor's push onto > the global list and combining it with the transition to the "weak > done" state. > > States, encoded in the link member of nmethod N: > - unclaimed: NULL > - weak: N, tag 00 > - weak done: NEXT, tag 01 > - weak, need strong: N, tag 10 > - strong: NEXT, tag 11 > > where NEXT is the next nmethod in the global list, or N if it is the > last entry, e.g. self-loop indicates end of list. > > weak_processor(n): > if n->link != NULL: > # already claimed; nothing to do here. > return > elif not replace_if_null(tagged(n, 0), &n->link): > # just claimed by another thread; nothing to do here. > return > # successfully claimed for weak processing. > assert n->link == tagged(n, 0) > do_weak_processing(n) > # push onto global list. self-loop end of list to avoid tagged > NULL. > # not pushing onto global list until ready to mark weak > processing > # done significantly simplifies the set of states. > next = xchg(n, &_list_head) > if next == NULL: next = n > # try to install end of list + weak done tag. > if cmpxchg(tagged(next, 1), &n->link, tagged(n, 0)) == tagged(n, > 0): > return > # failed, which means some other thread added strong request. > assert n->link == tagged(n, 2) > # do deferred strong processing. > n->link = tagged(next, 3) > do_strong_processing(n) > > strong_processor(n): > raw_next = cmpxchg(tagged(n, 3), &n->link, NULL) > if raw_next == NULL: > # successfully claimed for strong processing. > do_strong_processing(n) > # push onto global list. self-loop end of list to avoid > tagged NULL. > next = xchg(n, &_list_head) > if next == NULL: next = n > n->link = tagged(next, 3) > return > # claim failed. figure out why and handle it. > next = strip_tag(raw_next) > if raw_next == next: # (raw_next - next) == 0 > # claim failed because being weak processed (state == > "weak"). > # try to request deferred strong processing. > assert next == tagged(n, 0) > raw_next = cmpxchg(tagged(n, 2), &n->link, next) > if (raw_next == next): > # successfully requested deferred strong processing. > return > # failed because of a concurrent transition. > # no longer in "weak" state. > next = strip_tag(raw_next) > if (raw_next - next) >= 2: > # already claimed for strong processing or requested for > such. > return > # weak processing is complete. > # raw_next: tag == 1, NEXT == next list entry or N > if cmpxchg(tagged(NEXT, 3), &N->link, raw_next) == raw_next: > # claimed "weak done" to "strong". > do_strong_processing(N) > # if claim failed then some other thread got it. > From shade at redhat.com Sun Oct 20 19:29:06 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Sun, 20 Oct 2019 21:29:06 +0200 Subject: RFR 8232010: Shenandoah: implement self-fixing native barrier In-Reply-To: <6cecca8a-a477-53b4-48de-f504a2100955@redhat.com> References: <6cecca8a-a477-53b4-48de-f504a2100955@redhat.com> Message-ID: <7c73ed8d-ce53-c54b-28d5-6806f1000af7@redhat.com> On 10/11/19 2:30 PM, Zhengyu Gu wrote: > Please review this patch that implements self-fixing LRB for in native oops. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232010 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232010/webrev.00/ This breaks Windows builds, see: https://bugs.openjdk.java.net/browse/JDK-8232674 -- Thanks, -Aleksey From shade at redhat.com Sun Oct 20 20:33:56 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Sun, 20 Oct 2019 22:33:56 +0200 Subject: RFR (S) 8232674: Fix build and rename ShenandoahBarrierSet::oop_load_from_native_barrier Message-ID: P1 bug: https://bugs.openjdk.java.net/browse/JDK-8232674 I believe this is caused by missing definition of this method: oop oop_load_from_native_barrier(oop obj, narrowOop* load_addr); The way out is to generify it, the same way as we do it for SBS::load_reference_barrier. I also took this opportunity to rename the method to match the other LRB flavor: now the ShenandoahRuntime::load_reference_barrier_native wrapper looks right. Fix: https://cr.openjdk.java.net/~shade/8232674/webrev.01/ Testing: Windows x86_64 build, Linux x86_64 build, hotspot_gc_shenandoah -- Thanks, -Aleksey From zgu at redhat.com Mon Oct 21 00:19:21 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Sun, 20 Oct 2019 20:19:21 -0400 Subject: RFR (S) 8232674: Fix build and rename ShenandoahBarrierSet::oop_load_from_native_barrier In-Reply-To: References: Message-ID: Thanks for fixing it, Aleksey On 10/20/19 4:33 PM, Aleksey Shipilev wrote: > P1 bug: > https://bugs.openjdk.java.net/browse/JDK-8232674 > > I believe this is caused by missing definition of this method: > oop oop_load_from_native_barrier(oop obj, narrowOop* load_addr); This method should never been used. I thought I can get away with not implementing it (just like with gcc on Linux). I think the method body should just be: ... ShouldNotReacheHere(); return NULL; ... -Zhengyu > > The way out is to generify it, the same way as we do it for SBS::load_reference_barrier. I also took > this opportunity to rename the method to match the other LRB flavor: now the > ShenandoahRuntime::load_reference_barrier_native wrapper looks right. > > Fix: > https://cr.openjdk.java.net/~shade/8232674/webrev.01/ > > Testing: Windows x86_64 build, Linux x86_64 build, hotspot_gc_shenandoah > From shade at redhat.com Mon Oct 21 08:00:21 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 21 Oct 2019 10:00:21 +0200 Subject: RFR (S) 8232674: Fix build and rename ShenandoahBarrierSet::oop_load_from_native_barrier In-Reply-To: References: Message-ID: On 10/21/19 2:19 AM, Zhengyu Gu wrote: > On 10/20/19 4:33 PM, Aleksey Shipilev wrote: >> P1 bug: >> ?? https://bugs.openjdk.java.net/browse/JDK-8232674 >> >> I believe this is caused by missing definition of this method: >> ?? oop oop_load_from_native_barrier(oop obj, narrowOop* load_addr); > > This method should never been used. I thought I can get away with not implementing it (just like > with gcc on Linux). > > I think the method body should just be: > ?... > ? ShouldNotReacheHere(); > ? return NULL; > ?... Right! Let's do that: https://cr.openjdk.java.net/~shade/8232674/webrev.02/ ?? Testing: {Linux, Windows} x86_64 hotspot_gc_shenandoah; tier1 with Shenandoah -- Thanks, -Aleksey From sakamoto.osamu at nttcom.co.jp Mon Oct 21 08:50:23 2019 From: sakamoto.osamu at nttcom.co.jp (Osamu Sakamoto) Date: Mon, 21 Oct 2019 17:50:23 +0900 Subject: Segmentation Fault occurs when ClassLoader and Metaspace is released in JDK 8 Message-ID: Hi all, I have a problem about Segmentation Fault(SEGV) in GC and I can't make the cause clear. Could you help me solve the problem? Our System uses OpenJDK 1.8.0.181, and crashed by SEGV when purging ClassLoader at safepoint. This problem can't be reproduced, but this has happened 4 times in a few months. The following is the summary of my investigation. ============================================================================= First I checked hs_err, and that shows that the SEGV occurred. VM_Operation is GenCollectForAllocation at safepoint. ----------------------------------------------------------------------------- # # A fatal error has been detected by the Java Runtime Environment: # #? SIGSEGV (0xb) at pc=0x00007f6080c97f88, pid=23931, tid=0x00007f607c3ed700 # # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13) # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 compressed oops) # Problematic frame: # V? [libjvm.so+0x84bf88] # # Core dump written. Default location: /opt/tomcate0/core or core.23931 # # If you would like to submit a bug report, please visit: #?? http://bugreport.java.com/bugreport/crash.jsp # ---------------? T H R E A D? --------------- Current thread (0x00007f6078c00000):? VMThread [stack: 0x00007f607c2ed000,0x00007f607c3ee000] [id=23939] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000018 Registers: RAX=0x0000000000000010, RBX=0x00007f5ff800ad30, RCX=0x0000000000000010, RDX=0x0000000000000000 RSP=0x00007f607c3ecb50, RBP=0x00007f607c3ecb80, RSI=0x0000000000000002, RDI=0x0000000001cfe570 R8 =0x00007f5ff80ae320, R9 =0x00007f5ff8052480, R10=0x0000000000000000, R11=0x0000000000000400 R12=0x0000000001cfe570, R13=0x00007f6081419470, R14=0x0000000000000002, R15=0x00007f6081418640 RIP=0x00007f6080c97f88, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000004 ? TRAPNO=0x000000000000000e Top of Stack: (sp=0x00007f607c3ecb50) 0x00007f607c3ecb50:?? 00007f607c3ecba0 00007f5ff800ad30 0x00007f607c3ecb60:?? 00007f5ff800ad00 0000000000000000 0x00007f607c3ecb70:?? 0000000000000000 0000000000000001 0x00007f607c3ecb80:?? 00007f607c3ecba0 00007f6080c995fa 0x00007f607c3ecb90:?? 00007f5ff800ad00 00007f5ff800ac20 0x00007f607c3ecba0:?? 00007f607c3ecbc0 00007f60808bff5e 0x00007f607c3ecbb0:?? 00007f5ff800ac20 00007f5ff8052870 0x00007f607c3ecbc0:?? 00007f607c3ecbe0 00007f60808c0f0f 0x00007f607c3ecbd0:?? 00007f607c3ecbf0 00007f608140f308 0x00007f607c3ecbe0:?? 00007f607c3ecc30 00007f6080daa0b7 0x00007f607c3ecbf0:?? 00007f6069000100 0000000000000000 0x00007f607c3ecc00:?? 00007f607c3ecc20 00007f6080ed0800 0x00007f607c3ecc10:?? 00000000000000f9 88e95c3ba257ab00 0x00007f607c3ecc20:?? 431bde82d7b634db 00007f607800aa00 0x00007f607c3ecc30:?? 00007f607c3eccc0 00007f6080daa9d5 0x00007f607c3ecc40:?? 0000000000000000 00007f607803bf20 0x00007f607c3ecc50:?? 00007f607803be20 00000000000003e8 0x00007f607c3ecc60:?? 0000000000000001 00007f6078c00000 0x00007f607c3ecc70:?? 00007f607c3eccc0 0000000000000000 0x00007f607c3ecc80:?? 00000004000000f9 00007f60813e2b99 0x00007f607c3ecc90:?? 00007f607803bfa0 00007f6078c00000 0x00007f607c3ecca0:?? 0000000000000000 0000000000000000 0x00007f607c3eccb0:?? 00007f6081418bd0 00007f607803bf20 0x00007f607c3eccc0:?? 00007f607c3ece60 00007f6080f2048a 0x00007f607c3eccd0:?? 00007f607c3ecd20 00007f607c3ecce0 0x00007f607c3ecce0:?? 00007f6078c00000 00007f6078c00980 0x00007f607c3eccf0:?? 00007f6078c009c0 00007f6078c009d0 0x00007f607c3ecd00:?? 00007f6078c00aa8 00000000000000d8 0x00007f607c3ecd10:?? 00007f6078c00be0 0000000000000000 0x00007f607c3ecd20:?? 00007f607c3ecd28 6e69747563657845 0x00007f607c3ecd30:?? 65706f204d562067 203a6e6f69746172 0x00007f607c3ecd40:?? 656c6c6f436e6547 6c6c41726f467463 Instructions: (pc=0x00007f6080c97f88) 0x00007f6080c97f68:?? b6 12 80 fa 00 74 01 f0 48 0f c1 01 31 c9 31 f6 0x00007f6080c97f78:?? 48 8b 44 0b 10 31 d2 48 85 c0 74 11 0f 1f 40 00 0x00007f6080c97f88:?? 48 8b 40 08 48 83 c2 01 48 85 c0 75 f3 48 83 c1 0x00007f6080c97f98:?? 08 48 01 d6 48 83 f9 20 75 d6 8b 7b 08 48 8b 05 Register to memory mapping: RAX=0x0000000000000010 is an unknown value RBX=0x00007f5ff800ad30 is an unknown value RCX=0x0000000000000010 is an unknown value RDX=0x0000000000000000 is an unknown value RSP=0x00007f607c3ecb50 is an unknown value RBP=0x00007f607c3ecb80 is an unknown value RSI=0x0000000000000002 is an unknown value RDI=0x0000000001cfe570 is an unknown value R8 =0x00007f5ff80ae320 is an unknown value R9 =0x00007f5ff8052480 is an unknown value R10=0x0000000000000000 is an unknown value R11=0x0000000000000400 is an unknown value R12=0x0000000001cfe570 is an unknown value R13=0x00007f6081419470: in /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so at 0x00007f608044c000 R14=0x0000000000000002 is an unknown value R15=0x00007f6081418640: in /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so at 0x00007f608044c000 Stack: [0x00007f607c2ed000,0x00007f607c3ee000], sp=0x00007f607c3ecb50,? free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V? [libjvm.so+0x84bf88] V? [libjvm.so+0x84d5fa] V? [libjvm.so+0x473f5e] V? [libjvm.so+0x474f0f] V? [libjvm.so+0x95e0b7] V? [libjvm.so+0x95e9d5] V? [libjvm.so+0xad448a] V? [libjvm.so+0xad48f1] V? [libjvm.so+0x8beb82] VM_Operation (0x00007f5fd69e6120): GenCollectForAllocation, mode: safepoint, requested by thread 0x00007f6079013800 ... ----------------------------------------------------------------------------- Next, I used GDB to check the backtrace of the SEGV thread from the coredump. The following is the backtrace. The SEGV occurred when ClassLoader is purged and Metaspace is destructed. And frame #7 shows that a signal(SEGV) handler is called after SpaceManager::~SpaceManager() is executed. ----------------------------------------------------------------------------- (gdb) bt #0? 0x00007f608146f1f7 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1? 0x00007f60814708e8 in __GI_abort () at abort.c:90 #2? 0x00007f6080d0bc39 in os::abort (dump_core=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1519 #3? 0x00007f6080f1b816 in VMError::report_and_die (this=this at entry=0x7f607c3ebd10) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060 #4? 0x00007f6080d15927 in JVM_handle_linux_signal (sig=11, info=0x7f607c3ebfb0, ucVoid=0x7f607c3ebe80, abort_if_unrecognized=) ??? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 #5? 0x00007f6080d09038 in signalHandler (sig=11, info=0x7f607c3ebfb0, uc=0x7f607c3ebe80) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4446 #6? #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, __in_chrg=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 #8? 0x00007f6080c995fa in Metaspace::~Metaspace (this=0x7f5ff800ad00, __in_chrg=) ??? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2971 #9? 0x00007f60808bff5e in ClassLoaderData::~ClassLoaderData (this=0x7f5ff800ac20, __in_chrg=) ??? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:383 #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 #11 0x00007f6080daa0b7 in ClassLoaderDataGraph::purge_if_needed () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.hpp:104 #12 SafepointSynchronize::do_cleanup_tasks () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:551 #13 0x00007f6080daa9d5 in SafepointSynchronize::begin () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:402 #14 0x00007f6080f2048a in VMThread::loop (this=this at entry=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:501 #15 0x00007f6080f208f1 in VMThread::run (this=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:276 #16 0x00007f6080d0ab82 in java_start (thread=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:796 #17 0x00007f6081e2de25 in start_thread (arg=0x7f607c3ed700) at pthread_create.c:308 #18 0x00007f608153234d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 ----------------------------------------------------------------------------- In Frame #7, Line 2028 (chunk = chunk->next()) is the crash point. The variable "chunk" is defined at Line 2025 (Metachunk* chunk = chunks_in_use(i);). "chunks_in_use(i)" is defined at Line 648 (Metachunk* chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; }). So I checked values of "_chunks_in_use", and understood that "_chunks_in_use[2]" has Illegal Address "0x10". Therefore, I think that the SEGV occurred because of referencing Illegal Address "0x10" at "chunk = chunk->next()". ----------------------------------------------------------------------------- (gdb) f 7 #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, __in_chrg=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 2028??? ??? chunk = chunk->next(); (gdb) list 2023??? size_t SpaceManager::sum_count_in_chunks_in_use(ChunkIndex i) { 2024??? ? size_t count = 0; 2025??? ? Metachunk* chunk = chunks_in_use(i); 2026??? ? while (chunk != NULL) { 2027??? ??? count++; 2028??? ??? chunk = chunk->next(); 2029??? ? } 2030??? ? return count; 2031??? } 2032 (gdb) list SpaceManager::chunks_in_use 647??? ? // Accessors 648??? ? Metachunk* chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; } ... (gdb) p _chunks_in_use $11 = {0x7f5fcd41c400, 0x7f5fcd41a000, 0x10, 0x0} ----------------------------------------------------------------------------- The following is disassemble code of "SpaceManager::~SpaceManager()". %rax has 0x10 at "0x00007f6080c97f88 <+200>", but I don't understand why this "0x10" is inserted to %rax. ----------------------------------------------------------------------------- (gdb) disas Dump of assembler code for function SpaceManager::~SpaceManager(): ?? 0x00007f6080c97ec0 <+0>:??? push?? %rbp ?? 0x00007f6080c97ec1 <+1>:??? mov??? %rsp,%rbp ?? 0x00007f6080c97ec4 <+4>:??? push?? %r15 ?? 0x00007f6080c97ec6 <+6>:??? push?? %r14 ?? 0x00007f6080c97ec8 <+8>:??? push?? %r13 ?? 0x00007f6080c97eca <+10>:??? push?? %r12 ?? 0x00007f6080c97ecc <+12>:??? push?? %rbx ?? 0x00007f6080c97ecd <+13>:??? mov??? %rdi,%rbx ?? 0x00007f6080c97ed0 <+16>:??? sub??? $0x8,%rsp ?? 0x00007f6080c97ed4 <+20>:??? mov 0x780785(%rip),%r12??????? # 0x7f6081418660 <_ZN12SpaceManager12_expand_lockE> ?? 0x00007f6080c97edb <+27>:??? test?? %r12,%r12 ?? 0x00007f6080c97ede <+30>:??? je???? 0x7f6080c97ee8 ?? 0x00007f6080c97ee0 <+32>:??? mov??? %r12,%rdi ?? 0x00007f6080c97ee3 <+35>:??? callq? 0x7f6080cce2f0 ?? 0x00007f6080c97ee8 <+40>:??? movslq 0x8(%rbx),%rcx ?? 0x00007f6080c97eec <+44>:??? lea 0x78075d(%rip),%rdx??????? # 0x7f6081418650 <_ZN12MetaspaceAux15_capacity_wordsE> ?? 0x00007f6080c97ef3 <+51>:??? lea 0x781576(%rip),%r13??????? # 0x7f6081419470 <_ZN2os16_processor_countE> ?? 0x00007f6080c97efa <+58>:??? lea 0x78073f(%rip),%r15??????? # 0x7f6081418640 <_ZN12MetaspaceAux11_used_wordsE> ?? 0x00007f6080c97f01 <+65>:??? mov??? (%rdx,%rcx,8),%rax ?? 0x00007f6080c97f05 <+69>:??? sub??? 0x40(%rbx),%rax ?? 0x00007f6080c97f09 <+73>:??? mov??? %rax,(%rdx,%rcx,8) ?? 0x00007f6080c97f0d <+77>:??? mov??? 0x38(%rbx),%rax ?? 0x00007f6080c97f11 <+81>:??? movslq 0x8(%rbx),%rdx ?? 0x00007f6080c97f15 <+85>:??? neg??? %rax ?? 0x00007f6080c97f18 <+88>:??? cmpl?? $0x1,0x0(%r13) ?? 0x00007f6080c97f1d <+93>:??? lea??? (%r15,%rdx,8),%rcx ?? 0x00007f6080c97f21 <+97>:??? mov??? $0x1,%edx ?? 0x00007f6080c97f26 <+102>:??? jne??? 0x7f6080c97f32 ?? 0x00007f6080c97f28 <+104>:??? lea 0x74acb4(%rip),%rdx??????? # 0x7f60813e2be3 ?? 0x00007f6080c97f2f <+111>:??? movzbl (%rdx),%edx ?? 0x00007f6080c97f32 <+114>:??? cmp??? $0x0,%dl ?? 0x00007f6080c97f35 <+117>:??? je???? 0x7f6080c97f38 ?? 0x00007f6080c97f37 <+119>:??? lock xadd %rax,(%rcx) ?? 0x00007f6080c97f3c <+124>:??? mov??? 0x48(%rbx),%r14 ?? 0x00007f6080c97f40 <+128>:??? callq? 0x7f6080c951a0 ?? 0x00007f6080c97f45 <+133>:??? movslq 0x8(%rbx),%rdx ?? 0x00007f6080c97f49 <+137>:??? imul?? %r14,%rax ?? 0x00007f6080c97f4d <+141>:??? lea??? (%r15,%rdx,8),%rcx ?? 0x00007f6080c97f51 <+145>:??? mov??? $0x1,%edx ?? 0x00007f6080c97f56 <+150>:??? neg??? %rax ?? 0x00007f6080c97f59 <+153>:??? cmpl?? $0x1,0x0(%r13) ?? 0x00007f6080c97f5e <+158>:??? jne??? 0x7f6080c97f6a ?? 0x00007f6080c97f60 <+160>:??? lea 0x74ac7c(%rip),%rdx??????? # 0x7f60813e2be3 ?? 0x00007f6080c97f67 <+167>:??? movzbl (%rdx),%edx ?? 0x00007f6080c97f6a <+170>:??? cmp??? $0x0,%dl ?? 0x00007f6080c97f6d <+173>:??? je???? 0x7f6080c97f70 ?? 0x00007f6080c97f6f <+175>:??? lock xadd %rax,(%rcx) ?? 0x00007f6080c97f74 <+180>:??? xor??? %ecx,%ecx ?? 0x00007f6080c97f76 <+182>:??? xor??? %esi,%esi ?? 0x00007f6080c97f78 <+184>:??? mov 0x10(%rbx,%rcx,1),%rax ?? 0x00007f6080c97f7d <+189>:??? xor??? %edx,%edx ?? 0x00007f6080c97f7f <+191>:??? test?? %rax,%rax ?? 0x00007f6080c97f82 <+194>:??? je???? 0x7f6080c97f95 ?? 0x00007f6080c97f84 <+196>:??? nopl?? 0x0(%rax) => 0x00007f6080c97f88 <+200>:??? mov??? 0x8(%rax),%rax ?? 0x00007f6080c97f8c <+204>:??? add??? $0x1,%rdx ?? 0x00007f6080c97f90 <+208>:??? test?? %rax,%rax ... (gdb) info registers rax??????????? 0x10??? 16 rbx??????????? 0x7f5ff800ad30??? 140050159414576 rcx??????????? 0x10??? 16 rdx??????????? 0x0??? 0 rsi??????????? 0x2??? 2 rdi??????????? 0x1cfe570??? 30401904 rbp??????????? 0x7f607c3ecb80??? 0x7f607c3ecb80 rsp??????????? 0x7f607c3ecb50??? 0x7f607c3ecb50 r8???????????? 0x7f5ff80ae320??? 140050160083744 r9???????????? 0x7f5ff8052480??? 140050159707264 r10??????????? 0x0??? 0 r11??????????? 0x400??? 1024 r12??????????? 0x1cfe570??? 30401904 r13??????????? 0x7f6081419470??? 140052462146672 r14??????????? 0x2??? 2 r15??????????? 0x7f6081418640??? 140052462143040 rip??????????? 0x7f6080c97f88??? 0x7f6080c97f88 eflags???????? 0x206??? [ PF IF ] cs???????????? 0x33??? 51 ss???????????? 0x2b??? 43 ds???????????? 0x0??? 0 es???????????? 0x0??? 0 fs???????????? 0x0??? 0 gs???????????? 0x0??? 0 k0???????????? k1???????????? k2???????????? k3???????????? k4???????????? k5???????????? k6???????????? k7???????????? ----------------------------------------------------------------------------- ============================================================================= Does anyone know about this case? Thanks, Osamu From shade at redhat.com Mon Oct 21 10:08:39 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 21 Oct 2019 12:08:39 +0200 Subject: RFR (S) 8232702: Shenandoah: gc/shenandoah/TestVerifyJCStress.java uses non-existent -XX:+VerifyObjectEquals Message-ID: <5106d033-9d06-4190-8fcb-9fbc984ec736@redhat.com> Testbug: https://bugs.openjdk.java.net/browse/JDK-8232702 Fix: https://cr.openjdk.java.net/~shade/8232702/webrev.01/ This is the left-over from the days when ShenandoahVerifyObjectEquals was just VerifyObjectEquals. It was removed by JDK-8231946. This test never noticed it, because it ignored unrecognized VM options wholesale, but should really only do it for the ShVerifyOptoBarriers. Testing: affected test on Linux x86_64 {release, fastdebug, slowdebug} -- Thanks, -Aleksey From rkennke at redhat.com Mon Oct 21 10:34:56 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 21 Oct 2019 12:34:56 +0200 Subject: RFR (S) 8232702: Shenandoah: gc/shenandoah/TestVerifyJCStress.java uses non-existent -XX:+VerifyObjectEquals In-Reply-To: <5106d033-9d06-4190-8fcb-9fbc984ec736@redhat.com> References: <5106d033-9d06-4190-8fcb-9fbc984ec736@redhat.com> Message-ID: <702435be-2a6c-3a32-ce8f-7471454db7ac@redhat.com> Looks good. Thanks! Roman > Testbug: > https://bugs.openjdk.java.net/browse/JDK-8232702 > > Fix: > https://cr.openjdk.java.net/~shade/8232702/webrev.01/ > > This is the left-over from the days when ShenandoahVerifyObjectEquals was just VerifyObjectEquals. > It was removed by JDK-8231946. This test never noticed it, because it ignored unrecognized VM > options wholesale, but should really only do it for the ShVerifyOptoBarriers. > > Testing: affected test on Linux x86_64 {release, fastdebug, slowdebug} > From thomas.schatzl at oracle.com Mon Oct 21 11:18:46 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 21 Oct 2019 13:18:46 +0200 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> Message-ID: <4b3026b8-10bd-7439-84c6-d906bacd7774@oracle.com> Hi Sangheon, On 13.10.19 08:00, sangheon.kim at oracle.com wrote: > Hi all, > > Previous patch conflicts, so I'm posting rebased one. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.2 > Testing: hs-tier 1 ~ 5, with/without UseNUMA > Looks good to me. Thanks, Thomas From stefan.karlsson at oracle.com Mon Oct 21 13:00:14 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 21 Oct 2019 15:00:14 +0200 Subject: RFR: 8232601: ZGC: Parameterize the ZGranuleMap table size Message-ID: Hi all, Please review this patch to parameterize the ZGranuleMap table size. https://cr.openjdk.java.net/~stefank/8232601/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8232601 Previously, the maps were always bound by the range of a virtual address space view (ZAddressOffsetMax). We want to be able to use ZGranuleMap to map against physical memory offsets, so this RFE suggests that we allow users of ZGranuleMap to specify the max offset. Thanks, StefanK From zgu at redhat.com Mon Oct 21 13:03:19 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 21 Oct 2019 09:03:19 -0400 Subject: RFR (S) 8232674: Fix build and rename ShenandoahBarrierSet::oop_load_from_native_barrier In-Reply-To: References: Message-ID: <77e762ba-df60-ef10-b110-e6260e75cf77@redhat.com> Looks good to me. Thanks, -Zhengyu On 10/21/19 4:00 AM, Aleksey Shipilev wrote: > On 10/21/19 2:19 AM, Zhengyu Gu wrote: >> On 10/20/19 4:33 PM, Aleksey Shipilev wrote: >>> P1 bug: >>> ?? https://bugs.openjdk.java.net/browse/JDK-8232674 >>> >>> I believe this is caused by missing definition of this method: >>> ?? oop oop_load_from_native_barrier(oop obj, narrowOop* load_addr); >> >> This method should never been used. I thought I can get away with not implementing it (just like >> with gcc on Linux). >> >> I think the method body should just be: >> ?... >> ? ShouldNotReacheHere(); >> ? return NULL; >> ?... > > Right! Let's do that: > https://cr.openjdk.java.net/~shade/8232674/webrev.02/ > ?? Testing: {Linux, Windows} x86_64 hotspot_gc_shenandoah; tier1 with Shenandoah > From stefan.karlsson at oracle.com Mon Oct 21 13:09:57 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 21 Oct 2019 15:09:57 +0200 Subject: RFR: 8232602: ZGC: Make ZGranuleMap ZAddress agnostic Message-ID: <4638080b-9f2e-6965-6ed9-a17b32ad3b94@oracle.com> Hi all, Please review this patch to make ZGranuleMap ZAddress agnostic. https://cr.openjdk.java.net/~stefank/8232602/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8232602 Currently, the ZGranuleMap get and put functions take an address in the heap as a parameter. The address is then converted into an offset (into a heap view), before being scaled to a granule. We want to be able to use the ZGranuleMap for physical memory offsets, and not only heap addresses. Therefore, I propose that we move the conversions from address to offset out from ZGranuleMap, and move it to the current users of ZGranuleMap. This patch applies on-top of the patch for JDK-8232601. Thanks, StefanK From stefan.karlsson at oracle.com Mon Oct 21 13:22:00 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 21 Oct 2019 15:22:00 +0200 Subject: RFR: 8232648: ZGC: Move ATTRIBUTE_ALIGNED to the front of declarations Message-ID: Hi all, Please review this patch to move ATTRIBUTE_ALIGNED to the front of declarations. https://cr.openjdk.java.net/~stefank/8232648/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8232648 This is done because the Windows compiler requires ATTRIBUTE_ALIGNED to be put at the front of declarations. A new macro (ZCACHE_ALIGNED) is introduced, and used, to shorten the affected lines. Thanks, StefanK From suenaga at oss.nttdata.com Mon Oct 21 13:29:22 2019 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Mon, 21 Oct 2019 22:29:22 +0900 Subject: Segmentation Fault occurs when ClassLoader and Metaspace is released in JDK 8 In-Reply-To: References: Message-ID: <422c9ca2-5053-c761-cb61-f075877bb666@oss.nttdata.com> Hi Osamu, What JVM options did you pass? I guess you used CMS because this problem seems to occur on CMS only [1] [2]. So it might be work around not to use CMS. I'm not sure root cause of this issue, but it seems to break ClassLoaderDataGraph::_unloading. (like double free (delete) of CLD) Thanks, Yasumasa [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.hpp#l100 [2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp#l6384 On 2019/10/21 17:50, Osamu Sakamoto wrote: > Hi all, > > I have a problem about Segmentation Fault(SEGV) in GC and I can't make the cause clear. > Could you help me solve the problem? > > Our System uses OpenJDK 1.8.0.181, and crashed by SEGV when purging ClassLoader at safepoint. > This problem can't be reproduced, but this has happened 4 times in a few months. > > The following is the summary of my investigation. > > ============================================================================= > > First I checked hs_err, and that shows that the SEGV occurred. > VM_Operation is GenCollectForAllocation at safepoint. > > ----------------------------------------------------------------------------- > # > # A fatal error has been detected by the Java Runtime Environment: > # > #? SIGSEGV (0xb) at pc=0x00007f6080c97f88, pid=23931, tid=0x00007f607c3ed700 > # > # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13) > # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 compressed oops) > # Problematic frame: > # V? [libjvm.so+0x84bf88] > # > # Core dump written. Default location: /opt/tomcate0/core or core.23931 > # > # If you would like to submit a bug report, please visit: > #?? http://bugreport.java.com/bugreport/crash.jsp > # > > ---------------? T H R E A D? --------------- > > Current thread (0x00007f6078c00000):? VMThread [stack: 0x00007f607c2ed000,0x00007f607c3ee000] [id=23939] > > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000018 > > Registers: > RAX=0x0000000000000010, RBX=0x00007f5ff800ad30, RCX=0x0000000000000010, RDX=0x0000000000000000 > RSP=0x00007f607c3ecb50, RBP=0x00007f607c3ecb80, RSI=0x0000000000000002, RDI=0x0000000001cfe570 > R8 =0x00007f5ff80ae320, R9 =0x00007f5ff8052480, R10=0x0000000000000000, R11=0x0000000000000400 > R12=0x0000000001cfe570, R13=0x00007f6081419470, R14=0x0000000000000002, R15=0x00007f6081418640 > RIP=0x00007f6080c97f88, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000004 > ? TRAPNO=0x000000000000000e > > Top of Stack: (sp=0x00007f607c3ecb50) > 0x00007f607c3ecb50:?? 00007f607c3ecba0 00007f5ff800ad30 > 0x00007f607c3ecb60:?? 00007f5ff800ad00 0000000000000000 > 0x00007f607c3ecb70:?? 0000000000000000 0000000000000001 > 0x00007f607c3ecb80:?? 00007f607c3ecba0 00007f6080c995fa > 0x00007f607c3ecb90:?? 00007f5ff800ad00 00007f5ff800ac20 > 0x00007f607c3ecba0:?? 00007f607c3ecbc0 00007f60808bff5e > 0x00007f607c3ecbb0:?? 00007f5ff800ac20 00007f5ff8052870 > 0x00007f607c3ecbc0:?? 00007f607c3ecbe0 00007f60808c0f0f > 0x00007f607c3ecbd0:?? 00007f607c3ecbf0 00007f608140f308 > 0x00007f607c3ecbe0:?? 00007f607c3ecc30 00007f6080daa0b7 > 0x00007f607c3ecbf0:?? 00007f6069000100 0000000000000000 > 0x00007f607c3ecc00:?? 00007f607c3ecc20 00007f6080ed0800 > 0x00007f607c3ecc10:?? 00000000000000f9 88e95c3ba257ab00 > 0x00007f607c3ecc20:?? 431bde82d7b634db 00007f607800aa00 > 0x00007f607c3ecc30:?? 00007f607c3eccc0 00007f6080daa9d5 > 0x00007f607c3ecc40:?? 0000000000000000 00007f607803bf20 > 0x00007f607c3ecc50:?? 00007f607803be20 00000000000003e8 > 0x00007f607c3ecc60:?? 0000000000000001 00007f6078c00000 > 0x00007f607c3ecc70:?? 00007f607c3eccc0 0000000000000000 > 0x00007f607c3ecc80:?? 00000004000000f9 00007f60813e2b99 > 0x00007f607c3ecc90:?? 00007f607803bfa0 00007f6078c00000 > 0x00007f607c3ecca0:?? 0000000000000000 0000000000000000 > 0x00007f607c3eccb0:?? 00007f6081418bd0 00007f607803bf20 > 0x00007f607c3eccc0:?? 00007f607c3ece60 00007f6080f2048a > 0x00007f607c3eccd0:?? 00007f607c3ecd20 00007f607c3ecce0 > 0x00007f607c3ecce0:?? 00007f6078c00000 00007f6078c00980 > 0x00007f607c3eccf0:?? 00007f6078c009c0 00007f6078c009d0 > 0x00007f607c3ecd00:?? 00007f6078c00aa8 00000000000000d8 > 0x00007f607c3ecd10:?? 00007f6078c00be0 0000000000000000 > 0x00007f607c3ecd20:?? 00007f607c3ecd28 6e69747563657845 > 0x00007f607c3ecd30:?? 65706f204d562067 203a6e6f69746172 > 0x00007f607c3ecd40:?? 656c6c6f436e6547 6c6c41726f467463 > > Instructions: (pc=0x00007f6080c97f88) > 0x00007f6080c97f68:?? b6 12 80 fa 00 74 01 f0 48 0f c1 01 31 c9 31 f6 > 0x00007f6080c97f78:?? 48 8b 44 0b 10 31 d2 48 85 c0 74 11 0f 1f 40 00 > 0x00007f6080c97f88:?? 48 8b 40 08 48 83 c2 01 48 85 c0 75 f3 48 83 c1 > 0x00007f6080c97f98:?? 08 48 01 d6 48 83 f9 20 75 d6 8b 7b 08 48 8b 05 > > Register to memory mapping: > > RAX=0x0000000000000010 is an unknown value > RBX=0x00007f5ff800ad30 is an unknown value > RCX=0x0000000000000010 is an unknown value > RDX=0x0000000000000000 is an unknown value > RSP=0x00007f607c3ecb50 is an unknown value > RBP=0x00007f607c3ecb80 is an unknown value > RSI=0x0000000000000002 is an unknown value > RDI=0x0000000001cfe570 is an unknown value > R8 =0x00007f5ff80ae320 is an unknown value > R9 =0x00007f5ff8052480 is an unknown value > R10=0x0000000000000000 is an unknown value > R11=0x0000000000000400 is an unknown value > R12=0x0000000001cfe570 is an unknown value > R13=0x00007f6081419470: in /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so at 0x00007f608044c000 > R14=0x0000000000000002 is an unknown value > R15=0x00007f6081418640: in /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so at 0x00007f608044c000 > > > Stack: [0x00007f607c2ed000,0x00007f607c3ee000], sp=0x00007f607c3ecb50, free space=1022k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V? [libjvm.so+0x84bf88] > V? [libjvm.so+0x84d5fa] > V? [libjvm.so+0x473f5e] > V? [libjvm.so+0x474f0f] > V? [libjvm.so+0x95e0b7] > V? [libjvm.so+0x95e9d5] > V? [libjvm.so+0xad448a] > V? [libjvm.so+0xad48f1] > V? [libjvm.so+0x8beb82] > > VM_Operation (0x00007f5fd69e6120): GenCollectForAllocation, mode: safepoint, requested by thread 0x00007f6079013800 > > ... > ----------------------------------------------------------------------------- > > > > Next, I used GDB to check the backtrace of the SEGV thread from the coredump. > The following is the backtrace. > The SEGV occurred when ClassLoader is purged and Metaspace is destructed. > And frame #7 shows that a signal(SEGV) handler is called after SpaceManager::~SpaceManager() is executed. > > ----------------------------------------------------------------------------- > (gdb) bt > #0? 0x00007f608146f1f7 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > #1? 0x00007f60814708e8 in __GI_abort () at abort.c:90 > #2? 0x00007f6080d0bc39 in os::abort (dump_core=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1519 > #3? 0x00007f6080f1b816 in VMError::report_and_die (this=this at entry=0x7f607c3ebd10) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060 > #4? 0x00007f6080d15927 in JVM_handle_linux_signal (sig=11, info=0x7f607c3ebfb0, ucVoid=0x7f607c3ebe80, abort_if_unrecognized=) > ??? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 > #5? 0x00007f6080d09038 in signalHandler (sig=11, info=0x7f607c3ebfb0, uc=0x7f607c3ebe80) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4446 > #6? > #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, __in_chrg=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 > #8? 0x00007f6080c995fa in Metaspace::~Metaspace (this=0x7f5ff800ad00, __in_chrg=) > ??? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2971 > #9? 0x00007f60808bff5e in ClassLoaderData::~ClassLoaderData (this=0x7f5ff800ac20, __in_chrg=) > ??? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:383 > #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 > #11 0x00007f6080daa0b7 in ClassLoaderDataGraph::purge_if_needed () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.hpp:104 > #12 SafepointSynchronize::do_cleanup_tasks () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:551 > #13 0x00007f6080daa9d5 in SafepointSynchronize::begin () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:402 > #14 0x00007f6080f2048a in VMThread::loop (this=this at entry=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:501 > #15 0x00007f6080f208f1 in VMThread::run (this=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:276 > #16 0x00007f6080d0ab82 in java_start (thread=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:796 > #17 0x00007f6081e2de25 in start_thread (arg=0x7f607c3ed700) at pthread_create.c:308 > #18 0x00007f608153234d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 > ----------------------------------------------------------------------------- > > > In Frame #7, Line 2028 (chunk = chunk->next()) is the crash point. > The variable "chunk" is defined at Line 2025 (Metachunk* chunk = chunks_in_use(i);). > "chunks_in_use(i)" is defined at Line 648 (Metachunk* chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; }). > So I checked values of "_chunks_in_use", and understood that "_chunks_in_use[2]" has Illegal Address "0x10". > Therefore, I think that the SEGV occurred because of referencing Illegal Address "0x10" at "chunk = chunk->next()". > > ----------------------------------------------------------------------------- > (gdb) f 7 > #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, __in_chrg=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 > 2028??? ??? chunk = chunk->next(); > (gdb) list > 2023??? size_t SpaceManager::sum_count_in_chunks_in_use(ChunkIndex i) { > 2024??? ? size_t count = 0; > 2025??? ? Metachunk* chunk = chunks_in_use(i); > 2026??? ? while (chunk != NULL) { > 2027??? ??? count++; > 2028??? ??? chunk = chunk->next(); > 2029??? ? } > 2030??? ? return count; > 2031??? } > 2032 > (gdb) list SpaceManager::chunks_in_use > 647??? ? // Accessors > 648??? ? Metachunk* chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; } > ... > (gdb) p _chunks_in_use > $11 = {0x7f5fcd41c400, 0x7f5fcd41a000, 0x10, 0x0} > ----------------------------------------------------------------------------- > > > > The following is disassemble code of "SpaceManager::~SpaceManager()". > %rax has 0x10 at "0x00007f6080c97f88 <+200>", but I don't understand why this "0x10" is inserted to %rax. > > ----------------------------------------------------------------------------- > (gdb) disas > Dump of assembler code for function SpaceManager::~SpaceManager(): > ?? 0x00007f6080c97ec0 <+0>:??? push?? %rbp > ?? 0x00007f6080c97ec1 <+1>:??? mov??? %rsp,%rbp > ?? 0x00007f6080c97ec4 <+4>:??? push?? %r15 > ?? 0x00007f6080c97ec6 <+6>:??? push?? %r14 > ?? 0x00007f6080c97ec8 <+8>:??? push?? %r13 > ?? 0x00007f6080c97eca <+10>:??? push?? %r12 > ?? 0x00007f6080c97ecc <+12>:??? push?? %rbx > ?? 0x00007f6080c97ecd <+13>:??? mov??? %rdi,%rbx > ?? 0x00007f6080c97ed0 <+16>:??? sub??? $0x8,%rsp > ?? 0x00007f6080c97ed4 <+20>:??? mov 0x780785(%rip),%r12??????? # 0x7f6081418660 <_ZN12SpaceManager12_expand_lockE> > ?? 0x00007f6080c97edb <+27>:??? test?? %r12,%r12 > ?? 0x00007f6080c97ede <+30>:??? je???? 0x7f6080c97ee8 > ?? 0x00007f6080c97ee0 <+32>:??? mov??? %r12,%rdi > ?? 0x00007f6080c97ee3 <+35>:??? callq? 0x7f6080cce2f0 > ?? 0x00007f6080c97ee8 <+40>:??? movslq 0x8(%rbx),%rcx > ?? 0x00007f6080c97eec <+44>:??? lea 0x78075d(%rip),%rdx??????? # 0x7f6081418650 <_ZN12MetaspaceAux15_capacity_wordsE> > ?? 0x00007f6080c97ef3 <+51>:??? lea 0x781576(%rip),%r13??????? # 0x7f6081419470 <_ZN2os16_processor_countE> > ?? 0x00007f6080c97efa <+58>:??? lea 0x78073f(%rip),%r15??????? # 0x7f6081418640 <_ZN12MetaspaceAux11_used_wordsE> > ?? 0x00007f6080c97f01 <+65>:??? mov??? (%rdx,%rcx,8),%rax > ?? 0x00007f6080c97f05 <+69>:??? sub??? 0x40(%rbx),%rax > ?? 0x00007f6080c97f09 <+73>:??? mov??? %rax,(%rdx,%rcx,8) > ?? 0x00007f6080c97f0d <+77>:??? mov??? 0x38(%rbx),%rax > ?? 0x00007f6080c97f11 <+81>:??? movslq 0x8(%rbx),%rdx > ?? 0x00007f6080c97f15 <+85>:??? neg??? %rax > ?? 0x00007f6080c97f18 <+88>:??? cmpl?? $0x1,0x0(%r13) > ?? 0x00007f6080c97f1d <+93>:??? lea??? (%r15,%rdx,8),%rcx > ?? 0x00007f6080c97f21 <+97>:??? mov??? $0x1,%edx > ?? 0x00007f6080c97f26 <+102>:??? jne??? 0x7f6080c97f32 > ?? 0x00007f6080c97f28 <+104>:??? lea 0x74acb4(%rip),%rdx??????? # 0x7f60813e2be3 > ?? 0x00007f6080c97f2f <+111>:??? movzbl (%rdx),%edx > ?? 0x00007f6080c97f32 <+114>:??? cmp??? $0x0,%dl > ?? 0x00007f6080c97f35 <+117>:??? je???? 0x7f6080c97f38 > ?? 0x00007f6080c97f37 <+119>:??? lock xadd %rax,(%rcx) > ?? 0x00007f6080c97f3c <+124>:??? mov??? 0x48(%rbx),%r14 > ?? 0x00007f6080c97f40 <+128>:??? callq? 0x7f6080c951a0 > ?? 0x00007f6080c97f45 <+133>:??? movslq 0x8(%rbx),%rdx > ?? 0x00007f6080c97f49 <+137>:??? imul?? %r14,%rax > ?? 0x00007f6080c97f4d <+141>:??? lea??? (%r15,%rdx,8),%rcx > ?? 0x00007f6080c97f51 <+145>:??? mov??? $0x1,%edx > ?? 0x00007f6080c97f56 <+150>:??? neg??? %rax > ?? 0x00007f6080c97f59 <+153>:??? cmpl?? $0x1,0x0(%r13) > ?? 0x00007f6080c97f5e <+158>:??? jne??? 0x7f6080c97f6a > ?? 0x00007f6080c97f60 <+160>:??? lea 0x74ac7c(%rip),%rdx??????? # 0x7f60813e2be3 > ?? 0x00007f6080c97f67 <+167>:??? movzbl (%rdx),%edx > ?? 0x00007f6080c97f6a <+170>:??? cmp??? $0x0,%dl > ?? 0x00007f6080c97f6d <+173>:??? je???? 0x7f6080c97f70 > ?? 0x00007f6080c97f6f <+175>:??? lock xadd %rax,(%rcx) > ?? 0x00007f6080c97f74 <+180>:??? xor??? %ecx,%ecx > ?? 0x00007f6080c97f76 <+182>:??? xor??? %esi,%esi > ?? 0x00007f6080c97f78 <+184>:??? mov 0x10(%rbx,%rcx,1),%rax > ?? 0x00007f6080c97f7d <+189>:??? xor??? %edx,%edx > ?? 0x00007f6080c97f7f <+191>:??? test?? %rax,%rax > ?? 0x00007f6080c97f82 <+194>:??? je???? 0x7f6080c97f95 > ?? 0x00007f6080c97f84 <+196>:??? nopl?? 0x0(%rax) > => 0x00007f6080c97f88 <+200>:??? mov??? 0x8(%rax),%rax > ?? 0x00007f6080c97f8c <+204>:??? add??? $0x1,%rdx > ?? 0x00007f6080c97f90 <+208>:??? test?? %rax,%rax > ... > (gdb) info registers > rax??????????? 0x10??? 16 > rbx??????????? 0x7f5ff800ad30??? 140050159414576 > rcx??????????? 0x10??? 16 > rdx??????????? 0x0??? 0 > rsi??????????? 0x2??? 2 > rdi??????????? 0x1cfe570??? 30401904 > rbp??????????? 0x7f607c3ecb80??? 0x7f607c3ecb80 > rsp??????????? 0x7f607c3ecb50??? 0x7f607c3ecb50 > r8???????????? 0x7f5ff80ae320??? 140050160083744 > r9???????????? 0x7f5ff8052480??? 140050159707264 > r10??????????? 0x0??? 0 > r11??????????? 0x400??? 1024 > r12??????????? 0x1cfe570??? 30401904 > r13??????????? 0x7f6081419470??? 140052462146672 > r14??????????? 0x2??? 2 > r15??????????? 0x7f6081418640??? 140052462143040 > rip??????????? 0x7f6080c97f88??? 0x7f6080c97f88 > eflags???????? 0x206??? [ PF IF ] > cs???????????? 0x33??? 51 > ss???????????? 0x2b??? 43 > ds???????????? 0x0??? 0 > es???????????? 0x0??? 0 > fs???????????? 0x0??? 0 > gs???????????? 0x0??? 0 > k0???????????? > k1???????????? > k2???????????? > k3???????????? > k4???????????? > k5???????????? > k6???????????? > k7???????????? > ----------------------------------------------------------------------------- > > ============================================================================= > > > > Does anyone know about this case? > > Thanks, Osamu > > From stefan.karlsson at oracle.com Mon Oct 21 14:06:40 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 21 Oct 2019 16:06:40 +0200 Subject: RFR: 8232649: ZGC: Add callbacks to ZMemoryManager Message-ID: <8793cda6-bec6-dac7-5164-8fc34454286e@oracle.com> Hi all, Please review this patch to add callbacks to ZMemoryManager. https://cr.openjdk.java.net/~stefank/8232649/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8232649 This allows users of ZMemoryManager to get callbacks when memory regions are inserted, removed, split, and coalesced. This is needed to support Windows' stricter requirements for placeholder reserved memory. Thanks, StefanK From thomas.schatzl at oracle.com Mon Oct 21 14:09:16 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 21 Oct 2019 16:09:16 +0200 Subject: RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3) In-Reply-To: References: Message-ID: Hi, some initial comments looking at the log output: On 13.10.19 08:16, sangheon.kim at oracle.com wrote: > Hi all, > > Previous patch conflicts because of JDK-8220310, I'm posting rebased one > with some refactoring. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220312/webrev.2 > Testing: hs-tier 1 ~ 5, with/without UseNUMA > > Here's the full patch of 8220310, 8220311 and 8220312. > http://cr.openjdk.java.net/~sangheki/8220312/webrev.full.2/ > - I did not performance impact test the additional logging yet, but I do not expect issues. - that's something from the first NUMA patch: There is this gc+heap+numa=debug log message "Request memory [address, address] to be numa id (X)." for every region. First, it seems to be on the wrong level, consider a heap with ten-thousands of regions. This imo clogs the log too much, and I would prefer to move this information to trace level. Second, the full stop at the end is not necessary :) - the G1HRPrinter should be made NUMA aware, i.e. print expected NUMA id for this region - the casing of NUMA changes depending on message, i.e. sometimes "NUMA" and other times "numa" in the log messages themselves. I would recommend uniformly use "NUMA". However I think that all the "NUMA id" in these messages should read "node id" as at that level we do not manage the OS level NUMA ids any more. - the "numa id" values in the various messages are formatted differently in the different messages with no apparent guideline: sometimes the code adds the leading zeros, sometimes not. Also the separator between node id and value is sometimes ":" and once "=" E.g. "NUMA id verification: preferred id (matched #): 00 (32), 01 (32), ..." "Region Allocated / Requested: 99% xxxx/yyyy (numa id 0: 99% ..." I am kind of undecided what is best, but probably simply leaving out the leading zeros is best for the large majority of cases. - just a suggestion: "Region Allocated / Requested" -> "Placement Match Ratio" or so. Maybe somebody else has a better name. Also in that message I would not print "numa id" at all to make the message shorter. - "Worker threads local object process rate" -> "Worker task locality match rate" seems shorter. Again, to make the message shorter I would prefer that "numa id" were not printed at all in the details. Not sure if that rate at this point is extremely interesting since G1 won't even try to improve it at this time, but you can leave it in if you want. - I would *probably* like to have most of these messages split into "recent" and "total" statistics. Maybe others think that the totals are okay. - Again, to save space I would prefer to have the per-node details in the region summaries in the same line as the original output. I.e. instead of Eden regions: 28->0 (29) From numa id 0: 18->0 From numa id 1: 10->0 the following would be much shorter: Eden regions: 28->0 (29) (0: 18->0, 1: 10->0) As with higher node counts you will get lots of lines with little content imho. Maybe others think differently? Also, this would "fix" the problem that when you enabled gc+heap+numa but not gc+heap, you will see these "From numa id" numbers in the log without their required context. Alternatively, gc+heap+numa could automatically enable gc+heap at the same level. Comments after some superficial look at the changes themselves: - G1Regions should be renamed as G1RegionCounts and get a single line comment like: "Contains per Node id region count". - G1NodeTimes::Stat: it would probably be useful to have a "rate()" getter that recalculates the value as needed instead of the member. - G1HeapTransition::Data::~Data: the "if (soemthing != NULL)" checks are unnecessary. FREE_C_HEAP_ARRAY does that already. Same in G1ParScanThreadState::G1ParscanThreadState. - I do not understand the name "G1NodeTimes" :) What "time" is that referring to? - G1NUMA::clear_statistics() seems to be unused. - G1NodeTimes::print_mutator_alloc_stat_info() and G1NodeTimes::copy_to_sruvivor_stat_info() look very similar. Could the code be refactored a bit? Thanks, Thomas From stefan.karlsson at oracle.com Mon Oct 21 14:37:34 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 21 Oct 2019 16:37:34 +0200 Subject: RFR: 8232650: ZGC: Add initialization hooks for OS specific code Message-ID: <5cdd2722-26a4-8e6c-1262-5d97dfd7f46c@oracle.com> Hi all, Please review this patch to add initialization hooks for OS specific code. https://cr.openjdk.java.net/~stefank/8232650/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8232650 These hooks are needed to for a Windows port. ZInitialize allows syscalls to be dynamically resolved. ZVirtualMemory allows callbacks from 8232649 to be initialized. Thanks, StefanK From shade at redhat.com Mon Oct 21 16:55:33 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 21 Oct 2019 18:55:33 +0200 Subject: RFR (XS) 8232729: Shenandoah: assert ShenandoahHeap::cas_oop addresses are aligned Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8232729 Fix: https://cr.openjdk.java.net/~shade/8232729/webrev.01/ Current ShenandoahHeap::cas_oop routines perform CASes on given address, hoping the hardware would handle it properly. In most cases, this is guaranteed by callers who pass aligned addresses to it: those are aligned narrowOop*/oop* fields or the roots that we can update concurrently. However, we should assert the alignment directly to catch bugs. This would fail the asserts with proper message rather than obscure SIGBUS on some platforms like AArch64. These new asserts are known to legitimately fail with Traversal (JDK-8232730) on x86_64 and with jcstress on AArch64 (JDK-8232712), so I am going to push this after the fixes land to ensure clean test results. Testing: {x86_64, x86_32} hotspot_gc_shenandoah; x86_64 tier1 with Shenandoah (running) -- Thanks, -Aleksey From shade at redhat.com Mon Oct 21 16:55:47 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 21 Oct 2019 18:55:47 +0200 Subject: RFR (S) 8232730: Shenandoah: Traversal should not CAS the roots Message-ID: <09c30b70-fd00-d762-1760-5ff32fd01301@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8232730 Fix: https://cr.openjdk.java.net/~shade/8232730/webrev.01/ This is captured by asserts from JDK-8232729 with hotspot_gc_shenandoah on x86_64. See more details in the bug. The underlying reason for this failure is trying to CAS the roots that are not aligned to the pointer size, notably code roots. Normal concurrent cycle avoids this by updating the roots with plain stores, Traversal should do the same. Testing: {x86_64, x86_32} hotspot_gc_shenandoah; tier1 with Shenandoah (running) -- Thanks, -Aleksey From zgu at redhat.com Mon Oct 21 17:02:20 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 21 Oct 2019 13:02:20 -0400 Subject: RFR 8232712: Shenandoah: SIGBUS in load_reference_barrier_native Message-ID: <59396f6f-6ff8-3ac6-ab51-240f56298ab6@redhat.com> I missed aarch64 changes for JDK-8232010[1]. On aarch64, native barrier does not setup the second parameter (load_addr) for runtime call, therefore, the address to CAS is bogus. Bug: https://bugs.openjdk.java.net/browse/JDK-8232712 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232712/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) on aarch64 Linux. [1] https://bugs.openjdk.java.net/browse/JDK-8232010 Thanks, -Zhengyu From rkennke at redhat.com Mon Oct 21 17:18:41 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 21 Oct 2019 19:18:41 +0200 Subject: RFR (XS) 8232729: Shenandoah: assert ShenandoahHeap::cas_oop addresses are aligned In-Reply-To: References: Message-ID: <5cb6c167-e4e5-07c6-c892-1df1d8700505@redhat.com> Yup. More asserts are always good. :-) Roman > RFE: > https://bugs.openjdk.java.net/browse/JDK-8232729 > > Fix: > https://cr.openjdk.java.net/~shade/8232729/webrev.01/ > > Current ShenandoahHeap::cas_oop routines perform CASes on given address, hoping the hardware would > handle it properly. In most cases, this is guaranteed by callers who pass aligned addresses to it: > those are aligned narrowOop*/oop* fields or the roots that we can update concurrently. > > However, we should assert the alignment directly to catch bugs. This would fail the asserts with > proper message rather than obscure SIGBUS on some platforms like AArch64. These new asserts are > known to legitimately fail with Traversal (JDK-8232730) on x86_64 and with jcstress on AArch64 > (JDK-8232712), so I am going to push this after the fixes land to ensure clean test results. > > Testing: {x86_64, x86_32} hotspot_gc_shenandoah; x86_64 tier1 with Shenandoah (running) > From shade at redhat.com Mon Oct 21 17:35:01 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 21 Oct 2019 19:35:01 +0200 Subject: RFR 8232712: Shenandoah: SIGBUS in load_reference_barrier_native In-Reply-To: <59396f6f-6ff8-3ac6-ab51-240f56298ab6@redhat.com> References: <59396f6f-6ff8-3ac6-ab51-240f56298ab6@redhat.com> Message-ID: <3d3061c5-7ab0-2905-2fe3-dc16ac3dd911@redhat.com> On 10/21/19 7:02 PM, Zhengyu Gu wrote: > I missed aarch64 changes for JDK-8232010[1]. > > On aarch64, native barrier does not setup the second parameter (load_addr) for runtime call, > therefore, the address to CAS is bogus. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232712 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232712/webrev.00/ Roman needs to ack this. This patch allows me to pass the subset of jcstress tests that were previously failing on aarch64. -- Thanks, -Aleksey From zgu at redhat.com Mon Oct 21 17:42:18 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 21 Oct 2019 13:42:18 -0400 Subject: RFR (S) 8232730: Shenandoah: Traversal should not CAS the roots In-Reply-To: <09c30b70-fd00-d762-1760-5ff32fd01301@redhat.com> References: <09c30b70-fd00-d762-1760-5ff32fd01301@redhat.com> Message-ID: <3dc51ca9-d26b-6fe7-dcbd-d48169c55993@redhat.com> Good to me. Thanks, -Zhengyu On 10/21/19 12:55 PM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232730 > > Fix: > https://cr.openjdk.java.net/~shade/8232730/webrev.01/ > > This is captured by asserts from JDK-8232729 with hotspot_gc_shenandoah on x86_64. See more details > in the bug. The underlying reason for this failure is trying to CAS the roots that are not aligned > to the pointer size, notably code roots. Normal concurrent cycle avoids this by updating the roots > with plain stores, Traversal should do the same. > > Testing: {x86_64, x86_32} hotspot_gc_shenandoah; tier1 with Shenandoah (running) > From rkennke at redhat.com Mon Oct 21 18:24:17 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 21 Oct 2019 20:24:17 +0200 Subject: RFR (S) 8232730: Shenandoah: Traversal should not CAS the roots In-Reply-To: <09c30b70-fd00-d762-1760-5ff32fd01301@redhat.com> References: <09c30b70-fd00-d762-1760-5ff32fd01301@redhat.com> Message-ID: Ok! Thanks, Roman > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232730 > > Fix: > https://cr.openjdk.java.net/~shade/8232730/webrev.01/ > > This is captured by asserts from JDK-8232729 with hotspot_gc_shenandoah on x86_64. See more details > in the bug. The underlying reason for this failure is trying to CAS the roots that are not aligned > to the pointer size, notably code roots. Normal concurrent cycle avoids this by updating the roots > with plain stores, Traversal should do the same. > > Testing: {x86_64, x86_32} hotspot_gc_shenandoah; tier1 with Shenandoah (running) > From rkennke at redhat.com Mon Oct 21 18:24:51 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 21 Oct 2019 20:24:51 +0200 Subject: RFR 8232712: Shenandoah: SIGBUS in load_reference_barrier_native In-Reply-To: <59396f6f-6ff8-3ac6-ab51-240f56298ab6@redhat.com> References: <59396f6f-6ff8-3ac6-ab51-240f56298ab6@redhat.com> Message-ID: <713f6623-17a0-1c6f-90b2-ea398a532bbf@redhat.com> Ok! Thanks, Roman > I missed aarch64 changes for JDK-8232010[1]. > > On aarch64, native barrier does not setup the second parameter > (load_addr) for runtime call, therefore, the address to CAS is bogus. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232712 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232712/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) on aarch64 Linux. > > [1] https://bugs.openjdk.java.net/browse/JDK-8232010 > > Thanks, > > -Zhengyu > From kim.barrett at oracle.com Mon Oct 21 23:24:33 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 21 Oct 2019 19:24:33 -0400 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> Message-ID: > On Oct 13, 2019, at 2:00 AM, sangheon.kim at oracle.com wrote: > > Hi all, > > Previous patch conflicts, so I'm posting rebased one. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.2 > Testing: hs-tier 1 ~ 5, with/without UseNUMA > > Thanks, > Sangheon ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1ParScanThreadState.hpp Removed: 190 // ... State is the original (source) cset state for the object 191 // that is allocated for. ... That simple removal doesn't seem right. Now "state" in the next sentence has no explanation. Maybe some better rewrite? ------------------------------------------------------------------------------ Looks good, other than that one comment issue. From kim.barrett at oracle.com Tue Oct 22 01:20:33 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 21 Oct 2019 21:20:33 -0400 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> Message-ID: <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> > On Oct 19, 2019, at 9:06 AM, Thomas Schatzl wrote: > > Hi all, > > there is a new webrev at > > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2/ (full only, > there is no point in providing a diff) > > since I like this solution a lot as it removes a lot of additional > post-processing. > > Testing has been a bit of a headache: interference between strong and > weak processing is extremely rare, so I had to make it pretty common by > > 1) only a single thread doing strong processing > 2) the weak processing stage has to be moved right after the root > processing so they overlap with a lot higher probability > > hs-tier 1-5 passes with and without these changes, with a noticable > amount of overlap according to additional log messages. That change can > be looked at at > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2.testing/ . > Obviously I am not going to push this. > > Surprisingly there had to be no changes to Shenandoah as it does not > use the claim mechanism changed here, implementing something else. > Shenandoah also passed vmTestbase/gc with these changes with no > problem. > > Below this email is a copy of Kim's suggestion about the state machine > again for reference. I also added documentation about why and how the > code is supposed to work. I'm glad the new state machine worked out, and allowed the extra task to be eliminated. Thanks for going the extra mile with the testing. And thanks for turning my pseudo-code into something more readable. My comments here mostly suggestions for more of that; I don't think I'd want to have to decipher this in 6 months without some helpful commentary. :) ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp Removed: 1829 #define NMETHOD_SENTINEL ((nmethod*)badAddress) Yay! ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.hpp 118 // SR -> SD: the nmethod has been processed strongly from the beginning. I think this is just the tail of 114 // WR -> SR -> SD: during weak processing another thread found that the nmethod and is not what is needed here. I think what you are really looking for here is the unclaimed -> SD case. I think the state progressions are unclaimed -> WR -> WD unclaimed -> WR -> SR -> SD unclaimed -> WR -> WD -> SD unclaimed -> SD The first is terminal (at WD) if the nmethod doesn't need strong processing. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.hpp 95 // We store state and claim information in the _oops_do_mark_link member, using 96 // the two LSBs for the state and the rest for linking together nmethods that 97 // were visited. There's no description of the upper bits in this comment. In particular, the self-loop to indicate end of list isn't mentioned. Also, the specific values for the upper bits in the transitions turned out to be important, as discussed in the pseudo-code. So if N is the nmethod and X is the "next" value (which is N at end of list), then the state progressions might be described as unclaimed -> WR(N) -> WD(X) unclaimed -> WR(N) -> SR(N) -> SD(X) unclaimed -> WR(N) -> WD(X) -> SD(X) unclaimed -> SD(N) -> SD(X) (The text descriptions of the progressions seem okay.) It also might help to indicate which thread performs each step. If C is the claiming thread, and O is some other thread, then something like unclaimed (C)-> WR(N) (C)-> WD(X) unclaimed (C)-> WR(N) (O)-> SR(N) (C)-> SD(X) unclaimed (C)-> WR(N) (C)-> WD(X) (O)-> SD(X) unclaimed (C)-> SD(N) (C)-> SD(X) (Admittedly, that's pretty dense notation.) I think the comments describing the various transition functions might be better if they explicitly state which of the above transitions they (attempt to) perform, e.g. // Attempt unclaimed -> WR(N) transition, returning true if successful. bool oops_do_try_claim_weak_request(); I found the existing text descriptions hard to map onto the specific steps, even though they are (mostly?) one-to-one. I was finding it easier to ignore the descriptions and just use the names, though that isn't trivial either. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.hpp 160 oops_do_mark_link* oops_do_try_claim_weak_request_as_strong_request(oops_do_mark_link* next); I think this function is misnamed; it doesn't really claim anything. Instead it attempts to add a strong request (SR) to a weak request (WR), and should be called oops_do_try_add_strong_request. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 1848 bool nmethod::oops_do_try_claim_weak_request() { 1849 assert(SafepointSynchronize::is_at_safepoint(), "only at safepoint"); 1850 1851 if (_oops_do_mark_link != NULL) { 1852 return false; 1853 } 1854 if (!Atomic::replace_if_null(mark_link(this, claim_weak_request_tag), &_oops_do_mark_link)) { 1855 return false; 1856 } 1857 oops_do_log_change("oops_do, mark weak request"); 1858 return true; 1859 } I found the various "!"s and early returns in the above made it hard to read. I think simpler is the following. YMMV. bool nmethod::oops_do_try_claim_weak_request() { assert(SafepointSynchronize::is_at_safepoint(), "only at safepoint"); if ((_oops_do_mark_link == NULL) && Atomic::replace_if_null(mark_link(this, claim_weak_request_tag), &_oops_do_mark_link)) { oops_do_log_change("oops_do, mark weak request"); return true; } return false; } That's also more similar to the style of the other functions. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.hpp 130 assert(((uintptr_t)nm & 0x3) == 0, "nmethod pointer must have zero lower two LSB"); assert(is_aligned(nm, 2), ...); ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 1906 if (old_head == NULL) { 1907 old_head = this; 1908 } ... 1922 if (old_head == NULL) { 1923 old_head = this; 1924 } ... 2013 } while (cur != next); In none of these places nor in the header comments is there any mention of the use of self-loop to indicate the end of the list (nor why that's being done). ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 2014 _oops_do_mark_nmethods = NULL; Maybe move this up to immediately following 1997 nmethod* next = _oops_do_mark_nmethods; to make it more immediately obvious that we're taking and processing the whole list. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 1987 void nmethod::oops_do_marking_prologue() { ... 1991 _oops_do_mark_nmethods = NULL; That assignment ought to be a nop, and could instead be an assert. ------------------------------------------------------------------------------ From sangheon.kim at oracle.com Tue Oct 22 05:50:26 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Mon, 21 Oct 2019 22:50:26 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <4b3026b8-10bd-7439-84c6-d906bacd7774@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <4b3026b8-10bd-7439-84c6-d906bacd7774@oracle.com> Message-ID: <98d6a618-c780-1ef1-35cd-8117f2af2a0b@oracle.com> Hi Thomas, On 10/21/19 4:18 AM, Thomas Schatzl wrote: > Hi Sangheon, > > On 13.10.19 08:00, sangheon.kim at oracle.com wrote: >> Hi all, >> >> Previous patch conflicts, so I'm posting rebased one. >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.2 >> Testing: hs-tier 1 ~ 5, with/without UseNUMA >> > > Looks good to me. Thanks for your review! Thanks, Sangheon > > Thanks, > ? Thomas From sangheon.kim at oracle.com Tue Oct 22 05:52:05 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Mon, 21 Oct 2019 22:52:05 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> Message-ID: <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> Hi Kim, Thanks for reviewing this part. On 10/21/19 4:24 PM, Kim Barrett wrote: >> On Oct 13, 2019, at 2:00 AM, sangheon.kim at oracle.com wrote: >> >> Hi all, >> >> Previous patch conflicts, so I'm posting rebased one. >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.2 >> Testing: hs-tier 1 ~ 5, with/without UseNUMA >> >> Thanks, >> Sangheon > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1ParScanThreadState.hpp > Removed: > 190 // ... State is the original (source) cset state for the object > 191 // that is allocated for. ... > > That simple removal doesn't seem right. Now "state" in the next > sentence has no explanation. Maybe some better rewrite? What do you think about below comment? ? // Tries to allocate word_sz in the PLAB of the next "generation" after trying to ? // allocate into dest. Previous_plab_refill_failed indicates whether previous ? // PLAB refill for the original (source) object was failed. ? // Returns a non-NULL pointer if successful, and updates dest if required. ? // Also determines whether we should continue to try to allocate into the various ? // generations or just end trying to allocate. ? HeapWord* allocate_in_next_plab(G1HeapRegionAttr* dest, ... Let me post the webrev when we decide. :) Thanks, Sangheon > > ------------------------------------------------------------------------------ > > Looks good, other than that one comment issue. > From per.liden at oracle.com Tue Oct 22 06:14:27 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 08:14:27 +0200 Subject: RFR: 8232601: ZGC: Parameterize the ZGranuleMap table size In-Reply-To: References: Message-ID: Looks good. /Per On 10/21/19 3:00 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to parameterize the ZGranuleMap table size. > > https://cr.openjdk.java.net/~stefank/8232601/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232601 > > Previously, the maps were always bound by the range of a virtual address > space view (ZAddressOffsetMax). We want to be able to use ZGranuleMap to > map against physical memory offsets, so this RFE suggests that we allow > users of ZGranuleMap to specify the max offset. > > Thanks, > StefanK From per.liden at oracle.com Tue Oct 22 06:18:37 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 08:18:37 +0200 Subject: RFR: 8232602: ZGC: Make ZGranuleMap ZAddress agnostic In-Reply-To: <4638080b-9f2e-6965-6ed9-a17b32ad3b94@oracle.com> References: <4638080b-9f2e-6965-6ed9-a17b32ad3b94@oracle.com> Message-ID: Looks good. /Per On 10/21/19 3:09 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to make ZGranuleMap ZAddress agnostic. > > https://cr.openjdk.java.net/~stefank/8232602/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232602 > > Currently, the ZGranuleMap get and put functions take an address in the > heap as a parameter. The address is then converted into an offset (into > a heap view), before being scaled to a granule. > > We want to be able to use the ZGranuleMap for physical memory offsets, > and not only heap addresses. Therefore, I propose that we move the > conversions from address to offset out from ZGranuleMap, and move it to > the current users of ZGranuleMap. > > This patch applies on-top of the patch for JDK-8232601. > > Thanks, > StefanK > From per.liden at oracle.com Tue Oct 22 06:19:22 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 08:19:22 +0200 Subject: RFR: 8232648: ZGC: Move ATTRIBUTE_ALIGNED to the front of declarations In-Reply-To: References: Message-ID: <5a6bf373-bb06-31fc-9493-8c93a7b21ba5@oracle.com> Looks good. /Per On 10/21/19 3:22 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to move ATTRIBUTE_ALIGNED to the front of > declarations. > > https://cr.openjdk.java.net/~stefank/8232648/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232648 > > This is done because the Windows compiler requires ATTRIBUTE_ALIGNED to > be put at the front of declarations. A new macro (ZCACHE_ALIGNED) is > introduced, and used, to shorten the affected lines. > > Thanks, > StefanK From per.liden at oracle.com Tue Oct 22 06:22:13 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 08:22:13 +0200 Subject: RFR: 8232650: ZGC: Add initialization hooks for OS specific code In-Reply-To: <5cdd2722-26a4-8e6c-1262-5d97dfd7f46c@oracle.com> References: <5cdd2722-26a4-8e6c-1262-5d97dfd7f46c@oracle.com> Message-ID: Looks good. /Per On 10/21/19 4:37 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to add initialization hooks for OS specific code. > > https://cr.openjdk.java.net/~stefank/8232650/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232650 > > These hooks are needed to for a Windows port. ZInitialize allows > syscalls to be dynamically resolved. ZVirtualMemory allows callbacks > from 8232649 to be initialized. > > Thanks, > StefanK From per.liden at oracle.com Tue Oct 22 06:24:24 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 08:24:24 +0200 Subject: RFR: 8232649: ZGC: Add callbacks to ZMemoryManager In-Reply-To: <8793cda6-bec6-dac7-5164-8fc34454286e@oracle.com> References: <8793cda6-bec6-dac7-5164-8fc34454286e@oracle.com> Message-ID: Looks good. /Per On 10/21/19 4:06 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to add callbacks to ZMemoryManager. > > https://cr.openjdk.java.net/~stefank/8232649/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232649 > > This allows users of ZMemoryManager to get callbacks when memory regions > are inserted, removed, split, and coalesced. This is needed to support > Windows' stricter requirements for placeholder reserved memory. > > Thanks, > StefanK From kim.barrett at oracle.com Tue Oct 22 07:19:22 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 22 Oct 2019 03:19:22 -0400 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> Message-ID: <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> > On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: > What do you think about below comment? > > // Tries to allocate word_sz in the PLAB of the next "generation" after trying to > // allocate into dest. Previous_plab_refill_failed indicates whether previous > // PLAB refill for the original (source) object was failed. Drop ?was?. Otherwise looks good. > // Returns a non-NULL pointer if successful, and updates dest if required. > // Also determines whether we should continue to try to allocate into the various > // generations or just end trying to allocate. > HeapWord* allocate_in_next_plab(G1HeapRegionAttr* dest, > ... > > Let me post the webrev when we decide. :) > > Thanks, > Sangheon > > >> >> ------------------------------------------------------------------------------ >> >> Looks good, other than that one comment issue. From kishor.kharbas at intel.com Tue Oct 22 07:22:59 2019 From: kishor.kharbas at intel.com (Kharbas, Kishor) Date: Tue, 22 Oct 2019 07:22:59 +0000 Subject: RFR(S): 8215893: Add better abstraction for pinning G1 concurrent marking bitmaps. In-Reply-To: <2CB4D3B2-02E7-46A8-85D3-CCEA34C0695B@oracle.com> References: <2CB4D3B2-02E7-46A8-85D3-CCEA34C0695B@oracle.com> Message-ID: Hi Stefan, > -----Original Message----- > From: Stefan Johansson [mailto:stefan.johansson at oracle.com] > Sent: Friday, October 18, 2019 7:32 AM > To: Kharbas, Kishor > Cc: sangheon.kim at oracle.com; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR(S): 8215893: Add better abstraction for pinning G1 > concurrent marking bitmaps. > > Hi Kishor, > > > 17 okt. 2019 kl. 23:28 skrev Kharbas, Kishor : > > > > Hi Stefan, > > > >> -----Original Message----- > >> From: Stefan Johansson [mailto:stefan.johansson at oracle.com] > >> Sent: Thursday, October 17, 2019 4:34 AM > >> To: Kharbas, Kishor ; > >> sangheon.kim at oracle.com > >> Cc: hotspot-gc-dev at openjdk.java.net > >> Subject: Re: RFR(S): 8215893: Add better abstraction for pinning G1 > >> concurrent marking bitmaps. > >> > >> Hi Kishor, > >> > >> On 2019-10-17 03:39, Kharbas, Kishor wrote: > >>> Hi Sangheon, > >>> > >>> *From:*sangheon.kim at oracle.com [mailto:sangheon.kim at oracle.com] > >>> *Sent:* Wednesday, October 16, 2019 11:03 AM > >>> *To:* Kharbas, Kishor > >>> *Cc:* hotspot-gc-dev at openjdk.java.net; Stefan Johansson > >>> > >>> *Subject:* Re: RFR(S): 8215893: Add better abstraction for pinning > >>> G1 concurrent marking bitmaps. > >>> > >>>> Hi Kishor, > >>>> > >>>> Before reviewing webrev.02, could you remind us what was the > >>>> motivation of pinning the bitmap mappers here? > >>>> In addition to explanations of the problematic situation, any logs > >>>> / stack-trace also may help. > >>>> > >>>> We think that understanding of the root cause should be considered > first. > >>> > >>> Unfortunately, I do not have log/stack-trace of the problem I had faced. > >>> > >>> I am trying to reproduce it by running SPECjbb workload over and > >>> over > >> again. > >>> > >>> I haven't looked at GC code since end of last year. So I am having a > >>> difficult time pinning what the problem was. > >>> > >>> I am looking at G1ClearBitMapTask which iterates over bitmap for all > >>> available regions. I am not sure when this task is performed. > >>> > >>> There is comment in HeapRegionManager::par_iterate() as shown > below, > >>> > >>> /// This also (potentially) iterates over regions newly allocated > >>> during GC. This/ > >>> > >>> / // is no problem except for some extra work./ > >>> > >>> This method is eventually called from G1ClearBitMapTask. The comment > >>> suggests that regions are allocated concurrently when the function > >>> is run. This also means with AllocateOldGenAt flag enabled, regions > >>> can also be un-committed. > >> > >> I don't understand how AllocateOldGenAt would make any difference, > >> regions can be un-committed without it as well and there are > >> mechanisms in place to make sure only the correct parts of the side > >> structures are un- committed when that happens. > > > > In the regular code un-commit is only done by VM thread during safepoint. > Un-commit of region also causes its corresponding bitmap to be un- > committed. > > But it never happens that CM threads are iterating over bitmap while > regions are being un-committed concurrently. > > > > Whereas when AllocateOldGenAt is used, because of the way regions are > > managed between dram and nvdimms, regions can be un-committed by > mutator threads and GC threads. > > 1. Mutator threads - during mutator region allocation and humongous > region allocation. > > This is the problem, I managed to reproduce this by adding a short sleep in > the clearing code and force back to back concurrent cycles in SPECjvm2008 > and a 2g heap. I think this is only a problem for humongous allocations, > because we should never allocate more young regions than we have already > made available at the end of the previous GC. But the humongous allocations > can very well happen during we clear the bitmaps in the concurrent cycle so > that is probably why the pinning was added. > > Thinking more about this, a different solution would be to not un-commit > memory in this situation. This all depends on how one sees the amount of > committed memory when using AllocateOldGenAt, should the amount of > committed on dram + nvdimm never be more than Xmx or is the important > thing that the number of regions use never exceeds Xmx. I think I?m leaning > towards the latter, but there might be reasons I haven?t thought about here. > This would break the current invariant: > assert(total_committed_before == total_regions_committed(), "invariant > not met?); > > But that might be ok. If using that approach, instead of un-committing > (shrink_dram), just remove the same number of regions from the freelist, > that you expand on nvdimm. The unused removed regions need to be kept > track of so we can add them again during the GC. To me this is more or less > the same concept we use when borrowing regions during the GC. There > might be issues with this approach but I think it would be interesting to > explore. > [Kharbas, Kishor] Thank you for looking into this and reproducing the bug. I think I follow your suggestion. I will try to work on a solution using this. > I also wonder if we ever should need to expand_dram during > allocate_new_region, I see that it happens now during GC and that is > probably because we do this at the end of the GC: > _manager->adjust_dram_regions((uint)young_list_target_length() ? > > If this adjustment included the expected number of survivors as well, we > should have enough DRAM regions and if we then end up getting an > NVDIMM region when asking for a survivor we should return NULL signaling > that survivor is full. > > What do you think about that approach? [Kharbas, Kishor] This approach is simpler to implement. I am afraid that it would change the behavior with respect to default case. Still I will give it a try. For now, can we close the bug if the abstraction is satisfactory and continue exploration in a separate issue? Thanks, Kishor > > Thanks, > Stefan > > > 2. GC worker threads - during survivor region and old region allocation. > > 3. VMThread - heap size adjustment as in default and after full GC to > allocate enough regions in dram for young gen (may require to un-commit > some regions from nvdimm). > > > > Could any of these be running concurrently when CM threads are iterating > over the bitmap? > > > >> > >> I want to reiterate what Sangheon said about identifying the root cause. > >> If we don't know why this is needed and can't reproduce any failures > >> without the special pinning of the bitmaps, I would rather see that > >> we remove the pinning code to make things work more like normal G1. > > > > I am trying to reproduce but as you can imagine it is very rare and hard-to- > reproduce bug, if it is. > > > > Thanks, > > Kishor > >> > >> Thanks, > >> Stefan > >> > >> > >>> > >>> Pardon me if my understanding is incorrect. > >>> > >>> Regards, > >>> > >>> Kishor From stefan.karlsson at oracle.com Tue Oct 22 08:18:52 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 22 Oct 2019 10:18:52 +0200 Subject: RFR: 8232601: ZGC: Parameterize the ZGranuleMap table size In-Reply-To: References: Message-ID: <1f2c9769-525b-d1d6-5e43-b9567ad6d070@oracle.com> Thanks, Per. StefanK On 2019-10-22 08:14, Per Liden wrote: > Looks good. > > /Per > > On 10/21/19 3:00 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to parameterize the ZGranuleMap table size. >> >> https://cr.openjdk.java.net/~stefank/8232601/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232601 >> >> Previously, the maps were always bound by the range of a virtual >> address space view (ZAddressOffsetMax). We want to be able to use >> ZGranuleMap to map against physical memory offsets, so this RFE >> suggests that we allow users of ZGranuleMap to specify the max offset. >> >> Thanks, >> StefanK From stefan.karlsson at oracle.com Tue Oct 22 08:19:01 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 22 Oct 2019 10:19:01 +0200 Subject: RFR: 8232602: ZGC: Make ZGranuleMap ZAddress agnostic In-Reply-To: References: <4638080b-9f2e-6965-6ed9-a17b32ad3b94@oracle.com> Message-ID: Thanks, Per. StefanK On 2019-10-22 08:18, Per Liden wrote: > Looks good. > > /Per > > On 10/21/19 3:09 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to make ZGranuleMap ZAddress agnostic. >> >> https://cr.openjdk.java.net/~stefank/8232602/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232602 >> >> Currently, the ZGranuleMap get and put functions take an address in >> the heap as a parameter. The address is then converted into an offset >> (into a heap view), before being scaled to a granule. >> >> We want to be able to use the ZGranuleMap for physical memory offsets, >> and not only heap addresses. Therefore, I propose that we move the >> conversions from address to offset out from ZGranuleMap, and move it >> to the current users of ZGranuleMap. >> >> This patch applies on-top of the patch for JDK-8232601. >> >> Thanks, >> StefanK >> From stefan.karlsson at oracle.com Tue Oct 22 08:19:26 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 22 Oct 2019 10:19:26 +0200 Subject: RFR: 8232650: ZGC: Add initialization hooks for OS specific code In-Reply-To: References: <5cdd2722-26a4-8e6c-1262-5d97dfd7f46c@oracle.com> Message-ID: <2c679db1-e2b0-20c6-665b-7d04acd0b03b@oracle.com> Thanks, Per. StefanK On 2019-10-22 08:22, Per Liden wrote: > Looks good. > > /Per > > On 10/21/19 4:37 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to add initialization hooks for OS specific >> code. >> >> https://cr.openjdk.java.net/~stefank/8232650/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232650 >> >> These hooks are needed to for a Windows port. ZInitialize allows >> syscalls to be dynamically resolved. ZVirtualMemory allows callbacks >> from 8232649 to be initialized. >> >> Thanks, >> StefanK From stefan.karlsson at oracle.com Tue Oct 22 08:19:12 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 22 Oct 2019 10:19:12 +0200 Subject: RFR: 8232648: ZGC: Move ATTRIBUTE_ALIGNED to the front of declarations In-Reply-To: <5a6bf373-bb06-31fc-9493-8c93a7b21ba5@oracle.com> References: <5a6bf373-bb06-31fc-9493-8c93a7b21ba5@oracle.com> Message-ID: <97da9b0b-89aa-6480-1485-bab4ad5c3be1@oracle.com> Thanks, Per. StefanK On 2019-10-22 08:19, Per Liden wrote: > Looks good. > > /Per > > On 10/21/19 3:22 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to move ATTRIBUTE_ALIGNED to the front of >> declarations. >> >> https://cr.openjdk.java.net/~stefank/8232648/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232648 >> >> This is done because the Windows compiler requires ATTRIBUTE_ALIGNED >> to be put at the front of declarations. A new macro (ZCACHE_ALIGNED) >> is introduced, and used, to shorten the affected lines. >> >> Thanks, >> StefanK From stefan.karlsson at oracle.com Tue Oct 22 08:19:37 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 22 Oct 2019 10:19:37 +0200 Subject: RFR: 8232649: ZGC: Add callbacks to ZMemoryManager In-Reply-To: References: <8793cda6-bec6-dac7-5164-8fc34454286e@oracle.com> Message-ID: <2e154c3a-bca6-cf5f-17f0-b9fbc6c079ad@oracle.com> Thanks, Per. StefanK On 2019-10-22 08:24, Per Liden wrote: > Looks good. > > /Per > > On 10/21/19 4:06 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to add callbacks to ZMemoryManager. >> >> https://cr.openjdk.java.net/~stefank/8232649/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232649 >> >> This allows users of ZMemoryManager to get callbacks when memory >> regions are inserted, removed, split, and coalesced. This is needed to >> support Windows' stricter requirements for placeholder reserved memory. >> >> Thanks, >> StefanK From erik.osterlund at oracle.com Tue Oct 22 09:17:25 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 22 Oct 2019 11:17:25 +0200 Subject: RFR: 8232601: ZGC: Parameterize the ZGranuleMap table size In-Reply-To: References: Message-ID: Hi Stefan, Looks good. Thanks, /Erik On 10/21/19 3:00 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to parameterize the ZGranuleMap table size. > > https://cr.openjdk.java.net/~stefank/8232601/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232601 > > Previously, the maps were always bound by the range of a virtual > address space view (ZAddressOffsetMax). We want to be able to use > ZGranuleMap to map against physical memory offsets, so this RFE > suggests that we allow users of ZGranuleMap to specify the max offset. > > Thanks, > StefanK From erik.osterlund at oracle.com Tue Oct 22 09:17:50 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 22 Oct 2019 11:17:50 +0200 Subject: RFR: 8232602: ZGC: Make ZGranuleMap ZAddress agnostic In-Reply-To: <4638080b-9f2e-6965-6ed9-a17b32ad3b94@oracle.com> References: <4638080b-9f2e-6965-6ed9-a17b32ad3b94@oracle.com> Message-ID: Hi Stefan, Looks good. Thanks, /Erik On 10/21/19 3:09 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to make ZGranuleMap ZAddress agnostic. > > https://cr.openjdk.java.net/~stefank/8232602/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232602 > > Currently, the ZGranuleMap get and put functions take an address in > the heap as a parameter. The address is then converted into an offset > (into a heap view), before being scaled to a granule. > > We want to be able to use the ZGranuleMap for physical memory offsets, > and not only heap addresses. Therefore, I propose that we move the > conversions from address to offset out from ZGranuleMap, and move it > to the current users of ZGranuleMap. > > This patch applies on-top of the patch for JDK-8232601. > > Thanks, > StefanK > From erik.osterlund at oracle.com Tue Oct 22 09:18:14 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 22 Oct 2019 11:18:14 +0200 Subject: RFR: 8232648: ZGC: Move ATTRIBUTE_ALIGNED to the front of declarations In-Reply-To: References: Message-ID: <201ba7a6-a371-f4c0-340f-a5af14d0323f@oracle.com> Hi Stefan, Looks good. Thanks, /Erik On 10/21/19 3:22 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to move ATTRIBUTE_ALIGNED to the front of > declarations. > > https://cr.openjdk.java.net/~stefank/8232648/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232648 > > This is done because the Windows compiler requires ATTRIBUTE_ALIGNED > to be put at the front of declarations. A new macro (ZCACHE_ALIGNED) > is introduced, and used, to shorten the affected lines. > > Thanks, > StefanK From erik.osterlund at oracle.com Tue Oct 22 09:18:31 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 22 Oct 2019 11:18:31 +0200 Subject: RFR: 8232650: ZGC: Add initialization hooks for OS specific code In-Reply-To: <5cdd2722-26a4-8e6c-1262-5d97dfd7f46c@oracle.com> References: <5cdd2722-26a4-8e6c-1262-5d97dfd7f46c@oracle.com> Message-ID: Hi Stefan, Looks good. Thanks, /Erik On 10/21/19 4:37 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to add initialization hooks for OS specific > code. > > https://cr.openjdk.java.net/~stefank/8232650/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232650 > > These hooks are needed to for a Windows port. ZInitialize allows > syscalls to be dynamically resolved. ZVirtualMemory allows callbacks > from 8232649 to be initialized. > > Thanks, > StefanK From erik.osterlund at oracle.com Tue Oct 22 09:18:47 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 22 Oct 2019 11:18:47 +0200 Subject: RFR: 8232649: ZGC: Add callbacks to ZMemoryManager In-Reply-To: <8793cda6-bec6-dac7-5164-8fc34454286e@oracle.com> References: <8793cda6-bec6-dac7-5164-8fc34454286e@oracle.com> Message-ID: <17249488-d0f4-81ae-3a15-b120cac388af@oracle.com> Hi Stefan, Looks good. Thanks, /Erik On 10/21/19 4:06 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to add callbacks to ZMemoryManager. > > https://cr.openjdk.java.net/~stefank/8232649/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232649 > > This allows users of ZMemoryManager to get callbacks when memory > regions are inserted, removed, split, and coalesced. This is needed to > support Windows' stricter requirements for placeholder reserved memory. > > Thanks, > StefanK From thomas.schatzl at oracle.com Tue Oct 22 09:53:30 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 11:53:30 +0200 Subject: RFR (XS): 8232771: Revert JDK-8230794 because of environment changes Message-ID: <31507a84-fb55-25e7-7ead-3955e0c66a31@oracle.com> Hi all, can I have reviews for this small change that reverts JDK-8230794 because it let the failure reported in JDK-8227695 disappear? Also there were some environment changes that we think fixes the issue in JDK-8227695. I would like the original code bake again, and see if this hunch is correct. Later I still want to improve the assert, but first let's see about JDK-8227695. So sorry for the back and forth. This is a straight revert of JDK-8230794 that applied without issues. CR: https://bugs.openjdk.java.net/browse/JDK-8232771 Webrev: http://cr.openjdk.java.net/~tschatzl/8232771/webrev/ Testing: local compilation Thanks, Thomas From stefan.johansson at oracle.com Tue Oct 22 10:04:10 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 22 Oct 2019 12:04:10 +0200 Subject: RFR (XS): 8232771: Revert JDK-8230794 because of environment changes In-Reply-To: <31507a84-fb55-25e7-7ead-3955e0c66a31@oracle.com> References: <31507a84-fb55-25e7-7ead-3955e0c66a31@oracle.com> Message-ID: Looks good, Stefan On 2019-10-22 11:53, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small change that reverts JDK-8230794 > because it let the failure reported in JDK-8227695 disappear? Also there > were some environment changes that we think fixes the issue in JDK-8227695. > > I would like the original code bake again, and see if this hunch is > correct. > Later I still want to improve the assert, but first let's see about > JDK-8227695. So sorry for the back and forth. > > This is a straight revert of JDK-8230794 that applied without issues. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232771 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232771/webrev/ > Testing: > local compilation > > Thanks, > ? Thomas From per.liden at oracle.com Tue Oct 22 10:12:11 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 12:12:11 +0200 Subject: RFR (XS): 8232771: Revert JDK-8230794 because of environment changes In-Reply-To: <31507a84-fb55-25e7-7ead-3955e0c66a31@oracle.com> References: <31507a84-fb55-25e7-7ead-3955e0c66a31@oracle.com> Message-ID: <3b0b24e8-be45-6501-f1ce-112585087954@oracle.com> Looks good. /Per On 10/22/19 11:53 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small change that reverts JDK-8230794 > because it let the failure reported in JDK-8227695 disappear? Also there > were some environment changes that we think fixes the issue in JDK-8227695. > > I would like the original code bake again, and see if this hunch is > correct. > Later I still want to improve the assert, but first let's see about > JDK-8227695. So sorry for the back and forth. > > This is a straight revert of JDK-8230794 that applied without issues. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232771 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232771/webrev/ > Testing: > local compilation > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Tue Oct 22 10:13:11 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 12:13:11 +0200 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> Message-ID: Hi Kim, thanks a lot for taking the time so quickly. On 22.10.19 03:20, Kim Barrett wrote: >> On Oct 19, 2019, at 9:06 AM, Thomas Schatzl wrote: >> >> Hi all, >> >> there is a new webrev at >> >> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2/ (full only, >> there is no point in providing a diff) >> >> since I like this solution a lot as it removes a lot of additional >> >> post-processing. >>[...] >> > I'm glad the new state machine worked out, and allowed the extra task > to be eliminated. Thanks for going the extra mile with the testing. > And thanks for turning my pseudo-code into something more readable. My > comments here mostly suggestions for more of that; I don't think I'd > want to have to decipher this in 6 months without some helpful > commentary. :) I think I addressed all your comments, and thanks for your suggestions - I agree about having this tricky code well documented. Changes are currently running through hs-tier1-5 with the changes that ease reproduction (the webrev.2.testing changes noted in the last email). Since there are no significant code changes apart from documentation, I am confident there will be no issues. Webrevs: http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2_to_3/ (diff) http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3/ (full) Thanks, Thomas From thomas.schatzl at oracle.com Tue Oct 22 10:13:54 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 12:13:54 +0200 Subject: RFR (XS): 8232771: Revert JDK-8230794 because of environment changes In-Reply-To: <3b0b24e8-be45-6501-f1ce-112585087954@oracle.com> References: <31507a84-fb55-25e7-7ead-3955e0c66a31@oracle.com> <3b0b24e8-be45-6501-f1ce-112585087954@oracle.com> Message-ID: <1f05b5f8-960f-99be-d518-63f6f7dfb3c2@oracle.com> Thanks Per and Stefan for your reviews. Thomas On 22.10.19 12:12, Per Liden wrote: > Looks good. > > /Per > > On 10/22/19 11:53 AM, Thomas Schatzl wrote: >> Hi all, >> >> ?? can I have reviews for this small change that reverts JDK-8230794 >> because it let the failure reported in JDK-8227695 disappear? Also >> there were some environment changes that we think fixes the issue in >> JDK-8227695. >> >> I would like the original code bake again, and see if this hunch is >> correct. >> Later I still want to improve the assert, but first let's see about >> JDK-8227695. So sorry for the back and forth. >> >> This is a straight revert of JDK-8230794 that applied without issues. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8232771 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8232771/webrev/ >> Testing: >> local compilation >> >> Thanks, >> ?? Thomas From shade at redhat.com Tue Oct 22 11:48:15 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Oct 2019 13:48:15 +0200 Subject: RFR (XS) 8232778: Shenandoah: SBSA::arraycopy_prologue checks wrong register Message-ID: <01ea4f42-e33f-64ee-3514-12d19d2dc820@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8232778 Fix: diff -r 24d411cb3a90 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Oct 22 08:57:41 2019 +0200 +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Oct 22 13:39:05 2019 +0200 @@ -58,7 +58,7 @@ Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); __ ldrb(rscratch1, gc_state); if (dest_uninitialized) { - __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); + __ tbz(rscratch1, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); } else { __ mov(rscratch2, ShenandoahHeap::HAS_FORWARDED | ShenandoahHeap::MARKING); __ tst(rscratch1, rscratch2); The load happens into rscratch1, yet we are testing rscratch2. I think this silently breaks arraycopy to-space guarantees, as rscratch2 may contain garbage. Testing: aarch64 hotspot_gc_shenandoah -- Thanks, -Aleksey From per.liden at oracle.com Tue Oct 22 12:01:18 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 14:01:18 +0200 Subject: RFR: 8231552: ZGC: Refine address space reservation In-Reply-To: References: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> Message-ID: <2b79829d-f577-819d-9577-91351c03fddb@oracle.com> Updated webrev after off-line comments from Stefan and Erik. Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.3 Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.3-diff /Per On 10/16/19 10:41 AM, Per Liden wrote: > Latest version of this patch, rebased on today's jdk/jdk: > > http://cr.openjdk.java.net/~pliden/8231552/webrev.2 > > /Per > > On 10/3/19 11:45 AM, Per Liden wrote: >> We could be slightly more sophisticated and do a better job reserving >> address space in situations where parts of the address space is >> already occupied or when the process is running with address space >> limitations. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231552 >> Webrev: http://cr.openjdk.java.net/~pliden/8231552/webrev.0 >> >> /Per From rkennke at redhat.com Tue Oct 22 12:04:37 2019 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 22 Oct 2019 14:04:37 +0200 Subject: RFR (XS) 8232778: Shenandoah: SBSA::arraycopy_prologue checks wrong register In-Reply-To: <01ea4f42-e33f-64ee-3514-12d19d2dc820@redhat.com> References: <01ea4f42-e33f-64ee-3514-12d19d2dc820@redhat.com> Message-ID: <8be9fb1d-1f8c-b121-026f-f40a12b2ca09@redhat.com> Good spot! Looks good, thanks! Roman > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232778 > > Fix: > > diff -r 24d411cb3a90 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp > --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Oct 22 > 08:57:41 2019 +0200 > +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Oct 22 > 13:39:05 2019 +0200 > @@ -58,7 +58,7 @@ > Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); > __ ldrb(rscratch1, gc_state); > if (dest_uninitialized) { > - __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); > + __ tbz(rscratch1, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); > } else { > __ mov(rscratch2, ShenandoahHeap::HAS_FORWARDED | ShenandoahHeap::MARKING); > __ tst(rscratch1, rscratch2); > > The load happens into rscratch1, yet we are testing rscratch2. I think this silently breaks > arraycopy to-space guarantees, as rscratch2 may contain garbage. > > Testing: aarch64 hotspot_gc_shenandoah > From shade at redhat.com Tue Oct 22 12:12:24 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Oct 2019 14:12:24 +0200 Subject: RFR (XS) 8232778: Shenandoah: SBSA::arraycopy_prologue checks wrong register In-Reply-To: <8be9fb1d-1f8c-b121-026f-f40a12b2ca09@redhat.com> References: <01ea4f42-e33f-64ee-3514-12d19d2dc820@redhat.com> <8be9fb1d-1f8c-b121-026f-f40a12b2ca09@redhat.com> Message-ID: <238286cd-c420-59ab-fe61-d9b3169a4df6@redhat.com> Thanks, I also think it is trivial. Pushed. -Aleksey On 10/22/19 2:04 PM, Roman Kennke wrote: > Good spot! > > Looks good, thanks! > Roman > >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8232778 >> >> Fix: >> >> diff -r 24d411cb3a90 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp >> --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Oct 22 >> 08:57:41 2019 +0200 >> +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Oct 22 >> 13:39:05 2019 +0200 >> @@ -58,7 +58,7 @@ >> Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); >> __ ldrb(rscratch1, gc_state); >> if (dest_uninitialized) { >> - __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); >> + __ tbz(rscratch1, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); >> } else { >> __ mov(rscratch2, ShenandoahHeap::HAS_FORWARDED | ShenandoahHeap::MARKING); >> __ tst(rscratch1, rscratch2); >> >> The load happens into rscratch1, yet we are testing rscratch2. I think this silently breaks >> arraycopy to-space guarantees, as rscratch2 may contain garbage. >> >> Testing: aarch64 hotspot_gc_shenandoah >> > -- Thanks, -Aleksey From erik.osterlund at oracle.com Tue Oct 22 12:39:26 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 22 Oct 2019 14:39:26 +0200 Subject: RFR: 8231552: ZGC: Refine address space reservation In-Reply-To: <2b79829d-f577-819d-9577-91351c03fddb@oracle.com> References: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> <2b79829d-f577-819d-9577-91351c03fddb@oracle.com> Message-ID: <13fefca7-da7a-5f8b-ab5f-f208bdf33940@oracle.com> Hi Per, Looks good. Thanks, /Erik On 10/22/19 2:01 PM, Per Liden wrote: > Updated webrev after off-line comments from Stefan and Erik. > > Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.3 > Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.3-diff > > /Per > > On 10/16/19 10:41 AM, Per Liden wrote: >> Latest version of this patch, rebased on today's jdk/jdk: >> >> http://cr.openjdk.java.net/~pliden/8231552/webrev.2 >> >> /Per >> >> On 10/3/19 11:45 AM, Per Liden wrote: >>> We could be slightly more sophisticated and do a better job >>> reserving address space in situations where parts of the address >>> space is already occupied or when the process is running with >>> address space limitations. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231552 >>> Webrev: http://cr.openjdk.java.net/~pliden/8231552/webrev.0 >>> >>> /Per From per.liden at oracle.com Tue Oct 22 12:55:24 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 22 Oct 2019 14:55:24 +0200 Subject: RFR: 8231552: ZGC: Refine address space reservation In-Reply-To: <13fefca7-da7a-5f8b-ab5f-f208bdf33940@oracle.com> References: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> <2b79829d-f577-819d-9577-91351c03fddb@oracle.com> <13fefca7-da7a-5f8b-ab5f-f208bdf33940@oracle.com> Message-ID: <4c6adb69-ce8b-5b35-2bb0-644b55ef229d@oracle.com> Thanks Erik! /Per On 10/22/19 2:39 PM, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > > On 10/22/19 2:01 PM, Per Liden wrote: >> Updated webrev after off-line comments from Stefan and Erik. >> >> Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.3 >> Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.3-diff >> >> /Per >> >> On 10/16/19 10:41 AM, Per Liden wrote: >>> Latest version of this patch, rebased on today's jdk/jdk: >>> >>> http://cr.openjdk.java.net/~pliden/8231552/webrev.2 >>> >>> /Per >>> >>> On 10/3/19 11:45 AM, Per Liden wrote: >>>> We could be slightly more sophisticated and do a better job >>>> reserving address space in situations where parts of the address >>>> space is already occupied or when the process is running with >>>> address space limitations. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231552 >>>> Webrev: http://cr.openjdk.java.net/~pliden/8231552/webrev.0 >>>> >>>> /Per > From zgu at redhat.com Tue Oct 22 13:38:57 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 22 Oct 2019 09:38:57 -0400 Subject: RFR 8232747: Shenandoah: Concurrent GC should deactivate SATB before processing weak roots Message-ID: <9c29e649-eb27-be6e-2240-ad3ff99b7462@redhat.com> This is the counterpart of JDK-8231999[1] for Shenandoah concurrent GC. Shenandoah needs to deactivate SATB barrier before processing weak roots, to avoid barrier side-effects on its paths. Bug: https://bugs.openjdk.java.net/browse/JDK-8232747 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232747/webrev.00/index.html Test: hotspot_gc_shenandoah (fastdebug and release) on Linux x86_64 Thanks, -Zhengyu [1] https://bugs.openjdk.java.net/browse/JDK-8231999 From stefan.johansson at oracle.com Tue Oct 22 13:41:46 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 22 Oct 2019 15:41:46 +0200 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: <400df998-171a-5bbe-9f3e-01af1781afb4@oracle.com> References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> <400df998-171a-5bbe-9f3e-01af1781afb4@oracle.com> Message-ID: Hi Haoyu, I've reviewed the patch now and have some comments and questions. To simplify the review and have a common base to look at I've created a webrev at: http://cr.openjdk.java.net/~sjohanss/8220465/00/ One general note first, most of the new code uses four space indentation, in hotspot the standard is two spaces, please change this. Below are some file by file comments. src/hotspot/share/gc/parallel/psCompactionManager.cpp --- 53 GrowableArray* ParCompactionManager::_free_shadow = new (ResourceObj::C_HEAP, mtInternal) GrowableArray(10, true); 54 Monitor* ParCompactionManager::_monitor = NULL; Set _free_shadow to NULL here like the other statics and then create the GrowableArray in initialize(). I also think _shadow_region_array or something like that would be a better name and the monitor should also be named something that signals that it is used for this array. --- 70 if (_monitor == NULL) { 71 _monitor = new Monitor(Mutex::barrier, "CompactionManager monitor", 72 Mutex::_allow_vm_block_flag, Monitor::_safepoint_check_never); 73 } Instead of doing the monitor creation here having to check for NULL, do it in initialize() below together with the array creation. --- src/hotspot/share/gc/parallel/psParallelCompact.cpp --- 2974 if (cur->push()) { Correct me if I'm wrong, if this call to push() returns true it means that nobody else has "stolen" it (used a shadow region to prepare it) and we mark it as pushed. But when pushed in this code path this is the end state for this RegionData? If this is the case I think it would be easier to understand the code if we added another function and state for when we "steal" it. Haven't thought very much about the names but I think you understand what I want to achieve: Normal path: UNUSED -> push() -> NORMAL Steal path: UNUSED -> steal() -> STOLEN -> fill() -> FILLED -> copy() -> SHADOW We could then also assert in set_completed() that the state is either NORMAL or SHADOW (or if they have a shared end state DONE). As I said the names can be improved (both for the states and the functions) but I think we should have names and not just numbers. --- 3060 template 3061 void PSParallelCompact::fill_region(ParCompactionManager* cm, size_t region_idx, size_t shadow, size_t offset) As I told you this was a big improvement from the first patch, but I think there is room for even more improvements around the way we pass in ignored parameters to MoveAndUpdateClosure. Explaining my idea in text is harder than code, so I created a patch, what do you think about this? http://cr.openjdk.java.net/~sjohanss/8220465/00-alt/ This alternative is based on 00 and does not take my other comments into consideration. So it might have to be altered a bit if you address some of my other comments/questions. --- 3196 void PSParallelCompact::copy_back(HeapWord *region_addr, HeapWord *shadow_addr) { I think the paramenter should change place, so that it corresponds with the copy below. --- 3200 bool PSParallelCompact::steal_shadow_region(ParCompactionManager* cm, size_t ®ion_idx) { 3201 size_t& record = cm->shadow_record(); Did you consider to just let shadow_record() be a simple getter instead of getting a reference and then have a next_shadow_record() which advances it by active_workers? --- 3236 void PSParallelCompact::initialize_steal_record(uint which) { I'm having a hard time understanding the details here, or I get that all threads should have a separate shadow record, but I'm not sure why it is not enough to just do: size_t record = _summary_data.addr_to_region_idx( _space_info[old_space_id].dense_prefix()); cm->set_shadow_record(record + which); As you can see I'm also suggesting adding a setter for shadow_record. --- 3434 ParMarkBitMapClosure::IterationStatus 3435 ShadowClosure::do_addr(HeapWord* addr, size_t words) { 3436 HeapWord* shadow_destination = destination() + _offset; Using an offset instead of a given address feels a bit backwards, did you consider letting the closure keep and update a _shadow_destination instead? Or would it even be possible to just set destination to be the shadow region address? In that case it should be possible to just use the do_addr and other functions from the MoveAndUpdateClosure. I see from looking at this particular function that there is one assert that would have to change: 3408 assert(PSParallelCompact::summary_data().calc_new_pointer(source(), compaction_manager()) == 3409 destination(), "wrong destination"); This should be easily fixed by adding a virtual function check_destination, that has a special implementation for the ShadowClosure. --- src/hotspot/share/gc/parallel/psParallelCompact.hpp --- 333 // Preempt the region to avoid double processes 334 inline bool push(); 335 // Mark the region as filled and ready to be copied back 336 inline bool fill(); 337 // Preempt the region to copy the shadow region content back 338 inline bool copy(); As mentioned, I think there might be better names for those functions and the comments. Maybe adding a prefix would make the code more self explaining. try_push(), mark_filled(), try_copy() and the new try_steal(). --- Thanks again for providing this patch, I look forward to see an updated version. Cheers, Stefan On 2019-10-14 15:00, Stefan Johansson wrote: > Thanks for the quick update Haoyu, > > This is a great improvement and I will try to find time to look into the > patch in more detail the coming weeks. > > Thanks, > Stefan > > On 2019-10-11 14:49, Haoyu Li wrote: >> Hi Stefan, >> >> Thanks for your suggestion! It is very redundant that >> PSParallelCompact::fill_shadow_region() copies most code from >> PSParallelCompact::fill_region(), and therefore I've refactored these >> two functions to share code as many as possible. And the attachment is >> the updated patch. >> >> Specifically, the closure, which moves objects, in >> PSParallelCompact::fill_region() is now declared as a template of >> either MoveAndUpdateClosure or ShadowClosure. So by controlling the >> type of closure when invoking the function, we can decide whether to >> fill a normal region or a shadow one. Thus, almost all code in >> PSParallelCompact::fill_region() can be reused. >> >> Besides, a virtual function named complete_region() is added in both >> closures to do some work after the filling, such setting states and >> copying the shadow region back. >> >> Thanks again for reviewing the patch, looking forward to your insights >> and suggestions! >> >> Best Regards, >> Haoyu Li >> >> 2019-10-10 21:50 GMT+08:00, Stefan Johansson >> : >>> Thanks for the clarification =) >>> >>> Moving on to the next part, the code in the patch. So this won't be a >>> full review of the patch but just an initial comment that I would like >>> to be addressed first. >>> >>> The new function PSParallelCompact::fill_shadow_region() is more or less >>> a copy of PSParallelCompact::fill_region() and I understand that from a >>> proof of concept point of view it was the easy (and right) way to do it. >>> I would prefer if the code could be refactored so that fill_region() and >>> fill_shadow_region() share more code. There might be reasons that I've >>> missed, that prevents it, but we should at least explore how much code >>> can be shared. >>> >>> Thanks, >>> Stefan >>> >>> On 2019-10-10 15:10, Haoyu Li wrote: >>>> Hi Stefan, >>>> >>>> Thanks for your quick response! As to your concern about the OCA, I am >>>> the sole author of the patch. And it is the case as what the agreement >>>> states. >>>> Best Regrads, >>>> Haoyu Li, >>>> >>>> >>>> Stefan Johansson >>> > ?2019?10?10??? ??8:37 >>>> ??? >>>> >>>> ???? Hi, >>>> >>>> ???? On 2019-10-10 13:06, Haoyu Li wrote: >>>> ????? > Hi Stefan, >>>> ????? > >>>> ????? > Thanks for your testing! One possible reason for the >>>> regressions >>>> in >>>> ????? > simple tests is that the region dependencies maybe not heavy >>>> enough. >>>> ????? > Because the locality of shadow regions is lower than that of >>>> heap >>>> ????? > regions, writing to shadow regions will be slower than to >>>> normal >>>> ????? > regions, and this is a part of the reason why I reuse shadow >>>> ???? regions. >>>> ????? > Therefore, if only a few shadow regions are created and not >>>> ???? reused, the >>>> ????? > overhead may not be amortized. >>>> >>>> ???? I guess it is something like this. I thought that for "easy" heaps >>>> the >>>> ???? shadow regions won't be used at all, and should therefor not >>>> really >>>> ???? cost >>>> ???? anything. >>>> >>>> ????? > >>>> ????? > As to the OCA, it is the case that I'm the only person >>>> signing the >>>> ????? > agreement. Please let me know if you have any further >>>> questions. >>>> ???? Thanks >>>> ????? > again! >>>> >>>> ???? Ok, so you are the sole author of the patch. The important >>>> part, as >>>> the >>>> ???? agreement states, is: >>>> ???? "no other person or entity, including my employer, has or will >>>> have >>>> ???? rights with respect my contributions" >>>> >>>> ???? Is that the case? >>>> >>>> ???? Thanks, >>>> ???? Stefan >>>> >>>> ????? > >>>> ????? > Best Regrads, >>>> ????? > Haoyu Li >>>> ????? > >>>> ????? > Stefan Johansson >>> ???? >>>> ????? > >>> ???? >> ?2019?10?8??? ?? >>>> 6:49 >>>> ???? ??? >>>> ????? > >>>> ????? >???? Hi Haoyu, >>>> ????? > >>>> ????? >???? I've done some more testing and I haven't seen any issues >>>> ???? with the >>>> ????? >???? patch >>>> ????? >???? so far and the performance looks promising in most >>>> cases. For >>>> ???? simple >>>> ????? >???? tests I've seen some regressions, but I'm not really sure >>>> ???? why. Will do >>>> ????? >???? some more digging. >>>> ????? > >>>> ????? >???? To move forward with this the first thing we need to do is >>>> ???? making sure >>>> ????? >???? that you being covered by the Oracle Contributor >>>> Agreement is >>>> ???? enough. >>>> ????? >?????? From what we can see it is only you as an individual that >>>> ???? has signed >>>> ????? >???? the OCA and in that case it is important that this >>>> statement >>>> ???? from the >>>> ????? >???? OCA is fulfilled: "no other person or entity, including my >>>> ???? employer, >>>> ????? >???? has >>>> ????? >???? or will have rights with respect my contributions" >>>> ????? > >>>> ????? >???? Is this the case for this contribution or should we have >>>> the >>>> ???? university >>>> ????? >???? sign the OCA as well? For more information regarding the >>>> OCA >>>> ???? please >>>> ????? >???? refer to: >>>> ????? > https://www.oracle.com/technetwork/oca-faq-405384.pdf >>>> ????? > >>>> ????? >???? Thanks, >>>> ????? >???? Stefan >>>> ????? > >>>> ????? >???? On 2019-09-16 16:02, Haoyu Li wrote: >>>> ????? >????? > FYI, the evaluation results on OpenJDK 14 are plotted in >>>> the >>>> ????? >???? attachment. >>>> ????? >????? > I compute the full GC throughput by dividing the heap >>>> size >>>> ???? before >>>> ????? >???? full >>>> ????? >????? > GC by the GC pause time, and the results are arithmetic >>>> mean >>>> ????? >???? values of >>>> ????? >????? > ten runs after a warm-up run. The evaluation is >>>> conducted on >>>> a >>>> ????? >???? machine >>>> ????? >????? > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 >>>> ???? physical >>>> ????? >???? cores >>>> ????? >????? > with SMT enabled) and 64G DRAM. >>>> ????? >????? > >>>> ????? >????? > Best Regrads, >>>> ????? >????? > Haoyu Li, >>>> ????? >????? > Institute of Parallel and Distributed Systems(IPADS), >>>> ????? >????? > School of Software, >>>> ????? >????? > Shanghai Jiao Tong University >>>> ????? >????? > >>>> ????? >????? > >>>> ????? >????? > Stefan Johansson >>> ???? >>>> ????? >???? >>> ???? > >>>> ????? >????? > >>> ???? >>>> ????? >???? >>> ???? >>> ?2019?9?12??? ? >>>> ?5:34 >>>> ????? >???? ??? >>>> ????? >????? > >>>> ????? >????? >???? Hi Haoyu, >>>> ????? >????? > >>>> ????? >????? >???? I recently came across your patch and I would >>>> like to >>>> ???? pick up on >>>> ????? >????? >???? some of the things Kim mentioned in his mails. I >>>> ???? especially want >>>> ????? >????? >???? evaluate and investigate if this is a technique >>>> we can >>>> ???? use to >>>> ????? >????? >???? improve the other GCs as well. To start that work I >>>> ???? want to >>>> ????? >???? take the >>>> ????? >????? >???? patch for a spin in our internal performance >>>> testing. >>>> ???? The patch >>>> ????? >????? >???? doesn?t apply clean to the latest JDK repository, so >>>> ???? if you could >>>> ????? >????? >???? provide an updated patch that would be very helpful. >>>> ????? >????? > >>>> ????? >????? >???? It would also be great if you could share some more >>>> ???? information >>>> ????? >????? >???? around the results presented in the paper. For >>>> example, >>>> it >>>> ????? >???? would be >>>> ????? >????? >???? good to get the full command lines for the different >>>> ????? >???? benchmarks so >>>> ????? >????? >???? we can run them locally and reproduce the >>>> ???? results you?ve seen. >>>> ????? >????? > >>>> ????? >????? >???? Thanks, >>>> ????? >????? >???? Stefan >>>> ????? >????? > >>>> ????? >????? >>???? 12 mars 2019 kl. 03:21 skrev Haoyu Li >>>> ???? >>>> ????? >???? > >>>> ????? >????? >>???? >>> ???? >>> ???? >>>: >>>> ????? >????? >> >>>> ????? >????? >>???? Hi Kim, >>>> ????? >????? >> >>>> ????? >????? >>???? Thanks for reviewing and testing the patch. If >>>> there >>>> ???? are any >>>> ????? >????? >>???? failures or performance degradation relevant to the >>>> ???? work, please >>>> ????? >????? >>???? let me know and I'll be very happy to keep >>>> improving >>>> it. >>>> ????? >???? Also, any >>>> ????? >????? >>???? suggestions about code improvements are well >>>> appreciated. >>>> ????? >????? >> >>>> ????? >????? >>???? I'm not quite sure if both G1 and Shenandoah >>>> have the >>>> ???? similar >>>> ????? >????? >>???? region dependency issue, since I haven't studied >>>> their >>>> GC >>>> ????? >????? >>???? behaviors before. If they have, I'm also willing to >>>> ???? propose >>>> ????? >???? a more >>>> ????? >????? >>???? general optimization. >>>> ????? >????? >> >>>> ????? >????? >>???? As to the memory overhead, I believe it will be low >>>> ???? because this >>>> ????? >????? >>???? patch exploits empty regions in the young space >>>> ???? rather than >>>> ????? >????? >>???? off-heap memory to allocate shadow regions, and >>>> also >>>> ???? reuses the >>>> ????? >????? >>???? /_source_region/ field of each /RegionData /to >>>> record >>>> the >>>> ????? >????? >>???? correspongding shadow region index. We only >>>> introduce >>>> ???? a new >>>> ????? >????? >>???? integer filed /_shadow /in the RegionData class to >>>> ???? indicate the >>>> ????? >????? >>???? status of a region, a global /GrowableArray >>>> ???? _free_shadow/ to >>>> ????? >???? store >>>> ????? >????? >>???? the indices of shadow regions, and a global >>>> ???? /Monitor/ to protect >>>> ????? >????? >>???? the array. These information might help if the >>>> memory >>>> ???? overhead >>>> ????? >????? >>???? need to be evaluated. >>>> ????? >????? >> >>>> ????? >????? >>???? Looking forward to your insight. >>>> ????? >????? >> >>>> ????? >????? >>???? Best Regrads, >>>> ????? >????? >>???? Haoyu Li, >>>> ????? >????? >>???? Institute of Parallel and Distributed >>>> Systems(IPADS), >>>> ????? >????? >>???? School of Software, >>>> ????? >????? >>???? Shanghai Jiao Tong University >>>> ????? >????? >> >>>> ????? >????? >> >>>> ????? >????? >>???? Kim Barrett >>> ???? >>>> ????? >???? >>> > >>>> ????? >????? >>???? >>> ???? >>>> ????? >???? >>> ???? >>> ?2019?3?12??? ??6:11 >>>> ??? >>>> ????? >????? >> >>>> ????? >????? >>???????? > On Mar 11, 2019, at 1:45 AM, Kim Barrett >>>> ????? >????? >>???????? >>> ???? >>> ???? > >>>> ????? >???? >>> ???? >>> ???? >>> wrote: >>>> ????? >????? >>???????? > >>>> ????? >????? >>???????? >> On Jan 24, 2019, at 3:58 AM, Haoyu Li >>>> ????? >???? >>>> ???? > >>>> ????? >????? >>???????? >>> ???? >>>> ????? >???? >>> >>>> ???? wrote: >>>> ????? >????? >>???????? >> >>>> ????? >????? >>???????? >> Hi Kim, >>>> ????? >????? >>???????? >> >>>> ????? >????? >>???????? >> I have ported my patch to OpenJDK 13 >>>> according >>>> ???? to your >>>> ????? >????? >>???????? instructions in your last mail, and the >>>> patch is >>>> ???? attached in >>>> ????? >????? >>???????? this mail. The patch does not change much since >>>> ???? PSGC is >>>> ????? >???? indeed >>>> ????? >????? >>???????? pretty stable. >>>> ????? >????? >>???????? >> >>>> ????? >????? >>???????? >> Also, I evaluate the correctness and >>>> ???? performance of >>>> ????? >???? PS full >>>> ????? >????? >>???????? GC with benchmarks from DaCapo, SPECjvm2008, >>>> and >>>> ???? JOlden >>>> ????? >???? suits >>>> ????? >????? >>???????? on a machine with dual Intel Xeon E5-2618L v3 >>>> CPUs(16 >>>> ????? >???? physical >>>> ????? >????? >>???????? cores), 64G DRAM and linux kernel 4.17. The >>>> ???? evaluation >>>> ????? >???? result, >>>> ????? >????? >>???????? indicating 1.9X GC throughput improvement on >>>> ???? average, is >>>> ????? >????? >>???????? attached, too. >>>> ????? >????? >>???????? >> >>>> ????? >????? >>???????? >> However, I have no idea how to further test >>>> this >>>> ????? >???? patch for >>>> ????? >????? >>???????? both correctness and performance. Can I please >>>> ???? get any >>>> ????? >????? >>???????? guidance from you or some sponsor? >>>> ????? >????? >>???????? > >>>> ????? >????? >>???????? > Sorry I missed that you had sent an updated >>>> ???? version of the >>>> ????? >????? >>???????? patch. >>>> ????? >????? >>???????? > >>>> ????? >????? >>???????? > I?ve run the full regression suite across >>>> ???? Oracle-supported >>>> ????? >????? >>???????? platforms.? There are some >>>> ????? >????? >>???????? > failures, but there are almost always some >>>> ???? failures in the >>>> ????? >????? >>???????? later tiers right now.? I?ll start >>>> ????? >????? >>???????? > looking at them tomorrow to figure out >>>> whether >>>> ???? any of them >>>> ????? >????? >>???????? are relevant. >>>> ????? >????? >>???????? > >>>> ????? >????? >>???????? > I?m also planning to run some of our >>>> performance >>>> ????? >???? benchmarks. >>>> ????? >????? >>???????? > >>>> ????? >????? >>???????? > I?ve lightly skimmed the proposed changes. >>>> ???? There might be >>>> ????? >????? >>???????? some code improvements >>>> ????? >????? >>???????? > to be made. >>>> ????? >????? >>???????? > >>>> ????? >????? >>???????? > I?m also wondering if this technique >>>> applies to >>>> ???? other >>>> ????? >????? >>???????? collectors.? It seems like both G1 and >>>> ????? >????? >>???????? > Shenandoah full gc?s might have similar >>>> ???? issues?? If so, a >>>> ????? >????? >>???????? solution that is ParallelGC-specific >>>> ????? >????? >>???????? > is less interesting than one that has broader >>>> ????? >????? >>???????? applicability.? Though maybe this optimization >>>> ????? >????? >>???????? > is less important for G1 and Shenandoah, >>>> since >>>> they >>>> ????? >???? actively >>>> ????? >????? >>???????? try to avoid full gc?s. >>>> ????? >????? >>???????? > >>>> ????? >????? >>???????? > I?m also not clear on how much additional >>>> ???? memory might be >>>> ????? >????? >>???????? temporarily allocated by this >>>> ????? >????? >>???????? > mechanism. >>>> ????? >????? >> >>>> ????? >????? >>???????? I?ve created a CR for this: >>>> ????? >????? >> https://bugs.openjdk.java.net/browse/JDK-8220465 >>>> ????? >????? >> >>>> ????? >????? > >>>> ????? > >>>> >>> >> >> From kim.barrett at oracle.com Tue Oct 22 13:44:22 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 22 Oct 2019 09:44:22 -0400 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> Message-ID: <1898AC1E-0A8C-467C-9CA9-4B02C00A3A07@oracle.com> > On Oct 22, 2019, at 6:13 AM, Thomas Schatzl wrote: > > Hi Kim, > > thanks a lot for taking the time so quickly. > > On 22.10.19 03:20, Kim Barrett wrote: >>> On Oct 19, 2019, at 9:06 AM, Thomas Schatzl wrote: >>> >>> Hi all, >>> >>> there is a new webrev at >>> >>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2/ (full only, >>> there is no point in providing a diff) >>> >>> since I like this solution a lot as it removes a lot of additional >>> >> post-processing. >>> [...] > >> >> I'm glad the new state machine worked out, and allowed the extra task >> to be eliminated. Thanks for going the extra mile with the testing. >> And thanks for turning my pseudo-code into something more readable. My >> comments here mostly suggestions for more of that; I don't think I'd >> want to have to decipher this in 6 months without some helpful >> commentary. :) > > I think I addressed all your comments, and thanks for your suggestions - I agree about having this tricky code well documented. > > Changes are currently running through hs-tier1-5 with the changes that ease reproduction (the webrev.2.testing changes noted in the last email). Since there are no significant code changes apart from documentation, I am confident there will be no issues. > > Webrevs: > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2_to_3/ (diff) > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3/ (full) > > Thanks, > Thomas Looks good. From thomas.schatzl at oracle.com Tue Oct 22 13:45:52 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 15:45:52 +0200 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <1898AC1E-0A8C-467C-9CA9-4B02C00A3A07@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> <1898AC1E-0A8C-467C-9CA9-4B02C00A3A07@oracle.com> Message-ID: <7f150234-4080-b2f9-a791-b456038af795@oracle.com> Hi Kim, On 22.10.19 15:44, Kim Barrett wrote: >> On Oct 22, 2019, at 6:13 AM, Thomas Schatzl wrote: >> >> Hi Kim, >> >> thanks a lot for taking the time so quickly. >> >> On 22.10.19 03:20, Kim Barrett wrote: >>>> On Oct 19, 2019, at 9:06 AM, Thomas Schatzl wrote: >>>> >>>> Hi all, >>>> >>>> there is a new webrev at >>>> >>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2/ (full only, >>>> there is no point in providing a diff) >>>> >>>> since I like this solution a lot as it removes a lot of additional >>>>>> post-processing. >>>> [...] >>>> >>> I'm glad the new state machine worked out, and allowed the extra task >>> to be eliminated. Thanks for going the extra mile with the testing. >>> And thanks for turning my pseudo-code into something more readable. My >>> comments here mostly suggestions for more of that; I don't think I'd >>> want to have to decipher this in 6 months without some helpful >>> commentary. :) >> >> I think I addressed all your comments, and thanks for your suggestions - I agree about having this tricky code well documented. >> >> Changes are currently running through hs-tier1-5 with the changes that ease reproduction (the webrev.2.testing changes noted in the last email). Since there are no significant code changes apart from documentation, I am confident there will be no issues. >> >> Webrevs: >> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2_to_3/ (diff) >> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3/ (full) >> >> Thanks, >> Thomas > > Looks good. > thanks for your review. As expected, the hs-tier1-5 testing found no issues in the meantime. Thanks, Thomas From shade at redhat.com Tue Oct 22 13:55:04 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Oct 2019 15:55:04 +0200 Subject: RFR 8232747: Shenandoah: Concurrent GC should deactivate SATB before processing weak roots In-Reply-To: <9c29e649-eb27-be6e-2240-ad3ff99b7462@redhat.com> References: <9c29e649-eb27-be6e-2240-ad3ff99b7462@redhat.com> Message-ID: On 10/22/19 3:38 PM, Zhengyu Gu wrote: > This is the counterpart of JDK-8231999[1] for Shenandoah concurrent GC. Shenandoah needs to > deactivate SATB barrier before processing weak roots, to avoid barrier side-effects on its paths. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232747 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232747/webrev.00/index.html *) Mmm... In ShenandoahConcurrentMark::finish_mark_from_roots, there is a call: _heap->parallel_cleaning(full_gc); Does it mean new code would perform cleaning twice? *) This comment relates to keeping has_forwarded_objects set on cancelled path: // If we needed to update refs, and concurrent marking has been cancelled, // we need to finish updating references. ...current placement loses that connection. Suggestion: // If this cycle was updating references and got cancelled, we need to keep // the flag on, for subsequent phases to deal with it. *) Maybe we should inline stop_concurrent_marking everywhere to make the flow more obvious... -- Thanks, -Aleksey From shade at redhat.com Tue Oct 22 14:29:03 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Oct 2019 16:29:03 +0200 Subject: RFR (XS) 8232791: Shenandoah: passive mode should disable pacing Message-ID: <9fb18df5-68fb-43e9-81fb-70318e67d8ba@redhat.com> RFE: https://bugs.openjdk.java.net/browse/JDK-8232791 The rationale is in the RFE description. Fix: https://cr.openjdk.java.net/~shade/8232791/webrev.01/ Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From zgu at redhat.com Tue Oct 22 14:30:59 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 22 Oct 2019 10:30:59 -0400 Subject: RFR (XS) 8232791: Shenandoah: passive mode should disable pacing In-Reply-To: <9fb18df5-68fb-43e9-81fb-70318e67d8ba@redhat.com> References: <9fb18df5-68fb-43e9-81fb-70318e67d8ba@redhat.com> Message-ID: Good and trivial. Thanks, -Zhengyu On 10/22/19 10:29 AM, Aleksey Shipilev wrote: > RFE: > https://bugs.openjdk.java.net/browse/JDK-8232791 > > The rationale is in the RFE description. > > Fix: > https://cr.openjdk.java.net/~shade/8232791/webrev.01/ > > Testing: hotspot_gc_shenandoah > From zgu at redhat.com Tue Oct 22 14:31:51 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 22 Oct 2019 10:31:51 -0400 Subject: RFR 8232747: Shenandoah: Concurrent GC should deactivate SATB before processing weak roots In-Reply-To: References: <9c29e649-eb27-be6e-2240-ad3ff99b7462@redhat.com> Message-ID: <3e64dfe9-ea9e-60ed-6a51-1c5c466078c0@redhat.com> Hi Aleksey, On 10/22/19 9:55 AM, Aleksey Shipilev wrote: > On 10/22/19 3:38 PM, Zhengyu Gu wrote: >> This is the counterpart of JDK-8231999[1] for Shenandoah concurrent GC. Shenandoah needs to >> deactivate SATB barrier before processing weak roots, to avoid barrier side-effects on its paths. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232747 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232747/webrev.00/index.html > > *) Mmm... In ShenandoahConcurrentMark::finish_mark_from_roots, there is a call: > _heap->parallel_cleaning(full_gc); It is removed by following. diff --git a/src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp b/src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp +++ b/src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp @@ -442,8 +442,6 @@ weak_refs_work(full_gc); } - _heap->parallel_cleaning(full_gc); - assert(task_queues()->is_empty(), "Should be empty"); TASKQUEUE_STATS_ONLY(task_queues()->print_taskqueue_stats()); TASKQUEUE_STATS_ONLY(task_queues()->reset_taskqueue_stats()); > > Does it mean new code would perform cleaning twice? > > *) This comment relates to keeping has_forwarded_objects set on cancelled path: > > // If we needed to update refs, and concurrent marking has been cancelled, > // we need to finish updating references. > > ...current placement loses that connection. Suggestion: > > // If this cycle was updating references and got cancelled, we need to keep > // the flag on, for subsequent phases to deal with it. > > *) Maybe we should inline stop_concurrent_marking everywhere to make the flow more obvious... > Updated: http://cr.openjdk.java.net/~zgu/JDK-8232747/webrev.01/index.html Thanks, -Zhengyu From shade at redhat.com Tue Oct 22 14:46:17 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Oct 2019 16:46:17 +0200 Subject: RFR 8232747: Shenandoah: Concurrent GC should deactivate SATB before processing weak roots In-Reply-To: <3e64dfe9-ea9e-60ed-6a51-1c5c466078c0@redhat.com> References: <9c29e649-eb27-be6e-2240-ad3ff99b7462@redhat.com> <3e64dfe9-ea9e-60ed-6a51-1c5c466078c0@redhat.com> Message-ID: On 10/22/19 4:31 PM, Zhengyu Gu wrote: > Updated: http://cr.openjdk.java.net/~zgu/JDK-8232747/webrev.01/index.html Right. Looks much better. Still, a few nits: *) We don't need to assert these anymore (we never do in other places) 1481 assert(is_concurrent_mark_in_progress(), "How else could we get here?"); ... 1585 assert(is_concurrent_mark_in_progress(), "How else could we get here?"); *) Newline between lines here, also captialize "Marking..." 1479 concurrent_mark()->finish_mark_from_roots(/* full_gc = */ false); 1480 // marking is completed, deactivate SATB barrier *) This is still awkwardly worded, that's my fault. Let's do this: concurrent_mark()->cancel(); assert(is_concurrent_mark_in_progress(), "How else could we get here?"); set_concurrent_mark_in_progress(false); // If this cycle was updating references, we need to keep the has_forwarded_objects // flag on, for subsequent phases to deal with it. if (process_references()) *) You tested hotspot_gc_shenandoah to verify that adding parallel_cleaning call in mark-compact phase1 is safe, right? Otherwise looks good. -- Thanks, -Aleksey From shade at redhat.com Tue Oct 22 15:14:18 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Oct 2019 17:14:18 +0200 Subject: RFR (XS) 8232802: Shenandoah: transition between "cset" and "pinned_cset" does not require cancelled gc Message-ID: <5f3b9301-a873-7166-6416-8ba4cd358039@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8232802 Fix: https://cr.openjdk.java.net/~shade/8232802/webrev.01/ The failure caught in testing says that transition from cset to pinned-cset is invalid when GC was not cancelled. However, this was only true before JDK-8232575 work. Now, this transition is done in sync_pinned_region_status that is supposed to work on all paths. In this case, Degenerated GC dropped the cancelled GC flag already, and thus blows up the check. The check is excessive and should be removed. Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From zgu at redhat.com Tue Oct 22 15:19:47 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 22 Oct 2019 11:19:47 -0400 Subject: RFR (XS) 8232802: Shenandoah: transition between "cset" and "pinned_cset" does not require cancelled gc In-Reply-To: <5f3b9301-a873-7166-6416-8ba4cd358039@redhat.com> References: <5f3b9301-a873-7166-6416-8ba4cd358039@redhat.com> Message-ID: <4258260f-4737-077e-fd39-12f880e8fc16@redhat.com> Good. Thanks, -Zhengyu On 10/22/19 11:14 AM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232802 > > Fix: > https://cr.openjdk.java.net/~shade/8232802/webrev.01/ > > The failure caught in testing says that transition from cset to pinned-cset is invalid when GC was > not cancelled. However, this was only true before JDK-8232575 work. Now, this transition is done in > sync_pinned_region_status that is supposed to work on all paths. In this case, Degenerated GC > dropped the cancelled GC flag already, and thus blows up the check. > > The check is excessive and should be removed. > > Testing: hotspot_gc_shenandoah > From zgu at redhat.com Tue Oct 22 16:00:39 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 22 Oct 2019 12:00:39 -0400 Subject: RFR 8232747: Shenandoah: Concurrent GC should deactivate SATB before processing weak roots In-Reply-To: References: <9c29e649-eb27-be6e-2240-ad3ff99b7462@redhat.com> <3e64dfe9-ea9e-60ed-6a51-1c5c466078c0@redhat.com> Message-ID: <2207188f-7d8c-3d3a-b1f6-3f4ead520c33@redhat.com> On 10/22/19 10:46 AM, Aleksey Shipilev wrote: > On 10/22/19 4:31 PM, Zhengyu Gu wrote: >> Updated: http://cr.openjdk.java.net/~zgu/JDK-8232747/webrev.01/index.html > > Right. Looks much better. Still, a few nits: > > *) We don't need to assert these anymore (we never do in other places) > > 1481 assert(is_concurrent_mark_in_progress(), "How else could we get here?"); > ... > 1585 assert(is_concurrent_mark_in_progress(), "How else could we get here?"); > > *) Newline between lines here, also captialize "Marking..." > > 1479 concurrent_mark()->finish_mark_from_roots(/* full_gc = */ false); > 1480 // marking is completed, deactivate SATB barrier > > *) This is still awkwardly worded, that's my fault. Let's do this: > > concurrent_mark()->cancel(); > assert(is_concurrent_mark_in_progress(), "How else could we get here?"); > set_concurrent_mark_in_progress(false); > > // If this cycle was updating references, we need to keep the has_forwarded_objects > // flag on, for subsequent phases to deal with it. > > if (process_references()) All fixed and pushed. > > *) You tested hotspot_gc_shenandoah to verify that adding parallel_cleaning call in mark-compact > phase1 is safe, right? Of course. And reran the tests after every iteration. Thanks, -Zhengyu > > Otherwise looks good. > From sangheon.kim at oracle.com Tue Oct 22 16:47:56 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 22 Oct 2019 09:47:56 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> Message-ID: <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> Hi Kim, On 10/22/19 12:19 AM, Kim Barrett wrote: >> On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: >> What do you think about below comment? >> >> // Tries to allocate word_sz in the PLAB of the next "generation" after trying to >> // allocate into dest. Previous_plab_refill_failed indicates whether previous >> // PLAB refill for the original (source) object was failed. > Drop ?was?. Otherwise looks good. Done. Webrev: http://cr.openjdk.java.net/~sangheki/8220311/webrev.3 http://cr.openjdk.java.net/~sangheki/8220311/webrev.3.inc Thanks, Sangheon > >> // Returns a non-NULL pointer if successful, and updates dest if required. >> // Also determines whether we should continue to try to allocate into the various >> // generations or just end trying to allocate. >> HeapWord* allocate_in_next_plab(G1HeapRegionAttr* dest, >> ... >> >> Let me post the webrev when we decide. :) >> >> Thanks, >> Sangheon >> >> >>> ------------------------------------------------------------------------------ >>> >>> Looks good, other than that one comment issue. > From thomas.schatzl at oracle.com Tue Oct 22 17:06:45 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 19:06:45 +0200 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> Message-ID: <649a42fa-3a31-e86c-90c8-f5a408fcfe39@oracle.com> Hi, On 22.10.19 18:47, sangheon.kim at oracle.com wrote: > Hi Kim, > > On 10/22/19 12:19 AM, Kim Barrett wrote: >>> On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: >>> What do you think about below comment? >>> >>> ?? // Tries to allocate word_sz in the PLAB of the next "generation" >>> after trying to >>> ?? // allocate into dest. Previous_plab_refill_failed indicates >>> whether previous >>> ?? // PLAB refill for the original (source) object was failed. >> Drop ?was?.? Otherwise looks good. > Done. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.3 > http://cr.openjdk.java.net/~sangheki/8220311/webrev.3.inc > still good :) Thomas From thomas.schatzl at oracle.com Tue Oct 22 17:30:15 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 19:30:15 +0200 Subject: RFR (M): 8228609: G1 copy cost prediction uses used vs. actual copied bytes Message-ID: Hi all, can I have reviews for this change that makes G1 calculate and the use actual amount of bytes copied for Object Copy phase estimation? The problem is that the "used" value that is currently used for this can differ a lot from the number of actually copied bytes during the parallel phases. Sources for differences are: - TLAB sizing - TLAB/region fragmentation - all of that multiplied by the number of threads Particularly if the amount of copied data is small compared to the number of regions all this can add up and disturb the prediction quite a lot, although overall it's not that bad. It's only that this and other small inaccuracies add up. CR: https://bugs.openjdk.java.net/browse/JDK-8228609 Webrev: http://cr.openjdk.java.net/~tschatzl/8228609/webrev/ Testing: hs-tier1-5 Thanks, Thomas From thomas.schatzl at oracle.com Tue Oct 22 17:35:38 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 19:35:38 +0200 Subject: RFR (S): 8232776: G1 should always take rs_length_diff into account when predicting rs_lengths Message-ID: <2e973399-ce75-7ab4-ce21-58fe63c74f9c@oracle.com> Hi all, can I have reviews for this small change that makes G1 always use the error term for rs-length prediction, not only if G1 sees fit. While rs length prediction is still kind of bad even with this change (and seemingly a band-aid), with that change it is a bit better. While there is a "real" fix for RS length estimation coming that so far looks really good, this change decreases complexity of further changes in G1Policy enough while improving the estimation. CR: https://bugs.openjdk.java.net/browse/JDK-8232776 Webrev: http://cr.openjdk.java.net/~tschatzl/8232776/webrev/ Testing: hs-tier1-5 Thanks, Thomas From sangheon.kim at oracle.com Tue Oct 22 17:36:54 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 22 Oct 2019 10:36:54 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <649a42fa-3a31-e86c-90c8-f5a408fcfe39@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <649a42fa-3a31-e86c-90c8-f5a408fcfe39@oracle.com> Message-ID: <9855fa14-ebf5-4c80-082f-4a26e578ee66@oracle.com> Thanks, Thomas! Sangheon On 10/22/19 10:06 AM, Thomas Schatzl wrote: > Hi, > > On 22.10.19 18:47, sangheon.kim at oracle.com wrote: >> Hi Kim, >> >> On 10/22/19 12:19 AM, Kim Barrett wrote: >>>> On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: >>>> What do you think about below comment? >>>> >>>> ?? // Tries to allocate word_sz in the PLAB of the next >>>> "generation" after trying to >>>> ?? // allocate into dest. Previous_plab_refill_failed indicates >>>> whether previous >>>> ?? // PLAB refill for the original (source) object was failed. >>> Drop ?was?.? Otherwise looks good. >> Done. >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3 >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3.inc >> > > ? still good :) > > Thomas From thomas.schatzl at oracle.com Tue Oct 22 18:02:27 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 20:02:27 +0200 Subject: RFR (S): 8232777: Rename G1Policy::_max_rs_length as it is no maximum Message-ID: <6d966a0f-2f56-4bae-4b55-47eeec7e9d81@oracle.com> Hi all, can I have reviews for this small cleanup that renames G1Policy::_max_rs_length to just _rs_length because the contained value is simply no maximum. This causes some confusion down the line in its use (imo). CR: https://bugs.openjdk.java.net/browse/JDK-8232777 Webrev: http://cr.openjdk.java.net/~tschatzl/8232777/webrev/ Testing: local compilation Thanks, Thomas From thomas.schatzl at oracle.com Tue Oct 22 18:05:13 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 20:05:13 +0200 Subject: RFR (XS): 8232779: G1 current collection parallel time does not include optional evacuation Message-ID: <15f7fa18-0334-c9ff-be69-c8ecb114e363@oracle.com> Hi all, can I have reviews for this change that fixes the calculation of G1GCPhaseTimes::cur_collection_par_time_ms(): we forgot to consider the optional evacuation time. This causes too long Other time, having minor effects on pause time prediction. CR: https://bugs.openjdk.java.net/browse/JDK-8232779 Webrev: http://cr.openjdk.java.net/~tschatzl/8232779/webrev/ Testing: local compilation Thanks, Thomas From thomas.schatzl at oracle.com Tue Oct 22 18:26:22 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 22 Oct 2019 20:26:22 +0200 Subject: RFR (M): 8227739: Merge cost predictions for scanning cards and log buffer entries Message-ID: Hi all, can I have reviews for this change that aligns the cost predictions to the way we do evacuations, i.e. that we first drop all remembered sets onto the card table, and only a fraction of that will be scanned as introduced by JDK-8213108. This code adds all the predictions for ratios etc to align to that code in our prediction model too. After this change (and all previous) changes just sent out for review, mostly JDK-8228609 (which is a prerequisite for this change), predictions are a bit (noticably) better than before :) CR: https://bugs.openjdk.java.net/browse/JDK-8227739 Webrev: http://cr.openjdk.java.net/~tschatzl/8227739/webrev/ Testing: hs-tier1-5, perf testing, pause time keeping improves a little Thanks, Thomas From kim.barrett at oracle.com Tue Oct 22 19:08:09 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 22 Oct 2019 15:08:09 -0400 Subject: RFR (XS): 8232779: G1 current collection parallel time does not include optional evacuation In-Reply-To: <15f7fa18-0334-c9ff-be69-c8ecb114e363@oracle.com> References: <15f7fa18-0334-c9ff-be69-c8ecb114e363@oracle.com> Message-ID: <800ED894-9A67-4590-8C32-51DCE38E9C47@oracle.com> > On Oct 22, 2019, at 2:05 PM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this change that fixes the calculation of G1GCPhaseTimes::cur_collection_par_time_ms(): we forgot to consider the optional evacuation time. > > This causes too long Other time, having minor effects on pause time prediction. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232779 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232779/webrev/ > Testing: > local compilation > > Thanks, > Thomas Looks good. From kim.barrett at oracle.com Tue Oct 22 19:16:37 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 22 Oct 2019 15:16:37 -0400 Subject: RFR (S): 8232777: Rename G1Policy::_max_rs_length as it is no maximum In-Reply-To: <6d966a0f-2f56-4bae-4b55-47eeec7e9d81@oracle.com> References: <6d966a0f-2f56-4bae-4b55-47eeec7e9d81@oracle.com> Message-ID: > On Oct 22, 2019, at 2:02 PM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this small cleanup that renames G1Policy::_max_rs_length to just _rs_length because the contained value is simply no maximum. This causes some confusion down the line in its use (imo). > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232777 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232777/webrev/ > Testing: > local compilation > > Thanks, > Thomas You missed one in a comment: src/hotspot/share/gc/g1/g1Policy.cpp 757 // This is defensive. For a while _max_rs_length could get Otherwise than that, looks good, and trivial. From kim.barrett at oracle.com Tue Oct 22 20:08:06 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 22 Oct 2019 16:08:06 -0400 Subject: RFR (S): 8232776: G1 should always take rs_length_diff into account when predicting rs_lengths In-Reply-To: <2e973399-ce75-7ab4-ce21-58fe63c74f9c@oracle.com> References: <2e973399-ce75-7ab4-ce21-58fe63c74f9c@oracle.com> Message-ID: <29DA6617-8933-4184-9892-C55DA13989CF@oracle.com> > On Oct 22, 2019, at 1:35 PM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this small change that makes G1 always use the error term for rs-length prediction, not only if G1 sees fit. > > While rs length prediction is still kind of bad even with this change (and seemingly a band-aid), with that change it is a bit better. While there is a "real" fix for RS length estimation coming that so far looks really good, this change decreases complexity of further changes in G1Policy enough while improving the estimation. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232776 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232776/webrev/ > Testing: > hs-tier1-5 > > Thanks, > Thomas Looks good. From sangheon.kim at oracle.com Tue Oct 22 20:46:45 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 22 Oct 2019 13:46:45 -0700 Subject: RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3) In-Reply-To: References: Message-ID: <743d16cc-499d-784b-79fc-c006643f9ec5@oracle.com> Hi Thomas, Thanks for your review! On 10/21/19 7:09 AM, Thomas Schatzl wrote: > Hi, > > ? some initial comments looking at the log output: > > On 13.10.19 08:16, sangheon.kim at oracle.com wrote: >> Hi all, >> >> Previous patch conflicts because of JDK-8220310, I'm posting rebased >> one with some refactoring. >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220312/webrev.2 >> Testing: hs-tier 1 ~ 5, with/without UseNUMA >> >> Here's the full patch of 8220310, 8220311 and 8220312. >> http://cr.openjdk.java.net/~sangheki/8220312/webrev.full.2/ >> > > ? - I did not performance impact test the additional logging yet, but > I do not expect issues. > > ? - that's something from the first NUMA patch: > > There is this gc+heap+numa=debug log message "Request memory [address, > address] to be numa id (X)." for every region. > > First, it seems to be on the wrong level, consider a heap with > ten-thousands of regions. This imo clogs the log too much, and I would > prefer to move this information to trace level. Moved to Trace level. > > Second, the full stop at the end is not necessary :) Removed. > > ? - the G1HRPrinter should be made NUMA aware, i.e. print expected > NUMA id for this region > > ? - the casing of NUMA changes depending on message, i.e. sometimes > "NUMA" and other times "numa" in the log messages themselves. I would > recommend uniformly use "NUMA". Changed to "NUMA". > > However I think that all the "NUMA id" in these messages should read > "node id" as at that level we do not manage the OS level NUMA ids any > more. We don't manage but users may configure OS level NUMA ids (e.g. via numactl), so I wanted to print all logs with NUMA id. > > ? - the "numa id" values in the various messages are formatted > differently in the different messages with no apparent guideline: > sometimes the code adds the leading zeros, sometimes not. Also the > separator between node id and value is sometimes ":" and once "=" > > E.g. > > "NUMA id verification: preferred id (matched #): 00 (32), 01 (32), ..." > "Region Allocated / Requested: 99% xxxx/yyyy (numa id 0: 99% ..." > > I am kind of undecided what is best, but probably simply leaving out > the leading zeros is best for the large majority of cases. Okay, will remove leading zeros. > > ? - just a suggestion: "Region Allocated / Requested" -> "Placement > Match Ratio" or so. Maybe somebody else has a better name. "Placement match ratio" feels better but to align with below message, changed to lower case. > > Also in that message I would not print "numa id" at all to make the > message shorter. > > ? - "Worker threads local object process rate" -> "Worker task > locality match rate" seems shorter. Changed to "Worker task locality match ratio" > > Again, to make the message shorter I would prefer that "numa id" were > not printed at all in the details. Tried to minimize but not zero occurrence. > > Not sure if that rate at this point is extremely interesting since G1 > won't even try to improve it at this time, but you can leave it in if > you want. Yeah, I know. But this is sort of logging framework for NUMA, so I would like to leave as is. > > ? - I would *probably* like to have most of these messages split into > "recent" and "total" statistics. Maybe others think that the totals > are okay. Interesting idea. Could you expand your suggestion a bit more? What is "recent"? Or do you mean per GC cycle? > > ? - Again, to save space I would prefer to have the per-node details > in the region summaries in the same line as the original output. I.e. > instead of > > Eden regions: 28->0 (29) > ? From numa id 0: 18->0 > ? From numa id 1: 10->0 > > the following would be much shorter: > > Eden regions: 28->0 (29) (0: 18->0, 1: 10->0) > > As with higher node counts you will get lots of lines with little > content imho. Maybe others think differently? I like your suggestion. > > Also, this would "fix" the problem that when you enabled gc+heap+numa > but not gc+heap, you will see these "From numa id" numbers in the log > without their required context. Alternatively, gc+heap+numa could > automatically enable gc+heap at the same level. Yeah, I know this issue and this is why I like your suggestion! :) > > Comments after some superficial look at the changes themselves: > > ? - G1Regions should be renamed as G1RegionCounts and get a single > line comment like: "Contains per Node id region count". Done. > > ? - G1NodeTimes::Stat: it would probably be useful to have a "rate()" > getter that recalculates the value as needed instead of the member. I'm okay with your suggestion so I tried. :) > > ? - G1HeapTransition::Data::~Data: the "if (soemthing != NULL)" checks > are unnecessary. FREE_C_HEAP_ARRAY does that already. Done. > > Same in G1ParScanThreadState::G1ParscanThreadState. Done. > > ? - I do not understand the name "G1NodeTimes" :) What "time" is that > referring to? It meant 'Phase Times' similar to G1GCPhaseTimes or ReferenceProcessorPhaseTimes. G1NUMAPhaseTimes is better? Or any suggestion for a name? > > ? - G1NUMA::clear_statistics() seems to be unused. Removed G1NUMA::clear_statistics(). > > ? - G1NodeTimes::print_mutator_alloc_stat_info() and > G1NodeTimes::copy_to_sruvivor_stat_info() look very similar. Could the > code be refactored a bit? Good catch. Done. I mostly addressed your comments except below two: - I would *probably* like to have most of these messages split into "recent" and "total" statistics. Maybe others think that the totals are okay. - I do not understand the name "G1NodeTimes" :) What "time" is that referring to? I will post next webrev, if I get other reviews. Thanks, Sangheon > > Thanks, > ? Thomas From sangheon.kim at oracle.com Tue Oct 22 20:49:25 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 22 Oct 2019 13:49:25 -0700 Subject: RFR (S): 8232776: G1 should always take rs_length_diff into account when predicting rs_lengths In-Reply-To: <2e973399-ce75-7ab4-ce21-58fe63c74f9c@oracle.com> References: <2e973399-ce75-7ab4-ce21-58fe63c74f9c@oracle.com> Message-ID: Hi Thomas, On 10/22/19 10:35 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small change that makes G1 always use > the error term for rs-length prediction, not only if G1 sees fit. > > While rs length prediction is still kind of bad even with this change > (and seemingly a band-aid), with that change it is a bit better. While > there is a "real" fix for RS length estimation coming that so far > looks really good, this change decreases complexity of further changes > in G1Policy enough while improving the estimation. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232776 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232776/webrev/ Looks good. Thanks, Sangheon > Testing: > hs-tier1-5 > > Thanks, > ? Thomas From sangheon.kim at oracle.com Tue Oct 22 20:50:45 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 22 Oct 2019 13:50:45 -0700 Subject: RFR (XS): 8232779: G1 current collection parallel time does not include optional evacuation In-Reply-To: <15f7fa18-0334-c9ff-be69-c8ecb114e363@oracle.com> References: <15f7fa18-0334-c9ff-be69-c8ecb114e363@oracle.com> Message-ID: Hi Thomas, On 10/22/19 11:05 AM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this change that fixes the calculation of > G1GCPhaseTimes::cur_collection_par_time_ms(): we forgot to consider > the optional evacuation time. > > This causes too long Other time, having minor effects on pause time > prediction. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8232779 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232779/webrev/ Looks good. Thanks, Sangheon > Testing: > local compilation > > Thanks, > ? Thomas From stefan.johansson at oracle.com Wed Oct 23 06:16:06 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 23 Oct 2019 08:16:06 +0200 Subject: RFR (S): 8232777: Rename G1Policy::_max_rs_length as it is no maximum In-Reply-To: References: <6d966a0f-2f56-4bae-4b55-47eeec7e9d81@oracle.com> Message-ID: <1366cea1-9e01-2982-e211-7417e63be46f@oracle.com> On 2019-10-22 21:16, Kim Barrett wrote: >> On Oct 22, 2019, at 2:02 PM, Thomas Schatzl wrote: >> >> Hi all, >> >> can I have reviews for this small cleanup that renames G1Policy::_max_rs_length to just _rs_length because the contained value is simply no maximum. This causes some confusion down the line in its use (imo). >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8232777 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8232777/webrev/ >> Testing: >> local compilation >> >> Thanks, >> Thomas > > You missed one in a comment: > src/hotspot/share/gc/g1/g1Policy.cpp > 757 // This is defensive. For a while _max_rs_length could get > > Otherwise than that, looks good, and trivial. > Look good, Stefan From sangheon.kim at oracle.com Wed Oct 23 06:39:19 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 22 Oct 2019 23:39:19 -0700 Subject: RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3) In-Reply-To: <743d16cc-499d-784b-79fc-c006643f9ec5@oracle.com> References: <743d16cc-499d-784b-79fc-c006643f9ec5@oracle.com> Message-ID: Hi Thomas, I am posting the next webrev as Kim is waiting it. Webrev: http://cr.openjdk.java.net/~sangheki/8220312/webrev.3 http://cr.openjdk.java.net/~sangheki/8220312/webrev.3.inc Testing: hs-tier 1 ~ 4 with/without UseNUMA. hs-tier5 is almost finished without new failures. Thanks, Sangheon On 10/22/19 1:46 PM, sangheon.kim at oracle.com wrote: > Hi Thomas, > > Thanks for your review! > > On 10/21/19 7:09 AM, Thomas Schatzl wrote: >> Hi, >> >> ? some initial comments looking at the log output: >> >> On 13.10.19 08:16, sangheon.kim at oracle.com wrote: >>> Hi all, >>> >>> Previous patch conflicts because of JDK-8220310, I'm posting rebased >>> one with some refactoring. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.2 >>> Testing: hs-tier 1 ~ 5, with/without UseNUMA >>> >>> Here's the full patch of 8220310, 8220311 and 8220312. >>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.full.2/ >>> >> >> ? - I did not performance impact test the additional logging yet, but >> I do not expect issues. >> >> ? - that's something from the first NUMA patch: >> >> There is this gc+heap+numa=debug log message "Request memory >> [address, address] to be numa id (X)." for every region. >> >> First, it seems to be on the wrong level, consider a heap with >> ten-thousands of regions. This imo clogs the log too much, and I >> would prefer to move this information to trace level. > Moved to Trace level. > >> >> Second, the full stop at the end is not necessary :) > Removed. > >> >> ? - the G1HRPrinter should be made NUMA aware, i.e. print expected >> NUMA id for this region >> >> ? - the casing of NUMA changes depending on message, i.e. sometimes >> "NUMA" and other times "numa" in the log messages themselves. I would >> recommend uniformly use "NUMA". > Changed to "NUMA". > >> >> However I think that all the "NUMA id" in these messages should read >> "node id" as at that level we do not manage the OS level NUMA ids any >> more. > We don't manage but users may configure OS level NUMA ids (e.g. via > numactl), so I wanted to print all logs with NUMA id. > >> >> ? - the "numa id" values in the various messages are formatted >> differently in the different messages with no apparent guideline: >> sometimes the code adds the leading zeros, sometimes not. Also the >> separator between node id and value is sometimes ":" and once "=" >> >> E.g. >> >> "NUMA id verification: preferred id (matched #): 00 (32), 01 (32), ..." >> "Region Allocated / Requested: 99% xxxx/yyyy (numa id 0: 99% ..." >> >> I am kind of undecided what is best, but probably simply leaving out >> the leading zeros is best for the large majority of cases. > Okay, will remove leading zeros. > >> >> ? - just a suggestion: "Region Allocated / Requested" -> "Placement >> Match Ratio" or so. Maybe somebody else has a better name. > "Placement match ratio" feels better but to align with below message, > changed to lower case. > >> >> Also in that message I would not print "numa id" at all to make the >> message shorter. >> >> ? - "Worker threads local object process rate" -> "Worker task >> locality match rate" seems shorter. > Changed to "Worker task locality match ratio" > >> >> Again, to make the message shorter I would prefer that "numa id" were >> not printed at all in the details. > Tried to minimize but not zero occurrence. > >> >> Not sure if that rate at this point is extremely interesting since G1 >> won't even try to improve it at this time, but you can leave it in if >> you want. > Yeah, I know. But this is sort of logging framework for NUMA, so I > would like to leave as is. > >> >> ? - I would *probably* like to have most of these messages split into >> "recent" and "total" statistics. Maybe others think that the totals >> are okay. > Interesting idea. > Could you expand your suggestion a bit more? > What is "recent"? Or do you mean per GC cycle? > >> >> ? - Again, to save space I would prefer to have the per-node details >> in the region summaries in the same line as the original output. I.e. >> instead of >> >> Eden regions: 28->0 (29) >> ? From numa id 0: 18->0 >> ? From numa id 1: 10->0 >> >> the following would be much shorter: >> >> Eden regions: 28->0 (29) (0: 18->0, 1: 10->0) >> >> As with higher node counts you will get lots of lines with little >> content imho. Maybe others think differently? > I like your suggestion. > >> >> Also, this would "fix" the problem that when you enabled gc+heap+numa >> but not gc+heap, you will see these "From numa id" numbers in the log >> without their required context. Alternatively, gc+heap+numa could >> automatically enable gc+heap at the same level. > Yeah, I know this issue and this is why I like your suggestion! :) > >> >> Comments after some superficial look at the changes themselves: >> >> ? - G1Regions should be renamed as G1RegionCounts and get a single >> line comment like: "Contains per Node id region count". > Done. > >> >> ? - G1NodeTimes::Stat: it would probably be useful to have a "rate()" >> getter that recalculates the value as needed instead of the member. > I'm okay with your suggestion so I tried. :) > >> >> ? - G1HeapTransition::Data::~Data: the "if (soemthing != NULL)" >> checks are unnecessary. FREE_C_HEAP_ARRAY does that already. > Done. > >> >> Same in G1ParScanThreadState::G1ParscanThreadState. > Done. > >> >> ? - I do not understand the name "G1NodeTimes" :) What "time" is that >> referring to? > It meant 'Phase Times' similar to G1GCPhaseTimes or > ReferenceProcessorPhaseTimes. > G1NUMAPhaseTimes is better? > Or any suggestion for a name? > >> >> ? - G1NUMA::clear_statistics() seems to be unused. > Removed G1NUMA::clear_statistics(). > >> >> ? - G1NodeTimes::print_mutator_alloc_stat_info() and >> G1NodeTimes::copy_to_sruvivor_stat_info() look very similar. Could >> the code be refactored a bit? > Good catch. Done. > > I mostly addressed your comments except below two: > - I would *probably* like to have most of these messages split into > "recent" and "total" statistics. Maybe others think that the totals > are okay. > - I do not understand the name "G1NodeTimes" :) What "time" is that > referring to? > > I will post next webrev, if I get other reviews. > > Thanks, > Sangheon > > >> >> Thanks, >> ? Thomas > From stefan.johansson at oracle.com Wed Oct 23 07:05:58 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 23 Oct 2019 09:05:58 +0200 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <7f150234-4080-b2f9-a791-b456038af795@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> <1898AC1E-0A8C-467C-9CA9-4B02C00A3A07@oracle.com> <7f150234-4080-b2f9-a791-b456038af795@oracle.com> Message-ID: <8126d900-714b-585a-f2f0-4ce13f71501c@oracle.com> Hi Thomas, On 2019-10-22 15:45, Thomas Schatzl wrote: > Hi Kim, > > On 22.10.19 15:44, Kim Barrett wrote: >>> On Oct 22, 2019, at 6:13 AM, Thomas Schatzl >>> wrote: >>> >>> Hi Kim, >>> >>> ? thanks a lot for taking the time so quickly. >>> >>> On 22.10.19 03:20, Kim Barrett wrote: >>>>> On Oct 19, 2019, at 9:06 AM, Thomas Schatzl >>>>> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> ? there is a new webrev at >>>>> >>>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2/ (full only, >>>>> there is no point in providing a diff) >>>>> >>>>> since I like this solution a lot as it removes a lot of additional >>>>>>> post-processing. >>>>> [...] >>>>> >>>> I'm glad the new state machine worked out, and allowed the extra task >>>> to be eliminated. Thanks for going the extra mile with the testing. >>>> And thanks for turning my pseudo-code into something more readable. My >>>> comments here mostly suggestions for more of that; I don't think I'd >>>> want to have to decipher this in 6 months without some helpful >>>> commentary. :) >>> >>> I think I addressed all your comments, and thanks for your >>> suggestions - I agree about having this tricky code well documented. >>> >>> Changes are currently running through hs-tier1-5 with the changes >>> that ease reproduction (the webrev.2.testing changes noted in the >>> last email). Since there are no significant code changes apart from >>> documentation, I am confident there will be no issues. >>> >>> Webrevs: >>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2_to_3/ (diff) >>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3/ (full) This looks good, and well documented :) One small thing: src/hotspot/share/gc/g1/g1SharedClosures.hpp --- 46 _codeblobs(pss->worker_id(), &_oops, Mark == G1MarkFromRoot) {} What do you think about adding a helper for Mark == G1MarkFromRoot, something like need_strong_processing() and a comment explaining that it will be true during initial mark. --- Thanks, Stefan >>> >>> Thanks, >>> ? Thomas >> >> Looks good. >> > > ? thanks for your review. > > As expected, the hs-tier1-5 testing found no issues in the meantime. > > Thanks, > ? Thomas From per.liden at oracle.com Wed Oct 23 08:21:40 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 23 Oct 2019 10:21:40 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> Message-ID: Hi Sangheon, I noticed that this patch adds os::numa_get_address_id(). That name is misleading as it doesn't return an "address id", but a "numa node id". However, the terminology used in the os class for numa node is "group" (for example, numa_get_groups_num, numa_get_group_id, etc). So I'd suggest we instead name this os::numa_get_group_id(void* address), i.e. an overload of os::numa_get_group_id(). Btw, I think that the numa related names used in the os class are odd, but I guess that are brought over from Solaris. We can refine those at some later time if we want, but for now I think we should follow the naming convention that we have there. Also, I don't think this function should print warnings, as that's up to the caller to decide what to do, what to print, etc. Furthermore, I suggest we remove os::InvalidNUMAId. Other numa functions in the os class returns -1 on error, so I think we should do that here too. Here's a patch with the proposed changes: diff --git a/src/hotspot/os/linux/os_linux.cpp b/src/hotspot/os/linux/os_linux.cpp --- a/src/hotspot/os/linux/os_linux.cpp +++ b/src/hotspot/os/linux/os_linux.cpp @@ -3007,7 +3007,7 @@ return 0; } -int os::numa_get_address_id(void* address) { +int os::numa_get_group_id(void* address) { #ifndef MPOL_F_NODE #define MPOL_F_NODE (1<<0) // Return next IL mode instead of node mask #endif @@ -3016,11 +3016,10 @@ #define MPOL_F_ADDR (1<<1) // Look up VMA using address #endif - int id = InvalidNUMAId; + int id = 0; if (syscall(SYS_get_mempolicy, &id, NULL, 0, address, MPOL_F_NODE | MPOL_F_ADDR) == -1) { - warning("Failed to get numa id at " PTR_FORMAT " with errno=%d", p2i(address), errno); - return InvalidNUMAId; + return -1; } return id; } diff --git a/src/hotspot/share/gc/g1/g1NUMA.cpp b/src/hotspot/share/gc/g1/g1NUMA.cpp --- a/src/hotspot/share/gc/g1/g1NUMA.cpp +++ b/src/hotspot/share/gc/g1/g1NUMA.cpp @@ -164,7 +164,7 @@ uint G1NUMA::index_of_address(HeapWord *address) const { int numa_id = os::numa_get_address_id((void*)address); - if (numa_id == os::InvalidNUMAId) { + if (numa_id == -1) { return UnknownNodeIndex; } else { return index_of_node_id(numa_id); @@ -201,7 +201,7 @@ if (!is_enabled()) { return; } - + if (size_in_bytes == 0) { return; } diff --git a/src/hotspot/share/runtime/os.hpp b/src/hotspot/share/runtime/os.hpp --- a/src/hotspot/share/runtime/os.hpp +++ b/src/hotspot/share/runtime/os.hpp @@ -374,10 +374,7 @@ static size_t numa_get_leaf_groups(int *ids, size_t size); static bool numa_topology_changed(); static int numa_get_group_id(); - - static const int InvalidNUMAId = -1; - - static int numa_get_address_id(void* address); + static int numa_get_group_id(void* address); // Page manipulation struct page_info { cheers, Per On 10/16/19 7:54 PM, sangheon.kim at oracle.com wrote: > Hi Kim, Stefan and Thomas, > > Many thanks for the reviews and suggestions! > > Kim, > I will move page_size() near page_start() before push as you suggested. > As you know, all 3 patches will be pushed together though. > > Thanks, > Sangheon > > > On 10/16/19 7:00 AM, Kim Barrett wrote: >>> On Oct 15, 2019, at 10:33 AM, sangheon.kim at oracle.com wrote: >>> >>> Hi all, >>> >>> Here's revised webrev which addresses: >>> 1) G1RegionToSpaceMapper checks mtJavaHeap and then conditionally >>> calls G1NUMA::request_memory_on_node() (Kim) >>> 2) The signature of G1NUMA::request_memory_on_node(void* address, ,) >>> is changed to have actual address instead of page index. (Stefan) >>> 3) Some local variable name changes at G1RegionToSpaceMapper. i -> >>> region_idx, idx -> page_idx (for local style, used idx instead of index) >>> >>> webrev: >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.5/ >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.5.inc/ >>> Testing: hs-tier 1 ~ 5, with/without UseNUMA >> Looks good. >> >> In g1PageBasedVirtualSpace.cpp, could the newly added definition of >> page_size() >> be moved to be near the existing definition of page_start()?? I don?t >> need a new >> webrev if you move it. >> > From thomas.schatzl at oracle.com Wed Oct 23 08:39:22 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 23 Oct 2019 10:39:22 +0200 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <8126d900-714b-585a-f2f0-4ce13f71501c@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> <1898AC1E-0A8C-467C-9CA9-4B02C00A3A07@oracle.com> <7f150234-4080-b2f9-a791-b456038af795@oracle.com> <8126d900-714b-585a-f2f0-4ce13f71501c@oracle.com> Message-ID: <531dc0fe-236d-110b-65ee-d224ac028130@oracle.com> Hi Stefan, On 23.10.19 09:05, Stefan Johansson wrote: > Hi Thomas, > > On 2019-10-22 15:45, Thomas Schatzl wrote: >> Hi Kim, >> >> On 22.10.19 15:44, Kim Barrett wrote: >>>> On Oct 22, 2019, at 6:13 AM, Thomas Schatzl [...]>>>> Webrevs: >>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2_to_3/ (diff) >>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3/ (full) > > This looks good, and well documented :) > > One small thing: > src/hotspot/share/gc/g1/g1SharedClosures.hpp > --- > ?46???? _codeblobs(pss->worker_id(), &_oops, Mark == G1MarkFromRoot) {} > > What do you think about adding a helper for Mark == G1MarkFromRoot, > something like need_strong_processing() and a comment explaining that it > will be true during initial mark. Something like this? http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3_to_4/ (diff) http://cr.openjdk.java.net/~tschatzl/8230706/webrev.4/ (full) Not completely sure if that is required as searching for G1MarkFromRoot shows that it is only used for the strong shared closures in the initial mark closure set. But I understand that it is nice to be reminded about this. Thanks for your and Kim's reviews. Thanks, Thomas From stefan.johansson at oracle.com Wed Oct 23 08:47:45 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 23 Oct 2019 10:47:45 +0200 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> Message-ID: <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> Hi Sangheon, On 2019-10-22 18:47, sangheon.kim at oracle.com wrote: > Hi Kim, > > On 10/22/19 12:19 AM, Kim Barrett wrote: >>> On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: >>> What do you think about below comment? >>> >>> ?? // Tries to allocate word_sz in the PLAB of the next "generation" >>> after trying to >>> ?? // allocate into dest. Previous_plab_refill_failed indicates >>> whether previous >>> ?? // PLAB refill for the original (source) object was failed. >> Drop ?was?.? Otherwise looks good. > Done. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.3 > http://cr.openjdk.java.net/~sangheki/8220311/webrev.3.inc Looks good in general, just one minor thing, no need for a new webrev though: src/hotspot/share/gc/g1/g1Allocator.cpp --- 144 for (uint nodex_index = 0; nodex_index < _num_alloc_regions; nodex_index++) { The name nodex_index has one too many x:es =) I would prefer node_index. --- Thanks, Stefan > > Thanks, > Sangheon > > >> >>> ?? // Returns a non-NULL pointer if successful, and updates dest if >>> required. >>> ?? // Also determines whether we should continue to try to allocate >>> into the various >>> ?? // generations or just end trying to allocate. >>> ?? HeapWord* allocate_in_next_plab(G1HeapRegionAttr* dest, >>> ... >>> >>> Let me post the webrev when we decide. :) >>> >>> Thanks, >>> Sangheon >>> >>> >>>> ------------------------------------------------------------------------------ >>>> >>>> >>>> Looks good, other than that one comment issue. >> > From stefan.karlsson at oracle.com Wed Oct 23 08:56:08 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 23 Oct 2019 10:56:08 +0200 Subject: RFR: 8232649: ZGC: Add callbacks to ZMemoryManager In-Reply-To: <17249488-d0f4-81ae-3a15-b120cac388af@oracle.com> References: <8793cda6-bec6-dac7-5164-8fc34454286e@oracle.com> <17249488-d0f4-81ae-3a15-b120cac388af@oracle.com> Message-ID: Thanks, Erik. StefanK On 2019-10-22 11:18, erik.osterlund at oracle.com wrote: > Hi Stefan, > > Looks good. > > Thanks, > /Erik > > On 10/21/19 4:06 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to add callbacks to ZMemoryManager. >> >> https://cr.openjdk.java.net/~stefank/8232649/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232649 >> >> This allows users of ZMemoryManager to get callbacks when memory >> regions are inserted, removed, split, and coalesced. This is needed >> to support Windows' stricter requirements for placeholder reserved >> memory. >> >> Thanks, >> StefanK > From stefan.karlsson at oracle.com Wed Oct 23 08:56:25 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 23 Oct 2019 10:56:25 +0200 Subject: RFR: 8232650: ZGC: Add initialization hooks for OS specific code In-Reply-To: References: <5cdd2722-26a4-8e6c-1262-5d97dfd7f46c@oracle.com> Message-ID: Thanks, Erik. StefanK On 2019-10-22 11:18, erik.osterlund at oracle.com wrote: > Hi Stefan, > > Looks good. > > Thanks, > /Erik > > On 10/21/19 4:37 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to add initialization hooks for OS specific >> code. >> >> https://cr.openjdk.java.net/~stefank/8232650/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232650 >> >> These hooks are needed to for a Windows port. ZInitialize allows >> syscalls to be dynamically resolved. ZVirtualMemory allows callbacks >> from 8232649 to be initialized. >> >> Thanks, >> StefanK > From stefan.karlsson at oracle.com Wed Oct 23 08:56:51 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 23 Oct 2019 10:56:51 +0200 Subject: RFR: 8232648: ZGC: Move ATTRIBUTE_ALIGNED to the front of declarations In-Reply-To: <201ba7a6-a371-f4c0-340f-a5af14d0323f@oracle.com> References: <201ba7a6-a371-f4c0-340f-a5af14d0323f@oracle.com> Message-ID: <64c2f0ae-2330-3d60-4e44-3b03878aae9f@oracle.com> Thanks, Erik. StefanK On 2019-10-22 11:18, erik.osterlund at oracle.com wrote: > Hi Stefan, > > Looks good. > > Thanks, > /Erik > > On 10/21/19 3:22 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to move ATTRIBUTE_ALIGNED to the front of >> declarations. >> >> https://cr.openjdk.java.net/~stefank/8232648/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232648 >> >> This is done because the Windows compiler requires ATTRIBUTE_ALIGNED >> to be put at the front of declarations. A new macro (ZCACHE_ALIGNED) >> is introduced, and used, to shorten the affected lines. >> >> Thanks, >> StefanK > From thomas.schatzl at oracle.com Wed Oct 23 08:57:08 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 23 Oct 2019 10:57:08 +0200 Subject: RFR (S): 8232777: Rename G1Policy::_max_rs_length as it is no maximum In-Reply-To: <1366cea1-9e01-2982-e211-7417e63be46f@oracle.com> References: <6d966a0f-2f56-4bae-4b55-47eeec7e9d81@oracle.com> <1366cea1-9e01-2982-e211-7417e63be46f@oracle.com> Message-ID: <3ed4d2f2-ffad-65e1-2d95-27e2d69ecba9@oracle.com> Hi Kim, Stefan, thanks for your reviews. For reference, I updated the webrev in place. Thanks, Thomas On 23.10.19 08:16, Stefan Johansson wrote: > > > On 2019-10-22 21:16, Kim Barrett wrote: >>> On Oct 22, 2019, at 2:02 PM, Thomas Schatzl >>> wrote: >>> >>> Hi all, >>> >>> ? can I have reviews for this small cleanup that renames >>> G1Policy::_max_rs_length to just _rs_length because the contained >>> value is simply no maximum. This causes some confusion down the line >>> in its use (imo). >>> >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8232777 >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8232777/webrev/ >>> Testing: >>> local compilation >>> >>> Thanks, >>> ? Thomas >> >> You missed one in a comment: >> src/hotspot/share/gc/g1/g1Policy.cpp >> ? 757???? // This is defensive. For a while _max_rs_length could get >> >> Otherwise than that, looks good, and trivial. >> > Look good, > Stefan From stefan.karlsson at oracle.com Wed Oct 23 08:57:09 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 23 Oct 2019 10:57:09 +0200 Subject: RFR: 8232602: ZGC: Make ZGranuleMap ZAddress agnostic In-Reply-To: References: <4638080b-9f2e-6965-6ed9-a17b32ad3b94@oracle.com> Message-ID: <0711353f-5c0d-23b1-1ae7-c774aa20014a@oracle.com> Thanks, Erik. StefanK On 2019-10-22 11:17, erik.osterlund at oracle.com wrote: > Hi Stefan, > > Looks good. > > Thanks, > /Erik > > On 10/21/19 3:09 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to make ZGranuleMap ZAddress agnostic. >> >> https://cr.openjdk.java.net/~stefank/8232602/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232602 >> >> Currently, the ZGranuleMap get and put functions take an address in >> the heap as a parameter. The address is then converted into an offset >> (into a heap view), before being scaled to a granule. >> >> We want to be able to use the ZGranuleMap for physical memory >> offsets, and not only heap addresses. Therefore, I propose that we >> move the conversions from address to offset out from ZGranuleMap, and >> move it to the current users of ZGranuleMap. >> >> This patch applies on-top of the patch for JDK-8232601. >> >> Thanks, >> StefanK >> > From stefan.karlsson at oracle.com Wed Oct 23 08:57:26 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 23 Oct 2019 10:57:26 +0200 Subject: RFR: 8232601: ZGC: Parameterize the ZGranuleMap table size In-Reply-To: References: Message-ID: <4a213165-79ab-d285-e305-421d4bf5f27f@oracle.com> Thanks, Erik. StefanK On 2019-10-22 11:17, erik.osterlund at oracle.com wrote: > Hi Stefan, > > Looks good. > > Thanks, > /Erik > > On 10/21/19 3:00 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to parameterize the ZGranuleMap table size. >> >> https://cr.openjdk.java.net/~stefank/8232601/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232601 >> >> Previously, the maps were always bound by the range of a virtual >> address space view (ZAddressOffsetMax). We want to be able to use >> ZGranuleMap to map against physical memory offsets, so this RFE >> suggests that we allow users of ZGranuleMap to specify the max offset. >> >> Thanks, >> StefanK > From sakamoto.osamu at nttcom.co.jp Wed Oct 23 09:57:42 2019 From: sakamoto.osamu at nttcom.co.jp (Osamu Sakamoto) Date: Wed, 23 Oct 2019 18:57:42 +0900 Subject: Segmentation Fault occurs when ClassLoader and Metaspace is released in JDK 8 In-Reply-To: <422c9ca2-5053-c761-cb61-f075877bb666@oss.nttdata.com> References: <422c9ca2-5053-c761-cb61-f075877bb666@oss.nttdata.com> Message-ID: <314f9ad2-17df-1082-8816-7af73a96e9fb@nttcom.co.jp_1> Hi Yasumasa, Thank you for answering. > What JVM options did you pass? The following is the JVM options I passed. ----------------------------------------------------------------- -Xmx2048m -Xms2048m -XX:NewSize=412m -XX:MaxNewSize=412m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=15 -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80 -XX:+CMSClassUnloadingEnabled -XX:CompressedClassSpaceSize=64m -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:GCLogFileSize=0 -Xloggc:/var/log/tomcatm0/gc-%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:+AlwaysLockClassLoader ----------------------------------------------------------------- > I guess you used CMS because this problem seems to occur on CMS only [1] [2]. Yes, I used CMS. > So it might be work around not to use CMS. Thank you for telling me work around. But it is difficult to change the GC method, so we would like to solve this issue with CMS GC if possible. > I'm not sure root cause of this issue, but it seems to break ClassLoaderDataGraph::_unloading. > (like double free (delete) of CLD) I checked whether the ClassLoaderDataGraph::_unloading is broken or not, but I didn't know because of the value has been cleaered by NULL or optimized out. Referring ClassLoaderDataGraph[1].cpp, it looks like that _unloading value is saved to ClassLoaderDataGraph::_saved_unloading. But _saved_unloading had been cleared by NULL, too. Is there any other way to check it? [1]http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.cpp#l753 ----------------------------------------------------------------- (gdb) f 10 #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 818??? ??? delete purge_me; (gdb) list ClassLoaderDataGraph::purge 810??? void ClassLoaderDataGraph::purge() { 811??? ? assert(SafepointSynchronize::is_at_safepoint(), "must be at safepoint!"); 812??? ? ClassLoaderData* list = _unloading; 813??? ? _unloading = NULL; 814??? ? ClassLoaderData* next = list; 815??? ? while (next != NULL) { 816??? ??? ClassLoaderData* purge_me = next; 817??? ??? next = purge_me->next(); 818??? ??? delete purge_me; 819??? ? } 820??? ? Metaspace::purge(); 821??? } (gdb) p _unloading $29 = (ClassLoaderData *) 0x0 (gdb) p list $30 = (gdb) p next $31 = (gdb) p ClassLoaderDataGraph::_saved_unloading $32 = (ClassLoaderData *) 0x0 ----------------------------------------------------------------- Thanks, Osamu On 10/21/19 22:29, Yasumasa Suenaga wrote: > Hi Osamu, > > What JVM options did you pass? > > I guess you used CMS because this problem seems to occur on CMS only > [1] [2]. > So it might be work around not to use CMS. > > I'm not sure root cause of this issue, but it seems to break > ClassLoaderDataGraph::_unloading. > (like double free (delete) of CLD) > > > Thanks, > > Yasumasa > > > [1] > http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.hpp#l100 > [2] > http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp#l6384 > > > On 2019/10/21 17:50, Osamu Sakamoto wrote: >> Hi all, >> >> I have a problem about Segmentation Fault(SEGV) in GC and I can't >> make the cause clear. >> Could you help me solve the problem? >> >> Our System uses OpenJDK 1.8.0.181, and crashed by SEGV when purging >> ClassLoader at safepoint. >> This problem can't be reproduced, but this has happened 4 times in a >> few months. >> >> The following is the summary of my investigation. >> >> ============================================================================= >> >> >> First I checked hs_err, and that shows that the SEGV occurred. >> VM_Operation is GenCollectForAllocation at safepoint. >> >> ----------------------------------------------------------------------------- >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> #? SIGSEGV (0xb) at pc=0x00007f6080c97f88, pid=23931, >> tid=0x00007f607c3ed700 >> # >> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build >> 1.8.0_181-b13) >> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode >> linux-amd64 compressed oops) >> # Problematic frame: >> # V? [libjvm.so+0x84bf88] >> # >> # Core dump written. Default location: /opt/tomcate0/core or core.23931 >> # >> # If you would like to submit a bug report, please visit: >> #?? http://bugreport.java.com/bugreport/crash.jsp >> # >> >> ---------------? T H R E A D? --------------- >> >> Current thread (0x00007f6078c00000):? VMThread [stack: >> 0x00007f607c2ed000,0x00007f607c3ee000] [id=23939] >> >> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: >> 0x0000000000000018 >> >> Registers: >> RAX=0x0000000000000010, RBX=0x00007f5ff800ad30, >> RCX=0x0000000000000010, RDX=0x0000000000000000 >> RSP=0x00007f607c3ecb50, RBP=0x00007f607c3ecb80, >> RSI=0x0000000000000002, RDI=0x0000000001cfe570 >> R8 =0x00007f5ff80ae320, R9 =0x00007f5ff8052480, >> R10=0x0000000000000000, R11=0x0000000000000400 >> R12=0x0000000001cfe570, R13=0x00007f6081419470, >> R14=0x0000000000000002, R15=0x00007f6081418640 >> RIP=0x00007f6080c97f88, EFLAGS=0x0000000000010202, >> CSGSFS=0x0000000000000033, ERR=0x0000000000000004 >> ?? TRAPNO=0x000000000000000e >> >> Top of Stack: (sp=0x00007f607c3ecb50) >> 0x00007f607c3ecb50:?? 00007f607c3ecba0 00007f5ff800ad30 >> 0x00007f607c3ecb60:?? 00007f5ff800ad00 0000000000000000 >> 0x00007f607c3ecb70:?? 0000000000000000 0000000000000001 >> 0x00007f607c3ecb80:?? 00007f607c3ecba0 00007f6080c995fa >> 0x00007f607c3ecb90:?? 00007f5ff800ad00 00007f5ff800ac20 >> 0x00007f607c3ecba0:?? 00007f607c3ecbc0 00007f60808bff5e >> 0x00007f607c3ecbb0:?? 00007f5ff800ac20 00007f5ff8052870 >> 0x00007f607c3ecbc0:?? 00007f607c3ecbe0 00007f60808c0f0f >> 0x00007f607c3ecbd0:?? 00007f607c3ecbf0 00007f608140f308 >> 0x00007f607c3ecbe0:?? 00007f607c3ecc30 00007f6080daa0b7 >> 0x00007f607c3ecbf0:?? 00007f6069000100 0000000000000000 >> 0x00007f607c3ecc00:?? 00007f607c3ecc20 00007f6080ed0800 >> 0x00007f607c3ecc10:?? 00000000000000f9 88e95c3ba257ab00 >> 0x00007f607c3ecc20:?? 431bde82d7b634db 00007f607800aa00 >> 0x00007f607c3ecc30:?? 00007f607c3eccc0 00007f6080daa9d5 >> 0x00007f607c3ecc40:?? 0000000000000000 00007f607803bf20 >> 0x00007f607c3ecc50:?? 00007f607803be20 00000000000003e8 >> 0x00007f607c3ecc60:?? 0000000000000001 00007f6078c00000 >> 0x00007f607c3ecc70:?? 00007f607c3eccc0 0000000000000000 >> 0x00007f607c3ecc80:?? 00000004000000f9 00007f60813e2b99 >> 0x00007f607c3ecc90:?? 00007f607803bfa0 00007f6078c00000 >> 0x00007f607c3ecca0:?? 0000000000000000 0000000000000000 >> 0x00007f607c3eccb0:?? 00007f6081418bd0 00007f607803bf20 >> 0x00007f607c3eccc0:?? 00007f607c3ece60 00007f6080f2048a >> 0x00007f607c3eccd0:?? 00007f607c3ecd20 00007f607c3ecce0 >> 0x00007f607c3ecce0:?? 00007f6078c00000 00007f6078c00980 >> 0x00007f607c3eccf0:?? 00007f6078c009c0 00007f6078c009d0 >> 0x00007f607c3ecd00:?? 00007f6078c00aa8 00000000000000d8 >> 0x00007f607c3ecd10:?? 00007f6078c00be0 0000000000000000 >> 0x00007f607c3ecd20:?? 00007f607c3ecd28 6e69747563657845 >> 0x00007f607c3ecd30:?? 65706f204d562067 203a6e6f69746172 >> 0x00007f607c3ecd40:?? 656c6c6f436e6547 6c6c41726f467463 >> >> Instructions: (pc=0x00007f6080c97f88) >> 0x00007f6080c97f68:?? b6 12 80 fa 00 74 01 f0 48 0f c1 01 31 c9 31 f6 >> 0x00007f6080c97f78:?? 48 8b 44 0b 10 31 d2 48 85 c0 74 11 0f 1f 40 00 >> 0x00007f6080c97f88:?? 48 8b 40 08 48 83 c2 01 48 85 c0 75 f3 48 83 c1 >> 0x00007f6080c97f98:?? 08 48 01 d6 48 83 f9 20 75 d6 8b 7b 08 48 8b 05 >> >> Register to memory mapping: >> >> RAX=0x0000000000000010 is an unknown value >> RBX=0x00007f5ff800ad30 is an unknown value >> RCX=0x0000000000000010 is an unknown value >> RDX=0x0000000000000000 is an unknown value >> RSP=0x00007f607c3ecb50 is an unknown value >> RBP=0x00007f607c3ecb80 is an unknown value >> RSI=0x0000000000000002 is an unknown value >> RDI=0x0000000001cfe570 is an unknown value >> R8 =0x00007f5ff80ae320 is an unknown value >> R9 =0x00007f5ff8052480 is an unknown value >> R10=0x0000000000000000 is an unknown value >> R11=0x0000000000000400 is an unknown value >> R12=0x0000000001cfe570 is an unknown value >> R13=0x00007f6081419470: in >> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so >> at 0x00007f608044c000 >> R14=0x0000000000000002 is an unknown value >> R15=0x00007f6081418640: in >> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so >> at 0x00007f608044c000 >> >> >> Stack: [0x00007f607c2ed000,0x00007f607c3ee000], >> sp=0x00007f607c3ecb50, free space=1022k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >> C=native code) >> V? [libjvm.so+0x84bf88] >> V? [libjvm.so+0x84d5fa] >> V? [libjvm.so+0x473f5e] >> V? [libjvm.so+0x474f0f] >> V? [libjvm.so+0x95e0b7] >> V? [libjvm.so+0x95e9d5] >> V? [libjvm.so+0xad448a] >> V? [libjvm.so+0xad48f1] >> V? [libjvm.so+0x8beb82] >> >> VM_Operation (0x00007f5fd69e6120): GenCollectForAllocation, mode: >> safepoint, requested by thread 0x00007f6079013800 >> >> ... >> ----------------------------------------------------------------------------- >> >> >> >> >> Next, I used GDB to check the backtrace of the SEGV thread from the >> coredump. >> The following is the backtrace. >> The SEGV occurred when ClassLoader is purged and Metaspace is >> destructed. >> And frame #7 shows that a signal(SEGV) handler is called after >> SpaceManager::~SpaceManager() is executed. >> >> ----------------------------------------------------------------------------- >> >> (gdb) bt >> #0? 0x00007f608146f1f7 in __GI_raise (sig=sig at entry=6) at >> ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >> #1? 0x00007f60814708e8 in __GI_abort () at abort.c:90 >> #2? 0x00007f6080d0bc39 in os::abort (dump_core=) at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1519 >> #3? 0x00007f6080f1b816 in VMError::report_and_die >> (this=this at entry=0x7f607c3ebd10) at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060 >> #4? 0x00007f6080d15927 in JVM_handle_linux_signal (sig=11, >> info=0x7f607c3ebfb0, ucVoid=0x7f607c3ebe80, >> abort_if_unrecognized=) >> ???? at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 >> #5? 0x00007f6080d09038 in signalHandler (sig=11, info=0x7f607c3ebfb0, >> uc=0x7f607c3ebe80) at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4446 >> #6? >> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, >> __in_chrg=) at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >> #8? 0x00007f6080c995fa in Metaspace::~Metaspace (this=0x7f5ff800ad00, >> __in_chrg=) >> ???? at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2971 >> #9? 0x00007f60808bff5e in ClassLoaderData::~ClassLoaderData >> (this=0x7f5ff800ac20, __in_chrg=) >> ???? at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:383 >> #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 >> #11 0x00007f6080daa0b7 in ClassLoaderDataGraph::purge_if_needed () at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.hpp:104 >> #12 SafepointSynchronize::do_cleanup_tasks () at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:551 >> #13 0x00007f6080daa9d5 in SafepointSynchronize::begin () at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:402 >> #14 0x00007f6080f2048a in VMThread::loop >> (this=this at entry=0x7f6078c00000) at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:501 >> #15 0x00007f6080f208f1 in VMThread::run (this=0x7f6078c00000) at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:276 >> >> #16 0x00007f6080d0ab82 in java_start (thread=0x7f6078c00000) at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:796 >> >> #17 0x00007f6081e2de25 in start_thread (arg=0x7f607c3ed700) at >> pthread_create.c:308 >> #18 0x00007f608153234d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 >> ----------------------------------------------------------------------------- >> >> >> >> In Frame #7, Line 2028 (chunk = chunk->next()) is the crash point. >> The variable "chunk" is defined at Line 2025 (Metachunk* chunk = >> chunks_in_use(i);). >> "chunks_in_use(i)" is defined at Line 648 (Metachunk* >> chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; >> }). >> So I checked values of "_chunks_in_use", and understood that >> "_chunks_in_use[2]" has Illegal Address "0x10". >> Therefore, I think that the SEGV occurred because of referencing >> Illegal Address "0x10" at "chunk = chunk->next()". >> >> ----------------------------------------------------------------------------- >> >> (gdb) f 7 >> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, >> __in_chrg=) at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >> 2028??? ??? chunk = chunk->next(); >> (gdb) list >> 2023??? size_t SpaceManager::sum_count_in_chunks_in_use(ChunkIndex i) { >> 2024??? ? size_t count = 0; >> 2025??? ? Metachunk* chunk = chunks_in_use(i); >> 2026??? ? while (chunk != NULL) { >> 2027??? ??? count++; >> 2028??? ??? chunk = chunk->next(); >> 2029??? ? } >> 2030??? ? return count; >> 2031??? } >> 2032 >> (gdb) list SpaceManager::chunks_in_use >> 647??? ? // Accessors >> 648??? ? Metachunk* chunks_in_use(ChunkIndex index) const { return >> _chunks_in_use[index]; } >> ... >> (gdb) p _chunks_in_use >> $11 = {0x7f5fcd41c400, 0x7f5fcd41a000, 0x10, 0x0} >> ----------------------------------------------------------------------------- >> >> >> >> >> The following is disassemble code of "SpaceManager::~SpaceManager()". >> %rax has 0x10 at "0x00007f6080c97f88 <+200>", but I don't understand >> why this "0x10" is inserted to %rax. >> >> ----------------------------------------------------------------------------- >> >> (gdb) disas >> Dump of assembler code for function SpaceManager::~SpaceManager(): >> ??? 0x00007f6080c97ec0 <+0>:??? push?? %rbp >> ??? 0x00007f6080c97ec1 <+1>:??? mov??? %rsp,%rbp >> ??? 0x00007f6080c97ec4 <+4>:??? push?? %r15 >> ??? 0x00007f6080c97ec6 <+6>:??? push?? %r14 >> ??? 0x00007f6080c97ec8 <+8>:??? push?? %r13 >> ??? 0x00007f6080c97eca <+10>:??? push?? %r12 >> ??? 0x00007f6080c97ecc <+12>:??? push?? %rbx >> ??? 0x00007f6080c97ecd <+13>:??? mov??? %rdi,%rbx >> ??? 0x00007f6080c97ed0 <+16>:??? sub??? $0x8,%rsp >> ??? 0x00007f6080c97ed4 <+20>:??? mov 0x780785(%rip),%r12??????? # >> 0x7f6081418660 <_ZN12SpaceManager12_expand_lockE> >> ??? 0x00007f6080c97edb <+27>:??? test?? %r12,%r12 >> ??? 0x00007f6080c97ede <+30>:??? je???? 0x7f6080c97ee8 >> >> ??? 0x00007f6080c97ee0 <+32>:??? mov??? %r12,%rdi >> ??? 0x00007f6080c97ee3 <+35>:??? callq? 0x7f6080cce2f0 >> >> ??? 0x00007f6080c97ee8 <+40>:??? movslq 0x8(%rbx),%rcx >> ??? 0x00007f6080c97eec <+44>:??? lea 0x78075d(%rip),%rdx??????? # >> 0x7f6081418650 <_ZN12MetaspaceAux15_capacity_wordsE> >> ??? 0x00007f6080c97ef3 <+51>:??? lea 0x781576(%rip),%r13??????? # >> 0x7f6081419470 <_ZN2os16_processor_countE> >> ??? 0x00007f6080c97efa <+58>:??? lea 0x78073f(%rip),%r15??????? # >> 0x7f6081418640 <_ZN12MetaspaceAux11_used_wordsE> >> ??? 0x00007f6080c97f01 <+65>:??? mov??? (%rdx,%rcx,8),%rax >> ??? 0x00007f6080c97f05 <+69>:??? sub??? 0x40(%rbx),%rax >> ??? 0x00007f6080c97f09 <+73>:??? mov??? %rax,(%rdx,%rcx,8) >> ??? 0x00007f6080c97f0d <+77>:??? mov??? 0x38(%rbx),%rax >> ??? 0x00007f6080c97f11 <+81>:??? movslq 0x8(%rbx),%rdx >> ??? 0x00007f6080c97f15 <+85>:??? neg??? %rax >> ??? 0x00007f6080c97f18 <+88>:??? cmpl?? $0x1,0x0(%r13) >> ??? 0x00007f6080c97f1d <+93>:??? lea??? (%r15,%rdx,8),%rcx >> ??? 0x00007f6080c97f21 <+97>:??? mov??? $0x1,%edx >> ??? 0x00007f6080c97f26 <+102>:??? jne??? 0x7f6080c97f32 >> >> ??? 0x00007f6080c97f28 <+104>:??? lea 0x74acb4(%rip),%rdx??????? # >> 0x7f60813e2be3 >> ??? 0x00007f6080c97f2f <+111>:??? movzbl (%rdx),%edx >> ??? 0x00007f6080c97f32 <+114>:??? cmp??? $0x0,%dl >> ??? 0x00007f6080c97f35 <+117>:??? je???? 0x7f6080c97f38 >> >> ??? 0x00007f6080c97f37 <+119>:??? lock xadd %rax,(%rcx) >> ??? 0x00007f6080c97f3c <+124>:??? mov??? 0x48(%rbx),%r14 >> ??? 0x00007f6080c97f40 <+128>:??? callq? 0x7f6080c951a0 >> >> ??? 0x00007f6080c97f45 <+133>:??? movslq 0x8(%rbx),%rdx >> ??? 0x00007f6080c97f49 <+137>:??? imul?? %r14,%rax >> ??? 0x00007f6080c97f4d <+141>:??? lea (%r15,%rdx,8),%rcx >> ??? 0x00007f6080c97f51 <+145>:??? mov??? $0x1,%edx >> ??? 0x00007f6080c97f56 <+150>:??? neg??? %rax >> ??? 0x00007f6080c97f59 <+153>:??? cmpl?? $0x1,0x0(%r13) >> ??? 0x00007f6080c97f5e <+158>:??? jne??? 0x7f6080c97f6a >> >> ??? 0x00007f6080c97f60 <+160>:??? lea 0x74ac7c(%rip),%rdx??????? # >> 0x7f60813e2be3 >> ??? 0x00007f6080c97f67 <+167>:??? movzbl (%rdx),%edx >> ??? 0x00007f6080c97f6a <+170>:??? cmp??? $0x0,%dl >> ??? 0x00007f6080c97f6d <+173>:??? je???? 0x7f6080c97f70 >> >> ??? 0x00007f6080c97f6f <+175>:??? lock xadd %rax,(%rcx) >> ??? 0x00007f6080c97f74 <+180>:??? xor??? %ecx,%ecx >> ??? 0x00007f6080c97f76 <+182>:??? xor??? %esi,%esi >> ??? 0x00007f6080c97f78 <+184>:??? mov 0x10(%rbx,%rcx,1),%rax >> ??? 0x00007f6080c97f7d <+189>:??? xor??? %edx,%edx >> ??? 0x00007f6080c97f7f <+191>:??? test?? %rax,%rax >> ??? 0x00007f6080c97f82 <+194>:??? je???? 0x7f6080c97f95 >> >> ??? 0x00007f6080c97f84 <+196>:??? nopl?? 0x0(%rax) >> => 0x00007f6080c97f88 <+200>:??? mov??? 0x8(%rax),%rax >> ??? 0x00007f6080c97f8c <+204>:??? add??? $0x1,%rdx >> ??? 0x00007f6080c97f90 <+208>:??? test?? %rax,%rax >> ... >> (gdb) info registers >> rax??????????? 0x10??? 16 >> rbx??????????? 0x7f5ff800ad30??? 140050159414576 >> rcx??????????? 0x10??? 16 >> rdx??????????? 0x0??? 0 >> rsi??????????? 0x2??? 2 >> rdi??????????? 0x1cfe570??? 30401904 >> rbp??????????? 0x7f607c3ecb80??? 0x7f607c3ecb80 >> rsp??????????? 0x7f607c3ecb50??? 0x7f607c3ecb50 >> r8???????????? 0x7f5ff80ae320??? 140050160083744 >> r9???????????? 0x7f5ff8052480??? 140050159707264 >> r10??????????? 0x0??? 0 >> r11??????????? 0x400??? 1024 >> r12??????????? 0x1cfe570??? 30401904 >> r13??????????? 0x7f6081419470??? 140052462146672 >> r14??????????? 0x2??? 2 >> r15??????????? 0x7f6081418640??? 140052462143040 >> rip??????????? 0x7f6080c97f88??? 0x7f6080c97f88 >> >> eflags???????? 0x206??? [ PF IF ] >> cs???????????? 0x33??? 51 >> ss???????????? 0x2b??? 43 >> ds???????????? 0x0??? 0 >> es???????????? 0x0??? 0 >> fs???????????? 0x0??? 0 >> gs???????????? 0x0??? 0 >> k0???????????? >> k1???????????? >> k2???????????? >> k3???????????? >> k4???????????? >> k5???????????? >> k6???????????? >> k7???????????? >> ----------------------------------------------------------------------------- >> >> >> ============================================================================= >> >> >> >> >> Does anyone know about this case? >> >> Thanks, Osamu >> >> From per.liden at oracle.com Wed Oct 23 10:38:09 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 23 Oct 2019 12:38:09 +0200 Subject: RFR: 8231552: ZGC: Refine address space reservation In-Reply-To: <2b79829d-f577-819d-9577-91351c03fddb@oracle.com> References: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> <2b79829d-f577-819d-9577-91351c03fddb@oracle.com> Message-ID: Another update after Stefan found an incorrect comparison: Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.4 Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.4-diff /Per On 10/22/19 2:01 PM, Per Liden wrote: > Updated webrev after off-line comments from Stefan and Erik. > > Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.3 > Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.3-diff > > /Per > > On 10/16/19 10:41 AM, Per Liden wrote: >> Latest version of this patch, rebased on today's jdk/jdk: >> >> http://cr.openjdk.java.net/~pliden/8231552/webrev.2 >> >> /Per >> >> On 10/3/19 11:45 AM, Per Liden wrote: >>> We could be slightly more sophisticated and do a better job reserving >>> address space in situations where parts of the address space is >>> already occupied or when the process is running with address space >>> limitations. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231552 >>> Webrev: http://cr.openjdk.java.net/~pliden/8231552/webrev.0 >>> >>> /Per From shade at redhat.com Wed Oct 23 10:56:54 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 23 Oct 2019 12:56:54 +0200 Subject: RFR (S) 8222766: Shenandoah: streamline post-LRB CAS barrier (x86) Message-ID: RFE: https://bugs.openjdk.java.net/browse/JDK-8222766 Fix: https://cr.openjdk.java.net/~shade/8222766/webrev.07/ I hope the comments in the new code are self-explanatory. This rewrite allows us to ditch resolve_fwd_ptr and its awkward borrowing scheme. Since it is removing two of three fwdptr resolves, it also considerably improves the generated code quality for CAS -- which is measurable on microbenchmarks. The AArch64 counterpart comes later in JDK-8232782. Compare: https://cr.openjdk.java.net/~shade/8222766/shenandoah-cas-before.perfasm https://cr.openjdk.java.net/~shade/8222766/shenandoah-cas-after.perfasm Testing: {x86_32, x86_64} hotspot_gc_shenandoah; jcstress runs -- Thanks, -Aleksey From rkennke at redhat.com Wed Oct 23 10:59:50 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 23 Oct 2019 12:59:50 +0200 Subject: RFR (S) 8222766: Shenandoah: streamline post-LRB CAS barrier (x86) In-Reply-To: References: Message-ID: <0794092a-76f4-fee3-c537-aa6533701a1c@redhat.com> > RFE: > https://bugs.openjdk.java.net/browse/JDK-8222766 > > Fix: > https://cr.openjdk.java.net/~shade/8222766/webrev.07/ > > I hope the comments in the new code are self-explanatory. This rewrite allows us to ditch > resolve_fwd_ptr and its awkward borrowing scheme. Since it is removing two of three fwdptr resolves, > it also considerably improves the generated code quality for CAS -- which is measurable on > microbenchmarks. > > The AArch64 counterpart comes later in JDK-8232782. > > Compare: > https://cr.openjdk.java.net/~shade/8222766/shenandoah-cas-before.perfasm > https://cr.openjdk.java.net/~shade/8222766/shenandoah-cas-after.perfasm > > Testing: {x86_32, x86_64} hotspot_gc_shenandoah; jcstress runs Looks good to me! Thanks, Roman From erik.osterlund at oracle.com Wed Oct 23 13:01:04 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Wed, 23 Oct 2019 15:01:04 +0200 Subject: RFR: 8231552: ZGC: Refine address space reservation In-Reply-To: References: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> <2b79829d-f577-819d-9577-91351c03fddb@oracle.com> Message-ID: Hi Per, Looks good. Thanks, /Erik On 10/23/19 12:38 PM, Per Liden wrote: > Another update after Stefan found an incorrect comparison: > > Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.4 > Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.4-diff > > /Per > > On 10/22/19 2:01 PM, Per Liden wrote: >> Updated webrev after off-line comments from Stefan and Erik. >> >> Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.3 >> Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.3-diff >> >> /Per >> >> On 10/16/19 10:41 AM, Per Liden wrote: >>> Latest version of this patch, rebased on today's jdk/jdk: >>> >>> http://cr.openjdk.java.net/~pliden/8231552/webrev.2 >>> >>> /Per >>> >>> On 10/3/19 11:45 AM, Per Liden wrote: >>>> We could be slightly more sophisticated and do a better job >>>> reserving address space in situations where parts of the address >>>> space is already occupied or when the process is running with >>>> address space limitations. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231552 >>>> Webrev: http://cr.openjdk.java.net/~pliden/8231552/webrev.0 >>>> >>>> /Per From stefan.karlsson at oracle.com Wed Oct 23 13:06:38 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 23 Oct 2019 15:06:38 +0200 Subject: RFC: JEP: ZGC on Windows Message-ID: <9f10be76-8f7f-662f-d4ac-dd96acc1f507@oracle.com> Hi all, ZGC is currently available on Linux/x64 and Linux/AArch64. There's Candidate JEP to add macOS support [1]. We would also like to add support for ZGC on Windows. I've prepared a JEP draft [2] for that work. Most of the ZGC code base is platform independent and requires no Windows-specific changes. The existing load barrier support for x64 is OS agnostic and can also be used on Windows. The platform specific code that needs to be ported relates to how address space is reserved and how physical memory is mapped into a reserved address space. Please see the details in the JEP for more information. Feedback is welcome! Thanks, StefanK [1] https://openjdk.java.net/jeps/364 [2] https://openjdk.java.net/jeps/8232364 From per.liden at oracle.com Wed Oct 23 13:28:30 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 23 Oct 2019 15:28:30 +0200 Subject: RFR: 8231552: ZGC: Refine address space reservation In-Reply-To: References: <5015ca7b-3e3e-b2bd-c3f8-0a83ecdb41d8@oracle.com> <2b79829d-f577-819d-9577-91351c03fddb@oracle.com> Message-ID: <8a9929f3-cad7-4f14-6315-dec78135d0bd@oracle.com> Thanks Erik! /Per On 2019-10-23 15:01, erik.osterlund at oracle.com wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > > On 10/23/19 12:38 PM, Per Liden wrote: >> Another update after Stefan found an incorrect comparison: >> >> Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.4 >> Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.4-diff >> >> /Per >> >> On 10/22/19 2:01 PM, Per Liden wrote: >>> Updated webrev after off-line comments from Stefan and Erik. >>> >>> Full: http://cr.openjdk.java.net/~pliden/8231552/webrev.3 >>> Diff: http://cr.openjdk.java.net/~pliden/8231552/webrev.3-diff >>> >>> /Per >>> >>> On 10/16/19 10:41 AM, Per Liden wrote: >>>> Latest version of this patch, rebased on today's jdk/jdk: >>>> >>>> http://cr.openjdk.java.net/~pliden/8231552/webrev.2 >>>> >>>> /Per >>>> >>>> On 10/3/19 11:45 AM, Per Liden wrote: >>>>> We could be slightly more sophisticated and do a better job >>>>> reserving address space in situations where parts of the address >>>>> space is already occupied or when the process is running with >>>>> address space limitations. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231552 >>>>> Webrev: http://cr.openjdk.java.net/~pliden/8231552/webrev.0 >>>>> >>>>> /Per > From rkennke at redhat.com Wed Oct 23 14:29:29 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 23 Oct 2019 16:29:29 +0200 Subject: [11u] RFR: 8231085: C2/GC: Better GC-interface for expanding clone Message-ID: <582a0140-7b8e-cdd6-d0d3-d58c0964ccb0@redhat.com> I would like to backport the recent GC interface for expanding clones to jdk11u. This is a prerequisite to backport related Shenandoah changes to 11u without making a mess. The change differs from the original jdk14 change because it basically skips the intermediate GC interface for the same thing that's been introduced in jdk12. This one wholly replaces that. Bug: https://bugs.openjdk.java.net/browse/JDK-8231085 Original webrev: http://cr.openjdk.java.net/~rkennke/JDK-8231085/webrev.00/ JDK11u webrev: http://cr.openjdk.java.net/~rkennke/JDK-8231085/webrev.00.jdk11u/ Testing: tier1 and tier2 no regressions Good? Roman From leihouyju at gmail.com Wed Oct 23 15:15:52 2019 From: leihouyju at gmail.com (Haoyu Li) Date: Wed, 23 Oct 2019 23:15:52 +0800 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> <400df998-171a-5bbe-9f3e-01af1781afb4@oracle.com> Message-ID: Hi Stefan, Thanks for your constructive feedback. I've addressed all the issues you mentioned, and the updated patch is attached in this email. During refining the patch, I have a couple of questions: 1) Now the MoveAndUpdateClosure and ShadowClosure assume the destination address is the very beginning of a region, instead of an arbitrary address like what it used to be. However, there is an unused function named PSParallelCompact::move_and_update() uses the MoveAndUpdateClosure to process a region from its middle, which conflicts with the assumption. I notice that you removed this function in your patch, and so did I in the updated patch. Does it matter? 2) Using the same do_addr() in MoveAndUpdateClosure and ShadowClosure is doable, but it does not reuse all the code neatly. Because storing the address of the shadow region in _destination requires extra virtual functions to handle allocating blocks in the start_array and setting addresses of deferred objects. In particular, allocate_blocks() and set_deferred_object_for() in both closures are added. Is it worth avoiding to use _offset to calculate the shadow_destination? If there are any problems with this patch, please contact me anytime. I'm more than happy to keep improving the code. Thanks again for reviewing. Best, Haoyu Li Stefan Johansson ?2019?10?22??? ??9:42??? > Hi Haoyu, > > I've reviewed the patch now and have some comments and questions. > > To simplify the review and have a common base to look at I've created a > webrev at: > http://cr.openjdk.java.net/~sjohanss/8220465/00/ > > One general note first, most of the new code uses four space > indentation, in hotspot the standard is two spaces, please change this. > Below are some file by file comments. > > src/hotspot/share/gc/parallel/psCompactionManager.cpp > --- > 53 GrowableArray* ParCompactionManager::_free_shadow = new > (ResourceObj::C_HEAP, mtInternal) GrowableArray(10, true); > 54 Monitor* ParCompactionManager::_monitor = NULL; > > Set _free_shadow to NULL here like the other statics and then create the > GrowableArray in initialize(). I also think _shadow_region_array or > something like that would be a better name and the monitor should also > be named something that signals that it is used for this array. > --- > 70 if (_monitor == NULL) { > 71 _monitor = new Monitor(Mutex::barrier, "CompactionManager > monitor", > 72 Mutex::_allow_vm_block_flag, > Monitor::_safepoint_check_never); > 73 } > > Instead of doing the monitor creation here having to check for NULL, do > it in initialize() below together with the array creation. > --- > > src/hotspot/share/gc/parallel/psParallelCompact.cpp > --- > 2974 if (cur->push()) { > > Correct me if I'm wrong, if this call to push() returns true it means > that nobody else has "stolen" it (used a shadow region to prepare it) > and we mark it as pushed. But when pushed in this code path this is the > end state for this RegionData? If this is the case I think it would be > easier to understand the code if we added another function and state for > when we "steal" it. Haven't thought very much about the names but I > think you understand what I want to achieve: > Normal path: > UNUSED -> push() -> NORMAL > Steal path: > UNUSED -> steal() -> STOLEN -> fill() -> FILLED -> copy() -> SHADOW > > We could then also assert in set_completed() that the state is either > NORMAL or SHADOW (or if they have a shared end state DONE). As I said > the names can be improved (both for the states and the functions) but I > think we should have names and not just numbers. > --- > > 3060 template > 3061 void PSParallelCompact::fill_region(ParCompactionManager* cm, > size_t region_idx, size_t shadow, size_t offset) > > As I told you this was a big improvement from the first patch, but I > think there is room for even more improvements around the way we pass in > ignored parameters to MoveAndUpdateClosure. Explaining my idea in text > is harder than code, so I created a patch, what do you think about this? > http://cr.openjdk.java.net/~sjohanss/8220465/00-alt/ > > This alternative is based on 00 and does not take my other comments into > consideration. So it might have to be altered a bit if you address some > of my other comments/questions. > --- > > 3196 void PSParallelCompact::copy_back(HeapWord *region_addr, HeapWord > *shadow_addr) { > > I think the paramenter should change place, so that it corresponds with > the copy below. > --- > > 3200 bool PSParallelCompact::steal_shadow_region(ParCompactionManager* > cm, size_t ®ion_idx) { > 3201 size_t& record = cm->shadow_record(); > > Did you consider to just let shadow_record() be a simple getter instead > of getting a reference and then have a next_shadow_record() which > advances it by active_workers? > --- > > 3236 void PSParallelCompact::initialize_steal_record(uint which) { > > I'm having a hard time understanding the details here, or I get that all > threads should have a separate shadow record, but I'm not sure why it is > not enough to just do: > size_t record = _summary_data.addr_to_region_idx( > _space_info[old_space_id].dense_prefix()); > cm->set_shadow_record(record + which); > > As you can see I'm also suggesting adding a setter for shadow_record. > --- > > 3434 ParMarkBitMapClosure::IterationStatus > 3435 ShadowClosure::do_addr(HeapWord* addr, size_t words) { > 3436 HeapWord* shadow_destination = destination() + _offset; > > Using an offset instead of a given address feels a bit backwards, did > you consider letting the closure keep and update a _shadow_destination > instead? Or would it even be possible to just set destination to be the > shadow region address? In that case it should be possible to just use > the do_addr and other functions from the MoveAndUpdateClosure. > > I see from looking at this particular function that there is one assert > that would have to change: > 3408 > assert(PSParallelCompact::summary_data().calc_new_pointer(source(), > compaction_manager()) == > 3409 destination(), "wrong destination"); > > This should be easily fixed by adding a virtual function > check_destination, that has a special implementation for the ShadowClosure. > --- > > src/hotspot/share/gc/parallel/psParallelCompact.hpp > --- > 333 // Preempt the region to avoid double processes > 334 inline bool push(); > 335 // Mark the region as filled and ready to be copied back > 336 inline bool fill(); > 337 // Preempt the region to copy the shadow region content back > 338 inline bool copy(); > > As mentioned, I think there might be better names for those functions > and the comments. Maybe adding a prefix would make the code more self > explaining. try_push(), mark_filled(), try_copy() and the new try_steal(). > --- > > Thanks again for providing this patch, I look forward to see an updated > version. > > Cheers, > Stefan > > > On 2019-10-14 15:00, Stefan Johansson wrote: > > Thanks for the quick update Haoyu, > > > > This is a great improvement and I will try to find time to look into the > > patch in more detail the coming weeks. > > > > Thanks, > > Stefan > > > > On 2019-10-11 14:49, Haoyu Li wrote: > >> Hi Stefan, > >> > >> Thanks for your suggestion! It is very redundant that > >> PSParallelCompact::fill_shadow_region() copies most code from > >> PSParallelCompact::fill_region(), and therefore I've refactored these > >> two functions to share code as many as possible. And the attachment is > >> the updated patch. > >> > >> Specifically, the closure, which moves objects, in > >> PSParallelCompact::fill_region() is now declared as a template of > >> either MoveAndUpdateClosure or ShadowClosure. So by controlling the > >> type of closure when invoking the function, we can decide whether to > >> fill a normal region or a shadow one. Thus, almost all code in > >> PSParallelCompact::fill_region() can be reused. > >> > >> Besides, a virtual function named complete_region() is added in both > >> closures to do some work after the filling, such setting states and > >> copying the shadow region back. > >> > >> Thanks again for reviewing the patch, looking forward to your insights > >> and suggestions! > >> > >> Best Regards, > >> Haoyu Li > >> > >> 2019-10-10 21:50 GMT+08:00, Stefan Johansson > >> : > >>> Thanks for the clarification =) > >>> > >>> Moving on to the next part, the code in the patch. So this won't be a > >>> full review of the patch but just an initial comment that I would like > >>> to be addressed first. > >>> > >>> The new function PSParallelCompact::fill_shadow_region() is more or > less > >>> a copy of PSParallelCompact::fill_region() and I understand that from a > >>> proof of concept point of view it was the easy (and right) way to do > it. > >>> I would prefer if the code could be refactored so that fill_region() > and > >>> fill_shadow_region() share more code. There might be reasons that I've > >>> missed, that prevents it, but we should at least explore how much code > >>> can be shared. > >>> > >>> Thanks, > >>> Stefan > >>> > >>> On 2019-10-10 15:10, Haoyu Li wrote: > >>>> Hi Stefan, > >>>> > >>>> Thanks for your quick response! As to your concern about the OCA, I am > >>>> the sole author of the patch. And it is the case as what the agreement > >>>> states. > >>>> Best Regrads, > >>>> Haoyu Li, > >>>> > >>>> > >>>> Stefan Johansson >>>> > ?2019?10?10??? ??8:37 > >>>> ??? > >>>> > >>>> Hi, > >>>> > >>>> On 2019-10-10 13:06, Haoyu Li wrote: > >>>> > Hi Stefan, > >>>> > > >>>> > Thanks for your testing! One possible reason for the > >>>> regressions > >>>> in > >>>> > simple tests is that the region dependencies maybe not heavy > >>>> enough. > >>>> > Because the locality of shadow regions is lower than that of > >>>> heap > >>>> > regions, writing to shadow regions will be slower than to > >>>> normal > >>>> > regions, and this is a part of the reason why I reuse shadow > >>>> regions. > >>>> > Therefore, if only a few shadow regions are created and not > >>>> reused, the > >>>> > overhead may not be amortized. > >>>> > >>>> I guess it is something like this. I thought that for "easy" > heaps > >>>> the > >>>> shadow regions won't be used at all, and should therefor not > >>>> really > >>>> cost > >>>> anything. > >>>> > >>>> > > >>>> > As to the OCA, it is the case that I'm the only person > >>>> signing the > >>>> > agreement. Please let me know if you have any further > >>>> questions. > >>>> Thanks > >>>> > again! > >>>> > >>>> Ok, so you are the sole author of the patch. The important > >>>> part, as > >>>> the > >>>> agreement states, is: > >>>> "no other person or entity, including my employer, has or will > >>>> have > >>>> rights with respect my contributions" > >>>> > >>>> Is that the case? > >>>> > >>>> Thanks, > >>>> Stefan > >>>> > >>>> > > >>>> > Best Regrads, > >>>> > Haoyu Li > >>>> > > >>>> > Stefan Johansson >>>> > >>>> > >>>> >> ?2019?10?8??? ?? > >>>> 6:49 > >>>> ??? > >>>> > > >>>> > Hi Haoyu, > >>>> > > >>>> > I've done some more testing and I haven't seen any issues > >>>> with the > >>>> > patch > >>>> > so far and the performance looks promising in most > >>>> cases. For > >>>> simple > >>>> > tests I've seen some regressions, but I'm not really sure > >>>> why. Will do > >>>> > some more digging. > >>>> > > >>>> > To move forward with this the first thing we need to do is > >>>> making sure > >>>> > that you being covered by the Oracle Contributor > >>>> Agreement is > >>>> enough. > >>>> > From what we can see it is only you as an individual > that > >>>> has signed > >>>> > the OCA and in that case it is important that this > >>>> statement > >>>> from the > >>>> > OCA is fulfilled: "no other person or entity, including my > >>>> employer, > >>>> > has > >>>> > or will have rights with respect my contributions" > >>>> > > >>>> > Is this the case for this contribution or should we have > >>>> the > >>>> university > >>>> > sign the OCA as well? For more information regarding the > >>>> OCA > >>>> please > >>>> > refer to: > >>>> > https://www.oracle.com/technetwork/oca-faq-405384.pdf > >>>> > > >>>> > Thanks, > >>>> > Stefan > >>>> > > >>>> > On 2019-09-16 16:02, Haoyu Li wrote: > >>>> > > FYI, the evaluation results on OpenJDK 14 are plotted > in > >>>> the > >>>> > attachment. > >>>> > > I compute the full GC throughput by dividing the heap > >>>> size > >>>> before > >>>> > full > >>>> > > GC by the GC pause time, and the results are arithmetic > >>>> mean > >>>> > values of > >>>> > > ten runs after a warm-up run. The evaluation is > >>>> conducted on > >>>> a > >>>> > machine > >>>> > > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 sockets, 16 > >>>> physical > >>>> > cores > >>>> > > with SMT enabled) and 64G DRAM. > >>>> > > > >>>> > > Best Regrads, > >>>> > > Haoyu Li, > >>>> > > Institute of Parallel and Distributed Systems(IPADS), > >>>> > > School of Software, > >>>> > > Shanghai Jiao Tong University > >>>> > > > >>>> > > > >>>> > > Stefan Johansson >>>> > >>>> > >>>> > > >>>> > > >>>> > >>>> > >>>> >>> ?2019?9?12??? ? > >>>> ?5:34 > >>>> > ??? > >>>> > > > >>>> > > Hi Haoyu, > >>>> > > > >>>> > > I recently came across your patch and I would > >>>> like to > >>>> pick up on > >>>> > > some of the things Kim mentioned in his mails. I > >>>> especially want > >>>> > > evaluate and investigate if this is a technique > >>>> we can > >>>> use to > >>>> > > improve the other GCs as well. To start that work I > >>>> want to > >>>> > take the > >>>> > > patch for a spin in our internal performance > >>>> testing. > >>>> The patch > >>>> > > doesn?t apply clean to the latest JDK repository, > so > >>>> if you could > >>>> > > provide an updated patch that would be very > helpful. > >>>> > > > >>>> > > It would also be great if you could share some more > >>>> information > >>>> > > around the results presented in the paper. For > >>>> example, > >>>> it > >>>> > would be > >>>> > > good to get the full command lines for the > different > >>>> > benchmarks so > >>>> > > we can run them locally and reproduce the > >>>> results you?ve seen. > >>>> > > > >>>> > > Thanks, > >>>> > > Stefan > >>>> > > > >>>> > >> 12 mars 2019 kl. 03:21 skrev Haoyu Li > >>>> > >>>> > > > >>>> > >> >>>> >>>> >>>: > >>>> > >> > >>>> > >> Hi Kim, > >>>> > >> > >>>> > >> Thanks for reviewing and testing the patch. If > >>>> there > >>>> are any > >>>> > >> failures or performance degradation relevant to > the > >>>> work, please > >>>> > >> let me know and I'll be very happy to keep > >>>> improving > >>>> it. > >>>> > Also, any > >>>> > >> suggestions about code improvements are well > >>>> appreciated. > >>>> > >> > >>>> > >> I'm not quite sure if both G1 and Shenandoah > >>>> have the > >>>> similar > >>>> > >> region dependency issue, since I haven't studied > >>>> their > >>>> GC > >>>> > >> behaviors before. If they have, I'm also willing > to > >>>> propose > >>>> > a more > >>>> > >> general optimization. > >>>> > >> > >>>> > >> As to the memory overhead, I believe it will be > low > >>>> because this > >>>> > >> patch exploits empty regions in the young space > >>>> rather than > >>>> > >> off-heap memory to allocate shadow regions, and > >>>> also > >>>> reuses the > >>>> > >> /_source_region/ field of each /RegionData /to > >>>> record > >>>> the > >>>> > >> correspongding shadow region index. We only > >>>> introduce > >>>> a new > >>>> > >> integer filed /_shadow /in the RegionData class to > >>>> indicate the > >>>> > >> status of a region, a global /GrowableArray > >>>> _free_shadow/ to > >>>> > store > >>>> > >> the indices of shadow regions, and a global > >>>> /Monitor/ to protect > >>>> > >> the array. These information might help if the > >>>> memory > >>>> overhead > >>>> > >> need to be evaluated. > >>>> > >> > >>>> > >> Looking forward to your insight. > >>>> > >> > >>>> > >> Best Regrads, > >>>> > >> Haoyu Li, > >>>> > >> Institute of Parallel and Distributed > >>>> Systems(IPADS), > >>>> > >> School of Software, > >>>> > >> Shanghai Jiao Tong University > >>>> > >> > >>>> > >> > >>>> > >> Kim Barrett >>>> > >>>> > >>>> > > >>>> > >> >>>> > >>>> > >>>> >>> ?2019?3?12??? ??6:11 > >>>> ??? > >>>> > >> > >>>> > >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett > >>>> > >> >>>> >>>> > > >>>> > >>>> >>>> >>> wrote: > >>>> > >> > > >>>> > >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li > >>>> > > >>>> > > >>>> > >> >>>> > >>>> > >>>> > >>>> wrote: > >>>> > >> >> > >>>> > >> >> Hi Kim, > >>>> > >> >> > >>>> > >> >> I have ported my patch to OpenJDK 13 > >>>> according > >>>> to your > >>>> > >> instructions in your last mail, and the > >>>> patch is > >>>> attached in > >>>> > >> this mail. The patch does not change much > since > >>>> PSGC is > >>>> > indeed > >>>> > >> pretty stable. > >>>> > >> >> > >>>> > >> >> Also, I evaluate the correctness and > >>>> performance of > >>>> > PS full > >>>> > >> GC with benchmarks from DaCapo, SPECjvm2008, > >>>> and > >>>> JOlden > >>>> > suits > >>>> > >> on a machine with dual Intel Xeon E5-2618L v3 > >>>> CPUs(16 > >>>> > physical > >>>> > >> cores), 64G DRAM and linux kernel 4.17. The > >>>> evaluation > >>>> > result, > >>>> > >> indicating 1.9X GC throughput improvement on > >>>> average, is > >>>> > >> attached, too. > >>>> > >> >> > >>>> > >> >> However, I have no idea how to further test > >>>> this > >>>> > patch for > >>>> > >> both correctness and performance. Can I please > >>>> get any > >>>> > >> guidance from you or some sponsor? > >>>> > >> > > >>>> > >> > Sorry I missed that you had sent an updated > >>>> version of the > >>>> > >> patch. > >>>> > >> > > >>>> > >> > I?ve run the full regression suite across > >>>> Oracle-supported > >>>> > >> platforms. There are some > >>>> > >> > failures, but there are almost always some > >>>> failures in the > >>>> > >> later tiers right now. I?ll start > >>>> > >> > looking at them tomorrow to figure out > >>>> whether > >>>> any of them > >>>> > >> are relevant. > >>>> > >> > > >>>> > >> > I?m also planning to run some of our > >>>> performance > >>>> > benchmarks. > >>>> > >> > > >>>> > >> > I?ve lightly skimmed the proposed changes. > >>>> There might be > >>>> > >> some code improvements > >>>> > >> > to be made. > >>>> > >> > > >>>> > >> > I?m also wondering if this technique > >>>> applies to > >>>> other > >>>> > >> collectors. It seems like both G1 and > >>>> > >> > Shenandoah full gc?s might have similar > >>>> issues? If so, a > >>>> > >> solution that is ParallelGC-specific > >>>> > >> > is less interesting than one that has > broader > >>>> > >> applicability. Though maybe this optimization > >>>> > >> > is less important for G1 and Shenandoah, > >>>> since > >>>> they > >>>> > actively > >>>> > >> try to avoid full gc?s. > >>>> > >> > > >>>> > >> > I?m also not clear on how much additional > >>>> memory might be > >>>> > >> temporarily allocated by this > >>>> > >> > mechanism. > >>>> > >> > >>>> > >> I?ve created a CR for this: > >>>> > >> https://bugs.openjdk.java.net/browse/JDK-8220465 > >>>> > >> > >>>> > > > >>>> > > >>>> > >>> > >> > >> > From sangheon.kim at oracle.com Wed Oct 23 16:20:47 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Wed, 23 Oct 2019 09:20:47 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> Message-ID: <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> Hi Per, Thanks for taking a look at this. I agree all your comments and here's the webrev. - All comments from Per. - Move G1PageBasedVirtualSpace::page_size() near to page_start() from Kim. Webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.6 http://cr.openjdk.java.net/~sangheki/8220310/webrev.6.inc Testing: build test for linux, solaris, windows and mac. FYI, as I think existing numa related API names and -1 stuff seem not good, I planned to refine those later after pushing. But as you said following existing rule and then refine all together later seems better. Thanks, Sangheon On 10/23/19 1:21 AM, Per Liden wrote: > Hi Sangheon, > > I noticed that this patch adds os::numa_get_address_id(). That name is > misleading as it doesn't return an "address id", but a "numa node id". > However, the terminology used in the os class for numa node is "group" > (for example, numa_get_groups_num, numa_get_group_id, etc). So I'd > suggest we instead name this os::numa_get_group_id(void* address), > i.e. an overload of os::numa_get_group_id(). > > Btw, I think that the numa related names used in the os class are odd, > but I guess that are brought over from Solaris. We can refine those at > some later time if we want, but for now I think we should follow the > naming convention that we have there. > > Also, I don't think this function should print warnings, as that's up > to the caller to decide what to do, what to print, etc. > > Furthermore, I suggest we remove os::InvalidNUMAId. Other numa > functions in the os class returns -1 on error, so I think we should do > that here too. > > Here's a patch with the proposed changes: > > > diff --git a/src/hotspot/os/linux/os_linux.cpp > b/src/hotspot/os/linux/os_linux.cpp > --- a/src/hotspot/os/linux/os_linux.cpp > +++ b/src/hotspot/os/linux/os_linux.cpp > @@ -3007,7 +3007,7 @@ > ?? return 0; > ?} > > -int os::numa_get_address_id(void* address) { > +int os::numa_get_group_id(void* address) { > ?#ifndef MPOL_F_NODE > ?#define MPOL_F_NODE???? (1<<0)? // Return next IL mode instead of > node mask > ?#endif > @@ -3016,11 +3016,10 @@ > ?#define MPOL_F_ADDR???? (1<<1)? // Look up VMA using address > ?#endif > > -? int id = InvalidNUMAId; > +? int id = 0; > > ?? if (syscall(SYS_get_mempolicy, &id, NULL, 0, address, MPOL_F_NODE | > MPOL_F_ADDR) == -1) { > -??? warning("Failed to get numa id at " PTR_FORMAT " with errno=%d", > p2i(address), errno); > -??? return InvalidNUMAId; > +??? return -1; > ?? } > ?? return id; > ?} > diff --git a/src/hotspot/share/gc/g1/g1NUMA.cpp > b/src/hotspot/share/gc/g1/g1NUMA.cpp > --- a/src/hotspot/share/gc/g1/g1NUMA.cpp > +++ b/src/hotspot/share/gc/g1/g1NUMA.cpp > @@ -164,7 +164,7 @@ > > ?uint G1NUMA::index_of_address(HeapWord *address) const { > ?? int numa_id = os::numa_get_address_id((void*)address); > -? if (numa_id == os::InvalidNUMAId) { > +? if (numa_id == -1) { > ???? return UnknownNodeIndex; > ?? } else { > ???? return index_of_node_id(numa_id); > @@ -201,7 +201,7 @@ > ?? if (!is_enabled()) { > ???? return; > ?? } > - > + > ?? if (size_in_bytes == 0) { > ???? return; > ?? } > diff --git a/src/hotspot/share/runtime/os.hpp > b/src/hotspot/share/runtime/os.hpp > --- a/src/hotspot/share/runtime/os.hpp > +++ b/src/hotspot/share/runtime/os.hpp > @@ -374,10 +374,7 @@ > ?? static size_t numa_get_leaf_groups(int *ids, size_t size); > ?? static bool?? numa_topology_changed(); > ?? static int??? numa_get_group_id(); > - > -? static const int InvalidNUMAId = -1; > - > -? static int numa_get_address_id(void* address); > +? static int??? numa_get_group_id(void* address); > > ?? // Page manipulation > ?? struct page_info { > > > cheers, > Per > > > On 10/16/19 7:54 PM, sangheon.kim at oracle.com wrote: >> Hi Kim, Stefan and Thomas, >> >> Many thanks for the reviews and suggestions! >> >> Kim, >> I will move page_size() near page_start() before push as you suggested. >> As you know, all 3 patches will be pushed together though. >> >> Thanks, >> Sangheon >> >> >> On 10/16/19 7:00 AM, Kim Barrett wrote: >>>> On Oct 15, 2019, at 10:33 AM, sangheon.kim at oracle.com wrote: >>>> >>>> Hi all, >>>> >>>> Here's revised webrev which addresses: >>>> 1) G1RegionToSpaceMapper checks mtJavaHeap and then conditionally >>>> calls G1NUMA::request_memory_on_node() (Kim) >>>> 2) The signature of G1NUMA::request_memory_on_node(void* address, >>>> ,) is changed to have actual address instead of page index. (Stefan) >>>> 3) Some local variable name changes at G1RegionToSpaceMapper. i -> >>>> region_idx, idx -> page_idx (for local style, used idx instead of >>>> index) >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.5/ >>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.5.inc/ >>>> Testing: hs-tier 1 ~ 5, with/without UseNUMA >>> Looks good. >>> >>> In g1PageBasedVirtualSpace.cpp, could the newly added definition of >>> page_size() >>> be moved to be near the existing definition of page_start()? I don?t >>> need a new >>> webrev if you move it. >>> >> From shade at redhat.com Wed Oct 23 18:17:45 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 23 Oct 2019 20:17:45 +0200 Subject: RFR (XS) 8232908: Shenandoah: compact heuristics has incorrect trigger "Free is lower than allocated recently" Message-ID: <97e90b6e-524b-99d0-9d15-464c23553d28@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8232908 See the discussion in the bug. The fix is to remove the offending trigger: diff -r da4578a0f73d src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp --- a/src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp Mon Sep 30 22:39:11 2019 +0200 +++ b/src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp Wed Oct 23 20:14:48 2019 +0200 @@ -66,11 +66,4 @@ } - if (available < threshold_bytes_allocated) { - log_info(gc)("Trigger: Free (" SIZE_FORMAT "%s) is lower than allocated recently (" SIZE_FORMAT "%s)", - byte_size_in_proper_unit(available), proper_unit_for_byte_size(available), - byte_size_in_proper_unit(threshold_bytes_allocated), proper_unit_for_byte_size(threshold_bytes_allocated)); - return true; - } - size_t bytes_allocated = heap->bytes_allocated_since_gc_start(); if (bytes_allocated > threshold_bytes_allocated) { Testing: hotspot_gc_shenandoah -- Thanks, -Aleksey From hohensee at amazon.com Wed Oct 23 20:37:35 2019 From: hohensee at amazon.com (Hohensee, Paul) Date: Wed, 23 Oct 2019 20:37:35 +0000 Subject: [11u] RFR: 8231085: C2/GC: Better GC-interface for expanding clone In-Reply-To: <582a0140-7b8e-cdd6-d0d3-d58c0964ccb0@redhat.com> References: <582a0140-7b8e-cdd6-d0d3-d58c0964ccb0@redhat.com> Message-ID: <360A0C79-6CBF-467A-AF49-EB9F9CD003AC@amazon.com> Ok. Still a tiny skipped change. Paul ?On 10/23/19, 7:30 AM, "hotspot-compiler-dev on behalf of Roman Kennke" wrote: I would like to backport the recent GC interface for expanding clones to jdk11u. This is a prerequisite to backport related Shenandoah changes to 11u without making a mess. The change differs from the original jdk14 change because it basically skips the intermediate GC interface for the same thing that's been introduced in jdk12. This one wholly replaces that. Bug: https://bugs.openjdk.java.net/browse/JDK-8231085 Original webrev: http://cr.openjdk.java.net/~rkennke/JDK-8231085/webrev.00/ JDK11u webrev: http://cr.openjdk.java.net/~rkennke/JDK-8231085/webrev.00.jdk11u/ Testing: tier1 and tier2 no regressions Good? Roman From stefan.johansson at oracle.com Wed Oct 23 20:48:14 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 23 Oct 2019 22:48:14 +0200 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <531dc0fe-236d-110b-65ee-d224ac028130@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> <1898AC1E-0A8C-467C-9CA9-4B02C00A3A07@oracle.com> <7f150234-4080-b2f9-a791-b456038af795@oracle.com> <8126d900-714b-585a-f2f0-4ce13f71501c@oracle.com> <531dc0fe-236d-110b-65ee-d224ac028130@oracle.com> Message-ID: <3BD075B5-12B8-4516-AB0C-5CFAEC10BF30@oracle.com> > 23 okt. 2019 kl. 10:39 skrev Thomas Schatzl : > > Hi Stefan, > > On 23.10.19 09:05, Stefan Johansson wrote: >> Hi Thomas, >> On 2019-10-22 15:45, Thomas Schatzl wrote: >>> Hi Kim, >>> >>> On 22.10.19 15:44, Kim Barrett wrote: >>>>> On Oct 22, 2019, at 6:13 AM, Thomas Schatzl > [...]>>>> Webrevs: >>>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2_to_3/ (diff) >>>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3/ (full) >> This looks good, and well documented :) >> One small thing: >> src/hotspot/share/gc/g1/g1SharedClosures.hpp >> --- >> 46 _codeblobs(pss->worker_id(), &_oops, Mark == G1MarkFromRoot) {} >> What do you think about adding a helper for Mark == G1MarkFromRoot, something like need_strong_processing() and a comment explaining that it will be true during initial mark. > > Something like this? > > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3_to_4/ (diff) > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.4/ (full) > > Not completely sure if that is required as searching for G1MarkFromRoot shows that it is only used for the strong shared closures in the initial mark closure set. But I understand that it is nice to be reminded about this. > Thanks for addressing it, look good! Stefan > Thanks for your and Kim's reviews. > > Thanks, > Thomas From rkennke at redhat.com Wed Oct 23 20:50:11 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 23 Oct 2019 22:50:11 +0200 Subject: RFR (XS) 8232908: Shenandoah: compact heuristics has incorrect trigger "Free is lower than allocated recently" In-Reply-To: <97e90b6e-524b-99d0-9d15-464c23553d28@redhat.com> References: <97e90b6e-524b-99d0-9d15-464c23553d28@redhat.com> Message-ID: <21225bdb-381b-44b7-030d-0b147ca1b5fd@redhat.com> Ok, good. Thanks, Roman > Bug: > https://bugs.openjdk.java.net/browse/JDK-8232908 > > See the discussion in the bug. The fix is to remove the offending trigger: > > diff -r da4578a0f73d src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp > --- a/src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp Mon Sep 30 > 22:39:11 2019 +0200 > +++ b/src/hotspot/share/gc/shenandoah/heuristics/shenandoahCompactHeuristics.cpp Wed Oct 23 > 20:14:48 2019 +0200 > @@ -66,11 +66,4 @@ > } > > - if (available < threshold_bytes_allocated) { > - log_info(gc)("Trigger: Free (" SIZE_FORMAT "%s) is lower than allocated recently (" SIZE_FORMAT > "%s)", > - byte_size_in_proper_unit(available), > proper_unit_for_byte_size(available), > - byte_size_in_proper_unit(threshold_bytes_allocated), > proper_unit_for_byte_size(threshold_bytes_allocated)); > - return true; > - } > - > size_t bytes_allocated = heap->bytes_allocated_since_gc_start(); > if (bytes_allocated > threshold_bytes_allocated) { > > > Testing: hotspot_gc_shenandoah > From suenaga at oss.nttdata.com Thu Oct 24 00:49:55 2019 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Thu, 24 Oct 2019 09:49:55 +0900 Subject: Segmentation Fault occurs when ClassLoader and Metaspace is released in JDK 8 In-Reply-To: <314f9ad2-17df-1082-8816-7af73a96e9fb@nttcom.co.jp_1> References: <422c9ca2-5053-c761-cb61-f075877bb666@oss.nttdata.com> <314f9ad2-17df-1082-8816-7af73a96e9fb@nttcom.co.jp_1> Message-ID: <1ccb4f35-7f21-4aa2-4cbb-b75244b6d12d@oss.nttdata.com> Hi Osamu, I guess this is a bug in combination of Metaspace and CMS. However current jdk/jdk has different implementation, so it might not be occur in modern JDK. I want to hear the comments from others. My comments is below: On 2019/10/23 18:57, Osamu Sakamoto wrote: > Hi Yasumasa, > > Thank you for answering. > > > What JVM options did you pass? > The following is the JVM options I passed. > ----------------------------------------------------------------- > -Xmx2048m > -Xms2048m > -XX:NewSize=412m > -XX:MaxNewSize=412m > -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=15 > -XX:+UseConcMarkSweepGC > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=80 > -XX:+CMSClassUnloadingEnabled > -XX:CompressedClassSpaceSize=64m > -XX:+PrintGCDetails > -XX:+PrintGCDateStamps > -XX:+UseGCLogFileRotation > -XX:GCLogFileSize=0 > -Xloggc:/var/log/tomcatm0/gc-%p.log > -XX:+HeapDumpOnOutOfMemoryError > -XX:+AlwaysLockClassLoader > ----------------------------------------------------------------- > > > > I guess you used CMS because this problem seems to occur on CMS only [1] [2]. > Yes, I used CMS. > > > So it might be work around not to use CMS. > Thank you for telling me work around. > But it is difficult to change the GC method, so we would like to solve this issue with CMS GC if possible. > > > > I'm not sure root cause of this issue, but it seems to break ClassLoaderDataGraph::_unloading. > > (like double free (delete) of CLD) > I checked whether the ClassLoaderDataGraph::_unloading is broken or not, but I didn't know because of the value has been cleaered by NULL or optimized out. > > Referring ClassLoaderDataGraph[1].cpp, it looks like that _unloading value is saved to ClassLoaderDataGraph::_saved_unloading. > But _saved_unloading had been cleared by NULL, too. > > Is there any other way to check it? > > [1]http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.cpp#l753 > > ----------------------------------------------------------------- > (gdb) f 10 > #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 > 818??? ??? delete purge_me; > (gdb) list ClassLoaderDataGraph::purge > 810??? void ClassLoaderDataGraph::purge() { > 811??? ? assert(SafepointSynchronize::is_at_safepoint(), "must be at safepoint!"); > 812??? ? ClassLoaderData* list = _unloading; > 813??? ? _unloading = NULL; > 814??? ? ClassLoaderData* next = list; > 815??? ? while (next != NULL) { > 816??? ??? ClassLoaderData* purge_me = next; > 817??? ??? next = purge_me->next(); > 818??? ??? delete purge_me; > 819??? ? } > 820??? ? Metaspace::purge(); > 821??? } > (gdb) p _unloading > $29 = (ClassLoaderData *) 0x0 > (gdb) p list > $30 = > (gdb) p next > $31 = > (gdb) p ClassLoaderDataGraph::_saved_unloading > $32 = (ClassLoaderData *) 0x0 > ----------------------------------------------------------------- AFAICS you cannot find head of _unloading at this point. However you can traverse CLD list with purge_me->_next . BTW, CLD has OOP for class loader in ClassLoaderData::_class_loader . If you check it on (CL)HSDB, you might get any hints from it. For example, use system class loader instead of custom class loader from framework. Thanks, Yasumasa > Thanks, > Osamu > > On 10/21/19 22:29, Yasumasa Suenaga wrote: >> Hi Osamu, >> >> What JVM options did you pass? >> >> I guess you used CMS because this problem seems to occur on CMS only [1] [2]. >> So it might be work around not to use CMS. >> >> I'm not sure root cause of this issue, but it seems to break ClassLoaderDataGraph::_unloading. >> (like double free (delete) of CLD) >> >> >> Thanks, >> >> Yasumasa >> >> >> [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.hpp#l100 >> [2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp#l6384 >> >> >> On 2019/10/21 17:50, Osamu Sakamoto wrote: >>> Hi all, >>> >>> I have a problem about Segmentation Fault(SEGV) in GC and I can't make the cause clear. >>> Could you help me solve the problem? >>> >>> Our System uses OpenJDK 1.8.0.181, and crashed by SEGV when purging ClassLoader at safepoint. >>> This problem can't be reproduced, but this has happened 4 times in a few months. >>> >>> The following is the summary of my investigation. >>> >>> ============================================================================= >>> >>> First I checked hs_err, and that shows that the SEGV occurred. >>> VM_Operation is GenCollectForAllocation at safepoint. >>> >>> ----------------------------------------------------------------------------- >>> # >>> # A fatal error has been detected by the Java Runtime Environment: >>> # >>> #? SIGSEGV (0xb) at pc=0x00007f6080c97f88, pid=23931, tid=0x00007f607c3ed700 >>> # >>> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13) >>> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 compressed oops) >>> # Problematic frame: >>> # V? [libjvm.so+0x84bf88] >>> # >>> # Core dump written. Default location: /opt/tomcate0/core or core.23931 >>> # >>> # If you would like to submit a bug report, please visit: >>> #?? http://bugreport.java.com/bugreport/crash.jsp >>> # >>> >>> ---------------? T H R E A D? --------------- >>> >>> Current thread (0x00007f6078c00000):? VMThread [stack: 0x00007f607c2ed000,0x00007f607c3ee000] [id=23939] >>> >>> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000018 >>> >>> Registers: >>> RAX=0x0000000000000010, RBX=0x00007f5ff800ad30, RCX=0x0000000000000010, RDX=0x0000000000000000 >>> RSP=0x00007f607c3ecb50, RBP=0x00007f607c3ecb80, RSI=0x0000000000000002, RDI=0x0000000001cfe570 >>> R8 =0x00007f5ff80ae320, R9 =0x00007f5ff8052480, R10=0x0000000000000000, R11=0x0000000000000400 >>> R12=0x0000000001cfe570, R13=0x00007f6081419470, R14=0x0000000000000002, R15=0x00007f6081418640 >>> RIP=0x00007f6080c97f88, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000004 >>> ?? TRAPNO=0x000000000000000e >>> >>> Top of Stack: (sp=0x00007f607c3ecb50) >>> 0x00007f607c3ecb50:?? 00007f607c3ecba0 00007f5ff800ad30 >>> 0x00007f607c3ecb60:?? 00007f5ff800ad00 0000000000000000 >>> 0x00007f607c3ecb70:?? 0000000000000000 0000000000000001 >>> 0x00007f607c3ecb80:?? 00007f607c3ecba0 00007f6080c995fa >>> 0x00007f607c3ecb90:?? 00007f5ff800ad00 00007f5ff800ac20 >>> 0x00007f607c3ecba0:?? 00007f607c3ecbc0 00007f60808bff5e >>> 0x00007f607c3ecbb0:?? 00007f5ff800ac20 00007f5ff8052870 >>> 0x00007f607c3ecbc0:?? 00007f607c3ecbe0 00007f60808c0f0f >>> 0x00007f607c3ecbd0:?? 00007f607c3ecbf0 00007f608140f308 >>> 0x00007f607c3ecbe0:?? 00007f607c3ecc30 00007f6080daa0b7 >>> 0x00007f607c3ecbf0:?? 00007f6069000100 0000000000000000 >>> 0x00007f607c3ecc00:?? 00007f607c3ecc20 00007f6080ed0800 >>> 0x00007f607c3ecc10:?? 00000000000000f9 88e95c3ba257ab00 >>> 0x00007f607c3ecc20:?? 431bde82d7b634db 00007f607800aa00 >>> 0x00007f607c3ecc30:?? 00007f607c3eccc0 00007f6080daa9d5 >>> 0x00007f607c3ecc40:?? 0000000000000000 00007f607803bf20 >>> 0x00007f607c3ecc50:?? 00007f607803be20 00000000000003e8 >>> 0x00007f607c3ecc60:?? 0000000000000001 00007f6078c00000 >>> 0x00007f607c3ecc70:?? 00007f607c3eccc0 0000000000000000 >>> 0x00007f607c3ecc80:?? 00000004000000f9 00007f60813e2b99 >>> 0x00007f607c3ecc90:?? 00007f607803bfa0 00007f6078c00000 >>> 0x00007f607c3ecca0:?? 0000000000000000 0000000000000000 >>> 0x00007f607c3eccb0:?? 00007f6081418bd0 00007f607803bf20 >>> 0x00007f607c3eccc0:?? 00007f607c3ece60 00007f6080f2048a >>> 0x00007f607c3eccd0:?? 00007f607c3ecd20 00007f607c3ecce0 >>> 0x00007f607c3ecce0:?? 00007f6078c00000 00007f6078c00980 >>> 0x00007f607c3eccf0:?? 00007f6078c009c0 00007f6078c009d0 >>> 0x00007f607c3ecd00:?? 00007f6078c00aa8 00000000000000d8 >>> 0x00007f607c3ecd10:?? 00007f6078c00be0 0000000000000000 >>> 0x00007f607c3ecd20:?? 00007f607c3ecd28 6e69747563657845 >>> 0x00007f607c3ecd30:?? 65706f204d562067 203a6e6f69746172 >>> 0x00007f607c3ecd40:?? 656c6c6f436e6547 6c6c41726f467463 >>> >>> Instructions: (pc=0x00007f6080c97f88) >>> 0x00007f6080c97f68:?? b6 12 80 fa 00 74 01 f0 48 0f c1 01 31 c9 31 f6 >>> 0x00007f6080c97f78:?? 48 8b 44 0b 10 31 d2 48 85 c0 74 11 0f 1f 40 00 >>> 0x00007f6080c97f88:?? 48 8b 40 08 48 83 c2 01 48 85 c0 75 f3 48 83 c1 >>> 0x00007f6080c97f98:?? 08 48 01 d6 48 83 f9 20 75 d6 8b 7b 08 48 8b 05 >>> >>> Register to memory mapping: >>> >>> RAX=0x0000000000000010 is an unknown value >>> RBX=0x00007f5ff800ad30 is an unknown value >>> RCX=0x0000000000000010 is an unknown value >>> RDX=0x0000000000000000 is an unknown value >>> RSP=0x00007f607c3ecb50 is an unknown value >>> RBP=0x00007f607c3ecb80 is an unknown value >>> RSI=0x0000000000000002 is an unknown value >>> RDI=0x0000000001cfe570 is an unknown value >>> R8 =0x00007f5ff80ae320 is an unknown value >>> R9 =0x00007f5ff8052480 is an unknown value >>> R10=0x0000000000000000 is an unknown value >>> R11=0x0000000000000400 is an unknown value >>> R12=0x0000000001cfe570 is an unknown value >>> R13=0x00007f6081419470: in /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so at 0x00007f608044c000 >>> R14=0x0000000000000002 is an unknown value >>> R15=0x00007f6081418640: in /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so at 0x00007f608044c000 >>> >>> >>> Stack: [0x00007f607c2ed000,0x00007f607c3ee000], sp=0x00007f607c3ecb50, free space=1022k >>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >>> V? [libjvm.so+0x84bf88] >>> V? [libjvm.so+0x84d5fa] >>> V? [libjvm.so+0x473f5e] >>> V? [libjvm.so+0x474f0f] >>> V? [libjvm.so+0x95e0b7] >>> V? [libjvm.so+0x95e9d5] >>> V? [libjvm.so+0xad448a] >>> V? [libjvm.so+0xad48f1] >>> V? [libjvm.so+0x8beb82] >>> >>> VM_Operation (0x00007f5fd69e6120): GenCollectForAllocation, mode: safepoint, requested by thread 0x00007f6079013800 >>> >>> ... >>> ----------------------------------------------------------------------------- >>> >>> >>> >>> Next, I used GDB to check the backtrace of the SEGV thread from the coredump. >>> The following is the backtrace. >>> The SEGV occurred when ClassLoader is purged and Metaspace is destructed. >>> And frame #7 shows that a signal(SEGV) handler is called after SpaceManager::~SpaceManager() is executed. >>> >>> ----------------------------------------------------------------------------- >>> (gdb) bt >>> #0? 0x00007f608146f1f7 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >>> #1? 0x00007f60814708e8 in __GI_abort () at abort.c:90 >>> #2? 0x00007f6080d0bc39 in os::abort (dump_core=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1519 >>> #3? 0x00007f6080f1b816 in VMError::report_and_die (this=this at entry=0x7f607c3ebd10) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060 >>> #4? 0x00007f6080d15927 in JVM_handle_linux_signal (sig=11, info=0x7f607c3ebfb0, ucVoid=0x7f607c3ebe80, abort_if_unrecognized=) >>> ???? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 >>> #5? 0x00007f6080d09038 in signalHandler (sig=11, info=0x7f607c3ebfb0, uc=0x7f607c3ebe80) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4446 >>> #6? >>> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, __in_chrg=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >>> #8? 0x00007f6080c995fa in Metaspace::~Metaspace (this=0x7f5ff800ad00, __in_chrg=) >>> ???? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2971 >>> #9? 0x00007f60808bff5e in ClassLoaderData::~ClassLoaderData (this=0x7f5ff800ac20, __in_chrg=) >>> ???? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:383 >>> #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 >>> #11 0x00007f6080daa0b7 in ClassLoaderDataGraph::purge_if_needed () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.hpp:104 >>> #12 SafepointSynchronize::do_cleanup_tasks () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:551 >>> #13 0x00007f6080daa9d5 in SafepointSynchronize::begin () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:402 >>> #14 0x00007f6080f2048a in VMThread::loop (this=this at entry=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:501 >>> #15 0x00007f6080f208f1 in VMThread::run (this=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:276 >>> #16 0x00007f6080d0ab82 in java_start (thread=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:796 >>> #17 0x00007f6081e2de25 in start_thread (arg=0x7f607c3ed700) at pthread_create.c:308 >>> #18 0x00007f608153234d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 >>> ----------------------------------------------------------------------------- >>> >>> >>> In Frame #7, Line 2028 (chunk = chunk->next()) is the crash point. >>> The variable "chunk" is defined at Line 2025 (Metachunk* chunk = chunks_in_use(i);). >>> "chunks_in_use(i)" is defined at Line 648 (Metachunk* chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; }). >>> So I checked values of "_chunks_in_use", and understood that "_chunks_in_use[2]" has Illegal Address "0x10". >>> Therefore, I think that the SEGV occurred because of referencing Illegal Address "0x10" at "chunk = chunk->next()". >>> >>> ----------------------------------------------------------------------------- >>> (gdb) f 7 >>> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, __in_chrg=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >>> 2028??? ??? chunk = chunk->next(); >>> (gdb) list >>> 2023??? size_t SpaceManager::sum_count_in_chunks_in_use(ChunkIndex i) { >>> 2024??? ? size_t count = 0; >>> 2025??? ? Metachunk* chunk = chunks_in_use(i); >>> 2026??? ? while (chunk != NULL) { >>> 2027??? ??? count++; >>> 2028??? ??? chunk = chunk->next(); >>> 2029??? ? } >>> 2030??? ? return count; >>> 2031??? } >>> 2032 >>> (gdb) list SpaceManager::chunks_in_use >>> 647??? ? // Accessors >>> 648??? ? Metachunk* chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; } >>> ... >>> (gdb) p _chunks_in_use >>> $11 = {0x7f5fcd41c400, 0x7f5fcd41a000, 0x10, 0x0} >>> ----------------------------------------------------------------------------- >>> >>> >>> >>> The following is disassemble code of "SpaceManager::~SpaceManager()". >>> %rax has 0x10 at "0x00007f6080c97f88 <+200>", but I don't understand why this "0x10" is inserted to %rax. >>> >>> ----------------------------------------------------------------------------- >>> (gdb) disas >>> Dump of assembler code for function SpaceManager::~SpaceManager(): >>> ??? 0x00007f6080c97ec0 <+0>:??? push?? %rbp >>> ??? 0x00007f6080c97ec1 <+1>:??? mov??? %rsp,%rbp >>> ??? 0x00007f6080c97ec4 <+4>:??? push?? %r15 >>> ??? 0x00007f6080c97ec6 <+6>:??? push?? %r14 >>> ??? 0x00007f6080c97ec8 <+8>:??? push?? %r13 >>> ??? 0x00007f6080c97eca <+10>:??? push?? %r12 >>> ??? 0x00007f6080c97ecc <+12>:??? push?? %rbx >>> ??? 0x00007f6080c97ecd <+13>:??? mov??? %rdi,%rbx >>> ??? 0x00007f6080c97ed0 <+16>:??? sub??? $0x8,%rsp >>> ??? 0x00007f6080c97ed4 <+20>:??? mov 0x780785(%rip),%r12??????? # 0x7f6081418660 <_ZN12SpaceManager12_expand_lockE> >>> ??? 0x00007f6080c97edb <+27>:??? test?? %r12,%r12 >>> ??? 0x00007f6080c97ede <+30>:??? je???? 0x7f6080c97ee8 >>> ??? 0x00007f6080c97ee0 <+32>:??? mov??? %r12,%rdi >>> ??? 0x00007f6080c97ee3 <+35>:??? callq? 0x7f6080cce2f0 >>> ??? 0x00007f6080c97ee8 <+40>:??? movslq 0x8(%rbx),%rcx >>> ??? 0x00007f6080c97eec <+44>:??? lea 0x78075d(%rip),%rdx??????? # 0x7f6081418650 <_ZN12MetaspaceAux15_capacity_wordsE> >>> ??? 0x00007f6080c97ef3 <+51>:??? lea 0x781576(%rip),%r13??????? # 0x7f6081419470 <_ZN2os16_processor_countE> >>> ??? 0x00007f6080c97efa <+58>:??? lea 0x78073f(%rip),%r15??????? # 0x7f6081418640 <_ZN12MetaspaceAux11_used_wordsE> >>> ??? 0x00007f6080c97f01 <+65>:??? mov??? (%rdx,%rcx,8),%rax >>> ??? 0x00007f6080c97f05 <+69>:??? sub??? 0x40(%rbx),%rax >>> ??? 0x00007f6080c97f09 <+73>:??? mov??? %rax,(%rdx,%rcx,8) >>> ??? 0x00007f6080c97f0d <+77>:??? mov??? 0x38(%rbx),%rax >>> ??? 0x00007f6080c97f11 <+81>:??? movslq 0x8(%rbx),%rdx >>> ??? 0x00007f6080c97f15 <+85>:??? neg??? %rax >>> ??? 0x00007f6080c97f18 <+88>:??? cmpl?? $0x1,0x0(%r13) >>> ??? 0x00007f6080c97f1d <+93>:??? lea??? (%r15,%rdx,8),%rcx >>> ??? 0x00007f6080c97f21 <+97>:??? mov??? $0x1,%edx >>> ??? 0x00007f6080c97f26 <+102>:??? jne??? 0x7f6080c97f32 >>> ??? 0x00007f6080c97f28 <+104>:??? lea 0x74acb4(%rip),%rdx??????? # 0x7f60813e2be3 >>> ??? 0x00007f6080c97f2f <+111>:??? movzbl (%rdx),%edx >>> ??? 0x00007f6080c97f32 <+114>:??? cmp??? $0x0,%dl >>> ??? 0x00007f6080c97f35 <+117>:??? je???? 0x7f6080c97f38 >>> ??? 0x00007f6080c97f37 <+119>:??? lock xadd %rax,(%rcx) >>> ??? 0x00007f6080c97f3c <+124>:??? mov??? 0x48(%rbx),%r14 >>> ??? 0x00007f6080c97f40 <+128>:??? callq? 0x7f6080c951a0 >>> ??? 0x00007f6080c97f45 <+133>:??? movslq 0x8(%rbx),%rdx >>> ??? 0x00007f6080c97f49 <+137>:??? imul?? %r14,%rax >>> ??? 0x00007f6080c97f4d <+141>:??? lea (%r15,%rdx,8),%rcx >>> ??? 0x00007f6080c97f51 <+145>:??? mov??? $0x1,%edx >>> ??? 0x00007f6080c97f56 <+150>:??? neg??? %rax >>> ??? 0x00007f6080c97f59 <+153>:??? cmpl?? $0x1,0x0(%r13) >>> ??? 0x00007f6080c97f5e <+158>:??? jne??? 0x7f6080c97f6a >>> ??? 0x00007f6080c97f60 <+160>:??? lea 0x74ac7c(%rip),%rdx??????? # 0x7f60813e2be3 >>> ??? 0x00007f6080c97f67 <+167>:??? movzbl (%rdx),%edx >>> ??? 0x00007f6080c97f6a <+170>:??? cmp??? $0x0,%dl >>> ??? 0x00007f6080c97f6d <+173>:??? je???? 0x7f6080c97f70 >>> ??? 0x00007f6080c97f6f <+175>:??? lock xadd %rax,(%rcx) >>> ??? 0x00007f6080c97f74 <+180>:??? xor??? %ecx,%ecx >>> ??? 0x00007f6080c97f76 <+182>:??? xor??? %esi,%esi >>> ??? 0x00007f6080c97f78 <+184>:??? mov 0x10(%rbx,%rcx,1),%rax >>> ??? 0x00007f6080c97f7d <+189>:??? xor??? %edx,%edx >>> ??? 0x00007f6080c97f7f <+191>:??? test?? %rax,%rax >>> ??? 0x00007f6080c97f82 <+194>:??? je???? 0x7f6080c97f95 >>> ??? 0x00007f6080c97f84 <+196>:??? nopl?? 0x0(%rax) >>> => 0x00007f6080c97f88 <+200>:??? mov??? 0x8(%rax),%rax >>> ??? 0x00007f6080c97f8c <+204>:??? add??? $0x1,%rdx >>> ??? 0x00007f6080c97f90 <+208>:??? test?? %rax,%rax >>> ... >>> (gdb) info registers >>> rax??????????? 0x10??? 16 >>> rbx??????????? 0x7f5ff800ad30??? 140050159414576 >>> rcx??????????? 0x10??? 16 >>> rdx??????????? 0x0??? 0 >>> rsi??????????? 0x2??? 2 >>> rdi??????????? 0x1cfe570??? 30401904 >>> rbp??????????? 0x7f607c3ecb80??? 0x7f607c3ecb80 >>> rsp??????????? 0x7f607c3ecb50??? 0x7f607c3ecb50 >>> r8???????????? 0x7f5ff80ae320??? 140050160083744 >>> r9???????????? 0x7f5ff8052480??? 140050159707264 >>> r10??????????? 0x0??? 0 >>> r11??????????? 0x400??? 1024 >>> r12??????????? 0x1cfe570??? 30401904 >>> r13??????????? 0x7f6081419470??? 140052462146672 >>> r14??????????? 0x2??? 2 >>> r15??????????? 0x7f6081418640??? 140052462143040 >>> rip??????????? 0x7f6080c97f88??? 0x7f6080c97f88 >>> eflags???????? 0x206??? [ PF IF ] >>> cs???????????? 0x33??? 51 >>> ss???????????? 0x2b??? 43 >>> ds???????????? 0x0??? 0 >>> es???????????? 0x0??? 0 >>> fs???????????? 0x0??? 0 >>> gs???????????? 0x0??? 0 >>> k0???????????? >>> k1???????????? >>> k2???????????? >>> k3???????????? >>> k4???????????? >>> k5???????????? >>> k6???????????? >>> k7???????????? >>> ----------------------------------------------------------------------------- >>> >>> ============================================================================= >>> >>> >>> >>> Does anyone know about this case? >>> >>> Thanks, Osamu >>> >>> > From kim.barrett at oracle.com Thu Oct 24 04:26:04 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 24 Oct 2019 00:26:04 -0400 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <531dc0fe-236d-110b-65ee-d224ac028130@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> <1898AC1E-0A8C-467C-9CA9-4B02C00A3A07@oracle.com> <7f150234-4080-b2f9-a791-b456038af795@oracle.com> <8126d900-714b-585a-f2f0-4ce13f71501c@oracle.com> <531dc0fe-236d-110b-65ee-d224ac028130@oracle.com> Message-ID: <36DA8B60-2E22-4E91-8CD9-35AA0B3A53B3@oracle.com> > On Oct 23, 2019, at 4:39 AM, Thomas Schatzl wrote: > > Hi Stefan, > > On 23.10.19 09:05, Stefan Johansson wrote: >> Hi Thomas, >> On 2019-10-22 15:45, Thomas Schatzl wrote: >>> Hi Kim, >>> >>> On 22.10.19 15:44, Kim Barrett wrote: >>>>> On Oct 22, 2019, at 6:13 AM, Thomas Schatzl > [...]>>>> Webrevs: >>>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2_to_3/ (diff) >>>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3/ (full) >> This looks good, and well documented :) >> One small thing: >> src/hotspot/share/gc/g1/g1SharedClosures.hpp >> --- >> 46 _codeblobs(pss->worker_id(), &_oops, Mark == G1MarkFromRoot) {} >> What do you think about adding a helper for Mark == G1MarkFromRoot, something like need_strong_processing() and a comment explaining that it will be true during initial mark. > > Something like this? > > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3_to_4/ (diff) > http://cr.openjdk.java.net/~tschatzl/8230706/webrev.4/ (full) > > Not completely sure if that is required as searching for G1MarkFromRoot shows that it is only used for the strong shared closures in the initial mark closure set. But I understand that it is nice to be reminded about this. > > Thanks for your and Kim's reviews. > > Thanks, > Thomas Still good. From erik.osterlund at oracle.com Thu Oct 24 10:38:37 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Thu, 24 Oct 2019 12:38:37 +0200 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS Message-ID: Hi, Now that some curling has been performed, paving way for this patch: ??? 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops from non-oops ??? 8229278: Improve hs_err location printing to assume less about GC internals ??? 8229189: Improve JFR leak profiler tracing to deal with discontiguous heaps ??? 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() ??? 8224820: ZGC: Support discontiguous heap reservations ...the remaining thing to do is plugging in a few platform specific ZGC files. This patch does that. Decided to go with mach_vm_map/mach_vm_remap to implement multi-mapping. Previously I didn't want to do that as I couldn't figure out how to mach_vm_remap memory on top of reserved VA (acquired using mmap). But apparently VM_FLAGS_OVERWRITE was the missing ingredient there. With that in place, dodging the terrible ftruncate implementation on macOS seemed like a good idea. That also implies this port supports large pages (unlike other GCs on macOS today). Yay! CR: http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8229358 Thanks, /Erik From thomas.schatzl at oracle.com Thu Oct 24 10:44:39 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 24 Oct 2019 12:44:39 +0200 Subject: RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1 In-Reply-To: <36DA8B60-2E22-4E91-8CD9-35AA0B3A53B3@oracle.com> References: <0F637570-EC97-47C5-B493-B33681133149@oracle.com> <5c6b06b1-de44-3cb7-7fc8-0b641df5f353@oracle.com> <80DA3FD5-C2FA-44BF-83C5-AE0EA6AA3684@oracle.com> <0D820E95-361A-4CAC-9BC3-99C39512D396@oracle.com> <1898AC1E-0A8C-467C-9CA9-4B02C00A3A07@oracle.com> <7f150234-4080-b2f9-a791-b456038af795@oracle.com> <8126d900-714b-585a-f2f0-4ce13f71501c@oracle.com> <531dc0fe-236d-110b-65ee-d224ac028130@oracle.com> <36DA8B60-2E22-4E91-8CD9-35AA0B3A53B3@oracle.com> Message-ID: <56554e4c-d511-73d2-e29d-3b7b51260e51@oracle.com> Hi Stefan, Kim, thanks for your reviews. The change has finally been pushed :) Thomas On 24.10.19 06:26, Kim Barrett wrote: >> On Oct 23, 2019, at 4:39 AM, Thomas Schatzl wrote: >> >> Hi Stefan, >> >> On 23.10.19 09:05, Stefan Johansson wrote: >>> Hi Thomas, >>> On 2019-10-22 15:45, Thomas Schatzl wrote: >>>> Hi Kim, >>>> >>>> On 22.10.19 15:44, Kim Barrett wrote: >>>>>> On Oct 22, 2019, at 6:13 AM, Thomas Schatzl >> [...]>>>> Webrevs: >>>>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2_to_3/ (diff) >>>>>> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3/ (full) >>> This looks good, and well documented :) >>> One small thing: >>> src/hotspot/share/gc/g1/g1SharedClosures.hpp >>> --- >>> 46 _codeblobs(pss->worker_id(), &_oops, Mark == G1MarkFromRoot) {} >>> What do you think about adding a helper for Mark == G1MarkFromRoot, something like need_strong_processing() and a comment explaining that it will be true during initial mark. >> >> Something like this? >> >> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.3_to_4/ (diff) >> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.4/ (full) >> >> Not completely sure if that is required as searching for G1MarkFromRoot shows that it is only used for the strong shared closures in the initial mark closure set. But I understand that it is nice to be reminded about this. >> >> Thanks for your and Kim's reviews. >> >> Thanks, >> Thomas > > Still good. > From thomas.schatzl at oracle.com Thu Oct 24 11:50:27 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 24 Oct 2019 13:50:27 +0200 Subject: RFR (XS): 8232951: TestG1ParallelPhases.java fails with phase NonYoungFreeCSet not found Message-ID: <27fa21ab-d8ac-95ea-3485-7a72116c22f2@oracle.com> Hi all, can I have reviews for this small fix to the TestG1ParallelPhases.java test so that it is more robust? As far as I can tell from the failure and program execution the test tries to force mixed gcs expecting particular JFR events. In particular the failure is about the failure to get the "NonYoungFreeCSet" parallel phase. This event is sent when freeing the remembered sets of an old region (during mixed gc). However due to how the test is set up, while it successfully forces mixed gcs, it does not make sure that there ever is waste in old regions (even if the test sets the threshold to 0.0). In my unsuccessful reproduction tries I noticed that the actual waste at the start of mixed gc is very close to 0.0 (or even 0.0) in all mixed gcs (often something like 8 bytes to reclaim only in total), and that while the threshold forces a mixed gc, sometimes during region selection no old gen regions are selected, and so no freeing of an old gen region and the JFR event occurs. This seems to be correct to me, so I changed the test a little to be sure to actually generate waste. The alternative would be to send a fake "NonYoungFreeCSet" parallel phase jfr event in the collector, but I do not like sending fake events for the sake of a test. I also added some useful logging options in case this occurs again in CI, and curbed the amount of (young only) GCs performed. Note that I did not manage to reproduce the issue myselves - the only occurrence has been a month ago that has been linked to a wrong bug. Obviously the failure (that we do not get any reclaimable old gen region) depends on a lot of other factors. CR: https://bugs.openjdk.java.net/browse/JDK-8232951 Webrev: http://cr.openjdk.java.net/~tschatzl/8232951/webrev/ Testing: 400 runs of the changed test without issues Thanks, Thomas From per.liden at oracle.com Thu Oct 24 12:00:28 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 24 Oct 2019 14:00:28 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> Message-ID: <1fa725ed-7cc9-68f5-0976-5d588ccfec68@oracle.com> Hi Sangheon, On 10/23/19 6:20 PM, sangheon.kim at oracle.com wrote: > Hi Per, > > Thanks for taking a look at this. > > I agree all your comments and here's the webrev. > - All comments from Per. > - Move G1PageBasedVirtualSpace::page_size() near to page_start() from Kim. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.6 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.6.inc > Testing: build test for linux, solaris, windows and mac. Thanks for fixing. os changes look good to me. /Per > > FYI, as I think existing numa related API names and -1 stuff seem not > good, I planned to refine those later after pushing. But as you said > following existing rule and then refine all together later seems better. > > Thanks, > Sangheon > > > On 10/23/19 1:21 AM, Per Liden wrote: >> Hi Sangheon, >> >> I noticed that this patch adds os::numa_get_address_id(). That name is >> misleading as it doesn't return an "address id", but a "numa node id". >> However, the terminology used in the os class for numa node is "group" >> (for example, numa_get_groups_num, numa_get_group_id, etc). So I'd >> suggest we instead name this os::numa_get_group_id(void* address), >> i.e. an overload of os::numa_get_group_id(). >> >> Btw, I think that the numa related names used in the os class are odd, >> but I guess that are brought over from Solaris. We can refine those at >> some later time if we want, but for now I think we should follow the >> naming convention that we have there. >> >> Also, I don't think this function should print warnings, as that's up >> to the caller to decide what to do, what to print, etc. >> >> Furthermore, I suggest we remove os::InvalidNUMAId. Other numa >> functions in the os class returns -1 on error, so I think we should do >> that here too. >> >> Here's a patch with the proposed changes: >> >> >> diff --git a/src/hotspot/os/linux/os_linux.cpp >> b/src/hotspot/os/linux/os_linux.cpp >> --- a/src/hotspot/os/linux/os_linux.cpp >> +++ b/src/hotspot/os/linux/os_linux.cpp >> @@ -3007,7 +3007,7 @@ >> ?? return 0; >> ?} >> >> -int os::numa_get_address_id(void* address) { >> +int os::numa_get_group_id(void* address) { >> ?#ifndef MPOL_F_NODE >> ?#define MPOL_F_NODE???? (1<<0)? // Return next IL mode instead of >> node mask >> ?#endif >> @@ -3016,11 +3016,10 @@ >> ?#define MPOL_F_ADDR???? (1<<1)? // Look up VMA using address >> ?#endif >> >> -? int id = InvalidNUMAId; >> +? int id = 0; >> >> ?? if (syscall(SYS_get_mempolicy, &id, NULL, 0, address, MPOL_F_NODE | >> MPOL_F_ADDR) == -1) { >> -??? warning("Failed to get numa id at " PTR_FORMAT " with errno=%d", >> p2i(address), errno); >> -??? return InvalidNUMAId; >> +??? return -1; >> ?? } >> ?? return id; >> ?} >> diff --git a/src/hotspot/share/gc/g1/g1NUMA.cpp >> b/src/hotspot/share/gc/g1/g1NUMA.cpp >> --- a/src/hotspot/share/gc/g1/g1NUMA.cpp >> +++ b/src/hotspot/share/gc/g1/g1NUMA.cpp >> @@ -164,7 +164,7 @@ >> >> ?uint G1NUMA::index_of_address(HeapWord *address) const { >> ?? int numa_id = os::numa_get_address_id((void*)address); >> -? if (numa_id == os::InvalidNUMAId) { >> +? if (numa_id == -1) { >> ???? return UnknownNodeIndex; >> ?? } else { >> ???? return index_of_node_id(numa_id); >> @@ -201,7 +201,7 @@ >> ?? if (!is_enabled()) { >> ???? return; >> ?? } >> - >> + >> ?? if (size_in_bytes == 0) { >> ???? return; >> ?? } >> diff --git a/src/hotspot/share/runtime/os.hpp >> b/src/hotspot/share/runtime/os.hpp >> --- a/src/hotspot/share/runtime/os.hpp >> +++ b/src/hotspot/share/runtime/os.hpp >> @@ -374,10 +374,7 @@ >> ?? static size_t numa_get_leaf_groups(int *ids, size_t size); >> ?? static bool?? numa_topology_changed(); >> ?? static int??? numa_get_group_id(); >> - >> -? static const int InvalidNUMAId = -1; >> - >> -? static int numa_get_address_id(void* address); >> +? static int??? numa_get_group_id(void* address); >> >> ?? // Page manipulation >> ?? struct page_info { >> >> >> cheers, >> Per >> >> >> On 10/16/19 7:54 PM, sangheon.kim at oracle.com wrote: >>> Hi Kim, Stefan and Thomas, >>> >>> Many thanks for the reviews and suggestions! >>> >>> Kim, >>> I will move page_size() near page_start() before push as you suggested. >>> As you know, all 3 patches will be pushed together though. >>> >>> Thanks, >>> Sangheon >>> >>> >>> On 10/16/19 7:00 AM, Kim Barrett wrote: >>>>> On Oct 15, 2019, at 10:33 AM, sangheon.kim at oracle.com wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Here's revised webrev which addresses: >>>>> 1) G1RegionToSpaceMapper checks mtJavaHeap and then conditionally >>>>> calls G1NUMA::request_memory_on_node() (Kim) >>>>> 2) The signature of G1NUMA::request_memory_on_node(void* address, >>>>> ,) is changed to have actual address instead of page index. (Stefan) >>>>> 3) Some local variable name changes at G1RegionToSpaceMapper. i -> >>>>> region_idx, idx -> page_idx (for local style, used idx instead of >>>>> index) >>>>> >>>>> webrev: >>>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.5/ >>>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.5.inc/ >>>>> Testing: hs-tier 1 ~ 5, with/without UseNUMA >>>> Looks good. >>>> >>>> In g1PageBasedVirtualSpace.cpp, could the newly added definition of >>>> page_size() >>>> be moved to be near the existing definition of page_start()? I don?t >>>> need a new >>>> webrev if you move it. >>>> >>> > From stefan.johansson at oracle.com Thu Oct 24 12:16:03 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 24 Oct 2019 14:16:03 +0200 Subject: [PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance In-Reply-To: References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> <400df998-171a-5bbe-9f3e-01af1781afb4@oracle.com> Message-ID: Hi Haoyu, On 2019-10-23 17:15, Haoyu Li wrote: > Hi Stefan, > > Thanks for your constructive feedback. I've addressed all the issues you > mentioned, and the updated patch is attached in this email. Nice, I will look at the patch next week, but I'll shortly answer your questions right away. > > During refining the patch, I have a couple of questions: > 1) Now the MoveAndUpdateClosure and ShadowClosure assume the destination > address is the very beginning of a region, instead of an arbitrary > address like what it used to be. However, there is an unused function > named PSParallelCompact::move_and_update() uses the MoveAndUpdateClosure > to process a region from its middle, which conflicts with the > assumption. I notice that you removed this function in your patch, and > so did I in the updated patch. Does it matter? Yes, I found this function during my code review and it should be removed, but I think that should be handled as a separate issue. We can do this removal before this patch goes in. > 2) Using the same do_addr() in MoveAndUpdateClosure and ShadowClosure is > doable, but it does not reuse all the code neatly. Because storing the > address of the shadow region in _destination requires extra virtual > functions to handle allocating blocks in the start_array and setting > addresses of deferred objects. In particular, allocate_blocks() and > set_deferred_object_for() in both closures are added. Is it worth > avoiding to use _offset to calculate the shadow_destination? Ok, sounds like it might be better to have specific do_addr() functions then. I'll think some more around this when reviewing the new patch in depth. > > If there are any problems with this patch, please contact me anytime. > I'm more than happy to keep improving the code. Thanks again for reviewing. > Sound good, thanks, Stefan > Best, > Haoyu Li > > > Stefan Johansson > ?2019?10?22??? ??9:42??? > > Hi Haoyu, > > I've reviewed the patch now and have some comments and questions. > > To simplify the review and have a common base to look at I've created a > webrev at: > http://cr.openjdk.java.net/~sjohanss/8220465/00/ > > One general note first, most of the new code uses four space > indentation, in hotspot the standard is two spaces, please change this. > Below are some file by file comments. > > src/hotspot/share/gc/parallel/psCompactionManager.cpp > --- > ? ?53 GrowableArray* ParCompactionManager::_free_shadow = new > (ResourceObj::C_HEAP, mtInternal) GrowableArray(10, true); > ? ?54 Monitor*? ? ? ? ? ? ? ? ParCompactionManager::_monitor = NULL; > > Set _free_shadow to NULL here like the other statics and then create > the > GrowableArray in initialize(). I also think _shadow_region_array or > something like that would be a better name and the monitor should also > be named something that signals that it is used for this array. > --- > ? ?70? ?if (_monitor == NULL) { > ? ?71? ? ? ?_monitor = new Monitor(Mutex::barrier, "CompactionManager > monitor", > ? ?72? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Mutex::_allow_vm_block_flag, > Monitor::_safepoint_check_never); > ? ?73? ?} > > Instead of doing the monitor creation here having to check for NULL, do > it in initialize() below together with the array creation. > --- > > src/hotspot/share/gc/parallel/psParallelCompact.cpp > --- > 2974? ? ? ?if (cur->push()) { > > Correct me if I'm wrong, if this call to push() returns true it means > that nobody else has "stolen" it (used a shadow region to prepare it) > and we mark it as pushed. But when pushed in this code path this is the > end state for this RegionData? If this is the case I think it would be > easier to understand the code if we added another function and state > for > when we "steal" it. Haven't thought very much about the names but I > think you understand what I want to achieve: > Normal path: > UNUSED -> push() -> NORMAL > Steal path: > UNUSED -> steal() -> STOLEN -> fill() -> FILLED -> copy() -> SHADOW > > We could then also assert in set_completed() that the state is either > NORMAL or SHADOW (or if they have a shared end state DONE). As I said > the names can be improved (both for the states and the functions) but I > think we should have names and not just numbers. > --- > > 3060 template > 3061 void PSParallelCompact::fill_region(ParCompactionManager* cm, > size_t region_idx, size_t shadow, size_t offset) > > As I told you this was a big improvement from the first patch, but I > think there is room for even more improvements around the way we > pass in > ignored parameters to MoveAndUpdateClosure. Explaining my idea in text > is harder than code, so I created a patch, what do you think about this? > http://cr.openjdk.java.net/~sjohanss/8220465/00-alt/ > > This alternative is based on 00 and does not take my other comments > into > consideration. So it might have to be altered a bit if you address some > of my other comments/questions. > --- > > 3196 void PSParallelCompact::copy_back(HeapWord *region_addr, HeapWord > *shadow_addr) { > > I think the paramenter should change place, so that it corresponds with > the copy below. > --- > > 3200 bool PSParallelCompact::steal_shadow_region(ParCompactionManager* > cm, size_t ®ion_idx) { > 3201? ? ?size_t& record = cm->shadow_record(); > > Did you consider to just let shadow_record() be a simple getter instead > of getting a reference and then have a next_shadow_record() which > advances it by active_workers? > --- > > 3236 void PSParallelCompact::initialize_steal_record(uint which) { > > I'm having a hard time understanding the details here, or I get that > all > threads should have a separate shadow record, but I'm not sure why > it is > not enough to just do: > size_t record = _summary_data.addr_to_region_idx( > ? ?_space_info[old_space_id].dense_prefix()); > cm->set_shadow_record(record + which); > > As you can see I'm also suggesting adding a setter for shadow_record. > --- > > 3434 ParMarkBitMapClosure::IterationStatus > 3435 ShadowClosure::do_addr(HeapWord* addr, size_t words) { > 3436? ? ?HeapWord* shadow_destination = destination() + _offset; > > Using an offset instead of a given address feels a bit backwards, did > you consider letting the closure keep and update a _shadow_destination > instead? Or would it even be possible to just set destination to be the > shadow region address? In that case it should be possible to just use > the do_addr and other functions from the MoveAndUpdateClosure. > > I see from looking at this particular function that there is one assert > that would have to change: > 3408 > assert(PSParallelCompact::summary_data().calc_new_pointer(source(), > compaction_manager()) == > 3409? ? ? ? ? destination(), "wrong destination"); > > This should be easily fixed by adding a virtual function > check_destination, that has a special implementation for the > ShadowClosure. > --- > > src/hotspot/share/gc/parallel/psParallelCompact.hpp > --- > ? 333? ? ?// Preempt the region to avoid double processes > ? 334? ? ?inline bool push(); > ? 335? ? ?// Mark the region as filled and ready to be copied back > ? 336? ? ?inline bool fill(); > ? 337? ? ?// Preempt the region to copy the shadow region content back > ? 338? ? ?inline bool copy(); > > As mentioned, I think there might be better names for those functions > and the comments. Maybe adding a prefix would make the code more self > explaining. try_push(), mark_filled(), try_copy() and the new > try_steal(). > --- > > Thanks again for providing this patch, I look forward to see an updated > version. > > Cheers, > Stefan > > > On 2019-10-14 15:00, Stefan Johansson wrote: > > Thanks for the quick update Haoyu, > > > > This is a great improvement and I will try to find time to look > into the > > patch in more detail the coming weeks. > > > > Thanks, > > Stefan > > > > On 2019-10-11 14:49, Haoyu Li wrote: > >> Hi Stefan, > >> > >> Thanks for your suggestion! It is very redundant that > >> PSParallelCompact::fill_shadow_region() copies most code from > >> PSParallelCompact::fill_region(), and therefore I've refactored > these > >> two functions to share code as many as possible. And the > attachment is > >> the updated patch. > >> > >> Specifically, the closure, which moves objects, in > >> PSParallelCompact::fill_region() is now declared as a template of > >> either MoveAndUpdateClosure or ShadowClosure. So by controlling the > >> type of closure when invoking the function, we can decide whether to > >> fill a normal region or a shadow one. Thus, almost all code in > >> PSParallelCompact::fill_region() can be reused. > >> > >> Besides, a virtual function named complete_region() is added in both > >> closures to do some work after the filling, such setting states and > >> copying the shadow region back. > >> > >> Thanks again for reviewing the patch, looking forward to your > insights > >> and suggestions! > >> > >> Best Regards, > >> Haoyu Li > >> > >> 2019-10-10 21:50 GMT+08:00, Stefan Johansson > >> >: > >>> Thanks for the clarification =) > >>> > >>> Moving on to the next part, the code in the patch. So this > won't be a > >>> full review of the patch but just an initial comment that I > would like > >>> to be addressed first. > >>> > >>> The new function PSParallelCompact::fill_shadow_region() is > more or less > >>> a copy of PSParallelCompact::fill_region() and I understand > that from a > >>> proof of concept point of view it was the easy (and right) way > to do it. > >>> I would prefer if the code could be refactored so that > fill_region() and > >>> fill_shadow_region() share more code. There might be reasons > that I've > >>> missed, that prevents it, but we should at least explore how > much code > >>> can be shared. > >>> > >>> Thanks, > >>> Stefan > >>> > >>> On 2019-10-10 15:10, Haoyu Li wrote: > >>>> Hi Stefan, > >>>> > >>>> Thanks for your quick response! As to your concern about the > OCA, I am > >>>> the sole author of the patch. And it is the case as what the > agreement > >>>> states. > >>>> Best Regrads, > >>>> Haoyu Li, > >>>> > >>>> > >>>> Stefan Johansson > >>>> >> ?2019?10?10??? ??8:37 > >>>> ??? > >>>> > >>>> ???? Hi, > >>>> > >>>> ???? On 2019-10-10 13:06, Haoyu Li wrote: > >>>> ????? > Hi Stefan, > >>>> ????? > > >>>> ????? > Thanks for your testing! One possible reason for the > >>>> regressions > >>>> in > >>>> ????? > simple tests is that the region dependencies maybe not > heavy > >>>> enough. > >>>> ????? > Because the locality of shadow regions is lower than > that of > >>>> heap > >>>> ????? > regions, writing to shadow regions will be slower than to > >>>> normal > >>>> ????? > regions, and this is a part of the reason why I reuse > shadow > >>>> ???? regions. > >>>> ????? > Therefore, if only a few shadow regions are created > and not > >>>> ???? reused, the > >>>> ????? > overhead may not be amortized. > >>>> > >>>> ???? I guess it is something like this. I thought that for > "easy" heaps > >>>> the > >>>> ???? shadow regions won't be used at all, and should therefor not > >>>> really > >>>> ???? cost > >>>> ???? anything. > >>>> > >>>> ????? > > >>>> ????? > As to the OCA, it is the case that I'm the only person > >>>> signing the > >>>> ????? > agreement. Please let me know if you have any further > >>>> questions. > >>>> ???? Thanks > >>>> ????? > again! > >>>> > >>>> ???? Ok, so you are the sole author of the patch. The important > >>>> part, as > >>>> the > >>>> ???? agreement states, is: > >>>> ???? "no other person or entity, including my employer, has or > will > >>>> have > >>>> ???? rights with respect my contributions" > >>>> > >>>> ???? Is that the case? > >>>> > >>>> ???? Thanks, > >>>> ???? Stefan > >>>> > >>>> ????? > > >>>> ????? > Best Regrads, > >>>> ????? > Haoyu Li > >>>> ????? > > >>>> ????? > Stefan Johansson > >>>> ???? > > >>>> ????? > > >>>> ???? >>> ?2019?10?8??? ?? > >>>> 6:49 > >>>> ???? ??? > >>>> ????? > > >>>> ????? >???? Hi Haoyu, > >>>> ????? > > >>>> ????? >???? I've done some more testing and I haven't seen any > issues > >>>> ???? with the > >>>> ????? >???? patch > >>>> ????? >???? so far and the performance looks promising in most > >>>> cases. For > >>>> ???? simple > >>>> ????? >???? tests I've seen some regressions, but I'm not > really sure > >>>> ???? why. Will do > >>>> ????? >???? some more digging. > >>>> ????? > > >>>> ????? >???? To move forward with this the first thing we need > to do is > >>>> ???? making sure > >>>> ????? >???? that you being covered by the Oracle Contributor > >>>> Agreement is > >>>> ???? enough. > >>>> ????? >?????? From what we can see it is only you as an > individual that > >>>> ???? has signed > >>>> ????? >???? the OCA and in that case it is important that this > >>>> statement > >>>> ???? from the > >>>> ????? >???? OCA is fulfilled: "no other person or entity, > including my > >>>> ???? employer, > >>>> ????? >???? has > >>>> ????? >???? or will have rights with respect my contributions" > >>>> ????? > > >>>> ????? >???? Is this the case for this contribution or should > we have > >>>> the > >>>> ???? university > >>>> ????? >???? sign the OCA as well? For more information > regarding the > >>>> OCA > >>>> ???? please > >>>> ????? >???? refer to: > >>>> ????? > https://www.oracle.com/technetwork/oca-faq-405384.pdf > >>>> ????? > > >>>> ????? >???? Thanks, > >>>> ????? >???? Stefan > >>>> ????? > > >>>> ????? >???? On 2019-09-16 16:02, Haoyu Li wrote: > >>>> ????? >????? > FYI, the evaluation results on OpenJDK 14 are > plotted in > >>>> the > >>>> ????? >???? attachment. > >>>> ????? >????? > I compute the full GC throughput by dividing > the heap > >>>> size > >>>> ???? before > >>>> ????? >???? full > >>>> ????? >????? > GC by the GC pause time, and the results are > arithmetic > >>>> mean > >>>> ????? >???? values of > >>>> ????? >????? > ten runs after a warm-up run. The evaluation is > >>>> conducted on > >>>> a > >>>> ????? >???? machine > >>>> ????? >????? > with dual Intel ?XeonTM E5-2618L v3 CPUs (2 > sockets, 16 > >>>> ???? physical > >>>> ????? >???? cores > >>>> ????? >????? > with SMT enabled) and 64G DRAM. > >>>> ????? >????? > > >>>> ????? >????? > Best Regrads, > >>>> ????? >????? > Haoyu Li, > >>>> ????? >????? > Institute of Parallel and Distributed > Systems(IPADS), > >>>> ????? >????? > School of Software, > >>>> ????? >????? > Shanghai Jiao Tong University > >>>> ????? >????? > > >>>> ????? >????? > > >>>> ????? >????? > Stefan Johansson > >>>> ???? > > >>>> ????? >???? > >>>> ???? >> > >>>> ????? >????? > > >>>> ???? > > >>>> ????? >???? > >>>> ???? >>>> ?2019?9?12??? ? > >>>> ?5:34 > >>>> ????? >???? ??? > >>>> ????? >????? > > >>>> ????? >????? >???? Hi Haoyu, > >>>> ????? >????? > > >>>> ????? >????? >???? I recently came across your patch and I would > >>>> like to > >>>> ???? pick up on > >>>> ????? >????? >???? some of the things Kim mentioned in his > mails. I > >>>> ???? especially want > >>>> ????? >????? >???? evaluate and investigate if this is a > technique > >>>> we can > >>>> ???? use to > >>>> ????? >????? >???? improve the other GCs as well. To start > that work I > >>>> ???? want to > >>>> ????? >???? take the > >>>> ????? >????? >???? patch for a spin in our internal performance > >>>> testing. > >>>> ???? The patch > >>>> ????? >????? >???? doesn?t apply clean to the latest JDK > repository, so > >>>> ???? if you could > >>>> ????? >????? >???? provide an updated patch that would be very > helpful. > >>>> ????? >????? > > >>>> ????? >????? >???? It would also be great if you could share > some more > >>>> ???? information > >>>> ????? >????? >???? around the results presented in the paper. For > >>>> example, > >>>> it > >>>> ????? >???? would be > >>>> ????? >????? >???? good to get the full command lines for the > different > >>>> ????? >???? benchmarks so > >>>> ????? >????? >???? we can run them locally and reproduce the > >>>> ???? results you?ve seen. > >>>> ????? >????? > > >>>> ????? >????? >???? Thanks, > >>>> ????? >????? >???? Stefan > >>>> ????? >????? > > >>>> ????? >????? >>???? 12 mars 2019 kl. 03:21 skrev Haoyu Li > >>>> ???? > > > >>>> ????? >???? >> > >>>> ????? >????? >>???? > >>>> ???? > > > >>>> ???? >>>>: > >>>> ????? >????? >> > >>>> ????? >????? >>???? Hi Kim, > >>>> ????? >????? >> > >>>> ????? >????? >>???? Thanks for reviewing and testing the > patch. If > >>>> there > >>>> ???? are any > >>>> ????? >????? >>???? failures or performance degradation > relevant to the > >>>> ???? work, please > >>>> ????? >????? >>???? let me know and I'll be very happy to keep > >>>> improving > >>>> it. > >>>> ????? >???? Also, any > >>>> ????? >????? >>???? suggestions about code improvements are well > >>>> appreciated. > >>>> ????? >????? >> > >>>> ????? >????? >>???? I'm not quite sure if both G1 and Shenandoah > >>>> have the > >>>> ???? similar > >>>> ????? >????? >>???? region dependency issue, since I haven't > studied > >>>> their > >>>> GC > >>>> ????? >????? >>???? behaviors before. If they have, I'm also > willing to > >>>> ???? propose > >>>> ????? >???? a more > >>>> ????? >????? >>???? general optimization. > >>>> ????? >????? >> > >>>> ????? >????? >>???? As to the memory overhead, I believe it > will be low > >>>> ???? because this > >>>> ????? >????? >>???? patch exploits empty regions in the young > space > >>>> ???? rather than > >>>> ????? >????? >>???? off-heap memory to allocate shadow > regions, and > >>>> also > >>>> ???? reuses the > >>>> ????? >????? >>???? /_source_region/ field of each /RegionData > /to > >>>> record > >>>> the > >>>> ????? >????? >>???? correspongding shadow region index. We only > >>>> introduce > >>>> ???? a new > >>>> ????? >????? >>???? integer filed /_shadow /in the RegionData > class to > >>>> ???? indicate the > >>>> ????? >????? >>???? status of a region, a global /GrowableArray > >>>> ???? _free_shadow/ to > >>>> ????? >???? store > >>>> ????? >????? >>???? the indices of shadow regions, and a global > >>>> ???? /Monitor/ to protect > >>>> ????? >????? >>???? the array. These information might help if > the > >>>> memory > >>>> ???? overhead > >>>> ????? >????? >>???? need to be evaluated. > >>>> ????? >????? >> > >>>> ????? >????? >>???? Looking forward to your insight. > >>>> ????? >????? >> > >>>> ????? >????? >>???? Best Regrads, > >>>> ????? >????? >>???? Haoyu Li, > >>>> ????? >????? >>???? Institute of Parallel and Distributed > >>>> Systems(IPADS), > >>>> ????? >????? >>???? School of Software, > >>>> ????? >????? >>???? Shanghai Jiao Tong University > >>>> ????? >????? >> > >>>> ????? >????? >> > >>>> ????? >????? >>???? Kim Barrett > >>>> ???? > > >>>> ????? >???? > >>>> >> > >>>> ????? >????? >>???? > >>>> ???? > > >>>> ????? >???? > >>>> ???? >>>> ?2019?3?12??? ??6:11 > >>>> ??? > >>>> ????? >????? >> > >>>> ????? >????? >>???????? > On Mar 11, 2019, at 1:45 AM, Kim Barrett > >>>> ????? >????? >>???????? > >>>> ???? > > >>>> ???? >> > >>>> ????? >???? > >>>> ???? > > >>>> ???? >>>> wrote: > >>>> ????? >????? >>???????? > > >>>> ????? >????? >>???????? >> On Jan 24, 2019, at 3:58 AM, Haoyu Li > >>>> ????? >???? > > > >>>> ???? > >> > >>>> ????? >????? >>???????? > >>>> ???? > > >>>> ????? >???? >>>> > >>>> ???? wrote: > >>>> ????? >????? >>???????? >> > >>>> ????? >????? >>???????? >> Hi Kim, > >>>> ????? >????? >>???????? >> > >>>> ????? >????? >>???????? >> I have ported my patch to OpenJDK 13 > >>>> according > >>>> ???? to your > >>>> ????? >????? >>???????? instructions in your last mail, and the > >>>> patch is > >>>> ???? attached in > >>>> ????? >????? >>???????? this mail. The patch does not change > much since > >>>> ???? PSGC is > >>>> ????? >???? indeed > >>>> ????? >????? >>???????? pretty stable. > >>>> ????? >????? >>???????? >> > >>>> ????? >????? >>???????? >> Also, I evaluate the correctness and > >>>> ???? performance of > >>>> ????? >???? PS full > >>>> ????? >????? >>???????? GC with benchmarks from DaCapo, > SPECjvm2008, > >>>> and > >>>> ???? JOlden > >>>> ????? >????? suits > >>>> ????? >????? >>???????? on a machine with dual Intel Xeon > E5-2618L v3 > >>>> CPUs(16 > >>>> ????? >???? physical > >>>> ????? >????? >>???????? cores), 64G DRAM and linux kernel > 4.17. The > >>>> ???? evaluation > >>>> ????? >???? result, > >>>> ????? >????? >>???????? indicating 1.9X GC throughput > improvement on > >>>> ???? average, is > >>>> ????? >????? >>???????? attached, too. > >>>> ????? >????? >>???????? >> > >>>> ????? >????? >>???????? >> However, I have no idea how to > further test > >>>> this > >>>> ????? >???? patch for > >>>> ????? >????? >>???????? both correctness and performance. Can > I please > >>>> ???? get any > >>>> ????? >????? >>???????? guidance from you or some sponsor? > >>>> ????? >????? >>???????? > > >>>> ????? >????? >>???????? > Sorry I missed that you had sent an > updated > >>>> ???? version of the > >>>> ????? >????? >>???????? patch. > >>>> ????? >????? >>???????? > > >>>> ????? >????? >>???????? > I?ve run the full regression suite > across > >>>> ???? Oracle-supported > >>>> ????? >????? >>???????? platforms.? There are some > >>>> ????? >????? >>???????? > failures, but there are almost > always some > >>>> ???? failures in the > >>>> ????? >????? >>???????? later tiers right now.? I?ll start > >>>> ????? >????? >>???????? > looking at them tomorrow to figure out > >>>> whether > >>>> ???? any of them > >>>> ????? >????? >>???????? are relevant. > >>>> ????? >????? >>???????? > > >>>> ????? >????? >>???????? > I?m also planning to run some of our > >>>> performance > >>>> ????? >???? benchmarks. > >>>> ????? >????? >>???????? > > >>>> ????? >????? >>???????? > I?ve lightly skimmed the proposed > changes. > >>>> ???? There might be > >>>> ????? >????? >>???????? some code improvements > >>>> ????? >????? >>???????? > to be made. > >>>> ????? >????? >>???????? > > >>>> ????? >????? >>???????? > I?m also wondering if this technique > >>>> applies to > >>>> ???? other > >>>> ????? >????? >>???????? collectors.? It seems like both G1 and > >>>> ????? >????? >>???????? > Shenandoah full gc?s might have similar > >>>> ???? issues?? If so, a > >>>> ????? >????? >>???????? solution that is ParallelGC-specific > >>>> ????? >????? >>???????? > is less interesting than one that > has broader > >>>> ????? >????? >>???????? applicability.? Though maybe this > optimization > >>>> ????? >????? >>???????? > is less important for G1 and > Shenandoah, > >>>> since > >>>> they > >>>> ????? >???? actively > >>>> ????? >????? >>???????? try to avoid full gc?s. > >>>> ????? >????? >>???????? > > >>>> ????? >????? >>???????? > I?m also not clear on how much > additional > >>>> ???? memory might be > >>>> ????? >????? >>???????? temporarily allocated by this > >>>> ????? >????? >>???????? > mechanism. > >>>> ????? >????? >> > >>>> ????? >????? >>???????? I?ve created a CR for this: > >>>> ????? >????? >> https://bugs.openjdk.java.net/browse/JDK-8220465 > >>>> ????? >????? >> > >>>> ????? >????? > > >>>> ????? > > >>>> > >>> > >> > >> > From per.liden at oracle.com Thu Oct 24 15:47:38 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 24 Oct 2019 17:47:38 +0200 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: References: Message-ID: Hi, On 10/24/19 12:38 PM, erik.osterlund at oracle.com wrote: > Hi, > > Now that some curling has been performed, paving way for this patch: > > ??? 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops > from non-oops > ??? 8229278: Improve hs_err location printing to assume less about GC > internals > ??? 8229189: Improve JFR leak profiler tracing to deal with > discontiguous heaps > ??? 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() > ??? 8224820: ZGC: Support discontiguous heap reservations > > ...the remaining thing to do is plugging in a few platform specific ZGC > files. This patch does that. > Decided to go with mach_vm_map/mach_vm_remap to implement multi-mapping. > Previously I didn't want to do that as I couldn't figure out how to > mach_vm_remap memory on top of reserved VA (acquired using mmap). But > apparently VM_FLAGS_OVERWRITE was the missing ingredient there. With > that in place, dodging the terrible ftruncate implementation on macOS > seemed like a good idea. That also implies this port supports large > pages (unlike other GCs on macOS today). Yay! > > CR: > http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00/ As I've pre-reviewed this code, all my comments have already been addressed. Looks super! /Per > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8229358 > > Thanks, > /Erik From thomas.schatzl at oracle.com Thu Oct 24 15:57:12 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 24 Oct 2019 17:57:12 +0200 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: References: Message-ID: <788a7929-28bc-23c3-d89e-0ced7286fc82@oracle.com> Hi, On 24.10.19 17:47, Per Liden wrote: > Hi, > > On 10/24/19 12:38 PM, erik.osterlund at oracle.com wrote: >> Hi, >> >> Now that some curling has been performed, paving way for this patch: >> >> ???? 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops >> from non-oops >> ???? 8229278: Improve hs_err location printing to assume less about GC >> internals >> ???? 8229189: Improve JFR leak profiler tracing to deal with >> discontiguous heaps >> ???? 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() >> ???? 8224820: ZGC: Support discontiguous heap reservations >> >> ...the remaining thing to do is plugging in a few platform specific >> ZGC files. This patch does that. >> Decided to go with mach_vm_map/mach_vm_remap to implement >> multi-mapping. Previously I didn't want to do that as I couldn't >> figure out how to mach_vm_remap memory on top of reserved VA (acquired >> using mmap). But apparently VM_FLAGS_OVERWRITE was the missing >> ingredient there. With that in place, dodging the terrible ftruncate >> implementation on macOS seemed like a good idea. That also implies >> this port supports large pages (unlike other GCs on macOS today). Yay! Not completely related and not a review: Please file an RFE with a link to this mechanism. It would be nice to do such changes in a generic way so that all collectors benefit in the future, not just one. Its confusing as already is with one collector supporting this and other collectors supporting that, adding to that is not nice. Thanks, Thomas From erik.osterlund at oracle.com Thu Oct 24 16:27:21 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 24 Oct 2019 18:27:21 +0200 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: References: Message-ID: Hi Per, Thanks for the review. /Erik > On 24 Oct 2019, at 17:47, Per Liden wrote: > > ?Hi, > >> On 10/24/19 12:38 PM, erik.osterlund at oracle.com wrote: >> Hi, >> Now that some curling has been performed, paving way for this patch: >> 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops from non-oops >> 8229278: Improve hs_err location printing to assume less about GC internals >> 8229189: Improve JFR leak profiler tracing to deal with discontiguous heaps >> 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() >> 8224820: ZGC: Support discontiguous heap reservations >> ...the remaining thing to do is plugging in a few platform specific ZGC files. This patch does that. >> Decided to go with mach_vm_map/mach_vm_remap to implement multi-mapping. Previously I didn't want to do that as I couldn't figure out how to mach_vm_remap memory on top of reserved VA (acquired using mmap). But apparently VM_FLAGS_OVERWRITE was the missing ingredient there. With that in place, dodging the terrible ftruncate implementation on macOS seemed like a good idea. That also implies this port supports large pages (unlike other GCs on macOS today). Yay! >> CR: >> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00/ > > As I've pre-reviewed this code, all my comments have already been addressed. Looks super! > > /Per > >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8229358 >> Thanks, >> /Erik From stefan.karlsson at oracle.com Thu Oct 24 16:36:42 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 24 Oct 2019 18:36:42 +0200 Subject: RFR: 8232604: ZGC: Make ZVerifyViews mapping and unmapping precise Message-ID: <813a25f8-4540-859d-502d-8ffdfc503f72@oracle.com> Hi all, Please review this patch to make the ZVerifyViews mapping and unmapping precise. https://cr.openjdk.java.net/~stefank/8232604/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8232604 Today, when the ZVerifyViews flag is turned on, we unmap all bad views. The intention is to catch stray-pointer bugs. The current implementation takes a short-cut and unmap all memory en masse. This works for Linux, but not on Windows, where we must be precise in what we unmap. There are three places where allocated pages are registered today: - In the page table - actively used - In the page cache - free pages waiting to be used - In-flight from the alloc queue The proposed patch registers all satisfied alloc requests, lets the requesting threads deregister the satisfied request when the page is received, and makes sure that the GC visits all in-flight satisfied alloc requests when it performs the ZVerifyViews flip. Thanks, StefanK From erik.osterlund at oracle.com Thu Oct 24 16:39:34 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 24 Oct 2019 18:39:34 +0200 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: <788a7929-28bc-23c3-d89e-0ced7286fc82@oracle.com> References: <788a7929-28bc-23c3-d89e-0ced7286fc82@oracle.com> Message-ID: <27583414-08E0-4AEE-A779-F6F8CE2C0F0B@oracle.com> Hi Thomas, Sure I can file an RFE. For anonymous mmaped memory, a seemingly undocumented feature is that you can pass in superpage flags for the mach VM system via the file descriptor parameter. Anyway, I will detail it in the RFE. /Erik > On 24 Oct 2019, at 17:57, Thomas Schatzl wrote: > > ?Hi, > >> On 24.10.19 17:47, Per Liden wrote: >> Hi, >>> On 10/24/19 12:38 PM, erik.osterlund at oracle.com wrote: >>> Hi, >>> >>> Now that some curling has been performed, paving way for this patch: >>> >>> 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops from non-oops >>> 8229278: Improve hs_err location printing to assume less about GC internals >>> 8229189: Improve JFR leak profiler tracing to deal with discontiguous heaps >>> 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() >>> 8224820: ZGC: Support discontiguous heap reservations >>> >>> ...the remaining thing to do is plugging in a few platform specific ZGC files. This patch does that. >>> Decided to go with mach_vm_map/mach_vm_remap to implement multi-mapping. Previously I didn't want to do that as I couldn't figure out how to mach_vm_remap memory on top of reserved VA (acquired using mmap). But apparently VM_FLAGS_OVERWRITE was the missing ingredient there. With that in place, dodging the terrible ftruncate implementation on macOS seemed like a good idea. That also implies this port supports large pages (unlike other GCs on macOS today). Yay! > > Not completely related and not a review: > > Please file an RFE with a link to this mechanism. It would be nice to do such changes in a generic way so that all collectors benefit in the future, not just one. > > Its confusing as already is with one collector supporting this and other collectors supporting that, adding to that is not nice. > > Thanks, > Thomas From per.liden at oracle.com Thu Oct 24 19:40:18 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 24 Oct 2019 21:40:18 +0200 Subject: RFR: 8232604: ZGC: Make ZVerifyViews mapping and unmapping precise In-Reply-To: <813a25f8-4540-859d-502d-8ffdfc503f72@oracle.com> References: <813a25f8-4540-859d-502d-8ffdfc503f72@oracle.com> Message-ID: Looks good! Just one minor nit: ZVerifyViewsFlip(ZPageAllocator* allocator); could become: ZVerifyViewsFlip(const ZPageAllocator* allocator); I don't need to see a new webrev. cheers, Per On 10/24/19 6:36 PM, Stefan Karlsson wrote: > Hi all, > > Please review this patch to make the ZVerifyViews mapping and unmapping > precise. > > https://cr.openjdk.java.net/~stefank/8232604/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232604 > > Today, when the ZVerifyViews flag is turned on, we unmap all bad views. > The intention is to catch stray-pointer bugs. > > The current implementation takes a short-cut and unmap all memory en > masse. This works for Linux, but not on Windows, where we must be > precise in what we unmap. > > There are three places where allocated pages are registered today: > - In the page table - actively used > - In the page cache - free pages waiting to be used > - In-flight from the alloc queue > > The proposed patch registers all satisfied alloc requests, lets the > requesting threads deregister the satisfied request when the page is > received, and makes sure that the GC visits all in-flight satisfied > alloc requests when it performs the ZVerifyViews flip. > > Thanks, > StefanK From stefan.karlsson at oracle.com Thu Oct 24 19:41:31 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 24 Oct 2019 21:41:31 +0200 Subject: RFR: 8232604: ZGC: Make ZVerifyViews mapping and unmapping precise In-Reply-To: References: <813a25f8-4540-859d-502d-8ffdfc503f72@oracle.com> Message-ID: <84f91120-aa7c-1981-01ad-b7edb9cf643d@oracle.com> OK. Thanks! StefanK On 2019-10-24 21:40, Per Liden wrote: > Looks good! Just one minor nit: > > ? ZVerifyViewsFlip(ZPageAllocator* allocator); > > could become: > > ? ZVerifyViewsFlip(const ZPageAllocator* allocator); > > I don't need to see a new webrev. > > cheers, > Per > > On 10/24/19 6:36 PM, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to make the ZVerifyViews mapping and >> unmapping precise. >> >> https://cr.openjdk.java.net/~stefank/8232604/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232604 >> >> Today, when the ZVerifyViews flag is turned on, we unmap all bad >> views. The intention is to catch stray-pointer bugs. >> >> The current implementation takes a short-cut and unmap all memory en >> masse. This works for Linux, but not on Windows, where we must be >> precise in what we unmap. >> >> There are three places where allocated pages are registered today: >> - In the page table - actively used >> - In the page cache - free pages waiting to be used >> - In-flight from the alloc queue >> >> The proposed patch registers all satisfied alloc requests, lets the >> requesting threads deregister the satisfied request when the page is >> received, and makes sure that the GC visits all in-flight satisfied >> alloc requests when it performs the ZVerifyViews flip. >> >> Thanks, >> StefanK From kim.barrett at oracle.com Thu Oct 24 23:05:59 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 24 Oct 2019 19:05:59 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> Message-ID: <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> > On Oct 23, 2019, at 12:20 PM, sangheon.kim at oracle.com wrote: > > Hi Per, > > Thanks for taking a look at this. > > I agree all your comments and here's the webrev. > - All comments from Per. > - Move G1PageBasedVirtualSpace::page_size() near to page_start() from Kim. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.6 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.6.inc > Testing: build test for linux, solaris, windows and mac. > > FYI, as I think existing numa related API names and -1 stuff seem not good, I planned to refine those later after pushing. But as you said following existing rule and then refine all together later seems better. The type of the argument for numa_get_group_id(void* address) should be "const void*". Sorry I didn't notice that earlier. Of course, this will require a const_cast to remove the const qualifier when calling get_mempolicy, but it is better to isolate the workaround for that missing qualifier to that one place. I'm not sure I like the overload for os::numa_get_group_id. While both are getting the numa id associated with something, the associations involved seem pretty different to me. Spelling them out, they could be numa_get_group_id_for_current_thread() numa_get_group_id_for_address(const void* address) Those seem semantically unrelated to me, so violate the usual guidance of only overloading operations that are roughly equivalent (*). Or put another way, one should not need to determine which overload is selected to understand a call site. Of course, "roughly equivalent" is in the eye of the beholder. (*) Operator overloading sometimes violates this on the basis that the syntactic concision of using operators is more important, and there are a limited set of operators. Such violations are often used as an argument against using operator overloading at all. From dean.long at oracle.com Fri Oct 25 00:37:56 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 24 Oct 2019 17:37:56 -0700 Subject: RFR: 8231955: ARM32: Address displacement is 0 for volatile field access because of Unsafe field access. In-Reply-To: <20191015073212.7FCCA319074@aojmv0009> References: <20191010143426.BA4B6319F46@aojmv0009> <20191015073212.7FCCA319074@aojmv0009> Message-ID: The shared code used to call generate_address(), which correctly handles various displacements, but I guess it got lost in the barrier refactoring in jdk11.? I think the correct fix is for the caller to use generate_address() again.? CCing GC alias. Alternatively, the arm code could call generate_address rather than changing the shared code. dl On 10/15/19 12:30 AM, christoph.goettschkes at microdoc.com wrote: > Is there anyone who could take a look at this change and give feedback > please? > > Thanks, > Christoph > > "hotspot-compiler-dev" > wrote on 2019-10-10 16:29:11: > >> From: christoph.goettschkes at microdoc.com >> To: hotspot-compiler-dev at openjdk.java.net >> Date: 2019-10-10 16:35 >> Subject: RFR: 8231955: ARM32: Address displacement is 0 for volatile > field >> access because of Unsafe field access. >> Sent by: "hotspot-compiler-dev" > >> Hi, >> >> please review the following changeset. This patch fixes the volatile > field >> access for 32-bit ARM. The functions LIRGenerator::volatile_field_store >> and LIRGenerator::volatile_field_load both assume that the displacement >> for the given address is always 0. Both use the given address and pass > the >> values to add_large_constant() [1], which asserts that the given >> displacement is not 0. The change does not call add_large_constant if > the >> given displacement is 0. The displacement can be 0, because of the >> implementation of the unsafe intrinsics. This happens, because the > offset >> into the object from which the field is accessed is not a constant > value. >> This fixes the hotspot tier1 tests mentioned in the issue. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231955 >> Webrev: https://cr.openjdk.java.net/~bulasevich/8231955/webrev.00/ >> >> Thanks, >> Christoph >> >> [1] >> > https://hg.openjdk.java.net/jdk/jdk/file/30a9612a657d/src/hotspot/cpu/arm/ >> c1_LIRGenerator_arm.cpp#l166 >> From dean.long at oracle.com Fri Oct 25 00:55:52 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 24 Oct 2019 17:55:52 -0700 Subject: RFR: 8231955: ARM32: Address displacement is 0 for volatile field access because of Unsafe field access. In-Reply-To: References: <20191010143426.BA4B6319F46@aojmv0009> <20191015073212.7FCCA319074@aojmv0009> Message-ID: <587f6363-bbdc-da12-9e50-82acc5bc5853@oracle.com> I see now that BarrierSetC1::resolve_address() is calling generate_address(), at least when access isn't patched.? So now I'm thinking that the address passed to volatile_field_load/volatile_field_store should be correct, and the call to add_large_constant() isn't necessary. dl On 10/24/19 5:37 PM, dean.long at oracle.com wrote: > The shared code used to call generate_address(), which correctly > handles various displacements, but I guess it got lost in the barrier > refactoring in jdk11.? I think the correct fix is for the caller to > use generate_address() again.? CCing GC alias. Alternatively, the arm > code could call generate_address rather than changing the shared code. > > dl > > On 10/15/19 12:30 AM, christoph.goettschkes at microdoc.com wrote: >> Is there anyone who could take a look at this change and give feedback >> please? >> >> Thanks, >> Christoph >> >> "hotspot-compiler-dev" >> wrote on 2019-10-10 16:29:11: >> >>> From: christoph.goettschkes at microdoc.com >>> To: hotspot-compiler-dev at openjdk.java.net >>> Date: 2019-10-10 16:35 >>> Subject: RFR: 8231955: ARM32: Address displacement is 0 for volatile >> field >>> access because of Unsafe field access. >>> Sent by: "hotspot-compiler-dev" >> >>> Hi, >>> >>> please review the following changeset. This patch fixes the volatile >> field >>> access for 32-bit ARM. The functions LIRGenerator::volatile_field_store >>> and LIRGenerator::volatile_field_load both assume that the displacement >>> for the given address is always 0. Both use the given address and pass >> the >>> values to add_large_constant() [1], which asserts that the given >>> displacement is not 0. The change does not call add_large_constant if >> the >>> given displacement is 0. The displacement can be 0, because of the >>> implementation of the unsafe intrinsics. This happens, because the >> offset >>> into the object from which the field is accessed is not a constant >> value. >>> This fixes the hotspot tier1 tests mentioned in the issue. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231955 >>> Webrev: https://cr.openjdk.java.net/~bulasevich/8231955/webrev.00/ >>> >>> Thanks, >>> Christoph >>> >>> [1] >>> >> https://hg.openjdk.java.net/jdk/jdk/file/30a9612a657d/src/hotspot/cpu/arm/ >> >>> c1_LIRGenerator_arm.cpp#l166 >>> > From sakamoto.osamu at nttcom.co.jp Fri Oct 25 08:53:35 2019 From: sakamoto.osamu at nttcom.co.jp (Osamu Sakamoto) Date: Fri, 25 Oct 2019 17:53:35 +0900 Subject: Segmentation Fault occurs when ClassLoader and Metaspace is released in JDK 8 In-Reply-To: <1ccb4f35-7f21-4aa2-4cbb-b75244b6d12d@oss.nttdata.com> References: <422c9ca2-5053-c761-cb61-f075877bb666@oss.nttdata.com> <314f9ad2-17df-1082-8816-7af73a96e9fb@nttcom.co.jp_1> <1ccb4f35-7f21-4aa2-4cbb-b75244b6d12d@oss.nttdata.com> Message-ID: Hi Yasumasa, > I guess this is a bug in combination of Metaspace and CMS. > However current jdk/jdk has different implementation, so it might not be occur in modern JDK. > I want to hear the comments from others. Thank you for your comment. I want to hear from others, too > AFAICS you cannot find head of _unloading at this point. > However you can traverse CLD list with purge_me->_next . Thank you for telling me how to traverse CLD list. I could start to traverse the CLD list, but this list is too long to traverse manually. I recursively chekced _next -> _next -> next ... about 500 times with GDB print command, but NULL termination or address loop isn't found yet. I'll try to find a good way to traverse the CLD list to the end. > BTW, CLD has OOP for class loader in ClassLoaderData::_class_loader . > If you check it on (CL)HSDB, you might get any hints from it. > For example, use system class loader instead of custom class loader from framework. I checked CLD oop, but I don't understand what type of ClassLoader is. The result is below. It looks like that this ClassLoaderData::_class_loader oop indicates character array. Is it normal? If so, what is this class loader?(Bootstrap ClassLoader?) --------------------------------------------------- (gdb) p ClassLoaderData::_class_loader $21 = (oop) 0xa3afc1f0 hsdb> inspect 0xa3afc1f0 instance of [C @ 0x00000000a3afc1f0 @ 0x00000000a3afc1f0 (size = 72) _mark: 1 _metadata._compressed_klass: TypeArrayKlass for [C 0: 'c' 1: 'o' 2: 'l' 3: 'u' 4: 'm' 5: 'n' 6: '1' 7: '5' 8: '6' 9: '5' 10: '7' 11: '5' 12: '5' 13: '9' 14: '8' 15: '6' 16: '3' 17: '3' 18: '1' 19: '_' 20: '8' 21: '0' 22: '0' 23: '3' --------------------------------------------------- Thanks, Osamu On 10/24/19 09:49, Yasumasa Suenaga wrote: > Hi Osamu, > > I guess this is a bug in combination of Metaspace and CMS. > However current jdk/jdk has different implementation, so it might not > be occur in modern JDK. > I want to hear the comments from others. > > My comments is below: > > On 2019/10/23 18:57, Osamu Sakamoto wrote: >> Hi Yasumasa, >> >> Thank you for answering. >> >> ?> What JVM options did you pass? >> The following is the JVM options I passed. >> ----------------------------------------------------------------- >> -Xmx2048m >> -Xms2048m >> -XX:NewSize=412m >> -XX:MaxNewSize=412m >> -XX:SurvivorRatio=8 >> -XX:MaxTenuringThreshold=15 >> -XX:+UseConcMarkSweepGC >> -XX:+UseCMSInitiatingOccupancyOnly >> -XX:CMSInitiatingOccupancyFraction=80 >> -XX:+CMSClassUnloadingEnabled >> -XX:CompressedClassSpaceSize=64m >> -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps >> -XX:+UseGCLogFileRotation >> -XX:GCLogFileSize=0 >> -Xloggc:/var/log/tomcatm0/gc-%p.log >> -XX:+HeapDumpOnOutOfMemoryError >> -XX:+AlwaysLockClassLoader >> ----------------------------------------------------------------- >> >> >> ?> I guess you used CMS because this problem seems to occur on CMS >> only [1] [2]. >> Yes, I used CMS. >> >> ?> So it might be work around not to use CMS. >> Thank you for telling me work around. >> But it is difficult to change the GC method, so we would like to >> solve this issue with CMS GC if possible. >> >> >> ?> I'm not sure root cause of this issue, but it seems to break >> ClassLoaderDataGraph::_unloading. >> ?> (like double free (delete) of CLD) >> I checked whether the ClassLoaderDataGraph::_unloading is broken or >> not, but I didn't know because of the value has been cleaered by NULL >> or optimized out. >> >> Referring ClassLoaderDataGraph[1].cpp, it looks like that _unloading >> value is saved to ClassLoaderDataGraph::_saved_unloading. >> But _saved_unloading had been cleared by NULL, too. >> >> Is there any other way to check it? >> >> [1]http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.cpp#l753 >> >> >> ----------------------------------------------------------------- >> (gdb) f 10 >> #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at >> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 >> 818??? ??? delete purge_me; >> (gdb) list ClassLoaderDataGraph::purge >> 810??? void ClassLoaderDataGraph::purge() { >> 811??? ? assert(SafepointSynchronize::is_at_safepoint(), "must be at >> safepoint!"); >> 812??? ? ClassLoaderData* list = _unloading; >> 813??? ? _unloading = NULL; >> 814??? ? ClassLoaderData* next = list; >> 815??? ? while (next != NULL) { >> 816??? ??? ClassLoaderData* purge_me = next; >> 817??? ??? next = purge_me->next(); >> 818??? ??? delete purge_me; >> 819??? ? } >> 820??? ? Metaspace::purge(); >> 821??? } >> (gdb) p _unloading >> $29 = (ClassLoaderData *) 0x0 >> (gdb) p list >> $30 = >> (gdb) p next >> $31 = >> (gdb) p ClassLoaderDataGraph::_saved_unloading >> $32 = (ClassLoaderData *) 0x0 >> ----------------------------------------------------------------- > > AFAICS you cannot find head of _unloading at this point. > However you can traverse CLD list with purge_me->_next . > > > BTW, CLD has OOP for class loader in ClassLoaderData::_class_loader . > If you check it on (CL)HSDB, you might get any hints from it. > For example, use system class loader instead of custom class loader > from framework. > > > Thanks, > > Yasumasa > > >> Thanks, >> Osamu >> >> On 10/21/19 22:29, Yasumasa Suenaga wrote: >>> Hi Osamu, >>> >>> What JVM options did you pass? >>> >>> I guess you used CMS because this problem seems to occur on CMS only >>> [1] [2]. >>> So it might be work around not to use CMS. >>> >>> I'm not sure root cause of this issue, but it seems to break >>> ClassLoaderDataGraph::_unloading. >>> (like double free (delete) of CLD) >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>> [1] >>> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.hpp#l100 >>> [2] >>> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp#l6384 >>> >>> >>> On 2019/10/21 17:50, Osamu Sakamoto wrote: >>>> Hi all, >>>> >>>> I have a problem about Segmentation Fault(SEGV) in GC and I can't >>>> make the cause clear. >>>> Could you help me solve the problem? >>>> >>>> Our System uses OpenJDK 1.8.0.181, and crashed by SEGV when purging >>>> ClassLoader at safepoint. >>>> This problem can't be reproduced, but this has happened 4 times in >>>> a few months. >>>> >>>> The following is the summary of my investigation. >>>> >>>> ============================================================================= >>>> >>>> >>>> First I checked hs_err, and that shows that the SEGV occurred. >>>> VM_Operation is GenCollectForAllocation at safepoint. >>>> >>>> ----------------------------------------------------------------------------- >>>> >>>> # >>>> # A fatal error has been detected by the Java Runtime Environment: >>>> # >>>> #? SIGSEGV (0xb) at pc=0x00007f6080c97f88, pid=23931, >>>> tid=0x00007f607c3ed700 >>>> # >>>> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build >>>> 1.8.0_181-b13) >>>> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode >>>> linux-amd64 compressed oops) >>>> # Problematic frame: >>>> # V? [libjvm.so+0x84bf88] >>>> # >>>> # Core dump written. Default location: /opt/tomcate0/core or >>>> core.23931 >>>> # >>>> # If you would like to submit a bug report, please visit: >>>> #?? http://bugreport.java.com/bugreport/crash.jsp >>>> # >>>> >>>> ---------------? T H R E A D? --------------- >>>> >>>> Current thread (0x00007f6078c00000):? VMThread [stack: >>>> 0x00007f607c2ed000,0x00007f607c3ee000] [id=23939] >>>> >>>> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: >>>> 0x0000000000000018 >>>> >>>> Registers: >>>> RAX=0x0000000000000010, RBX=0x00007f5ff800ad30, >>>> RCX=0x0000000000000010, RDX=0x0000000000000000 >>>> RSP=0x00007f607c3ecb50, RBP=0x00007f607c3ecb80, >>>> RSI=0x0000000000000002, RDI=0x0000000001cfe570 >>>> R8 =0x00007f5ff80ae320, R9 =0x00007f5ff8052480, >>>> R10=0x0000000000000000, R11=0x0000000000000400 >>>> R12=0x0000000001cfe570, R13=0x00007f6081419470, >>>> R14=0x0000000000000002, R15=0x00007f6081418640 >>>> RIP=0x00007f6080c97f88, EFLAGS=0x0000000000010202, >>>> CSGSFS=0x0000000000000033, ERR=0x0000000000000004 >>>> ?? TRAPNO=0x000000000000000e >>>> >>>> Top of Stack: (sp=0x00007f607c3ecb50) >>>> 0x00007f607c3ecb50:?? 00007f607c3ecba0 00007f5ff800ad30 >>>> 0x00007f607c3ecb60:?? 00007f5ff800ad00 0000000000000000 >>>> 0x00007f607c3ecb70:?? 0000000000000000 0000000000000001 >>>> 0x00007f607c3ecb80:?? 00007f607c3ecba0 00007f6080c995fa >>>> 0x00007f607c3ecb90:?? 00007f5ff800ad00 00007f5ff800ac20 >>>> 0x00007f607c3ecba0:?? 00007f607c3ecbc0 00007f60808bff5e >>>> 0x00007f607c3ecbb0:?? 00007f5ff800ac20 00007f5ff8052870 >>>> 0x00007f607c3ecbc0:?? 00007f607c3ecbe0 00007f60808c0f0f >>>> 0x00007f607c3ecbd0:?? 00007f607c3ecbf0 00007f608140f308 >>>> 0x00007f607c3ecbe0:?? 00007f607c3ecc30 00007f6080daa0b7 >>>> 0x00007f607c3ecbf0:?? 00007f6069000100 0000000000000000 >>>> 0x00007f607c3ecc00:?? 00007f607c3ecc20 00007f6080ed0800 >>>> 0x00007f607c3ecc10:?? 00000000000000f9 88e95c3ba257ab00 >>>> 0x00007f607c3ecc20:?? 431bde82d7b634db 00007f607800aa00 >>>> 0x00007f607c3ecc30:?? 00007f607c3eccc0 00007f6080daa9d5 >>>> 0x00007f607c3ecc40:?? 0000000000000000 00007f607803bf20 >>>> 0x00007f607c3ecc50:?? 00007f607803be20 00000000000003e8 >>>> 0x00007f607c3ecc60:?? 0000000000000001 00007f6078c00000 >>>> 0x00007f607c3ecc70:?? 00007f607c3eccc0 0000000000000000 >>>> 0x00007f607c3ecc80:?? 00000004000000f9 00007f60813e2b99 >>>> 0x00007f607c3ecc90:?? 00007f607803bfa0 00007f6078c00000 >>>> 0x00007f607c3ecca0:?? 0000000000000000 0000000000000000 >>>> 0x00007f607c3eccb0:?? 00007f6081418bd0 00007f607803bf20 >>>> 0x00007f607c3eccc0:?? 00007f607c3ece60 00007f6080f2048a >>>> 0x00007f607c3eccd0:?? 00007f607c3ecd20 00007f607c3ecce0 >>>> 0x00007f607c3ecce0:?? 00007f6078c00000 00007f6078c00980 >>>> 0x00007f607c3eccf0:?? 00007f6078c009c0 00007f6078c009d0 >>>> 0x00007f607c3ecd00:?? 00007f6078c00aa8 00000000000000d8 >>>> 0x00007f607c3ecd10:?? 00007f6078c00be0 0000000000000000 >>>> 0x00007f607c3ecd20:?? 00007f607c3ecd28 6e69747563657845 >>>> 0x00007f607c3ecd30:?? 65706f204d562067 203a6e6f69746172 >>>> 0x00007f607c3ecd40:?? 656c6c6f436e6547 6c6c41726f467463 >>>> >>>> Instructions: (pc=0x00007f6080c97f88) >>>> 0x00007f6080c97f68:?? b6 12 80 fa 00 74 01 f0 48 0f c1 01 31 c9 31 f6 >>>> 0x00007f6080c97f78:?? 48 8b 44 0b 10 31 d2 48 85 c0 74 11 0f 1f 40 00 >>>> 0x00007f6080c97f88:?? 48 8b 40 08 48 83 c2 01 48 85 c0 75 f3 48 83 c1 >>>> 0x00007f6080c97f98:?? 08 48 01 d6 48 83 f9 20 75 d6 8b 7b 08 48 8b 05 >>>> >>>> Register to memory mapping: >>>> >>>> RAX=0x0000000000000010 is an unknown value >>>> RBX=0x00007f5ff800ad30 is an unknown value >>>> RCX=0x0000000000000010 is an unknown value >>>> RDX=0x0000000000000000 is an unknown value >>>> RSP=0x00007f607c3ecb50 is an unknown value >>>> RBP=0x00007f607c3ecb80 is an unknown value >>>> RSI=0x0000000000000002 is an unknown value >>>> RDI=0x0000000001cfe570 is an unknown value >>>> R8 =0x00007f5ff80ae320 is an unknown value >>>> R9 =0x00007f5ff8052480 is an unknown value >>>> R10=0x0000000000000000 is an unknown value >>>> R11=0x0000000000000400 is an unknown value >>>> R12=0x0000000001cfe570 is an unknown value >>>> R13=0x00007f6081419470: in >>>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so >>>> at 0x00007f608044c000 >>>> R14=0x0000000000000002 is an unknown value >>>> R15=0x00007f6081418640: in >>>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so >>>> at 0x00007f608044c000 >>>> >>>> >>>> Stack: [0x00007f607c2ed000,0x00007f607c3ee000], >>>> sp=0x00007f607c3ecb50, free space=1022k >>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >>>> C=native code) >>>> V? [libjvm.so+0x84bf88] >>>> V? [libjvm.so+0x84d5fa] >>>> V? [libjvm.so+0x473f5e] >>>> V? [libjvm.so+0x474f0f] >>>> V? [libjvm.so+0x95e0b7] >>>> V? [libjvm.so+0x95e9d5] >>>> V? [libjvm.so+0xad448a] >>>> V? [libjvm.so+0xad48f1] >>>> V? [libjvm.so+0x8beb82] >>>> >>>> VM_Operation (0x00007f5fd69e6120): GenCollectForAllocation, mode: >>>> safepoint, requested by thread 0x00007f6079013800 >>>> >>>> ... >>>> ----------------------------------------------------------------------------- >>>> >>>> >>>> >>>> >>>> Next, I used GDB to check the backtrace of the SEGV thread from the >>>> coredump. >>>> The following is the backtrace. >>>> The SEGV occurred when ClassLoader is purged and Metaspace is >>>> destructed. >>>> And frame #7 shows that a signal(SEGV) handler is called after >>>> SpaceManager::~SpaceManager() is executed. >>>> >>>> ----------------------------------------------------------------------------- >>>> >>>> (gdb) bt >>>> #0? 0x00007f608146f1f7 in __GI_raise (sig=sig at entry=6) at >>>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >>>> #1? 0x00007f60814708e8 in __GI_abort () at abort.c:90 >>>> #2? 0x00007f6080d0bc39 in os::abort (dump_core=) at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1519 >>>> #3? 0x00007f6080f1b816 in VMError::report_and_die >>>> (this=this at entry=0x7f607c3ebd10) at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060 >>>> #4? 0x00007f6080d15927 in JVM_handle_linux_signal (sig=11, >>>> info=0x7f607c3ebfb0, ucVoid=0x7f607c3ebe80, >>>> abort_if_unrecognized=) >>>> ???? at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 >>>> #5? 0x00007f6080d09038 in signalHandler (sig=11, >>>> info=0x7f607c3ebfb0, uc=0x7f607c3ebe80) at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4446 >>>> #6? >>>> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, >>>> __in_chrg=) at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >>>> #8? 0x00007f6080c995fa in Metaspace::~Metaspace >>>> (this=0x7f5ff800ad00, __in_chrg=) >>>> ???? at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2971 >>>> #9? 0x00007f60808bff5e in ClassLoaderData::~ClassLoaderData >>>> (this=0x7f5ff800ac20, __in_chrg=) >>>> ???? at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:383 >>>> #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 >>>> >>>> #11 0x00007f6080daa0b7 in ClassLoaderDataGraph::purge_if_needed () >>>> at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.hpp:104 >>>> #12 SafepointSynchronize::do_cleanup_tasks () at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:551 >>>> #13 0x00007f6080daa9d5 in SafepointSynchronize::begin () at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:402 >>>> >>>> #14 0x00007f6080f2048a in VMThread::loop >>>> (this=this at entry=0x7f6078c00000) at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:501 >>>> #15 0x00007f6080f208f1 in VMThread::run (this=0x7f6078c00000) at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:276 >>>> #16 0x00007f6080d0ab82 in java_start (thread=0x7f6078c00000) at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:796 >>>> #17 0x00007f6081e2de25 in start_thread (arg=0x7f607c3ed700) at >>>> pthread_create.c:308 >>>> #18 0x00007f608153234d in clone () at >>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 >>>> ----------------------------------------------------------------------------- >>>> >>>> >>>> >>>> In Frame #7, Line 2028 (chunk = chunk->next()) is the crash point. >>>> The variable "chunk" is defined at Line 2025 (Metachunk* chunk = >>>> chunks_in_use(i);). >>>> "chunks_in_use(i)" is defined at Line 648 (Metachunk* >>>> chunks_in_use(ChunkIndex index) const { return >>>> _chunks_in_use[index]; }). >>>> So I checked values of "_chunks_in_use", and understood that >>>> "_chunks_in_use[2]" has Illegal Address "0x10". >>>> Therefore, I think that the SEGV occurred because of referencing >>>> Illegal Address "0x10" at "chunk = chunk->next()". >>>> >>>> ----------------------------------------------------------------------------- >>>> >>>> (gdb) f 7 >>>> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, >>>> __in_chrg=) at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >>>> 2028??? ??? chunk = chunk->next(); >>>> (gdb) list >>>> 2023??? size_t SpaceManager::sum_count_in_chunks_in_use(ChunkIndex >>>> i) { >>>> 2024??? ? size_t count = 0; >>>> 2025??? ? Metachunk* chunk = chunks_in_use(i); >>>> 2026??? ? while (chunk != NULL) { >>>> 2027??? ??? count++; >>>> 2028??? ??? chunk = chunk->next(); >>>> 2029??? ? } >>>> 2030??? ? return count; >>>> 2031??? } >>>> 2032 >>>> (gdb) list SpaceManager::chunks_in_use >>>> 647??? ? // Accessors >>>> 648??? ? Metachunk* chunks_in_use(ChunkIndex index) const { return >>>> _chunks_in_use[index]; } >>>> ... >>>> (gdb) p _chunks_in_use >>>> $11 = {0x7f5fcd41c400, 0x7f5fcd41a000, 0x10, 0x0} >>>> ----------------------------------------------------------------------------- >>>> >>>> >>>> >>>> >>>> The following is disassemble code of "SpaceManager::~SpaceManager()". >>>> %rax has 0x10 at "0x00007f6080c97f88 <+200>", but I don't >>>> understand why this "0x10" is inserted to %rax. >>>> >>>> ----------------------------------------------------------------------------- >>>> >>>> (gdb) disas >>>> Dump of assembler code for function SpaceManager::~SpaceManager(): >>>> ??? 0x00007f6080c97ec0 <+0>:??? push?? %rbp >>>> ??? 0x00007f6080c97ec1 <+1>:??? mov??? %rsp,%rbp >>>> ??? 0x00007f6080c97ec4 <+4>:??? push?? %r15 >>>> ??? 0x00007f6080c97ec6 <+6>:??? push?? %r14 >>>> ??? 0x00007f6080c97ec8 <+8>:??? push?? %r13 >>>> ??? 0x00007f6080c97eca <+10>:??? push?? %r12 >>>> ??? 0x00007f6080c97ecc <+12>:??? push?? %rbx >>>> ??? 0x00007f6080c97ecd <+13>:??? mov??? %rdi,%rbx >>>> ??? 0x00007f6080c97ed0 <+16>:??? sub??? $0x8,%rsp >>>> ??? 0x00007f6080c97ed4 <+20>:??? mov 0x780785(%rip),%r12??????? # >>>> 0x7f6081418660 <_ZN12SpaceManager12_expand_lockE> >>>> ??? 0x00007f6080c97edb <+27>:??? test?? %r12,%r12 >>>> ??? 0x00007f6080c97ede <+30>:??? je???? 0x7f6080c97ee8 >>>> >>>> ??? 0x00007f6080c97ee0 <+32>:??? mov??? %r12,%rdi >>>> ??? 0x00007f6080c97ee3 <+35>:??? callq? 0x7f6080cce2f0 >>>> >>>> ??? 0x00007f6080c97ee8 <+40>:??? movslq 0x8(%rbx),%rcx >>>> ??? 0x00007f6080c97eec <+44>:??? lea 0x78075d(%rip),%rdx??????? # >>>> 0x7f6081418650 <_ZN12MetaspaceAux15_capacity_wordsE> >>>> ??? 0x00007f6080c97ef3 <+51>:??? lea 0x781576(%rip),%r13??????? # >>>> 0x7f6081419470 <_ZN2os16_processor_countE> >>>> ??? 0x00007f6080c97efa <+58>:??? lea 0x78073f(%rip),%r15??????? # >>>> 0x7f6081418640 <_ZN12MetaspaceAux11_used_wordsE> >>>> ??? 0x00007f6080c97f01 <+65>:??? mov (%rdx,%rcx,8),%rax >>>> ??? 0x00007f6080c97f05 <+69>:??? sub 0x40(%rbx),%rax >>>> ??? 0x00007f6080c97f09 <+73>:??? mov %rax,(%rdx,%rcx,8) >>>> ??? 0x00007f6080c97f0d <+77>:??? mov 0x38(%rbx),%rax >>>> ??? 0x00007f6080c97f11 <+81>:??? movslq 0x8(%rbx),%rdx >>>> ??? 0x00007f6080c97f15 <+85>:??? neg??? %rax >>>> ??? 0x00007f6080c97f18 <+88>:??? cmpl?? $0x1,0x0(%r13) >>>> ??? 0x00007f6080c97f1d <+93>:??? lea (%r15,%rdx,8),%rcx >>>> ??? 0x00007f6080c97f21 <+97>:??? mov??? $0x1,%edx >>>> ??? 0x00007f6080c97f26 <+102>:??? jne 0x7f6080c97f32 >>>> >>>> ??? 0x00007f6080c97f28 <+104>:??? lea 0x74acb4(%rip),%rdx??????? # >>>> 0x7f60813e2be3 >>>> ??? 0x00007f6080c97f2f <+111>:??? movzbl (%rdx),%edx >>>> ??? 0x00007f6080c97f32 <+114>:??? cmp??? $0x0,%dl >>>> ??? 0x00007f6080c97f35 <+117>:??? je 0x7f6080c97f38 >>>> >>>> ??? 0x00007f6080c97f37 <+119>:??? lock xadd %rax,(%rcx) >>>> ??? 0x00007f6080c97f3c <+124>:??? mov 0x48(%rbx),%r14 >>>> ??? 0x00007f6080c97f40 <+128>:??? callq 0x7f6080c951a0 >>>> >>>> ??? 0x00007f6080c97f45 <+133>:??? movslq 0x8(%rbx),%rdx >>>> ??? 0x00007f6080c97f49 <+137>:??? imul?? %r14,%rax >>>> ??? 0x00007f6080c97f4d <+141>:??? lea (%r15,%rdx,8),%rcx >>>> ??? 0x00007f6080c97f51 <+145>:??? mov??? $0x1,%edx >>>> ??? 0x00007f6080c97f56 <+150>:??? neg??? %rax >>>> ??? 0x00007f6080c97f59 <+153>:??? cmpl $0x1,0x0(%r13) >>>> ??? 0x00007f6080c97f5e <+158>:??? jne 0x7f6080c97f6a >>>> >>>> ??? 0x00007f6080c97f60 <+160>:??? lea 0x74ac7c(%rip),%rdx??????? # >>>> 0x7f60813e2be3 >>>> ??? 0x00007f6080c97f67 <+167>:??? movzbl (%rdx),%edx >>>> ??? 0x00007f6080c97f6a <+170>:??? cmp??? $0x0,%dl >>>> ??? 0x00007f6080c97f6d <+173>:??? je 0x7f6080c97f70 >>>> >>>> ??? 0x00007f6080c97f6f <+175>:??? lock xadd %rax,(%rcx) >>>> ??? 0x00007f6080c97f74 <+180>:??? xor??? %ecx,%ecx >>>> ??? 0x00007f6080c97f76 <+182>:??? xor??? %esi,%esi >>>> ??? 0x00007f6080c97f78 <+184>:??? mov 0x10(%rbx,%rcx,1),%rax >>>> ??? 0x00007f6080c97f7d <+189>:??? xor??? %edx,%edx >>>> ??? 0x00007f6080c97f7f <+191>:??? test?? %rax,%rax >>>> ??? 0x00007f6080c97f82 <+194>:??? je 0x7f6080c97f95 >>>> >>>> ??? 0x00007f6080c97f84 <+196>:??? nopl?? 0x0(%rax) >>>> => 0x00007f6080c97f88 <+200>:??? mov 0x8(%rax),%rax >>>> ??? 0x00007f6080c97f8c <+204>:??? add??? $0x1,%rdx >>>> ??? 0x00007f6080c97f90 <+208>:??? test?? %rax,%rax >>>> ... >>>> (gdb) info registers >>>> rax??????????? 0x10??? 16 >>>> rbx??????????? 0x7f5ff800ad30??? 140050159414576 >>>> rcx??????????? 0x10??? 16 >>>> rdx??????????? 0x0??? 0 >>>> rsi??????????? 0x2??? 2 >>>> rdi??????????? 0x1cfe570??? 30401904 >>>> rbp??????????? 0x7f607c3ecb80??? 0x7f607c3ecb80 >>>> rsp??????????? 0x7f607c3ecb50??? 0x7f607c3ecb50 >>>> r8???????????? 0x7f5ff80ae320??? 140050160083744 >>>> r9???????????? 0x7f5ff8052480??? 140050159707264 >>>> r10??????????? 0x0??? 0 >>>> r11??????????? 0x400??? 1024 >>>> r12??????????? 0x1cfe570??? 30401904 >>>> r13??????????? 0x7f6081419470??? 140052462146672 >>>> r14??????????? 0x2??? 2 >>>> r15??????????? 0x7f6081418640??? 140052462143040 >>>> rip??????????? 0x7f6080c97f88??? 0x7f6080c97f88 >>>> >>>> eflags???????? 0x206??? [ PF IF ] >>>> cs???????????? 0x33??? 51 >>>> ss???????????? 0x2b??? 43 >>>> ds???????????? 0x0??? 0 >>>> es???????????? 0x0??? 0 >>>> fs???????????? 0x0??? 0 >>>> gs???????????? 0x0??? 0 >>>> k0???????????? >>>> k1???????????? >>>> k2???????????? >>>> k3???????????? >>>> k4???????????? >>>> k5???????????? >>>> k6???????????? >>>> k7???????????? >>>> ----------------------------------------------------------------------------- >>>> >>>> >>>> ============================================================================= >>>> >>>> >>>> >>>> >>>> Does anyone know about this case? >>>> >>>> Thanks, Osamu >>>> >>>> >> From thomas.schatzl at oracle.com Fri Oct 25 10:20:03 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 25 Oct 2019 12:20:03 +0200 Subject: RFR (S): 8232776: G1 should always take rs_length_diff into account when predicting rs_lengths In-Reply-To: References: <2e973399-ce75-7ab4-ce21-58fe63c74f9c@oracle.com> Message-ID: <2dd515f2-225a-65fb-9939-0c75d1a9f09f@oracle.com> Hi Kim, Sangheon, On 22.10.19 22:49, sangheon.kim at oracle.com wrote: > Hi Thomas, > > On 10/22/19 10:35 AM, Thomas Schatzl wrote: >> Hi all, >> >> ? can I have reviews for this small change that makes G1 always use >> the error term for rs-length prediction, not only if G1 sees fit. >> >> While rs length prediction is still kind of bad even with this change >> (and seemingly a band-aid), with that change it is a bit better. While >> there is a "real" fix for RS length estimation coming that so far >> looks really good, this change decreases complexity of further changes >> in G1Policy enough while improving the estimation. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8232776 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8232776/webrev/ > Looks good. > thanks for your review. Thomas From thomas.schatzl at oracle.com Fri Oct 25 10:19:24 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 25 Oct 2019 12:19:24 +0200 Subject: RFR (XS): 8232779: G1 current collection parallel time does not include optional evacuation In-Reply-To: References: <15f7fa18-0334-c9ff-be69-c8ecb114e363@oracle.com> Message-ID: <6202aeaa-f40d-eaed-c46c-244a6bf0d7bc@oracle.com> Hi Sangheon, Kim On 22.10.19 22:50, sangheon.kim at oracle.com wrote: > Hi Thomas, > > On 10/22/19 11:05 AM, Thomas Schatzl wrote: >> Hi all, >> >> ? can I have reviews for this change that fixes the calculation of >> G1GCPhaseTimes::cur_collection_par_time_ms(): we forgot to consider >> the optional evacuation time. >> >> This causes too long Other time, having minor effects on pause time >> prediction. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8232779 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8232779/webrev/ > Looks good. > thanks for your review. Thomas From suenaga at oss.nttdata.com Fri Oct 25 12:20:20 2019 From: suenaga at oss.nttdata.com (Yasumasa Suenaga) Date: Fri, 25 Oct 2019 21:20:20 +0900 Subject: Segmentation Fault occurs when ClassLoader and Metaspace is released in JDK 8 In-Reply-To: References: <422c9ca2-5053-c761-cb61-f075877bb666@oss.nttdata.com> <314f9ad2-17df-1082-8816-7af73a96e9fb@nttcom.co.jp_1> <1ccb4f35-7f21-4aa2-4cbb-b75244b6d12d@oss.nttdata.com> Message-ID: It seems a bug. Anyone have any suggestions about this? > (gdb) p ClassLoaderData::_class_loader > $21 = (oop) 0xa3afc1f0 (CLD::_class_loader is not static member, so this command would be failed.) > hsdb> inspect 0xa3afc1f0 > instance of [C @ 0x00000000a3afc1f0 @ 0x00000000a3afc1f0 (size = 72) > _mark: 1 > _metadata._compressed_klass: TypeArrayKlass for [C > 0: 'c' I believe CLD::_class_loader should be the OOP for class loader. I guess memory corruption was occurred in some reason - Is it a bug in HotSpot? I checked 8u222 on Fedora 30, my guess seems correct. * GDB ``` (gdb) p ClassLoaderDataGraph::_head->_class_loader $2 = (oop) 0xd67d0900 ``` * CLHSDB ``` hsdb> inspect 0xd67d0900 instance of Oop for sun/misc/Launcher$AppClassLoader @ 0x00000000d67d0900 @ 0x00000000d67d0900 (size = 96) _mark: 436443282689 _metadata._compressed_klass: InstanceKlass for sun/misc/Launcher$AppClassLoader parent: Oop for sun/misc/Launcher$ExtClassLoader @ 0x00000000d67bb348 Oop for sun/misc/Launcher$ExtClassLoader @ 0x00000000d67bb348 : ``` Yasumasa On 2019/10/25 17:53, Osamu Sakamoto wrote: > Hi Yasumasa, > > > > I guess this is a bug in combination of Metaspace and CMS. > > However current jdk/jdk has different implementation, so it might not be occur in modern JDK. > > I want to hear the comments from others. > Thank you for your comment. > I want to hear from others, too > > > > AFAICS you cannot find head of _unloading at this point. > > However you can traverse CLD list with purge_me->_next . > Thank you for telling me how to traverse CLD list. > I could start to traverse the CLD list, but this list is too long to traverse manually. > I recursively chekced _next -> _next -> next ... about 500 times with GDB print command, but NULL termination or address loop isn't found yet. > I'll try to find a good way to traverse the CLD list to the end. > > > > BTW, CLD has OOP for class loader in ClassLoaderData::_class_loader . > > If you check it on (CL)HSDB, you might get any hints from it. > > For example, use system class loader instead of custom class loader from framework. > I checked CLD oop, but I don't understand what type of ClassLoader is. > The result is below. > It looks like that this ClassLoaderData::_class_loader oop indicates character array. > Is it normal? > If so, what is this class loader?(Bootstrap ClassLoader?) > > --------------------------------------------------- > (gdb) p ClassLoaderData::_class_loader > $21 = (oop) 0xa3afc1f0 > > hsdb> inspect 0xa3afc1f0 > instance of [C @ 0x00000000a3afc1f0 @ 0x00000000a3afc1f0 (size = 72) > _mark: 1 > _metadata._compressed_klass: TypeArrayKlass for [C > 0: 'c' > 1: 'o' > 2: 'l' > 3: 'u' > 4: 'm' > 5: 'n' > 6: '1' > 7: '5' > 8: '6' > 9: '5' > 10: '7' > 11: '5' > 12: '5' > 13: '9' > 14: '8' > 15: '6' > 16: '3' > 17: '3' > 18: '1' > 19: '_' > 20: '8' > 21: '0' > 22: '0' > 23: '3' > --------------------------------------------------- > > > Thanks, > > Osamu > > > On 10/24/19 09:49, Yasumasa Suenaga wrote: >> Hi Osamu, >> >> I guess this is a bug in combination of Metaspace and CMS. >> However current jdk/jdk has different implementation, so it might not be occur in modern JDK. >> I want to hear the comments from others. >> >> My comments is below: >> >> On 2019/10/23 18:57, Osamu Sakamoto wrote: >>> Hi Yasumasa, >>> >>> Thank you for answering. >>> >>> ?> What JVM options did you pass? >>> The following is the JVM options I passed. >>> ----------------------------------------------------------------- >>> -Xmx2048m >>> -Xms2048m >>> -XX:NewSize=412m >>> -XX:MaxNewSize=412m >>> -XX:SurvivorRatio=8 >>> -XX:MaxTenuringThreshold=15 >>> -XX:+UseConcMarkSweepGC >>> -XX:+UseCMSInitiatingOccupancyOnly >>> -XX:CMSInitiatingOccupancyFraction=80 >>> -XX:+CMSClassUnloadingEnabled >>> -XX:CompressedClassSpaceSize=64m >>> -XX:+PrintGCDetails >>> -XX:+PrintGCDateStamps >>> -XX:+UseGCLogFileRotation >>> -XX:GCLogFileSize=0 >>> -Xloggc:/var/log/tomcatm0/gc-%p.log >>> -XX:+HeapDumpOnOutOfMemoryError >>> -XX:+AlwaysLockClassLoader >>> ----------------------------------------------------------------- >>> >>> >>> ?> I guess you used CMS because this problem seems to occur on CMS only [1] [2]. >>> Yes, I used CMS. >>> >>> ?> So it might be work around not to use CMS. >>> Thank you for telling me work around. >>> But it is difficult to change the GC method, so we would like to solve this issue with CMS GC if possible. >>> >>> >>> ?> I'm not sure root cause of this issue, but it seems to break ClassLoaderDataGraph::_unloading. >>> ?> (like double free (delete) of CLD) >>> I checked whether the ClassLoaderDataGraph::_unloading is broken or not, but I didn't know because of the value has been cleaered by NULL or optimized out. >>> >>> Referring ClassLoaderDataGraph[1].cpp, it looks like that _unloading value is saved to ClassLoaderDataGraph::_saved_unloading. >>> But _saved_unloading had been cleared by NULL, too. >>> >>> Is there any other way to check it? >>> >>> [1]http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.cpp#l753 >>> >>> ----------------------------------------------------------------- >>> (gdb) f 10 >>> #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 >>> 818??? ??? delete purge_me; >>> (gdb) list ClassLoaderDataGraph::purge >>> 810??? void ClassLoaderDataGraph::purge() { >>> 811??? ? assert(SafepointSynchronize::is_at_safepoint(), "must be at safepoint!"); >>> 812??? ? ClassLoaderData* list = _unloading; >>> 813??? ? _unloading = NULL; >>> 814??? ? ClassLoaderData* next = list; >>> 815??? ? while (next != NULL) { >>> 816??? ??? ClassLoaderData* purge_me = next; >>> 817??? ??? next = purge_me->next(); >>> 818??? ??? delete purge_me; >>> 819??? ? } >>> 820??? ? Metaspace::purge(); >>> 821??? } >>> (gdb) p _unloading >>> $29 = (ClassLoaderData *) 0x0 >>> (gdb) p list >>> $30 = >>> (gdb) p next >>> $31 = >>> (gdb) p ClassLoaderDataGraph::_saved_unloading >>> $32 = (ClassLoaderData *) 0x0 >>> ----------------------------------------------------------------- >> >> AFAICS you cannot find head of _unloading at this point. >> However you can traverse CLD list with purge_me->_next . >> >> >> BTW, CLD has OOP for class loader in ClassLoaderData::_class_loader . >> If you check it on (CL)HSDB, you might get any hints from it. >> For example, use system class loader instead of custom class loader from framework. >> >> >> Thanks, >> >> Yasumasa >> >> >>> Thanks, >>> Osamu >>> >>> On 10/21/19 22:29, Yasumasa Suenaga wrote: >>>> Hi Osamu, >>>> >>>> What JVM options did you pass? >>>> >>>> I guess you used CMS because this problem seems to occur on CMS only [1] [2]. >>>> So it might be work around not to use CMS. >>>> >>>> I'm not sure root cause of this issue, but it seems to break ClassLoaderDataGraph::_unloading. >>>> (like double free (delete) of CLD) >>>> >>>> >>>> Thanks, >>>> >>>> Yasumasa >>>> >>>> >>>> [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.hpp#l100 >>>> [2] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp#l6384 >>>> >>>> >>>> On 2019/10/21 17:50, Osamu Sakamoto wrote: >>>>> Hi all, >>>>> >>>>> I have a problem about Segmentation Fault(SEGV) in GC and I can't make the cause clear. >>>>> Could you help me solve the problem? >>>>> >>>>> Our System uses OpenJDK 1.8.0.181, and crashed by SEGV when purging ClassLoader at safepoint. >>>>> This problem can't be reproduced, but this has happened 4 times in a few months. >>>>> >>>>> The following is the summary of my investigation. >>>>> >>>>> ============================================================================= >>>>> >>>>> First I checked hs_err, and that shows that the SEGV occurred. >>>>> VM_Operation is GenCollectForAllocation at safepoint. >>>>> >>>>> ----------------------------------------------------------------------------- >>>>> # >>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>> # >>>>> #? SIGSEGV (0xb) at pc=0x00007f6080c97f88, pid=23931, tid=0x00007f607c3ed700 >>>>> # >>>>> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13) >>>>> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 compressed oops) >>>>> # Problematic frame: >>>>> # V? [libjvm.so+0x84bf88] >>>>> # >>>>> # Core dump written. Default location: /opt/tomcate0/core or core.23931 >>>>> # >>>>> # If you would like to submit a bug report, please visit: >>>>> #?? http://bugreport.java.com/bugreport/crash.jsp >>>>> # >>>>> >>>>> ---------------? T H R E A D? --------------- >>>>> >>>>> Current thread (0x00007f6078c00000):? VMThread [stack: 0x00007f607c2ed000,0x00007f607c3ee000] [id=23939] >>>>> >>>>> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000018 >>>>> >>>>> Registers: >>>>> RAX=0x0000000000000010, RBX=0x00007f5ff800ad30, RCX=0x0000000000000010, RDX=0x0000000000000000 >>>>> RSP=0x00007f607c3ecb50, RBP=0x00007f607c3ecb80, RSI=0x0000000000000002, RDI=0x0000000001cfe570 >>>>> R8 =0x00007f5ff80ae320, R9 =0x00007f5ff8052480, R10=0x0000000000000000, R11=0x0000000000000400 >>>>> R12=0x0000000001cfe570, R13=0x00007f6081419470, R14=0x0000000000000002, R15=0x00007f6081418640 >>>>> RIP=0x00007f6080c97f88, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000004 >>>>> ?? TRAPNO=0x000000000000000e >>>>> >>>>> Top of Stack: (sp=0x00007f607c3ecb50) >>>>> 0x00007f607c3ecb50:?? 00007f607c3ecba0 00007f5ff800ad30 >>>>> 0x00007f607c3ecb60:?? 00007f5ff800ad00 0000000000000000 >>>>> 0x00007f607c3ecb70:?? 0000000000000000 0000000000000001 >>>>> 0x00007f607c3ecb80:?? 00007f607c3ecba0 00007f6080c995fa >>>>> 0x00007f607c3ecb90:?? 00007f5ff800ad00 00007f5ff800ac20 >>>>> 0x00007f607c3ecba0:?? 00007f607c3ecbc0 00007f60808bff5e >>>>> 0x00007f607c3ecbb0:?? 00007f5ff800ac20 00007f5ff8052870 >>>>> 0x00007f607c3ecbc0:?? 00007f607c3ecbe0 00007f60808c0f0f >>>>> 0x00007f607c3ecbd0:?? 00007f607c3ecbf0 00007f608140f308 >>>>> 0x00007f607c3ecbe0:?? 00007f607c3ecc30 00007f6080daa0b7 >>>>> 0x00007f607c3ecbf0:?? 00007f6069000100 0000000000000000 >>>>> 0x00007f607c3ecc00:?? 00007f607c3ecc20 00007f6080ed0800 >>>>> 0x00007f607c3ecc10:?? 00000000000000f9 88e95c3ba257ab00 >>>>> 0x00007f607c3ecc20:?? 431bde82d7b634db 00007f607800aa00 >>>>> 0x00007f607c3ecc30:?? 00007f607c3eccc0 00007f6080daa9d5 >>>>> 0x00007f607c3ecc40:?? 0000000000000000 00007f607803bf20 >>>>> 0x00007f607c3ecc50:?? 00007f607803be20 00000000000003e8 >>>>> 0x00007f607c3ecc60:?? 0000000000000001 00007f6078c00000 >>>>> 0x00007f607c3ecc70:?? 00007f607c3eccc0 0000000000000000 >>>>> 0x00007f607c3ecc80:?? 00000004000000f9 00007f60813e2b99 >>>>> 0x00007f607c3ecc90:?? 00007f607803bfa0 00007f6078c00000 >>>>> 0x00007f607c3ecca0:?? 0000000000000000 0000000000000000 >>>>> 0x00007f607c3eccb0:?? 00007f6081418bd0 00007f607803bf20 >>>>> 0x00007f607c3eccc0:?? 00007f607c3ece60 00007f6080f2048a >>>>> 0x00007f607c3eccd0:?? 00007f607c3ecd20 00007f607c3ecce0 >>>>> 0x00007f607c3ecce0:?? 00007f6078c00000 00007f6078c00980 >>>>> 0x00007f607c3eccf0:?? 00007f6078c009c0 00007f6078c009d0 >>>>> 0x00007f607c3ecd00:?? 00007f6078c00aa8 00000000000000d8 >>>>> 0x00007f607c3ecd10:?? 00007f6078c00be0 0000000000000000 >>>>> 0x00007f607c3ecd20:?? 00007f607c3ecd28 6e69747563657845 >>>>> 0x00007f607c3ecd30:?? 65706f204d562067 203a6e6f69746172 >>>>> 0x00007f607c3ecd40:?? 656c6c6f436e6547 6c6c41726f467463 >>>>> >>>>> Instructions: (pc=0x00007f6080c97f88) >>>>> 0x00007f6080c97f68:?? b6 12 80 fa 00 74 01 f0 48 0f c1 01 31 c9 31 f6 >>>>> 0x00007f6080c97f78:?? 48 8b 44 0b 10 31 d2 48 85 c0 74 11 0f 1f 40 00 >>>>> 0x00007f6080c97f88:?? 48 8b 40 08 48 83 c2 01 48 85 c0 75 f3 48 83 c1 >>>>> 0x00007f6080c97f98:?? 08 48 01 d6 48 83 f9 20 75 d6 8b 7b 08 48 8b 05 >>>>> >>>>> Register to memory mapping: >>>>> >>>>> RAX=0x0000000000000010 is an unknown value >>>>> RBX=0x00007f5ff800ad30 is an unknown value >>>>> RCX=0x0000000000000010 is an unknown value >>>>> RDX=0x0000000000000000 is an unknown value >>>>> RSP=0x00007f607c3ecb50 is an unknown value >>>>> RBP=0x00007f607c3ecb80 is an unknown value >>>>> RSI=0x0000000000000002 is an unknown value >>>>> RDI=0x0000000001cfe570 is an unknown value >>>>> R8 =0x00007f5ff80ae320 is an unknown value >>>>> R9 =0x00007f5ff8052480 is an unknown value >>>>> R10=0x0000000000000000 is an unknown value >>>>> R11=0x0000000000000400 is an unknown value >>>>> R12=0x0000000001cfe570 is an unknown value >>>>> R13=0x00007f6081419470: in /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so at 0x00007f608044c000 >>>>> R14=0x0000000000000002 is an unknown value >>>>> R15=0x00007f6081418640: in /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so at 0x00007f608044c000 >>>>> >>>>> >>>>> Stack: [0x00007f607c2ed000,0x00007f607c3ee000], sp=0x00007f607c3ecb50, free space=1022k >>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >>>>> V? [libjvm.so+0x84bf88] >>>>> V? [libjvm.so+0x84d5fa] >>>>> V? [libjvm.so+0x473f5e] >>>>> V? [libjvm.so+0x474f0f] >>>>> V? [libjvm.so+0x95e0b7] >>>>> V? [libjvm.so+0x95e9d5] >>>>> V? [libjvm.so+0xad448a] >>>>> V? [libjvm.so+0xad48f1] >>>>> V? [libjvm.so+0x8beb82] >>>>> >>>>> VM_Operation (0x00007f5fd69e6120): GenCollectForAllocation, mode: safepoint, requested by thread 0x00007f6079013800 >>>>> >>>>> ... >>>>> ----------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> Next, I used GDB to check the backtrace of the SEGV thread from the coredump. >>>>> The following is the backtrace. >>>>> The SEGV occurred when ClassLoader is purged and Metaspace is destructed. >>>>> And frame #7 shows that a signal(SEGV) handler is called after SpaceManager::~SpaceManager() is executed. >>>>> >>>>> ----------------------------------------------------------------------------- >>>>> (gdb) bt >>>>> #0? 0x00007f608146f1f7 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >>>>> #1? 0x00007f60814708e8 in __GI_abort () at abort.c:90 >>>>> #2? 0x00007f6080d0bc39 in os::abort (dump_core=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1519 >>>>> #3? 0x00007f6080f1b816 in VMError::report_and_die (this=this at entry=0x7f607c3ebd10) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060 >>>>> #4? 0x00007f6080d15927 in JVM_handle_linux_signal (sig=11, info=0x7f607c3ebfb0, ucVoid=0x7f607c3ebe80, abort_if_unrecognized=) >>>>> ???? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 >>>>> #5? 0x00007f6080d09038 in signalHandler (sig=11, info=0x7f607c3ebfb0, uc=0x7f607c3ebe80) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4446 >>>>> #6? >>>>> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, __in_chrg=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >>>>> #8? 0x00007f6080c995fa in Metaspace::~Metaspace (this=0x7f5ff800ad00, __in_chrg=) >>>>> ???? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2971 >>>>> #9? 0x00007f60808bff5e in ClassLoaderData::~ClassLoaderData (this=0x7f5ff800ac20, __in_chrg=) >>>>> ???? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:383 >>>>> #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 >>>>> #11 0x00007f6080daa0b7 in ClassLoaderDataGraph::purge_if_needed () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.hpp:104 >>>>> #12 SafepointSynchronize::do_cleanup_tasks () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:551 >>>>> #13 0x00007f6080daa9d5 in SafepointSynchronize::begin () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:402 >>>>> #14 0x00007f6080f2048a in VMThread::loop (this=this at entry=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:501 >>>>> #15 0x00007f6080f208f1 in VMThread::run (this=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:276 >>>>> #16 0x00007f6080d0ab82 in java_start (thread=0x7f6078c00000) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:796 >>>>> #17 0x00007f6081e2de25 in start_thread (arg=0x7f607c3ed700) at pthread_create.c:308 >>>>> #18 0x00007f608153234d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 >>>>> ----------------------------------------------------------------------------- >>>>> >>>>> >>>>> In Frame #7, Line 2028 (chunk = chunk->next()) is the crash point. >>>>> The variable "chunk" is defined at Line 2025 (Metachunk* chunk = chunks_in_use(i);). >>>>> "chunks_in_use(i)" is defined at Line 648 (Metachunk* chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; }). >>>>> So I checked values of "_chunks_in_use", and understood that "_chunks_in_use[2]" has Illegal Address "0x10". >>>>> Therefore, I think that the SEGV occurred because of referencing Illegal Address "0x10" at "chunk = chunk->next()". >>>>> >>>>> ----------------------------------------------------------------------------- >>>>> (gdb) f 7 >>>>> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, __in_chrg=) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >>>>> 2028??? ??? chunk = chunk->next(); >>>>> (gdb) list >>>>> 2023??? size_t SpaceManager::sum_count_in_chunks_in_use(ChunkIndex i) { >>>>> 2024??? ? size_t count = 0; >>>>> 2025??? ? Metachunk* chunk = chunks_in_use(i); >>>>> 2026??? ? while (chunk != NULL) { >>>>> 2027??? ??? count++; >>>>> 2028??? ??? chunk = chunk->next(); >>>>> 2029??? ? } >>>>> 2030??? ? return count; >>>>> 2031??? } >>>>> 2032 >>>>> (gdb) list SpaceManager::chunks_in_use >>>>> 647??? ? // Accessors >>>>> 648??? ? Metachunk* chunks_in_use(ChunkIndex index) const { return _chunks_in_use[index]; } >>>>> ... >>>>> (gdb) p _chunks_in_use >>>>> $11 = {0x7f5fcd41c400, 0x7f5fcd41a000, 0x10, 0x0} >>>>> ----------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> The following is disassemble code of "SpaceManager::~SpaceManager()". >>>>> %rax has 0x10 at "0x00007f6080c97f88 <+200>", but I don't understand why this "0x10" is inserted to %rax. >>>>> >>>>> ----------------------------------------------------------------------------- >>>>> (gdb) disas >>>>> Dump of assembler code for function SpaceManager::~SpaceManager(): >>>>> ??? 0x00007f6080c97ec0 <+0>:??? push?? %rbp >>>>> ??? 0x00007f6080c97ec1 <+1>:??? mov??? %rsp,%rbp >>>>> ??? 0x00007f6080c97ec4 <+4>:??? push?? %r15 >>>>> ??? 0x00007f6080c97ec6 <+6>:??? push?? %r14 >>>>> ??? 0x00007f6080c97ec8 <+8>:??? push?? %r13 >>>>> ??? 0x00007f6080c97eca <+10>:??? push?? %r12 >>>>> ??? 0x00007f6080c97ecc <+12>:??? push?? %rbx >>>>> ??? 0x00007f6080c97ecd <+13>:??? mov??? %rdi,%rbx >>>>> ??? 0x00007f6080c97ed0 <+16>:??? sub??? $0x8,%rsp >>>>> ??? 0x00007f6080c97ed4 <+20>:??? mov 0x780785(%rip),%r12??????? # 0x7f6081418660 <_ZN12SpaceManager12_expand_lockE> >>>>> ??? 0x00007f6080c97edb <+27>:??? test?? %r12,%r12 >>>>> ??? 0x00007f6080c97ede <+30>:??? je???? 0x7f6080c97ee8 >>>>> ??? 0x00007f6080c97ee0 <+32>:??? mov??? %r12,%rdi >>>>> ??? 0x00007f6080c97ee3 <+35>:??? callq? 0x7f6080cce2f0 >>>>> ??? 0x00007f6080c97ee8 <+40>:??? movslq 0x8(%rbx),%rcx >>>>> ??? 0x00007f6080c97eec <+44>:??? lea 0x78075d(%rip),%rdx??????? # 0x7f6081418650 <_ZN12MetaspaceAux15_capacity_wordsE> >>>>> ??? 0x00007f6080c97ef3 <+51>:??? lea 0x781576(%rip),%r13??????? # 0x7f6081419470 <_ZN2os16_processor_countE> >>>>> ??? 0x00007f6080c97efa <+58>:??? lea 0x78073f(%rip),%r15??????? # 0x7f6081418640 <_ZN12MetaspaceAux11_used_wordsE> >>>>> ??? 0x00007f6080c97f01 <+65>:??? mov (%rdx,%rcx,8),%rax >>>>> ??? 0x00007f6080c97f05 <+69>:??? sub 0x40(%rbx),%rax >>>>> ??? 0x00007f6080c97f09 <+73>:??? mov %rax,(%rdx,%rcx,8) >>>>> ??? 0x00007f6080c97f0d <+77>:??? mov 0x38(%rbx),%rax >>>>> ??? 0x00007f6080c97f11 <+81>:??? movslq 0x8(%rbx),%rdx >>>>> ??? 0x00007f6080c97f15 <+85>:??? neg??? %rax >>>>> ??? 0x00007f6080c97f18 <+88>:??? cmpl?? $0x1,0x0(%r13) >>>>> ??? 0x00007f6080c97f1d <+93>:??? lea (%r15,%rdx,8),%rcx >>>>> ??? 0x00007f6080c97f21 <+97>:??? mov??? $0x1,%edx >>>>> ??? 0x00007f6080c97f26 <+102>:??? jne 0x7f6080c97f32 >>>>> ??? 0x00007f6080c97f28 <+104>:??? lea 0x74acb4(%rip),%rdx??????? # 0x7f60813e2be3 >>>>> ??? 0x00007f6080c97f2f <+111>:??? movzbl (%rdx),%edx >>>>> ??? 0x00007f6080c97f32 <+114>:??? cmp??? $0x0,%dl >>>>> ??? 0x00007f6080c97f35 <+117>:??? je 0x7f6080c97f38 >>>>> ??? 0x00007f6080c97f37 <+119>:??? lock xadd %rax,(%rcx) >>>>> ??? 0x00007f6080c97f3c <+124>:??? mov 0x48(%rbx),%r14 >>>>> ??? 0x00007f6080c97f40 <+128>:??? callq 0x7f6080c951a0 >>>>> ??? 0x00007f6080c97f45 <+133>:??? movslq 0x8(%rbx),%rdx >>>>> ??? 0x00007f6080c97f49 <+137>:??? imul?? %r14,%rax >>>>> ??? 0x00007f6080c97f4d <+141>:??? lea (%r15,%rdx,8),%rcx >>>>> ??? 0x00007f6080c97f51 <+145>:??? mov??? $0x1,%edx >>>>> ??? 0x00007f6080c97f56 <+150>:??? neg??? %rax >>>>> ??? 0x00007f6080c97f59 <+153>:??? cmpl $0x1,0x0(%r13) >>>>> ??? 0x00007f6080c97f5e <+158>:??? jne 0x7f6080c97f6a >>>>> ??? 0x00007f6080c97f60 <+160>:??? lea 0x74ac7c(%rip),%rdx??????? # 0x7f60813e2be3 >>>>> ??? 0x00007f6080c97f67 <+167>:??? movzbl (%rdx),%edx >>>>> ??? 0x00007f6080c97f6a <+170>:??? cmp??? $0x0,%dl >>>>> ??? 0x00007f6080c97f6d <+173>:??? je 0x7f6080c97f70 >>>>> ??? 0x00007f6080c97f6f <+175>:??? lock xadd %rax,(%rcx) >>>>> ??? 0x00007f6080c97f74 <+180>:??? xor??? %ecx,%ecx >>>>> ??? 0x00007f6080c97f76 <+182>:??? xor??? %esi,%esi >>>>> ??? 0x00007f6080c97f78 <+184>:??? mov 0x10(%rbx,%rcx,1),%rax >>>>> ??? 0x00007f6080c97f7d <+189>:??? xor??? %edx,%edx >>>>> ??? 0x00007f6080c97f7f <+191>:??? test?? %rax,%rax >>>>> ??? 0x00007f6080c97f82 <+194>:??? je 0x7f6080c97f95 >>>>> ??? 0x00007f6080c97f84 <+196>:??? nopl?? 0x0(%rax) >>>>> => 0x00007f6080c97f88 <+200>:??? mov 0x8(%rax),%rax >>>>> ??? 0x00007f6080c97f8c <+204>:??? add??? $0x1,%rdx >>>>> ??? 0x00007f6080c97f90 <+208>:??? test?? %rax,%rax >>>>> ... >>>>> (gdb) info registers >>>>> rax??????????? 0x10??? 16 >>>>> rbx??????????? 0x7f5ff800ad30??? 140050159414576 >>>>> rcx??????????? 0x10??? 16 >>>>> rdx??????????? 0x0??? 0 >>>>> rsi??????????? 0x2??? 2 >>>>> rdi??????????? 0x1cfe570??? 30401904 >>>>> rbp??????????? 0x7f607c3ecb80??? 0x7f607c3ecb80 >>>>> rsp??????????? 0x7f607c3ecb50??? 0x7f607c3ecb50 >>>>> r8???????????? 0x7f5ff80ae320??? 140050160083744 >>>>> r9???????????? 0x7f5ff8052480??? 140050159707264 >>>>> r10??????????? 0x0??? 0 >>>>> r11??????????? 0x400??? 1024 >>>>> r12??????????? 0x1cfe570??? 30401904 >>>>> r13??????????? 0x7f6081419470??? 140052462146672 >>>>> r14??????????? 0x2??? 2 >>>>> r15??????????? 0x7f6081418640??? 140052462143040 >>>>> rip??????????? 0x7f6080c97f88??? 0x7f6080c97f88 >>>>> eflags???????? 0x206??? [ PF IF ] >>>>> cs???????????? 0x33??? 51 >>>>> ss???????????? 0x2b??? 43 >>>>> ds???????????? 0x0??? 0 >>>>> es???????????? 0x0??? 0 >>>>> fs???????????? 0x0??? 0 >>>>> gs???????????? 0x0??? 0 >>>>> k0???????????? >>>>> k1???????????? >>>>> k2???????????? >>>>> k3???????????? >>>>> k4???????????? >>>>> k5???????????? >>>>> k6???????????? >>>>> k7???????????? >>>>> ----------------------------------------------------------------------------- >>>>> >>>>> ============================================================================= >>>>> >>>>> >>>>> >>>>> Does anyone know about this case? >>>>> >>>>> Thanks, Osamu >>>>> >>>>> >>> > From sangheon.kim at oracle.com Fri Oct 25 14:02:23 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Fri, 25 Oct 2019 07:02:23 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> Message-ID: <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> Hi Stefan, On 10/23/19 1:47 AM, Stefan Johansson wrote: > Hi Sangheon, > > On 2019-10-22 18:47, sangheon.kim at oracle.com wrote: >> Hi Kim, >> >> On 10/22/19 12:19 AM, Kim Barrett wrote: >>>> On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: >>>> What do you think about below comment? >>>> >>>> ?? // Tries to allocate word_sz in the PLAB of the next >>>> "generation" after trying to >>>> ?? // allocate into dest. Previous_plab_refill_failed indicates >>>> whether previous >>>> ?? // PLAB refill for the original (source) object was failed. >>> Drop ?was?.? Otherwise looks good. >> Done. >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3 >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3.inc > Looks good in general, just one minor thing, no need for a new webrev > though: > src/hotspot/share/gc/g1/g1Allocator.cpp > --- > 144?? for (uint nodex_index = 0; nodex_index < _num_alloc_regions; > nodex_index++) { > > The name nodex_index has one too many x:es =) I would prefer node_index. Ouch! Fixed.. In addition, Stefan, Thomas and I had some discussion about making PLAB-NUMA aware (only for survivor). Stefan provided a patch with it and it is simple enough to include under this CR. Webrev: http://cr.openjdk.java.net/~sangheki/8220311/webrev.4 http://cr.openjdk.java.net/~sangheki/8220311/webrev.4.inc Testing: hs-tier 1 ~ 3, with/without UseNUMA Thanks, Sangheon > --- > > Thanks, > Stefan > >> >> Thanks, >> Sangheon >> >> >>> >>>> ?? // Returns a non-NULL pointer if successful, and updates dest if >>>> required. >>>> ?? // Also determines whether we should continue to try to allocate >>>> into the various >>>> ?? // generations or just end trying to allocate. >>>> ?? HeapWord* allocate_in_next_plab(G1HeapRegionAttr* dest, >>>> ... >>>> >>>> Let me post the webrev when we decide. :) >>>> >>>> Thanks, >>>> Sangheon >>>> >>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> >>>>> Looks good, other than that one comment issue. >>> >> From zgu at redhat.com Fri Oct 25 14:29:08 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 25 Oct 2019 10:29:08 -0400 Subject: RFR 8232992: Shenandoah: Implement self-fixing interpreter LRB Message-ID: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> Please review this patch that implements self-fixing interpreter LRB. Bug: https://bugs.openjdk.java.net/browse/JDK-8232992 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232992/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) x86_64 and x86_32 on Linux aarch64 Linux Windows x86_64 Thanks, -Zhengyu From shade at redhat.com Fri Oct 25 14:48:17 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 25 Oct 2019 16:48:17 +0200 Subject: RFR 8232992: Shenandoah: Implement self-fixing interpreter LRB In-Reply-To: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> References: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> Message-ID: <6ba6da68-c48f-24b1-7ca3-d2bd8a46c4b8@redhat.com> On 10/25/19 4:29 PM, Zhengyu Gu wrote: > Please review this patch that implements self-fixing interpreter LRB. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8232992 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232992/webrev.00/ *) I believe we can drop "fixup" from ShenandoahRuntime::load_reference_barrier_fixup(_narrow), as there no not-fixup versions left. *) shenandoahBarrierSetAssembler_x86.cpp nit: space before parenthesis 346 if(setup_addr_first) { *) shenandoahBarrierSetAssembler_x86.cpp: why pop(thread) is in the middle here? 315 __ testb(gc_state, ShenandoahHeap::HAS_FORWARDED); 316 #ifndef _LP64 317 __ pop(thread); 318 #endif 319 __ jccb(Assembler::zero, done); *) shenandoahBarrierSetAssembler_x86.cpp: seems to me it is cleaner to initialize the boolean variable first, and then use it. Also, suggestion for name: "need_addr_setup". // Use rsi for src address const Register src_addr = rsi; bool need_addr_setup = (src_addr != dst); if (need_addr_setup) { ... } else { ... } __ call(RuntimeAddress(CAST_FROM_FN_PTR(...); if (need_addr_setup) { ... *) shenandoahBarrierSetAssembler_x86.cpp, shenandoahBarrierSetAssembler_aarch64.cpp: since this code now uses rscratch1, it has to assert that registers do not clash. For example with: assert_different_registers(dst, rscratch1, rscratch2); *) shenandoahBarrierSetC2.cpp: this change looks like a bug fix for matching _narrow. Please RFR it separately, it should go in sooner. -- Thanks, -Aleksey From shade at redhat.com Fri Oct 25 15:24:03 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 25 Oct 2019 17:24:03 +0200 Subject: RFR (XS) 8233021: Shenandoah: SBSC2::is_shenandoah_lrb_call should match all LRB shapes Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8233021 See bug for explanation. Fix: http://cr.openjdk.java.net/~shade/8233021/webrev.01/ Testing: hotspot_gc_shenandoah; specjvm2008 with C2 verification turned on -- Thanks, -Aleksey From sangheon.kim at oracle.com Fri Oct 25 21:56:30 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Fri, 25 Oct 2019 14:56:30 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> Message-ID: <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> Hi Kim, On 10/24/19 4:05 PM, Kim Barrett wrote: >> On Oct 23, 2019, at 12:20 PM, sangheon.kim at oracle.com wrote: >> >> Hi Per, >> >> Thanks for taking a look at this. >> >> I agree all your comments and here's the webrev. >> - All comments from Per. >> - Move G1PageBasedVirtualSpace::page_size() near to page_start() from Kim. >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.6 >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.6.inc >> Testing: build test for linux, solaris, windows and mac. >> >> FYI, as I think existing numa related API names and -1 stuff seem not good, I planned to refine those later after pushing. But as you said following existing rule and then refine all together later seems better. > The type of the argument for numa_get_group_id(void* address) should > be "const void*". Sorry I didn't notice that earlier. Of course, > this will require a const_cast to remove the const qualifier when > calling get_mempolicy, but it is better to isolate the workaround for > that missing qualifier to that one place. > > I'm not sure I like the overload for os::numa_get_group_id. While > both are getting the numa id associated with something, the associations > involved seem pretty different to me. > > Spelling them out, they could be > > numa_get_group_id_for_current_thread() > numa_get_group_id_for_address(const void* address) > > Those seem semantically unrelated to me, so violate the usual guidance > of only overloading operations that are roughly equivalent (*). Or put > another way, one should not need to determine which overload is selected > to understand a call site. > > Of course, "roughly equivalent" is in the eye of the beholder. > > (*) Operator overloading sometimes violates this on the basis that the > syntactic concision of using operators is more important, and there > are a limited set of operators. Such violations are often used as an > argument against using operator overloading at all. I think the overload looks okay to me. But as you are not sure about it, I renamed the newly added one. - static int numa_get_group_id(void* address); + static int numa_get_group_id_for_address(const void* address); webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.7 http://cr.openjdk.java.net/~sangheki/8220310/webrev.7.inc Testing: hs-tier1 Thanks, Sangheon > From zgu at redhat.com Sat Oct 26 00:34:38 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 25 Oct 2019 20:34:38 -0400 Subject: RFR 8232992: Shenandoah: Implement self-fixing interpreter LRB In-Reply-To: <6ba6da68-c48f-24b1-7ca3-d2bd8a46c4b8@redhat.com> References: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> <6ba6da68-c48f-24b1-7ca3-d2bd8a46c4b8@redhat.com> Message-ID: <942c5c5d-fa2b-e14b-3319-0092d782da24@redhat.com> On 10/25/19 10:48 AM, Aleksey Shipilev wrote: > On 10/25/19 4:29 PM, Zhengyu Gu wrote: >> Please review this patch that implements self-fixing interpreter LRB. >> >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8232992 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8232992/webrev.00/ > > *) I believe we can drop "fixup" from ShenandoahRuntime::load_reference_barrier_fixup(_narrow), as > there no not-fixup versions left. Sure. > > *) shenandoahBarrierSetAssembler_x86.cpp nit: space before parenthesis > > 346 if(setup_addr_first) { Fixed > > *) shenandoahBarrierSetAssembler_x86.cpp: why pop(thread) is in the middle here? > > 315 __ testb(gc_state, ShenandoahHeap::HAS_FORWARDED); > 316 #ifndef _LP64 > 317 __ pop(thread); > 318 #endif > 319 __ jccb(Assembler::zero, done); > I was worried about having to track 'thread', so just pop it after use and forget about it. But, yes, there is nothing to worry about, reverted. > *) shenandoahBarrierSetAssembler_x86.cpp: seems to me it is cleaner to initialize the boolean > variable first, and then use it. Also, suggestion for name: "need_addr_setup". > > // Use rsi for src address > const Register src_addr = rsi; > bool need_addr_setup = (src_addr != dst); > > if (need_addr_setup) { > ... > } else { > ... > } > > __ call(RuntimeAddress(CAST_FROM_FN_PTR(...); > > if (need_addr_setup) { > ... > Fixed > *) shenandoahBarrierSetAssembler_x86.cpp, shenandoahBarrierSetAssembler_aarch64.cpp: since this code > now uses rscratch1, it has to assert that registers do not clash. For example with: > > assert_different_registers(dst, rscratch1, rscratch2); We only need to use rscratch1 when dst == r1, and there is possibility that dst comes in in rscratch1 (see SBSA::load_at() method), I think current assertion (dst != rscratch2) is sufficient. However, we do need to ensure scratch registers are not used by load_addr, so added: assert_different_registers(load_addr.base(), load_addr.index(), rscratch1); assert_different_registers(load_addr.base(), load_addr.index(), rscratch2); Updated: http://cr.openjdk.java.net/~zgu/JDK-8232992/webrev.01/ Reran hotspot_gc_shenandoah tests (fastdebug and release) x86_64 and x86_32 on Linux aarch64 on Linux Thanks, -Zhengyu > > *) shenandoahBarrierSetC2.cpp: this change looks like a bug fix for matching _narrow. Please RFR it > separately, it should go in sooner. > From kim.barrett at oracle.com Sat Oct 26 01:51:43 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 25 Oct 2019 21:51:43 -0400 Subject: RFR (XS): 8232951: TestG1ParallelPhases.java fails with phase NonYoungFreeCSet not found In-Reply-To: <27fa21ab-d8ac-95ea-3485-7a72116c22f2@oracle.com> References: <27fa21ab-d8ac-95ea-3485-7a72116c22f2@oracle.com> Message-ID: <951EB603-F273-4787-9D8D-32D8194ECFAA@oracle.com> > On Oct 24, 2019, at 7:50 AM, Thomas Schatzl wrote: > [?] > CR: > https://bugs.openjdk.java.net/browse/JDK-8232951 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8232951/webrev/ > Testing: > 400 runs of the changed test without issues > > Thanks, > Thomas I'd not previously noticed the AlwaysTenure and NeverTenure options. So many options... Those options are documented as being ParallelGC only. But it looks like setting either of them forces a value for MaxTenuringThreshold, so it seems okay to change the test to use AlwaysTenure. The documentation for the options should be updated though. (That can be a separate RFE.) Please put the new -Xlog option on a separate line. I know we don't have an official line length limit, but 152 chars seems excessive to me, and forced me to scroll to see some of it. Other than that, looks good. I don't need a new webrev. From per.liden at oracle.com Sat Oct 26 08:36:44 2019 From: per.liden at oracle.com (Per Liden) Date: Sat, 26 Oct 2019 10:36:44 +0200 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> Message-ID: <9d9494cd-82cd-6cf6-94e6-432a6ae187fb@oracle.com> On 10/25/19 11:56 PM, sangheon.kim at oracle.com wrote: > Hi Kim, > > On 10/24/19 4:05 PM, Kim Barrett wrote: >>> On Oct 23, 2019, at 12:20 PM,sangheon.kim at oracle.com wrote: >>> >>> Hi Per, >>> >>> Thanks for taking a look at this. >>> >>> I agree all your comments and here's the webrev. >>> - All comments from Per. >>> - Move G1PageBasedVirtualSpace::page_size() near to page_start() from Kim. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.6 >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.6.inc >>> Testing: build test for linux, solaris, windows and mac. >>> >>> FYI, as I think existing numa related API names and -1 stuff seem not good, I planned to refine those later after pushing. But as you said following existing rule and then refine all together later seems better. >> The type of the argument for numa_get_group_id(void* address) should >> be "const void*". Sorry I didn't notice that earlier. Of course, >> this will require a const_cast to remove the const qualifier when >> calling get_mempolicy, but it is better to isolate the workaround for >> that missing qualifier to that one place. >> >> I'm not sure I like the overload for os::numa_get_group_id. While >> both are getting the numa id associated with something, the associations >> involved seem pretty different to me. >> >> Spelling them out, they could be >> >> numa_get_group_id_for_current_thread() >> numa_get_group_id_for_address(const void* address) >> >> Those seem semantically unrelated to me, so violate the usual guidance >> of only overloading operations that are roughly equivalent (*). Or put >> another way, one should not need to determine which overload is selected >> to understand a call site. >> >> Of course, "roughly equivalent" is in the eye of the beholder. >> >> (*) Operator overloading sometimes violates this on the basis that the >> syntactic concision of using operators is more important, and there >> are a limited set of operators. Such violations are often used as an >> argument against using operator overloading at all. > I think the overload looks okay to me. > But as you are not sure about it, I renamed the newly added one. > > - static int numa_get_group_id(void* address); > + static int numa_get_group_id_for_address(const void* address); > Works for me. /Per > > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.7 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.7.inc > > Testing: hs-tier1 > > Thanks, > Sangheon > > > From kim.barrett at oracle.com Sun Oct 27 22:02:44 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 27 Oct 2019 18:02:44 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> References: <06ACBF87-ADBE-499F-B668-0274E4925B26@oracle.com> <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> Message-ID: > On Oct 25, 2019, at 5:56 PM, sangheon.kim at oracle.com wrote: > > - static int numa_get_group_id(void* address); > + static int numa_get_group_id_for_address(const void* address); > > webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.7 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.7.inc > > Testing: hs-tier1 > > Thanks, > Sangheon Looks good. From sakamoto.osamu at nttcom.co.jp Mon Oct 28 02:40:00 2019 From: sakamoto.osamu at nttcom.co.jp (Osamu Sakamoto) Date: Mon, 28 Oct 2019 11:40:00 +0900 Subject: Segmentation Fault occurs when ClassLoader and Metaspace is released in JDK 8 In-Reply-To: References: <422c9ca2-5053-c761-cb61-f075877bb666@oss.nttdata.com> <314f9ad2-17df-1082-8816-7af73a96e9fb@nttcom.co.jp_1> <1ccb4f35-7f21-4aa2-4cbb-b75244b6d12d@oss.nttdata.com> Message-ID: <121df7b1-b423-7790-5453-c14b545fa40b@nttcom.co.jp_1> Hi Yasumasa, > It seems a bug. > Anyone have any suggestions about this? I think, too. Does anyone know this? > (CLD::_class_loader is not static member, so this command would be failed.) I checked CLD::_class_loader after moving frame 9(ClassLoaderData::~ClassLoaderData), and it successed. ``` (gdb) f 9 #9? 0x00007f60808bff5e in ClassLoaderData::~ClassLoaderData (this=0x7f5ff800ac20, ??? __in_chrg=) ??? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:383 383??? ??? delete m; (gdb) p ClassLoaderData::_class_loader $34 = (oop) 0xa3afc1f0 ``` I rechecked the _class_loader in purge_me, the value is same. ``` (gdb) f 10 #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () ??? at /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 818??? ??? delete purge_me; (gdb) p purge_me $35 = (ClassLoaderData *) 0x7f5ff800ac20 (gdb) p purge_me->_class_loader $36 = (oop) 0xa3afc1f0 ``` > I believe CLD::_class_loader should be the OOP for class loader. > I guess memory corruption was occurred in some reason - Is it a bug in HotSpot? I checked ClassLoaderDataGraph::_head->_class_loader, too. Its oop indicates sun.reflect.DelegatingClassLoader. So, it seems that _class_loader in purge_me has illegal oop value. * GDB ``` (gdb) p ClassLoaderDataGraph::_head->_class_loader $37 = (oop) 0xb10f3aa8 ``` *CLHSDB ``` hsdb> inspect 0xb10f3aa8 instance of Oop for sun/reflect/DelegatingClassLoader @ 0x00000000b10f3aa8 @ 0x00000000b10f3aa8 (size = 72) _mark: 184567784057 _metadata._compressed_klass: InstanceKlass for sun/reflect/DelegatingClassLoader parent: Oop for org/apache/catalina/loader/ParallelWebappClassLoader @ 0x0000000099c00000 Oop for org/apache/catalina/loader/ParallelWebappClassLoader @ 0x0000000099c00000 ??? : ``` Thanks, Osamu On 10/25/19 21:20, Yasumasa Suenaga wrote: > It seems a bug. > Anyone have any suggestions about this? > >> (gdb) p ClassLoaderData::_class_loader >> $21 = (oop) 0xa3afc1f0 > > (CLD::_class_loader is not static member, so this command would be > failed.) > >> hsdb> inspect 0xa3afc1f0 >> instance of [C @ 0x00000000a3afc1f0 @ 0x00000000a3afc1f0 (size = 72) >> _mark: 1 >> _metadata._compressed_klass: TypeArrayKlass for [C >> 0: 'c' > > I believe CLD::_class_loader should be the OOP for class loader. > I guess memory corruption was occurred in some reason - Is it a bug in > HotSpot? > > I checked 8u222 on Fedora 30, my guess seems correct. > > * GDB > ``` > (gdb) p ClassLoaderDataGraph::_head->_class_loader > $2 = (oop) 0xd67d0900 > ``` > > * CLHSDB > ``` > hsdb> inspect 0xd67d0900 > instance of Oop for sun/misc/Launcher$AppClassLoader @ > 0x00000000d67d0900 @ 0x00000000d67d0900 (size = 96) > _mark: 436443282689 > _metadata._compressed_klass: InstanceKlass for > sun/misc/Launcher$AppClassLoader > parent: Oop for sun/misc/Launcher$ExtClassLoader @ 0x00000000d67bb348 > Oop for sun/misc/Launcher$ExtClassLoader @ 0x00000000d67bb348 > ??????? : > ``` > > > Yasumasa > > > On 2019/10/25 17:53, Osamu Sakamoto wrote: >> Hi Yasumasa, >> >> >> ?> I guess this is a bug in combination of Metaspace and CMS. >> ?> However current jdk/jdk has different implementation, so it might >> not be occur in modern JDK. >> ?> I want to hear the comments from others. >> Thank you for your comment. >> I want to hear from others, too >> >> >> ?> AFAICS you cannot find head of _unloading at this point. >> ?> However you can traverse CLD list with purge_me->_next . >> Thank you for telling me how to traverse CLD list. >> I could start to traverse the CLD list, but this list is too long to >> traverse manually. >> I recursively chekced _next -> _next -> next ... about 500 times with >> GDB print command, but NULL termination or address loop isn't found yet. >> I'll try to find a good way to traverse the CLD list to the end. >> >> >> ?> BTW, CLD has OOP for class loader in ClassLoaderData::_class_loader . >> ?> If you check it on (CL)HSDB, you might get any hints from it. >> ?> For example, use system class loader instead of custom class >> loader from framework. >> I checked CLD oop, but I don't understand what type of ClassLoader is. >> The result is below. >> It looks like that this ClassLoaderData::_class_loader oop indicates >> character array. >> Is it normal? >> If so, what is this class loader?(Bootstrap ClassLoader?) >> >> --------------------------------------------------- >> (gdb) p ClassLoaderData::_class_loader >> $21 = (oop) 0xa3afc1f0 >> >> hsdb> inspect 0xa3afc1f0 >> instance of [C @ 0x00000000a3afc1f0 @ 0x00000000a3afc1f0 (size = 72) >> _mark: 1 >> _metadata._compressed_klass: TypeArrayKlass for [C >> 0: 'c' >> 1: 'o' >> 2: 'l' >> 3: 'u' >> 4: 'm' >> 5: 'n' >> 6: '1' >> 7: '5' >> 8: '6' >> 9: '5' >> 10: '7' >> 11: '5' >> 12: '5' >> 13: '9' >> 14: '8' >> 15: '6' >> 16: '3' >> 17: '3' >> 18: '1' >> 19: '_' >> 20: '8' >> 21: '0' >> 22: '0' >> 23: '3' >> --------------------------------------------------- >> >> >> Thanks, >> >> Osamu >> >> >> On 10/24/19 09:49, Yasumasa Suenaga wrote: >>> Hi Osamu, >>> >>> I guess this is a bug in combination of Metaspace and CMS. >>> However current jdk/jdk has different implementation, so it might >>> not be occur in modern JDK. >>> I want to hear the comments from others. >>> >>> My comments is below: >>> >>> On 2019/10/23 18:57, Osamu Sakamoto wrote: >>>> Hi Yasumasa, >>>> >>>> Thank you for answering. >>>> >>>> ?> What JVM options did you pass? >>>> The following is the JVM options I passed. >>>> ----------------------------------------------------------------- >>>> -Xmx2048m >>>> -Xms2048m >>>> -XX:NewSize=412m >>>> -XX:MaxNewSize=412m >>>> -XX:SurvivorRatio=8 >>>> -XX:MaxTenuringThreshold=15 >>>> -XX:+UseConcMarkSweepGC >>>> -XX:+UseCMSInitiatingOccupancyOnly >>>> -XX:CMSInitiatingOccupancyFraction=80 >>>> -XX:+CMSClassUnloadingEnabled >>>> -XX:CompressedClassSpaceSize=64m >>>> -XX:+PrintGCDetails >>>> -XX:+PrintGCDateStamps >>>> -XX:+UseGCLogFileRotation >>>> -XX:GCLogFileSize=0 >>>> -Xloggc:/var/log/tomcatm0/gc-%p.log >>>> -XX:+HeapDumpOnOutOfMemoryError >>>> -XX:+AlwaysLockClassLoader >>>> ----------------------------------------------------------------- >>>> >>>> >>>> ?> I guess you used CMS because this problem seems to occur on CMS >>>> only [1] [2]. >>>> Yes, I used CMS. >>>> >>>> ?> So it might be work around not to use CMS. >>>> Thank you for telling me work around. >>>> But it is difficult to change the GC method, so we would like to >>>> solve this issue with CMS GC if possible. >>>> >>>> >>>> ?> I'm not sure root cause of this issue, but it seems to break >>>> ClassLoaderDataGraph::_unloading. >>>> ?> (like double free (delete) of CLD) >>>> I checked whether the ClassLoaderDataGraph::_unloading is broken or >>>> not, but I didn't know because of the value has been cleaered by >>>> NULL or optimized out. >>>> >>>> Referring ClassLoaderDataGraph[1].cpp, it looks like that >>>> _unloading value is saved to ClassLoaderDataGraph::_saved_unloading. >>>> But _saved_unloading had been cleared by NULL, too. >>>> >>>> Is there any other way to check it? >>>> >>>> [1]http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.cpp#l753 >>>> >>>> >>>> ----------------------------------------------------------------- >>>> (gdb) f 10 >>>> #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at >>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 >>>> >>>> 818??? ??? delete purge_me; >>>> (gdb) list ClassLoaderDataGraph::purge >>>> 810??? void ClassLoaderDataGraph::purge() { >>>> 811??? ? assert(SafepointSynchronize::is_at_safepoint(), "must be >>>> at safepoint!"); >>>> 812??? ? ClassLoaderData* list = _unloading; >>>> 813??? ? _unloading = NULL; >>>> 814??? ? ClassLoaderData* next = list; >>>> 815??? ? while (next != NULL) { >>>> 816??? ??? ClassLoaderData* purge_me = next; >>>> 817??? ??? next = purge_me->next(); >>>> 818??? ??? delete purge_me; >>>> 819??? ? } >>>> 820??? ? Metaspace::purge(); >>>> 821??? } >>>> (gdb) p _unloading >>>> $29 = (ClassLoaderData *) 0x0 >>>> (gdb) p list >>>> $30 = >>>> (gdb) p next >>>> $31 = >>>> (gdb) p ClassLoaderDataGraph::_saved_unloading >>>> $32 = (ClassLoaderData *) 0x0 >>>> ----------------------------------------------------------------- >>> >>> AFAICS you cannot find head of _unloading at this point. >>> However you can traverse CLD list with purge_me->_next . >>> >>> >>> BTW, CLD has OOP for class loader in ClassLoaderData::_class_loader . >>> If you check it on (CL)HSDB, you might get any hints from it. >>> For example, use system class loader instead of custom class loader >>> from framework. >>> >>> >>> Thanks, >>> >>> Yasumasa >>> >>> >>>> Thanks, >>>> Osamu >>>> >>>> On 10/21/19 22:29, Yasumasa Suenaga wrote: >>>>> Hi Osamu, >>>>> >>>>> What JVM options did you pass? >>>>> >>>>> I guess you used CMS because this problem seems to occur on CMS >>>>> only [1] [2]. >>>>> So it might be work around not to use CMS. >>>>> >>>>> I'm not sure root cause of this issue, but it seems to break >>>>> ClassLoaderDataGraph::_unloading. >>>>> (like double free (delete) of CLD) >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> [1] >>>>> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/classfile/classLoaderData.hpp#l100 >>>>> [2] >>>>> http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/eed8e846c982/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp#l6384 >>>>> >>>>> >>>>> On 2019/10/21 17:50, Osamu Sakamoto wrote: >>>>>> Hi all, >>>>>> >>>>>> I have a problem about Segmentation Fault(SEGV) in GC and I can't >>>>>> make the cause clear. >>>>>> Could you help me solve the problem? >>>>>> >>>>>> Our System uses OpenJDK 1.8.0.181, and crashed by SEGV when >>>>>> purging ClassLoader at safepoint. >>>>>> This problem can't be reproduced, but this has happened 4 times >>>>>> in a few months. >>>>>> >>>>>> The following is the summary of my investigation. >>>>>> >>>>>> ============================================================================= >>>>>> >>>>>> >>>>>> First I checked hs_err, and that shows that the SEGV occurred. >>>>>> VM_Operation is GenCollectForAllocation at safepoint. >>>>>> >>>>>> ----------------------------------------------------------------------------- >>>>>> >>>>>> # >>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>> # >>>>>> #? SIGSEGV (0xb) at pc=0x00007f6080c97f88, pid=23931, >>>>>> tid=0x00007f607c3ed700 >>>>>> # >>>>>> # JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build >>>>>> 1.8.0_181-b13) >>>>>> # Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode >>>>>> linux-amd64 compressed oops) >>>>>> # Problematic frame: >>>>>> # V? [libjvm.so+0x84bf88] >>>>>> # >>>>>> # Core dump written. Default location: /opt/tomcate0/core or >>>>>> core.23931 >>>>>> # >>>>>> # If you would like to submit a bug report, please visit: >>>>>> #?? http://bugreport.java.com/bugreport/crash.jsp >>>>>> # >>>>>> >>>>>> ---------------? T H R E A D? --------------- >>>>>> >>>>>> Current thread (0x00007f6078c00000):? VMThread [stack: >>>>>> 0x00007f607c2ed000,0x00007f607c3ee000] [id=23939] >>>>>> >>>>>> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), >>>>>> si_addr: 0x0000000000000018 >>>>>> >>>>>> Registers: >>>>>> RAX=0x0000000000000010, RBX=0x00007f5ff800ad30, >>>>>> RCX=0x0000000000000010, RDX=0x0000000000000000 >>>>>> RSP=0x00007f607c3ecb50, RBP=0x00007f607c3ecb80, >>>>>> RSI=0x0000000000000002, RDI=0x0000000001cfe570 >>>>>> R8 =0x00007f5ff80ae320, R9 =0x00007f5ff8052480, >>>>>> R10=0x0000000000000000, R11=0x0000000000000400 >>>>>> R12=0x0000000001cfe570, R13=0x00007f6081419470, >>>>>> R14=0x0000000000000002, R15=0x00007f6081418640 >>>>>> RIP=0x00007f6080c97f88, EFLAGS=0x0000000000010202, >>>>>> CSGSFS=0x0000000000000033, ERR=0x0000000000000004 >>>>>> ?? TRAPNO=0x000000000000000e >>>>>> >>>>>> Top of Stack: (sp=0x00007f607c3ecb50) >>>>>> 0x00007f607c3ecb50:?? 00007f607c3ecba0 00007f5ff800ad30 >>>>>> 0x00007f607c3ecb60:?? 00007f5ff800ad00 0000000000000000 >>>>>> 0x00007f607c3ecb70:?? 0000000000000000 0000000000000001 >>>>>> 0x00007f607c3ecb80:?? 00007f607c3ecba0 00007f6080c995fa >>>>>> 0x00007f607c3ecb90:?? 00007f5ff800ad00 00007f5ff800ac20 >>>>>> 0x00007f607c3ecba0:?? 00007f607c3ecbc0 00007f60808bff5e >>>>>> 0x00007f607c3ecbb0:?? 00007f5ff800ac20 00007f5ff8052870 >>>>>> 0x00007f607c3ecbc0:?? 00007f607c3ecbe0 00007f60808c0f0f >>>>>> 0x00007f607c3ecbd0:?? 00007f607c3ecbf0 00007f608140f308 >>>>>> 0x00007f607c3ecbe0:?? 00007f607c3ecc30 00007f6080daa0b7 >>>>>> 0x00007f607c3ecbf0:?? 00007f6069000100 0000000000000000 >>>>>> 0x00007f607c3ecc00:?? 00007f607c3ecc20 00007f6080ed0800 >>>>>> 0x00007f607c3ecc10:?? 00000000000000f9 88e95c3ba257ab00 >>>>>> 0x00007f607c3ecc20:?? 431bde82d7b634db 00007f607800aa00 >>>>>> 0x00007f607c3ecc30:?? 00007f607c3eccc0 00007f6080daa9d5 >>>>>> 0x00007f607c3ecc40:?? 0000000000000000 00007f607803bf20 >>>>>> 0x00007f607c3ecc50:?? 00007f607803be20 00000000000003e8 >>>>>> 0x00007f607c3ecc60:?? 0000000000000001 00007f6078c00000 >>>>>> 0x00007f607c3ecc70:?? 00007f607c3eccc0 0000000000000000 >>>>>> 0x00007f607c3ecc80:?? 00000004000000f9 00007f60813e2b99 >>>>>> 0x00007f607c3ecc90:?? 00007f607803bfa0 00007f6078c00000 >>>>>> 0x00007f607c3ecca0:?? 0000000000000000 0000000000000000 >>>>>> 0x00007f607c3eccb0:?? 00007f6081418bd0 00007f607803bf20 >>>>>> 0x00007f607c3eccc0:?? 00007f607c3ece60 00007f6080f2048a >>>>>> 0x00007f607c3eccd0:?? 00007f607c3ecd20 00007f607c3ecce0 >>>>>> 0x00007f607c3ecce0:?? 00007f6078c00000 00007f6078c00980 >>>>>> 0x00007f607c3eccf0:?? 00007f6078c009c0 00007f6078c009d0 >>>>>> 0x00007f607c3ecd00:?? 00007f6078c00aa8 00000000000000d8 >>>>>> 0x00007f607c3ecd10:?? 00007f6078c00be0 0000000000000000 >>>>>> 0x00007f607c3ecd20:?? 00007f607c3ecd28 6e69747563657845 >>>>>> 0x00007f607c3ecd30:?? 65706f204d562067 203a6e6f69746172 >>>>>> 0x00007f607c3ecd40:?? 656c6c6f436e6547 6c6c41726f467463 >>>>>> >>>>>> Instructions: (pc=0x00007f6080c97f88) >>>>>> 0x00007f6080c97f68:?? b6 12 80 fa 00 74 01 f0 48 0f c1 01 31 c9 >>>>>> 31 f6 >>>>>> 0x00007f6080c97f78:?? 48 8b 44 0b 10 31 d2 48 85 c0 74 11 0f 1f >>>>>> 40 00 >>>>>> 0x00007f6080c97f88:?? 48 8b 40 08 48 83 c2 01 48 85 c0 75 f3 48 >>>>>> 83 c1 >>>>>> 0x00007f6080c97f98:?? 08 48 01 d6 48 83 f9 20 75 d6 8b 7b 08 48 >>>>>> 8b 05 >>>>>> >>>>>> Register to memory mapping: >>>>>> >>>>>> RAX=0x0000000000000010 is an unknown value >>>>>> RBX=0x00007f5ff800ad30 is an unknown value >>>>>> RCX=0x0000000000000010 is an unknown value >>>>>> RDX=0x0000000000000000 is an unknown value >>>>>> RSP=0x00007f607c3ecb50 is an unknown value >>>>>> RBP=0x00007f607c3ecb80 is an unknown value >>>>>> RSI=0x0000000000000002 is an unknown value >>>>>> RDI=0x0000000001cfe570 is an unknown value >>>>>> R8 =0x00007f5ff80ae320 is an unknown value >>>>>> R9 =0x00007f5ff8052480 is an unknown value >>>>>> R10=0x0000000000000000 is an unknown value >>>>>> R11=0x0000000000000400 is an unknown value >>>>>> R12=0x0000000001cfe570 is an unknown value >>>>>> R13=0x00007f6081419470: in >>>>>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so >>>>>> at 0x00007f608044c000 >>>>>> R14=0x0000000000000002 is an unknown value >>>>>> R15=0x00007f6081418640: in >>>>>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/amd64/server/libjvm.so >>>>>> at 0x00007f608044c000 >>>>>> >>>>>> >>>>>> Stack: [0x00007f607c2ed000,0x00007f607c3ee000], >>>>>> sp=0x00007f607c3ecb50, free space=1022k >>>>>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >>>>>> C=native code) >>>>>> V? [libjvm.so+0x84bf88] >>>>>> V? [libjvm.so+0x84d5fa] >>>>>> V? [libjvm.so+0x473f5e] >>>>>> V? [libjvm.so+0x474f0f] >>>>>> V? [libjvm.so+0x95e0b7] >>>>>> V? [libjvm.so+0x95e9d5] >>>>>> V? [libjvm.so+0xad448a] >>>>>> V? [libjvm.so+0xad48f1] >>>>>> V? [libjvm.so+0x8beb82] >>>>>> >>>>>> VM_Operation (0x00007f5fd69e6120): GenCollectForAllocation, mode: >>>>>> safepoint, requested by thread 0x00007f6079013800 >>>>>> >>>>>> ... >>>>>> ----------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Next, I used GDB to check the backtrace of the SEGV thread from >>>>>> the coredump. >>>>>> The following is the backtrace. >>>>>> The SEGV occurred when ClassLoader is purged and Metaspace is >>>>>> destructed. >>>>>> And frame #7 shows that a signal(SEGV) handler is called after >>>>>> SpaceManager::~SpaceManager() is executed. >>>>>> >>>>>> ----------------------------------------------------------------------------- >>>>>> >>>>>> (gdb) bt >>>>>> #0? 0x00007f608146f1f7 in __GI_raise (sig=sig at entry=6) at >>>>>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >>>>>> #1? 0x00007f60814708e8 in __GI_abort () at abort.c:90 >>>>>> #2? 0x00007f6080d0bc39 in os::abort (dump_core=) >>>>>> at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1519 >>>>>> #3? 0x00007f6080f1b816 in VMError::report_and_die >>>>>> (this=this at entry=0x7f607c3ebd10) at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:1060 >>>>>> #4? 0x00007f6080d15927 in JVM_handle_linux_signal (sig=11, >>>>>> info=0x7f607c3ebfb0, ucVoid=0x7f607c3ebe80, >>>>>> abort_if_unrecognized=) >>>>>> ???? at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541 >>>>>> #5? 0x00007f6080d09038 in signalHandler (sig=11, >>>>>> info=0x7f607c3ebfb0, uc=0x7f607c3ebe80) at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:4446 >>>>>> #6? >>>>>> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, >>>>>> __in_chrg=) at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >>>>>> #8? 0x00007f6080c995fa in Metaspace::~Metaspace >>>>>> (this=0x7f5ff800ad00, __in_chrg=) >>>>>> ???? at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2971 >>>>>> #9? 0x00007f60808bff5e in ClassLoaderData::~ClassLoaderData >>>>>> (this=0x7f5ff800ac20, __in_chrg=) >>>>>> ???? at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:383 >>>>>> #10 0x00007f60808c0f0f in ClassLoaderDataGraph::purge () at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.cpp:818 >>>>>> #11 0x00007f6080daa0b7 in ClassLoaderDataGraph::purge_if_needed >>>>>> () at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/classLoaderData.hpp:104 >>>>>> #12 SafepointSynchronize::do_cleanup_tasks () at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:551 >>>>>> #13 0x00007f6080daa9d5 in SafepointSynchronize::begin () at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/safepoint.cpp:402 >>>>>> #14 0x00007f6080f2048a in VMThread::loop >>>>>> (this=this at entry=0x7f6078c00000) at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:501 >>>>>> #15 0x00007f6080f208f1 in VMThread::run (this=0x7f6078c00000) at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/runtime/vmThread.cpp:276 >>>>>> #16 0x00007f6080d0ab82 in java_start (thread=0x7f6078c00000) at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:796 >>>>>> #17 0x00007f6081e2de25 in start_thread (arg=0x7f607c3ed700) at >>>>>> pthread_create.c:308 >>>>>> #18 0x00007f608153234d in clone () at >>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 >>>>>> ----------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> >>>>>> In Frame #7, Line 2028 (chunk = chunk->next()) is the crash point. >>>>>> The variable "chunk" is defined at Line 2025 (Metachunk* chunk = >>>>>> chunks_in_use(i);). >>>>>> "chunks_in_use(i)" is defined at Line 648 (Metachunk* >>>>>> chunks_in_use(ChunkIndex index) const { return >>>>>> _chunks_in_use[index]; }). >>>>>> So I checked values of "_chunks_in_use", and understood that >>>>>> "_chunks_in_use[2]" has Illegal Address "0x10". >>>>>> Therefore, I think that the SEGV occurred because of referencing >>>>>> Illegal Address "0x10" at "chunk = chunk->next()". >>>>>> >>>>>> ----------------------------------------------------------------------------- >>>>>> >>>>>> (gdb) f 7 >>>>>> #7? SpaceManager::~SpaceManager (this=0x7f5ff800ad30, >>>>>> __in_chrg=) at >>>>>> /usr/src/debug/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/memory/metaspace.cpp:2028 >>>>>> 2028??? ??? chunk = chunk->next(); >>>>>> (gdb) list >>>>>> 2023??? size_t >>>>>> SpaceManager::sum_count_in_chunks_in_use(ChunkIndex i) { >>>>>> 2024??? ? size_t count = 0; >>>>>> 2025??? ? Metachunk* chunk = chunks_in_use(i); >>>>>> 2026??? ? while (chunk != NULL) { >>>>>> 2027??? ??? count++; >>>>>> 2028??? ??? chunk = chunk->next(); >>>>>> 2029??? ? } >>>>>> 2030??? ? return count; >>>>>> 2031??? } >>>>>> 2032 >>>>>> (gdb) list SpaceManager::chunks_in_use >>>>>> 647??? ? // Accessors >>>>>> 648??? ? Metachunk* chunks_in_use(ChunkIndex index) const { >>>>>> return _chunks_in_use[index]; } >>>>>> ... >>>>>> (gdb) p _chunks_in_use >>>>>> $11 = {0x7f5fcd41c400, 0x7f5fcd41a000, 0x10, 0x0} >>>>>> ----------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The following is disassemble code of >>>>>> "SpaceManager::~SpaceManager()". >>>>>> %rax has 0x10 at "0x00007f6080c97f88 <+200>", but I don't >>>>>> understand why this "0x10" is inserted to %rax. >>>>>> >>>>>> ----------------------------------------------------------------------------- >>>>>> >>>>>> (gdb) disas >>>>>> Dump of assembler code for function SpaceManager::~SpaceManager(): >>>>>> ??? 0x00007f6080c97ec0 <+0>:??? push?? %rbp >>>>>> ??? 0x00007f6080c97ec1 <+1>:??? mov??? %rsp,%rbp >>>>>> ??? 0x00007f6080c97ec4 <+4>:??? push?? %r15 >>>>>> ??? 0x00007f6080c97ec6 <+6>:??? push?? %r14 >>>>>> ??? 0x00007f6080c97ec8 <+8>:??? push?? %r13 >>>>>> ??? 0x00007f6080c97eca <+10>:??? push?? %r12 >>>>>> ??? 0x00007f6080c97ecc <+12>:??? push?? %rbx >>>>>> ??? 0x00007f6080c97ecd <+13>:??? mov??? %rdi,%rbx >>>>>> ??? 0x00007f6080c97ed0 <+16>:??? sub??? $0x8,%rsp >>>>>> ??? 0x00007f6080c97ed4 <+20>:??? mov 0x780785(%rip),%r12??????? # >>>>>> 0x7f6081418660 <_ZN12SpaceManager12_expand_lockE> >>>>>> ??? 0x00007f6080c97edb <+27>:??? test?? %r12,%r12 >>>>>> ??? 0x00007f6080c97ede <+30>:??? je 0x7f6080c97ee8 >>>>>> >>>>>> ??? 0x00007f6080c97ee0 <+32>:??? mov??? %r12,%rdi >>>>>> ??? 0x00007f6080c97ee3 <+35>:??? callq 0x7f6080cce2f0 >>>>>> >>>>>> ??? 0x00007f6080c97ee8 <+40>:??? movslq 0x8(%rbx),%rcx >>>>>> ??? 0x00007f6080c97eec <+44>:??? lea 0x78075d(%rip),%rdx??????? # >>>>>> 0x7f6081418650 <_ZN12MetaspaceAux15_capacity_wordsE> >>>>>> ??? 0x00007f6080c97ef3 <+51>:??? lea 0x781576(%rip),%r13??????? # >>>>>> 0x7f6081419470 <_ZN2os16_processor_countE> >>>>>> ??? 0x00007f6080c97efa <+58>:??? lea 0x78073f(%rip),%r15??????? # >>>>>> 0x7f6081418640 <_ZN12MetaspaceAux11_used_wordsE> >>>>>> ??? 0x00007f6080c97f01 <+65>:??? mov (%rdx,%rcx,8),%rax >>>>>> ??? 0x00007f6080c97f05 <+69>:??? sub 0x40(%rbx),%rax >>>>>> ??? 0x00007f6080c97f09 <+73>:??? mov %rax,(%rdx,%rcx,8) >>>>>> ??? 0x00007f6080c97f0d <+77>:??? mov 0x38(%rbx),%rax >>>>>> ??? 0x00007f6080c97f11 <+81>:??? movslq 0x8(%rbx),%rdx >>>>>> ??? 0x00007f6080c97f15 <+85>:??? neg??? %rax >>>>>> ??? 0x00007f6080c97f18 <+88>:??? cmpl $0x1,0x0(%r13) >>>>>> ??? 0x00007f6080c97f1d <+93>:??? lea (%r15,%rdx,8),%rcx >>>>>> ??? 0x00007f6080c97f21 <+97>:??? mov??? $0x1,%edx >>>>>> ??? 0x00007f6080c97f26 <+102>:??? jne 0x7f6080c97f32 >>>>>> >>>>>> ??? 0x00007f6080c97f28 <+104>:??? lea 0x74acb4(%rip),%rdx??????? >>>>>> # 0x7f60813e2be3 >>>>>> ??? 0x00007f6080c97f2f <+111>:??? movzbl (%rdx),%edx >>>>>> ??? 0x00007f6080c97f32 <+114>:??? cmp??? $0x0,%dl >>>>>> ??? 0x00007f6080c97f35 <+117>:??? je 0x7f6080c97f38 >>>>>> >>>>>> ??? 0x00007f6080c97f37 <+119>:??? lock xadd %rax,(%rcx) >>>>>> ??? 0x00007f6080c97f3c <+124>:??? mov 0x48(%rbx),%r14 >>>>>> ??? 0x00007f6080c97f40 <+128>:??? callq 0x7f6080c951a0 >>>>>> >>>>>> ??? 0x00007f6080c97f45 <+133>:??? movslq 0x8(%rbx),%rdx >>>>>> ??? 0x00007f6080c97f49 <+137>:??? imul?? %r14,%rax >>>>>> ??? 0x00007f6080c97f4d <+141>:??? lea (%r15,%rdx,8),%rcx >>>>>> ??? 0x00007f6080c97f51 <+145>:??? mov??? $0x1,%edx >>>>>> ??? 0x00007f6080c97f56 <+150>:??? neg??? %rax >>>>>> ??? 0x00007f6080c97f59 <+153>:??? cmpl $0x1,0x0(%r13) >>>>>> ??? 0x00007f6080c97f5e <+158>:??? jne 0x7f6080c97f6a >>>>>> >>>>>> ??? 0x00007f6080c97f60 <+160>:??? lea 0x74ac7c(%rip),%rdx??????? >>>>>> # 0x7f60813e2be3 >>>>>> ??? 0x00007f6080c97f67 <+167>:??? movzbl (%rdx),%edx >>>>>> ??? 0x00007f6080c97f6a <+170>:??? cmp??? $0x0,%dl >>>>>> ??? 0x00007f6080c97f6d <+173>:??? je 0x7f6080c97f70 >>>>>> >>>>>> ??? 0x00007f6080c97f6f <+175>:??? lock xadd %rax,(%rcx) >>>>>> ??? 0x00007f6080c97f74 <+180>:??? xor??? %ecx,%ecx >>>>>> ??? 0x00007f6080c97f76 <+182>:??? xor??? %esi,%esi >>>>>> ??? 0x00007f6080c97f78 <+184>:??? mov 0x10(%rbx,%rcx,1),%rax >>>>>> ??? 0x00007f6080c97f7d <+189>:??? xor??? %edx,%edx >>>>>> ??? 0x00007f6080c97f7f <+191>:??? test?? %rax,%rax >>>>>> ??? 0x00007f6080c97f82 <+194>:??? je 0x7f6080c97f95 >>>>>> >>>>>> ??? 0x00007f6080c97f84 <+196>:??? nopl?? 0x0(%rax) >>>>>> => 0x00007f6080c97f88 <+200>:??? mov 0x8(%rax),%rax >>>>>> ??? 0x00007f6080c97f8c <+204>:??? add??? $0x1,%rdx >>>>>> ??? 0x00007f6080c97f90 <+208>:??? test?? %rax,%rax >>>>>> ... >>>>>> (gdb) info registers >>>>>> rax??????????? 0x10??? 16 >>>>>> rbx??????????? 0x7f5ff800ad30??? 140050159414576 >>>>>> rcx??????????? 0x10??? 16 >>>>>> rdx??????????? 0x0??? 0 >>>>>> rsi??????????? 0x2??? 2 >>>>>> rdi??????????? 0x1cfe570??? 30401904 >>>>>> rbp??????????? 0x7f607c3ecb80??? 0x7f607c3ecb80 >>>>>> rsp??????????? 0x7f607c3ecb50??? 0x7f607c3ecb50 >>>>>> r8???????????? 0x7f5ff80ae320??? 140050160083744 >>>>>> r9???????????? 0x7f5ff8052480??? 140050159707264 >>>>>> r10??????????? 0x0??? 0 >>>>>> r11??????????? 0x400??? 1024 >>>>>> r12??????????? 0x1cfe570??? 30401904 >>>>>> r13??????????? 0x7f6081419470??? 140052462146672 >>>>>> r14??????????? 0x2??? 2 >>>>>> r15??????????? 0x7f6081418640??? 140052462143040 >>>>>> rip??????????? 0x7f6080c97f88??? 0x7f6080c97f88 >>>>>> >>>>>> eflags???????? 0x206??? [ PF IF ] >>>>>> cs???????????? 0x33??? 51 >>>>>> ss???????????? 0x2b??? 43 >>>>>> ds???????????? 0x0??? 0 >>>>>> es???????????? 0x0??? 0 >>>>>> fs???????????? 0x0??? 0 >>>>>> gs???????????? 0x0??? 0 >>>>>> k0???????????? >>>>>> k1???????????? >>>>>> k2???????????? >>>>>> k3???????????? >>>>>> k4???????????? >>>>>> k5???????????? >>>>>> k6???????????? >>>>>> k7???????????? >>>>>> ----------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> ============================================================================= >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Does anyone know about this case? >>>>>> >>>>>> Thanks, Osamu >>>>>> >>>>>> >>>> >> From stefan.johansson at oracle.com Mon Oct 28 08:35:53 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 28 Oct 2019 09:35:53 +0100 Subject: RFR (XS): 8232951: TestG1ParallelPhases.java fails with phase NonYoungFreeCSet not found In-Reply-To: <951EB603-F273-4787-9D8D-32D8194ECFAA@oracle.com> References: <27fa21ab-d8ac-95ea-3485-7a72116c22f2@oracle.com> <951EB603-F273-4787-9D8D-32D8194ECFAA@oracle.com> Message-ID: <7734C751-1D9B-45F5-86FB-D51D2BE8985F@oracle.com> Hi Thomas, > 26 okt. 2019 kl. 03:51 skrev Kim Barrett : > >> On Oct 24, 2019, at 7:50 AM, Thomas Schatzl wrote: >> [?] >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8232951 >> Webrev: >> http://cr.openjdk.java.net/~tschatzl/8232951/webrev/ >> Testing: >> 400 runs of the changed test without issues >> >> Thanks, >> Thomas > > I'd not previously noticed the AlwaysTenure and NeverTenure options. > So many options... > > Those options are documented as being ParallelGC only. But it looks > like setting either of them forces a value for MaxTenuringThreshold, > so it seems okay to change the test to use AlwaysTenure. The > documentation for the options should be updated though. (That can be > a separate RFE.) > > Please put the new -Xlog option on a separate line. I know we don't > have an official line length limit, but 152 chars seems excessive to > me, and forced me to scroll to see some of it. > > Other than that, looks good. I don't need a new webrev. > Sounds like a good fix and it looks good, Stefan From erik.osterlund at oracle.com Mon Oct 28 09:16:58 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 28 Oct 2019 10:16:58 +0100 Subject: RFR: 8232604: ZGC: Make ZVerifyViews mapping and unmapping precise In-Reply-To: <813a25f8-4540-859d-502d-8ffdfc503f72@oracle.com> References: <813a25f8-4540-859d-502d-8ffdfc503f72@oracle.com> Message-ID: Hi Stefan, Looks good. Thanks, /Erik On 2019-10-24 18:36, Stefan Karlsson wrote: > Hi all, > > Please review this patch to make the ZVerifyViews mapping and > unmapping precise. > > https://cr.openjdk.java.net/~stefank/8232604/webrev.01/ > https://bugs.openjdk.java.net/browse/JDK-8232604 > > Today, when the ZVerifyViews flag is turned on, we unmap all bad > views. The intention is to catch stray-pointer bugs. > > The current implementation takes a short-cut and unmap all memory en > masse. This works for Linux, but not on Windows, where we must be > precise in what we unmap. > > There are three places where allocated pages are registered today: > - In the page table - actively used > - In the page cache - free pages waiting to be used > - In-flight from the alloc queue > > The proposed patch registers all satisfied alloc requests, lets the > requesting threads deregister the satisfied request when the page is > received, and makes sure that the GC visits all in-flight satisfied > alloc requests when it performs the ZVerifyViews flip. > > Thanks, > StefanK From stefan.karlsson at oracle.com Mon Oct 28 10:04:45 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 28 Oct 2019 11:04:45 +0100 Subject: RFR: 8232604: ZGC: Make ZVerifyViews mapping and unmapping precise In-Reply-To: References: <813a25f8-4540-859d-502d-8ffdfc503f72@oracle.com> Message-ID: <9678016a-ee4b-84e5-67f1-89e5beb78231@oracle.com> Thanks, Erik. StefanK On 2019-10-28 10:16, Erik ?sterlund wrote: > Hi Stefan, > > Looks good. > > Thanks, > /Erik > > On 2019-10-24 18:36, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to make the ZVerifyViews mapping and >> unmapping precise. >> >> https://cr.openjdk.java.net/~stefank/8232604/webrev.01/ >> https://bugs.openjdk.java.net/browse/JDK-8232604 >> >> Today, when the ZVerifyViews flag is turned on, we unmap all bad >> views. The intention is to catch stray-pointer bugs. >> >> The current implementation takes a short-cut and unmap all memory en >> masse. This works for Linux, but not on Windows, where we must be >> precise in what we unmap. >> >> There are three places where allocated pages are registered today: >> - In the page table - actively used >> - In the page cache - free pages waiting to be used >> - In-flight from the alloc queue >> >> The proposed patch registers all satisfied alloc requests, lets the >> requesting threads deregister the satisfied request when the page is >> received, and makes sure that the GC visits all in-flight satisfied >> alloc requests when it performs the ZVerifyViews flip. >> >> Thanks, >> StefanK > From leo.korinth at oracle.com Mon Oct 28 10:40:59 2019 From: leo.korinth at oracle.com (Leo Korinth) Date: Mon, 28 Oct 2019 11:40:59 +0100 Subject: RFR (XS): 8232951: TestG1ParallelPhases.java fails with phase NonYoungFreeCSet not found In-Reply-To: <7734C751-1D9B-45F5-86FB-D51D2BE8985F@oracle.com> References: <27fa21ab-d8ac-95ea-3485-7a72116c22f2@oracle.com> <951EB603-F273-4787-9D8D-32D8194ECFAA@oracle.com> <7734C751-1D9B-45F5-86FB-D51D2BE8985F@oracle.com> Message-ID: <1fff7f47-aebc-bbcd-bda6-bb6185c11c3a@oracle.com> Hi. Just want to add some information, because I think it will fail again. The buggy test case is written by me and the provoke mixed gc part is copied mostly either from TestOldGenCollectionUsage or TestLogging (as it is hard to share this code due to JTREG). However when I did "copy" the code I also did try to improve the code, this could be the reason for this failure. I did at least two "improvements" in that I removed magic constants when allocating the 20k arrays and instead calculated how many I would need; this made the algorithm allocate ~2M instead of ~3M which could be a problem although to my understanding it should not be. Another change I made is that I will not provoke a gc by allocating until out-of-memory. The original code seems to try to provoke a gc by starting concurrent marks and young gc, but kind of fail-safes with the code after the comment // allocate more objects to provoke GC. Having this code I guess would fix the problem with the test case, but on the other hand, we would not know why the youngGC() after concurrent mark does not provoke a mixed gc (I guess it should, but correct me if this is false). I have talked to Thomas off-list, and I think AlwaysTenure is not the solution to the problem we have. I think adding the debug options is great and should be done, and AlwaysTenure seems better than MaxTenuringThreshold=1 but we should expect the test case to continue to fail in the future. If you go by adding AlwaysTenure instead of MaxTenuringThreshold=1, please also remove one getWhiteBox().youngGC() in allocateOldObjects so that we do not leave "magic" lines in the test case. Also update the comment to // Do *one* young collections... and there is another "-XX:MaxTenuringThreshold=1" that needs to be updated. I need no webrev for these changes. I am sorry that my "improvements" probably caused this failure, though just having heaps of code and not understanding why, is probably worse in the long run --- at least that is my thinking. Thanks, Leo On 28/10/2019 09:35, Stefan Johansson wrote: > Hi Thomas, > >> 26 okt. 2019 kl. 03:51 skrev Kim Barrett : >> >>> On Oct 24, 2019, at 7:50 AM, Thomas Schatzl wrote: >>> [?] >>> CR: >>> https://bugs.openjdk.java.net/browse/JDK-8232951 >>> Webrev: >>> http://cr.openjdk.java.net/~tschatzl/8232951/webrev/ >>> Testing: >>> 400 runs of the changed test without issues >>> >>> Thanks, >>> Thomas >> >> I'd not previously noticed the AlwaysTenure and NeverTenure options. >> So many options... >> >> Those options are documented as being ParallelGC only. But it looks >> like setting either of them forces a value for MaxTenuringThreshold, >> so it seems okay to change the test to use AlwaysTenure. The >> documentation for the options should be updated though. (That can be >> a separate RFE.) >> >> Please put the new -Xlog option on a separate line. I know we don't >> have an official line length limit, but 152 chars seems excessive to >> me, and forced me to scroll to see some of it. >> >> Other than that, looks good. I don't need a new webrev. >> > > Sounds like a good fix and it looks good, > Stefan > From stefan.johansson at oracle.com Mon Oct 28 12:41:38 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 28 Oct 2019 13:41:38 +0100 Subject: RFR: 8233065: PSParallelCompact::move_and_update is unused and should be removed Message-ID: <7C04DE96-9918-4F5B-81C5-0ABA5AB6DEAB@oracle.com> Hi, Please review this small fix that removes an unused function. JBS: https://bugs.openjdk.java.net/browse/JDK-8233065 Webrev: http://cr.openjdk.java.net/~sjohanss/8233065/00/ Summary: The function move_and_update was not removed when its last use was removed during the removal of PermGen. Testing: Build and tested through mach5 (tier1) Thanks, Stefan From leo.korinth at oracle.com Mon Oct 28 13:06:43 2019 From: leo.korinth at oracle.com (Leo Korinth) Date: Mon, 28 Oct 2019 14:06:43 +0100 Subject: RFR: 8233065: PSParallelCompact::move_and_update is unused and should be removed In-Reply-To: <7C04DE96-9918-4F5B-81C5-0ABA5AB6DEAB@oracle.com> References: <7C04DE96-9918-4F5B-81C5-0ABA5AB6DEAB@oracle.com> Message-ID: <9592e6db-bfb7-894d-fa24-f5982e63b8fd@oracle.com> Looks good. Thanks for cleaning up! /Leo On 28/10/2019 13:41, Stefan Johansson wrote: > Hi, > > Please review this small fix that removes an unused function. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8233065 > Webrev: http://cr.openjdk.java.net/~sjohanss/8233065/00/ > > Summary: > The function move_and_update was not removed when its last use was removed during the removal of PermGen. > > Testing: > Build and tested through mach5 (tier1) > > Thanks, > Stefan > From thomas.schatzl at oracle.com Mon Oct 28 13:42:10 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 28 Oct 2019 14:42:10 +0100 Subject: RFR: 8233065: PSParallelCompact::move_and_update is unused and should be removed In-Reply-To: <7C04DE96-9918-4F5B-81C5-0ABA5AB6DEAB@oracle.com> References: <7C04DE96-9918-4F5B-81C5-0ABA5AB6DEAB@oracle.com> Message-ID: <9f2a0003-c33f-1d31-f4bb-8d491817a4be@oracle.com> Hi, On 28.10.19 13:41, Stefan Johansson wrote: > Hi, > > Please review this small fix that removes an unused function. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8233065 > Webrev: http://cr.openjdk.java.net/~sjohanss/8233065/00/ > > Summary: > The function move_and_update was not removed when its last use was removed during the removal of PermGen. > > Testing: > Build and tested through mach5 (tier1) > looks good. Thomas From shade at redhat.com Mon Oct 28 14:49:23 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 28 Oct 2019 15:49:23 +0100 Subject: RFR 8232992: Shenandoah: Implement self-fixing interpreter LRB In-Reply-To: <942c5c5d-fa2b-e14b-3319-0092d782da24@redhat.com> References: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> <6ba6da68-c48f-24b1-7ca3-d2bd8a46c4b8@redhat.com> <942c5c5d-fa2b-e14b-3319-0092d782da24@redhat.com> Message-ID: <15f36ca6-aa08-3ca6-dad8-314baa77fc75@redhat.com> On 10/26/19 2:34 AM, Zhengyu Gu wrote: > We only need to use rscratch1 when dst == r1, and there is possibility that dst comes in in > rscratch1 (see SBSA::load_at() method), I think current assertion (dst != rscratch2) is sufficient. > > However, we do need to ensure scratch registers are not used by load_addr, so added: > > ? assert_different_registers(load_addr.base(), load_addr.index(), rscratch1); > ? assert_different_registers(load_addr.base(), load_addr.index(), rscratch2); Why not just: assert_different_registers(load_addr.base(), load_addr.index(), rscratch1, rscratch2); > Updated: http://cr.openjdk.java.net/~zgu/JDK-8232992/webrev.01/ Looks fine to me otherwise. -- Thanks, -Aleksey From erik.osterlund at oracle.com Mon Oct 28 15:29:35 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 28 Oct 2019 16:29:35 +0100 Subject: RFR: 8233073: Make BitMap accessors more memory ordering friendly Message-ID: <1b19522f-534d-ca6b-4e97-f837e4ab7212@oracle.com> Hi, I have need for accessors on BitMap being more explicitly memory ordering aware, to fix a bug in ZGC for AArch64 (https://bugs.openjdk.java.net/browse/JDK-8233061). In particular, I need failed bit sets to still have acquire semantics, and I need a getter with acquire semantics. My intention is to solve the problem by making the relevant BitMap accessors accept explicitly passed in memory ordering parameters, and utilize them. I draw the line of conservativeness at supporting IRIW-consistent loads. Having spent a great deal of time finding a single algorithm that breaks due to IRIW-consistency violations, and knowing the complexity of algorithms that actually can break due to that, I would be *very* surprised if we got anywhere close to that. Therefore, acquiring loads are the most conservative loads I support. This is explicitly stated in the API, so that anyone that actually relies on IRIW consistency in the future can reconsider that, and add a mode that fences before loads on nMCA machines. The main points of controversy with this patch, where I expect people to have wildly different opinions and hopefully get at least a little bit upset are the following: 1) For the same reason that our implementation of Atomic::cmpxchg does not supply both one ordering for success and one for failed CAS, unlike the C++11 atomic counter part, I do not do so either in the par_set_bit API. In the Atomic API, this was very much intentional, because it is tricky to reason about the subtle effects of having relaxed failed CAS and conservative success. In fact, it's a bug of precisely that nature I am chasing. Therefore, I wish to transfer that same reasoning to the par_set_bit API, and not allow passing in a weaker failing memory ordering. A consequence of this is that I have made the uses of this API more conservative for failed bit flips than it was in the past. However, this new API allows relaxing the real pain point of the API: the success case (with it's bi-directional full fencing semantics). So I expect it can be applied to make RMO architectures happier where it really matters in the end. However, I will not attempt to prove that relaxing these calls is okay in various places with this patch: that is outside of my scope, I'm merely adding API hooks for allowing that. 2) The default strength on the getter is memory_order_relaxed and not memory_order_conservative. After looking at uses, I realize it's used mostly in single threaded contexts by compiler code, and there is seemingly only a single use in the VM that cares about having acquire (ZGC, and it's broken today). While letting the frequency of uses decide what is the default rather than what is safest is not something I would normally do, it does feel like since the norm is so vastly in favour of the relaxed variant, I don't want to let the one ZGC use case clutter half the VM with explicitly relaxing the load. I am okay with reverting that decision if people want me to. CR: http://cr.openjdk.java.net/~eosterlund/8233073/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8233073 Thanks, /Erik From zgu at redhat.com Mon Oct 28 15:35:49 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 28 Oct 2019 11:35:49 -0400 Subject: RFR 8232992: Shenandoah: Implement self-fixing interpreter LRB In-Reply-To: <15f36ca6-aa08-3ca6-dad8-314baa77fc75@redhat.com> References: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> <6ba6da68-c48f-24b1-7ca3-d2bd8a46c4b8@redhat.com> <942c5c5d-fa2b-e14b-3319-0092d782da24@redhat.com> <15f36ca6-aa08-3ca6-dad8-314baa77fc75@redhat.com> Message-ID: <48c982fe-9dce-b122-c1fd-6e716778d4f2@redhat.com> On 10/28/19 10:49 AM, Aleksey Shipilev wrote: > On 10/26/19 2:34 AM, Zhengyu Gu wrote: >> We only need to use rscratch1 when dst == r1, and there is possibility that dst comes in in >> rscratch1 (see SBSA::load_at() method), I think current assertion (dst != rscratch2) is sufficient. >> >> However, we do need to ensure scratch registers are not used by load_addr, so added: >> >> ? assert_different_registers(load_addr.base(), load_addr.index(), rscratch1); >> ? assert_different_registers(load_addr.base(), load_addr.index(), rscratch2); > > Why not just: > assert_different_registers(load_addr.base(), load_addr.index(), rscratch1, rscratch2); Yep, fixed and pushed. Thanks, -Zhengyu > >> Updated: http://cr.openjdk.java.net/~zgu/JDK-8232992/webrev.01/ > > Looks fine to me otherwise. > From erik.osterlund at oracle.com Mon Oct 28 15:53:05 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 28 Oct 2019 16:53:05 +0100 Subject: RFR: 8233061: ZGC: Enforce memory ordering in segmented bit maps Message-ID: <311b863b-e2dc-c56c-7115-d13afb7c4f4b@oracle.com> Hi, In ZGC, bitmaps are lazily cleared in a segmented fashion. In this scheme, liveness is determined by looking at a counter, a segment bit map and finally the flat bit map structure. The accesses for the various stages need to be ordered properly. This patch sprinkles some OrderAccess calls to enforce this ordering. Out of curiosity, I disassembled libjvm.so with and without this patch to see if the reordering has bitten us in practice on x86_64. Fortunately, according to my analysis, it has not; we seem to have been lucky. But there is a lot of machine code, so I could have missed something. However, given that we now have an AArch64 port which is definitely affected by this problem, and compilers really are free to do whatever they want to in the future, it seems in order to enforce this explicitly. This patch depends on https://bugs.openjdk.java.net/browse/JDK-8233073 which exposes some memory ordering aware getters on BitMap. I did not want to just wrap the existing API in ZGC, so I split that out to a separate RFE. CR: http://cr.openjdk.java.net/~eosterlund/8233061/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8233061 Thanks, /Erik From erik.osterlund at oracle.com Mon Oct 28 16:11:42 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 28 Oct 2019 17:11:42 +0100 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: References: Message-ID: <837c9e23-6068-53b5-9c50-7880a0f375c7@oracle.com> Hi, After some internal discussions with Per and Stefan, some refactorings have been made: 1) Use mmap consistently wherever possibly, instead of mach_vm_map, for consistency. And only use mach_vm_remap from a wrapper function to map in views. 2) Move the pmem segments up one level so that producer and consumer of the segments is on the same level, and let the virtual "file" know only about offsets. 3) Minor polishing. Incremental: http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00..01/ Full: http://cr.openjdk.java.net/~eosterlund/8224817/webrev.01/ Thanks, /Erik On 2019-10-24 12:38, erik.osterlund at oracle.com wrote: > Hi, > > Now that some curling has been performed, paving way for this patch: > > ??? 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops > from non-oops > ??? 8229278: Improve hs_err location printing to assume less about GC > internals > ??? 8229189: Improve JFR leak profiler tracing to deal with > discontiguous heaps > ??? 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() > ??? 8224820: ZGC: Support discontiguous heap reservations > > ...the remaining thing to do is plugging in a few platform specific > ZGC files. This patch does that. > Decided to go with mach_vm_map/mach_vm_remap to implement > multi-mapping. Previously I didn't want to do that as I couldn't > figure out how to mach_vm_remap memory on top of reserved VA (acquired > using mmap). But apparently VM_FLAGS_OVERWRITE was the missing > ingredient there. With that in place, dodging the terrible ftruncate > implementation on macOS seemed like a good idea. That also implies > this port supports large pages (unlike other GCs on macOS today). Yay! > > CR: > http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8229358 > > Thanks, > /Erik From erik.osterlund at oracle.com Mon Oct 28 16:38:02 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 28 Oct 2019 17:38:02 +0100 Subject: RFR: 8230661: ZGC: Stop reloading oops in load barriers Message-ID: <954589ac-147d-9419-f5eb-2a4fceb61cf5@oracle.com> Hi, In ZGC, an oop is first loaded somewhere, by e.g. JIT compiled code. Then it passes a load barrier that typically does not take a slow path. But when it does take a slow path, the oop is sometimes reloaded, at historically three different places, and now two places. 1) We used to do that as part of the mechanism that transferred execution to the slow path because it was easier to write that stub code if the original oop died. Since then, the compiler slow paths have been rewritten to not reload the oop. 2) Once in the slow path, we sometimes reload weak oops during the resurrection block window, because there used to be a race when it closed. After concurrent class unloading integrated, there is a thread-local handshake before closing the resurrection block window. Therefore, that race no longer exists (when class unloading is used). 3) Once the final oop of a slow path has been determined, self-healing kicks in. The self-healing CAS may fail. When it does, the oop is reloaded. But this is completely unnecessary. With obstacle 1 gone, and 2 and 3 having no reason to be in the code any more, I propose to get rid of all reloading of the oops in the slow paths, so that it becomes easier to reason about the code. The object captured by the original load, is then always the same object as the object found after the load barrier completes, although possibly with a new bit representation. Bug: https://bugs.openjdk.java.net/browse/JDK-8230661 CR: https://bugs.openjdk.java.net/browse/JDK-8230661 Thanks, /Erik From erik.osterlund at oracle.com Mon Oct 28 16:44:14 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 28 Oct 2019 17:44:14 +0100 Subject: RFR: 8230661: ZGC: Stop reloading oops in load barriers In-Reply-To: <954589ac-147d-9419-f5eb-2a4fceb61cf5@oracle.com> References: <954589ac-147d-9419-f5eb-2a4fceb61cf5@oracle.com> Message-ID: Oops. CR link was a bug link. For anyone that couldn't figure out what the CR link could possibly be, here it is: http://cr.openjdk.java.net/~eosterlund/8230661/webrev.00/ /Erik On 2019-10-28 17:38, Erik ?sterlund wrote: > Hi, > > In ZGC, an oop is first loaded somewhere, by e.g. JIT compiled code. > Then it passes a load barrier that typically does not take a slow > path. But when it does take a slow path, the oop is sometimes > reloaded, at historically three different places, and now two places. > > 1) We used to do that as part of the mechanism that transferred > execution to the slow path because it was easier to write that stub > code if the original oop died. Since then, the compiler slow paths > have been rewritten to not reload the oop. > > 2) Once in the slow path, we sometimes reload weak oops during the > resurrection block window, because there used to be a race when it > closed. After concurrent class unloading integrated, there is a > thread-local handshake before closing the resurrection block window. > Therefore, that race no longer exists (when class unloading is used). > > 3) Once the final oop of a slow path has been determined, self-healing > kicks in. The self-healing CAS may fail. When it does, the oop is > reloaded. But this is completely unnecessary. > > With obstacle 1 gone, and 2 and 3 having no reason to be in the code > any more, I propose to get rid of all reloading of the oops in the > slow paths, so that it becomes easier to reason about the code. The > object captured by the original load, is then always the same object > as the object found after the load barrier completes, although > possibly with a new bit representation. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8230661 > > CR: > https://bugs.openjdk.java.net/browse/JDK-8230661 > > Thanks, > /Erik From stefan.johansson at oracle.com Mon Oct 28 19:03:36 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Mon, 28 Oct 2019 20:03:36 +0100 Subject: RFR: 8220465: Use shadow regions for faster ParallelGC full GCs In-Reply-To: References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> <400df998-171a-5bbe-9f3e-01af1781afb4@oracle.com> Message-ID: <955C4FA4-FC18-446C-851A-0D04A916D88D@oracle.com> Hi Haoyu, I?ve looked through the patch in detail now and created a new webrev at: http://cr.openjdk.java.net/~sjohanss/8220465/01/ I took the liberty of removing the removal of move_and_update from your patch since I?m addressing that separately in JDK-8233065. The webrev above is still based on that removal, but I expect that to be pushed tomorrow or Wednesday so that should be fine. I also changed the subject to make it more clear that this is now a review of: https://bugs.openjdk.java.net/browse/JDK-8220465 Regarding the current patch, I think that it looks good in general, but I thought a bit more around how to share stuff between the closures and I agree that adding those extra virtual functions doesn?t really feel worth it. I?m wondering if a solution where we revert back to letting destination be the ?real destination? (not ever pointing to the shadow region) and add a copy_destination which is destination + offset. To make this work the normal MoveAndUpdateClosure would also have an offset, but it would always be 0. If do_addr() is then updated to use the copy_destination() in some places we might end up with something pretty nice, but maybe I?m missing something. I also realized that the current patch will trigger an assert because destination is expected not to be the shadow address: # Internal Error (open/src/hotspot/share/gc/parallel/psParallelCompact.cpp:3045), pid=12649, tid=12728 # assert(src_cp->destination() == destination) failed: first live obj in the space must match the destination So this also suggests that we should keep destination() returning the real destination. Some other comments: src/hotspot/share/gc/parallel/psParallelCompact.cpp ? 3383 void ShadowClosure::complete_region(ParCompactionManager *cm, HeapWord *dest_addr, 3384 PSParallelCompact::RegionData *region_ptr) { 3385 assert(region_ptr->shadow_state() == ParallelCompactData::RegionData::FINISH, "Region should be finished?); This assertion will also trigger when running with a debug build and at this point the shadow state should be SHADOW not FINISH. ? src/hotspot/share/gc/parallel/psParallelCompact.hpp ? 632 inline bool ParallelCompactData::RegionData::mark_filled() { 633 return Atomic::cmpxchg(FILLED, &_shadow_state, SHADOW) == SHADOW; 634 } Since we never check the return value here we should make it void and maybe instead add an assert that the return value is SHADOW. ? When you addressed these comments, would it be possible to include both the full patch and and the incremental changes from the current version. That makes it easier for the reviewers to see what changed between version of the patch. Thanks, Stefan > 24 okt. 2019 kl. 14:16 skrev Stefan Johansson : > > Hi Haoyu, > > On 2019-10-23 17:15, Haoyu Li wrote: >> Hi Stefan, >> Thanks for your constructive feedback. I've addressed all the issues you mentioned, and the updated patch is attached in this email. > Nice, I will look at the patch next week, but I'll shortly answer your questions right away. > >> During refining the patch, I have a couple of questions: >> 1) Now the MoveAndUpdateClosure and ShadowClosure assume the destination address is the very beginning of a region, instead of an arbitrary address like what it used to be. However, there is an unused function named PSParallelCompact::move_and_update() uses the MoveAndUpdateClosure to process a region from its middle, which conflicts with the assumption. I notice that you removed this function in your patch, and so did I in the updated patch. Does it matter? > Yes, I found this function during my code review and it should be removed, but I think that should be handled as a separate issue. We can do this removal before this patch goes in. > >> 2) Using the same do_addr() in MoveAndUpdateClosure and ShadowClosure is doable, but it does not reuse all the code neatly. Because storing the address of the shadow region in _destination requires extra virtual functions to handle allocating blocks in the start_array and setting addresses of deferred objects. In particular, allocate_blocks() and set_deferred_object_for() in both closures are added. Is it worth avoiding to use _offset to calculate the shadow_destination? > Ok, sounds like it might be better to have specific do_addr() functions then. I'll think some more around this when reviewing the new patch in depth. > >> If there are any problems with this patch, please contact me anytime. I'm more than happy to keep improving the code. Thanks again for reviewing. >> > Sound good, thanks, > Stefan From kim.barrett at oracle.com Tue Oct 29 01:54:16 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 28 Oct 2019 21:54:16 -0400 Subject: RFR: 8233073: Make BitMap accessors more memory ordering friendly In-Reply-To: <1b19522f-534d-ca6b-4e97-f837e4ab7212@oracle.com> References: <1b19522f-534d-ca6b-4e97-f837e4ab7212@oracle.com> Message-ID: <104E3C62-435C-4F31-87EB-D3EBC34634EE@oracle.com> Should this review be happening on hotspot-dev rather than hotspot-gc-dev? GC is not the only BitMap client; compiler uses them too (and generally rather differently). > On Oct 28, 2019, at 11:29 AM, Erik ?sterlund wrote: > > Hi, > > I have need for accessors on BitMap being more explicitly memory ordering aware, to fix a bug in ZGC for AArch64 (https://bugs.openjdk.java.net/browse/JDK-8233061). > In particular, I need failed bit sets to still have acquire semantics, and I need a getter with acquire semantics. > > My intention is to solve the problem by making the relevant BitMap accessors accept explicitly passed in memory ordering parameters, and utilize them. I draw the line of conservativeness at supporting IRIW-consistent loads. Having spent a great deal of time finding a single algorithm that breaks due to IRIW-consistency violations, and knowing the complexity of algorithms that actually can break due to that, I would be *very* surprised if we got anywhere close to that. Therefore, acquiring loads are the most conservative loads I support. This is explicitly stated in the API, so that anyone that actually relies on IRIW consistency in the future can reconsider that, and add a mode that fences before loads on nMCA machines. > > The main points of controversy with this patch, where I expect people to have wildly different opinions and hopefully get at least a little bit upset are the following: > > 1) For the same reason that our implementation of Atomic::cmpxchg does not supply both one ordering for success and one for failed CAS, unlike the C++11 atomic counter part, I do not do so either in the par_set_bit API. In the Atomic API, this was very much intentional, because it is tricky to reason about the subtle effects of having relaxed failed CAS and conservative success. In fact, it's a bug of precisely that nature I am chasing. Therefore, I wish to transfer that same reasoning to the par_set_bit API, and not allow passing in a weaker failing memory ordering. A consequence of this is that I have made the uses of this API more conservative for failed bit flips than it was in the past. However, this new API allows relaxing the real pain point of the API: the success case (with it's bi-directional full fencing semantics). So I expect it can be applied to make RMO architectures happier where it really matters in the end. However, I will not attempt to prove that relaxing these calls is okay in various places with this patch: that is outside of my scope, I'm merely adding API hooks for allowing that. > > 2) The default strength on the getter is memory_order_relaxed and not memory_order_conservative. After looking at uses, I realize it's used mostly in single threaded contexts by compiler code, and there is seemingly only a single use in the VM that cares about having acquire (ZGC, and it's broken today). While letting the frequency of uses decide what is the default rather than what is safest is not something I would normally do, it does feel like since the norm is so vastly in favour of the relaxed variant, I don't want to let the one ZGC use case clutter half the VM with explicitly relaxing the load. I am okay with reverting that decision if people want me to. > > CR: > http://cr.openjdk.java.net/~eosterlund/8233073/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8233073 > > Thanks, > /Erik ------------------------------------------------------------------------------ src/hotspot/share/utilities/bitMap.hpp 207 bool at(idx_t index, atomic_memory_order memory_order = memory_order_relaxed) const; My initial reaction here is that I'd prefer adding par_at() rather than giving at() a memory order argument. This would also address the question of what the default should be. For at(), it's nonatomic. For par_at() it's acquire. That would also avoid imposing volatile ordering on at(). As you said, existing uses of at() are relaxed/nonatomic. The code rearrangement for MarkBitMap::is_marked() makes me wonder if any of the calls should be acquire ordered, but obviously none are now... ------------------------------------------------------------------------------ src/hotspot/share/utilities/bitMap.hpp 205 // The memory ordering goes up to memory_order_acquire, but not higher. It is 206 // assumed that users of the BitMap API will never rely on IRIW consistency. I think what this means is that memory_order_seq_cst (memory_order_conservative in HotSpot) isn't supported? So just as we only have Atomic::load (relaxed) and Atomic::load_acquire (acquire). That seems okay. But if we aren't going to support the stronger semantics, I don't think this should permit the corresponding memory order value. C++11 specifies that atomic load operations cannot have a memory order of memory_order_release or memory_order_acq_rel. (Similarly, store operations cannot have a memory order of memory_order_consume, memory_order_acquire, or memory_order_acq_rel. That isn't relevant for this change, as all the modifying operations here are RMW.) So I think we should just be explicit that only relaxed and acquire are supported here. (And actually make that true; see next comment.) ------------------------------------------------------------------------------ src/hotspot/share/utilities/bitMap.inline.hpp 55 inline bool BitMap::at(idx_t index, atomic_memory_order memory_order) const { ... 58 return (load_word_ordered(addr, memory_order) & bit_mask(index)) != 0; This is using the load_word_ordered helper, but the behavior of that function is designed to support the RMW operations, and I think isn't really right for at() (see previous comment). The simplest solution to get what I'm suggesting would be to add an assert here that the memory_order is either relaxed or acquire. ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/markBitMap.inline.hpp 71 return _bm.at(addr_to_offset(addr), memory_order_relaxed); The memory order argument isn't needed with the current default, and wouldn't even be permitted with the above suggestion add par_at. ------------------------------------------------------------------------------ I'm not understanding part of the problem description though. You say ... have made the uses of this API more conservative for failed bit flips than it was in the past. But the pre-existing unordered cases in the setting functions (e.g. don't go through cmpxchg) are those where the bit is already set to the desired value, so there's no failure to change the bit involved. It seems reasonable to me that an acquire (at least) is usually desirable on that path, for reasons similar to why one wants an acquire on the outside-the-lock test when using the Double Checked Locking pattern. But that's not what was said, so I'm not sure I'm understanding the point. ------------------------------------------------------------------------------ The par_xxx_range operations are not being directly modified by this change. When only 1 bit is actually involved, they'll delegate to the conservative single-bit operations, so are changed to pick up the acquire on the already set to the desired value path. Otherwise, they always go through conservative cmpxchg as before. That all seems fine. ------------------------------------------------------------------------------ From ioi.lam at oracle.com Tue Oct 29 04:19:40 2019 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 28 Oct 2019 21:19:40 -0700 Subject: RFR: 8233073: Make BitMap accessors more memory ordering friendly In-Reply-To: <104E3C62-435C-4F31-87EB-D3EBC34634EE@oracle.com> References: <1b19522f-534d-ca6b-4e97-f837e4ab7212@oracle.com> <104E3C62-435C-4F31-87EB-D3EBC34634EE@oracle.com> Message-ID: <97d0b6b0-74bf-8a97-e399-f75582294cc3@oracle.com> CDS also uses BitMap -- always in single threaded code. So as long as single-threaded code doesn't get slowed down by this patch, I have no objection. Thanks - Ioi On 10/28/19 6:54 PM, Kim Barrett wrote: > Should this review be happening on hotspot-dev rather than > hotspot-gc-dev? GC is not the only BitMap client; compiler uses them > too (and generally rather differently). > >> On Oct 28, 2019, at 11:29 AM, Erik ?sterlund wrote: >> >> Hi, >> >> I have need for accessors on BitMap being more explicitly memory ordering aware, to fix a bug in ZGC for AArch64 (https://bugs.openjdk.java.net/browse/JDK-8233061). >> In particular, I need failed bit sets to still have acquire semantics, and I need a getter with acquire semantics. >> >> My intention is to solve the problem by making the relevant BitMap accessors accept explicitly passed in memory ordering parameters, and utilize them. I draw the line of conservativeness at supporting IRIW-consistent loads. Having spent a great deal of time finding a single algorithm that breaks due to IRIW-consistency violations, and knowing the complexity of algorithms that actually can break due to that, I would be *very* surprised if we got anywhere close to that. Therefore, acquiring loads are the most conservative loads I support. This is explicitly stated in the API, so that anyone that actually relies on IRIW consistency in the future can reconsider that, and add a mode that fences before loads on nMCA machines. >> >> The main points of controversy with this patch, where I expect people to have wildly different opinions and hopefully get at least a little bit upset are the following: >> >> 1) For the same reason that our implementation of Atomic::cmpxchg does not supply both one ordering for success and one for failed CAS, unlike the C++11 atomic counter part, I do not do so either in the par_set_bit API. In the Atomic API, this was very much intentional, because it is tricky to reason about the subtle effects of having relaxed failed CAS and conservative success. In fact, it's a bug of precisely that nature I am chasing. Therefore, I wish to transfer that same reasoning to the par_set_bit API, and not allow passing in a weaker failing memory ordering. A consequence of this is that I have made the uses of this API more conservative for failed bit flips than it was in the past. However, this new API allows relaxing the real pain point of the API: the success case (with it's bi-directional full fencing semantics). So I expect it can be applied to make RMO architectures happier where it really matters in the end. However, I will not attempt to prove that relaxing these calls is okay in various places with this patch: that is outside of my scope, I'm merely adding API hooks for allowing that. >> >> 2) The default strength on the getter is memory_order_relaxed and not memory_order_conservative. After looking at uses, I realize it's used mostly in single threaded contexts by compiler code, and there is seemingly only a single use in the VM that cares about having acquire (ZGC, and it's broken today). While letting the frequency of uses decide what is the default rather than what is safest is not something I would normally do, it does feel like since the norm is so vastly in favour of the relaxed variant, I don't want to let the one ZGC use case clutter half the VM with explicitly relaxing the load. I am okay with reverting that decision if people want me to. >> >> CR: >> http://cr.openjdk.java.net/~eosterlund/8233073/webrev.00/ >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8233073 >> >> Thanks, >> /Erik > ------------------------------------------------------------------------------ > src/hotspot/share/utilities/bitMap.hpp > 207 bool at(idx_t index, atomic_memory_order memory_order = memory_order_relaxed) const; > > My initial reaction here is that I'd prefer adding par_at() rather > than giving at() a memory order argument. This would also address the > question of what the default should be. For at(), it's nonatomic. > For par_at() it's acquire. > > That would also avoid imposing volatile ordering on at(). > > As you said, existing uses of at() are relaxed/nonatomic. The code > rearrangement for MarkBitMap::is_marked() makes me wonder if any of > the calls should be acquire ordered, but obviously none are now... > > ------------------------------------------------------------------------------ > src/hotspot/share/utilities/bitMap.hpp > 205 // The memory ordering goes up to memory_order_acquire, but not higher. It is > 206 // assumed that users of the BitMap API will never rely on IRIW consistency. > > I think what this means is that memory_order_seq_cst > (memory_order_conservative in HotSpot) isn't supported? So just as we > only have Atomic::load (relaxed) and Atomic::load_acquire (acquire). > That seems okay. But if we aren't going to support the stronger > semantics, I don't think this should permit the corresponding memory > order value. > > C++11 specifies that atomic load operations cannot have a memory order > of memory_order_release or memory_order_acq_rel. (Similarly, store > operations cannot have a memory order of memory_order_consume, > memory_order_acquire, or memory_order_acq_rel. That isn't relevant for > this change, as all the modifying operations here are RMW.) > > So I think we should just be explicit that only relaxed and acquire > are supported here. (And actually make that true; see next comment.) > > ------------------------------------------------------------------------------ > src/hotspot/share/utilities/bitMap.inline.hpp > 55 inline bool BitMap::at(idx_t index, atomic_memory_order memory_order) const { > ... > 58 return (load_word_ordered(addr, memory_order) & bit_mask(index)) != 0; > > This is using the load_word_ordered helper, but the behavior of that > function is designed to support the RMW operations, and I think isn't > really right for at() (see previous comment). The simplest solution to > get what I'm suggesting would be to add an assert here that the > memory_order is either relaxed or acquire. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/markBitMap.inline.hpp > 71 return _bm.at(addr_to_offset(addr), memory_order_relaxed); > > The memory order argument isn't needed with the current default, and > wouldn't even be permitted with the above suggestion add par_at. > > ------------------------------------------------------------------------------ > > I'm not understanding part of the problem description though. You say > > ... have made the uses of this API more conservative for failed bit > flips than it was in the past. > > But the pre-existing unordered cases in the setting functions (e.g. > don't go through cmpxchg) are those where the bit is already set to > the desired value, so there's no failure to change the bit involved. > It seems reasonable to me that an acquire (at least) is usually > desirable on that path, for reasons similar to why one wants an > acquire on the outside-the-lock test when using the Double Checked > Locking pattern. But that's not what was said, so I'm not sure I'm > understanding the point. > > ------------------------------------------------------------------------------ > > The par_xxx_range operations are not being directly modified by this > change. When only 1 bit is actually involved, they'll delegate to the > conservative single-bit operations, so are changed to pick up the > acquire on the already set to the desired value path. Otherwise, they > always go through conservative cmpxchg as before. That all seems fine. > > ------------------------------------------------------------------------------ > From thomas.schatzl at oracle.com Tue Oct 29 08:42:27 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 29 Oct 2019 09:42:27 +0100 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> Message-ID: <8eefcbf1-9c75-1fe1-14b8-df0f23a53518@oracle.com> Hi, On 25.10.19 16:02, sangheon.kim at oracle.com wrote: [...] > > In addition, Stefan, Thomas and I had some discussion about making > PLAB-NUMA aware (only for survivor). > Stefan provided a patch with it and it is simple enough to include under > this CR. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.4 > http://cr.openjdk.java.net/~sangheki/8220311/webrev.4.inc > > Testing: hs-tier 1 ~ 3, with/without UseNUMA > > Thanks, > Sangheon > > - G1Allocator::nodes() -> G1Allocator::num_nodes() - g1Allocator.hpp:167: s/depend/depending - please file an RFE investigating adding the node index to the region attributes Looks good otherwise. I do not need a re-review for these changes. Thomas From per.liden at oracle.com Tue Oct 29 11:29:23 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 29 Oct 2019 12:29:23 +0100 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: <837c9e23-6068-53b5-9c50-7880a0f375c7@oracle.com> References: <837c9e23-6068-53b5-9c50-7880a0f375c7@oracle.com> Message-ID: <806e1ac7-fa7a-7058-c4cc-bdd90a9f2b86@oracle.com> Some suggested adjustments, already discussed with Erik off-line: http://cr.openjdk.java.net/~pliden/8224817/webrev.review.0 /Per On 10/28/19 5:11 PM, Erik ?sterlund wrote: > Hi, > > After some internal discussions with Per and Stefan, some refactorings > have been made: > > 1) Use mmap consistently wherever possibly, instead of mach_vm_map, for > consistency. And only use mach_vm_remap from a wrapper function to map > in views. > 2) Move the pmem segments up one level so that producer and consumer of > the segments is on the same level, and let the virtual "file" know only > about offsets. > 3) Minor polishing. > > Incremental: > http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00..01/ > > Full: > http://cr.openjdk.java.net/~eosterlund/8224817/webrev.01/ > > Thanks, > /Erik > > On 2019-10-24 12:38, erik.osterlund at oracle.com wrote: >> Hi, >> >> Now that some curling has been performed, paving way for this patch: >> >> ??? 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops >> from non-oops >> ??? 8229278: Improve hs_err location printing to assume less about GC >> internals >> ??? 8229189: Improve JFR leak profiler tracing to deal with >> discontiguous heaps >> ??? 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() >> ??? 8224820: ZGC: Support discontiguous heap reservations >> >> ...the remaining thing to do is plugging in a few platform specific >> ZGC files. This patch does that. >> Decided to go with mach_vm_map/mach_vm_remap to implement >> multi-mapping. Previously I didn't want to do that as I couldn't >> figure out how to mach_vm_remap memory on top of reserved VA (acquired >> using mmap). But apparently VM_FLAGS_OVERWRITE was the missing >> ingredient there. With that in place, dodging the terrible ftruncate >> implementation on macOS seemed like a good idea. That also implies >> this port supports large pages (unlike other GCs on macOS today). Yay! >> >> CR: >> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00/ >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8229358 >> >> Thanks, >> /Erik > From erik.osterlund at oracle.com Tue Oct 29 11:31:24 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 29 Oct 2019 12:31:24 +0100 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: <806e1ac7-fa7a-7058-c4cc-bdd90a9f2b86@oracle.com> References: <837c9e23-6068-53b5-9c50-7880a0f375c7@oracle.com> <806e1ac7-fa7a-7058-c4cc-bdd90a9f2b86@oracle.com> Message-ID: <3ac7acf6-7154-1627-38b3-c186e9a18267@oracle.com> Seems reasonable. Thanks. /Erik On 10/29/19 12:29 PM, Per Liden wrote: > Some suggested adjustments, already discussed with Erik off-line: > > http://cr.openjdk.java.net/~pliden/8224817/webrev.review.0 > > /Per > > On 10/28/19 5:11 PM, Erik ?sterlund wrote: >> Hi, >> >> After some internal discussions with Per and Stefan, some >> refactorings have been made: >> >> 1) Use mmap consistently wherever possibly, instead of mach_vm_map, >> for consistency. And only use mach_vm_remap from a wrapper function >> to map in views. >> 2) Move the pmem segments up one level so that producer and consumer >> of the segments is on the same level, and let the virtual "file" know >> only about offsets. >> 3) Minor polishing. >> >> Incremental: >> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00..01/ >> >> Full: >> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.01/ >> >> Thanks, >> /Erik >> >> On 2019-10-24 12:38, erik.osterlund at oracle.com wrote: >>> Hi, >>> >>> Now that some curling has been performed, paving way for this patch: >>> >>> ??? 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops >>> from non-oops >>> ??? 8229278: Improve hs_err location printing to assume less about >>> GC internals >>> ??? 8229189: Improve JFR leak profiler tracing to deal with >>> discontiguous heaps >>> ??? 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() >>> ??? 8224820: ZGC: Support discontiguous heap reservations >>> >>> ...the remaining thing to do is plugging in a few platform specific >>> ZGC files. This patch does that. >>> Decided to go with mach_vm_map/mach_vm_remap to implement >>> multi-mapping. Previously I didn't want to do that as I couldn't >>> figure out how to mach_vm_remap memory on top of reserved VA >>> (acquired using mmap). But apparently VM_FLAGS_OVERWRITE was the >>> missing ingredient there. With that in place, dodging the terrible >>> ftruncate implementation on macOS seemed like a good idea. That also >>> implies this port supports large pages (unlike other GCs on macOS >>> today). Yay! >>> >>> CR: >>> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00/ >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8229358 >>> >>> Thanks, >>> /Erik >> From leihouyju at gmail.com Tue Oct 29 12:52:10 2019 From: leihouyju at gmail.com (Haoyu Li) Date: Tue, 29 Oct 2019 20:52:10 +0800 Subject: RFR: 8220465: Use shadow regions for faster ParallelGC full GCs In-Reply-To: <955C4FA4-FC18-446C-851A-0D04A916D88D@oracle.com> References: <4F02DD53-EA98-4A1A-B871-C6E9D9610B2C@oracle.com> <9B69AFD1-7AE2-4B50-BFCF-C9C6E2594240@oracle.com> <8ef5b52e-d6fc-3073-5ca7-44c87c1eb981@oracle.com> <92277aab-0578-9e2c-3f4f-55ae1e8c94a9@oracle.com> <400df998-171a-5bbe-9f3e-01af1781afb4@oracle.com> <955C4FA4-FC18-446C-851A-0D04A916D88D@oracle.com> Message-ID: Hi Stefan, Thanks for your constructive comments. I will address these issues in the next few days and provide both a full patch as well as the incremental changes. Best Regrads, Haoyu Li, Institute of Parallel and Distributed Systems(IPADS), School of Software, Shanghai Jiao Tong University Stefan Johansson ?2019?10?29??? ??3:03??? > Hi Haoyu, > > I?ve looked through the patch in detail now and created a new webrev at: > http://cr.openjdk.java.net/~sjohanss/8220465/01/ > > I took the liberty of removing the removal of move_and_update from your > patch since I?m addressing that separately in JDK-8233065. The webrev above > is still based on that removal, but I expect that to be pushed tomorrow or > Wednesday so that should be fine. > > I also changed the subject to make it more clear that this is now a review > of: > https://bugs.openjdk.java.net/browse/JDK-8220465 > > Regarding the current patch, I think that it looks good in general, but I > thought a bit more around how to share stuff between the closures and I > agree that adding those extra virtual functions doesn?t really feel worth > it. I?m wondering if a solution where we revert back to letting destination > be the ?real destination? (not ever pointing to the shadow region) and add > a copy_destination which is destination + offset. To make this work the > normal MoveAndUpdateClosure would also have an offset, but it would always > be 0. If do_addr() is then updated to use the copy_destination() in some > places we might end up with something pretty nice, but maybe I?m missing > something. > > I also realized that the current patch will trigger an assert because > destination is expected not to be the shadow address: > # Internal Error > (open/src/hotspot/share/gc/parallel/psParallelCompact.cpp:3045), pid=12649, > tid=12728 > # assert(src_cp->destination() == destination) failed: first live obj in > the space must match the destination > > So this also suggests that we should keep destination() returning the real > destination. > > Some other comments: > src/hotspot/share/gc/parallel/psParallelCompact.cpp > ? > 3383 void ShadowClosure::complete_region(ParCompactionManager *cm, > HeapWord *dest_addr, > 3384 PSParallelCompact::RegionData > *region_ptr) { > 3385 assert(region_ptr->shadow_state() == > ParallelCompactData::RegionData::FINISH, "Region should be finished?); > > This assertion will also trigger when running with a debug build and at > this point the shadow state should be SHADOW not FINISH. > ? > > src/hotspot/share/gc/parallel/psParallelCompact.hpp > ? > 632 inline bool ParallelCompactData::RegionData::mark_filled() { > 633 return Atomic::cmpxchg(FILLED, &_shadow_state, SHADOW) == SHADOW; > 634 } > > Since we never check the return value here we should make it void and > maybe instead add an assert that the return value is SHADOW. > ? > > When you addressed these comments, would it be possible to include both > the full patch and and the incremental changes from the current version. > That makes it easier for the reviewers to see what changed between version > of the patch. > > Thanks, > Stefan > > > 24 okt. 2019 kl. 14:16 skrev Stefan Johansson < > stefan.johansson at oracle.com>: > > > > Hi Haoyu, > > > > On 2019-10-23 17:15, Haoyu Li wrote: > >> Hi Stefan, > >> Thanks for your constructive feedback. I've addressed all the issues > you mentioned, and the updated patch is attached in this email. > > Nice, I will look at the patch next week, but I'll shortly answer your > questions right away. > > > >> During refining the patch, I have a couple of questions: > >> 1) Now the MoveAndUpdateClosure and ShadowClosure assume the > destination address is the very beginning of a region, instead of an > arbitrary address like what it used to be. However, there is an unused > function named PSParallelCompact::move_and_update() uses the > MoveAndUpdateClosure to process a region from its middle, which conflicts > with the assumption. I notice that you removed this function in your patch, > and so did I in the updated patch. Does it matter? > > Yes, I found this function during my code review and it should be > removed, but I think that should be handled as a separate issue. We can do > this removal before this patch goes in. > > > >> 2) Using the same do_addr() in MoveAndUpdateClosure and ShadowClosure > is doable, but it does not reuse all the code neatly. Because storing the > address of the shadow region in _destination requires extra virtual > functions to handle allocating blocks in the start_array and setting > addresses of deferred objects. In particular, allocate_blocks() and > set_deferred_object_for() in both closures are added. Is it worth avoiding > to use _offset to calculate the shadow_destination? > > Ok, sounds like it might be better to have specific do_addr() functions > then. I'll think some more around this when reviewing the new patch in > depth. > > > >> If there are any problems with this patch, please contact me anytime. > I'm more than happy to keep improving the code. Thanks again for reviewing. > >> > > Sound good, thanks, > > Stefan > > From erik.osterlund at oracle.com Tue Oct 29 13:40:20 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 29 Oct 2019 14:40:20 +0100 Subject: RFR: 8233073: Make BitMap accessors more memory ordering friendly In-Reply-To: <104E3C62-435C-4F31-87EB-D3EBC34634EE@oracle.com> References: <1b19522f-534d-ca6b-4e97-f837e4ab7212@oracle.com> <104E3C62-435C-4F31-87EB-D3EBC34634EE@oracle.com> Message-ID: <4786ce40-4889-2862-2e2c-f11bd661076a@oracle.com> Hi Kim, On 10/29/19 2:54 AM, Kim Barrett wrote: > Should this review be happening on hotspot-dev rather than > hotspot-gc-dev? GC is not the only BitMap client; compiler uses them > too (and generally rather differently). Perhaps. I presumed that the way it is being changed is only interesting for GC folks... but I guess that depends on the direction this is going. Let's see... > ------------------------------------------------------------------------------ > src/hotspot/share/utilities/bitMap.hpp > 207 bool at(idx_t index, atomic_memory_order memory_order = memory_order_relaxed) const; > > My initial reaction here is that I'd prefer adding par_at() rather > than giving at() a memory order argument. This would also address the > question of what the default should be. For at(), it's nonatomic. > For par_at() it's acquire. > > That would also avoid imposing volatile ordering on at(). I like that idea. Let's give that a shot. > As you said, existing uses of at() are relaxed/nonatomic. The code > rearrangement for MarkBitMap::is_marked() makes me wonder if any of > the calls should be acquire ordered, but obviously none are now... Yeah, I also wonder about that... > ------------------------------------------------------------------------------ > src/hotspot/share/utilities/bitMap.hpp > 205 // The memory ordering goes up to memory_order_acquire, but not higher. It is > 206 // assumed that users of the BitMap API will never rely on IRIW consistency. > > I think what this means is that memory_order_seq_cst > (memory_order_conservative in HotSpot) isn't supported? So just as we > only have Atomic::load (relaxed) and Atomic::load_acquire (acquire). > That seems okay. But if we aren't going to support the stronger > semantics, I don't think this should permit the corresponding memory > order value. > > C++11 specifies that atomic load operations cannot have a memory order > of memory_order_release or memory_order_acq_rel. (Similarly, store > operations cannot have a memory order of memory_order_consume, > memory_order_acquire, or memory_order_acq_rel. That isn't relevant for > this change, as all the modifying operations here are RMW.) > > So I think we should just be explicit that only relaxed and acquire > are supported here. (And actually make that true; see next comment.) > > ------------------------------------------------------------------------------ > src/hotspot/share/utilities/bitMap.inline.hpp > 55 inline bool BitMap::at(idx_t index, atomic_memory_order memory_order) const { > ... > 58 return (load_word_ordered(addr, memory_order) & bit_mask(index)) != 0; > > This is using the load_word_ordered helper, but the behavior of that > function is designed to support the RMW operations, and I think isn't > really right for at() (see previous comment). The simplest solution to > get what I'm suggesting would be to add an assert here that the > memory_order is either relaxed or acquire. That makes sense. I added the assert. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/markBitMap.inline.hpp > 71 return _bm.at(addr_to_offset(addr), memory_order_relaxed); > > The memory order argument isn't needed with the current default, and > wouldn't even be permitted with the above suggestion add par_at. Indeed. Reverted in favor of par_at. > ------------------------------------------------------------------------------ > > I'm not understanding part of the problem description though. You say > > ... have made the uses of this API more conservative for failed bit > flips than it was in the past. > > But the pre-existing unordered cases in the setting functions (e.g. > don't go through cmpxchg) are those where the bit is already set to > the desired value, so there's no failure to change the bit involved. > It seems reasonable to me that an acquire (at least) is usually > desirable on that path, for reasons similar to why one wants an > acquire on the outside-the-lock test when using the Double Checked > Locking pattern. But that's not what was said, so I'm not sure I'm > understanding the point. By failed bit flips, I specifically meant that the bit flipping function (par_set_at) returns false. This happens for two reasons: 1) the bit was already set in the original (relaxed) load, or 2) a concurrent thread beat us to it in the subsequent CAS. So if the function returns false, previously you couldn't know if the load that made the function return false had acquire semantics or not. Now with this patch it will have acquire semantics (unless the whole operation is specified to have relaxed or release semantics), even when the original load already had the bit set already. That is what I meant made the API more conservative than before. And as you say, I think that is a good thing. Hope this explains our misunderstanding. > ------------------------------------------------------------------------------ > > The par_xxx_range operations are not being directly modified by this > change. When only 1 bit is actually involved, they'll delegate to the > conservative single-bit operations, so are changed to pick up the > acquire on the already set to the desired value path. Otherwise, they > always go through conservative cmpxchg as before. That all seems fine. Yeah. I made a new patch that reverts at() to do what it used to (in the .hpp file), and added a new par_at() accessor instead with explicit memory ordering (asserted to be acquire or relaxed), defaulting to acquire, as suggested. I left some innocent cleanups of missing includes in the area that I would like to keep anyway. New webrev: http://cr.openjdk.java.net/~eosterlund/8233073/webrev.01/ Incremental: http://cr.openjdk.java.net/~eosterlund/8233073/webrev.00_01/ Thanks, /Erik From per.liden at oracle.com Tue Oct 29 14:59:40 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 29 Oct 2019 15:59:40 +0100 Subject: RFR: 8233073: Make BitMap accessors more memory ordering friendly In-Reply-To: <4786ce40-4889-2862-2e2c-f11bd661076a@oracle.com> References: <1b19522f-534d-ca6b-4e97-f837e4ab7212@oracle.com> <104E3C62-435C-4F31-87EB-D3EBC34634EE@oracle.com> <4786ce40-4889-2862-2e2c-f11bd661076a@oracle.com> Message-ID: Hi, On 10/29/19 2:40 PM, erik.osterlund at oracle.com wrote: > Hi Kim, > > On 10/29/19 2:54 AM, Kim Barrett wrote: >> Should this review be happening on hotspot-dev rather than >> hotspot-gc-dev?? GC is not the only BitMap client; compiler uses them >> too (and generally rather differently). > > Perhaps. I presumed that the way it is being changed is only interesting > for GC folks... but I guess that depends on the direction this is going. > Let's see... > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/utilities/bitMap.hpp >> 207?? bool at(idx_t index, atomic_memory_order memory_order = >> memory_order_relaxed) const; >> >> My initial reaction here is that I'd prefer adding par_at() rather >> than giving at() a memory order argument.? This would also address the >> question of what the default should be.? For at(), it's nonatomic. >> For par_at() it's acquire. >> >> That would also avoid imposing volatile ordering on at(). > > I like that idea. Let's give that a shot. > >> As you said, existing uses of at() are relaxed/nonatomic.? The code >> rearrangement for MarkBitMap::is_marked() makes me wonder if any of >> the calls should be acquire ordered, but obviously none are now... > > Yeah, I also wonder about that... > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/utilities/bitMap.hpp >> 205?? // The memory ordering goes up to memory_order_acquire, but not >> higher. It is >> 206?? // assumed that users of the BitMap API will never rely on IRIW >> consistency. >> >> I think what this means is that memory_order_seq_cst >> (memory_order_conservative in HotSpot) isn't supported? So just as we >> only have Atomic::load (relaxed) and Atomic::load_acquire (acquire). >> That seems okay. But if we aren't going to support the stronger >> semantics, I don't think this should permit the corresponding memory >> order value. >> >> C++11 specifies that atomic load operations cannot have a memory order >> of memory_order_release or memory_order_acq_rel. (Similarly, store >> operations cannot have a memory order of memory_order_consume, >> memory_order_acquire, or memory_order_acq_rel. That isn't relevant for >> this change, as all the modifying operations here are RMW.) >> >> So I think we should just be explicit that only relaxed and acquire >> are supported here.? (And actually make that true; see next comment.) >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/utilities/bitMap.inline.hpp >> ? 55 inline bool BitMap::at(idx_t index, atomic_memory_order >> memory_order) const { >> ... >> ? 58?? return (load_word_ordered(addr, memory_order) & >> bit_mask(index)) != 0; >> >> This is using the load_word_ordered helper, but the behavior of that >> function is designed to support the RMW operations, and I think isn't >> really right for at() (see previous comment). The simplest solution to >> get what I'm suggesting would be to add an assert here that the >> memory_order is either relaxed or acquire. > > That makes sense. I added the assert. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/shared/markBitMap.inline.hpp >> 71?? return _bm.at(addr_to_offset(addr), memory_order_relaxed); >> >> The memory order argument isn't needed with the current default, and >> wouldn't even be permitted with the above suggestion add par_at. > > Indeed. Reverted in favor of par_at. > >> ------------------------------------------------------------------------------ >> >> >> I'm not understanding part of the problem description though.? You say >> >> ?? ... have made the uses of this API more conservative for failed bit >> ?? flips than it was in the past. >> >> But the pre-existing unordered cases in the setting functions (e.g. >> don't go through cmpxchg) are those where the bit is already set to >> the desired value, so there's no failure to change the bit involved. >> It seems reasonable to me that an acquire (at least) is usually >> desirable on that path, for reasons similar to why one wants an >> acquire on the outside-the-lock test when using the Double Checked >> Locking pattern. But that's not what was said, so I'm not sure I'm >> understanding the point. > > By failed bit flips, I specifically meant that the bit flipping function > (par_set_at) returns false. This happens for two reasons: 1) the bit was > already set in the original (relaxed) load, or 2) a concurrent thread > beat us to it in the subsequent CAS. So if the function returns false, > previously you couldn't know if the load that made the function return > false had acquire semantics or not. Now with this patch it will have > acquire semantics (unless the whole operation is specified to have > relaxed or release semantics), even when the original load already had > the bit set already. That is what I meant made the API more conservative > than before. And as you say, I think that is a good thing. Hope this > explains our misunderstanding. > >> ------------------------------------------------------------------------------ >> >> >> The par_xxx_range operations are not being directly modified by this >> change. When only 1 bit is actually involved, they'll delegate to the >> conservative single-bit operations, so are changed to pick up the >> acquire on the already set to the desired value path. Otherwise, they >> always go through conservative cmpxchg as before.? That all seems fine. > > Yeah. > > I made a new patch that reverts at() to do what it used to (in the .hpp > file), and added a new par_at() accessor instead with explicit memory > ordering (asserted to be acquire or relaxed), defaulting to acquire, as > suggested. I left some innocent cleanups of missing includes in the area > that I would like to keep anyway. > > New webrev: > http://cr.openjdk.java.net/~eosterlund/8233073/webrev.01/ Looks good to me. I like the par_at() approach. /Per > > Incremental: > http://cr.openjdk.java.net/~eosterlund/8233073/webrev.00_01/ > > Thanks, > /Erik From erik.osterlund at oracle.com Tue Oct 29 15:22:10 2019 From: erik.osterlund at oracle.com (erik.osterlund at oracle.com) Date: Tue, 29 Oct 2019 16:22:10 +0100 Subject: RFR: 8233073: Make BitMap accessors more memory ordering friendly In-Reply-To: References: <1b19522f-534d-ca6b-4e97-f837e4ab7212@oracle.com> <104E3C62-435C-4F31-87EB-D3EBC34634EE@oracle.com> <4786ce40-4889-2862-2e2c-f11bd661076a@oracle.com> Message-ID: Hi Per, Thanks for the review. /Erik On 10/29/19 3:59 PM, Per Liden wrote: > Hi, > > On 10/29/19 2:40 PM, erik.osterlund at oracle.com wrote: >> Hi Kim, >> >> On 10/29/19 2:54 AM, Kim Barrett wrote: >>> Should this review be happening on hotspot-dev rather than >>> hotspot-gc-dev?? GC is not the only BitMap client; compiler uses them >>> too (and generally rather differently). >> >> Perhaps. I presumed that the way it is being changed is only >> interesting for GC folks... but I guess that depends on the direction >> this is going. Let's see... >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/utilities/bitMap.hpp >>> 207?? bool at(idx_t index, atomic_memory_order memory_order = >>> memory_order_relaxed) const; >>> >>> My initial reaction here is that I'd prefer adding par_at() rather >>> than giving at() a memory order argument.? This would also address the >>> question of what the default should be.? For at(), it's nonatomic. >>> For par_at() it's acquire. >>> >>> That would also avoid imposing volatile ordering on at(). >> >> I like that idea. Let's give that a shot. >> >>> As you said, existing uses of at() are relaxed/nonatomic.? The code >>> rearrangement for MarkBitMap::is_marked() makes me wonder if any of >>> the calls should be acquire ordered, but obviously none are now... >> >> Yeah, I also wonder about that... >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/utilities/bitMap.hpp >>> 205?? // The memory ordering goes up to memory_order_acquire, but >>> not higher. It is >>> 206?? // assumed that users of the BitMap API will never rely on >>> IRIW consistency. >>> >>> I think what this means is that memory_order_seq_cst >>> (memory_order_conservative in HotSpot) isn't supported? So just as we >>> only have Atomic::load (relaxed) and Atomic::load_acquire (acquire). >>> That seems okay. But if we aren't going to support the stronger >>> semantics, I don't think this should permit the corresponding memory >>> order value. >>> >>> C++11 specifies that atomic load operations cannot have a memory order >>> of memory_order_release or memory_order_acq_rel. (Similarly, store >>> operations cannot have a memory order of memory_order_consume, >>> memory_order_acquire, or memory_order_acq_rel. That isn't relevant for >>> this change, as all the modifying operations here are RMW.) >>> >>> So I think we should just be explicit that only relaxed and acquire >>> are supported here.? (And actually make that true; see next comment.) >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/utilities/bitMap.inline.hpp >>> ? 55 inline bool BitMap::at(idx_t index, atomic_memory_order >>> memory_order) const { >>> ... >>> ? 58?? return (load_word_ordered(addr, memory_order) & >>> bit_mask(index)) != 0; >>> >>> This is using the load_word_ordered helper, but the behavior of that >>> function is designed to support the RMW operations, and I think isn't >>> really right for at() (see previous comment). The simplest solution to >>> get what I'm suggesting would be to add an assert here that the >>> memory_order is either relaxed or acquire. >> >> That makes sense. I added the assert. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/shared/markBitMap.inline.hpp >>> 71?? return _bm.at(addr_to_offset(addr), memory_order_relaxed); >>> >>> The memory order argument isn't needed with the current default, and >>> wouldn't even be permitted with the above suggestion add par_at. >> >> Indeed. Reverted in favor of par_at. >> >>> ------------------------------------------------------------------------------ >>> >>> >>> I'm not understanding part of the problem description though. You say >>> >>> ?? ... have made the uses of this API more conservative for failed bit >>> ?? flips than it was in the past. >>> >>> But the pre-existing unordered cases in the setting functions (e.g. >>> don't go through cmpxchg) are those where the bit is already set to >>> the desired value, so there's no failure to change the bit involved. >>> It seems reasonable to me that an acquire (at least) is usually >>> desirable on that path, for reasons similar to why one wants an >>> acquire on the outside-the-lock test when using the Double Checked >>> Locking pattern. But that's not what was said, so I'm not sure I'm >>> understanding the point. >> >> By failed bit flips, I specifically meant that the bit flipping >> function (par_set_at) returns false. This happens for two reasons: 1) >> the bit was already set in the original (relaxed) load, or 2) a >> concurrent thread beat us to it in the subsequent CAS. So if the >> function returns false, previously you couldn't know if the load that >> made the function return false had acquire semantics or not. Now with >> this patch it will have acquire semantics (unless the whole operation >> is specified to have relaxed or release semantics), even when the >> original load already had the bit set already. That is what I meant >> made the API more conservative than before. And as you say, I think >> that is a good thing. Hope this explains our misunderstanding. >> >>> ------------------------------------------------------------------------------ >>> >>> >>> The par_xxx_range operations are not being directly modified by this >>> change. When only 1 bit is actually involved, they'll delegate to the >>> conservative single-bit operations, so are changed to pick up the >>> acquire on the already set to the desired value path. Otherwise, they >>> always go through conservative cmpxchg as before.? That all seems fine. >> >> Yeah. >> >> I made a new patch that reverts at() to do what it used to (in the >> .hpp file), and added a new par_at() accessor instead with explicit >> memory ordering (asserted to be acquire or relaxed), defaulting to >> acquire, as suggested. I left some innocent cleanups of missing >> includes in the area that I would like to keep anyway. >> >> New webrev: >> http://cr.openjdk.java.net/~eosterlund/8233073/webrev.01/ > > Looks good to me. I like the par_at() approach. > > /Per > >> >> Incremental: >> http://cr.openjdk.java.net/~eosterlund/8233073/webrev.00_01/ >> >> Thanks, >> /Erik From stefan.johansson at oracle.com Tue Oct 29 17:59:13 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 29 Oct 2019 18:59:13 +0100 Subject: RFR: 8233065: PSParallelCompact::move_and_update is unused and should be removed In-Reply-To: <9f2a0003-c33f-1d31-f4bb-8d491817a4be@oracle.com> References: <7C04DE96-9918-4F5B-81C5-0ABA5AB6DEAB@oracle.com> <9f2a0003-c33f-1d31-f4bb-8d491817a4be@oracle.com> Message-ID: <645B4E39-E795-47C6-AF84-29506715D0A6@oracle.com> Thanks for the reviews Thomas and Leo, Stefan > 28 okt. 2019 kl. 14:42 skrev Thomas Schatzl : > > Hi, > > On 28.10.19 13:41, Stefan Johansson wrote: >> Hi, >> Please review this small fix that removes an unused function. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8233065 >> Webrev: http://cr.openjdk.java.net/~sjohanss/8233065/00/ >> Summary: >> The function move_and_update was not removed when its last use was removed during the removal of PermGen. >> Testing: >> Build and tested through mach5 (tier1) > > looks good. > > Thomas From sangheon.kim at oracle.com Tue Oct 29 20:06:45 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 29 Oct 2019 13:06:45 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <8eefcbf1-9c75-1fe1-14b8-df0f23a53518@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> <8eefcbf1-9c75-1fe1-14b8-df0f23a53518@oracle.com> Message-ID: <4e872474-df42-069e-84dc-cfd5e8700914@oracle.com> Hi Thomas, On 10/29/19 1:42 AM, Thomas Schatzl wrote: > Hi, > > On 25.10.19 16:02, sangheon.kim at oracle.com wrote: > [...] >> >> In addition, Stefan, Thomas and I had some discussion about making >> PLAB-NUMA aware (only for survivor). >> Stefan provided a patch with it and it is simple enough to include >> under this CR. >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.4 >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.4.inc >> >> Testing: hs-tier 1 ~ 3, with/without UseNUMA >> >> Thanks, >> Sangheon >> >> > > - G1Allocator::nodes() -> G1Allocator::num_nodes() Done. > > - g1Allocator.hpp:167: s/depend/depending Done. > > - please file an RFE investigating adding the node index to the region > attributes Filed https://bugs.openjdk.java.net/browse/JDK-8233149: Investigate adding node index at G1HeapRegionAttr. > > Looks good otherwise. I do not need a re-review for these changes. Thanks for your review. Sangheon > > Thomas From stefan.johansson at oracle.com Tue Oct 29 20:13:45 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 29 Oct 2019 21:13:45 +0100 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> Message-ID: <5A6C0668-86F6-4A3F-AC4D-75097D40A1C4@oracle.com> Hi Sangheon, > 25 okt. 2019 kl. 16:02 skrev sangheon.kim at oracle.com: > > Hi Stefan, > > On 10/23/19 1:47 AM, Stefan Johansson wrote: >> Hi Sangheon, >> >> On 2019-10-22 18:47, sangheon.kim at oracle.com wrote: >>> Hi Kim, >>> >>> On 10/22/19 12:19 AM, Kim Barrett wrote: >>>>> On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: >>>>> What do you think about below comment? >>>>> >>>>> // Tries to allocate word_sz in the PLAB of the next "generation" after trying to >>>>> // allocate into dest. Previous_plab_refill_failed indicates whether previous >>>>> // PLAB refill for the original (source) object was failed. >>>> Drop ?was?. Otherwise looks good. >>> Done. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3 >>> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3.inc >> Looks good in general, just one minor thing, no need for a new webrev though: >> src/hotspot/share/gc/g1/g1Allocator.cpp >> --- >> 144 for (uint nodex_index = 0; nodex_index < _num_alloc_regions; nodex_index++) { >> >> The name nodex_index has one too many x:es =) I would prefer node_index. > Ouch! > Fixed.. > > In addition, Stefan, Thomas and I had some discussion about making PLAB-NUMA aware (only for survivor). > Stefan provided a patch with it and it is simple enough to include under this CR. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.4 > http://cr.openjdk.java.net/~sangheki/8220311/webrev.4.inc Looks good in general, just one comment. src/hotspot/share/gc/g1/g1Allocator.inline.hpp --- 78 assert(_alloc_buffers[dest.type()] != NULL, 79 "Allocation buffer is NULL: %s", dest.get_type_str()); 80 G1HeapRegionAttr::region_type_t type = dest.type(); 81 return alloc_buffer(type, node_index); As I mentioned to you offline, I think it is a bit unfortunate that we can?t index our way to the correct PLAB in G1PLABAllocator::alloc_buffer(?) without the if-statement, but I agree that having multiple array slots pointing to the same PLAB isn?t a optimal either. So I think this is approach is good for now, but I have a very minor comment on the code snippet above. I would prefer if line 80 was skipped and the call on 81 just did return alloc_buffer(dest.type(), node_index). ? I don?t need a new webrev for this. Thanks, Stefan > > Testing: hs-tier 1 ~ 3, with/without UseNUMA > > Thanks, > Sangheon > > >> --- >> >> Thanks, >> Stefan >> >>> >>> Thanks, >>> Sangheon >>> >>> >>>> >>>>> // Returns a non-NULL pointer if successful, and updates dest if required. >>>>> // Also determines whether we should continue to try to allocate into the various >>>>> // generations or just end trying to allocate. >>>>> HeapWord* allocate_in_next_plab(G1HeapRegionAttr* dest, >>>>> ... >>>>> >>>>> Let me post the webrev when we decide. :) >>>>> >>>>> Thanks, >>>>> Sangheon >>>>> >>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> Looks good, other than that one comment issue. From sangheon.kim at oracle.com Tue Oct 29 20:39:00 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 29 Oct 2019 13:39:00 -0700 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <9d9494cd-82cd-6cf6-94e6-432a6ae187fb@oracle.com> References: <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> <9d9494cd-82cd-6cf6-94e6-432a6ae187fb@oracle.com> Message-ID: <8430eee6-8990-6367-8ede-0741de8fc836@oracle.com> Hi Kim and Per, Thanks for your reviews. ----------- To all reviewers, Stefan suggested a safer handling of node index so here's another webrev. Basically when we enable AlwaysPreTouch, we expect to get actual node id of the address. However, in theory we still may get something unknown id. So below change is added to have safer handling of node index. uint G1NUMA::index_for_region(HeapRegion* hr) const { if (!is_enabled()) { return 0; } if (AlwaysPreTouch) { // If we already pretouched, we can check actual node index here. - return index_of_address(hr->bottom()); + // However, if node index is still unknown, use preferred node index. + uint node_index = index_of_address(hr->bottom()); + if (node_index != UnknownNodeIndex) { + return node_index; + } Webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.8 http://cr.openjdk.java.net/~sangheki/8220310/webrev.8.inc Testing: local build Thanks, Sangheon On 10/26/19 1:36 AM, Per Liden wrote: > On 10/25/19 11:56 PM, sangheon.kim at oracle.com wrote: >> Hi Kim, >> >> On 10/24/19 4:05 PM, Kim Barrett wrote: >>>> On Oct 23, 2019, at 12:20 PM,sangheon.kim at oracle.com? wrote: >>>> >>>> Hi Per, >>>> >>>> Thanks for taking a look at this. >>>> >>>> I agree all your comments and here's the webrev. >>>> - All comments from Per. >>>> - Move G1PageBasedVirtualSpace::page_size() near to page_start() >>>> from Kim. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.6 >>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.6.inc >>>> Testing: build test for linux, solaris, windows and mac. >>>> >>>> FYI, as I think existing numa related API names and -1 stuff seem >>>> not good, I planned to refine those later after pushing. But as you >>>> said following existing rule and then refine all together later >>>> seems better. >>> The type of the argument for numa_get_group_id(void* address) should >>> be "const void*".? Sorry I didn't notice that earlier.? Of course, >>> this will require a const_cast to remove the const qualifier when >>> calling get_mempolicy, but it is better to isolate the workaround for >>> that missing qualifier to that one place. >>> >>> I'm not sure I like the overload for os::numa_get_group_id. While >>> both are getting the numa id associated with something, the >>> associations >>> involved seem pretty different to me. >>> >>> Spelling them out, they could be >>> >>> numa_get_group_id_for_current_thread() >>> numa_get_group_id_for_address(const void* address) >>> >>> Those seem semantically unrelated to me, so violate the usual guidance >>> of only overloading operations that are roughly equivalent (*).? Or put >>> another way, one should not need to determine which overload is >>> selected >>> to understand a call site. >>> >>> Of course, "roughly equivalent" is in the eye of the beholder. >>> >>> (*) Operator overloading sometimes violates this on the basis that the >>> syntactic concision of using operators is more important, and there >>> are a limited set of operators.? Such violations are often used as an >>> argument against using operator overloading at all. >> I think the overload looks okay to me. >> But as you are not sure about it, I renamed the newly added one. >> >> - static int numa_get_group_id(void* address); >> + static int numa_get_group_id_for_address(const void* address); >> > > Works for me. > > /Per > >> >> webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.7 >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.7.inc >> >> Testing: hs-tier1 >> >> Thanks, >> Sangheon >> >> >> From sangheon.kim at oracle.com Tue Oct 29 20:44:40 2019 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Tue, 29 Oct 2019 13:44:40 -0700 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <5A6C0668-86F6-4A3F-AC4D-75097D40A1C4@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> <5A6C0668-86F6-4A3F-AC4D-75097D40A1C4@oracle.com> Message-ID: <85b282d1-0837-af5c-745f-efd0000d0ae1@oracle.com> Hi Stefan, Thanks for reviewing this. On 10/29/19 1:13 PM, Stefan Johansson wrote: > Hi Sangheon, > >> 25 okt. 2019 kl. 16:02 skrev sangheon.kim at oracle.com: >> >> Hi Stefan, >> >> On 10/23/19 1:47 AM, Stefan Johansson wrote: >>> Hi Sangheon, >>> >>> On 2019-10-22 18:47, sangheon.kim at oracle.com wrote: >>>> Hi Kim, >>>> >>>> On 10/22/19 12:19 AM, Kim Barrett wrote: >>>>>> On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: >>>>>> What do you think about below comment? >>>>>> >>>>>> // Tries to allocate word_sz in the PLAB of the next "generation" after trying to >>>>>> // allocate into dest. Previous_plab_refill_failed indicates whether previous >>>>>> // PLAB refill for the original (source) object was failed. >>>>> Drop ?was?. Otherwise looks good. >>>> Done. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3 >>>> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3.inc >>> Looks good in general, just one minor thing, no need for a new webrev though: >>> src/hotspot/share/gc/g1/g1Allocator.cpp >>> --- >>> 144 for (uint nodex_index = 0; nodex_index < _num_alloc_regions; nodex_index++) { >>> >>> The name nodex_index has one too many x:es =) I would prefer node_index. >> Ouch! >> Fixed.. >> >> In addition, Stefan, Thomas and I had some discussion about making PLAB-NUMA aware (only for survivor). >> Stefan provided a patch with it and it is simple enough to include under this CR. >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.4 >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.4.inc > Looks good in general, just one comment. > > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > --- > 78 assert(_alloc_buffers[dest.type()] != NULL, > 79 "Allocation buffer is NULL: %s", dest.get_type_str()); > > 80 G1HeapRegionAttr::region_type_t type = dest.type(); > 81 return alloc_buffer(type, node_index); > > As I mentioned to you offline, I think it is a bit unfortunate that we can?t index our way to the correct PLAB in G1PLABAllocator::alloc_buffer(?) without the if-statement, but I agree that having multiple array slots pointing to the same PLAB isn?t a optimal either. So I think this is approach is good for now, but I have a very minor comment on the code snippet above. I would prefer if line 80 was skipped and the call on 81 just did return alloc_buffer(dest.type(), node_index). Done. It is leftover from testing code. You and Thomas didn't ask for webrev, but here's the next one for the record. :) Webrev: http://cr.openjdk.java.net/~sangheki/8220311/webrev.5 http://cr.openjdk.java.net/~sangheki/8220311/webrev.5.inc Testing: local build Thanks, Sangheon > ? > > I don?t need a new webrev for this. > > Thanks, > Stefan > > >> Testing: hs-tier 1 ~ 3, with/without UseNUMA >> >> Thanks, >> Sangheon >> >> >>> --- >>> >>> Thanks, >>> Stefan >>> >>>> Thanks, >>>> Sangheon >>>> >>>> >>>>>> // Returns a non-NULL pointer if successful, and updates dest if required. >>>>>> // Also determines whether we should continue to try to allocate into the various >>>>>> // generations or just end trying to allocate. >>>>>> HeapWord* allocate_in_next_plab(G1HeapRegionAttr* dest, >>>>>> ... >>>>>> >>>>>> Let me post the webrev when we decide. :) >>>>>> >>>>>> Thanks, >>>>>> Sangheon >>>>>> >>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> Looks good, other than that one comment issue. From stefan.johansson at oracle.com Wed Oct 30 07:25:18 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 30 Oct 2019 08:25:18 +0100 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <8430eee6-8990-6367-8ede-0741de8fc836@oracle.com> References: <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> <9d9494cd-82cd-6cf6-94e6-432a6ae187fb@oracle.com> <8430eee6-8990-6367-8ede-0741de8fc836@oracle.com> Message-ID: <7268E931-1F9D-47CF-86CE-F7AA29D4D10D@oracle.com> > 29 okt. 2019 kl. 21:39 skrev sangheon.kim at oracle.com: > > Hi Kim and Per, > > Thanks for your reviews. > > ----------- > To all reviewers, > > Stefan suggested a safer handling of node index so here's another webrev. > Basically when we enable AlwaysPreTouch, we expect to get actual node id of the address. > However, in theory we still may get something unknown id. So below change is added to have safer handling of node index. > > uint G1NUMA::index_for_region(HeapRegion* hr) const { > if (!is_enabled()) { > return 0; > } > > > if (AlwaysPreTouch) { > // If we already pretouched, we can check actual node index here. > - return index_of_address(hr->bottom()); > > + // However, if node index is still unknown, use preferred node index. > + uint node_index = index_of_address(hr->bottom()); > + if (node_index != UnknownNodeIndex) { > + return node_index; > + } > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.8 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.8.inc Looks good, Stefan > Testing: local build > > Thanks, > Sangheon > > > On 10/26/19 1:36 AM, Per Liden wrote: >> On 10/25/19 11:56 PM, sangheon.kim at oracle.com wrote: >>> Hi Kim, >>> >>> On 10/24/19 4:05 PM, Kim Barrett wrote: >>>>> On Oct 23, 2019, at 12:20 PM,sangheon.kim at oracle.com wrote: >>>>> >>>>> Hi Per, >>>>> >>>>> Thanks for taking a look at this. >>>>> >>>>> I agree all your comments and here's the webrev. >>>>> - All comments from Per. >>>>> - Move G1PageBasedVirtualSpace::page_size() near to page_start() from Kim. >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.6 >>>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.6.inc >>>>> Testing: build test for linux, solaris, windows and mac. >>>>> >>>>> FYI, as I think existing numa related API names and -1 stuff seem not good, I planned to refine those later after pushing. But as you said following existing rule and then refine all together later seems better. >>>> The type of the argument for numa_get_group_id(void* address) should >>>> be "const void*". Sorry I didn't notice that earlier. Of course, >>>> this will require a const_cast to remove the const qualifier when >>>> calling get_mempolicy, but it is better to isolate the workaround for >>>> that missing qualifier to that one place. >>>> >>>> I'm not sure I like the overload for os::numa_get_group_id. While >>>> both are getting the numa id associated with something, the associations >>>> involved seem pretty different to me. >>>> >>>> Spelling them out, they could be >>>> >>>> numa_get_group_id_for_current_thread() >>>> numa_get_group_id_for_address(const void* address) >>>> >>>> Those seem semantically unrelated to me, so violate the usual guidance >>>> of only overloading operations that are roughly equivalent (*). Or put >>>> another way, one should not need to determine which overload is selected >>>> to understand a call site. >>>> >>>> Of course, "roughly equivalent" is in the eye of the beholder. >>>> >>>> (*) Operator overloading sometimes violates this on the basis that the >>>> syntactic concision of using operators is more important, and there >>>> are a limited set of operators. Such violations are often used as an >>>> argument against using operator overloading at all. >>> I think the overload looks okay to me. >>> But as you are not sure about it, I renamed the newly added one. >>> >>> - static int numa_get_group_id(void* address); >>> + static int numa_get_group_id_for_address(const void* address); >>> >> >> Works for me. >> >> /Per >> >>> >>> webrev: >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.7 >>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.7.inc >>> >>> Testing: hs-tier1 >>> >>> Thanks, >>> Sangheon >>> >>> >>> > From stefan.johansson at oracle.com Wed Oct 30 07:27:20 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Wed, 30 Oct 2019 08:27:20 +0100 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <85b282d1-0837-af5c-745f-efd0000d0ae1@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> <5A6C0668-86F6-4A3F-AC4D-75097D40A1C4@oracle.com> <85b282d1-0837-af5c-745f-efd0000d0ae1@oracle.com> Message-ID: <967B0943-BD9F-451B-9A00-5FE7A200C620@oracle.com> Looks good, Stefan > 29 okt. 2019 kl. 21:44 skrev sangheon.kim at oracle.com: > > Hi Stefan, > > Thanks for reviewing this. > > On 10/29/19 1:13 PM, Stefan Johansson wrote: >> Hi Sangheon, >> >>> 25 okt. 2019 kl. 16:02 skrev sangheon.kim at oracle.com: >>> >>> Hi Stefan, >>> >>> On 10/23/19 1:47 AM, Stefan Johansson wrote: >>>> Hi Sangheon, >>>> >>>> On 2019-10-22 18:47, sangheon.kim at oracle.com wrote: >>>>> Hi Kim, >>>>> >>>>> On 10/22/19 12:19 AM, Kim Barrett wrote: >>>>>>> On Oct 22, 2019, at 1:52 AM, sangheon.kim at oracle.com wrote: >>>>>>> What do you think about below comment? >>>>>>> >>>>>>> // Tries to allocate word_sz in the PLAB of the next "generation" after trying to >>>>>>> // allocate into dest. Previous_plab_refill_failed indicates whether previous >>>>>>> // PLAB refill for the original (source) object was failed. >>>>>> Drop ?was?. Otherwise looks good. >>>>> Done. >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3 >>>>> http://cr.openjdk.java.net/~sangheki/8220311/webrev.3.inc >>>> Looks good in general, just one minor thing, no need for a new webrev though: >>>> src/hotspot/share/gc/g1/g1Allocator.cpp >>>> --- >>>> 144 for (uint nodex_index = 0; nodex_index < _num_alloc_regions; nodex_index++) { >>>> >>>> The name nodex_index has one too many x:es =) I would prefer node_index. >>> Ouch! >>> Fixed.. >>> >>> In addition, Stefan, Thomas and I had some discussion about making PLAB-NUMA aware (only for survivor). >>> Stefan provided a patch with it and it is simple enough to include under this CR. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~sangheki/8220311/webrev.4 >>> http://cr.openjdk.java.net/~sangheki/8220311/webrev.4.inc >> Looks good in general, just one comment. >> >> src/hotspot/share/gc/g1/g1Allocator.inline.hpp >> --- >> 78 assert(_alloc_buffers[dest.type()] != NULL, >> 79 "Allocation buffer is NULL: %s", dest.get_type_str()); >> >> 80 G1HeapRegionAttr::region_type_t type = dest.type(); >> 81 return alloc_buffer(type, node_index); >> >> As I mentioned to you offline, I think it is a bit unfortunate that we can?t index our way to the correct PLAB in G1PLABAllocator::alloc_buffer(?) without the if-statement, but I agree that having multiple array slots pointing to the same PLAB isn?t a optimal either. So I think this is approach is good for now, but I have a very minor comment on the code snippet above. I would prefer if line 80 was skipped and the call on 81 just did return alloc_buffer(dest.type(), node_index). > Done. > It is leftover from testing code. > > You and Thomas didn't ask for webrev, but here's the next one for the record. :) > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.5 > http://cr.openjdk.java.net/~sangheki/8220311/webrev.5.inc > > Testing: local build > > Thanks, > Sangheon > > >> ? >> >> I don?t need a new webrev for this. >> >> Thanks, >> Stefan >> >> >>> Testing: hs-tier 1 ~ 3, with/without UseNUMA >>> >>> Thanks, >>> Sangheon >>> >>> >>>> --- >>>> >>>> Thanks, >>>> Stefan >>>> >>>>> Thanks, >>>>> Sangheon >>>>> >>>>> >>>>>>> // Returns a non-NULL pointer if successful, and updates dest if required. >>>>>>> // Also determines whether we should continue to try to allocate into the various >>>>>>> // generations or just end trying to allocate. >>>>>>> HeapWord* allocate_in_next_plab(G1HeapRegionAttr* dest, >>>>>>> ... >>>>>>> >>>>>>> Let me post the webrev when we decide. :) >>>>>>> >>>>>>> Thanks, >>>>>>> Sangheon >>>>>>> >>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> >>>>>>>> Looks good, other than that one comment issue. From thomas.schatzl at oracle.com Wed Oct 30 08:02:07 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 30 Oct 2019 09:02:07 +0100 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <7268E931-1F9D-47CF-86CE-F7AA29D4D10D@oracle.com> References: <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> <9d9494cd-82cd-6cf6-94e6-432a6ae187fb@oracle.com> <8430eee6-8990-6367-8ede-0741de8fc836@oracle.com> <7268E931-1F9D-47CF-86CE-F7AA29D4D10D@oracle.com> Message-ID: <2055ec92-5116-1a86-4002-5c304e63c29d@oracle.com> Hi, On 30.10.19 08:25, Stefan Johansson wrote: > > >> 29 okt. 2019 kl. 21:39 skrev sangheon.kim at oracle.com: >> >> Hi Kim and Per, >> >> Thanks for your reviews. >> >> ----------- >> To all reviewers, >> >> Stefan suggested a safer handling of node index so here's another webrev. >> Basically when we enable AlwaysPreTouch, we expect to get actual node id of the address. >> However, in theory we still may get something unknown id. So below change is added to have safer handling of node index. >> >> uint G1NUMA::index_for_region(HeapRegion* hr) const { >> if (!is_enabled()) { >> return 0; >> } >> >> >> if (AlwaysPreTouch) { >> // If we already pretouched, we can check actual node index here. >> - return index_of_address(hr->bottom()); >> >> + // However, if node index is still unknown, use preferred node index. >> + uint node_index = index_of_address(hr->bottom()); >> + if (node_index != UnknownNodeIndex) { >> + return node_index; >> + } >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.8 >> http://cr.openjdk.java.net/~sangheki/8220310/webrev.8.inc > Looks good, > Stefan +1 Thomas From thomas.schatzl at oracle.com Wed Oct 30 08:00:48 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 30 Oct 2019 09:00:48 +0100 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <967B0943-BD9F-451B-9A00-5FE7A200C620@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> <5A6C0668-86F6-4A3F-AC4D-75097D40A1C4@oracle.com> <85b282d1-0837-af5c-745f-efd0000d0ae1@oracle.com> <967B0943-BD9F-451B-9A00-5FE7A200C620@oracle.com> Message-ID: Hi, webrev.5 looks good. Thomas On 30.10.19 08:27, Stefan Johansson wrote: > Looks good, > Stefan > From stefan.karlsson at oracle.com Wed Oct 30 10:50:05 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 30 Oct 2019 11:50:05 +0100 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: <3ac7acf6-7154-1627-38b3-c186e9a18267@oracle.com> References: <837c9e23-6068-53b5-9c50-7880a0f375c7@oracle.com> <806e1ac7-fa7a-7058-c4cc-bdd90a9f2b86@oracle.com> <3ac7acf6-7154-1627-38b3-c186e9a18267@oracle.com> Message-ID: <446b360d-498d-18ed-7d6a-ce7fd2dba14f@oracle.com> Hi Erik, Reviewed: https://cr.openjdk.java.net/~eosterlund/8224817/webrev.02 Looks good. Thanks, StefanK On 2019-10-29 12:31, erik.osterlund at oracle.com wrote: > Seems reasonable. Thanks. > > /Erik > > On 10/29/19 12:29 PM, Per Liden wrote: >> Some suggested adjustments, already discussed with Erik off-line: >> >> http://cr.openjdk.java.net/~pliden/8224817/webrev.review.0 >> >> /Per >> >> On 10/28/19 5:11 PM, Erik ?sterlund wrote: >>> Hi, >>> >>> After some internal discussions with Per and Stefan, some >>> refactorings have been made: >>> >>> 1) Use mmap consistently wherever possibly, instead of mach_vm_map, >>> for consistency. And only use mach_vm_remap from a wrapper function >>> to map in views. >>> 2) Move the pmem segments up one level so that producer and consumer >>> of the segments is on the same level, and let the virtual "file" know >>> only about offsets. >>> 3) Minor polishing. >>> >>> Incremental: >>> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00..01/ >>> >>> Full: >>> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.01/ >>> >>> Thanks, >>> /Erik >>> >>> On 2019-10-24 12:38, erik.osterlund at oracle.com wrote: >>>> Hi, >>>> >>>> Now that some curling has been performed, paving way for this patch: >>>> >>>> ??? 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops >>>> from non-oops >>>> ??? 8229278: Improve hs_err location printing to assume less about >>>> GC internals >>>> ??? 8229189: Improve JFR leak profiler tracing to deal with >>>> discontiguous heaps >>>> ??? 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() >>>> ??? 8224820: ZGC: Support discontiguous heap reservations >>>> >>>> ...the remaining thing to do is plugging in a few platform specific >>>> ZGC files. This patch does that. >>>> Decided to go with mach_vm_map/mach_vm_remap to implement >>>> multi-mapping. Previously I didn't want to do that as I couldn't >>>> figure out how to mach_vm_remap memory on top of reserved VA >>>> (acquired using mmap). But apparently VM_FLAGS_OVERWRITE was the >>>> missing ingredient there. With that in place, dodging the terrible >>>> ftruncate implementation on macOS seemed like a good idea. That also >>>> implies this port supports large pages (unlike other GCs on macOS >>>> today). Yay! >>>> >>>> CR: >>>> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00/ >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8229358 >>>> >>>> Thanks, >>>> /Erik >>> > From zgu at redhat.com Wed Oct 30 12:56:33 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 30 Oct 2019 08:56:33 -0400 Subject: RFR 8233165: Shenandoah:SBSA::gen_load_reference_barrier_stub() should use pointer register for address on aarch64 Message-ID: <82ee1d40-a7a8-1b68-1aec-5b15597d7cbb@redhat.com> The load address can come in in single-size or double-size register, as_pointer_register() can deal with both case. Bug: https://bugs.openjdk.java.net/browse/JDK-8233165 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233165/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug and release) jcstress quick tests (fastdebug and release) on aarch64 Linux Thanks, -Zhengyu From rkennke at redhat.com Wed Oct 30 13:40:04 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 30 Oct 2019 14:40:04 +0100 Subject: RFR 8233165: Shenandoah:SBSA::gen_load_reference_barrier_stub() should use pointer register for address on aarch64 In-Reply-To: <82ee1d40-a7a8-1b68-1aec-5b15597d7cbb@redhat.com> References: <82ee1d40-a7a8-1b68-1aec-5b15597d7cbb@redhat.com> Message-ID: Nice. Please push the fix. Thanks, Roman > The load address can come in in single-size or double-size register, > as_pointer_register() can deal with both case. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8233165 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233165/webrev.00/ > > Test: > ? hotspot_gc_shenandoah (fastdebug and release) > ? jcstress quick tests (fastdebug and release) > ? on aarch64 Linux > > Thanks, > > -Zhengyu > From zgu at redhat.com Wed Oct 30 13:43:51 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 30 Oct 2019 09:43:51 -0400 Subject: RFR 8233165: Shenandoah:SBSA::gen_load_reference_barrier_stub() should use pointer register for address on aarch64 In-Reply-To: References: <82ee1d40-a7a8-1b68-1aec-5b15597d7cbb@redhat.com> Message-ID: Thanks for the review, and pushed. -Zhengyu On 10/30/19 9:40 AM, Roman Kennke wrote: > Nice. > > Please push the fix. > > Thanks, > Roman > >> The load address can come in in single-size or double-size register, >> as_pointer_register() can deal with both case. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233165 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233165/webrev.00/ >> >> Test: >> ? hotspot_gc_shenandoah (fastdebug and release) >> ? jcstress quick tests (fastdebug and release) >> ? on aarch64 Linux >> >> Thanks, >> >> -Zhengyu >> > From kim.barrett at oracle.com Wed Oct 30 14:18:30 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 30 Oct 2019 10:18:30 -0400 Subject: RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3) In-Reply-To: <8430eee6-8990-6367-8ede-0741de8fc836@oracle.com> References: <2b37edd6-3e0f-013d-1616-9d003f8ac1ed@oracle.com> <74ACAF31-8233-482A-892E-0D2E7CA72F4F@oracle.com> <4afe9f43-4cfa-9384-f45f-f985399629dd@oracle.com> <77f6c57a-65a6-2727-cbe9-fbc1ed52a015@oracle.com> <7C1985BF-A769-49FB-A658-E1B1060B5897@oracle.com> <3F549477-A2DF-42CF-A0E5-586F78BBCC47@oracle.com> <9219a118-0c1d-2cee-10e5-f9bb87c72eb9@oracle.com> <521b3b8a-70e6-6fef-cb67-b6327fa08c03@oracle.com> <0A9D98F3-479D-421D-A5E0-0AB8BB203717@oracle.com> <1615ad5b-6be7-7e7d-6815-68cfc338fd6f@oracle.com> <9d9494cd-82cd-6cf6-94e6-432a6ae187fb@oracle.com> <8430eee6-8990-6367-8ede-0741de8fc836@oracle.com> Message-ID: <455CC0A4-D794-4BB9-9408-D1314E8CD008@oracle.com> > On Oct 29, 2019, at 4:39 PM, sangheon.kim at oracle.com wrote: > > Hi Kim and Per, > > Thanks for your reviews. > > ----------- > To all reviewers, > > Stefan suggested a safer handling of node index so here's another webrev. > Basically when we enable AlwaysPreTouch, we expect to get actual node id of the address. > However, in theory we still may get something unknown id. So below change is added to have safer handling of node index. > > uint G1NUMA::index_for_region(HeapRegion* hr) const { > if (!is_enabled()) { > return 0; > } > > > if (AlwaysPreTouch) { > // If we already pretouched, we can check actual node index here. > - return index_of_address(hr->bottom()); > > + // However, if node index is still unknown, use preferred node index. > + uint node_index = index_of_address(hr->bottom()); > + if (node_index != UnknownNodeIndex) { > + return node_index; > + } > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220310/webrev.8 > http://cr.openjdk.java.net/~sangheki/8220310/webrev.8.inc > Testing: local build Looks good. From erik.osterlund at oracle.com Wed Oct 30 14:47:04 2019 From: erik.osterlund at oracle.com (=?utf-8?Q?Erik_=C3=96sterlund?=) Date: Wed, 30 Oct 2019 15:47:04 +0100 Subject: RFR: 8224817: Implementation of JEP 364: ZGC on macOS In-Reply-To: <446b360d-498d-18ed-7d6a-ce7fd2dba14f@oracle.com> References: <446b360d-498d-18ed-7d6a-ce7fd2dba14f@oracle.com> Message-ID: <459ABC22-5A33-4A10-ADFD-61B9CE776B69@oracle.com> Hi Stefan, Thank you for the review. /Erik > On 30 Oct 2019, at 11:50, Stefan Karlsson wrote: > > ?Hi Erik, > > Reviewed: > https://cr.openjdk.java.net/~eosterlund/8224817/webrev.02 > > Looks good. > > Thanks, > StefanK > > >> On 2019-10-29 12:31, erik.osterlund at oracle.com wrote: >> Seems reasonable. Thanks. >> /Erik >>> On 10/29/19 12:29 PM, Per Liden wrote: >>> Some suggested adjustments, already discussed with Erik off-line: >>> >>> http://cr.openjdk.java.net/~pliden/8224817/webrev.review.0 >>> >>> /Per >>> >>> On 10/28/19 5:11 PM, Erik ?sterlund wrote: >>>> Hi, >>>> >>>> After some internal discussions with Per and Stefan, some refactorings have been made: >>>> >>>> 1) Use mmap consistently wherever possibly, instead of mach_vm_map, for consistency. And only use mach_vm_remap from a wrapper function to map in views. >>>> 2) Move the pmem segments up one level so that producer and consumer of the segments is on the same level, and let the virtual "file" know only about offsets. >>>> 3) Minor polishing. >>>> >>>> Incremental: >>>> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00..01/ >>>> >>>> Full: >>>> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.01/ >>>> >>>> Thanks, >>>> /Erik >>>> >>>> On 2019-10-24 12:38, erik.osterlund at oracle.com wrote: >>>>> Hi, >>>>> >>>>> Now that some curling has been performed, paving way for this patch: >>>>> >>>>> 8229027: Improve how JNIHandleBlock::oops_do distinguishes oops from non-oops >>>>> 8229278: Improve hs_err location printing to assume less about GC internals >>>>> 8229189: Improve JFR leak profiler tracing to deal with discontiguous heaps >>>>> 8224815: Remove non-GC uses of CollectedHeap::is_in_reserved() >>>>> 8224820: ZGC: Support discontiguous heap reservations >>>>> >>>>> ...the remaining thing to do is plugging in a few platform specific ZGC files. This patch does that. >>>>> Decided to go with mach_vm_map/mach_vm_remap to implement multi-mapping. Previously I didn't want to do that as I couldn't figure out how to mach_vm_remap memory on top of reserved VA (acquired using mmap). But apparently VM_FLAGS_OVERWRITE was the missing ingredient there. With that in place, dodging the terrible ftruncate implementation on macOS seemed like a good idea. That also implies this port supports large pages (unlike other GCs on macOS today). Yay! >>>>> >>>>> CR: >>>>> http://cr.openjdk.java.net/~eosterlund/8224817/webrev.00/ >>>>> >>>>> Bug: >>>>> https://bugs.openjdk.java.net/browse/JDK-8229358 >>>>> >>>>> Thanks, >>>>> /Erik >>>> From aph at redhat.com Wed Oct 30 16:50:36 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 30 Oct 2019 16:50:36 +0000 Subject: RFR 8228532: Shenandoah: Implement SBSA::try_resolve_jobject_in_native() In-Reply-To: <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> References: <5a2ee72a-ea25-669e-226f-7eb62084068a@redhat.com> <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> Message-ID: On 7/26/19 2:18 AM, Zhengyu Gu wrote: > Updated Webrev: http://cr.openjdk.java.net/~zgu/JDK-8228532/webrev.01/ > > On X86 platforms, r15 does not have valid thread value, instead, it > should be derived from jni_env argument. > > Test: > hotspot_gc_shenandoah (fastdebug and release) on > Linux x86_64, x86_32 > Windows x86_64. FYI: I found a bug in AArch64. When we are resolving an object in native, rthread does not contain a valid thread value. Instead it should be derived from the jni_env argument. I believe this is true for all platforms: none will have a valid rthread when called from native code. Has this bug been backported? How should we handle it? Suggested patch: diff -r 6a05019acb67 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Sep 17 14:00:36 2019 -0400 +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Wed Oct 30 12:44:23 2019 -0400 @@ -424,9 +448,12 @@ // Check for null. __ cbz(obj, done); assert(obj != rscratch2, "need rscratch2"); - Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); - __ ldrb(rscratch2, gc_state); + Address gc_state(jni_env, ShenandoahThreadLocalData::gc_state_offset() - JavaThread::jni_environment_offset()); + __ lea(rscratch2, gc_state); + __ ldrb(rscratch2, Address(rscratch2)); // Check for heap in evacuation phase __ tbnz(rscratch2, ShenandoahHeap::EVACUATION_BITPOS, slowpath); -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Wed Oct 30 17:02:15 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 30 Oct 2019 18:02:15 +0100 Subject: RFR 8228532: Shenandoah: Implement SBSA::try_resolve_jobject_in_native() In-Reply-To: References: <5a2ee72a-ea25-669e-226f-7eb62084068a@redhat.com> <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> Message-ID: On 10/30/19 5:50 PM, Andrew Haley wrote: > Has this bug been backported? How should we handle it? JDK-8228532 is only in 14, it had not been backported. > Suggested patch: > > diff -r 6a05019acb67 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp > --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Sep 17 14:00:36 2019 -0400 > +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Wed Oct 30 12:44:23 2019 -0400 > @@ -424,9 +448,12 @@ > // Check for null. > __ cbz(obj, done); > > assert(obj != rscratch2, "need rscratch2"); > - Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); > - __ ldrb(rscratch2, gc_state); > + Address gc_state(jni_env, ShenandoahThreadLocalData::gc_state_offset() - JavaThread::jni_environment_offset()); > + __ lea(rscratch2, gc_state); > + __ ldrb(rscratch2, Address(rscratch2)); > > // Check for heap in evacuation phase > __ tbnz(rscratch2, ShenandoahHeap::EVACUATION_BITPOS, slowpath); Yes, RFR that under new bug and link it to 8228532 :) I think x86 does it correctly already: https://hg.openjdk.java.net/jdk/jdk/rev/db740ced41c4 -- Thanks, -Aleksey From aph at redhat.com Wed Oct 30 17:07:24 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 30 Oct 2019 17:07:24 +0000 Subject: RFR 8233165: Shenandoah:SBSA::gen_load_reference_barrier_stub() should use pointer register for address on aarch64 In-Reply-To: <82ee1d40-a7a8-1b68-1aec-5b15597d7cbb@redhat.com> References: <82ee1d40-a7a8-1b68-1aec-5b15597d7cbb@redhat.com> Message-ID: On 10/30/19 12:56 PM, Zhengyu Gu wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8233165 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233165/webrev.00/ > > Test: > hotspot_gc_shenandoah (fastdebug and release) > jcstress quick tests (fastdebug and release) > on aarch64 Linux That looks right. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Oct 30 17:38:05 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 30 Oct 2019 17:38:05 +0000 Subject: RFR: 8233232: AArch64: jni_fast_GetLongField is broken In-Reply-To: <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> References: <5a2ee72a-ea25-669e-226f-7eb62084068a@redhat.com> <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> Message-ID: I found a bug in AArch64. When we are resolving an object in native, rthread does not contain a valid thread value. Instead it should be derived from the jni_env argument. x86 does not use rthread, and is OK. I believe this is true for all platforms: none will have a valid rthread when called from native code. Fixed thusly, the same as x86. OK? diff -r 6a05019acb67 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Sep 17 14:00:36 2019 -0400 +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Wed Oct 30 12:44:23 2019 -0400 @@ -424,9 +448,12 @@ // Check for null. __ cbz(obj, done); assert(obj != rscratch2, "need rscratch2"); - Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); - __ ldrb(rscratch2, gc_state); + Address gc_state(jni_env, ShenandoahThreadLocalData::gc_state_offset() - JavaThread::jni_environment_offset()); + __ lea(rscratch2, gc_state); + __ ldrb(rscratch2, Address(rscratch2)); // Check for heap in evacuation phase __ tbnz(rscratch2, ShenandoahHeap::EVACUATION_BITPOS, slowpath); -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Wed Oct 30 17:45:09 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 30 Oct 2019 18:45:09 +0100 Subject: RFR: 8233232: AArch64: jni_fast_GetLongField is broken In-Reply-To: References: <5a2ee72a-ea25-669e-226f-7eb62084068a@redhat.com> <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> Message-ID: Is it not possible to use the gc_state Address directly in ldrb? Roman > I found a bug in AArch64. When we are resolving an object in native, > rthread does not contain a valid thread value. Instead it should be > derived from the jni_env argument. x86 does not use rthread, and is > OK. > > I believe this is true for all platforms: none will have a valid > rthread when called from native code. > > Fixed thusly, the same as x86. OK? > > diff -r 6a05019acb67 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp > --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Sep 17 14:00:36 2019 -0400 > +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Wed Oct 30 12:44:23 2019 -0400 > @@ -424,9 +448,12 @@ > // Check for null. > __ cbz(obj, done); > > assert(obj != rscratch2, "need rscratch2"); > - Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); > - __ ldrb(rscratch2, gc_state); > + Address gc_state(jni_env, ShenandoahThreadLocalData::gc_state_offset() - JavaThread::jni_environment_offset()); > + __ lea(rscratch2, gc_state); > + __ ldrb(rscratch2, Address(rscratch2)); > > // Check for heap in evacuation phase > __ tbnz(rscratch2, ShenandoahHeap::EVACUATION_BITPOS, slowpath); > From zgu at redhat.com Wed Oct 30 17:48:23 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 30 Oct 2019 13:48:23 -0400 Subject: RFR 8233165: Shenandoah:SBSA::gen_load_reference_barrier_stub() should use pointer register for address on aarch64 In-Reply-To: References: <82ee1d40-a7a8-1b68-1aec-5b15597d7cbb@redhat.com> Message-ID: <6e455f3c-b306-31b4-5aa6-f065a3bdcf59@redhat.com> Thanks for the review, Andrew. It already pushed, so I can not add you as a reviewer. -Zhengyu On 10/30/19 1:07 PM, Andrew Haley wrote: > On 10/30/19 12:56 PM, Zhengyu Gu wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8233165 >> Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233165/webrev.00/ >> >> Test: >> hotspot_gc_shenandoah (fastdebug and release) >> jcstress quick tests (fastdebug and release) >> on aarch64 Linux > > That looks right. > From zgu at redhat.com Wed Oct 30 17:59:16 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 30 Oct 2019 13:59:16 -0400 Subject: RFR: 8233232: AArch64: jni_fast_GetLongField is broken In-Reply-To: References: <5a2ee72a-ea25-669e-226f-7eb62084068a@redhat.com> <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> Message-ID: <5ab024a1-0f34-ccf2-d77f-4d2ded3af38d@redhat.com> Hi Andrew, Fix looks good. Sorry for neglecting aarch64 during past barrier works, I will double check them. -Zhengyu On 10/30/19 1:38 PM, Andrew Haley wrote: > I found a bug in AArch64. When we are resolving an object in native, > rthread does not contain a valid thread value. Instead it should be > derived from the jni_env argument. x86 does not use rthread, and is > OK. > > I believe this is true for all platforms: none will have a valid > rthread when called from native code. > > Fixed thusly, the same as x86. OK? > > diff -r 6a05019acb67 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp > --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Sep 17 14:00:36 2019 -0400 > +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Wed Oct 30 12:44:23 2019 -0400 > @@ -424,9 +448,12 @@ > // Check for null. > __ cbz(obj, done); > > assert(obj != rscratch2, "need rscratch2"); > - Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); > - __ ldrb(rscratch2, gc_state); > + Address gc_state(jni_env, ShenandoahThreadLocalData::gc_state_offset() - JavaThread::jni_environment_offset()); > + __ lea(rscratch2, gc_state); > + __ ldrb(rscratch2, Address(rscratch2)); > > // Check for heap in evacuation phase > __ tbnz(rscratch2, ShenandoahHeap::EVACUATION_BITPOS, slowpath); > From kim.barrett at oracle.com Wed Oct 30 19:53:19 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 30 Oct 2019 15:53:19 -0400 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <5A6C0668-86F6-4A3F-AC4D-75097D40A1C4@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> <5A6C0668-86F6-4A3F-AC4D-75097D40A1C4@oracle.com> Message-ID: <479069C4-526E-47CB-A86D-3ADE04076A07@oracle.com> > On Oct 29, 2019, at 4:13 PM, Stefan Johansson wrote: >> >> Webrev: >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.4 >> http://cr.openjdk.java.net/~sangheki/8220311/webrev.4.inc > > Looks good in general, just one comment. > > src/hotspot/share/gc/g1/g1Allocator.inline.hpp > --- > 78 assert(_alloc_buffers[dest.type()] != NULL, > 79 "Allocation buffer is NULL: %s", dest.get_type_str()); > > 80 G1HeapRegionAttr::region_type_t type = dest.type(); > 81 return alloc_buffer(type, node_index); > > As I mentioned to you offline, I think it is a bit unfortunate that we can?t index our way to the correct PLAB in G1PLABAllocator::alloc_buffer(?) without the if-statement, but I agree that having multiple array slots pointing to the same PLAB isn?t a optimal either. So I think this is approach is good for now, I wondered about that too. Multiple array slots pointing to the same PLAB doesn?t seem bad to me, though it makes the PLAB management a little more complicated. I agree this is good for now though, and can be investigated further in a followup. > but I have a very minor comment on the code snippet above. I would prefer if line 80 was skipped and the call on 81 just did return alloc_buffer(dest.type(), node_index). +1 From kim.barrett at oracle.com Wed Oct 30 19:55:42 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 30 Oct 2019 15:55:42 -0400 Subject: RFR(M): 8220311: Implementation: NUMA-Aware Memory Allocation for G1, Survivor (2/3) In-Reply-To: <85b282d1-0837-af5c-745f-efd0000d0ae1@oracle.com> References: <9a78e353-7908-b546-8f6a-7acd92eb40ac@oracle.com> <846eb849-8a49-5872-73d7-6bbc8f98369c@oracle.com> <56788E04-DC92-461F-B3A7-DEEBC524DB5B@oracle.com> <3fe39096-43cb-4828-c042-0fc976a0307a@oracle.com> <01a9ebcf-34ed-06b2-2da8-18d84feae858@oracle.com> <196b55d5-01f4-0202-effb-4495ae409df0@oracle.com> <5A6C0668-86F6-4A3F-AC4D-75097D40A1C4@oracle.com> <85b282d1-0837-af5c-745f-efd0000d0ae1@oracle.com> Message-ID: > On Oct 29, 2019, at 4:44 PM, sangheon.kim at oracle.com wrote: > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220311/webrev.5 > http://cr.openjdk.java.net/~sangheki/8220311/webrev.5.inc Looks good. From aph at redhat.com Wed Oct 30 20:26:52 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 30 Oct 2019 20:26:52 +0000 Subject: RFR: 8233232: AArch64: jni_fast_GetLongField is broken In-Reply-To: References: <5a2ee72a-ea25-669e-226f-7eb62084068a@redhat.com> <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> Message-ID: On 10/30/19 5:45 PM, Roman Kennke wrote: > Is it not possible to use the gc_state Address directly in ldrb? No, because it's a large negative offset. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Wed Oct 30 20:35:40 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 30 Oct 2019 21:35:40 +0100 Subject: RFR: 8233232: AArch64: jni_fast_GetLongField is broken In-Reply-To: References: <5a2ee72a-ea25-669e-226f-7eb62084068a@redhat.com> <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> Message-ID: <574a0d77-4c32-2d54-cf63-1a6c0fceed7e@redhat.com> > On 10/30/19 5:45 PM, Roman Kennke wrote: >> Is it not possible to use the gc_state Address directly in ldrb? > > No, because it's a large negative offset. Ah ok. Then it looks good. Thanks, Roman From shade at redhat.com Wed Oct 30 20:40:26 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 30 Oct 2019 21:40:26 +0100 Subject: RFR: 8233232: AArch64: jni_fast_GetLongField is broken In-Reply-To: References: <5a2ee72a-ea25-669e-226f-7eb62084068a@redhat.com> <2b7ec01b-085b-6e33-2946-2ad570d89ec6@redhat.com> Message-ID: <76364785-81b1-f1df-4c87-419bfcad9dc3@redhat.com> On 10/30/19 6:38 PM, Andrew Haley wrote: > I found a bug in AArch64. When we are resolving an object in native, > rthread does not contain a valid thread value. Instead it should be > derived from the jni_env argument. x86 does not use rthread, and is > OK. > > I believe this is true for all platforms: none will have a valid > rthread when called from native code. > > Fixed thusly, the same as x86. OK? > > diff -r 6a05019acb67 src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp > --- a/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Tue Sep 17 14:00:36 2019 -0400 > +++ b/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp Wed Oct 30 12:44:23 2019 -0400 > @@ -424,9 +448,12 @@ > // Check for null. > __ cbz(obj, done); > > assert(obj != rscratch2, "need rscratch2"); > - Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); > - __ ldrb(rscratch2, gc_state); > + Address gc_state(jni_env, ShenandoahThreadLocalData::gc_state_offset() - JavaThread::jni_environment_offset()); > + __ lea(rscratch2, gc_state); > + __ ldrb(rscratch2, Address(rscratch2)); > > // Check for heap in evacuation phase > __ tbnz(rscratch2, ShenandoahHeap::EVACUATION_BITPOS, slowpath); Looks good. -- Thanks, -Aleksey From mark.reinhold at oracle.com Wed Oct 30 21:45:22 2019 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Wed, 30 Oct 2019 14:45:22 -0700 (PDT) Subject: New candidate JEP: 365: ZGC on Windows Message-ID: <20191030214522.6102930C239@eggemoggin.niobe.net> https://openjdk.java.net/jeps/365 - Mark From mark.reinhold at oracle.com Wed Oct 30 22:05:19 2019 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Wed, 30 Oct 2019 15:05:19 -0700 (PDT) Subject: New candidate JEP: 366: Deprecate the ParallelScavenge + SerialOld GC Combination Message-ID: <20191030220519.8FE0830C244@eggemoggin.niobe.net> https://openjdk.java.net/jeps/366 - Mark From christoph.goettschkes at microdoc.com Thu Oct 31 09:12:05 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Thu, 31 Oct 2019 10:12:05 +0100 Subject: RFR: 8231955: ARM32: Address displacement is 0 for volatile field access because of Unsafe field access. In-Reply-To: <587f6363-bbdc-da12-9e50-82acc5bc5853@oracle.com> References: <20191010143426.BA4B6319F46@aojmv0009> <20191015073212.7FCCA319074@aojmv0009> <587f6363-bbdc-da12-9e50-82acc5bc5853@oracle.com> Message-ID: > I see now that BarrierSetC1::resolve_address() is calling > generate_address(), at least when access isn't patched. So now I'm > thinking that the address passed to > volatile_field_load/volatile_field_store should be correct, and the call > to add_large_constant() isn't necessary. Yes, this is correct. The LIR_Address is created by LIRGenerator::generate_address and has a displacement of 0. I attached a backtrace of the failing assert at the end of this mail. Do you think the patch makes sense and can be pushed? The HotSpot tier1 JTreg tests are passing with this and other patches I am working on applied with a debug VM. -- Christoph #0 0x7636b860 in LIRGenerator::add_large_constant (this=0x641ae2f0, src=0xe500b, c=0, dest=0xe900b) at src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp:166 #1 0x7636f266 in LIRGenerator::volatile_field_load (this=0x641ae2f0, address=0x6429c970, result=0xdd093, info=0x0) at src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp:1326 #2 0x762d9806 in BarrierSetC1::load_at_resolved (this=0x7602b1f0, access=..., result=0xdd093) at src/hotspot/share/gc/shared/c1/barrierSetC1.cpp:183 #3 0x762d929a in BarrierSetC1::load_at (this=0x7602b1f0, access=..., result=0xdd093) at src/hotspot/share/gc/shared/c1/barrierSetC1.cpp:94 #4 0x7635f6cc in LIRGenerator::access_load_at (this=0x641ae2f0, decorators=9127331840, type=T_LONG, base=..., offset=0xd900b, result=0xdd093, patch_info=0x0, load_emit_info=0x0) at src/hotspot/share/c1/c1_LIRGenerator.cpp:1618 #5 0x7636133e in LIRGenerator::do_UnsafeGetObject (this=0x641ae2f0, x=0x6429a0d0) at src/hotspot/share/c1/c1_LIRGenerator.cpp:2173 #6 0x76328bdc in UnsafeGetObject::visit (this=0x6429a0d0, v=0x641ae2f0) at src/hotspot/share/c1/c1_Instruction.hpp:2407 #7 0x7635b2d2 in LIRGenerator::do_root (this=0x641ae2f0, instr=0x6429a0d0) at src/hotspot/share/c1/c1_LIRGenerator.cpp:373 #8 0x7635b1f2 in LIRGenerator::block_do (this=0x641ae2f0, block=0x64299788) at src/hotspot/share/c1/c1_LIRGenerator.cpp:354 #9 0x76337d5a in BlockList::iterate_forward (this=0x6429bf00, closure=0x641ae2f4) at src/hotspot/share/c1/c1_Instruction.cpp:921 #10 0x76332936 in IR::iterate_linear_scan_order (this=0x642994d0, closure=0x641ae2f4) at src/hotspot/share/c1/c1_IR.cpp:1221 #11 0x7630ed10 in Compilation::emit_lir (this=0x641ae5c0) at src/hotspot/share/c1/c1_Compilation.cpp:259 #12 0x7630f2be in Compilation::compile_java_method (this=0x641ae5c0) at src/hotspot/share/c1/c1_Compilation.cpp:398 #13 0x7630f566 in Compilation::compile_method (this=0x641ae5c0) at src/hotspot/share/c1/c1_Compilation.cpp:460 #14 0x7630fabc in Compilation::Compilation (this=0x641ae5c0, compiler=0x760eb610, env=0x641ae848, method=0x63d2edc8, osr_bci=-1, buffer_blob=0x73eb7448, directive=0x760cf858) at src/hotspot/share/c1/c1_Compilation.cpp:583 #15 0x76312d6e in Compiler::compile_method (this=0x760eb610, env=0x641ae848, method=0x63d2edc8, entry_bci=-1, directive=0x760cf858) at src/hotspot/share/c1/c1_Compiler.cpp:247 #16 0x76453704 in CompileBroker::invoke_compiler_on_method (task=0x642cfa50) at src/hotspot/share/compiler/compileBroker.cpp:2115 #17 0x764529ba in CompileBroker::compiler_thread_loop () at src/hotspot/share/compiler/compileBroker.cpp:1800 #18 0x7693548c in compiler_thread_entry (thread=0x6423b400, __the_thread__=0x6423b400) at src/hotspot/share/runtime/thread.cpp:3401 #19 0x769315d4 in JavaThread::thread_main_inner (this=0x6423b400) at src/hotspot/share/runtime/thread.cpp:1917 #20 0x769314ac in JavaThread::run (this=0x6423b400) at src/hotspot/share/runtime/thread.cpp:1900 #21 0x7692e884 in Thread::call_run (this=0x6423b400) at src/hotspot/share/runtime/thread.cpp:398 #22 0x768285ce in thread_native_entry (thread=0x6423b400) at src/hotspot/os/linux/os_linux.cpp:790 #23 0x76f84568 in start_thread() from target:/usr/lib/libpthread.so.0 #24 0x76ef8ac8 in ?? () from target:/usr/lib/libc.so.6 From shade at redhat.com Thu Oct 31 09:15:51 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 31 Oct 2019 10:15:51 +0100 Subject: RFR (XS) 8233303: Shenandoah: verifier assert erroneously uses byte_size_in_exact_unit Message-ID: <217faac8-b8bc-f499-8ea2-5b52768da5df@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8233303 Typo in JDK-8232102 found by sh/jdk8 backports, where byte_size_in_exact_unit is not defined. Should actually be "proper_unit". Fix: diff -r b026a43e1809 src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp Tue Oct 29 09:34:23 2019 +0800 +++ b/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp Thu Oct 31 10:08:22 2019 +0100 @@ -693,12 +693,12 @@ size_t heap_committed = _heap->committed(); guarantee(cl.committed() == heap_committed, "%s: heap committed size must be consistent: heap-committed = " SIZE_FORMAT "%s, regions-committed = " SIZE_FORMAT "%s", label, - byte_size_in_exact_unit(heap_committed), proper_unit_for_byte_size(heap_committed), - byte_size_in_exact_unit(cl.committed()), proper_unit_for_byte_size(cl.committed())); + byte_size_in_proper_unit(heap_committed), proper_unit_for_byte_size(heap_committed), + byte_size_in_proper_unit(cl.committed()), proper_unit_for_byte_size(cl.committed())); } // Internal heap region checks if (ShenandoahVerifyLevel >= 1) { ShenandoahVerifyHeapRegionClosure cl(label, regions); Testing: x86_64 build -- Thanks, -Aleksey From rkennke at redhat.com Thu Oct 31 09:17:25 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 31 Oct 2019 10:17:25 +0100 Subject: RFR (XS) 8233303: Shenandoah: verifier assert erroneously uses byte_size_in_exact_unit In-Reply-To: <217faac8-b8bc-f499-8ea2-5b52768da5df@redhat.com> References: <217faac8-b8bc-f499-8ea2-5b52768da5df@redhat.com> Message-ID: <28db9580-76b0-4eea-b6c9-252e69f67f51@redhat.com> Ok. Thanks, Roman > Bug: > https://bugs.openjdk.java.net/browse/JDK-8233303 > > Typo in JDK-8232102 found by sh/jdk8 backports, where byte_size_in_exact_unit is not defined. Should > actually be "proper_unit". > > Fix: > > diff -r b026a43e1809 src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp Tue Oct 29 09:34:23 2019 +0800 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp Thu Oct 31 10:08:22 2019 +0100 > @@ -693,12 +693,12 @@ > > size_t heap_committed = _heap->committed(); > guarantee(cl.committed() == heap_committed, > "%s: heap committed size must be consistent: heap-committed = " SIZE_FORMAT "%s, > regions-committed = " SIZE_FORMAT "%s", > label, > - byte_size_in_exact_unit(heap_committed), proper_unit_for_byte_size(heap_committed), > - byte_size_in_exact_unit(cl.committed()), proper_unit_for_byte_size(cl.committed())); > + byte_size_in_proper_unit(heap_committed), proper_unit_for_byte_size(heap_committed), > + byte_size_in_proper_unit(cl.committed()), proper_unit_for_byte_size(cl.committed())); > } > > // Internal heap region checks > if (ShenandoahVerifyLevel >= 1) { > ShenandoahVerifyHeapRegionClosure cl(label, regions); > > Testing: x86_64 build > From thomas.schatzl at oracle.com Thu Oct 31 09:51:58 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 31 Oct 2019 10:51:58 +0100 Subject: RFR (XS): 8232951: TestG1ParallelPhases.java fails with phase NonYoungFreeCSet not found In-Reply-To: <1fff7f47-aebc-bbcd-bda6-bb6185c11c3a@oracle.com> References: <27fa21ab-d8ac-95ea-3485-7a72116c22f2@oracle.com> <951EB603-F273-4787-9D8D-32D8194ECFAA@oracle.com> <7734C751-1D9B-45F5-86FB-D51D2BE8985F@oracle.com> <1fff7f47-aebc-bbcd-bda6-bb6185c11c3a@oracle.com> Message-ID: Hi, On 28.10.19 11:40, Leo Korinth wrote: > Hi. > > Just want to add some information, because I think it will fail again. > > The buggy test case is written by me and the provoke mixed gc part is > copied mostly either from TestOldGenCollectionUsage or TestLogging (as > it is hard to share this code due to JTREG). However when I did "copy" > the code I also did try to improve the code, this could be the reason > for this failure. I did at least two "improvements" in that I removed > magic constants when allocating the 20k arrays and instead calculated > how many I would need; this made the algorithm allocate ~2M instead of > ~3M which could be a problem although to my understanding it should not > be. Another change I made is that I will not provoke a gc by allocating > until out-of-memory. The original code seems to try to provoke a gc by > starting concurrent marks and young gc, but kind of fail-safes with the > code after the comment // allocate more objects to provoke GC. Having > this code I guess would fix the problem with the test case, but on the > other hand, we would not know why the youngGC() after concurrent mark > does not provoke a mixed gc (I guess it should, but correct me if this > is false). I do not think either change makes a difference. > > I have talked to Thomas off-list, and I think AlwaysTenure is not the > solution to the problem we have. I think adding the debug options is > great and should be done, and AlwaysTenure seems better than > MaxTenuringThreshold=1 but we should expect the test case to continue to > fail in the future. > > If you go by adding AlwaysTenure instead of MaxTenuringThreshold=1, > please also remove one getWhiteBox().youngGC() in allocateOldObjects so > that we do not leave "magic" lines in the test case. Also update the > comment to // Do *one* young collections... > and there is another "-XX:MaxTenuringThreshold=1" that needs to be > updated. I need no webrev for these changes. Updated in place; also fixed Kim's comment about line length. http://cr.openjdk.java.net/~tschatzl/8232951/webrev/ > > I am sorry that my "improvements" probably caused this failure, though > just having heaps of code and not understanding why, is probably worse > in the long run --- at least that is my thinking. The question I have is whether I can push these changes under this CR (and if it occurs again we at least have a log to look at) or use another CR for it? Thanks, Thomas From leo.korinth at oracle.com Thu Oct 31 10:06:59 2019 From: leo.korinth at oracle.com (Leo Korinth) Date: Thu, 31 Oct 2019 11:06:59 +0100 Subject: RFR (XS): 8232951: TestG1ParallelPhases.java fails with phase NonYoungFreeCSet not found In-Reply-To: References: <27fa21ab-d8ac-95ea-3485-7a72116c22f2@oracle.com> <951EB603-F273-4787-9D8D-32D8194ECFAA@oracle.com> <7734C751-1D9B-45F5-86FB-D51D2BE8985F@oracle.com> <1fff7f47-aebc-bbcd-bda6-bb6185c11c3a@oracle.com> Message-ID: <816cd016-b10c-dcfd-292a-99ab36685c04@oracle.com> On 31/10/2019 10:51, Thomas Schatzl wrote: > Hi, > > On 28.10.19 11:40, Leo Korinth wrote: >> Hi. >> >> Just want to add some information, because I think it will fail again. >> >> The buggy test case is written by me and the provoke mixed gc part is >> copied mostly either from TestOldGenCollectionUsage or TestLogging (as >> it is hard to share this code due to JTREG). However when I did "copy" >> the code I also did try to improve the code, this could be the reason >> for this failure. I did at least two "improvements" in that I removed >> magic constants when allocating the 20k arrays and instead calculated >> how many I would need; this made the algorithm allocate ~2M instead of >> ~3M which could be a problem although to my understanding it should >> not be. Another change I made is that I will not provoke a gc by >> allocating until out-of-memory. The original code seems to try to >> provoke a gc by starting concurrent marks and young gc, but kind of >> fail-safes with the code after the comment // allocate more objects to >> provoke GC. Having this code I guess would fix the problem with the >> test case, but on the other hand, we would not know why the youngGC() >> after concurrent mark does not provoke a mixed gc (I guess it should, >> but correct me if this is false). > > I do not think either change makes a difference. > >> >> I have talked to Thomas off-list, and I think AlwaysTenure is not the >> solution to the problem we have. I think adding the debug options is >> great and should be done, and AlwaysTenure seems better than >> MaxTenuringThreshold=1 but we should expect the test case to continue >> to fail in the future. >> >> If you go by adding AlwaysTenure instead of MaxTenuringThreshold=1, >> please also remove one getWhiteBox().youngGC() in allocateOldObjects >> so that we do not leave "magic" lines in the test case. Also update >> the comment to // Do *one* young collections... >> and there is another "-XX:MaxTenuringThreshold=1" that needs to be >> updated. I need no webrev for these changes. > > Updated in place; also fixed Kim's comment about line length. > > http://cr.openjdk.java.net/~tschatzl/8232951/webrev/ > >> >> I am sorry that my "improvements" probably caused this failure, though >> just having heaps of code and not understanding why, is probably worse >> in the long run --- at least that is my thinking. > > The question I have is whether I can push these changes under this CR > (and if it occurs again we at least have a log to look at) or use > another CR for it? I am fine with you pushing under the current CR. Thanks, Leo > > Thanks, > ? Thomas From stefan.karlsson at oracle.com Thu Oct 31 10:18:20 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 31 Oct 2019 11:18:20 +0100 Subject: RFR: 8233299: Implementation: JEP 365: ZGC on Windows Message-ID: <8fbffe58-7045-52e4-687c-35cb8c146365@oracle.com> Hi all, Please review this patch to add ZGC support on Windows. https://cr.openjdk.java.net/~stefank/8233299/webrev.01/ https://bugs.openjdk.java.net/browse/JDK-8233299 As mentioned in the JEP (https://openjdk.java.net/jeps/365), there were some preparation patches that needed to go in to pave the way for this patch: 8232601: ZGC: Parameterize the ZGranuleMap table size 8232602: ZGC: Make ZGranuleMap ZAddress agnostic 8232604: ZGC: Make ZVerifyViews mapping and unmapping precise 8232648: ZGC: Move ATTRIBUTE_ALIGNED to the front of declarations 8232649: ZGC: Add callbacks to ZMemoryManager 8232650: ZGC: Add initialization hooks for OS specific code 8232651: Add implementation of os::processor_id() for Windows ... they have all been pushed now. One important key-point to this implementation is to use the new Windows APIs that support reservation and mapping of memory through "placeholders": VirtualAlloc2, VirtualFreeEx, MapViewOfFile3, and UnmapViewOfFile2. These functions are available starting from version 1803 of Windows 10 and Windows Server. ZGC will lookup these symbols to determine if the Windows version supports these functions. Correlating the text in the JEP with the code: * '"Support for multi-mapping memory". ZGC's use of colored pointers requires support for heap multi-mapping, so that the same physical memory can be accessed from multiple different locations in the process address space. On Windows, paging-file backed memory provides physical memory with an identity (a handle), which is unrelated to the virtual address where it is mapped. Using this identity allows ZGC to map the same physical memory into multiple locations.' We commit memory via paging file mappings and map views into that memory. The function ZMapper::create_and_commit_paging_file_mapping uses CreateFileMappingW with SEC_RESERVE to create this mapping, MapViewOfFile3 to map a temporary view into the mapping, VirtualAlloc2 to commit the memory, and then UnmapViewOfFile2 to unmap the view. The reason to use SEC_RESERVE and the extra VirtualAlloc2, instead of SEC_COMMIT, is to ensure that the later multi-mappings of committed file mappings don't fail under low-memory situations. Earlier prototypes used SEC_COMMIT and saw these kind of OOME errors when mapping new views to already committed memory. The current platform-independent ZGC code isn't prepared to handle OOME errors when mapping views, so we chose this solution. MapViewOfFile3 is then used to multi-map into the committed memory. * '"Support for mapping paging-file backed memory into a reserved address space". The Windows memory management API is not as flexible as POSIX's mmap/munmap, especially when it comes to mapping file backed memory into a previously reserved address space region. To do this, ZGC will use the Windows concept of address space placeholders. The placeholder concept was introduced in version 1803 of Windows 10 and Windows Server. ZGC support for older versions of Windows will not be implemented.' Before the placeholder APIs there was no way to first reserve a specific virtual memory range, and then map a view of a committed paging file over that range. The VirtuaAlloc function could be used to first reserve and then commit anonymous memory, but nothing similar existed for mapped views. Now with placeholders, we can create a placeholder reservation of memory with VirtualAlloc2, and then replace that reservation with MapViewOfFile3. When memory is unmapped, we can use UnmapViewOfFile2 to "preserve" the placeholder memory reservation. * '"Support for mapping and unmapping arbitrary parts of the heap". ZGC's heap layout in combination with its dynamic sizing (and re-sizing) of heap pages requires support for mapping and unmapping arbitrary heap granules. This requirement in combination with Windows address space placeholders requires special attention, since placeholders must be explicitly split/coalesced by the program, as opposed to being automatically split/coalesced by the operating system (as on Linux).' Half of the preparation patches were put in place to support this. When replacing a placeholder with a view of the backing file, we need to exactly match the address and size of a placeholder. Also, when unmapping a view, we need to exactly match the address and size of the view, and replace it with a placeholder. To make it easier to map and unmap arbitrary parts of the heap, we split reserved memory into ZGranuleSize-sized placeholders. So, whenever we perform any of these operations, we know that any given memory range could be dealt with as a number of granules. When memory is reserved, but not mapped, it is registered in the ZVirtualMemoryManager. It splits memory into granule-sized placholders when reserved memory is fetched, and coalesces placeholders when reserved memory is handed back. * '"Support for committing and uncommitting arbitrary parts of the heap". ZGC can commit and uncommit physical memory dynamically while the Java program is running. To support these operations the physical memory will be divided into, and backed by, multiple paging-file segments. Each paging-file segment corresponds to a ZGC heap granule, and can be committed and uncommitted independently of other segments.' Just like we can map and unmap in granules, we want to be able to commit and uncommit memory in granules. You can see how memory is committed and uncommitted in granules in ZBackingFile::commit_from_paging_file and ZBackingFile::uncommit_from_paging_file. Each committed granule is associated with one registered handle. When memory for a granule is uncommitted, the handle is closed. At this point, no views exist to the mapping and the memory is handed back to the OS. Final point about ZPhysicalMemoryBacking. We've tried to make this file similar on all OSes, with the hope to be able to combine them when both the Windows and macOS ports have been merged. Thanks, StefanK From thomas.schatzl at oracle.com Thu Oct 31 13:07:25 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 31 Oct 2019 14:07:25 +0100 Subject: RFR (S): 8233301: Implementation of JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination Message-ID: <727338ce-845c-067f-e56e-f066d2a602dd@oracle.com> Hi all, can I have reviews for this small change that implements deprecation as outlined in JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination? CR: https://bugs.openjdk.java.net/browse/JDK-8233301 Webrev: http://cr.openjdk.java.net/~tschatzl/8233301/webrev/ Testing: hs-tier1-5 Thanks, Thomas From per.liden at oracle.com Thu Oct 31 13:31:00 2019 From: per.liden at oracle.com (Per Liden) Date: Thu, 31 Oct 2019 14:31:00 +0100 Subject: RFR (S): 8233301: Implementation of JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination In-Reply-To: <727338ce-845c-067f-e56e-f066d2a602dd@oracle.com> References: <727338ce-845c-067f-e56e-f066d2a602dd@oracle.com> Message-ID: Looks good! /Per On 10/31/19 2:07 PM, Thomas Schatzl wrote: > Hi all, > > ? can I have reviews for this small change that implements deprecation > as outlined in JEP 366: Deprecate the ParallelScavenge + SerialOld GC > Combination? > > CR: > https://bugs.openjdk.java.net/browse/JDK-8233301 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8233301/webrev/ > Testing: > hs-tier1-5 > > Thanks, > ? Thomas From thomas.schatzl at oracle.com Thu Oct 31 13:43:17 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 31 Oct 2019 14:43:17 +0100 Subject: RFR (M): 8189737: Make HeapRegion not derive from Space Message-ID: <4fdeb066-e9eb-c0e6-5fd2-5ec9a368bc23@oracle.com> Hi all, can I get reviews for this refactoring that removes the inheritance of HeapRegion from Space? Since JDK10 we did not use much of the shared code in G1, so apart from inheriting a few trivial members (bottom, top, compaction_top) there is not much gain in inheriting from (Contiguous-)Space, except adding quite a few unused members and lots of legacy code. In JDK10 we already considered removing this inheritance, but never got around until now :) There will be a follow-up JDK-8233306 that cleans up the code a bit (sorting members and methods), but to keep this a bit more easily reviewable, the change is as it is. The change is smaller than webrev indicates, for some reason the single-line include change in test_g1HeapVerifier.cpp caused it to be included as a "new" file. There is also a lot of one-line #include-wrangling. CR: https://bugs.openjdk.java.net/browse/JDK-8189737 Webrev: http://cr.openjdk.java.net/~tschatzl/8189737/webrev/ Testing: hs-tier-1-5 Thanks, Thomas From thomas.schatzl at oracle.com Thu Oct 31 13:47:04 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 31 Oct 2019 14:47:04 +0100 Subject: RFR (M): 8233306: Sort members in G1's HeapRegion after removal of Space dependency Message-ID: <568f7bca-3c39-f554-b557-953e5f7f157c@oracle.com> Hi all, after the change to HeapRegion in JDK-8233306 the declaration fo the HeapRegion class is a bit messed up (merging G1ContiguousSpace, adding a few members needed from ContiguousSpace). This change tries to fix this as much as possible by shuffling around stuff (i.e. grouping allocation related methods, evacuation related methods, some helper pointers in HeapRegion, etc). Depends on JDK-8189737 also out for review. CR: https://bugs.openjdk.java.net/browse/JDK-8233306 Webrev: http://cr.openjdk.java.net/~tschatzl/8233306/webrev/ Testing: hs-tier1-5 Thanks, Thomas From stefan.johansson at oracle.com Thu Oct 31 15:31:39 2019 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Thu, 31 Oct 2019 16:31:39 +0100 Subject: RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3) In-Reply-To: References: <743d16cc-499d-784b-79fc-c006643f9ec5@oracle.com> Message-ID: <6449da1d-6dcc-50ba-8ae8-7615e7ad35f9@oracle.com> Hi Sangheon, On 2019-10-23 08:39, sangheon.kim at oracle.com wrote: > Hi Thomas, > > I am posting the next webrev as Kim is waiting it. > > Webrev: > http://cr.openjdk.java.net/~sangheki/8220312/webrev.3 > http://cr.openjdk.java.net/~sangheki/8220312/webrev.3.inc Here are my comments: src/hotspot/share/gc/g1/g1CollectedHeap.hpp --- 2397 st->print(" remaining free region(s) from each node id: "); What do you think about changing this to "... region(s) on each NUMA node: "? I think we should be clear about the logging being for NUMA. --- src/hotspot/share/gc/g1/g1EdenRegions.hpp --- 33 class G1EdenRegions : public G1RegionCounts { I don?t think G1EdenRegions is a G1RegionCounts but rather it should have one. So instead of using inheritance here I think G1EdenRegions should have a G1RegionsCount. Instead of overloading length I would then suggest adding a region_count(uint node_index) to get the count. Same goes for G1SurvivorRegions. --- src/hotspot/share/gc/g1/g1NUMA.cpp --- 279 bool NodeIndexCheckClosure::do_heap_region(HeapRegion* hr) { 280 uint preferred_node_index = _numa->preferred_node_index_for_index(hr->hrm_index()); 281 uint active_node_index = _numa->index_of_address(hr->bottom()); 282 283 if (preferred_node_index == active_node_index) { 284 _matched[preferred_node_index]++; 285 } else if (active_node_index == G1NUMA::UnknownNodeIndex) { 286 _unknown++; 287 } 288 _total++; 289 290 return false; 291 } As we discussed offline, I would like to know the mismatches as well, I think the easiest approach would be to make the total count per node as well and that way we can see if there were any regions that didn't match. What do you think about printing the info like this: [3,009s][trace][gc,heap,numa ] GC(6) NUMA region verification (actual/expected): 0: 1024/1024, 1: 270/1024, Unknown: 0 When testing this I also realized this output is problematic in the case where we have committed regions that have not yet been used. Reading the manual for get_mempolicy (the way we get the numa id for the address) say: "If no page has yet been allocated for the specified address, get_mempolicy() will allocate a page as if the thread had performed a read (load) access to that address, and return the ID of the node where that page was allocated." Doing a read access seem to always get a page on NUMA node 0, so the accounting will not be correct in this case. One way to fix this would be to only do accounting for regions currently used (!hr->is_free()) but I'm not sure that is exactly what we want, at least not if we only do this after the GC, then only the survivors and old will be checked. We could solve this by also do verification before the GC. I think this might be the way to go, what do you think? If my proposal was hard to follow, here's a patch: http://cr.openjdk.java.net/~sjohanss/numa/verify-alternative/ The output from this patch would be: 9,233s][trace][gc,heap,numa ] GC(18) GC Start: NUMA region verification (actual/expected): 0: 358/358, 1: 361/361, Unknown: 0 [9,306s][trace][gc,heap,numa ] GC(18) GC End: NUMA region verification (actual/expected): 0: 348/348, 1: 347/347, Unknown: 0 One can also see that this verification takes some time, so maybe it would make sense to have this logging under gc+numa+verify. --- 234 uint converted_req_index = requested_node_index; 235 if(converted_req_index == AnyNodeIndex) { 236 converted_req_index = _num_active_node_ids; 237 } 238 if (converted_req_index <= _num_active_node_ids) { 239 _times->update(phase, converted_req_index, allocated_node_index); 240 } I had to read this more than once to understand what it really did and I think we can simplify it a bit, by just doing an if-else that checks for AnyNodeIndex and if so passes in _num_active_node_ids to update(). This should be ok since requested_node_index never can be larger than _num_active_node_ids. --- src/hotspot/share/gc/g1/g1ParScanThreadState.cpp --- I would prefer if we hide all the accounting in helper functions, but it might be good to declare them to be inlined. 85 if (_numa->is_enabled()) { 86 LogTarget(Info, gc, heap, numa) lt; 87 88 if (lt.is_enabled()) { 89 uint num_nodes = _numa->num_active_nodes(); 90 // Record only if there are multiple active nodes. 91 _obj_alloc_stat = NEW_C_HEAP_ARRAY(size_t, num_nodes, mtGC); 92 memset((void*)_obj_alloc_stat, 0, sizeof(size_t) * num_nodes); 93 } 94 } Move to something like initialize_numa_stats(). 108 if (_obj_alloc_stat != NULL) { 109 uint node_index = _numa->index_of_current_thread(); 110 _numa->copy_statistics(G1NodeTimes::LocalObjProcessAtCopyToSurv, node_index, _obj_alloc_stat); 111 } This could be called flush_numa_stats(). 268 if (_obj_alloc_stat != NULL) { 269 _obj_alloc_stat[node_index]++; 270 } And this something like update_numa_stats(uint). -- heapRegionSet.hpp --- 159 inline void update_length(HeapRegion* hr, bool increase); 254 inline void update_length(HeapRegion* hr, bool increase); Is there any reason for having update_length that takes a bool rather than having one function for increments and one for decrements? To me it looks like all uses are pretty well defined and it would make the code easier to read. I also think we could pass in the node index rather than the HeapRegion since the getter lenght() does this. --- src/hotspot/share/gc/g1/g1NodeTimes.cpp --- First, a question about the names, G1NodeTimes signals that it has to do with timing, but currently we don't really record any timings. Same thing with NodeStatPhases, not really the same type of phases that we have for the rest of the GC logging. What do you think about renaming the class to G1NUMAStats and the enum to NodeDataItems? 166 void G1NodeTimes::print_phase_info(G1NodeTimes::NodeStatPhases phase) { 167 LogTarget(Info, gc, heap, numa) lt; I think this should be on debug level, but if you don't agree leave it as is. --- 191 void G1NodeTimes::print_mutator_alloc_stat_debug() { 192 LogTarget(Debug, gc, heap, numa) lt; And if you agree on moving the above to debug I think this should be on trace level. --- This is it for now. Thanks, Stefan > Testing: hs-tier 1 ~ 4 with/without UseNUMA. hs-tier5 is almost finished > without new failures. > > Thanks, > Sangheon > > From aph at redhat.com Thu Oct 31 16:45:55 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 31 Oct 2019 16:45:55 +0000 Subject: RFR 8232992: Shenandoah: Implement self-fixing interpreter LRB In-Reply-To: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> References: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> Message-ID: <15f764f4-8cbe-5a9c-88f9-1843b1e97d0c@redhat.com> On 10/25/19 3:29 PM, Zhengyu Gu wrote: > Test: > hotspot_gc_shenandoah (fastdebug and release) > x86_64 and x86_32 on Linux > aarch64 Linux > Windows x86_64 I didn't see this because I don't read all the Shenandoah and GC messages. The AArch64 code is unidiomatic and cumbersome in places, not to mention extremely confusing, and I can help with that. 236 void ShenandoahBarrierSetAssembler::load_reference_barrier_not_null(MacroAssembler* masm, Register dst, Address load_addr) { 237 assert(ShenandoahLoadRefBarrier, "Should be enabled"); 238 assert(dst != rscratch2, "need rscratch2"); 239 assert_different_registers(load_addr.base(), load_addr.index(), rscratch1); 240 assert_different_registers(load_addr.base(), load_addr.index(), rscratch2); 241 242 Label done; 243 __ enter(); 244 Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); 245 __ ldrb(rscratch2, gc_state); 246 247 // Check for heap stability 248 __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); 249 250 // use r1 for load address 251 Register result_dst = dst; 252 if (dst == r1) { 253 __ mov(rscratch1, dst); This is pointless. On AArch64 mov(Rn, Rm) generates no code if Rn == Rm. 254 dst = rscratch1; 255 } 256 257 RegSet to_save_r1 = RegSet::of(r1); 258 // If outgoing register is r1, we can clobber it 259 if (result_dst != r1) { 260 __ push(to_save_r1, sp); 261 } On AArch64 registers are always saved in pairs, so it makes sense to push individual registers. You might as well push both if either is to be saved. 262 __ lea(r1, load_addr); 263 264 RegSet to_save_r0 = RegSet::of(r0); 265 if (dst != r0) { 266 __ push(to_save_r0, sp); 267 __ mov(r0, dst); 268 } 269 270 __ far_call(RuntimeAddress(CAST_FROM_FN_PTR(address, ShenandoahBarrierSetAssembler::shenandoah_lrb()))); 271 272 if (result_dst != r0) { 273 __ mov(result_dst, r0); 274 } 275 276 if (dst != r0) { 277 __ pop(to_save_r0, sp); 278 } 279 280 if (result_dst != r1) { 281 __ pop(to_save_r1, sp); 282 } 283 284 __ bind(done); 285 __ leave(); 286 } So, you want to save r1 and r0, but if either of those is the destination you don't want to save it. The code at ShenandoahBarrierSetAssembler::shenandoah_lrb() preserves everything but r1 and r0. I believe this is what you want: void ShenandoahBarrierSetAssembler::load_reference_barrier_not_null(MacroAssembler* masm, Register dst, Address load_addr) { assert(ShenandoahLoadRefBarrier, "Should be enabled"); assert(dst != rscratch2, "need rscratch2"); assert_different_registers(load_addr.base(), load_addr.index(), rscratch1, rscratch2); Label done; __ enter(); Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); __ ldrb(rscratch2, gc_state); // Check for heap stability __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); // use r1 for load address Register result_dst = dst; if (dst == r1) { __ mov(rscratch1, dst); dst = rscratch1; } RegSet to_save = RegSet::of(r0, r1) - result_dst; __ push(to_save, sp); __ lea(r1, load_addr); __ mov(r0, dst); __ far_call(RuntimeAddress(CAST_FROM_FN_PTR(address, ShenandoahBarrierSetAssembler::shenandoah_lrb()))); __ mov(result_dst, r0); __ pop(to_save, sp); __ bind(done); __ leave(); } Please forward any patches which contain AArch64 assembly code to the aarch64-port-dev at openjdk.java.net list. I don't mean any criticism of you personally, but the AArch64 code in the Shenandoah GC barriers is gnarly and some of the most difficult to read in the whole port, probably because its authors, while undoubtedly brilliant, were not experienced AArch64 programmers. Let me help. :-) -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Oct 31 16:50:04 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 31 Oct 2019 16:50:04 +0000 Subject: RFR 8232992: Shenandoah: Implement self-fixing interpreter LRB In-Reply-To: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> References: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> Message-ID: <8adb0165-383d-b18b-4a05-52828100d397@redhat.com> On 10/25/19 3:29 PM, Zhengyu Gu wrote: > Test: > hotspot_gc_shenandoah (fastdebug and release) > x86_64 and x86_32 on Linux > aarch64 Linux > Windows x86_64 I didn't see this because I don't read all the Shenandoah and GC messages. The AArch64 code is unidiomatic and cumbersome in places, not to mention extremely confusing, and I can help with that. 236 void ShenandoahBarrierSetAssembler::load_reference_barrier_not_null(MacroAssembler* masm, Register dst, Address load_addr) { 237 assert(ShenandoahLoadRefBarrier, "Should be enabled"); 238 assert(dst != rscratch2, "need rscratch2"); 239 assert_different_registers(load_addr.base(), load_addr.index(), rscratch1); 240 assert_different_registers(load_addr.base(), load_addr.index(), rscratch2); 241 242 Label done; 243 __ enter(); 244 Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); 245 __ ldrb(rscratch2, gc_state); 246 247 // Check for heap stability 248 __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); 249 250 // use r1 for load address 251 Register result_dst = dst; 252 if (dst == r1) { 253 __ mov(rscratch1, dst); This is pointless. On AArch64 mov(Rn, Rm) generates no code if Rn == Rm. 254 dst = rscratch1; 255 } 256 257 RegSet to_save_r1 = RegSet::of(r1); 258 // If outgoing register is r1, we can clobber it 259 if (result_dst != r1) { 260 __ push(to_save_r1, sp); 261 } On AArch64 registers are always saved in pairs, so it makes sense to push individual registers. You might as well push both if either is to be saved. 262 __ lea(r1, load_addr); 263 264 RegSet to_save_r0 = RegSet::of(r0); 265 if (dst != r0) { 266 __ push(to_save_r0, sp); 267 __ mov(r0, dst); 268 } 269 270 __ far_call(RuntimeAddress(CAST_FROM_FN_PTR(address, ShenandoahBarrierSetAssembler::shenandoah_lrb()))); 271 272 if (result_dst != r0) { 273 __ mov(result_dst, r0); 274 } 275 276 if (dst != r0) { 277 __ pop(to_save_r0, sp); 278 } 279 280 if (result_dst != r1) { 281 __ pop(to_save_r1, sp); 282 } 283 284 __ bind(done); 285 __ leave(); 286 } So, you want to save r1 and r0, but if either of those is the destination you don't want to save it. The code at ShenandoahBarrierSetAssembler::shenandoah_lrb() preserves everything but r1 and r0. I believe this is what you want: void ShenandoahBarrierSetAssembler::load_reference_barrier_not_null(MacroAssembler* masm, Register dst, Address load_addr) { assert(ShenandoahLoadRefBarrier, "Should be enabled"); assert(dst != rscratch2, "need rscratch2"); assert_different_registers(load_addr.base(), load_addr.index(), rscratch1, rscratch2); Label done; __ enter(); Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); __ ldrb(rscratch2, gc_state); // Check for heap stability __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); // use r1 for load address Register result_dst = dst; if (dst == r1) { __ mov(rscratch1, dst); dst = rscratch1; } RegSet to_save = RegSet::of(r0, r1) - result_dst; __ push(to_save, sp); __ lea(r1, load_addr); __ mov(r0, dst); __ far_call(RuntimeAddress(CAST_FROM_FN_PTR(address, ShenandoahBarrierSetAssembler::shenandoah_lrb()))); __ mov(result_dst, r0); __ pop(to_save, sp); __ bind(done); __ leave(); } Please forward any patches which contain AArch64 assembly code to the aarch64-port-dev at openjdk.java.net list. I don't mean any criticism of you personally, but the AArch64 code in the Shenandoah GC barriers is gnarly and some of the most difficult to read in the whole port, probably because its authors, while undoubtedly brilliant, were not experienced AArch64 programmers. Let me help. :-) -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at redhat.com Thu Oct 31 18:09:09 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 31 Oct 2019 14:09:09 -0400 Subject: RFR 8232992: Shenandoah: Implement self-fixing interpreter LRB In-Reply-To: <8adb0165-383d-b18b-4a05-52828100d397@redhat.com> References: <1648aef7-6df9-6f54-6601-fde9d7251187@redhat.com> <8adb0165-383d-b18b-4a05-52828100d397@redhat.com> Message-ID: <540275f5-595a-faa0-2304-a95e657f92a0@redhat.com> Hi Andrew, Thanks for the suggestions. Filed JDK-8233337 to clean this up. -Zhengyu On 10/31/19 12:50 PM, Andrew Haley wrote: > On 10/25/19 3:29 PM, Zhengyu Gu wrote: >> Test: >> hotspot_gc_shenandoah (fastdebug and release) >> x86_64 and x86_32 on Linux >> aarch64 Linux >> Windows x86_64 > > I didn't see this because I don't read all the Shenandoah and GC > messages. > > The AArch64 code is unidiomatic and cumbersome in places, not to > mention extremely confusing, and I can help with that. > > 236 void ShenandoahBarrierSetAssembler::load_reference_barrier_not_null(MacroAssembler* masm, Register dst, Address load_addr) { > 237 assert(ShenandoahLoadRefBarrier, "Should be enabled"); > 238 assert(dst != rscratch2, "need rscratch2"); > 239 assert_different_registers(load_addr.base(), load_addr.index(), rscratch1); > 240 assert_different_registers(load_addr.base(), load_addr.index(), rscratch2); > 241 > 242 Label done; > 243 __ enter(); > 244 Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); > 245 __ ldrb(rscratch2, gc_state); > 246 > 247 // Check for heap stability > 248 __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); > 249 > 250 // use r1 for load address > 251 Register result_dst = dst; > 252 if (dst == r1) { > 253 __ mov(rscratch1, dst); > > This is pointless. On AArch64 mov(Rn, Rm) generates no code if Rn == Rm. > > 254 dst = rscratch1; > 255 } > 256 > 257 RegSet to_save_r1 = RegSet::of(r1); > 258 // If outgoing register is r1, we can clobber it > 259 if (result_dst != r1) { > 260 __ push(to_save_r1, sp); > 261 } > > On AArch64 registers are always saved in pairs, so it makes sense to push > individual registers. You might as well push both if either is to be saved. > > 262 __ lea(r1, load_addr); > 263 > 264 RegSet to_save_r0 = RegSet::of(r0); > 265 if (dst != r0) { > 266 __ push(to_save_r0, sp); > 267 __ mov(r0, dst); > 268 } > 269 > 270 __ far_call(RuntimeAddress(CAST_FROM_FN_PTR(address, ShenandoahBarrierSetAssembler::shenandoah_lrb()))); > 271 > 272 if (result_dst != r0) { > 273 __ mov(result_dst, r0); > 274 } > 275 > 276 if (dst != r0) { > 277 __ pop(to_save_r0, sp); > 278 } > 279 > 280 if (result_dst != r1) { > 281 __ pop(to_save_r1, sp); > 282 } > 283 > 284 __ bind(done); > 285 __ leave(); > 286 } > > So, you want to save r1 and r0, but if either of those is the destination you > don't want to save it. The code at ShenandoahBarrierSetAssembler::shenandoah_lrb() > preserves everything but r1 and r0. > > I believe this is what you want: > > void ShenandoahBarrierSetAssembler::load_reference_barrier_not_null(MacroAssembler* masm, Register dst, Address load_addr) { > assert(ShenandoahLoadRefBarrier, "Should be enabled"); > assert(dst != rscratch2, "need rscratch2"); > assert_different_registers(load_addr.base(), load_addr.index(), rscratch1, rscratch2); > > Label done; > __ enter(); > Address gc_state(rthread, in_bytes(ShenandoahThreadLocalData::gc_state_offset())); > __ ldrb(rscratch2, gc_state); > > // Check for heap stability > __ tbz(rscratch2, ShenandoahHeap::HAS_FORWARDED_BITPOS, done); > > // use r1 for load address > Register result_dst = dst; > if (dst == r1) { > __ mov(rscratch1, dst); > dst = rscratch1; > } > > RegSet to_save = RegSet::of(r0, r1) - result_dst; > __ push(to_save, sp); > __ lea(r1, load_addr); > __ mov(r0, dst); > > __ far_call(RuntimeAddress(CAST_FROM_FN_PTR(address, ShenandoahBarrierSetAssembler::shenandoah_lrb()))); > > __ mov(result_dst, r0); > __ pop(to_save, sp); > > __ bind(done); > __ leave(); > } > > > Please forward any patches which contain AArch64 assembly code to the > aarch64-port-dev at openjdk.java.net list. > > I don't mean any criticism of you personally, but the AArch64 code in > the Shenandoah GC barriers is gnarly and some of the most difficult to > read in the whole port, probably because its authors, while > undoubtedly brilliant, were not experienced AArch64 programmers. Let > me help. :-) > From zgu at redhat.com Thu Oct 31 18:48:04 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 31 Oct 2019 14:48:04 -0400 Subject: RFR 8233339: Shenandoah: Centralize load barrier decisions into ShenandoahBarrierSet Message-ID: <6ef89df6-84db-0ffe-d1fc-7ffde7e622bf@redhat.com> Right now, the decisions on, if a load barrier needs load reference barrier, if so, what kind? and if the reference needs to be kept alive, are scattered inside interpreter/c1/2 load barrier code, which is hard to make them consistent. I would like to centralize the decision making into ShenandoahBarrierSet, so them can be consistent and easy to maintain. Bug: https://bugs.openjdk.java.net/browse/JDK-8233339 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8233339/webrev.00/index.html Test: hotspot_gc_shenandoah (fastdebug and release) x86_64 and x86_32 on Linux AArch64 on Linux Thanks, -Zhengyu From kim.barrett at oracle.com Thu Oct 31 20:53:01 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 31 Oct 2019 16:53:01 -0400 Subject: RFR: 8232588: G1 concurrent System.gc can return early or late Message-ID: <8518409B-159B-48B2-97E4-D5D4C3B2BC5A@oracle.com> RFR: 8232588: G1 concurrent System.gc can return early or late RFR: 8233279: G1: GCLocker GC with +GCLockerInvokesConcurrent spins while cycle in progress Please review this refactoring and fixing of the state machine used by G1CollectedHeap::collect for handling requests for concurrent collections. The handling of concurrent collection requests is now split out into a helper function for that purpose. All of the state machine logic for checking for completion, waiting for completions, and performing retries is now in that new helper function, rather than being distributed between try_collect() and various parts of the VMOp. Added a new VMOp, VM_G1TryInitiateConcMark. This simplified both the handling of this case and VM_G1CollectForAllocation. The new VMOp provides some additional information for use by the state machine. For user-requested concurrent GC requests, the previously intended behavior was to wait for an in-progress concurrent marking cycle (if any), then start a new concurrent marking cycle and wait for it to complete. However, there were various race conditions that might result in returning either sooner or later than intended. This change addresses those races, so that we get consistent behavior for such requests. (WhiteBox.g1StartConcMarkCycle is the function that uses _wb_conc_mark. With that name, it's not obvious that the full waiting behavior is intended, but that's what it used to do, so not changing it. Some tests follow it with a sleep-wait for !WB.g1InConcurrentMark(), while others seem to expect it to perform a complete collection.) A change is that waiting by a user-requested GC for a concurrent marking cycle to complete used to be performed with the thread transitioned to native and without safepoint checks on the associated monitor lock and wait. This was noted as having been cribbed from CMS. Coleen and I looked at this and could not come up with a reason for doing that for G1 (anymore, after the recent spate of locking improvements), so there's a new G1-specific monitor being used and the locking and waiting is now "normal". (This makes the FullGCCount_lock monitor largely CMS-specific.) For other concurrent GC requests, the only intentional change is for _gc_locker with GCLockerInvokesConcurrent. Previously it would spin in try_collect while there was a concurrent marking cycle in progress, also blocking any callers of GCLocker::stall_until_clear() (JDK-8233279). Now it returns in that situation, though it's not clear that's a great idea either. Indeed, even when that option was introduced (for CMS, as part of fixing a bad interaction between GCLocker GCs and +ExplicitGCInvokesConcurrent) it was not clear it was a good idea (see JDK-6919638). Fortunately it's off by default. JDK-8233280 has been filed to remove this option. CR: https://bugs.openjdk.java.net/browse/JDK-8233279 https://bugs.openjdk.java.net/browse/JDK-8232588 Webrev: https://cr.openjdk.java.net/~kbarrett/8232588/open.00/ Testing: mach5 tier1-6 Local (linux-x64) testing with a program that allocates some live data in the old gen, then has several threads all repeatedly looping on System.gc(). Looked at output from new logging in try_collect_concurrently and verified the interleavings of GC start/end and new log messages were as expected. From kim.barrett at oracle.com Thu Oct 31 21:08:51 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 31 Oct 2019 17:08:51 -0400 Subject: RFR (S): 8233301: Implementation of JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination In-Reply-To: <727338ce-845c-067f-e56e-f066d2a602dd@oracle.com> References: <727338ce-845c-067f-e56e-f066d2a602dd@oracle.com> Message-ID: <898970DE-A743-491B-9689-C1E3C2848755@oracle.com> > On Oct 31, 2019, at 9:07 AM, Thomas Schatzl wrote: > > Hi all, > > can I have reviews for this small change that implements deprecation as outlined in JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination? > > CR: > https://bugs.openjdk.java.net/browse/JDK-8233301 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8233301/webrev/ > Testing: > hs-tier1-5 > > Thanks, > Thomas Looks good. From kim.barrett at oracle.com Thu Oct 31 22:12:20 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 31 Oct 2019 18:12:20 -0400 Subject: RFR (M): 8189737: Make HeapRegion not derive from Space In-Reply-To: <4fdeb066-e9eb-c0e6-5fd2-5ec9a368bc23@oracle.com> References: <4fdeb066-e9eb-c0e6-5fd2-5ec9a368bc23@oracle.com> Message-ID: > On Oct 31, 2019, at 9:43 AM, Thomas Schatzl wrote: > > Hi all, > > can I get reviews for this refactoring that removes the inheritance of HeapRegion from Space? > > Since JDK10 we did not use much of the shared code in G1, so apart from inheriting a few trivial members (bottom, top, compaction_top) there is not much gain in inheriting from (Contiguous-)Space, except adding quite a few unused members and lots of legacy code. > > In JDK10 we already considered removing this inheritance, but never got around until now :) > > There will be a follow-up JDK-8233306 that cleans up the code a bit (sorting members and methods), but to keep this a bit more easily reviewable, the change is as it is. > > The change is smaller than webrev indicates, for some reason the single-line include change in test_g1HeapVerifier.cpp caused it to be included as a "new" file. There is also a lot of one-line #include-wrangling. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8189737 > Webrev: > http://cr.openjdk.java.net/~tschatzl/8189737/webrev/ > Testing: > hs-tier-1-5 > > Thanks, > Thomas It's a little unfortunate that you needed to touch the #includes in a couple of cms files. Looks like it should be an easy merge for whichever of you or Leo goes second though. ------------------------------------------------------------------------------ 102 inline HeapWord* HeapRegion::par_allocate(size_t min_word_size, 103 size_t desired_word_size, 104 size_t* actual_size) { Parameter list indentation needs fixing. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/heapRegion.hpp There's a big comment block about BlockOffsetTable divergence, time stamps, &etc in front of G1ContiguousSpace that seems to have simply disappeared. I take it this was leftover commentary that should have been removed with JDK-8199326 and maybe others? ------------------------------------------------------------------------------ Looks good. I don't need a new webrev for the parameter list indentation fix. From ecki at zusammenkunft.net Thu Oct 31 23:05:18 2019 From: ecki at zusammenkunft.net (Bernd Eckenfels) Date: Thu, 31 Oct 2019 23:05:18 +0000 Subject: RFR (S): 8233301: Implementation of JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination In-Reply-To: <727338ce-845c-067f-e56e-f066d2a602dd@oracle.com> References: <727338ce-845c-067f-e56e-f066d2a602dd@oracle.com> Message-ID: The help message: Use the Parallel Old garbage collector. Deprecated. Looks a bit missleading to me. I know it means the option is deprecated (especially the non default negative value), but it could easily be understood as ParallelOld beeing deprecated. There is no jtreg for +UseParallelOld. It would need to document that deprecation warning is expected for that as well? Gruss Bernd -- http://bernd.eckenfels.net ________________________________ Von: hotspot-gc-dev im Auftrag von Thomas Schatzl Gesendet: Donnerstag, Oktober 31, 2019 2:07 PM An: hotspot-gc-dev at openjdk.java.net Betreff: RFR (S): 8233301: Implementation of JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination Hi all, can I have reviews for this small change that implements deprecation as outlined in JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination? CR: https://bugs.openjdk.java.net/browse/JDK-8233301 Webrev: http://cr.openjdk.java.net/~tschatzl/8233301/webrev/ Testing: hs-tier1-5 Thanks, Thomas From thomas.schatzl at oracle.com Thu Oct 31 23:20:51 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 01 Nov 2019 00:20:51 +0100 Subject: RFR (M): 8189737: Make HeapRegion not derive from Space In-Reply-To: References: <4fdeb066-e9eb-c0e6-5fd2-5ec9a368bc23@oracle.com> Message-ID: Hi Kim, thanks for your review. On Thu, 2019-10-31 at 18:12 -0400, Kim Barrett wrote: > > On Oct 31, 2019, at 9:43 AM, Thomas Schatzl < > > thomas.schatzl at oracle.com> wrote: > > > > Hi all, > > > > can I get reviews for this refactoring that removes the > > inheritance of HeapRegion from Space? > > > > [...] > > CR: > > https://bugs.openjdk.java.net/browse/JDK-8189737 > > Webrev: > > http://cr.openjdk.java.net/~tschatzl/8189737/webrev/ > > Testing: > > hs-tier-1-5 > > > > Thanks, > > Thomas > > It's a little unfortunate that you needed to touch the #includes in a > couple of cms files. Looks like it should be an easy merge for > whichever of you or Leo goes second though. > Yeah, np for either of us I guess. > ------------------------------------------------------------------- > ----------- > 102 inline HeapWord* HeapRegion::par_allocate(size_t min_word_size, > 103 size_t > desired_word_size, > 104 size_t* > actual_size) { > > Parameter list indentation needs fixing. Will do. > > ------------------------------------------------------------------- > ----------- > src/hotspot/share/gc/g1/heapRegion.hpp > > There's a big comment block about BlockOffsetTable divergence, time > stamps, &etc in front of G1ContiguousSpace that seems to have simply > disappeared. I take it this was leftover commentary that should have > been removed with JDK-8199326 and maybe others? The first one about the divergence is obsolete because with that change we officially and intentionally abandon any way to converge. The other about the time stamps should have, as you correctly noticed, been removed with JDK-8199326. > > ------------------------------------------------------------------- > ----------- > > Looks good. > > I don't need a new webrev for the parameter list indentation fix. > I will update the webrev later in place. Thanks, Thomas