RFR(XS) 8191369: NMT: Enhance thread stack tracking

Zhengyu Gu zgu at redhat.com
Tue Feb 6 18:17:07 UTC 2018


Hi Coleen,

On 02/05/2018 01:03 PM, coleen.phillimore at oracle.com wrote:
> 
> Hi Zhengyu,
> 
> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/src/hotspot/share/services/virtualMemoryTracker.cpp.udiff.html 
> 
> 
>       if (rgn->overlap_region(addr, size)) {
> - // Clear a space for this region in the case it overlaps with any 
> regions.
> - remove_uncommitted_region(addr, size);
> - break; // commit below
> + if (addr < rgn->base()) {
> + rgn->expand_region(addr, (rgn->base() - addr));
> + }
> +
> + if (addr + size > rgn->base() + rgn->size()) {
> + rgn->expand_region(rgn->base() + rgn->size(),
> + (addr + size) - (rgn->base() + rgn->size()));
>       }
> + return true;
> + }
> +
> 
> 
> The code to add the overlapping region does not look correct.  It 
> doesn't add subtract the overlapping part and add in the part that's 
> extended into the count of how big the section is.   Stefan has a 
> pre-review for another fix that I'm looking at now to merge regions.  I 
> think his design is simpler.   It adds the section, by deleting any 
> overlapping memory, then has a loop to potentially merge sections.  The 
> linked list of committed regions isn't generally very long.
>
Yes, I failed to adjust summary amount here, thanks for catching this!


> Instead of adding a committed region for thread reserved region, since 
> you call
> 
> + size_t committed_size = os::committed_stack_size(stack_bottom, 
> stack_size);
> 
> 
> Each time, why not remove the committed region rather than trying to 
> merge any additional committed area to it, ie:
> 
> remove_committed_region, call os::commited_stack_size(stack base, 
> stack_size), add the committed region returned for printing.
No, I can not, cause I don't know previous stack size to uncommit it.

Stefan's patch touched similar area, but not the same. I think we have 
some corner cases that are not handled properly. ex. once region is 
expanded, it can become adjacent region to next one, etc.

> 
> You expect the committed region to be contiguous, don't you?

Yes, for this case and most of cases at the time this code was written. 
But not anymore, Shenandoah can uncommit arbitrary regions and recommit 
them in arbitrary orders.

I can hold off this patch till Stefan's patch get in, if it will happen 
soon.

> 
> I can't comment on the os parts but they look legit.  If the os_linux. 
> pd_committed_stack_size calls code that is available on the other posix 
> platforms, can you move it there?  Can you call it 
> committed_stack_size() and not have the pd_* layer as well?

Linux pd_committed_stack_size() uses mincore(), and is Linux only API. 
There are only Linux and Windows implementation with this patch.

so, you want something like this?

#if !defined(LINUX) && !defined(_WINDOWS)
size_t os::committed_stack_size(address bottom, size_t size) {
   return size;
}
#endif

and leave Linux/Windows' implementation under os directory?

Thanks,

-Zhengyu




> 
> Thanks,
> Coleen
> 
> On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>> Hi,
>>
>> Can I get second (R)eviewer for this? and I also need a sponsor.
>>
>> This is JDK11 bug.
>>
>> Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>> Bug: Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>
>>
>> On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>> Hi Andrew,
>>>
>>> Thanks for the review.
>>>
>>>>> I'm not sure about the logic of the test when you hit the the
>>>>> ENOMEM/uncommited case.
>>>>>
>>>>> +    mincore_return_value = mincore(test_addr, page_sz, vec);
>>>>> +
>>>>> +    if (mincore_return_value == -1 || (committed_only && (vec[0] &
>>>>> 0x01) == 0)) {
>>>>> +      // Page is not mapped/committed go up
>>>>> +      // to find first mapped/committed page
>>>>> +      if (errno != EAGAIN || (committed_only && (vec[0] & 0x01) == 
>>>>> 0)) {
>>>>> +        assert(mincore_return_value != -1 || errno == ENOMEM,
>>>>> "Unexpected mincore errno");
>>>>>
>>>>> Should that not be
>>>>>
>>>>>    +      if ((mincore_return_value != -1 && errno != EAGAIN) ||
>>>>> (committed_only && (vec[0] & 0x01) == 0)) {
>>>>
>>>> Sorry, that correction should have been:
>>>>
>>>>    +      if ((mincore_return_value == -1 && errno != EAGAIN) ||
>>>>    (committed_only && (vec[0] & 0x01) == 0)) {
>>>
>>> Thanks for pointing out. Updated accordingly:
>>>
>>> Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>
>>> -Zhengyu
>>>
>>>>
>>>>> A successful call to mincore is not guaranteed to reset errno and it
>>>>> might by chance already have value EAGAIN before the call to 
>>>>> mincore. If
>>>>> we happened to hit a mapped, committed page first time then the vec 
>>>>> bit
>>>>> will be 1 and the code will loop and retest the same address. Unlikely
>>>>> but possible?
>>>>>
>>>>>
>>>>> The adjustment to the edge case logic now looks correct.
>>>>>
>>>>>> The test case for verifying fixes for above two issues can be found
>>>>>> here: http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>>>>
>>>>> Yes, that verifies the edge case handling e.g. I ran it with 256 pages
>>>>> and with 127/8/9 mapped and it gave the right results.
>>>>>
>>>>>> As mentioned in early email, new patch also contains 
>>>>>> implementation for
>>>>>> Windows.
>>>>>>
>>>>>> Updated Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>>>>>
>>>>>> Test:
>>>>>>
>>>>>>    Reran hotspot_tier1 tests on Windows x64 and Linux x64 
>>>>>> (fastdebug and
>>>>>> release)
>>>>> I have no idea about the validity of the Windows fix but the Linux one
>>>>> looks good modulo that if condition.
>>>>>
>>>>> regards,
>>>>>
>>>>>
>>>>> Andrew Dinn
>>>>> -----------
>>>>> Senior Principal Software Engineer
>>>>> Red Hat UK Ltd
>>>>> Registered in England and Wales under Company Registration No. 
>>>>> 03798903
>>>>> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
>>>>>
>>>>
> 


More information about the hotspot-runtime-dev mailing list