RFR(XS) 8191369: NMT: Enhance thread stack tracking
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Mon Feb 12 14:24:37 UTC 2018
On 2/6/18 1:17 PM, Zhengyu Gu wrote:
> Hi Coleen,
>
> On 02/05/2018 01:03 PM, coleen.phillimore at oracle.com wrote:
>>
>> Hi Zhengyu,
>>
>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/src/hotspot/share/services/virtualMemoryTracker.cpp.udiff.html
>>
>>
>> if (rgn->overlap_region(addr, size)) {
>> - // Clear a space for this region in the case it overlaps with any
>> regions.
>> - remove_uncommitted_region(addr, size);
>> - break; // commit below
>> + if (addr < rgn->base()) {
>> + rgn->expand_region(addr, (rgn->base() - addr));
>> + }
>> +
>> + if (addr + size > rgn->base() + rgn->size()) {
>> + rgn->expand_region(rgn->base() + rgn->size(),
>> + (addr + size) - (rgn->base() + rgn->size()));
>> }
>> + return true;
>> + }
>> +
>>
>>
>> The code to add the overlapping region does not look correct. It
>> doesn't add subtract the overlapping part and add in the part that's
>> extended into the count of how big the section is. Stefan has a
>> pre-review for another fix that I'm looking at now to merge regions.
>> I think his design is simpler. It adds the section, by deleting any
>> overlapping memory, then has a loop to potentially merge sections.
>> The linked list of committed regions isn't generally very long.
>>
> Yes, I failed to adjust summary amount here, thanks for catching this!
>
>
>> Instead of adding a committed region for thread reserved region,
>> since you call
>>
>> + size_t committed_size = os::committed_stack_size(stack_bottom,
>> stack_size);
>>
>>
>> Each time, why not remove the committed region rather than trying to
>> merge any additional committed area to it, ie:
>>
>> remove_committed_region, call os::commited_stack_size(stack base,
>> stack_size), add the committed region returned for printing.
> No, I can not, cause I don't know previous stack size to uncommit it.
>
> Stefan's patch touched similar area, but not the same. I think we have
> some corner cases that are not handled properly. ex. once region is
> expanded, it can become adjacent region to next one, etc.
>
>>
>> You expect the committed region to be contiguous, don't you?
>
> Yes, for this case and most of cases at the time this code was
> written. But not anymore, Shenandoah can uncommit arbitrary regions
> and recommit them in arbitrary orders.
For the thread stack? But that's ok, it's better to handle all the
regions the same.
>
> I can hold off this patch till Stefan's patch get in, if it will
> happen soon.
Yes, please hold off until Stefan's patch is done. I see you've
continued that discussion.
>
>>
>> I can't comment on the os parts but they look legit. If the
>> os_linux. pd_committed_stack_size calls code that is available on the
>> other posix platforms, can you move it there? Can you call it
>> committed_stack_size() and not have the pd_* layer as well?
>
> Linux pd_committed_stack_size() uses mincore(), and is Linux only API.
> There are only Linux and Windows implementation with this patch.
>
> so, you want something like this?
>
> #if !defined(LINUX) && !defined(_WINDOWS)
> size_t os::committed_stack_size(address bottom, size_t size) {
> return size;
> }
> #endif
>
> and leave Linux/Windows' implementation under os directory?
No, what you have is better. I dislike the delegation to the duplicate
function signature in the pd_* layer. Can you make the
os::committed_stack_size() functions in the platforms where this isn't
supported just be empy functions?
If there are platforms that don't have the ability to track committed
regions of the mtThread stack, don't you still need the concept of
all_committed?
thanks,
Coleen
>
> Thanks,
>
> -Zhengyu
>
>
>
>
>>
>> Thanks,
>> Coleen
>>
>> On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>> Hi,
>>>
>>> Can I get second (R)eviewer for this? and I also need a sponsor.
>>>
>>> This is JDK11 bug.
>>>
>>> Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>> Bug: Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>>
>>>
>>> On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>>> Hi Andrew,
>>>>
>>>> Thanks for the review.
>>>>
>>>>>> I'm not sure about the logic of the test when you hit the the
>>>>>> ENOMEM/uncommited case.
>>>>>>
>>>>>> + mincore_return_value = mincore(test_addr, page_sz, vec);
>>>>>> +
>>>>>> + if (mincore_return_value == -1 || (committed_only && (vec[0] &
>>>>>> 0x01) == 0)) {
>>>>>> + // Page is not mapped/committed go up
>>>>>> + // to find first mapped/committed page
>>>>>> + if (errno != EAGAIN || (committed_only && (vec[0] & 0x01)
>>>>>> == 0)) {
>>>>>> + assert(mincore_return_value != -1 || errno == ENOMEM,
>>>>>> "Unexpected mincore errno");
>>>>>>
>>>>>> Should that not be
>>>>>>
>>>>>> + if ((mincore_return_value != -1 && errno != EAGAIN) ||
>>>>>> (committed_only && (vec[0] & 0x01) == 0)) {
>>>>>
>>>>> Sorry, that correction should have been:
>>>>>
>>>>> + if ((mincore_return_value == -1 && errno != EAGAIN) ||
>>>>> (committed_only && (vec[0] & 0x01) == 0)) {
>>>>
>>>> Thanks for pointing out. Updated accordingly:
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>>
>>>> -Zhengyu
>>>>
>>>>>
>>>>>> A successful call to mincore is not guaranteed to reset errno and it
>>>>>> might by chance already have value EAGAIN before the call to
>>>>>> mincore. If
>>>>>> we happened to hit a mapped, committed page first time then the
>>>>>> vec bit
>>>>>> will be 1 and the code will loop and retest the same address.
>>>>>> Unlikely
>>>>>> but possible?
>>>>>>
>>>>>>
>>>>>> The adjustment to the edge case logic now looks correct.
>>>>>>
>>>>>>> The test case for verifying fixes for above two issues can be found
>>>>>>> here: http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>>>>>
>>>>>> Yes, that verifies the edge case handling e.g. I ran it with 256
>>>>>> pages
>>>>>> and with 127/8/9 mapped and it gave the right results.
>>>>>>
>>>>>>> As mentioned in early email, new patch also contains
>>>>>>> implementation for
>>>>>>> Windows.
>>>>>>>
>>>>>>> Updated Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>>>>>>
>>>>>>> Test:
>>>>>>>
>>>>>>> Reran hotspot_tier1 tests on Windows x64 and Linux x64
>>>>>>> (fastdebug and
>>>>>>> release)
>>>>>> I have no idea about the validity of the Windows fix but the
>>>>>> Linux one
>>>>>> looks good modulo that if condition.
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>>
>>>>>> Andrew Dinn
>>>>>> -----------
>>>>>> Senior Principal Software Engineer
>>>>>> Red Hat UK Ltd
>>>>>> Registered in England and Wales under Company Registration No.
>>>>>> 03798903
>>>>>> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric
>>>>>> Shander
>>>>>>
>>>>>
>>
More information about the hotspot-runtime-dev
mailing list