RFR(S) 8204557: NMT: Linux os::committed_in_range() does not break out outer loop when contiguous region is found

Zhengyu Gu zgu at redhat.com
Thu Jun 7 20:07:22 UTC 2018


Thanks, Thomas.

-Zhengyu

On 06/07/2018 04:05 PM, Thomas Stüfe wrote:
> On Thu, Jun 7, 2018 at 9:53 PM, Zhengyu Gu <zgu at redhat.com> wrote:
>> Hi Thomas,
>>
>> Thanks for the review.
>>
>> On 06/07/2018 02:51 PM, Thomas Stüfe wrote:
>>>
>>> Hi Zhengyu,
>>>
>>> thank you for addressing this so quick!
>>>
>>> The fix for (2) is nice and minimal.
>>>
>>> ---
>>>
>>> But about the second issue:
>>>
>>> 3131     // During shutdown, some memory goes away without properly
>>> notifying NMT,
>>> 3132     // E.g. ConcurrentGCThread/WatcherThread can exit without
>>> deleting thread object.
>>> 3133     // Bailout and return as not committed for now.
>>> 3134     if (mincore_return_value == -1 && errno == ENOMEM) {
>>> 3135       return false;
>>> 3136     }
>>>
>>> I still do not understand that part:
>>>
>>> if testing one stripe of pages (1024 pages amount to 4MB) turns out to
>>> be completely uncommitted, mincore will return ENOMEM.
>>> See man mincore(2): "ENOMEM addr to addr + length contained unmapped
>>> memory."
>>> But why return false in that case, why not simply continue with the next
>>> stripe?
>>>
>>
>> I believe *unmapped* here, means the address space is invalid, not just
>> without backing pages.
>>
>> The memory tracked by NMT, should always be mapped/valid. If it encounters
>> an unmapped/invalid address, that means it lost track somewhere (e.g.
>> mentioned in comments)
>>
> 
> Oh, you are right! Sorry for the noise, I misunderstood the man page completely.
> 
>>> ---
>>>
>>> May I suggest a further improvement, since you are touching the method
>>> anyway?
>>>
>>> I am always a bit queasy about mincore and its implicit output array
>>> size. Very unlikely, but if our notion of page size should differ by
>>> some weird coincidence from the notion mincore has (which is
>>> sysconf(_SC_PAGESIZE)), we may get stack overwrites. I once had such a
>>> case on AIX.
>>>
>>> So why not put a canary at the end:
>>>
>>> unsigned char vec[stripe + 1];
>>> vec[stripe] = 'X';
>>>
>>> ..
>>> mincore
>>> ..
>>> assert(vec[stripe] == 'X', "canary died");
>>
>>
>> Hmm, likely it will crash:-( I will add the assertion.
>>
>> Updated webrev: http://cr.openjdk.java.net/~zgu/8204557/webrev.01/
>>
>> Thanks,
>>
>> -Zhengyu
>>
> 
> This looks fine now. Thank you for fixing.
> 
> ..Thomas
> 
>>
>>
>>>
>>> ----
>>>
>>> Best Regards, Thomas
>>>
>>>
>>>
>>> On Thu, Jun 7, 2018 at 7:37 PM, Zhengyu Gu <zgu at redhat.com> wrote:
>>>>
>>>> Thomas found this bug while in discussion of JDK-8202772 [1]
>>>>
>>>> When contiguous committed region is found, it only breaks out inner loop.
>>>> Instead of returning the found range, it continues scanning the rest to
>>>> the
>>>> end.
>>>>
>>>> As the result, it squeezes all committed regions into one, even there are
>>>> holes.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8204557
>>>> Webrev: http://cr.openjdk.java.net/~zgu/8204557/webrev.00/
>>>>
>>>> Test:
>>>>
>>>>     hotspot_nmt and gtest on Linux x64 (fastdebug and release)
>>>>     Submit test also came back clean.
>>>>
>>>> [1]
>>>>
>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-June/028419.html
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu


More information about the hotspot-runtime-dev mailing list