RFR(XS) 8191369: NMT: Enhance thread stack tracking
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Mon Mar 5 22:20:21 UTC 2018
On 3/5/18 5:05 PM, Zhengyu Gu wrote:
> Hi Coleen,
>>>
>>> Which platforms they failed on?
>>
>> They failed on linux x64 72 processor
>> model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
>
> I can reproduce on 56 processor machine. I am looking into it.
Thanks, I filed https://bugs.openjdk.java.net/browse/JDK-8199067
Coleen
>
> Thanks,
>
> -Zhengyu
>
>>
>> thanks,
>> Coleen
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>>>
>>>> thanks,
>>>> Coleen
>>>>
>>>>
>>>> On 3/5/18 11:21 AM, Thomas Stüfe wrote:
>>>>> Hi Zhengyu,
>>>>>
>>>>> about the os_windows.cpp part:
>>>>>
>>>>> Are you sure this works as intended? I was not able to test it
>>>>> yet. It may work but makes some assumptions: that the stack top
>>>>> you are calculating is actually the end of a heap region; that you
>>>>> can just pass any address as the first parameter to VirtualQuery
>>>>> (the documentation is a bit ambiguous here); even that all
>>>>> committed pages must be consecutive, which does not have to be the
>>>>> case (we had cases on HyperV where there were holes in the
>>>>> committed thread stack area).
>>>>>
>>>>> I probably would have reused the coding in
>>>>> os::current_stack_base(), because we know that works:
>>>>>
>>>>> - call VirtualQuery with an address on the stack to get
>>>>> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack
>>>>> bottom here as input.
>>>>> - then walk all regions upward and sum up the sizes of the
>>>>> committed ones, until we hit stack top - which supposedly should
>>>>> coincide with the end of a heap region but who knows.
>>>>>
>>>>> Best Regards, Thomas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com
>>>>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>>>>
>>>>>
>>>>> This looks good but I hope someone else can review the os_linux
>>>>> and os_windows parts.
>>>>>
>>>>> I will sponsor this once the reviews are complete.
>>>>>
>>>>> thanks,
>>>>> Coleen
>>>>>
>>>>>
>>>>> On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I rebased the patch after JDK-8196405[1]. Stefan's patch
>>>>> fixed
>>>>> virtual memory tracking issues I tried to address in early
>>>>> patch.
>>>>>
>>>>>
>>>>> In additions, I removed os::pd_committed_stack_size() methods
>>>>> as Coleen suggested. Also added new gtest.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>>>>>
>>>>> Test:
>>>>> hotspot_runtime, test-hotspot-gtest on Linux 64 with
>>>>> fastdebug and release builds.
>>>>>
>>>>>
>>>>> [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>>>>> <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Zhengyu
>>>>>
>>>>>
>>>>> On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>>>>>
>>>>>
>>>>>
>>>>> No, what you have is better. I dislike the
>>>>> delegation to the duplicate function signature in the
>>>>> pd_* layer. Can you make the
>>>>> os::committed_stack_size() functions in the platforms
>>>>> where this isn't supported just be empy functions?
>>>>>
>>>>> Okay.
>>>>>
>>>>>
>>>>> If there are platforms that don't have the ability to
>>>>> track committed regions of the mtThread stack, don't
>>>>> you still need the concept of all_committed?
>>>>>
>>>>> No, it exists for convenience. However, stack tracking
>>>>> code does need adjustment.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Zhengyu
>>>>>
>>>>>
>>>>>
>>>>> thanks,
>>>>> Coleen
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Zhengyu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Can I get second (R)eviewer for this? and
>>>>> I also need a sponsor.
>>>>>
>>>>> This is JDK11 bug.
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>>> Bug: Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8191369
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Zhengyu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> Thanks for the review.
>>>>>
>>>>> I'm not sure about the logic
>>>>> of the test when you hit
>>>>> the the
>>>>> ENOMEM/uncommited case.
>>>>>
>>>>> + mincore_return_value =
>>>>> mincore(test_addr,
>>>>> page_sz, vec);
>>>>> +
>>>>> + if (mincore_return_value
>>>>> == -1 || (committed_only &&
>>>>> (vec[0] &
>>>>> 0x01) == 0)) {
>>>>> + // Page is not
>>>>> mapped/committed go up
>>>>> + // to find first
>>>>> mapped/committed page
>>>>> + if (errno != EAGAIN ||
>>>>> (committed_only && (vec[0] &
>>>>> 0x01) == 0)) {
>>>>> +
>>>>> assert(mincore_return_value !=
>>>>> -1 || errno == ENOMEM,
>>>>> "Unexpected mincore errno");
>>>>>
>>>>> Should that not be
>>>>>
>>>>> + if
>>>>> ((mincore_return_value != -1
>>>>> && errno != EAGAIN) ||
>>>>> (committed_only && (vec[0] &
>>>>> 0x01) == 0)) {
>>>>>
>>>>>
>>>>> Sorry, that correction should
>>>>> have
>>>>> been:
>>>>>
>>>>> + if
>>>>> ((mincore_return_value == -1 &&
>>>>> errno != EAGAIN) ||
>>>>> (committed_only && (vec[0] &
>>>>> 0x01) == 0)) {
>>>>>
>>>>>
>>>>> Thanks for pointing out. Updated
>>>>> accordingly:
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>>>
>>>>> -Zhengyu
>>>>>
>>>>>
>>>>> A successful call to mincore
>>>>> is not guaranteed to reset
>>>>> errno and it
>>>>> might by chance already have
>>>>> value EAGAIN before the call
>>>>> to mincore. If
>>>>> we happened to hit a mapped,
>>>>> committed page first time
>>>>> then
>>>>> the vec bit
>>>>> will be 1 and the code will
>>>>> loop and retest the same
>>>>> address. Unlikely
>>>>> but possible?
>>>>>
>>>>>
>>>>> The adjustment to the edge
>>>>> case logic now looks correct.
>>>>>
>>>>> The test case for
>>>>> verifying fixes for above
>>>>> two issues can be found
>>>>> here:
>>>>> http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>>>>>
>>>>>
>>>>> Yes, that verifies the edge
>>>>> case handling e.g. I ran it
>>>>> with 256 pages
>>>>> and with 127/8/9 mapped
>>>>> and it
>>>>> gave the right results.
>>>>>
>>>>> As mentioned in early
>>>>> email, new patch also
>>>>> contains
>>>>> implementation for
>>>>> Windows.
>>>>>
>>>>> Updated Webrev:
>>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>>>>>
>>>>> Test:
>>>>>
>>>>> Reran hotspot_tier1
>>>>> tests on Windows x64 and
>>>>> Linux x64 (fastdebug and
>>>>> release)
>>>>>
>>>>> I have no idea about the
>>>>> validity of the Windows fix
>>>>> but the Linux one
>>>>> looks good modulo that if
>>>>> condition.
>>>>>
>>>>> regards,
>>>>>
>>>>>
>>>>> Andrew Dinn
>>>>> -----------
>>>>> Senior Principal Software
>>>>> Engineer
>>>>> Red Hat UK Ltd
>>>>> Registered in England and
>>>>> Wales under Company
>>>>> Registration No. 03798903
>>>>> Directors: Michael
>>>>> Cunningham,
>>>>> Michael ("Mike") O'Neill,
>>>>> Eric
>>>>> Shander
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
More information about the hotspot-runtime-dev
mailing list