RFR(XS) 8191369: NMT: Enhance thread stack tracking
Zhengyu Gu
zgu at redhat.com
Mon Mar 5 22:05:03 UTC 2018
Hi Coleen,
>>
>> Which platforms they failed on?
>
> They failed on linux x64 72 processor
> model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
I can reproduce on 56 processor machine. I am looking into it.
Thanks,
-Zhengyu
>
> thanks,
> Coleen
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> On 3/5/18 11:21 AM, Thomas Stüfe wrote:
>>>> Hi Zhengyu,
>>>>
>>>> about the os_windows.cpp part:
>>>>
>>>> Are you sure this works as intended? I was not able to test it yet.
>>>> It may work but makes some assumptions: that the stack top you are
>>>> calculating is actually the end of a heap region; that you can just
>>>> pass any address as the first parameter to VirtualQuery (the
>>>> documentation is a bit ambiguous here); even that all committed
>>>> pages must be consecutive, which does not have to be the case (we
>>>> had cases on HyperV where there were holes in the committed thread
>>>> stack area).
>>>>
>>>> I probably would have reused the coding in os::current_stack_base(),
>>>> because we know that works:
>>>>
>>>> - call VirtualQuery with an address on the stack to get
>>>> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack
>>>> bottom here as input.
>>>> - then walk all regions upward and sum up the sizes of the committed
>>>> ones, until we hit stack top - which supposedly should coincide with
>>>> the end of a heap region but who knows.
>>>>
>>>> Best Regards, Thomas
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com
>>>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>>>
>>>>
>>>> This looks good but I hope someone else can review the os_linux
>>>> and os_windows parts.
>>>>
>>>> I will sponsor this once the reviews are complete.
>>>>
>>>> thanks,
>>>> Coleen
>>>>
>>>>
>>>> On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>>>>
>>>> Hi,
>>>>
>>>> I rebased the patch after JDK-8196405[1]. Stefan's patch fixed
>>>> virtual memory tracking issues I tried to address in early
>>>> patch.
>>>>
>>>>
>>>> In additions, I removed os::pd_committed_stack_size() methods
>>>> as Coleen suggested. Also added new gtest.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>> Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>>>>
>>>> Test:
>>>> hotspot_runtime, test-hotspot-gtest on Linux 64 with
>>>> fastdebug and release builds.
>>>>
>>>>
>>>> [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>>>> <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu
>>>>
>>>>
>>>> On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>>>>
>>>>
>>>>
>>>> No, what you have is better. I dislike the
>>>> delegation to the duplicate function signature in the
>>>> pd_* layer. Can you make the
>>>> os::committed_stack_size() functions in the platforms
>>>> where this isn't supported just be empy functions?
>>>>
>>>> Okay.
>>>>
>>>>
>>>> If there are platforms that don't have the ability to
>>>> track committed regions of the mtThread stack, don't
>>>> you still need the concept of all_committed?
>>>>
>>>> No, it exists for convenience. However, stack tracking
>>>> code does need adjustment.
>>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu
>>>>
>>>>
>>>>
>>>> thanks,
>>>> Coleen
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>>>
>>>> Hi,
>>>>
>>>> Can I get second (R)eviewer for this? and
>>>> I also need a sponsor.
>>>>
>>>> This is JDK11 bug.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>> Bug: Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8191369
>>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>>
>>>> Thanks,
>>>>
>>>> -Zhengyu
>>>>
>>>>
>>>>
>>>>
>>>> On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>>>
>>>> Hi Andrew,
>>>>
>>>> Thanks for the review.
>>>>
>>>> I'm not sure about the logic
>>>> of the test when you hit the
>>>> the
>>>> ENOMEM/uncommited case.
>>>>
>>>> + mincore_return_value =
>>>> mincore(test_addr, page_sz,
>>>> vec);
>>>> +
>>>> + if (mincore_return_value
>>>> == -1 || (committed_only &&
>>>> (vec[0] &
>>>> 0x01) == 0)) {
>>>> + // Page is not
>>>> mapped/committed go up
>>>> + // to find first
>>>> mapped/committed page
>>>> + if (errno != EAGAIN ||
>>>> (committed_only && (vec[0] &
>>>> 0x01) == 0)) {
>>>> +
>>>> assert(mincore_return_value !=
>>>> -1 || errno == ENOMEM,
>>>> "Unexpected mincore errno");
>>>>
>>>> Should that not be
>>>>
>>>> + if
>>>> ((mincore_return_value != -1
>>>> && errno != EAGAIN) ||
>>>> (committed_only && (vec[0] &
>>>> 0x01) == 0)) {
>>>>
>>>>
>>>> Sorry, that correction should have
>>>> been:
>>>>
>>>> + if
>>>> ((mincore_return_value == -1 &&
>>>> errno != EAGAIN) ||
>>>> (committed_only && (vec[0] &
>>>> 0x01) == 0)) {
>>>>
>>>>
>>>> Thanks for pointing out. Updated
>>>> accordingly:
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>>
>>>> -Zhengyu
>>>>
>>>>
>>>> A successful call to mincore
>>>> is not guaranteed to reset
>>>> errno and it
>>>> might by chance already have
>>>> value EAGAIN before the call
>>>> to mincore. If
>>>> we happened to hit a mapped,
>>>> committed page first time then
>>>> the vec bit
>>>> will be 1 and the code will
>>>> loop and retest the same
>>>> address. Unlikely
>>>> but possible?
>>>>
>>>>
>>>> The adjustment to the edge
>>>> case logic now looks correct.
>>>>
>>>> The test case for
>>>> verifying fixes for above
>>>> two issues can be found
>>>> here:
>>>> http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>>>>
>>>>
>>>> Yes, that verifies the edge
>>>> case handling e.g. I ran it
>>>> with 256 pages
>>>> and with 127/8/9 mapped and it
>>>> gave the right results.
>>>>
>>>> As mentioned in early
>>>> email, new patch also
>>>> contains implementation for
>>>> Windows.
>>>>
>>>> Updated Webrev:
>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>>>>
>>>> Test:
>>>>
>>>> Reran hotspot_tier1
>>>> tests on Windows x64 and
>>>> Linux x64 (fastdebug and
>>>> release)
>>>>
>>>> I have no idea about the
>>>> validity of the Windows fix
>>>> but the Linux one
>>>> looks good modulo that if
>>>> condition.
>>>>
>>>> regards,
>>>>
>>>>
>>>> Andrew Dinn
>>>> -----------
>>>> Senior Principal Software
>>>> Engineer
>>>> Red Hat UK Ltd
>>>> Registered in England and
>>>> Wales under Company
>>>> Registration No. 03798903
>>>> Directors: Michael Cunningham,
>>>> Michael ("Mike") O'Neill, Eric
>>>> Shander
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list