RFR(XS) 8191369: NMT: Enhance thread stack tracking

Zhengyu Gu zgu at redhat.com
Mon Mar 5 22:05:03 UTC 2018


Hi Coleen,
>>
>> Which platforms they failed on?
> 
> They failed on linux x64 72 processor
> model name    : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

I can reproduce on 56 processor machine. I am looking into it.

Thanks,

-Zhengyu

> 
> thanks,
> Coleen
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> On 3/5/18 11:21 AM, Thomas Stüfe wrote:
>>>> Hi Zhengyu,
>>>>
>>>> about the os_windows.cpp part:
>>>>
>>>> Are you sure this works as intended? I was not able to test it yet. 
>>>> It may work but makes some assumptions: that the stack top you are 
>>>> calculating is actually the end of a heap region; that you can just 
>>>> pass any address as the first parameter to VirtualQuery (the 
>>>> documentation is a bit ambiguous here); even that all committed 
>>>> pages must be consecutive, which does not have to be the case (we 
>>>> had cases on HyperV where there were holes in the committed thread 
>>>> stack area).
>>>>
>>>> I probably would have reused the coding in os::current_stack_base(), 
>>>> because we know that works:
>>>>
>>>> - call VirtualQuery with an address on the stack to get 
>>>> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack 
>>>> bottom here as input.
>>>> - then walk all regions upward and sum up the sizes of the committed 
>>>> ones, until we hit stack top - which supposedly should coincide with 
>>>> the end of a heap region but who knows.
>>>>
>>>> Best Regards, Thomas
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com 
>>>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>>>
>>>>
>>>>     This looks good but I hope someone else can review the os_linux
>>>>     and os_windows parts.
>>>>
>>>>     I will sponsor this once the reviews are complete.
>>>>
>>>>     thanks,
>>>>     Coleen
>>>>
>>>>
>>>>     On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>>>>
>>>>         Hi,
>>>>
>>>>         I rebased the patch after JDK-8196405[1]. Stefan's patch fixed
>>>>         virtual memory tracking issues I tried to address in early 
>>>> patch.
>>>>
>>>>
>>>>         In additions, I removed os::pd_committed_stack_size() methods
>>>>         as Coleen suggested. Also added new gtest.
>>>>
>>>>         Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>>         Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>>>>
>>>>         Test:
>>>>           hotspot_runtime, test-hotspot-gtest on Linux 64 with
>>>>         fastdebug and release builds.
>>>>
>>>>
>>>>         [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>>>> <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>>>>
>>>>         Thanks,
>>>>
>>>>         -Zhengyu
>>>>
>>>>
>>>>         On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>>>>
>>>>
>>>>
>>>>                 No, what you have is better.   I dislike the
>>>>                 delegation to the duplicate function signature in the
>>>>                 pd_* layer.   Can you make the
>>>>                 os::committed_stack_size() functions in the platforms
>>>>                 where this isn't supported just be empy functions?
>>>>
>>>>             Okay.
>>>>
>>>>
>>>>                 If there are platforms that don't have the ability to
>>>>                 track committed regions of the mtThread stack, don't
>>>>                 you still need the concept of all_committed?
>>>>
>>>>             No, it exists for convenience. However, stack tracking
>>>>             code does need adjustment.
>>>>
>>>>             Thanks,
>>>>
>>>>             -Zhengyu
>>>>
>>>>
>>>>
>>>>                 thanks,
>>>>                 Coleen
>>>>
>>>>
>>>>                     Thanks,
>>>>
>>>>                     -Zhengyu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                         Thanks,
>>>>                         Coleen
>>>>
>>>>                         On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>>>
>>>>                             Hi,
>>>>
>>>>                             Can I get second (R)eviewer for this? and
>>>>                             I also need a sponsor.
>>>>
>>>>                             This is JDK11 bug.
>>>>
>>>>                             Webrev:
>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>>                             Bug: Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8191369
>>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>>
>>>>                             Thanks,
>>>>
>>>>                             -Zhengyu
>>>>
>>>>
>>>>
>>>>
>>>>                             On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>>>
>>>>                                 Hi Andrew,
>>>>
>>>>                                 Thanks for the review.
>>>>
>>>>                                         I'm not sure about the logic
>>>>                                         of the test when you hit the 
>>>> the
>>>>                                         ENOMEM/uncommited case.
>>>>
>>>>                                         + mincore_return_value =
>>>>                                         mincore(test_addr, page_sz, 
>>>> vec);
>>>>                                         +
>>>>                                         +    if (mincore_return_value
>>>>                                         == -1 || (committed_only &&
>>>>                                         (vec[0] &
>>>>                                         0x01) == 0)) {
>>>>                                         +      // Page is not
>>>>                                         mapped/committed go up
>>>>                                         +      // to find first
>>>>                                         mapped/committed page
>>>>                                         +      if (errno != EAGAIN ||
>>>>                                         (committed_only && (vec[0] &
>>>>                                         0x01) == 0)) {
>>>>                                         + 
>>>> assert(mincore_return_value !=
>>>>                                         -1 || errno == ENOMEM,
>>>>                                         "Unexpected mincore errno");
>>>>
>>>>                                         Should that not be
>>>>
>>>>                                            +      if
>>>>                                         ((mincore_return_value != -1
>>>>                                         && errno != EAGAIN) ||
>>>>                                         (committed_only && (vec[0] &
>>>>                                         0x01) == 0)) {
>>>>
>>>>
>>>>                                     Sorry, that correction should have
>>>>                                     been:
>>>>
>>>>                                        +      if
>>>>                                     ((mincore_return_value == -1 &&
>>>>                                     errno != EAGAIN) ||
>>>>                                        (committed_only && (vec[0] &
>>>>                                     0x01) == 0)) {
>>>>
>>>>
>>>>                                 Thanks for pointing out. Updated
>>>>                                 accordingly:
>>>>
>>>>                                 Webrev:
>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>>
>>>>                                 -Zhengyu
>>>>
>>>>
>>>>                                         A successful call to mincore
>>>>                                         is not guaranteed to reset
>>>>                                         errno and it
>>>>                                         might by chance already have
>>>>                                         value EAGAIN before the call
>>>>                                         to mincore. If
>>>>                                         we happened to hit a mapped,
>>>>                                         committed page first time then
>>>>                                         the vec bit
>>>>                                         will be 1 and the code will
>>>>                                         loop and retest the same
>>>>                                         address. Unlikely
>>>>                                         but possible?
>>>>
>>>>
>>>>                                         The adjustment to the edge
>>>>                                         case logic now looks correct.
>>>>
>>>>                                             The test case for
>>>>                                             verifying fixes for above
>>>>                                             two issues can be found
>>>>                                             here:
>>>> http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>>>>
>>>>
>>>>                                         Yes, that verifies the edge
>>>>                                         case handling e.g. I ran it
>>>>                                         with 256 pages
>>>>                                         and with 127/8/9 mapped and it
>>>>                                         gave the right results.
>>>>
>>>>                                             As mentioned in early
>>>>                                             email, new patch also
>>>>                                             contains implementation for
>>>>                                             Windows.
>>>>
>>>>                                             Updated Webrev:
>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>>>>
>>>>                                             Test:
>>>>
>>>>                                                Reran hotspot_tier1
>>>>                                             tests on Windows x64 and
>>>>                                             Linux x64 (fastdebug and
>>>>                                             release)
>>>>
>>>>                                         I have no idea about the
>>>>                                         validity of the Windows fix
>>>>                                         but the Linux one
>>>>                                         looks good modulo that if
>>>>                                         condition.
>>>>
>>>>                                         regards,
>>>>
>>>>
>>>>                                         Andrew Dinn
>>>>                                         -----------
>>>>                                         Senior Principal Software 
>>>> Engineer
>>>>                                         Red Hat UK Ltd
>>>>                                         Registered in England and
>>>>                                         Wales under Company
>>>>                                         Registration No. 03798903
>>>>                                         Directors: Michael Cunningham,
>>>>                                         Michael ("Mike") O'Neill, Eric
>>>>                                         Shander
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
> 


More information about the hotspot-runtime-dev mailing list