RFR(XS) 8191369: NMT: Enhance thread stack tracking

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Mon Mar 5 22:20:21 UTC 2018



On 3/5/18 5:05 PM, Zhengyu Gu wrote:
> Hi Coleen,
>>>
>>> Which platforms they failed on?
>>
>> They failed on linux x64 72 processor
>> model name    : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
>
> I can reproduce on 56 processor machine. I am looking into it.

Thanks, I filed https://bugs.openjdk.java.net/browse/JDK-8199067

Coleen

>
> Thanks,
>
> -Zhengyu
>
>>
>> thanks,
>> Coleen
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>>>
>>>> thanks,
>>>> Coleen
>>>>
>>>>
>>>> On 3/5/18 11:21 AM, Thomas Stüfe wrote:
>>>>> Hi Zhengyu,
>>>>>
>>>>> about the os_windows.cpp part:
>>>>>
>>>>> Are you sure this works as intended? I was not able to test it 
>>>>> yet. It may work but makes some assumptions: that the stack top 
>>>>> you are calculating is actually the end of a heap region; that you 
>>>>> can just pass any address as the first parameter to VirtualQuery 
>>>>> (the documentation is a bit ambiguous here); even that all 
>>>>> committed pages must be consecutive, which does not have to be the 
>>>>> case (we had cases on HyperV where there were holes in the 
>>>>> committed thread stack area).
>>>>>
>>>>> I probably would have reused the coding in 
>>>>> os::current_stack_base(), because we know that works:
>>>>>
>>>>> - call VirtualQuery with an address on the stack to get 
>>>>> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack 
>>>>> bottom here as input.
>>>>> - then walk all regions upward and sum up the sizes of the 
>>>>> committed ones, until we hit stack top - which supposedly should 
>>>>> coincide with the end of a heap region but who knows.
>>>>>
>>>>> Best Regards, Thomas
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com 
>>>>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>>>>
>>>>>
>>>>>     This looks good but I hope someone else can review the os_linux
>>>>>     and os_windows parts.
>>>>>
>>>>>     I will sponsor this once the reviews are complete.
>>>>>
>>>>>     thanks,
>>>>>     Coleen
>>>>>
>>>>>
>>>>>     On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>>>>>
>>>>>         Hi,
>>>>>
>>>>>         I rebased the patch after JDK-8196405[1]. Stefan's patch 
>>>>> fixed
>>>>>         virtual memory tracking issues I tried to address in early 
>>>>> patch.
>>>>>
>>>>>
>>>>>         In additions, I removed os::pd_committed_stack_size() methods
>>>>>         as Coleen suggested. Also added new gtest.
>>>>>
>>>>>         Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>>>         Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>>>>>
>>>>>         Test:
>>>>>           hotspot_runtime, test-hotspot-gtest on Linux 64 with
>>>>>         fastdebug and release builds.
>>>>>
>>>>>
>>>>>         [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>>>>> <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>>>>>
>>>>>         Thanks,
>>>>>
>>>>>         -Zhengyu
>>>>>
>>>>>
>>>>>         On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>>>>>
>>>>>
>>>>>
>>>>>                 No, what you have is better.   I dislike the
>>>>>                 delegation to the duplicate function signature in the
>>>>>                 pd_* layer.   Can you make the
>>>>>                 os::committed_stack_size() functions in the platforms
>>>>>                 where this isn't supported just be empy functions?
>>>>>
>>>>>             Okay.
>>>>>
>>>>>
>>>>>                 If there are platforms that don't have the ability to
>>>>>                 track committed regions of the mtThread stack, don't
>>>>>                 you still need the concept of all_committed?
>>>>>
>>>>>             No, it exists for convenience. However, stack tracking
>>>>>             code does need adjustment.
>>>>>
>>>>>             Thanks,
>>>>>
>>>>>             -Zhengyu
>>>>>
>>>>>
>>>>>
>>>>>                 thanks,
>>>>>                 Coleen
>>>>>
>>>>>
>>>>>                     Thanks,
>>>>>
>>>>>                     -Zhengyu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                         Thanks,
>>>>>                         Coleen
>>>>>
>>>>>                         On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>>>>
>>>>>                             Hi,
>>>>>
>>>>>                             Can I get second (R)eviewer for this? and
>>>>>                             I also need a sponsor.
>>>>>
>>>>>                             This is JDK11 bug.
>>>>>
>>>>>                             Webrev:
>>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>>>                             Bug: Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8191369
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>>>
>>>>>                             Thanks,
>>>>>
>>>>>                             -Zhengyu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                             On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>>>>
>>>>>                                 Hi Andrew,
>>>>>
>>>>>                                 Thanks for the review.
>>>>>
>>>>>                                         I'm not sure about the logic
>>>>>                                         of the test when you hit 
>>>>> the the
>>>>>                                         ENOMEM/uncommited case.
>>>>>
>>>>>                                         + mincore_return_value =
>>>>>                                         mincore(test_addr, 
>>>>> page_sz, vec);
>>>>>                                         +
>>>>>                                         +    if (mincore_return_value
>>>>>                                         == -1 || (committed_only &&
>>>>>                                         (vec[0] &
>>>>>                                         0x01) == 0)) {
>>>>>                                         +      // Page is not
>>>>>                                         mapped/committed go up
>>>>>                                         +      // to find first
>>>>>                                         mapped/committed page
>>>>>                                         +      if (errno != EAGAIN ||
>>>>>                                         (committed_only && (vec[0] &
>>>>>                                         0x01) == 0)) {
>>>>>                                         + 
>>>>> assert(mincore_return_value !=
>>>>>                                         -1 || errno == ENOMEM,
>>>>>                                         "Unexpected mincore errno");
>>>>>
>>>>>                                         Should that not be
>>>>>
>>>>>                                            +      if
>>>>> ((mincore_return_value != -1
>>>>>                                         && errno != EAGAIN) ||
>>>>>                                         (committed_only && (vec[0] &
>>>>>                                         0x01) == 0)) {
>>>>>
>>>>>
>>>>>                                     Sorry, that correction should 
>>>>> have
>>>>>                                     been:
>>>>>
>>>>>                                        +      if
>>>>>                                     ((mincore_return_value == -1 &&
>>>>>                                     errno != EAGAIN) ||
>>>>>                                        (committed_only && (vec[0] &
>>>>>                                     0x01) == 0)) {
>>>>>
>>>>>
>>>>>                                 Thanks for pointing out. Updated
>>>>>                                 accordingly:
>>>>>
>>>>>                                 Webrev:
>>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>>>
>>>>>                                 -Zhengyu
>>>>>
>>>>>
>>>>>                                         A successful call to mincore
>>>>>                                         is not guaranteed to reset
>>>>>                                         errno and it
>>>>>                                         might by chance already have
>>>>>                                         value EAGAIN before the call
>>>>>                                         to mincore. If
>>>>>                                         we happened to hit a mapped,
>>>>>                                         committed page first time 
>>>>> then
>>>>>                                         the vec bit
>>>>>                                         will be 1 and the code will
>>>>>                                         loop and retest the same
>>>>>                                         address. Unlikely
>>>>>                                         but possible?
>>>>>
>>>>>
>>>>>                                         The adjustment to the edge
>>>>>                                         case logic now looks correct.
>>>>>
>>>>>                                             The test case for
>>>>>                                             verifying fixes for above
>>>>>                                             two issues can be found
>>>>>                                             here:
>>>>> http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>>>>>
>>>>>
>>>>>                                         Yes, that verifies the edge
>>>>>                                         case handling e.g. I ran it
>>>>>                                         with 256 pages
>>>>>                                         and with 127/8/9 mapped 
>>>>> and it
>>>>>                                         gave the right results.
>>>>>
>>>>>                                             As mentioned in early
>>>>>                                             email, new patch also
>>>>>                                             contains 
>>>>> implementation for
>>>>>                                             Windows.
>>>>>
>>>>>                                             Updated Webrev:
>>>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>>>>>
>>>>>                                             Test:
>>>>>
>>>>>                                                Reran hotspot_tier1
>>>>>                                             tests on Windows x64 and
>>>>>                                             Linux x64 (fastdebug and
>>>>>                                             release)
>>>>>
>>>>>                                         I have no idea about the
>>>>>                                         validity of the Windows fix
>>>>>                                         but the Linux one
>>>>>                                         looks good modulo that if
>>>>>                                         condition.
>>>>>
>>>>>                                         regards,
>>>>>
>>>>>
>>>>>                                         Andrew Dinn
>>>>>                                         -----------
>>>>>                                         Senior Principal Software 
>>>>> Engineer
>>>>>                                         Red Hat UK Ltd
>>>>>                                         Registered in England and
>>>>>                                         Wales under Company
>>>>>                                         Registration No. 03798903
>>>>>                                         Directors: Michael 
>>>>> Cunningham,
>>>>>                                         Michael ("Mike") O'Neill, 
>>>>> Eric
>>>>>                                         Shander
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>



More information about the hotspot-runtime-dev mailing list