RFR(XS) 8191369: NMT: Enhance thread stack tracking

Mon Mar 5 21:03:56 UTC 2018


On 3/5/18 3:02 PM, Zhengyu Gu wrote:
> Hi Coleen,
>
> On 03/05/2018 02:36 PM, coleen.phillimore at oracle.com wrote:
>>
>> Hi,
>> I think this change is broken because on my machine I get crashes 
>> while printing the stack region.
>>
>> FAILED: runtime/NMT/CheckForProperDetailStackTrace.java
>> FAILED: runtime/NMT/PrintNMTStatistics.java
>> FAILED: runtime/appcds/sharedStrings/FlagCombo.java
>>
>> When I reverted this change the tests pass.  I think we should 
>> backout this change.
>
>
> Sure.
>
> Which platforms they failed on?

They failed on linux x64 72 processor
model name    : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

thanks,
Coleen
>
> Thanks,
>
> -Zhengyu
>
>
>>
>> thanks,
>> Coleen
>>
>>
>> On 3/5/18 11:21 AM, Thomas Stüfe wrote:
>>> Hi Zhengyu,
>>>
>>> about the os_windows.cpp part:
>>>
>>> Are you sure this works as intended? I was not able to test it yet. 
>>> It may work but makes some assumptions: that the stack top you are 
>>> calculating is actually the end of a heap region; that you can just 
>>> pass any address as the first parameter to VirtualQuery (the 
>>> documentation is a bit ambiguous here); even that all committed 
>>> pages must be consecutive, which does not have to be the case (we 
>>> had cases on HyperV where there were holes in the committed thread 
>>> stack area).
>>>
>>> I probably would have reused the coding in os::current_stack_base(), 
>>> because we know that works:
>>>
>>> - call VirtualQuery with an address on the stack to get 
>>> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack 
>>> bottom here as input.
>>> - then walk all regions upward and sum up the sizes of the committed 
>>> ones, until we hit stack top - which supposedly should coincide with 
>>> the end of a heap region but who knows.
>>>
>>> Best Regards, Thomas
>>>
>>>
>>>
>>>
>>> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com 
>>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>>
>>>
>>>     This looks good but I hope someone else can review the os_linux
>>>     and os_windows parts.
>>>
>>>     I will sponsor this once the reviews are complete.
>>>
>>>     thanks,
>>>     Coleen
>>>
>>>
>>>     On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>>>
>>>         Hi,
>>>
>>>         I rebased the patch after JDK-8196405[1]. Stefan's patch fixed
>>>         virtual memory tracking issues I tried to address in early 
>>> patch.
>>>
>>>
>>>         In additions, I removed os::pd_committed_stack_size() methods
>>>         as Coleen suggested. Also added new gtest.
>>>
>>>         Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>         Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>>>
>>>         Test:
>>>           hotspot_runtime, test-hotspot-gtest on Linux 64 with
>>>         fastdebug and release builds.
>>>
>>>
>>>         [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>>> <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>>>
>>>         Thanks,
>>>
>>>         -Zhengyu
>>>
>>>
>>>         On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>>>
>>>
>>>
>>>                 No, what you have is better.   I dislike the
>>>                 delegation to the duplicate function signature in the
>>>                 pd_* layer.   Can you make the
>>>                 os::committed_stack_size() functions in the platforms
>>>                 where this isn't supported just be empy functions?
>>>
>>>             Okay.
>>>
>>>
>>>                 If there are platforms that don't have the ability to
>>>                 track committed regions of the mtThread stack, don't
>>>                 you still need the concept of all_committed?
>>>
>>>             No, it exists for convenience. However, stack tracking
>>>             code does need adjustment.
>>>
>>>             Thanks,
>>>
>>>             -Zhengyu
>>>
>>>
>>>
>>>                 thanks,
>>>                 Coleen
>>>
>>>
>>>                     Thanks,
>>>
>>>                     -Zhengyu
>>>
>>>
>>>
>>>
>>>
>>>                         Thanks,
>>>                         Coleen
>>>
>>>                         On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>>
>>>                             Hi,
>>>
>>>                             Can I get second (R)eviewer for this? and
>>>                             I also need a sponsor.
>>>
>>>                             This is JDK11 bug.
>>>
>>>                             Webrev:
>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>                             Bug: Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8191369
>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>
>>>                             Thanks,
>>>
>>>                             -Zhengyu
>>>
>>>
>>>
>>>
>>>                             On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>>
>>>                                 Hi Andrew,
>>>
>>>                                 Thanks for the review.
>>>
>>>                                         I'm not sure about the logic
>>>                                         of the test when you hit the 
>>> the
>>>                                         ENOMEM/uncommited case.
>>>
>>>                                         + mincore_return_value =
>>>                                         mincore(test_addr, page_sz, 
>>> vec);
>>>                                         +
>>>                                         +    if (mincore_return_value
>>>                                         == -1 || (committed_only &&
>>>                                         (vec[0] &
>>>                                         0x01) == 0)) {
>>>                                         +      // Page is not
>>>                                         mapped/committed go up
>>>                                         +      // to find first
>>>                                         mapped/committed page
>>>                                         +      if (errno != EAGAIN ||
>>>                                         (committed_only && (vec[0] &
>>>                                         0x01) == 0)) {
>>>                                         + 
>>> assert(mincore_return_value !=
>>>                                         -1 || errno == ENOMEM,
>>>                                         "Unexpected mincore errno");
>>>
>>>                                         Should that not be
>>>
>>>                                            +      if
>>>                                         ((mincore_return_value != -1
>>>                                         && errno != EAGAIN) ||
>>>                                         (committed_only && (vec[0] &
>>>                                         0x01) == 0)) {
>>>
>>>
>>>                                     Sorry, that correction should have
>>>                                     been:
>>>
>>>                                        +      if
>>>                                     ((mincore_return_value == -1 &&
>>>                                     errno != EAGAIN) ||
>>>                                        (committed_only && (vec[0] &
>>>                                     0x01) == 0)) {
>>>
>>>
>>>                                 Thanks for pointing out. Updated
>>>                                 accordingly:
>>>
>>>                                 Webrev:
>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>
>>>                                 -Zhengyu
>>>
>>>
>>>                                         A successful call to mincore
>>>                                         is not guaranteed to reset
>>>                                         errno and it
>>>                                         might by chance already have
>>>                                         value EAGAIN before the call
>>>                                         to mincore. If
>>>                                         we happened to hit a mapped,
>>>                                         committed page first time then
>>>                                         the vec bit
>>>                                         will be 1 and the code will
>>>                                         loop and retest the same
>>>                                         address. Unlikely
>>>                                         but possible?
>>>
>>>
>>>                                         The adjustment to the edge
>>>                                         case logic now looks correct.
>>>
>>>                                             The test case for
>>>                                             verifying fixes for above
>>>                                             two issues can be found
>>>                                             here:
>>> http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>>>
>>>
>>>                                         Yes, that verifies the edge
>>>                                         case handling e.g. I ran it
>>>                                         with 256 pages
>>>                                         and with 127/8/9 mapped and it
>>>                                         gave the right results.
>>>
>>>                                             As mentioned in early
>>>                                             email, new patch also
>>>                                             contains implementation for
>>>                                             Windows.
>>>
>>>                                             Updated Webrev:
>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>>>
>>>                                             Test:
>>>
>>>                                                Reran hotspot_tier1
>>>                                             tests on Windows x64 and
>>>                                             Linux x64 (fastdebug and
>>>                                             release)
>>>
>>>                                         I have no idea about the
>>>                                         validity of the Windows fix
>>>                                         but the Linux one
>>>                                         looks good modulo that if
>>>                                         condition.
>>>
>>>                                         regards,
>>>
>>>
>>>                                         Andrew Dinn
>>>                                         -----------
>>>                                         Senior Principal Software 
>>> Engineer
>>>                                         Red Hat UK Ltd
>>>                                         Registered in England and
>>>                                         Wales under Company
>>>                                         Registration No. 03798903
>>>                                         Directors: Michael Cunningham,
>>>                                         Michael ("Mike") O'Neill, Eric
>>>                                         Shander
>>>
>>>
>>>
>>>
>>>
>>>
>>