RFR(XS) 8191369: NMT: Enhance thread stack tracking

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Mon Mar 5 19:36:59 UTC 2018


Hi,
I think this change is broken because on my machine I get crashes while 
printing the stack region.

FAILED: runtime/NMT/CheckForProperDetailStackTrace.java
FAILED: runtime/NMT/PrintNMTStatistics.java
FAILED: runtime/appcds/sharedStrings/FlagCombo.java

When I reverted this change the tests pass.  I think we should backout 
this change.

thanks,
Coleen


On 3/5/18 11:21 AM, Thomas Stüfe wrote:
> Hi Zhengyu,
>
> about the os_windows.cpp part:
>
> Are you sure this works as intended? I was not able to test it yet. It 
> may work but makes some assumptions: that the stack top you are 
> calculating is actually the end of a heap region; that you can just 
> pass any address as the first parameter to VirtualQuery (the 
> documentation is a bit ambiguous here); even that all committed pages 
> must be consecutive, which does not have to be the case (we had cases 
> on HyperV where there were holes in the committed thread stack area).
>
> I probably would have reused the coding in os::current_stack_base(), 
> because we know that works:
>
> - call VirtualQuery with an address on the stack to get 
> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack 
> bottom here as input.
> - then walk all regions upward and sum up the sizes of the committed 
> ones, until we hit stack top - which supposedly should coincide with 
> the end of a heap region but who knows.
>
> Best Regards, Thomas
>
>
>
>
> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com 
> <mailto:coleen.phillimore at oracle.com>> wrote:
>
>
>     This looks good but I hope someone else can review the os_linux
>     and os_windows parts.
>
>     I will sponsor this once the reviews are complete.
>
>     thanks,
>     Coleen
>
>
>     On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>
>         Hi,
>
>         I rebased the patch after JDK-8196405[1]. Stefan's patch fixed
>         virtual memory tracking issues I tried to address in early patch.
>
>
>         In additions, I removed os::pd_committed_stack_size() methods
>         as Coleen suggested. Also added new gtest.
>
>         Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>         <https://bugs.openjdk.java.net/browse/JDK-8191369>
>         Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>         <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>
>         Test:
>           hotspot_runtime, test-hotspot-gtest on Linux 64 with
>         fastdebug and release builds.
>
>
>         [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>         <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>
>         Thanks,
>
>         -Zhengyu
>
>
>         On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>
>
>
>                 No, what you have is better.   I dislike the
>                 delegation to the duplicate function signature in the
>                 pd_* layer.   Can you make the
>                 os::committed_stack_size() functions in the platforms
>                 where this isn't supported just be empy functions?
>
>             Okay.
>
>
>                 If there are platforms that don't have the ability to
>                 track committed regions of the mtThread stack, don't
>                 you still need the concept of all_committed?
>
>             No, it exists for convenience. However, stack tracking
>             code does need adjustment.
>
>             Thanks,
>
>             -Zhengyu
>
>
>
>                 thanks,
>                 Coleen
>
>
>                     Thanks,
>
>                     -Zhengyu
>
>
>
>
>
>                         Thanks,
>                         Coleen
>
>                         On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>
>                             Hi,
>
>                             Can I get second (R)eviewer for this? and
>                             I also need a sponsor.
>
>                             This is JDK11 bug.
>
>                             Webrev:
>                             http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>                             <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>                             Bug: Bug:
>                             https://bugs.openjdk.java.net/browse/JDK-8191369
>                             <https://bugs.openjdk.java.net/browse/JDK-8191369>
>
>                             Thanks,
>
>                             -Zhengyu
>
>
>
>
>                             On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>
>                                 Hi Andrew,
>
>                                 Thanks for the review.
>
>                                         I'm not sure about the logic
>                                         of the test when you hit the the
>                                         ENOMEM/uncommited case.
>
>                                         +    mincore_return_value =
>                                         mincore(test_addr, page_sz, vec);
>                                         +
>                                         +    if (mincore_return_value
>                                         == -1 || (committed_only &&
>                                         (vec[0] &
>                                         0x01) == 0)) {
>                                         +      // Page is not
>                                         mapped/committed go up
>                                         +      // to find first
>                                         mapped/committed page
>                                         +      if (errno != EAGAIN ||
>                                         (committed_only && (vec[0] &
>                                         0x01) == 0)) {
>                                         +       
>                                         assert(mincore_return_value !=
>                                         -1 || errno == ENOMEM,
>                                         "Unexpected mincore errno");
>
>                                         Should that not be
>
>                                            +      if
>                                         ((mincore_return_value != -1
>                                         && errno != EAGAIN) ||
>                                         (committed_only && (vec[0] &
>                                         0x01) == 0)) {
>
>
>                                     Sorry, that correction should have
>                                     been:
>
>                                        +      if
>                                     ((mincore_return_value == -1 &&
>                                     errno != EAGAIN) ||
>                                        (committed_only && (vec[0] &
>                                     0x01) == 0)) {
>
>
>                                 Thanks for pointing out. Updated
>                                 accordingly:
>
>                                 Webrev:
>                                 http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>                                 <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>
>                                 -Zhengyu
>
>
>                                         A successful call to mincore
>                                         is not guaranteed to reset
>                                         errno and it
>                                         might by chance already have
>                                         value EAGAIN before the call
>                                         to mincore. If
>                                         we happened to hit a mapped,
>                                         committed page first time then
>                                         the vec bit
>                                         will be 1 and the code will
>                                         loop and retest the same
>                                         address. Unlikely
>                                         but possible?
>
>
>                                         The adjustment to the edge
>                                         case logic now looks correct.
>
>                                             The test case for
>                                             verifying fixes for above
>                                             two issues can be found
>                                             here:
>                                             http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>                                             <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>
>
>                                         Yes, that verifies the edge
>                                         case handling e.g. I ran it
>                                         with 256 pages
>                                         and with 127/8/9 mapped and it
>                                         gave the right results.
>
>                                             As mentioned in early
>                                             email, new patch also
>                                             contains implementation for
>                                             Windows.
>
>                                             Updated Webrev:
>                                             http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>                                             <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>
>                                             Test:
>
>                                                Reran hotspot_tier1
>                                             tests on Windows x64 and
>                                             Linux x64 (fastdebug and
>                                             release)
>
>                                         I have no idea about the
>                                         validity of the Windows fix
>                                         but the Linux one
>                                         looks good modulo that if
>                                         condition.
>
>                                         regards,
>
>
>                                         Andrew Dinn
>                                         -----------
>                                         Senior Principal Software Engineer
>                                         Red Hat UK Ltd
>                                         Registered in England and
>                                         Wales under Company
>                                         Registration No. 03798903
>                                         Directors: Michael Cunningham,
>                                         Michael ("Mike") O'Neill, Eric
>                                         Shander
>
>
>
>
>
>



More information about the hotspot-runtime-dev mailing list