RFR(XS) 8191369: NMT: Enhance thread stack tracking

Zhengyu Gu zgu at redhat.com
Mon Mar 5 17:13:49 UTC 2018


Hi Thomas,


On 03/05/2018 11:21 AM, Thomas Stüfe wrote:
> Hi Zhengyu,
> 
> about the os_windows.cpp part:
> 
> Are you sure this works as intended? I was not able to test it yet. It 
> may work but makes some assumptions: that the stack top you are 
> calculating is actually the end of a heap region; that you can just pass 
> any address as the first parameter to VirtualQuery (the documentation is 
> a bit ambiguous here); even that all committed pages must be 
> consecutive, which does not have to be the case (we had cases on HyperV 
> where there were holes in the committed thread stack area).
> 
> I probably would have reused the coding in os::current_stack_base(), 
> because we know that works:
> 
> - call VirtualQuery with an address on the stack to get 
> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack bottom 
> here as input.
> - then walk all regions upward and sum up the sizes of the committed 
> ones, until we hit stack top - which supposedly should coincide with the 
> end of a heap region but who knows.

The stack range we have here, is obtained from os::current_stack_base() 
and os::current_stack_size() during thread initialization, exactly the 
way as you mentioned above, I don't see why we can not trust it here.

I don't have any experiment on HyperV. If I recall correctly, on 
Windows, the stack has to be touched page by page, if you skip a page in 
the middle, you get SEGV. This may be different under Hypervisor? so we 
should walk whole range, instead of stopping at the first uncommitted 
region?

Thanks,

-Zhengyu




> 
> Best Regards, Thomas
> 
> 
> 
> 
> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com 
> <mailto:coleen.phillimore at oracle.com>> wrote:
> 
> 
>     This looks good but I hope someone else can review the os_linux and
>     os_windows parts.
> 
>     I will sponsor this once the reviews are complete.
> 
>     thanks,
>     Coleen
> 
> 
>     On 2/21/18 10:49 AM, Zhengyu Gu wrote:
> 
>         Hi,
> 
>         I rebased the patch after JDK-8196405[1]. Stefan's patch fixed
>         virtual memory tracking issues I tried to address in early patch.
> 
> 
>         In additions, I removed os::pd_committed_stack_size() methods as
>         Coleen suggested. Also added new gtest.
> 
>         Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>         <https://bugs.openjdk.java.net/browse/JDK-8191369>
>         Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>         <http://cr.openjdk.java.net/~zgu/8191369/webrev.03/>
> 
>         Test:
>            hotspot_runtime, test-hotspot-gtest on Linux 64 with
>         fastdebug and release builds.
> 
> 
>         [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>         <http://cr.openjdk.java.net/~stefank/8196405/webrev.02/>
> 
>         Thanks,
> 
>         -Zhengyu
> 
> 
>         On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
> 
> 
> 
>                 No, what you have is better.   I dislike the delegation
>                 to the duplicate function signature in the pd_* layer.  
>                 Can you make the os::committed_stack_size() functions in
>                 the platforms where this isn't supported just be empy
>                 functions?
> 
>             Okay.
> 
> 
>                 If there are platforms that don't have the ability to
>                 track committed regions of the mtThread stack, don't you
>                 still need the concept of all_committed?
> 
>             No, it exists for convenience. However, stack tracking code
>             does need adjustment.
> 
>             Thanks,
> 
>             -Zhengyu
> 
> 
> 
>                 thanks,
>                 Coleen
> 
> 
>                     Thanks,
> 
>                     -Zhengyu
> 
> 
> 
> 
> 
>                         Thanks,
>                         Coleen
> 
>                         On 1/11/18 4:05 PM, Zhengyu Gu wrote:
> 
>                             Hi,
> 
>                             Can I get second (R)eviewer for this? and I
>                             also need a sponsor.
> 
>                             This is JDK11 bug.
> 
>                             Webrev:
>                             http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>                             <http://cr.openjdk.java.net/~zgu/8191369/webrev.02/>
>                             Bug: Bug:
>                             https://bugs.openjdk.java.net/browse/JDK-8191369
>                             <https://bugs.openjdk.java.net/browse/JDK-8191369>
> 
>                             Thanks,
> 
>                             -Zhengyu
> 
> 
> 
> 
>                             On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
> 
>                                 Hi Andrew,
> 
>                                 Thanks for the review.
> 
>                                         I'm not sure about the logic of
>                                         the test when you hit the the
>                                         ENOMEM/uncommited case.
> 
>                                         +    mincore_return_value =
>                                         mincore(test_addr, page_sz, vec);
>                                         +
>                                         +    if (mincore_return_value ==
>                                         -1 || (committed_only && (vec[0] &
>                                         0x01) == 0)) {
>                                         +      // Page is not
>                                         mapped/committed go up
>                                         +      // to find first
>                                         mapped/committed page
>                                         +      if (errno != EAGAIN ||
>                                         (committed_only && (vec[0] &
>                                         0x01) == 0)) {
>                                         +       
>                                         assert(mincore_return_value !=
>                                         -1 || errno == ENOMEM,
>                                         "Unexpected mincore errno");
> 
>                                         Should that not be
> 
>                                             +      if
>                                         ((mincore_return_value != -1 &&
>                                         errno != EAGAIN) ||
>                                         (committed_only && (vec[0] &
>                                         0x01) == 0)) {
> 
> 
>                                     Sorry, that correction should have been:
> 
>                                         +      if ((mincore_return_value
>                                     == -1 && errno != EAGAIN) ||
>                                         (committed_only && (vec[0] &
>                                     0x01) == 0)) {
> 
> 
>                                 Thanks for pointing out. Updated
>                                 accordingly:
> 
>                                 Webrev:
>                                 http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>                                 <http://cr.openjdk.java.net/~zgu/8191369/webrev.02/>
> 
>                                 -Zhengyu
> 
> 
>                                         A successful call to mincore is
>                                         not guaranteed to reset errno and it
>                                         might by chance already have
>                                         value EAGAIN before the call to
>                                         mincore. If
>                                         we happened to hit a mapped,
>                                         committed page first time then
>                                         the vec bit
>                                         will be 1 and the code will loop
>                                         and retest the same address.
>                                         Unlikely
>                                         but possible?
> 
> 
>                                         The adjustment to the edge case
>                                         logic now looks correct.
> 
>                                             The test case for verifying
>                                             fixes for above two issues
>                                             can be found
>                                             here:
>                                             http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>                                             <http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c>
> 
> 
>                                         Yes, that verifies the edge case
>                                         handling e.g. I ran it with 256
>                                         pages
>                                         and with 127/8/9 mapped and it
>                                         gave the right results.
> 
>                                             As mentioned in early email,
>                                             new patch also contains
>                                             implementation for
>                                             Windows.
> 
>                                             Updated Webrev:
>                                             http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>                                             <http://cr.openjdk.java.net/~zgu/8191369/webrev.01/>
> 
>                                             Test:
> 
>                                                 Reran hotspot_tier1
>                                             tests on Windows x64 and
>                                             Linux x64 (fastdebug and
>                                             release)
> 
>                                         I have no idea about the
>                                         validity of the Windows fix but
>                                         the Linux one
>                                         looks good modulo that if condition.
> 
>                                         regards,
> 
> 
>                                         Andrew Dinn
>                                         -----------
>                                         Senior Principal Software Engineer
>                                         Red Hat UK Ltd
>                                         Registered in England and Wales
>                                         under Company Registration No.
>                                         03798903
>                                         Directors: Michael Cunningham,
>                                         Michael ("Mike") O'Neill, Eric
>                                         Shander
> 
> 
> 
> 
> 
> 


More information about the hotspot-runtime-dev mailing list