RFR(XS) 8191369: NMT: Enhance thread stack tracking

Zhengyu Gu zgu at redhat.com
Mon Mar 5 20:02:57 UTC 2018


Hi Coleen,

On 03/05/2018 02:36 PM, coleen.phillimore at oracle.com wrote:
> 
> Hi,
> I think this change is broken because on my machine I get crashes while 
> printing the stack region.
> 
> FAILED: runtime/NMT/CheckForProperDetailStackTrace.java
> FAILED: runtime/NMT/PrintNMTStatistics.java
> FAILED: runtime/appcds/sharedStrings/FlagCombo.java
> 
> When I reverted this change the tests pass.  I think we should backout 
> this change.


Sure.

Which platforms they failed on?

Thanks,

-Zhengyu


> 
> thanks,
> Coleen
> 
> 
> On 3/5/18 11:21 AM, Thomas Stüfe wrote:
>> Hi Zhengyu,
>>
>> about the os_windows.cpp part:
>>
>> Are you sure this works as intended? I was not able to test it yet. It 
>> may work but makes some assumptions: that the stack top you are 
>> calculating is actually the end of a heap region; that you can just 
>> pass any address as the first parameter to VirtualQuery (the 
>> documentation is a bit ambiguous here); even that all committed pages 
>> must be consecutive, which does not have to be the case (we had cases 
>> on HyperV where there were holes in the committed thread stack area).
>>
>> I probably would have reused the coding in os::current_stack_base(), 
>> because we know that works:
>>
>> - call VirtualQuery with an address on the stack to get 
>> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack 
>> bottom here as input.
>> - then walk all regions upward and sum up the sizes of the committed 
>> ones, until we hit stack top - which supposedly should coincide with 
>> the end of a heap region but who knows.
>>
>> Best Regards, Thomas
>>
>>
>>
>>
>> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com 
>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>
>>
>>     This looks good but I hope someone else can review the os_linux
>>     and os_windows parts.
>>
>>     I will sponsor this once the reviews are complete.
>>
>>     thanks,
>>     Coleen
>>
>>
>>     On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>>
>>         Hi,
>>
>>         I rebased the patch after JDK-8196405[1]. Stefan's patch fixed
>>         virtual memory tracking issues I tried to address in early patch.
>>
>>
>>         In additions, I removed os::pd_committed_stack_size() methods
>>         as Coleen suggested. Also added new gtest.
>>
>>         Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>         <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>         Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>>         <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>>
>>         Test:
>>           hotspot_runtime, test-hotspot-gtest on Linux 64 with
>>         fastdebug and release builds.
>>
>>
>>         [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>>         <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>>
>>         Thanks,
>>
>>         -Zhengyu
>>
>>
>>         On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>>
>>
>>
>>                 No, what you have is better.   I dislike the
>>                 delegation to the duplicate function signature in the
>>                 pd_* layer.   Can you make the
>>                 os::committed_stack_size() functions in the platforms
>>                 where this isn't supported just be empy functions?
>>
>>             Okay.
>>
>>
>>                 If there are platforms that don't have the ability to
>>                 track committed regions of the mtThread stack, don't
>>                 you still need the concept of all_committed?
>>
>>             No, it exists for convenience. However, stack tracking
>>             code does need adjustment.
>>
>>             Thanks,
>>
>>             -Zhengyu
>>
>>
>>
>>                 thanks,
>>                 Coleen
>>
>>
>>                     Thanks,
>>
>>                     -Zhengyu
>>
>>
>>
>>
>>
>>                         Thanks,
>>                         Coleen
>>
>>                         On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>
>>                             Hi,
>>
>>                             Can I get second (R)eviewer for this? and
>>                             I also need a sponsor.
>>
>>                             This is JDK11 bug.
>>
>>                             Webrev:
>>                             http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>                             <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>                             Bug: Bug:
>>                             https://bugs.openjdk.java.net/browse/JDK-8191369
>>                             <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>
>>                             Thanks,
>>
>>                             -Zhengyu
>>
>>
>>
>>
>>                             On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>
>>                                 Hi Andrew,
>>
>>                                 Thanks for the review.
>>
>>                                         I'm not sure about the logic
>>                                         of the test when you hit the the
>>                                         ENOMEM/uncommited case.
>>
>>                                         +    mincore_return_value =
>>                                         mincore(test_addr, page_sz, vec);
>>                                         +
>>                                         +    if (mincore_return_value
>>                                         == -1 || (committed_only &&
>>                                         (vec[0] &
>>                                         0x01) == 0)) {
>>                                         +      // Page is not
>>                                         mapped/committed go up
>>                                         +      // to find first
>>                                         mapped/committed page
>>                                         +      if (errno != EAGAIN ||
>>                                         (committed_only && (vec[0] &
>>                                         0x01) == 0)) {
>>                                         +       
>>                                         assert(mincore_return_value !=
>>                                         -1 || errno == ENOMEM,
>>                                         "Unexpected mincore errno");
>>
>>                                         Should that not be
>>
>>                                            +      if
>>                                         ((mincore_return_value != -1
>>                                         && errno != EAGAIN) ||
>>                                         (committed_only && (vec[0] &
>>                                         0x01) == 0)) {
>>
>>
>>                                     Sorry, that correction should have
>>                                     been:
>>
>>                                        +      if
>>                                     ((mincore_return_value == -1 &&
>>                                     errno != EAGAIN) ||
>>                                        (committed_only && (vec[0] &
>>                                     0x01) == 0)) {
>>
>>
>>                                 Thanks for pointing out. Updated
>>                                 accordingly:
>>
>>                                 Webrev:
>>                                 http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>                                 <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>
>>                                 -Zhengyu
>>
>>
>>                                         A successful call to mincore
>>                                         is not guaranteed to reset
>>                                         errno and it
>>                                         might by chance already have
>>                                         value EAGAIN before the call
>>                                         to mincore. If
>>                                         we happened to hit a mapped,
>>                                         committed page first time then
>>                                         the vec bit
>>                                         will be 1 and the code will
>>                                         loop and retest the same
>>                                         address. Unlikely
>>                                         but possible?
>>
>>
>>                                         The adjustment to the edge
>>                                         case logic now looks correct.
>>
>>                                             The test case for
>>                                             verifying fixes for above
>>                                             two issues can be found
>>                                             here:
>>                                             http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>                                             <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>>
>>
>>                                         Yes, that verifies the edge
>>                                         case handling e.g. I ran it
>>                                         with 256 pages
>>                                         and with 127/8/9 mapped and it
>>                                         gave the right results.
>>
>>                                             As mentioned in early
>>                                             email, new patch also
>>                                             contains implementation for
>>                                             Windows.
>>
>>                                             Updated Webrev:
>>                                             http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>                                             <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>>
>>                                             Test:
>>
>>                                                Reran hotspot_tier1
>>                                             tests on Windows x64 and
>>                                             Linux x64 (fastdebug and
>>                                             release)
>>
>>                                         I have no idea about the
>>                                         validity of the Windows fix
>>                                         but the Linux one
>>                                         looks good modulo that if
>>                                         condition.
>>
>>                                         regards,
>>
>>
>>                                         Andrew Dinn
>>                                         -----------
>>                                         Senior Principal Software Engineer
>>                                         Red Hat UK Ltd
>>                                         Registered in England and
>>                                         Wales under Company
>>                                         Registration No. 03798903
>>                                         Directors: Michael Cunningham,
>>                                         Michael ("Mike") O'Neill, Eric
>>                                         Shander
>>
>>
>>
>>
>>
>>
> 


More information about the hotspot-runtime-dev mailing list