RFR(XS) 8191369: NMT: Enhance thread stack tracking
Zhengyu Gu
zgu at redhat.com
Mon Mar 5 20:02:57 UTC 2018
Hi Coleen,
On 03/05/2018 02:36 PM, coleen.phillimore at oracle.com wrote:
>
> Hi,
> I think this change is broken because on my machine I get crashes while
> printing the stack region.
>
> FAILED: runtime/NMT/CheckForProperDetailStackTrace.java
> FAILED: runtime/NMT/PrintNMTStatistics.java
> FAILED: runtime/appcds/sharedStrings/FlagCombo.java
>
> When I reverted this change the tests pass. I think we should backout
> this change.
Sure.
Which platforms they failed on?
Thanks,
-Zhengyu
>
> thanks,
> Coleen
>
>
> On 3/5/18 11:21 AM, Thomas Stüfe wrote:
>> Hi Zhengyu,
>>
>> about the os_windows.cpp part:
>>
>> Are you sure this works as intended? I was not able to test it yet. It
>> may work but makes some assumptions: that the stack top you are
>> calculating is actually the end of a heap region; that you can just
>> pass any address as the first parameter to VirtualQuery (the
>> documentation is a bit ambiguous here); even that all committed pages
>> must be consecutive, which does not have to be the case (we had cases
>> on HyperV where there were holes in the committed thread stack area).
>>
>> I probably would have reused the coding in os::current_stack_base(),
>> because we know that works:
>>
>> - call VirtualQuery with an address on the stack to get
>> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack
>> bottom here as input.
>> - then walk all regions upward and sum up the sizes of the committed
>> ones, until we hit stack top - which supposedly should coincide with
>> the end of a heap region but who knows.
>>
>> Best Regards, Thomas
>>
>>
>>
>>
>> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com
>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>
>>
>> This looks good but I hope someone else can review the os_linux
>> and os_windows parts.
>>
>> I will sponsor this once the reviews are complete.
>>
>> thanks,
>> Coleen
>>
>>
>> On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>>
>> Hi,
>>
>> I rebased the patch after JDK-8196405[1]. Stefan's patch fixed
>> virtual memory tracking issues I tried to address in early patch.
>>
>>
>> In additions, I removed os::pd_committed_stack_size() methods
>> as Coleen suggested. Also added new gtest.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>> Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>>
>> Test:
>> hotspot_runtime, test-hotspot-gtest on Linux 64 with
>> fastdebug and release builds.
>>
>>
>> [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>> <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>> On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>>
>>
>>
>> No, what you have is better. I dislike the
>> delegation to the duplicate function signature in the
>> pd_* layer. Can you make the
>> os::committed_stack_size() functions in the platforms
>> where this isn't supported just be empy functions?
>>
>> Okay.
>>
>>
>> If there are platforms that don't have the ability to
>> track committed regions of the mtThread stack, don't
>> you still need the concept of all_committed?
>>
>> No, it exists for convenience. However, stack tracking
>> code does need adjustment.
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>
>> thanks,
>> Coleen
>>
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>
>>
>>
>> Thanks,
>> Coleen
>>
>> On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>
>> Hi,
>>
>> Can I get second (R)eviewer for this? and
>> I also need a sponsor.
>>
>> This is JDK11 bug.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>> Bug: Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8191369
>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>
>> Thanks,
>>
>> -Zhengyu
>>
>>
>>
>>
>> On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>
>> Hi Andrew,
>>
>> Thanks for the review.
>>
>> I'm not sure about the logic
>> of the test when you hit the the
>> ENOMEM/uncommited case.
>>
>> + mincore_return_value =
>> mincore(test_addr, page_sz, vec);
>> +
>> + if (mincore_return_value
>> == -1 || (committed_only &&
>> (vec[0] &
>> 0x01) == 0)) {
>> + // Page is not
>> mapped/committed go up
>> + // to find first
>> mapped/committed page
>> + if (errno != EAGAIN ||
>> (committed_only && (vec[0] &
>> 0x01) == 0)) {
>> +
>> assert(mincore_return_value !=
>> -1 || errno == ENOMEM,
>> "Unexpected mincore errno");
>>
>> Should that not be
>>
>> + if
>> ((mincore_return_value != -1
>> && errno != EAGAIN) ||
>> (committed_only && (vec[0] &
>> 0x01) == 0)) {
>>
>>
>> Sorry, that correction should have
>> been:
>>
>> + if
>> ((mincore_return_value == -1 &&
>> errno != EAGAIN) ||
>> (committed_only && (vec[0] &
>> 0x01) == 0)) {
>>
>>
>> Thanks for pointing out. Updated
>> accordingly:
>>
>> Webrev:
>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>
>> -Zhengyu
>>
>>
>> A successful call to mincore
>> is not guaranteed to reset
>> errno and it
>> might by chance already have
>> value EAGAIN before the call
>> to mincore. If
>> we happened to hit a mapped,
>> committed page first time then
>> the vec bit
>> will be 1 and the code will
>> loop and retest the same
>> address. Unlikely
>> but possible?
>>
>>
>> The adjustment to the edge
>> case logic now looks correct.
>>
>> The test case for
>> verifying fixes for above
>> two issues can be found
>> here:
>> http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>> <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>>
>>
>> Yes, that verifies the edge
>> case handling e.g. I ran it
>> with 256 pages
>> and with 127/8/9 mapped and it
>> gave the right results.
>>
>> As mentioned in early
>> email, new patch also
>> contains implementation for
>> Windows.
>>
>> Updated Webrev:
>> http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>>
>> Test:
>>
>> Reran hotspot_tier1
>> tests on Windows x64 and
>> Linux x64 (fastdebug and
>> release)
>>
>> I have no idea about the
>> validity of the Windows fix
>> but the Linux one
>> looks good modulo that if
>> condition.
>>
>> regards,
>>
>>
>> Andrew Dinn
>> -----------
>> Senior Principal Software Engineer
>> Red Hat UK Ltd
>> Registered in England and
>> Wales under Company
>> Registration No. 03798903
>> Directors: Michael Cunningham,
>> Michael ("Mike") O'Neill, Eric
>> Shander
>>
>>
>>
>>
>>
>>
>
More information about the hotspot-runtime-dev
mailing list