RFR(XS) 8191369: NMT: Enhance thread stack tracking
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Mon Mar 5 21:03:56 UTC 2018
On 3/5/18 3:02 PM, Zhengyu Gu wrote:
> Hi Coleen,
>
> On 03/05/2018 02:36 PM, coleen.phillimore at oracle.com wrote:
>>
>> Hi,
>> I think this change is broken because on my machine I get crashes
>> while printing the stack region.
>>
>> FAILED: runtime/NMT/CheckForProperDetailStackTrace.java
>> FAILED: runtime/NMT/PrintNMTStatistics.java
>> FAILED: runtime/appcds/sharedStrings/FlagCombo.java
>>
>> When I reverted this change the tests pass. I think we should
>> backout this change.
>
>
> Sure.
>
> Which platforms they failed on?
They failed on linux x64 72 processor
model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
thanks,
Coleen
>
> Thanks,
>
> -Zhengyu
>
>
>>
>> thanks,
>> Coleen
>>
>>
>> On 3/5/18 11:21 AM, Thomas Stüfe wrote:
>>> Hi Zhengyu,
>>>
>>> about the os_windows.cpp part:
>>>
>>> Are you sure this works as intended? I was not able to test it yet.
>>> It may work but makes some assumptions: that the stack top you are
>>> calculating is actually the end of a heap region; that you can just
>>> pass any address as the first parameter to VirtualQuery (the
>>> documentation is a bit ambiguous here); even that all committed
>>> pages must be consecutive, which does not have to be the case (we
>>> had cases on HyperV where there were holes in the committed thread
>>> stack area).
>>>
>>> I probably would have reused the coding in os::current_stack_base(),
>>> because we know that works:
>>>
>>> - call VirtualQuery with an address on the stack to get
>>> MEMORY_BASIC_INFORMATION::AllocationBase. We could just use stack
>>> bottom here as input.
>>> - then walk all regions upward and sum up the sizes of the committed
>>> ones, until we hit stack top - which supposedly should coincide with
>>> the end of a heap region but who knows.
>>>
>>> Best Regards, Thomas
>>>
>>>
>>>
>>>
>>> On Fri, Feb 23, 2018 at 5:52 PM, <coleen.phillimore at oracle.com
>>> <mailto:coleen.phillimore at oracle.com>> wrote:
>>>
>>>
>>> This looks good but I hope someone else can review the os_linux
>>> and os_windows parts.
>>>
>>> I will sponsor this once the reviews are complete.
>>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> On 2/21/18 10:49 AM, Zhengyu Gu wrote:
>>>
>>> Hi,
>>>
>>> I rebased the patch after JDK-8196405[1]. Stefan's patch fixed
>>> virtual memory tracking issues I tried to address in early
>>> patch.
>>>
>>>
>>> In additions, I removed os::pd_committed_stack_size() methods
>>> as Coleen suggested. Also added new gtest.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8191369
>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>> Webrev: http://cr.openjdk.java.net/~zgu/8191369/webrev.03/
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.03/>
>>>
>>> Test:
>>> hotspot_runtime, test-hotspot-gtest on Linux 64 with
>>> fastdebug and release builds.
>>>
>>>
>>> [1] http://cr.openjdk.java.net/~stefank/8196405/webrev.02/
>>> <http://cr.openjdk.java.net/%7Estefank/8196405/webrev.02/>
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>> On 02/12/2018 09:33 AM, Zhengyu Gu wrote:
>>>
>>>
>>>
>>> No, what you have is better. I dislike the
>>> delegation to the duplicate function signature in the
>>> pd_* layer. Can you make the
>>> os::committed_stack_size() functions in the platforms
>>> where this isn't supported just be empy functions?
>>>
>>> Okay.
>>>
>>>
>>> If there are platforms that don't have the ability to
>>> track committed regions of the mtThread stack, don't
>>> you still need the concept of all_committed?
>>>
>>> No, it exists for convenience. However, stack tracking
>>> code does need adjustment.
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>>
>>> thanks,
>>> Coleen
>>>
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 1/11/18 4:05 PM, Zhengyu Gu wrote:
>>>
>>> Hi,
>>>
>>> Can I get second (R)eviewer for this? and
>>> I also need a sponsor.
>>>
>>> This is JDK11 bug.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>> Bug: Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8191369
>>> <https://bugs.openjdk.java.net/browse/JDK-8191369>
>>>
>>> Thanks,
>>>
>>> -Zhengyu
>>>
>>>
>>>
>>>
>>> On 11/20/2017 01:46 PM, Zhengyu Gu wrote:
>>>
>>> Hi Andrew,
>>>
>>> Thanks for the review.
>>>
>>> I'm not sure about the logic
>>> of the test when you hit the
>>> the
>>> ENOMEM/uncommited case.
>>>
>>> + mincore_return_value =
>>> mincore(test_addr, page_sz,
>>> vec);
>>> +
>>> + if (mincore_return_value
>>> == -1 || (committed_only &&
>>> (vec[0] &
>>> 0x01) == 0)) {
>>> + // Page is not
>>> mapped/committed go up
>>> + // to find first
>>> mapped/committed page
>>> + if (errno != EAGAIN ||
>>> (committed_only && (vec[0] &
>>> 0x01) == 0)) {
>>> +
>>> assert(mincore_return_value !=
>>> -1 || errno == ENOMEM,
>>> "Unexpected mincore errno");
>>>
>>> Should that not be
>>>
>>> + if
>>> ((mincore_return_value != -1
>>> && errno != EAGAIN) ||
>>> (committed_only && (vec[0] &
>>> 0x01) == 0)) {
>>>
>>>
>>> Sorry, that correction should have
>>> been:
>>>
>>> + if
>>> ((mincore_return_value == -1 &&
>>> errno != EAGAIN) ||
>>> (committed_only && (vec[0] &
>>> 0x01) == 0)) {
>>>
>>>
>>> Thanks for pointing out. Updated
>>> accordingly:
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.02/
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.02/>
>>>
>>> -Zhengyu
>>>
>>>
>>> A successful call to mincore
>>> is not guaranteed to reset
>>> errno and it
>>> might by chance already have
>>> value EAGAIN before the call
>>> to mincore. If
>>> we happened to hit a mapped,
>>> committed page first time then
>>> the vec bit
>>> will be 1 and the code will
>>> loop and retest the same
>>> address. Unlikely
>>> but possible?
>>>
>>>
>>> The adjustment to the edge
>>> case logic now looks correct.
>>>
>>> The test case for
>>> verifying fixes for above
>>> two issues can be found
>>> here:
>>> http://cr.openjdk.java.net/~zgu/8191369/test_mmap.c
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/test_mmap.c>
>>>
>>>
>>> Yes, that verifies the edge
>>> case handling e.g. I ran it
>>> with 256 pages
>>> and with 127/8/9 mapped and it
>>> gave the right results.
>>>
>>> As mentioned in early
>>> email, new patch also
>>> contains implementation for
>>> Windows.
>>>
>>> Updated Webrev:
>>> http://cr.openjdk.java.net/~zgu/8191369/webrev.01/
>>> <http://cr.openjdk.java.net/%7Ezgu/8191369/webrev.01/>
>>>
>>> Test:
>>>
>>> Reran hotspot_tier1
>>> tests on Windows x64 and
>>> Linux x64 (fastdebug and
>>> release)
>>>
>>> I have no idea about the
>>> validity of the Windows fix
>>> but the Linux one
>>> looks good modulo that if
>>> condition.
>>>
>>> regards,
>>>
>>>
>>> Andrew Dinn
>>> -----------
>>> Senior Principal Software
>>> Engineer
>>> Red Hat UK Ltd
>>> Registered in England and
>>> Wales under Company
>>> Registration No. 03798903
>>> Directors: Michael Cunningham,
>>> Michael ("Mike") O'Neill, Eric
>>> Shander
>>>
>>>
>>>
>>>
>>>
>>>
>>
More information about the hotspot-runtime-dev
mailing list