RFR(m): 8185712: [windows] Multiple issues with the native symbol decoder
Thomas Stüfe
thomas.stuefe at gmail.com
Wed Aug 16 15:12:38 UTC 2017
Hi Ioi,
On Wed, Aug 16, 2017 at 4:43 PM, Ioi Lam <ioi.lam at oracle.com> wrote:
> Hi Thomas,
>
> I am planning to review this but there's a lot of changes so it would take
> me a little while :-(
>
>
thank you!
I currently wonder whether this is too massive and if I should split this
up. At least the centralized dbghelp.dll handling could easily moved into
an own patch, that would make this patch easier to digest. What do you
think?
> One quick question - what kind of testing has been done on these changes?
> My main worry is (1) stability during normal execution, (2) impacts to
> hs_err file generation.
>
>
I built windows x86 and x64, slodebug and release. Ran gtests manually. Our
nightly builds ran (TCK, hotspot jtreg) with no problems I could attribute
to my patch.
I also wrote a small selftest (SymbolEngine::selftest()) atop of the ones I
added to test_os.cpp which lives in symbol_api.cpp and stresses the
SymbolEngine class from the inside. I did not add this to the webrev. For
one because the webrev was already large and also because this test was not
a gtest but a selftest inside the VM and I did not think you guys would
accept this.
> I can see the improvements in the hs_err call stack, but would there be
> any regression in corner cases? How can we test for that?
I think the code with my patch is way more stable than before, that was the
aim of this patch. One could write loads of regression tests for corner
cases.
We actually do this at SAP with our own fork - we have extensive tests for
the hs-err file generation. But from experience I know that these kind of
tests are difficult to get 100% solid. For instance, one thing we do at SAP
is to the VM crash in many colorful ways and check the callstacks for
validity. However, this is fragile and depends on many factors - how the
compiler optimizes, whether we build with debug infos etc. I hesitate to
add tests like this to the OpenJDK which causes sporadic work for
maintainers, everyone would hate me :)
Thanks, Thomas
> Thanks
>
> - Ioi
>
>
>
> On 8/13/17 8:14 AM, Thomas Stüfe wrote:
>
>> Dear all,
>>
>> May I please have reviews for the following patch.
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8185712
>> Webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8185712-windows-
>> improve-native-symbol-resolver/webrev.00/webrev/
>>
>> This fix makes the native symbol decoder on Windows more robust. This
>> increases the chance of getting useful error files. It also refactors the
>> code and contains some useful new features.
>>
>> ---
>>
>> Native symbol resolution on windows is done via the group of "SymXX" APIs
>> exported from dbghelp.dll. Currently this is done via a layer of
>> abstraction in decoder.cpp, and a "WindowsDecoder" class in
>> decoder_windows.cpp. This class can be instantianted twice, so there can
>> exist two objects of this class.
>>
>> The bugs:
>>
>> 1) Functions from dbghelp.dll are not threadsafe; calls to them must be
>> synchronized. But dbghelp.dll could be used by the two live WindowsDecoder
>> objects in parallel from different threads. In addition to that,
>> dbghelp.dll functions are used from within os_windows.cpp, when writing
>> minidumps.
>>
>> 2) the SymXX APIs need to be initialized by a call to SymInitialize().
>> This
>> can only happen once. The way it is now, each of the two WindowsDecoder
>> objects will call SymInitialize. The second object to do this will fail
>> and
>> hence not be usable. In practice this means if in the VM someone called
>> e.g. dll_from_address_name() - which will initialize and use the shared
>> WindowsDecoder object - and then the VM crashes, the hs-err file will
>> contain no useful stack because that would require the second
>> WindowsDecoder object, which would get initialized after the shared one,
>> and its initialization would fail.
>>
>> 3) Initialization dependencies:
>> - when building the pdb search path (WindowsDecoder::initialize),
>> Arguments::get_java_home() is used to deduce the jdk bin directory
>> containing the jdk shared objects. This will crash if invoked before the
>> system properties are set.
>> - Decoder::decode() calls (via the Monitor lock) Thread::current().
>> This
>> means the code will not work during initialization and where
>> Thread::current is not set (e.g. in an unattached thread). Admittedly a
>> bit
>> theoretical, as this only affects the shared WindowsDecoder object, which
>> is not used during error reporting.
>>
>> 4) WindowsDecoder::initialize(), pdb search path: Code uses MAX_PATH in
>> various places, which is wrong. NTFS paths can be longer than that.
>> - when calling SymGetSearchPath, code assumes a maximum path size of
>> MAX_PATH for the *combined* size of all directory names, which may be way
>> too small. Truncation is not handled, code will silently fail if output
>> buffer is too small, resulting in the existing pdb path to be overwritten
>> instead of being preserved.
>> - GetModuleFileName(): similar problem, output buffer is MAX_PATH len,
>> which may be too small. Truncation will lead to wrong results.
>> - Throughout the function "strncat" is used to assemble the search
>> path,
>> but is used wrong and does not guard against buffer overflows (if that was
>> the intent - otherwise, why not just use strcat?). The last argument to
>> strncat should be the remaining space in the destination buffer, not the
>> length of the source string.
>>
>> 5) WindowsDecoder::decode(): We call SymGetSymFromAddr64() using an output
>> buffer of size MAX_PATH - which makes no sense at all, as we are
>> retrieving
>> a symbol name, not a file name. Truncation is not handled. This is
>> actually
>> dangerous, because SymGetSymFromAddr64() handles truncations sloppily, it
>> will return success and fill the output buffer completely, so on
>> truncation
>> the symbol name will not be zero terminated.
>>
>> In addition to the bugs, there are a number of things which could be done
>> better or simpler:
>>
>> a) Setting the search path could be simplified and made more useful if we
>> would just add the directory of every loaded module to the search path
>> (currently we just add the two jdk bin directories, which also exposes us
>> to initialization dependency, see (3)). This would be more useful as a
>> common convention is to put pdb files beside binaries, and that way we
>> would catch all those. Including, but not limited to, our own jdk pdb
>> files.
>>
>> b) Dlls can be loaded and unloaded, and it would be nice to have an
>> updated
>> pdb search path - e.g. in case a late-loaded third party JNI library
>> crashes. So, it would be nice to run (a) whenever a library is loaded or
>> unloaded, and for the process to be fast if nothing changed (e.g. if the
>> new DLL was loaded from a directory which is already part of the search
>> path).
>>
>> c) It would be nice to have file name and line number in the callstack,
>> too.
>>
>> d) As pointed out in JDK-8144855, the function
>> Decoder::can_decode_C_frame_in_vm() is not necessary. This should be
>> handled in WindowsDecoder instead.
>>
>> ---
>>
>> What I did in this fix:
>>
>> - I pulled out dbghelp.dll handling into an own centralized singleton
>> (DbgHelpLoader). This class takes care of loading the Dll and synchonizes
>> access to all its functions, thus solving (1). DbgHelpLoader now replaces
>> all places where before the DLL was loaded manually.
>>
>> - Atop of DbgHelpLoader there is another new singleton class,
>> "SymbolEngine", which wraps the life cycle of the symbol APIs. This solves
>> problem (2). In addition to that, this class is the new interface to the
>> SymXX APIs.
>>
>> - this means that the existing code in decoder.cpp, which instantiates two
>> Decode objects and synchronizes access to one of them, makes no sense for
>> Windows. That whole layer is now bypassed in Windows - the static
>> Decoder::xxx() functions are now directly implemented in
>> decoder_windows.cpp and access the SymbolEngine singleton without any
>> added
>> layers inbetween.
>>
>> - SymbolEngine contains a new improved way to assemble the pdb search path
>> as described in (a) and (b). We iterate loaded modules, extract
>> directories
>> and add them to the search path. Implementation takes care to not do
>> unnecessary work (avoiding changing the path if list of loaded modules is
>> unchanged, or if only Dlls from known directories were loaded).
>>
>> - We now recalculate the pdb seach path if a DLL was loaded (via
>> os::dll_load). That makes it possible to debug third party dlls which do
>> not reside in jdk directories and were loaded after the SymbolEngine was
>> initialized.
>>
>> - When decoding a symbol from an unknown address and the decode fails, we
>> attempt to rebuild the pdb search path and attempt again.
>>
>> - Care was taken for this code to be robust. It may be called
>> pre-initialization, post-shutdown or under error conditions (low memory,
>> out of stack etc), so no VM infrastructure was used and memory use is
>> frugal (dynamic buffers allocated off the stack and reused were possible).
>>
>> - I did away with the "Decoder::can_decode_C_frame_in_vm()" code. See
>> argumentation in JDK-8144855. To me, it makes no sense. The function is a
>> workaround the problem that, when faced with stripped binaries which only
>> have debug symbols for public functions, symbol info may be confusing
>> (public symbols + large offsets). But this is not done consistently, and
>> most programmers know not to trust symbols with large offsets.
>>
>> - Finally, we now have source info in the callstack :)
>>
>> Stack: [0x00840000,0x00890000], sp=0x0088f884, free space=318k
>> Native frames: (J=compiled Java code, A=aot compiled Java code,
>> j=interpreted, Vv=VM code, C=native code)
>> V [jvm.dll+0xa26903] VMError::controlled_crash+0x2a3
>> (vmerror.cpp:1683)
>> V [jvm.dll+0xa2664e] VMError::test_error_handler+0xe
>> (vmerror.cpp:1632)
>> V [jvm.dll+0x6af796] JNI_CreateJavaVM_inner+0x196 (jni.cpp:3988)
>> V [jvm.dll+0x6af5c2] JNI_CreateJavaVM+0x52 (jni.cpp:4036)
>> V [jvm.dll+0x2cdfc] init_jvm+0xdc (gtestmain.cpp:94)
>> V [jvm.dll+0x2d18a] JVMInitializerListener::OnTestStart+0x7a
>> (gtestmain.cpp:114)
>> V [jvm.dll+0x829b] testing::internal::TestEventRe
>> peater::OnTestStart+0x5b
>> (gtest.cc:2979)
>> V [jvm.dll+0x6874] testing::TestInfo::Run+0x54 (gtest.cc:2312)
>> V [jvm.dll+0x6d8f] testing::TestCase::Run+0xbf (gtest.cc:2445)
>> V [jvm.dll+0xbe9e] testing::internal::UnitTestImpl::RunAllTests+0x25e
>> (gtest.cc:4316)
>> V [jvm.dll+0x2a410]
>> testing::internal::HandleSehExceptionsInMethodIfSupported<
>> testing::internal::UnitTestImpl,bool>+0x40
>> (gtest.cc:2063)
>> V [jvm.dll+0x2493e]
>> testing::internal::HandleExceptionsInMethodIfSupported<
>> testing::internal::UnitTestImpl,bool>+0x5e
>> (gtest.cc:2114)
>> V [jvm.dll+0xb0e9] testing::UnitTest::Run+0xe9 (gtest.cc:3929)
>> V [jvm.dll+0x2d0bf] RUN_ALL_TESTS+0xf (gtest.h:2289)
>> V [jvm.dll+0x2cc76] runUnitTestsInner+0x1f6 (gtestmain.cpp:249)
>> V [jvm.dll+0x2c958] runUnitTests+0x48 (gtestmain.cpp:319)
>> C [gtestLauncher.exe+0x1011] main+0x11 (gtestlauncher.cpp:32)
>> C [gtestLauncher.exe+0x1182] __tmainCRTStartup+0x122 (crtexe.c:555)
>> C [KERNEL32.DLL+0x138f4]
>> C [ntdll.dll+0x65de3]
>> C [ntdll.dll+0x65dae]
>>
>> .. which is implemented for Windows, but pipes are laid to plug in other
>> platforms as well (Decoder::get_source_info())
>>
>> ----
>>
>> Tell me what you think.
>>
>> Thanks, and Kind Regards, Thomas
>>
>
>
More information about the hotspot-runtime-dev
mailing list