RFR(m): 8185712: [windows] Multiple issues with the native symbol decoder
Ioi Lam
ioi.lam at oracle.com
Wed Aug 16 14:43:46 UTC 2017
Hi Thomas,
I am planning to review this but there's a lot of changes so it would
take me a little while :-(
One quick question - what kind of testing has been done on these
changes? My main worry is (1) stability during normal execution, (2)
impacts to hs_err file generation.
I can see the improvements in the hs_err call stack, but would there be
any regression in corner cases? How can we test for that?
Thanks
- Ioi
On 8/13/17 8:14 AM, Thomas Stüfe wrote:
> Dear all,
>
> May I please have reviews for the following patch.
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8185712
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8185712-windows-improve-native-symbol-resolver/webrev.00/webrev/
>
> This fix makes the native symbol decoder on Windows more robust. This
> increases the chance of getting useful error files. It also refactors the
> code and contains some useful new features.
>
> ---
>
> Native symbol resolution on windows is done via the group of "SymXX" APIs
> exported from dbghelp.dll. Currently this is done via a layer of
> abstraction in decoder.cpp, and a "WindowsDecoder" class in
> decoder_windows.cpp. This class can be instantianted twice, so there can
> exist two objects of this class.
>
> The bugs:
>
> 1) Functions from dbghelp.dll are not threadsafe; calls to them must be
> synchronized. But dbghelp.dll could be used by the two live WindowsDecoder
> objects in parallel from different threads. In addition to that,
> dbghelp.dll functions are used from within os_windows.cpp, when writing
> minidumps.
>
> 2) the SymXX APIs need to be initialized by a call to SymInitialize(). This
> can only happen once. The way it is now, each of the two WindowsDecoder
> objects will call SymInitialize. The second object to do this will fail and
> hence not be usable. In practice this means if in the VM someone called
> e.g. dll_from_address_name() - which will initialize and use the shared
> WindowsDecoder object - and then the VM crashes, the hs-err file will
> contain no useful stack because that would require the second
> WindowsDecoder object, which would get initialized after the shared one,
> and its initialization would fail.
>
> 3) Initialization dependencies:
> - when building the pdb search path (WindowsDecoder::initialize),
> Arguments::get_java_home() is used to deduce the jdk bin directory
> containing the jdk shared objects. This will crash if invoked before the
> system properties are set.
> - Decoder::decode() calls (via the Monitor lock) Thread::current(). This
> means the code will not work during initialization and where
> Thread::current is not set (e.g. in an unattached thread). Admittedly a bit
> theoretical, as this only affects the shared WindowsDecoder object, which
> is not used during error reporting.
>
> 4) WindowsDecoder::initialize(), pdb search path: Code uses MAX_PATH in
> various places, which is wrong. NTFS paths can be longer than that.
> - when calling SymGetSearchPath, code assumes a maximum path size of
> MAX_PATH for the *combined* size of all directory names, which may be way
> too small. Truncation is not handled, code will silently fail if output
> buffer is too small, resulting in the existing pdb path to be overwritten
> instead of being preserved.
> - GetModuleFileName(): similar problem, output buffer is MAX_PATH len,
> which may be too small. Truncation will lead to wrong results.
> - Throughout the function "strncat" is used to assemble the search path,
> but is used wrong and does not guard against buffer overflows (if that was
> the intent - otherwise, why not just use strcat?). The last argument to
> strncat should be the remaining space in the destination buffer, not the
> length of the source string.
>
> 5) WindowsDecoder::decode(): We call SymGetSymFromAddr64() using an output
> buffer of size MAX_PATH - which makes no sense at all, as we are retrieving
> a symbol name, not a file name. Truncation is not handled. This is actually
> dangerous, because SymGetSymFromAddr64() handles truncations sloppily, it
> will return success and fill the output buffer completely, so on truncation
> the symbol name will not be zero terminated.
>
> In addition to the bugs, there are a number of things which could be done
> better or simpler:
>
> a) Setting the search path could be simplified and made more useful if we
> would just add the directory of every loaded module to the search path
> (currently we just add the two jdk bin directories, which also exposes us
> to initialization dependency, see (3)). This would be more useful as a
> common convention is to put pdb files beside binaries, and that way we
> would catch all those. Including, but not limited to, our own jdk pdb files.
>
> b) Dlls can be loaded and unloaded, and it would be nice to have an updated
> pdb search path - e.g. in case a late-loaded third party JNI library
> crashes. So, it would be nice to run (a) whenever a library is loaded or
> unloaded, and for the process to be fast if nothing changed (e.g. if the
> new DLL was loaded from a directory which is already part of the search
> path).
>
> c) It would be nice to have file name and line number in the callstack, too.
>
> d) As pointed out in JDK-8144855, the function
> Decoder::can_decode_C_frame_in_vm() is not necessary. This should be
> handled in WindowsDecoder instead.
>
> ---
>
> What I did in this fix:
>
> - I pulled out dbghelp.dll handling into an own centralized singleton
> (DbgHelpLoader). This class takes care of loading the Dll and synchonizes
> access to all its functions, thus solving (1). DbgHelpLoader now replaces
> all places where before the DLL was loaded manually.
>
> - Atop of DbgHelpLoader there is another new singleton class,
> "SymbolEngine", which wraps the life cycle of the symbol APIs. This solves
> problem (2). In addition to that, this class is the new interface to the
> SymXX APIs.
>
> - this means that the existing code in decoder.cpp, which instantiates two
> Decode objects and synchronizes access to one of them, makes no sense for
> Windows. That whole layer is now bypassed in Windows - the static
> Decoder::xxx() functions are now directly implemented in
> decoder_windows.cpp and access the SymbolEngine singleton without any added
> layers inbetween.
>
> - SymbolEngine contains a new improved way to assemble the pdb search path
> as described in (a) and (b). We iterate loaded modules, extract directories
> and add them to the search path. Implementation takes care to not do
> unnecessary work (avoiding changing the path if list of loaded modules is
> unchanged, or if only Dlls from known directories were loaded).
>
> - We now recalculate the pdb seach path if a DLL was loaded (via
> os::dll_load). That makes it possible to debug third party dlls which do
> not reside in jdk directories and were loaded after the SymbolEngine was
> initialized.
>
> - When decoding a symbol from an unknown address and the decode fails, we
> attempt to rebuild the pdb search path and attempt again.
>
> - Care was taken for this code to be robust. It may be called
> pre-initialization, post-shutdown or under error conditions (low memory,
> out of stack etc), so no VM infrastructure was used and memory use is
> frugal (dynamic buffers allocated off the stack and reused were possible).
>
> - I did away with the "Decoder::can_decode_C_frame_in_vm()" code. See
> argumentation in JDK-8144855. To me, it makes no sense. The function is a
> workaround the problem that, when faced with stripped binaries which only
> have debug symbols for public functions, symbol info may be confusing
> (public symbols + large offsets). But this is not done consistently, and
> most programmers know not to trust symbols with large offsets.
>
> - Finally, we now have source info in the callstack :)
>
> Stack: [0x00840000,0x00890000], sp=0x0088f884, free space=318k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> V [jvm.dll+0xa26903] VMError::controlled_crash+0x2a3 (vmerror.cpp:1683)
> V [jvm.dll+0xa2664e] VMError::test_error_handler+0xe (vmerror.cpp:1632)
> V [jvm.dll+0x6af796] JNI_CreateJavaVM_inner+0x196 (jni.cpp:3988)
> V [jvm.dll+0x6af5c2] JNI_CreateJavaVM+0x52 (jni.cpp:4036)
> V [jvm.dll+0x2cdfc] init_jvm+0xdc (gtestmain.cpp:94)
> V [jvm.dll+0x2d18a] JVMInitializerListener::OnTestStart+0x7a
> (gtestmain.cpp:114)
> V [jvm.dll+0x829b] testing::internal::TestEventRepeater::OnTestStart+0x5b
> (gtest.cc:2979)
> V [jvm.dll+0x6874] testing::TestInfo::Run+0x54 (gtest.cc:2312)
> V [jvm.dll+0x6d8f] testing::TestCase::Run+0xbf (gtest.cc:2445)
> V [jvm.dll+0xbe9e] testing::internal::UnitTestImpl::RunAllTests+0x25e
> (gtest.cc:4316)
> V [jvm.dll+0x2a410]
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool>+0x40
> (gtest.cc:2063)
> V [jvm.dll+0x2493e]
> testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool>+0x5e
> (gtest.cc:2114)
> V [jvm.dll+0xb0e9] testing::UnitTest::Run+0xe9 (gtest.cc:3929)
> V [jvm.dll+0x2d0bf] RUN_ALL_TESTS+0xf (gtest.h:2289)
> V [jvm.dll+0x2cc76] runUnitTestsInner+0x1f6 (gtestmain.cpp:249)
> V [jvm.dll+0x2c958] runUnitTests+0x48 (gtestmain.cpp:319)
> C [gtestLauncher.exe+0x1011] main+0x11 (gtestlauncher.cpp:32)
> C [gtestLauncher.exe+0x1182] __tmainCRTStartup+0x122 (crtexe.c:555)
> C [KERNEL32.DLL+0x138f4]
> C [ntdll.dll+0x65de3]
> C [ntdll.dll+0x65dae]
>
> .. which is implemented for Windows, but pipes are laid to plug in other
> platforms as well (Decoder::get_source_info())
>
> ----
>
> Tell me what you think.
>
> Thanks, and Kind Regards, Thomas
More information about the hotspot-runtime-dev
mailing list