RFR(m): 8185712: [windows] Multiple issues with the native symbol decoder

Thomas Stüfe thomas.stuefe at gmail.com
Wed Aug 16 15:53:07 UTC 2017


Hi all,

after discussing this with Ioi, I decided to remove this RFR and split the
patch up in multiple chunks, which are hopefully easier to review.

Thanks, Thomas

On Sun, Aug 13, 2017 at 5:14 PM, Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:

> Dear all,
>
> May I please have reviews for the following patch.
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8185712
> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
> 8185712-windows-improve-native-symbol-resolver/webrev.00/webrev/
>
> This fix makes the native symbol decoder on Windows more robust. This
> increases the chance of getting useful error files. It also refactors the
> code and contains some useful new features.
>
> ---
>
> Native symbol resolution on windows is done via the group of "SymXX" APIs
> exported from dbghelp.dll. Currently this is done via a layer of
> abstraction in decoder.cpp, and a "WindowsDecoder" class in
> decoder_windows.cpp. This class can be instantianted twice, so there can
> exist two objects of this class.
>
> The bugs:
>
> 1) Functions from dbghelp.dll are not threadsafe; calls to them must be
> synchronized. But dbghelp.dll could be used by the two live WindowsDecoder
> objects in parallel from different threads. In addition to that,
> dbghelp.dll functions are used from within os_windows.cpp, when writing
> minidumps.
>
> 2) the SymXX APIs need to be initialized by a call to SymInitialize().
> This can only happen once. The way it is now, each of the two
> WindowsDecoder objects will call SymInitialize. The second object to do
> this will fail and hence not be usable. In practice this means if in the VM
> someone called e.g. dll_from_address_name() - which will initialize and use
> the shared WindowsDecoder object - and then the VM crashes, the hs-err file
> will contain no useful stack because that would require the second
> WindowsDecoder object, which would get initialized after the shared one,
> and its initialization would fail.
>
> 3) Initialization dependencies:
>   - when building the pdb search path (WindowsDecoder::initialize),
> Arguments::get_java_home() is used to deduce the jdk bin directory
> containing the jdk shared objects. This will crash if invoked before the
> system properties are set.
>   - Decoder::decode() calls (via the Monitor lock) Thread::current(). This
> means the code will not work during initialization and where
> Thread::current is not set (e.g. in an unattached thread). Admittedly a bit
> theoretical, as this only affects the shared WindowsDecoder object, which
> is not used during error reporting.
>
> 4) WindowsDecoder::initialize(), pdb search path: Code uses MAX_PATH in
> various places, which is wrong. NTFS paths can be longer than that.
>   - when calling SymGetSearchPath, code assumes a maximum path size of
> MAX_PATH for the *combined* size of all directory names, which may be way
> too small. Truncation is not handled, code will silently fail if output
> buffer is too small, resulting in the existing pdb path to be overwritten
> instead of being preserved.
>   - GetModuleFileName(): similar problem, output buffer is MAX_PATH len,
> which may be too small. Truncation will lead to wrong results.
>   - Throughout the function "strncat" is used to assemble the search path,
> but is used wrong and does not guard against buffer overflows (if that was
> the intent - otherwise, why not just use strcat?). The last argument to
> strncat should be the remaining space in the destination buffer, not the
> length of the source string.
>
> 5) WindowsDecoder::decode(): We call SymGetSymFromAddr64() using an output
> buffer of size MAX_PATH - which makes no sense at all, as we are retrieving
> a symbol name, not a file name. Truncation is not handled. This is actually
> dangerous, because SymGetSymFromAddr64() handles truncations sloppily, it
> will return success and fill the output buffer completely, so on truncation
> the symbol name will not be zero terminated.
>
> In addition to the bugs, there are a number of things which could be done
> better or simpler:
>
> a) Setting the search path could be simplified and made more useful if we
> would just add the directory of every loaded module to the search path
> (currently we just add the two jdk bin directories, which also exposes us
> to initialization dependency, see (3)). This would be more useful as a
> common convention is to put pdb files beside binaries, and that way we
> would catch all those. Including, but not limited to, our own jdk pdb files.
>
> b) Dlls can be loaded and unloaded, and it would be nice to have an
> updated pdb search path - e.g. in case a late-loaded third party JNI
> library crashes. So, it would be nice to run (a) whenever a library is
> loaded or unloaded, and for the process to be fast if nothing changed (e.g.
> if the new DLL was loaded from a directory which is already part of the
> search path).
>
> c) It would be nice to have file name and line number in the callstack,
> too.
>
> d) As pointed out in JDK-8144855, the function Decoder::can_decode_C_frame_in_vm()
> is not necessary. This should be handled in WindowsDecoder instead.
>
> ---
>
> What I did in this fix:
>
> - I pulled out dbghelp.dll handling into an own centralized singleton
> (DbgHelpLoader). This class takes care of loading the Dll and synchonizes
> access to all its functions, thus solving (1). DbgHelpLoader now replaces
> all places where before the DLL was loaded manually.
>
> - Atop of DbgHelpLoader there is another new singleton class,
> "SymbolEngine", which wraps the life cycle of the symbol APIs. This solves
> problem (2). In addition to that, this class is the new interface to the
> SymXX APIs.
>
> - this means that the existing code in decoder.cpp, which instantiates two
> Decode objects and synchronizes access to one of them, makes no sense for
> Windows. That whole layer is now bypassed in Windows - the static
> Decoder::xxx() functions are now directly implemented in
> decoder_windows.cpp and access the SymbolEngine singleton without any added
> layers inbetween.
>
> - SymbolEngine contains a new improved way to assemble the pdb search path
> as described in (a) and (b). We iterate loaded modules, extract directories
> and add them to the search path. Implementation takes care to not do
> unnecessary work (avoiding changing the path if list of loaded modules is
> unchanged, or if only Dlls from known directories were loaded).
>
> - We now recalculate the pdb seach path if a DLL was loaded (via
> os::dll_load). That makes it possible to debug third party dlls which do
> not reside in jdk directories and were loaded after the SymbolEngine was
> initialized.
>
> - When decoding a symbol from an unknown address and the decode fails, we
> attempt to rebuild the pdb search path and attempt again.
>
> - Care was taken for this code to be robust. It may be called
> pre-initialization, post-shutdown or under error conditions (low memory,
> out of stack etc), so no VM infrastructure was used and memory use is
> frugal (dynamic buffers allocated off the stack and reused were possible).
>
> - I did away with the "Decoder::can_decode_C_frame_in_vm()" code. See
> argumentation in JDK-8144855. To me, it makes no sense. The function is a
> workaround the problem that, when faced with stripped binaries which only
> have debug symbols for public functions, symbol info may be confusing
> (public symbols + large offsets). But this is not done consistently, and
> most programmers know not to trust symbols with large offsets.
>
> - Finally, we now have source info in the callstack :)
>
> Stack: [0x00840000,0x00890000],  sp=0x0088f884,  free space=318k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> V  [jvm.dll+0xa26903]  VMError::controlled_crash+0x2a3  (vmerror.cpp:1683)
> V  [jvm.dll+0xa2664e]  VMError::test_error_handler+0xe  (vmerror.cpp:1632)
> V  [jvm.dll+0x6af796]  JNI_CreateJavaVM_inner+0x196  (jni.cpp:3988)
> V  [jvm.dll+0x6af5c2]  JNI_CreateJavaVM+0x52  (jni.cpp:4036)
> V  [jvm.dll+0x2cdfc]  init_jvm+0xdc  (gtestmain.cpp:94)
> V  [jvm.dll+0x2d18a]  JVMInitializerListener::OnTestStart+0x7a
>  (gtestmain.cpp:114)
> V  [jvm.dll+0x829b]  testing::internal::TestEventRepeater::OnTestStart+0x5b
>  (gtest.cc:2979)
> V  [jvm.dll+0x6874]  testing::TestInfo::Run+0x54  (gtest.cc:2312)
> V  [jvm.dll+0x6d8f]  testing::TestCase::Run+0xbf  (gtest.cc:2445)
> V  [jvm.dll+0xbe9e]  testing::internal::UnitTestImpl::RunAllTests+0x25e
>  (gtest.cc:4316)
> V  [jvm.dll+0x2a410]  testing::internal::HandleSehExceptionsInMethodIfS
> upported<testing::internal::UnitTestImpl,bool>+0x40  (gtest.cc:2063)
> V  [jvm.dll+0x2493e]  testing::internal::HandleExceptionsInMethodIfSupp
> orted<testing::internal::UnitTestImpl,bool>+0x5e  (gtest.cc:2114)
> V  [jvm.dll+0xb0e9]  testing::UnitTest::Run+0xe9  (gtest.cc:3929)
> V  [jvm.dll+0x2d0bf]  RUN_ALL_TESTS+0xf  (gtest.h:2289)
> V  [jvm.dll+0x2cc76]  runUnitTestsInner+0x1f6  (gtestmain.cpp:249)
> V  [jvm.dll+0x2c958]  runUnitTests+0x48  (gtestmain.cpp:319)
> C  [gtestLauncher.exe+0x1011]  main+0x11  (gtestlauncher.cpp:32)
> C  [gtestLauncher.exe+0x1182]  __tmainCRTStartup+0x122  (crtexe.c:555)
> C  [KERNEL32.DLL+0x138f4]
> C  [ntdll.dll+0x65de3]
> C  [ntdll.dll+0x65dae]
>
> .. which is implemented for Windows, but pipes are laid to plug in other
> platforms as well (Decoder::get_source_info())
>
> ----
>
> Tell me what you think.
>
> Thanks, and Kind Regards, Thomas
>
>
>
>
>


More information about the hotspot-runtime-dev mailing list