RFR(m): 8185712: [windows] Multiple issues with the native symbol decoder

Thomas Stüfe thomas.stuefe at gmail.com
Sun Aug 13 15:14:30 UTC 2017


Dear all,

May I please have reviews for the following patch.

Issue: https://bugs.openjdk.java.net/browse/JDK-8185712
Webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8185712-windows-improve-native-symbol-resolver/webrev.00/webrev/

This fix makes the native symbol decoder on Windows more robust. This
increases the chance of getting useful error files. It also refactors the
code and contains some useful new features.

---

Native symbol resolution on windows is done via the group of "SymXX" APIs
exported from dbghelp.dll. Currently this is done via a layer of
abstraction in decoder.cpp, and a "WindowsDecoder" class in
decoder_windows.cpp. This class can be instantianted twice, so there can
exist two objects of this class.

The bugs:

1) Functions from dbghelp.dll are not threadsafe; calls to them must be
synchronized. But dbghelp.dll could be used by the two live WindowsDecoder
objects in parallel from different threads. In addition to that,
dbghelp.dll functions are used from within os_windows.cpp, when writing
minidumps.

2) the SymXX APIs need to be initialized by a call to SymInitialize(). This
can only happen once. The way it is now, each of the two WindowsDecoder
objects will call SymInitialize. The second object to do this will fail and
hence not be usable. In practice this means if in the VM someone called
e.g. dll_from_address_name() - which will initialize and use the shared
WindowsDecoder object - and then the VM crashes, the hs-err file will
contain no useful stack because that would require the second
WindowsDecoder object, which would get initialized after the shared one,
and its initialization would fail.

3) Initialization dependencies:
  - when building the pdb search path (WindowsDecoder::initialize),
Arguments::get_java_home() is used to deduce the jdk bin directory
containing the jdk shared objects. This will crash if invoked before the
system properties are set.
  - Decoder::decode() calls (via the Monitor lock) Thread::current(). This
means the code will not work during initialization and where
Thread::current is not set (e.g. in an unattached thread). Admittedly a bit
theoretical, as this only affects the shared WindowsDecoder object, which
is not used during error reporting.

4) WindowsDecoder::initialize(), pdb search path: Code uses MAX_PATH in
various places, which is wrong. NTFS paths can be longer than that.
  - when calling SymGetSearchPath, code assumes a maximum path size of
MAX_PATH for the *combined* size of all directory names, which may be way
too small. Truncation is not handled, code will silently fail if output
buffer is too small, resulting in the existing pdb path to be overwritten
instead of being preserved.
  - GetModuleFileName(): similar problem, output buffer is MAX_PATH len,
which may be too small. Truncation will lead to wrong results.
  - Throughout the function "strncat" is used to assemble the search path,
but is used wrong and does not guard against buffer overflows (if that was
the intent - otherwise, why not just use strcat?). The last argument to
strncat should be the remaining space in the destination buffer, not the
length of the source string.

5) WindowsDecoder::decode(): We call SymGetSymFromAddr64() using an output
buffer of size MAX_PATH - which makes no sense at all, as we are retrieving
a symbol name, not a file name. Truncation is not handled. This is actually
dangerous, because SymGetSymFromAddr64() handles truncations sloppily, it
will return success and fill the output buffer completely, so on truncation
the symbol name will not be zero terminated.

In addition to the bugs, there are a number of things which could be done
better or simpler:

a) Setting the search path could be simplified and made more useful if we
would just add the directory of every loaded module to the search path
(currently we just add the two jdk bin directories, which also exposes us
to initialization dependency, see (3)). This would be more useful as a
common convention is to put pdb files beside binaries, and that way we
would catch all those. Including, but not limited to, our own jdk pdb files.

b) Dlls can be loaded and unloaded, and it would be nice to have an updated
pdb search path - e.g. in case a late-loaded third party JNI library
crashes. So, it would be nice to run (a) whenever a library is loaded or
unloaded, and for the process to be fast if nothing changed (e.g. if the
new DLL was loaded from a directory which is already part of the search
path).

c) It would be nice to have file name and line number in the callstack, too.

d) As pointed out in JDK-8144855, the function
Decoder::can_decode_C_frame_in_vm() is not necessary. This should be
handled in WindowsDecoder instead.

---

What I did in this fix:

- I pulled out dbghelp.dll handling into an own centralized singleton
(DbgHelpLoader). This class takes care of loading the Dll and synchonizes
access to all its functions, thus solving (1). DbgHelpLoader now replaces
all places where before the DLL was loaded manually.

- Atop of DbgHelpLoader there is another new singleton class,
"SymbolEngine", which wraps the life cycle of the symbol APIs. This solves
problem (2). In addition to that, this class is the new interface to the
SymXX APIs.

- this means that the existing code in decoder.cpp, which instantiates two
Decode objects and synchronizes access to one of them, makes no sense for
Windows. That whole layer is now bypassed in Windows - the static
Decoder::xxx() functions are now directly implemented in
decoder_windows.cpp and access the SymbolEngine singleton without any added
layers inbetween.

- SymbolEngine contains a new improved way to assemble the pdb search path
as described in (a) and (b). We iterate loaded modules, extract directories
and add them to the search path. Implementation takes care to not do
unnecessary work (avoiding changing the path if list of loaded modules is
unchanged, or if only Dlls from known directories were loaded).

- We now recalculate the pdb seach path if a DLL was loaded (via
os::dll_load). That makes it possible to debug third party dlls which do
not reside in jdk directories and were loaded after the SymbolEngine was
initialized.

- When decoding a symbol from an unknown address and the decode fails, we
attempt to rebuild the pdb search path and attempt again.

- Care was taken for this code to be robust. It may be called
pre-initialization, post-shutdown or under error conditions (low memory,
out of stack etc), so no VM infrastructure was used and memory use is
frugal (dynamic buffers allocated off the stack and reused were possible).

- I did away with the "Decoder::can_decode_C_frame_in_vm()" code. See
argumentation in JDK-8144855. To me, it makes no sense. The function is a
workaround the problem that, when faced with stripped binaries which only
have debug symbols for public functions, symbol info may be confusing
(public symbols + large offsets). But this is not done consistently, and
most programmers know not to trust symbols with large offsets.

- Finally, we now have source info in the callstack :)

Stack: [0x00840000,0x00890000],  sp=0x0088f884,  free space=318k
Native frames: (J=compiled Java code, A=aot compiled Java code,
j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xa26903]  VMError::controlled_crash+0x2a3  (vmerror.cpp:1683)
V  [jvm.dll+0xa2664e]  VMError::test_error_handler+0xe  (vmerror.cpp:1632)
V  [jvm.dll+0x6af796]  JNI_CreateJavaVM_inner+0x196  (jni.cpp:3988)
V  [jvm.dll+0x6af5c2]  JNI_CreateJavaVM+0x52  (jni.cpp:4036)
V  [jvm.dll+0x2cdfc]  init_jvm+0xdc  (gtestmain.cpp:94)
V  [jvm.dll+0x2d18a]  JVMInitializerListener::OnTestStart+0x7a
 (gtestmain.cpp:114)
V  [jvm.dll+0x829b]  testing::internal::TestEventRepeater::OnTestStart+0x5b
 (gtest.cc:2979)
V  [jvm.dll+0x6874]  testing::TestInfo::Run+0x54  (gtest.cc:2312)
V  [jvm.dll+0x6d8f]  testing::TestCase::Run+0xbf  (gtest.cc:2445)
V  [jvm.dll+0xbe9e]  testing::internal::UnitTestImpl::RunAllTests+0x25e
 (gtest.cc:4316)
V  [jvm.dll+0x2a410]
 testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool>+0x40
 (gtest.cc:2063)
V  [jvm.dll+0x2493e]
 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool>+0x5e
 (gtest.cc:2114)
V  [jvm.dll+0xb0e9]  testing::UnitTest::Run+0xe9  (gtest.cc:3929)
V  [jvm.dll+0x2d0bf]  RUN_ALL_TESTS+0xf  (gtest.h:2289)
V  [jvm.dll+0x2cc76]  runUnitTestsInner+0x1f6  (gtestmain.cpp:249)
V  [jvm.dll+0x2c958]  runUnitTests+0x48  (gtestmain.cpp:319)
C  [gtestLauncher.exe+0x1011]  main+0x11  (gtestlauncher.cpp:32)
C  [gtestLauncher.exe+0x1182]  __tmainCRTStartup+0x122  (crtexe.c:555)
C  [KERNEL32.DLL+0x138f4]
C  [ntdll.dll+0x65de3]
C  [ntdll.dll+0x65dae]

.. which is implemented for Windows, but pipes are laid to plug in other
platforms as well (Decoder::get_source_info())

----

Tell me what you think.

Thanks, and Kind Regards, Thomas


More information about the hotspot-runtime-dev mailing list