RFR(m): 8185712: [windows] Multiple issues with the native symbol decoder

Ioi Lam ioi.lam at oracle.com
Wed Aug 16 15:45:08 UTC 2017



On 8/16/17 8:12 AM, Thomas Stüfe wrote:
> Hi Ioi,
>
> On Wed, Aug 16, 2017 at 4:43 PM, Ioi Lam <ioi.lam at oracle.com 
> <mailto:ioi.lam at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     I am planning to review this but there's a lot of changes so it
>     would take me a little while :-(
>
>
> thank you!
>
> I currently wonder whether this is too massive and if I should split 
> this up. At least the centralized dbghelp.dll handling could easily 
> moved into an own patch, that would make this patch easier to digest. 
> What do you think?
I think having multiple smaller patches would be better. I would do a 
subset of our nightly tests before pushing, and then we can let our 
nightly test run for a couple of weeks to make sure no issues come up. 
Then we can try the next patch, and so forth.

Given this is an area that's not used often, and must perform correctly 
when "used" (e.g., where a VM crash state needs to be correctly logged), 
I think we need to take extra precautions. I hope this wouldn't mean too 
much extra work for you.
>
>     One quick question - what kind of testing has been done on these
>     changes? My main worry is (1) stability during normal execution,
>     (2) impacts to hs_err file generation.
>
>
> I built windows x86 and x64, slodebug and release. Ran gtests 
> manually. Our nightly builds ran (TCK, hotspot jtreg) with no problems 
> I could attribute to my patch.
>
> I also wrote a small selftest (SymbolEngine::selftest()) atop of the 
> ones I added to test_os.cpp which lives in symbol_api.cpp and stresses 
> the SymbolEngine class from the inside. I did not add this to the 
> webrev. For one because the webrev was already large and also because 
> this test was not a gtest but a selftest inside the VM and I did not 
> think you guys would accept this.
If the code size is not too big, maybe we can add it to whitebox? That 
way we can start it using jtreg. It would be kind of like the various 
<Class>::verify() routines in the JVM, and can be restricted in 
non-product builds if necessary.

Thanks
- Ioi
>
>     I can see the improvements in the hs_err call stack, but would
>     there be any regression in corner cases? How can we test for that?
>
>
> I think the code with my patch is way more stable than before, that 
> was the aim of this patch. One could write loads of regression tests 
> for corner cases.
>
> We actually do this at SAP with our own fork - we have extensive tests 
> for the hs-err file generation. But from experience I know that these 
> kind of tests are difficult to get 100% solid. For instance, one thing 
> we do at SAP is to the VM crash in many colorful ways and check the 
> callstacks for validity. However, this is fragile and depends on many 
> factors - how the compiler optimizes, whether we build with debug 
> infos etc. I hesitate to add tests like this to the OpenJDK which 
> causes sporadic work for maintainers, everyone would hate me :)
>
> Thanks, Thomas
>
>
>     Thanks
>
>     - Ioi
>
>
>
>     On 8/13/17 8:14 AM, Thomas Stüfe wrote:
>
>         Dear all,
>
>         May I please have reviews for the following patch.
>
>         Issue: https://bugs.openjdk.java.net/browse/JDK-8185712
>         <https://bugs.openjdk.java.net/browse/JDK-8185712>
>         Webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8185712-windows-improve-native-symbol-resolver/webrev.00/webrev/
>         <http://cr.openjdk.java.net/%7Estuefe/webrevs/8185712-windows-improve-native-symbol-resolver/webrev.00/webrev/>
>
>         This fix makes the native symbol decoder on Windows more
>         robust. This
>         increases the chance of getting useful error files. It also
>         refactors the
>         code and contains some useful new features.
>
>         ---
>
>         Native symbol resolution on windows is done via the group of
>         "SymXX" APIs
>         exported from dbghelp.dll. Currently this is done via a layer of
>         abstraction in decoder.cpp, and a "WindowsDecoder" class in
>         decoder_windows.cpp. This class can be instantianted twice, so
>         there can
>         exist two objects of this class.
>
>         The bugs:
>
>         1) Functions from dbghelp.dll are not threadsafe; calls to
>         them must be
>         synchronized. But dbghelp.dll could be used by the two live
>         WindowsDecoder
>         objects in parallel from different threads. In addition to that,
>         dbghelp.dll functions are used from within os_windows.cpp,
>         when writing
>         minidumps.
>
>         2) the SymXX APIs need to be initialized by a call to
>         SymInitialize(). This
>         can only happen once. The way it is now, each of the two
>         WindowsDecoder
>         objects will call SymInitialize. The second object to do this
>         will fail and
>         hence not be usable. In practice this means if in the VM
>         someone called
>         e.g. dll_from_address_name() - which will initialize and use
>         the shared
>         WindowsDecoder object - and then the VM crashes, the hs-err
>         file will
>         contain no useful stack because that would require the second
>         WindowsDecoder object, which would get initialized after the
>         shared one,
>         and its initialization would fail.
>
>         3) Initialization dependencies:
>            - when building the pdb search path
>         (WindowsDecoder::initialize),
>         Arguments::get_java_home() is used to deduce the jdk bin directory
>         containing the jdk shared objects. This will crash if invoked
>         before the
>         system properties are set.
>            - Decoder::decode() calls (via the Monitor lock)
>         Thread::current(). This
>         means the code will not work during initialization and where
>         Thread::current is not set (e.g. in an unattached thread).
>         Admittedly a bit
>         theoretical, as this only affects the shared WindowsDecoder
>         object, which
>         is not used during error reporting.
>
>         4) WindowsDecoder::initialize(), pdb search path: Code uses
>         MAX_PATH in
>         various places, which is wrong. NTFS paths can be longer than
>         that.
>            - when calling SymGetSearchPath, code assumes a maximum
>         path size of
>         MAX_PATH for the *combined* size of all directory names, which
>         may be way
>         too small. Truncation is not handled, code will silently fail
>         if output
>         buffer is too small, resulting in the existing pdb path to be
>         overwritten
>         instead of being preserved.
>            - GetModuleFileName(): similar problem, output buffer is
>         MAX_PATH len,
>         which may be too small. Truncation will lead to wrong results.
>            - Throughout the function "strncat" is used to assemble the
>         search path,
>         but is used wrong and does not guard against buffer overflows
>         (if that was
>         the intent - otherwise, why not just use strcat?). The last
>         argument to
>         strncat should be the remaining space in the destination
>         buffer, not the
>         length of the source string.
>
>         5) WindowsDecoder::decode(): We call SymGetSymFromAddr64()
>         using an output
>         buffer of size MAX_PATH - which makes no sense at all, as we
>         are retrieving
>         a symbol name, not a file name. Truncation is not handled.
>         This is actually
>         dangerous, because SymGetSymFromAddr64() handles truncations
>         sloppily, it
>         will return success and fill the output buffer completely, so
>         on truncation
>         the symbol name will not be zero terminated.
>
>         In addition to the bugs, there are a number of things which
>         could be done
>         better or simpler:
>
>         a) Setting the search path could be simplified and made more
>         useful if we
>         would just add the directory of every loaded module to the
>         search path
>         (currently we just add the two jdk bin directories, which also
>         exposes us
>         to initialization dependency, see (3)). This would be more
>         useful as a
>         common convention is to put pdb files beside binaries, and
>         that way we
>         would catch all those. Including, but not limited to, our own
>         jdk pdb files.
>
>         b) Dlls can be loaded and unloaded, and it would be nice to
>         have an updated
>         pdb search path - e.g. in case a late-loaded third party JNI
>         library
>         crashes. So, it would be nice to run (a) whenever a library is
>         loaded or
>         unloaded, and for the process to be fast if nothing changed
>         (e.g. if the
>         new DLL was loaded from a directory which is already part of
>         the search
>         path).
>
>         c) It would be nice to have file name and line number in the
>         callstack, too.
>
>         d) As pointed out in JDK-8144855, the function
>         Decoder::can_decode_C_frame_in_vm() is not necessary. This
>         should be
>         handled in WindowsDecoder instead.
>
>         ---
>
>         What I did in this fix:
>
>         - I pulled out dbghelp.dll handling into an own centralized
>         singleton
>         (DbgHelpLoader). This class takes care of loading the Dll and
>         synchonizes
>         access to all its functions, thus solving (1). DbgHelpLoader
>         now replaces
>         all places where before the DLL was loaded manually.
>
>         - Atop of DbgHelpLoader there is another new singleton class,
>         "SymbolEngine", which wraps the life cycle of the symbol APIs.
>         This solves
>         problem (2). In addition to that, this class is the new
>         interface to the
>         SymXX APIs.
>
>         - this means that the existing code in decoder.cpp, which
>         instantiates two
>         Decode objects and synchronizes access to one of them, makes
>         no sense for
>         Windows. That whole layer is now bypassed in Windows - the static
>         Decoder::xxx() functions are now directly implemented in
>         decoder_windows.cpp and access the SymbolEngine singleton
>         without any added
>         layers inbetween.
>
>         - SymbolEngine contains a new improved way to assemble the pdb
>         search path
>         as described in (a) and (b). We iterate loaded modules,
>         extract directories
>         and add them to the search path. Implementation takes care to
>         not do
>         unnecessary work (avoiding changing the path if list of loaded
>         modules is
>         unchanged, or if only Dlls from known directories were loaded).
>
>         - We now recalculate the pdb seach path if a DLL was loaded (via
>         os::dll_load). That makes it possible to debug third party
>         dlls which do
>         not reside in jdk directories and were loaded after the
>         SymbolEngine was
>         initialized.
>
>         - When decoding a symbol from an unknown address and the
>         decode fails, we
>         attempt to rebuild the pdb search path and attempt again.
>
>         - Care was taken for this code to be robust. It may be called
>         pre-initialization, post-shutdown or under error conditions
>         (low memory,
>         out of stack etc), so no VM infrastructure was used and memory
>         use is
>         frugal (dynamic buffers allocated off the stack and reused
>         were possible).
>
>         - I did away with the "Decoder::can_decode_C_frame_in_vm()"
>         code. See
>         argumentation in JDK-8144855. To me, it makes no sense. The
>         function is a
>         workaround the problem that, when faced with stripped binaries
>         which only
>         have debug symbols for public functions, symbol info may be
>         confusing
>         (public symbols + large offsets). But this is not done
>         consistently, and
>         most programmers know not to trust symbols with large offsets.
>
>         - Finally, we now have source info in the callstack :)
>
>         Stack: [0x00840000,0x00890000],  sp=0x0088f884, free space=318k
>         Native frames: (J=compiled Java code, A=aot compiled Java code,
>         j=interpreted, Vv=VM code, C=native code)
>         V  [jvm.dll+0xa26903] VMError::controlled_crash+0x2a3
>         (vmerror.cpp:1683)
>         V  [jvm.dll+0xa2664e] VMError::test_error_handler+0xe
>         (vmerror.cpp:1632)
>         V  [jvm.dll+0x6af796]  JNI_CreateJavaVM_inner+0x196 (jni.cpp:3988)
>         V  [jvm.dll+0x6af5c2]  JNI_CreateJavaVM+0x52 (jni.cpp:4036)
>         V  [jvm.dll+0x2cdfc]  init_jvm+0xdc (gtestmain.cpp:94)
>         V  [jvm.dll+0x2d18a]  JVMInitializerListener::OnTestStart+0x7a
>           (gtestmain.cpp:114)
>         V  [jvm.dll+0x829b] 
>         testing::internal::TestEventRepeater::OnTestStart+0x5b
>           (gtest.cc:2979)
>         V  [jvm.dll+0x6874]  testing::TestInfo::Run+0x54 (gtest.cc:2312)
>         V  [jvm.dll+0x6d8f]  testing::TestCase::Run+0xbf (gtest.cc:2445)
>         V  [jvm.dll+0xbe9e] 
>         testing::internal::UnitTestImpl::RunAllTests+0x25e
>           (gtest.cc:4316)
>         V  [jvm.dll+0x2a410]
>          
>         testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool>+0x40
>           (gtest.cc:2063)
>         V  [jvm.dll+0x2493e]
>          
>         testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool>+0x5e
>           (gtest.cc:2114)
>         V  [jvm.dll+0xb0e9]  testing::UnitTest::Run+0xe9 (gtest.cc:3929)
>         V  [jvm.dll+0x2d0bf]  RUN_ALL_TESTS+0xf (gtest.h:2289)
>         V  [jvm.dll+0x2cc76]  runUnitTestsInner+0x1f6 (gtestmain.cpp:249)
>         V  [jvm.dll+0x2c958]  runUnitTests+0x48 (gtestmain.cpp:319)
>         C  [gtestLauncher.exe+0x1011]  main+0x11 (gtestlauncher.cpp:32)
>         C  [gtestLauncher.exe+0x1182] __tmainCRTStartup+0x122 
>         (crtexe.c:555)
>         C  [KERNEL32.DLL+0x138f4]
>         C  [ntdll.dll+0x65de3]
>         C  [ntdll.dll+0x65dae]
>
>         .. which is implemented for Windows, but pipes are laid to
>         plug in other
>         platforms as well (Decoder::get_source_info())
>
>         ----
>
>         Tell me what you think.
>
>         Thanks, and Kind Regards, Thomas
>
>
>



More information about the hotspot-runtime-dev mailing list