Use of C++ dynamic global object initialization with thread guards

Tue Feb 6 20:31:56 UTC 2024

* Kim Barrett:

>> On Dec 6, 2023, at 5:51 AM, Florian Weimer <fweimer at redhat.com> wrote:
>> 
>> * Kim Barrett:
>> 
>>>> The implementation of __cxa_guard_acquire is not entirely trivial
>>>> because it detects recursive initialization and throws
>>>> __gnu_cxx::recursive_init_error, which means that it pulls in the C++
>>>> unwinder (at least with a traditional GNU/Linux build of libstdc++.a).
>>> 
>>> Does it?  Seems like it shouldn’t.  We build with -fno-exceptions, and
>>> the definition of throw_recursive_init_exception is conditionalized on
>>> __cpp_exceptions, only throwing when that macro is defined.  It calls
>>> __builtin_trap() if that macro isn’t defined.
>> 
>> With upstream GCC (and presumably most distributions), there's one
>> libstdc++.a with one implementation of __cxa_guard_acquire, and it's
>> built with exception support.
>> 
>> It's supposed to be possible to build libstdc++ without exception
>> support, but upstream GCC doesn't do this automatically for you if the
>> target supports exception handling.  In principle, the GCC specs
>> mechanism allows you to treat -fno-exceptions as a linker flag and link
>> against a custom no-exceptions build of libstdc++.a.
>> 
>> Maybe this is what your toolchain is doing if you don't see the unwinder
>> symbols in your builds?  It should be easy enough to check if you have a
>> build with a symbol table: look for a call in __cxa_throw in the
>> disassembly of __cxa_guard_acquire.cold or __cxa_guard_acquire.  One of
>> our builds looks like this:
>
> I've verified that the same is happening in Oracle builds.  We don't build an
> exception-disabled libstdc++ as part of our devkit either.
>
> So my next question is, exactly what is the harm, and how serious is it? So
> far, I don't know of anyone noticing a problem arising from this.

I had hoped that I could introduce this piece by piece. 8-)

I was investigating a way to backdate the glibc version requirement to
glibc 2.12 (or glibc 2.17 for AArch64 and POWER) without a devkit.  The
idea is to use a system toolchain on a modern system (I used glibc 2.34
with GCC 11 on RHEL 9) and end up with a binary that can run ancient
systems.  This is based on the observation that OpenJDK uses a subset of
glibc which has a stable ABI across symbol versions (no glob, for
example), so it's only necessary to supply stub .so files for linking,
plus a few statically linked stubs for the old __xstat interfaces.  The
system headers can be used as is for compiling OpenJDK.  Even the
current glibc startup files work because the main executables do not
depend on the execution of their ELF constructors.  Among the directly
linked in shared objects, only glibc has symbol versioning information,
so no other stubs were needed.  All in all it was pleasently
straightforward.

But I did not expect the significant dependencies on libstdc++.  The
__cxa_guard_acquire function is just one aspect of this.  It forced me
to stick with a glibc 2.34 system because with GCC 12/glibc 2.35 and
later, the ELF-specific part of the unwinder is gone from libgcc_eh.a
and is now dynamically linked from glibc (via the _dl_find_object
function).  Obviously, _dl_find_object does not exist in older glibc
versions, so this rather trivial backdating approach does not work.  I
could supply an always-failing stub for _dl_find_object to address this
issue because we do not expect unwinding to occur, and this should allow
building against later glibc/GCC combinations.  It's still ugly because
it's not a forward-looking change.  With extensive dependencies on
libstdc++ functionality and more expected to come, glibc or GCC updates
will likely require further changes.  If it were just about adhering to
-fno-exceptions, we could probably change GCC upstream to link with a
-fno-exceptions build of libstdc++.a, turning -fno-exceptions into a
linker flag as well (I already asked).  But based on your comments, I
get the impression that limited use of libstdc++ is probably a temporary
affair.

A complicated dependency that cannot simply be stubbed out already
exists: thread_local variables with destructors introduce a dependency
on __cxa_thread_atexit_impl at GLIBC_2.18.  I believe use of thread_local
destructors is probably not worth the hidden complexity.  Hotspot
already uses pthread_key_create with a destructor callback function
elsewhere.  For deterministic shutdown ordering independent of glibc
version, all shutdown activities should probably be called from there.
(On glibc 2.17 and earlier, GCC uses a pthread_key_create-based
emulation internally, with potentially different invocation order with
the other pthread_key_create destructor.  That seems an unnecessary
divergence even with the existing devkit builds.)  For my experiment I
just replaced the thread_local destructor callback with another
pthread_key_create (similar to what must happen with the devkit builds).

The stub generation is really simple, much easier than maintaining a
full toolchain (e.g., if you use a system toolchain, you get the
recent-ish AArch64 backend changes and subsequent fixes automatically;
they are not included in GCC 11.3).  But it seems the idea is rooted in
on a false premise regarding Hotspot's libstdc++ usage.

I hope this explains why I was looking into this.

Thanks,
Florian