RFR(xs): 8184339: Thread::current_or_null() should not assert if Posix TLS is not yet initialized

Mon Aug 7 03:26:52 UTC 2017

Hi Thomas,

On 1/08/2017 11:40 PM, Thomas Stüfe wrote:
> Hi David!
> 
> On Tue, Aug 1, 2017 at 9:33 AM, David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Hi Thomas,
> 
>     Back from vacation ...
> 
>     I appreciate the effort to fix this problem, however ... :)
> 
>         Thomas Stüfe thomas.stuefe at gmail.com <http://gmail.com>
>         Mon Jul 17 07:00:01 UTC 2017
>         Hi all,
> 
>         may I please have reviews + a sponsor for the following fix:
> 
>         bug: https://bugs.openjdk.java.net/browse/JDK-8184339
>         <https://bugs.openjdk.java.net/browse/JDK-8184339>
>         webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8184339-Thread-current-or-null-shall-not-assert/webrev.00/webrev/
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8184339-Thread-current-or-null-shall-not-assert/webrev.00/webrev/>
> 
>         The problem is caused by the fact that Posix TLS cannot be used
>         before it
>         is initialized. It is initialized in os::init(). If we use Posix
>         TLS (eg
>         via Thread::current()) before, we assert. It is used now (since
>         JDK-8183925
>         <https://bugs.openjdk.java.net/browse/JDK-8183925
>         <https://bugs.openjdk.java.net/browse/JDK-8183925>>) before
>         os::init() (see
>         callstack in bug description).
> 
> 
>     ... those kind of initialization order changes are exactly what the
>     assertion was supposed to detect! It can be dangerous to use various
>     VM facilities "too soon" in the initialization sequence. The change
>     in JDK-8183925 needs a close examination IMHO before relaxing this
>     guard. :(
> 
> 
> I agree with you that using a VM facility "too soon" is the core of this 
> problem. But I am not sure that just forbidding that is a practical 
> solution, because these pre-init-uses have the tendency to creep up on you:
> 
> There are many functions in the VM which very basic and which are used 
> without regard for initialization sequence. os::malloc() is one example, 
> Thread::current_or_null() is another. It is difficult to find pre-init 
> uses with tests, especially if we have code paths which are only 
> executed on one platform, like in this case.
> 
> In this case, on AIX, we had two problems, Thread::current_or_null() was 
> asserting, then it was again used in error handling, asserting again, 
> which lead to an annoying recursion difficult to entangle (on AIX with 
> its less-than-optimal debugger at least).
> 
> To prevent this from happening, one would have to guard every 
> "Thread::current_or_null()" use with something like "if 
> ThreadLocalStorage.is_initialized()". But I think a more practical 
> solution would be to keep these basic functions safe to be called 
> anytime. And maybe make this a clear part of the interface, and test 
> this too. Similar to the Posix list of functions safe to be called in 
> signal handlers.

I don't agree :) I would prefer if nothing were executed prior to VM 
initialization - certainly "on load" actions and C++ static 
initialization actions are very fragile because there is nothing in any 
of that code to indicate that it is being executed in such an 
environment. So it is far too easy for people to make changes to code 
that is called from that environment in ways that are incompatible with 
that environment. Hence using asserts to try and catch when someone 
makes such a change.

The fact such assertion failures can trigger further problems with error 
reporting just reinforces the basic problem. Error reporting makes lots 
of assumptions about the state of things and should not be called "very 
early" in the VM init sequence.

> Thread::current_or_null() could return NULL before initialization, 
> os::malloc() could refrain from using NMT before its initialization ran, 
> and NMT in turn could tolerate "free/realloc-without-malloc" cases...
> 
> (Note that this discussion feels similar to the "CanUseSafeFetch()" 
> issue we had two years ago, and we did not agree then either :)

:)

Cheers,
David

> 
> 
>         There are two functions, Thread::current() and
>         Thread::current_or_null().
>         The latter should not assert if Posix TLS is not yet available
>         but return
>         NULL instead. This is what callers expect: this function is safe
>         to call,
>         but it might not return a valid Thread*.
> 
> 
>     Yes but because the current thread has not yet been set, not because
>     we're executing the code well before any VM initialization has been
>     performed! :(
> 
> 
> I think anyone using Thread::current_or_null() is aware that the thread 
> pointer returned may be NULL and is prepared to handle it, but does not 
> care for the reason it is NULL. Like, in error handling.
> 
>         Note that this issue currently only comes up at AIX, because AIX
>         is the
>         only platform using Posix TLS for Thread::current() - all other
>         platforms
>         use Compiler-based TLS (__thread variables).
> 
> 
>     It's also used in the mobile-dev project IIRC, but they don't have
>     JDK-8183925 applied.
> 
>         However, we want to keep the Posix TLS code path alive, so this
>         may also
>         affect other platforms. There have been compiler bugs in the
>         past (e.g. gcc
>         + glibc) leading to errors when using compiler-based TLS, so it
>         is good to
>         have a reliable fallback.
> 
> 
>     Definitely!
> 
> 
> Good to be in agreement here!
> Kind Regards, Thomas
> 
>     Cheers,
>     David
> 
>         Thanks,
> 
>         Thomas
> 
>