RFR: 8130212: Thread::current() might access freed memory on Solaris

Mon Aug 3 06:01:37 UTC 2015

Hi Vladimir,

On 31/07/2015 12:12 PM, Vladimir Kozlov wrote:
>  > I can add a 32-bit version of the change but am not sure how to test it.
>  > Can we still easily build 32-bit Solaris or do I need to hack the build
>  > systems somehow?
>
> You have to build first 32-bit 8u60 (using 7u80, for example, as boot
> jdk). And then use it as boot jdk for jdk9 32-bit build. Both builds
> have to be configured with --with-target-bits=32.
>
> And there are no way to do that in JPRT as I know. Only manually :(
>
> If it is hard for you I will manage (will prepare patch later).
>
> So I am fine with current code if you don't want to spend more time on it.

I tried applying my patch to jkd8u-dev and performing the 32-bit changes 
there to test it out. But when I did the 32-bit build the resulting 
binaries would not even run!

 > ./b00/se-solaris-i586-internal/images/j2sdk-image/bin/java -client 
-version
Error: dl failure on line 893
Error: failed 
/scratch/dh198349/jdk8u-dev/build/b00/se-solaris-i586-internal/images/j2sdk-image/jre/lib/i386/client/libjvm.so, 
because ld.so.1: java: fatal: 
/scratch/dh198349/jdk8u-dev/build/b00/se-solaris-i586-internal/images/j2sdk-image/jre/lib/i386/client/libjvm.so: 
open failed: No such file or directory

 > find b00/se-solaris-i586-internal/images/j2sdk-image/ -name libjvm.so
b00/se-solaris-i586-internal/images/j2sdk-image/jre/lib/i386/server/libjvm.so
b00/se-solaris-i586-internal/images/j2sdk-image/jre/lib/i386/client/libjvm.so

I have no idea what is going on. So, sorry, but I won't be doing the 
32-bit variant.

David
-----

> Thanks,
> Vladimir
>
> On 7/30/15 2:59 PM, David Holmes wrote:
>> Hi Vladimir,
>>
>> On 31/07/2015 4:30 AM, Vladimir Kozlov wrote:
>>> On 7/29/15 9:40 PM, David Holmes wrote:
>>>> Hi Vladimir,
>>>>
>>>> On 30/07/2015 9:45 AM, Vladimir Kozlov wrote:
>>>>> David,
>>>>>
>>>>> I don't see changes in macroAssembler_sparc.cpp for get_thread().
>>>>
>>>> There's no need it already simply calls ThreadLocalStorage::thread
>>>> (with an indirection for debug builds)
>>>
>>> You are right.
>>>
>>>>
>>>>> Did you look what assembler is generated for thread() method?:
>>>>>
>>>>> inline Thread* ThreadLocalStorage::thread()  {
>>>>>    return _thr_current;
>>>>> }
>>>>
>>>> In the object file it generates code of the form (for x86):
>>>>
>>>>   leaq   +0x0(%rip),%rdi
>>>>
>>>> followed by a call that is translated as a call to _tls_get_addr. But
>>>> there is a lot of link-edit time manipulation of
>>>> the code. I don't know how to find the exact runtime code.
>>>
>>> Yes, I realized it will be difficult to reproduce what C++ compiler and
>>> liker will do.
>>>
>>>>
>>>>>
>>>>> It would be nice to avoid runtime call in assembler if we know address
>>>>> of _thr_current.
>>>>
>>>> But the address itself would be a per-thread variable that we'd need
>>>> to get from somewhere ??
>>>
>>> Yes, yes, it is true. And we want to avoid any tables which led to this
>>> problem.
>>> Okay, since it is slow path we can do call.
>>>
>>> Last thing. Can you preserve separate versions for 32- and 64-bits?
>>> Sometimes I have to build 32-bit Sol-x86 for debugging with dbx.
>>
>> I was under the impression that 32-bit Solaris support had been flagged
>> for deletion.
>>
>> I can add a 32-bit version of the change but am not sure how to test it.
>> Can we still easily build 32-bit Solaris or do I need to hack the build
>> systems somehow?
>>
>> Thanks,
>> David
>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 7/29/15 3:38 PM, Christian Thalinger wrote:
>>>>>>
>>>>>>> On Jul 29, 2015, at 3:25 PM, David Holmes <david.holmes at oracle.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> <replying on hotspot-dev>
>>>>>>>
>>>>>>> On 30/07/2015 1:28 AM, Christian Thalinger wrote:
>>>>>>>> Now that threadLS_solaris_sparc.{cpp,hpp} and
>>>>>>>> threadLS_solaris_x86.{cpp,hop} look exactly the same it would be
>>>>>>>> nice to merge them into threadLS_solaris.{cpp,hpp}.
>>>>>>>
>>>>>>> In the next phase:
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>>
>>>>>>> these files will disappear completely. Can this wait till then?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>>> On Jul 28, 2015, at 10:56 PM, David Holmes
>>>>>>>>> <david.holmes at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Moved to hotspot-dev so the compiler folk also see this for the
>>>>>>>>> MacroAssembler changes.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 29/07/2015 3:53 PM, David Holmes wrote:
>>>>>>>>>> I forgot to credit Dave Dice with the suggestion to modernize
>>>>>>>>>> this
>>>>>>>>>> code.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> On 29/07/2015 3:46 PM, David Holmes wrote:
>>>>>>>>>>> Summary: replace complex custom code for maintaining
>>>>>>>>>>> ThreadLocalStorage
>>>>>>>>>>> with compiler supported thread-local variables (Solaris only)
>>>>>>>>>>>
>>>>>>>>>>> This is a non-public bug so let me explain with some
>>>>>>>>>>> background, the
>>>>>>>>>>> bug, and then the fix - which involves lots of complex-code
>>>>>>>>>>> deletion and
>>>>>>>>>>> addition of some very simple code. :)
>>>>>>>>>>>
>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>>>>>>>>>>
>>>>>>>>>>> In various parts of the runtime and in compiler generated
>>>>>>>>>>> code we
>>>>>>>>>>> need
>>>>>>>>>>> to get a reference to the (VM-level) Thread* of the currently
>>>>>>>>>>> executing
>>>>>>>>>>> thread. This is what Thread::current() returns. For performance
>>>>>>>>>>> reasons
>>>>>>>>>>> we also have a fast-path on 64-bit where the Thread* is stashed
>>>>>>>>>>> away in
>>>>>>>>>>> a register (g7 on sparc, r15 on x64).
>>>>>>>>>>>
>>>>>>>>>>> So Thread::current() is actually a slow-path mechanism and it
>>>>>>>>>>> delegates
>>>>>>>>>>> to ThreadLocalStorage::thread().
>>>>>>>>>>>
>>>>>>>>>>> On some systems ThreadLocalStorage::thread utilizes a caching
>>>>>>>>>>> mechanism
>>>>>>>>>>> to try and speed up access to the current thread. Otherwise it
>>>>>>>>>>> calls
>>>>>>>>>>> into yet another "slow" path which uses the available platform
>>>>>>>>>>> thread-specific-storage APIs.
>>>>>>>>>>>
>>>>>>>>>>> Compiled code also has a slow-path get_thread() method which
>>>>>>>>>>> uses
>>>>>>>>>>> assembly code to invoke the same platform
>>>>>>>>>>> thread-specific-storage
>>>>>>>>>>> APIs
>>>>>>>>>>> (in some cases - on sparc it simply calls
>>>>>>>>>>> ThreadLocalStorage::thread()).
>>>>>>>>>>>
>>>>>>>>>>> On Solaris 64-bit (which is all we support today) there is a
>>>>>>>>>>> simple
>>>>>>>>>>> 1-level thread cache which is an array of Thread*. If a thread
>>>>>>>>>>> doesn't
>>>>>>>>>>> find itself in the slot for the hash of its id it inserts itself
>>>>>>>>>>> there.
>>>>>>>>>>> As a thread terminates it clears out its ThreadLocalStorage
>>>>>>>>>>> values
>>>>>>>>>>> including any cached reference.
>>>>>>>>>>>
>>>>>>>>>>> The bug is that we have potential for a read-after-free error
>>>>>>>>>>> due to
>>>>>>>>>>> this code:
>>>>>>>>>>>
>>>>>>>>>>>    46   uintptr_t raw = pd_raw_thread_id();
>>>>>>>>>>>    47   int ix = pd_cache_index(raw);  // hashes id
>>>>>>>>>>>    48   Thread* candidate =
>>>>>>>>>>> ThreadLocalStorage::_get_thread_cache[ix];
>>>>>>>>>>>    49   if (candidate->self_raw_id() == raw) {
>>>>>>>>>>>    50     // hit
>>>>>>>>>>>    51     return candidate;
>>>>>>>>>>>    52   } else {
>>>>>>>>>>>    53     return
>>>>>>>>>>> ThreadLocalStorage::get_thread_via_cache_slowly(raw,
>>>>>>>>>>> ix);
>>>>>>>>>>>    54   }
>>>>>>>>>>>
>>>>>>>>>>> The problem is that the value read as candidate could be a
>>>>>>>>>>> thread
>>>>>>>>>>> that
>>>>>>>>>>> (after line 48) terminated and was freed. But line #49 then
>>>>>>>>>>> reads
>>>>>>>>>>> the
>>>>>>>>>>> raw id of that thread, which is then a read-after-free -
>>>>>>>>>>> which is
>>>>>>>>>>> a "Bad
>>>>>>>>>>> Thing (TM)".
>>>>>>>>>>>
>>>>>>>>>>> There's no simple fix for the caching code - you would need a
>>>>>>>>>>> completely
>>>>>>>>>>> different approach (or synchronization that would nullify the
>>>>>>>>>>> whole
>>>>>>>>>>> point of the cache).
>>>>>>>>>>>
>>>>>>>>>>> Now all this ThreadLocalStorage code is pretty old and was
>>>>>>>>>>> put in
>>>>>>>>>>> place
>>>>>>>>>>> to deal with inadequacies of the system provided
>>>>>>>>>>> thread-specific-storage
>>>>>>>>>>> API. In fact on Solaris we even by-pass the public API
>>>>>>>>>>> (thr_getspecific/thr_setspecific) when we can and implement our
>>>>>>>>>>> own
>>>>>>>>>>> version using lower-level APIs available in the T1/T2 threading
>>>>>>>>>>> libraries!
>>>>>>>>>>>
>>>>>>>>>>> In mid-2015 things have changed considerably and we have
>>>>>>>>>>> reliable
>>>>>>>>>>> and
>>>>>>>>>>> performant support for thread-local variables at the C+
>>>>>>>>>>> language-level.
>>>>>>>>>>> So the way to maintain the current thread is simply using:
>>>>>>>>>>>
>>>>>>>>>>>   // Declaration of thread-local variable
>>>>>>>>>>>   static __thread Thread * _thr_current
>>>>>>>>>>>
>>>>>>>>>>>   inline Thread* ThreadLocalStorage::thread()  {
>>>>>>>>>>>     return _thr_current;
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>>   inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>>>>>>>>>>     _thr_current = thread;
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> And all the complex ThreadLocalStorage code with caching etc all
>>>>>>>>>>> vanishes!
>>>>>>>>>>>
>>>>>>>>>>> For my next trick I plan to try and remove the
>>>>>>>>>>> ThreadLocalStorage
>>>>>>>>>>> class
>>>>>>>>>>> completely by using language-based thread-locals on all
>>>>>>>>>>> platforms. But
>>>>>>>>>>> for now this is just Solaris and so we still need the
>>>>>>>>>>> ThreadLocalStorage
>>>>>>>>>>> API. However a lot of that API is not needed any more on Solaris
>>>>>>>>>>> so I
>>>>>>>>>>> have excluded it from there in the shared code (ifndef SOLARIS).
>>>>>>>>>>> But to
>>>>>>>>>>> avoid changing other shared-code callsites of ThreadLocalStorage
>>>>>>>>>>> I've
>>>>>>>>>>> kept part of the API with trivial implementations on Solaris.
>>>>>>>>>>>
>>>>>>>>>>> Testing: JPRT
>>>>>>>>>>>           All hotspot regression tests
>>>>>>>>>>>
>>>>>>>>>>> I'm happy to run more tests but the nice thing about such
>>>>>>>>>>> low-level code
>>>>>>>>>>> is that if it is broken, it is always broken :) Every use of
>>>>>>>>>>> Thread::current or MacroAssembler::get_thread now hits this
>>>>>>>>>>> code.
>>>>>>>>>>>
>>>>>>>>>>> Performance: I've run a basic set of benchmarks that is readily
>>>>>>>>>>> available to me on our performance testing system. The best
>>>>>>>>>>> way to
>>>>>>>>>>> describe the result is neutral. There are some slight wins, and
>>>>>>>>>>> some
>>>>>>>>>>> slight losses, with most showing no statistical difference. And
>>>>>>>>>>> even the
>>>>>>>>>>> "wins" and "losses" are within the natural variations of the
>>>>>>>>>>> benchmarks.
>>>>>>>>>>> So a lot of complex code has been replaced by simple code and we
>>>>>>>>>>> haven't
>>>>>>>>>>> lost any observable performance - which seems like a win to me.
>>>>>>>>>>>
>>>>>>>>>>> Also product mode x64 libjvm.so has shrunk by 921KB - which is a
>>>>>>>>>>> little
>>>>>>>>>>> surprising but very nice.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>
>>>>>>