RFR: 8130212: Thread::current() might access freed memory on Solaris
Vladimir Kozlov
vladimir.kozlov at oracle.com
Mon Aug 3 16:12:21 UTC 2015
Okay, you can skip 32-bit. Thank you for trying.
Vladimir
On 8/2/15 11:01 PM, David Holmes wrote:
> Hi Vladimir,
>
> On 31/07/2015 12:12 PM, Vladimir Kozlov wrote:
>> > I can add a 32-bit version of the change but am not sure how to test it.
>> > Can we still easily build 32-bit Solaris or do I need to hack the build
>> > systems somehow?
>>
>> You have to build first 32-bit 8u60 (using 7u80, for example, as boot
>> jdk). And then use it as boot jdk for jdk9 32-bit build. Both builds
>> have to be configured with --with-target-bits=32.
>>
>> And there are no way to do that in JPRT as I know. Only manually :(
>>
>> If it is hard for you I will manage (will prepare patch later).
>>
>> So I am fine with current code if you don't want to spend more time on it.
>
> I tried applying my patch to jkd8u-dev and performing the 32-bit changes there to test it out. But when I did the 32-bit
> build the resulting binaries would not even run!
>
> > ./b00/se-solaris-i586-internal/images/j2sdk-image/bin/java -client -version
> Error: dl failure on line 893
> Error: failed
> /scratch/dh198349/jdk8u-dev/build/b00/se-solaris-i586-internal/images/j2sdk-image/jre/lib/i386/client/libjvm.so, because
> ld.so.1: java: fatal:
> /scratch/dh198349/jdk8u-dev/build/b00/se-solaris-i586-internal/images/j2sdk-image/jre/lib/i386/client/libjvm.so: open
> failed: No such file or directory
>
> > find b00/se-solaris-i586-internal/images/j2sdk-image/ -name libjvm.so
> b00/se-solaris-i586-internal/images/j2sdk-image/jre/lib/i386/server/libjvm.so
> b00/se-solaris-i586-internal/images/j2sdk-image/jre/lib/i386/client/libjvm.so
>
> I have no idea what is going on. So, sorry, but I won't be doing the 32-bit variant.
>
> David
> -----
>
>> Thanks,
>> Vladimir
>>
>> On 7/30/15 2:59 PM, David Holmes wrote:
>>> Hi Vladimir,
>>>
>>> On 31/07/2015 4:30 AM, Vladimir Kozlov wrote:
>>>> On 7/29/15 9:40 PM, David Holmes wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> On 30/07/2015 9:45 AM, Vladimir Kozlov wrote:
>>>>>> David,
>>>>>>
>>>>>> I don't see changes in macroAssembler_sparc.cpp for get_thread().
>>>>>
>>>>> There's no need it already simply calls ThreadLocalStorage::thread
>>>>> (with an indirection for debug builds)
>>>>
>>>> You are right.
>>>>
>>>>>
>>>>>> Did you look what assembler is generated for thread() method?:
>>>>>>
>>>>>> inline Thread* ThreadLocalStorage::thread() {
>>>>>> return _thr_current;
>>>>>> }
>>>>>
>>>>> In the object file it generates code of the form (for x86):
>>>>>
>>>>> leaq +0x0(%rip),%rdi
>>>>>
>>>>> followed by a call that is translated as a call to _tls_get_addr. But
>>>>> there is a lot of link-edit time manipulation of
>>>>> the code. I don't know how to find the exact runtime code.
>>>>
>>>> Yes, I realized it will be difficult to reproduce what C++ compiler and
>>>> liker will do.
>>>>
>>>>>
>>>>>>
>>>>>> It would be nice to avoid runtime call in assembler if we know address
>>>>>> of _thr_current.
>>>>>
>>>>> But the address itself would be a per-thread variable that we'd need
>>>>> to get from somewhere ??
>>>>
>>>> Yes, yes, it is true. And we want to avoid any tables which led to this
>>>> problem.
>>>> Okay, since it is slow path we can do call.
>>>>
>>>> Last thing. Can you preserve separate versions for 32- and 64-bits?
>>>> Sometimes I have to build 32-bit Sol-x86 for debugging with dbx.
>>>
>>> I was under the impression that 32-bit Solaris support had been flagged
>>> for deletion.
>>>
>>> I can add a 32-bit version of the change but am not sure how to test it.
>>> Can we still easily build 32-bit Solaris or do I need to hack the build
>>> systems somehow?
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 7/29/15 3:38 PM, Christian Thalinger wrote:
>>>>>>>
>>>>>>>> On Jul 29, 2015, at 3:25 PM, David Holmes <david.holmes at oracle.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> <replying on hotspot-dev>
>>>>>>>>
>>>>>>>> On 30/07/2015 1:28 AM, Christian Thalinger wrote:
>>>>>>>>> Now that threadLS_solaris_sparc.{cpp,hpp} and
>>>>>>>>> threadLS_solaris_x86.{cpp,hop} look exactly the same it would be
>>>>>>>>> nice to merge them into threadLS_solaris.{cpp,hpp}.
>>>>>>>>
>>>>>>>> In the next phase:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>>>>>>>
>>>>>>>> these files will disappear completely. Can this wait till then?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>>> On Jul 28, 2015, at 10:56 PM, David Holmes
>>>>>>>>>> <david.holmes at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Moved to hotspot-dev so the compiler folk also see this for the
>>>>>>>>>> MacroAssembler changes.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> On 29/07/2015 3:53 PM, David Holmes wrote:
>>>>>>>>>>> I forgot to credit Dave Dice with the suggestion to modernize
>>>>>>>>>>> this
>>>>>>>>>>> code.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> On 29/07/2015 3:46 PM, David Holmes wrote:
>>>>>>>>>>>> Summary: replace complex custom code for maintaining
>>>>>>>>>>>> ThreadLocalStorage
>>>>>>>>>>>> with compiler supported thread-local variables (Solaris only)
>>>>>>>>>>>>
>>>>>>>>>>>> This is a non-public bug so let me explain with some
>>>>>>>>>>>> background, the
>>>>>>>>>>>> bug, and then the fix - which involves lots of complex-code
>>>>>>>>>>>> deletion and
>>>>>>>>>>>> addition of some very simple code. :)
>>>>>>>>>>>>
>>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8130212/webrev/
>>>>>>>>>>>>
>>>>>>>>>>>> In various parts of the runtime and in compiler generated
>>>>>>>>>>>> code we
>>>>>>>>>>>> need
>>>>>>>>>>>> to get a reference to the (VM-level) Thread* of the currently
>>>>>>>>>>>> executing
>>>>>>>>>>>> thread. This is what Thread::current() returns. For performance
>>>>>>>>>>>> reasons
>>>>>>>>>>>> we also have a fast-path on 64-bit where the Thread* is stashed
>>>>>>>>>>>> away in
>>>>>>>>>>>> a register (g7 on sparc, r15 on x64).
>>>>>>>>>>>>
>>>>>>>>>>>> So Thread::current() is actually a slow-path mechanism and it
>>>>>>>>>>>> delegates
>>>>>>>>>>>> to ThreadLocalStorage::thread().
>>>>>>>>>>>>
>>>>>>>>>>>> On some systems ThreadLocalStorage::thread utilizes a caching
>>>>>>>>>>>> mechanism
>>>>>>>>>>>> to try and speed up access to the current thread. Otherwise it
>>>>>>>>>>>> calls
>>>>>>>>>>>> into yet another "slow" path which uses the available platform
>>>>>>>>>>>> thread-specific-storage APIs.
>>>>>>>>>>>>
>>>>>>>>>>>> Compiled code also has a slow-path get_thread() method which
>>>>>>>>>>>> uses
>>>>>>>>>>>> assembly code to invoke the same platform
>>>>>>>>>>>> thread-specific-storage
>>>>>>>>>>>> APIs
>>>>>>>>>>>> (in some cases - on sparc it simply calls
>>>>>>>>>>>> ThreadLocalStorage::thread()).
>>>>>>>>>>>>
>>>>>>>>>>>> On Solaris 64-bit (which is all we support today) there is a
>>>>>>>>>>>> simple
>>>>>>>>>>>> 1-level thread cache which is an array of Thread*. If a thread
>>>>>>>>>>>> doesn't
>>>>>>>>>>>> find itself in the slot for the hash of its id it inserts itself
>>>>>>>>>>>> there.
>>>>>>>>>>>> As a thread terminates it clears out its ThreadLocalStorage
>>>>>>>>>>>> values
>>>>>>>>>>>> including any cached reference.
>>>>>>>>>>>>
>>>>>>>>>>>> The bug is that we have potential for a read-after-free error
>>>>>>>>>>>> due to
>>>>>>>>>>>> this code:
>>>>>>>>>>>>
>>>>>>>>>>>> 46 uintptr_t raw = pd_raw_thread_id();
>>>>>>>>>>>> 47 int ix = pd_cache_index(raw); // hashes id
>>>>>>>>>>>> 48 Thread* candidate =
>>>>>>>>>>>> ThreadLocalStorage::_get_thread_cache[ix];
>>>>>>>>>>>> 49 if (candidate->self_raw_id() == raw) {
>>>>>>>>>>>> 50 // hit
>>>>>>>>>>>> 51 return candidate;
>>>>>>>>>>>> 52 } else {
>>>>>>>>>>>> 53 return
>>>>>>>>>>>> ThreadLocalStorage::get_thread_via_cache_slowly(raw,
>>>>>>>>>>>> ix);
>>>>>>>>>>>> 54 }
>>>>>>>>>>>>
>>>>>>>>>>>> The problem is that the value read as candidate could be a
>>>>>>>>>>>> thread
>>>>>>>>>>>> that
>>>>>>>>>>>> (after line 48) terminated and was freed. But line #49 then
>>>>>>>>>>>> reads
>>>>>>>>>>>> the
>>>>>>>>>>>> raw id of that thread, which is then a read-after-free -
>>>>>>>>>>>> which is
>>>>>>>>>>>> a "Bad
>>>>>>>>>>>> Thing (TM)".
>>>>>>>>>>>>
>>>>>>>>>>>> There's no simple fix for the caching code - you would need a
>>>>>>>>>>>> completely
>>>>>>>>>>>> different approach (or synchronization that would nullify the
>>>>>>>>>>>> whole
>>>>>>>>>>>> point of the cache).
>>>>>>>>>>>>
>>>>>>>>>>>> Now all this ThreadLocalStorage code is pretty old and was
>>>>>>>>>>>> put in
>>>>>>>>>>>> place
>>>>>>>>>>>> to deal with inadequacies of the system provided
>>>>>>>>>>>> thread-specific-storage
>>>>>>>>>>>> API. In fact on Solaris we even by-pass the public API
>>>>>>>>>>>> (thr_getspecific/thr_setspecific) when we can and implement our
>>>>>>>>>>>> own
>>>>>>>>>>>> version using lower-level APIs available in the T1/T2 threading
>>>>>>>>>>>> libraries!
>>>>>>>>>>>>
>>>>>>>>>>>> In mid-2015 things have changed considerably and we have
>>>>>>>>>>>> reliable
>>>>>>>>>>>> and
>>>>>>>>>>>> performant support for thread-local variables at the C+
>>>>>>>>>>>> language-level.
>>>>>>>>>>>> So the way to maintain the current thread is simply using:
>>>>>>>>>>>>
>>>>>>>>>>>> // Declaration of thread-local variable
>>>>>>>>>>>> static __thread Thread * _thr_current
>>>>>>>>>>>>
>>>>>>>>>>>> inline Thread* ThreadLocalStorage::thread() {
>>>>>>>>>>>> return _thr_current;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> inline void ThreadLocalStorage::set_thread(Thread* thread) {
>>>>>>>>>>>> _thr_current = thread;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> And all the complex ThreadLocalStorage code with caching etc all
>>>>>>>>>>>> vanishes!
>>>>>>>>>>>>
>>>>>>>>>>>> For my next trick I plan to try and remove the
>>>>>>>>>>>> ThreadLocalStorage
>>>>>>>>>>>> class
>>>>>>>>>>>> completely by using language-based thread-locals on all
>>>>>>>>>>>> platforms. But
>>>>>>>>>>>> for now this is just Solaris and so we still need the
>>>>>>>>>>>> ThreadLocalStorage
>>>>>>>>>>>> API. However a lot of that API is not needed any more on Solaris
>>>>>>>>>>>> so I
>>>>>>>>>>>> have excluded it from there in the shared code (ifndef SOLARIS).
>>>>>>>>>>>> But to
>>>>>>>>>>>> avoid changing other shared-code callsites of ThreadLocalStorage
>>>>>>>>>>>> I've
>>>>>>>>>>>> kept part of the API with trivial implementations on Solaris.
>>>>>>>>>>>>
>>>>>>>>>>>> Testing: JPRT
>>>>>>>>>>>> All hotspot regression tests
>>>>>>>>>>>>
>>>>>>>>>>>> I'm happy to run more tests but the nice thing about such
>>>>>>>>>>>> low-level code
>>>>>>>>>>>> is that if it is broken, it is always broken :) Every use of
>>>>>>>>>>>> Thread::current or MacroAssembler::get_thread now hits this
>>>>>>>>>>>> code.
>>>>>>>>>>>>
>>>>>>>>>>>> Performance: I've run a basic set of benchmarks that is readily
>>>>>>>>>>>> available to me on our performance testing system. The best
>>>>>>>>>>>> way to
>>>>>>>>>>>> describe the result is neutral. There are some slight wins, and
>>>>>>>>>>>> some
>>>>>>>>>>>> slight losses, with most showing no statistical difference. And
>>>>>>>>>>>> even the
>>>>>>>>>>>> "wins" and "losses" are within the natural variations of the
>>>>>>>>>>>> benchmarks.
>>>>>>>>>>>> So a lot of complex code has been replaced by simple code and we
>>>>>>>>>>>> haven't
>>>>>>>>>>>> lost any observable performance - which seems like a win to me.
>>>>>>>>>>>>
>>>>>>>>>>>> Also product mode x64 libjvm.so has shrunk by 921KB - which is a
>>>>>>>>>>>> little
>>>>>>>>>>>> surprising but very nice.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>
>>>>>>>
More information about the hotspot-dev
mailing list