From david.holmes at oracle.com Mon Nov 2 06:40:39 2015 From: david.holmes at oracle.com (David Holmes) Date: Mon, 2 Nov 2015 16:40:39 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables Message-ID: <56370567.3090801@oracle.com> bug: https://bugs.openjdk.java.net/browse/JDK-8132510 Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ A simple (in principle) but wide-ranging change which should appeal to our Code Deletion Engineer's. We implement Thread::current() using a compiler/language-based thread-local variable eg: static __thread Thread *_thr_current; inline Thread* Thread::current() { return _thr_current; } with an appropriate setter of course. By doing this we can completely remove the platform-specific ThreadLocalStorage implementations, and the associated os::thread_local_storage* calls, plus all the uses of ThreadLocalStorage::thread() and ThreadLocalStorage::get_thread_slow(). This extends the previous work done on Solaris to implement ThreadLocalStorage::thread() using compiler-based thread-locals. We can also consolidate nearly all the os_cpu versions of MacroAssembler::get_thread on x86 into one cpu specific one ( a special variant is still needed for 32-bit Windows). As a result of this change we have further potential cleanups: - all the src/os//vm/thread_.inline.hpp files are now completely empty and could also be removed - the MINIMIZE_RAM_USAGE define (which avoids use of the linux sp-map "cache" on 32-bit) now has no affect and so could be completely removed from the build system I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, but could add the removal of the "inline" files to this CR if people think it worth removing them. I have one missing piece on Aarch64 - I need to change MacroAssembler::get_thread to simply call Thread::current() as on other platforms, but I don't know how to write that. I would appreciate it if someone could give me the right code for that. I would also appreciate comments/testing by the AIX and PPC64 folk as well. A concern about memory-leaks had previously been raised, but experiments using simple C code on linux 86 and Solaris showed no issues. Also note that Aarch64 already uses this kind of thread-local. Thanks, David From david.holmes at oracle.com Mon Nov 2 11:14:09 2015 From: david.holmes at oracle.com (David Holmes) Date: Mon, 2 Nov 2015 21:14:09 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <5637138F.8090800@oracle.com> References: <56370567.3090801@oracle.com> <5637138F.8090800@oracle.com> Message-ID: <56374581.7080601@oracle.com> On 2/11/2015 5:41 PM, Per Liden wrote: > Hi David, > > On 2015-11-02 07:40, David Holmes wrote: >> bug: https://bugs.openjdk.java.net/browse/JDK-8132510 >> >> Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ >> >> A simple (in principle) but wide-ranging change which should appeal to >> our Code Deletion Engineer's. We implement Thread::current() using a >> compiler/language-based thread-local variable eg: >> >> >> static __thread Thread *_thr_current; >> >> inline Thread* Thread::current() { >> return _thr_current; >> } > > Do we expect the cost of calling Thread::current() to go down with this > change or will it remain about the same? Depends on the platform and the exact circumstances, given the varied caching and other "fast lookup" schemes previously employed (even though this is slow-path code). I do not expect it to be slower in general but we will be somewhat at the mercy of the particular platform implementation. The earlier Solaris change showed mildly positive results. I'll be starting some performance runs tomorrow. > Btw, this looks like a really nice cleanup/simplification! Thanks, David > cheers, > /Per > >> >> with an appropriate setter of course. By doing this we can completely >> remove the platform-specific ThreadLocalStorage implementations, and the >> associated os::thread_local_storage* calls, plus all the uses of >> ThreadLocalStorage::thread() and ThreadLocalStorage::get_thread_slow(). >> This extends the previous work done on Solaris to implement >> ThreadLocalStorage::thread() using compiler-based thread-locals. >> >> We can also consolidate nearly all the os_cpu versions of >> MacroAssembler::get_thread on x86 into one cpu specific one ( a special >> variant is still needed for 32-bit Windows). >> >> As a result of this change we have further potential cleanups: >> - all the src/os//vm/thread_.inline.hpp files are now completely >> empty and could also be removed >> - the MINIMIZE_RAM_USAGE define (which avoids use of the linux sp-map >> "cache" on 32-bit) now has no affect and so could be completely removed >> from the build system >> >> I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, but could >> add the removal of the "inline" files to this CR if people think it >> worth removing them. >> >> I have one missing piece on Aarch64 - I need to change >> MacroAssembler::get_thread to simply call Thread::current() as on other >> platforms, but I don't know how to write that. I would appreciate it if >> someone could give me the right code for that. >> >> I would also appreciate comments/testing by the AIX and PPC64 folk as >> well. >> >> A concern about memory-leaks had previously been raised, but experiments >> using simple C code on linux 86 and Solaris showed no issues. Also note >> that Aarch64 already uses this kind of thread-local. >> >> Thanks, >> David From thomas.stuefe at gmail.com Mon Nov 2 11:20:22 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 2 Nov 2015 12:20:22 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <56370567.3090801@oracle.com> References: <56370567.3090801@oracle.com> Message-ID: Hi David, some small changes are needed to make this build and run on AIX. I attached a patch file with the needed additions. I did not run any extensive tests on AIX, so I cannot say for sure if this is stable. We (SAP) also may face some problems later when we port this to HP-UX, because there, shared libraries using __thread cannot be loaded dynamically. So, I admit to some small worries, beside the issue with memory leaks on older glibc versions. For me, this feels like something which needs tight compiler/thread library support from the OS, so it makes us vulnerable to running on older systems (older glibc) or building with outdated compilers. Therefore it would be nice to have a simple way to re-add the pthread-based TLS implementation if needed. Apart from that, I like the patch and think the simplification is good and worth the effort. Kind Regards, Thomas On Mon, Nov 2, 2015 at 7:40 AM, David Holmes wrote: > bug: https://bugs.openjdk.java.net/browse/JDK-8132510 > > Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ > > A simple (in principle) but wide-ranging change which should appeal to our > Code Deletion Engineer's. We implement Thread::current() using a > compiler/language-based thread-local variable eg: > > > static __thread Thread *_thr_current; > > inline Thread* Thread::current() { > return _thr_current; > } > > with an appropriate setter of course. By doing this we can completely > remove the platform-specific ThreadLocalStorage implementations, and the > associated os::thread_local_storage* calls, plus all the uses of > ThreadLocalStorage::thread() and ThreadLocalStorage::get_thread_slow(). > This extends the previous work done on Solaris to implement > ThreadLocalStorage::thread() using compiler-based thread-locals. > > We can also consolidate nearly all the os_cpu versions of > MacroAssembler::get_thread on x86 into one cpu specific one ( a special > variant is still needed for 32-bit Windows). > > As a result of this change we have further potential cleanups: > - all the src/os//vm/thread_.inline.hpp files are now completely > empty and could also be removed > - the MINIMIZE_RAM_USAGE define (which avoids use of the linux sp-map > "cache" on 32-bit) now has no affect and so could be completely removed > from the build system > > I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, but could > add the removal of the "inline" files to this CR if people think it worth > removing them. > > I have one missing piece on Aarch64 - I need to change > MacroAssembler::get_thread to simply call Thread::current() as on other > platforms, but I don't know how to write that. I would appreciate it if > someone could give me the right code for that. > > I would also appreciate comments/testing by the AIX and PPC64 folk as well. > > A concern about memory-leaks had previously been raised, but experiments > using simple C code on linux 86 and Solaris showed no issues. Also note > that Aarch64 already uses this kind of thread-local. > > Thanks, > David > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: aix.patch Type: application/octet-stream Size: 1114 bytes Desc: not available URL: From david.holmes at oracle.com Mon Nov 2 20:01:04 2015 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Nov 2015 06:01:04 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> Message-ID: <5637C100.8040405@oracle.com> Hi Thomas, On 2/11/2015 9:20 PM, Thomas St?fe wrote: > Hi David, > > some small changes are needed to make this build and run on AIX. I > attached a patch file with the needed additions. Thanks! > I did not run any extensive tests on AIX, so I cannot say for sure if > this is stable. We (SAP) also may face some problems later when we port > this to HP-UX, because there, shared libraries using __thread cannot be > loaded dynamically. Ouch! > So, I admit to some small worries, beside the issue with memory leaks on > older glibc versions. For me, this feels like something which needs > tight compiler/thread library support from the OS, so it makes us > vulnerable to running on older systems (older glibc) or building with > outdated compilers. Therefore it would be nice to have a simple way to > re-add the pthread-based TLS implementation if needed. I can't see how to do that without keeping all the existing layers of code - even though they would be no-ops on all the platforms that support the compiler-based TLS. Basically just extend what I did for Solaris to the other platforms. > Apart from that, I like the patch and think the simplification is good > and worth the effort. Even if you can't easily add back the pthread-based TLS if needed? It is unfortunate that hotspot may still be shackled to the past that way - we killed off hotspot-express (in part) to remove those shackles and allow us to modernize the codebase. Thanks, David > Kind Regards, Thomas > > > > > > > On Mon, Nov 2, 2015 at 7:40 AM, David Holmes > wrote: > > bug: https://bugs.openjdk.java.net/browse/JDK-8132510 > > Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ > > A simple (in principle) but wide-ranging change which should appeal > to our Code Deletion Engineer's. We implement Thread::current() > using a compiler/language-based thread-local variable eg: > > > static __thread Thread *_thr_current; > > inline Thread* Thread::current() { > return _thr_current; > } > > with an appropriate setter of course. By doing this we can > completely remove the platform-specific ThreadLocalStorage > implementations, and the associated os::thread_local_storage* calls, > plus all the uses of ThreadLocalStorage::thread() and > ThreadLocalStorage::get_thread_slow(). This extends the previous > work done on Solaris to implement ThreadLocalStorage::thread() using > compiler-based thread-locals. > > We can also consolidate nearly all the os_cpu versions of > MacroAssembler::get_thread on x86 into one cpu specific one ( a > special variant is still needed for 32-bit Windows). > > As a result of this change we have further potential cleanups: > - all the src/os//vm/thread_.inline.hpp files are now > completely empty and could also be removed > - the MINIMIZE_RAM_USAGE define (which avoids use of the linux > sp-map "cache" on 32-bit) now has no affect and so could be > completely removed from the build system > > I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, but > could add the removal of the "inline" files to this CR if people > think it worth removing them. > > I have one missing piece on Aarch64 - I need to change > MacroAssembler::get_thread to simply call Thread::current() as on > other platforms, but I don't know how to write that. I would > appreciate it if someone could give me the right code for that. > > I would also appreciate comments/testing by the AIX and PPC64 folk > as well. > > A concern about memory-leaks had previously been raised, but > experiments using simple C code on linux 86 and Solaris showed no > issues. Also note that Aarch64 already uses this kind of thread-local. > > Thanks, > David > > From edward.nevill at gmail.com Tue Nov 3 09:11:57 2015 From: edward.nevill at gmail.com (Edward Nevill) Date: Tue, 03 Nov 2015 09:11:57 +0000 Subject: Project proposal: AArch32 port In-Reply-To: <1446040701.26259.11.camel@gmx.com> References: <1444830648.7802.34.camel@mylittlepony.linaroharston> <5628A5D2.40909@oracle.com> <1445518014.29998.4.camel@mylittlepony.linaroharston> <562A2DBB.9000404@oracle.com> <1445607702.28722.19.camel@mint> <1446040701.26259.11.camel@gmx.com> Message-ID: <1446541917.18905.12.camel@mylittlepony.linaroharston> On Wed, 2015-10-28 at 13:58 +0000, Joseph Joyce wrote: > Hi Ed and Dalibor, > > There's no code in there from other open source projects. As far as I > can remember the only code from elsewhere was an algorithm for > division, which came from > http://www.chiark.greenend.org.uk/~theom/riscos/docs/ultimate/a252div.txt > The code was modified a bit to work for the assembler, the url from > which it came is mentioned in the source (MacroAssembler::divide32). If > this is a problem I can easily replace the code with a call out to a C > (or Ed suggested using the algorithm from the ARM32-Microjit). > > I have now signed and sent the OCA (today) and would like to continue > contributing to this project. If I could be added to the committers that > would be great. Hi Joseph, That's great news. I therefore propose Joseph Joyce as an additional committer for the aarch32 project. I have had a look at the divide routine in the template interpreter and the original by Graeme Williams. I do not believe this should be an issue as your implementation of the algorithm is completely different. For a start yours is written in C calling the MacroAssembler methods to generate the code, whereas his is written in some sort of BASIC assembler. So although the algorithm is the same the implementation is different and AIUI it is the implementation that is copyrighted. Dalibor: If you are happy with this may I proceed to a CFV for the aarch32 project. Thanks, Ed. From dalibor.topic at oracle.com Tue Nov 3 13:57:18 2015 From: dalibor.topic at oracle.com (dalibor topic) Date: Tue, 3 Nov 2015 14:57:18 +0100 Subject: Project proposal: AArch32 port In-Reply-To: <1446541917.18905.12.camel@mylittlepony.linaroharston> References: <1444830648.7802.34.camel@mylittlepony.linaroharston> <5628A5D2.40909@oracle.com> <1445518014.29998.4.camel@mylittlepony.linaroharston> <562A2DBB.9000404@oracle.com> <1445607702.28722.19.camel@mint> <1446040701.26259.11.camel@gmx.com> <1446541917.18905.12.camel@mylittlepony.linaroharston> Message-ID: <5638BD3E.5040209@oracle.com> Thanks, Edward & Joseph - no objections from me on proceeding to the next step. I'll take a look at the incoming OCA queue to see where we are with processing Joseph's OCA, and let you both know once it's processed. cheers, dalibor topic On 03.11.2015 10:11, Edward Nevill wrote: > On Wed, 2015-10-28 at 13:58 +0000, Joseph Joyce wrote: >> Hi Ed and Dalibor, >> >> There's no code in there from other open source projects. As far as I >> can remember the only code from elsewhere was an algorithm for >> division, which came from >> http://www.chiark.greenend.org.uk/~theom/riscos/docs/ultimate/a252div.txt >> The code was modified a bit to work for the assembler, the url from >> which it came is mentioned in the source (MacroAssembler::divide32). If >> this is a problem I can easily replace the code with a call out to a C >> (or Ed suggested using the algorithm from the ARM32-Microjit). >> >> I have now signed and sent the OCA (today) and would like to continue >> contributing to this project. If I could be added to the committers that >> would be great. > > Hi Joseph, > > That's great news. I therefore propose Joseph Joyce as an additional > committer for the aarch32 project. > > I have had a look at the divide routine in the template interpreter and > the original by Graeme Williams. I do not believe this should be an > issue as your implementation of the algorithm is completely different. > For a start yours is written in C calling the MacroAssembler methods to > generate the code, whereas his is written in some sort of BASIC > assembler. So although the algorithm is the same the implementation is > different and AIUI it is the implementation that is copyrighted. > > Dalibor: If you are happy with this may I proceed to a CFV for the > aarch32 project. > > Thanks, > Ed. > > -- Dalibor Topic | Principal Product Manager Phone: +494089091214 | Mobile: +491737185961 ORACLE Deutschland B.V. & Co. KG | K?hneh?fe 5 | 22761 Hamburg ORACLE Deutschland B.V. & Co. KG Hauptverwaltung: Riesstr. 25, D-80992 M?nchen Registergericht: Amtsgericht M?nchen, HRA 95603 Komplement?rin: ORACLE Deutschland Verwaltung B.V. Hertogswetering 163/167, 3543 AS Utrecht, Niederlande Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697 Gesch?ftsf?hrer: Alexander van der Ven, Astrid Kepper, Val Maher Oracle is committed to developing practices and products that help protect the environment From thomas.stuefe at gmail.com Tue Nov 3 14:14:29 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 3 Nov 2015 15:14:29 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <5637C100.8040405@oracle.com> References: <56370567.3090801@oracle.com> <5637C100.8040405@oracle.com> Message-ID: Hi David, On Mon, Nov 2, 2015 at 9:01 PM, David Holmes wrote: > Hi Thomas, > > On 2/11/2015 9:20 PM, Thomas St?fe wrote: > >> Hi David, >> >> some small changes are needed to make this build and run on AIX. I >> attached a patch file with the needed additions. >> > > Thanks! > > I also checked Linux ppc64, seems to work fine. > I did not run any extensive tests on AIX, so I cannot say for sure if >> this is stable. We (SAP) also may face some problems later when we port >> this to HP-UX, because there, shared libraries using __thread cannot be >> loaded dynamically. >> > > Ouch! > > So, I admit to some small worries, beside the issue with memory leaks on >> older glibc versions. For me, this feels like something which needs >> tight compiler/thread library support from the OS, so it makes us >> vulnerable to running on older systems (older glibc) or building with >> outdated compilers. Therefore it would be nice to have a simple way to >> re-add the pthread-based TLS implementation if needed. >> > > I can't see how to do that without keeping all the existing layers of code > - even though they would be no-ops on all the platforms that support the > compiler-based TLS. Basically just extend what I did for Solaris to the > other platforms. I took a closer look and I now I worry less. I am confident that in case our old platforms experience problemswith __thread, we can reintroduce TLS without too many changes. Just as a test, I changed the AIX implementation from using __thread back to pthread tls just by changing implementations for Thread::current(), Thread::initialize_thread_current() and Thead::clear_thread_current() in thread.cpp. Works fine as expected. Of course this was just a hack, but if we need to go back to pthread tls for AIX or any platform, I think it can be done in a simpler way than before and still be clean. Not terribly important, but I would prefer if Thread::initialize_thread_current() and Thead::clear_thread_current() were not exposed from Thread at all or at least as private as possible. Thread::initialize_thread_current() is called from the OS layer, but Thead::clear_thread_current() is only called from within thread.cpp itself and could be kept at file scope. > > > Apart from that, I like the patch and think the simplification is good >> and worth the effort. >> > > Even if you can't easily add back the pthread-based TLS if needed? > I think we can, if needed. > It is unfortunate that hotspot may still be shackled to the past that way > - we killed off hotspot-express (in part) to remove those shackles and > allow us to modernize the codebase. > > Thanks, > David > > One question about your changes: Before, Thread::current() would assert instead of returning NULL if called before Thread::initialize_thread_current() or after Thead::clear_thread_current() . Now, we just return NULL. Is this intended? Regards, Thomas > Kind Regards, Thomas >> >> >> >> >> >> >> On Mon, Nov 2, 2015 at 7:40 AM, David Holmes > > wrote: >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8132510 >> >> Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ >> >> A simple (in principle) but wide-ranging change which should appeal >> to our Code Deletion Engineer's. We implement Thread::current() >> using a compiler/language-based thread-local variable eg: >> >> >> static __thread Thread *_thr_current; >> >> inline Thread* Thread::current() { >> return _thr_current; >> } >> >> with an appropriate setter of course. By doing this we can >> completely remove the platform-specific ThreadLocalStorage >> implementations, and the associated os::thread_local_storage* calls, >> plus all the uses of ThreadLocalStorage::thread() and >> ThreadLocalStorage::get_thread_slow(). This extends the previous >> work done on Solaris to implement ThreadLocalStorage::thread() using >> compiler-based thread-locals. >> >> We can also consolidate nearly all the os_cpu versions of >> MacroAssembler::get_thread on x86 into one cpu specific one ( a >> special variant is still needed for 32-bit Windows). >> >> As a result of this change we have further potential cleanups: >> - all the src/os//vm/thread_.inline.hpp files are now >> completely empty and could also be removed >> - the MINIMIZE_RAM_USAGE define (which avoids use of the linux >> sp-map "cache" on 32-bit) now has no affect and so could be >> completely removed from the build system >> >> I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, but >> could add the removal of the "inline" files to this CR if people >> think it worth removing them. >> >> I have one missing piece on Aarch64 - I need to change >> MacroAssembler::get_thread to simply call Thread::current() as on >> other platforms, but I don't know how to write that. I would >> appreciate it if someone could give me the right code for that. >> >> I would also appreciate comments/testing by the AIX and PPC64 folk >> as well. >> >> A concern about memory-leaks had previously been raised, but >> experiments using simple C code on linux 86 and Solaris showed no >> issues. Also note that Aarch64 already uses this kind of thread-local. >> >> Thanks, >> David >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Wed Nov 4 04:08:48 2015 From: david.holmes at oracle.com (David Holmes) Date: Wed, 4 Nov 2015 14:08:48 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <5637C100.8040405@oracle.com> Message-ID: <563984D0.8030501@oracle.com> Hi Thomas, On 4/11/2015 12:14 AM, Thomas St?fe wrote: > On Mon, Nov 2, 2015 at 9:01 PM, David Holmes > wrote: > On 2/11/2015 9:20 PM, Thomas St?fe wrote: > some small changes are needed to make this build and run on AIX. I > attached a patch file with the needed additions. > > I also checked Linux ppc64, seems to work fine. Excellent! Thank you! > I did not run any extensive tests on AIX, so I cannot say for > sure if > this is stable. We (SAP) also may face some problems later when > we port > this to HP-UX, because there, shared libraries using __thread > cannot be loaded dynamically. > > Ouch! > > So, I admit to some small worries, beside the issue with memory > leaks on > older glibc versions. For me, this feels like something which needs > tight compiler/thread library support from the OS, so it makes us > vulnerable to running on older systems (older glibc) or building > with > outdated compilers. Therefore it would be nice to have a simple > way to re-add the pthread-based TLS implementation if needed. > > I can't see how to do that without keeping all the existing layers > of code - even though they would be no-ops on all the platforms that > support the compiler-based TLS. Basically just extend what I did for > Solaris to the other platforms. > > I took a closer look and I now I worry less. I am confident that in case > our old platforms experience problemswith __thread, we can reintroduce > TLS without too many changes. > > Just as a test, I changed the AIX implementation from using __thread > back to pthread tls just by changing implementations for > Thread::current(), Thread::initialize_thread_current() and > Thead::clear_thread_current() in thread.cpp. Works fine as expected. Of > course this was just a hack, but if we need to go back to pthread tls > for AIX or any platform, I think it can be done in a simpler way than > before and still be clean. Thanks for looking into this in detail! Yes I've been thinking about this too, and I think three of four simple hooks will allow the basic pthread-TLS mechanism to be reinstated, in shared code, but without any of the per-platform fancy caching schemes. There will be a single threadLocalStorage.cpp file in a platform specific directory; and of course MacroAssembler::get_thread may need to be os/cpu specific. I will look further into this, but may defer its implementation to a follow up issue. > Not terribly important, but I would prefer if > Thread::initialize_thread_current() and Thead::clear_thread_current() > were not exposed from Thread at all or at least as private as possible. > Thread::initialize_thread_current() is called from the OS layer, but > Thead::clear_thread_current() is only called from within thread.cpp > itself and could be kept at file scope. As you note the initialize function has to be exposed as it is needed in the OS thread startup code. But I can make the clear function private. > Apart from that, I like the patch and think the simplification > is good and worth the effort. > > Even if you can't easily add back the pthread-based TLS if needed? > > I think we can, if needed. Ok. > It is unfortunate that hotspot may still be shackled to the past > that way - we killed off hotspot-express (in part) to remove those > shackles and allow us to modernize the codebase. > > Thanks, > David > > > One question about your changes: > > Before, Thread::current() would assert instead of returning NULL if > called before Thread::initialize_thread_current() or after > Thead::clear_thread_current() . Now, we just return NULL. Is this intended? Ah great question ... so before we have a mix of calls to: - Thread::current() (asserts on NULL as does JavaThread::current) - ThreadLocalStorage::thread() (can return NULL) - ThreadLocalStorage::get_thread_slow() (can return NULL) and now we only have Thread::current() which means we have to allow returning NULL because it can be intentionally called when a thread is not attached. That means we won't directly catch calls to Thread::current() from code that doesn't realize it is calling it "too soon" - though there do exist numerous assertions in the callers of Thread::current() that check the result is not NULL. I could add the assert to Thread::current() and also add Thread::current_or_null() to be used by code that needs to use it to check for attachment (ie JNI code). I'd also have to examine all the changed ThreadLocalStorage::thread/get_thread_slow call-sites to see if any of those legitimately expect the thread may not be attached. What do you think? I also need to look at the location of Thread::current in the .hpp file rather than .inline.hpp and reconcile that with comments regarding the noinline version (which is only used in g1HotCardCache.hpp). Thanks, David > Regards, Thomas > > Kind Regards, Thomas > > > > > > > On Mon, Nov 2, 2015 at 7:40 AM, David Holmes > > >> wrote: > > bug: https://bugs.openjdk.java.net/browse/JDK-8132510 > > Open webrev: > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ > > A simple (in principle) but wide-ranging change which > should appeal > to our Code Deletion Engineer's. We implement Thread::current() > using a compiler/language-based thread-local variable eg: > > > static __thread Thread *_thr_current; > > inline Thread* Thread::current() { > return _thr_current; > } > > with an appropriate setter of course. By doing this we can > completely remove the platform-specific ThreadLocalStorage > implementations, and the associated > os::thread_local_storage* calls, > plus all the uses of ThreadLocalStorage::thread() and > ThreadLocalStorage::get_thread_slow(). This extends the > previous > work done on Solaris to implement > ThreadLocalStorage::thread() using > compiler-based thread-locals. > > We can also consolidate nearly all the os_cpu versions of > MacroAssembler::get_thread on x86 into one cpu specific one ( a > special variant is still needed for 32-bit Windows). > > As a result of this change we have further potential cleanups: > - all the src/os//vm/thread_.inline.hpp files are now > completely empty and could also be removed > - the MINIMIZE_RAM_USAGE define (which avoids use of the linux > sp-map "cache" on 32-bit) now has no affect and so could be > completely removed from the build system > > I plan to do the MINIMIZE_RAM_USAGE removal as a follow up > CR, but > could add the removal of the "inline" files to this CR if > people > think it worth removing them. > > I have one missing piece on Aarch64 - I need to change > MacroAssembler::get_thread to simply call Thread::current() > as on > other platforms, but I don't know how to write that. I would > appreciate it if someone could give me the right code for that. > > I would also appreciate comments/testing by the AIX and > PPC64 folk > as well. > > A concern about memory-leaks had previously been raised, but > experiments using simple C code on linux 86 and Solaris > showed no > issues. Also note that Aarch64 already uses this kind of > thread-local. > > Thanks, > David > > > From thomas.stuefe at gmail.com Wed Nov 4 08:17:29 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 4 Nov 2015 09:17:29 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <563984D0.8030501@oracle.com> References: <56370567.3090801@oracle.com> <5637C100.8040405@oracle.com> <563984D0.8030501@oracle.com> Message-ID: Hi David, On Wed, Nov 4, 2015 at 5:08 AM, David Holmes wrote: > Hi Thomas, > > >> One question about your changes: >> >> Before, Thread::current() would assert instead of returning NULL if >> called before Thread::initialize_thread_current() or after >> Thead::clear_thread_current() . Now, we just return NULL. Is this >> intended? >> > > Ah great question ... so before we have a mix of calls to: > > - Thread::current() (asserts on NULL as does JavaThread::current) > - ThreadLocalStorage::thread() (can return NULL) > - ThreadLocalStorage::get_thread_slow() (can return NULL) > > and now we only have Thread::current() which means we have to allow > returning NULL because it can be intentionally called when a thread is not > attached. That means we won't directly catch calls to Thread::current() > from code that doesn't realize it is calling it "too soon" - though there > do exist numerous assertions in the callers of Thread::current() that check > the result is not NULL. > > I could add the assert to Thread::current() and also add > Thread::current_or_null() to be used by code that needs to use it to check > for attachment (ie JNI code). I'd also have to examine all the changed > ThreadLocalStorage::thread/get_thread_slow call-sites to see if any of > those legitimately expect the thread may not be attached. > > What do you think? > > I would prefer having Thread::current() to assert and to have a Thread::current_or_null() for cases where NULL could occurr. I tend to hit that assert a lot in development, it is useful. And the non-asserting version gets already used in a number of places, also in our (not OpenJDK) coding. > I also need to look at the location of Thread::current in the .hpp file > rather than .inline.hpp and reconcile that with comments regarding the > noinline version (which is only used in g1HotCardCache.hpp). > > Could we leave just the inline version in thread.hpp and remove the noinline version altogether? Now that Thread::current() is very simple, we may just as well keep it in the class body like the other accessors. Thanks, Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Wed Nov 4 09:26:16 2015 From: david.holmes at oracle.com (David Holmes) Date: Wed, 4 Nov 2015 19:26:16 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <5637C100.8040405@oracle.com> <563984D0.8030501@oracle.com> Message-ID: <5639CF38.5060800@oracle.com> On 4/11/2015 6:17 PM, Thomas St?fe wrote: > Hi David, > > On Wed, Nov 4, 2015 at 5:08 AM, David Holmes > wrote: > > Hi Thomas, > > > One question about your changes: > > Before, Thread::current() would assert instead of returning NULL if > called before Thread::initialize_thread_current() or after > Thead::clear_thread_current() . Now, we just return NULL. Is > this intended? > > > Ah great question ... so before we have a mix of calls to: > > - Thread::current() (asserts on NULL as does JavaThread::current) > - ThreadLocalStorage::thread() (can return NULL) > - ThreadLocalStorage::get_thread_slow() (can return NULL) > > and now we only have Thread::current() which means we have to allow > returning NULL because it can be intentionally called when a thread > is not attached. That means we won't directly catch calls to > Thread::current() from code that doesn't realize it is calling it > "too soon" - though there do exist numerous assertions in the > callers of Thread::current() that check the result is not NULL. > > I could add the assert to Thread::current() and also add > Thread::current_or_null() to be used by code that needs to use it to > check for attachment (ie JNI code). I'd also have to examine all the > changed ThreadLocalStorage::thread/get_thread_slow call-sites to see > if any of those legitimately expect the thread may not be attached. > > What do you think? > > > I would prefer having Thread::current() to assert and to have a > Thread::current_or_null() for cases where NULL could occurr. I tend to > hit that assert a lot in development, it is useful. And the > non-asserting version gets already used in a number of places, also in > our (not OpenJDK) coding. Yes I agree. Most of the TLS::thread() and TLS::get_thread_slow() should actually call Thread::current_or_null(). I also found a couple of existing Thread::current()'s that should be current_or_null(). :) > I also need to look at the location of Thread::current in the .hpp > file rather than .inline.hpp and reconcile that with comments > regarding the noinline version (which is only used in > g1HotCardCache.hpp). > > > Could we leave just the inline version in thread.hpp and remove the > noinline version altogether? Now that Thread::current() is very simple, > we may just as well keep it in the class body like the other accessors. I'll see if the g1 code can tolerate that. I'll update a prepare a new webrev tomorrow. Thanks, David > Thanks, Thomas > > From david.holmes at oracle.com Thu Nov 5 04:36:32 2015 From: david.holmes at oracle.com (David Holmes) Date: Thu, 5 Nov 2015 14:36:32 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <5639CF38.5060800@oracle.com> References: <56370567.3090801@oracle.com> <5637C100.8040405@oracle.com> <563984D0.8030501@oracle.com> <5639CF38.5060800@oracle.com> Message-ID: <563ADCD0.1070800@oracle.com> Here's updated webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v3/ Changes since v2: - include Thomas's AIX fixes - Add assertion for not-NULL into Thread::current() - Add Thread::current_or_null() for when NULL can be expected, or for where failing an assert would cause more problems (eg crash reporting). Most uses of ThreadLocalStorage::thread()/get_thread_slow() now call current_or_null(). - Removed Thread::current_noinline() (it was only used in an assert in some G1 code, so the inline-or-not seems irrelevant) - Made Thread::clear_thread_current() private I'm debating whether the get_thread implementations should call Thread::current() or Thread::current_or_null(). We should never get NULL but seems unnecessary overhead to check that with an assert in this code. Opinions welcomed. I still need some assistance from Aarch64 folk to write their get_thread function please! I still have footprint and performance measurements to make before proposing formal RFR. I also am still to determine whether to include the ability to hook in a pthread_ based implementation instead. Thanks, David On 4/11/2015 7:26 PM, David Holmes wrote: > On 4/11/2015 6:17 PM, Thomas St?fe wrote: >> Hi David, >> >> On Wed, Nov 4, 2015 at 5:08 AM, David Holmes > > wrote: >> >> Hi Thomas, >> >> >> One question about your changes: >> >> Before, Thread::current() would assert instead of returning >> NULL if >> called before Thread::initialize_thread_current() or after >> Thead::clear_thread_current() . Now, we just return NULL. Is >> this intended? >> >> >> Ah great question ... so before we have a mix of calls to: >> >> - Thread::current() (asserts on NULL as does JavaThread::current) >> - ThreadLocalStorage::thread() (can return NULL) >> - ThreadLocalStorage::get_thread_slow() (can return NULL) >> >> and now we only have Thread::current() which means we have to allow >> returning NULL because it can be intentionally called when a thread >> is not attached. That means we won't directly catch calls to >> Thread::current() from code that doesn't realize it is calling it >> "too soon" - though there do exist numerous assertions in the >> callers of Thread::current() that check the result is not NULL. >> >> I could add the assert to Thread::current() and also add >> Thread::current_or_null() to be used by code that needs to use it to >> check for attachment (ie JNI code). I'd also have to examine all the >> changed ThreadLocalStorage::thread/get_thread_slow call-sites to see >> if any of those legitimately expect the thread may not be attached. >> >> What do you think? >> >> >> I would prefer having Thread::current() to assert and to have a >> Thread::current_or_null() for cases where NULL could occurr. I tend to >> hit that assert a lot in development, it is useful. And the >> non-asserting version gets already used in a number of places, also in >> our (not OpenJDK) coding. > > Yes I agree. Most of the TLS::thread() and TLS::get_thread_slow() should > actually call Thread::current_or_null(). I also found a couple of > existing Thread::current()'s that should be current_or_null(). :) > >> I also need to look at the location of Thread::current in the .hpp >> file rather than .inline.hpp and reconcile that with comments >> regarding the noinline version (which is only used in >> g1HotCardCache.hpp). >> >> >> Could we leave just the inline version in thread.hpp and remove the >> noinline version altogether? Now that Thread::current() is very simple, >> we may just as well keep it in the class body like the other accessors. > > I'll see if the g1 code can tolerate that. > > I'll update a prepare a new webrev tomorrow. > > Thanks, > David > >> Thanks, Thomas >> >> From thomas.stuefe at gmail.com Thu Nov 5 10:18:00 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 5 Nov 2015 11:18:00 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <563ADCD0.1070800@oracle.com> References: <56370567.3090801@oracle.com> <5637C100.8040405@oracle.com> <563984D0.8030501@oracle.com> <5639CF38.5060800@oracle.com> <563ADCD0.1070800@oracle.com> Message-ID: Hi David, looks good and works fine on AIX and Linux Power. We could now get rid of the thread__inline.hpp files too, right? Kind Regards, Thomas On Thu, Nov 5, 2015 at 5:36 AM, David Holmes wrote: > Here's updated webrev: > > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v3/ > > Changes since v2: > > - include Thomas's AIX fixes > - Add assertion for not-NULL into Thread::current() > - Add Thread::current_or_null() for when NULL can be expected, or for > where failing an assert would cause more problems (eg crash reporting). > Most uses of ThreadLocalStorage::thread()/get_thread_slow() now call > current_or_null(). > - Removed Thread::current_noinline() (it was only used in an assert in > some G1 code, so the inline-or-not seems irrelevant) > - Made Thread::clear_thread_current() private > > I'm debating whether the get_thread implementations should call > Thread::current() or Thread::current_or_null(). We should never get NULL > but seems unnecessary overhead to check that with an assert in this code. > Opinions welcomed. > > I still need some assistance from Aarch64 folk to write their get_thread > function please! > > I still have footprint and performance measurements to make before > proposing formal RFR. > > I also am still to determine whether to include the ability to hook in a > pthread_ based implementation instead. > > Thanks, > David > > > On 4/11/2015 7:26 PM, David Holmes wrote: > >> On 4/11/2015 6:17 PM, Thomas St?fe wrote: >> >>> Hi David, >>> >>> On Wed, Nov 4, 2015 at 5:08 AM, David Holmes >> > wrote: >>> >>> Hi Thomas, >>> >>> >>> One question about your changes: >>> >>> Before, Thread::current() would assert instead of returning >>> NULL if >>> called before Thread::initialize_thread_current() or after >>> Thead::clear_thread_current() . Now, we just return NULL. Is >>> this intended? >>> >>> >>> Ah great question ... so before we have a mix of calls to: >>> >>> - Thread::current() (asserts on NULL as does JavaThread::current) >>> - ThreadLocalStorage::thread() (can return NULL) >>> - ThreadLocalStorage::get_thread_slow() (can return NULL) >>> >>> and now we only have Thread::current() which means we have to allow >>> returning NULL because it can be intentionally called when a thread >>> is not attached. That means we won't directly catch calls to >>> Thread::current() from code that doesn't realize it is calling it >>> "too soon" - though there do exist numerous assertions in the >>> callers of Thread::current() that check the result is not NULL. >>> >>> I could add the assert to Thread::current() and also add >>> Thread::current_or_null() to be used by code that needs to use it to >>> check for attachment (ie JNI code). I'd also have to examine all the >>> changed ThreadLocalStorage::thread/get_thread_slow call-sites to see >>> if any of those legitimately expect the thread may not be attached. >>> >>> What do you think? >>> >>> >>> I would prefer having Thread::current() to assert and to have a >>> Thread::current_or_null() for cases where NULL could occurr. I tend to >>> hit that assert a lot in development, it is useful. And the >>> non-asserting version gets already used in a number of places, also in >>> our (not OpenJDK) coding. >>> >> >> Yes I agree. Most of the TLS::thread() and TLS::get_thread_slow() should >> actually call Thread::current_or_null(). I also found a couple of >> existing Thread::current()'s that should be current_or_null(). :) >> >> I also need to look at the location of Thread::current in the .hpp >>> file rather than .inline.hpp and reconcile that with comments >>> regarding the noinline version (which is only used in >>> g1HotCardCache.hpp). >>> >>> >>> Could we leave just the inline version in thread.hpp and remove the >>> noinline version altogether? Now that Thread::current() is very simple, >>> we may just as well keep it in the class body like the other accessors. >>> >> >> I'll see if the g1 code can tolerate that. >> >> I'll update a prepare a new webrev tomorrow. >> >> Thanks, >> David >> >> Thanks, Thomas >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Thu Nov 5 22:58:51 2015 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Nov 2015 08:58:51 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <5637C100.8040405@oracle.com> <563984D0.8030501@oracle.com> <5639CF38.5060800@oracle.com> <563ADCD0.1070800@oracle.com> Message-ID: <563BDF2B.1090506@oracle.com> On 5/11/2015 8:18 PM, Thomas St?fe wrote: > Hi David, > > looks good and works fine on AIX and Linux Power. > > We could now get rid of the thread__inline.hpp files too, right? Right. I was waiting to see if anyone would comment on that ("leave them in just in case we need some os-specific thread stuff ..."). But I will go ahead and remove them before the official RFR. Thanks, David > Kind Regards, Thomas > > > > On Thu, Nov 5, 2015 at 5:36 AM, David Holmes > wrote: > > Here's updated webrev: > > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v3/ > > Changes since v2: > > - include Thomas's AIX fixes > - Add assertion for not-NULL into Thread::current() > - Add Thread::current_or_null() for when NULL can be expected, or > for where failing an assert would cause more problems (eg crash > reporting). Most uses of > ThreadLocalStorage::thread()/get_thread_slow() now call > current_or_null(). > - Removed Thread::current_noinline() (it was only used in an assert > in some G1 code, so the inline-or-not seems irrelevant) > - Made Thread::clear_thread_current() private > > I'm debating whether the get_thread implementations should call > Thread::current() or Thread::current_or_null(). We should never get > NULL but seems unnecessary overhead to check that with an assert in > this code. Opinions welcomed. > > I still need some assistance from Aarch64 folk to write their > get_thread function please! > > I still have footprint and performance measurements to make before > proposing formal RFR. > > I also am still to determine whether to include the ability to hook > in a pthread_ based implementation instead. > > Thanks, > David > > > On 4/11/2015 7:26 PM, David Holmes wrote: > > On 4/11/2015 6:17 PM, Thomas St?fe wrote: > > Hi David, > > On Wed, Nov 4, 2015 at 5:08 AM, David Holmes > > >> wrote: > > Hi Thomas, > > > One question about your changes: > > Before, Thread::current() would assert instead of > returning > NULL if > called before Thread::initialize_thread_current() > or after > Thead::clear_thread_current() . Now, we just return > NULL. Is > this intended? > > > Ah great question ... so before we have a mix of calls to: > > - Thread::current() (asserts on NULL as does > JavaThread::current) > - ThreadLocalStorage::thread() (can return NULL) > - ThreadLocalStorage::get_thread_slow() (can return NULL) > > and now we only have Thread::current() which means we > have to allow > returning NULL because it can be intentionally called > when a thread > is not attached. That means we won't directly catch > calls to > Thread::current() from code that doesn't realize it is > calling it > "too soon" - though there do exist numerous assertions > in the > callers of Thread::current() that check the result is > not NULL. > > I could add the assert to Thread::current() and also add > Thread::current_or_null() to be used by code that needs > to use it to > check for attachment (ie JNI code). I'd also have to > examine all the > changed ThreadLocalStorage::thread/get_thread_slow > call-sites to see > if any of those legitimately expect the thread may not > be attached. > > What do you think? > > > I would prefer having Thread::current() to assert and to have a > Thread::current_or_null() for cases where NULL could occurr. > I tend to > hit that assert a lot in development, it is useful. And the > non-asserting version gets already used in a number of > places, also in > our (not OpenJDK) coding. > > > Yes I agree. Most of the TLS::thread() and > TLS::get_thread_slow() should > actually call Thread::current_or_null(). I also found a couple of > existing Thread::current()'s that should be current_or_null(). :) > > I also need to look at the location of Thread::current > in the .hpp > file rather than .inline.hpp and reconcile that with > comments > regarding the noinline version (which is only used in > g1HotCardCache.hpp). > > > Could we leave just the inline version in thread.hpp and > remove the > noinline version altogether? Now that Thread::current() is > very simple, > we may just as well keep it in the class body like the other > accessors. > > > I'll see if the g1 code can tolerate that. > > I'll update a prepare a new webrev tomorrow. > > Thanks, > David > > Thanks, Thomas > > > From david.holmes at oracle.com Fri Nov 6 03:09:03 2015 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Nov 2015 13:09:03 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> Message-ID: <563C19CF.30001@oracle.com> Hi Jeremy, I was going to ask you to elaborate :) On 6/11/2015 12:24 PM, Jeremy Manson wrote: > I should probably elaborate on this. With glibc + ELF, the first time a > thread accesses a variable declared __thread, if that variable is in a > shared library (as opposed to the main executable), the system calls > malloc() to allocate the space for it. If that happens in a signal that > is being delivered during a call to malloc(), then you usually get a crash. My understanding of the ELF ABI for thread-locals - which I read about in the Solaris 11.1 Linkers and libraries guide - does require use of the dynamic TLS model for any dynamically loaded shared object which defines a thread-local, but that is what we use as I understand it. The docs state: "A shared object containing only dynamic TLS can be loaded following process startup without limitations. The runtime linker extends the list of initialization records to include the initialization template of the new object. The new object is given an index of m = M + 1. The counter M is incremented by 1. However, the allocation of new TLS blocks is deferred until the blocks are actually referenced." Now I guess "extends the list" might be implemented using malloc ... but this will only occur in the main thread (the one started by the launcher to load the JVM and become the main thread), at the time libjvm is loaded - which will all be over before any agent etc can run and do anything. But "allocation ... is deferred" suggests we may have a problem until either the first call to Thread::current or the call to Thread::initialize_thread_current. If it is the former then that should occur well before any agent etc can be loaded. And I can easily inject an initial dummy call to initialize_thread_current(null) to force the TLS allocation. > This may bite you if AsyncGetCallTrace uses Thread::current(), and you > use system timers to do profiling. If a thread doing a malloc() prior > to the first time it accesses Thread::current(), and it gets delivered a > signal, it might die. This is especially likely for pure native threads > started by native code. > > I believe that this is a use case you support, so you might want to make > sure it is okay. For a VM embedded in a process, which already contains native threads, that will later attach to the VM, this may indeed be a problem. One would have hoped however that the implementation of TLS would be completely robust, at least for something as simple as getting a signal whilst in the allocator. I'm unclear how to test for or check for this kind of problem. Arguably there could be many things that are async-unsafe in this way. Need to think more about this and do some more research. Would also appreciate any insight from any glibc and/or ELF gurus. Thanks. David > Jeremy > > On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson > wrote: > > Something that's bitten me with __thread: it isn't async-safe when > called from a shared object on Linux. Have you vetted to make sure > this doesn't make HS less async-safe? > > Jeremy > > On Sun, Nov 1, 2015 at 10:40 PM, David Holmes > > wrote: > > bug: https://bugs.openjdk.java.net/browse/JDK-8132510 > > Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ > > A simple (in principle) but wide-ranging change which should > appeal to our Code Deletion Engineer's. We implement > Thread::current() using a compiler/language-based thread-local > variable eg: > > > static __thread Thread *_thr_current; > > inline Thread* Thread::current() { > return _thr_current; > } > > with an appropriate setter of course. By doing this we can > completely remove the platform-specific ThreadLocalStorage > implementations, and the associated os::thread_local_storage* > calls, plus all the uses of ThreadLocalStorage::thread() and > ThreadLocalStorage::get_thread_slow(). This extends the previous > work done on Solaris to implement ThreadLocalStorage::thread() > using compiler-based thread-locals. > > We can also consolidate nearly all the os_cpu versions of > MacroAssembler::get_thread on x86 into one cpu specific one ( a > special variant is still needed for 32-bit Windows). > > As a result of this change we have further potential cleanups: > - all the src/os//vm/thread_.inline.hpp files are now > completely empty and could also be removed > - the MINIMIZE_RAM_USAGE define (which avoids use of the linux > sp-map "cache" on 32-bit) now has no affect and so could be > completely removed from the build system > > I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, > but could add the removal of the "inline" files to this CR if > people think it worth removing them. > > I have one missing piece on Aarch64 - I need to change > MacroAssembler::get_thread to simply call Thread::current() as > on other platforms, but I don't know how to write that. I would > appreciate it if someone could give me the right code for that. > > I would also appreciate comments/testing by the AIX and PPC64 > folk as well. > > A concern about memory-leaks had previously been raised, but > experiments using simple C code on linux 86 and Solaris showed > no issues. Also note that Aarch64 already uses this kind of > thread-local. > > Thanks, > David > > > From david.holmes at oracle.com Fri Nov 6 06:26:44 2015 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Nov 2015 16:26:44 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <563C19CF.30001@oracle.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> Message-ID: <563C4824.7040300@oracle.com> Hi Jeremy, Okay I have read: https://sourceware.org/glibc/wiki/TLSandSignals and the tree of mail postings referenced therefrom - great reading! :) So basic problem: access to __thread variables is not async-signal-safe Exacerbator to problem: first attempt to even read a __thread variable can lead to allocation which is the real problem in relation to async-signal-safety I mention the exacerbator because pthread_getspecific and pthread_setSpecific are also not async-signal-safe but we already use them. However, pthread_getspecific is in fact (per email threads linked above) effectively async-signal-safe, and further a call to pthread_getspecific never results in a call to pthread_setspecific or an allocation. Hence the pthread functions are almost, if not completely, safe in practice with reasonable uses (ie only read from signal handler). Which explain this code in existing Thread::current() #ifdef PARANOID // Signal handler should call ThreadLocalStorage::get_thread_slow() Thread* t = ThreadLocalStorage::get_thread_slow(); assert(t != NULL && !t->is_inside_signal_handler(), "Don't use Thread::current() inside signal handler"); #endif So problem scenario is: use of __thread variable (that belongs to the shared-library) in a signal handler. Solution 0: don't do that. Seriously - like any other async-signal-unsafe stuff we should not be using it in real signal handlers. The crash handler is a different matter - we try all sorts there because it might work and you can't die twice. Otherwise: narrow the window of exposure. 1. We ensure we initialize thread_current (even with a dummy value) as early as possible in the thread that loads libjvm. As we have no signal handlers installed at that point that might use the same variable, we can not hit the problem scenario. 2. We ensure we initialize thread_current in a new thread with all signals blocked. This again avoids the problem scenario. 3. We initialize thread_current in an attaching thread as soon as possible and we again first block all signals. That still leaves the problem of an unattached native thread taking a signal whilst in async-signal-unsafe code, and executing a signal handler which in turns tries to access thread_current for the first time. This signal handler need not be an actual JVM handler, but one attached by other native code eg an agent. I'm not clear in the latter case how reasonable it is for an agent's handler to try and do things from an unattached thread - and we don't claim any JNI interfaces can, or should, be called from a signal handler - but it is something you can probably get away with today. Let me also point out that we already effectively have this code in Solaris already (at the ThreadLocalStorage class level). So if there is something here that will prevent the current proposal we already have a problem on Solaris. :( Thoughts/comments/suggestions? Thanks, David On 6/11/2015 1:09 PM, David Holmes wrote: > Hi Jeremy, > > I was going to ask you to elaborate :) > > On 6/11/2015 12:24 PM, Jeremy Manson wrote: >> I should probably elaborate on this. With glibc + ELF, the first time a >> thread accesses a variable declared __thread, if that variable is in a >> shared library (as opposed to the main executable), the system calls >> malloc() to allocate the space for it. If that happens in a signal that >> is being delivered during a call to malloc(), then you usually get a >> crash. > > My understanding of the ELF ABI for thread-locals - which I read about > in the Solaris 11.1 Linkers and libraries guide - does require use of > the dynamic TLS model for any dynamically loaded shared object which > defines a thread-local, but that is what we use as I understand it. The > docs state: > > "A shared object containing only dynamic TLS can be loaded following > process startup without limitations. The runtime linker extends the list > of initialization records to include the initialization template of the > new object. The new object is given an index of m = M + 1. The > counter M is incremented by 1. However, the allocation of new TLS blocks > is deferred until the blocks are actually referenced." > > Now I guess "extends the list" might be implemented using malloc ... but > this will only occur in the main thread (the one started by the launcher > to load the JVM and become the main thread), at the time libjvm is > loaded - which will all be over before any agent etc can run and do > anything. But "allocation ... is deferred" suggests we may have a > problem until either the first call to Thread::current or the call to > Thread::initialize_thread_current. If it is the former then that should > occur well before any agent etc can be loaded. And I can easily inject > an initial dummy call to initialize_thread_current(null) to force the > TLS allocation. > >> This may bite you if AsyncGetCallTrace uses Thread::current(), and you >> use system timers to do profiling. If a thread doing a malloc() prior >> to the first time it accesses Thread::current(), and it gets delivered a >> signal, it might die. This is especially likely for pure native threads >> started by native code. >> >> I believe that this is a use case you support, so you might want to make >> sure it is okay. > > For a VM embedded in a process, which already contains native threads, > that will later attach to the VM, this may indeed be a problem. One > would have hoped however that the implementation of TLS would be > completely robust, at least for something as simple as getting a signal > whilst in the allocator. > > I'm unclear how to test for or check for this kind of problem. Arguably > there could be many things that are async-unsafe in this way. > > Need to think more about this and do some more research. Would also > appreciate any insight from any glibc and/or ELF gurus. > > Thanks. > David > >> Jeremy >> >> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson > > wrote: >> >> Something that's bitten me with __thread: it isn't async-safe when >> called from a shared object on Linux. Have you vetted to make sure >> this doesn't make HS less async-safe? >> >> Jeremy >> >> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes >> > wrote: >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8132510 >> >> Open webrev: >> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ >> >> A simple (in principle) but wide-ranging change which should >> appeal to our Code Deletion Engineer's. We implement >> Thread::current() using a compiler/language-based thread-local >> variable eg: >> >> >> static __thread Thread *_thr_current; >> >> inline Thread* Thread::current() { >> return _thr_current; >> } >> >> with an appropriate setter of course. By doing this we can >> completely remove the platform-specific ThreadLocalStorage >> implementations, and the associated os::thread_local_storage* >> calls, plus all the uses of ThreadLocalStorage::thread() and >> ThreadLocalStorage::get_thread_slow(). This extends the previous >> work done on Solaris to implement ThreadLocalStorage::thread() >> using compiler-based thread-locals. >> >> We can also consolidate nearly all the os_cpu versions of >> MacroAssembler::get_thread on x86 into one cpu specific one ( a >> special variant is still needed for 32-bit Windows). >> >> As a result of this change we have further potential cleanups: >> - all the src/os//vm/thread_.inline.hpp files are now >> completely empty and could also be removed >> - the MINIMIZE_RAM_USAGE define (which avoids use of the linux >> sp-map "cache" on 32-bit) now has no affect and so could be >> completely removed from the build system >> >> I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, >> but could add the removal of the "inline" files to this CR if >> people think it worth removing them. >> >> I have one missing piece on Aarch64 - I need to change >> MacroAssembler::get_thread to simply call Thread::current() as >> on other platforms, but I don't know how to write that. I would >> appreciate it if someone could give me the right code for that. >> >> I would also appreciate comments/testing by the AIX and PPC64 >> folk as well. >> >> A concern about memory-leaks had previously been raised, but >> experiments using simple C code on linux 86 and Solaris showed >> no issues. Also note that Aarch64 already uses this kind of >> thread-local. >> >> Thanks, >> David >> >> >> From david.holmes at oracle.com Fri Nov 6 07:48:43 2015 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Nov 2015 17:48:43 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> Message-ID: <563C5B5B.9060802@oracle.com> On 6/11/2015 4:55 PM, Jeremy Manson wrote: > FWIW, Google tried to convince the glibc maintainers to make this > async-safe, but they weren't biting: > > https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html Yes I read all that. I wouldn't say they weren't biting, more of a disagreement on the right direction for the patch. glibc weren't prepared to take it directly as is, while google-folk didn't seem to think it worth their while to do whatever glibc folk wanted. The actual patch proposal just died out. :( Quite a pity. > Most of the things you can do are going to be mitigation rather than a > fix. I did what you suggest to mitigate, and no one complained, until > someone at Google started a sidecar C++ thread that did a boatload of > malloc'ing. Yes all mitigation. :( > My workaround was far sneakier, and fixes the problem entirely, but you > probably won't want to do it for code hygiene reasons. I declare the > __thread variable in the java launcher itself, and then export a > function that provides a pointer to it. In practice, in glibc, if it is > in the main executable, ELF is smart enough to declare it as part of the > stack, and is therefore async-safe. But even that only works for our own launchers - not for embedded in the JVM. :( > Fortunately, while this is a fun thing to think about, I don't think > there are any async paths in the JVM that use Thread::current() > (although I could be wrong - there is a comment in there saying not to > call Thread::current() in a signal handler...). I would check the call > paths in AsyncGetCallTrace to make sure. So two things ... First, using Thread::current() in a signal context was disallowed, but the alternative was ThreadLocalStorage::get_thread_slow(). The former may not work in a signal context due to the caching mechanisms layered in on different platforms, while the latter used the platform TLS API which, even if not guaranteed, worked well enough in a signal context. With __thread we don't have even a pretend signal-safe alternative :( Second, AsyncGetCallTrace violates the first rule by using JavaThread::current() in an assertion. Also, the problem may not be limited to something like AsyncGetCallTrace. Though agents may get the current thread from the JNIEnv rather than invoking some JVM function that results in Thread::current() being used, I can't be sure of that. Anyway more things to mull over on the weekedn. :) Thanks, David > Jeremy > > On Thu, Nov 5, 2015 at 10:26 PM, David Holmes > wrote: > > Hi Jeremy, > > Okay I have read: > > https://sourceware.org/glibc/wiki/TLSandSignals > > and the tree of mail postings referenced therefrom - great reading! :) > > So basic problem: access to __thread variables is not async-signal-safe > > Exacerbator to problem: first attempt to even read a __thread > variable can lead to allocation which is the real problem in > relation to async-signal-safety > > I mention the exacerbator because pthread_getspecific and > pthread_setSpecific are also not async-signal-safe but we already > use them. However, pthread_getspecific is in fact (per email threads > linked above) effectively async-signal-safe, and further a call to > pthread_getspecific never results in a call to pthread_setspecific > or an allocation. Hence the pthread functions are almost, if not > completely, safe in practice with reasonable uses (ie only read from > signal handler). Which explain this code in existing Thread::current() > > #ifdef PARANOID > // Signal handler should call ThreadLocalStorage::get_thread_slow() > Thread* t = ThreadLocalStorage::get_thread_slow(); > assert(t != NULL && !t->is_inside_signal_handler(), > "Don't use Thread::current() inside signal handler"); > #endif > > So problem scenario is: use of __thread variable (that belongs to > the shared-library) in a signal handler. > > Solution 0: don't do that. Seriously - like any other > async-signal-unsafe stuff we should not be using it in real signal > handlers. The crash handler is a different matter - we try all sorts > there because it might work and you can't die twice. > > Otherwise: narrow the window of exposure. > > 1. We ensure we initialize thread_current (even with a dummy value) > as early as possible in the thread that loads libjvm. As we have no > signal handlers installed at that point that might use the same > variable, we can not hit the problem scenario. > > 2. We ensure we initialize thread_current in a new thread with all > signals blocked. This again avoids the problem scenario. > > 3. We initialize thread_current in an attaching thread as soon as > possible and we again first block all signals. > > That still leaves the problem of an unattached native thread taking > a signal whilst in async-signal-unsafe code, and executing a signal > handler which in turns tries to access thread_current for the first > time. This signal handler need not be an actual JVM handler, but one > attached by other native code eg an agent. I'm not clear in the > latter case how reasonable it is for an agent's handler to try and > do things from an unattached thread - and we don't claim any JNI > interfaces can, or should, be called from a signal handler - but it > is something you can probably get away with today. > > Let me also point out that we already effectively have this code in > Solaris already (at the ThreadLocalStorage class level). So if there > is something here that will prevent the current proposal we already > have a problem on Solaris. :( > > Thoughts/comments/suggestions? > > Thanks, > David > > > On 6/11/2015 1:09 PM, David Holmes wrote: > > Hi Jeremy, > > I was going to ask you to elaborate :) > > On 6/11/2015 12:24 PM, Jeremy Manson wrote: > > I should probably elaborate on this. With glibc + ELF, the > first time a > thread accesses a variable declared __thread, if that > variable is in a > shared library (as opposed to the main executable), the > system calls > malloc() to allocate the space for it. If that happens in a > signal that > is being delivered during a call to malloc(), then you > usually get a > crash. > > > My understanding of the ELF ABI for thread-locals - which I read > about > in the Solaris 11.1 Linkers and libraries guide - does require > use of > the dynamic TLS model for any dynamically loaded shared object which > defines a thread-local, but that is what we use as I understand > it. The > docs state: > > "A shared object containing only dynamic TLS can be loaded following > process startup without limitations. The runtime linker extends > the list > of initialization records to include the initialization template > of the > new object. The new object is given an index of m = M + 1. The > counter M is incremented by 1. However, the allocation of new > TLS blocks > is deferred until the blocks are actually referenced." > > Now I guess "extends the list" might be implemented using malloc > ... but > this will only occur in the main thread (the one started by the > launcher > to load the JVM and become the main thread), at the time libjvm is > loaded - which will all be over before any agent etc can run and do > anything. But "allocation ... is deferred" suggests we may have a > problem until either the first call to Thread::current or the > call to > Thread::initialize_thread_current. If it is the former then that > should > occur well before any agent etc can be loaded. And I can easily > inject > an initial dummy call to initialize_thread_current(null) to > force the > TLS allocation. > > This may bite you if AsyncGetCallTrace uses > Thread::current(), and you > use system timers to do profiling. If a thread doing a > malloc() prior > to the first time it accesses Thread::current(), and it gets > delivered a > signal, it might die. This is especially likely for pure > native threads > started by native code. > > I believe that this is a use case you support, so you might > want to make > sure it is okay. > > > For a VM embedded in a process, which already contains native > threads, > that will later attach to the VM, this may indeed be a problem. One > would have hoped however that the implementation of TLS would be > completely robust, at least for something as simple as getting a > signal > whilst in the allocator. > > I'm unclear how to test for or check for this kind of problem. > Arguably > there could be many things that are async-unsafe in this way. > > Need to think more about this and do some more research. Would also > appreciate any insight from any glibc and/or ELF gurus. > > Thanks. > David > > Jeremy > > On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson > > >> wrote: > > Something that's bitten me with __thread: it isn't > async-safe when > called from a shared object on Linux. Have you vetted > to make sure > this doesn't make HS less async-safe? > > Jeremy > > On Sun, Nov 1, 2015 at 10:40 PM, David Holmes > > >> wrote: > > bug: https://bugs.openjdk.java.net/browse/JDK-8132510 > > Open webrev: > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ > > A simple (in principle) but wide-ranging change > which should > appeal to our Code Deletion Engineer's. We implement > Thread::current() using a compiler/language-based > thread-local > variable eg: > > > static __thread Thread *_thr_current; > > inline Thread* Thread::current() { > return _thr_current; > } > > with an appropriate setter of course. By doing this > we can > completely remove the platform-specific > ThreadLocalStorage > implementations, and the associated > os::thread_local_storage* > calls, plus all the uses of > ThreadLocalStorage::thread() and > ThreadLocalStorage::get_thread_slow(). This extends > the previous > work done on Solaris to implement > ThreadLocalStorage::thread() > using compiler-based thread-locals. > > We can also consolidate nearly all the os_cpu > versions of > MacroAssembler::get_thread on x86 into one cpu > specific one ( a > special variant is still needed for 32-bit Windows). > > As a result of this change we have further > potential cleanups: > - all the src/os//vm/thread_.inline.hpp > files are now > completely empty and could also be removed > - the MINIMIZE_RAM_USAGE define (which avoids use > of the linux > sp-map "cache" on 32-bit) now has no affect and so > could be > completely removed from the build system > > I plan to do the MINIMIZE_RAM_USAGE removal as a > follow up CR, > but could add the removal of the "inline" files to > this CR if > people think it worth removing them. > > I have one missing piece on Aarch64 - I need to change > MacroAssembler::get_thread to simply call > Thread::current() as > on other platforms, but I don't know how to write > that. I would > appreciate it if someone could give me the right > code for that. > > I would also appreciate comments/testing by the AIX > and PPC64 > folk as well. > > A concern about memory-leaks had previously been > raised, but > experiments using simple C code on linux 86 and > Solaris showed > no issues. Also note that Aarch64 already uses this > kind of > thread-local. > > Thanks, > David > > > > From thomas.stuefe at gmail.com Fri Nov 6 11:52:54 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 6 Nov 2015 12:52:54 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <563C4824.7040300@oracle.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> Message-ID: Hi David, On Fri, Nov 6, 2015 at 7:26 AM, David Holmes wrote: > Hi Jeremy, > > Okay I have read: > > https://sourceware.org/glibc/wiki/TLSandSignals > > and the tree of mail postings referenced therefrom - great reading! :) > > So basic problem: access to __thread variables is not async-signal-safe > > Exacerbator to problem: first attempt to even read a __thread variable can > lead to allocation which is the real problem in relation to > async-signal-safety > > I mention the exacerbator because pthread_getspecific and > pthread_setSpecific are also not async-signal-safe but we already use them. > However, pthread_getspecific is in fact (per email threads linked above) > effectively async-signal-safe, and further a call to pthread_getspecific > never results in a call to pthread_setspecific or an allocation. Hence the > pthread functions are almost, if not completely, safe in practice with > reasonable uses (ie only read from signal handler). Which explain this code > in existing Thread::current() > > #ifdef PARANOID > // Signal handler should call ThreadLocalStorage::get_thread_slow() > Thread* t = ThreadLocalStorage::get_thread_slow(); > assert(t != NULL && !t->is_inside_signal_handler(), > "Don't use Thread::current() inside signal handler"); > #endif > > So problem scenario is: use of __thread variable (that belongs to the > shared-library) in a signal handler. > > Solution 0: don't do that. Seriously - like any other async-signal-unsafe > stuff we should not be using it in real signal handlers. The crash handler > is a different matter - we try all sorts there because it might work and > you can't die twice. > > Otherwise: narrow the window of exposure. > > 1. We ensure we initialize thread_current (even with a dummy value) as > early as possible in the thread that loads libjvm. As we have no signal > handlers installed at that point that might use the same variable, we can > not hit the problem scenario. > > 2. We ensure we initialize thread_current in a new thread with all signals > blocked. This again avoids the problem scenario. > > 3. We initialize thread_current in an attaching thread as soon as possible > and we again first block all signals. > > That still leaves the problem of an unattached native thread taking a > signal whilst in async-signal-unsafe code, and executing a signal handler > which in turns tries to access thread_current for the first time. This > signal handler need not be an actual JVM handler, but one attached by other > native code eg an agent. I'm not clear in the latter case how reasonable it > is for an agent's handler to try and do things from an unattached thread - > and we don't claim any JNI interfaces can, or should, be called from a > signal handler - but it is something you can probably get away with today. > > Let me also point out that we already effectively have this code in > Solaris already (at the ThreadLocalStorage class level). So if there is > something here that will prevent the current proposal we already have a > problem on Solaris. :( > > Thoughts/comments/suggestions? > > The first problem: thread initializes TLS variable, gets interrupted and accesses the half-initialized variable from within the signal handler. This could happen today too, or? but I think we never saw this. In theory, it could be mitigated by some careful testing before using the Thread::current() value in the signal handler. Like, put an eyecatcher at the beginning of the Thread structure and check that using SafeFetch. As for the second problem - recursive malloc() deadlocks - I am at a loss. I do not fully understand though why pthread_getspecific is different - does it not have to allocate place for the TLS variable too? Regards, Thomas > Thanks, > David > > > On 6/11/2015 1:09 PM, David Holmes wrote: > >> Hi Jeremy, >> >> I was going to ask you to elaborate :) >> >> On 6/11/2015 12:24 PM, Jeremy Manson wrote: >> >>> I should probably elaborate on this. With glibc + ELF, the first time a >>> thread accesses a variable declared __thread, if that variable is in a >>> shared library (as opposed to the main executable), the system calls >>> malloc() to allocate the space for it. If that happens in a signal that >>> is being delivered during a call to malloc(), then you usually get a >>> crash. >>> >> >> My understanding of the ELF ABI for thread-locals - which I read about >> in the Solaris 11.1 Linkers and libraries guide - does require use of >> the dynamic TLS model for any dynamically loaded shared object which >> defines a thread-local, but that is what we use as I understand it. The >> docs state: >> >> "A shared object containing only dynamic TLS can be loaded following >> process startup without limitations. The runtime linker extends the list >> of initialization records to include the initialization template of the >> new object. The new object is given an index of m = M + 1. The >> counter M is incremented by 1. However, the allocation of new TLS blocks >> is deferred until the blocks are actually referenced." >> >> Now I guess "extends the list" might be implemented using malloc ... but >> this will only occur in the main thread (the one started by the launcher >> to load the JVM and become the main thread), at the time libjvm is >> loaded - which will all be over before any agent etc can run and do >> anything. But "allocation ... is deferred" suggests we may have a >> problem until either the first call to Thread::current or the call to >> Thread::initialize_thread_current. If it is the former then that should >> occur well before any agent etc can be loaded. And I can easily inject >> an initial dummy call to initialize_thread_current(null) to force the >> TLS allocation. >> >> This may bite you if AsyncGetCallTrace uses Thread::current(), and you >>> use system timers to do profiling. If a thread doing a malloc() prior >>> to the first time it accesses Thread::current(), and it gets delivered a >>> signal, it might die. This is especially likely for pure native threads >>> started by native code. >>> >>> I believe that this is a use case you support, so you might want to make >>> sure it is okay. >>> >> >> For a VM embedded in a process, which already contains native threads, >> that will later attach to the VM, this may indeed be a problem. One >> would have hoped however that the implementation of TLS would be >> completely robust, at least for something as simple as getting a signal >> whilst in the allocator. >> >> I'm unclear how to test for or check for this kind of problem. Arguably >> there could be many things that are async-unsafe in this way. >> >> Need to think more about this and do some more research. Would also >> appreciate any insight from any glibc and/or ELF gurus. >> >> Thanks. >> David >> >> Jeremy >>> >>> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson >> > wrote: >>> >>> Something that's bitten me with __thread: it isn't async-safe when >>> called from a shared object on Linux. Have you vetted to make sure >>> this doesn't make HS less async-safe? >>> >>> Jeremy >>> >>> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes >>> > wrote: >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510 >>> >>> Open webrev: >>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ >>> >>> A simple (in principle) but wide-ranging change which should >>> appeal to our Code Deletion Engineer's. We implement >>> Thread::current() using a compiler/language-based thread-local >>> variable eg: >>> >>> >>> static __thread Thread *_thr_current; >>> >>> inline Thread* Thread::current() { >>> return _thr_current; >>> } >>> >>> with an appropriate setter of course. By doing this we can >>> completely remove the platform-specific ThreadLocalStorage >>> implementations, and the associated os::thread_local_storage* >>> calls, plus all the uses of ThreadLocalStorage::thread() and >>> ThreadLocalStorage::get_thread_slow(). This extends the previous >>> work done on Solaris to implement ThreadLocalStorage::thread() >>> using compiler-based thread-locals. >>> >>> We can also consolidate nearly all the os_cpu versions of >>> MacroAssembler::get_thread on x86 into one cpu specific one ( a >>> special variant is still needed for 32-bit Windows). >>> >>> As a result of this change we have further potential cleanups: >>> - all the src/os//vm/thread_.inline.hpp files are now >>> completely empty and could also be removed >>> - the MINIMIZE_RAM_USAGE define (which avoids use of the linux >>> sp-map "cache" on 32-bit) now has no affect and so could be >>> completely removed from the build system >>> >>> I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR, >>> but could add the removal of the "inline" files to this CR if >>> people think it worth removing them. >>> >>> I have one missing piece on Aarch64 - I need to change >>> MacroAssembler::get_thread to simply call Thread::current() as >>> on other platforms, but I don't know how to write that. I would >>> appreciate it if someone could give me the right code for that. >>> >>> I would also appreciate comments/testing by the AIX and PPC64 >>> folk as well. >>> >>> A concern about memory-leaks had previously been raised, but >>> experiments using simple C code on linux 86 and Solaris showed >>> no issues. Also note that Aarch64 already uses this kind of >>> thread-local. >>> >>> Thanks, >>> David >>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Fri Nov 6 22:20:40 2015 From: david.holmes at oracle.com (David Holmes) Date: Sat, 7 Nov 2015 08:20:40 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> Message-ID: <563D27B8.4040501@oracle.com> On 6/11/2015 9:52 PM, Thomas St?fe wrote: > Hi David, > > On Fri, Nov 6, 2015 at 7:26 AM, David Holmes > wrote: > > Hi Jeremy, > > Okay I have read: > > https://sourceware.org/glibc/wiki/TLSandSignals > > and the tree of mail postings referenced therefrom - great reading! :) > > So basic problem: access to __thread variables is not async-signal-safe > > Exacerbator to problem: first attempt to even read a __thread > variable can lead to allocation which is the real problem in > relation to async-signal-safety > > I mention the exacerbator because pthread_getspecific and > pthread_setSpecific are also not async-signal-safe but we already > use them. However, pthread_getspecific is in fact (per email threads > linked above) effectively async-signal-safe, and further a call to > pthread_getspecific never results in a call to pthread_setspecific > or an allocation. Hence the pthread functions are almost, if not > completely, safe in practice with reasonable uses (ie only read from > signal handler). Which explain this code in existing Thread::current() > > #ifdef PARANOID > // Signal handler should call ThreadLocalStorage::get_thread_slow() > Thread* t = ThreadLocalStorage::get_thread_slow(); > assert(t != NULL && !t->is_inside_signal_handler(), > "Don't use Thread::current() inside signal handler"); > #endif > > So problem scenario is: use of __thread variable (that belongs to > the shared-library) in a signal handler. > > Solution 0: don't do that. Seriously - like any other > async-signal-unsafe stuff we should not be using it in real signal > handlers. The crash handler is a different matter - we try all sorts > there because it might work and you can't die twice. > > Otherwise: narrow the window of exposure. > > 1. We ensure we initialize thread_current (even with a dummy value) > as early as possible in the thread that loads libjvm. As we have no > signal handlers installed at that point that might use the same > variable, we can not hit the problem scenario. > > 2. We ensure we initialize thread_current in a new thread with all > signals blocked. This again avoids the problem scenario. > > 3. We initialize thread_current in an attaching thread as soon as > possible and we again first block all signals. > > That still leaves the problem of an unattached native thread taking > a signal whilst in async-signal-unsafe code, and executing a signal > handler which in turns tries to access thread_current for the first > time. This signal handler need not be an actual JVM handler, but one > attached by other native code eg an agent. I'm not clear in the > latter case how reasonable it is for an agent's handler to try and > do things from an unattached thread - and we don't claim any JNI > interfaces can, or should, be called from a signal handler - but it > is something you can probably get away with today. > > Let me also point out that we already effectively have this code in > Solaris already (at the ThreadLocalStorage class level). So if there > is something here that will prevent the current proposal we already > have a problem on Solaris. :( > > Thoughts/comments/suggestions? > > > The first problem: thread initializes TLS variable, gets interrupted and > accesses the half-initialized variable from within the signal handler. > This could happen today too, or? but I think we never saw this. That depends on the state of signal masks at the time of the initialization. For threads created in the VM and for threads attached to the VM it is likely not an issue. Unattached threads could in theory try to access a TLS variable from a signal handler, but they will never be initializing that variable. Of course the unattached thread could be initializing a completely distinct TLS variable, but reading a different TLS variable from the signal handler does not seem to be an issue (in theory it may be but this is an extreme corner case). > In theory, it could be mitigated by some careful testing before using > the Thread::current() value in the signal handler. Like, put an > eyecatcher at the beginning of the Thread structure and check that using > SafeFetch. There is no way to access the Thread structure before calling Thread::current(). And the potential problem is with unattached threads which have no Thread structure. For threads attached to the VM, or attaching, my three steps will deal with any potential problems. > As for the second problem - recursive malloc() deadlocks - I am at a > loss. I do not fully understand though why pthread_getspecific is > different - does it not have to allocate place for the TLS variable too? No, pthread_getspecific does not have to allocate. Presumably it is written in a way that attempting to index a TLS variable that has not been allocated just returns an error (EINVAL?). The problem with __thread is that even a read will attempt to do the allocation - arguably (as the Google folk did argue) this is wrong, or else should be done in an async-safe way. This does leave me wondering exactly what affect the: static __thread Thread* _thr_current = NULL; has in terms of any per-thread allocation. ?? Anyway to reiterate the problem scenario: - VM has been loaded in a process and signal handlers have been installed (maybe VM, maybe agent) - unattached thread is doing a malloc when it takes a signal - signal handler tries to read __thread variable and we get a malloc deadlock As I said I need to determine what signal handlers in the VM might ever run on an unattached thread, and what they might do. For a "third-party" signal handler there's really nothing I can do - they should not be accessing the VM's __thread variables though (and they cal always introduce their own independent deadlocks by performing non-async-safe actions). Thanks, David > Regards, Thomas > > > Thanks, > David > > > On 6/11/2015 1:09 PM, David Holmes wrote: > > Hi Jeremy, > > I was going to ask you to elaborate :) > > On 6/11/2015 12:24 PM, Jeremy Manson wrote: > > I should probably elaborate on this. With glibc + ELF, the > first time a > thread accesses a variable declared __thread, if that > variable is in a > shared library (as opposed to the main executable), the > system calls > malloc() to allocate the space for it. If that happens in a > signal that > is being delivered during a call to malloc(), then you > usually get a > crash. > > > My understanding of the ELF ABI for thread-locals - which I read > about > in the Solaris 11.1 Linkers and libraries guide - does require > use of > the dynamic TLS model for any dynamically loaded shared object which > defines a thread-local, but that is what we use as I understand > it. The > docs state: > > "A shared object containing only dynamic TLS can be loaded following > process startup without limitations. The runtime linker extends > the list > of initialization records to include the initialization template > of the > new object. The new object is given an index of m = M + 1. The > counter M is incremented by 1. However, the allocation of new > TLS blocks > is deferred until the blocks are actually referenced." > > Now I guess "extends the list" might be implemented using malloc > ... but > this will only occur in the main thread (the one started by the > launcher > to load the JVM and become the main thread), at the time libjvm is > loaded - which will all be over before any agent etc can run and do > anything. But "allocation ... is deferred" suggests we may have a > problem until either the first call to Thread::current or the > call to > Thread::initialize_thread_current. If it is the former then that > should > occur well before any agent etc can be loaded. And I can easily > inject > an initial dummy call to initialize_thread_current(null) to > force the > TLS allocation. > > This may bite you if AsyncGetCallTrace uses > Thread::current(), and you > use system timers to do profiling. If a thread doing a > malloc() prior > to the first time it accesses Thread::current(), and it gets > delivered a > signal, it might die. This is especially likely for pure > native threads > started by native code. > > I believe that this is a use case you support, so you might > want to make > sure it is okay. > > > For a VM embedded in a process, which already contains native > threads, > that will later attach to the VM, this may indeed be a problem. One > would have hoped however that the implementation of TLS would be > completely robust, at least for something as simple as getting a > signal > whilst in the allocator. > > I'm unclear how to test for or check for this kind of problem. > Arguably > there could be many things that are async-unsafe in this way. > > Need to think more about this and do some more research. Would also > appreciate any insight from any glibc and/or ELF gurus. > > Thanks. > David > > Jeremy > > On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson > > >> wrote: > > Something that's bitten me with __thread: it isn't > async-safe when > called from a shared object on Linux. Have you vetted > to make sure > this doesn't make HS less async-safe? > > Jeremy > > On Sun, Nov 1, 2015 at 10:40 PM, David Holmes > > >> wrote: > > bug: https://bugs.openjdk.java.net/browse/JDK-8132510 > > Open webrev: > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ > > A simple (in principle) but wide-ranging change > which should > appeal to our Code Deletion Engineer's. We implement > Thread::current() using a compiler/language-based > thread-local > variable eg: > > > static __thread Thread *_thr_current; > > inline Thread* Thread::current() { > return _thr_current; > } > > with an appropriate setter of course. By doing this > we can > completely remove the platform-specific > ThreadLocalStorage > implementations, and the associated > os::thread_local_storage* > calls, plus all the uses of > ThreadLocalStorage::thread() and > ThreadLocalStorage::get_thread_slow(). This extends > the previous > work done on Solaris to implement > ThreadLocalStorage::thread() > using compiler-based thread-locals. > > We can also consolidate nearly all the os_cpu > versions of > MacroAssembler::get_thread on x86 into one cpu > specific one ( a > special variant is still needed for 32-bit Windows). > > As a result of this change we have further > potential cleanups: > - all the src/os//vm/thread_.inline.hpp > files are now > completely empty and could also be removed > - the MINIMIZE_RAM_USAGE define (which avoids use > of the linux > sp-map "cache" on 32-bit) now has no affect and so > could be > completely removed from the build system > > I plan to do the MINIMIZE_RAM_USAGE removal as a > follow up CR, > but could add the removal of the "inline" files to > this CR if > people think it worth removing them. > > I have one missing piece on Aarch64 - I need to change > MacroAssembler::get_thread to simply call > Thread::current() as > on other platforms, but I don't know how to write > that. I would > appreciate it if someone could give me the right > code for that. > > I would also appreciate comments/testing by the AIX > and PPC64 > folk as well. > > A concern about memory-leaks had previously been > raised, but > experiments using simple C code on linux 86 and > Solaris showed > no issues. Also note that Aarch64 already uses this > kind of > thread-local. > > Thanks, > David > > > > From david.holmes at oracle.com Sun Nov 8 22:54:18 2015 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Nov 2015 08:54:18 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563C5B5B.9060802@oracle.com> Message-ID: <563FD29A.3060103@oracle.com> On 7/11/2015 11:22 AM, Jeremy Manson wrote: > > > On Thu, Nov 5, 2015 at 11:48 PM, David Holmes > wrote: > > On 6/11/2015 4:55 PM, Jeremy Manson wrote: > > FWIW, Google tried to convince the glibc maintainers to make this > async-safe, but they weren't biting: > > https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html > > > Yes I read all that. I wouldn't say they weren't biting, more of a > disagreement on the right direction for the patch. glibc weren't > prepared to take it directly as is, while google-folk didn't seem to > think it worth their while to do whatever glibc folk wanted. The > actual patch proposal just died out. :( Quite a pity. > > Most of the things you can do are going to be mitigation rather > than a > fix. I did what you suggest to mitigate, and no one complained, > until > someone at Google started a sidecar C++ thread that did a > boatload of > malloc'ing. > > > Yes all mitigation. :( > > My workaround was far sneakier, and fixes the problem entirely, > but you > probably won't want to do it for code hygiene reasons. I > declare the > __thread variable in the java launcher itself, and then export a > function that provides a pointer to it. In practice, in glibc, > if it is > in the main executable, ELF is smart enough to declare it as > part of the > stack, and is therefore async-safe. > > > But even that only works for our own launchers - not for embedded in > the JVM. :( > > > Yup. Fortunately, I can tell people at Google how to write launchers. > > Fortunately, while this is a fun thing to think about, I don't think > there are any async paths in the JVM that use Thread::current() > (although I could be wrong - there is a comment in there saying > not to > call Thread::current() in a signal handler...). I would check > the call > paths in AsyncGetCallTrace to make sure. > > > So two things ... > > First, using Thread::current() in a signal context was disallowed, > but the alternative was ThreadLocalStorage::get_thread_slow(). The > former may not work in a signal context due to the caching > mechanisms layered in on different platforms, while the latter used > the platform TLS API which, even if not guaranteed, worked well > enough in a signal context. With __thread we don't have even a > pretend signal-safe alternative :( > > > Right. > > Second, AsyncGetCallTrace violates the first rule by using > JavaThread::current() in an assertion. > > > While we're on the subject, the assertion in Method::bci_from is > reachable from AsyncGetCallTrace and calls err_msg, which can malloc(). > I meant to file a bug about that. > > Also, the problem may not be limited to something like > AsyncGetCallTrace. Though agents may get the current thread from the > JNIEnv rather than invoking some JVM function that results in > Thread::current() being used, I can't be sure of that. > > > Which JVM functions that get the thread are supposed to be async-safe? > There is no reason to think that any method that isn't explicitly marked > async-safe is async safe, and most JNI methods I've tried to use from > signal handlers die painfully if I try to use them from a signal handler. > > Generally, I don't think it is reasonable for a user to expect > async-safety from an API that isn't expressly designed that way. POSIX > has a list of async-safe methods (signal(7)). Right - no JNI or JVM TI functions are designated as async-signal-safe (the specs dont even mention signals). Unfortunately my problem is more basic: pretty much the first thing the JVM signal handler does is get the current thread. So if the signal is handled on an unattached thread that happened to be doing a malloc then we're toast. :( Most of the signals the JVM expects to handle are not blocked by default, AFAICS, so any unattached thread could be selected. David ----- > FWIW, to use AsyncGetCallTrace, I get the JNIEnv in a ThreadStart hook > from JVMTI and stash it in a __thread (and pull the trick I mentioned). > Jeremy > > > Anyway more things to mull over on the weekedn. :) > > Thanks, > David > > Jeremy > > On Thu, Nov 5, 2015 at 10:26 PM, David Holmes > > >> wrote: > > Hi Jeremy, > > Okay I have read: > > https://sourceware.org/glibc/wiki/TLSandSignals > > and the tree of mail postings referenced therefrom - great > reading! :) > > So basic problem: access to __thread variables is not > async-signal-safe > > Exacerbator to problem: first attempt to even read a __thread > variable can lead to allocation which is the real problem in > relation to async-signal-safety > > I mention the exacerbator because pthread_getspecific and > pthread_setSpecific are also not async-signal-safe but we > already > use them. However, pthread_getspecific is in fact (per > email threads > linked above) effectively async-signal-safe, and further a > call to > pthread_getspecific never results in a call to > pthread_setspecific > or an allocation. Hence the pthread functions are almost, > if not > completely, safe in practice with reasonable uses (ie only > read from > signal handler). Which explain this code in existing > Thread::current() > > #ifdef PARANOID > // Signal handler should call > ThreadLocalStorage::get_thread_slow() > Thread* t = ThreadLocalStorage::get_thread_slow(); > assert(t != NULL && !t->is_inside_signal_handler(), > "Don't use Thread::current() inside signal handler"); > #endif > > So problem scenario is: use of __thread variable (that > belongs to > the shared-library) in a signal handler. > > Solution 0: don't do that. Seriously - like any other > async-signal-unsafe stuff we should not be using it in real > signal > handlers. The crash handler is a different matter - we try > all sorts > there because it might work and you can't die twice. > > Otherwise: narrow the window of exposure. > > 1. We ensure we initialize thread_current (even with a > dummy value) > as early as possible in the thread that loads libjvm. As we > have no > signal handlers installed at that point that might use the same > variable, we can not hit the problem scenario. > > 2. We ensure we initialize thread_current in a new thread > with all > signals blocked. This again avoids the problem scenario. > > 3. We initialize thread_current in an attaching thread as > soon as > possible and we again first block all signals. > > That still leaves the problem of an unattached native > thread taking > a signal whilst in async-signal-unsafe code, and executing > a signal > handler which in turns tries to access thread_current for > the first > time. This signal handler need not be an actual JVM > handler, but one > attached by other native code eg an agent. I'm not clear in the > latter case how reasonable it is for an agent's handler to > try and > do things from an unattached thread - and we don't claim > any JNI > interfaces can, or should, be called from a signal handler > - but it > is something you can probably get away with today. > > Let me also point out that we already effectively have this > code in > Solaris already (at the ThreadLocalStorage class level). So > if there > is something here that will prevent the current proposal we > already > have a problem on Solaris. :( > > Thoughts/comments/suggestions? > > Thanks, > David > > > On 6/11/2015 1:09 PM, David Holmes wrote: > > Hi Jeremy, > > I was going to ask you to elaborate :) > > On 6/11/2015 12:24 PM, Jeremy Manson wrote: > > I should probably elaborate on this. With glibc + > ELF, the > first time a > thread accesses a variable declared __thread, if that > variable is in a > shared library (as opposed to the main executable), the > system calls > malloc() to allocate the space for it. If that > happens in a > signal that > is being delivered during a call to malloc(), then you > usually get a > crash. > > > My understanding of the ELF ABI for thread-locals - > which I read > about > in the Solaris 11.1 Linkers and libraries guide - does > require > use of > the dynamic TLS model for any dynamically loaded shared > object which > defines a thread-local, but that is what we use as I > understand > it. The > docs state: > > "A shared object containing only dynamic TLS can be > loaded following > process startup without limitations. The runtime linker > extends > the list > of initialization records to include the initialization > template > of the > new object. The new object is given an index of m = M + > 1. The > counter M is incremented by 1. However, the allocation > of new > TLS blocks > is deferred until the blocks are actually referenced." > > Now I guess "extends the list" might be implemented > using malloc > ... but > this will only occur in the main thread (the one > started by the > launcher > to load the JVM and become the main thread), at the > time libjvm is > loaded - which will all be over before any agent etc > can run and do > anything. But "allocation ... is deferred" suggests we > may have a > problem until either the first call to Thread::current > or the > call to > Thread::initialize_thread_current. If it is the former > then that > should > occur well before any agent etc can be loaded. And I > can easily > inject > an initial dummy call to initialize_thread_current(null) to > force the > TLS allocation. > > This may bite you if AsyncGetCallTrace uses > Thread::current(), and you > use system timers to do profiling. If a thread doing a > malloc() prior > to the first time it accesses Thread::current(), > and it gets > delivered a > signal, it might die. This is especially likely > for pure > native threads > started by native code. > > I believe that this is a use case you support, so > you might > want to make > sure it is okay. > > > For a VM embedded in a process, which already contains > native > threads, > that will later attach to the VM, this may indeed be a > problem. One > would have hoped however that the implementation of TLS > would be > completely robust, at least for something as simple as > getting a > signal > whilst in the allocator. > > I'm unclear how to test for or check for this kind of > problem. > Arguably > there could be many things that are async-unsafe in > this way. > > Need to think more about this and do some more > research. Would also > appreciate any insight from any glibc and/or ELF gurus. > > Thanks. > David > > Jeremy > > On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson > > > > >>> wrote: > > Something that's bitten me with __thread: it isn't > async-safe when > called from a shared object on Linux. Have > you vetted > to make sure > this doesn't make HS less async-safe? > > Jeremy > > On Sun, Nov 1, 2015 at 10:40 PM, David Holmes > > > > > > >>> wrote: > > bug: > https://bugs.openjdk.java.net/browse/JDK-8132510 > > Open webrev: > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ > > A simple (in principle) but wide-ranging > change > which should > appeal to our Code Deletion Engineer's. We > implement > Thread::current() using a > compiler/language-based > thread-local > variable eg: > > > static __thread Thread *_thr_current; > > inline Thread* Thread::current() { > return _thr_current; > } > > with an appropriate setter of course. By > doing this > we can > completely remove the platform-specific > ThreadLocalStorage > implementations, and the associated > os::thread_local_storage* > calls, plus all the uses of > ThreadLocalStorage::thread() and > ThreadLocalStorage::get_thread_slow(). > This extends > the previous > work done on Solaris to implement > ThreadLocalStorage::thread() > using compiler-based thread-locals. > > We can also consolidate nearly all the os_cpu > versions of > MacroAssembler::get_thread on x86 into one cpu > specific one ( a > special variant is still needed for 32-bit > Windows). > > As a result of this change we have further > potential cleanups: > - all the > src/os//vm/thread_.inline.hpp > files are now > completely empty and could also be removed > - the MINIMIZE_RAM_USAGE define (which > avoids use > of the linux > sp-map "cache" on 32-bit) now has no > affect and so > could be > completely removed from the build system > > I plan to do the MINIMIZE_RAM_USAGE > removal as a > follow up CR, > but could add the removal of the "inline" > files to > this CR if > people think it worth removing them. > > I have one missing piece on Aarch64 - I > need to change > MacroAssembler::get_thread to simply call > Thread::current() as > on other platforms, but I don't know how > to write > that. I would > appreciate it if someone could give me the > right > code for that. > > I would also appreciate comments/testing > by the AIX > and PPC64 > folk as well. > > A concern about memory-leaks had > previously been > raised, but > experiments using simple C code on linux > 86 and > Solaris showed > no issues. Also note that Aarch64 already > uses this > kind of > thread-local. > > Thanks, > David > > > > > From gil at azul.com Mon Nov 9 02:39:50 2015 From: gil at azul.com (Gil Tene) Date: Mon, 9 Nov 2015 02:39:50 +0000 Subject: Project proposal: AArch32 port In-Reply-To: <1446541917.18905.12.camel@mylittlepony.linaroharston> References: <1444830648.7802.34.camel@mylittlepony.linaroharston> <5628A5D2.40909@oracle.com> <1445518014.29998.4.camel@mylittlepony.linaroharston> <562A2DBB.9000404@oracle.com> <1445607702.28722.19.camel@mint> <1446040701.26259.11.camel@gmx.com> <1446541917.18905.12.camel@mylittlepony.linaroharston> Message-ID: I'd like to voice both my own and Azul's enthusiastic support for this new aarch32 project. We will be happy to participate and contribute code & resources, including multiple additional committers. ? Gil. > On Nov 3, 2015, at 1:11 AM, Edward Nevill wrote: > > On Wed, 2015-10-28 at 13:58 +0000, Joseph Joyce wrote: >> Hi Ed and Dalibor, >> >> There's no code in there from other open source projects. As far as I >> can remember the only code from elsewhere was an algorithm for >> division, which came from >> http://www.chiark.greenend.org.uk/~theom/riscos/docs/ultimate/a252div.txt >> The code was modified a bit to work for the assembler, the url from >> which it came is mentioned in the source (MacroAssembler::divide32). If >> this is a problem I can easily replace the code with a call out to a C >> (or Ed suggested using the algorithm from the ARM32-Microjit). >> >> I have now signed and sent the OCA (today) and would like to continue >> contributing to this project. If I could be added to the committers that >> would be great. > > Hi Joseph, > > That's great news. I therefore propose Joseph Joyce as an additional > committer for the aarch32 project. > > I have had a look at the divide routine in the template interpreter and > the original by Graeme Williams. I do not believe this should be an > issue as your implementation of the algorithm is completely different. > For a start yours is written in C calling the MacroAssembler methods to > generate the code, whereas his is written in some sort of BASIC > assembler. So although the algorithm is the same the implementation is > different and AIUI it is the implementation that is copyrighted. > > Dalibor: If you are happy with this may I proceed to a CFV for the > aarch32 project. > > Thanks, > Ed. > > From david.holmes at oracle.com Mon Nov 9 23:55:19 2015 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Nov 2015 09:55:19 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <563FD29A.3060103@oracle.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563C5B5B.9060802@oracle.com> <563FD29A.3060103@oracle.com> Message-ID: <56413267.6000001@oracle.com> On 9/11/2015 8:54 AM, David Holmes wrote: > On 7/11/2015 11:22 AM, Jeremy Manson wrote: >> >> >> On Thu, Nov 5, 2015 at 11:48 PM, David Holmes > > wrote: >> >> On 6/11/2015 4:55 PM, Jeremy Manson wrote: >> >> FWIW, Google tried to convince the glibc maintainers to make this >> async-safe, but they weren't biting: >> >> https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html >> >> >> Yes I read all that. I wouldn't say they weren't biting, more of a >> disagreement on the right direction for the patch. glibc weren't >> prepared to take it directly as is, while google-folk didn't seem to >> think it worth their while to do whatever glibc folk wanted. The >> actual patch proposal just died out. :( Quite a pity. >> >> Most of the things you can do are going to be mitigation rather >> than a >> fix. I did what you suggest to mitigate, and no one complained, >> until >> someone at Google started a sidecar C++ thread that did a >> boatload of >> malloc'ing. >> >> >> Yes all mitigation. :( >> >> My workaround was far sneakier, and fixes the problem entirely, >> but you >> probably won't want to do it for code hygiene reasons. I >> declare the >> __thread variable in the java launcher itself, and then export a >> function that provides a pointer to it. In practice, in glibc, >> if it is >> in the main executable, ELF is smart enough to declare it as >> part of the >> stack, and is therefore async-safe. >> >> >> But even that only works for our own launchers - not for embedded in >> the JVM. :( >> >> >> Yup. Fortunately, I can tell people at Google how to write launchers. >> >> Fortunately, while this is a fun thing to think about, I don't >> think >> there are any async paths in the JVM that use Thread::current() >> (although I could be wrong - there is a comment in there saying >> not to >> call Thread::current() in a signal handler...). I would check >> the call >> paths in AsyncGetCallTrace to make sure. >> >> >> So two things ... >> >> First, using Thread::current() in a signal context was disallowed, >> but the alternative was ThreadLocalStorage::get_thread_slow(). The >> former may not work in a signal context due to the caching >> mechanisms layered in on different platforms, while the latter used >> the platform TLS API which, even if not guaranteed, worked well >> enough in a signal context. With __thread we don't have even a >> pretend signal-safe alternative :( >> >> >> Right. >> >> Second, AsyncGetCallTrace violates the first rule by using >> JavaThread::current() in an assertion. >> >> >> While we're on the subject, the assertion in Method::bci_from is >> reachable from AsyncGetCallTrace and calls err_msg, which can malloc(). >> I meant to file a bug about that. >> >> Also, the problem may not be limited to something like >> AsyncGetCallTrace. Though agents may get the current thread from the >> JNIEnv rather than invoking some JVM function that results in >> Thread::current() being used, I can't be sure of that. >> >> >> Which JVM functions that get the thread are supposed to be async-safe? >> There is no reason to think that any method that isn't explicitly marked >> async-safe is async safe, and most JNI methods I've tried to use from >> signal handlers die painfully if I try to use them from a signal handler. >> >> Generally, I don't think it is reasonable for a user to expect >> async-safety from an API that isn't expressly designed that way. POSIX >> has a list of async-safe methods (signal(7)). > > Right - no JNI or JVM TI functions are designated as async-signal-safe > (the specs dont even mention signals). > > Unfortunately my problem is more basic: pretty much the first thing the > JVM signal handler does is get the current thread. So if the signal is > handled on an unattached thread that happened to be doing a malloc then > we're toast. :( Most of the signals the JVM expects to handle are not > blocked by default, AFAICS, so any unattached thread could be selected. Just to keep my thinking straight on this, the problem only exists for threads that existed before the JVM was loaded. All threads allocated after that will have space for all the TLS variables allocated directly. So the problem scenario is: - external process with existing threads loads the JVM - existing thread is executing critical library function eg malloc, when it takes a process-directed signal. - JVM signal handler runs and accesses _thr_current which triggers dynamic TLS allocation David ----- > David > ----- > > >> FWIW, to use AsyncGetCallTrace, I get the JNIEnv in a ThreadStart hook >> from JVMTI and stash it in a __thread (and pull the trick I mentioned). >> Jeremy >> >> >> Anyway more things to mull over on the weekedn. :) >> >> Thanks, >> David >> >> Jeremy >> >> On Thu, Nov 5, 2015 at 10:26 PM, David Holmes >> >> > >> wrote: >> >> Hi Jeremy, >> >> Okay I have read: >> >> https://sourceware.org/glibc/wiki/TLSandSignals >> >> and the tree of mail postings referenced therefrom - great >> reading! :) >> >> So basic problem: access to __thread variables is not >> async-signal-safe >> >> Exacerbator to problem: first attempt to even read a >> __thread >> variable can lead to allocation which is the real problem in >> relation to async-signal-safety >> >> I mention the exacerbator because pthread_getspecific and >> pthread_setSpecific are also not async-signal-safe but we >> already >> use them. However, pthread_getspecific is in fact (per >> email threads >> linked above) effectively async-signal-safe, and further a >> call to >> pthread_getspecific never results in a call to >> pthread_setspecific >> or an allocation. Hence the pthread functions are almost, >> if not >> completely, safe in practice with reasonable uses (ie only >> read from >> signal handler). Which explain this code in existing >> Thread::current() >> >> #ifdef PARANOID >> // Signal handler should call >> ThreadLocalStorage::get_thread_slow() >> Thread* t = ThreadLocalStorage::get_thread_slow(); >> assert(t != NULL && !t->is_inside_signal_handler(), >> "Don't use Thread::current() inside signal >> handler"); >> #endif >> >> So problem scenario is: use of __thread variable (that >> belongs to >> the shared-library) in a signal handler. >> >> Solution 0: don't do that. Seriously - like any other >> async-signal-unsafe stuff we should not be using it in real >> signal >> handlers. The crash handler is a different matter - we try >> all sorts >> there because it might work and you can't die twice. >> >> Otherwise: narrow the window of exposure. >> >> 1. We ensure we initialize thread_current (even with a >> dummy value) >> as early as possible in the thread that loads libjvm. As we >> have no >> signal handlers installed at that point that might use >> the same >> variable, we can not hit the problem scenario. >> >> 2. We ensure we initialize thread_current in a new thread >> with all >> signals blocked. This again avoids the problem scenario. >> >> 3. We initialize thread_current in an attaching thread as >> soon as >> possible and we again first block all signals. >> >> That still leaves the problem of an unattached native >> thread taking >> a signal whilst in async-signal-unsafe code, and executing >> a signal >> handler which in turns tries to access thread_current for >> the first >> time. This signal handler need not be an actual JVM >> handler, but one >> attached by other native code eg an agent. I'm not clear >> in the >> latter case how reasonable it is for an agent's handler to >> try and >> do things from an unattached thread - and we don't claim >> any JNI >> interfaces can, or should, be called from a signal handler >> - but it >> is something you can probably get away with today. >> >> Let me also point out that we already effectively have this >> code in >> Solaris already (at the ThreadLocalStorage class level). So >> if there >> is something here that will prevent the current proposal we >> already >> have a problem on Solaris. :( >> >> Thoughts/comments/suggestions? >> >> Thanks, >> David >> >> >> On 6/11/2015 1:09 PM, David Holmes wrote: >> >> Hi Jeremy, >> >> I was going to ask you to elaborate :) >> >> On 6/11/2015 12:24 PM, Jeremy Manson wrote: >> >> I should probably elaborate on this. With glibc + >> ELF, the >> first time a >> thread accesses a variable declared __thread, if >> that >> variable is in a >> shared library (as opposed to the main >> executable), the >> system calls >> malloc() to allocate the space for it. If that >> happens in a >> signal that >> is being delivered during a call to malloc(), >> then you >> usually get a >> crash. >> >> >> My understanding of the ELF ABI for thread-locals - >> which I read >> about >> in the Solaris 11.1 Linkers and libraries guide - does >> require >> use of >> the dynamic TLS model for any dynamically loaded shared >> object which >> defines a thread-local, but that is what we use as I >> understand >> it. The >> docs state: >> >> "A shared object containing only dynamic TLS can be >> loaded following >> process startup without limitations. The runtime linker >> extends >> the list >> of initialization records to include the initialization >> template >> of the >> new object. The new object is given an index of m = M + >> 1. The >> counter M is incremented by 1. However, the allocation >> of new >> TLS blocks >> is deferred until the blocks are actually referenced." >> >> Now I guess "extends the list" might be implemented >> using malloc >> ... but >> this will only occur in the main thread (the one >> started by the >> launcher >> to load the JVM and become the main thread), at the >> time libjvm is >> loaded - which will all be over before any agent etc >> can run and do >> anything. But "allocation ... is deferred" suggests we >> may have a >> problem until either the first call to Thread::current >> or the >> call to >> Thread::initialize_thread_current. If it is the former >> then that >> should >> occur well before any agent etc can be loaded. And I >> can easily >> inject >> an initial dummy call to >> initialize_thread_current(null) to >> force the >> TLS allocation. >> >> This may bite you if AsyncGetCallTrace uses >> Thread::current(), and you >> use system timers to do profiling. If a thread >> doing a >> malloc() prior >> to the first time it accesses Thread::current(), >> and it gets >> delivered a >> signal, it might die. This is especially likely >> for pure >> native threads >> started by native code. >> >> I believe that this is a use case you support, so >> you might >> want to make >> sure it is okay. >> >> >> For a VM embedded in a process, which already contains >> native >> threads, >> that will later attach to the VM, this may indeed be a >> problem. One >> would have hoped however that the implementation of TLS >> would be >> completely robust, at least for something as simple as >> getting a >> signal >> whilst in the allocator. >> >> I'm unclear how to test for or check for this kind of >> problem. >> Arguably >> there could be many things that are async-unsafe in >> this way. >> >> Need to think more about this and do some more >> research. Would also >> appreciate any insight from any glibc and/or ELF gurus. >> >> Thanks. >> David >> >> Jeremy >> >> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson >> > > > >> > >> > >>> wrote: >> >> Something that's bitten me with __thread: it >> isn't >> async-safe when >> called from a shared object on Linux. Have >> you vetted >> to make sure >> this doesn't make HS less async-safe? >> >> Jeremy >> >> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes >> > >> > > >> > >> >> > >>> wrote: >> >> bug: >> https://bugs.openjdk.java.net/browse/JDK-8132510 >> >> Open webrev: >> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ >> >> A simple (in principle) but wide-ranging >> change >> which should >> appeal to our Code Deletion Engineer's. We >> implement >> Thread::current() using a >> compiler/language-based >> thread-local >> variable eg: >> >> >> static __thread Thread *_thr_current; >> >> inline Thread* Thread::current() { >> return _thr_current; >> } >> >> with an appropriate setter of course. By >> doing this >> we can >> completely remove the platform-specific >> ThreadLocalStorage >> implementations, and the associated >> os::thread_local_storage* >> calls, plus all the uses of >> ThreadLocalStorage::thread() and >> ThreadLocalStorage::get_thread_slow(). >> This extends >> the previous >> work done on Solaris to implement >> ThreadLocalStorage::thread() >> using compiler-based thread-locals. >> >> We can also consolidate nearly all the >> os_cpu >> versions of >> MacroAssembler::get_thread on x86 into >> one cpu >> specific one ( a >> special variant is still needed for 32-bit >> Windows). >> >> As a result of this change we have further >> potential cleanups: >> - all the >> src/os//vm/thread_.inline.hpp >> files are now >> completely empty and could also be removed >> - the MINIMIZE_RAM_USAGE define (which >> avoids use >> of the linux >> sp-map "cache" on 32-bit) now has no >> affect and so >> could be >> completely removed from the build system >> >> I plan to do the MINIMIZE_RAM_USAGE >> removal as a >> follow up CR, >> but could add the removal of the "inline" >> files to >> this CR if >> people think it worth removing them. >> >> I have one missing piece on Aarch64 - I >> need to change >> MacroAssembler::get_thread to simply call >> Thread::current() as >> on other platforms, but I don't know how >> to write >> that. I would >> appreciate it if someone could give me the >> right >> code for that. >> >> I would also appreciate comments/testing >> by the AIX >> and PPC64 >> folk as well. >> >> A concern about memory-leaks had >> previously been >> raised, but >> experiments using simple C code on linux >> 86 and >> Solaris showed >> no issues. Also note that Aarch64 already >> uses this >> kind of >> thread-local. >> >> Thanks, >> David >> >> >> >> >> From thomas.stuefe at gmail.com Tue Nov 10 10:20:26 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 10 Nov 2015 11:20:26 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <563D27B8.4040501@oracle.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563D27B8.4040501@oracle.com> Message-ID: Hi David, On Fri, Nov 6, 2015 at 11:20 PM, David Holmes wrote: > On 6/11/2015 9:52 PM, Thomas St?fe wrote: > >> Hi David, >> >> On Fri, Nov 6, 2015 at 7:26 AM, David Holmes > > wrote: >> >> Hi Jeremy, >> >> Okay I have read: >> >> https://sourceware.org/glibc/wiki/TLSandSignals >> >> and the tree of mail postings referenced therefrom - great reading! :) >> >> So basic problem: access to __thread variables is not >> async-signal-safe >> >> Exacerbator to problem: first attempt to even read a __thread >> variable can lead to allocation which is the real problem in >> relation to async-signal-safety >> >> I mention the exacerbator because pthread_getspecific and >> pthread_setSpecific are also not async-signal-safe but we already >> use them. However, pthread_getspecific is in fact (per email threads >> linked above) effectively async-signal-safe, and further a call to >> pthread_getspecific never results in a call to pthread_setspecific >> or an allocation. Hence the pthread functions are almost, if not >> completely, safe in practice with reasonable uses (ie only read from >> signal handler). Which explain this code in existing Thread::current() >> >> #ifdef PARANOID >> // Signal handler should call ThreadLocalStorage::get_thread_slow() >> Thread* t = ThreadLocalStorage::get_thread_slow(); >> assert(t != NULL && !t->is_inside_signal_handler(), >> "Don't use Thread::current() inside signal handler"); >> #endif >> >> So problem scenario is: use of __thread variable (that belongs to >> the shared-library) in a signal handler. >> >> Solution 0: don't do that. Seriously - like any other >> async-signal-unsafe stuff we should not be using it in real signal >> handlers. The crash handler is a different matter - we try all sorts >> there because it might work and you can't die twice. >> >> Otherwise: narrow the window of exposure. >> >> 1. We ensure we initialize thread_current (even with a dummy value) >> as early as possible in the thread that loads libjvm. As we have no >> signal handlers installed at that point that might use the same >> variable, we can not hit the problem scenario. >> >> 2. We ensure we initialize thread_current in a new thread with all >> signals blocked. This again avoids the problem scenario. >> >> 3. We initialize thread_current in an attaching thread as soon as >> possible and we again first block all signals. >> >> That still leaves the problem of an unattached native thread taking >> a signal whilst in async-signal-unsafe code, and executing a signal >> handler which in turns tries to access thread_current for the first >> time. This signal handler need not be an actual JVM handler, but one >> attached by other native code eg an agent. I'm not clear in the >> latter case how reasonable it is for an agent's handler to try and >> do things from an unattached thread - and we don't claim any JNI >> interfaces can, or should, be called from a signal handler - but it >> is something you can probably get away with today. >> >> Let me also point out that we already effectively have this code in >> Solaris already (at the ThreadLocalStorage class level). So if there >> is something here that will prevent the current proposal we already >> have a problem on Solaris. :( >> >> Thoughts/comments/suggestions? >> >> >> The first problem: thread initializes TLS variable, gets interrupted and >> accesses the half-initialized variable from within the signal handler. >> This could happen today too, or? but I think we never saw this. >> > > That depends on the state of signal masks at the time of the > initialization. For threads created in the VM and for threads attached to > the VM it is likely not an issue. Unattached threads could in theory try to > access a TLS variable from a signal handler, but they will never be > initializing that variable. Of course the unattached thread could be > initializing a completely distinct TLS variable, but reading a different > TLS variable from the signal handler does not seem to be an issue (in > theory it may be but this is an extreme corner case). > > In theory, it could be mitigated by some careful testing before using >> the Thread::current() value in the signal handler. Like, put an >> eyecatcher at the beginning of the Thread structure and check that using >> SafeFetch. >> > > There is no way to access the Thread structure before calling > Thread::current(). And the potential problem is with unattached threads > which have no Thread structure. For threads attached to the VM, or > attaching, my three steps will deal with any potential problems. > > As for the second problem - recursive malloc() deadlocks - I am at a >> loss. I do not fully understand though why pthread_getspecific is >> different - does it not have to allocate place for the TLS variable too? >> > > No, pthread_getspecific does not have to allocate. Presumably it is > written in a way that attempting to index a TLS variable that has not been > allocated just returns an error (EINVAL?). It would return NULL. > The problem with __thread is that even a read will attempt to do the > allocation - arguably (as the Google folk did argue) this is wrong, or else > should be done in an async-safe way. > I looked up the implementation of pthread_getspecific and pthread_setspecific in the glibc and now understand better why pthread tls is considered safe here. glibc allows for 1024 tls slots which are organized as a 32x32 sparse array, whose second level arrays are only allocated if the first slot in it is used by pthread_setspecific. So, only when writing the slot. It also means that the number of allocation calls is smaller than with __thread - instead of (presumably) calling malloc() for every instance of __thread, it only calls at the maximum 32 times. And the first 32 slots are already allocated in the pthread structure, so they are free. This means that even if one were to write to a TLS slot in the signal handler, chances of it mallocing are quite small. > > This does leave me wondering exactly what affect the: > > static __thread Thread* _thr_current = NULL; > > has in terms of any per-thread allocation. ?? > > Anyway to reiterate the problem scenario: > - VM has been loaded in a process and signal handlers have been installed > (maybe VM, maybe agent) > - unattached thread is doing a malloc when it takes a signal > - signal handler tries to read __thread variable and we get a malloc > deadlock > > As I said I need to determine what signal handlers in the VM might ever > run on an unattached thread, and what they might do. I don't understand - our signal handler is globally active, no? So any unattached thread may execute our signal handler at any time, and the first thing our signal handler does is Thread::current(). If there was a third party signal handler, it is getting called as chained handler, but only after our signal handler ran. Thanks, Thomas (My current feeling is that I'd prefer to keep the pthread TLS solution but I like your simplifications to the code and would like to keep that too...) > For a "third-party" signal handler there's really nothing I can do - they > should not be accessing the VM's __thread variables though (and they cal > always introduce their own independent deadlocks by performing > non-async-safe actions). > > Thanks, > David > > Regards, Thomas >> >> >> Thanks, >> David >> >> >> On 6/11/2015 1:09 PM, David Holmes wrote: >> >> Hi Jeremy, >> >> I was going to ask you to elaborate :) >> >> On 6/11/2015 12:24 PM, Jeremy Manson wrote: >> >> I should probably elaborate on this. With glibc + ELF, the >> first time a >> thread accesses a variable declared __thread, if that >> variable is in a >> shared library (as opposed to the main executable), the >> system calls >> malloc() to allocate the space for it. If that happens in a >> signal that >> is being delivered during a call to malloc(), then you >> usually get a >> crash. >> >> >> My understanding of the ELF ABI for thread-locals - which I read >> about >> in the Solaris 11.1 Linkers and libraries guide - does require >> use of >> the dynamic TLS model for any dynamically loaded shared object >> which >> defines a thread-local, but that is what we use as I understand >> it. The >> docs state: >> >> "A shared object containing only dynamic TLS can be loaded >> following >> process startup without limitations. The runtime linker extends >> the list >> of initialization records to include the initialization template >> of the >> new object. The new object is given an index of m = M + 1. The >> counter M is incremented by 1. However, the allocation of new >> TLS blocks >> is deferred until the blocks are actually referenced." >> >> Now I guess "extends the list" might be implemented using malloc >> ... but >> this will only occur in the main thread (the one started by the >> launcher >> to load the JVM and become the main thread), at the time libjvm is >> loaded - which will all be over before any agent etc can run and >> do >> anything. But "allocation ... is deferred" suggests we may have a >> problem until either the first call to Thread::current or the >> call to >> Thread::initialize_thread_current. If it is the former then that >> should >> occur well before any agent etc can be loaded. And I can easily >> inject >> an initial dummy call to initialize_thread_current(null) to >> force the >> TLS allocation. >> >> This may bite you if AsyncGetCallTrace uses >> Thread::current(), and you >> use system timers to do profiling. If a thread doing a >> malloc() prior >> to the first time it accesses Thread::current(), and it gets >> delivered a >> signal, it might die. This is especially likely for pure >> native threads >> started by native code. >> >> I believe that this is a use case you support, so you might >> want to make >> sure it is okay. >> >> >> For a VM embedded in a process, which already contains native >> threads, >> that will later attach to the VM, this may indeed be a problem. >> One >> would have hoped however that the implementation of TLS would be >> completely robust, at least for something as simple as getting a >> signal >> whilst in the allocator. >> >> I'm unclear how to test for or check for this kind of problem. >> Arguably >> there could be many things that are async-unsafe in this way. >> >> Need to think more about this and do some more research. Would >> also >> appreciate any insight from any glibc and/or ELF gurus. >> >> Thanks. >> David >> >> Jeremy >> >> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson >> >> > >> wrote: >> >> Something that's bitten me with __thread: it isn't >> async-safe when >> called from a shared object on Linux. Have you vetted >> to make sure >> this doesn't make HS less async-safe? >> >> Jeremy >> >> On Sun, Nov 1, 2015 at 10:40 PM, David Holmes >> > >> > >> >> wrote: >> >> bug: >> https://bugs.openjdk.java.net/browse/JDK-8132510 >> >> Open webrev: >> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ >> >> A simple (in principle) but wide-ranging change >> which should >> appeal to our Code Deletion Engineer's. We implement >> Thread::current() using a compiler/language-based >> thread-local >> variable eg: >> >> >> static __thread Thread *_thr_current; >> >> inline Thread* Thread::current() { >> return _thr_current; >> } >> >> with an appropriate setter of course. By doing this >> we can >> completely remove the platform-specific >> ThreadLocalStorage >> implementations, and the associated >> os::thread_local_storage* >> calls, plus all the uses of >> ThreadLocalStorage::thread() and >> ThreadLocalStorage::get_thread_slow(). This extends >> the previous >> work done on Solaris to implement >> ThreadLocalStorage::thread() >> using compiler-based thread-locals. >> >> We can also consolidate nearly all the os_cpu >> versions of >> MacroAssembler::get_thread on x86 into one cpu >> specific one ( a >> special variant is still needed for 32-bit Windows). >> >> As a result of this change we have further >> potential cleanups: >> - all the src/os//vm/thread_.inline.hpp >> files are now >> completely empty and could also be removed >> - the MINIMIZE_RAM_USAGE define (which avoids use >> of the linux >> sp-map "cache" on 32-bit) now has no affect and so >> could be >> completely removed from the build system >> >> I plan to do the MINIMIZE_RAM_USAGE removal as a >> follow up CR, >> but could add the removal of the "inline" files to >> this CR if >> people think it worth removing them. >> >> I have one missing piece on Aarch64 - I need to >> change >> MacroAssembler::get_thread to simply call >> Thread::current() as >> on other platforms, but I don't know how to write >> that. I would >> appreciate it if someone could give me the right >> code for that. >> >> I would also appreciate comments/testing by the AIX >> and PPC64 >> folk as well. >> >> A concern about memory-leaks had previously been >> raised, but >> experiments using simple C code on linux 86 and >> Solaris showed >> no issues. Also note that Aarch64 already uses this >> kind of >> thread-local. >> >> Thanks, >> David >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue Nov 10 11:26:10 2015 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Nov 2015 21:26:10 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563D27B8.4040501@oracle.com> Message-ID: <5641D452.6040503@oracle.com> Sorry the formatting of the replies is getting totally screwed up now :( On 10/11/2015 8:20 PM, Thomas St?fe wrote: > Hi David, > > On Fri, Nov 6, 2015 at 11:20 PM, David Holmes > wrote: > > On 6/11/2015 9:52 PM, Thomas St?fe wrote: > > Hi David, > > On Fri, Nov 6, 2015 at 7:26 AM, David Holmes > > >> wrote: > > Hi Jeremy, > > Okay I have read: > > https://sourceware.org/glibc/wiki/TLSandSignals > > and the tree of mail postings referenced therefrom - great > reading! :) > > So basic problem: access to __thread variables is not > async-signal-safe > > Exacerbator to problem: first attempt to even read a __thread > variable can lead to allocation which is the real problem in > relation to async-signal-safety > > I mention the exacerbator because pthread_getspecific and > pthread_setSpecific are also not async-signal-safe but we > already > use them. However, pthread_getspecific is in fact (per > email threads > linked above) effectively async-signal-safe, and further a > call to > pthread_getspecific never results in a call to > pthread_setspecific > or an allocation. Hence the pthread functions are almost, > if not > completely, safe in practice with reasonable uses (ie only > read from > signal handler). Which explain this code in existing > Thread::current() > > #ifdef PARANOID > // Signal handler should call > ThreadLocalStorage::get_thread_slow() > Thread* t = ThreadLocalStorage::get_thread_slow(); > assert(t != NULL && !t->is_inside_signal_handler(), > "Don't use Thread::current() inside signal handler"); > #endif > > So problem scenario is: use of __thread variable (that > belongs to > the shared-library) in a signal handler. > > Solution 0: don't do that. Seriously - like any other > async-signal-unsafe stuff we should not be using it in real > signal > handlers. The crash handler is a different matter - we try > all sorts > there because it might work and you can't die twice. > > Otherwise: narrow the window of exposure. > > 1. We ensure we initialize thread_current (even with a > dummy value) > as early as possible in the thread that loads libjvm. As we > have no > signal handlers installed at that point that might use the same > variable, we can not hit the problem scenario. > > 2. We ensure we initialize thread_current in a new thread > with all > signals blocked. This again avoids the problem scenario. > > 3. We initialize thread_current in an attaching thread as > soon as > possible and we again first block all signals. > > That still leaves the problem of an unattached native > thread taking > a signal whilst in async-signal-unsafe code, and executing > a signal > handler which in turns tries to access thread_current for > the first > time. This signal handler need not be an actual JVM > handler, but one > attached by other native code eg an agent. I'm not clear in the > latter case how reasonable it is for an agent's handler to > try and > do things from an unattached thread - and we don't claim > any JNI > interfaces can, or should, be called from a signal handler > - but it > is something you can probably get away with today. > > Let me also point out that we already effectively have this > code in > Solaris already (at the ThreadLocalStorage class level). So > if there > is something here that will prevent the current proposal we > already > have a problem on Solaris. :( > > Thoughts/comments/suggestions? > > > The first problem: thread initializes TLS variable, gets > interrupted and > accesses the half-initialized variable from within the signal > handler. > This could happen today too, or? but I think we never saw this. > > > That depends on the state of signal masks at the time of the > initialization. For threads created in the VM and for threads > attached to the VM it is likely not an issue. Unattached threads > could in theory try to access a TLS variable from a signal handler, > but they will never be initializing that variable. Of course the > unattached thread could be initializing a completely distinct TLS > variable, but reading a different TLS variable from the signal > handler does not seem to be an issue (in theory it may be but this > is an extreme corner case). > > In theory, it could be mitigated by some careful testing before > using > the Thread::current() value in the signal handler. Like, put an > eyecatcher at the beginning of the Thread structure and check > that using > SafeFetch. > > > There is no way to access the Thread structure before calling > Thread::current(). And the potential problem is with unattached > threads which have no Thread structure. For threads attached to the > VM, or attaching, my three steps will deal with any potential problems. > > As for the second problem - recursive malloc() deadlocks - I am at a > loss. I do not fully understand though why pthread_getspecific is > different - does it not have to allocate place for the TLS > variable too? > > > No, pthread_getspecific does not have to allocate. Presumably it is > written in a way that attempting to index a TLS variable that has > not been allocated just returns an error (EINVAL?). > > > It would return NULL. > > The problem with __thread is that even a read will attempt to do the > allocation - arguably (as the Google folk did argue) this is wrong, > or else should be done in an async-safe way. > > > I looked up the implementation of pthread_getspecific and > pthread_setspecific in the glibc and now understand better why pthread > tls is considered safe here. > > glibc allows for 1024 tls slots which are organized as a 32x32 sparse > array, whose second level arrays are only allocated if the first slot in > it is used by pthread_setspecific. So, only when writing the slot. It > also means that the number of allocation calls is smaller than with > __thread - instead of (presumably) calling malloc() for every instance > of __thread, it only calls at the maximum 32 times. And the first 32 > slots are already allocated in the pthread structure, so they are free. > This means that even if one were to write to a TLS slot in the signal > handler, chances of it mallocing are quite small. __thread will only need to malloc (in the context we are discussing) when a pre-existing thread first references a TLS variable from a lazily loaded DSO. Otherwise the space for the DSO's TLS variables are allocated "statically" when the thread control structures are created. > > This does leave me wondering exactly what affect the: > > static __thread Thread* _thr_current = NULL; > > has in terms of any per-thread allocation. ?? > > Anyway to reiterate the problem scenario: > - VM has been loaded in a process and signal handlers have been > installed (maybe VM, maybe agent) > - unattached thread is doing a malloc when it takes a signal > - signal handler tries to read __thread variable and we get a malloc > deadlock > > As I said I need to determine what signal handlers in the VM might > ever run on an unattached thread, and what they might do. > > > I don't understand - our signal handler is globally active, no? So any > unattached thread may execute our signal handler at any time, and the > first thing our signal handler does is Thread::current(). If there was a > third party signal handler, it is getting called as chained handler, but > only after our signal handler ran. The current code uses ThreadLocalStorage::get_thread_slow() in the signal handler, which uses pthread_getspecific, which is "safe". The new code would access the __thread variable and have to malloc - which is unsafe. Using the JDK launchers there are no unattached threads created before libjvm is loaded - so the problem would never arise. I have been looking hard at our signal handlers though and found they don't seem to match how they are described in various parts of the code that set the signal masks. My main concern is with process-directed signals (SIGQUIT, SIGTERM) that trigger specific actions (thread dump, orderly shutdown). Synchronous signals are not an issue - if the unattached thread triggers a segv while in malloc then the process is doomed anyway and a deadlock is less problematic (not great but hardly in the same league as deadlocking an active fully working VM). But I'm still having trouble joining all the dots in this code and figuring out how an unattached thread might react today. I'll continue untangling this tomorrow. > Thanks, Thomas > > (My current feeling is that I'd prefer to keep the pthread TLS solution > but I like your simplifications to the code and would like to keep that > too...) It was all the complex, inconsistent caching mechanisms employed over the top of the library based TLS that motivated the cleanup - especially as the cache on Solaris was shown to be broken. If it was just a simple library based TLS layer, there would be less motivation to try the __thread approach - but __thread had the appeal of removing a lot of duplicated code. A simple library based scheme might be an alternative if it is performant enough - but not sure I have the time to go back to square one on this. Thanks, David > For a "third-party" signal handler there's really nothing I can do - > they should not be accessing the VM's __thread variables though (and > they cal always introduce their own independent deadlocks by > performing non-async-safe actions). > > Thanks, > David > > Regards, Thomas > > > Thanks, > David > > > On 6/11/2015 1:09 PM, David Holmes wrote: > > Hi Jeremy, > > I was going to ask you to elaborate :) > > On 6/11/2015 12:24 PM, Jeremy Manson wrote: > > I should probably elaborate on this. With glibc + > ELF, the > first time a > thread accesses a variable declared __thread, if that > variable is in a > shared library (as opposed to the main executable), the > system calls > malloc() to allocate the space for it. If that > happens in a > signal that > is being delivered during a call to malloc(), then you > usually get a > crash. > > > My understanding of the ELF ABI for thread-locals - > which I read > about > in the Solaris 11.1 Linkers and libraries guide - does > require > use of > the dynamic TLS model for any dynamically loaded shared > object which > defines a thread-local, but that is what we use as I > understand > it. The > docs state: > > "A shared object containing only dynamic TLS can be > loaded following > process startup without limitations. The runtime linker > extends > the list > of initialization records to include the initialization > template > of the > new object. The new object is given an index of m = M + > 1. The > counter M is incremented by 1. However, the allocation > of new > TLS blocks > is deferred until the blocks are actually referenced." > > Now I guess "extends the list" might be implemented > using malloc > ... but > this will only occur in the main thread (the one > started by the > launcher > to load the JVM and become the main thread), at the > time libjvm is > loaded - which will all be over before any agent etc > can run and do > anything. But "allocation ... is deferred" suggests we > may have a > problem until either the first call to Thread::current > or the > call to > Thread::initialize_thread_current. If it is the former > then that > should > occur well before any agent etc can be loaded. And I > can easily > inject > an initial dummy call to initialize_thread_current(null) to > force the > TLS allocation. > > This may bite you if AsyncGetCallTrace uses > Thread::current(), and you > use system timers to do profiling. If a thread doing a > malloc() prior > to the first time it accesses Thread::current(), > and it gets > delivered a > signal, it might die. This is especially likely > for pure > native threads > started by native code. > > I believe that this is a use case you support, so > you might > want to make > sure it is okay. > > > For a VM embedded in a process, which already contains > native > threads, > that will later attach to the VM, this may indeed be a > problem. One > would have hoped however that the implementation of TLS > would be > completely robust, at least for something as simple as > getting a > signal > whilst in the allocator. > > I'm unclear how to test for or check for this kind of > problem. > Arguably > there could be many things that are async-unsafe in > this way. > > Need to think more about this and do some more > research. Would also > appreciate any insight from any glibc and/or ELF gurus. > > Thanks. > David > > Jeremy > > On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson > > > > >>> wrote: > > Something that's bitten me with __thread: it isn't > async-safe when > called from a shared object on Linux. Have > you vetted > to make sure > this doesn't make HS less async-safe? > > Jeremy > > On Sun, Nov 1, 2015 at 10:40 PM, David Holmes > > > > > > >>> wrote: > > bug: > https://bugs.openjdk.java.net/browse/JDK-8132510 > > Open webrev: > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/ > > A simple (in principle) but wide-ranging > change > which should > appeal to our Code Deletion Engineer's. We > implement > Thread::current() using a > compiler/language-based > thread-local > variable eg: > > > static __thread Thread *_thr_current; > > inline Thread* Thread::current() { > return _thr_current; > } > > with an appropriate setter of course. By > doing this > we can > completely remove the platform-specific > ThreadLocalStorage > implementations, and the associated > os::thread_local_storage* > calls, plus all the uses of > ThreadLocalStorage::thread() and > ThreadLocalStorage::get_thread_slow(). > This extends > the previous > work done on Solaris to implement > ThreadLocalStorage::thread() > using compiler-based thread-locals. > > We can also consolidate nearly all the os_cpu > versions of > MacroAssembler::get_thread on x86 into one cpu > specific one ( a > special variant is still needed for 32-bit > Windows). > > As a result of this change we have further > potential cleanups: > - all the > src/os//vm/thread_.inline.hpp > files are now > completely empty and could also be removed > - the MINIMIZE_RAM_USAGE define (which > avoids use > of the linux > sp-map "cache" on 32-bit) now has no > affect and so > could be > completely removed from the build system > > I plan to do the MINIMIZE_RAM_USAGE > removal as a > follow up CR, > but could add the removal of the "inline" > files to > this CR if > people think it worth removing them. > > I have one missing piece on Aarch64 - I > need to change > MacroAssembler::get_thread to simply call > Thread::current() as > on other platforms, but I don't know how > to write > that. I would > appreciate it if someone could give me the > right > code for that. > > I would also appreciate comments/testing > by the AIX > and PPC64 > folk as well. > > A concern about memory-leaks had > previously been > raised, but > experiments using simple C code on linux > 86 and > Solaris showed > no issues. Also note that Aarch64 already > uses this > kind of > thread-local. > > Thanks, > David > > > > > From david.holmes at oracle.com Wed Nov 11 08:19:19 2015 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 Nov 2015 18:19:19 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <5641D452.6040503@oracle.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563D27B8.4040501@oracle.com> <5641D452.6040503@oracle.com> Message-ID: <5642FA07.8050309@oracle.com> Hi Thomas, Okay here's the next revision: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v5/ I've reinstated a basic ThreadLocalStorage class which will only need two implementations: a POSIX one, and a Windows one (still TBD). This class is always initialized and ThreadLocalStorage::thread() is used from the signal handlers (as today). For platforms that don't have __thread support they can define USE_LIBRARY_BASED_TLS_ONLY at build time to only use the ThreadLocalStorage implementation. Obviously still need to get some performance numbers. I'd appreciate it if you could retest AIX, though as all platforms currently use pthread_get/setspecific I'm confident there will be no platform issues. Thanks, David From aph at redhat.com Wed Nov 11 10:56:43 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 11 Nov 2015 10:56:43 +0000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <56413267.6000001@oracle.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563C5B5B.9060802@oracle.com> <563FD29A.3060103@oracle.com> <56413267.6000001@oracle.com> Message-ID: <56431EEB.7090705@redhat.com> On 09/11/15 23:55, David Holmes wrote: > On 9/11/2015 8:54 AM, David Holmes wrote: > > Just to keep my thinking straight on this, the problem only exists for > threads that existed before the JVM was loaded. All threads allocated > after that will have space for all the TLS variables allocated directly. > So the problem scenario is: > > - external process with existing threads loads the JVM > - existing thread is executing critical library function eg malloc, when > it takes a process-directed signal. > - JVM signal handler runs and accesses _thr_current which triggers > dynamic TLS allocation Why not simply use pthread_* thread-local storage, but only in the signal handler? That would avoid the (fairly unlikely) race condition, at very little cost. Sure, we'd have to use pthread_setspecific() when attaching a thread, but that's no big deal. Andrew. From thomas.stuefe at gmail.com Wed Nov 11 15:03:44 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 11 Nov 2015 16:03:44 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <56431EEB.7090705@redhat.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563C5B5B.9060802@oracle.com> <563FD29A.3060103@oracle.com> <56413267.6000001@oracle.com> <56431EEB.7090705@redhat.com> Message-ID: On Wed, Nov 11, 2015 at 11:56 AM, Andrew Haley wrote: > On 09/11/15 23:55, David Holmes wrote: > > On 9/11/2015 8:54 AM, David Holmes wrote: > > > > Just to keep my thinking straight on this, the problem only exists for > > threads that existed before the JVM was loaded. All threads allocated > > after that will have space for all the TLS variables allocated directly. > > So the problem scenario is: > > > > - external process with existing threads loads the JVM > > - existing thread is executing critical library function eg malloc, when > > it takes a process-directed signal. > > - JVM signal handler runs and accesses _thr_current which triggers > > dynamic TLS allocation > > Why not simply use pthread_* thread-local storage, but only in the > signal handler? That would avoid the (fairly unlikely) race > condition, at very little cost. Sure, we'd have to use > pthread_setspecific() when attaching a thread, but that's no big deal. > > This could work. So, initialize both the pthread TLS slot and the __thread variable on thread creation. We could name them Thread::current and Thread::current_safe or similar. However, we still do not know how big the performance advantage is in using __thread over pthread_getspecific(). May not even worth all the trouble of using __thread. Thomas > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Wed Nov 11 16:36:50 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 11 Nov 2015 17:36:50 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <5642FA07.8050309@oracle.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563D27B8.4040501@oracle.com> <5641D452.6040503@oracle.com> <5642FA07.8050309@oracle.com> Message-ID: Hi David, I get build errors on all my platforms. I think the change misses #include "runtime/threadLocalStorage.hpp" in a couple of places, at least thread.cpp and possible also the os_xx_yy.cpp files. Will take another look tomorrow. Thanks, Thomas On Wed, Nov 11, 2015 at 9:19 AM, David Holmes wrote: > Hi Thomas, > > Okay here's the next revision: > > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v5/ > > I've reinstated a basic ThreadLocalStorage class which will only need two > implementations: a POSIX one, and a Windows one (still TBD). This class is > always initialized and ThreadLocalStorage::thread() is used from the signal > handlers (as today). > > For platforms that don't have __thread support they can define > USE_LIBRARY_BASED_TLS_ONLY at build time to only use the ThreadLocalStorage > implementation. > > Obviously still need to get some performance numbers. > > I'd appreciate it if you could retest AIX, though as all platforms > currently use pthread_get/setspecific I'm confident there will be no > platform issues. > > Thanks, > David > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Wed Nov 11 16:47:00 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 11 Nov 2015 17:47:00 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563D27B8.4040501@oracle.com> <5641D452.6040503@oracle.com> <5642FA07.8050309@oracle.com> Message-ID: On Wed, Nov 11, 2015 at 5:36 PM, Thomas St?fe wrote: > Hi David, > > I get build errors on all my platforms. > > I think the change misses #include "runtime/threadLocalStorage.hpp" in a > couple of places, at least thread.cpp and possible also the os_xx_yy.cpp > files. > > Sorry, I have to correct myself. It is a linker error, I do not find the implementations for the ThreadLocalStorage class methods anywhere. I applied your patch atop a freshly synced hs-rt repo: - .../hotspot $ hg log -l 3 changeset: 9317:3b23f69bc887 8132510__thread_davids_change qbase qtip tip user: stuefe date: Wed Nov 11 16:12:14 2015 +0100 summary: imported patch 8132510__thread_davids_change changeset: 9316:f17e5edbe761 qparent user: tschatzl date: Tue Nov 10 11:07:15 2015 +0100 summary: 8140689: Skip last young-only gc if nothing to do in the mixed gc phase Reviewed-by: mgerdin, drwhite on AIX I get: ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::thread() ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::is_initialized() ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::set_thread(Thread*) ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::init() Am I building wrong? Regards, Thomas > Will take another look tomorrow. > > Thanks, Thomas > > On Wed, Nov 11, 2015 at 9:19 AM, David Holmes > wrote: > >> Hi Thomas, >> >> Okay here's the next revision: >> >> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v5/ >> >> I've reinstated a basic ThreadLocalStorage class which will only need two >> implementations: a POSIX one, and a Windows one (still TBD). This class is >> always initialized and ThreadLocalStorage::thread() is used from the signal >> handlers (as today). >> >> For platforms that don't have __thread support they can define >> USE_LIBRARY_BASED_TLS_ONLY at build time to only use the ThreadLocalStorage >> implementation. >> >> Obviously still need to get some performance numbers. >> >> I'd appreciate it if you could retest AIX, though as all platforms >> currently use pthread_get/setspecific I'm confident there will be no >> platform issues. >> >> Thanks, >> David >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Wed Nov 11 20:23:32 2015 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Nov 2015 06:23:32 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563D27B8.4040501@oracle.com> <5641D452.6040503@oracle.com> <5642FA07.8050309@oracle.com> Message-ID: <5643A3C4.3050803@oracle.com> Sorry Thomas the all important: src/os/posix/vm/threadLocalStorage_posix.cpp was missing from the webrev. Now adding. David ----- On 12/11/2015 2:47 AM, Thomas St?fe wrote: > > On Wed, Nov 11, 2015 at 5:36 PM, Thomas St?fe > wrote: > > Hi David, > > I get build errors on all my platforms. > > I think the change misses #include "runtime/threadLocalStorage.hpp" > in a couple of places, at least thread.cpp and possible also the > os_xx_yy.cpp files. > > > Sorry, I have to correct myself. It is a linker error, I do not find the > implementations for the ThreadLocalStorage class methods anywhere. I > applied your patch atop a freshly synced hs-rt repo: > > - .../hotspot $ hg log -l 3 > changeset: 9317:3b23f69bc887 8132510__thread_davids_change qbase qtip tip > user: stuefe > date: Wed Nov 11 16:12:14 2015 +0100 > summary: imported patch 8132510__thread_davids_change > > changeset: 9316:f17e5edbe761 qparent > user: tschatzl > date: Tue Nov 10 11:07:15 2015 +0100 > summary: 8140689: Skip last young-only gc if nothing to do in the > mixed gc phase > Reviewed-by: mgerdin, drwhite > > on AIX I get: > > ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::thread() > ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::is_initialized() > ld: 0711-317 ERROR: Undefined symbol: > .ThreadLocalStorage::set_thread(Thread*) > ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::init() > > Am I building wrong? > > Regards, Thomas > > > Will take another look tomorrow. > > Thanks, Thomas > > On Wed, Nov 11, 2015 at 9:19 AM, David Holmes > > wrote: > > Hi Thomas, > > Okay here's the next revision: > > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v5/ > > I've reinstated a basic ThreadLocalStorage class which will only > need two implementations: a POSIX one, and a Windows one (still > TBD). This class is always initialized and > ThreadLocalStorage::thread() is used from the signal handlers > (as today). > > For platforms that don't have __thread support they can define > USE_LIBRARY_BASED_TLS_ONLY at build time to only use the > ThreadLocalStorage implementation. > > Obviously still need to get some performance numbers. > > I'd appreciate it if you could retest AIX, though as all > platforms currently use pthread_get/setspecific I'm confident > there will be no platform issues. > > Thanks, > David > > > From thomas.stuefe at gmail.com Thu Nov 12 13:51:12 2015 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 12 Nov 2015 14:51:12 +0100 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: <5643A3C4.3050803@oracle.com> References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563D27B8.4040501@oracle.com> <5641D452.6040503@oracle.com> <5642FA07.8050309@oracle.com> <5643A3C4.3050803@oracle.com> Message-ID: Hi David, builds and works on both variants (with and without USE_LIBRARY_BASED_TLS_ONLY) on AIX and Linux ppc. Small nitpicks: - I probably would have implemented Thread::current() using Thread::current_or_null(). - Also, instead of using the "raw" ThreadLocalStorage::thread(), I would have liked a Thread::current_safe() or similar. Kind Regards, Thomas On Wed, Nov 11, 2015 at 9:23 PM, David Holmes wrote: > Sorry Thomas the all important: > > src/os/posix/vm/threadLocalStorage_posix.cpp > > was missing from the webrev. Now adding. > > David > ----- > > On 12/11/2015 2:47 AM, Thomas St?fe wrote: > >> >> On Wed, Nov 11, 2015 at 5:36 PM, Thomas St?fe > > wrote: >> >> Hi David, >> >> I get build errors on all my platforms. >> >> I think the change misses #include "runtime/threadLocalStorage.hpp" >> in a couple of places, at least thread.cpp and possible also the >> os_xx_yy.cpp files. >> >> >> Sorry, I have to correct myself. It is a linker error, I do not find the >> implementations for the ThreadLocalStorage class methods anywhere. I >> applied your patch atop a freshly synced hs-rt repo: >> >> - .../hotspot $ hg log -l 3 >> changeset: 9317:3b23f69bc887 8132510__thread_davids_change qbase qtip >> tip >> user: stuefe >> date: Wed Nov 11 16:12:14 2015 +0100 >> summary: imported patch 8132510__thread_davids_change >> >> changeset: 9316:f17e5edbe761 qparent >> user: tschatzl >> date: Tue Nov 10 11:07:15 2015 +0100 >> summary: 8140689: Skip last young-only gc if nothing to do in the >> mixed gc phase >> Reviewed-by: mgerdin, drwhite >> >> on AIX I get: >> >> ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::thread() >> ld: 0711-317 ERROR: Undefined symbol: >> .ThreadLocalStorage::is_initialized() >> ld: 0711-317 ERROR: Undefined symbol: >> .ThreadLocalStorage::set_thread(Thread*) >> ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::init() >> >> Am I building wrong? >> >> Regards, Thomas >> >> >> Will take another look tomorrow. >> >> Thanks, Thomas >> >> On Wed, Nov 11, 2015 at 9:19 AM, David Holmes >> > wrote: >> >> Hi Thomas, >> >> Okay here's the next revision: >> >> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v5/ >> >> I've reinstated a basic ThreadLocalStorage class which will only >> need two implementations: a POSIX one, and a Windows one (still >> TBD). This class is always initialized and >> ThreadLocalStorage::thread() is used from the signal handlers >> (as today). >> >> For platforms that don't have __thread support they can define >> USE_LIBRARY_BASED_TLS_ONLY at build time to only use the >> ThreadLocalStorage implementation. >> >> Obviously still need to get some performance numbers. >> >> I'd appreciate it if you could retest AIX, though as all >> platforms currently use pthread_get/setspecific I'm confident >> there will be no platform issues. >> >> Thanks, >> David >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Thu Nov 12 20:35:01 2015 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Nov 2015 06:35:01 +1000 Subject: (L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables In-Reply-To: References: <56370567.3090801@oracle.com> <563C19CF.30001@oracle.com> <563C4824.7040300@oracle.com> <563D27B8.4040501@oracle.com> <5641D452.6040503@oracle.com> <5642FA07.8050309@oracle.com> <5643A3C4.3050803@oracle.com> Message-ID: <5644F7F5.9060202@oracle.com> On 12/11/2015 11:51 PM, Thomas St?fe wrote: > Hi David, > > builds and works on both variants (with and without > USE_LIBRARY_BASED_TLS_ONLY) on AIX and Linux ppc. Great - thanks! > Small nitpicks: > > - I probably would have implemented Thread::current() using > Thread::current_or_null(). I can do that. I presume the compiler will be smart enough. :) > - Also, instead of using the "raw" ThreadLocalStorage::thread(), I would > have liked a Thread::current_safe() or similar. That's a reasonable suggestion too - I was influenced by existing usage, but could change it. I'll send out the formal RFR once I have checked performance and done more testing. Thanks, David > Kind Regards, Thomas > > > On Wed, Nov 11, 2015 at 9:23 PM, David Holmes > wrote: > > Sorry Thomas the all important: > > src/os/posix/vm/threadLocalStorage_posix.cpp > > was missing from the webrev. Now adding. > > David > ----- > > On 12/11/2015 2:47 AM, Thomas St?fe wrote: > > > On Wed, Nov 11, 2015 at 5:36 PM, Thomas St?fe > > >> wrote: > > Hi David, > > I get build errors on all my platforms. > > I think the change misses #include > "runtime/threadLocalStorage.hpp" > in a couple of places, at least thread.cpp and possible > also the > os_xx_yy.cpp files. > > > Sorry, I have to correct myself. It is a linker error, I do not > find the > implementations for the ThreadLocalStorage class methods anywhere. I > applied your patch atop a freshly synced hs-rt repo: > > - .../hotspot $ hg log -l 3 > changeset: 9317:3b23f69bc887 8132510__thread_davids_change > qbase qtip tip > user: stuefe > date: Wed Nov 11 16:12:14 2015 +0100 > summary: imported patch 8132510__thread_davids_change > > changeset: 9316:f17e5edbe761 qparent > user: tschatzl > date: Tue Nov 10 11:07:15 2015 +0100 > summary: 8140689: Skip last young-only gc if nothing to do > in the > mixed gc phase > Reviewed-by: mgerdin, drwhite > > on AIX I get: > > ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::thread() > ld: 0711-317 ERROR: Undefined symbol: > .ThreadLocalStorage::is_initialized() > ld: 0711-317 ERROR: Undefined symbol: > .ThreadLocalStorage::set_thread(Thread*) > ld: 0711-317 ERROR: Undefined symbol: .ThreadLocalStorage::init() > > Am I building wrong? > > Regards, Thomas > > > Will take another look tomorrow. > > Thanks, Thomas > > On Wed, Nov 11, 2015 at 9:19 AM, David Holmes > > >> wrote: > > Hi Thomas, > > Okay here's the next revision: > > http://cr.openjdk.java.net/~dholmes/8132510/webrev.v5/ > > I've reinstated a basic ThreadLocalStorage class which > will only > need two implementations: a POSIX one, and a Windows > one (still > TBD). This class is always initialized and > ThreadLocalStorage::thread() is used from the signal > handlers > (as today). > > For platforms that don't have __thread support they can > define > USE_LIBRARY_BASED_TLS_ONLY at build time to only use the > ThreadLocalStorage implementation. > > Obviously still need to get some performance numbers. > > I'd appreciate it if you could retest AIX, though as all > platforms currently use pthread_get/setspecific I'm > confident > there will be no platform issues. > > Thanks, > David > > > >