From coleen.phillimore at oracle.com Mon Oct 2 14:55:14 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 2 Oct 2017 10:55:14 -0400 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> Message-ID: <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> I can sponsor this for you once you rebase, and fix these compilation errors. Thanks, Coleen On 9/30/17 12:28 AM, Volker Simonis wrote: > Hi Vladimir, > > thanks a lot for remembering these changes! > > Regards, > Volker > > > Vladimir Kozlov > schrieb am Fr. 29. Sep. 2017 um > 15:47: > > I hit build failure when tried to push changes: > > src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' : > conversion from 'size_t' to 'int', possible loss of data > src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' : > conversion from 'size_t' to 'int', possible loss of data > > I am going to fix it by casting (int): > > +? void adjust_size(size_t used) { > +? ? _size = (int)used; > +? ? _data_offset = (int)used; > +? ? _code_end = (address)this + used; > +? ? _data_end = (address)this + used; > +? } > > Note, CodeCache size can't more than 2Gb (max_int) so such casting > is fine. > > Vladimir > > On 9/6/17 6:20 AM, Volker Simonis wrote: > > On Tue, Sep 5, 2017 at 9:36 PM,? > wrote: > >> > >> I was going to make the same comment about the friend > declaration in v1, so > >> v2 looks better to me.? Looks good.? Thank you for finding a > solution to > >> this problem that we've had for a long time.? I will sponsor > this (remind me > >> if I forget after the 18th). > >> > > > > Thanks Coleen! I've updated > > > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ > > > > > in-place and added you as a second reviewer. > > > > Regards, > > Volker > > > > > >> thanks, > >> Coleen > >> > >> > >> > >> On 9/5/17 1:17 PM, Vladimir Kozlov wrote: > >>> > >>> On 9/5/17 9:49 AM, Volker Simonis wrote: > >>>> > >>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov > >>>> > wrote: > >>>>> > >>>>> May be add new CodeBlob's method to adjust sizes instead of > directly > >>>>> setting > >>>>> them in? CodeCache::free_unused_tail(). Then you would not > need friend > >>>>> class > >>>>> CodeCache in CodeBlob. > >>>>> > >>>> > >>>> Changed as suggested (I didn't liked the friend declaration > as well :) > >>>> > >>>>> Also I think adjustment to header_size should be done in > >>>>> CodeCache::free_unused_tail() to limit scope of code who > knows about > >>>>> blob > >>>>> layout. > >>>>> > >>>> > >>>> Yes, that's much cleaner. Please find the updated webrev here: > >>>> > >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ > > >>> > >>> > >>> Good. > >>> > >>>> > >>>> I've also found another "day 1" problem in StubQueue::next(): > >>>> > >>>>? ? ? Stub* next(Stub* s) const ? ? ? ? { int i = > >>>> index_of(s) + stub_size(s); > >>>> - ? ? ? ? ?if (i == > >>>> _buffer_limit) i = 0; > >>>> + ? ? ? ? ?// Only wrap > >>>> around in the non-contiguous case (see stubss.cpp) > >>>> + ? ? ? ? ?if (i == > >>>> _buffer_limit && _queue_end < _buffer_limit) i = 0; > >>>> ? ? ? ? ? ?return (i == > >>>> _queue_end) ? NULL : stub_at(i); > >>>> ? ? ? ? ?} > >>>> > >>>> The problem was that the method was not prepared to handle > the case > >>>> where _buffer_limit == _queue_end == _buffer_size which lead > to an > >>>> infinite recursion when iterating over a StubQueue with > >>>> StubQueue::next() until next() returns NULL (as this was for > example > >>>> done with -XX:+PrintInterpreter). But with the new, trimmed > CodeBlob > >>>> we run into exactly this situation. > >>> > >>> > >>> Okay. > >>> > >>>> > >>>> While doing this last fix I also noticed that > "StubQueue::stubs_do()", > >>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" > don't seem > >>>> to be used anywhere in the open code base (please correct me > if I'm > >>>> wrong). What do you think, maybe we should remove this code in a > >>>> follow up change if it is really not needed? > >>> > >>> > >>> register_queue() is used in constructor. Other 2 you can remove. > >>> stub_code_begin() and stub_code_end() are not used too -remove. > >>> I thought we run on linux with flag which warn about unused code. > >>> > >>>> > >>>> Finally, could you please run the new version through JPRT > and sponsor > >>>> it once jdk10/hs will be opened again? > >>> > >>> > >>> Will do when jdk10 "consolidation" is finished. Please, remind > me later if > >>> I forget. > >>> > >>> Thanks, > >>> Vladimir > >>> > >>>> > >>>> Thanks, > >>>> Volker > >>>> > >>>>> Thanks, > >>>>> Vladimir > >>>>> > >>>>> > >>>>> On 9/1/17 8:46 AM, Volker Simonis wrote: > >>>>>> > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I've decided to split the fix for the > 'CodeHeap::contains_blob()' > >>>>>> problem into its own issue "8187091: > ReturnBlobToWrongHeapTest fails > >>>>>> because of problems in CodeHeap::contains_blob()" > >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and > started a new > >>>>>> review thread for discussing it at: > >>>>>> > >>>>>> > >>>>>> > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html > >>>>>> > >>>>>> So please lets keep this thread for discussing the > interpreter code > >>>>>> size issue only. I've prepared a new version of the webrev > which is > >>>>>> the same as the first one with the only difference that the > change to > >>>>>> 'CodeHeap::contains_blob()' has been removed: > >>>>>> > >>>>>> > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ > > >>>>>> > >>>>>> Thanks, > >>>>>> Volker > >>>>>> > >>>>>> > >>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis > >>>>>> > wrote: > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov > >>>>>>> > wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> Very good change. Thank you, Volker. > >>>>>>>> > >>>>>>>> About contains_blob(). The problem is that AOTCompiledMethod > >>>>>>>> allocated > >>>>>>>> in > >>>>>>>> CHeap and not in aot code section (which is RO): > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 > >>>>>>>> > >>>>>>>> It is allocated in CHeap after AOT library is loaded. Its > >>>>>>>> code_begin() > >>>>>>>> points to AOT code section but AOTCompiledMethod* points > outside it > >>>>>>>> (to > >>>>>>>> normal malloced space) so you can't use (char*)blob address. > >>>>>>>> > >>>>>>> > >>>>>>> Thanks for the explanation - now I got it. > >>>>>>> > >>>>>>>> There are 2 ways to fix it, I think. > >>>>>>>> One is to add new field to CodeBlobLayout and set it to > blob* address > >>>>>>>> for > >>>>>>>> normal CodeCache blobs and to code_begin for AOT code. > >>>>>>>> Second is to use contains(blob->code_end() - 1) assuming > that AOT > >>>>>>>> code > >>>>>>>> is > >>>>>>>> never zero. > >>>>>>>> > >>>>>>> > >>>>>>> I'll give it a try tomorrow and will send out a new webrev. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Volker > >>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Vladimir > >>>>>>>> > >>>>>>>> > >>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad > >>>>>>>>> > wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> While working on this, I found another problem which > is related to > >>>>>>>>>>> the > >>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing > the JTreg > >>>>>>>>>>> test > >>>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. > >>>>>>>>>>> > >>>>>>>>>>> The problem is that JDK-8183573 replaced > >>>>>>>>>>> > >>>>>>>>>>>? ? ? ? virtual bool contains_blob(const CodeBlob* > blob) const { > >>>>>>>>>>> return > >>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } > >>>>>>>>>>> > >>>>>>>>>>> by: > >>>>>>>>>>> > >>>>>>>>>>>? ? ? ? bool contains_blob(const CodeBlob* blob) const > { return > >>>>>>>>>>> contains(blob->code_begin()); } > >>>>>>>>>>> > >>>>>>>>>>> But that my be wrong in the corner case where the size > of the > >>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists > only of the > >>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that > case > >>>>>>>>>>> CodeBlob::code_begin() points right behind the > CodeBlob's header > >>>>>>>>>>> which > >>>>>>>>>>> is a memory location which doesn't belong to the > CodeBlob anymore. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I recall this change was somehow necessary to allow merging > >>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into > >>>>>>>>>> one devirtualized method, so you need to ensure all AOT > tests > >>>>>>>>>> pass with this change (on linux-x64). > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed > and passed > >>>>>>>>> successful. Are there any other tests I should check? > >>>>>>>>> > >>>>>>>>> That said, it is a little hard to follow the stages of > your change. > >>>>>>>>> It > >>>>>>>>> seems like > >>>>>>>>> > http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ > > >>>>>>>>> was reviewed [1] but then finally the slightly changed > version from > >>>>>>>>> > http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ > > >>>>>>>>> was > >>>>>>>>> checked in and linked to the bug report. > >>>>>>>>> > >>>>>>>>> The first, reviewed version of the change still had a > correct > >>>>>>>>> version > >>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while > the second, > >>>>>>>>> checked in version has the faulty version of that method. > >>>>>>>>> > >>>>>>>>> I don't know why you finally did that change to > 'contains_blob()' > >>>>>>>>> but > >>>>>>>>> I don't see any reason why we shouldn't be able to > directly use the > >>>>>>>>> blob's address for inclusion checking. From what I > understand, it > >>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap > so no > >>>>>>>>> reason > >>>>>>>>> to mess with 'CodeBlob::code_begin()'. > >>>>>>>>> > >>>>>>>>> Please let me know if I'm missing something. > >>>>>>>>> > >>>>>>>>> [1] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html > >>>>>>>>> > >>>>>>>>>> I can't help to wonder if we'd not be better served by > disallowing > >>>>>>>>>> zero-sized payloads. Is this something that can ever > actually > >>>>>>>>>> happen except by abuse of the white box API? > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) > specifically > >>>>>>>>> wants to allocate "segment sized" blocks which is most > easily > >>>>>>>>> achieved > >>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's > nothing > >>>>>>>>> wrong > >>>>>>>>> about it if we handle the inclusion tests correctly. > >>>>>>>>> > >>>>>>>>> Thank you and best regards, > >>>>>>>>> Volker > >>>>>>>>> > >>>>>>>>>> /Claes > >> > >> > From harold.seigel at oracle.com Mon Oct 2 14:59:25 2017 From: harold.seigel at oracle.com (harold seigel) Date: Mon, 2 Oct 2017 10:59:25 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> Message-ID: Hi Coleen, The hs runtime changes look good. Thanks! Harold On 9/28/2017 5:36 PM, coleen.phillimore at oracle.com wrote: > > Thank you to Stefan Karlsson offlist for pointing out that the > previous .01 version of this webrev breaks CMS in that it doesn't > remember ClassLoaderData::_handles that are changed and added while > concurrent marking is in progress.? I've fixed this bug to move the > Klass::_modified_oops and _accumulated_modified_oops to the > ClassLoaderData and use these fields in the CMS remarking phase to > catch any new handles that are added.?? This also fixes this bug > https://bugs.openjdk.java.net/browse/JDK-8173988 . > > In addition, the previous version of this change removed an > optimization during young collection, which showed some uncertain > performance regression in young pause times, so I added this > optimization back to not walk ClassLoaderData during young collections > if all the oops are old.? The performance results of SPECjbb2015 now > are slightly better, but not significantly. > > This latest patch has been tested on tier1-5 on linux x64 and windows > x64 in mach5 test harness. > > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ > > Can I get at least 3 reviewers?? One from each of the compiler, gc, > and runtime group at least since there are changes to all 3. > > Thanks! > Coleen > > > On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >> Summary: Add indirection for fetching mirror so that GC doesn't have >> to follow CLD::_klasses >> >> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 >> changes. >> >> Ran nightly tests through Mach5 and RBT.?? Early performance testing >> showed good performance improvment in GC class loader data processing >> time, but nmethod processing time continues to dominate. Also >> performace testing showed no throughput regression.?? I'm rerunning >> both of these performance testing and will post the numbers. >> >> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >> >> Thanks, >> Coleen From coleen.phillimore at oracle.com Mon Oct 2 15:05:51 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 2 Oct 2017 11:05:51 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> Message-ID: <2abe18fc-9ff3-b17f-700e-4cd8ff5c7ee1@oracle.com> Thank you, Harold! Coleen On 10/2/17 10:59 AM, harold seigel wrote: > Hi Coleen, > > The hs runtime changes look good. > > Thanks! Harold > > > On 9/28/2017 5:36 PM, coleen.phillimore at oracle.com wrote: >> >> Thank you to Stefan Karlsson offlist for pointing out that the >> previous .01 version of this webrev breaks CMS in that it doesn't >> remember ClassLoaderData::_handles that are changed and added while >> concurrent marking is in progress.? I've fixed this bug to move the >> Klass::_modified_oops and _accumulated_modified_oops to the >> ClassLoaderData and use these fields in the CMS remarking phase to >> catch any new handles that are added.?? This also fixes this bug >> https://bugs.openjdk.java.net/browse/JDK-8173988 . >> >> In addition, the previous version of this change removed an >> optimization during young collection, which showed some uncertain >> performance regression in young pause times, so I added this >> optimization back to not walk ClassLoaderData during young >> collections if all the oops are old.? The performance results of >> SPECjbb2015 now are slightly better, but not significantly. >> >> This latest patch has been tested on tier1-5 on linux x64 and windows >> x64 in mach5 test harness. >> >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >> >> Can I get at least 3 reviewers?? One from each of the compiler, gc, >> and runtime group at least since there are changes to all 3. >> >> Thanks! >> Coleen >> >> >> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >>> Summary: Add indirection for fetching mirror so that GC doesn't have >>> to follow CLD::_klasses >>> >>> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 >>> changes. >>> >>> Ran nightly tests through Mach5 and RBT.?? Early performance testing >>> showed good performance improvment in GC class loader data >>> processing time, but nmethod processing time continues to dominate. >>> Also performace testing showed no throughput regression.?? I'm >>> rerunning both of these performance testing and will post the numbers. >>> >>> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >>> >>> Thanks, >>> Coleen > From tobias.hartmann at oracle.com Mon Oct 2 15:18:58 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 2 Oct 2017 17:18:58 +0200 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> Message-ID: <34c406b6-e993-d662-8fb8-4e7586775b53@oracle.com> Hi Coleen, On 28.09.2017 23:36, coleen.phillimore at oracle.com wrote: > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ > > Can I get at least 3 reviewers?? One from each of the compiler, gc, and runtime group at least since there are changes > to all 3. The compiler changes look good to me. Found a little typo: - In line 1776 of memnode.cpp: it should be "loads" instead of "load" I just wanted to mention that SharkIntrinsics::do_Object_getClass() would need to be fixed as well but I've seen that you filed JDK-8171853 [1] to remove Shark which is broken with JDK 9 anyway. Best regards, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8171853 From coleen.phillimore at oracle.com Mon Oct 2 15:24:48 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 2 Oct 2017 11:24:48 -0400 Subject: CFV: New hotspot Group Member: Ioi Lam Message-ID: I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in the hotspot Group. Ioi has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 79 changes.?? He is an expert in the area of class data sharing. Votes are due by Monday, October 16, 2017. Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by replying to this mailing list. For Lazy Consensus voting instructions, see [2]. Coleen [1]http://openjdk.java.net/census#hotspot [2]http://openjdk.java.net/groups/#member-vote From coleen.phillimore at oracle.com Mon Oct 2 15:31:22 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 2 Oct 2017 11:31:22 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <34c406b6-e993-d662-8fb8-4e7586775b53@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> <34c406b6-e993-d662-8fb8-4e7586775b53@oracle.com> Message-ID: <394576b9-ff3a-acf3-fe6a-a0f924afaa8d@oracle.com> On 10/2/17 11:18 AM, Tobias Hartmann wrote: > Hi Coleen, > > On 28.09.2017 23:36, coleen.phillimore at oracle.com wrote: >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >> >> Can I get at least 3 reviewers?? One from each of the compiler, gc, >> and runtime group at least since there are changes to all 3. > > The compiler changes look good to me. > > Found a little typo: > - In line 1776 of memnode.cpp: it should be "loads" instead of "load" Thank you Tobias.? I fixed this typo. > > I just wanted to mention that SharkIntrinsics::do_Object_getClass() > would need to be fixed as well but I've seen that you filed > JDK-8171853 [1] to remove Shark which is broken with JDK 9 anyway. Yes, I think we've broken shark for a while now and it should be removed, unless someone in the open wants to take it over.?? I don't have any idea how to build it anymore. Thanks! Coleen > > Best regards, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8171853 From daniel.daugherty at oracle.com Mon Oct 2 15:33:03 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 2 Oct 2017 09:33:03 -0600 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: <384bce26-f5ae-304e-4607-39e55f23ff11@oracle.com> Vote: yes Dan On 10/2/17 9:24 AM, coleen.phillimore at oracle.com wrote: > I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in > the hotspot Group. > > Ioi has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 79 changes.?? He is an expert in > the area of class data sharing. > > Votes are due by Monday, October 16, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > From bob.vandette at oracle.com Mon Oct 2 15:46:48 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Mon, 2 Oct 2017 11:46:48 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> Message-ID: <10D254F1-ADA7-4EEB-A4AA-9BF6F42B72E0@oracle.com> > On Sep 27, 2017, at 9:20 PM, David Holmes wrote: > > Hi Bob, > > On 28/09/2017 1:45 AM, Bob Vandette wrote: >> David, Thank you for taking the time and providing a detailed review of these changes. >> Where I haven?t responded, I?ll update the implementation based on your comments. > > Okay. I've trimmed below to only leave things I have follow up on. > >>> If this is all confined to Linux only then this should be a linux-only flag and all the changes should be confined to linux code. No shared osContainer API is needed as it can be defined as a nested class of os::Linux, and will only be called from os_linux.cpp. >> I received feedback on my other Container work where I was asked to >> make sure it was possible to support other container technologies. >> The addition of the shared osContainer API is to prepare for this and >> recognize that this will eventually be supported other platforms. > > The problem is that the proposed osContainer API is totally cgroup centric. That API might not make sense for a different container technology. Even if Docker is used on different platforms, does it use cgroups on those other platforms? Until we have at least two examples we want to support we don't know how to formulate a generic API. So in my opinion we should initially keep this Linux specific as a proof-of-concept for future more general container support. I was trying to prepare for the JEP implementation where M&M and JFR hooks will need a shared API to call. I was expecting to return a not supported error code on platforms that didn?t have the os specific implementations. I did take a look at a few other types of containers (VMWare?s SDK for example) and they all had similar types of functions for retrieving the number of cpus and quotas along with the memory limits, swap and free space. I assumed that we could clean up the shared APIs once we did the second container support. In any case that work can be done by the JEP integration so I?m ok with making this os/linux specific but I still would like to keep this support in it?s own file (osContainer_linux.cpp and osContainer_linux.hpp) so all the cgroup processing is kept separate and these files don?t have to move later. This would make it easier to support alternate types of containers. I also wanted to avoid adding lots more size to os_linux.cpp. It?s already too big. Bob. > >>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>>> may not satisfy every users needs, I?ve added an additional flag to allow the >>>> number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. >>> >>> I would suggest that ActiveProcessorCount be constrained to being >1 - this is in line with our plans to get rid of AssumeMP/os::is_MP and always build in MP support. Otherwise a count of 1 today won't behave the same as a count of 1 in the future. >> What if I return true for is_MP anytime ActiveProcessorCount is set. I?d like to provide the ability of specifying a single processor. > > If I make the AssumeMP change for 18.3 as planned then this won't be an issue. I'd better get onto that :) > >>> >>> Also you have defined this globally but only accounted for it on Linux. I think it makes sense to support this flag on all platforms (a generalization of AssumeMP). Otherwise it needs to be defined as a Linux-only flag in the pd_globals.hpp file >> Good idea. > > You could even factor this out as a separate issue/task independent of the container work. > >>> Style issue: >>> >>> 2121 if (i < 0) st->print("OSContainer::active_processor_count() failed"); >>> 2122 else >>> >>> and elsewhere. Please move the st->print to its own line. Others may argue for always using blocks ({}) in if/else. >> There doesn?t seem to be consistency on this issue. > > No there's no consistency :( And this isn't in the hotspot style guide AFAICS. But I'm sure it's in some other coding guidelines ;-) > >>> 5024 // User has overridden the number of active processors >>> 5025 if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) { >>> 5026 log_trace(os)("active_processor_count: " >>> 5027 "active processor count set by user : %d", >>> 5028 (int)ActiveProcessorCount); >>> 5029 return ActiveProcessorCount; >>> 5030 } >>> >>> We don't normally check flags in runtime code like this - this will be executed on every call, and you will see that logging each time. This should be handled during initialization (os::Posix::init()? - if applying this flag globally) - with logging occurring once. The above should just reduce to: >>> >>> if (ActiveProcessorCount > 0) { >>> return ActiveProcessorCount; // explicit user control of number of cpus >>> } >>> >>> Even then I do get concerned about having to always check for the least common cases before the most common one. :( >> This is not in a highly used function so it should be ok. > > I really don't like seeing the FLAG_IS_DEFAULT in there - and you need to move the logging anyway. > >>> >>> The osContainer_.hpp files seem to be unnecessary as they are all empty. >> I?ll remove them. I wasn?t sure if there was a convention to move more of osContainer_linux.cpp -> osContainer_linux.hpp. >> For example: classCgroupSubsystem > > The header is only needed to expose an API for other code to use. Locally defined classes can be kept in the .cpp file. > >>> 34 class CgroupSubsystem: CHeapObj { >>> >>> You defined this class as CHeapObj and added a destructor to free a few things, but I can't see where the instances of this class will themselves ever be freed >> What?s the latest thinking on freeing CHeap Objects on termination? Is it really worth wasting cpu cycles when our >> process is about to terminate? If not, I?ll just remove the destructors. > > Philosophically I prefer new APIs to play nice with the invocation API, even if existing API's don't play nice. But that's just me. > >>> >>> 62 void set_subsystem_path(char *cgroup_path) { >>> >>> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere? >> I tried several different ways of declaring the container accessor functions and >> always ended up with warnings due to scanf not being able to validate arguments >> since the format string didn?t end up being a string literal. I originally was using templates >> and then ended up with the macros. I tried several different casts but could resolve the problem. > > Sounds like something Kim Barrett should take a look at :) > > Thanks, > David From tobias.hartmann at oracle.com Mon Oct 2 16:07:40 2017 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 2 Oct 2017 18:07:40 +0200 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: <4925faea-082c-4d8b-5e48-49c5299e3f6d@oracle.com> Vote: yes On 02.10.2017 17:24, coleen.phillimore at oracle.com wrote: > I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in the hotspot Group. > > Ioi has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 79 changes. > He is an expert in the area of class data sharing. > > Votes are due by Monday, October 16, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by > replying to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote From jesper.wilhelmsson at oracle.com Mon Oct 2 16:45:48 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Mon, 2 Oct 2017 18:45:48 +0200 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: <6B9190C0-E9C8-42FC-9F73-38FAC49C1EDF@oracle.com> Vote: yes /Jesper > On 2 Oct 2017, at 17:24, coleen.phillimore at oracle.com wrote: > > I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in the hotspot Group. > > Ioi has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 79 changes. He is an expert in the area of class data sharing. > > Votes are due by Monday, October 16, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by replying to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote From erik.osterlund at oracle.com Mon Oct 2 16:48:34 2017 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Mon, 2 Oct 2017 18:48:34 +0200 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> Message-ID: Hi Coleen, I looked a bit at the code generation part of this change. It beats me that the indirect load required for resolution of the oop handle was somewhat encapsulated in a resolve oop handle call in the macro assembler (a bit like resolve jobject), but in the corresponding C1 and C2 code, there is no such abstraction. Instead the loads required for resolve are generated straight up. Therefore, if the logic involved in resolving an OopHandle ever changes, it might start to get tricky to chase down where it is being used too. So I wonder if you would find it useful to encapsulate that into some method on e.g. LIRGenerator for C1 and GraphKit for C2? In the case of C2 it might be a bit tricky to abstract due to the node matching logic, unless we want to macro expand a new ResolveOopHandleNode, or something like that. Or a matching function maybe. Just a thought that beat me reading through the changes. I like abstractions! Thanks, /Erik > On 28 Sep 2017, at 23:36, coleen.phillimore at oracle.com wrote: > > > Thank you to Stefan Karlsson offlist for pointing out that the previous .01 version of this webrev breaks CMS in that it doesn't remember ClassLoaderData::_handles that are changed and added while concurrent marking is in progress. I've fixed this bug to move the Klass::_modified_oops and _accumulated_modified_oops to the ClassLoaderData and use these fields in the CMS remarking phase to catch any new handles that are added. This also fixes this bug https://bugs.openjdk.java.net/browse/JDK-8173988 . > > In addition, the previous version of this change removed an optimization during young collection, which showed some uncertain performance regression in young pause times, so I added this optimization back to not walk ClassLoaderData during young collections if all the oops are old. The performance results of SPECjbb2015 now are slightly better, but not significantly. > > This latest patch has been tested on tier1-5 on linux x64 and windows x64 in mach5 test harness. > > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ > > Can I get at least 3 reviewers? One from each of the compiler, gc, and runtime group at least since there are changes to all 3. > > Thanks! > Coleen > > >> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >> Summary: Add indirection for fetching mirror so that GC doesn't have to follow CLD::_klasses >> >> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 changes. >> >> Ran nightly tests through Mach5 and RBT. Early performance testing showed good performance improvment in GC class loader data processing time, but nmethod processing time continues to dominate. Also performace testing showed no throughput regression. I'm rerunning both of these performance testing and will post the numbers. >> >> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >> >> Thanks, >> Coleen From coleen.phillimore at oracle.com Mon Oct 2 17:04:19 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 2 Oct 2017 13:04:19 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> Message-ID: On 10/2/17 12:48 PM, Erik Osterlund wrote: > Hi Coleen, > > I looked a bit at the code generation part of this change. > It beats me that the indirect load required for resolution of the oop handle was somewhat encapsulated in a resolve oop handle call in the macro assembler (a bit like resolve jobject), but in the corresponding C1 and C2 code, there is no such abstraction. Instead the loads required for resolve are generated straight up. Therefore, if the logic involved in resolving an OopHandle ever changes, it might start to get tricky to chase down where it is being used too. Hi Erik,? I wanted the load encaspulated in resolve_oop_handle() in the macroAssembler, but I didn't know how to change the c1/c2 code (or graal) to do the same. > > So I wonder if you would find it useful to encapsulate that into some method on e.g. LIRGenerator for C1 and GraphKit for C2? > In the case of C2 it might be a bit tricky to abstract due to the node matching logic, unless we want to macro expand a new ResolveOopHandleNode, or something like that. Or a matching function maybe. Can I file a seperate RFE for this?? I like the idea very much but would like to push this larger change first. > > Just a thought that beat me reading through the changes. I like abstractions! Me too! Thanks, Coleen > > Thanks, > /Erik > >> On 28 Sep 2017, at 23:36, coleen.phillimore at oracle.com wrote: >> >> >> Thank you to Stefan Karlsson offlist for pointing out that the previous .01 version of this webrev breaks CMS in that it doesn't remember ClassLoaderData::_handles that are changed and added while concurrent marking is in progress. I've fixed this bug to move the Klass::_modified_oops and _accumulated_modified_oops to the ClassLoaderData and use these fields in the CMS remarking phase to catch any new handles that are added. This also fixes this bug https://bugs.openjdk.java.net/browse/JDK-8173988 . >> >> In addition, the previous version of this change removed an optimization during young collection, which showed some uncertain performance regression in young pause times, so I added this optimization back to not walk ClassLoaderData during young collections if all the oops are old. The performance results of SPECjbb2015 now are slightly better, but not significantly. >> >> This latest patch has been tested on tier1-5 on linux x64 and windows x64 in mach5 test harness. >> >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >> >> Can I get at least 3 reviewers? One from each of the compiler, gc, and runtime group at least since there are changes to all 3. >> >> Thanks! >> Coleen >> >> >>> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >>> Summary: Add indirection for fetching mirror so that GC doesn't have to follow CLD::_klasses >>> >>> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 changes. >>> >>> Ran nightly tests through Mach5 and RBT. Early performance testing showed good performance improvment in GC class loader data processing time, but nmethod processing time continues to dominate. Also performace testing showed no throughput regression. I'm rerunning both of these performance testing and will post the numbers. >>> >>> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >>> >>> Thanks, >>> Coleen From coleen.phillimore at oracle.com Mon Oct 2 17:10:15 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 2 Oct 2017 13:10:15 -0400 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: <3a248076-9b93-ab0d-0327-3c24931014e6@oracle.com> Vote: yes On 10/2/17 11:24 AM, coleen.phillimore at oracle.com wrote: > I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in > the hotspot Group. > > Ioi has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 79 changes.?? He is an expert in > the area of class data sharing. > > Votes are due by Monday, October 16, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote From erik.osterlund at oracle.com Mon Oct 2 17:13:38 2017 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Mon, 2 Oct 2017 19:13:38 +0200 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> Message-ID: Hi Coleen, > On 2 Oct 2017, at 19:04, coleen.phillimore at oracle.com wrote: > > > >> On 10/2/17 12:48 PM, Erik Osterlund wrote: >> Hi Coleen, >> >> I looked a bit at the code generation part of this change. >> It beats me that the indirect load required for resolution of the oop handle was somewhat encapsulated in a resolve oop handle call in the macro assembler (a bit like resolve jobject), but in the corresponding C1 and C2 code, there is no such abstraction. Instead the loads required for resolve are generated straight up. Therefore, if the logic involved in resolving an OopHandle ever changes, it might start to get tricky to chase down where it is being used too. > > Hi Erik, I wanted the load encaspulated in resolve_oop_handle() in the macroAssembler, but I didn't know how to change the c1/c2 code (or graal) to do the same. >> >> So I wonder if you would find it useful to encapsulate that into some method on e.g. LIRGenerator for C1 and GraphKit for C2? >> In the case of C2 it might be a bit tricky to abstract due to the node matching logic, unless we want to macro expand a new ResolveOopHandleNode, or something like that. Or a matching function maybe. > > Can I file a seperate RFE for this? I like the idea very much but would like to push this larger change first. Sure, I am fine with that. >> >> Just a thought that beat me reading through the changes. I like abstractions! > > Me too! :) Thanks, /Erik > Thanks, > Coleen >> >> Thanks, >> /Erik >> >>> On 28 Sep 2017, at 23:36, coleen.phillimore at oracle.com wrote: >>> >>> >>> Thank you to Stefan Karlsson offlist for pointing out that the previous .01 version of this webrev breaks CMS in that it doesn't remember ClassLoaderData::_handles that are changed and added while concurrent marking is in progress. I've fixed this bug to move the Klass::_modified_oops and _accumulated_modified_oops to the ClassLoaderData and use these fields in the CMS remarking phase to catch any new handles that are added. This also fixes this bug https://bugs.openjdk.java.net/browse/JDK-8173988 . >>> >>> In addition, the previous version of this change removed an optimization during young collection, which showed some uncertain performance regression in young pause times, so I added this optimization back to not walk ClassLoaderData during young collections if all the oops are old. The performance results of SPECjbb2015 now are slightly better, but not significantly. >>> >>> This latest patch has been tested on tier1-5 on linux x64 and windows x64 in mach5 test harness. >>> >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >>> >>> Can I get at least 3 reviewers? One from each of the compiler, gc, and runtime group at least since there are changes to all 3. >>> >>> Thanks! >>> Coleen >>> >>> >>>> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >>>> Summary: Add indirection for fetching mirror so that GC doesn't have to follow CLD::_klasses >>>> >>>> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 changes. >>>> >>>> Ran nightly tests through Mach5 and RBT. Early performance testing showed good performance improvment in GC class loader data processing time, but nmethod processing time continues to dominate. Also performace testing showed no throughput regression. I'm rerunning both of these performance testing and will post the numbers. >>>> >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >>>> >>>> Thanks, >>>> Coleen > From vladimir.kozlov at oracle.com Mon Oct 2 18:29:12 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 2 Oct 2017 11:29:12 -0700 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: <3a248076-9b93-ab0d-0327-3c24931014e6@oracle.com> References: <3a248076-9b93-ab0d-0327-3c24931014e6@oracle.com> Message-ID: Vote: yes Vladimir > On Oct 2, 2017, at 10:10 AM, coleen.phillimore at oracle.com wrote: > > Vote: yes > >> On 10/2/17 11:24 AM, coleen.phillimore at oracle.com wrote: >> I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in the hotspot Group. >> >> Ioi has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 79 changes. He is an expert in the area of class data sharing. >> >> Votes are due by Monday, October 16, 2017. >> >> Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by replying to this mailing list. >> >> For Lazy Consensus voting instructions, see [2]. >> >> Coleen >> >> [1]http://openjdk.java.net/census#hotspot >> [2]http://openjdk.java.net/groups/#member-vote > From robbin.ehn at oracle.com Mon Oct 2 20:20:58 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 2 Oct 2017 22:20:58 +0200 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> Message-ID: <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> Hi Bob, As I said in your presentation for RT. If kernel if configured with cgroup this should always be read (otherwise we get wrong values). E.g. fedora have had cgroups default on several years (I believe most distros have it on). - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off? - log target container would make little sense since almost all linuxes run with croups on. - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's. Thanks for trying to fixing this! /Robbin On 09/22/2017 04:27 PM, Bob Vandette wrote: > Please review these changes that improve on docker container detection and the > automatic configuration of the number of active CPUs and total and free memory > based on the containers resource limitation settings and metric data files. > > http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ > > These changes are enabled with -XX:+UseContainerSupport. > > You can enable logging for this support via -Xlog:os+container=trace. > > Since the dynamic selection of CPUs based on cpusets, quotas and shares > may not satisfy every users needs, I?ve added an additional flag to allow the > number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. > > > Bob. > > > From david.holmes at oracle.com Mon Oct 2 22:05:16 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Oct 2017 08:05:16 +1000 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: <5c43ea90-b50d-bfa9-1584-a5820b7040f2@oracle.com> Vote: yes David On 3/10/2017 1:24 AM, coleen.phillimore at oracle.com wrote: > I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in > the hotspot Group. > > Ioi has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 79 changes.?? He is an expert in the > area of class data sharing. > > Votes are due by Monday, October 16, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote From david.holmes at oracle.com Mon Oct 2 22:46:20 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Oct 2017 08:46:20 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> Message-ID: <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> Hi Robbin, I have some views on this :) On 3/10/2017 6:20 AM, Robbin Ehn wrote: > Hi Bob, > > As I said in your presentation for RT. > If kernel if configured with cgroup this should always be read > (otherwise we get wrong values). > E.g. fedora have had cgroups default on several years (I believe most > distros have it on). > > - No option is needed at all: right now we have wrong values your fix > will provide right ones, why would you ever what to turn that off? It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only. > - log target container would make little sense since almost all linuxes > run with croups on. Again the capability is present but may not be enabled/configured. > - For cpuset, the processes affinity mask already reflect cgroup setting > so you don't need to look into cgroup for that > ? If you do, you would miss any processes specific affinity mask. So > _cpu_count() should already be returning the right number of CPU's. While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's better IMHO to read everything from the cgroups if configured to use cgroups. Cheers, David > > Thanks for trying to fixing this! > > /Robbin > > On 09/22/2017 04:27 PM, Bob Vandette wrote: >> Please review these changes that improve on docker container detection >> and the >> automatic configuration of the number of active CPUs and total and >> free memory >> based on the containers resource limitation settings and metric data >> files. >> >> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >> >> >> These changes are enabled with -XX:+UseContainerSupport. >> >> You can enable logging for this support via -Xlog:os+container=trace. >> >> Since the dynamic selection of CPUs based on cpusets, quotas and shares >> may not satisfy every users needs, I?ve added an additional flag to >> allow the >> number of CPUs to be overridden.? This flag is named >> -XX:ActiveProcessorCount=xx. >> >> >> Bob. >> >> >> From david.holmes at oracle.com Tue Oct 3 01:33:55 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Oct 2017 11:33:55 +1000 Subject: (XS) RFR: 8188246: Add test/hotspot/jtreg/gc/logging/TestPrintReferences.java to ProblemList.txt Message-ID: <86cbbd16-f353-5214-7a4c-aca3e74afbab@oracle.com> The test fails intermittently in tier1 testing so we need to exclude it until fixed. patch inline below. webrev: http://cr.openjdk.java.net/~dholmes/8188246/webrev/ Will push under trivial rules as soon as I have one Review. Thanks, David --- old/test/hotspot/jtreg/ProblemList.txt 2017-10-02 21:26:20.127717945 -0400 +++ new/test/hotspot/jtreg/ProblemList.txt 2017-10-02 21:26:18.043599357 -0400 @@ -64,6 +64,7 @@ gc/g1/humongousObjects/TestHeapCounters.java 8178918 generic-all gc/stress/gclocker/TestGCLockerWithG1.java 8179226 generic-all gc/survivorAlignment/TestPromotionFromSurvivorToTenuredAfterMinorGC.java 8177765 generic-all +gc/logging/TestPrintReferences.java 8188245 generic-all ############################################################################# From daniel.daugherty at oracle.com Tue Oct 3 01:56:54 2017 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 2 Oct 2017 19:56:54 -0600 Subject: (XS) RFR: 8188246: Add test/hotspot/jtreg/gc/logging/TestPrintReferences.java to ProblemList.txt In-Reply-To: <86cbbd16-f353-5214-7a4c-aca3e74afbab@oracle.com> References: <86cbbd16-f353-5214-7a4c-aca3e74afbab@oracle.com> Message-ID: On 10/2/17 7:33 PM, David Holmes wrote: > The test fails intermittently in tier1 testing so we need to exclude > it until fixed. patch inline below. > > webrev: http://cr.openjdk.java.net/~dholmes/8188246/webrev/ Thumbs up! Dan > > Will push under trivial rules as soon as I have one Review. > > Thanks, > David > > --- old/test/hotspot/jtreg/ProblemList.txt??? 2017-10-02 > 21:26:20.127717945 -0400 > +++ new/test/hotspot/jtreg/ProblemList.txt??? 2017-10-02 > 21:26:18.043599357 -0400 > @@ -64,6 +64,7 @@ > ?gc/g1/humongousObjects/TestHeapCounters.java 8178918 generic-all > ?gc/stress/gclocker/TestGCLockerWithG1.java 8179226 generic-all > > gc/survivorAlignment/TestPromotionFromSurvivorToTenuredAfterMinorGC.java > 8177765 generic-all > +gc/logging/TestPrintReferences.java 8188245 generic-all > > > ############################################################################# > From serguei.spitsyn at oracle.com Tue Oct 3 01:58:42 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 2 Oct 2017 18:58:42 -0700 Subject: (XS) RFR: 8188246: Add test/hotspot/jtreg/gc/logging/TestPrintReferences.java to ProblemList.txt In-Reply-To: References: <86cbbd16-f353-5214-7a4c-aca3e74afbab@oracle.com> Message-ID: +1 Thanks, Serguei On 10/2/17 18:56, Daniel D. Daugherty wrote: > On 10/2/17 7:33 PM, David Holmes wrote: >> The test fails intermittently in tier1 testing so we need to exclude >> it until fixed. patch inline below. >> >> webrev: http://cr.openjdk.java.net/~dholmes/8188246/webrev/ > > Thumbs up! > > Dan > > >> >> Will push under trivial rules as soon as I have one Review. >> >> Thanks, >> David >> >> --- old/test/hotspot/jtreg/ProblemList.txt??? 2017-10-02 >> 21:26:20.127717945 -0400 >> +++ new/test/hotspot/jtreg/ProblemList.txt??? 2017-10-02 >> 21:26:18.043599357 -0400 >> @@ -64,6 +64,7 @@ >> ?gc/g1/humongousObjects/TestHeapCounters.java 8178918 generic-all >> ?gc/stress/gclocker/TestGCLockerWithG1.java 8179226 generic-all >> >> gc/survivorAlignment/TestPromotionFromSurvivorToTenuredAfterMinorGC.java >> 8177765 generic-all >> +gc/logging/TestPrintReferences.java 8188245 generic-all >> >> >> ############################################################################# >> > From david.holmes at oracle.com Tue Oct 3 02:00:04 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Oct 2017 12:00:04 +1000 Subject: (XS) RFR: 8188246: Add test/hotspot/jtreg/gc/logging/TestPrintReferences.java to ProblemList.txt In-Reply-To: References: <86cbbd16-f353-5214-7a4c-aca3e74afbab@oracle.com> Message-ID: <5d9bde33-4e41-cbfe-a2c2-9f024610f222@oracle.com> Thanks Dan and Serguei! Sorry I already committed before Serguei's email came through. David On 3/10/2017 11:58 AM, serguei.spitsyn at oracle.com wrote: > +1 > > Thanks, > Serguei > > On 10/2/17 18:56, Daniel D. Daugherty wrote: >> On 10/2/17 7:33 PM, David Holmes wrote: >>> The test fails intermittently in tier1 testing so we need to exclude >>> it until fixed. patch inline below. >>> >>> webrev: http://cr.openjdk.java.net/~dholmes/8188246/webrev/ >> >> Thumbs up! >> >> Dan >> >> >>> >>> Will push under trivial rules as soon as I have one Review. >>> >>> Thanks, >>> David >>> >>> --- old/test/hotspot/jtreg/ProblemList.txt??? 2017-10-02 >>> 21:26:20.127717945 -0400 >>> +++ new/test/hotspot/jtreg/ProblemList.txt??? 2017-10-02 >>> 21:26:18.043599357 -0400 >>> @@ -64,6 +64,7 @@ >>> ?gc/g1/humongousObjects/TestHeapCounters.java 8178918 generic-all >>> ?gc/stress/gclocker/TestGCLockerWithG1.java 8179226 generic-all >>> >>> gc/survivorAlignment/TestPromotionFromSurvivorToTenuredAfterMinorGC.java >>> 8177765 generic-all >>> +gc/logging/TestPrintReferences.java 8188245 generic-all >>> >>> >>> ############################################################################# >>> >> > From serguei.spitsyn at oracle.com Tue Oct 3 02:10:55 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Mon, 2 Oct 2017 19:10:55 -0700 Subject: (XS) RFR: 8188246: Add test/hotspot/jtreg/gc/logging/TestPrintReferences.java to ProblemList.txt In-Reply-To: <5d9bde33-4e41-cbfe-a2c2-9f024610f222@oracle.com> References: <86cbbd16-f353-5214-7a4c-aca3e74afbab@oracle.com> <5d9bde33-4e41-cbfe-a2c2-9f024610f222@oracle.com> Message-ID: <995cf43d-b3f7-b89f-ecfd-977591c2905a@oracle.com> On 10/2/17 19:00, David Holmes wrote: > Thanks Dan and Serguei! > > Sorry I already committed before Serguei's email came through. No problem. :) Thanks, Serguei > > David > > On 3/10/2017 11:58 AM, serguei.spitsyn at oracle.com wrote: >> +1 >> >> Thanks, >> Serguei >> >> On 10/2/17 18:56, Daniel D. Daugherty wrote: >>> On 10/2/17 7:33 PM, David Holmes wrote: >>>> The test fails intermittently in tier1 testing so we need to >>>> exclude it until fixed. patch inline below. >>>> >>>> webrev: http://cr.openjdk.java.net/~dholmes/8188246/webrev/ >>> >>> Thumbs up! >>> >>> Dan >>> >>> >>>> >>>> Will push under trivial rules as soon as I have one Review. >>>> >>>> Thanks, >>>> David >>>> >>>> --- old/test/hotspot/jtreg/ProblemList.txt??? 2017-10-02 >>>> 21:26:20.127717945 -0400 >>>> +++ new/test/hotspot/jtreg/ProblemList.txt??? 2017-10-02 >>>> 21:26:18.043599357 -0400 >>>> @@ -64,6 +64,7 @@ >>>> ?gc/g1/humongousObjects/TestHeapCounters.java 8178918 generic-all >>>> ?gc/stress/gclocker/TestGCLockerWithG1.java 8179226 generic-all >>>> >>>> gc/survivorAlignment/TestPromotionFromSurvivorToTenuredAfterMinorGC.java >>>> 8177765 generic-all >>>> +gc/logging/TestPrintReferences.java 8188245 generic-all >>>> >>>> >>>> ############################################################################# >>>> >>> >> From john.r.rose at oracle.com Tue Oct 3 06:54:24 2017 From: john.r.rose at oracle.com (John Rose) Date: Mon, 2 Oct 2017 23:54:24 -0700 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: <83B6E0A3-7730-431D-B0A2-AF36A6E7C5CA@oracle.com> Vote: yes From robbin.ehn at oracle.com Tue Oct 3 08:00:31 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 3 Oct 2017 10:00:31 +0200 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> Message-ID: <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> Hi David, On 10/03/2017 12:46 AM, David Holmes wrote: > Hi Robbin, > > I have some views on this :) > > On 3/10/2017 6:20 AM, Robbin Ehn wrote: >> Hi Bob, >> >> As I said in your presentation for RT. >> If kernel if configured with cgroup this should always be read (otherwise we get wrong values). >> E.g. fedora have had cgroups default on several years (I believe most distros have it on). >> >> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off? > > It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - in > which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes clearer > how this needs to be used we can adjust the defaults. For now this is enabling technology only. If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem. Therefore the flag make no sense. > >> - log target container would make little sense since almost all linuxes run with croups on. > > Again the capability is present but may not be enabled/configured. The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem. > >> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that >> ?? If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's. > > While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and someone > sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's better IMHO to > read everything from the cgroups if configured to use cgroups. I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count. Here is the bug: [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc [0.002s][debug][os] Initial active processor count set to 4 ^C [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc [0.003s][debug][os] Initial active processor count set to 32 ^C _cpu_count already does the right thing. Thanks, Robbin > > Cheers, > David > >> >> Thanks for trying to fixing this! >> >> /Robbin >> >> On 09/22/2017 04:27 PM, Bob Vandette wrote: >>> Please review these changes that improve on docker container detection and the >>> automatic configuration of the number of active CPUs and total and free memory >>> based on the containers resource limitation settings and metric data files. >>> >>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >>> >>> These changes are enabled with -XX:+UseContainerSupport. >>> >>> You can enable logging for this support via -Xlog:os+container=trace. >>> >>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>> may not satisfy every users needs, I?ve added an additional flag to allow the >>> number of CPUs to be overridden.? This flag is named -XX:ActiveProcessorCount=xx. >>> >>> >>> Bob. >>> >>> >>> From david.holmes at oracle.com Tue Oct 3 08:42:43 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Oct 2017 18:42:43 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> Message-ID: <640fdf30-fc85-112f-ad11-b99cc071053e@oracle.com> On 3/10/2017 6:00 PM, Robbin Ehn wrote: > Hi David, > > On 10/03/2017 12:46 AM, David Holmes wrote: >> Hi Robbin, >> >> I have some views on this :) >> >> On 3/10/2017 6:20 AM, Robbin Ehn wrote: >>> Hi Bob, >>> >>> As I said in your presentation for RT. >>> If kernel if configured with cgroup this should always be read >>> (otherwise we get wrong values). >>> E.g. fedora have had cgroups default on several years (I believe most >>> distros have it on). >>> >>> - No option is needed at all: right now we have wrong values your fix >>> will provide right ones, why would you ever what to turn that off? >> >> It's not that you would want to turn that off (necessarily) but just >> because cgroups capability exists it doesn't mean they have actually >> been enabled and configured - in which case reading all the cgroup >> info is unnecessary startup overhead. So for now this is opt-in - as >> was the experimental cgroup support we added. Once it becomes clearer >> how this needs to be used we can adjust the defaults. For now this is >> enabling technology only. > > If cgroup are mounted they are on and the only way to know the > configuration (such as no limits) is to actual read the cgroup filesystem. > Therefore the flag make no sense. No that is exactly why it is opt-in! Why should we have to waste startup time reading a bunch of cgroup values just to determine that cgroups are not actually being used! >> >>> - log target container would make little sense since almost all >>> linuxes run with croups on. >> >> Again the capability is present but may not be enabled/configured. > > The capability is on if cgroup are mount and the only way to know the > configuration is to read the cgroup filesystem. > >> >>> - For cpuset, the processes affinity mask already reflect cgroup >>> setting so you don't need to look into cgroup for that >>> ?? If you do, you would miss any processes specific affinity mask. So >>> _cpu_count() should already be returning the right number of CPU's. >> >> While the process affinity mask reflect cpusets (and we already use it >> for that reason), it doesn't reflect shares and quotas. And if >> shares/quotas are enforced and someone sets a custom affinity mask, >> what is it all supposed to mean? That's one of the main reasons to >> allow the number of cpu's to be hardwired via a flag. So it's better >> IMHO to read everything from the cgroups if configured to use cgroups. > > I'm not taking about shares and quotes, they should be read of course, > but cpuset should be checked such as in _cpu_count. > > Here is the bug: > > [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . > ForEver | grep proc > [0.002s][debug][os] Initial active processor count set to 4 > ^C > [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java > -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc > [0.003s][debug][os] Initial active processor count set to 32 > ^C > > _cpu_count already does the right thing. But how do you then combine that information with the use of shares and/or quotas? David ----- > Thanks, Robbin > > >> >> Cheers, >> David >> >>> >>> Thanks for trying to fixing this! >>> >>> /Robbin >>> >>> On 09/22/2017 04:27 PM, Bob Vandette wrote: >>>> Please review these changes that improve on docker container >>>> detection and the >>>> automatic configuration of the number of active CPUs and total and >>>> free memory >>>> based on the containers resource limitation settings and metric data >>>> files. >>>> >>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >>>> >>>> >>>> These changes are enabled with -XX:+UseContainerSupport. >>>> >>>> You can enable logging for this support via -Xlog:os+container=trace. >>>> >>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>>> may not satisfy every users needs, I?ve added an additional flag to >>>> allow the >>>> number of CPUs to be overridden.? This flag is named >>>> -XX:ActiveProcessorCount=xx. >>>> >>>> >>>> Bob. >>>> >>>> >>>> From robbin.ehn at oracle.com Tue Oct 3 10:45:10 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 3 Oct 2017 12:45:10 +0200 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <640fdf30-fc85-112f-ad11-b99cc071053e@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <640fdf30-fc85-112f-ad11-b99cc071053e@oracle.com> Message-ID: <2df87576-cd2f-6d1d-4367-8a2956b88fea@oracle.com> Hi David, I think we are seen the issue from complete opposite. (this RFE could be pushed as a bug from my POV) On 10/03/2017 10:42 AM, David Holmes wrote: > On 3/10/2017 6:00 PM, Robbin Ehn wrote: >> Hi David, >> >> On 10/03/2017 12:46 AM, David Holmes wrote: >>> Hi Robbin, >>> >>> I have some views on this :) >>> >>> On 3/10/2017 6:20 AM, Robbin Ehn wrote: >>>> Hi Bob, >>>> >>>> As I said in your presentation for RT. >>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values). >>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on). >>>> >>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off? >>> >>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - >>> in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes >>> clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only. >> >> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem. >> Therefore the flag make no sense. > > No that is exactly why it is opt-in! Why should we have to waste startup time reading a bunch of cgroup values just to determine that cgroups are not actually being used! If you have a cgroup enabled kernel they _are_ being used, no escaping that. cgroup is not a simple yes and no so for which resources depend on how you configured your kernel. To find out for what resource and what limits are set is we need to read them. I rather waste startup time (0.103292989 vs 0.103577139 seconds) and get values correct, so our heuristic works fine out-of-the-box. (and if you must, it opt-out) Also I notice that we don't read the numa values so the phys mem method does a poor job. Correct would be check at least cgroup and numa bindings. We also have this option UseCGroupMemoryLimitForHeap which should be removed. > >>> >>>> - log target container would make little sense since almost all linuxes run with croups on. >>> >>> Again the capability is present but may not be enabled/configured. >> >> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem. >> >>> >>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that >>>> ?? If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's. >>> >>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and >>> someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's >>> better IMHO to read everything from the cgroups if configured to use cgroups. >> >> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count. >> >> Here is the bug: >> >> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc >> [0.002s][debug][os] Initial active processor count set to 4 >> ^C >> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc >> [0.003s][debug][os] Initial active processor count set to 32 >> ^C >> >> _cpu_count already does the right thing. > > But how do you then combine that information with the use of shares and/or quotas? That I don't know, wild naive guess would be: active count ~ MIN(OSContainer::pd_active_processor_count(), cpuset); :) I assume everything we need to know is in: https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt Thanks, Robbin > > David > ----- > >> Thanks, Robbin >> >> >>> >>> Cheers, >>> David >>> >>>> >>>> Thanks for trying to fixing this! >>>> >>>> /Robbin >>>> >>>> On 09/22/2017 04:27 PM, Bob Vandette wrote: >>>>> Please review these changes that improve on docker container detection and the >>>>> automatic configuration of the number of active CPUs and total and free memory >>>>> based on the containers resource limitation settings and metric data files. >>>>> >>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >>>>> >>>>> These changes are enabled with -XX:+UseContainerSupport. >>>>> >>>>> You can enable logging for this support via -Xlog:os+container=trace. >>>>> >>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>>>> may not satisfy every users needs, I?ve added an additional flag to allow the >>>>> number of CPUs to be overridden.? This flag is named -XX:ActiveProcessorCount=xx. >>>>> >>>>> >>>>> Bob. >>>>> >>>>> >>>>> From david.holmes at oracle.com Tue Oct 3 11:00:46 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Oct 2017 21:00:46 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <2df87576-cd2f-6d1d-4367-8a2956b88fea@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <640fdf30-fc85-112f-ad11-b99cc071053e@oracle.com> <2df87576-cd2f-6d1d-4367-8a2956b88fea@oracle.com> Message-ID: Hi Robbin, On 3/10/2017 8:45 PM, Robbin Ehn wrote: > Hi David, I think we are seen the issue from complete opposite. (this > RFE could be pushed as a bug from my POV) Yes we see this completely opposite. I see this is a poorly integrated add-on API that we have to try to account for instead of being able to read an "always correct" value from a standard OS API. They at least got the cpuset support correct by having sched_getaffinity correctly account for it. Alas the rest is ad-hoc. > > On 10/03/2017 10:42 AM, David Holmes wrote: >> On 3/10/2017 6:00 PM, Robbin Ehn wrote: >>> Hi David, >>> >>> On 10/03/2017 12:46 AM, David Holmes wrote: >>>> Hi Robbin, >>>> >>>> I have some views on this :) >>>> >>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote: >>>>> Hi Bob, >>>>> >>>>> As I said in your presentation for RT. >>>>> If kernel if configured with cgroup this should always be read >>>>> (otherwise we get wrong values). >>>>> E.g. fedora have had cgroups default on several years (I believe >>>>> most distros have it on). >>>>> >>>>> - No option is needed at all: right now we have wrong values your >>>>> fix will provide right ones, why would you ever what to turn that off? >>>> >>>> It's not that you would want to turn that off (necessarily) but just >>>> because cgroups capability exists it doesn't mean they have actually >>>> been enabled and configured - in which case reading all the cgroup >>>> info is unnecessary startup overhead. So for now this is opt-in - as >>>> was the experimental cgroup support we added. Once it becomes >>>> clearer how this needs to be used we can adjust the defaults. For >>>> now this is enabling technology only. >>> >>> If cgroup are mounted they are on and the only way to know the >>> configuration (such as no limits) is to actual read the cgroup >>> filesystem. >>> Therefore the flag make no sense. >> >> No that is exactly why it is opt-in! Why should we have to waste >> startup time reading a bunch of cgroup values just to determine that >> cgroups are not actually being used! > > If you have a cgroup enabled kernel they _are_ being used, no escaping > that. A cgroup set to unlimited is not being used from a practical perspective. > cgroup is not a simple yes and no so for which resources depend on how > you configured your kernel. > To find out for what resource and what limits are set is we need to read > them. > > I rather waste startup time (0.103292989 vs 0.103577139 seconds) and get > values correct, so our heuristic works fine out-of-the-box. (and if you > must, it opt-out) I'd rather people say "Hey I'm using this add-on resource management API so don't ask the OS but please query the add-on.". Yes that is a little harsh but the lack of integration at the OS level is a huge impediment in my opinion. > Also I notice that we don't read the numa values so the phys mem method > does a poor job. Correct would be check at least cgroup and numa bindings. NUMA is another minefield. > We also have this option UseCGroupMemoryLimitForHeap which should be > removed. Bob already addressed why he was not getting rid of that initially. >> >>>> >>>>> - log target container would make little sense since almost all >>>>> linuxes run with croups on. >>>> >>>> Again the capability is present but may not be enabled/configured. >>> >>> The capability is on if cgroup are mount and the only way to know the >>> configuration is to read the cgroup filesystem. >>> >>>> >>>>> - For cpuset, the processes affinity mask already reflect cgroup >>>>> setting so you don't need to look into cgroup for that >>>>> ?? If you do, you would miss any processes specific affinity mask. >>>>> So _cpu_count() should already be returning the right number of CPU's. >>>> >>>> While the process affinity mask reflect cpusets (and we already use >>>> it for that reason), it doesn't reflect shares and quotas. And if >>>> shares/quotas are enforced and someone sets a custom affinity mask, >>>> what is it all supposed to mean? That's one of the main reasons to >>>> allow the number of cpu's to be hardwired via a flag. So it's better >>>> IMHO to read everything from the cgroups if configured to use cgroups. >>> >>> I'm not taking about shares and quotes, they should be read of >>> course, but cpuset should be checked such as in _cpu_count. >>> >>> Here is the bug: >>> >>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp >>> . ForEver | grep proc >>> [0.002s][debug][os] Initial active processor count set to 4 >>> ^C >>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java >>> -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc >>> [0.003s][debug][os] Initial active processor count set to 32 >>> ^C >>> >>> _cpu_count already does the right thing. >> >> But how do you then combine that information with the use of shares >> and/or quotas? > > That I don't know, wild naive guess would be: > active count ~ MIN(OSContainer::pd_active_processor_count(), cpuset); :) That would be one option but it may not be meaningful. That said I don't think the use of quota or shares to define the number of available CPUs makes sense anyway. Personally I don't think mixing direct use of cpusets with cgroup defined limits makes much sense. > I assume everything we need to know is in: > https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt Nope that only addresses cpusets. The one part of this that at least makes sense in isolation. Cheers, David > Thanks, Robbin > >> >> David >> ----- >> >>> Thanks, Robbin >>> >>> >>>> >>>> Cheers, >>>> David >>>> >>>>> >>>>> Thanks for trying to fixing this! >>>>> >>>>> /Robbin >>>>> >>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote: >>>>>> Please review these changes that improve on docker container >>>>>> detection and the >>>>>> automatic configuration of the number of active CPUs and total and >>>>>> free memory >>>>>> based on the containers resource limitation settings and metric >>>>>> data files. >>>>>> >>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >>>>>> >>>>>> >>>>>> These changes are enabled with -XX:+UseContainerSupport. >>>>>> >>>>>> You can enable logging for this support via -Xlog:os+container=trace. >>>>>> >>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and >>>>>> shares >>>>>> may not satisfy every users needs, I?ve added an additional flag >>>>>> to allow the >>>>>> number of CPUs to be overridden.? This flag is named >>>>>> -XX:ActiveProcessorCount=xx. >>>>>> >>>>>> >>>>>> Bob. >>>>>> >>>>>> >>>>>> From vladimir.x.ivanov at oracle.com Tue Oct 3 11:54:12 2017 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 3 Oct 2017 14:54:12 +0300 Subject: Questions about ... Lambda Form Compilation In-Reply-To: References: Message-ID: <57d5cf51-111f-d34a-e161-02df724b6577@oracle.com> Hi, > 2. For the same cluster, we also see over half of machines repeatedly > experiencing full GC due to Metaspace full. We dump JSTACK for every minute > during 30 minutes, and see many threads are trying to compile the exact > same lambda form throughout the 30-minute period. > > Here is an example stacktrace on one machine. The LambdaForm triggers the > compilation on that machine is always LambdaForm$MH/170067652. Once it's > compiled, it should use the new compiled lambda form. We don't know why > it's still trying to compile the same lambda form again and again. -- Would > it be because the compiled lambda form somehow failed to load? This might > relate to the negative number of loaded classes. What you are seeing here is LambdaForm customization (8069591 [1]). Customization creates a new LambdaForm instance specialized for a particular MethodHandle instance (no LF sharing possible). It was designed to alleviate performance penalty when inlining through a MH invoker doesn't happen and enables JIT-compilers to compile the whole method handle chain into a single nmethod. Without customization a method handle chain breaks up into a chain of small nmethods (1 nmethod per LambdaForm) and calls between them start dominate the execution time. (More details are available in [2].) Customization takes place once a method handle has been invoked through MH.invoke/invokeExact() more than 127 times. Considering you observe continuous customization, it means there are method handles being continuously instantiated and used which share the same lambda form (LambdaForm$MH/170067652). It leads to excessive generation of VM anonymous classes and creates memory pressure in Metaspace. As a workaround, you can try to disable LF customization (java.lang.invoke.MethodHandle.CUSTOMIZE_THRESHOLD=-1). But I'd suggest to look into why the application continuously creates method handles. As you noted, it doesn't play well with existing heuristics aimed at maximum throughput which assume the application behavior "stabilizes" over time. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8069591 [2] http://cr.openjdk.java.net/~vlivanov/talks/2015-JVMLS_State_of_JLI.pdf slides #45-#50 > "20170926_232912_39740_3vuuu.1.79-4-76640" #76640 prio=5 os_prio=0 > tid=0x00007f908006dbd0 nid=0x150a6 runnable [0x00007f8bddb1b000] > java.lang.Thread.State: RUNNABLE > at sun.misc.Unsafe.defineAnonymousClass(Native Method) > at java.lang.invoke.InvokerBytecodeGenerator. > loadAndInitializeInvokerClass(InvokerBytecodeGenerator.java:284) > at java.lang.invoke.InvokerBytecodeGenerator.loadMethod( > InvokerBytecodeGenerator.java:276) > at java.lang.invoke.InvokerBytecodeGenerator. > generateCustomizedCode(InvokerBytecodeGenerator.java:618) > at java.lang.invoke.LambdaForm.compileToBytecode(LambdaForm. > java:654) > at java.lang.invoke.LambdaForm.prepare(LambdaForm.java:635) > at java.lang.invoke.MethodHandle.updateForm(MethodHandle.java: > 1432) > at java.lang.invoke.MethodHandle.customize(MethodHandle.java: > 1442) > at java.lang.invoke.Invokers.maybeCustomize(Invokers.java:407) > at java.lang.invoke.Invokers.checkCustomized(Invokers.java:398) > at java.lang.invoke.LambdaForm$MH/170067652.invokeExact_MT( > LambdaForm$MH) > at com.facebook.presto.operator.aggregation.MinMaxHelper. > combineStateWithState(MinMaxHelper.java:141) > at com.facebook.presto.operator.aggregation. > MaxAggregationFunction.combine(MaxAggregationFunction.java:108) > at java.lang.invoke.LambdaForm$DMH/1607453282.invokeStatic_ > L3_V(LambdaForm$DMH) > at java.lang.invoke.LambdaForm$BMH/1118134445.reinvoke( > LambdaForm$BMH) > at java.lang.invoke.LambdaForm$MH/1971758264. > linkToTargetMethod(LambdaForm$MH) > at com.facebook.presto.$gen.IntegerIntegerMaxGroupedAccumu > lator_3439.addIntermediate(Unknown Source) > at com.facebook.presto.operator.aggregation.builder. > InMemoryHashAggregationBuilder$Aggregator.processPage( > InMemoryHashAggregationBuilder.java:367) > at com.facebook.presto.operator.aggregation.builder. > InMemoryHashAggregationBuilder.processPage(InMemoryHashAggregationBuilder > .java:138) > at com.facebook.presto.operator.HashAggregationOperator. > addInput(HashAggregationOperator.java:400) > at com.facebook.presto.operator.Driver.processInternal(Driver. > java:343) > at com.facebook.presto.operator.Driver.lambda$processFor$6( > Driver.java:241) > at com.facebook.presto.operator.Driver$$Lambda$765/442308692.get(Unknown > Source) > at com.facebook.presto.operator.Driver.tryWithLock(Driver. > java:614) > at com.facebook.presto.operator.Driver.processFor(Driver.java: > 235) > at com.facebook.presto.execution.SqlTaskExecution$ > DriverSplitRunner.processFor(SqlTaskExecution.java:622) > at com.facebook.presto.execution.executor. > PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) > at com.facebook.presto.execution.executor.TaskExecutor$ > TaskRunner.run(TaskExecutor.java:485) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > ... > > > > Both issues go away after we restart the JVM, and the same query won't > trigger the LambdaForm compilation issue, so it looks like the JVM enters > some weird state. We are wondering if there is any thoughts on what could > trigger these issues? Or is there any suggestions about how to further > investigate it next time we see the VM in this state? > > Thank you. > > From vladimir.x.ivanov at oracle.com Tue Oct 3 12:10:17 2017 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 3 Oct 2017 15:10:17 +0300 Subject: Questions about negative loaded classes ... In-Reply-To: References: Message-ID: <49bb66f9-49a7-3183-4410-b15176033e02@oracle.com> > 1. On more than half of the machines (200 out of 400 machines), we see he > JMX counter report negative LoadedClassCount, see attached jmxcounter.png. > > After some further dig, we note UnloadedClassCount is larger than > TotalLoadedClassCount. And LoadedClassCount (-695,710) = > TotalLoadedClassCount - UnloadedClassCount . PerfCounter reports the same > number, here is the result on the same machine: > > $ jcmd 307 PerfCounter.print | grep -i class | grep -i java.cls > java.cls.loadedClasses=192004392 > java.cls.sharedLoadedClasses=0 > java.cls.sharedUnloadedClasses=0 > java.cls.unloadedClasses=192700102 JVM performance counters aren't exact (e.g., updates aren't atomic [1]), so I wouldn't be surprised to see loadedClasses & unloadedClasses diverging during concurrent class loading. Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/file/92693f9dd704/src/share/vm/runtime/perfData.hpp#l425 From robbin.ehn at oracle.com Tue Oct 3 12:19:36 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 3 Oct 2017 14:19:36 +0200 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <640fdf30-fc85-112f-ad11-b99cc071053e@oracle.com> <2df87576-cd2f-6d1d-4367-8a2956b88fea@oracle.com> Message-ID: <8439163a-3a94-6804-0e27-ce384be821cf@oracle.com> Hi, I'll leave that discussion for a while, another thing is: In os::Linux::available_memory(), OSContainer::memory_limit_in_bytes() the limit can be larger than actual ram. So we also need to check sysinfo e.g. return MIN(avail_mem, si.freeram * si.mem_unit). So I think the check against "if (XXX == 9223372036854771712)" is not needed at all for any of those methods. Just return what cgroup says if that is larger then the actual value pick the lower one. /Robbin On 10/03/2017 01:00 PM, David Holmes wrote: > Hi Robbin, > > On 3/10/2017 8:45 PM, Robbin Ehn wrote: >> Hi David, I think we are seen the issue from complete opposite. (this RFE could be pushed as a bug from my POV) > > Yes we see this completely opposite. I see this is a poorly integrated add-on API that we have to try to account for instead of being able to read an "always correct" value > from a standard OS API. They at least got the cpuset support correct by having sched_getaffinity correctly account for it. Alas the rest is ad-hoc. > >> >> On 10/03/2017 10:42 AM, David Holmes wrote: >>> On 3/10/2017 6:00 PM, Robbin Ehn wrote: >>>> Hi David, >>>> >>>> On 10/03/2017 12:46 AM, David Holmes wrote: >>>>> Hi Robbin, >>>>> >>>>> I have some views on this :) >>>>> >>>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote: >>>>>> Hi Bob, >>>>>> >>>>>> As I said in your presentation for RT. >>>>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values). >>>>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on). >>>>>> >>>>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off? >>>>> >>>>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - >>>>> in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes >>>>> clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only. >>>> >>>> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem. >>>> Therefore the flag make no sense. >>> >>> No that is exactly why it is opt-in! Why should we have to waste startup time reading a bunch of cgroup values just to determine that cgroups are not actually being used! >> >> If you have a cgroup enabled kernel they _are_ being used, no escaping that. > > A cgroup set to unlimited is not being used from a practical perspective. > >> cgroup is not a simple yes and no so for which resources depend on how you configured your kernel. >> To find out for what resource and what limits are set is we need to read them. >> >> I rather waste startup time (0.103292989 vs 0.103577139 seconds) and get values correct, so our heuristic works fine out-of-the-box. (and if you must, it opt-out) > > I'd rather people say "Hey I'm using this add-on resource management API so don't ask the OS but please query the add-on.". Yes that is a little harsh but the lack of > integration at the OS level is a huge impediment in my opinion. > >> Also I notice that we don't read the numa values so the phys mem method does a poor job. Correct would be check at least cgroup and numa bindings. > > NUMA is another minefield. > >> We also have this option UseCGroupMemoryLimitForHeap which should be removed. > > Bob already addressed why he was not getting rid of that initially. > >>> >>>>> >>>>>> - log target container would make little sense since almost all linuxes run with croups on. >>>>> >>>>> Again the capability is present but may not be enabled/configured. >>>> >>>> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem. >>>> >>>>> >>>>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that >>>>>> ?? If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's. >>>>> >>>>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and >>>>> someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's >>>>> better IMHO to read everything from the cgroups if configured to use cgroups. >>>> >>>> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count. >>>> >>>> Here is the bug: >>>> >>>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc >>>> [0.002s][debug][os] Initial active processor count set to 4 >>>> ^C >>>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc >>>> [0.003s][debug][os] Initial active processor count set to 32 >>>> ^C >>>> >>>> _cpu_count already does the right thing. >>> >>> But how do you then combine that information with the use of shares and/or quotas? >> >> That I don't know, wild naive guess would be: >> active count ~ MIN(OSContainer::pd_active_processor_count(), cpuset); :) > > That would be one option but it may not be meaningful. That said I don't think the use of quota or shares to define the number of available CPUs makes sense anyway. > > Personally I don't think mixing direct use of cpusets with cgroup defined limits makes much sense. > >> I assume everything we need to know is in: https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt > > Nope that only addresses cpusets. The one part of this that at least makes sense in isolation. > > Cheers, > David > >> Thanks, Robbin >> >>> >>> David >>> ----- >>> >>>> Thanks, Robbin >>>> >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>>> >>>>>> Thanks for trying to fixing this! >>>>>> >>>>>> /Robbin >>>>>> >>>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote: >>>>>>> Please review these changes that improve on docker container detection and the >>>>>>> automatic configuration of the number of active CPUs and total and free memory >>>>>>> based on the containers resource limitation settings and metric data files. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >>>>>>> >>>>>>> These changes are enabled with -XX:+UseContainerSupport. >>>>>>> >>>>>>> You can enable logging for this support via -Xlog:os+container=trace. >>>>>>> >>>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>>>>>> may not satisfy every users needs, I?ve added an additional flag to allow the >>>>>>> number of CPUs to be overridden.? This flag is named -XX:ActiveProcessorCount=xx. >>>>>>> >>>>>>> >>>>>>> Bob. >>>>>>> >>>>>>> >>>>>>> From bob.vandette at oracle.com Tue Oct 3 12:25:13 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 3 Oct 2017 08:25:13 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> Message-ID: <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> After talking to a number of folks and getting feedback, my current thinking is to enable the support by default. I still want to include the flag for at least one Java release in the event that the new behavior causes some regression in behavior. I?m trying to make the detection robust so that it will fallback to the current behavior in the event that cgroups is not configured as expected but I?d like to have a way of forcing the issue. JDK 10 is not supposed to be a long term support release which makes it a good target for this new behavior. I agree with David that once we commit to cgroups, we should extract all VM configuration data from that source. There?s more information available for cpusets than just processor affinity that we might want to consider when calculating the number of processors to assume for the VM. There?s exclusivity and effective cpu data available in addition to the cpuset string. Bob. > On Oct 3, 2017, at 4:00 AM, Robbin Ehn wrote: > > Hi David, > > On 10/03/2017 12:46 AM, David Holmes wrote: >> Hi Robbin, >> I have some views on this :) >> On 3/10/2017 6:20 AM, Robbin Ehn wrote: >>> Hi Bob, >>> >>> As I said in your presentation for RT. >>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values). >>> E.g. fedora have had cgroups default on several years (I believe most distros have it on). >>> >>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off? >> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only. > > If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem. > Therefore the flag make no sense. > >>> - log target container would make little sense since almost all linuxes run with croups on. >> Again the capability is present but may not be enabled/configured. > > The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem. > >>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that >>> If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's. >> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's better IMHO to read everything from the cgroups if configured to use cgroups. > > I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count. > > Here is the bug: > > [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc > [0.002s][debug][os] Initial active processor count set to 4 > ^C > [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc > [0.003s][debug][os] Initial active processor count set to 32 > ^C > > _cpu_count already does the right thing. > > Thanks, Robbin > > >> Cheers, >> David >>> >>> Thanks for trying to fixing this! >>> >>> /Robbin >>> >>> On 09/22/2017 04:27 PM, Bob Vandette wrote: >>>> Please review these changes that improve on docker container detection and the >>>> automatic configuration of the number of active CPUs and total and free memory >>>> based on the containers resource limitation settings and metric data files. >>>> >>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >>>> >>>> These changes are enabled with -XX:+UseContainerSupport. >>>> >>>> You can enable logging for this support via -Xlog:os+container=trace. >>>> >>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>>> may not satisfy every users needs, I?ve added an additional flag to allow the >>>> number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. >>>> >>>> >>>> Bob. >>>> >>>> >>>> From erik.osterlund at oracle.com Tue Oct 3 12:29:07 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 3 Oct 2017 14:29:07 +0200 Subject: RFR (M): 8188224: Generalize Atomic::load/store to use templates Message-ID: <59D38293.7030800@oracle.com> Hi, The time has come to generalize Atomic::load/store with templates - the last operation to generalize in Atomic. The design was inspired by Atomic::xchg and uses a similar mechanism to validate the passed in arguments. It was also designed with coming OrderAccess changes in mind. OrderAccess also contains loads and stores that will reuse the LoadImpl and StoreImpl infrastructure in Atomic::load/store. (the type checking for what is okay to pass in to Atomic::load/store is very much the same for OrderAccess::load_acquire/*store*). One thing worth mentioning is that the bsd zero port (but notably not the linux zero port) had a leading fence for atomic stores of jint when #if !defined(ARM) && !defined(M68K) is true without any comment describing why. So I took the liberty of removing it. Atomic should not have any fencing at all - that is what OrderAccess is for. In fact Atomic does not promise any memory ordering semantics for loads and stores. Atomic merely provides relaxed accesses that are atomic. Worth mentioning nevertheless in case anyone wants to keep that jint Atomic::store fence on bsd zero !M68K && !ARM. Bug: https://bugs.openjdk.java.net/browse/JDK-8188224 Webrev: http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00/ Testing: JPRT, mach5 hs-tier3 Thanks, /Erik From robbin.ehn at oracle.com Tue Oct 3 12:39:38 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 3 Oct 2017 14:39:38 +0200 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> Message-ID: On 10/03/2017 02:25 PM, Bob Vandette wrote: > After talking to a number of folks and getting feedback, my current thinking is to enable the support by default. Great. > > I still want to include the flag for at least one Java release in the event that the new behavior causes some regression > in behavior. I?m trying to make the detection robust so that it will fallback to the current behavior in the event > that cgroups is not configured as expected but I?d like to have a way of forcing the issue. JDK 10 is not > supposed to be a long term support release which makes it a good target for this new behavior. > > I agree with David that once we commit to cgroups, we should extract all VM configuration data from that > source. There?s more information available for cpusets than just processor affinity that we might want to > consider when calculating the number of processors to assume for the VM. There?s exclusivity and > effective cpu data available in addition to the cpuset string. cgroup only contains limits, not the real hard limits. You most consider the affinity mask. We that have numa nodes do: [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -cp . ForEver | grep proc [0.001s][debug][os] Initial active processor count set to 16 [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc [0.001s][debug][os] Initial active processor count set to 32 when benchmarking all the time and that must be set to 16 otherwise the flag is really bad for us. So the flag actually breaks the little numa support we have now. Thanks, Robbin > > Bob. > > >> On Oct 3, 2017, at 4:00 AM, Robbin Ehn wrote: >> >> Hi David, >> >> On 10/03/2017 12:46 AM, David Holmes wrote: >>> Hi Robbin, >>> I have some views on this :) >>> On 3/10/2017 6:20 AM, Robbin Ehn wrote: >>>> Hi Bob, >>>> >>>> As I said in your presentation for RT. >>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values). >>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on). >>>> >>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off? >>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only. >> >> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem. >> Therefore the flag make no sense. >> >>>> - log target container would make little sense since almost all linuxes run with croups on. >>> Again the capability is present but may not be enabled/configured. >> >> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem. >> >>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that >>>> If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's. >>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's better IMHO to read everything from the cgroups if configured to use cgroups. >> >> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count. >> >> Here is the bug: >> >> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc >> [0.002s][debug][os] Initial active processor count set to 4 >> ^C >> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc >> [0.003s][debug][os] Initial active processor count set to 32 >> ^C >> >> _cpu_count already does the right thing. >> >> Thanks, Robbin >> >> >>> Cheers, >>> David >>>> >>>> Thanks for trying to fixing this! >>>> >>>> /Robbin >>>> >>>> On 09/22/2017 04:27 PM, Bob Vandette wrote: >>>>> Please review these changes that improve on docker container detection and the >>>>> automatic configuration of the number of active CPUs and total and free memory >>>>> based on the containers resource limitation settings and metric data files. >>>>> >>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >>>>> >>>>> These changes are enabled with -XX:+UseContainerSupport. >>>>> >>>>> You can enable logging for this support via -Xlog:os+container=trace. >>>>> >>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>>>> may not satisfy every users needs, I?ve added an additional flag to allow the >>>>> number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. >>>>> >>>>> >>>>> Bob. >>>>> >>>>> >>>>> > From david.holmes at oracle.com Tue Oct 3 12:44:19 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 Oct 2017 22:44:19 +1000 Subject: RFR (M): 8188224: Generalize Atomic::load/store to use templates In-Reply-To: <59D38293.7030800@oracle.com> References: <59D38293.7030800@oracle.com> Message-ID: <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> Hi Erik, A lot of jumping through hoops just to do a direct load/store in the bulk of cases - but okay, we're embracing templates. 66 // Atomically store to a location 67 // See comment above about using jlong atomics on 32-bit platforms The comment at #67 and the equivalent one for load can be deleted. The "comment above" should only be referring to r-m-w atomic ops not basic load and store. All platforms must have a means to do atomic load/store of 64-bit due to Java volatile variables (eg by using floating-point unit on 32-bit) but may not have cmpxchg<8> capability. (I failed to convince the author of this when those comments went in. ;-) ) Cheers, David On 3/10/2017 10:29 PM, Erik ?sterlund wrote: > Hi, > > The time has come to generalize Atomic::load/store with templates - the > last operation to generalize in Atomic. > The design was inspired by Atomic::xchg and uses a similar mechanism to > validate the passed in arguments. It was also designed with coming > OrderAccess changes in mind. OrderAccess also contains loads and stores > that will reuse the LoadImpl and StoreImpl infrastructure in > Atomic::load/store. (the type checking for what is okay to pass in to > Atomic::load/store is very much the same for > OrderAccess::load_acquire/*store*). > > One thing worth mentioning is that the bsd zero port (but notably not > the linux zero port) had a leading fence for atomic stores of jint when > #if !defined(ARM) && !defined(M68K) is true without any comment > describing why. So I took the liberty of removing it. Atomic should not > have any fencing at all - that is what OrderAccess is for. In fact > Atomic does not promise any memory ordering semantics for loads and > stores. Atomic merely provides relaxed accesses that are atomic. Worth > mentioning nevertheless in case anyone wants to keep that jint > Atomic::store fence on bsd zero !M68K && !ARM. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8188224 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00/ > > Testing: JPRT, mach5 hs-tier3 > > Thanks, > /Erik From erik.osterlund at oracle.com Tue Oct 3 12:58:11 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 3 Oct 2017 14:58:11 +0200 Subject: RFR (M): 8188224: Generalize Atomic::load/store to use templates In-Reply-To: <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> References: <59D38293.7030800@oracle.com> <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> Message-ID: <59D38963.2070806@oracle.com> Hi David, Thanks for the review. The comments have been removed. New full webrev: http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/ New incremental webrev: http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00_01/ Thanks, /Erik On 2017-10-03 14:44, David Holmes wrote: > Hi Erik, > > A lot of jumping through hoops just to do a direct load/store in the > bulk of cases - but okay, we're embracing templates. > > 66 // Atomically store to a location > 67 // See comment above about using jlong atomics on 32-bit platforms > > The comment at #67 and the equivalent one for load can be deleted. The > "comment above" should only be referring to r-m-w atomic ops not basic > load and store. All platforms must have a means to do atomic > load/store of 64-bit due to Java volatile variables (eg by using > floating-point unit on 32-bit) but may not have cmpxchg<8> capability. > (I failed to convince the author of this when those comments went in. > ;-) ) > > Cheers, > David > > On 3/10/2017 10:29 PM, Erik ?sterlund wrote: >> Hi, >> >> The time has come to generalize Atomic::load/store with templates - >> the last operation to generalize in Atomic. >> The design was inspired by Atomic::xchg and uses a similar mechanism >> to validate the passed in arguments. It was also designed with coming >> OrderAccess changes in mind. OrderAccess also contains loads and >> stores that will reuse the LoadImpl and StoreImpl infrastructure in >> Atomic::load/store. (the type checking for what is okay to pass in to >> Atomic::load/store is very much the same for >> OrderAccess::load_acquire/*store*). >> >> One thing worth mentioning is that the bsd zero port (but notably not >> the linux zero port) had a leading fence for atomic stores of jint >> when #if !defined(ARM) && !defined(M68K) is true without any comment >> describing why. So I took the liberty of removing it. Atomic should >> not have any fencing at all - that is what OrderAccess is for. In >> fact Atomic does not promise any memory ordering semantics for loads >> and stores. Atomic merely provides relaxed accesses that are atomic. >> Worth mentioning nevertheless in case anyone wants to keep that jint >> Atomic::store fence on bsd zero !M68K && !ARM. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8188224 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00/ >> >> Testing: JPRT, mach5 hs-tier3 >> >> Thanks, >> /Erik From coleen.phillimore at oracle.com Tue Oct 3 14:23:26 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 3 Oct 2017 10:23:26 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <1498efad-e443-5875-cc20-b0d0c926e883@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> <1498efad-e443-5875-cc20-b0d0c926e883@oracle.com> Message-ID: <7982f8eb-e4ba-8c09-f15f-e33797553141@oracle.com> Here is an updated webrev with fixes for your comments. open webrev at http://cr.openjdk.java.net/~coleenp/8186777.03/webrev Thanks for reviewing and all your help with this! Coleen On 9/29/17 6:41 AM, Stefan Karlsson wrote: > Hi Coleen, > > I started looking at this, but will need a second round before I've > fully reviewed the GC parts. > > Here are some nits that would be nice to get cleaned up. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.cpp.frames.html > > > ?788???? record_modified_oops();? // necessary? > > This could be removed. Only G1 cares about deleted "weak" references. > > Or we can wait until Erik?'s GC Barrier Interface is in place and > remove it then. > > ---------- > > ?#ifdef CLD_DUMP_KLASSES > ?? if (Verbose) { > ???? Klass* k = _klasses; > ???? while (k != NULL) { > -????? out->print_cr("klass " PTR_FORMAT ", %s, CT: %d, MUT: %d", k, > k->name()->as_C_string(), > -????????? k->has_modified_oops(), k->has_accumulated_modified_oops()); > +????? out->print_cr("klass " PTR_FORMAT ", %s", k, > k->name()->as_C_string()); > ?????? assert(k != k->next_link(), "no loops!"); > ?????? k = k->next_link(); > ???? } > ?? } > ?#endif? // CLD_DUMP_KLASSES > > Pre-existing: I don't think this will compile if you turn on > CLD_DUMP_KLASSES. k must be p2i(k). > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.hpp.udiff.html > > > +? // Remembered sets support for the oops in the class loader data. > +? jbyte _modified_oops;???????????? // Card Table Equivalent (YC/CMS > support) > +? jbyte _accumulated_modified_oops; // Mod Union Equivalent (CMS > support) > > We should create a follow-up bug to change these jbytes to bools. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1HeapVerifier.cpp.frames.html > > > Spurious addition: > +? G1CollectedHeap* _g1h; > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1OopClosures.hpp.udiff.html > > > Spurious addition?: > +? G1CollectedHeap* g1() { return _g1; } > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psScavenge.inline.hpp.patch > > > ?? PSPromotionManager* _pm; > -? // Used to redirty a scanned klass if it has oops > +? // Used to redirty a scanned cld if it has oops > ?? // pointing to the young generation after being scanned. > -? Klass*???????????? _scanned_klass; > +? ClassLoaderData*???????????? _scanned_cld; > > Indentation. > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psTasks.cpp.frames.html > > > ? 80???? case class_loader_data: > ? 81???? { > ? 82?????? PSScavengeCLDClosure ps(pm); > ? 83?????? ClassLoaderDataGraph::cld_do(&ps); > ? 84???? } > > Would you mind changing the name ps to cld_closure? > > ========== > http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/shared/genOopClosures.hpp.patch > > > +? OopsInClassLoaderDataOrGenClosure*?? _scavenge_closure; > ?? // true if the the modified oops state should be saved. > ?? bool???????????????????? _accumulate_modified_oops; > > Indentation. > > ---------- > +? void do_cld(ClassLoaderData* k); > > Rename k? > > Thanks, > StefanK > > On 2017-09-28 23:36, coleen.phillimore at oracle.com wrote: >> >> Thank you to Stefan Karlsson offlist for pointing out that the >> previous .01 version of this webrev breaks CMS in that it doesn't >> remember ClassLoaderData::_handles that are changed and added while >> concurrent marking is in progress.? I've fixed this bug to move the >> Klass::_modified_oops and _accumulated_modified_oops to the >> ClassLoaderData and use these fields in the CMS remarking phase to >> catch any new handles that are added.?? This also fixes this bug >> https://bugs.openjdk.java.net/browse/JDK-8173988 . >> >> In addition, the previous version of this change removed an >> optimization during young collection, which showed some uncertain >> performance regression in young pause times, so I added this >> optimization back to not walk ClassLoaderData during young >> collections if all the oops are old.? The performance results of >> SPECjbb2015 now are slightly better, but not significantly. >> >> This latest patch has been tested on tier1-5 on linux x64 and windows >> x64 in mach5 test harness. >> >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >> >> Can I get at least 3 reviewers?? One from each of the compiler, gc, >> and runtime group at least since there are changes to all 3. >> >> Thanks! >> Coleen >> >> >> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >>> Summary: Add indirection for fetching mirror so that GC doesn't have >>> to follow CLD::_klasses >>> >>> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 >>> changes. >>> >>> Ran nightly tests through Mach5 and RBT.?? Early performance testing >>> showed good performance improvment in GC class loader data >>> processing time, but nmethod processing time continues to dominate. >>> Also performace testing showed no throughput regression.?? I'm >>> rerunning both of these performance testing and will post the numbers. >>> >>> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >>> >>> Thanks, >>> Coleen From dmitry.chuyko at bell-sw.com Tue Oct 3 14:24:03 2017 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Tue, 3 Oct 2017 17:24:03 +0300 Subject: [10] RFR: 8186671 - AARCH64: Use `yield` instruction in SpinPause on linux-aarch64 In-Reply-To: References: Message-ID: <70a22c6b-3716-0355-b80c-c0c2b84ec3a2@bell-sw.com> Over the past time there have been no objections,, Andrew, can you please sponsor the change? Thanks, -Dmitry On 09/27/2017 08:04 PM, Dmitry Chuyko wrote: > > Hello, > > Re-sending this to hotspot-dev on the advice of Adrew, the patch is > updated for consolidated repo. > > rfe: https://bugs.openjdk.java.net/browse/JDK-8186671 > webrev: http://cr.openjdk.java.net/~dchuyko/8186671/webrev.01/ > original thread: > http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2017-August/004870.html > > The function was moved to platform .S file and now implemented with > yield instruction. > > -Dmitry > > > -------- Forwarded Message -------- > Subject: Re: [aarch64-port-dev ] RFR: 8186671: Use `yield` > instruction in SpinPause on linux-aarch64 > Date: Sat, 2 Sep 2017 09:10:00 +0100 > From: Andrew Haley > To: Dmitry Chuyko , > aarch64-port-dev at openjdk.java.net > > > > On 01/09/17 17:26, Dmitry Chuyko wrote: > > There were no objections to this part (extern). I need sponsorship to > > push the change. > > I can do it, but it really needs to be sent to hotspot-dev. > > > It would be interesting to discuss the other (intrinsic) part a bit more > > at fireside chat. > > OK, but without any actual implementations we can test it'll be a very > short discussion. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Oct 3 14:30:13 2017 From: aph at redhat.com (Andrew Haley) Date: Tue, 3 Oct 2017 15:30:13 +0100 Subject: [10] RFR: 8186671 - AARCH64: Use `yield` instruction in SpinPause on linux-aarch64 In-Reply-To: <70a22c6b-3716-0355-b80c-c0c2b84ec3a2@bell-sw.com> References: <70a22c6b-3716-0355-b80c-c0c2b84ec3a2@bell-sw.com> Message-ID: <6a8b007f-2b1c-c8a8-5b5e-6025ccc6dbd6@redhat.com> On 03/10/17 15:24, Dmitry Chuyko wrote: > Over the past time there have been no objections,, > > Andrew, can you please sponsor the change? No, let's discuss it on Thursday. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From bob.vandette at oracle.com Tue Oct 3 14:41:38 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 3 Oct 2017 10:41:38 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> Message-ID: <82E66654-2AF3-45EB-B996-45C7DE4191D2@oracle.com> > On Oct 3, 2017, at 8:39 AM, Robbin Ehn wrote: > > On 10/03/2017 02:25 PM, Bob Vandette wrote: >> After talking to a number of folks and getting feedback, my current thinking is to enable the support by default. > > Great. > >> I still want to include the flag for at least one Java release in the event that the new behavior causes some regression >> in behavior. I?m trying to make the detection robust so that it will fallback to the current behavior in the event >> that cgroups is not configured as expected but I?d like to have a way of forcing the issue. JDK 10 is not >> supposed to be a long term support release which makes it a good target for this new behavior. >> I agree with David that once we commit to cgroups, we should extract all VM configuration data from that >> source. There?s more information available for cpusets than just processor affinity that we might want to >> consider when calculating the number of processors to assume for the VM. There?s exclusivity and >> effective cpu data available in addition to the cpuset string. > > cgroup only contains limits, not the real hard limits. > You most consider the affinity mask. We that have numa nodes do: > > [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -cp . ForEver | grep proc > [0.001s][debug][os] Initial active processor count set to 16 > [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc > [0.001s][debug][os] Initial active processor count set to 32 > > when benchmarking all the time and that must be set to 16 otherwise the flag is really bad for us. > So the flag actually breaks the little numa support we have now. Thanks for sharing those results. I?ll look into this. I?m hoping this is due to the fact that I am not yet examining the memory node files in the cgroup file system. Bob. > > Thanks, Robbin > >> Bob. >>> On Oct 3, 2017, at 4:00 AM, Robbin Ehn wrote: >>> >>> Hi David, >>> >>> On 10/03/2017 12:46 AM, David Holmes wrote: >>>> Hi Robbin, >>>> I have some views on this :) >>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote: >>>>> Hi Bob, >>>>> >>>>> As I said in your presentation for RT. >>>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values). >>>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on). >>>>> >>>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off? >>>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only. >>> >>> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem. >>> Therefore the flag make no sense. >>> >>>>> - log target container would make little sense since almost all linuxes run with croups on. >>>> Again the capability is present but may not be enabled/configured. >>> >>> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem. >>> >>>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that >>>>> If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's. >>>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's better IMHO to read everything from the cgroups if configured to use cgroups. >>> >>> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count. >>> >>> Here is the bug: >>> >>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc >>> [0.002s][debug][os] Initial active processor count set to 4 >>> ^C >>> [rehn at rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc >>> [0.003s][debug][os] Initial active processor count set to 32 >>> ^C >>> >>> _cpu_count already does the right thing. >>> >>> Thanks, Robbin >>> >>> >>>> Cheers, >>>> David >>>>> >>>>> Thanks for trying to fixing this! >>>>> >>>>> /Robbin >>>>> >>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote: >>>>>> Please review these changes that improve on docker container detection and the >>>>>> automatic configuration of the number of active CPUs and total and free memory >>>>>> based on the containers resource limitation settings and metric data files. >>>>>> >>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ >>>>>> >>>>>> These changes are enabled with -XX:+UseContainerSupport. >>>>>> >>>>>> You can enable logging for this support via -Xlog:os+container=trace. >>>>>> >>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares >>>>>> may not satisfy every users needs, I?ve added an additional flag to allow the >>>>>> number of CPUs to be overridden. This flag is named -XX:ActiveProcessorCount=xx. >>>>>> >>>>>> >>>>>> Bob. >>>>>> >>>>>> >>>>>> From aph at redhat.com Tue Oct 3 14:56:11 2017 From: aph at redhat.com (Andrew Haley) Date: Tue, 3 Oct 2017 15:56:11 +0100 Subject: RFR (M): 8188224: Generalize Atomic::load/store to use templates In-Reply-To: <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> References: <59D38293.7030800@oracle.com> <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> Message-ID: <599fbf96-4439-ba00-e0a2-0599f0de057f@redhat.com> On 03/10/17 13:44, David Holmes wrote: > A lot of jumping through hoops just to do a direct load/store in the > bulk of cases - but okay, we're embracing templates. That doesn't really follow: embracing templates often makes generic code simpler, with fewer hoops. That's the idea, as I understand it. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From wenlei.xie at gmail.com Tue Oct 3 18:14:40 2017 From: wenlei.xie at gmail.com (Wenlei Xie) Date: Tue, 3 Oct 2017 11:14:40 -0700 Subject: Questions about negative loaded classes and Lambda Form Compilation In-Reply-To: References: Message-ID: Hi, We are still seeing this on 1.8.0_144. Just wondering these is any idea what might cause this, or what kind of thing we can do to investigate the VM is in this state? Thank you !! BTW: I note the attachment doesn't seem to work. So here is the link to screenshot about the negative number of loaded classes: https://imgur.com/a/ kGbto On Wed, Sep 27, 2017 at 11:03 AM, Wenlei Xie wrote: > Hi, > > We recently see some weird behavior of JVM in our production cluster. We > are running JDK 1.8.0_131. > > 1. On more than half of the machines (200 out of 400 machines), we see he > JMX counter report negative LoadedClassCount, see attached jmxcounter.png. > > After some further dig, we note UnloadedClassCount is larger than > TotalLoadedClassCount. And LoadedClassCount (-695,710) = > TotalLoadedClassCount - UnloadedClassCount . PerfCounter reports the same > number, here is the result on the same machine: > > $ jcmd 307 PerfCounter.print | grep -i class | grep -i java.cls > java.cls.loadedClasses=192004392 > java.cls.sharedLoadedClasses=0 > java.cls.sharedUnloadedClasses=0 > java.cls.unloadedClasses=192700102 > > > > 2. For the same cluster, we also see over half of machines repeatedly > experiencing full GC due to Metaspace full. We dump JSTACK for every minute > during 30 minutes, and see many threads are trying to compile the exact > same lambda form throughout the 30-minute period. > > Here is an example stacktrace on one machine. The LambdaForm triggers the > compilation on that machine is always LambdaForm$MH/170067652. Once it's > compiled, it should use the new compiled lambda form. We don't know why > it's still trying to compile the same lambda form again and again. -- Would > it be because the compiled lambda form somehow failed to load? This might > relate to the negative number of loaded classes. > > > "20170926_232912_39740_3vuuu.1.79-4-76640" #76640 prio=5 os_prio=0 > tid=0x00007f908006dbd0 nid=0x150a6 runnable [0x00007f8bddb1b000] > java.lang.Thread.State: RUNNABLE > at sun.misc.Unsafe.defineAnonymousClass(Native Method) > at java.lang.invoke.InvokerByteco > deGenerator.loadAndInitializeInvokerClass(InvokerBytecodeGen > erator.java:284) > at java.lang.invoke.InvokerByteco > deGenerator.loadMethod(InvokerBytecodeGenerator.java:276) > at java.lang.invoke.InvokerByteco > deGenerator.generateCustomizedCode(InvokerBytecodeGenerator.java:618) > at java.lang.invoke.LambdaForm.co > mpileToBytecode(LambdaForm.java:654) > at java.lang.invoke.LambdaForm.prepare(LambdaForm.java:635) > at java.lang.invoke.MethodHandle. > updateForm(MethodHandle.java:1432) > at java.lang.invoke.MethodHandle. > customize(MethodHandle.java:1442) > at java.lang.invoke.Invokers.maybeCustomize(Invokers.java:407) > at java.lang.invoke.Invokers.chec > kCustomized(Invokers.java:398) > at java.lang.invoke.LambdaForm$MH > /170067652.invokeExact_MT(LambdaForm$MH) > at com.facebook.presto.operator.a > ggregation.MinMaxHelper.combineStateWithState(MinMaxHelper.java:141) > at com.facebook.presto.operator.a > ggregation.MaxAggregationFunction.combine(MaxAggregationFunction.java:108) > at java.lang.invoke.LambdaForm$DM > H/1607453282.invokeStatic_L3_V(LambdaForm$DMH) > at java.lang.invoke.LambdaForm$BM > H/1118134445.reinvoke(LambdaForm$BMH) > at java.lang.invoke.LambdaForm$MH > /1971758264.linkToTargetMethod(LambdaForm$MH) > at com.facebook.presto.$gen.Integ > erIntegerMaxGroupedAccumulator_3439.addIntermediate(Unknown Source) > at com.facebook.presto.operator.a > ggregation.builder.InMemoryHashAggregationBuilder$Aggregator > .processPage(InMemoryHashAggregationBuilder.java:367) > at com.facebook.presto.operator.a > ggregation.builder.InMemoryHashAggregationBuilder.processPag > e(InMemoryHashAggregationBuilder.java:138) > at com.facebook.presto.operator.H > ashAggregationOperator.addInput(HashAggregationOperator.java:400) > at com.facebook.presto.operator.D > river.processInternal(Driver.java:343) > at com.facebook.presto.operator.D > river.lambda$processFor$6(Driver.java:241) > at com.facebook.presto.operator.Driver$$Lambda$765/ > 442308692.get(Unknown Source) > at com.facebook.presto.operator.D > river.tryWithLock(Driver.java:614) > at com.facebook.presto.operator.D > river.processFor(Driver.java:235) > at com.facebook.presto.execution. > SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622) > at com.facebook.presto.execution. > executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) > at com.facebook.presto.execution. > executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:485) > at java.util.concurrent.ThreadPoo > lExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoo > lExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > ... > > > > Both issues go away after we restart the JVM, and the same query won't > trigger the LambdaForm compilation issue, so it looks like the JVM enters > some weird state. We are wondering if there is any thoughts on what could > trigger these issues? Or is there any suggestions about how to further > investigate it next time we see the VM in this state? > > Thank you. > > > -- > Best Regards, > Wenlei Xie > > Email: wenlei.xie at gmail.com > -- Best Regards, Wenlei Xie Email: wenlei.xie at gmail.com From alexander.harlap at oracle.com Tue Oct 3 18:44:35 2017 From: alexander.harlap at oracle.com (Alexander Harlap) Date: Tue, 3 Oct 2017 14:44:35 -0400 Subject: Request for review JDK-8187819 gc/TestFullGCALot.java fails on jdk10 started with "-XX:-UseCompressedOops" option Message-ID: Please review change for JDK-8187819 gc/TestFullGCALot.java fails on jdk10 started with "-XX:-UseCompressedOops" option. Change is located at http://cr.openjdk.java.net/~aharlap/8187819/webrev.00/ Initialized metaspace performance counters before their potential use. Tested - JPRT Alex From vladimir.kozlov at oracle.com Tue Oct 3 19:46:35 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 3 Oct 2017 12:46:35 -0700 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> Message-ID: I rebased it. But there is problem with changes. VM hit guarantee() in this code when run on SPARC in both, fastdebug and product, builds. Crash happens during build. We can't push this - problem should be investigated and fixed first. Thanks, Vladimir make/Main.gmk:443: recipe for target 'generate-link-opt-data' failed /usr/ccs/bin/bash: line 4: 9349 Abort (core dumped) /s/build/solaris-sparcv9-debug/support/interim-image/bin/java -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true -cp /s/build/solaris-sparcv9-debug/support/classlist.jar build.tools.classlist.HelloClasslist 2>&1 > /s/build/solaris-sparcv9-debug/support/link_opt/default_jli_trace.txt make[3]: *** [/s/build/solaris-sparcv9-debug/support/link_opt/classlist] Error 134 make[2]: *** [generate-link-opt-data] Error 1 # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/s/open/src/hotspot/share/memory/heap.cpp:233), pid=9349, tid=2 # guarantee(b == block_at(_next_segment - actual_number_of_segments)) failed: Intermediate allocation! # # JRE version: (10.0) (fastdebug build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 10-internal+0-2017-09-30-014154.8166317, mixed mode, tiered, compressed oops, g1 gc, solaris-sparc) # Core dump will be written. Default location: /s/open/make/core or core.9349 # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # --------------- S U M M A R Y ------------ Command Line: -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true build.tools.classlist.HelloClasslist Host: sca00dbv, Sparcv9 64 bit 3600 MHz, 16 cores, 32G, Oracle Solaris 11.2 SPARC Time: Sat Sep 30 03:29:46 2017 UTC elapsed time: 0 seconds (0d 0h 0m 0s) --------------- T H R E A D --------------- Current thread (0x000000010012f000): JavaThread "Unknown thread" [_thread_in_vm, id=2, stack(0x0007fffef9700000,0x0007fffef9800000)] Stack: [0x0007fffef9700000,0x0007fffef9800000], sp=0x0007fffef97ff020, free space=1020k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1f94508] void VMError::report_and_die(int,const char*,const char*,void*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned long)+0xa58 V [libjvm.so+0x1f93a3c] void VMError::report_and_die(Thread*,const char*,int,const char*,const char*,void*)+0x3c V [libjvm.so+0xd02f38] void report_vm_error(const char*,int,const char*,const char*,...)+0x78 V [libjvm.so+0xfc219c] void CodeHeap::deallocate_tail(void*,unsigned long)+0xec V [libjvm.so+0xbf4f14] void CodeCache::free_unused_tail(CodeBlob*,unsigned long)+0xe4 V [libjvm.so+0x1e0ae70] void StubQueue::deallocate_unused_tail()+0x40 V [libjvm.so+0x1e7452c] void TemplateInterpreter::initialize()+0x19c V [libjvm.so+0x1051220] void interpreter_init()+0x20 V [libjvm.so+0x10116e0] int init_globals()+0xf0 V [libjvm.so+0x1ed8548] int Threads::create_vm(JavaVMInitArgs*,bool*)+0x4a8 V [libjvm.so+0x11c7b58] int JNI_CreateJavaVM_inner(JavaVM_**,void**,void*)+0x108 C [libjli.so+0x7950] InitializeJVM+0x100 On 10/2/17 7:55 AM, coleen.phillimore at oracle.com wrote: > > I can sponsor this for you once you rebase, and fix these compilation errors. > Thanks, > Coleen > > On 9/30/17 12:28 AM, Volker Simonis wrote: >> Hi Vladimir, >> >> thanks a lot for remembering these changes! >> >> Regards, >> Volker >> >> >> Vladimir Kozlov > schrieb am Fr. 29. Sep. 2017 um 15:47: >> >> I hit build failure when tried to push changes: >> >> src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data >> src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data >> >> I am going to fix it by casting (int): >> >> +? void adjust_size(size_t used) { >> +? ? _size = (int)used; >> +? ? _data_offset = (int)used; >> +? ? _code_end = (address)this + used; >> +? ? _data_end = (address)this + used; >> +? } >> >> Note, CodeCache size can't more than 2Gb (max_int) so such casting is fine. >> >> Vladimir >> >> On 9/6/17 6:20 AM, Volker Simonis wrote: >> > On Tue, Sep 5, 2017 at 9:36 PM,? > wrote: >> >> >> >> I was going to make the same comment about the friend declaration in v1, so >> >> v2 looks better to me.? Looks good.? Thank you for finding a solution to >> >> this problem that we've had for a long time.? I will sponsor this (remind me >> >> if I forget after the 18th). >> >> >> > >> > Thanks Coleen! I've updated >> > >> > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ >> > >> > in-place and added you as a second reviewer. >> > >> > Regards, >> > Volker >> > >> > >> >> thanks, >> >> Coleen >> >> >> >> >> >> >> >> On 9/5/17 1:17 PM, Vladimir Kozlov wrote: >> >>> >> >>> On 9/5/17 9:49 AM, Volker Simonis wrote: >> >>>> >> >>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov >> >>>> > wrote: >> >>>>> >> >>>>> May be add new CodeBlob's method to adjust sizes instead of directly >> >>>>> setting >> >>>>> them in? CodeCache::free_unused_tail(). Then you would not need friend >> >>>>> class >> >>>>> CodeCache in CodeBlob. >> >>>>> >> >>>> >> >>>> Changed as suggested (I didn't liked the friend declaration as well :) >> >>>> >> >>>>> Also I think adjustment to header_size should be done in >> >>>>> CodeCache::free_unused_tail() to limit scope of code who knows about >> >>>>> blob >> >>>>> layout. >> >>>>> >> >>>> >> >>>> Yes, that's much cleaner. Please find the updated webrev here: >> >>>> >> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ >> >>> >> >>> >> >>> Good. >> >>> >> >>>> >> >>>> I've also found another "day 1" problem in StubQueue::next(): >> >>>> >> >>>>? ? ? Stub* next(Stub* s) const ? ? ? ? { int i = >> >>>> index_of(s) + stub_size(s); >> >>>> - ? ? ? ? ?if (i == >> >>>> _buffer_limit) i = 0; >> >>>> + ? ? ? ? ?// Only wrap >> >>>> around in the non-contiguous case (see stubss.cpp) >> >>>> + ? ? ? ? ?if (i == >> >>>> _buffer_limit && _queue_end < _buffer_limit) i = 0; >> >>>> ? ? ? ? ? ?return (i == >> >>>> _queue_end) ? NULL : stub_at(i); >> >>>> ? ? ? ? ?} >> >>>> >> >>>> The problem was that the method was not prepared to handle the case >> >>>> where _buffer_limit == _queue_end == _buffer_size which lead to an >> >>>> infinite recursion when iterating over a StubQueue with >> >>>> StubQueue::next() until next() returns NULL (as this was for example >> >>>> done with -XX:+PrintInterpreter). But with the new, trimmed CodeBlob >> >>>> we run into exactly this situation. >> >>> >> >>> >> >>> Okay. >> >>> >> >>>> >> >>>> While doing this last fix I also noticed that "StubQueue::stubs_do()", >> >>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't seem >> >>>> to be used anywhere in the open code base (please correct me if I'm >> >>>> wrong). What do you think, maybe we should remove this code in a >> >>>> follow up change if it is really not needed? >> >>> >> >>> >> >>> register_queue() is used in constructor. Other 2 you can remove. >> >>> stub_code_begin() and stub_code_end() are not used too -remove. >> >>> I thought we run on linux with flag which warn about unused code. >> >>> >> >>>> >> >>>> Finally, could you please run the new version through JPRT and sponsor >> >>>> it once jdk10/hs will be opened again? >> >>> >> >>> >> >>> Will do when jdk10 "consolidation" is finished. Please, remind me later if >> >>> I forget. >> >>> >> >>> Thanks, >> >>> Vladimir >> >>> >> >>>> >> >>>> Thanks, >> >>>> Volker >> >>>> >> >>>>> Thanks, >> >>>>> Vladimir >> >>>>> >> >>>>> >> >>>>> On 9/1/17 8:46 AM, Volker Simonis wrote: >> >>>>>> >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> I've decided to split the fix for the 'CodeHeap::contains_blob()' >> >>>>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest fails >> >>>>>> because of problems in CodeHeap::contains_blob()" >> >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started a new >> >>>>>> review thread for discussing it at: >> >>>>>> >> >>>>>> >> >>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >> >>>>>> >> >>>>>> So please lets keep this thread for discussing the interpreter code >> >>>>>> size issue only. I've prepared a new version of the webrev which is >> >>>>>> the same as the first one with the only difference that the change to >> >>>>>> 'CodeHeap::contains_blob()' has been removed: >> >>>>>> >> >>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Volker >> >>>>>> >> >>>>>> >> >>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >> >>>>>> > wrote: >> >>>>>>> >> >>>>>>> >> >>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >> >>>>>>> > wrote: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Very good change. Thank you, Volker. >> >>>>>>>> >> >>>>>>>> About contains_blob(). The problem is that AOTCompiledMethod >> >>>>>>>> allocated >> >>>>>>>> in >> >>>>>>>> CHeap and not in aot code section (which is RO): >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >> >>>>>>>> >> >>>>>>>> It is allocated in CHeap after AOT library is loaded. Its >> >>>>>>>> code_begin() >> >>>>>>>> points to AOT code section but AOTCompiledMethod* points outside it >> >>>>>>>> (to >> >>>>>>>> normal malloced space) so you can't use (char*)blob address. >> >>>>>>>> >> >>>>>>> >> >>>>>>> Thanks for the explanation - now I got it. >> >>>>>>> >> >>>>>>>> There are 2 ways to fix it, I think. >> >>>>>>>> One is to add new field to CodeBlobLayout and set it to blob* address >> >>>>>>>> for >> >>>>>>>> normal CodeCache blobs and to code_begin for AOT code. >> >>>>>>>> Second is to use contains(blob->code_end() - 1) assuming that AOT >> >>>>>>>> code >> >>>>>>>> is >> >>>>>>>> never zero. >> >>>>>>>> >> >>>>>>> >> >>>>>>> I'll give it a try tomorrow and will send out a new webrev. >> >>>>>>> >> >>>>>>> Regards, >> >>>>>>> Volker >> >>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> Vladimir >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >> >>>>>>>>> > wrote: >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> While working on this, I found another problem which is related to >> >>>>>>>>>>> the >> >>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing the JTreg >> >>>>>>>>>>> test >> >>>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >> >>>>>>>>>>> >> >>>>>>>>>>> The problem is that JDK-8183573 replaced >> >>>>>>>>>>> >> >>>>>>>>>>>? ? ? ? virtual bool contains_blob(const CodeBlob* blob) const { >> >>>>>>>>>>> return >> >>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >> >>>>>>>>>>> >> >>>>>>>>>>> by: >> >>>>>>>>>>> >> >>>>>>>>>>>? ? ? ? bool contains_blob(const CodeBlob* blob) const { return >> >>>>>>>>>>> contains(blob->code_begin()); } >> >>>>>>>>>>> >> >>>>>>>>>>> But that my be wrong in the corner case where the size of the >> >>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >> >>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that case >> >>>>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's header >> >>>>>>>>>>> which >> >>>>>>>>>>> is a memory location which doesn't belong to the CodeBlob anymore. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> I recall this change was somehow necessary to allow merging >> >>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob into >> >>>>>>>>>> one devirtualized method, so you need to ensure all AOT tests >> >>>>>>>>>> pass with this change (on linux-x64). >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and passed >> >>>>>>>>> successful. Are there any other tests I should check? >> >>>>>>>>> >> >>>>>>>>> That said, it is a little hard to follow the stages of your change. >> >>>>>>>>> It >> >>>>>>>>> seems like >> >>>>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >> >>>>>>>>> was reviewed [1] but then finally the slightly changed version from >> >>>>>>>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ >> >>>>>>>>> was >> >>>>>>>>> checked in and linked to the bug report. >> >>>>>>>>> >> >>>>>>>>> The first, reviewed version of the change still had a correct >> >>>>>>>>> version >> >>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while the second, >> >>>>>>>>> checked in version has the faulty version of that method. >> >>>>>>>>> >> >>>>>>>>> I don't know why you finally did that change to 'contains_blob()' >> >>>>>>>>> but >> >>>>>>>>> I don't see any reason why we shouldn't be able to directly use the >> >>>>>>>>> blob's address for inclusion checking. From what I understand, it >> >>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap so no >> >>>>>>>>> reason >> >>>>>>>>> to mess with 'CodeBlob::code_begin()'. >> >>>>>>>>> >> >>>>>>>>> Please let me know if I'm missing something. >> >>>>>>>>> >> >>>>>>>>> [1] >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >> >>>>>>>>> >> >>>>>>>>>> I can't help to wonder if we'd not be better served by disallowing >> >>>>>>>>>> zero-sized payloads. Is this something that can ever actually >> >>>>>>>>>> happen except by abuse of the white box API? >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) specifically >> >>>>>>>>> wants to allocate "segment sized" blocks which is most easily >> >>>>>>>>> achieved >> >>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's nothing >> >>>>>>>>> wrong >> >>>>>>>>> about it if we handle the inclusion tests correctly. >> >>>>>>>>> >> >>>>>>>>> Thank you and best regards, >> >>>>>>>>> Volker >> >>>>>>>>> >> >>>>>>>>>> /Claes >> >> >> >> >> > From vladimir.kozlov at oracle.com Tue Oct 3 19:58:54 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 3 Oct 2017 12:58:54 -0700 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). In-Reply-To: <908a6ae1-0d83-361c-9c1b-1b2a114884ff@oracle.com> References: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> <908a6ae1-0d83-361c-9c1b-1b2a114884ff@oracle.com> Message-ID: On 10/2/17 8:52 AM, Patric Hedlin wrote: > Hi Vladimir, > > > On 09/29/2017 08:56 PM, Vladimir Kozlov wrote: >> In general it is fine. Few notes. >> You use ifdef DEBUG_SPARC_CAPS which is undefed at the beginning. Is it set by gcc by default? >> > I have not noticed any (obvious) convention in the code base for this case, when I have a entirely (file-) local, typically debug, definition that makes no sense to define except within a particular > file. I usually list those as undefines in the beginning of the file to make sure they are not exposed to the command line (the rationale being that they should not be of use if you are not actively > making changes to the particular file). And it sort of works as part of the local docs. Got it. But in such situation we have other mechanisms to print information about CPUs. I would suggest to use unified logging currently we use for this: -Xlog:os+cpu http://hg.openjdk.java.net/jdk10/hs/file/58931d9b2260/src/hotspot/share/runtime/vm_version.cpp#l300 There are different levels of output and for your case you can use Debug or Trace level (default is Info). Thanks, Vladimir > > Would it be an acceptable approach to add a comment like this: > > /* NOTE: Enable the local define 'DEBUG_LINUX_SPARC_CAPS' below (or define it > ?*?????? from the command line) as an aid when updating the feature table. > #define DEBUG_LINUX_SPARC_CAPS > ?*/ > > Close to its first use (?). (I changed the name since it will be exposed outside the file.) > >> Coding style for methods definitions - open parenthesis should be on the same line: >> >> +? bool match(const char* s) const >> +? { >> > > Old habits die hard... and it's so much more readable ;) > > /Patric >> Thanks, >> Vladimir >> >> On 9/29/17 6:08 AM, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8172232 >>> >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8172232/ >>> >>> >>> 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). >>> >>> ???? Subsumes (duplicate) JDK-8186579: VM_Version::platform_features() needs update on linux-sparc. >>> >>> >>> Caveat: >>> >>> ???? This update will introduce some redundancies into the code base, features and definitions >>> ???? currently not used, addressed by subsequent bug or feature updates/patches. Fujitsu HW is >>> ???? treated very conservatively. >>> >>> >>> Testing: >>> >>> ???? JDK9/JDK10 local jtreg/hotspot >>> >>> >>> Thanks to Adrian for additional test (and review) support. >>> >>> Tested-By: John Paul Adrian Glaubitz >>> >>> >>> Best regards, >>> Patric >>> > From coleen.phillimore at oracle.com Tue Oct 3 20:02:38 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 3 Oct 2017 16:02:38 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <7982f8eb-e4ba-8c09-f15f-e33797553141@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> <1498efad-e443-5875-cc20-b0d0c926e883@oracle.com> <7982f8eb-e4ba-8c09-f15f-e33797553141@oracle.com> Message-ID: <124f386e-28ec-701a-111c-fcc15335feb6@oracle.com> Stefan found a problem that set_java_mirror() code could be unsafe if the java_mirror code changes, which the function allowed one to do.? There is code in jvmtiRedefineClasses that temporarily switches the java_mirrors for verification of the newly loaded class.? Since this simply swaps java_mirrors that are together in the ClassLoaderData::_handles area, I added an API for that and made set_java_mirror() more restrictive. I reran JVMTI, CDS and tier1 tests.?? New webrev with all changes are: open webrev at http://cr.openjdk.java.net/~coleenp/8186777.04/webrev Thanks, Coleen On 10/3/17 10:23 AM, coleen.phillimore at oracle.com wrote: > > Here is an updated webrev with fixes for your comments. > > open webrev at http://cr.openjdk.java.net/~coleenp/8186777.03/webrev > > Thanks for reviewing and all your help with this! > > Coleen > > On 9/29/17 6:41 AM, Stefan Karlsson wrote: >> Hi Coleen, >> >> I started looking at this, but will need a second round before I've >> fully reviewed the GC parts. >> >> Here are some nits that would be nice to get cleaned up. >> >> ========== >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.cpp.frames.html >> >> >> ?788???? record_modified_oops();? // necessary? >> >> This could be removed. Only G1 cares about deleted "weak" references. >> >> Or we can wait until Erik?'s GC Barrier Interface is in place and >> remove it then. >> >> ---------- >> >> ?#ifdef CLD_DUMP_KLASSES >> ?? if (Verbose) { >> ???? Klass* k = _klasses; >> ???? while (k != NULL) { >> -????? out->print_cr("klass " PTR_FORMAT ", %s, CT: %d, MUT: %d", k, >> k->name()->as_C_string(), >> -????????? k->has_modified_oops(), k->has_accumulated_modified_oops()); >> +????? out->print_cr("klass " PTR_FORMAT ", %s", k, >> k->name()->as_C_string()); >> ?????? assert(k != k->next_link(), "no loops!"); >> ?????? k = k->next_link(); >> ???? } >> ?? } >> ?#endif? // CLD_DUMP_KLASSES >> >> Pre-existing: I don't think this will compile if you turn on >> CLD_DUMP_KLASSES. k must be p2i(k). >> >> ========== >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.hpp.udiff.html >> >> >> +? // Remembered sets support for the oops in the class loader data. >> +? jbyte _modified_oops;???????????? // Card Table Equivalent (YC/CMS >> support) >> +? jbyte _accumulated_modified_oops; // Mod Union Equivalent (CMS >> support) >> >> We should create a follow-up bug to change these jbytes to bools. >> >> ========== >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1HeapVerifier.cpp.frames.html >> >> >> Spurious addition: >> +? G1CollectedHeap* _g1h; >> >> ========== >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1OopClosures.hpp.udiff.html >> >> >> Spurious addition?: >> +? G1CollectedHeap* g1() { return _g1; } >> >> ========== >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psScavenge.inline.hpp.patch >> >> >> ?? PSPromotionManager* _pm; >> -? // Used to redirty a scanned klass if it has oops >> +? // Used to redirty a scanned cld if it has oops >> ?? // pointing to the young generation after being scanned. >> -? Klass*???????????? _scanned_klass; >> +? ClassLoaderData*???????????? _scanned_cld; >> >> Indentation. >> >> ========== >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psTasks.cpp.frames.html >> >> >> ? 80???? case class_loader_data: >> ? 81???? { >> ? 82?????? PSScavengeCLDClosure ps(pm); >> ? 83?????? ClassLoaderDataGraph::cld_do(&ps); >> ? 84???? } >> >> Would you mind changing the name ps to cld_closure? >> >> ========== >> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/shared/genOopClosures.hpp.patch >> >> >> +? OopsInClassLoaderDataOrGenClosure*?? _scavenge_closure; >> ?? // true if the the modified oops state should be saved. >> ?? bool???????????????????? _accumulate_modified_oops; >> >> Indentation. >> >> ---------- >> +? void do_cld(ClassLoaderData* k); >> >> Rename k? >> >> Thanks, >> StefanK >> >> On 2017-09-28 23:36, coleen.phillimore at oracle.com wrote: >>> >>> Thank you to Stefan Karlsson offlist for pointing out that the >>> previous .01 version of this webrev breaks CMS in that it doesn't >>> remember ClassLoaderData::_handles that are changed and added while >>> concurrent marking is in progress.? I've fixed this bug to move the >>> Klass::_modified_oops and _accumulated_modified_oops to the >>> ClassLoaderData and use these fields in the CMS remarking phase to >>> catch any new handles that are added.?? This also fixes this bug >>> https://bugs.openjdk.java.net/browse/JDK-8173988 . >>> >>> In addition, the previous version of this change removed an >>> optimization during young collection, which showed some uncertain >>> performance regression in young pause times, so I added this >>> optimization back to not walk ClassLoaderData during young >>> collections if all the oops are old.? The performance results of >>> SPECjbb2015 now are slightly better, but not significantly. >>> >>> This latest patch has been tested on tier1-5 on linux x64 and >>> windows x64 in mach5 test harness. >>> >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >>> >>> Can I get at least 3 reviewers?? One from each of the compiler, gc, >>> and runtime group at least since there are changes to all 3. >>> >>> Thanks! >>> Coleen >>> >>> >>> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >>>> Summary: Add indirection for fetching mirror so that GC doesn't >>>> have to follow CLD::_klasses >>>> >>>> Thank you to Tom Rodriguez for Graal changes and Rickard for the C2 >>>> changes. >>>> >>>> Ran nightly tests through Mach5 and RBT.?? Early performance >>>> testing showed good performance improvment in GC class loader data >>>> processing time, but nmethod processing time continues to dominate. >>>> Also performace testing showed no throughput regression.?? I'm >>>> rerunning both of these performance testing and will post the numbers. >>>> >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >>>> >>>> Thanks, >>>> Coleen > From stefan.karlsson at oracle.com Tue Oct 3 20:15:06 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 3 Oct 2017 22:15:06 +0200 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <124f386e-28ec-701a-111c-fcc15335feb6@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> <1498efad-e443-5875-cc20-b0d0c926e883@oracle.com> <7982f8eb-e4ba-8c09-f15f-e33797553141@oracle.com> <124f386e-28ec-701a-111c-fcc15335feb6@oracle.com> Message-ID: <055f4b75-efaa-79a3-0b6f-83c13ab87896@oracle.com> On 2017-10-03 22:02, coleen.phillimore at oracle.com wrote: > > Stefan found a problem that set_java_mirror() code could be unsafe if > the java_mirror code changes, which the function allowed one to do.? > There is code in jvmtiRedefineClasses that temporarily switches the > java_mirrors for verification of the newly loaded class.? Since this > simply swaps java_mirrors that are together in the > ClassLoaderData::_handles area, I added an API for that and made > set_java_mirror() more restrictive. > > I reran JVMTI, CDS and tier1 tests.?? New webrev with all changes are: > > open webrev at http://cr.openjdk.java.net/~coleenp/8186777.04/webrev The GC parts look good to me. Thanks, StefanK > > Thanks, > Coleen > > On 10/3/17 10:23 AM, coleen.phillimore at oracle.com wrote: >> >> Here is an updated webrev with fixes for your comments. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.03/webrev >> >> Thanks for reviewing and all your help with this! >> >> Coleen >> >> On 9/29/17 6:41 AM, Stefan Karlsson wrote: >>> Hi Coleen, >>> >>> I started looking at this, but will need a second round before I've >>> fully reviewed the GC parts. >>> >>> Here are some nits that would be nice to get cleaned up. >>> >>> ========== >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.cpp.frames.html >>> >>> >>> ?788???? record_modified_oops();? // necessary? >>> >>> This could be removed. Only G1 cares about deleted "weak" references. >>> >>> Or we can wait until Erik?'s GC Barrier Interface is in place and >>> remove it then. >>> >>> ---------- >>> >>> ?#ifdef CLD_DUMP_KLASSES >>> ?? if (Verbose) { >>> ???? Klass* k = _klasses; >>> ???? while (k != NULL) { >>> -????? out->print_cr("klass " PTR_FORMAT ", %s, CT: %d, MUT: %d", k, >>> k->name()->as_C_string(), >>> -????????? k->has_modified_oops(), k->has_accumulated_modified_oops()); >>> +????? out->print_cr("klass " PTR_FORMAT ", %s", k, >>> k->name()->as_C_string()); >>> ?????? assert(k != k->next_link(), "no loops!"); >>> ?????? k = k->next_link(); >>> ???? } >>> ?? } >>> ?#endif? // CLD_DUMP_KLASSES >>> >>> Pre-existing: I don't think this will compile if you turn on >>> CLD_DUMP_KLASSES. k must be p2i(k). >>> >>> ========== >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.hpp.udiff.html >>> >>> >>> +? // Remembered sets support for the oops in the class loader data. >>> +? jbyte _modified_oops;???????????? // Card Table Equivalent >>> (YC/CMS support) >>> +? jbyte _accumulated_modified_oops; // Mod Union Equivalent (CMS >>> support) >>> >>> We should create a follow-up bug to change these jbytes to bools. >>> >>> ========== >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1HeapVerifier.cpp.frames.html >>> >>> >>> Spurious addition: >>> +? G1CollectedHeap* _g1h; >>> >>> ========== >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1OopClosures.hpp.udiff.html >>> >>> >>> Spurious addition?: >>> +? G1CollectedHeap* g1() { return _g1; } >>> >>> ========== >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psScavenge.inline.hpp.patch >>> >>> >>> ?? PSPromotionManager* _pm; >>> -? // Used to redirty a scanned klass if it has oops >>> +? // Used to redirty a scanned cld if it has oops >>> ?? // pointing to the young generation after being scanned. >>> -? Klass*???????????? _scanned_klass; >>> +? ClassLoaderData*???????????? _scanned_cld; >>> >>> Indentation. >>> >>> ========== >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psTasks.cpp.frames.html >>> >>> >>> ? 80???? case class_loader_data: >>> ? 81???? { >>> ? 82?????? PSScavengeCLDClosure ps(pm); >>> ? 83?????? ClassLoaderDataGraph::cld_do(&ps); >>> ? 84???? } >>> >>> Would you mind changing the name ps to cld_closure? >>> >>> ========== >>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/shared/genOopClosures.hpp.patch >>> >>> >>> +? OopsInClassLoaderDataOrGenClosure*?? _scavenge_closure; >>> ?? // true if the the modified oops state should be saved. >>> ?? bool???????????????????? _accumulate_modified_oops; >>> >>> Indentation. >>> >>> ---------- >>> +? void do_cld(ClassLoaderData* k); >>> >>> Rename k? >>> >>> Thanks, >>> StefanK >>> >>> On 2017-09-28 23:36, coleen.phillimore at oracle.com wrote: >>>> >>>> Thank you to Stefan Karlsson offlist for pointing out that the >>>> previous .01 version of this webrev breaks CMS in that it doesn't >>>> remember ClassLoaderData::_handles that are changed and added while >>>> concurrent marking is in progress. I've fixed this bug to move the >>>> Klass::_modified_oops and _accumulated_modified_oops to the >>>> ClassLoaderData and use these fields in the CMS remarking phase to >>>> catch any new handles that are added.?? This also fixes this bug >>>> https://bugs.openjdk.java.net/browse/JDK-8173988 . >>>> >>>> In addition, the previous version of this change removed an >>>> optimization during young collection, which showed some uncertain >>>> performance regression in young pause times, so I added this >>>> optimization back to not walk ClassLoaderData during young >>>> collections if all the oops are old.? The performance results of >>>> SPECjbb2015 now are slightly better, but not significantly. >>>> >>>> This latest patch has been tested on tier1-5 on linux x64 and >>>> windows x64 in mach5 test harness. >>>> >>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >>>> >>>> Can I get at least 3 reviewers?? One from each of the compiler, gc, >>>> and runtime group at least since there are changes to all 3. >>>> >>>> Thanks! >>>> Coleen >>>> >>>> >>>> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >>>>> Summary: Add indirection for fetching mirror so that GC doesn't >>>>> have to follow CLD::_klasses >>>>> >>>>> Thank you to Tom Rodriguez for Graal changes and Rickard for the >>>>> C2 changes. >>>>> >>>>> Ran nightly tests through Mach5 and RBT.?? Early performance >>>>> testing showed good performance improvment in GC class loader data >>>>> processing time, but nmethod processing time continues to >>>>> dominate. Also performace testing showed no throughput >>>>> regression.?? I'm rerunning both of these performance testing and >>>>> will post the numbers. >>>>> >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >>>>> >>>>> Thanks, >>>>> Coleen >> > From coleen.phillimore at oracle.com Tue Oct 3 20:31:43 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 3 Oct 2017 16:31:43 -0400 Subject: RFR (L) 8186777: Make Klass::_java_mirror an OopHandle In-Reply-To: <055f4b75-efaa-79a3-0b6f-83c13ab87896@oracle.com> References: <9adb92ce-fb2d-11df-01dc-722e482a4d40@oracle.com> <383dcc42-47ea-3d3e-5565-15f8950c35ae@oracle.com> <1498efad-e443-5875-cc20-b0d0c926e883@oracle.com> <7982f8eb-e4ba-8c09-f15f-e33797553141@oracle.com> <124f386e-28ec-701a-111c-fcc15335feb6@oracle.com> <055f4b75-efaa-79a3-0b6f-83c13ab87896@oracle.com> Message-ID: On 10/3/17 4:15 PM, Stefan Karlsson wrote: > On 2017-10-03 22:02, coleen.phillimore at oracle.com wrote: >> >> Stefan found a problem that set_java_mirror() code could be unsafe if >> the java_mirror code changes, which the function allowed one to do.? >> There is code in jvmtiRedefineClasses that temporarily switches the >> java_mirrors for verification of the newly loaded class.? Since this >> simply swaps java_mirrors that are together in the >> ClassLoaderData::_handles area, I added an API for that and made >> set_java_mirror() more restrictive. >> >> I reran JVMTI, CDS and tier1 tests.?? New webrev with all changes are: >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.04/webrev > > The GC parts look good to me. Thanks for your help! Coleen > > Thanks, > StefanK > >> >> Thanks, >> Coleen >> >> On 10/3/17 10:23 AM, coleen.phillimore at oracle.com wrote: >>> >>> Here is an updated webrev with fixes for your comments. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.03/webrev >>> >>> Thanks for reviewing and all your help with this! >>> >>> Coleen >>> >>> On 9/29/17 6:41 AM, Stefan Karlsson wrote: >>>> Hi Coleen, >>>> >>>> I started looking at this, but will need a second round before I've >>>> fully reviewed the GC parts. >>>> >>>> Here are some nits that would be nice to get cleaned up. >>>> >>>> ========== >>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.cpp.frames.html >>>> >>>> >>>> ?788???? record_modified_oops();? // necessary? >>>> >>>> This could be removed. Only G1 cares about deleted "weak" references. >>>> >>>> Or we can wait until Erik?'s GC Barrier Interface is in place and >>>> remove it then. >>>> >>>> ---------- >>>> >>>> ?#ifdef CLD_DUMP_KLASSES >>>> ?? if (Verbose) { >>>> ???? Klass* k = _klasses; >>>> ???? while (k != NULL) { >>>> -????? out->print_cr("klass " PTR_FORMAT ", %s, CT: %d, MUT: %d", >>>> k, k->name()->as_C_string(), >>>> -????????? k->has_modified_oops(), >>>> k->has_accumulated_modified_oops()); >>>> +????? out->print_cr("klass " PTR_FORMAT ", %s", k, >>>> k->name()->as_C_string()); >>>> ?????? assert(k != k->next_link(), "no loops!"); >>>> ?????? k = k->next_link(); >>>> ???? } >>>> ?? } >>>> ?#endif? // CLD_DUMP_KLASSES >>>> >>>> Pre-existing: I don't think this will compile if you turn on >>>> CLD_DUMP_KLASSES. k must be p2i(k). >>>> >>>> ========== >>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/classfile/classLoaderData.hpp.udiff.html >>>> >>>> >>>> +? // Remembered sets support for the oops in the class loader data. >>>> +? jbyte _modified_oops;???????????? // Card Table Equivalent >>>> (YC/CMS support) >>>> +? jbyte _accumulated_modified_oops; // Mod Union Equivalent (CMS >>>> support) >>>> >>>> We should create a follow-up bug to change these jbytes to bools. >>>> >>>> ========== >>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1HeapVerifier.cpp.frames.html >>>> >>>> >>>> Spurious addition: >>>> +? G1CollectedHeap* _g1h; >>>> >>>> ========== >>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/g1/g1OopClosures.hpp.udiff.html >>>> >>>> >>>> Spurious addition?: >>>> +? G1CollectedHeap* g1() { return _g1; } >>>> >>>> ========== >>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psScavenge.inline.hpp.patch >>>> >>>> >>>> ?? PSPromotionManager* _pm; >>>> -? // Used to redirty a scanned klass if it has oops >>>> +? // Used to redirty a scanned cld if it has oops >>>> ?? // pointing to the young generation after being scanned. >>>> -? Klass*???????????? _scanned_klass; >>>> +? ClassLoaderData*???????????? _scanned_cld; >>>> >>>> Indentation. >>>> >>>> ========== >>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/parallel/psTasks.cpp.frames.html >>>> >>>> >>>> ? 80???? case class_loader_data: >>>> ? 81???? { >>>> ? 82?????? PSScavengeCLDClosure ps(pm); >>>> ? 83?????? ClassLoaderDataGraph::cld_do(&ps); >>>> ? 84???? } >>>> >>>> Would you mind changing the name ps to cld_closure? >>>> >>>> ========== >>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/src/hotspot/share/gc/shared/genOopClosures.hpp.patch >>>> >>>> >>>> +? OopsInClassLoaderDataOrGenClosure*?? _scavenge_closure; >>>> ?? // true if the the modified oops state should be saved. >>>> ?? bool???????????????????? _accumulate_modified_oops; >>>> >>>> Indentation. >>>> >>>> ---------- >>>> +? void do_cld(ClassLoaderData* k); >>>> >>>> Rename k? >>>> >>>> Thanks, >>>> StefanK >>>> >>>> On 2017-09-28 23:36, coleen.phillimore at oracle.com wrote: >>>>> >>>>> Thank you to Stefan Karlsson offlist for pointing out that the >>>>> previous .01 version of this webrev breaks CMS in that it doesn't >>>>> remember ClassLoaderData::_handles that are changed and added >>>>> while concurrent marking is in progress. I've fixed this bug to >>>>> move the Klass::_modified_oops and _accumulated_modified_oops to >>>>> the ClassLoaderData and use these fields in the CMS remarking >>>>> phase to catch any new handles that are added.?? This also fixes >>>>> this bug https://bugs.openjdk.java.net/browse/JDK-8173988 . >>>>> >>>>> In addition, the previous version of this change removed an >>>>> optimization during young collection, which showed some uncertain >>>>> performance regression in young pause times, so I added this >>>>> optimization back to not walk ClassLoaderData during young >>>>> collections if all the oops are old.? The performance results of >>>>> SPECjbb2015 now are slightly better, but not significantly. >>>>> >>>>> This latest patch has been tested on tier1-5 on linux x64 and >>>>> windows x64 in mach5 test harness. >>>>> >>>>> http://cr.openjdk.java.net/~coleenp/8186777.02/webrev/ >>>>> >>>>> Can I get at least 3 reviewers?? One from each of the compiler, >>>>> gc, and runtime group at least since there are changes to all 3. >>>>> >>>>> Thanks! >>>>> Coleen >>>>> >>>>> >>>>> On 9/6/17 12:04 PM, coleen.phillimore at oracle.com wrote: >>>>>> Summary: Add indirection for fetching mirror so that GC doesn't >>>>>> have to follow CLD::_klasses >>>>>> >>>>>> Thank you to Tom Rodriguez for Graal changes and Rickard for the >>>>>> C2 changes. >>>>>> >>>>>> Ran nightly tests through Mach5 and RBT.?? Early performance >>>>>> testing showed good performance improvment in GC class loader >>>>>> data processing time, but nmethod processing time continues to >>>>>> dominate. Also performace testing showed no throughput >>>>>> regression.?? I'm rerunning both of these performance testing and >>>>>> will post the numbers. >>>>>> >>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8186777 >>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8186777.01/webrev >>>>>> >>>>>> Thanks, >>>>>> Coleen >>> >> > From volker.simonis at gmail.com Wed Oct 4 07:19:49 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 04 Oct 2017 07:19:49 +0000 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> Message-ID: Thanks Vladimir, I'll take a look at the problem next week when I'm back from JavaOne. Regards, Volker Vladimir Kozlov schrieb am Di. 3. Okt. 2017 um 12:43: > I rebased it. But there is problem with changes. VM hit guarantee() in > this code when run on SPARC in both, fastdebug and product, builds. > Crash happens during build. We can't push this - problem should be > investigated and fixed first. > > Thanks, > Vladimir > > make/Main.gmk:443: recipe for target 'generate-link-opt-data' failed > /usr/ccs/bin/bash: line 4: 9349 Abort (core dumped) > /s/build/solaris-sparcv9-debug/support/interim-image/bin/java > -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist > -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true -cp > /s/build/solaris-sparcv9-debug/support/classlist.jar > build.tools.classlist.HelloClasslist 2>&1 > > /s/build/solaris-sparcv9-debug/support/link_opt/default_jli_trace.txt > make[3]: *** [/s/build/solaris-sparcv9-debug/support/link_opt/classlist] > Error 134 > make[2]: *** [generate-link-opt-data] Error 1 > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/s/open/src/hotspot/share/memory/heap.cpp:233), > pid=9349, tid=2 > # guarantee(b == block_at(_next_segment - actual_number_of_segments)) > failed: Intermediate allocation! > # > # JRE version: (10.0) (fastdebug build ) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug > 10-internal+0-2017-09-30-014154.8166317, mixed mode, tiered, compressed > oops, g1 gc, solaris-sparc) > # Core dump will be written. Default location: /s/open/make/core or > core.9349 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > > --------------- S U M M A R Y ------------ > > Command Line: > -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist > -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true > build.tools.classlist.HelloClasslist > > Host: sca00dbv, Sparcv9 64 bit 3600 MHz, 16 cores, 32G, Oracle Solaris > 11.2 SPARC > Time: Sat Sep 30 03:29:46 2017 UTC elapsed time: 0 seconds (0d 0h 0m 0s) > > --------------- T H R E A D --------------- > > Current thread (0x000000010012f000): JavaThread "Unknown thread" > [_thread_in_vm, id=2, stack(0x0007fffef9700000,0x0007fffef9800000)] > > Stack: [0x0007fffef9700000,0x0007fffef9800000], sp=0x0007fffef97ff020, > free space=1020k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1f94508] void VMError::report_and_die(int,const > char*,const char*,void*,Thread*,unsigned char*,void*,void*,const > char*,int,unsigned long)+0xa58 > V [libjvm.so+0x1f93a3c] void VMError::report_and_die(Thread*,const > char*,int,const char*,const char*,void*)+0x3c > V [libjvm.so+0xd02f38] void report_vm_error(const char*,int,const > char*,const char*,...)+0x78 > V [libjvm.so+0xfc219c] void CodeHeap::deallocate_tail(void*,unsigned > long)+0xec > V [libjvm.so+0xbf4f14] void > CodeCache::free_unused_tail(CodeBlob*,unsigned long)+0xe4 > V [libjvm.so+0x1e0ae70] void StubQueue::deallocate_unused_tail()+0x40 > V [libjvm.so+0x1e7452c] void TemplateInterpreter::initialize()+0x19c > V [libjvm.so+0x1051220] void interpreter_init()+0x20 > V [libjvm.so+0x10116e0] int init_globals()+0xf0 > V [libjvm.so+0x1ed8548] int > Threads::create_vm(JavaVMInitArgs*,bool*)+0x4a8 > V [libjvm.so+0x11c7b58] int > JNI_CreateJavaVM_inner(JavaVM_**,void**,void*)+0x108 > C [libjli.so+0x7950] InitializeJVM+0x100 > > > On 10/2/17 7:55 AM, coleen.phillimore at oracle.com wrote: > > > > I can sponsor this for you once you rebase, and fix these compilation > errors. > > Thanks, > > Coleen > > > > On 9/30/17 12:28 AM, Volker Simonis wrote: > >> Hi Vladimir, > >> > >> thanks a lot for remembering these changes! > >> > >> Regards, > >> Volker > >> > >> > >> Vladimir Kozlov vladimir.kozlov at oracle.com>> schrieb am Fr. 29. Sep. 2017 um 15:47: > >> > >> I hit build failure when tried to push changes: > >> > >> src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' : > conversion from 'size_t' to 'int', possible loss of data > >> src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' : > conversion from 'size_t' to 'int', possible loss of data > >> > >> I am going to fix it by casting (int): > >> > >> + void adjust_size(size_t used) { > >> + _size = (int)used; > >> + _data_offset = (int)used; > >> + _code_end = (address)this + used; > >> + _data_end = (address)this + used; > >> + } > >> > >> Note, CodeCache size can't more than 2Gb (max_int) so such casting > is fine. > >> > >> Vladimir > >> > >> On 9/6/17 6:20 AM, Volker Simonis wrote: > >> > On Tue, Sep 5, 2017 at 9:36 PM, > wrote: > >> >> > >> >> I was going to make the same comment about the friend > declaration in v1, so > >> >> v2 looks better to me. Looks good. Thank you for finding a > solution to > >> >> this problem that we've had for a long time. I will sponsor > this (remind me > >> >> if I forget after the 18th). > >> >> > >> > > >> > Thanks Coleen! I've updated > >> > > >> > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ < > http://cr.openjdk.java.net/%7Esimonis/webrevs/2017/8166317.v2/> > >> > > >> > in-place and added you as a second reviewer. > >> > > >> > Regards, > >> > Volker > >> > > >> > > >> >> thanks, > >> >> Coleen > >> >> > >> >> > >> >> > >> >> On 9/5/17 1:17 PM, Vladimir Kozlov wrote: > >> >>> > >> >>> On 9/5/17 9:49 AM, Volker Simonis wrote: > >> >>>> > >> >>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov > >> >>>> > > wrote: > >> >>>>> > >> >>>>> May be add new CodeBlob's method to adjust sizes instead of > directly > >> >>>>> setting > >> >>>>> them in CodeCache::free_unused_tail(). Then you would not > need friend > >> >>>>> class > >> >>>>> CodeCache in CodeBlob. > >> >>>>> > >> >>>> > >> >>>> Changed as suggested (I didn't liked the friend declaration as > well :) > >> >>>> > >> >>>>> Also I think adjustment to header_size should be done in > >> >>>>> CodeCache::free_unused_tail() to limit scope of code who > knows about > >> >>>>> blob > >> >>>>> layout. > >> >>>>> > >> >>>> > >> >>>> Yes, that's much cleaner. Please find the updated webrev here: > >> >>>> > >> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ < > http://cr.openjdk.java.net/%7Esimonis/webrevs/2017/8166317.v2/> > >> >>> > >> >>> > >> >>> Good. > >> >>> > >> >>>> > >> >>>> I've also found another "day 1" problem in StubQueue::next(): > >> >>>> > >> >>>> Stub* next(Stub* s) const { int i = > >> >>>> index_of(s) + stub_size(s); > >> >>>> - if (i == > >> >>>> _buffer_limit) i = 0; > >> >>>> + // Only wrap > >> >>>> around in the non-contiguous case (see stubss.cpp) > >> >>>> + if (i == > >> >>>> _buffer_limit && _queue_end < _buffer_limit) i = 0; > >> >>>> return (i == > >> >>>> _queue_end) ? NULL : stub_at(i); > >> >>>> } > >> >>>> > >> >>>> The problem was that the method was not prepared to handle the > case > >> >>>> where _buffer_limit == _queue_end == _buffer_size which lead > to an > >> >>>> infinite recursion when iterating over a StubQueue with > >> >>>> StubQueue::next() until next() returns NULL (as this was for > example > >> >>>> done with -XX:+PrintInterpreter). But with the new, trimmed > CodeBlob > >> >>>> we run into exactly this situation. > >> >>> > >> >>> > >> >>> Okay. > >> >>> > >> >>>> > >> >>>> While doing this last fix I also noticed that > "StubQueue::stubs_do()", > >> >>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" > don't seem > >> >>>> to be used anywhere in the open code base (please correct me > if I'm > >> >>>> wrong). What do you think, maybe we should remove this code in > a > >> >>>> follow up change if it is really not needed? > >> >>> > >> >>> > >> >>> register_queue() is used in constructor. Other 2 you can remove. > >> >>> stub_code_begin() and stub_code_end() are not used too -remove. > >> >>> I thought we run on linux with flag which warn about unused > code. > >> >>> > >> >>>> > >> >>>> Finally, could you please run the new version through JPRT and > sponsor > >> >>>> it once jdk10/hs will be opened again? > >> >>> > >> >>> > >> >>> Will do when jdk10 "consolidation" is finished. Please, remind > me later if > >> >>> I forget. > >> >>> > >> >>> Thanks, > >> >>> Vladimir > >> >>> > >> >>>> > >> >>>> Thanks, > >> >>>> Volker > >> >>>> > >> >>>>> Thanks, > >> >>>>> Vladimir > >> >>>>> > >> >>>>> > >> >>>>> On 9/1/17 8:46 AM, Volker Simonis wrote: > >> >>>>>> > >> >>>>>> > >> >>>>>> Hi, > >> >>>>>> > >> >>>>>> I've decided to split the fix for the > 'CodeHeap::contains_blob()' > >> >>>>>> problem into its own issue "8187091: > ReturnBlobToWrongHeapTest fails > >> >>>>>> because of problems in CodeHeap::contains_blob()" > >> >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and > started a new > >> >>>>>> review thread for discussing it at: > >> >>>>>> > >> >>>>>> > >> >>>>>> > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html > >> >>>>>> > >> >>>>>> So please lets keep this thread for discussing the > interpreter code > >> >>>>>> size issue only. I've prepared a new version of the webrev > which is > >> >>>>>> the same as the first one with the only difference that the > change to > >> >>>>>> 'CodeHeap::contains_blob()' has been removed: > >> >>>>>> > >> >>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ > > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> Volker > >> >>>>>> > >> >>>>>> > >> >>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis > >> >>>>>> > > wrote: > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov > >> >>>>>>> vladimir.kozlov at oracle.com>> wrote: > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> Very good change. Thank you, Volker. > >> >>>>>>>> > >> >>>>>>>> About contains_blob(). The problem is that > AOTCompiledMethod > >> >>>>>>>> allocated > >> >>>>>>>> in > >> >>>>>>>> CHeap and not in aot code section (which is RO): > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 > >> >>>>>>>> > >> >>>>>>>> It is allocated in CHeap after AOT library is loaded. Its > >> >>>>>>>> code_begin() > >> >>>>>>>> points to AOT code section but AOTCompiledMethod* points > outside it > >> >>>>>>>> (to > >> >>>>>>>> normal malloced space) so you can't use (char*)blob > address. > >> >>>>>>>> > >> >>>>>>> > >> >>>>>>> Thanks for the explanation - now I got it. > >> >>>>>>> > >> >>>>>>>> There are 2 ways to fix it, I think. > >> >>>>>>>> One is to add new field to CodeBlobLayout and set it to > blob* address > >> >>>>>>>> for > >> >>>>>>>> normal CodeCache blobs and to code_begin for AOT code. > >> >>>>>>>> Second is to use contains(blob->code_end() - 1) assuming > that AOT > >> >>>>>>>> code > >> >>>>>>>> is > >> >>>>>>>> never zero. > >> >>>>>>>> > >> >>>>>>> > >> >>>>>>> I'll give it a try tomorrow and will send out a new webrev. > >> >>>>>>> > >> >>>>>>> Regards, > >> >>>>>>> Volker > >> >>>>>>> > >> >>>>>>>> Thanks, > >> >>>>>>>> Vladimir > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad > >> >>>>>>>>> claes.redestad at oracle.com>> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> While working on this, I found another problem which is > related to > >> >>>>>>>>>>> the > >> >>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing > the JTreg > >> >>>>>>>>>>> test > >> >>>>>>>>>>> > compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. > >> >>>>>>>>>>> > >> >>>>>>>>>>> The problem is that JDK-8183573 replaced > >> >>>>>>>>>>> > >> >>>>>>>>>>> virtual bool contains_blob(const CodeBlob* blob) > const { > >> >>>>>>>>>>> return > >> >>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < > high(); } > >> >>>>>>>>>>> > >> >>>>>>>>>>> by: > >> >>>>>>>>>>> > >> >>>>>>>>>>> bool contains_blob(const CodeBlob* blob) const { > return > >> >>>>>>>>>>> contains(blob->code_begin()); } > >> >>>>>>>>>>> > >> >>>>>>>>>>> But that my be wrong in the corner case where the size > of the > >> >>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists > only of the > >> >>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that > case > >> >>>>>>>>>>> CodeBlob::code_begin() points right behind the > CodeBlob's header > >> >>>>>>>>>>> which > >> >>>>>>>>>>> is a memory location which doesn't belong to the > CodeBlob anymore. > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> I recall this change was somehow necessary to allow > merging > >> >>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob > into > >> >>>>>>>>>> one devirtualized method, so you need to ensure all AOT > tests > >> >>>>>>>>>> pass with this change (on linux-x64). > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed > and passed > >> >>>>>>>>> successful. Are there any other tests I should check? > >> >>>>>>>>> > >> >>>>>>>>> That said, it is a little hard to follow the stages of > your change. > >> >>>>>>>>> It > >> >>>>>>>>> seems like > >> >>>>>>>>> > http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ < > http://cr.openjdk.java.net/%7Eredestad/scratch/codeheap_contains.00/> > >> >>>>>>>>> was reviewed [1] but then finally the slightly changed > version from > >> >>>>>>>>> > http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ < > http://cr.openjdk.java.net/%7Eredestad/scratch/codeheap_contains.01/> > >> >>>>>>>>> was > >> >>>>>>>>> checked in and linked to the bug report. > >> >>>>>>>>> > >> >>>>>>>>> The first, reviewed version of the change still had a > correct > >> >>>>>>>>> version > >> >>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while > the second, > >> >>>>>>>>> checked in version has the faulty version of that method. > >> >>>>>>>>> > >> >>>>>>>>> I don't know why you finally did that change to > 'contains_blob()' > >> >>>>>>>>> but > >> >>>>>>>>> I don't see any reason why we shouldn't be able to > directly use the > >> >>>>>>>>> blob's address for inclusion checking. From what I > understand, it > >> >>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap > so no > >> >>>>>>>>> reason > >> >>>>>>>>> to mess with 'CodeBlob::code_begin()'. > >> >>>>>>>>> > >> >>>>>>>>> Please let me know if I'm missing something. > >> >>>>>>>>> > >> >>>>>>>>> [1] > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html > >> >>>>>>>>> > >> >>>>>>>>>> I can't help to wonder if we'd not be better served by > disallowing > >> >>>>>>>>>> zero-sized payloads. Is this something that can ever > actually > >> >>>>>>>>>> happen except by abuse of the white box API? > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) > specifically > >> >>>>>>>>> wants to allocate "segment sized" blocks which is most > easily > >> >>>>>>>>> achieved > >> >>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's > nothing > >> >>>>>>>>> wrong > >> >>>>>>>>> about it if we handle the inclusion tests correctly. > >> >>>>>>>>> > >> >>>>>>>>> Thank you and best regards, > >> >>>>>>>>> Volker > >> >>>>>>>>> > >> >>>>>>>>>> /Claes > >> >> > >> >> > >> > > > From patric.hedlin at oracle.com Wed Oct 4 09:04:18 2017 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Wed, 4 Oct 2017 11:04:18 +0200 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). In-Reply-To: <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> References: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> Message-ID: <3fcc865d-3eda-a341-e112-8417711ee3e5@oracle.com> Thanks for reviewing Vladimir. On 09/29/2017 08:56 PM, Vladimir Kozlov wrote: > In general it is fine. Few notes. > You use ifdef DEBUG_SPARC_CAPS which is undefed at the beginning. Is > it set by gcc by default? > Removed. > Coding style for methods definitions - open parenthesis should be on > the same line: > > + bool match(const char* s) const > + { > Updated/re-formated. Refreshed webrev. @Adrian: Please validate. Best regards, Patric > Thanks, > Vladimir > > On 9/29/17 6:08 AM, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8172232 >> >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8172232/ >> >> >> 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on >> Linux). >> >> Subsumes (duplicate) JDK-8186579: >> VM_Version::platform_features() needs update on linux-sparc. >> >> >> Caveat: >> >> This update will introduce some redundancies into the code base, >> features and definitions >> currently not used, addressed by subsequent bug or feature >> updates/patches. Fujitsu HW is >> treated very conservatively. >> >> >> Testing: >> >> JDK9/JDK10 local jtreg/hotspot >> >> >> Thanks to Adrian for additional test (and review) support. >> >> Tested-By: John Paul Adrian Glaubitz >> >> >> Best regards, >> Patric >> From glaubitz at physik.fu-berlin.de Wed Oct 4 09:39:35 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 4 Oct 2017 11:39:35 +0200 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). In-Reply-To: <3fcc865d-3eda-a341-e112-8417711ee3e5@oracle.com> References: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> <3fcc865d-3eda-a341-e112-8417711ee3e5@oracle.com> Message-ID: <55211504-0f3e-52a0-0930-f34babb5da14@physik.fu-berlin.de> On 10/04/2017 11:04 AM, Patric Hedlin wrote: > Refreshed webrev. > > @Adrian: Please validate. Done. Both the server and the zero variant build fine on linux-sparc with the updated webrev, hence: Tested-By: John Paul Adrian Glaubitz Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From patric.hedlin at oracle.com Wed Oct 4 09:39:56 2017 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Wed, 4 Oct 2017 11:39:56 +0200 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). In-Reply-To: <55211504-0f3e-52a0-0930-f34babb5da14@physik.fu-berlin.de> References: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> <3fcc865d-3eda-a341-e112-8417711ee3e5@oracle.com> <55211504-0f3e-52a0-0930-f34babb5da14@physik.fu-berlin.de> Message-ID: Thanks Adrian. /Patric On 10/04/2017 11:39 AM, John Paul Adrian Glaubitz wrote: > On 10/04/2017 11:04 AM, Patric Hedlin wrote: >> Refreshed webrev. >> >> @Adrian: Please validate. > Done. Both the server and the zero variant build fine on linux-sparc > with the updated webrev, hence: > > Tested-By: John Paul Adrian Glaubitz > > Adrian > From glaubitz at physik.fu-berlin.de Wed Oct 4 09:58:17 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Wed, 4 Oct 2017 11:58:17 +0200 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). In-Reply-To: References: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> <3fcc865d-3eda-a341-e112-8417711ee3e5@oracle.com> <55211504-0f3e-52a0-0930-f34babb5da14@physik.fu-berlin.de> Message-ID: <2d1fd501-8ba3-7591-a360-2cdc114cfbe9@physik.fu-berlin.de> On 10/04/2017 11:39 AM, Patric Hedlin wrote: > Thanks Adrian. Thank you for your work on this :-). Hope this gets merged soon. After that, the linux-sparc builds won't need any external patches downstream anymore. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From HORIE at jp.ibm.com Wed Oct 4 10:13:58 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 4 Oct 2017 19:13:58 +0900 Subject: RFR(S):8188757:PPC64:Disable VSR52-63 in ppc.ad Message-ID: Dear all, Would you please review the following change in hs? Bug: https://bugs.openjdk.java.net/browse/JDK-8188757 Webrev: http://cr.openjdk.java.net/~mhorie/8188757/webrev.00/ This change disables VSR52-63 because currently there is no support for these registers to be properly treated as nonvolatile. Also, this change removes redundant logical or with 1u to enforce to use VSR32- registers in assembler_ppc.inline.hpp, which was done in my previous webrev for 8188139. Best regards, -- Michihiro, IBM Research - Tokyo From martin.doerr at sap.com Wed Oct 4 12:05:49 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 4 Oct 2017 12:05:49 +0000 Subject: RFR(S):8188757:PPC64:Disable VSR52-63 in ppc.ad In-Reply-To: References: Message-ID: <47f7c8e22e364223b3f049998cf2506f@sap.com> Hi Michihiro, thanks for fixing it so quickly. Reviewed and pushed. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Mittwoch, 4. Oktober 2017 12:14 To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Doerr, Martin Cc: Simonis, Volker ; Hiroshi H Horii ; Kazunori Ogata ; Gustavo Romero Subject: RFR(S):8188757:PPC64:Disable VSR52-63 in ppc.ad Dear all, Would you please review the following change in hs? Bug: https://bugs.openjdk.java.net/browse/JDK-8188757 Webrev: http://cr.openjdk.java.net/~mhorie/8188757/webrev.00/ This change disables VSR52-63 because currently there is no support for these registers to be properly treated as nonvolatile. Also, this change removes redundant logical or with 1u to enforce to use VSR32- registers in assembler_ppc.inline.hpp, which was done in my previous webrev for 8188139. Best regards, -- Michihiro, IBM Research - Tokyo From coleen.phillimore at oracle.com Wed Oct 4 12:08:43 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 4 Oct 2017 08:08:43 -0400 Subject: RFR (M): 8188224: Generalize Atomic::load/store to use templates In-Reply-To: <59D38963.2070806@oracle.com> References: <59D38293.7030800@oracle.com> <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> <59D38963.2070806@oracle.com> Message-ID: So this change is becoming more familiar but I think it's because the comment is repeated now for cmpxchg, add, and now load and store.?? My scanning ability is too limited to spot the differences.? I don't like the duplicated comments at all. I don't know if this is possible and not with this change, but I think there should be a class platformAtomic.hpp which consolidates these comments and moves the platform* stuff out of atomic.hpp, to be included or subclassed by atomic.hpp.? Then we can find our desired Atomic::blah functions again.?? I would like an RFE for this. Otherwise, I've pattern matched this and it seems correct and am fine with checking this in. http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/src/hotspot/os_cpu/windows_x86/atomic_windows_x86.hpp.udiff.html These changes I really like because now we don't have to go hunting to see that atomic::load/store is just *thing. Thanks! Coleen On 10/3/17 8:58 AM, Erik ?sterlund wrote: > Hi David, > > Thanks for the review. > The comments have been removed. > > New full webrev: > http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/ > > New incremental webrev: > http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00_01/ > > Thanks, > /Erik > > On 2017-10-03 14:44, David Holmes wrote: >> Hi Erik, >> >> A lot of jumping through hoops just to do a direct load/store in the >> bulk of cases - but okay, we're embracing templates. >> >> 66?? // Atomically store to a location >> 67?? // See comment above about using jlong atomics on 32-bit platforms >> >> The comment at #67 and the equivalent one for load can be deleted. >> The "comment above" should only be referring to r-m-w atomic ops not >> basic load and store. All platforms must have a means to do atomic >> load/store of 64-bit due to Java volatile variables (eg by using >> floating-point unit on 32-bit) but may not have cmpxchg<8> >> capability. (I failed to convince the author of this when those >> comments went in. ;-) ) >> >> Cheers, >> David >> >> On 3/10/2017 10:29 PM, Erik ?sterlund wrote: >>> Hi, >>> >>> The time has come to generalize Atomic::load/store with templates - >>> the last operation to generalize in Atomic. >>> The design was inspired by Atomic::xchg and uses a similar mechanism >>> to validate the passed in arguments. It was also designed with >>> coming OrderAccess changes in mind. OrderAccess also contains loads >>> and stores that will reuse the LoadImpl and StoreImpl infrastructure >>> in Atomic::load/store. (the type checking for what is okay to pass >>> in to Atomic::load/store is very much the same for >>> OrderAccess::load_acquire/*store*). >>> >>> One thing worth mentioning is that the bsd zero port (but notably >>> not the linux zero port) had a leading fence for atomic stores of >>> jint when #if !defined(ARM) && !defined(M68K) is true without any >>> comment describing why. So I took the liberty of removing it. Atomic >>> should not have any fencing at all - that is what OrderAccess is >>> for. In fact Atomic does not promise any memory ordering semantics >>> for loads and stores. Atomic merely provides relaxed accesses that >>> are atomic. Worth mentioning nevertheless in case anyone wants to >>> keep that jint Atomic::store fence on bsd zero !M68K && !ARM. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8188224 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00/ >>> >>> Testing: JPRT, mach5 hs-tier3 >>> >>> Thanks, >>> /Erik > From coleen.phillimore at oracle.com Wed Oct 4 12:09:55 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 4 Oct 2017 08:09:55 -0400 Subject: RFR (M): 8188224: Generalize Atomic::load/store to use templates In-Reply-To: References: <59D38293.7030800@oracle.com> <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> <59D38963.2070806@oracle.com> Message-ID: <29af85a5-28c4-c617-abb8-42cda7dea371@oracle.com> On 10/4/17 8:08 AM, coleen.phillimore at oracle.com wrote: > > So this change is becoming more familiar but I think it's because the > comment is repeated now for cmpxchg, add, and now load and store.?? My > scanning ability is too limited to spot the differences.? I don't like > the duplicated comments at all. ^ long (> 1 line) > > I don't know if this is possible and not with this change, but I think > there should be a class platformAtomic.hpp which consolidates these > comments and moves the platform* stuff out of atomic.hpp, to be > included or subclassed by atomic.hpp.? Then we can find our desired > Atomic::blah functions again.?? I would like an RFE for this. > > Otherwise, I've pattern matched this and it seems correct and am fine > with checking this in. > > http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/src/hotspot/os_cpu/windows_x86/atomic_windows_x86.hpp.udiff.html > > > These changes I really like because now we don't have to go hunting to > see that atomic::load/store is just *thing. > > Thanks! > Coleen > > On 10/3/17 8:58 AM, Erik ?sterlund wrote: >> Hi David, >> >> Thanks for the review. >> The comments have been removed. >> >> New full webrev: >> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/ >> >> New incremental webrev: >> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00_01/ >> >> Thanks, >> /Erik >> >> On 2017-10-03 14:44, David Holmes wrote: >>> Hi Erik, >>> >>> A lot of jumping through hoops just to do a direct load/store in the >>> bulk of cases - but okay, we're embracing templates. >>> >>> 66?? // Atomically store to a location >>> 67?? // See comment above about using jlong atomics on 32-bit platforms >>> >>> The comment at #67 and the equivalent one for load can be deleted. >>> The "comment above" should only be referring to r-m-w atomic ops not >>> basic load and store. All platforms must have a means to do atomic >>> load/store of 64-bit due to Java volatile variables (eg by using >>> floating-point unit on 32-bit) but may not have cmpxchg<8> >>> capability. (I failed to convince the author of this when those >>> comments went in. ;-) ) >>> >>> Cheers, >>> David >>> >>> On 3/10/2017 10:29 PM, Erik ?sterlund wrote: >>>> Hi, >>>> >>>> The time has come to generalize Atomic::load/store with templates - >>>> the last operation to generalize in Atomic. >>>> The design was inspired by Atomic::xchg and uses a similar >>>> mechanism to validate the passed in arguments. It was also designed >>>> with coming OrderAccess changes in mind. OrderAccess also contains >>>> loads and stores that will reuse the LoadImpl and StoreImpl >>>> infrastructure in Atomic::load/store. (the type checking for what >>>> is okay to pass in to Atomic::load/store is very much the same for >>>> OrderAccess::load_acquire/*store*). >>>> >>>> One thing worth mentioning is that the bsd zero port (but notably >>>> not the linux zero port) had a leading fence for atomic stores of >>>> jint when #if !defined(ARM) && !defined(M68K) is true without any >>>> comment describing why. So I took the liberty of removing it. >>>> Atomic should not have any fencing at all - that is what >>>> OrderAccess is for. In fact Atomic does not promise any memory >>>> ordering semantics for loads and stores. Atomic merely provides >>>> relaxed accesses that are atomic. Worth mentioning nevertheless in >>>> case anyone wants to keep that jint Atomic::store fence on bsd zero >>>> !M68K && !ARM. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8188224 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00/ >>>> >>>> Testing: JPRT, mach5 hs-tier3 >>>> >>>> Thanks, >>>> /Erik >> > From erik.osterlund at oracle.com Wed Oct 4 13:06:17 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 4 Oct 2017 15:06:17 +0200 Subject: RFR (M): 8188224: Generalize Atomic::load/store to use templates In-Reply-To: References: <59D38293.7030800@oracle.com> <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> <59D38963.2070806@oracle.com> Message-ID: <59D4DCC9.6080107@oracle.com> Hi Coleen, On 2017-10-04 14:08, coleen.phillimore at oracle.com wrote: > > So this change is becoming more familiar but I think it's because the > comment is repeated now for cmpxchg, add, and now load and store. My > scanning ability is too limited to spot the differences. I don't like > the duplicated comments at all. > > I don't know if this is possible and not with this change, but I think > there should be a class platformAtomic.hpp which consolidates these > comments and moves the platform* stuff out of atomic.hpp, to be > included or subclassed by atomic.hpp. Then we can find our desired > Atomic::blah functions again. I would like an RFE for this. I see what you are saying. When you think about it, is almost as if we want the comments themselves to be template expanded for each operation. (joking) I will file an RFE for this. > Otherwise, I've pattern matched this and it seems correct and am fine > with checking this in. > > http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/src/hotspot/os_cpu/windows_x86/atomic_windows_x86.hpp.udiff.html > > > These changes I really like because now we don't have to go hunting to > see that atomic::load/store is just *thing. Thank you for the review! /Erik > > Thanks! > Coleen > > On 10/3/17 8:58 AM, Erik ?sterlund wrote: >> Hi David, >> >> Thanks for the review. >> The comments have been removed. >> >> New full webrev: >> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/ >> >> New incremental webrev: >> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00_01/ >> >> Thanks, >> /Erik >> >> On 2017-10-03 14:44, David Holmes wrote: >>> Hi Erik, >>> >>> A lot of jumping through hoops just to do a direct load/store in the >>> bulk of cases - but okay, we're embracing templates. >>> >>> 66 // Atomically store to a location >>> 67 // See comment above about using jlong atomics on 32-bit platforms >>> >>> The comment at #67 and the equivalent one for load can be deleted. >>> The "comment above" should only be referring to r-m-w atomic ops not >>> basic load and store. All platforms must have a means to do atomic >>> load/store of 64-bit due to Java volatile variables (eg by using >>> floating-point unit on 32-bit) but may not have cmpxchg<8> >>> capability. (I failed to convince the author of this when those >>> comments went in. ;-) ) >>> >>> Cheers, >>> David >>> >>> On 3/10/2017 10:29 PM, Erik ?sterlund wrote: >>>> Hi, >>>> >>>> The time has come to generalize Atomic::load/store with templates - >>>> the last operation to generalize in Atomic. >>>> The design was inspired by Atomic::xchg and uses a similar >>>> mechanism to validate the passed in arguments. It was also designed >>>> with coming OrderAccess changes in mind. OrderAccess also contains >>>> loads and stores that will reuse the LoadImpl and StoreImpl >>>> infrastructure in Atomic::load/store. (the type checking for what >>>> is okay to pass in to Atomic::load/store is very much the same for >>>> OrderAccess::load_acquire/*store*). >>>> >>>> One thing worth mentioning is that the bsd zero port (but notably >>>> not the linux zero port) had a leading fence for atomic stores of >>>> jint when #if !defined(ARM) && !defined(M68K) is true without any >>>> comment describing why. So I took the liberty of removing it. >>>> Atomic should not have any fencing at all - that is what >>>> OrderAccess is for. In fact Atomic does not promise any memory >>>> ordering semantics for loads and stores. Atomic merely provides >>>> relaxed accesses that are atomic. Worth mentioning nevertheless in >>>> case anyone wants to keep that jint Atomic::store fence on bsd zero >>>> !M68K && !ARM. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8188224 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00/ >>>> >>>> Testing: JPRT, mach5 hs-tier3 >>>> >>>> Thanks, >>>> /Erik >> > From coleen.phillimore at oracle.com Wed Oct 4 13:34:44 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 4 Oct 2017 09:34:44 -0400 Subject: RFR (M): 8188224: Generalize Atomic::load/store to use templates In-Reply-To: <59D4DCC9.6080107@oracle.com> References: <59D38293.7030800@oracle.com> <712e1c4e-b38b-11c3-4b51-d88f1560a063@oracle.com> <59D38963.2070806@oracle.com> <59D4DCC9.6080107@oracle.com> Message-ID: <1610a769-7a46-7568-191f-5f480b7fea99@oracle.com> On 10/4/17 9:06 AM, Erik ?sterlund wrote: > Hi Coleen, > > On 2017-10-04 14:08, coleen.phillimore at oracle.com wrote: >> >> So this change is becoming more familiar but I think it's because the >> comment is repeated now for cmpxchg, add, and now load and store.?? >> My scanning ability is too limited to spot the differences.? I don't >> like the duplicated comments at all. >> >> I don't know if this is possible and not with this change, but I >> think there should be a class platformAtomic.hpp which consolidates >> these comments and moves the platform* stuff out of atomic.hpp, to be >> included or subclassed by atomic.hpp.? Then we can find our desired >> Atomic::blah functions again.?? I would like an RFE for this. > > I see what you are saying. When you think about it, is almost as if we > want the comments themselves to be template expanded for each > operation. (joking) LOL, I almost wrote this :) > I will file an RFE for this. Thanks! Coleen > > >> Otherwise, I've pattern matched this and it seems correct and am fine >> with checking this in. >> >> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/src/hotspot/os_cpu/windows_x86/atomic_windows_x86.hpp.udiff.html >> >> >> These changes I really like because now we don't have to go hunting >> to see that atomic::load/store is just *thing. > > Thank you for the review! > > /Erik > >> >> Thanks! >> Coleen >> >> On 10/3/17 8:58 AM, Erik ?sterlund wrote: >>> Hi David, >>> >>> Thanks for the review. >>> The comments have been removed. >>> >>> New full webrev: >>> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.01/ >>> >>> New incremental webrev: >>> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00_01/ >>> >>> Thanks, >>> /Erik >>> >>> On 2017-10-03 14:44, David Holmes wrote: >>>> Hi Erik, >>>> >>>> A lot of jumping through hoops just to do a direct load/store in >>>> the bulk of cases - but okay, we're embracing templates. >>>> >>>> 66?? // Atomically store to a location >>>> 67?? // See comment above about using jlong atomics on 32-bit >>>> platforms >>>> >>>> The comment at #67 and the equivalent one for load can be deleted. >>>> The "comment above" should only be referring to r-m-w atomic ops >>>> not basic load and store. All platforms must have a means to do >>>> atomic load/store of 64-bit due to Java volatile variables (eg by >>>> using floating-point unit on 32-bit) but may not have cmpxchg<8> >>>> capability. (I failed to convince the author of this when those >>>> comments went in. ;-) ) >>>> >>>> Cheers, >>>> David >>>> >>>> On 3/10/2017 10:29 PM, Erik ?sterlund wrote: >>>>> Hi, >>>>> >>>>> The time has come to generalize Atomic::load/store with templates >>>>> - the last operation to generalize in Atomic. >>>>> The design was inspired by Atomic::xchg and uses a similar >>>>> mechanism to validate the passed in arguments. It was also >>>>> designed with coming OrderAccess changes in mind. OrderAccess also >>>>> contains loads and stores that will reuse the LoadImpl and >>>>> StoreImpl infrastructure in Atomic::load/store. (the type checking >>>>> for what is okay to pass in to Atomic::load/store is very much the >>>>> same for OrderAccess::load_acquire/*store*). >>>>> >>>>> One thing worth mentioning is that the bsd zero port (but notably >>>>> not the linux zero port) had a leading fence for atomic stores of >>>>> jint when #if !defined(ARM) && !defined(M68K) is true without any >>>>> comment describing why. So I took the liberty of removing it. >>>>> Atomic should not have any fencing at all - that is what >>>>> OrderAccess is for. In fact Atomic does not promise any memory >>>>> ordering semantics for loads and stores. Atomic merely provides >>>>> relaxed accesses that are atomic. Worth mentioning nevertheless in >>>>> case anyone wants to keep that jint Atomic::store fence on bsd >>>>> zero !M68K && !ARM. >>>>> >>>>> Bug: >>>>> https://bugs.openjdk.java.net/browse/JDK-8188224 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8188224/webrev.00/ >>>>> >>>>> Testing: JPRT, mach5 hs-tier3 >>>>> >>>>> Thanks, >>>>> /Erik >>> >> > From bob.vandette at oracle.com Wed Oct 4 18:14:29 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 4 Oct 2017 14:14:29 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> Message-ID: Robbin, I?ve looked into this issue and you are correct. I do have to examine both the sched_getaffinity results as well as the cgroup cpu subsystem configuration files in order to provide a reasonable value for active_processors. If I was only interested in cpusets, I could simply rely on the getaffinity call but I also want to factor in shares and quotas as well. I had assumed that when sched_setaffinity was called (in your case by numactl) that the cgroup cpu config files would be updated to reflect the current processor affinity for the running process. This is not correct. I have updated my changeset and have successfully run with your examples below. I?ll post a new webrev soon. Thanks, Bob. > >> I still want to include the flag for at least one Java release in the event that the new behavior causes some regression >> in behavior. I?m trying to make the detection robust so that it will fallback to the current behavior in the event >> that cgroups is not configured as expected but I?d like to have a way of forcing the issue. JDK 10 is not >> supposed to be a long term support release which makes it a good target for this new behavior. >> I agree with David that once we commit to cgroups, we should extract all VM configuration data from that >> source. There?s more information available for cpusets than just processor affinity that we might want to >> consider when calculating the number of processors to assume for the VM. There?s exclusivity and >> effective cpu data available in addition to the cpuset string. > > cgroup only contains limits, not the real hard limits. > You most consider the affinity mask. We that have numa nodes do: > > [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -cp . ForEver | grep proc > [0.001s][debug][os] Initial active processor count set to 16 > [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc > [0.001s][debug][os] Initial active processor count set to 32 > > when benchmarking all the time and that must be set to 16 otherwise the flag is really bad for us. > So the flag actually breaks the little numa support we have now. > > Thanks, Robbin From robbin.ehn at oracle.com Wed Oct 4 18:30:34 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 4 Oct 2017 20:30:34 +0200 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> Message-ID: <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> Thanks Bob for looking into this. On 10/04/2017 08:14 PM, Bob Vandette wrote: > Robbin, > > I?ve looked into this issue and you are correct. I do have to examine both the > sched_getaffinity results as well as the cgroup cpu subsystem configuration > files in order to provide a reasonable value for active_processors. If I was only > interested in cpusets, I could simply rely on the getaffinity call but I also want to > factor in shares and quotas as well. We had a quick discussion at the office, we actually do think that you could skip reading the shares and quotas. It really depends on what the user expect, if he give us 4 cpu's with 50% or 2 full cpu what do he expect the differences would be? One could argue that he 'knows' that he will only use max 50% and thus we can act as if he is giving us 4 full cpu. But I'll leave that up to you, just a tough we had. > > I had assumed that when sched_setaffinity was called (in your case by numactl) that the > cgroup cpu config files would be updated to reflect the current processor affinity for the > running process. This is not correct. I have updated my changeset and have successfully > run with your examples below. I?ll post a new webrev soon. > I see, thanks again! /Robbin > Thanks, > Bob. > > >> >>> I still want to include the flag for at least one Java release in the event that the new behavior causes some regression >>> in behavior. I?m trying to make the detection robust so that it will fallback to the current behavior in the event >>> that cgroups is not configured as expected but I?d like to have a way of forcing the issue. JDK 10 is not >>> supposed to be a long term support release which makes it a good target for this new behavior. >>> I agree with David that once we commit to cgroups, we should extract all VM configuration data from that >>> source. There?s more information available for cpusets than just processor affinity that we might want to >>> consider when calculating the number of processors to assume for the VM. There?s exclusivity and >>> effective cpu data available in addition to the cpuset string. >> >> cgroup only contains limits, not the real hard limits. >> You most consider the affinity mask. We that have numa nodes do: >> >> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -cp . ForEver | grep proc >> [0.001s][debug][os] Initial active processor count set to 16 >> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc >> [0.001s][debug][os] Initial active processor count set to 32 >> >> when benchmarking all the time and that must be set to 16 otherwise the flag is really bad for us. >> So the flag actually breaks the little numa support we have now. >> >> Thanks, Robbin > From bob.vandette at oracle.com Wed Oct 4 18:51:04 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 4 Oct 2017 14:51:04 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> Message-ID: > On Oct 4, 2017, at 2:30 PM, Robbin Ehn wrote: > > Thanks Bob for looking into this. > > On 10/04/2017 08:14 PM, Bob Vandette wrote: >> Robbin, >> I?ve looked into this issue and you are correct. I do have to examine both the >> sched_getaffinity results as well as the cgroup cpu subsystem configuration >> files in order to provide a reasonable value for active_processors. If I was only >> interested in cpusets, I could simply rely on the getaffinity call but I also want to >> factor in shares and quotas as well. > > We had a quick discussion at the office, we actually do think that you could skip reading the shares and quotas. > It really depends on what the user expect, if he give us 4 cpu's with 50% or 2 full cpu what do he expect the differences would be? > One could argue that he 'knows' that he will only use max 50% and thus we can act as if he is giving us 4 full cpu. > But I'll leave that up to you, just a tough we had. It?s my opinion that we should do something if someone makes the effort to configure their containers to use quotas or shares. There are many different opinions on what the right that right ?something? is. Many developers that are trying to deploy apps that use containers say they don?t like cpusets. This is too limiting for them especially when the server configurations vary within their organization. From everything I?ve read including source code, there seems to be a consensus that shares and quotas are being used as a way to specify a fraction of a system (number of cpus). Docker added ?cpus which is implemented using quotas and periods. They adjust these two parameters to provide a way of calculating the number of cpus that will be available to a process (quota/period). Amazon also documents that cpu shares are defined to be a multiple of 1024. Where 1024 represents a single cpu and a share value of N*1024 represents N cpus. Of course these are just conventions. This is why I provided a way of specifying the number of CPUs so folks deploying Java services can be certain they get what they want. Bob. > >> I had assumed that when sched_setaffinity was called (in your case by numactl) that the >> cgroup cpu config files would be updated to reflect the current processor affinity for the >> running process. This is not correct. I have updated my changeset and have successfully >> run with your examples below. I?ll post a new webrev soon. > > I see, thanks again! > > /Robbin > >> Thanks, >> Bob. >>> >>>> I still want to include the flag for at least one Java release in the event that the new behavior causes some regression >>>> in behavior. I?m trying to make the detection robust so that it will fallback to the current behavior in the event >>>> that cgroups is not configured as expected but I?d like to have a way of forcing the issue. JDK 10 is not >>>> supposed to be a long term support release which makes it a good target for this new behavior. >>>> I agree with David that once we commit to cgroups, we should extract all VM configuration data from that >>>> source. There?s more information available for cpusets than just processor affinity that we might want to >>>> consider when calculating the number of processors to assume for the VM. There?s exclusivity and >>>> effective cpu data available in addition to the cpuset string. >>> >>> cgroup only contains limits, not the real hard limits. >>> You most consider the affinity mask. We that have numa nodes do: >>> >>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -cp . ForEver | grep proc >>> [0.001s][debug][os] Initial active processor count set to 16 >>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc >>> [0.001s][debug][os] Initial active processor count set to 32 >>> >>> when benchmarking all the time and that must be set to 16 otherwise the flag is really bad for us. >>> So the flag actually breaks the little numa support we have now. >>> >>> Thanks, Robbin From ceeaspb at gmail.com Wed Oct 4 20:01:09 2017 From: ceeaspb at gmail.com (Alex Bagehot) Date: Wed, 4 Oct 2017 21:01:09 +0100 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> Message-ID: Hi, On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette wrote: > > > On Oct 4, 2017, at 2:30 PM, Robbin Ehn wrote: > > > > Thanks Bob for looking into this. > > > > On 10/04/2017 08:14 PM, Bob Vandette wrote: > >> Robbin, > >> I?ve looked into this issue and you are correct. I do have to examine > both the > >> sched_getaffinity results as well as the cgroup cpu subsystem > configuration > >> files in order to provide a reasonable value for active_processors. If > I was only > >> interested in cpusets, I could simply rely on the getaffinity call but > I also want to > >> factor in shares and quotas as well. > > > > We had a quick discussion at the office, we actually do think that you > could skip reading the shares and quotas. > > It really depends on what the user expect, if he give us 4 cpu's with > 50% or 2 full cpu what do he expect the differences would be? > > One could argue that he 'knows' that he will only use max 50% and thus > we can act as if he is giving us 4 full cpu. > > But I'll leave that up to you, just a tough we had. > > It?s my opinion that we should do something if someone makes the effort to > configure their > containers to use quotas or shares. There are many different opinions on > what the right that > right ?something? is. > It might be interesting to look at some real instances of how java might[3] be deployed in containers. Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so this is a vast chunk of deployments that need both of them today. > > Many developers that are trying to deploy apps that use containers say > they don?t like > cpusets. This is too limiting for them especially when the server > configurations vary > within their organization. > True, however Kubernetes has an alpha feature[5] where it allocates cpusets to containers that request a whole number of cpus. Previously without cpusets any container could run on any cpu which we know might not be good for some workloads that want isolation. A request for a fractional or burstable amount of cpu would be allocated from a shared cpu pool. So although manual allocation of cpusets will be flakey[3] , automation should be able to make it work. > > From everything I?ve read including source code, there seems to be a > consensus that > shares and quotas are being used as a way to specify a fraction of a > system (number of cpus). > A refinement[6] on this is: Shares can be used for guaranteed cpu - you will always get your share. Quota[4] is a limit/constraint - you can never get more than the quota. So given the below limit of how many shares will be allocated on a host you can have burstable(or overcommit) capacity if your shares are less than your quota. > > Docker added ?cpus which is implemented using quotas and periods. They > adjust these > two parameters to provide a way of calculating the number of cpus that > will be available > to a process (quota/period). Amazon also documents that cpu shares are > defined to be a multiple of 1024. > Where 1024 represents a single cpu and a share value of N*1024 represents > N cpus. > Kubernetes and Mesos/Marathon also use the N*1024 shares per host to allocate resources automatically. Hopefully this provides some background on what a couple of orchestration systems that will be running java are doing currently in this area. Thanks, Alex [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a reasonable intro : https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke r-mesos-and-marathon/ ) [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 [2] https://kubernetes.io/docs/concepts/configuration/manage -compute-resources-container/ [3] https://youtu.be/w1rZOY5gbvk?t=2479 [4] https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf https://lwn.net/Articles/428175/ [5] https://github.com/kubernetes/community/blob/43ce57ac476b9f2ce3f0220354a075e095a0d469/contributors/design-proposals/node/cpu-manager.md / https://github.com/kubernetes/kubernetes/commit/ 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / https://vimeo.com/226858314 [6] https://kubernetes.io/docs/concepts/configuration/manage- compute-resources-container/#how-pods-with-resource-limits-are-run > Of course these are just conventions. This is why I provided a way of > specifying the > number of CPUs so folks deploying Java services can be certain they get > what they want. > > Bob. > > > > >> I had assumed that when sched_setaffinity was called (in your case by > numactl) that the > >> cgroup cpu config files would be updated to reflect the current > processor affinity for the > >> running process. This is not correct. I have updated my changeset and > have successfully > >> run with your examples below. I?ll post a new webrev soon. > > > > I see, thanks again! > > > > /Robbin > > > >> Thanks, > >> Bob. > >>> > >>>> I still want to include the flag for at least one Java release in the > event that the new behavior causes some regression > >>>> in behavior. I?m trying to make the detection robust so that it will > fallback to the current behavior in the event > >>>> that cgroups is not configured as expected but I?d like to have a way > of forcing the issue. JDK 10 is not > >>>> supposed to be a long term support release which makes it a good > target for this new behavior. > >>>> I agree with David that once we commit to cgroups, we should extract > all VM configuration data from that > >>>> source. There?s more information available for cpusets than just > processor affinity that we might want to > >>>> consider when calculating the number of processors to assume for the > VM. There?s exclusivity and > >>>> effective cpu data available in addition to the cpuset string. > >>> > >>> cgroup only contains limits, not the real hard limits. > >>> You most consider the affinity mask. We that have numa nodes do: > >>> > >>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java > -Xlog:os=debug -cp . ForEver | grep proc > >>> [0.001s][debug][os] Initial active processor count set to 16 > >>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java > -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc > >>> [0.001s][debug][os] Initial active processor count set to 32 > >>> > >>> when benchmarking all the time and that must be set to 16 otherwise > the flag is really bad for us. > >>> So the flag actually breaks the little numa support we have now. > >>> > >>> Thanks, Robbin > > From david.holmes at oracle.com Wed Oct 4 21:51:47 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 5 Oct 2017 07:51:47 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> Message-ID: <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> Hi Alex, Can you tell me how shares/quotas are actually implemented in terms of allocating "cpus" to processes when shares/quotas are being applied? For example in a 12 cpu system if I have a 50% share do I get all 12 CPUs for 50% of a "quantum" each, or do I get 6 CPUs for a full quantum each? When we try to use the "number of processors" to control the number of threads created, or the number of partitions in a task, then we really want to know how many CPUs we can actually be concurrently running on! Thanks, David On 5/10/2017 6:01 AM, Alex Bagehot wrote: > Hi, > > On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette > wrote: > >> >>> On Oct 4, 2017, at 2:30 PM, Robbin Ehn wrote: >>> >>> Thanks Bob for looking into this. >>> >>> On 10/04/2017 08:14 PM, Bob Vandette wrote: >>>> Robbin, >>>> I?ve looked into this issue and you are correct. I do have to examine >> both the >>>> sched_getaffinity results as well as the cgroup cpu subsystem >> configuration >>>> files in order to provide a reasonable value for active_processors. If >> I was only >>>> interested in cpusets, I could simply rely on the getaffinity call but >> I also want to >>>> factor in shares and quotas as well. >>> >>> We had a quick discussion at the office, we actually do think that you >> could skip reading the shares and quotas. >>> It really depends on what the user expect, if he give us 4 cpu's with >> 50% or 2 full cpu what do he expect the differences would be? >>> One could argue that he 'knows' that he will only use max 50% and thus >> we can act as if he is giving us 4 full cpu. >>> But I'll leave that up to you, just a tough we had. >> >> It?s my opinion that we should do something if someone makes the effort to >> configure their >> containers to use quotas or shares. There are many different opinions on >> what the right that >> right ?something? is. >> > > It might be interesting to look at some real instances of how java might[3] > be deployed in containers. > Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so this is a vast > chunk of deployments that need both of them today. > > >> >> Many developers that are trying to deploy apps that use containers say >> they don?t like >> cpusets. This is too limiting for them especially when the server >> configurations vary >> within their organization. >> > > True, however Kubernetes has an alpha feature[5] where it allocates cpusets > to containers that request a whole number of cpus. Previously without > cpusets any container could run on any cpu which we know might not be good > for some workloads that want isolation. A request for a fractional or > burstable amount of cpu would be allocated from a shared cpu pool. So > although manual allocation of cpusets will be flakey[3] , automation should > be able to make it work. > > >> >> From everything I?ve read including source code, there seems to be a >> consensus that >> shares and quotas are being used as a way to specify a fraction of a >> system (number of cpus). >> > > A refinement[6] on this is: > Shares can be used for guaranteed cpu - you will always get your share. > Quota[4] is a limit/constraint - you can never get more than the quota. > So given the below limit of how many shares will be allocated on a host you > can have burstable(or overcommit) capacity if your shares are less than > your quota. > > >> >> Docker added ?cpus which is implemented using quotas and periods. They >> adjust these >> two parameters to provide a way of calculating the number of cpus that >> will be available >> to a process (quota/period). Amazon also documents that cpu shares are >> defined to be a multiple of 1024. >> Where 1024 represents a single cpu and a share value of N*1024 represents >> N cpus. >> > > Kubernetes and Mesos/Marathon also use the N*1024 shares per host to > allocate resources automatically. > > Hopefully this provides some background on what a couple of orchestration > systems that will be running java are doing currently in this area. > Thanks, > Alex > > > [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e > 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a reasonable > intro : https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke > r-mesos-and-marathon/ ) > [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 > > [2] https://kubernetes.io/docs/concepts/configuration/manage > -compute-resources-container/ > > [3] https://youtu.be/w1rZOY5gbvk?t=2479 > > [4] https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt > https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf > https://lwn.net/Articles/428175/ > > [5] > https://github.com/kubernetes/community/blob/43ce57ac476b9f2ce3f0220354a075e095a0d469/contributors/design-proposals/node/cpu-manager.md > / https://github.com/kubernetes/kubernetes/commit/ > 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / https://vimeo.com/226858314 > > [6] https://kubernetes.io/docs/concepts/configuration/manage- > compute-resources-container/#how-pods-with-resource-limits-are-run > > >> Of course these are just conventions. This is why I provided a way of >> specifying the >> number of CPUs so folks deploying Java services can be certain they get >> what they want. >> >> Bob. >> >>> >>>> I had assumed that when sched_setaffinity was called (in your case by >> numactl) that the >>>> cgroup cpu config files would be updated to reflect the current >> processor affinity for the >>>> running process. This is not correct. I have updated my changeset and >> have successfully >>>> run with your examples below. I?ll post a new webrev soon. >>> >>> I see, thanks again! >>> >>> /Robbin >>> >>>> Thanks, >>>> Bob. >>>>> >>>>>> I still want to include the flag for at least one Java release in the >> event that the new behavior causes some regression >>>>>> in behavior. I?m trying to make the detection robust so that it will >> fallback to the current behavior in the event >>>>>> that cgroups is not configured as expected but I?d like to have a way >> of forcing the issue. JDK 10 is not >>>>>> supposed to be a long term support release which makes it a good >> target for this new behavior. >>>>>> I agree with David that once we commit to cgroups, we should extract >> all VM configuration data from that >>>>>> source. There?s more information available for cpusets than just >> processor affinity that we might want to >>>>>> consider when calculating the number of processors to assume for the >> VM. There?s exclusivity and >>>>>> effective cpu data available in addition to the cpuset string. >>>>> >>>>> cgroup only contains limits, not the real hard limits. >>>>> You most consider the affinity mask. We that have numa nodes do: >>>>> >>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java >> -Xlog:os=debug -cp . ForEver | grep proc >>>>> [0.001s][debug][os] Initial active processor count set to 16 >>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java >> -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc >>>>> [0.001s][debug][os] Initial active processor count set to 32 >>>>> >>>>> when benchmarking all the time and that must be set to 16 otherwise >> the flag is really bad for us. >>>>> So the flag actually breaks the little numa support we have now. >>>>> >>>>> Thanks, Robbin >> >> From vladimir.kozlov at oracle.com Wed Oct 4 23:05:33 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 4 Oct 2017 16:05:33 -0700 Subject: [10] RFR(S) 8188775: Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.hotspot Message-ID: https://bugs.openjdk.java.net/browse/JDK-8188775 Changes for 8182701[1] missed changes in default.policy for new module jdk.internal.vm.compiler.management. Add missing code: src/java.base/share/lib/security/default.policy @@ -154,6 +154,10 @@ permission java.security.AllPermission; }; +grant codeBase "jrt:/jdk.internal.vm.compiler.management" { + permission java.security.AllPermission; +}; + grant codeBase "jrt:/jdk.jsobject" { permission java.security.AllPermission; }; Verified with failed test. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk10/hs/rev/8b2054b7d02c From mandy.chung at oracle.com Wed Oct 4 23:07:07 2017 From: mandy.chung at oracle.com (mandy chung) Date: Wed, 4 Oct 2017 16:07:07 -0700 Subject: [10] RFR(S) 8188775: Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.hotspot In-Reply-To: References: Message-ID: <2e050a60-0f6e-503f-df39-31108f0da6d1@oracle.com> +1 Mandy On 10/4/17 4:05 PM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8188775 > > Changes for 8182701[1] missed changes in default.policy for new module > jdk.internal.vm.compiler.management. > > Add missing code: > > src/java.base/share/lib/security/default.policy > @@ -154,6 +154,10 @@ > ???? permission java.security.AllPermission; > ?}; > > +grant codeBase "jrt:/jdk.internal.vm.compiler.management" { > +??? permission java.security.AllPermission; > +}; > + > ?grant codeBase "jrt:/jdk.jsobject" { > ???? permission java.security.AllPermission; > ?}; > > Verified with failed test. > > Thanks, > Vladimir > > [1] http://hg.openjdk.java.net/jdk10/hs/rev/8b2054b7d02c From vladimir.kozlov at oracle.com Wed Oct 4 23:12:27 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 4 Oct 2017 16:12:27 -0700 Subject: [10] RFR(XS) 8188776: jdk.internal.vm.ci can't export package to upgradeable modules Message-ID: https://bugs.openjdk.java.net/browse/JDK-8188776 8182701 added exports for jdk.vm.ci.runtime package [1] but did not add new exception in the test. Added missing exception in JdkQualifiedExportTest.java test: --- a/test/jdk/jdk/modules/etc/JdkQualifiedExportTest.java +++ b/test/jdk/jdk/modules/etc/JdkQualifiedExportTest.java @@ -70,6 +70,7 @@ static Set KNOWN_EXCEPTIONS = Set.of("jdk.internal.vm.ci/jdk.vm.ci.services", + "jdk.internal.vm.ci/jdk.vm.ci.runtime", "jdk.jsobject/jdk.internal.netscape.javascript.spi"); static void checkExports(ModuleDescriptor md) { Verified with this test. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk10/hs/rev/8b2054b7d02c#l3.1 From vladimir.kozlov at oracle.com Wed Oct 4 23:12:55 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 4 Oct 2017 16:12:55 -0700 Subject: [10] RFR(S) 8188775: Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.hotspot In-Reply-To: <2e050a60-0f6e-503f-df39-31108f0da6d1@oracle.com> References: <2e050a60-0f6e-503f-df39-31108f0da6d1@oracle.com> Message-ID: <7a5e843d-4da1-25e8-d21c-908977707d4c@oracle.com> Thank you, Mandy Vladimir On 10/4/17 4:07 PM, mandy chung wrote: > +1 > > Mandy > > On 10/4/17 4:05 PM, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8188775 >> >> Changes for 8182701[1] missed changes in default.policy for new module jdk.internal.vm.compiler.management. >> >> Add missing code: >> >> src/java.base/share/lib/security/default.policy >> @@ -154,6 +154,10 @@ >> ???? permission java.security.AllPermission; >> ?}; >> >> +grant codeBase "jrt:/jdk.internal.vm.compiler.management" { >> +??? permission java.security.AllPermission; >> +}; >> + >> ?grant codeBase "jrt:/jdk.jsobject" { >> ???? permission java.security.AllPermission; >> ?}; >> >> Verified with failed test. >> >> Thanks, >> Vladimir >> >> [1] http://hg.openjdk.java.net/jdk10/hs/rev/8b2054b7d02c > From mandy.chung at oracle.com Wed Oct 4 23:15:43 2017 From: mandy.chung at oracle.com (mandy chung) Date: Wed, 4 Oct 2017 16:15:43 -0700 Subject: [10] RFR(XS) 8188776: jdk.internal.vm.ci can't export package to upgradeable modules In-Reply-To: References: Message-ID: <1f6bdab4-8c1d-61a4-4abf-f294590e2eff@oracle.com> +1 Looks like JDK regression tests were not run before pushing JDK-8182701? Mandy On 10/4/17 4:12 PM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8188776 > > 8182701 added exports for jdk.vm.ci.runtime package [1] but did not > add new exception in the test. > > Added missing exception in JdkQualifiedExportTest.java test: > > --- a/test/jdk/jdk/modules/etc/JdkQualifiedExportTest.java > +++ b/test/jdk/jdk/modules/etc/JdkQualifiedExportTest.java > @@ -70,6 +70,7 @@ > > ???? static Set KNOWN_EXCEPTIONS = > ???????? Set.of("jdk.internal.vm.ci/jdk.vm.ci.services", > +?????????????? "jdk.internal.vm.ci/jdk.vm.ci.runtime", > "jdk.jsobject/jdk.internal.netscape.javascript.spi"); > > ???? static void checkExports(ModuleDescriptor md) { > > Verified with this test. > > Thanks, > Vladimir > > [1] http://hg.openjdk.java.net/jdk10/hs/rev/8b2054b7d02c#l3.1 From vladimir.kozlov at oracle.com Wed Oct 4 23:34:23 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 4 Oct 2017 16:34:23 -0700 Subject: [10] RFR(XS) 8188776: jdk.internal.vm.ci can't export package to upgradeable modules In-Reply-To: <1f6bdab4-8c1d-61a4-4abf-f294590e2eff@oracle.com> References: <1f6bdab4-8c1d-61a4-4abf-f294590e2eff@oracle.com> Message-ID: Thank you, Mandy On 10/4/17 4:15 PM, mandy chung wrote: > +1 > > Looks like JDK regression tests were not run before pushing JDK-8182701? Yes, only hotspot jtreg tests were run unfortunately before the push. We do run jdk_lang regularly in tier5 Nightly testing. Thanks, Vladimir > > Mandy > > On 10/4/17 4:12 PM, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8188776 >> >> 8182701 added exports for jdk.vm.ci.runtime package [1] but did not add new exception in the test. >> >> Added missing exception in JdkQualifiedExportTest.java test: >> >> --- a/test/jdk/jdk/modules/etc/JdkQualifiedExportTest.java >> +++ b/test/jdk/jdk/modules/etc/JdkQualifiedExportTest.java >> @@ -70,6 +70,7 @@ >> >> ???? static Set KNOWN_EXCEPTIONS = >> ???????? Set.of("jdk.internal.vm.ci/jdk.vm.ci.services", >> +?????????????? "jdk.internal.vm.ci/jdk.vm.ci.runtime", >> "jdk.jsobject/jdk.internal.netscape.javascript.spi"); >> >> ???? static void checkExports(ModuleDescriptor md) { >> >> Verified with this test. >> >> Thanks, >> Vladimir >> >> [1] http://hg.openjdk.java.net/jdk10/hs/rev/8b2054b7d02c#l3.1 > From HORIE at jp.ibm.com Thu Oct 5 09:15:55 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Thu, 5 Oct 2017 18:15:55 +0900 Subject: RFR:8188802:PPC64: Failure on assert(lrgmask.is_aligned_sets(RegMask::SlotsPerVecX)) Message-ID: Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8188802 Webrev: http://cr.openjdk.java.net/~mhorie/8188802/webrev.00/ This change fixes the assertion failures, which occur after introducing " 8188139:PPC64: Superword Level Parallelization with VSX". I exchanged the order of declarations of alloc_classes for SR and VSR. After this fix, another assertion in rc_class() in ppc.ad failed, I modified the assertion itself to take into account newly added VSRs. I would be happy to revise code if these changes do not make sense. Best regards, -- Michihiro, IBM Research - Tokyo From martin.doerr at sap.com Thu Oct 5 11:05:30 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 5 Oct 2017 11:05:30 +0000 Subject: RFR:8188802:PPC64: Failure on assert(lrgmask.is_aligned_sets(RegMask::SlotsPerVecX)) In-Reply-To: References: Message-ID: <058ad834758242c2a7bc9e39b1aa06df@sap.com> Hi Michihiro, pushed this change as it enables us to build and run the VM again. I have introduced a switch "SuperwordUseVSX" which I only enable on >=Power8. Reason is that you're using Power8 instructions which broke the VM for older processors. Regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Donnerstag, 5. Oktober 2017 11:16 To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Doerr, Martin Cc: Hiroshi H Horii ; Gustavo Romero ; Kazunori Ogata Subject: RFR:8188802:PPC64: Failure on assert(lrgmask.is_aligned_sets(RegMask::SlotsPerVecX)) Dear all, Would you please review the following change? Bug: https://bugs.openjdk.java.net/browse/JDK-8188802 Webrev: http://cr.openjdk.java.net/~mhorie/8188802/webrev.00/ This change fixes the assertion failures, which occur after introducing "8188139:PPC64: Superword Level Parallelization with VSX". I exchanged the order of declarations of alloc_classes for SR and VSR. After this fix, another assertion in rc_class() in ppc.ad failed, I modified the assertion itself to take into account newly added VSRs. I would be happy to revise code if these changes do not make sense. Best regards, -- Michihiro, IBM Research - Tokyo From erik.osterlund at oracle.com Thu Oct 5 13:55:45 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 5 Oct 2017 15:55:45 +0200 Subject: RFR (M): 8188813: Generalize OrderAccess to use templates Message-ID: <59D639E1.7070104@oracle.com> Hi, Now that Atomic has been generalized with templates, the same should to be done to OrderAccess. Bug: https://bugs.openjdk.java.net/browse/JDK-8188813 Webrev: http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/ Testing: mach5 hs-tier3 Since Atomic already has a mechanism for type checking generic arguments for Atomic::load/store, and OrderAccess also is a bunch of semantically decorated loads and stores, I decided to reuse the template wheel that was already invented (Atomic::LoadImpl and Atomic::StoreImpl). Therefore, I made OrderAccess privately inherit Atomic so that this infrastructure could be reused. A whole bunch of code has been nuked with this generalization. It is worth noting that I have added PrimitiveConversion functionality for doubles and floats which translates to using the union trick for casting double to and from int64_t and float to and from int32_t when passing down doubles and ints to the API. I need the former two, because Java supports volatile double and volatile float, and therefore runtime support for that needs to be able to use floats and doubles. I also added PrimitiveConversion functionality for the subclasses of oop (instanceOop and friends). The base class oop already supported this, so it seemed natural that the subclasses should support it too. Thanks, /Erik From goetz.lindenmaier at sap.com Thu Oct 5 16:11:58 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 5 Oct 2017 16:11:58 +0000 Subject: Normalize help flags of tools in jdk? Message-ID: <794c228f8b3a4810ae7c885402dda687@sap.com> Hi, I would like to normalize the help flags of the tools in jdk/bin. java accepts -?, -h and --help. I think that's a good set the others should support, too. If this is appreciated, I would complete this webrev to cover all the cases where this is doable with acceptable effort: http://cr.openjdk.java.net/~goetz/wr17/helpMessage/webrev/ Some tools exit with '1' after displaying the help message, while most exit with '0'. Is that intended? See also the test I added, it's implemented similar to tools/launcher/VersionCheck.java. Best regards, Goetz. From ceeaspb at gmail.com Thu Oct 5 16:43:13 2017 From: ceeaspb at gmail.com (Alex Bagehot) Date: Thu, 5 Oct 2017 17:43:13 +0100 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> Message-ID: Hi David, On Wed, Oct 4, 2017 at 10:51 PM, David Holmes wrote: > Hi Alex, > > Can you tell me how shares/quotas are actually implemented in terms of > allocating "cpus" to processes when shares/quotas are being applied? The allocation of cpus to processes/threads(tasks as the kernel sees them) or the other way round is called balancing, which is done by Scheduling domains[3]. cpu shares use CFS "group" scheduling[1] to apply the share to all the tasks(threads) in the container. The container cpu shares weight maps directly to a task's weight in CFS, which given it is part of a group is divided by the number of tasks in the group (ie. a default container share of 1024 with 2 threads in the container/group would result in each thread/task having a 512 weight[4]). The same values used by nice[2] also. You can observe the task weight and other scheduler numbers in /proc/sched_debug [4]. You can also kernel trace scheduler activity which typically tells you the tasks involved, the cpu, the event: switch or wakeup, etc. > For example in a 12 cpu system if I have a 50% share do I get all 12 CPUs > for 50% of a "quantum" each, or do I get 6 CPUs for a full quantum each? > You get 12 cpus for 50% of the time on the average if there is another workload that has the same weight as you and is consuming as much as it can. If there's nothing else running on the machine you get 12 cpus for 100% of the time with a cpu shares only config (ie. the burst capacity). I validated that the share was balanced over all the cpus by running linux perf events and checking that there were cpu samples on all cpus. There's bound to be other ways of doing it also. > > When we try to use the "number of processors" to control the number of > threads created, or the number of partitions in a task, then we really want > to know how many CPUs we can actually be concurrently running on! > Makes sense to check. Hopefully there aren't any major errors or omissions in the above. Thanks, Alex [1] https://lwn.net/Articles/240474/ [2] https://github.com/torvalds/linux/blob/368f89984bb971b9f8b69eeb85ab19 a89f985809/kernel/sched/core.c#L6735 [3] https://lwn.net/Articles/80911/ / http://www.i3s.unice.fr/~ jplozi/wastedcores/files/extended_talk.pdf [4] cfs_rq[13]:/system.slice/docker-f5681788d6daab249c90810fe60da4 29a2565b901ff34245922a578635b5d607.scope .exec_clock : 0.000000 .MIN_vruntime : 0.000001 .min_vruntime : 8090.087297 .max_vruntime : 0.000001 .spread : 0.000000 .spread0 : -124692718.052832 .nr_spread_over : 0 .nr_running : 1 .load : 1024 .runnable_load_avg : 1023 .blocked_load_avg : 0 .tg_load_avg : 2046 .tg_load_contrib : 1023 .tg_runnable_contrib : 1023 .tg->runnable_avg : 2036 .tg->cfs_bandwidth.timer_active: 0 .throttled : 0 .throttle_count : 0 .se->exec_start : 236081964.515645 .se->vruntime : 24403993.326934 .se->sum_exec_runtime : 8091.135873 .se->load.weight : 512 .se->avg.runnable_avg_sum : 45979 .se->avg.runnable_avg_period : 45979 .se->avg.load_avg_contrib : 511 .se->avg.decay_count : 0 > > Thanks, > David > > > On 5/10/2017 6:01 AM, Alex Bagehot wrote: > >> Hi, >> >> On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette >> wrote: >> >> >>> On Oct 4, 2017, at 2:30 PM, Robbin Ehn wrote: >>>> >>>> Thanks Bob for looking into this. >>>> >>>> On 10/04/2017 08:14 PM, Bob Vandette wrote: >>>> >>>>> Robbin, >>>>> I?ve looked into this issue and you are correct. I do have to examine >>>>> >>>> both the >>> >>>> sched_getaffinity results as well as the cgroup cpu subsystem >>>>> >>>> configuration >>> >>>> files in order to provide a reasonable value for active_processors. If >>>>> >>>> I was only >>> >>>> interested in cpusets, I could simply rely on the getaffinity call but >>>>> >>>> I also want to >>> >>>> factor in shares and quotas as well. >>>>> >>>> >>>> We had a quick discussion at the office, we actually do think that you >>>> >>> could skip reading the shares and quotas. >>> >>>> It really depends on what the user expect, if he give us 4 cpu's with >>>> >>> 50% or 2 full cpu what do he expect the differences would be? >>> >>>> One could argue that he 'knows' that he will only use max 50% and thus >>>> >>> we can act as if he is giving us 4 full cpu. >>> >>>> But I'll leave that up to you, just a tough we had. >>>> >>> >>> It?s my opinion that we should do something if someone makes the effort >>> to >>> configure their >>> containers to use quotas or shares. There are many different opinions on >>> what the right that >>> right ?something? is. >>> >>> >> It might be interesting to look at some real instances of how java >> might[3] >> be deployed in containers. >> Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so this is a >> vast >> chunk of deployments that need both of them today. >> >> >> >>> Many developers that are trying to deploy apps that use containers say >>> they don?t like >>> cpusets. This is too limiting for them especially when the server >>> configurations vary >>> within their organization. >>> >>> >> True, however Kubernetes has an alpha feature[5] where it allocates >> cpusets >> to containers that request a whole number of cpus. Previously without >> cpusets any container could run on any cpu which we know might not be good >> for some workloads that want isolation. A request for a fractional or >> burstable amount of cpu would be allocated from a shared cpu pool. So >> although manual allocation of cpusets will be flakey[3] , automation >> should >> be able to make it work. >> >> >> >>> From everything I?ve read including source code, there seems to be a >>> consensus that >>> shares and quotas are being used as a way to specify a fraction of a >>> system (number of cpus). >>> >>> >> A refinement[6] on this is: >> Shares can be used for guaranteed cpu - you will always get your share. >> Quota[4] is a limit/constraint - you can never get more than the quota. >> So given the below limit of how many shares will be allocated on a host >> you >> can have burstable(or overcommit) capacity if your shares are less than >> your quota. >> >> >> >>> Docker added ?cpus which is implemented using quotas and periods. They >>> adjust these >>> two parameters to provide a way of calculating the number of cpus that >>> will be available >>> to a process (quota/period). Amazon also documents that cpu shares are >>> defined to be a multiple of 1024. >>> Where 1024 represents a single cpu and a share value of N*1024 represents >>> N cpus. >>> >>> >> Kubernetes and Mesos/Marathon also use the N*1024 shares per host to >> allocate resources automatically. >> >> Hopefully this provides some background on what a couple of orchestration >> systems that will be running java are doing currently in this area. >> Thanks, >> Alex >> >> >> [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e >> 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a reasonable >> intro : https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke >> r-mesos-and-marathon/ ) >> [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 >> >> [2] https://kubernetes.io/docs/concepts/configuration/manage >> -compute-resources-container/ >> >> [3] https://youtu.be/w1rZOY5gbvk?t=2479 >> >> [4] https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt >> https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf >> https://lwn.net/Articles/428175/ >> >> [5] >> https://github.com/kubernetes/community/blob/43ce57ac476b9f2 >> ce3f0220354a075e095a0d469/contributors/design-proposals/node >> /cpu-manager.md >> / https://github.com/kubernetes/kubernetes/commit/ >> 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / https://vimeo.com/226858314 >> >> >> [6] https://kubernetes.io/docs/concepts/configuration/manage- >> compute-resources-container/#how-pods-with-resource-limits-are-run >> >> >> Of course these are just conventions. This is why I provided a way of >>> specifying the >>> number of CPUs so folks deploying Java services can be certain they get >>> what they want. >>> >>> Bob. >>> >>> >>>> I had assumed that when sched_setaffinity was called (in your case by >>>>> >>>> numactl) that the >>> >>>> cgroup cpu config files would be updated to reflect the current >>>>> >>>> processor affinity for the >>> >>>> running process. This is not correct. I have updated my changeset and >>>>> >>>> have successfully >>> >>>> run with your examples below. I?ll post a new webrev soon. >>>>> >>>> >>>> I see, thanks again! >>>> >>>> /Robbin >>>> >>>> Thanks, >>>>> Bob. >>>>> >>>>>> >>>>>> I still want to include the flag for at least one Java release in the >>>>>>> >>>>>> event that the new behavior causes some regression >>> >>>> in behavior. I?m trying to make the detection robust so that it will >>>>>>> >>>>>> fallback to the current behavior in the event >>> >>>> that cgroups is not configured as expected but I?d like to have a way >>>>>>> >>>>>> of forcing the issue. JDK 10 is not >>> >>>> supposed to be a long term support release which makes it a good >>>>>>> >>>>>> target for this new behavior. >>> >>>> I agree with David that once we commit to cgroups, we should extract >>>>>>> >>>>>> all VM configuration data from that >>> >>>> source. There?s more information available for cpusets than just >>>>>>> >>>>>> processor affinity that we might want to >>> >>>> consider when calculating the number of processors to assume for the >>>>>>> >>>>>> VM. There?s exclusivity and >>> >>>> effective cpu data available in addition to the cpuset string. >>>>>>> >>>>>> >>>>>> cgroup only contains limits, not the real hard limits. >>>>>> You most consider the affinity mask. We that have numa nodes do: >>>>>> >>>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java >>>>>> >>>>> -Xlog:os=debug -cp . ForEver | grep proc >>> >>>> [0.001s][debug][os] Initial active processor count set to 16 >>>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java >>>>>> >>>>> -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc >>> >>>> [0.001s][debug][os] Initial active processor count set to 32 >>>>>> >>>>>> when benchmarking all the time and that must be set to 16 otherwise >>>>>> >>>>> the flag is really bad for us. >>> >>>> So the flag actually breaks the little numa support we have now. >>>>>> >>>>>> Thanks, Robbin >>>>>> >>>>> >>> >>> From jaroslav.tulach at oracle.com Thu Oct 5 15:32:39 2017 From: jaroslav.tulach at oracle.com (Jaroslav Tulach) Date: Thu, 05 Oct 2017 17:32:39 +0200 Subject: [10] RFR(S) 8188775: Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.hotspot In-Reply-To: References: Message-ID: <2799842.XxlxnWyqlB@pracovni> Opps. Sorry for causing the problem. I haven't executed the test in question and thus I thought everything is OK. Thanks Vladimir for creating the fix. -jt On st?eda 4. ??jna 2017 16:05:33 CEST Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8188775 > > Changes for 8182701[1] missed changes in default.policy for new module > jdk.internal.vm.compiler.management. > > Add missing code: > > src/java.base/share/lib/security/default.policy > @@ -154,6 +154,10 @@ > permission java.security.AllPermission; > }; > > +grant codeBase "jrt:/jdk.internal.vm.compiler.management" { > + permission java.security.AllPermission; > +}; > + > grant codeBase "jrt:/jdk.jsobject" { > permission java.security.AllPermission; > }; > > Verified with failed test. > > Thanks, > Vladimir > > [1] http://hg.openjdk.java.net/jdk10/hs/rev/8b2054b7d02c From bob.vandette at oracle.com Thu Oct 5 17:57:26 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Thu, 5 Oct 2017 13:57:26 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> Message-ID: <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> > On Oct 5, 2017, at 12:43 PM, Alex Bagehot wrote: > > Hi David, > > On Wed, Oct 4, 2017 at 10:51 PM, David Holmes > wrote: > Hi Alex, > > Can you tell me how shares/quotas are actually implemented in terms of allocating "cpus" to processes when shares/quotas are being applied? > > The allocation of cpus to processes/threads(tasks as the kernel sees them) or the other way round is called balancing, which is done by Scheduling domains[3]. > > cpu shares use CFS "group" scheduling[1] to apply the share to all the tasks(threads) in the container. The container cpu shares weight maps directly to a task's weight in CFS, which given it is part of a group is divided by the number of tasks in the group (ie. a default container share of 1024 with 2 threads in the container/group would result in each thread/task having a 512 weight[4]). The same values used by nice[2] also. > > You can observe the task weight and other scheduler numbers in /proc/sched_debug [4]. You can also kernel trace scheduler activity which typically tells you the tasks involved, the cpu, the event: switch or wakeup, etc. > > For example in a 12 cpu system if I have a 50% share do I get all 12 CPUs for 50% of a "quantum" each, or do I get 6 CPUs for a full quantum each? > > You get 12 cpus for 50% of the time on the average if there is another workload that has the same weight as you and is consuming as much as it can. > If there's nothing else running on the machine you get 12 cpus for 100% of the time with a cpu shares only config (ie. the burst capacity). > > I validated that the share was balanced over all the cpus by running linux perf events and checking that there were cpu samples on all cpus. There's bound to be other ways of doing it also. > > > When we try to use the "number of processors" to control the number of threads created, or the number of partitions in a task, then we really want to know how many CPUs we can actually be concurrently running on! I?m not sure that the primary question for serverless container execution. Just because you might happen to burst and have available to you more CPU time than you specified in your shares doesn?t mean that a multi-threaded application running in one of these containers should configure itself to use all available host processors. This would result in over-burdoning the system at times of high load. The Java runtime, at startup, configures several subsystems to use a number of threads for each system based on the number of available processors. These subsystems include things like the number of GC threads, JIT compiler and thread pools. The problem I am trying to solve is to come up with a single number of CPUs based on container knowledge that can be used for the Java runtime subsystem to configure itself. I believe that we should trust the implementor of the Mesos or Kubernetes setup and honor their wishes when coming up with this number and not just use the processor affinity or number of cpus in the cpuset. The challenge is determining the right algorithm that doesn?t penalize the VM. My current implementation does this: total available logical processors = min (cpusets,sched_getaffinity,shares/1024, quota/period) All fractional units are rounded up to the next whole number. Bob. > > Makes sense to check. Hopefully there aren't any major errors or omissions in the above. > Thanks, > Alex > > [1] https://lwn.net/Articles/240474/ > [2] https://github.com/torvalds/linux/blob/368f89984bb971b9f8b69eeb85ab19a89f985809/kernel/sched/core.c#L6735 > [3] https://lwn.net/Articles/80911/ / http://www.i3s.unice.fr/~jplozi/wastedcores/files/extended_talk.pdf > > [4] > cfs_rq[13]:/system.slice/docker-f5681788d6daab249c90810fe60da429a2565b901ff34245922a578635b5d607.scope > > .exec_clock : 0.000000 > > .MIN_vruntime : 0.000001 > > .min_vruntime : 8090.087297 > > .max_vruntime : 0.000001 > > .spread : 0.000000 > > .spread0 : -124692718.052832 > > .nr_spread_over : 0 > > .nr_running : 1 > > .load : 1024 > > .runnable_load_avg : 1023 > > .blocked_load_avg : 0 > > .tg_load_avg : 2046 > > .tg_load_contrib : 1023 > > .tg_runnable_contrib : 1023 > > .tg->runnable_avg : 2036 > > .tg->cfs_bandwidth.timer_active: 0 > > .throttled : 0 > > .throttle_count : 0 > > .se->exec_start : 236081964.515645 > > .se->vruntime : 24403993.326934 > > .se->sum_exec_runtime : 8091.135873 > > .se->load.weight : 512 > > .se->avg.runnable_avg_sum : 45979 > > .se->avg.runnable_avg_period : 45979 > > .se->avg.load_avg_contrib : 511 > > .se->avg.decay_count : 0 > > > > Thanks, > David > > > On 5/10/2017 6:01 AM, Alex Bagehot wrote: > Hi, > > On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette > > wrote: > > > On Oct 4, 2017, at 2:30 PM, Robbin Ehn > wrote: > > Thanks Bob for looking into this. > > On 10/04/2017 08:14 PM, Bob Vandette wrote: > Robbin, > I?ve looked into this issue and you are correct. I do have to examine > both the > sched_getaffinity results as well as the cgroup cpu subsystem > configuration > files in order to provide a reasonable value for active_processors. If > I was only > interested in cpusets, I could simply rely on the getaffinity call but > I also want to > factor in shares and quotas as well. > > We had a quick discussion at the office, we actually do think that you > could skip reading the shares and quotas. > It really depends on what the user expect, if he give us 4 cpu's with > 50% or 2 full cpu what do he expect the differences would be? > One could argue that he 'knows' that he will only use max 50% and thus > we can act as if he is giving us 4 full cpu. > But I'll leave that up to you, just a tough we had. > > It?s my opinion that we should do something if someone makes the effort to > configure their > containers to use quotas or shares. There are many different opinions on > what the right that > right ?something? is. > > > It might be interesting to look at some real instances of how java might[3] > be deployed in containers. > Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so this is a vast > chunk of deployments that need both of them today. > > > > Many developers that are trying to deploy apps that use containers say > they don?t like > cpusets. This is too limiting for them especially when the server > configurations vary > within their organization. > > > True, however Kubernetes has an alpha feature[5] where it allocates cpusets > to containers that request a whole number of cpus. Previously without > cpusets any container could run on any cpu which we know might not be good > for some workloads that want isolation. A request for a fractional or > burstable amount of cpu would be allocated from a shared cpu pool. So > although manual allocation of cpusets will be flakey[3] , automation should > be able to make it work. > > > > From everything I?ve read including source code, there seems to be a > consensus that > shares and quotas are being used as a way to specify a fraction of a > system (number of cpus). > > > A refinement[6] on this is: > Shares can be used for guaranteed cpu - you will always get your share. > Quota[4] is a limit/constraint - you can never get more than the quota. > So given the below limit of how many shares will be allocated on a host you > can have burstable(or overcommit) capacity if your shares are less than > your quota. > > > > Docker added ?cpus which is implemented using quotas and periods. They > adjust these > two parameters to provide a way of calculating the number of cpus that > will be available > to a process (quota/period). Amazon also documents that cpu shares are > defined to be a multiple of 1024. > Where 1024 represents a single cpu and a share value of N*1024 represents > N cpus. > > > Kubernetes and Mesos/Marathon also use the N*1024 shares per host to > allocate resources automatically. > > Hopefully this provides some background on what a couple of orchestration > systems that will be running java are doing currently in this area. > Thanks, > Alex > > > [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e > 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a reasonable > intro : https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke > r-mesos-and-marathon/ ) > [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 > > [2] https://kubernetes.io/docs/concepts/configuration/manage > -compute-resources-container/ > > [3] https://youtu.be/w1rZOY5gbvk?t=2479 > > [4] https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt > https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf > https://lwn.net/Articles/428175/ > > [5] > https://github.com/kubernetes/community/blob/43ce57ac476b9f2ce3f0220354a075e095a0d469/contributors/design-proposals/node/cpu-manager.md > / https://github.com/kubernetes/kubernetes/commit/ > 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / https://vimeo.com/226858314 > > > [6] https://kubernetes.io/docs/concepts/configuration/manage- > compute-resources-container/#how-pods-with-resource-limits-are-run > > > Of course these are just conventions. This is why I provided a way of > specifying the > number of CPUs so folks deploying Java services can be certain they get > what they want. > > Bob. > > > I had assumed that when sched_setaffinity was called (in your case by > numactl) that the > cgroup cpu config files would be updated to reflect the current > processor affinity for the > running process. This is not correct. I have updated my changeset and > have successfully > run with your examples below. I?ll post a new webrev soon. > > I see, thanks again! > > /Robbin > > Thanks, > Bob. > > I still want to include the flag for at least one Java release in the > event that the new behavior causes some regression > in behavior. I?m trying to make the detection robust so that it will > fallback to the current behavior in the event > that cgroups is not configured as expected but I?d like to have a way > of forcing the issue. JDK 10 is not > supposed to be a long term support release which makes it a good > target for this new behavior. > I agree with David that once we commit to cgroups, we should extract > all VM configuration data from that > source. There?s more information available for cpusets than just > processor affinity that we might want to > consider when calculating the number of processors to assume for the > VM. There?s exclusivity and > effective cpu data available in addition to the cpuset string. > > cgroup only contains limits, not the real hard limits. > You most consider the affinity mask. We that have numa nodes do: > > [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java > -Xlog:os=debug -cp . ForEver | grep proc > [0.001s][debug][os] Initial active processor count set to 16 > [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java > -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc > [0.001s][debug][os] Initial active processor count set to 32 > > when benchmarking all the time and that must be set to 16 otherwise > the flag is really bad for us. > So the flag actually breaks the little numa support we have now. > > Thanks, Robbin > > > From karen.kinnear at oracle.com Thu Oct 5 19:13:30 2017 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Thu, 5 Oct 2017 15:13:30 -0400 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: vote: yes Karen > On Oct 2, 2017, at 11:24 AM, coleen.phillimore at oracle.com wrote: > > I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in the hotspot Group. > > Ioi has been working on the hotspot project for over 5 years and is a Reviewer in the JDK 9 Project with 79 changes. He is an expert in the area of class data sharing. > > Votes are due by Monday, October 16, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on this nomination. Votes must be cast in the open by replying to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote From robbin.ehn at oracle.com Thu Oct 5 19:17:10 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 5 Oct 2017 21:17:10 +0200 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> Message-ID: <65492519-d2b2-82ae-37a0-4540d4c5b937@oracle.com> Hi Alex, just a short question, You said something about "Marathon/Mesos[1] and Kubernetes[2] use shares and quotas" If you only use shares and quotas, do you not care about numa? (read trust kernel) On would think that you would setup a cgroup per numa node and split those into cgroups with shares/quotas. Thanks, Robbin On 10/05/2017 06:43 PM, Alex Bagehot wrote: > Hi David, > > On Wed, Oct 4, 2017 at 10:51 PM, David Holmes > wrote: > >> Hi Alex, >> >> Can you tell me how shares/quotas are actually implemented in terms of >> allocating "cpus" to processes when shares/quotas are being applied? > > > The allocation of cpus to processes/threads(tasks as the kernel sees them) > or the other way round is called balancing, which is done by Scheduling > domains[3]. > > cpu shares use CFS "group" scheduling[1] to apply the share to all the > tasks(threads) in the container. The container cpu shares weight maps > directly to a task's weight in CFS, which given it is part of a group is > divided by the number of tasks in the group (ie. a default container share > of 1024 with 2 threads in the container/group would result in each > thread/task having a 512 weight[4]). The same values used by nice[2] also. > > You can observe the task weight and other scheduler numbers in > /proc/sched_debug [4]. You can also kernel trace scheduler activity which > typically tells you the tasks involved, the cpu, the event: switch or > wakeup, etc. > > >> For example in a 12 cpu system if I have a 50% share do I get all 12 CPUs >> for 50% of a "quantum" each, or do I get 6 CPUs for a full quantum each? >> > > You get 12 cpus for 50% of the time on the average if there is another > workload that has the same weight as you and is consuming as much as it can. > If there's nothing else running on the machine you get 12 cpus for 100% of > the time with a cpu shares only config (ie. the burst capacity). > > I validated that the share was balanced over all the cpus by running linux > perf events and checking that there were cpu samples on all cpus. There's > bound to be other ways of doing it also. > > >> >> When we try to use the "number of processors" to control the number of >> threads created, or the number of partitions in a task, then we really want >> to know how many CPUs we can actually be concurrently running on! >> > > Makes sense to check. Hopefully there aren't any major errors or omissions > in the above. > Thanks, > Alex > > [1] https://lwn.net/Articles/240474/ > [2] https://github.com/torvalds/linux/blob/368f89984bb971b9f8b69eeb85ab19 > a89f985809/kernel/sched/core.c#L6735 > [3] https://lwn.net/Articles/80911/ / http://www.i3s.unice.fr/~ > jplozi/wastedcores/files/extended_talk.pdf > > [4] > > cfs_rq[13]:/system.slice/docker-f5681788d6daab249c90810fe60da4 > 29a2565b901ff34245922a578635b5d607.scope > > .exec_clock : 0.000000 > > .MIN_vruntime : 0.000001 > > .min_vruntime : 8090.087297 > > .max_vruntime : 0.000001 > > .spread : 0.000000 > > .spread0 : -124692718.052832 > > .nr_spread_over : 0 > > .nr_running : 1 > > .load : 1024 > > .runnable_load_avg : 1023 > > .blocked_load_avg : 0 > > .tg_load_avg : 2046 > > .tg_load_contrib : 1023 > > .tg_runnable_contrib : 1023 > > .tg->runnable_avg : 2036 > > .tg->cfs_bandwidth.timer_active: 0 > > .throttled : 0 > > .throttle_count : 0 > > .se->exec_start : 236081964.515645 > > .se->vruntime : 24403993.326934 > > .se->sum_exec_runtime : 8091.135873 > > .se->load.weight : 512 > > .se->avg.runnable_avg_sum : 45979 > > .se->avg.runnable_avg_period : 45979 > > .se->avg.load_avg_contrib : 511 > > .se->avg.decay_count : 0 > > >> >> Thanks, >> David >> >> >> On 5/10/2017 6:01 AM, Alex Bagehot wrote: >> >>> Hi, >>> >>> On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette >>> wrote: >>> >>> >>>> On Oct 4, 2017, at 2:30 PM, Robbin Ehn wrote: >>>>> >>>>> Thanks Bob for looking into this. >>>>> >>>>> On 10/04/2017 08:14 PM, Bob Vandette wrote: >>>>> >>>>>> Robbin, >>>>>> I?ve looked into this issue and you are correct. I do have to examine >>>>>> >>>>> both the >>>> >>>>> sched_getaffinity results as well as the cgroup cpu subsystem >>>>>> >>>>> configuration >>>> >>>>> files in order to provide a reasonable value for active_processors. If >>>>>> >>>>> I was only >>>> >>>>> interested in cpusets, I could simply rely on the getaffinity call but >>>>>> >>>>> I also want to >>>> >>>>> factor in shares and quotas as well. >>>>>> >>>>> >>>>> We had a quick discussion at the office, we actually do think that you >>>>> >>>> could skip reading the shares and quotas. >>>> >>>>> It really depends on what the user expect, if he give us 4 cpu's with >>>>> >>>> 50% or 2 full cpu what do he expect the differences would be? >>>> >>>>> One could argue that he 'knows' that he will only use max 50% and thus >>>>> >>>> we can act as if he is giving us 4 full cpu. >>>> >>>>> But I'll leave that up to you, just a tough we had. >>>>> >>>> >>>> It?s my opinion that we should do something if someone makes the effort >>>> to >>>> configure their >>>> containers to use quotas or shares. There are many different opinions on >>>> what the right that >>>> right ?something? is. >>>> >>>> >>> It might be interesting to look at some real instances of how java >>> might[3] >>> be deployed in containers. >>> Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so this is a >>> vast >>> chunk of deployments that need both of them today. >>> >>> >>> >>>> Many developers that are trying to deploy apps that use containers say >>>> they don?t like >>>> cpusets. This is too limiting for them especially when the server >>>> configurations vary >>>> within their organization. >>>> >>>> >>> True, however Kubernetes has an alpha feature[5] where it allocates >>> cpusets >>> to containers that request a whole number of cpus. Previously without >>> cpusets any container could run on any cpu which we know might not be good >>> for some workloads that want isolation. A request for a fractional or >>> burstable amount of cpu would be allocated from a shared cpu pool. So >>> although manual allocation of cpusets will be flakey[3] , automation >>> should >>> be able to make it work. >>> >>> >>> >>>> From everything I?ve read including source code, there seems to be a >>>> consensus that >>>> shares and quotas are being used as a way to specify a fraction of a >>>> system (number of cpus). >>>> >>>> >>> A refinement[6] on this is: >>> Shares can be used for guaranteed cpu - you will always get your share. >>> Quota[4] is a limit/constraint - you can never get more than the quota. >>> So given the below limit of how many shares will be allocated on a host >>> you >>> can have burstable(or overcommit) capacity if your shares are less than >>> your quota. >>> >>> >>> >>>> Docker added ?cpus which is implemented using quotas and periods. They >>>> adjust these >>>> two parameters to provide a way of calculating the number of cpus that >>>> will be available >>>> to a process (quota/period). Amazon also documents that cpu shares are >>>> defined to be a multiple of 1024. >>>> Where 1024 represents a single cpu and a share value of N*1024 represents >>>> N cpus. >>>> >>>> >>> Kubernetes and Mesos/Marathon also use the N*1024 shares per host to >>> allocate resources automatically. >>> >>> Hopefully this provides some background on what a couple of orchestration >>> systems that will be running java are doing currently in this area. >>> Thanks, >>> Alex >>> >>> >>> [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e >>> 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a reasonable >>> intro : https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke >>> r-mesos-and-marathon/ ) >>> [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 >>> >>> [2] https://kubernetes.io/docs/concepts/configuration/manage >>> -compute-resources-container/ >>> >>> [3] https://youtu.be/w1rZOY5gbvk?t=2479 >>> >>> [4] https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt >>> https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf >>> https://lwn.net/Articles/428175/ >>> >>> [5] >>> https://github.com/kubernetes/community/blob/43ce57ac476b9f2 >>> ce3f0220354a075e095a0d469/contributors/design-proposals/node >>> /cpu-manager.md >>> / https://github.com/kubernetes/kubernetes/commit/ >>> 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / https://vimeo.com/226858314 >>> >>> >>> [6] https://kubernetes.io/docs/concepts/configuration/manage- >>> compute-resources-container/#how-pods-with-resource-limits-are-run >>> >>> >>> Of course these are just conventions. This is why I provided a way of >>>> specifying the >>>> number of CPUs so folks deploying Java services can be certain they get >>>> what they want. >>>> >>>> Bob. >>>> >>>> >>>>> I had assumed that when sched_setaffinity was called (in your case by >>>>>> >>>>> numactl) that the >>>> >>>>> cgroup cpu config files would be updated to reflect the current >>>>>> >>>>> processor affinity for the >>>> >>>>> running process. This is not correct. I have updated my changeset and >>>>>> >>>>> have successfully >>>> >>>>> run with your examples below. I?ll post a new webrev soon. >>>>>> >>>>> >>>>> I see, thanks again! >>>>> >>>>> /Robbin >>>>> >>>>> Thanks, >>>>>> Bob. >>>>>> >>>>>>> >>>>>>> I still want to include the flag for at least one Java release in the >>>>>>>> >>>>>>> event that the new behavior causes some regression >>>> >>>>> in behavior. I?m trying to make the detection robust so that it will >>>>>>>> >>>>>>> fallback to the current behavior in the event >>>> >>>>> that cgroups is not configured as expected but I?d like to have a way >>>>>>>> >>>>>>> of forcing the issue. JDK 10 is not >>>> >>>>> supposed to be a long term support release which makes it a good >>>>>>>> >>>>>>> target for this new behavior. >>>> >>>>> I agree with David that once we commit to cgroups, we should extract >>>>>>>> >>>>>>> all VM configuration data from that >>>> >>>>> source. There?s more information available for cpusets than just >>>>>>>> >>>>>>> processor affinity that we might want to >>>> >>>>> consider when calculating the number of processors to assume for the >>>>>>>> >>>>>>> VM. There?s exclusivity and >>>> >>>>> effective cpu data available in addition to the cpuset string. >>>>>>>> >>>>>>> >>>>>>> cgroup only contains limits, not the real hard limits. >>>>>>> You most consider the affinity mask. We that have numa nodes do: >>>>>>> >>>>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java >>>>>>> >>>>>> -Xlog:os=debug -cp . ForEver | grep proc >>>> >>>>> [0.001s][debug][os] Initial active processor count set to 16 >>>>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java >>>>>>> >>>>>> -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc >>>> >>>>> [0.001s][debug][os] Initial active processor count set to 32 >>>>>>> >>>>>>> when benchmarking all the time and that must be set to 16 otherwise >>>>>>> >>>>>> the flag is really bad for us. >>>> >>>>> So the flag actually breaks the little numa support we have now. >>>>>>> >>>>>>> Thanks, Robbin >>>>>>> >>>>>> >>>> >>>> From zgu at redhat.com Thu Oct 5 19:47:37 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 5 Oct 2017 15:47:37 -0400 Subject: RFR(XXS) 8187685: NMT: Tracking compiler memory usage of thread's resource area Message-ID: <69808d92-6ac8-9d83-61dc-6bb45936b4dc@redhat.com> Compiler uses resource area for compilation, let's bias it to mtCompiler for more accurate memory counting. Bug: https://bugs.openjdk.java.net/browse/JDK-8187685 Webrev: http://cr.openjdk.java.net/~zgu/8187685/webrev.00/index.html Discussion thread: http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028360.html Test: hotspot_tier1 fastdebug and release on Linux x64. Thanks, -Zhengyu From coleen.phillimore at oracle.com Thu Oct 5 21:55:31 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 5 Oct 2017 17:55:31 -0400 Subject: Result: New hotspot Group Member: Markus Gronlund Message-ID: The vote for Markus Gronlund [1] is now closed. Yes: 11 Veto: 0 Abstain: 0 According to the Bylaws definition of Lazy Consensus, this is sufficient to approve the nomination. Coleen Phillimore [1] http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028362.html From david.holmes at oracle.com Thu Oct 5 22:12:30 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Oct 2017 08:12:30 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> Message-ID: <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> Hi Bob, On 6/10/2017 3:57 AM, Bob Vandette wrote: > >> On Oct 5, 2017, at 12:43 PM, Alex Bagehot > > wrote: >> >> Hi David, >> >> On Wed, Oct 4, 2017 at 10:51 PM, David Holmes > > wrote: >> >> Hi Alex, >> >> Can you tell me how shares/quotas are actually implemented in >> terms of allocating "cpus" to processes when shares/quotas are >> being applied? >> >> >> The allocation of cpus to processes/threads(tasks as the kernel sees >> them) or the other way round is called balancing, which is done by >> Scheduling domains[3]. >> >> cpu shares use CFS "group" scheduling[1] to apply the share to all the >> tasks(threads) in the container. The container cpu shares weight maps >> directly to a task's weight in CFS, which given it is part of a group >> is divided by the number of tasks in the group (ie. a default >> container share of 1024 with 2 threads in the container/group would >> result in each thread/task having a 512 weight[4]). The same values >> used by nice[2] also. >> >> You can observe the task weight and other scheduler numbers in >> /proc/sched_debug [4]. You can also kernel trace scheduler activity >> which typically tells you the tasks involved, the cpu, the event: >> switch or wakeup, etc. >> >> For example in a 12 cpu system if I have a 50% share do I get all >> 12 CPUs for 50% of a "quantum" each, or do I get 6 CPUs for a full >> quantum each? >> >> >> You get 12 cpus for 50% of the time on the average if there is another >> workload that has the same weight as you and is consuming as much as >> it can. >> If there's nothing else running on the machine you get 12 cpus for >> 100% of the time with a cpu shares only config (ie. the burst capacity). >> >> I validated that the share was balanced over all the cpus by running >> linux perf events and checking that there were cpu samples on all >> cpus. There's bound to be other ways of doing it also. >> >> >> When we try to use the "number of processors" to control the >> number of threads created, or the number of partitions in a task, >> then we really want to know how many CPUs we can actually be >> concurrently running on! > > I?m not sure that the primary question for serverless container > execution. Just because you might happen to burst and have available > to you more CPU time than you specified in your shares doesn?t mean > that a multi-threaded application running in one of these containers > should configure itself to use all available host processors. This > would result in over-burdoning the system at times of high load. And conversely if you restrict yourself to the "share" of processors you get over time (ie 6 instead of 12) then you can severely impact the performance (response time in particular) of the VM and the application running on the VM. But I don't see how this can overburden the system. If you app is running alone you get to use all 12 cpus for 100% of the time and life is good. If another app starts up then your 100% drops proportionately. If you schedule 12 apps all with a 1/12 share then everyone gets up to 12 cpus for 1/12 of the time. It's only if you try to schedule a set of apps with a utilization total greater than 1 does the system become overloaded. > The Java runtime, at startup, configures several subsystems to use a > number of threads for each system based on the number of available > processors. These subsystems include things like the number of GC > threads, JIT compiler and thread pools. > The problem I am trying to solve is to come up with a single number > of CPUs based on container knowledge that can be used for the Java > runtime subsystem to configure itself. I believe that we should > trust the implementor of the Mesos or Kubernetes setup and honor > their wishes when coming up with this number and not just use the > processor affinity or number of cpus in the cpuset. I don't agree, as has been discussed before. It's perfectly fine, even desirable, in my opinion to have 12 threads executing concurrently for 50% of the time, rather than only 6 threads for 100% (assuming the scheduling technology is even clever enough to realize it can grant your threads 100%). Over time the amount of work your app can execute is the same, but the time taken for an individual subtask can vary. If you are just doing one-shot batch processing then it makes no difference. If you're running an app that itself services incoming requests then the response time to individual requests can be impacted. To take the worst-case scenario, imagine you get 12 concurrent requests that would each take 1/12 of your cpu quota. With 12 threads on 12 cpus you can service all 12 requests with a response time of 1/12 time units. But with 6 threads on 6 cpus you can only service 6 requests with a 1/12 response time, and the other 6 will have a 1/6 response time. > The challenge is determining the right algorithm that doesn?t penalize > the VM. Agreed. But I think the current algorithm may penalize the VM, and more importantly the application it is running. > My current implementation does this: > > total available logical processors = min > (cpusets,sched_getaffinity,shares/1024, quota/period) > > All fractional units are rounded up to the next whole number. My point has always been that I just don't think producing a single number from all these factors is the right/best way to deal with this. I think we really want to be able to answer the question "how many processors can I concurrently execute on" distinct from the question of "how much of a time slice will I get on each of those processors". To me "how many" is the question that "availableProcessors" should be answering - and only that question. How much "share" do I get is a different question, and perhaps one that the VM and the application need to be able to ask. BTW sched_getaffinity should already account for cpusets ?? Cheers, David > Bob. > >> >> Makes sense to check. Hopefully there aren't any major errors or >> omissions in the above. >> Thanks, >> Alex >> >> [1] https://lwn.net/Articles/240474/ >> [2] >> https://github.com/torvalds/linux/blob/368f89984bb971b9f8b69eeb85ab19a89f985809/kernel/sched/core.c#L6735 >> >> [3] https://lwn.net/Articles/80911/ >> / http://www.i3s.unice.fr/~jplozi/wastedcores/files/extended_talk.pdf >> >> >> [4] >> >> cfs_rq[13]:/system.slice/docker-f5681788d6daab249c90810fe60da429a2565b901ff34245922a578635b5d607.scope >> >> .exec_clock: 0.000000 >> >> .MIN_vruntime: 0.000001 >> >> .min_vruntime: 8090.087297 >> >> .max_vruntime: 0.000001 >> >> .spread: 0.000000 >> >> .spread0 : -124692718.052832 >> >> .nr_spread_over: 0 >> >> .nr_running: 1 >> >> .load: 1024 >> >> .runnable_load_avg : 1023 >> >> .blocked_load_avg: 0 >> >> .tg_load_avg : 2046 >> >> .tg_load_contrib : 1023 >> >> .tg_runnable_contrib : 1023 >> >> .tg->runnable_avg: 2036 >> >> .tg->cfs_bandwidth.timer_active: 0 >> >> .throttled : 0 >> >> .throttle_count: 0 >> >> .se->exec_start: 236081964.515645 >> >> .se->vruntime: 24403993.326934 >> >> .se->sum_exec_runtime: 8091.135873 >> >> .se->load.weight : 512 >> >> .se->avg.runnable_avg_sum: 45979 >> >> .se->avg.runnable_avg_period : 45979 >> >> .se->avg.load_avg_contrib: 511 >> >> .se->avg.decay_count : 0 >> >> >> Thanks, >> David >> >> >> On 5/10/2017 6:01 AM, Alex Bagehot wrote: >> >> Hi, >> >> On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette >> > >> wrote: >> >> >> On Oct 4, 2017, at 2:30 PM, Robbin Ehn >> > >> wrote: >> >> Thanks Bob for looking into this. >> >> On 10/04/2017 08:14 PM, Bob Vandette wrote: >> >> Robbin, >> I?ve looked into this issue and you are correct. >> I do have to examine >> >> both the >> >> sched_getaffinity results as well as the cgroup >> cpu subsystem >> >> configuration >> >> files in order to provide a reasonable value for >> active_processors.? If >> >> I was only >> >> interested in cpusets, I could simply rely on the >> getaffinity call but >> >> I also want to >> >> factor in shares and quotas as well. >> >> >> We had a quick discussion at the office, we actually >> do think that you >> >> could skip reading the shares and quotas. >> >> It really depends on what the user expect, if he give >> us 4 cpu's with >> >> 50% or 2 full cpu what do he expect the differences would be? >> >> One could argue that he 'knows' that he will only use >> max 50% and thus >> >> we can act as if he is giving us 4 full cpu. >> >> But I'll leave that up to you, just a tough we had. >> >> >> It?s my opinion that we should do something if someone >> makes the effort to >> configure their >> containers to use quotas or shares.? There are many >> different opinions on >> what the right that >> right ?something? is. >> >> >> It might be interesting to look at some real instances of how >> java might[3] >> be deployed in containers. >> Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so >> this is a vast >> chunk of deployments that need both of them today. >> >> >> >> Many developers that are trying to deploy apps that use >> containers say >> they don?t like >> cpusets.? This is too limiting for them especially when >> the server >> configurations vary >> within their organization. >> >> >> True, however Kubernetes has an alpha feature[5] where it >> allocates cpusets >> to containers that request a whole number of cpus. Previously >> without >> cpusets any container could run on any cpu which we know might >> not be good >> for some workloads that want isolation. A request for a >> fractional or >> burstable amount of cpu would be allocated from a shared cpu >> pool. So >> although manual allocation of cpusets will be flakey[3] , >> automation should >> be able to make it work. >> >> >> >> ?From everything I?ve read including source code, there >> seems to be a >> consensus that >> shares and quotas are being used as a way to specify a >> fraction of a >> system (number of cpus). >> >> >> A refinement[6] on this is: >> Shares can be used for guaranteed cpu - you will always get >> your share. >> Quota[4] is a limit/constraint - you can never get more than >> the quota. >> So given the below limit of how many shares will be allocated >> on a host you >> can have burstable(or overcommit) capacity if your shares are >> less than >> your quota. >> >> >> >> Docker added ?cpus which is implemented using quotas and >> periods.? They >> adjust these >> two parameters to provide a way of calculating the number >> of cpus that >> will be available >> to a process (quota/period).? Amazon also documents that >> cpu shares are >> defined to be a multiple of 1024. >> Where 1024 represents a single cpu and a share value of >> N*1024 represents >> N cpus. >> >> >> Kubernetes and Mesos/Marathon also use the N*1024 shares per >> host to >> allocate resources automatically. >> >> Hopefully this provides some background on what a couple of >> orchestration >> systems that will be running java are doing currently in this >> area. >> Thanks, >> Alex >> >> >> [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e >> >> 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a >> reasonable >> intro : >> https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke >> >> r-mesos-and-marathon/ ) >> [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 >> >> >> [2] https://kubernetes.io/docs/concepts/configuration/manage >> >> -compute-resources-container/ >> >> [3] https://youtu.be/w1rZOY5gbvk?t=2479 >> >> >> [4] >> https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt >> >> https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf >> >> https://lwn.net/Articles/428175/ >> >> >> [5] >> https://github.com/kubernetes/community/blob/43ce57ac476b9f2ce3f0220354a075e095a0d469/contributors/design-proposals/node/cpu-manager.md >> >> / https://github.com/kubernetes/kubernetes/commit/ >> >> 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / >> https://vimeo.com/226858314 >> >> >> [6] https://kubernetes.io/docs/concepts/configuration/manage- >> >> compute-resources-container/#how-pods-with-resource-limits-are-run >> >> >> Of course these are just conventions.? This is why I >> provided a way of >> specifying the >> number of CPUs so folks deploying Java services can be >> certain they get >> what they want. >> >> Bob. >> >> >> I had assumed that when sched_setaffinity was >> called (in your case by >> >> numactl) that the >> >> cgroup cpu config files would be updated to >> reflect the current >> >> processor affinity for the >> >> running process. This is not correct.? I have >> updated my changeset and >> >> have successfully >> >> run with your examples below.? I?ll post a new >> webrev soon. >> >> >> I see, thanks again! >> >> /Robbin >> >> Thanks, >> Bob. >> >> >> I still want to include the flag for at >> least one Java release in the >> >> event that the new behavior causes some regression >> >> in behavior.? I?m trying to make the >> detection robust so that it will >> >> fallback to the current behavior in the event >> >> that cgroups is not configured as expected >> but I?d like to have a way >> >> of forcing the issue.? JDK 10 is not >> >> supposed to be a long term support release >> which makes it a good >> >> target for this new behavior. >> >> I agree with David that once we commit to >> cgroups, we should extract >> >> all VM configuration data from that >> >> source.? There?s more information >> available for cpusets than just >> >> processor affinity that we might want to >> >> consider when calculating the number of >> processors to assume for the >> >> VM.? There?s exclusivity and >> >> effective cpu data available in addition >> to the cpuset string. >> >> >> cgroup only contains limits, not the real hard >> limits. >> You most consider the affinity mask. We that >> have numa nodes do: >> >> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 >> --membind=1 java >> >> -Xlog:os=debug -cp . ForEver | grep proc >> >> [0.001s][debug][os] Initial active processor >> count set to 16 >> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 >> --membind=1 java >> >> -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | >> grep proc >> >> [0.001s][debug][os] Initial active processor >> count set to 32 >> >> when benchmarking all the time and that must >> be set to 16 otherwise >> >> the flag is really bad for us. >> >> So the flag actually breaks the little numa >> support we have now. >> >> Thanks, Robbin >> >> >> >> > From david.holmes at oracle.com Fri Oct 6 06:01:46 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Oct 2017 16:01:46 +1000 Subject: RFR (M): 8188813: Generalize OrderAccess to use templates In-Reply-To: <59D639E1.7070104@oracle.com> References: <59D639E1.7070104@oracle.com> Message-ID: <378cd133-e7c8-4ebb-b20e-cfbb2aa30c0d@oracle.com> Hi Erik, On 5/10/2017 11:55 PM, Erik ?sterlund wrote: > Hi, > > Now that Atomic has been generalized with templates, the same should to > be done to OrderAccess. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8188813 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/ Well I didn't see anything too scary looking. :) I assume we'll drop the ptr variants at some stage. One query: src/hotspot/share/gc/shared/cardTableModRefBS.inline.hpp Can you declare "volatile jbyte* byte = ..." to avoid the volatile cast on the orderAccess call? > Testing: mach5 hs-tier3 > > Since Atomic already has a mechanism for type checking generic arguments > for Atomic::load/store, and OrderAccess also is a bunch of semantically > decorated loads and stores, I decided to reuse the template wheel that > was already invented (Atomic::LoadImpl and Atomic::StoreImpl). > Therefore, I made OrderAccess privately inherit Atomic so that this > infrastructure could be reused. A whole bunch of code has been nuked > with this generalization. Good! > It is worth noting that I have added PrimitiveConversion functionality > for doubles and floats which translates to using the union trick for > casting double to and from int64_t and float to and from int32_t when > passing down doubles and ints to the API. I need the former two, because > Java supports volatile double and volatile float, and therefore runtime > support for that needs to be able to use floats and doubles. I also I didn't quite follow that. What parts of the runtime need to operate on volatile float/double Java fields? > added PrimitiveConversion functionality for the subclasses of oop > (instanceOop and friends). The base class oop already supported this, so > it seemed natural that the subclasses should support it too. Ok. Thanks, David ----- > Thanks, > /Erik From erik.osterlund at oracle.com Fri Oct 6 06:48:26 2017 From: erik.osterlund at oracle.com (Erik =?ISO-8859-1?Q?=D6sterlund?=) Date: Fri, 06 Oct 2017 08:48:26 +0200 Subject: RFR (M): 8188813: Generalize OrderAccess to use templates In-Reply-To: <378cd133-e7c8-4ebb-b20e-cfbb2aa30c0d@oracle.com> References: <59D639E1.7070104@oracle.com> <378cd133-e7c8-4ebb-b20e-cfbb2aa30c0d@oracle.com> Message-ID: <1507272506.23180.14.camel@oracle.com> Hi David, On fre, 2017-10-06 at 16:01 +1000, David Holmes wrote: > Hi Erik, > > On 5/10/2017 11:55 PM, Erik ?sterlund wrote: > > > > Hi, > > > > Now that Atomic has been generalized with templates, the same > > should to? > > be done to OrderAccess. > > > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8188813 > > > > Webrev: > > http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/ > Well I didn't see anything too scary looking. :) I assume we'll drop > the? > ptr variants at some stage. Yes, that is indeed the plan. > One query: > > src/hotspot/share/gc/shared/cardTableModRefBS.inline.hpp > > Can you declare "volatile jbyte* byte = ..." to avoid the volatile > cast? > on the orderAccess call? Sure. Fixed. > > > > > Testing: mach5 hs-tier3 > > > > Since Atomic already has a mechanism for type checking generic > > arguments? > > for Atomic::load/store, and OrderAccess also is a bunch of > > semantically? > > decorated loads and stores, I decided to reuse the template wheel > > that? > > was already invented (Atomic::LoadImpl and Atomic::StoreImpl). > > Therefore, I made OrderAccess privately inherit Atomic so that > > this? > > infrastructure could be reused. A whole bunch of code has been > > nuked? > > with this generalization. > Good! :) > > > > > It is worth noting that I have added PrimitiveConversion > > functionality? > > for doubles and floats which translates to using the union trick > > for? > > casting double to and from int64_t and float to and from int32_t > > when? > > passing down doubles and ints to the API. I need the former two, > > because? > > Java supports volatile double and volatile float, and therefore > > runtime? > > support for that needs to be able to use floats and doubles. I > > also? > I didn't quite follow that. What parts of the runtime need to operate > on? > volatile float/double Java fields? At the moment, there are multiple places that support the use of Java- volatile float/double. Some examples: * The static interpreter supports Java-volatile getfield/putfield (cf. cppInterpreter_zero.cpp:588, bytecodeInterpreter.cpp:2023) * unsafe supports getters and setters of Java-volatile doubles/floats (cf. unsafe.cpp:476). This support is not accidental. The Java language allows the use of volatile floats and doubles. Therefore we must support them in our runtime. Thanks for the review. /Erik > > > > > added PrimitiveConversion functionality for the subclasses of oop? > > (instanceOop and friends). The base class oop already supported > > this, so? > > it seemed natural that the subclasses should support it too. > Ok. > > Thanks, > David > ----- > > > > > Thanks, > > /Erik From ceeaspb at gmail.com Fri Oct 6 07:20:34 2017 From: ceeaspb at gmail.com (Alex Bagehot) Date: Fri, 6 Oct 2017 08:20:34 +0100 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <65492519-d2b2-82ae-37a0-4540d4c5b937@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <65492519-d2b2-82ae-37a0-4540d4c5b937@oracle.com> Message-ID: Hi Robbin, On Thursday, October 5, 2017, Robbin Ehn wrote: > Hi Alex, just a short question, > > You said something about "Marathon/Mesos[1] and Kubernetes[2] use shares > and quotas" > If you only use shares and quotas, do you not care about numa? (read trust > kernel) > On would think that you would setup a cgroup per numa node and split those > into cgroups with shares/quotas. It's a good point. I certainly care about numa; we test I think similar to you numactl 'ing driver/server processes to be in control of that variable. Kubernetes doesn't, yet [1]. Neither mesos [2]. Thanks Alex [1] https://github.com/kubernetes/kubernetes/issues/49964 [2] https://issues.apache.org/jira/plugins/servlet/mobile#issue/MESOS-6548 / https://issues.apache.org/jira/plugins/servlet/mobile#issue/MESOS-5342 > Thanks, Robbin > > On 10/05/2017 06:43 PM, Alex Bagehot wrote: > >> Hi David, >> >> On Wed, Oct 4, 2017 at 10:51 PM, David Holmes >> wrote: >> >> Hi Alex, >>> >>> Can you tell me how shares/quotas are actually implemented in terms of >>> allocating "cpus" to processes when shares/quotas are being applied? >>> >> >> >> The allocation of cpus to processes/threads(tasks as the kernel sees them) >> or the other way round is called balancing, which is done by Scheduling >> domains[3]. >> >> cpu shares use CFS "group" scheduling[1] to apply the share to all the >> tasks(threads) in the container. The container cpu shares weight maps >> directly to a task's weight in CFS, which given it is part of a group is >> divided by the number of tasks in the group (ie. a default container share >> of 1024 with 2 threads in the container/group would result in each >> thread/task having a 512 weight[4]). The same values used by nice[2] also. >> >> You can observe the task weight and other scheduler numbers in >> /proc/sched_debug [4]. You can also kernel trace scheduler activity which >> typically tells you the tasks involved, the cpu, the event: switch or >> wakeup, etc. >> >> >> For example in a 12 cpu system if I have a 50% share do I get all 12 CPUs >>> for 50% of a "quantum" each, or do I get 6 CPUs for a full quantum each? >>> >>> >> You get 12 cpus for 50% of the time on the average if there is another >> workload that has the same weight as you and is consuming as much as it >> can. >> If there's nothing else running on the machine you get 12 cpus for 100% of >> the time with a cpu shares only config (ie. the burst capacity). >> >> I validated that the share was balanced over all the cpus by running linux >> perf events and checking that there were cpu samples on all cpus. There's >> bound to be other ways of doing it also. >> >> >> >>> When we try to use the "number of processors" to control the number of >>> threads created, or the number of partitions in a task, then we really >>> want >>> to know how many CPUs we can actually be concurrently running on! >>> >>> >> Makes sense to check. Hopefully there aren't any major errors or omissions >> in the above. >> Thanks, >> Alex >> >> [1] https://lwn.net/Articles/240474/ >> [2] https://github.com/torvalds/linux/blob/368f89984bb971b9f8b69eeb85ab19 >> a89f985809/kernel/sched/core.c#L6735 >> [3] https://lwn.net/Articles/80911/ / http://www.i3s.unice.fr/~ >> jplozi/wastedcores/files/extended_talk.pdf >> >> [4] >> >> cfs_rq[13]:/system.slice/docker-f5681788d6daab249c90810fe60da4 >> 29a2565b901ff34245922a578635b5d607.scope >> >> .exec_clock : 0.000000 >> >> .MIN_vruntime : 0.000001 >> >> .min_vruntime : 8090.087297 >> >> .max_vruntime : 0.000001 >> >> .spread : 0.000000 >> >> .spread0 : -124692718.052832 >> >> .nr_spread_over : 0 >> >> .nr_running : 1 >> >> .load : 1024 >> >> .runnable_load_avg : 1023 >> >> .blocked_load_avg : 0 >> >> .tg_load_avg : 2046 >> >> .tg_load_contrib : 1023 >> >> .tg_runnable_contrib : 1023 >> >> .tg->runnable_avg : 2036 >> >> .tg->cfs_bandwidth.timer_active: 0 >> >> .throttled : 0 >> >> .throttle_count : 0 >> >> .se->exec_start : 236081964.515645 >> >> .se->vruntime : 24403993.326934 >> >> .se->sum_exec_runtime : 8091.135873 >> >> .se->load.weight : 512 >> >> .se->avg.runnable_avg_sum : 45979 >> >> .se->avg.runnable_avg_period : 45979 >> >> .se->avg.load_avg_contrib : 511 >> >> .se->avg.decay_count : 0 >> >> >> >>> Thanks, >>> David >>> >>> >>> On 5/10/2017 6:01 AM, Alex Bagehot wrote: >>> >>> Hi, >>>> >>>> On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette >>>> wrote: >>>> >>>> >>>> On Oct 4, 2017, at 2:30 PM, Robbin Ehn wrote: >>>>> >>>>>> >>>>>> Thanks Bob for looking into this. >>>>>> >>>>>> On 10/04/2017 08:14 PM, Bob Vandette wrote: >>>>>> >>>>>> Robbin, >>>>>>> I?ve looked into this issue and you are correct. I do have to >>>>>>> examine >>>>>>> >>>>>>> both the >>>>>> >>>>> >>>>> sched_getaffinity results as well as the cgroup cpu subsystem >>>>>> >>>>>>> >>>>>>> configuration >>>>>> >>>>> >>>>> files in order to provide a reasonable value for active_processors. If >>>>>> >>>>>>> >>>>>>> I was only >>>>>> >>>>> >>>>> interested in cpusets, I could simply rely on the getaffinity call but >>>>>> >>>>>>> >>>>>>> I also want to >>>>>> >>>>> >>>>> factor in shares and quotas as well. >>>>>> >>>>>>> >>>>>>> >>>>>> We had a quick discussion at the office, we actually do think that you >>>>>> >>>>>> could skip reading the shares and quotas. >>>>> >>>>> It really depends on what the user expect, if he give us 4 cpu's with >>>>>> >>>>>> 50% or 2 full cpu what do he expect the differences would be? >>>>> >>>>> One could argue that he 'knows' that he will only use max 50% and thus >>>>>> >>>>>> we can act as if he is giving us 4 full cpu. >>>>> >>>>> But I'll leave that up to you, just a tough we had. >>>>>> >>>>>> >>>>> It?s my opinion that we should do something if someone makes the effort >>>>> to >>>>> configure their >>>>> containers to use quotas or shares. There are many different opinions >>>>> on >>>>> what the right that >>>>> right ?something? is. >>>>> >>>>> >>>>> It might be interesting to look at some real instances of how java >>>> might[3] >>>> be deployed in containers. >>>> Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so this is a >>>> vast >>>> chunk of deployments that need both of them today. >>>> >>>> >>>> >>>> Many developers that are trying to deploy apps that use containers say >>>>> they don?t like >>>>> cpusets. This is too limiting for them especially when the server >>>>> configurations vary >>>>> within their organization. >>>>> >>>>> >>>>> True, however Kubernetes has an alpha feature[5] where it allocates >>>> cpusets >>>> to containers that request a whole number of cpus. Previously without >>>> cpusets any container could run on any cpu which we know might not be >>>> good >>>> for some workloads that want isolation. A request for a fractional or >>>> burstable amount of cpu would be allocated from a shared cpu pool. So >>>> although manual allocation of cpusets will be flakey[3] , automation >>>> should >>>> be able to make it work. >>>> >>>> >>>> >>>> From everything I?ve read including source code, there seems to be a >>>>> consensus that >>>>> shares and quotas are being used as a way to specify a fraction of a >>>>> system (number of cpus). >>>>> >>>>> >>>>> A refinement[6] on this is: >>>> Shares can be used for guaranteed cpu - you will always get your share. >>>> Quota[4] is a limit/constraint - you can never get more than the quota. >>>> So given the below limit of how many shares will be allocated on a host >>>> you >>>> can have burstable(or overcommit) capacity if your shares are less than >>>> your quota. >>>> >>>> >>>> >>>> Docker added ?cpus which is implemented using quotas and periods. They >>>>> adjust these >>>>> two parameters to provide a way of calculating the number of cpus that >>>>> will be available >>>>> to a process (quota/period). Amazon also documents that cpu shares are >>>>> defined to be a multiple of 1024. >>>>> Where 1024 represents a single cpu and a share value of N*1024 >>>>> represents >>>>> N cpus. >>>>> >>>>> >>>>> Kubernetes and Mesos/Marathon also use the N*1024 shares per host to >>>> allocate resources automatically. >>>> >>>> Hopefully this provides some background on what a couple of >>>> orchestration >>>> systems that will be running java are doing currently in this area. >>>> Thanks, >>>> Alex >>>> >>>> >>>> [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e >>>> 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a >>>> reasonable >>>> intro : https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke >>>> r-mesos-and-marathon/ ) >>>> [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 >>>> >>>> [2] https://kubernetes.io/docs/concepts/configuration/manage >>>> -compute-resources-container/ >>>> >>>> [3] https://youtu.be/w1rZOY5gbvk?t=2479 >>>> >>>> [4] https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt >>>> https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf >>>> https://lwn.net/Articles/428175/ >>>> >>>> [5] >>>> https://github.com/kubernetes/community/blob/43ce57ac476b9f2 >>>> ce3f0220354a075e095a0d469/contributors/design-proposals/node >>>> /cpu-manager.md >>>> / https://github.com/kubernetes/kubernetes/commit/ >>>> 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / https://vimeo.com/226858314 >>>> >>>> >>>> [6] https://kubernetes.io/docs/concepts/configuration/manage- >>>> compute-resources-container/#how-pods-with-resource-limits-are-run >>>> >>>> >>>> Of course these are just conventions. This is why I provided a way of >>>> >>>>> specifying the >>>>> number of CPUs so folks deploying Java services can be certain they get >>>>> what they want. >>>>> >>>>> Bob. >>>>> >>>>> >>>>> I had assumed that when sched_setaffinity was called (in your case by >>>>>> >>>>>>> >>>>>>> numactl) that the >>>>>> >>>>> >>>>> cgroup cpu config files would be updated to reflect the current >>>>>> >>>>>>> >>>>>>> processor affinity for the >>>>>> >>>>> >>>>> running process. This is not correct. I have updated my changeset and >>>>>> >>>>>>> >>>>>>> have successfully >>>>>> >>>>> >>>>> run with your examples below. I?ll post a new webrev soon. >>>>>> >>>>>>> >>>>>>> >>>>>> I see, thanks again! >>>>>> >>>>>> /Robbin >>>>>> >>>>>> Thanks, >>>>>> >>>>>>> Bob. >>>>>>> >>>>>>> >>>>>>>> I still want to include the flag for at least one Java release in >>>>>>>> the >>>>>>>> >>>>>>>>> >>>>>>>>> event that the new behavior causes some regression >>>>>>>> >>>>>>> >>>>> in behavior. I?m trying to make the detection robust so that it will >>>>>> >>>>>>> >>>>>>>>> fallback to the current behavior in the event >>>>>>>> >>>>>>> >>>>> that cgroups is not configured as expected but I?d like to have a way >>>>>> >>>>>>> >>>>>>>>> of forcing the issue. JDK 10 is not >>>>>>>> >>>>>>> >>>>> supposed to be a long term support release which makes it a good >>>>>> >>>>>>> >>>>>>>>> target for this new behavior. >>>>>>>> >>>>>>> >>>>> I agree with David that once we commit to cgroups, we should extract >>>>>> >>>>>>> >>>>>>>>> all VM configuration data from that >>>>>>>> >>>>>>> >>>>> source. There?s more information available for cpusets than just >>>>>> >>>>>>> >>>>>>>>> processor affinity that we might want to >>>>>>>> >>>>>>> >>>>> consider when calculating the number of processors to assume for the >>>>>> >>>>>>> >>>>>>>>> VM. There?s exclusivity and >>>>>>>> >>>>>>> >>>>> effective cpu data available in addition to the cpuset string. >>>>>> >>>>>>> >>>>>>>>> >>>>>>>> cgroup only contains limits, not the real hard limits. >>>>>>>> You most consider the affinity mask. We that have numa nodes do: >>>>>>>> >>>>>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java >>>>>>>> >>>>>>>> -Xlog:os=debug -cp . ForEver | grep proc >>>>>>> >>>>>> >>>>> [0.001s][debug][os] Initial active processor count set to 16 >>>>>> >>>>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 --membind=1 java >>>>>>>> >>>>>>>> -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | grep proc >>>>>>> >>>>>> >>>>> [0.001s][debug][os] Initial active processor count set to 32 >>>>>> >>>>>>> >>>>>>>> when benchmarking all the time and that must be set to 16 otherwise >>>>>>>> >>>>>>>> the flag is really bad for us. >>>>>>> >>>>>> >>>>> So the flag actually breaks the little numa support we have now. >>>>>> >>>>>>> >>>>>>>> Thanks, Robbin >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>> From david.holmes at oracle.com Fri Oct 6 08:19:47 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 6 Oct 2017 18:19:47 +1000 Subject: RFR (M): 8188813: Generalize OrderAccess to use templates In-Reply-To: <1507272506.23180.14.camel@oracle.com> References: <59D639E1.7070104@oracle.com> <378cd133-e7c8-4ebb-b20e-cfbb2aa30c0d@oracle.com> <1507272506.23180.14.camel@oracle.com> Message-ID: On 6/10/2017 4:48 PM, Erik ?sterlund wrote: > Hi David, > > On fre, 2017-10-06 at 16:01 +1000, David Holmes wrote: >> Hi Erik, >> >> On 5/10/2017 11:55 PM, Erik ?sterlund wrote: >>> >>> Hi, >>> >>> Now that Atomic has been generalized with templates, the same >>> should to >>> be done to OrderAccess. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8188813 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/ >> Well I didn't see anything too scary looking. :) I assume we'll drop >> the >> ptr variants at some stage. > > Yes, that is indeed the plan. > >> One query: >> >> src/hotspot/share/gc/shared/cardTableModRefBS.inline.hpp >> >> Can you declare "volatile jbyte* byte = ..." to avoid the volatile >> cast >> on the orderAccess call? > > Sure. Fixed. > >> >>> >>> Testing: mach5 hs-tier3 >>> >>> Since Atomic already has a mechanism for type checking generic >>> arguments >>> for Atomic::load/store, and OrderAccess also is a bunch of >>> semantically >>> decorated loads and stores, I decided to reuse the template wheel >>> that >>> was already invented (Atomic::LoadImpl and Atomic::StoreImpl). >>> Therefore, I made OrderAccess privately inherit Atomic so that >>> this >>> infrastructure could be reused. A whole bunch of code has been >>> nuked >>> with this generalization. >> Good! > > :) > >> >>> >>> It is worth noting that I have added PrimitiveConversion >>> functionality >>> for doubles and floats which translates to using the union trick >>> for >>> casting double to and from int64_t and float to and from int32_t >>> when >>> passing down doubles and ints to the API. I need the former two, >>> because >>> Java supports volatile double and volatile float, and therefore >>> runtime >>> support for that needs to be able to use floats and doubles. I >>> also >> I didn't quite follow that. What parts of the runtime need to operate >> on >> volatile float/double Java fields? > > At the moment, there are multiple places that support the use of Java- > volatile float/double. > > Some examples: > * The static interpreter supports Java-volatile getfield/putfield (cf. > cppInterpreter_zero.cpp:588, bytecodeInterpreter.cpp:2023) Yes this is the _implementation_ of volatile field access for floats/doubles. I don't count that as a "use". :) > * unsafe supports getters and setters of Java-volatile doubles/floats > (cf. unsafe.cpp:476). Yes this is more of a "use" but again very specific. > This support is not accidental. The Java language allows the use of > volatile floats and doubles. Therefore we must support them in our > runtime. Not quite what I meant. :) Other than the implementation of the Java volatile field accesses (direct of via Unsafe or intrinsics) I was wondering where we might need to do this. The general runtime tends not to do arbitary orderAccess or atomic operations on floats/doubles. Cheers, David > Thanks for the review. > > /Erik > >> >>> >>> added PrimitiveConversion functionality for the subclasses of oop >>> (instanceOop and friends). The base class oop already supported >>> this, so >>> it seemed natural that the subclasses should support it too. >> Ok. >> >> Thanks, >> David >> ----- >> >>> >>> Thanks, >>> /Erik From volker.simonis at gmail.com Fri Oct 6 08:28:03 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Fri, 6 Oct 2017 10:28:03 +0200 Subject: CFV: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: Vote: yes On Mon, Oct 2, 2017 at 5:24 PM, wrote: > I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in the > hotspot Group. > > Ioi has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 79 changes. He is an expert in the area > of class data sharing. > > Votes are due by Monday, October 16, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on this > nomination. Votes must be cast in the open by replying to this mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote From goetz.lindenmaier at sap.com Fri Oct 6 08:29:53 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 6 Oct 2017 08:29:53 +0000 Subject: New hotspot Group Member: Ioi Lam In-Reply-To: References: Message-ID: <81e479634d5b43b9a7253c666241d7ba@sap.com> vote: yes > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf Of coleen.phillimore at oracle.com > Sent: Montag, 2. Oktober 2017 17:25 > To: hotspot-dev developers > Subject: CFV: New hotspot Group Member: Ioi Lam > > I hereby nominate Ioi Lam (OpenJDK user name: iklam) to Membership in > the hotspot Group. > > Ioi has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 79 changes.?? He is an expert in the > area of class data sharing. > > Votes are due by Monday, October 16, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote From erik.osterlund at oracle.com Fri Oct 6 08:49:07 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 6 Oct 2017 10:49:07 +0200 Subject: RFR (M): 8188813: Generalize OrderAccess to use templates In-Reply-To: References: <59D639E1.7070104@oracle.com> <378cd133-e7c8-4ebb-b20e-cfbb2aa30c0d@oracle.com> <1507272506.23180.14.camel@oracle.com> Message-ID: <59D74383.5000204@oracle.com> Hi David, Thanks for looking into this. On 2017-10-06 10:19, David Holmes wrote: > On 6/10/2017 4:48 PM, Erik ?sterlund wrote: >> Hi David, >> >> On fre, 2017-10-06 at 16:01 +1000, David Holmes wrote: >>> Hi Erik, >>> >>> On 5/10/2017 11:55 PM, Erik ?sterlund wrote: >>>> >>>> Hi, >>>> >>>> Now that Atomic has been generalized with templates, the same >>>> should to >>>> be done to OrderAccess. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8188813 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/ >>> Well I didn't see anything too scary looking. :) I assume we'll drop >>> the >>> ptr variants at some stage. >> >> Yes, that is indeed the plan. >> >>> One query: >>> >>> src/hotspot/share/gc/shared/cardTableModRefBS.inline.hpp >>> >>> Can you declare "volatile jbyte* byte = ..." to avoid the volatile >>> cast >>> on the orderAccess call? >> >> Sure. Fixed. >> >>> >>>> >>>> Testing: mach5 hs-tier3 >>>> >>>> Since Atomic already has a mechanism for type checking generic >>>> arguments >>>> for Atomic::load/store, and OrderAccess also is a bunch of >>>> semantically >>>> decorated loads and stores, I decided to reuse the template wheel >>>> that >>>> was already invented (Atomic::LoadImpl and Atomic::StoreImpl). >>>> Therefore, I made OrderAccess privately inherit Atomic so that >>>> this >>>> infrastructure could be reused. A whole bunch of code has been >>>> nuked >>>> with this generalization. >>> Good! >> >> :) >> >>> >>>> >>>> It is worth noting that I have added PrimitiveConversion >>>> functionality >>>> for doubles and floats which translates to using the union trick >>>> for >>>> casting double to and from int64_t and float to and from int32_t >>>> when >>>> passing down doubles and ints to the API. I need the former two, >>>> because >>>> Java supports volatile double and volatile float, and therefore >>>> runtime >>>> support for that needs to be able to use floats and doubles. I >>>> also >>> I didn't quite follow that. What parts of the runtime need to operate >>> on >>> volatile float/double Java fields? >> >> At the moment, there are multiple places that support the use of Java- >> volatile float/double. >> >> Some examples: >> * The static interpreter supports Java-volatile getfield/putfield (cf. >> cppInterpreter_zero.cpp:588, bytecodeInterpreter.cpp:2023) > > Yes this is the _implementation_ of volatile field access for > floats/doubles. I don't count that as a "use". :) > >> * unsafe supports getters and setters of Java-volatile doubles/floats >> (cf. unsafe.cpp:476). > > Yes this is more of a "use" but again very specific. Naturally, in order to support Java-volatile doubles and floats in the VM, we have the choice of 1) Flicking the PrimitiveConversion double/float switch allowing this to be automatically solved by the API and not rewriting uses of OrderAccess for supporting Java-volatile, or 2) Treating ordered accesses of double/float as special cases requiring manual (and very specific) casting to do the same thing. I thought alternative 1 was nicer, because I dislike unnecessary special cases. > >> This support is not accidental. The Java language allows the use of >> volatile floats and doubles. Therefore we must support them in our >> runtime. > > Not quite what I meant. :) Other than the implementation of the Java > volatile field accesses (direct of via Unsafe or intrinsics) I was > wondering where we might need to do this. The general runtime tends > not to do arbitary orderAccess or atomic operations on floats/doubles. We do not need it for anything else than supporting Java-volatile in the VM. Thanks, /Erik > Cheers, > David > >> Thanks for the review. >> >> /Erik >> >>> >>>> >>>> added PrimitiveConversion functionality for the subclasses of oop >>>> (instanceOop and friends). The base class oop already supported >>>> this, so >>>> it seemed natural that the subclasses should support it too. >>> Ok. >>> >>> Thanks, >>> David >>> ----- >>> >>>> >>>> Thanks, >>>> /Erik From goetz.lindenmaier at sap.com Fri Oct 6 09:13:18 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 6 Oct 2017 09:13:18 +0000 Subject: New hotspot Group Member: Markus Gronlund In-Reply-To: References: Message-ID: <542a2862ad4844908b31c415ff0e2447@sap.com> vote: yes Best regards, Goetz. > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf Of coleen.phillimore at oracle.com > Sent: Dienstag, 19. September 2017 19:55 > To: hotspot-dev developers > Subject: CFV: New hotspot Group Member: Markus Gronlund > > I hereby nominate Markus Gronlund (OpenJDK user name: mgronlun) to > Membership in the hotspot Group. > > Markus has been working on the hotspot project for over 5 years and is a > Reviewer in the JDK 9 Project with 51 changes.?? He is an expert in the > area of event based tracing of Java programs. > > Votes are due by Tuesday, October 3, 2017. > > Only current Members of the hotspot Group [1] are eligible to vote on > this nomination. Votes must be cast in the open by replying to this > mailing list. > > For Lazy Consensus voting instructions, see [2]. > > Coleen > > [1]http://openjdk.java.net/census#hotspot > [2]http://openjdk.java.net/groups/#member-vote > > From coleen.phillimore at oracle.com Fri Oct 6 15:09:38 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 6 Oct 2017 11:09:38 -0400 Subject: RFR (M): 8188813: Generalize OrderAccess to use templates In-Reply-To: <59D639E1.7070104@oracle.com> References: <59D639E1.7070104@oracle.com> Message-ID: http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.inline.hpp.udiff.html +template +struct OrderAccess::PlatformOrderedStore + VALUE_OBJ_CLASS_SPEC +{ + template + void operator()(T v, volatile T* p) const { release_store(p, v); fence(); } +}; Isn't release_store() removed by this patch?? Or does this call back to OrderAccess::release_store, which seems circular (?) Otherwise this looks really nice. I'll remove the *_ptr versions with https://bugs.openjdk.java.net/browse/JDK-8188220 . It's been fun. Thanks, Coleen On 10/5/17 9:55 AM, Erik ?sterlund wrote: > Hi, > > Now that Atomic has been generalized with templates, the same should > to be done to OrderAccess. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8188813 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/ > > Testing: mach5 hs-tier3 > > Since Atomic already has a mechanism for type checking generic > arguments for Atomic::load/store, and OrderAccess also is a bunch of > semantically decorated loads and stores, I decided to reuse the > template wheel that was already invented (Atomic::LoadImpl and > Atomic::StoreImpl). > Therefore, I made OrderAccess privately inherit Atomic so that this > infrastructure could be reused. A whole bunch of code has been nuked > with this generalization. > > It is worth noting that I have added PrimitiveConversion functionality > for doubles and floats which translates to using the union trick for > casting double to and from int64_t and float to and from int32_t when > passing down doubles and ints to the API. I need the former two, > because Java supports volatile double and volatile float, and > therefore runtime support for that needs to be able to use floats and > doubles. I also added PrimitiveConversion functionality for the > subclasses of oop (instanceOop and friends). The base class oop > already supported this, so it seemed natural that the subclasses > should support it too. > > Thanks, > /Erik From bob.vandette at oracle.com Fri Oct 6 15:34:37 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Fri, 6 Oct 2017 11:34:37 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> Message-ID: > On Oct 5, 2017, at 6:12 PM, David Holmes wrote: > > Hi Bob, > > On 6/10/2017 3:57 AM, Bob Vandette wrote: >>> On Oct 5, 2017, at 12:43 PM, Alex Bagehot > wrote: >>> >>> Hi David, >>> >>> On Wed, Oct 4, 2017 at 10:51 PM, David Holmes > wrote: >>> >>> Hi Alex, >>> >>> Can you tell me how shares/quotas are actually implemented in >>> terms of allocating "cpus" to processes when shares/quotas are >>> being applied? >>> >>> The allocation of cpus to processes/threads(tasks as the kernel sees them) or the other way round is called balancing, which is done by Scheduling domains[3]. >>> >>> cpu shares use CFS "group" scheduling[1] to apply the share to all the tasks(threads) in the container. The container cpu shares weight maps directly to a task's weight in CFS, which given it is part of a group is divided by the number of tasks in the group (ie. a default container share of 1024 with 2 threads in the container/group would result in each thread/task having a 512 weight[4]). The same values used by nice[2] also. >>> >>> You can observe the task weight and other scheduler numbers in /proc/sched_debug [4]. You can also kernel trace scheduler activity which typically tells you the tasks involved, the cpu, the event: switch or wakeup, etc. >>> >>> For example in a 12 cpu system if I have a 50% share do I get all >>> 12 CPUs for 50% of a "quantum" each, or do I get 6 CPUs for a full >>> quantum each? >>> >>> >>> You get 12 cpus for 50% of the time on the average if there is another workload that has the same weight as you and is consuming as much as it can. >>> If there's nothing else running on the machine you get 12 cpus for 100% of the time with a cpu shares only config (ie. the burst capacity). >>> >>> I validated that the share was balanced over all the cpus by running linux perf events and checking that there were cpu samples on all cpus. There's bound to be other ways of doing it also. >>> >>> >>> When we try to use the "number of processors" to control the >>> number of threads created, or the number of partitions in a task, >>> then we really want to know how many CPUs we can actually be >>> concurrently running on! >> I?m not sure that the primary question for serverless container execution. Just because you might happen to burst and have available >> to you more CPU time than you specified in your shares doesn?t mean >> that a multi-threaded application running in one of these containers should configure itself to use all available host processors. This would result in over-burdoning the system at times of high load. > > And conversely if you restrict yourself to the "share" of processors you get over time (ie 6 instead of 12) then you can severely impact the performance (response time in particular) of the VM and the application running on the VM. So if someone configures an 88 way system to use 1/88 share, you don?t think they expect a highly threaded application to run slower than if they didn?t restrict the shares?? The whole idea about shares is to SHARE the system. Yes, you?d have better performance when the system is idle and only running a single application but that?s not what these container frameworks are trying to accomplish. They want to get the best performance when running many many processes. That?s what I?m optimizing for. > > But I don't see how this can overburden the system. If you app is running alone you get to use all 12 cpus for 100% of the time and life is good. If another app starts up then your 100% drops proportionately. If you schedule 12 apps all with a 1/12 share then everyone gets up to 12 cpus for 1/12 of the time. It's only if you try to schedule a set of apps with a utilization total greater than 1 does the system become overloaded. In my above example, If we run the VM ergonomics based on 88 CPUs, then we are wasting a lot of memory on thread stacks and when many of these processes are running, the system will context switch a lot more than it would if we restricted the creation of threads to the share amount. Bob. > >> The Java runtime, at startup, configures several subsystems to use a number of threads for each system based on the number of available >> processors. These subsystems include things like the number of GC >> threads, JIT compiler and thread pools. > >> The problem I am trying to solve is to come up with a single number >> of CPUs based on container knowledge that can be used for the Java >> runtime subsystem to configure itself. I believe that we should >> trust the implementor of the Mesos or Kubernetes setup and honor their wishes when coming up with this number and not just use the >> processor affinity or number of cpus in the cpuset. > > I don't agree, as has been discussed before. It's perfectly fine, even desirable, in my opinion to have 12 threads executing concurrently for 50% of the time, rather than only 6 threads for 100% (assuming the scheduling technology is even clever enough to realize it can grant your threads 100%). > > Over time the amount of work your app can execute is the same, but the time taken for an individual subtask can vary. If you are just doing one-shot batch processing then it makes no difference. If you're running an app that itself services incoming requests then the response time to individual requests can be impacted. To take the worst-case scenario, imagine you get 12 concurrent requests that would each take 1/12 of your cpu quota. With 12 threads on 12 cpus you can service all 12 requests with a response time of 1/12 time units. But with 6 threads on 6 cpus you can only service 6 requests with a 1/12 response time, and the other 6 will have a 1/6 response time. > >> The challenge is determining the right algorithm that doesn?t penalize the VM. > > Agreed. But I think the current algorithm may penalize the VM, and more importantly the application it is running. > >> My current implementation does this: >> total available logical processors = min (cpusets,sched_getaffinity,shares/1024, quota/period) >> All fractional units are rounded up to the next whole number. > > My point has always been that I just don't think producing a single number from all these factors is the right/best way to deal with this. I think we really want to be able to answer the question "how many processors can I concurrently execute on" distinct from the question of "how much of a time slice will I get on each of those processors". To me "how many" is the question that "availableProcessors" should be answering - and only that question. How much "share" do I get is a different question, and perhaps one that the VM and the application need to be able to ask. > > BTW sched_getaffinity should already account for cpusets ?? > > Cheers, > David > >> Bob. >>> >>> Makes sense to check. Hopefully there aren't any major errors or omissions in the above. >>> Thanks, >>> Alex >>> >>> [1] https://lwn.net/Articles/240474/ >>> [2] https://github.com/torvalds/linux/blob/368f89984bb971b9f8b69eeb85ab19a89f985809/kernel/sched/core.c#L6735 >>> [3] https://lwn.net/Articles/80911/ / http://www.i3s.unice.fr/~jplozi/wastedcores/files/extended_talk.pdf >>> >>> [4] >>> >>> cfs_rq[13]:/system.slice/docker-f5681788d6daab249c90810fe60da429a2565b901ff34245922a578635b5d607.scope >>> >>> .exec_clock: 0.000000 >>> >>> .MIN_vruntime: 0.000001 >>> >>> .min_vruntime: 8090.087297 >>> >>> .max_vruntime: 0.000001 >>> >>> .spread: 0.000000 >>> >>> .spread0 : -124692718.052832 >>> >>> .nr_spread_over: 0 >>> >>> .nr_running: 1 >>> >>> .load: 1024 >>> >>> .runnable_load_avg : 1023 >>> >>> .blocked_load_avg: 0 >>> >>> .tg_load_avg : 2046 >>> >>> .tg_load_contrib : 1023 >>> >>> .tg_runnable_contrib : 1023 >>> >>> .tg->runnable_avg: 2036 >>> >>> .tg->cfs_bandwidth.timer_active: 0 >>> >>> .throttled : 0 >>> >>> .throttle_count: 0 >>> >>> .se->exec_start: 236081964.515645 >>> >>> .se->vruntime: 24403993.326934 >>> >>> .se->sum_exec_runtime: 8091.135873 >>> >>> .se->load.weight : 512 >>> >>> .se->avg.runnable_avg_sum: 45979 >>> >>> .se->avg.runnable_avg_period : 45979 >>> >>> .se->avg.load_avg_contrib: 511 >>> >>> .se->avg.decay_count : 0 >>> >>> >>> Thanks, >>> David >>> >>> >>> On 5/10/2017 6:01 AM, Alex Bagehot wrote: >>> >>> Hi, >>> >>> On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette >>> > >>> wrote: >>> >>> >>> On Oct 4, 2017, at 2:30 PM, Robbin Ehn >>> > >>> wrote: >>> >>> Thanks Bob for looking into this. >>> >>> On 10/04/2017 08:14 PM, Bob Vandette wrote: >>> >>> Robbin, >>> I?ve looked into this issue and you are correct. I do have to examine >>> >>> both the >>> >>> sched_getaffinity results as well as the cgroup >>> cpu subsystem >>> >>> configuration >>> >>> files in order to provide a reasonable value for >>> active_processors. If >>> >>> I was only >>> >>> interested in cpusets, I could simply rely on the >>> getaffinity call but >>> >>> I also want to >>> >>> factor in shares and quotas as well. >>> >>> >>> We had a quick discussion at the office, we actually >>> do think that you >>> >>> could skip reading the shares and quotas. >>> >>> It really depends on what the user expect, if he give >>> us 4 cpu's with >>> >>> 50% or 2 full cpu what do he expect the differences would be? >>> >>> One could argue that he 'knows' that he will only use >>> max 50% and thus >>> >>> we can act as if he is giving us 4 full cpu. >>> >>> But I'll leave that up to you, just a tough we had. >>> >>> >>> It?s my opinion that we should do something if someone >>> makes the effort to >>> configure their >>> containers to use quotas or shares. There are many >>> different opinions on >>> what the right that >>> right ?something? is. >>> >>> >>> It might be interesting to look at some real instances of how >>> java might[3] >>> be deployed in containers. >>> Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so >>> this is a vast >>> chunk of deployments that need both of them today. >>> >>> >>> >>> Many developers that are trying to deploy apps that use >>> containers say >>> they don?t like >>> cpusets. This is too limiting for them especially when >>> the server >>> configurations vary >>> within their organization. >>> >>> >>> True, however Kubernetes has an alpha feature[5] where it >>> allocates cpusets >>> to containers that request a whole number of cpus. Previously >>> without >>> cpusets any container could run on any cpu which we know might >>> not be good >>> for some workloads that want isolation. A request for a >>> fractional or >>> burstable amount of cpu would be allocated from a shared cpu >>> pool. So >>> although manual allocation of cpusets will be flakey[3] , >>> automation should >>> be able to make it work. >>> >>> >>> >>> From everything I?ve read including source code, there >>> seems to be a >>> consensus that >>> shares and quotas are being used as a way to specify a >>> fraction of a >>> system (number of cpus). >>> >>> >>> A refinement[6] on this is: >>> Shares can be used for guaranteed cpu - you will always get >>> your share. >>> Quota[4] is a limit/constraint - you can never get more than >>> the quota. >>> So given the below limit of how many shares will be allocated >>> on a host you >>> can have burstable(or overcommit) capacity if your shares are >>> less than >>> your quota. >>> >>> >>> >>> Docker added ?cpus which is implemented using quotas and >>> periods. They >>> adjust these >>> two parameters to provide a way of calculating the number >>> of cpus that >>> will be available >>> to a process (quota/period). Amazon also documents that >>> cpu shares are >>> defined to be a multiple of 1024. >>> Where 1024 represents a single cpu and a share value of >>> N*1024 represents >>> N cpus. >>> >>> >>> Kubernetes and Mesos/Marathon also use the N*1024 shares per >>> host to >>> allocate resources automatically. >>> >>> Hopefully this provides some background on what a couple of >>> orchestration >>> systems that will be running java are doing currently in this >>> area. >>> Thanks, >>> Alex >>> >>> >>> [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e >>> >>> 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a >>> reasonable >>> intro : >>> https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke >>> >>> r-mesos-and-marathon/ ) >>> [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 >>> >>> >>> [2] https://kubernetes.io/docs/concepts/configuration/manage >>> >>> -compute-resources-container/ >>> >>> [3] https://youtu.be/w1rZOY5gbvk?t=2479 >>> >>> >>> [4] >>> https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt >>> >>> https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf >>> >>> https://lwn.net/Articles/428175/ >>> >>> >>> [5] >>> https://github.com/kubernetes/community/blob/43ce57ac476b9f2ce3f0220354a075e095a0d469/contributors/design-proposals/node/cpu-manager.md >>> >>> / https://github.com/kubernetes/kubernetes/commit/ >>> >>> 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / >>> https://vimeo.com/226858314 >>> >>> >>> [6] https://kubernetes.io/docs/concepts/configuration/manage- >>> >>> compute-resources-container/#how-pods-with-resource-limits-are-run >>> >>> >>> Of course these are just conventions. This is why I >>> provided a way of >>> specifying the >>> number of CPUs so folks deploying Java services can be >>> certain they get >>> what they want. >>> >>> Bob. >>> >>> >>> I had assumed that when sched_setaffinity was >>> called (in your case by >>> >>> numactl) that the >>> >>> cgroup cpu config files would be updated to >>> reflect the current >>> >>> processor affinity for the >>> >>> running process. This is not correct. I have >>> updated my changeset and >>> >>> have successfully >>> >>> run with your examples below. I?ll post a new >>> webrev soon. >>> >>> >>> I see, thanks again! >>> >>> /Robbin >>> >>> Thanks, >>> Bob. >>> >>> >>> I still want to include the flag for at >>> least one Java release in the >>> >>> event that the new behavior causes some regression >>> >>> in behavior. I?m trying to make the >>> detection robust so that it will >>> >>> fallback to the current behavior in the event >>> >>> that cgroups is not configured as expected >>> but I?d like to have a way >>> >>> of forcing the issue. JDK 10 is not >>> >>> supposed to be a long term support release >>> which makes it a good >>> >>> target for this new behavior. >>> >>> I agree with David that once we commit to >>> cgroups, we should extract >>> >>> all VM configuration data from that >>> >>> source. There?s more information >>> available for cpusets than just >>> >>> processor affinity that we might want to >>> >>> consider when calculating the number of >>> processors to assume for the >>> >>> VM. There?s exclusivity and >>> >>> effective cpu data available in addition >>> to the cpuset string. >>> >>> >>> cgroup only contains limits, not the real hard >>> limits. >>> You most consider the affinity mask. We that >>> have numa nodes do: >>> >>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 >>> --membind=1 java >>> >>> -Xlog:os=debug -cp . ForEver | grep proc >>> >>> [0.001s][debug][os] Initial active processor >>> count set to 16 >>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 >>> --membind=1 java >>> >>> -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | >>> grep proc >>> >>> [0.001s][debug][os] Initial active processor >>> count set to 32 >>> >>> when benchmarking all the time and that must >>> be set to 16 otherwise >>> >>> the flag is really bad for us. >>> >>> So the flag actually breaks the little numa >>> support we have now. >>> >>> Thanks, Robbin >>> >>> >>> >>> From vladimir.kozlov at oracle.com Fri Oct 6 17:22:24 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 6 Oct 2017 10:22:24 -0700 Subject: RFR(XXS) 8187685: NMT: Tracking compiler memory usage of thread's resource area In-Reply-To: <69808d92-6ac8-9d83-61dc-6bb45936b4dc@redhat.com> References: <69808d92-6ac8-9d83-61dc-6bb45936b4dc@redhat.com> Message-ID: Good. Thank you, Zhengyu. Vladimir On 10/5/17 12:47 PM, Zhengyu Gu wrote: > Compiler uses resource area for compilation, let's bias it to mtCompiler > for more accurate memory counting. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8187685 > Webrev: http://cr.openjdk.java.net/~zgu/8187685/webrev.00/index.html > > > Discussion thread: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028360.html > > > Test: > > ? hotspot_tier1? fastdebug and release on Linux x64. > > Thanks, > > -Zhengyu From coleen.phillimore at oracle.com Fri Oct 6 17:53:53 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 6 Oct 2017 13:53:53 -0400 Subject: RFR(XXS) 8187685: NMT: Tracking compiler memory usage of thread's resource area In-Reply-To: References: <69808d92-6ac8-9d83-61dc-6bb45936b4dc@redhat.com> Message-ID: This seems fine.? I'll sponsor it for you. Coleen On 10/6/17 1:22 PM, Vladimir Kozlov wrote: > Good. Thank you, Zhengyu. > > Vladimir > > On 10/5/17 12:47 PM, Zhengyu Gu wrote: >> Compiler uses resource area for compilation, let's bias it to >> mtCompiler for more accurate memory counting. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8187685 >> Webrev: http://cr.openjdk.java.net/~zgu/8187685/webrev.00/index.html >> >> >> Discussion thread: >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028360.html >> >> >> Test: >> >> ?? hotspot_tier1? fastdebug and release on Linux x64. >> >> Thanks, >> >> -Zhengyu From ioi.lam at oracle.com Fri Oct 6 20:19:20 2017 From: ioi.lam at oracle.com (Ioi Lam) Date: Fri, 6 Oct 2017 13:19:20 -0700 Subject: RFR (XS) 8188828 Intermittent ClassNotFoundException: jdk.test.lib.Platform for compiler tests Message-ID: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> Please review this very simple change: https://bugs.openjdk.java.net/browse/JDK-8188828 http://ioilinux.us.oracle.com/webrev/jdk10/8188828_compiler_test_class_not_found.v01/ The dependency of ??? FileInstaller -> Utils -> JDKToolLauncher -> Platform has caused many intermittent ClassNotFoundException in the hotspot nightly runs. While this fix does not address the root cause (proper dependencies are not specified in the test cases -- which we are planning to fix), we will hopefully see much fewer occurrences of this annoying failure scenario. Thanks a lot to Igor for suggesting the simple fix! - Ioi From igor.ignatyev at oracle.com Fri Oct 6 20:28:58 2017 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 6 Oct 2017 13:28:58 -0700 Subject: RFR (XS) 8188828 Intermittent ClassNotFoundException: jdk.test.lib.Platform for compiler tests In-Reply-To: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> References: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> Message-ID: <56725166-18B0-47B4-A8FB-DED8B149604D@oracle.com> Hi Ioi, I'm really happy we found such a simple workaround for this annoying problem and hope it'll greatly reduce CNFE in our test runs. the fix looks good to me. Thanks, -- Igor > On Oct 6, 2017, at 1:19 PM, Ioi Lam wrote: > > Please review this very simple change: > > https://bugs.openjdk.java.net/browse/JDK-8188828 > http://ioilinux.us.oracle.com/webrev/jdk10/8188828_compiler_test_class_not_found.v01/ > > The dependency of > > FileInstaller -> Utils -> JDKToolLauncher -> Platform > > has caused many intermittent ClassNotFoundException in the hotspot nightly runs. > While this fix does not address the root cause (proper dependencies are not > specified in the test cases -- which we are planning to fix), we will hopefully > see much fewer occurrences of this annoying failure scenario. > > Thanks a lot to Igor for suggesting the simple fix! > > - Ioi > From george.triantafillou at oracle.com Fri Oct 6 20:39:16 2017 From: george.triantafillou at oracle.com (George Triantafillou) Date: Fri, 6 Oct 2017 16:39:16 -0400 Subject: RFR (XS) 8188828 Intermittent ClassNotFoundException: jdk.test.lib.Platform for compiler tests In-Reply-To: <56725166-18B0-47B4-A8FB-DED8B149604D@oracle.com> References: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> <56725166-18B0-47B4-A8FB-DED8B149604D@oracle.com> Message-ID: Hi Ioi, Looks good! -George On 10/6/2017 4:28 PM, Igor Ignatyev wrote: > Hi Ioi, > > I'm really happy we found such a simple workaround for this annoying problem and hope it'll greatly reduce CNFE in our test runs. > > the fix looks good to me. > > Thanks, > -- Igor > >> On Oct 6, 2017, at 1:19 PM, Ioi Lam wrote: >> >> Please review this very simple change: >> >> https://bugs.openjdk.java.net/browse/JDK-8188828 >> http://ioilinux.us.oracle.com/webrev/jdk10/8188828_compiler_test_class_not_found.v01/ >> >> The dependency of >> >> FileInstaller -> Utils -> JDKToolLauncher -> Platform >> >> has caused many intermittent ClassNotFoundException in the hotspot nightly runs. >> While this fix does not address the root cause (proper dependencies are not >> specified in the test cases -- which we are planning to fix), we will hopefully >> see much fewer occurrences of this annoying failure scenario. >> >> Thanks a lot to Igor for suggesting the simple fix! >> >> - Ioi >> From zgu at redhat.com Fri Oct 6 21:44:05 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 6 Oct 2017 17:44:05 -0400 Subject: RFR(XXS) 8187685: NMT: Tracking compiler memory usage of thread's resource area In-Reply-To: References: <69808d92-6ac8-9d83-61dc-6bb45936b4dc@redhat.com> Message-ID: <16b25caf-8dc2-c899-3840-553908c5ebf5@redhat.com> Thanks for the review, Vladimir. -Zhengyu On 10/06/2017 01:22 PM, Vladimir Kozlov wrote: > Good. Thank you, Zhengyu. > > Vladimir > > On 10/5/17 12:47 PM, Zhengyu Gu wrote: >> Compiler uses resource area for compilation, let's bias it to >> mtCompiler for more accurate memory counting. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8187685 >> Webrev: http://cr.openjdk.java.net/~zgu/8187685/webrev.00/index.html >> >> >> Discussion thread: >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028360.html >> >> >> Test: >> >> hotspot_tier1 fastdebug and release on Linux x64. >> >> Thanks, >> >> -Zhengyu From zgu at redhat.com Fri Oct 6 21:45:42 2017 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 6 Oct 2017 17:45:42 -0400 Subject: RFR(XXS) 8187685: NMT: Tracking compiler memory usage of thread's resource area In-Reply-To: References: <69808d92-6ac8-9d83-61dc-6bb45936b4dc@redhat.com> Message-ID: Hi Coleen, Thanks for the review and sponsor! -Zhengyu On 10/06/2017 01:53 PM, coleen.phillimore at oracle.com wrote: > This seems fine. I'll sponsor it for you. > Coleen > > > On 10/6/17 1:22 PM, Vladimir Kozlov wrote: >> Good. Thank you, Zhengyu. >> >> Vladimir >> >> On 10/5/17 12:47 PM, Zhengyu Gu wrote: >>> Compiler uses resource area for compilation, let's bias it to >>> mtCompiler for more accurate memory counting. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8187685 >>> Webrev: http://cr.openjdk.java.net/~zgu/8187685/webrev.00/index.html >>> >>> >>> Discussion thread: >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028360.html >>> >>> >>> Test: >>> >>> hotspot_tier1 fastdebug and release on Linux x64. >>> >>> Thanks, >>> >>> -Zhengyu > -------------- next part -------------- A non-text attachment was scrubbed... Name: 8187685.patch Type: text/x-patch Size: 2473 bytes Desc: not available URL: From david.holmes at oracle.com Fri Oct 6 23:10:34 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 7 Oct 2017 09:10:34 +1000 Subject: RFR (M): 8188813: Generalize OrderAccess to use templates In-Reply-To: References: <59D639E1.7070104@oracle.com> Message-ID: On 7/10/2017 1:09 AM, coleen.phillimore at oracle.com wrote: > http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.inline.hpp.udiff.html > > > +template > +struct OrderAccess::PlatformOrderedStore > + VALUE_OBJ_CLASS_SPEC > +{ > + template > + void operator()(T v, volatile T* p) const { release_store(p, v); > fence(); } > +}; > > Isn't release_store() removed by this patch?? Or does this call back to > OrderAccess::release_store, which seems circular (?) It's the same as the existing implementation. Without a specialization for a specific CPU the release_store_fence is just a release_store then a fence. David > Otherwise this looks really nice. > > I'll remove the *_ptr versions with > https://bugs.openjdk.java.net/browse/JDK-8188220 . It's been fun. > > Thanks, > Coleen > > > On 10/5/17 9:55 AM, Erik ?sterlund wrote: >> Hi, >> >> Now that Atomic has been generalized with templates, the same should >> to be done to OrderAccess. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8188813 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/ >> >> Testing: mach5 hs-tier3 >> >> Since Atomic already has a mechanism for type checking generic >> arguments for Atomic::load/store, and OrderAccess also is a bunch of >> semantically decorated loads and stores, I decided to reuse the >> template wheel that was already invented (Atomic::LoadImpl and >> Atomic::StoreImpl). >> Therefore, I made OrderAccess privately inherit Atomic so that this >> infrastructure could be reused. A whole bunch of code has been nuked >> with this generalization. >> >> It is worth noting that I have added PrimitiveConversion functionality >> for doubles and floats which translates to using the union trick for >> casting double to and from int64_t and float to and from int32_t when >> passing down doubles and ints to the API. I need the former two, >> because Java supports volatile double and volatile float, and >> therefore runtime support for that needs to be able to use floats and >> doubles. I also added PrimitiveConversion functionality for the >> subclasses of oop (instanceOop and friends). The base class oop >> already supported this, so it seemed natural that the subclasses >> should support it too. >> >> Thanks, >> /Erik > From david.holmes at oracle.com Fri Oct 6 23:28:14 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 7 Oct 2017 09:28:14 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> Message-ID: On 7/10/2017 1:34 AM, Bob Vandette wrote: > >> On Oct 5, 2017, at 6:12 PM, David Holmes wrote: >> >> Hi Bob, >> >> On 6/10/2017 3:57 AM, Bob Vandette wrote: >>>> On Oct 5, 2017, at 12:43 PM, Alex Bagehot > wrote: >>>> >>>> Hi David, >>>> >>>> On Wed, Oct 4, 2017 at 10:51 PM, David Holmes > wrote: >>>> >>>> Hi Alex, >>>> >>>> Can you tell me how shares/quotas are actually implemented in >>>> terms of allocating "cpus" to processes when shares/quotas are >>>> being applied? >>>> >>>> The allocation of cpus to processes/threads(tasks as the kernel sees them) or the other way round is called balancing, which is done by Scheduling domains[3]. >>>> >>>> cpu shares use CFS "group" scheduling[1] to apply the share to all the tasks(threads) in the container. The container cpu shares weight maps directly to a task's weight in CFS, which given it is part of a group is divided by the number of tasks in the group (ie. a default container share of 1024 with 2 threads in the container/group would result in each thread/task having a 512 weight[4]). The same values used by nice[2] also. >>>> >>>> You can observe the task weight and other scheduler numbers in /proc/sched_debug [4]. You can also kernel trace scheduler activity which typically tells you the tasks involved, the cpu, the event: switch or wakeup, etc. >>>> >>>> For example in a 12 cpu system if I have a 50% share do I get all >>>> 12 CPUs for 50% of a "quantum" each, or do I get 6 CPUs for a full >>>> quantum each? >>>> >>>> >>>> You get 12 cpus for 50% of the time on the average if there is another workload that has the same weight as you and is consuming as much as it can. >>>> If there's nothing else running on the machine you get 12 cpus for 100% of the time with a cpu shares only config (ie. the burst capacity). >>>> >>>> I validated that the share was balanced over all the cpus by running linux perf events and checking that there were cpu samples on all cpus. There's bound to be other ways of doing it also. >>>> >>>> >>>> When we try to use the "number of processors" to control the >>>> number of threads created, or the number of partitions in a task, >>>> then we really want to know how many CPUs we can actually be >>>> concurrently running on! >>> I?m not sure that the primary question for serverless container execution. Just because you might happen to burst and have available >>> to you more CPU time than you specified in your shares doesn?t mean >>> that a multi-threaded application running in one of these containers should configure itself to use all available host processors. This would result in over-burdoning the system at times of high load. >> >> And conversely if you restrict yourself to the "share" of processors you get over time (ie 6 instead of 12) then you can severely impact the performance (response time in particular) of the VM and the application running on the VM. > > So if someone configures an 88 way system to use 1/88 share, you don?t think they expect a highly threaded > application to run slower than if they didn?t restrict the shares?? The whole idea about shares is to SHARE the > system. Yes, you?d have better performance when the system is idle and only running a single application but that?s > not what these container frameworks are trying to accomplish. They want to get the best performance when running many > many processes. That?s what I?m optimizing for. In what I described you are SHARING the system. You're also getting the most benefit from a lightly loaded system. To me the conceptual model for a 1/88 share of an 88-way system is that you get 88 processors that appear to run at 1/88 the speed of the physical ones. Not that you get 1 real full speed processor. >> >> But I don't see how this can overburden the system. If you app is running alone you get to use all 12 cpus for 100% of the time and life is good. If another app starts up then your 100% drops proportionately. If you schedule 12 apps all with a 1/12 share then everyone gets up to 12 cpus for 1/12 of the time. It's only if you try to schedule a set of apps with a utilization total greater than 1 does the system become overloaded. > > In my above example, If we run the VM ergonomics based on 88 CPUs, then we are wasting a lot of memory on thread stacks and when > many of these processes are running, the system will context switch a lot more than it would if we restricted the creation of threads to > the share amount. Context switching is a function of threads and time. My way uses more threads and less time (per unit of work); yours uses less threads and more time. Seems like zero sum to me. Memory use is a different matter, but only because you can restrict memory independently of cpus. So you will need to ensure your memory quotas can accommodate the number of threads you expect to run - regardless. David ----- > Bob. > > >> >>> The Java runtime, at startup, configures several subsystems to use a number of threads for each system based on the number of available >>> processors. These subsystems include things like the number of GC >>> threads, JIT compiler and thread pools. >> >>> The problem I am trying to solve is to come up with a single number >>> of CPUs based on container knowledge that can be used for the Java >>> runtime subsystem to configure itself. I believe that we should >>> trust the implementor of the Mesos or Kubernetes setup and honor their wishes when coming up with this number and not just use the >>> processor affinity or number of cpus in the cpuset. >> >> I don't agree, as has been discussed before. It's perfectly fine, even desirable, in my opinion to have 12 threads executing concurrently for 50% of the time, rather than only 6 threads for 100% (assuming the scheduling technology is even clever enough to realize it can grant your threads 100%). >> >> Over time the amount of work your app can execute is the same, but the time taken for an individual subtask can vary. If you are just doing one-shot batch processing then it makes no difference. If you're running an app that itself services incoming requests then the response time to individual requests can be impacted. To take the worst-case scenario, imagine you get 12 concurrent requests that would each take 1/12 of your cpu quota. With 12 threads on 12 cpus you can service all 12 requests with a response time of 1/12 time units. But with 6 threads on 6 cpus you can only service 6 requests with a 1/12 response time, and the other 6 will have a 1/6 response time. >> >>> The challenge is determining the right algorithm that doesn?t penalize the VM. >> >> Agreed. But I think the current algorithm may penalize the VM, and more importantly the application it is running. >> >>> My current implementation does this: >>> total available logical processors = min (cpusets,sched_getaffinity,shares/1024, quota/period) >>> All fractional units are rounded up to the next whole number. >> >> My point has always been that I just don't think producing a single number from all these factors is the right/best way to deal with this. I think we really want to be able to answer the question "how many processors can I concurrently execute on" distinct from the question of "how much of a time slice will I get on each of those processors". To me "how many" is the question that "availableProcessors" should be answering - and only that question. How much "share" do I get is a different question, and perhaps one that the VM and the application need to be able to ask. >> >> BTW sched_getaffinity should already account for cpusets ?? >> >> Cheers, >> David >> >>> Bob. >>>> >>>> Makes sense to check. Hopefully there aren't any major errors or omissions in the above. >>>> Thanks, >>>> Alex >>>> >>>> [1] https://lwn.net/Articles/240474/ >>>> [2] https://github.com/torvalds/linux/blob/368f89984bb971b9f8b69eeb85ab19a89f985809/kernel/sched/core.c#L6735 >>>> [3] https://lwn.net/Articles/80911/ / http://www.i3s.unice.fr/~jplozi/wastedcores/files/extended_talk.pdf >>>> >>>> [4] >>>> >>>> cfs_rq[13]:/system.slice/docker-f5681788d6daab249c90810fe60da429a2565b901ff34245922a578635b5d607.scope >>>> >>>> .exec_clock: 0.000000 >>>> >>>> .MIN_vruntime: 0.000001 >>>> >>>> .min_vruntime: 8090.087297 >>>> >>>> .max_vruntime: 0.000001 >>>> >>>> .spread: 0.000000 >>>> >>>> .spread0 : -124692718.052832 >>>> >>>> .nr_spread_over: 0 >>>> >>>> .nr_running: 1 >>>> >>>> .load: 1024 >>>> >>>> .runnable_load_avg : 1023 >>>> >>>> .blocked_load_avg: 0 >>>> >>>> .tg_load_avg : 2046 >>>> >>>> .tg_load_contrib : 1023 >>>> >>>> .tg_runnable_contrib : 1023 >>>> >>>> .tg->runnable_avg: 2036 >>>> >>>> .tg->cfs_bandwidth.timer_active: 0 >>>> >>>> .throttled : 0 >>>> >>>> .throttle_count: 0 >>>> >>>> .se->exec_start: 236081964.515645 >>>> >>>> .se->vruntime: 24403993.326934 >>>> >>>> .se->sum_exec_runtime: 8091.135873 >>>> >>>> .se->load.weight : 512 >>>> >>>> .se->avg.runnable_avg_sum: 45979 >>>> >>>> .se->avg.runnable_avg_period : 45979 >>>> >>>> .se->avg.load_avg_contrib: 511 >>>> >>>> .se->avg.decay_count : 0 >>>> >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> On 5/10/2017 6:01 AM, Alex Bagehot wrote: >>>> >>>> Hi, >>>> >>>> On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette >>>> > >>>> wrote: >>>> >>>> >>>> On Oct 4, 2017, at 2:30 PM, Robbin Ehn >>>> > >>>> wrote: >>>> >>>> Thanks Bob for looking into this. >>>> >>>> On 10/04/2017 08:14 PM, Bob Vandette wrote: >>>> >>>> Robbin, >>>> I?ve looked into this issue and you are correct. I do have to examine >>>> >>>> both the >>>> >>>> sched_getaffinity results as well as the cgroup >>>> cpu subsystem >>>> >>>> configuration >>>> >>>> files in order to provide a reasonable value for >>>> active_processors. If >>>> >>>> I was only >>>> >>>> interested in cpusets, I could simply rely on the >>>> getaffinity call but >>>> >>>> I also want to >>>> >>>> factor in shares and quotas as well. >>>> >>>> >>>> We had a quick discussion at the office, we actually >>>> do think that you >>>> >>>> could skip reading the shares and quotas. >>>> >>>> It really depends on what the user expect, if he give >>>> us 4 cpu's with >>>> >>>> 50% or 2 full cpu what do he expect the differences would be? >>>> >>>> One could argue that he 'knows' that he will only use >>>> max 50% and thus >>>> >>>> we can act as if he is giving us 4 full cpu. >>>> >>>> But I'll leave that up to you, just a tough we had. >>>> >>>> >>>> It?s my opinion that we should do something if someone >>>> makes the effort to >>>> configure their >>>> containers to use quotas or shares. There are many >>>> different opinions on >>>> what the right that >>>> right ?something? is. >>>> >>>> >>>> It might be interesting to look at some real instances of how >>>> java might[3] >>>> be deployed in containers. >>>> Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so >>>> this is a vast >>>> chunk of deployments that need both of them today. >>>> >>>> >>>> >>>> Many developers that are trying to deploy apps that use >>>> containers say >>>> they don?t like >>>> cpusets. This is too limiting for them especially when >>>> the server >>>> configurations vary >>>> within their organization. >>>> >>>> >>>> True, however Kubernetes has an alpha feature[5] where it >>>> allocates cpusets >>>> to containers that request a whole number of cpus. Previously >>>> without >>>> cpusets any container could run on any cpu which we know might >>>> not be good >>>> for some workloads that want isolation. A request for a >>>> fractional or >>>> burstable amount of cpu would be allocated from a shared cpu >>>> pool. So >>>> although manual allocation of cpusets will be flakey[3] , >>>> automation should >>>> be able to make it work. >>>> >>>> >>>> >>>> From everything I?ve read including source code, there >>>> seems to be a >>>> consensus that >>>> shares and quotas are being used as a way to specify a >>>> fraction of a >>>> system (number of cpus). >>>> >>>> >>>> A refinement[6] on this is: >>>> Shares can be used for guaranteed cpu - you will always get >>>> your share. >>>> Quota[4] is a limit/constraint - you can never get more than >>>> the quota. >>>> So given the below limit of how many shares will be allocated >>>> on a host you >>>> can have burstable(or overcommit) capacity if your shares are >>>> less than >>>> your quota. >>>> >>>> >>>> >>>> Docker added ?cpus which is implemented using quotas and >>>> periods. They >>>> adjust these >>>> two parameters to provide a way of calculating the number >>>> of cpus that >>>> will be available >>>> to a process (quota/period). Amazon also documents that >>>> cpu shares are >>>> defined to be a multiple of 1024. >>>> Where 1024 represents a single cpu and a share value of >>>> N*1024 represents >>>> N cpus. >>>> >>>> >>>> Kubernetes and Mesos/Marathon also use the N*1024 shares per >>>> host to >>>> allocate resources automatically. >>>> >>>> Hopefully this provides some background on what a couple of >>>> orchestration >>>> systems that will be running java are doing currently in this >>>> area. >>>> Thanks, >>>> Alex >>>> >>>> >>>> [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e >>>> >>>> 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a >>>> reasonable >>>> intro : >>>> https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke >>>> >>>> r-mesos-and-marathon/ ) >>>> [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 >>>> >>>> >>>> [2] https://kubernetes.io/docs/concepts/configuration/manage >>>> >>>> -compute-resources-container/ >>>> >>>> [3] https://youtu.be/w1rZOY5gbvk?t=2479 >>>> >>>> >>>> [4] >>>> https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt >>>> >>>> https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf >>>> >>>> https://lwn.net/Articles/428175/ >>>> >>>> >>>> [5] >>>> https://github.com/kubernetes/community/blob/43ce57ac476b9f2ce3f0220354a075e095a0d469/contributors/design-proposals/node/cpu-manager.md >>>> >>>> / https://github.com/kubernetes/kubernetes/commit/ >>>> >>>> 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / >>>> https://vimeo.com/226858314 >>>> >>>> >>>> [6] https://kubernetes.io/docs/concepts/configuration/manage- >>>> >>>> compute-resources-container/#how-pods-with-resource-limits-are-run >>>> >>>> >>>> Of course these are just conventions. This is why I >>>> provided a way of >>>> specifying the >>>> number of CPUs so folks deploying Java services can be >>>> certain they get >>>> what they want. >>>> >>>> Bob. >>>> >>>> >>>> I had assumed that when sched_setaffinity was >>>> called (in your case by >>>> >>>> numactl) that the >>>> >>>> cgroup cpu config files would be updated to >>>> reflect the current >>>> >>>> processor affinity for the >>>> >>>> running process. This is not correct. I have >>>> updated my changeset and >>>> >>>> have successfully >>>> >>>> run with your examples below. I?ll post a new >>>> webrev soon. >>>> >>>> >>>> I see, thanks again! >>>> >>>> /Robbin >>>> >>>> Thanks, >>>> Bob. >>>> >>>> >>>> I still want to include the flag for at >>>> least one Java release in the >>>> >>>> event that the new behavior causes some regression >>>> >>>> in behavior. I?m trying to make the >>>> detection robust so that it will >>>> >>>> fallback to the current behavior in the event >>>> >>>> that cgroups is not configured as expected >>>> but I?d like to have a way >>>> >>>> of forcing the issue. JDK 10 is not >>>> >>>> supposed to be a long term support release >>>> which makes it a good >>>> >>>> target for this new behavior. >>>> >>>> I agree with David that once we commit to >>>> cgroups, we should extract >>>> >>>> all VM configuration data from that >>>> >>>> source. There?s more information >>>> available for cpusets than just >>>> >>>> processor affinity that we might want to >>>> >>>> consider when calculating the number of >>>> processors to assume for the >>>> >>>> VM. There?s exclusivity and >>>> >>>> effective cpu data available in addition >>>> to the cpuset string. >>>> >>>> >>>> cgroup only contains limits, not the real hard >>>> limits. >>>> You most consider the affinity mask. We that >>>> have numa nodes do: >>>> >>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 >>>> --membind=1 java >>>> >>>> -Xlog:os=debug -cp . ForEver | grep proc >>>> >>>> [0.001s][debug][os] Initial active processor >>>> count set to 16 >>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 >>>> --membind=1 java >>>> >>>> -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | >>>> grep proc >>>> >>>> [0.001s][debug][os] Initial active processor >>>> count set to 32 >>>> >>>> when benchmarking all the time and that must >>>> be set to 16 otherwise >>>> >>>> the flag is really bad for us. >>>> >>>> So the flag actually breaks the little numa >>>> support we have now. >>>> >>>> Thanks, Robbin >>>> >>>> >>>> >>>> > From vladimir.kozlov at oracle.com Fri Oct 6 23:35:30 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 6 Oct 2017 16:35:30 -0700 Subject: RFR (XS) 8188828 Intermittent ClassNotFoundException: jdk.test.lib.Platform for compiler tests In-Reply-To: <56725166-18B0-47B4-A8FB-DED8B149604D@oracle.com> References: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> <56725166-18B0-47B4-A8FB-DED8B149604D@oracle.com> Message-ID: Looks good. Thanks, Vladimir On 10/6/17 1:28 PM, Igor Ignatyev wrote: > Hi Ioi, > > I'm really happy we found such a simple workaround for this annoying problem and hope it'll greatly reduce CNFE in our test runs. > > the fix looks good to me. > > Thanks, > -- Igor > >> On Oct 6, 2017, at 1:19 PM, Ioi Lam wrote: >> >> Please review this very simple change: >> >> https://bugs.openjdk.java.net/browse/JDK-8188828 >> http://ioilinux.us.oracle.com/webrev/jdk10/8188828_compiler_test_class_not_found.v01/ >> >> The dependency of >> >> FileInstaller -> Utils -> JDKToolLauncher -> Platform >> >> has caused many intermittent ClassNotFoundException in the hotspot nightly runs. >> While this fix does not address the root cause (proper dependencies are not >> specified in the test cases -- which we are planning to fix), we will hopefully >> see much fewer occurrences of this annoying failure scenario. >> >> Thanks a lot to Igor for suggesting the simple fix! >> >> - Ioi >> > From wenlei.xie at gmail.com Sat Oct 7 06:42:53 2017 From: wenlei.xie at gmail.com (Wenlei Xie) Date: Fri, 6 Oct 2017 23:42:53 -0700 Subject: Questions about ... Lambda Form Compilation In-Reply-To: <57d5cf51-111f-d34a-e161-02df724b6577@oracle.com> References: <57d5cf51-111f-d34a-e161-02df724b6577@oracle.com> Message-ID: Thank you Vladimir! We are aware of MethodHandle get customization after calling over 127 times (thank you for the explanation in http://mail.openjdk.java.net/pipermail/mlvm-dev/2017-May/006755.html as well! ). And thus we are trying to avoid continuously instantiating them. For this case, the MethodHandle get continuously instantiated should be cached by LoadingCache in Guava. We are looking into why the cache fails to work in the expected way. Will get back if we have any new observations or findings! Thank you for the help! Best, Wenlei On Tue, Oct 3, 2017 at 4:54 AM, Vladimir Ivanov < vladimir.x.ivanov at oracle.com> wrote: > Hi, > > 2. For the same cluster, we also see over half of machines repeatedly >> experiencing full GC due to Metaspace full. We dump JSTACK for every >> minute >> during 30 minutes, and see many threads are trying to compile the exact >> same lambda form throughout the 30-minute period. >> >> Here is an example stacktrace on one machine. The LambdaForm triggers the >> compilation on that machine is always LambdaForm$MH/170067652. Once it's >> compiled, it should use the new compiled lambda form. We don't know why >> it's still trying to compile the same lambda form again and again. -- >> Would >> it be because the compiled lambda form somehow failed to load? This might >> relate to the negative number of loaded classes. >> > > What you are seeing here is LambdaForm customization (8069591 [1]). > > Customization creates a new LambdaForm instance specialized for a > particular MethodHandle instance (no LF sharing possible). It was designed > to alleviate performance penalty when inlining through a MH invoker doesn't > happen and enables JIT-compilers to compile the whole method handle chain > into a single nmethod. Without customization a method handle chain breaks > up into a chain of small nmethods (1 nmethod per LambdaForm) and calls > between them start dominate the execution time. (More details are available > in [2].) > > Customization takes place once a method handle has been invoked through > MH.invoke/invokeExact() more than 127 times. > > Considering you observe continuous customization, it means there are > method handles being continuously instantiated and used which share the > same lambda form (LambdaForm$MH/170067652). It leads to excessive > generation of VM anonymous classes and creates memory pressure in Metaspace. > > As a workaround, you can try to disable LF customization > (java.lang.invoke.MethodHandle.CUSTOMIZE_THRESHOLD=-1). > > But I'd suggest to look into why the application continuously creates > method handles. As you noted, it doesn't play well with existing heuristics > aimed at maximum throughput which assume the application behavior > "stabilizes" over time. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8069591 > > [2] http://cr.openjdk.java.net/~vlivanov/talks/2015-JVMLS_State_of_JLI.pdf > slides #45-#50 > > "20170926_232912_39740_3vuuu.1.79-4-76640" #76640 prio=5 os_prio=0 >> tid=0x00007f908006dbd0 nid=0x150a6 runnable [0x00007f8bddb1b000] >> java.lang.Thread.State: RUNNABLE >> at sun.misc.Unsafe.defineAnonymousClass(Native Method) >> at java.lang.invoke.InvokerBytecodeGenerator. >> loadAndInitializeInvokerClass(InvokerBytecodeGenerator.java:284) >> at java.lang.invoke.InvokerBytecodeGenerator.loadMethod( >> InvokerBytecodeGenerator.java:276) >> at java.lang.invoke.InvokerBytecodeGenerator. >> generateCustomizedCode(InvokerBytecodeGenerator.java:618) >> at java.lang.invoke.LambdaForm.compileToBytecode(LambdaForm. >> java:654) >> at java.lang.invoke.LambdaForm.prepare(LambdaForm.java:635) >> at java.lang.invoke.MethodHandle. >> updateForm(MethodHandle.java: >> 1432) >> at java.lang.invoke.MethodHandle. >> customize(MethodHandle.java: >> 1442) >> at java.lang.invoke.Invokers.mayb >> eCustomize(Invokers.java:407) >> at java.lang.invoke.Invokers.chec >> kCustomized(Invokers.java:398) >> at java.lang.invoke.LambdaForm$MH/170067652.invokeExact_MT( >> LambdaForm$MH) >> at com.facebook.presto.operator.aggregation.MinMaxHelper. >> combineStateWithState(MinMaxHelper.java:141) >> at com.facebook.presto.operator.aggregation. >> MaxAggregationFunction.combine(MaxAggregationFunction.java:108) >> at java.lang.invoke.LambdaForm$DMH/1607453282.invokeStatic_ >> L3_V(LambdaForm$DMH) >> at java.lang.invoke.LambdaForm$BMH/1118134445.reinvoke( >> LambdaForm$BMH) >> at java.lang.invoke.LambdaForm$MH/1971758264. >> linkToTargetMethod(LambdaForm$MH) >> at com.facebook.presto.$gen.IntegerIntegerMaxGroupedAccumu >> lator_3439.addIntermediate(Unknown Source) >> at com.facebook.presto.operator.aggregation.builder. >> InMemoryHashAggregationBuilder$Aggregator.processPage( >> InMemoryHashAggregationBuilder.java:367) >> at com.facebook.presto.operator.aggregation.builder. >> InMemoryHashAggregationBuilder.processPage(InMemoryHashAggregationBuilder >> .java:138) >> at com.facebook.presto.operator.HashAggregationOperator. >> addInput(HashAggregationOperator.java:400) >> at com.facebook.presto.operator.D >> river.processInternal(Driver. >> java:343) >> at com.facebook.presto.operator.Driver.lambda$processFor$6( >> Driver.java:241) >> at com.facebook.presto.operator.Driver$$Lambda$765/ >> 442308692.get(Unknown >> Source) >> at com.facebook.presto.operator.Driver.tryWithLock(Driver. >> java:614) >> at com.facebook.presto.operator.D >> river.processFor(Driver.java: >> 235) >> at com.facebook.presto.execution.SqlTaskExecution$ >> DriverSplitRunner.processFor(SqlTaskExecution.java:622) >> at com.facebook.presto.execution.executor. >> PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) >> at com.facebook.presto.execution.executor.TaskExecutor$ >> TaskRunner.run(TaskExecutor.java:485) >> at java.util.concurrent.ThreadPoolExecutor.runWorker( >> ThreadPoolExecutor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run( >> ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:748) >> ... >> >> >> >> Both issues go away after we restart the JVM, and the same query won't >> trigger the LambdaForm compilation issue, so it looks like the JVM enters >> some weird state. We are wondering if there is any thoughts on what could >> trigger these issues? Or is there any suggestions about how to further >> investigate it next time we see the VM in this state? >> >> Thank you. >> >> >> -- Best Regards, Wenlei Xie (???) Email: wenlei.xie at gmail.com From forax at univ-mlv.fr Sat Oct 7 09:13:47 2017 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 7 Oct 2017 11:13:47 +0200 (CEST) Subject: Questions about ... Lambda Form Compilation In-Reply-To: References: <57d5cf51-111f-d34a-e161-02df724b6577@oracle.com> Message-ID: <2012378759.3898930.1507367627698.JavaMail.zimbra@u-pem.fr> Depending on what you want to do, you can also use java.lang.ClassValue as cache. cheers, R?mi ----- Mail original ----- > De: "Wenlei Xie" > ?: "Vladimir Ivanov" > Cc: hotspot-dev at openjdk.java.net > Envoy?: Samedi 7 Octobre 2017 08:42:53 > Objet: Re: Questions about ... Lambda Form Compilation > Thank you Vladimir! > > We are aware of MethodHandle get customization after calling over 127 times > (thank you for the explanation in > http://mail.openjdk.java.net/pipermail/mlvm-dev/2017-May/006755.html as > well! ). And thus we are trying to avoid continuously instantiating them. > > For this case, the MethodHandle get continuously instantiated should be > cached by LoadingCache in Guava. We are looking into why the cache fails to > work in the expected way. Will get back if we have any new observations or > findings! > > Thank you for the help! > > Best, > Wenlei > > On Tue, Oct 3, 2017 at 4:54 AM, Vladimir Ivanov < > vladimir.x.ivanov at oracle.com> wrote: > >> Hi, >> >> 2. For the same cluster, we also see over half of machines repeatedly >>> experiencing full GC due to Metaspace full. We dump JSTACK for every >>> minute >>> during 30 minutes, and see many threads are trying to compile the exact >>> same lambda form throughout the 30-minute period. >>> >>> Here is an example stacktrace on one machine. The LambdaForm triggers the >>> compilation on that machine is always LambdaForm$MH/170067652. Once it's >>> compiled, it should use the new compiled lambda form. We don't know why >>> it's still trying to compile the same lambda form again and again. -- >>> Would >>> it be because the compiled lambda form somehow failed to load? This might >>> relate to the negative number of loaded classes. >>> >> >> What you are seeing here is LambdaForm customization (8069591 [1]). >> >> Customization creates a new LambdaForm instance specialized for a >> particular MethodHandle instance (no LF sharing possible). It was designed >> to alleviate performance penalty when inlining through a MH invoker doesn't >> happen and enables JIT-compilers to compile the whole method handle chain >> into a single nmethod. Without customization a method handle chain breaks >> up into a chain of small nmethods (1 nmethod per LambdaForm) and calls >> between them start dominate the execution time. (More details are available >> in [2].) >> >> Customization takes place once a method handle has been invoked through >> MH.invoke/invokeExact() more than 127 times. >> >> Considering you observe continuous customization, it means there are >> method handles being continuously instantiated and used which share the >> same lambda form (LambdaForm$MH/170067652). It leads to excessive >> generation of VM anonymous classes and creates memory pressure in Metaspace. >> >> As a workaround, you can try to disable LF customization >> (java.lang.invoke.MethodHandle.CUSTOMIZE_THRESHOLD=-1). >> >> But I'd suggest to look into why the application continuously creates >> method handles. As you noted, it doesn't play well with existing heuristics >> aimed at maximum throughput which assume the application behavior >> "stabilizes" over time. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8069591 >> >> [2] http://cr.openjdk.java.net/~vlivanov/talks/2015-JVMLS_State_of_JLI.pdf >> slides #45-#50 >> >> "20170926_232912_39740_3vuuu.1.79-4-76640" #76640 prio=5 os_prio=0 >>> tid=0x00007f908006dbd0 nid=0x150a6 runnable [0x00007f8bddb1b000] >>> java.lang.Thread.State: RUNNABLE >>> at sun.misc.Unsafe.defineAnonymousClass(Native Method) >>> at java.lang.invoke.InvokerBytecodeGenerator. >>> loadAndInitializeInvokerClass(InvokerBytecodeGenerator.java:284) >>> at java.lang.invoke.InvokerBytecodeGenerator.loadMethod( >>> InvokerBytecodeGenerator.java:276) >>> at java.lang.invoke.InvokerBytecodeGenerator. >>> generateCustomizedCode(InvokerBytecodeGenerator.java:618) >>> at java.lang.invoke.LambdaForm.compileToBytecode(LambdaForm. >>> java:654) >>> at java.lang.invoke.LambdaForm.prepare(LambdaForm.java:635) >>> at java.lang.invoke.MethodHandle. >>> updateForm(MethodHandle.java: >>> 1432) >>> at java.lang.invoke.MethodHandle. >>> customize(MethodHandle.java: >>> 1442) >>> at java.lang.invoke.Invokers.mayb >>> eCustomize(Invokers.java:407) >>> at java.lang.invoke.Invokers.chec >>> kCustomized(Invokers.java:398) >>> at java.lang.invoke.LambdaForm$MH/170067652.invokeExact_MT( >>> LambdaForm$MH) >>> at com.facebook.presto.operator.aggregation.MinMaxHelper. >>> combineStateWithState(MinMaxHelper.java:141) >>> at com.facebook.presto.operator.aggregation. >>> MaxAggregationFunction.combine(MaxAggregationFunction.java:108) >>> at java.lang.invoke.LambdaForm$DMH/1607453282.invokeStatic_ >>> L3_V(LambdaForm$DMH) >>> at java.lang.invoke.LambdaForm$BMH/1118134445.reinvoke( >>> LambdaForm$BMH) >>> at java.lang.invoke.LambdaForm$MH/1971758264. >>> linkToTargetMethod(LambdaForm$MH) >>> at com.facebook.presto.$gen.IntegerIntegerMaxGroupedAccumu >>> lator_3439.addIntermediate(Unknown Source) >>> at com.facebook.presto.operator.aggregation.builder. >>> InMemoryHashAggregationBuilder$Aggregator.processPage( >>> InMemoryHashAggregationBuilder.java:367) >>> at com.facebook.presto.operator.aggregation.builder. >>> InMemoryHashAggregationBuilder.processPage(InMemoryHashAggregationBuilder >>> .java:138) >>> at com.facebook.presto.operator.HashAggregationOperator. >>> addInput(HashAggregationOperator.java:400) >>> at com.facebook.presto.operator.D >>> river.processInternal(Driver. >>> java:343) >>> at com.facebook.presto.operator.Driver.lambda$processFor$6( >>> Driver.java:241) >>> at com.facebook.presto.operator.Driver$$Lambda$765/ >>> 442308692.get(Unknown >>> Source) >>> at com.facebook.presto.operator.Driver.tryWithLock(Driver. >>> java:614) >>> at com.facebook.presto.operator.D >>> river.processFor(Driver.java: >>> 235) >>> at com.facebook.presto.execution.SqlTaskExecution$ >>> DriverSplitRunner.processFor(SqlTaskExecution.java:622) >>> at com.facebook.presto.execution.executor. >>> PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) >>> at com.facebook.presto.execution.executor.TaskExecutor$ >>> TaskRunner.run(TaskExecutor.java:485) >>> at java.util.concurrent.ThreadPoolExecutor.runWorker( >>> ThreadPoolExecutor.java:1142) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run( >>> ThreadPoolExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:748) >>> ... >>> >>> >>> >>> Both issues go away after we restart the JVM, and the same query won't >>> trigger the LambdaForm compilation issue, so it looks like the JVM enters >>> some weird state. We are wondering if there is any thoughts on what could >>> trigger these issues? Or is there any suggestions about how to further >>> investigate it next time we see the VM in this state? >>> >>> Thank you. >>> >>> >>> > > > -- > Best Regards, > Wenlei Xie (???) > > Email: wenlei.xie at gmail.com From david.holmes at oracle.com Mon Oct 9 01:33:57 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 9 Oct 2017 11:33:57 +1000 Subject: RFR (XS) 8188828 Intermittent ClassNotFoundException: jdk.test.lib.Platform for compiler tests In-Reply-To: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> References: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> Message-ID: <4dea8e6c-34fc-83b6-8fe3-2905ec15b72b@oracle.com> Hi Ioi, This seems like a temporary workaround - fine for now - but what is the real fix here? It's crazy that one test library class can't use another class from the same test library! Thanks, David On 7/10/2017 6:19 AM, Ioi Lam wrote: > Please review this very simple change: > > https://bugs.openjdk.java.net/browse/JDK-8188828 > http://ioilinux.us.oracle.com/webrev/jdk10/8188828_compiler_test_class_not_found.v01/ > > > The dependency of > > ??? FileInstaller -> Utils -> JDKToolLauncher -> Platform > > has caused many intermittent ClassNotFoundException in the hotspot > nightly runs. > While this fix does not address the root cause (proper dependencies are not > specified in the test cases -- which we are planning to fix), we will > hopefully > see much fewer occurrences of this annoying failure scenario. > > Thanks a lot to Igor for suggesting the simple fix! > > - Ioi > From Alan.Bateman at oracle.com Mon Oct 9 07:55:49 2017 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 9 Oct 2017 08:55:49 +0100 Subject: [10] RFR(S) 8188775: Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.hotspot In-Reply-To: References: Message-ID: On 05/10/2017 00:05, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8188775 > > Changes for 8182701[1] missed changes in default.policy for new module > jdk.internal.vm.compiler.management. > > Add missing code: > > src/java.base/share/lib/security/default.policy > @@ -154,6 +154,10 @@ > ???? permission java.security.AllPermission; > ?}; > > +grant codeBase "jrt:/jdk.internal.vm.compiler.management" { > +??? permission java.security.AllPermission; > +}; > + This looks okay to me although it would be nice if we could identify the minimal permissions rather than granting it AllPermission. -Alan From erik.osterlund at oracle.com Mon Oct 9 08:42:36 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 9 Oct 2017 10:42:36 +0200 Subject: RFR (M): 8188813: Generalize OrderAccess to use templates In-Reply-To: References: <59D639E1.7070104@oracle.com> Message-ID: <59DB367C.6040509@oracle.com> Hi Coleen, On 2017-10-06 17:09, coleen.phillimore at oracle.com wrote: > http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/src/hotspot/os_cpu/linux_aarch64/orderAccess_linux_aarch64.inline.hpp.udiff.html > > > +template > +struct OrderAccess::PlatformOrderedStore > + VALUE_OBJ_CLASS_SPEC > +{ > + template > + void operator()(T v, volatile T* p) const { release_store(p, v); > fence(); } > +}; > > Isn't release_store() removed by this patch? Or does this call back > to OrderAccess::release_store, which seems circular (?) It is as David says. This does the same as was done before. Without this specialization, release_store_fence() would turn into release() store() fence(). This specializes further with release_store() fence(), which will probably turn into stlr; dmb ish; with GCC intrinsics on AArch64, which is a bit more slim than release() store() fence() which would use more fencing. > Otherwise this looks really nice. Thank you! > I'll remove the *_ptr versions with > https://bugs.openjdk.java.net/browse/JDK-8188220 . It's been fun. Thanks for doing that Coleen. /Erik > Thanks, > Coleen > > > On 10/5/17 9:55 AM, Erik ?sterlund wrote: >> Hi, >> >> Now that Atomic has been generalized with templates, the same should >> to be done to OrderAccess. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8188813 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8188813/webrev.00/ >> >> Testing: mach5 hs-tier3 >> >> Since Atomic already has a mechanism for type checking generic >> arguments for Atomic::load/store, and OrderAccess also is a bunch of >> semantically decorated loads and stores, I decided to reuse the >> template wheel that was already invented (Atomic::LoadImpl and >> Atomic::StoreImpl). >> Therefore, I made OrderAccess privately inherit Atomic so that this >> infrastructure could be reused. A whole bunch of code has been nuked >> with this generalization. >> >> It is worth noting that I have added PrimitiveConversion >> functionality for doubles and floats which translates to using the >> union trick for casting double to and from int64_t and float to and >> from int32_t when passing down doubles and ints to the API. I need >> the former two, because Java supports volatile double and volatile >> float, and therefore runtime support for that needs to be able to use >> floats and doubles. I also added PrimitiveConversion functionality >> for the subclasses of oop (instanceOop and friends). The base class >> oop already supported this, so it seemed natural that the subclasses >> should support it too. >> >> Thanks, >> /Erik > From ioi.lam at oracle.com Mon Oct 9 17:54:26 2017 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 9 Oct 2017 10:54:26 -0700 Subject: RFR (XS) 8188828 Intermittent ClassNotFoundException: jdk.test.lib.Platform for compiler tests In-Reply-To: <4dea8e6c-34fc-83b6-8fe3-2905ec15b72b@oracle.com> References: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> <4dea8e6c-34fc-83b6-8fe3-2905ec15b72b@oracle.com> Message-ID: There are several possibilities. One is to pre-compile a bunch of libraries during the build time, and put them in the classpath using the jtreg -cpa: option. Another possibility is to change jtreg to better express the dependency between different classes compiled by jtreg. Thanks - Ioi On 10/8/17 6:33 PM, David Holmes wrote: > Hi Ioi, > > This seems like a temporary workaround - fine for now - but what is > the real fix here? It's crazy that one test library class can't use > another class from the same test library! > > Thanks, > David > > On 7/10/2017 6:19 AM, Ioi Lam wrote: >> Please review this very simple change: >> >> https://bugs.openjdk.java.net/browse/JDK-8188828 >> http://ioilinux.us.oracle.com/webrev/jdk10/8188828_compiler_test_class_not_found.v01/ >> >> >> The dependency of >> >> ???? FileInstaller -> Utils -> JDKToolLauncher -> Platform >> >> has caused many intermittent ClassNotFoundException in the hotspot >> nightly runs. >> While this fix does not address the root cause (proper dependencies >> are not >> specified in the test cases -- which we are planning to fix), we will >> hopefully >> see much fewer occurrences of this annoying failure scenario. >> >> Thanks a lot to Igor for suggesting the simple fix! >> >> - Ioi >> From ioi.lam at oracle.com Mon Oct 9 17:55:34 2017 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 9 Oct 2017 10:55:34 -0700 Subject: RFR (XS) 8188828 Intermittent ClassNotFoundException: jdk.test.lib.Platform for compiler tests In-Reply-To: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> References: <1be927fa-fa4b-1964-93f3-1c72386acf7b@oracle.com> Message-ID: <1b6a6353-9db6-9e1f-7b03-85bc3773055a@oracle.com> Sorry I used an internal URL. Here's the proper openjdk URL: http://cr.openjdk.java.net/~iklam/jdk10/8188828_compiler_test_class_not_found.v01/ Thanks - Ioi On 10/6/17 1:19 PM, Ioi Lam wrote: > Please review this very simple change: > > https://bugs.openjdk.java.net/browse/JDK-8188828 > http://ioilinux.us.oracle.com/webrev/jdk10/8188828_compiler_test_class_not_found.v01/ > > > The dependency of > > ??? FileInstaller -> Utils -> JDKToolLauncher -> Platform > > has caused many intermittent ClassNotFoundException in the hotspot > nightly runs. > While this fix does not address the root cause (proper dependencies > are not > specified in the test cases -- which we are planning to fix), we will > hopefully > see much fewer occurrences of this annoying failure scenario. > > Thanks a lot to Igor for suggesting the simple fix! > > - Ioi > From volker.simonis at gmail.com Mon Oct 9 19:24:57 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 9 Oct 2017 21:24:57 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> Message-ID: Hi Vladimir, I've analyzed the crash. The problem is Sparc specific because on Sparc we do not call the SharedRuntime for G1 pre/post barriers (i.e. SharedRuntime::g1_wb_pre() / SharedRuntime::g1_wb_post()) like on other architectures. Instead we lazily create assembler stubs on the fly (generate_satb_log_enqueue_if_necessary() / generate_dirty_card_log_enqueue_if_necessary()) when they are needed. This happens during the generation of the interpreter and allocates more memory in the code cache such that we can't shrink the memory which was initially allocated for the interpreter any more. Unfortunately we can't easily generate these stubs during 'stubRoutines_init1()' because 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map base address which is only initialized in 'CardTableModRefBS::initialize()' during 'univers_init()' which happens after 'stubRoutines_init1()'. I'm still thinking about a good way to fix this without too many platfrom-specific ifdefs. Regards, Volker On Tue, Oct 3, 2017 at 9:46 PM, Vladimir Kozlov wrote: > I rebased it. But there is problem with changes. VM hit guarantee() in this > code when run on SPARC in both, fastdebug and product, builds. > Crash happens during build. We can't push this - problem should be > investigated and fixed first. > > Thanks, > Vladimir > > make/Main.gmk:443: recipe for target 'generate-link-opt-data' failed > /usr/ccs/bin/bash: line 4: 9349 Abort (core dumped) > /s/build/solaris-sparcv9-debug/support/interim-image/bin/java > -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist > -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true -cp > /s/build/solaris-sparcv9-debug/support/classlist.jar > build.tools.classlist.HelloClasslist 2>&1 > > /s/build/solaris-sparcv9-debug/support/link_opt/default_jli_trace.txt > make[3]: *** [/s/build/solaris-sparcv9-debug/support/link_opt/classlist] > Error 134 > make[2]: *** [generate-link-opt-data] Error 1 > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/s/open/src/hotspot/share/memory/heap.cpp:233), pid=9349, > tid=2 > # guarantee(b == block_at(_next_segment - actual_number_of_segments)) > failed: Intermediate allocation! > # > # JRE version: (10.0) (fastdebug build ) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug > 10-internal+0-2017-09-30-014154.8166317, mixed mode, tiered, compressed > oops, g1 gc, solaris-sparc) > # Core dump will be written. Default location: /s/open/make/core or > core.9349 > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > > --------------- S U M M A R Y ------------ > > Command Line: > -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist > -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true > build.tools.classlist.HelloClasslist > > Host: sca00dbv, Sparcv9 64 bit 3600 MHz, 16 cores, 32G, Oracle Solaris 11.2 > SPARC > Time: Sat Sep 30 03:29:46 2017 UTC elapsed time: 0 seconds (0d 0h 0m 0s) > > --------------- T H R E A D --------------- > > Current thread (0x000000010012f000): JavaThread "Unknown thread" > [_thread_in_vm, id=2, stack(0x0007fffef9700000,0x0007fffef9800000)] > > Stack: [0x0007fffef9700000,0x0007fffef9800000], sp=0x0007fffef97ff020, > free space=1020k > Native frames: (J=compiled Java code, A=aot compiled Java code, > j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x1f94508] void VMError::report_and_die(int,const char*,const > char*,void*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned > long)+0xa58 > V [libjvm.so+0x1f93a3c] void VMError::report_and_die(Thread*,const > char*,int,const char*,const char*,void*)+0x3c > V [libjvm.so+0xd02f38] void report_vm_error(const char*,int,const > char*,const char*,...)+0x78 > V [libjvm.so+0xfc219c] void CodeHeap::deallocate_tail(void*,unsigned > long)+0xec > V [libjvm.so+0xbf4f14] void CodeCache::free_unused_tail(CodeBlob*,unsigned > long)+0xe4 > V [libjvm.so+0x1e0ae70] void StubQueue::deallocate_unused_tail()+0x40 > V [libjvm.so+0x1e7452c] void TemplateInterpreter::initialize()+0x19c > V [libjvm.so+0x1051220] void interpreter_init()+0x20 > V [libjvm.so+0x10116e0] int init_globals()+0xf0 > V [libjvm.so+0x1ed8548] int > Threads::create_vm(JavaVMInitArgs*,bool*)+0x4a8 > V [libjvm.so+0x11c7b58] int > JNI_CreateJavaVM_inner(JavaVM_**,void**,void*)+0x108 > C [libjli.so+0x7950] InitializeJVM+0x100 > > > On 10/2/17 7:55 AM, coleen.phillimore at oracle.com wrote: >> >> >> I can sponsor this for you once you rebase, and fix these compilation >> errors. >> Thanks, >> Coleen >> >> On 9/30/17 12:28 AM, Volker Simonis wrote: >>> >>> Hi Vladimir, >>> >>> thanks a lot for remembering these changes! >>> >>> Regards, >>> Volker >>> >>> >>> Vladimir Kozlov >> > schrieb am Fr. 29. Sep. 2017 um 15:47: >>> >>> I hit build failure when tried to push changes: >>> >>> src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' : >>> conversion from 'size_t' to 'int', possible loss of data >>> src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' : >>> conversion from 'size_t' to 'int', possible loss of data >>> >>> I am going to fix it by casting (int): >>> >>> + void adjust_size(size_t used) { >>> + _size = (int)used; >>> + _data_offset = (int)used; >>> + _code_end = (address)this + used; >>> + _data_end = (address)this + used; >>> + } >>> >>> Note, CodeCache size can't more than 2Gb (max_int) so such casting is >>> fine. >>> >>> Vladimir >>> >>> On 9/6/17 6:20 AM, Volker Simonis wrote: >>> > On Tue, Sep 5, 2017 at 9:36 PM, >> > wrote: >>> >> >>> >> I was going to make the same comment about the friend declaration >>> in v1, so >>> >> v2 looks better to me. Looks good. Thank you for finding a >>> solution to >>> >> this problem that we've had for a long time. I will sponsor this >>> (remind me >>> >> if I forget after the 18th). >>> >> >>> > >>> > Thanks Coleen! I've updated >>> > >>> > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ >>> >>> > >>> > in-place and added you as a second reviewer. >>> > >>> > Regards, >>> > Volker >>> > >>> > >>> >> thanks, >>> >> Coleen >>> >> >>> >> >>> >> >>> >> On 9/5/17 1:17 PM, Vladimir Kozlov wrote: >>> >>> >>> >>> On 9/5/17 9:49 AM, Volker Simonis wrote: >>> >>>> >>> >>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov >>> >>>> > >>> wrote: >>> >>>>> >>> >>>>> May be add new CodeBlob's method to adjust sizes instead of >>> directly >>> >>>>> setting >>> >>>>> them in CodeCache::free_unused_tail(). Then you would not need >>> friend >>> >>>>> class >>> >>>>> CodeCache in CodeBlob. >>> >>>>> >>> >>>> >>> >>>> Changed as suggested (I didn't liked the friend declaration as >>> well :) >>> >>>> >>> >>>>> Also I think adjustment to header_size should be done in >>> >>>>> CodeCache::free_unused_tail() to limit scope of code who knows >>> about >>> >>>>> blob >>> >>>>> layout. >>> >>>>> >>> >>>> >>> >>>> Yes, that's much cleaner. Please find the updated webrev here: >>> >>>> >>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ >>> >>> >>> >>> >>> >>> >>> >>> Good. >>> >>> >>> >>>> >>> >>>> I've also found another "day 1" problem in StubQueue::next(): >>> >>>> >>> >>>> Stub* next(Stub* s) const { int i = >>> >>>> index_of(s) + stub_size(s); >>> >>>> - if (i == >>> >>>> _buffer_limit) i = 0; >>> >>>> + // Only wrap >>> >>>> around in the non-contiguous case (see stubss.cpp) >>> >>>> + if (i == >>> >>>> _buffer_limit && _queue_end < _buffer_limit) i = 0; >>> >>>> return (i == >>> >>>> _queue_end) ? NULL : stub_at(i); >>> >>>> } >>> >>>> >>> >>>> The problem was that the method was not prepared to handle the >>> case >>> >>>> where _buffer_limit == _queue_end == _buffer_size which lead to >>> an >>> >>>> infinite recursion when iterating over a StubQueue with >>> >>>> StubQueue::next() until next() returns NULL (as this was for >>> example >>> >>>> done with -XX:+PrintInterpreter). But with the new, trimmed >>> CodeBlob >>> >>>> we run into exactly this situation. >>> >>> >>> >>> >>> >>> Okay. >>> >>> >>> >>>> >>> >>>> While doing this last fix I also noticed that >>> "StubQueue::stubs_do()", >>> >>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't >>> seem >>> >>>> to be used anywhere in the open code base (please correct me if >>> I'm >>> >>>> wrong). What do you think, maybe we should remove this code in a >>> >>>> follow up change if it is really not needed? >>> >>> >>> >>> >>> >>> register_queue() is used in constructor. Other 2 you can remove. >>> >>> stub_code_begin() and stub_code_end() are not used too -remove. >>> >>> I thought we run on linux with flag which warn about unused code. >>> >>> >>> >>>> >>> >>>> Finally, could you please run the new version through JPRT and >>> sponsor >>> >>>> it once jdk10/hs will be opened again? >>> >>> >>> >>> >>> >>> Will do when jdk10 "consolidation" is finished. Please, remind me >>> later if >>> >>> I forget. >>> >>> >>> >>> Thanks, >>> >>> Vladimir >>> >>> >>> >>>> >>> >>>> Thanks, >>> >>>> Volker >>> >>>> >>> >>>>> Thanks, >>> >>>>> Vladimir >>> >>>>> >>> >>>>> >>> >>>>> On 9/1/17 8:46 AM, Volker Simonis wrote: >>> >>>>>> >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> I've decided to split the fix for the >>> 'CodeHeap::contains_blob()' >>> >>>>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest >>> fails >>> >>>>>> because of problems in CodeHeap::contains_blob()" >>> >>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started >>> a new >>> >>>>>> review thread for discussing it at: >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >>> >>>>>> >>> >>>>>> So please lets keep this thread for discussing the interpreter >>> code >>> >>>>>> size issue only. I've prepared a new version of the webrev >>> which is >>> >>>>>> the same as the first one with the only difference that the >>> change to >>> >>>>>> 'CodeHeap::contains_blob()' has been removed: >>> >>>>>> >>> >>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >>> >>> >>>>>> >>> >>>>>> Thanks, >>> >>>>>> Volker >>> >>>>>> >>> >>>>>> >>> >>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >>> >>>>>> > >>> wrote: >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >>> >>>>>>> >> > wrote: >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Very good change. Thank you, Volker. >>> >>>>>>>> >>> >>>>>>>> About contains_blob(). The problem is that AOTCompiledMethod >>> >>>>>>>> allocated >>> >>>>>>>> in >>> >>>>>>>> CHeap and not in aot code section (which is RO): >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>> >>>>>>>> >>> >>>>>>>> It is allocated in CHeap after AOT library is loaded. Its >>> >>>>>>>> code_begin() >>> >>>>>>>> points to AOT code section but AOTCompiledMethod* points >>> outside it >>> >>>>>>>> (to >>> >>>>>>>> normal malloced space) so you can't use (char*)blob address. >>> >>>>>>>> >>> >>>>>>> >>> >>>>>>> Thanks for the explanation - now I got it. >>> >>>>>>> >>> >>>>>>>> There are 2 ways to fix it, I think. >>> >>>>>>>> One is to add new field to CodeBlobLayout and set it to >>> blob* address >>> >>>>>>>> for >>> >>>>>>>> normal CodeCache blobs and to code_begin for AOT code. >>> >>>>>>>> Second is to use contains(blob->code_end() - 1) assuming >>> that AOT >>> >>>>>>>> code >>> >>>>>>>> is >>> >>>>>>>> never zero. >>> >>>>>>>> >>> >>>>>>> >>> >>>>>>> I'll give it a try tomorrow and will send out a new webrev. >>> >>>>>>> >>> >>>>>>> Regards, >>> >>>>>>> Volker >>> >>>>>>> >>> >>>>>>>> Thanks, >>> >>>>>>>> Vladimir >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>> >>>>>>>>> >> > wrote: >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> While working on this, I found another problem which is >>> related to >>> >>>>>>>>>>> the >>> >>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing >>> the JTreg >>> >>>>>>>>>>> test >>> >>>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>> >>>>>>>>>>> >>> >>>>>>>>>>> The problem is that JDK-8183573 replaced >>> >>>>>>>>>>> >>> >>>>>>>>>>> virtual bool contains_blob(const CodeBlob* blob) >>> const { >>> >>>>>>>>>>> return >>> >>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); >>> } >>> >>>>>>>>>>> >>> >>>>>>>>>>> by: >>> >>>>>>>>>>> >>> >>>>>>>>>>> bool contains_blob(const CodeBlob* blob) const { >>> return >>> >>>>>>>>>>> contains(blob->code_begin()); } >>> >>>>>>>>>>> >>> >>>>>>>>>>> But that my be wrong in the corner case where the size of >>> the >>> >>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists >>> only of the >>> >>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that >>> case >>> >>>>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's >>> header >>> >>>>>>>>>>> which >>> >>>>>>>>>>> is a memory location which doesn't belong to the CodeBlob >>> anymore. >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> I recall this change was somehow necessary to allow >>> merging >>> >>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob >>> into >>> >>>>>>>>>> one devirtualized method, so you need to ensure all AOT >>> tests >>> >>>>>>>>>> pass with this change (on linux-x64). >>> >>>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and >>> passed >>> >>>>>>>>> successful. Are there any other tests I should check? >>> >>>>>>>>> >>> >>>>>>>>> That said, it is a little hard to follow the stages of your >>> change. >>> >>>>>>>>> It >>> >>>>>>>>> seems like >>> >>>>>>>>> >>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>> >>> >>>>>>>>> was reviewed [1] but then finally the slightly changed >>> version from >>> >>>>>>>>> >>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ >>> >>> >>> >>>>>>>>> was >>> >>>>>>>>> checked in and linked to the bug report. >>> >>>>>>>>> >>> >>>>>>>>> The first, reviewed version of the change still had a >>> correct >>> >>>>>>>>> version >>> >>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while >>> the second, >>> >>>>>>>>> checked in version has the faulty version of that method. >>> >>>>>>>>> >>> >>>>>>>>> I don't know why you finally did that change to >>> 'contains_blob()' >>> >>>>>>>>> but >>> >>>>>>>>> I don't see any reason why we shouldn't be able to directly >>> use the >>> >>>>>>>>> blob's address for inclusion checking. From what I >>> understand, it >>> >>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap so >>> no >>> >>>>>>>>> reason >>> >>>>>>>>> to mess with 'CodeBlob::code_begin()'. >>> >>>>>>>>> >>> >>>>>>>>> Please let me know if I'm missing something. >>> >>>>>>>>> >>> >>>>>>>>> [1] >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>> >>>>>>>>> >>> >>>>>>>>>> I can't help to wonder if we'd not be better served by >>> disallowing >>> >>>>>>>>>> zero-sized payloads. Is this something that can ever >>> actually >>> >>>>>>>>>> happen except by abuse of the white box API? >>> >>>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) >>> specifically >>> >>>>>>>>> wants to allocate "segment sized" blocks which is most >>> easily >>> >>>>>>>>> achieved >>> >>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's >>> nothing >>> >>>>>>>>> wrong >>> >>>>>>>>> about it if we handle the inclusion tests correctly. >>> >>>>>>>>> >>> >>>>>>>>> Thank you and best regards, >>> >>>>>>>>> Volker >>> >>>>>>>>> >>> >>>>>>>>>> /Claes >>> >> >>> >> >>> >> > From cthalinger at twitter.com Mon Oct 9 19:45:58 2017 From: cthalinger at twitter.com (Christian Thalinger) Date: Mon, 9 Oct 2017 09:45:58 -1000 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> Message-ID: <50CACA26-DF35-428F-8385-AB4CE74FFD6E@twitter.com> > On Oct 9, 2017, at 9:24 AM, Volker Simonis wrote: > > Hi Vladimir, > > I've analyzed the crash. The problem is Sparc specific because on > Sparc we do not call the SharedRuntime for G1 pre/post barriers (i.e. > SharedRuntime::g1_wb_pre() / SharedRuntime::g1_wb_post()) like on > other architectures. Instead we lazily create assembler stubs on the > fly (generate_satb_log_enqueue_if_necessary() / > generate_dirty_card_log_enqueue_if_necessary()) when they are needed. Why are we using these stubs on SPARC? Can we get rid of them and just call into the runtime instead? > This happens during the generation of the interpreter and allocates > more memory in the code cache such that we can't shrink the memory > which was initially allocated for the interpreter any more. > > Unfortunately we can't easily generate these stubs during > 'stubRoutines_init1()' because > 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map > base address which is only initialized in > 'CardTableModRefBS::initialize()' during 'univers_init()' which > happens after 'stubRoutines_init1()'. > > I'm still thinking about a good way to fix this without too many > platfrom-specific ifdefs. > > Regards, > Volker > > > On Tue, Oct 3, 2017 at 9:46 PM, Vladimir Kozlov > wrote: >> I rebased it. But there is problem with changes. VM hit guarantee() in this >> code when run on SPARC in both, fastdebug and product, builds. >> Crash happens during build. We can't push this - problem should be >> investigated and fixed first. >> >> Thanks, >> Vladimir >> >> make/Main.gmk:443: recipe for target 'generate-link-opt-data' failed >> /usr/ccs/bin/bash: line 4: 9349 Abort (core dumped) >> /s/build/solaris-sparcv9-debug/support/interim-image/bin/java >> -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist >> -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true -cp >> /s/build/solaris-sparcv9-debug/support/classlist.jar >> build.tools.classlist.HelloClasslist 2>&1 > >> /s/build/solaris-sparcv9-debug/support/link_opt/default_jli_trace.txt >> make[3]: *** [/s/build/solaris-sparcv9-debug/support/link_opt/classlist] >> Error 134 >> make[2]: *** [generate-link-opt-data] Error 1 >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/s/open/src/hotspot/share/memory/heap.cpp:233), pid=9349, >> tid=2 >> # guarantee(b == block_at(_next_segment - actual_number_of_segments)) >> failed: Intermediate allocation! >> # >> # JRE version: (10.0) (fastdebug build ) >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug >> 10-internal+0-2017-09-30-014154.8166317, mixed mode, tiered, compressed >> oops, g1 gc, solaris-sparc) >> # Core dump will be written. Default location: /s/open/make/core or >> core.9349 >> # >> # If you would like to submit a bug report, please visit: >> # http://bugreport.java.com/bugreport/crash.jsp >> # >> >> --------------- S U M M A R Y ------------ >> >> Command Line: >> -XX:DumpLoadedClassList=/s/build/solaris-sparcv9-debug/support/link_opt/classlist >> -Djava.lang.invoke.MethodHandle.TRACE_RESOLVE=true >> build.tools.classlist.HelloClasslist >> >> Host: sca00dbv, Sparcv9 64 bit 3600 MHz, 16 cores, 32G, Oracle Solaris 11.2 >> SPARC >> Time: Sat Sep 30 03:29:46 2017 UTC elapsed time: 0 seconds (0d 0h 0m 0s) >> >> --------------- T H R E A D --------------- >> >> Current thread (0x000000010012f000): JavaThread "Unknown thread" >> [_thread_in_vm, id=2, stack(0x0007fffef9700000,0x0007fffef9800000)] >> >> Stack: [0x0007fffef9700000,0x0007fffef9800000], sp=0x0007fffef97ff020, >> free space=1020k >> Native frames: (J=compiled Java code, A=aot compiled Java code, >> j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x1f94508] void VMError::report_and_die(int,const char*,const >> char*,void*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned >> long)+0xa58 >> V [libjvm.so+0x1f93a3c] void VMError::report_and_die(Thread*,const >> char*,int,const char*,const char*,void*)+0x3c >> V [libjvm.so+0xd02f38] void report_vm_error(const char*,int,const >> char*,const char*,...)+0x78 >> V [libjvm.so+0xfc219c] void CodeHeap::deallocate_tail(void*,unsigned >> long)+0xec >> V [libjvm.so+0xbf4f14] void CodeCache::free_unused_tail(CodeBlob*,unsigned >> long)+0xe4 >> V [libjvm.so+0x1e0ae70] void StubQueue::deallocate_unused_tail()+0x40 >> V [libjvm.so+0x1e7452c] void TemplateInterpreter::initialize()+0x19c >> V [libjvm.so+0x1051220] void interpreter_init()+0x20 >> V [libjvm.so+0x10116e0] int init_globals()+0xf0 >> V [libjvm.so+0x1ed8548] int >> Threads::create_vm(JavaVMInitArgs*,bool*)+0x4a8 >> V [libjvm.so+0x11c7b58] int >> JNI_CreateJavaVM_inner(JavaVM_**,void**,void*)+0x108 >> C [libjli.so+0x7950] InitializeJVM+0x100 >> >> >> On 10/2/17 7:55 AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> I can sponsor this for you once you rebase, and fix these compilation >>> errors. >>> Thanks, >>> Coleen >>> >>> On 9/30/17 12:28 AM, Volker Simonis wrote: >>>> >>>> Hi Vladimir, >>>> >>>> thanks a lot for remembering these changes! >>>> >>>> Regards, >>>> Volker >>>> >>>> >>>> Vladimir Kozlov >>> > schrieb am Fr. 29. Sep. 2017 um 15:47: >>>> >>>> I hit build failure when tried to push changes: >>>> >>>> src/hotspot/share/code/codeBlob.hpp(162) : warning C4267: '=' : >>>> conversion from 'size_t' to 'int', possible loss of data >>>> src/hotspot/share/code/codeBlob.hpp(163) : warning C4267: '=' : >>>> conversion from 'size_t' to 'int', possible loss of data >>>> >>>> I am going to fix it by casting (int): >>>> >>>> + void adjust_size(size_t used) { >>>> + _size = (int)used; >>>> + _data_offset = (int)used; >>>> + _code_end = (address)this + used; >>>> + _data_end = (address)this + used; >>>> + } >>>> >>>> Note, CodeCache size can't more than 2Gb (max_int) so such casting is >>>> fine. >>>> >>>> Vladimir >>>> >>>> On 9/6/17 6:20 AM, Volker Simonis wrote: >>>>> On Tue, Sep 5, 2017 at 9:36 PM, >>> > wrote: >>>>>> >>>>>> I was going to make the same comment about the friend declaration >>>> in v1, so >>>>>> v2 looks better to me. Looks good. Thank you for finding a >>>> solution to >>>>>> this problem that we've had for a long time. I will sponsor this >>>> (remind me >>>>>> if I forget after the 18th). >>>>>> >>>>> >>>>> Thanks Coleen! I've updated >>>>> >>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ >>>> >>>>> >>>>> in-place and added you as a second reviewer. >>>>> >>>>> Regards, >>>>> Volker >>>>> >>>>> >>>>>> thanks, >>>>>> Coleen >>>>>> >>>>>> >>>>>> >>>>>> On 9/5/17 1:17 PM, Vladimir Kozlov wrote: >>>>>>> >>>>>>> On 9/5/17 9:49 AM, Volker Simonis wrote: >>>>>>>> >>>>>>>> On Fri, Sep 1, 2017 at 6:16 PM, Vladimir Kozlov >>>>>>>> > >>>> wrote: >>>>>>>>> >>>>>>>>> May be add new CodeBlob's method to adjust sizes instead of >>>> directly >>>>>>>>> setting >>>>>>>>> them in CodeCache::free_unused_tail(). Then you would not need >>>> friend >>>>>>>>> class >>>>>>>>> CodeCache in CodeBlob. >>>>>>>>> >>>>>>>> >>>>>>>> Changed as suggested (I didn't liked the friend declaration as >>>> well :) >>>>>>>> >>>>>>>>> Also I think adjustment to header_size should be done in >>>>>>>>> CodeCache::free_unused_tail() to limit scope of code who knows >>>> about >>>>>>>>> blob >>>>>>>>> layout. >>>>>>>>> >>>>>>>> >>>>>>>> Yes, that's much cleaner. Please find the updated webrev here: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v2/ >>>> >>>> >>>>>>> >>>>>>> >>>>>>> Good. >>>>>>> >>>>>>>> >>>>>>>> I've also found another "day 1" problem in StubQueue::next(): >>>>>>>> >>>>>>>> Stub* next(Stub* s) const { int i = >>>>>>>> index_of(s) + stub_size(s); >>>>>>>> - if (i == >>>>>>>> _buffer_limit) i = 0; >>>>>>>> + // Only wrap >>>>>>>> around in the non-contiguous case (see stubss.cpp) >>>>>>>> + if (i == >>>>>>>> _buffer_limit && _queue_end < _buffer_limit) i = 0; >>>>>>>> return (i == >>>>>>>> _queue_end) ? NULL : stub_at(i); >>>>>>>> } >>>>>>>> >>>>>>>> The problem was that the method was not prepared to handle the >>>> case >>>>>>>> where _buffer_limit == _queue_end == _buffer_size which lead to >>>> an >>>>>>>> infinite recursion when iterating over a StubQueue with >>>>>>>> StubQueue::next() until next() returns NULL (as this was for >>>> example >>>>>>>> done with -XX:+PrintInterpreter). But with the new, trimmed >>>> CodeBlob >>>>>>>> we run into exactly this situation. >>>>>>> >>>>>>> >>>>>>> Okay. >>>>>>> >>>>>>>> >>>>>>>> While doing this last fix I also noticed that >>>> "StubQueue::stubs_do()", >>>>>>>> "StubQueue::queues_do()" and "StubQueue::register_queue()" don't >>>> seem >>>>>>>> to be used anywhere in the open code base (please correct me if >>>> I'm >>>>>>>> wrong). What do you think, maybe we should remove this code in a >>>>>>>> follow up change if it is really not needed? >>>>>>> >>>>>>> >>>>>>> register_queue() is used in constructor. Other 2 you can remove. >>>>>>> stub_code_begin() and stub_code_end() are not used too -remove. >>>>>>> I thought we run on linux with flag which warn about unused code. >>>>>>> >>>>>>>> >>>>>>>> Finally, could you please run the new version through JPRT and >>>> sponsor >>>>>>>> it once jdk10/hs will be opened again? >>>>>>> >>>>>>> >>>>>>> Will do when jdk10 "consolidation" is finished. Please, remind me >>>> later if >>>>>>> I forget. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Volker >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9/1/17 8:46 AM, Volker Simonis wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I've decided to split the fix for the >>>> 'CodeHeap::contains_blob()' >>>>>>>>>> problem into its own issue "8187091: ReturnBlobToWrongHeapTest >>>> fails >>>>>>>>>> because of problems in CodeHeap::contains_blob()" >>>>>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8187091) and started >>>> a new >>>>>>>>>> review thread for discussing it at: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-September/028206.html >>>>>>>>>> >>>>>>>>>> So please lets keep this thread for discussing the interpreter >>>> code >>>>>>>>>> size issue only. I've prepared a new version of the webrev >>>> which is >>>>>>>>>> the same as the first one with the only difference that the >>>> change to >>>>>>>>>> 'CodeHeap::contains_blob()' has been removed: >>>>>>>>>> >>>>>>>>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v1/ >>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Volker >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 31, 2017 at 6:35 PM, Volker Simonis >>>>>>>>>> > >>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 31, 2017 at 6:05 PM, Vladimir Kozlov >>>>>>>>>>> >>> > wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Very good change. Thank you, Volker. >>>>>>>>>>>> >>>>>>>>>>>> About contains_blob(). The problem is that AOTCompiledMethod >>>>>>>>>>>> allocated >>>>>>>>>>>> in >>>>>>>>>>>> CHeap and not in aot code section (which is RO): >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>>>>>>>>>> >>>>>>>>>>>> It is allocated in CHeap after AOT library is loaded. Its >>>>>>>>>>>> code_begin() >>>>>>>>>>>> points to AOT code section but AOTCompiledMethod* points >>>> outside it >>>>>>>>>>>> (to >>>>>>>>>>>> normal malloced space) so you can't use (char*)blob address. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks for the explanation - now I got it. >>>>>>>>>>> >>>>>>>>>>>> There are 2 ways to fix it, I think. >>>>>>>>>>>> One is to add new field to CodeBlobLayout and set it to >>>> blob* address >>>>>>>>>>>> for >>>>>>>>>>>> normal CodeCache blobs and to code_begin for AOT code. >>>>>>>>>>>> Second is to use contains(blob->code_end() - 1) assuming >>>> that AOT >>>>>>>>>>>> code >>>>>>>>>>>> is >>>>>>>>>>>> never zero. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'll give it a try tomorrow and will send out a new webrev. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Volker >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Vladimir >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 8/31/17 5:43 AM, Volker Simonis wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:14 PM, Claes Redestad >>>>>>>>>>>>> >>> > wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2017-08-31 08:54, Volker Simonis wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> While working on this, I found another problem which is >>>> related to >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> fix of JDK-8183573 and leads to crashes when executing >>>> the JTreg >>>>>>>>>>>>>>> test >>>>>>>>>>>>>>> compiler/codecache/stress/ReturnBlobToWrongHeapTest.java. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The problem is that JDK-8183573 replaced >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> virtual bool contains_blob(const CodeBlob* blob) >>>> const { >>>>>>>>>>>>>>> return >>>>>>>>>>>>>>> low_boundary() <= (char*) blob && (char*) blob < high(); >>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> by: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> bool contains_blob(const CodeBlob* blob) const { >>>> return >>>>>>>>>>>>>>> contains(blob->code_begin()); } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But that my be wrong in the corner case where the size of >>>> the >>>>>>>>>>>>>>> CodeBlob's payload is zero (i.e. the CodeBlob consists >>>> only of the >>>>>>>>>>>>>>> 'header' - i.e. the C++ object itself) because in that >>>> case >>>>>>>>>>>>>>> CodeBlob::code_begin() points right behind the CodeBlob's >>>> header >>>>>>>>>>>>>>> which >>>>>>>>>>>>>>> is a memory location which doesn't belong to the CodeBlob >>>> anymore. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I recall this change was somehow necessary to allow >>>> merging >>>>>>>>>>>>>> AOTCodeHeap::contains_blob and CodeHead::contains_blob >>>> into >>>>>>>>>>>>>> one devirtualized method, so you need to ensure all AOT >>>> tests >>>>>>>>>>>>>> pass with this change (on linux-x64). >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> All of hotspot/test/aot and hotspot/test/jvmci executed and >>>> passed >>>>>>>>>>>>> successful. Are there any other tests I should check? >>>>>>>>>>>>> >>>>>>>>>>>>> That said, it is a little hard to follow the stages of your >>>> change. >>>>>>>>>>>>> It >>>>>>>>>>>>> seems like >>>>>>>>>>>>> >>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.00/ >>>> >>>>>>>>>>>>> was reviewed [1] but then finally the slightly changed >>>> version from >>>>>>>>>>>>> >>>> http://cr.openjdk.java.net/~redestad/scratch/codeheap_contains.01/ >>>> >>>> >>>>>>>>>>>>> was >>>>>>>>>>>>> checked in and linked to the bug report. >>>>>>>>>>>>> >>>>>>>>>>>>> The first, reviewed version of the change still had a >>>> correct >>>>>>>>>>>>> version >>>>>>>>>>>>> of 'CodeHeap::contains_blob(const CodeBlob* blob)' while >>>> the second, >>>>>>>>>>>>> checked in version has the faulty version of that method. >>>>>>>>>>>>> >>>>>>>>>>>>> I don't know why you finally did that change to >>>> 'contains_blob()' >>>>>>>>>>>>> but >>>>>>>>>>>>> I don't see any reason why we shouldn't be able to directly >>>> use the >>>>>>>>>>>>> blob's address for inclusion checking. From what I >>>> understand, it >>>>>>>>>>>>> should ALWAYS be contained in the corresponding CodeHeap so >>>> no >>>>>>>>>>>>> reason >>>>>>>>>>>>> to mess with 'CodeBlob::code_begin()'. >>>>>>>>>>>>> >>>>>>>>>>>>> Please let me know if I'm missing something. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-July/026624.html >>>>>>>>>>>>> >>>>>>>>>>>>>> I can't help to wonder if we'd not be better served by >>>> disallowing >>>>>>>>>>>>>> zero-sized payloads. Is this something that can ever >>>> actually >>>>>>>>>>>>>> happen except by abuse of the white box API? >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> The corresponding test (ReturnBlobToWrongHeapTest.java) >>>> specifically >>>>>>>>>>>>> wants to allocate "segment sized" blocks which is most >>>> easily >>>>>>>>>>>>> achieved >>>>>>>>>>>>> by allocation zero-sized CodeBlobs. And I think there's >>>> nothing >>>>>>>>>>>>> wrong >>>>>>>>>>>>> about it if we handle the inclusion tests correctly. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you and best regards, >>>>>>>>>>>>> Volker >>>>>>>>>>>>> >>>>>>>>>>>>>> /Claes >>>>>> >>>>>> >>>> >>> >> From jesper.wilhelmsson at oracle.com Tue Oct 10 01:35:21 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Tue, 10 Oct 2017 03:35:21 +0200 Subject: RFR (xs): JDK-8189071 - Require jtreg 4.2 b09 Message-ID: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> Hi, Can I have a review of this trivial fix to update the jib-profile to require the latest version of jtreg. This to get rid of the SocketTimeoutException that we see in the hotspot nightlies. Bug: https://bugs.openjdk.java.net/browse/JDK-8189071 The change is: diff --git a/make/conf/jib-profiles.js b/make/conf/jib-profiles.js --- a/make/conf/jib-profiles.js +++ b/make/conf/jib-profiles.js @@ -1063,7 +1063,7 @@ jtreg: { server: "javare", revision: "4.2", - build_number: "b08", + build_number: "b09", checksum_file: "MD5_VALUES", file: "jtreg_bin-4.2.zip", environment_name: "JT_HOME", Thanks, /Jesper From david.holmes at oracle.com Tue Oct 10 01:45:40 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Oct 2017 11:45:40 +1000 Subject: RFR (xs): JDK-8189071 - Require jtreg 4.2 b09 In-Reply-To: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> References: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> Message-ID: <0c3c434f-8a4e-1c91-d21d-62028382c8d6@oracle.com> Reviewed! Push it under trivial rules. Thanks, David On 10/10/2017 11:35 AM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > Can I have a review of this trivial fix to update the jib-profile to require the latest version of jtreg. This to get rid of the SocketTimeoutException that we see in the hotspot nightlies. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189071 > > The change is: > > diff --git a/make/conf/jib-profiles.js b/make/conf/jib-profiles.js > --- a/make/conf/jib-profiles.js > +++ b/make/conf/jib-profiles.js > @@ -1063,7 +1063,7 @@ > jtreg: { > server: "javare", > revision: "4.2", > - build_number: "b08", > + build_number: "b09", > checksum_file: "MD5_VALUES", > file: "jtreg_bin-4.2.zip", > environment_name: "JT_HOME", > > > Thanks, > /Jesper > From george.triantafillou at oracle.com Tue Oct 10 01:50:45 2017 From: george.triantafillou at oracle.com (George Triantafillou) Date: Mon, 9 Oct 2017 21:50:45 -0400 Subject: RFR (xs): JDK-8189071 - Require jtreg 4.2 b09 In-Reply-To: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> References: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> Message-ID: Hi Jesper, Looks good. -George On 10/9/2017 9:35 PM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > Can I have a review of this trivial fix to update the jib-profile to require the latest version of jtreg. This to get rid of the SocketTimeoutException that we see in the hotspot nightlies. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189071 > > The change is: > > diff --git a/make/conf/jib-profiles.js b/make/conf/jib-profiles.js > --- a/make/conf/jib-profiles.js > +++ b/make/conf/jib-profiles.js > @@ -1063,7 +1063,7 @@ > jtreg: { > server: "javare", > revision: "4.2", > - build_number: "b08", > + build_number: "b09", > checksum_file: "MD5_VALUES", > file: "jtreg_bin-4.2.zip", > environment_name: "JT_HOME", > > > Thanks, > /Jesper > From jesper.wilhelmsson at oracle.com Tue Oct 10 01:52:07 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Tue, 10 Oct 2017 03:52:07 +0200 Subject: RFR (xs): JDK-8189071 - Require jtreg 4.2 b09 In-Reply-To: <0c3c434f-8a4e-1c91-d21d-62028382c8d6@oracle.com> References: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> <0c3c434f-8a4e-1c91-d21d-62028382c8d6@oracle.com> Message-ID: <1F417DB9-7275-4240-85CF-8F3AA2667E0D@oracle.com> Thanks David! /Jesper > On 10 Oct 2017, at 03:45, David Holmes wrote: > > Reviewed! > > Push it under trivial rules. > > Thanks, > David > > On 10/10/2017 11:35 AM, jesper.wilhelmsson at oracle.com wrote: >> Hi, >> Can I have a review of this trivial fix to update the jib-profile to require the latest version of jtreg. This to get rid of the SocketTimeoutException that we see in the hotspot nightlies. >> Bug: https://bugs.openjdk.java.net/browse/JDK-8189071 >> The change is: >> diff --git a/make/conf/jib-profiles.js b/make/conf/jib-profiles.js >> --- a/make/conf/jib-profiles.js >> +++ b/make/conf/jib-profiles.js >> @@ -1063,7 +1063,7 @@ >> jtreg: { >> server: "javare", >> revision: "4.2", >> - build_number: "b08", >> + build_number: "b09", >> checksum_file: "MD5_VALUES", >> file: "jtreg_bin-4.2.zip", >> environment_name: "JT_HOME", >> Thanks, >> /Jesper From jesper.wilhelmsson at oracle.com Tue Oct 10 01:52:26 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Tue, 10 Oct 2017 03:52:26 +0200 Subject: RFR (xs): JDK-8189071 - Require jtreg 4.2 b09 In-Reply-To: References: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> Message-ID: Thanks George! /Jesper > On 10 Oct 2017, at 03:50, George Triantafillou wrote: > > Hi Jesper, > > Looks good. > > -George > On 10/9/2017 9:35 PM, jesper.wilhelmsson at oracle.com wrote: >> Hi, >> >> Can I have a review of this trivial fix to update the jib-profile to require the latest version of jtreg. This to get rid of the SocketTimeoutException that we see in the hotspot nightlies. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8189071 >> >> The change is: >> >> diff --git a/make/conf/jib-profiles.js b/make/conf/jib-profiles.js >> --- a/make/conf/jib-profiles.js >> +++ b/make/conf/jib-profiles.js >> @@ -1063,7 +1063,7 @@ >> jtreg: { >> server: "javare", >> revision: "4.2", >> - build_number: "b08", >> + build_number: "b09", >> checksum_file: "MD5_VALUES", >> file: "jtreg_bin-4.2.zip", >> environment_name: "JT_HOME", >> >> >> Thanks, >> /Jesper >> > From aph at redhat.com Tue Oct 10 07:42:27 2017 From: aph at redhat.com (Andrew Haley) Date: Tue, 10 Oct 2017 08:42:27 +0100 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> Message-ID: <4109f960-078f-e582-3c78-71f201a265fd@redhat.com> On 09/10/17 20:24, Volker Simonis wrote: > Unfortunately we can't easily generate these stubs during > 'stubRoutines_init1()' because > 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map > base address which is only initialized in > 'CardTableModRefBS::initialize()' during 'univers_init()' which > happens after 'stubRoutines_init1()'. Yes you can, you can do something like we do for narrow_ptrs_base: if (Universe::is_fully_initialized()) { mov(rheapbase, Universe::narrow_ptrs_base()); } else { lea(rheapbase, ExternalAddress((address)Universe::narrow_ptrs_base_addr())); ldr(rheapbase, Address(rheapbase)); } -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kim.barrett at oracle.com Tue Oct 10 08:29:57 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Oct 2017 04:29:57 -0400 Subject: RFR: 8189088: Add intrusive doubly-linked list utility Message-ID: RFR: 8189088: Add intrusive doubly-linked list utility Please review this new facility, providing a general mechanism for intrusive doubly-linked lists. A class supports inclusion in a list by having an IntrusiveListLink member, and providing structured information about how to access that member. A class supports inclusion in multiple lists by having multiple IntrusiveListLink members, with different lists specified to use different members. The IntrusiveList class template provides the list management. It is modelled on bidirectional containers such as std::list and boost::intrusive::list, providing many of the expected member types and functions. (Note that the member types use the Standard's naming conventions.) (Not all standard container requirements are met; some operations are not presently supported because they haven't been needed yet.) This includes iteration support using (mostly) standard-conforming iterator types (they are presently missing iterator_category member types, pending being able to include so we can use std::bidirectional_iterator_tag). This change only provides the new facility, and doesn't include any uses of it, though there is a suite of unit tests for it. I've extracted it from some in-progress work, as a useful tool in it's own right. I've converted a couple existing list implementations to use IntrusiveList, and will be submitting those changes once this infrastructure is in place. One place I haven't yet touched that I think will benefit is G1's region handling. There are various places where G1 iterates over all regions in order to do something with those which satisfy some property (humongous regions, regions in the collection set, &etc). If it were trivial to create new region sublists (and this facility makes that easy), some of these could be turned into direct iteration over only the regions of interest. CR: https://bugs.openjdk.java.net/browse/JDK-8189088 Webrev: http://cr.openjdk.java.net/~kbarrett/8189088 Testing: JPRT to build and run unit tests on supported platforms. From david.holmes at oracle.com Tue Oct 10 08:47:25 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 Oct 2017 18:47:25 +1000 Subject: RFR: 8189088: Add intrusive doubly-linked list utility In-Reply-To: References: Message-ID: <12515708-d1c3-b284-1117-44b4561d53cd@oracle.com> Hi Kim, I get the gist of this but am not going to pretend I can follow all the details. :) So what actually gets expanded into the target type to support this. Is it just a next/prev pointer or is there additional infrastructure needed? Thanks, David On 10/10/2017 6:29 PM, Kim Barrett wrote: > RFR: 8189088: Add intrusive doubly-linked list utility > > Please review this new facility, providing a general mechanism for > intrusive doubly-linked lists. A class supports inclusion in a list by > having an IntrusiveListLink member, and providing structured > information about how to access that member. A class supports > inclusion in multiple lists by having multiple IntrusiveListLink > members, with different lists specified to use different members. > > The IntrusiveList class template provides the list management. It is > modelled on bidirectional containers such as std::list and > boost::intrusive::list, providing many of the expected member types > and functions. (Note that the member types use the Standard's naming > conventions.) (Not all standard container requirements are met; some > operations are not presently supported because they haven't been > needed yet.) This includes iteration support using (mostly) > standard-conforming iterator types (they are presently missing > iterator_category member types, pending being able to include > so we can use std::bidirectional_iterator_tag). > > This change only provides the new facility, and doesn't include any > uses of it, though there is a suite of unit tests for it. I've > extracted it from some in-progress work, as a useful tool in it's own > right. > > I've converted a couple existing list implementations to use > IntrusiveList, and will be submitting those changes once this > infrastructure is in place. One place I haven't yet touched that I > think will benefit is G1's region handling. There are various places > where G1 iterates over all regions in order to do something with those > which satisfy some property (humongous regions, regions in the > collection set, &etc). If it were trivial to create new region > sublists (and this facility makes that easy), some of these could be > turned into direct iteration over only the regions of interest. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8189088 > > Webrev: > http://cr.openjdk.java.net/~kbarrett/8189088 > > Testing: > JPRT to build and run unit tests on supported platforms. > > From glaubitz at physik.fu-berlin.de Tue Oct 10 09:32:32 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Tue, 10 Oct 2017 11:32:32 +0200 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). In-Reply-To: <2d1fd501-8ba3-7591-a360-2cdc114cfbe9@physik.fu-berlin.de> References: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> <3fcc865d-3eda-a341-e112-8417711ee3e5@oracle.com> <55211504-0f3e-52a0-0930-f34babb5da14@physik.fu-berlin.de> <2d1fd501-8ba3-7591-a360-2cdc114cfbe9@physik.fu-berlin.de> Message-ID: <276c6e05-1732-90da-466f-6c84326e7984@physik.fu-berlin.de> Hi Patric! On 10/04/2017 11:58 AM, John Paul Adrian Glaubitz wrote: > Hope this gets merged soon. After that, the linux-sparc builds > won't need any external patches downstream anymore. Any news on this? Thanks, Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From erik.joelsson at oracle.com Tue Oct 10 10:17:32 2017 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Tue, 10 Oct 2017 12:17:32 +0200 Subject: RFR (xs): JDK-8189071 - Require jtreg 4.2 b09 In-Reply-To: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> References: <231091EC-AF95-4C88-A5CC-F555FC2C9CC1@oracle.com> Message-ID: <8d0542fb-4594-7d5e-4ace-e1777d14de5b@oracle.com> Hello, This looks good, but in the future, please include build-dev on such changes. This one was trivial, but you never know. That way the build team is also better able to keep track of any build related changes. I only found out about this by stumbling over a conversation on internal chat. /Erik On 2017-10-10 03:35, jesper.wilhelmsson at oracle.com wrote: > Hi, > > Can I have a review of this trivial fix to update the jib-profile to require the latest version of jtreg. This to get rid of the SocketTimeoutException that we see in the hotspot nightlies. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189071 > > The change is: > > diff --git a/make/conf/jib-profiles.js b/make/conf/jib-profiles.js > --- a/make/conf/jib-profiles.js > +++ b/make/conf/jib-profiles.js > @@ -1063,7 +1063,7 @@ > jtreg: { > server: "javare", > revision: "4.2", > - build_number: "b08", > + build_number: "b09", > checksum_file: "MD5_VALUES", > file: "jtreg_bin-4.2.zip", > environment_name: "JT_HOME", > > > Thanks, > /Jesper > From sean.mullan at oracle.com Tue Oct 10 12:26:12 2017 From: sean.mullan at oracle.com (Sean Mullan) Date: Tue, 10 Oct 2017 08:26:12 -0400 Subject: [10] RFR(S) 8188775: Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.hotspot In-Reply-To: References: Message-ID: <449d883b-1208-9708-2da7-9cd6393a8db7@oracle.com> On 10/9/17 3:55 AM, Alan Bateman wrote: > On 05/10/2017 00:05, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8188775 >> >> Changes for 8182701[1] missed changes in default.policy for new module >> jdk.internal.vm.compiler.management. >> >> Add missing code: >> >> src/java.base/share/lib/security/default.policy >> @@ -154,6 +154,10 @@ >> ???? permission java.security.AllPermission; >> ?}; >> >> +grant codeBase "jrt:/jdk.internal.vm.compiler.management" { >> +??? permission java.security.AllPermission; >> +}; >> + > This looks okay to me although it would be nice if we could identify the > minimal permissions rather than granting it AllPermission. +1. Is there any reason you did not just grant it RuntimePermission "accessClassInPackage.org.graalvm.compiler.hotspot"? I see you have already pushed the fix, so I would recommend opening another issue to only grant the required permissions to the jdk.internal.vm.compiler.management module. Thanks, Sean From jbax at univocity.com Tue Oct 10 12:59:36 2017 From: jbax at univocity.com (Jeronimo Backes) Date: Tue, 10 Oct 2017 23:29:36 +1030 Subject: Issues with JDK 9 crashing itself and the operating system In-Reply-To: References: Message-ID: Hello Rohit Do you have any update regarding the cause of this? Looks like it is specific to Ryzen+Linux. On 25 September 2017 at 20:33, Rohit Arul Raj wrote: > Hello Jeronimo, > > Thanks for the detailed report. We were able to reproduce the issue on > our machine. > We will analyze this further and get back to you. > > Regards, > Rohit > > On Sat, Sep 23, 2017 at 4:46 PM, Jeronimo Backes > wrote: > > Hello, my name is Jeronimo and I'm the author of the univocity-parsers > > library (https://github.com/uniVocity/univocity-parsers) and I'm > writing to > > you by recommendation of Erik Duveblad. > > > > Basically, I recently installed the JDK 9 distributed by Oracle on my > > development computer and when I try to build my project (with a simple > `mvn > > clean install` command) the JVM crashes with: > > > > > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x00007f18b96c52f0, pid=3865, tid=3904 > > # > > # JRE version: Java(TM) SE Runtime Environment (9.0+181) (build 9+181) > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (9+181, mixed mode, tiered, > > compressed oops, g1 gc, linux-amd64) > > # Problematic frame: > > # V [libjvm.so+0x9292f0] > > JVMCIGlobals::check_jvmci_flags_are_consistent()+0x120 > > # > > # Core dump will be written. Default location: Core dumps may be > processed > > with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e" (or > dumping to > > /home/jbax/dev/repository/univocity-parsers/core.3865) > > # > > # An error report file with more information is saved as: > > # /home/jbax/dev/repository/univocity-parsers/hs_err_pid3865.log > > # > > # Compiler replay data is saved as: > > # /home/jbax/dev/repository/univocity-parsers/replay_pid3865.log > > # > > # If you would like to submit a bug report, please visit: > > # http://bugreport.java.com/bugreport/crash.jsp > > # > > > > > > The hs_err files generated are available here > > https://github.com/uniVocity/univocity-parsers/files/ > 1326484/jdk_9_crash2.zip. > > This zip also contains the pom.xml file I used. The build succeeded 4 > times > > before the JVM crashed. > > > > Yesterday I had the crash happen 100% of the time, but the CPU was > > overclocked to 3.6Ghz (never had any issue with it though) and saved the > > error file here: > > https://github.com/uniVocity/univocity-parsers/files/ > 1324326/jdk_9_crash.zip. > > I created an issue on github to investigate this: > > https://github.com/uniVocity/univocity-parsers/issues/189. There Erik > > mentioned that: > > > > "Looking at the hs_err file, the stack trace is "wrong", a C2 Compiler > > Thread can't call JVMCIGlobals::check_jvmci_flags_are_consistent (and > the > > value of the register RIP does not correspond to any instruction in the > > compiled version of that function). This makes me suspect that something > > could be wrong with your CPU, the CPU should not have jumped to this > memory > > location." > > > > Things still fail with stock hardware settings. More details about my > > environment : > > > > OS, Maven and Java versions: > > > > [jbax at linux-pc ~]$ mvn -version > > Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; > > 2014-12-15T03:59:23+10:30) > > Maven home: /home/jbax/dev/apache-maven > > Java version: 9, vendor: Oracle Corporation > > Java home: /home/jbax/dev/jdk9 > > Default locale: en_AU, platform encoding: UTF-8 > > OS name: "linux", version: "4.12.13-1-manjaro", arch: "amd64", family: > > "unix" > > [jbax at linux-pc ~]$ > > > > Hardware: > > [jbax at linux-pc univocity-parsers]$ lscpu > > Architecture: x86_64 > > CPU op-mode(s): 32-bit, 64-bit > > Byte Order: Little Endian > > CPU(s): 16 > > On-line CPU(s) list: 0-15 > > Thread(s) per core: 2 > > Core(s) per socket: 8 > > Socket(s): 1 > > NUMA node(s): 1 > > Vendor ID: AuthenticAMD > > CPU family: 23 > > Model: 1 > > Model name: AMD Ryzen 7 1700 Eight-Core Processor > > Stepping: 1 > > CPU MHz: 1550.000 > > CPU max MHz: 3000.0000 > > CPU min MHz: 1550.0000 > > BogoMIPS: 6001.43 > > Virtualization: AMD-V > > L1d cache: 32K > > L1i cache: 64K > > L2 cache: 512K > > L3 cache: 8192K > > NUMA node0 CPU(s): 0-15 > > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > > mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext > fxsr_opt > > pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid > extd_apicid > > aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe > popcnt > > aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm > > sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core > > perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 > smep > > bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves > clzero > > irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid > > decodeassists pausefilter pfthreshold avic overflow_recov succor smca > > > > On an unrelated note, I use an old java application that crashes the > entire > > OS for me when Java 9 is used: http://www.jinchess.com/download > > > > It's just a matter of downloading, unpacking and trying to start it with > > jin-2.14.1/jin > > > > The OS crashes and I have to hard-reset the computer. It works just fine > if > > revert back to Java 6, 7 or 8. > > > > I thought you'd might want to investigate what is going on. Let me know > if > > you need more information. > > > > Best regards, > > > > Jeronimo. > > > > > > > > > > -- > > the uniVocity team > > www.univocity.com > -- the uniVocity team www.univocity.com From kim.barrett at oracle.com Tue Oct 10 14:56:08 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Oct 2017 10:56:08 -0400 Subject: RFR: 8189088: Add intrusive doubly-linked list utility In-Reply-To: <12515708-d1c3-b284-1117-44b4561d53cd@oracle.com> References: <12515708-d1c3-b284-1117-44b4561d53cd@oracle.com> Message-ID: > On Oct 10, 2017, at 4:47 AM, David Holmes wrote: > > Hi Kim, > > I get the gist of this but am not going to pretend I can follow all the details. :) > > So what actually gets expanded into the target type to support this. Is it just a next/prev pointer or is there additional infrastructure needed? The target type gets a next/prev pair of pointers, plus a debug-only pointer to the currently containing list to support various assertions. Those are all packaged in the IntrusiveListLink class. Replicated for each simultaneous list an object may need to be in. From vladimir.kozlov at oracle.com Tue Oct 10 15:29:29 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 10 Oct 2017 08:29:29 -0700 Subject: [10] RFR(S) 8188775: Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.hotspot In-Reply-To: <449d883b-1208-9708-2da7-9cd6393a8db7@oracle.com> References: <449d883b-1208-9708-2da7-9cd6393a8db7@oracle.com> Message-ID: <3fad30f1-0050-12c5-4e61-4bda9852457b@oracle.com> Thank you Alan and Sean, I copied preceding code for jdk.internal.vm.compiler because it is not clear for me if accessClassInPackage is enough for all cases. Anyway, I filed next issue to find minimum required permission as you suggested. https://bugs.openjdk.java.net/browse/JDK-8189116 Thanks, Vladimir On 10/10/17 5:26 AM, Sean Mullan wrote: > On 10/9/17 3:55 AM, Alan Bateman wrote: >> On 05/10/2017 00:05, Vladimir Kozlov wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8188775 >>> >>> Changes for 8182701[1] missed changes in default.policy for new module jdk.internal.vm.compiler.management. >>> >>> Add missing code: >>> >>> src/java.base/share/lib/security/default.policy >>> @@ -154,6 +154,10 @@ >>> ???? permission java.security.AllPermission; >>> ?}; >>> >>> +grant codeBase "jrt:/jdk.internal.vm.compiler.management" { >>> +??? permission java.security.AllPermission; >>> +}; >>> + >> This looks okay to me although it would be nice if we could identify the minimal permissions rather than granting it >> AllPermission. > > +1. > > Is there any reason you did not just grant it RuntimePermission "accessClassInPackage.org.graalvm.compiler.hotspot"? > > I see you have already pushed the fix, so I would recommend opening another issue to only grant the required permissions > to the jdk.internal.vm.compiler.management module. > > Thanks, > Sean From volker.simonis at gmail.com Tue Oct 10 17:17:40 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 10 Oct 2017 19:17:40 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <4109f960-078f-e582-3c78-71f201a265fd@redhat.com> References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> <4109f960-078f-e582-3c78-71f201a265fd@redhat.com> Message-ID: On Tue, Oct 10, 2017 at 9:42 AM, Andrew Haley wrote: > On 09/10/17 20:24, Volker Simonis wrote: >> Unfortunately we can't easily generate these stubs during >> 'stubRoutines_init1()' because >> 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map >> base address which is only initialized in >> 'CardTableModRefBS::initialize()' during 'univers_init()' which >> happens after 'stubRoutines_init1()'. > > Yes you can, you can do something like we do for narrow_ptrs_base: > > if (Universe::is_fully_initialized()) { > mov(rheapbase, Universe::narrow_ptrs_base()); > } else { > lea(rheapbase, ExternalAddress((address)Universe::narrow_ptrs_base_addr())); > ldr(rheapbase, Address(rheapbase)); > } > Hi Andrew, thanks for your suggestion. Yes, I could do that, but that would replace a constant load in the barrier with a constant load plus a load from memory, because during stubRoutines_init1() heap won't be initialized. Not sure about this, but I think we want to avoid this overhead in the barriers. Also, Christian proposed in a previous mail to replace the G1 barrier stubs on SPARC with simple runtime calls like on other platforms. While I think that it is probably worthwhile thinking about such a change, I don't know the exact history of these stubs and probably some GC experts should decide if that's really a good idea. I'd be happy to open an extra issue for following up on that path. But for the moments I've simply added a new initialization step "g1_barrier_stubs_init()" between 'univers_init()' and interpreter_init() which is empty on all platforms except SPARC where it generates the corresponding stubs: http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v3/ I've built and smoke-tested the new change on Windows, MacOS, Solaris/SPARC, AIX, Linux/x86_64/ppc64/ppc64le/s390. Unfortunately I don't have access to ARM machines so I couldn't check arm,arm64 and aarch64 although I don't expect any problems there (actually I've just added an empty method there). But it would be great if somebody could check that for any case. @Vladimir: I've also rebased the change for "8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob()": http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ Because it changes the same files like 8166317 it should be applied and pushed only after 8166317 was pushed. Thank you and best regards, Volker > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From coleen.phillimore at oracle.com Tue Oct 10 22:01:01 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 10 Oct 2017 18:01:01 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot Message-ID: Summary: With the new template functions these are unnecessary. The changes are mostly s/_ptr// and removing the cast to return type.? There weren't many types that needed to be improved to match the template version of the function.?? Some notes: 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging arguments. 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I disliked the first name because it's not explicit from the callers that there's an underlying cas.? If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer. 3. Added Atomic::sub() Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8188220 Thanks, Coleen From kim.barrett at oracle.com Wed Oct 11 03:43:19 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 10 Oct 2017 23:43:19 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: Message-ID: > On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: > > Summary: With the new template functions these are unnecessary. > > 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. I disliked the first name because it's not explicit from the callers that there's an underlying cas. If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer. I'm still looking at other parts, but I want to respond to this now. I object to this change. I think the proposed new name is confusing, suggesting there are two different comparisons involved. I originally called it something else that I wasn't entirely happy with. When David suggested replace_if_null I quickly adopted that as I think that name exactly describes what it does. In particular, I think "atomic replace if" pretty clearly suggests a test-and-set / compare-and-swap type of operation. Further, I think any name involving "cmpxchg" is problematic because the result of this operation is intentionally different from cmpxchg, in order to better support the primary use-case, which is lazy initialization. I also object to your alternative suggestion of removing the operation entirely and just using cmpxchg directly instead. I don't recall how many occurrences there presently are, but I suspect more could easily be added; it's part of a lazy initialization pattern similar to DCLP but without the locks. From david.holmes at oracle.com Wed Oct 11 03:55:27 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 Oct 2017 13:55:27 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: Message-ID: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> On 11/10/2017 1:43 PM, Kim Barrett wrote: >> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >> >> Summary: With the new template functions these are unnecessary. >> >> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. I disliked the first name because it's not explicit from the callers that there's an underlying cas. If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer. > > I'm still looking at other parts, but I want to respond to this now. > > I object to this change. I think the proposed new name is confusing, > suggesting there are two different comparisons involved. > > I originally called it something else that I wasn't entirely happy > with. When David suggested replace_if_null I quickly adopted that as > I think that name exactly describes what it does. In particular, I > think "atomic replace if" pretty clearly suggests a test-and-set / > compare-and-swap type of operation. I totally agree. It's an Atomic operation, the implementation will involve something atomic, it doesn't matter if it is cmpxchg or something else. The name replace_if_null describes exactly what the function does - it doesn't have to describe how it does it. David ----- > Further, I think any name involving "cmpxchg" is problematic because > the result of this operation is intentionally different from cmpxchg, > in order to better support the primary use-case, which is lazy > initialization. > > I also object to your alternative suggestion of removing the operation > entirely and just using cmpxchg directly instead. I don't recall how > many occurrences there presently are, but I suspect more could easily > be added; it's part of a lazy initialization pattern similar to DCLP > but without the locks. > From erik.osterlund at oracle.com Wed Oct 11 07:45:59 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 11 Oct 2017 09:45:59 +0200 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> Message-ID: <59DDCC37.8050306@oracle.com> Hi, First off - big thanks to Coleen for this cleanup. Nice! I think I have to take Coleen's side here regarding replace_if_null. Here is why: 1) I do not see how performing a CAS expecting NULL specifically is special enough that it warrants its own operation. It does not save many characters to just type it, and makes it less obvious what it does, which seems unnecessary to me. Atomic ought to have the minimum atomic operations required and not get cluttered with helpers. 2) To me it really does matter what each operation boils down to in Atomic, especially in terms of semantics. Will my replace_if_null have acquire semantics if it does not find null? Will it have trailing leading, or bidirectional fencing if it succeeds, or just release semantics on the store? Does it allow spurious failures? It matters to me, and should preferrably not be abstracted away in my opinion. And if we really depend on it behaving exactly like Atomic::cmpxchg semantically, I think (like Coleen) that either the name should reflect that, or preferrably for me, it should be removed and replaced with an explicit Atomic::cmpxchg. 3) I prefer not to have multiple APIs for doing the same thing. We all know what happens when programmers are given the choice of two different ways of expressing the same thing: they start disagreeing about how to express that thing. Now in this changeset, there are inconsistencies already. For example in classLoaderData.cpp:946 there is one occurence of an explicit cmpxchg that expects null (for the purposes of lazy initialization), while other places (e.g. nmethod.cpp:1662) use the abstraction. Should that be changed now (and in subsequent changesets) to use the abstraction to make the code consistent? I might think this should not matter and that the explicit CAS is okay, but I can almost promise somebody will have the opposite opinion. By having one way of performing a CAS that expects 0, we can spend less time disagreeing about which way we should CAS, and more time on other things of more importance. This is just my 50 cent, letting Coleen know she is not the only one with similar thoughts. I have not reviewed this completely yet - thought I'd wait with that until we agree about replace_if_null, if that is okay. Thanks, /Erik On 2017-10-11 05:55, David Holmes wrote: > On 11/10/2017 1:43 PM, Kim Barrett wrote: >>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>> >>> Summary: With the new template functions these are unnecessary. >>> >>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. I >>> disliked the first name because it's not explicit from the callers >>> that there's an underlying cas. If people want to fight, I'll >>> remove the function and use cmpxchg because there are only a couple >>> places where this is a little nicer. >> >> I'm still looking at other parts, but I want to respond to this now. >> >> I object to this change. I think the proposed new name is confusing, >> suggesting there are two different comparisons involved. >> >> I originally called it something else that I wasn't entirely happy >> with. When David suggested replace_if_null I quickly adopted that as >> I think that name exactly describes what it does. In particular, I >> think "atomic replace if" pretty clearly suggests a test-and-set / >> compare-and-swap type of operation. > > I totally agree. It's an Atomic operation, the implementation will > involve something atomic, it doesn't matter if it is cmpxchg or > something else. The name replace_if_null describes exactly what the > function does - it doesn't have to describe how it does it. > > David > ----- > >> Further, I think any name involving "cmpxchg" is problematic because >> the result of this operation is intentionally different from cmpxchg, >> in order to better support the primary use-case, which is lazy >> initialization. >> >> I also object to your alternative suggestion of removing the operation >> entirely and just using cmpxchg directly instead. I don't recall how >> many occurrences there presently are, but I suspect more could easily >> be added; it's part of a lazy initialization pattern similar to DCLP >> but without the locks. >> From david.holmes at oracle.com Wed Oct 11 08:09:29 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 Oct 2017 18:09:29 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <59DDCC37.8050306@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> Message-ID: <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: > Hi, > > First off - big thanks to Coleen for this cleanup. Nice! > > I think I have to take Coleen's side here regarding replace_if_null. > Here is why: > > 1) I do not see how performing a CAS expecting NULL specifically is > special enough that it warrants its own operation. It does not save many > characters to just type it, and makes it less obvious what it does, > which seems unnecessary to me. Atomic ought to have the minimum atomic > operations required and not get cluttered with helpers. From the earlier review thread related to the initial templatization of Atomic: "(1) cmpxchg(v, p, NULL), to store a pointer if no pointer is already present. This can be used as an alternative to DCLP. One way to deal with this might be an overload on std::nullptr_t and use nullptr, but that requires C++11. We don't have any current uses of this that I could find, but it's a sufficiently interesting idiom that I'm relucant to forbid it. But such idiomatic usage could be wrapped up in its own little package that can deal with the restriction." "I've also added bool Atomic::conditional_store_ptr(T, D volatile*), for the idiom of storing a value if the old value is NULL. It turns out there are about 25 occurrences of this idiom in Hotspot, so a utility for it seems warranted. The current implementation is just a straightforward wrapper around cmpxchg, which means it can't take advantage of gcc's __sync_bool_compare_and_swap. That can be dealt with later if desired." > 2) To me it really does matter what each operation boils down to in > Atomic, especially in terms of semantics. Will my replace_if_null have > acquire semantics if it does not find null? Will it have trailing > leading, or bidirectional fencing if it succeeds, or just release > semantics on the store? Does it allow spurious failures? It matters to > me, and should preferrably not be abstracted away in my opinion. I can buy that partially but you're stretching things given you can't glean those details from the name cmpxchg either. > And if we really depend on it behaving exactly like Atomic::cmpxchg > semantically, I think (like Coleen) that either the name should reflect > that, or preferrably for me, it should be removed and replaced with an > explicit Atomic::cmpxchg. I don't think we do/need-to depend on that. > 3) I prefer not to have multiple APIs for doing the same thing. We all > know what happens when programmers are given the choice of two different > ways of expressing the same thing: they start disagreeing about how to > express that thing. Now in this changeset, there are inconsistencies > already. For example in classLoaderData.cpp:946 there is one occurence > of an explicit cmpxchg that expects null (for the purposes of lazy > initialization), while other places (e.g. nmethod.cpp:1662) use the > abstraction. Should that be changed now (and in subsequent changesets) > to use the abstraction to make the code consistent? I might think this > should not matter and that the explicit CAS is okay, but I can almost > promise somebody will have the opposite opinion. By having one way of > performing a CAS that expects 0, we can spend less time disagreeing > about which way we should CAS, and more time on other things of more > importance. > > This is just my 50 cent, letting Coleen know she is not the only one > with similar thoughts. Removing the operation is a different argument to renaming it. Most of the above argues for removing it. :) Cheers, David ----- > I have not reviewed this completely yet - thought I'd wait with that > until we agree about replace_if_null, if that is okay. > > Thanks, > /Erik > > On 2017-10-11 05:55, David Holmes wrote: >> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>> >>>> Summary: With the new template functions these are unnecessary. >>>> >>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I >>>> disliked the first name because it's not explicit from the callers >>>> that there's an underlying cas.? If people want to fight, I'll >>>> remove the function and use cmpxchg because there are only a couple >>>> places where this is a little nicer. >>> >>> I'm still looking at other parts, but I want to respond to this now. >>> >>> I object to this change.? I think the proposed new name is confusing, >>> suggesting there are two different comparisons involved. >>> >>> I originally called it something else that I wasn't entirely happy >>> with.? When David suggested replace_if_null I quickly adopted that as >>> I think that name exactly describes what it does.? In particular, I >>> think "atomic replace if" pretty clearly suggests a test-and-set / >>> compare-and-swap type of operation. >> >> I totally agree. It's an Atomic operation, the implementation will >> involve something atomic, it doesn't matter if it is cmpxchg or >> something else. The name replace_if_null describes exactly what the >> function does - it doesn't have to describe how it does it. >> >> David >> ----- >> >>> Further, I think any name involving "cmpxchg" is problematic because >>> the result of this operation is intentionally different from cmpxchg, >>> in order to better support the primary use-case, which is lazy >>> initialization. >>> >>> I also object to your alternative suggestion of removing the operation >>> entirely and just using cmpxchg directly instead.? I don't recall how >>> many occurrences there presently are, but I suspect more could easily >>> be added; it's part of a lazy initialization pattern similar to DCLP >>> but without the locks. >>> > From robbin.ehn at oracle.com Wed Oct 11 08:12:04 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 11 Oct 2017 10:12:04 +0200 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> Message-ID: <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> On 10/11/2017 10:09 AM, David Holmes wrote: > On 11/10/2017 5:45 PM, Erik ?sterlund wrote: > > Removing the operation is a different argument to renaming it. Most of the above argues for removing it. :) +1 on removing Thanks, Robbin > > Cheers, > David > ----- > >> I have not reviewed this completely yet - thought I'd wait with that until we agree about replace_if_null, if that is okay. >> >> Thanks, >> /Erik >> >> On 2017-10-11 05:55, David Holmes wrote: >>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>> >>>>> Summary: With the new template functions these are unnecessary. >>>>> >>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I disliked the first name because it's not explicit from the callers that there's an underlying cas. >>>>> If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer. >>>> >>>> I'm still looking at other parts, but I want to respond to this now. >>>> >>>> I object to this change.? I think the proposed new name is confusing, >>>> suggesting there are two different comparisons involved. >>>> >>>> I originally called it something else that I wasn't entirely happy >>>> with.? When David suggested replace_if_null I quickly adopted that as >>>> I think that name exactly describes what it does.? In particular, I >>>> think "atomic replace if" pretty clearly suggests a test-and-set / >>>> compare-and-swap type of operation. >>> >>> I totally agree. It's an Atomic operation, the implementation will involve something atomic, it doesn't matter if it is cmpxchg or something else. The name >>> replace_if_null describes exactly what the function does - it doesn't have to describe how it does it. >>> >>> David >>> ----- >>> >>>> Further, I think any name involving "cmpxchg" is problematic because >>>> the result of this operation is intentionally different from cmpxchg, >>>> in order to better support the primary use-case, which is lazy >>>> initialization. >>>> >>>> I also object to your alternative suggestion of removing the operation >>>> entirely and just using cmpxchg directly instead.? I don't recall how >>>> many occurrences there presently are, but I suspect more could easily >>>> be added; it's part of a lazy initialization pattern similar to DCLP >>>> but without the locks. >>>> >> From coleen.phillimore at oracle.com Wed Oct 11 11:07:28 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 11 Oct 2017 07:07:28 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> Message-ID: On 10/11/17 4:12 AM, Robbin Ehn wrote: > On 10/11/2017 10:09 AM, David Holmes wrote: >> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >> >> Removing the operation is a different argument to renaming it. Most >> of the above argues for removing it. :) > > +1 on removing Thank you for all your feedback.? Erik best described what I was thinking.? I will remove it then.? There were not that many instances and one instance that people thought would be useful, needed the old return value. Coleen > > Thanks, Robbin > >> >> Cheers, >> David >> ----- >> >>> I have not reviewed this completely yet - thought I'd wait with that >>> until we agree about replace_if_null, if that is okay. >>> >>> Thanks, >>> /Erik >>> >>> On 2017-10-11 05:55, David Holmes wrote: >>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>>> >>>>>> Summary: With the new template functions these are unnecessary. >>>>>> >>>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I >>>>>> disliked the first name because it's not explicit from the >>>>>> callers that there's an underlying cas.? If people want to fight, >>>>>> I'll remove the function and use cmpxchg because there are only a >>>>>> couple places where this is a little nicer. >>>>> >>>>> I'm still looking at other parts, but I want to respond to this now. >>>>> >>>>> I object to this change.? I think the proposed new name is confusing, >>>>> suggesting there are two different comparisons involved. >>>>> >>>>> I originally called it something else that I wasn't entirely happy >>>>> with.? When David suggested replace_if_null I quickly adopted that as >>>>> I think that name exactly describes what it does.? In particular, I >>>>> think "atomic replace if" pretty clearly suggests a test-and-set / >>>>> compare-and-swap type of operation. >>>> >>>> I totally agree. It's an Atomic operation, the implementation will >>>> involve something atomic, it doesn't matter if it is cmpxchg or >>>> something else. The name replace_if_null describes exactly what the >>>> function does - it doesn't have to describe how it does it. >>>> >>>> David >>>> ----- >>>> >>>>> Further, I think any name involving "cmpxchg" is problematic because >>>>> the result of this operation is intentionally different from cmpxchg, >>>>> in order to better support the primary use-case, which is lazy >>>>> initialization. >>>>> >>>>> I also object to your alternative suggestion of removing the >>>>> operation >>>>> entirely and just using cmpxchg directly instead.? I don't recall how >>>>> many occurrences there presently are, but I suspect more could easily >>>>> be added; it's part of a lazy initialization pattern similar to DCLP >>>>> but without the locks. >>>>> >>> From robbin.ehn at oracle.com Wed Oct 11 13:37:51 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 11 Oct 2017 15:37:51 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes Message-ID: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Hi all, Starting the review of the code while JEP work is still not completed. JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none. Entire changeset: http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ Divided into 3-parts, SafepointMechanism abstraction: http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ Consolidating polling page allocation: http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ Handshakes: http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well. The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. Example of potential use-cases: -Biased lock revocation -External requests for stack traces -Deoptimization -Async exception delivery -External suspension -Eliding memory barriers All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC. Tested heavily with various test suits and comes with a few new tests. Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list. Thanks, Robbin From coleen.phillimore at oracle.com Wed Oct 11 13:50:08 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 11 Oct 2017 09:50:08 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> Message-ID: Please review version .02 which removes use of replace_if_null, but not the function.? A separate RFE can be filed to discuss that. open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev Thanks, Coleen On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: > > > On 10/11/17 4:12 AM, Robbin Ehn wrote: >> On 10/11/2017 10:09 AM, David Holmes wrote: >>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>> >>> Removing the operation is a different argument to renaming it. Most >>> of the above argues for removing it. :) >> >> +1 on removing > > Thank you for all your feedback.? Erik best described what I was > thinking.? I will remove it then.? There were not that many instances > and one instance that people thought would be useful, needed the old > return value. > > Coleen >> >> Thanks, Robbin >> >>> >>> Cheers, >>> David >>> ----- >>> >>>> I have not reviewed this completely yet - thought I'd wait with >>>> that until we agree about replace_if_null, if that is okay. >>>> >>>> Thanks, >>>> /Erik >>>> >>>> On 2017-10-11 05:55, David Holmes wrote: >>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>>>> >>>>>>> Summary: With the new template functions these are unnecessary. >>>>>>> >>>>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? >>>>>>> I disliked the first name because it's not explicit from the >>>>>>> callers that there's an underlying cas.? If people want to >>>>>>> fight, I'll remove the function and use cmpxchg because there >>>>>>> are only a couple places where this is a little nicer. >>>>>> >>>>>> I'm still looking at other parts, but I want to respond to this now. >>>>>> >>>>>> I object to this change.? I think the proposed new name is >>>>>> confusing, >>>>>> suggesting there are two different comparisons involved. >>>>>> >>>>>> I originally called it something else that I wasn't entirely happy >>>>>> with.? When David suggested replace_if_null I quickly adopted >>>>>> that as >>>>>> I think that name exactly describes what it does.? In particular, I >>>>>> think "atomic replace if" pretty clearly suggests a test-and-set / >>>>>> compare-and-swap type of operation. >>>>> >>>>> I totally agree. It's an Atomic operation, the implementation will >>>>> involve something atomic, it doesn't matter if it is cmpxchg or >>>>> something else. The name replace_if_null describes exactly what >>>>> the function does - it doesn't have to describe how it does it. >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> Further, I think any name involving "cmpxchg" is problematic because >>>>>> the result of this operation is intentionally different from >>>>>> cmpxchg, >>>>>> in order to better support the primary use-case, which is lazy >>>>>> initialization. >>>>>> >>>>>> I also object to your alternative suggestion of removing the >>>>>> operation >>>>>> entirely and just using cmpxchg directly instead.? I don't recall >>>>>> how >>>>>> many occurrences there presently are, but I suspect more could >>>>>> easily >>>>>> be added; it's part of a lazy initialization pattern similar to DCLP >>>>>> but without the locks. >>>>>> >>>> > From erik.osterlund at oracle.com Wed Oct 11 15:36:04 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 11 Oct 2017 17:36:04 +0200 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> Message-ID: <59DE3A64.7000009@oracle.com> Hi Coleen, In classLoaderData.cpp:~167: There is a cast to Chunk* when loading _head, but _head is already Chunk*, so it seems like that should not need a cast. In fact, _head should probably be declared as Chunk *volatile as it is accessed concurrently. In parNewGeneration.cpp:~1450: Atomic::add(-n, &_num_par_pushes); can now use Atomic::sub instead. g1PageBasedVirtualSpace.cpp:~249: Do you really need the (char*) cast for Atomic::add? Seems like it already is a char*, unless I missed something. cpCache.hpp: Noticed the casts for &_f1 (declared as volatile Metadata*) to Metadata *volatile*. It seems to me like _f1 should instead be declared as Metaata* volatile, and remove the casts. Also noticed some copyright headers have not been updated, might want to have a look at that. Otherwise, I think this looks good. Thank you again for doing this! Thanks, /Erik On 2017-10-11 15:50, coleen.phillimore at oracle.com wrote: > > Please review version .02 which removes use of replace_if_null, but > not the function. A separate RFE can be filed to discuss that. > > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev > > Thanks, > Coleen > > On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>> >>>> Removing the operation is a different argument to renaming it. Most >>>> of the above argues for removing it. :) >>> >>> +1 on removing >> >> Thank you for all your feedback. Erik best described what I was >> thinking. I will remove it then. There were not that many instances >> and one instance that people thought would be useful, needed the old >> return value. >> >> Coleen >>> >>> Thanks, Robbin >>> >>>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> I have not reviewed this completely yet - thought I'd wait with >>>>> that until we agree about replace_if_null, if that is okay. >>>>> >>>>> Thanks, >>>>> /Erik >>>>> >>>>> On 2017-10-11 05:55, David Holmes wrote: >>>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>>>>> >>>>>>>> Summary: With the new template functions these are unnecessary. >>>>>>>> >>>>>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. >>>>>>>> I disliked the first name because it's not explicit from the >>>>>>>> callers that there's an underlying cas. If people want to >>>>>>>> fight, I'll remove the function and use cmpxchg because there >>>>>>>> are only a couple places where this is a little nicer. >>>>>>> >>>>>>> I'm still looking at other parts, but I want to respond to this >>>>>>> now. >>>>>>> >>>>>>> I object to this change. I think the proposed new name is >>>>>>> confusing, >>>>>>> suggesting there are two different comparisons involved. >>>>>>> >>>>>>> I originally called it something else that I wasn't entirely happy >>>>>>> with. When David suggested replace_if_null I quickly adopted >>>>>>> that as >>>>>>> I think that name exactly describes what it does. In particular, I >>>>>>> think "atomic replace if" pretty clearly suggests a test-and-set / >>>>>>> compare-and-swap type of operation. >>>>>> >>>>>> I totally agree. It's an Atomic operation, the implementation >>>>>> will involve something atomic, it doesn't matter if it is cmpxchg >>>>>> or something else. The name replace_if_null describes exactly >>>>>> what the function does - it doesn't have to describe how it does it. >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Further, I think any name involving "cmpxchg" is problematic >>>>>>> because >>>>>>> the result of this operation is intentionally different from >>>>>>> cmpxchg, >>>>>>> in order to better support the primary use-case, which is lazy >>>>>>> initialization. >>>>>>> >>>>>>> I also object to your alternative suggestion of removing the >>>>>>> operation >>>>>>> entirely and just using cmpxchg directly instead. I don't >>>>>>> recall how >>>>>>> many occurrences there presently are, but I suspect more could >>>>>>> easily >>>>>>> be added; it's part of a lazy initialization pattern similar to >>>>>>> DCLP >>>>>>> but without the locks. >>>>>>> >>>>> >> > From rohitarulraj at gmail.com Wed Oct 11 16:20:01 2017 From: rohitarulraj at gmail.com (Rohit Arul Raj) Date: Wed, 11 Oct 2017 21:50:01 +0530 Subject: Issues with JDK 9 crashing itself and the operating system In-Reply-To: References: Message-ID: Hello Jeronimo, Sorry for the late reply. We have already forwarded the issue to the relevant team here to confirm if it is indeed specific to Ryzen + Linux. Please give us some more time to confirm the same. Regards, Rohit On Tue, Oct 10, 2017 at 6:29 PM, Jeronimo Backes wrote: > Hello Rohit > Do you have any update regarding the cause of this? Looks like it is > specific to Ryzen+Linux. > > On 25 September 2017 at 20:33, Rohit Arul Raj > wrote: >> >> Hello Jeronimo, >> >> Thanks for the detailed report. We were able to reproduce the issue on >> our machine. >> We will analyze this further and get back to you. >> >> Regards, >> Rohit >> >> On Sat, Sep 23, 2017 at 4:46 PM, Jeronimo Backes >> wrote: >> > Hello, my name is Jeronimo and I'm the author of the univocity-parsers >> > library (https://github.com/uniVocity/univocity-parsers) and I'm writing >> > to >> > you by recommendation of Erik Duveblad. >> > >> > Basically, I recently installed the JDK 9 distributed by Oracle on my >> > development computer and when I try to build my project (with a simple >> > `mvn >> > clean install` command) the JVM crashes with: >> > >> > >> > # A fatal error has been detected by the Java Runtime Environment: >> > # >> > # SIGSEGV (0xb) at pc=0x00007f18b96c52f0, pid=3865, tid=3904 >> > # >> > # JRE version: Java(TM) SE Runtime Environment (9.0+181) (build 9+181) >> > # Java VM: Java HotSpot(TM) 64-Bit Server VM (9+181, mixed mode, tiered, >> > compressed oops, g1 gc, linux-amd64) >> > # Problematic frame: >> > # V [libjvm.so+0x9292f0] >> > JVMCIGlobals::check_jvmci_flags_are_consistent()+0x120 >> > # >> > # Core dump will be written. Default location: Core dumps may be >> > processed >> > with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e" (or >> > dumping to >> > /home/jbax/dev/repository/univocity-parsers/core.3865) >> > # >> > # An error report file with more information is saved as: >> > # /home/jbax/dev/repository/univocity-parsers/hs_err_pid3865.log >> > # >> > # Compiler replay data is saved as: >> > # /home/jbax/dev/repository/univocity-parsers/replay_pid3865.log >> > # >> > # If you would like to submit a bug report, please visit: >> > # http://bugreport.java.com/bugreport/crash.jsp >> > # >> > >> > >> > The hs_err files generated are available here >> > >> > https://github.com/uniVocity/univocity-parsers/files/1326484/jdk_9_crash2.zip. >> > This zip also contains the pom.xml file I used. The build succeeded 4 >> > times >> > before the JVM crashed. >> > >> > Yesterday I had the crash happen 100% of the time, but the CPU was >> > overclocked to 3.6Ghz (never had any issue with it though) and saved the >> > error file here: >> > >> > https://github.com/uniVocity/univocity-parsers/files/1324326/jdk_9_crash.zip. >> > I created an issue on github to investigate this: >> > https://github.com/uniVocity/univocity-parsers/issues/189. There Erik >> > mentioned that: >> > >> > "Looking at the hs_err file, the stack trace is "wrong", a C2 Compiler >> > Thread can't call JVMCIGlobals::check_jvmci_flags_are_consistent (and >> > the >> > value of the register RIP does not correspond to any instruction in the >> > compiled version of that function). This makes me suspect that something >> > could be wrong with your CPU, the CPU should not have jumped to this >> > memory >> > location." >> > >> > Things still fail with stock hardware settings. More details about my >> > environment : >> > >> > OS, Maven and Java versions: >> > >> > [jbax at linux-pc ~]$ mvn -version >> > Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; >> > 2014-12-15T03:59:23+10:30) >> > Maven home: /home/jbax/dev/apache-maven >> > Java version: 9, vendor: Oracle Corporation >> > Java home: /home/jbax/dev/jdk9 >> > Default locale: en_AU, platform encoding: UTF-8 >> > OS name: "linux", version: "4.12.13-1-manjaro", arch: "amd64", family: >> > "unix" >> > [jbax at linux-pc ~]$ >> > >> > Hardware: >> > [jbax at linux-pc univocity-parsers]$ lscpu >> > Architecture: x86_64 >> > CPU op-mode(s): 32-bit, 64-bit >> > Byte Order: Little Endian >> > CPU(s): 16 >> > On-line CPU(s) list: 0-15 >> > Thread(s) per core: 2 >> > Core(s) per socket: 8 >> > Socket(s): 1 >> > NUMA node(s): 1 >> > Vendor ID: AuthenticAMD >> > CPU family: 23 >> > Model: 1 >> > Model name: AMD Ryzen 7 1700 Eight-Core Processor >> > Stepping: 1 >> > CPU MHz: 1550.000 >> > CPU max MHz: 3000.0000 >> > CPU min MHz: 1550.0000 >> > BogoMIPS: 6001.43 >> > Virtualization: AMD-V >> > L1d cache: 32K >> > L1i cache: 64K >> > L2 cache: 512K >> > L3 cache: 8192K >> > NUMA node0 CPU(s): 0-15 >> > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >> > pge >> > mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext >> > fxsr_opt >> > pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid >> > extd_apicid >> > aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe >> > popcnt >> > aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm >> > sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core >> > perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 >> > smep >> > bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves >> > clzero >> > irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid >> > decodeassists pausefilter pfthreshold avic overflow_recov succor smca >> > >> > On an unrelated note, I use an old java application that crashes the >> > entire >> > OS for me when Java 9 is used: http://www.jinchess.com/download >> > >> > It's just a matter of downloading, unpacking and trying to start it with >> > jin-2.14.1/jin >> > >> > The OS crashes and I have to hard-reset the computer. It works just fine >> > if >> > revert back to Java 6, 7 or 8. >> > >> > I thought you'd might want to investigate what is going on. Let me know >> > if >> > you need more information. >> > >> > Best regards, >> > >> > Jeronimo. >> > >> > >> > >> > >> > -- >> > the uniVocity team >> > www.univocity.com > > > > > -- > the uniVocity team > www.univocity.com From coleen.phillimore at oracle.com Wed Oct 11 17:44:34 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 11 Oct 2017 13:44:34 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <59DE3A64.7000009@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> <59DE3A64.7000009@oracle.com> Message-ID: <8f16bce9-8131-8ec6-18af-cba3d8234d71@oracle.com> On 10/11/17 11:36 AM, Erik ?sterlund wrote: > Hi Coleen, > > In classLoaderData.cpp:~167: > There is a cast to Chunk* when loading _head, but _head is already > Chunk*, so it seems like that should not need a cast. In fact, _head > should probably be declared as Chunk *volatile as it is accessed > concurrently. Yes, you are right.? I fixed it and now declare _head as Chunk* volatile (star goes on type I think). > > In parNewGeneration.cpp:~1450: > Atomic::add(-n, &_num_par_pushes); > can now use Atomic::sub instead. Fixed. > > g1PageBasedVirtualSpace.cpp:~249: > Do you really need the (char*) cast for Atomic::add? Seems like it > already is a char*, unless I missed something. > Nope.? Missed that one. > cpCache.hpp: > Noticed the casts for &_f1 (declared as volatile Metadata*) to > Metadata *volatile*. It seems to me like _f1 should instead be > declared as Metaata* volatile, and remove the casts. > Fixed.? You are right about the declaration for _f1.? It should be Metadata* volatile. > Also noticed some copyright headers have not been updated, might want > to have a look at that. > I forgot to say that I update the copyrights in my commit script. > Otherwise, I think this looks good. Thank you again for doing this! > Thank you so much for reviewing all of this and making the templates easy to use. Coleen > Thanks, > /Erik > > On 2017-10-11 15:50, coleen.phillimore at oracle.com wrote: >> >> Please review version .02 which removes use of replace_if_null, but >> not the function.? A separate RFE can be filed to discuss that. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev >> >> Thanks, >> Coleen >> >> On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>>> >>>>> Removing the operation is a different argument to renaming it. >>>>> Most of the above argues for removing it. :) >>>> >>>> +1 on removing >>> >>> Thank you for all your feedback.? Erik best described what I was >>> thinking.? I will remove it then.? There were not that many >>> instances and one instance that people thought would be useful, >>> needed the old return value. >>> >>> Coleen >>>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> I have not reviewed this completely yet - thought I'd wait with >>>>>> that until we agree about replace_if_null, if that is okay. >>>>>> >>>>>> Thanks, >>>>>> /Erik >>>>>> >>>>>> On 2017-10-11 05:55, David Holmes wrote: >>>>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>>>>>> >>>>>>>>> Summary: With the new template functions these are unnecessary. >>>>>>>>> >>>>>>>>> 2. renamed Atomic::replace_if_null to >>>>>>>>> Atomic::cmpxchg_if_null.? I disliked the first name because >>>>>>>>> it's not explicit from the callers that there's an underlying >>>>>>>>> cas.? If people want to fight, I'll remove the function and >>>>>>>>> use cmpxchg because there are only a couple places where this >>>>>>>>> is a little nicer. >>>>>>>> >>>>>>>> I'm still looking at other parts, but I want to respond to this >>>>>>>> now. >>>>>>>> >>>>>>>> I object to this change.? I think the proposed new name is >>>>>>>> confusing, >>>>>>>> suggesting there are two different comparisons involved. >>>>>>>> >>>>>>>> I originally called it something else that I wasn't entirely happy >>>>>>>> with.? When David suggested replace_if_null I quickly adopted >>>>>>>> that as >>>>>>>> I think that name exactly describes what it does. In particular, I >>>>>>>> think "atomic replace if" pretty clearly suggests a test-and-set / >>>>>>>> compare-and-swap type of operation. >>>>>>> >>>>>>> I totally agree. It's an Atomic operation, the implementation >>>>>>> will involve something atomic, it doesn't matter if it is >>>>>>> cmpxchg or something else. The name replace_if_null describes >>>>>>> exactly what the function does - it doesn't have to describe how >>>>>>> it does it. >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Further, I think any name involving "cmpxchg" is problematic >>>>>>>> because >>>>>>>> the result of this operation is intentionally different from >>>>>>>> cmpxchg, >>>>>>>> in order to better support the primary use-case, which is lazy >>>>>>>> initialization. >>>>>>>> >>>>>>>> I also object to your alternative suggestion of removing the >>>>>>>> operation >>>>>>>> entirely and just using cmpxchg directly instead.? I don't >>>>>>>> recall how >>>>>>>> many occurrences there presently are, but I suspect more could >>>>>>>> easily >>>>>>>> be added; it's part of a lazy initialization pattern similar to >>>>>>>> DCLP >>>>>>>> but without the locks. >>>>>>>> >>>>>> >>> >> > From bob.vandette at oracle.com Wed Oct 11 19:11:41 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 11 Oct 2017 15:11:41 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> Message-ID: Here?s an updated webrev for this RFE that contains changes and cleanups based on feedback I?ve received so far. I?m still investigating the best approach for reacting to cpu shares and quotas. I do not believe doing nothing is the answer. http://cr.openjdk.java.net/~bobv/8146115/webrev.01 Updates: 1. I had to move the processing of AggressiveHeap since the container memory size needs to be known before this can be processed. 2. I no longer use the cpuset.cpus contents since sched_getaffinity reports the correct results even if someone manually updates the cgroup data. I originally didn?t think this was the case since sched_setaffinity didn?t automatically update the cpuset file contents but the inverse is true. 3. I ifdef?d the container function support in src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os platform directories. I can do this if it?s absolutely necessary. Bob. > On Oct 6, 2017, at 7:28 PM, David Holmes wrote: > > On 7/10/2017 1:34 AM, Bob Vandette wrote: >>> On Oct 5, 2017, at 6:12 PM, David Holmes wrote: >>> >>> Hi Bob, >>> >>> On 6/10/2017 3:57 AM, Bob Vandette wrote: >>>>> On Oct 5, 2017, at 12:43 PM, Alex Bagehot > wrote: >>>>> >>>>> Hi David, >>>>> >>>>> On Wed, Oct 4, 2017 at 10:51 PM, David Holmes > wrote: >>>>> >>>>> Hi Alex, >>>>> >>>>> Can you tell me how shares/quotas are actually implemented in >>>>> terms of allocating "cpus" to processes when shares/quotas are >>>>> being applied? >>>>> >>>>> The allocation of cpus to processes/threads(tasks as the kernel sees them) or the other way round is called balancing, which is done by Scheduling domains[3]. >>>>> >>>>> cpu shares use CFS "group" scheduling[1] to apply the share to all the tasks(threads) in the container. The container cpu shares weight maps directly to a task's weight in CFS, which given it is part of a group is divided by the number of tasks in the group (ie. a default container share of 1024 with 2 threads in the container/group would result in each thread/task having a 512 weight[4]). The same values used by nice[2] also. >>>>> >>>>> You can observe the task weight and other scheduler numbers in /proc/sched_debug [4]. You can also kernel trace scheduler activity which typically tells you the tasks involved, the cpu, the event: switch or wakeup, etc. >>>>> >>>>> For example in a 12 cpu system if I have a 50% share do I get all >>>>> 12 CPUs for 50% of a "quantum" each, or do I get 6 CPUs for a full >>>>> quantum each? >>>>> >>>>> >>>>> You get 12 cpus for 50% of the time on the average if there is another workload that has the same weight as you and is consuming as much as it can. >>>>> If there's nothing else running on the machine you get 12 cpus for 100% of the time with a cpu shares only config (ie. the burst capacity). >>>>> >>>>> I validated that the share was balanced over all the cpus by running linux perf events and checking that there were cpu samples on all cpus. There's bound to be other ways of doing it also. >>>>> >>>>> >>>>> When we try to use the "number of processors" to control the >>>>> number of threads created, or the number of partitions in a task, >>>>> then we really want to know how many CPUs we can actually be >>>>> concurrently running on! >>>> I?m not sure that the primary question for serverless container execution. Just because you might happen to burst and have available >>>> to you more CPU time than you specified in your shares doesn?t mean >>>> that a multi-threaded application running in one of these containers should configure itself to use all available host processors. This would result in over-burdoning the system at times of high load. >>> >>> And conversely if you restrict yourself to the "share" of processors you get over time (ie 6 instead of 12) then you can severely impact the performance (response time in particular) of the VM and the application running on the VM. >> So if someone configures an 88 way system to use 1/88 share, you don?t think they expect a highly threaded >> application to run slower than if they didn?t restrict the shares?? The whole idea about shares is to SHARE the >> system. Yes, you?d have better performance when the system is idle and only running a single application but that?s >> not what these container frameworks are trying to accomplish. They want to get the best performance when running many >> many processes. That?s what I?m optimizing for. > > In what I described you are SHARING the system. You're also getting the most benefit from a lightly loaded system. > > To me the conceptual model for a 1/88 share of an 88-way system is that you get 88 processors that appear to run at 1/88 the speed of the physical ones. Not that you get 1 real full speed processor. > >>> >>> But I don't see how this can overburden the system. If you app is running alone you get to use all 12 cpus for 100% of the time and life is good. If another app starts up then your 100% drops proportionately. If you schedule 12 apps all with a 1/12 share then everyone gets up to 12 cpus for 1/12 of the time. It's only if you try to schedule a set of apps with a utilization total greater than 1 does the system become overloaded. >> In my above example, If we run the VM ergonomics based on 88 CPUs, then we are wasting a lot of memory on thread stacks and when >> many of these processes are running, the system will context switch a lot more than it would if we restricted the creation of threads to >> the share amount. > > Context switching is a function of threads and time. My way uses more threads and less time (per unit of work); yours uses less threads and more time. Seems like zero sum to me. > > Memory use is a different matter, but only because you can restrict memory independently of cpus. So you will need to ensure your memory quotas can accommodate the number of threads you expect to run - regardless. > > David > ----- > >> Bob. >>> >>>> The Java runtime, at startup, configures several subsystems to use a number of threads for each system based on the number of available >>>> processors. These subsystems include things like the number of GC >>>> threads, JIT compiler and thread pools. >>> >>>> The problem I am trying to solve is to come up with a single number >>>> of CPUs based on container knowledge that can be used for the Java >>>> runtime subsystem to configure itself. I believe that we should >>>> trust the implementor of the Mesos or Kubernetes setup and honor their wishes when coming up with this number and not just use the >>>> processor affinity or number of cpus in the cpuset. >>> >>> I don't agree, as has been discussed before. It's perfectly fine, even desirable, in my opinion to have 12 threads executing concurrently for 50% of the time, rather than only 6 threads for 100% (assuming the scheduling technology is even clever enough to realize it can grant your threads 100%). >>> >>> Over time the amount of work your app can execute is the same, but the time taken for an individual subtask can vary. If you are just doing one-shot batch processing then it makes no difference. If you're running an app that itself services incoming requests then the response time to individual requests can be impacted. To take the worst-case scenario, imagine you get 12 concurrent requests that would each take 1/12 of your cpu quota. With 12 threads on 12 cpus you can service all 12 requests with a response time of 1/12 time units. But with 6 threads on 6 cpus you can only service 6 requests with a 1/12 response time, and the other 6 will have a 1/6 response time. >>> >>>> The challenge is determining the right algorithm that doesn?t penalize the VM. >>> >>> Agreed. But I think the current algorithm may penalize the VM, and more importantly the application it is running. >>> >>>> My current implementation does this: >>>> total available logical processors = min (cpusets,sched_getaffinity,shares/1024, quota/period) >>>> All fractional units are rounded up to the next whole number. >>> >>> My point has always been that I just don't think producing a single number from all these factors is the right/best way to deal with this. I think we really want to be able to answer the question "how many processors can I concurrently execute on" distinct from the question of "how much of a time slice will I get on each of those processors". To me "how many" is the question that "availableProcessors" should be answering - and only that question. How much "share" do I get is a different question, and perhaps one that the VM and the application need to be able to ask. >>> >>> BTW sched_getaffinity should already account for cpusets ?? >>> >>> Cheers, >>> David >>> >>>> Bob. >>>>> >>>>> Makes sense to check. Hopefully there aren't any major errors or omissions in the above. >>>>> Thanks, >>>>> Alex >>>>> >>>>> [1] https://lwn.net/Articles/240474/ >>>>> [2] https://github.com/torvalds/linux/blob/368f89984bb971b9f8b69eeb85ab19a89f985809/kernel/sched/core.c#L6735 >>>>> [3] https://lwn.net/Articles/80911/ / http://www.i3s.unice.fr/~jplozi/wastedcores/files/extended_talk.pdf >>>>> >>>>> [4] >>>>> >>>>> cfs_rq[13]:/system.slice/docker-f5681788d6daab249c90810fe60da429a2565b901ff34245922a578635b5d607.scope >>>>> >>>>> .exec_clock: 0.000000 >>>>> >>>>> .MIN_vruntime: 0.000001 >>>>> >>>>> .min_vruntime: 8090.087297 >>>>> >>>>> .max_vruntime: 0.000001 >>>>> >>>>> .spread: 0.000000 >>>>> >>>>> .spread0 : -124692718.052832 >>>>> >>>>> .nr_spread_over: 0 >>>>> >>>>> .nr_running: 1 >>>>> >>>>> .load: 1024 >>>>> >>>>> .runnable_load_avg : 1023 >>>>> >>>>> .blocked_load_avg: 0 >>>>> >>>>> .tg_load_avg : 2046 >>>>> >>>>> .tg_load_contrib : 1023 >>>>> >>>>> .tg_runnable_contrib : 1023 >>>>> >>>>> .tg->runnable_avg: 2036 >>>>> >>>>> .tg->cfs_bandwidth.timer_active: 0 >>>>> >>>>> .throttled : 0 >>>>> >>>>> .throttle_count: 0 >>>>> >>>>> .se->exec_start: 236081964.515645 >>>>> >>>>> .se->vruntime: 24403993.326934 >>>>> >>>>> .se->sum_exec_runtime: 8091.135873 >>>>> >>>>> .se->load.weight : 512 >>>>> >>>>> .se->avg.runnable_avg_sum: 45979 >>>>> >>>>> .se->avg.runnable_avg_period : 45979 >>>>> >>>>> .se->avg.load_avg_contrib: 511 >>>>> >>>>> .se->avg.decay_count : 0 >>>>> >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>>> On 5/10/2017 6:01 AM, Alex Bagehot wrote: >>>>> >>>>> Hi, >>>>> >>>>> On Wed, Oct 4, 2017 at 7:51 PM, Bob Vandette >>>>> > >>>>> wrote: >>>>> >>>>> >>>>> On Oct 4, 2017, at 2:30 PM, Robbin Ehn >>>>> > >>>>> wrote: >>>>> >>>>> Thanks Bob for looking into this. >>>>> >>>>> On 10/04/2017 08:14 PM, Bob Vandette wrote: >>>>> >>>>> Robbin, >>>>> I?ve looked into this issue and you are correct. I do have to examine >>>>> >>>>> both the >>>>> >>>>> sched_getaffinity results as well as the cgroup >>>>> cpu subsystem >>>>> >>>>> configuration >>>>> >>>>> files in order to provide a reasonable value for >>>>> active_processors. If >>>>> >>>>> I was only >>>>> >>>>> interested in cpusets, I could simply rely on the >>>>> getaffinity call but >>>>> >>>>> I also want to >>>>> >>>>> factor in shares and quotas as well. >>>>> >>>>> >>>>> We had a quick discussion at the office, we actually >>>>> do think that you >>>>> >>>>> could skip reading the shares and quotas. >>>>> >>>>> It really depends on what the user expect, if he give >>>>> us 4 cpu's with >>>>> >>>>> 50% or 2 full cpu what do he expect the differences would be? >>>>> >>>>> One could argue that he 'knows' that he will only use >>>>> max 50% and thus >>>>> >>>>> we can act as if he is giving us 4 full cpu. >>>>> >>>>> But I'll leave that up to you, just a tough we had. >>>>> >>>>> >>>>> It?s my opinion that we should do something if someone >>>>> makes the effort to >>>>> configure their >>>>> containers to use quotas or shares. There are many >>>>> different opinions on >>>>> what the right that >>>>> right ?something? is. >>>>> >>>>> >>>>> It might be interesting to look at some real instances of how >>>>> java might[3] >>>>> be deployed in containers. >>>>> Marathon/Mesos[1] and Kubernetes[2] use shares and quotas so >>>>> this is a vast >>>>> chunk of deployments that need both of them today. >>>>> >>>>> >>>>> >>>>> Many developers that are trying to deploy apps that use >>>>> containers say >>>>> they don?t like >>>>> cpusets. This is too limiting for them especially when >>>>> the server >>>>> configurations vary >>>>> within their organization. >>>>> >>>>> >>>>> True, however Kubernetes has an alpha feature[5] where it >>>>> allocates cpusets >>>>> to containers that request a whole number of cpus. Previously >>>>> without >>>>> cpusets any container could run on any cpu which we know might >>>>> not be good >>>>> for some workloads that want isolation. A request for a >>>>> fractional or >>>>> burstable amount of cpu would be allocated from a shared cpu >>>>> pool. So >>>>> although manual allocation of cpusets will be flakey[3] , >>>>> automation should >>>>> be able to make it work. >>>>> >>>>> >>>>> >>>>> From everything I?ve read including source code, there >>>>> seems to be a >>>>> consensus that >>>>> shares and quotas are being used as a way to specify a >>>>> fraction of a >>>>> system (number of cpus). >>>>> >>>>> >>>>> A refinement[6] on this is: >>>>> Shares can be used for guaranteed cpu - you will always get >>>>> your share. >>>>> Quota[4] is a limit/constraint - you can never get more than >>>>> the quota. >>>>> So given the below limit of how many shares will be allocated >>>>> on a host you >>>>> can have burstable(or overcommit) capacity if your shares are >>>>> less than >>>>> your quota. >>>>> >>>>> >>>>> >>>>> Docker added ?cpus which is implemented using quotas and >>>>> periods. They >>>>> adjust these >>>>> two parameters to provide a way of calculating the number >>>>> of cpus that >>>>> will be available >>>>> to a process (quota/period). Amazon also documents that >>>>> cpu shares are >>>>> defined to be a multiple of 1024. >>>>> Where 1024 represents a single cpu and a share value of >>>>> N*1024 represents >>>>> N cpus. >>>>> >>>>> >>>>> Kubernetes and Mesos/Marathon also use the N*1024 shares per >>>>> host to >>>>> allocate resources automatically. >>>>> >>>>> Hopefully this provides some background on what a couple of >>>>> orchestration >>>>> systems that will be running java are doing currently in this >>>>> area. >>>>> Thanks, >>>>> Alex >>>>> >>>>> >>>>> [1] https://github.com/apache/mesos/commit/346cc8dd528a28a6e >>>>> >>>>> 1f1cbdb4c95b8bdea2f6070 / (now out of date but appears to be a >>>>> reasonable >>>>> intro : >>>>> https://zcox.wordpress.com/2014/09/17/cpu-resources-in-docke >>>>> >>>>> r-mesos-and-marathon/ ) >>>>> [1a] https://youtu.be/hJyAfC-Z2xk?t=2439 >>>>> >>>>> >>>>> [2] https://kubernetes.io/docs/concepts/configuration/manage >>>>> >>>>> -compute-resources-container/ >>>>> >>>>> [3] https://youtu.be/w1rZOY5gbvk?t=2479 >>>>> >>>>> >>>>> [4] >>>>> https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt >>>>> >>>>> https://landley.net/kdocs/ols/2010/ols2010-pages-245-254.pdf >>>>> >>>>> https://lwn.net/Articles/428175/ >>>>> >>>>> >>>>> [5] >>>>> https://github.com/kubernetes/community/blob/43ce57ac476b9f2ce3f0220354a075e095a0d469/contributors/design-proposals/node/cpu-manager.md >>>>> >>>>> / https://github.com/kubernetes/kubernetes/commit/ >>>>> >>>>> 00f0e0f6504ad8dd85fcbbd6294cd7cf2475fc72 / >>>>> https://vimeo.com/226858314 >>>>> >>>>> >>>>> [6] https://kubernetes.io/docs/concepts/configuration/manage- >>>>> >>>>> compute-resources-container/#how-pods-with-resource-limits-are-run >>>>> >>>>> >>>>> Of course these are just conventions. This is why I >>>>> provided a way of >>>>> specifying the >>>>> number of CPUs so folks deploying Java services can be >>>>> certain they get >>>>> what they want. >>>>> >>>>> Bob. >>>>> >>>>> >>>>> I had assumed that when sched_setaffinity was >>>>> called (in your case by >>>>> >>>>> numactl) that the >>>>> >>>>> cgroup cpu config files would be updated to >>>>> reflect the current >>>>> >>>>> processor affinity for the >>>>> >>>>> running process. This is not correct. I have >>>>> updated my changeset and >>>>> >>>>> have successfully >>>>> >>>>> run with your examples below. I?ll post a new >>>>> webrev soon. >>>>> >>>>> >>>>> I see, thanks again! >>>>> >>>>> /Robbin >>>>> >>>>> Thanks, >>>>> Bob. >>>>> >>>>> >>>>> I still want to include the flag for at >>>>> least one Java release in the >>>>> >>>>> event that the new behavior causes some regression >>>>> >>>>> in behavior. I?m trying to make the >>>>> detection robust so that it will >>>>> >>>>> fallback to the current behavior in the event >>>>> >>>>> that cgroups is not configured as expected >>>>> but I?d like to have a way >>>>> >>>>> of forcing the issue. JDK 10 is not >>>>> >>>>> supposed to be a long term support release >>>>> which makes it a good >>>>> >>>>> target for this new behavior. >>>>> >>>>> I agree with David that once we commit to >>>>> cgroups, we should extract >>>>> >>>>> all VM configuration data from that >>>>> >>>>> source. There?s more information >>>>> available for cpusets than just >>>>> >>>>> processor affinity that we might want to >>>>> >>>>> consider when calculating the number of >>>>> processors to assume for the >>>>> >>>>> VM. There?s exclusivity and >>>>> >>>>> effective cpu data available in addition >>>>> to the cpuset string. >>>>> >>>>> >>>>> cgroup only contains limits, not the real hard >>>>> limits. >>>>> You most consider the affinity mask. We that >>>>> have numa nodes do: >>>>> >>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 >>>>> --membind=1 java >>>>> >>>>> -Xlog:os=debug -cp . ForEver | grep proc >>>>> >>>>> [0.001s][debug][os] Initial active processor >>>>> count set to 16 >>>>> [rehn at rehn-ws dev]$ numactl --cpunodebind=1 >>>>> --membind=1 java >>>>> >>>>> -Xlog:os=debug -XX:+UseContainerSupport -cp . ForEver | >>>>> grep proc >>>>> >>>>> [0.001s][debug][os] Initial active processor >>>>> count set to 32 >>>>> >>>>> when benchmarking all the time and that must >>>>> be set to 16 otherwise >>>>> >>>>> the flag is really bad for us. >>>>> >>>>> So the flag actually breaks the little numa >>>>> support we have now. >>>>> >>>>> Thanks, Robbin >>>>> >>>>> >>>>> >>>>> From goetz.lindenmaier at sap.com Wed Oct 11 20:06:45 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 11 Oct 2017 20:06:45 +0000 Subject: RFR(M): 8189102: All tools should support -?, -h and --help Message-ID: Hi The tools in jdk should all show the same behavior wrt. help flags. This change normalizes the help flags of a row of the tools in the jdk. Java accepts -?, -h and --help, thus I changed the tools to support these, too. Some tools exited with '1' after displaying the help message, I turned this to '0'. Maybe this is not the right mailing list for this, please advise. Please review this change. I please need a sponsor. http://cr.openjdk.java.net/~goetz/wr17/8189102-helpMessage/webrev.01/ In detail, this fixes the help message of the following tools: jar -? -h --help; added -?. jarsigner -? -h --help; added --help. -help accepted but not documented. javac -? --help; added -?. Removed -help. -h is taken for other purpose javadoc -? -h --help; added -h -?. Removed -help javap -? -h --help; added -h. -help accepted but no more documented. jcmd -? -h --help; added -? --help. -help accepted but no more documented. Changed return value to '0' jdb -? -h --help; added -? -h --help. -help accepted but no more documented. jdeprscan -? -h --help; added -? jinfo -? -h --help; added -? --help. -help accepted but no more documented. jjs -h --help; Replaced -help by --help. Adding more not straight forward. jps -? -h --help; added -? --help. -help accepted but no more documented. jshell -? -h --help; added -? jstat -? -h --help; added -h --help. -help accepted but no more documented. Best regards, Goetz. From joe.darcy at oracle.com Wed Oct 11 20:10:31 2017 From: joe.darcy at oracle.com (joe darcy) Date: Wed, 11 Oct 2017 13:10:31 -0700 Subject: RFR(M): 8189102: All tools should support -?, -h and --help In-Reply-To: References: Message-ID: <895e0f83-3b7f-f691-53d6-67a3d6257aa3@oracle.com> Hi Goetz, Note that a change like this require a CSR request for the command line updates and return code modification. The review should also occur on aliases where the various tools are discussed, for example, javac is discussed on compiler-dev and several other tools are discussed on core-libs-dev. Thanks, -Joe On 10/11/2017 1:06 PM, Lindenmaier, Goetz wrote: > Hi > > The tools in jdk should all show the same behavior wrt. help flags. > This change normalizes the help flags of a row of the tools in the jdk. > Java accepts -?, -h and --help, thus I changed the tools to support > these, too. Some tools exited with '1' after displaying the help message, > I turned this to '0'. > > Maybe this is not the right mailing list for this, please advise. > > Please review this change. I please need a sponsor. > http://cr.openjdk.java.net/~goetz/wr17/8189102-helpMessage/webrev.01/ > > In detail, this fixes the help message of the following tools: > jar -? -h --help; added -?. > jarsigner -? -h --help; added --help. -help accepted but not documented. > javac -? --help; added -?. Removed -help. -h is taken for other purpose > javadoc -? -h --help; added -h -?. Removed -help > javap -? -h --help; added -h. -help accepted but no more documented. > jcmd -? -h --help; added -? --help. -help accepted but no more documented. Changed return value to '0' > jdb -? -h --help; added -? -h --help. -help accepted but no more documented. > jdeprscan -? -h --help; added -? > jinfo -? -h --help; added -? --help. -help accepted but no more documented. > jjs -h --help; Replaced -help by --help. Adding more not straight forward. > jps -? -h --help; added -? --help. -help accepted but no more documented. > jshell -? -h --help; added -? > jstat -? -h --help; added -h --help. -help accepted but no more documented. > > Best regards, > Goetz. From david.holmes at oracle.com Wed Oct 11 21:38:40 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Oct 2017 07:38:40 +1000 Subject: RFR(M): 8189102: All tools should support -?, -h and --help In-Reply-To: <895e0f83-3b7f-f691-53d6-67a3d6257aa3@oracle.com> References: <895e0f83-3b7f-f691-53d6-67a3d6257aa3@oracle.com> Message-ID: On 12/10/2017 6:10 AM, joe darcy wrote: > Hi Goetz, > > Note that a change like this require a CSR request for the command line > updates and return code modification. The review should also occur on > aliases where the various tools are discussed, for example, javac is > discussed on compiler-dev and several other tools are discussed on > core-libs-dev. And none of the tools/launchers fall under hotspot directly. Some may be serviceability ... David > Thanks, > > -Joe > > > On 10/11/2017 1:06 PM, Lindenmaier, Goetz wrote: >> Hi >> >> The tools in jdk should all show the same behavior wrt. help flags. >> This change normalizes the help flags of a row of the tools in the jdk. >> Java accepts -?, -h and --help, thus I changed the tools to support >> these, too.? Some tools exited with '1' after displaying the help >> message, >> I turned this to '0'. >> >> Maybe this is not the right mailing list for this, please advise. >> >> Please review this change. I please need a sponsor. >> http://cr.openjdk.java.net/~goetz/wr17/8189102-helpMessage/webrev.01/ >> >> In detail, this fixes the help message of the following tools: >> jar????????? -? -h --help;? added -?. >> jarsigner??? -? -h --help;? added --help. -help accepted but not >> documented. >> javac??????? -???? --help;? added -?. Removed -help. -h is taken for >> other purpose >> javadoc????? -? -h --help;? added -h -?. Removed -help >> javap??????? -? -h --help;? added -h. -help accepted but no more >> documented. >> jcmd???????? -? -h --help;? added -? --help. -help accepted but no >> more documented. Changed return value to '0' >> jdb????????? -? -h --help;? added -? -h --help. -help accepted but no >> more documented. >> jdeprscan??? -? -h --help;? added -? >> jinfo??????? -? -h --help;? added -? --help. -help accepted but no >> more documented. >> jjs???????????? -h --help;? Replaced -help by --help. Adding more not >> straight forward. >> jps????????? -? -h --help;? added -? --help. -help accepted but no >> more documented. >> jshell?????? -? -h --help;? added -? >> jstat??????? -? -h --help;? added -h --help. -help accepted but no >> more documented. >> >> Best regards, >> ?? Goetz. > From vladimir.kozlov at oracle.com Wed Oct 11 23:03:36 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 11 Oct 2017 16:03:36 -0700 Subject: RFR(M): 8189102: All tools should support -?, -h and --help In-Reply-To: References: Message-ID: You missed AOT tool jaotc: http://hg.openjdk.java.net/jdk10/hs/file/44117bc2bedf/src/jdk.aot/share/classes/jdk.tools.jaotc/src/jdk/tools/jaotc/Options.java#l230 }, new Option(" --help Print this usage message", false, "--help", "-h", "-?") { Vladimir On 10/11/17 1:06 PM, Lindenmaier, Goetz wrote: > Hi > > The tools in jdk should all show the same behavior wrt. help flags. > This change normalizes the help flags of a row of the tools in the jdk. > Java accepts -?, -h and --help, thus I changed the tools to support > these, too. Some tools exited with '1' after displaying the help message, > I turned this to '0'. > > Maybe this is not the right mailing list for this, please advise. > > Please review this change. I please need a sponsor. > http://cr.openjdk.java.net/~goetz/wr17/8189102-helpMessage/webrev.01/ > > In detail, this fixes the help message of the following tools: > jar -? -h --help; added -?. > jarsigner -? -h --help; added --help. -help accepted but not documented. > javac -? --help; added -?. Removed -help. -h is taken for other purpose > javadoc -? -h --help; added -h -?. Removed -help > javap -? -h --help; added -h. -help accepted but no more documented. > jcmd -? -h --help; added -? --help. -help accepted but no more documented. Changed return value to '0' > jdb -? -h --help; added -? -h --help. -help accepted but no more documented. > jdeprscan -? -h --help; added -? > jinfo -? -h --help; added -? --help. -help accepted but no more documented. > jjs -h --help; Replaced -help by --help. Adding more not straight forward. > jps -? -h --help; added -? --help. -help accepted but no more documented. > jshell -? -h --help; added -? > jstat -? -h --help; added -h --help. -help accepted but no more documented. > > Best regards, > Goetz. > From david.holmes at oracle.com Thu Oct 12 01:04:50 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Oct 2017 11:04:50 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> Message-ID: <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> Hi Bob, On 12/10/2017 5:11 AM, Bob Vandette wrote: > Here?s an updated webrev for this RFE that contains changes and cleanups > based on feedback I?ve received so far. > > I?m still investigating the best approach for reacting to cpu shares and > quotas. ?I do not believe doing nothing is the answer. I do. :) Let me try this again. When you run outside of a container you don't get 100% of the CPUs - you have to share with whatever else is running on the system. You get a fraction of CPU time based on the load. We don't try to communicate load information to the VM/application so it can adapt. Within a container setting shares/quotas is just a way of setting an artificial load. So why should we be treating it any differently? That's not to say an API to provide load/shares/quota information may not be useful, but that is a separate issue to what the "active processor count" should report. > > http://cr.openjdk.java.net/~bobv/8146115/webrev.01 > > Updates: > > 1. I had to move the processing of AggressiveHeap since the container > memory size needs to be known before this can be processed. I don't like the placement of this - we don't call os:: init functions from inside Arguments - we manage the initialization sequence from Threads::create_vm. Seems to me that container initialization can/should happen in os::init_before_ergo, and the AggressiveHeap processing can occur at the start of Arguments::apply_ergo(). That said we need to be sure nothing touched by set_aggressive_heap_flags will be used before we now reach that code - there are a lot of flags being set in there. > > 2. I no longer use the cpuset.cpus contents since sched_getaffinity > reports the correct results > even if someone manually updates the cgroup data. ?I originally didn?t > think this was the case since > sched_setaffinity didn?t automatically update the cpuset file contents > but the inverse is true. Ok. > > 3. I ifdef?d the container function support in > src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os > platform directories. ?I can do this if it?s absolutely necessary. You should not need to do this if initialization moves as I suggested above. os::init_before_ergo() in os_linux.cpp can call OSContainer::init(). No need for os::initialize_container_support() or os::pd_initialize_container_support. Some further comments: src/hotspot/share/runtime/globals.hpp + "Optimize heap optnios Typo. + product(intx, ActiveProcessorCount, -1, Why intx? It can be int then the logging log_trace(os)("active_processor_count: " "active processor count set by user : %d", (int)ActiveProcessorCount); can use %d without casts. Or you can init to 0 and make it uint (and use %u). + product(bool, UseContainerSupport, true, \ + "(Linux Only) Sorry don't recall if we already covered this, but this should be in ./os/linux/globals_linux.hpp --- src/hotspot/os/linux/os_linux.cpp/.hpp 187 log_trace(os)("available container memory: " JULONG_FORMAT, avail_mem); 188 return avail_mem; 189 } else { 190 log_debug(os,container)("container memory usage call failed: " JLONG_FORMAT, mem_usage); Why "trace" (the third logging level) to show the information, but "debug" (the second level) to show failed calls? You use debug in other files for basic info. Overall I'm unclear on your use of debug versus trace for the logging. --- src/hotspot/os/linux/osContainer_linux.cpp Dead code: 376 #if 0 377 os::Linux::print_container_info(tty); ... 390 #endif Thanks, David > Bob. From david.holmes at oracle.com Thu Oct 12 07:23:24 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Oct 2017 17:23:24 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> Message-ID: <4fb119b8-cb0b-474c-ebbc-60841ef4aa46@oracle.com> Hi Coleen, Thanks for doing this tedious cleanup! It was good to see so many casts disappear; and sad to see so many have to now appear in the sync code. :( There were a few things that struck me ... Atomic::xchg_ptr turned into Atomic::xchg; yet for the stub generator routines atomic_xchg_ptr became atomic_xchg_long - but I can't see where that stub will now come into play? --- src/hotspot/share/gc/shared/taskqueue.inline.hpp + return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, + (volatile intptr_t *)&_data, + (intptr_t)old_age._data); The actual types here should be size_t, can we now change it to use the real type? --- src/hotspot/share/oops/cpCache.cpp 114 bool ConstantPoolCacheEntry::init_flags_atomic(intptr_t flags) { 115 intptr_t result = Atomic::cmpxchg(flags, &_flags, (intptr_t)0); 116 return (result == 0); 117 } _flags is actually intx, yet above we treat it as intptr_t. But then later: 156 if (_flags == 0) { 157 intx newflags = (value & parameter_size_mask); 158 Atomic::cmpxchg(newflags, &_flags, (intx)0); 159 } its intx again. This looks really odd to me. --- src/hotspot/share/runtime/objectMonitor.inline.hpp The addition of header_addr() made me a little nervous :) Can we add a sanity assert either inside it (or in synchronizer.cpp), to verify that this == &_header (or monitor == monitor->header_addr()) --- src/hotspot/share/runtime/synchronizer.cpp // global list of blocks of monitors -// gBlockList is really PaddedEnd *, but we don't -// want to expose the PaddedEnd template more than necessary. -ObjectMonitor * volatile ObjectSynchronizer::gBlockList = NULL; +PaddedEnd * volatile ObjectSynchronizer::gBlockList = NULL; Did this have to change? I'm not sure why we didn't want to expose PaddedEnd, but it is now being exposed. Thanks, David ----- On 11/10/2017 11:50 PM, coleen.phillimore at oracle.com wrote: > > Please review version .02 which removes use of replace_if_null, but not > the function.? A separate RFE can be filed to discuss that. > > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev > > Thanks, > Coleen > > On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>> >>>> Removing the operation is a different argument to renaming it. Most >>>> of the above argues for removing it. :) >>> >>> +1 on removing >> >> Thank you for all your feedback.? Erik best described what I was >> thinking.? I will remove it then.? There were not that many instances >> and one instance that people thought would be useful, needed the old >> return value. >> >> Coleen >>> >>> Thanks, Robbin >>> >>>> >>>> Cheers, >>>> David >>>> ----- >>>> >>>>> I have not reviewed this completely yet - thought I'd wait with >>>>> that until we agree about replace_if_null, if that is okay. >>>>> >>>>> Thanks, >>>>> /Erik >>>>> >>>>> On 2017-10-11 05:55, David Holmes wrote: >>>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>>>>> >>>>>>>> Summary: With the new template functions these are unnecessary. >>>>>>>> >>>>>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. I >>>>>>>> disliked the first name because it's not explicit from the >>>>>>>> callers that there's an underlying cas.? If people want to >>>>>>>> fight, I'll remove the function and use cmpxchg because there >>>>>>>> are only a couple places where this is a little nicer. >>>>>>> >>>>>>> I'm still looking at other parts, but I want to respond to this now. >>>>>>> >>>>>>> I object to this change.? I think the proposed new name is >>>>>>> confusing, >>>>>>> suggesting there are two different comparisons involved. >>>>>>> >>>>>>> I originally called it something else that I wasn't entirely happy >>>>>>> with.? When David suggested replace_if_null I quickly adopted >>>>>>> that as >>>>>>> I think that name exactly describes what it does.? In particular, I >>>>>>> think "atomic replace if" pretty clearly suggests a test-and-set / >>>>>>> compare-and-swap type of operation. >>>>>> >>>>>> I totally agree. It's an Atomic operation, the implementation will >>>>>> involve something atomic, it doesn't matter if it is cmpxchg or >>>>>> something else. The name replace_if_null describes exactly what >>>>>> the function does - it doesn't have to describe how it does it. >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Further, I think any name involving "cmpxchg" is problematic because >>>>>>> the result of this operation is intentionally different from >>>>>>> cmpxchg, >>>>>>> in order to better support the primary use-case, which is lazy >>>>>>> initialization. >>>>>>> >>>>>>> I also object to your alternative suggestion of removing the >>>>>>> operation >>>>>>> entirely and just using cmpxchg directly instead.? I don't recall >>>>>>> how >>>>>>> many occurrences there presently are, but I suspect more could >>>>>>> easily >>>>>>> be added; it's part of a lazy initialization pattern similar to DCLP >>>>>>> but without the locks. >>>>>>> >>>>> >> > From kim.barrett at oracle.com Thu Oct 12 07:29:20 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 12 Oct 2017 03:29:20 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> Message-ID: > On Oct 11, 2017, at 7:07 AM, coleen.phillimore at oracle.com wrote: > > > > On 10/11/17 4:12 AM, Robbin Ehn wrote: >> On 10/11/2017 10:09 AM, David Holmes wrote: >>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>> >>> Removing the operation is a different argument to renaming it. Most of the above argues for removing it. :) >> >> +1 on removing > > Thank you for all your feedback. Erik best described what I was thinking. I will remove it then. There were not that many instances and one instance that people thought would be useful, needed the old return value. I?ve already registered my objection to removal. I disagree with several of Erik?s points, which don?t address or miss the issues brought up in the original discussion that led to its introduction, as quoted by David. I?m still slogging my way through the review, maybe about 3/4 of the way through. I?ve found a number of real problems, some pre-existing and discovered by looking at the code around your changes; I think there are a couple of ABA bugs, for example. I?m worried that I?m missing some too, because I?m getting burned out from reading reams of lock-free code. This is *really* hard, and I very much wish it had been broken up into more easily digestible chunks. From coleen.phillimore at oracle.com Thu Oct 12 11:29:02 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 12 Oct 2017 07:29:02 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> Message-ID: <4b56a92a-474e-1aa8-f217-413e4f642b6d@oracle.com> On 10/12/17 3:29 AM, Kim Barrett wrote: >> On Oct 11, 2017, at 7:07 AM, coleen.phillimore at oracle.com wrote: >> >> >> >> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>> >>>> Removing the operation is a different argument to renaming it. Most of the above argues for removing it. :) >>> +1 on removing >> Thank you for all your feedback. Erik best described what I was thinking. I will remove it then. There were not that many instances and one instance that people thought would be useful, needed the old return value. > I?ve already registered my objection to removal. I disagree with several of Erik?s points, which don?t > address or miss the issues brought up in the original discussion that led to its introduction, as quoted > by David. You can file an RFE for it. > > I?m still slogging my way through the review, maybe about 3/4 of the way through. > > I?ve found a number of real problems, some pre-existing and discovered by looking at the code > around your changes; I think there are a couple of ABA bugs, for example. I?m worried that I?m > missing some too, because I?m getting burned out from reading reams of lock-free code. This > is *really* hard, and I very much wish it had been broken up into more easily digestible chunks. > > Were these bugs pre-existing or did I introduce them? Thanks, Coleen From david.holmes at oracle.com Thu Oct 12 11:35:50 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Oct 2017 21:35:50 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> Message-ID: <0740f4ce-9388-d225-75d3-47f4657dcac3@oracle.com> On 12/10/2017 5:29 PM, Kim Barrett wrote: >> On Oct 11, 2017, at 7:07 AM, coleen.phillimore at oracle.com wrote: >> >> >> >> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>> >>>> Removing the operation is a different argument to renaming it. Most of the above argues for removing it. :) >>> >>> +1 on removing >> >> Thank you for all your feedback. Erik best described what I was thinking. I will remove it then. There were not that many instances and one instance that people thought would be useful, needed the old return value. > > I?ve already registered my objection to removal. I disagree with several of Erik?s points, which don?t > address or miss the issues brought up in the original discussion that led to its introduction, as quoted > by David. > > I?m still slogging my way through the review, maybe about 3/4 of the way through. > > I?ve found a number of real problems, some pre-existing and discovered by looking at the code > around your changes; I think there are a couple of ABA bugs, for example. I?m worried that I?m > missing some too, because I?m getting burned out from reading reams of lock-free code. This > is *really* hard, and I very much wish it had been broken up into more easily digestible chunks. I can't see how Coleen's changes can have introduced any bugs like that. So if there are ABA or other issues, then I think we would deal with them separately. Cheers, David > From coleen.phillimore at oracle.com Thu Oct 12 11:52:43 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 12 Oct 2017 07:52:43 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <4fb119b8-cb0b-474c-ebbc-60841ef4aa46@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> <4fb119b8-cb0b-474c-ebbc-60841ef4aa46@oracle.com> Message-ID: <5986a9d6-a27f-8462-d13e-5e11de8e358c@oracle.com> On 10/12/17 3:23 AM, David Holmes wrote: > Hi Coleen, > > Thanks for doing this tedious cleanup! > > It was good to see so many casts disappear; and sad to see so many > have to now appear in the sync code. :( The sync code has _owner field as void* because it can be several things.? I didn't try to > > There were a few things that struck me ... > > Atomic::xchg_ptr turned into Atomic::xchg; yet for the stub generator > routines atomic_xchg_ptr became atomic_xchg_long - but I can't see > where that stub will now come into play? http://cr.openjdk.java.net/~coleenp/8188220.02/webrev/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp.udiff.html I tried to remove it but windows x64 uses a stub for xchg (and others).? There was a preexisting stub for cmpxchg_long which I followed naming convention. ? static address _atomic_cmpxchg_entry; ? static address _atomic_cmpxchg_byte_entry; ? static address _atomic_cmpxchg_long_entry; Technically I think it should be long_long, as well as the cmpxchg_long_entry as well. I also missed renaming store_ptr_entry and add_ptr_entry.? What do you suggest? > > --- > > src/hotspot/share/gc/shared/taskqueue.inline.hpp > > +? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, > +????????????????????????????????? (volatile intptr_t *)&_data, > +????????????????????????????????? (intptr_t)old_age._data); > > The actual types here should be size_t, can we now change it to use > the real type? Yes, fixed.? Missed that one. > > --- > > src/hotspot/share/oops/cpCache.cpp > > ?114 bool ConstantPoolCacheEntry::init_flags_atomic(intptr_t flags) { > ?115?? intptr_t result = Atomic::cmpxchg(flags, &_flags, (intptr_t)0); > ?116?? return (result == 0); > ?117 } > > _flags is actually intx, yet above we treat it as intptr_t. But then > later: > > ?156?? if (_flags == 0) { > ?157???? intx newflags = (value & parameter_size_mask); > ?158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); > ?159?? } > > its intx again. This looks really odd to me. It's better as an intx, because that's what it's declared as.?? I'll patch up some other uses but don't promise total consistency because I don't want to pull on this particular sweater thread too much. intx and intptr_t I believe are typedefed to each other. typedef intptr_t? intx; Should we not have intx and uintx and change all their uses??? I've sworn off large changes after this though. ConstantPoolCacheEntry::make_flags returns an int.?? I fixed init_flags_atomic() because it's declared with an intx and defined with intptr_t. > > --- > > src/hotspot/share/runtime/objectMonitor.inline.hpp > > The addition of header_addr() made me a little nervous :) Can we add a > sanity assert either inside it (or in synchronizer.cpp), to verify > that this == &_header? (or monitor == monitor->header_addr()) Where I introduced it, looked like undefined behavior because it assumed that the header was the first field. So I should sanity check that other places with undefined behavior won't break?? Sure I'll do that. > > --- > > src/hotspot/share/runtime/synchronizer.cpp > > ?// global list of blocks of monitors > -// gBlockList is really PaddedEnd *, but we don't > -// want to expose the PaddedEnd template more than necessary. > -ObjectMonitor * volatile ObjectSynchronizer::gBlockList = NULL; > +PaddedEnd * volatile ObjectSynchronizer::gBlockList = > NULL; > > Did this have to change? I'm not sure why we didn't want to expose > PaddedEnd, but it is now being exposed. I didn't see why not and it avoided a bunch of ugly casts.?? I tested that the SA was fine with it because the SA manually did the address adjustment.? The SA could be fixed to know about PaddedEnd if it's somehting they want to do. Thanks for going through and reviewing all of this.?? Please answer question about the stub function name and I'll include the change with this patch. Coleen > > Thanks, > David > ----- > > > On 11/10/2017 11:50 PM, coleen.phillimore at oracle.com wrote: >> >> Please review version .02 which removes use of replace_if_null, but >> not the function.? A separate RFE can be filed to discuss that. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev >> >> Thanks, >> Coleen >> >> On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>>> >>>>> Removing the operation is a different argument to renaming it. >>>>> Most of the above argues for removing it. :) >>>> >>>> +1 on removing >>> >>> Thank you for all your feedback.? Erik best described what I was >>> thinking.? I will remove it then.? There were not that many >>> instances and one instance that people thought would be useful, >>> needed the old return value. >>> >>> Coleen >>>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Cheers, >>>>> David >>>>> ----- >>>>> >>>>>> I have not reviewed this completely yet - thought I'd wait with >>>>>> that until we agree about replace_if_null, if that is okay. >>>>>> >>>>>> Thanks, >>>>>> /Erik >>>>>> >>>>>> On 2017-10-11 05:55, David Holmes wrote: >>>>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>>>>>> >>>>>>>>> Summary: With the new template functions these are unnecessary. >>>>>>>>> >>>>>>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. >>>>>>>>> I disliked the first name because it's not explicit from the >>>>>>>>> callers that there's an underlying cas.? If people want to >>>>>>>>> fight, I'll remove the function and use cmpxchg because there >>>>>>>>> are only a couple places where this is a little nicer. >>>>>>>> >>>>>>>> I'm still looking at other parts, but I want to respond to this >>>>>>>> now. >>>>>>>> >>>>>>>> I object to this change.? I think the proposed new name is >>>>>>>> confusing, >>>>>>>> suggesting there are two different comparisons involved. >>>>>>>> >>>>>>>> I originally called it something else that I wasn't entirely happy >>>>>>>> with.? When David suggested replace_if_null I quickly adopted >>>>>>>> that as >>>>>>>> I think that name exactly describes what it does. In particular, I >>>>>>>> think "atomic replace if" pretty clearly suggests a test-and-set / >>>>>>>> compare-and-swap type of operation. >>>>>>> >>>>>>> I totally agree. It's an Atomic operation, the implementation >>>>>>> will involve something atomic, it doesn't matter if it is >>>>>>> cmpxchg or something else. The name replace_if_null describes >>>>>>> exactly what the function does - it doesn't have to describe how >>>>>>> it does it. >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> Further, I think any name involving "cmpxchg" is problematic >>>>>>>> because >>>>>>>> the result of this operation is intentionally different from >>>>>>>> cmpxchg, >>>>>>>> in order to better support the primary use-case, which is lazy >>>>>>>> initialization. >>>>>>>> >>>>>>>> I also object to your alternative suggestion of removing the >>>>>>>> operation >>>>>>>> entirely and just using cmpxchg directly instead.? I don't >>>>>>>> recall how >>>>>>>> many occurrences there presently are, but I suspect more could >>>>>>>> easily >>>>>>>> be added; it's part of a lazy initialization pattern similar to >>>>>>>> DCLP >>>>>>>> but without the locks. >>>>>>>> >>>>>> >>> >> From coleen.phillimore at oracle.com Thu Oct 12 11:54:09 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 12 Oct 2017 07:54:09 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> Message-ID: <53384d01-094e-19cd-b0a7-d695386e8d4d@oracle.com> Kim, I have this change as an mq patchset.? If you teach me more mq commands, I'll post webrevs for each. :) thanks, Coleen On 10/12/17 3:29 AM, Kim Barrett wrote: >> On Oct 11, 2017, at 7:07 AM, coleen.phillimore at oracle.com wrote: >> >> >> >> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>> >>>> Removing the operation is a different argument to renaming it. Most of the above argues for removing it. :) >>> +1 on removing >> Thank you for all your feedback. Erik best described what I was thinking. I will remove it then. There were not that many instances and one instance that people thought would be useful, needed the old return value. > I?ve already registered my objection to removal. I disagree with several of Erik?s points, which don?t > address or miss the issues brought up in the original discussion that led to its introduction, as quoted > by David. > > I?m still slogging my way through the review, maybe about 3/4 of the way through. > > I?ve found a number of real problems, some pre-existing and discovered by looking at the code > around your changes; I think there are a couple of ABA bugs, for example. I?m worried that I?m > missing some too, because I?m getting burned out from reading reams of lock-free code. This > is *really* hard, and I very much wish it had been broken up into more easily digestible chunks. > > From david.holmes at oracle.com Thu Oct 12 12:21:36 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 Oct 2017 22:21:36 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <5986a9d6-a27f-8462-d13e-5e11de8e358c@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> <4fb119b8-cb0b-474c-ebbc-60841ef4aa46@oracle.com> <5986a9d6-a27f-8462-d13e-5e11de8e358c@oracle.com> Message-ID: <354409e5-8985-6710-1a1a-848a6b366d12@oracle.com> On 12/10/2017 9:52 PM, coleen.phillimore at oracle.com wrote: > On 10/12/17 3:23 AM, David Holmes wrote: >> Hi Coleen, >> >> Thanks for doing this tedious cleanup! >> >> It was good to see so many casts disappear; and sad to see so many >> have to now appear in the sync code. :( > > The sync code has _owner field as void* because it can be several > things.? I didn't try to Yeah I understood why this had to happen. >> >> There were a few things that struck me ... >> >> Atomic::xchg_ptr turned into Atomic::xchg; yet for the stub generator >> routines atomic_xchg_ptr became atomic_xchg_long - but I can't see >> where that stub will now come into play? > > http://cr.openjdk.java.net/~coleenp/8188220.02/webrev/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp.udiff.html > > > I tried to remove it but windows x64 uses a stub for xchg (and others). Ah so I think this is where it is used: ./os_cpu/windows_x86/atomic_windows_x86.hpp:DEFINE_STUB_XCHG(8, jlong, os::atomic_xchg_ptr_func) ie atomic_xchg_ptr is the stub for Atomic::xchg<8> > There was a preexisting stub for cmpxchg_long which I followed naming > convention. > > ? static address _atomic_cmpxchg_entry; > ? static address _atomic_cmpxchg_byte_entry; > ? static address _atomic_cmpxchg_long_entry; > > Technically I think it should be long_long, as well as the > cmpxchg_long_entry as well. Or int64_t > I also missed renaming store_ptr_entry and add_ptr_entry.? What do you > suggest? store_ptr_entry actually seems unused. add_ptr_entry looks like it needs to be the 64-bit Atomic::add<8> implementation - so probably add_int64_t_entry. >> >> --- >> >> src/hotspot/share/gc/shared/taskqueue.inline.hpp >> >> +? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >> +????????????????????????????????? (volatile intptr_t *)&_data, >> +????????????????????????????????? (intptr_t)old_age._data); >> >> The actual types here should be size_t, can we now change it to use >> the real type? > > Yes, fixed.? Missed that one. >> >> --- >> >> src/hotspot/share/oops/cpCache.cpp >> >> ?114 bool ConstantPoolCacheEntry::init_flags_atomic(intptr_t flags) { >> ?115?? intptr_t result = Atomic::cmpxchg(flags, &_flags, (intptr_t)0); >> ?116?? return (result == 0); >> ?117 } >> >> _flags is actually intx, yet above we treat it as intptr_t. But then >> later: >> >> ?156?? if (_flags == 0) { >> ?157???? intx newflags = (value & parameter_size_mask); >> ?158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >> ?159?? } >> >> its intx again. This looks really odd to me. > > It's better as an intx, because that's what it's declared as.?? I'll > patch up some other uses but don't promise total consistency because I > don't want to pull on this particular sweater thread too much. intx and > intptr_t I believe are typedefed to each other. > > typedef intptr_t? intx; > > Should we not have intx and uintx and change all their uses??? I've > sworn off large changes after this though. I don't know why we have intx/uintx other than someone not liking having to type intptr_t all the time. > ConstantPoolCacheEntry::make_flags returns an int.?? I fixed > init_flags_atomic() because it's declared with an intx and defined with > intptr_t. Ok. >> >> --- >> >> src/hotspot/share/runtime/objectMonitor.inline.hpp >> >> The addition of header_addr() made me a little nervous :) Can we add a >> sanity assert either inside it (or in synchronizer.cpp), to verify >> that this == &_header? (or monitor == monitor->header_addr()) > > Where I introduced it, looked like undefined behavior because it assumed > that the header was the first field. Assumes and expects, I think. Not sure if it is undefined behaviour or not. > So I should sanity check that other places with undefined behavior won't > break?? Sure I'll do that. No only sanity check that your change actually didn't change anything. :) >> >> --- >> >> src/hotspot/share/runtime/synchronizer.cpp >> >> ?// global list of blocks of monitors >> -// gBlockList is really PaddedEnd *, but we don't >> -// want to expose the PaddedEnd template more than necessary. >> -ObjectMonitor * volatile ObjectSynchronizer::gBlockList = NULL; >> +PaddedEnd * volatile ObjectSynchronizer::gBlockList = >> NULL; >> >> Did this have to change? I'm not sure why we didn't want to expose >> PaddedEnd, but it is now being exposed. > > I didn't see why not and it avoided a bunch of ugly casts.?? I tested > that the SA was fine with it because the SA manually did the address > adjustment.? The SA could be fixed to know about PaddedEnd if it's > somehting they want to do. Glad you mentioned SA as I forgot to mention that with the vmStructs changes. :) > Thanks for going through and reviewing all of this.?? Please answer > question about the stub function name and I'll include the change with > this patch. Would like to see an incremental webrev please. (Should be easy if you're using mq :) ) Thanks, David > Coleen > >> >> Thanks, >> David >> ----- >> >> >> On 11/10/2017 11:50 PM, coleen.phillimore at oracle.com wrote: >>> >>> Please review version .02 which removes use of replace_if_null, but >>> not the function.? A separate RFE can be filed to discuss that. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev >>> >>> Thanks, >>> Coleen >>> >>> On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: >>>> >>>> >>>> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>>>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>>>> >>>>>> Removing the operation is a different argument to renaming it. >>>>>> Most of the above argues for removing it. :) >>>>> >>>>> +1 on removing >>>> >>>> Thank you for all your feedback.? Erik best described what I was >>>> thinking.? I will remove it then.? There were not that many >>>> instances and one instance that people thought would be useful, >>>> needed the old return value. >>>> >>>> Coleen >>>>> >>>>> Thanks, Robbin >>>>> >>>>>> >>>>>> Cheers, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> I have not reviewed this completely yet - thought I'd wait with >>>>>>> that until we agree about replace_if_null, if that is okay. >>>>>>> >>>>>>> Thanks, >>>>>>> /Erik >>>>>>> >>>>>>> On 2017-10-11 05:55, David Holmes wrote: >>>>>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>>>>>>> >>>>>>>>>> Summary: With the new template functions these are unnecessary. >>>>>>>>>> >>>>>>>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. >>>>>>>>>> I disliked the first name because it's not explicit from the >>>>>>>>>> callers that there's an underlying cas.? If people want to >>>>>>>>>> fight, I'll remove the function and use cmpxchg because there >>>>>>>>>> are only a couple places where this is a little nicer. >>>>>>>>> >>>>>>>>> I'm still looking at other parts, but I want to respond to this >>>>>>>>> now. >>>>>>>>> >>>>>>>>> I object to this change.? I think the proposed new name is >>>>>>>>> confusing, >>>>>>>>> suggesting there are two different comparisons involved. >>>>>>>>> >>>>>>>>> I originally called it something else that I wasn't entirely happy >>>>>>>>> with.? When David suggested replace_if_null I quickly adopted >>>>>>>>> that as >>>>>>>>> I think that name exactly describes what it does. In particular, I >>>>>>>>> think "atomic replace if" pretty clearly suggests a test-and-set / >>>>>>>>> compare-and-swap type of operation. >>>>>>>> >>>>>>>> I totally agree. It's an Atomic operation, the implementation >>>>>>>> will involve something atomic, it doesn't matter if it is >>>>>>>> cmpxchg or something else. The name replace_if_null describes >>>>>>>> exactly what the function does - it doesn't have to describe how >>>>>>>> it does it. >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> Further, I think any name involving "cmpxchg" is problematic >>>>>>>>> because >>>>>>>>> the result of this operation is intentionally different from >>>>>>>>> cmpxchg, >>>>>>>>> in order to better support the primary use-case, which is lazy >>>>>>>>> initialization. >>>>>>>>> >>>>>>>>> I also object to your alternative suggestion of removing the >>>>>>>>> operation >>>>>>>>> entirely and just using cmpxchg directly instead.? I don't >>>>>>>>> recall how >>>>>>>>> many occurrences there presently are, but I suspect more could >>>>>>>>> easily >>>>>>>>> be added; it's part of a lazy initialization pattern similar to >>>>>>>>> DCLP >>>>>>>>> but without the locks. >>>>>>>>> >>>>>>> >>>> >>> > From coleen.phillimore at oracle.com Thu Oct 12 12:55:56 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 12 Oct 2017 08:55:56 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <354409e5-8985-6710-1a1a-848a6b366d12@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> <4fb119b8-cb0b-474c-ebbc-60841ef4aa46@oracle.com> <5986a9d6-a27f-8462-d13e-5e11de8e358c@oracle.com> <354409e5-8985-6710-1a1a-848a6b366d12@oracle.com> Message-ID: <1a37a25f-8a72-3990-4849-24dbfbc21b0a@oracle.com> On 10/12/17 8:21 AM, David Holmes wrote: > On 12/10/2017 9:52 PM, coleen.phillimore at oracle.com wrote: >> On 10/12/17 3:23 AM, David Holmes wrote: >>> Hi Coleen, >>> >>> Thanks for doing this tedious cleanup! >>> >>> It was good to see so many casts disappear; and sad to see so many >>> have to now appear in the sync code. :( >> >> The sync code has _owner field as void* because it can be several >> things.? I didn't try to > > Yeah I understood why this had to happen. > >>> >>> There were a few things that struck me ... >>> >>> Atomic::xchg_ptr turned into Atomic::xchg; yet for the stub >>> generator routines atomic_xchg_ptr became atomic_xchg_long - but I >>> can't see where that stub will now come into play? >> >> http://cr.openjdk.java.net/~coleenp/8188220.02/webrev/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp.udiff.html >> >> >> I tried to remove it but windows x64 uses a stub for xchg (and others). > > Ah so I think this is where it is used: > > ./os_cpu/windows_x86/atomic_windows_x86.hpp:DEFINE_STUB_XCHG(8, jlong, > os::atomic_xchg_ptr_func) > > ie atomic_xchg_ptr is the stub for Atomic::xchg<8> > >> There was a preexisting stub for cmpxchg_long which I followed naming >> convention. >> >> ?? static address _atomic_cmpxchg_entry; >> ?? static address _atomic_cmpxchg_byte_entry; >> ?? static address _atomic_cmpxchg_long_entry; >> >> Technically I think it should be long_long, as well as the >> cmpxchg_long_entry as well. > > Or int64_t > >> I also missed renaming store_ptr_entry and add_ptr_entry.? What do >> you suggest? > > store_ptr_entry actually seems unused. > > add_ptr_entry looks like it needs to be the 64-bit Atomic::add<8> > implementation - so probably add_int64_t_entry. https://bugs.openjdk.java.net/browse/JDK-8186903 I'm renaming to ptr => long for now to follow other code and fixing the name with this RFE to what it really is, and what we decide. It was pretty ugly as: ? static jint????? (*atomic_add_func)?????????? (jint,????? volatile jint*); ? static intptr_t? (*atomic_add_ptr_func)?????? (intptr_t,? volatile intptr_t*); When the other uses jint as an argument.?? Actually, I think add_ptr makes more sense in this context than long.? I think I should leave this name and not make it long. > >>> >>> --- >>> >>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>> >>> +? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>> +????????????????????????????????? (volatile intptr_t *)&_data, >>> +????????????????????????????????? (intptr_t)old_age._data); >>> >>> The actual types here should be size_t, can we now change it to use >>> the real type? >> >> Yes, fixed.? Missed that one. >>> >>> --- >>> >>> src/hotspot/share/oops/cpCache.cpp >>> >>> ?114 bool ConstantPoolCacheEntry::init_flags_atomic(intptr_t flags) { >>> ?115?? intptr_t result = Atomic::cmpxchg(flags, &_flags, (intptr_t)0); >>> ?116?? return (result == 0); >>> ?117 } >>> >>> _flags is actually intx, yet above we treat it as intptr_t. But then >>> later: >>> >>> ?156?? if (_flags == 0) { >>> ?157???? intx newflags = (value & parameter_size_mask); >>> ?158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >>> ?159?? } >>> >>> its intx again. This looks really odd to me. >> >> It's better as an intx, because that's what it's declared as. I'll >> patch up some other uses but don't promise total consistency because >> I don't want to pull on this particular sweater thread too much. intx >> and intptr_t I believe are typedefed to each other. >> >> typedef intptr_t? intx; >> >> Should we not have intx and uintx and change all their uses? I've >> sworn off large changes after this though. > > I don't know why we have intx/uintx other than someone not liking > having to type intptr_t all the time. > >> ConstantPoolCacheEntry::make_flags returns an int.?? I fixed >> init_flags_atomic() because it's declared with an intx and defined >> with intptr_t. > > Ok. > >>> >>> --- >>> >>> src/hotspot/share/runtime/objectMonitor.inline.hpp >>> >>> The addition of header_addr() made me a little nervous :) Can we add >>> a sanity assert either inside it (or in synchronizer.cpp), to verify >>> that this == &_header? (or monitor == monitor->header_addr()) >> >> Where I introduced it, looked like undefined behavior because it >> assumed that the header was the first field. > > Assumes and expects, I think. Not sure if it is undefined behaviour or > not. Assumes without giving the static compiler a chance to check that what you've done is correct or not.? Maybe that's not undefined behavior. > >> So I should sanity check that other places with undefined behavior >> won't break?? Sure I'll do that. > > No only sanity check that your change actually didn't change anything. :) As well. > >>> >>> --- >>> >>> src/hotspot/share/runtime/synchronizer.cpp >>> >>> ?// global list of blocks of monitors >>> -// gBlockList is really PaddedEnd *, but we don't >>> -// want to expose the PaddedEnd template more than necessary. >>> -ObjectMonitor * volatile ObjectSynchronizer::gBlockList = NULL; >>> +PaddedEnd * volatile ObjectSynchronizer::gBlockList >>> = NULL; >>> >>> Did this have to change? I'm not sure why we didn't want to expose >>> PaddedEnd, but it is now being exposed. >> >> I didn't see why not and it avoided a bunch of ugly casts.?? I tested >> that the SA was fine with it because the SA manually did the address >> adjustment.? The SA could be fixed to know about PaddedEnd if it's >> somehting they want to do. > > Glad you mentioned SA as I forgot to mention that with the vmStructs > changes. :) > >> Thanks for going through and reviewing all of this.?? Please answer >> question about the stub function name and I'll include the change >> with this patch. > > Would like to see an incremental webrev please. (Should be easy if > you're using mq :) ) Will do. Thanks, Coleen > > Thanks, > David > >> Coleen >> >>> >>> Thanks, >>> David >>> ----- >>> >>> >>> On 11/10/2017 11:50 PM, coleen.phillimore at oracle.com wrote: >>>> >>>> Please review version .02 which removes use of replace_if_null, but >>>> not the function.? A separate RFE can be filed to discuss that. >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev >>>> >>>> Thanks, >>>> Coleen >>>> >>>> On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: >>>>> >>>>> >>>>> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>>>>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>>>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>>>>> >>>>>>> Removing the operation is a different argument to renaming it. >>>>>>> Most of the above argues for removing it. :) >>>>>> >>>>>> +1 on removing >>>>> >>>>> Thank you for all your feedback.? Erik best described what I was >>>>> thinking.? I will remove it then.? There were not that many >>>>> instances and one instance that people thought would be useful, >>>>> needed the old return value. >>>>> >>>>> Coleen >>>>>> >>>>>> Thanks, Robbin >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> I have not reviewed this completely yet - thought I'd wait with >>>>>>>> that until we agree about replace_if_null, if that is okay. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> /Erik >>>>>>>> >>>>>>>> On 2017-10-11 05:55, David Holmes wrote: >>>>>>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Summary: With the new template functions these are unnecessary. >>>>>>>>>>> >>>>>>>>>>> 2. renamed Atomic::replace_if_null to >>>>>>>>>>> Atomic::cmpxchg_if_null. I disliked the first name because >>>>>>>>>>> it's not explicit from the callers that there's an >>>>>>>>>>> underlying cas.? If people want to fight, I'll remove the >>>>>>>>>>> function and use cmpxchg because there are only a couple >>>>>>>>>>> places where this is a little nicer. >>>>>>>>>> >>>>>>>>>> I'm still looking at other parts, but I want to respond to >>>>>>>>>> this now. >>>>>>>>>> >>>>>>>>>> I object to this change.? I think the proposed new name is >>>>>>>>>> confusing, >>>>>>>>>> suggesting there are two different comparisons involved. >>>>>>>>>> >>>>>>>>>> I originally called it something else that I wasn't entirely >>>>>>>>>> happy >>>>>>>>>> with.? When David suggested replace_if_null I quickly adopted >>>>>>>>>> that as >>>>>>>>>> I think that name exactly describes what it does. In >>>>>>>>>> particular, I >>>>>>>>>> think "atomic replace if" pretty clearly suggests a >>>>>>>>>> test-and-set / >>>>>>>>>> compare-and-swap type of operation. >>>>>>>>> >>>>>>>>> I totally agree. It's an Atomic operation, the implementation >>>>>>>>> will involve something atomic, it doesn't matter if it is >>>>>>>>> cmpxchg or something else. The name replace_if_null describes >>>>>>>>> exactly what the function does - it doesn't have to describe >>>>>>>>> how it does it. >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> Further, I think any name involving "cmpxchg" is problematic >>>>>>>>>> because >>>>>>>>>> the result of this operation is intentionally different from >>>>>>>>>> cmpxchg, >>>>>>>>>> in order to better support the primary use-case, which is lazy >>>>>>>>>> initialization. >>>>>>>>>> >>>>>>>>>> I also object to your alternative suggestion of removing the >>>>>>>>>> operation >>>>>>>>>> entirely and just using cmpxchg directly instead.? I don't >>>>>>>>>> recall how >>>>>>>>>> many occurrences there presently are, but I suspect more >>>>>>>>>> could easily >>>>>>>>>> be added; it's part of a lazy initialization pattern similar >>>>>>>>>> to DCLP >>>>>>>>>> but without the locks. >>>>>>>>>> >>>>>>>> >>>>> >>>> >> From bob.vandette at oracle.com Thu Oct 12 15:43:17 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Thu, 12 Oct 2017 11:43:17 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <7e9322e8-274d-9fb2-f6a5-8cc612e3fe68@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> Message-ID: <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> > On Oct 11, 2017, at 9:04 PM, David Holmes wrote: > > Hi Bob, > > On 12/10/2017 5:11 AM, Bob Vandette wrote: >> Here?s an updated webrev for this RFE that contains changes and cleanups based on feedback I?ve received so far. >> I?m still investigating the best approach for reacting to cpu shares and quotas. I do not believe doing nothing is the answer. > > I do. :) Let me try this again. When you run outside of a container you don't get 100% of the CPUs - you have to share with whatever else is running on the system. You get a fraction of CPU time based on the load. We don't try to communicate load information to the VM/application so it can adapt. Within a container setting shares/quotas is just a way of setting an artificial load. So why should we be treating it any differently? Because today we optimize for a lightly loaded system and when running serverless applications in containers we should be optimizing for a fully loaded system. If developers don?t want this, then don?t use shares or quotas and you?ll have exactly the behavior you have today. I think we just have to document the new behavior (and how to turn it off) so people know what to expect. You seem to discount the added cost of 100s of VMs creating lots of un-necessaary threads. In the current JDK 10 code base, In a heavily loaded system with 88 processors, VmData grows from 60MBs (1 cpu) to 376MB (88 cpus). This is only mapped memory and it depends heavily on how deep in the stack these threads go before it impacts VmRSS but it shows the potential downside of having 100s of VMs thinking they each own the entire machine. I haven?t even done any experiments to determine the added context switching cost if the VM decides to use excessive pthreads. > > That's not to say an API to provide load/shares/quota information may not be useful, but that is a separate issue to what the "active processor count" should report. I don?t have a problem with active processor count reporting the number of processors we have, but I do have a problem with our current usage of this information within the VM and Core libraries. > >> http://cr.openjdk.java.net/~bobv/8146115/webrev.01 >> Updates: >> 1. I had to move the processing of AggressiveHeap since the container memory size needs to be known before this can be processed. > > I don't like the placement of this - we don't call os:: init functions from inside Arguments - we manage the initialization sequence from Threads::create_vm. Seems to me that container initialization can/should happen in os::init_before_ergo, and the AggressiveHeap processing can occur at the start of Arguments::apply_ergo(). > > That said we need to be sure nothing touched by set_aggressive_heap_flags will be used before we now reach that code - there are a lot of flags being set in there. This is exactly the reason why I put the call where it did. I put the call to set_aggressive_heap_flags in finalize_vm_init_args because that is exactly what this call is doing. It?s finalizing flags used after the parsing. The impacted flags are definitely being used shortly after and before init_before_ergo is called. > >> 2. I no longer use the cpuset.cpus contents since sched_getaffinity reports the correct results >> even if someone manually updates the cgroup data. I originally didn?t think this was the case since >> sched_setaffinity didn?t automatically update the cpuset file contents but the inverse is true. > > Ok. > >> 3. I ifdef?d the container function support in src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os >> platform directories. I can do this if it?s absolutely necessary. > > You should not need to do this if initialization moves as I suggested above. os::init_before_ergo() in os_linux.cpp can call OSContainer::init(). > No need for os::initialize_container_support() or os::pd_initialize_container_support. But os::init_before_ergo is in shared code. > > > Some further comments: > > src/hotspot/share/runtime/globals.hpp > > + "Optimize heap optnios > > Typo. Thx. > > + product(intx, ActiveProcessorCount, -1, Cut and paste issue, fixed. > > Why intx? It can be int then the logging > > log_trace(os)("active_processor_count: " > "active processor count set by user : %d", > (int)ActiveProcessorCount); > > can use %d without casts. Or you can init to 0 and make it uint (and use %u). > > + product(bool, UseContainerSupport, true, \ > + "(Linux Only) > > Sorry don't recall if we already covered this, but this should be in ./os/linux/globals_linux.hpp Fixed. > > --- > > src/hotspot/os/linux/os_linux.cpp/.hpp > > 187 log_trace(os)("available container memory: " JULONG_FORMAT, avail_mem); > 188 return avail_mem; > 189 } else { > 190 log_debug(os,container)("container memory usage call failed: " JLONG_FORMAT, mem_usage); > > Why "trace" (the third logging level) to show the information, but "debug" (the second level) to show failed calls? You use debug in other files for basic info. Overall I'm unclear on your use of debug versus trace for the logging. I use trace for noisy information that is not reporting errors and debug for failures that are informational and not fatal. In this case, the call could return -1 or -2. -1 is unlimited and -2 is an error. In either case we fallback to the standard system call to get available memory. I would have used warning but since these messages were occurring during a test run causing test failures. > > --- > > src/hotspot/os/linux/osContainer_linux.cpp > > Dead code: > > 376 #if 0 > 377 os::Linux::print_container_info(tty); > ... > 390 #endif I left it in for standalone testing. Should I use some other #if? Bob. > > Thanks, > David > >> Bob. From mbrandy at linux.vnet.ibm.com Thu Oct 12 16:16:16 2017 From: mbrandy at linux.vnet.ibm.com (Matthew Brandyberry) Date: Thu, 12 Oct 2017 11:16:16 -0500 Subject: RFR(M) 8188165: PPC64: Optimize Unsafe.copyMemory and arraycopy In-Reply-To: References: Message-ID: [Ping] On 9/29/17 4:00 PM, Matthew Brandyberry wrote: > This is specific to PPC64LE only. > > The emphasis in the proposed code is on minimizing branches. Thus, > this code makes no attempt to avoid misaligned accesses and each block > is designed to copy as many elements as possible. > > As one data point, this yields as much as a 13x improvement in > jbyte_disjoint_arraycopy for certain misaligned scenarios. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8188165 > Webrev: http://cr.openjdk.java.net/~mbrandy/8188165/jdk10/v1/ > > Thanks, > -Matt > From mbrandy at linux.vnet.ibm.com Thu Oct 12 16:17:12 2017 From: mbrandy at linux.vnet.ibm.com (Matthew Brandyberry) Date: Thu, 12 Oct 2017 11:17:12 -0500 Subject: [8u] RFR (M) 8181809 PPC64: Leverage mtfprd/mffprd on POWER8 In-Reply-To: References: Message-ID: <1726202e-b051-6c46-73f8-2f2f5f01e418@linux.vnet.ibm.com> [Ping] On 9/28/17 12:53 PM, Matthew Brandyberry wrote: > Hi, > > Please review this backport of 8181809 for jdk8u. > > It applies cleanly to jdk8u except for the lack of C1 support on PPC > in 8u -- thus those changes are omitted here. > > This is a PPC-specific hotspot optimization that leverages the > mtfprd/mffprd instructions for for movement between general purpose > and floating point registers (rather than through memory). It yields a > ~35% improvement measured via a microbenchmark. > > webrev?????? :http://cr.openjdk.java.net/~mbrandy/8181809/jdk8u/v1 > > bug????????? :https://bugs.openjdk.java.net/browse/JDK-8181809 > > review > thread:http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-June/027226.html > > > Thank you. > -Matt > From coleen.phillimore at oracle.com Thu Oct 12 17:23:33 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 12 Oct 2017 13:23:33 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <1a37a25f-8a72-3990-4849-24dbfbc21b0a@oracle.com> References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> <4fb119b8-cb0b-474c-ebbc-60841ef4aa46@oracle.com> <5986a9d6-a27f-8462-d13e-5e11de8e358c@oracle.com> <354409e5-8985-6710-1a1a-848a6b366d12@oracle.com> <1a37a25f-8a72-3990-4849-24dbfbc21b0a@oracle.com> Message-ID: Here's the qseries in webrevs. open webrev at http://cr.openjdk.java.net/~coleenp/8188220.add_ptr/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.cmpxchg_ptr/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.cmpxchg_if_null/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.xchg_ptr/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.store_ptr/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.load_ptr_acquire/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.assembler_cmpxchg/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.casptr/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev assembler_cmpxchg should be release_store_ptr which got qrefreshed with trying to get the cmpxchg function pointer to compile. Thanks, Coleen On 10/12/17 8:55 AM, coleen.phillimore at oracle.com wrote: > > > On 10/12/17 8:21 AM, David Holmes wrote: >> On 12/10/2017 9:52 PM, coleen.phillimore at oracle.com wrote: >>> On 10/12/17 3:23 AM, David Holmes wrote: >>>> Hi Coleen, >>>> >>>> Thanks for doing this tedious cleanup! >>>> >>>> It was good to see so many casts disappear; and sad to see so many >>>> have to now appear in the sync code. :( >>> >>> The sync code has _owner field as void* because it can be several >>> things.? I didn't try to >> >> Yeah I understood why this had to happen. >> >>>> >>>> There were a few things that struck me ... >>>> >>>> Atomic::xchg_ptr turned into Atomic::xchg; yet for the stub >>>> generator routines atomic_xchg_ptr became atomic_xchg_long - but I >>>> can't see where that stub will now come into play? >>> >>> http://cr.openjdk.java.net/~coleenp/8188220.02/webrev/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp.udiff.html >>> >>> >>> I tried to remove it but windows x64 uses a stub for xchg (and others). >> >> Ah so I think this is where it is used: >> >> ./os_cpu/windows_x86/atomic_windows_x86.hpp:DEFINE_STUB_XCHG(8, >> jlong, os::atomic_xchg_ptr_func) >> >> ie atomic_xchg_ptr is the stub for Atomic::xchg<8> >> >>> There was a preexisting stub for cmpxchg_long which I followed >>> naming convention. >>> >>> ?? static address _atomic_cmpxchg_entry; >>> ?? static address _atomic_cmpxchg_byte_entry; >>> ?? static address _atomic_cmpxchg_long_entry; >>> >>> Technically I think it should be long_long, as well as the >>> cmpxchg_long_entry as well. >> >> Or int64_t >> >>> I also missed renaming store_ptr_entry and add_ptr_entry.? What do >>> you suggest? >> >> store_ptr_entry actually seems unused. >> >> add_ptr_entry looks like it needs to be the 64-bit Atomic::add<8> >> implementation - so probably add_int64_t_entry. > > https://bugs.openjdk.java.net/browse/JDK-8186903 > > I'm renaming to ptr => long for now to follow other code and fixing > the name with this RFE to what it really is, and what we decide. > > It was pretty ugly as: > > ? static jint????? (*atomic_add_func)?????????? (jint, volatile jint*); > ? static intptr_t? (*atomic_add_ptr_func)?????? (intptr_t, volatile > intptr_t*); > > When the other uses jint as an argument.?? Actually, I think add_ptr > makes more sense in this context than long.? I think I should leave > this name and not make it long. >> >>>> >>>> --- >>>> >>>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>>> >>>> +? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>>> +????????????????????????????????? (volatile intptr_t *)&_data, >>>> +????????????????????????????????? (intptr_t)old_age._data); >>>> >>>> The actual types here should be size_t, can we now change it to use >>>> the real type? >>> >>> Yes, fixed.? Missed that one. >>>> >>>> --- >>>> >>>> src/hotspot/share/oops/cpCache.cpp >>>> >>>> ?114 bool ConstantPoolCacheEntry::init_flags_atomic(intptr_t flags) { >>>> ?115?? intptr_t result = Atomic::cmpxchg(flags, &_flags, (intptr_t)0); >>>> ?116?? return (result == 0); >>>> ?117 } >>>> >>>> _flags is actually intx, yet above we treat it as intptr_t. But >>>> then later: >>>> >>>> ?156?? if (_flags == 0) { >>>> ?157???? intx newflags = (value & parameter_size_mask); >>>> ?158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >>>> ?159?? } >>>> >>>> its intx again. This looks really odd to me. >>> >>> It's better as an intx, because that's what it's declared as. I'll >>> patch up some other uses but don't promise total consistency because >>> I don't want to pull on this particular sweater thread too much. >>> intx and intptr_t I believe are typedefed to each other. >>> >>> typedef intptr_t? intx; >>> >>> Should we not have intx and uintx and change all their uses? I've >>> sworn off large changes after this though. >> >> I don't know why we have intx/uintx other than someone not liking >> having to type intptr_t all the time. >> >>> ConstantPoolCacheEntry::make_flags returns an int.?? I fixed >>> init_flags_atomic() because it's declared with an intx and defined >>> with intptr_t. >> >> Ok. >> >>>> >>>> --- >>>> >>>> src/hotspot/share/runtime/objectMonitor.inline.hpp >>>> >>>> The addition of header_addr() made me a little nervous :) Can we >>>> add a sanity assert either inside it (or in synchronizer.cpp), to >>>> verify that this == &_header? (or monitor == monitor->header_addr()) >>> >>> Where I introduced it, looked like undefined behavior because it >>> assumed that the header was the first field. >> >> Assumes and expects, I think. Not sure if it is undefined behaviour >> or not. > > Assumes without giving the static compiler a chance to check that what > you've done is correct or not.? Maybe that's not undefined behavior. >> >>> So I should sanity check that other places with undefined behavior >>> won't break?? Sure I'll do that. >> >> No only sanity check that your change actually didn't change >> anything. :) > > As well. >> >>>> >>>> --- >>>> >>>> src/hotspot/share/runtime/synchronizer.cpp >>>> >>>> ?// global list of blocks of monitors >>>> -// gBlockList is really PaddedEnd *, but we don't >>>> -// want to expose the PaddedEnd template more than necessary. >>>> -ObjectMonitor * volatile ObjectSynchronizer::gBlockList = NULL; >>>> +PaddedEnd * volatile ObjectSynchronizer::gBlockList >>>> = NULL; >>>> >>>> Did this have to change? I'm not sure why we didn't want to expose >>>> PaddedEnd, but it is now being exposed. >>> >>> I didn't see why not and it avoided a bunch of ugly casts.?? I >>> tested that the SA was fine with it because the SA manually did the >>> address adjustment.? The SA could be fixed to know about PaddedEnd >>> if it's somehting they want to do. >> >> Glad you mentioned SA as I forgot to mention that with the vmStructs >> changes. :) >> >>> Thanks for going through and reviewing all of this.?? Please answer >>> question about the stub function name and I'll include the change >>> with this patch. >> >> Would like to see an incremental webrev please. (Should be easy if >> you're using mq :) ) > > Will do. > > Thanks, > Coleen >> >> Thanks, >> David >> >>> Coleen >>> >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>> >>>> On 11/10/2017 11:50 PM, coleen.phillimore at oracle.com wrote: >>>>> >>>>> Please review version .02 which removes use of replace_if_null, >>>>> but not the function.? A separate RFE can be filed to discuss that. >>>>> >>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev >>>>> >>>>> Thanks, >>>>> Coleen >>>>> >>>>> On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: >>>>>> >>>>>> >>>>>> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>>>>>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>>>>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>>>>>> >>>>>>>> Removing the operation is a different argument to renaming it. >>>>>>>> Most of the above argues for removing it. :) >>>>>>> >>>>>>> +1 on removing >>>>>> >>>>>> Thank you for all your feedback.? Erik best described what I was >>>>>> thinking.? I will remove it then.? There were not that many >>>>>> instances and one instance that people thought would be useful, >>>>>> needed the old return value. >>>>>> >>>>>> Coleen >>>>>>> >>>>>>> Thanks, Robbin >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> I have not reviewed this completely yet - thought I'd wait >>>>>>>>> with that until we agree about replace_if_null, if that is okay. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> /Erik >>>>>>>>> >>>>>>>>> On 2017-10-11 05:55, David Holmes wrote: >>>>>>>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Summary: With the new template functions these are >>>>>>>>>>>> unnecessary. >>>>>>>>>>>> >>>>>>>>>>>> 2. renamed Atomic::replace_if_null to >>>>>>>>>>>> Atomic::cmpxchg_if_null. I disliked the first name because >>>>>>>>>>>> it's not explicit from the callers that there's an >>>>>>>>>>>> underlying cas. If people want to fight, I'll remove the >>>>>>>>>>>> function and use cmpxchg because there are only a couple >>>>>>>>>>>> places where this is a little nicer. >>>>>>>>>>> >>>>>>>>>>> I'm still looking at other parts, but I want to respond to >>>>>>>>>>> this now. >>>>>>>>>>> >>>>>>>>>>> I object to this change.? I think the proposed new name is >>>>>>>>>>> confusing, >>>>>>>>>>> suggesting there are two different comparisons involved. >>>>>>>>>>> >>>>>>>>>>> I originally called it something else that I wasn't entirely >>>>>>>>>>> happy >>>>>>>>>>> with.? When David suggested replace_if_null I quickly >>>>>>>>>>> adopted that as >>>>>>>>>>> I think that name exactly describes what it does. In >>>>>>>>>>> particular, I >>>>>>>>>>> think "atomic replace if" pretty clearly suggests a >>>>>>>>>>> test-and-set / >>>>>>>>>>> compare-and-swap type of operation. >>>>>>>>>> >>>>>>>>>> I totally agree. It's an Atomic operation, the implementation >>>>>>>>>> will involve something atomic, it doesn't matter if it is >>>>>>>>>> cmpxchg or something else. The name replace_if_null describes >>>>>>>>>> exactly what the function does - it doesn't have to describe >>>>>>>>>> how it does it. >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> Further, I think any name involving "cmpxchg" is problematic >>>>>>>>>>> because >>>>>>>>>>> the result of this operation is intentionally different from >>>>>>>>>>> cmpxchg, >>>>>>>>>>> in order to better support the primary use-case, which is lazy >>>>>>>>>>> initialization. >>>>>>>>>>> >>>>>>>>>>> I also object to your alternative suggestion of removing the >>>>>>>>>>> operation >>>>>>>>>>> entirely and just using cmpxchg directly instead.? I don't >>>>>>>>>>> recall how >>>>>>>>>>> many occurrences there presently are, but I suspect more >>>>>>>>>>> could easily >>>>>>>>>>> be added; it's part of a lazy initialization pattern similar >>>>>>>>>>> to DCLP >>>>>>>>>>> but without the locks. >>>>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>> > From david.holmes at oracle.com Thu Oct 12 21:56:24 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Oct 2017 07:56:24 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7e4fbba3-4462-a729-b663-99fd6919360f@oracle.com> <59DDCC37.8050306@oracle.com> <3089b845-0532-d6a9-b68f-91b3b21c6ef3@oracle.com> <591c33b3-f9b1-55f3-2c4b-ddfad4ed9a39@oracle.com> <4fb119b8-cb0b-474c-ebbc-60841ef4aa46@oracle.com> <5986a9d6-a27f-8462-d13e-5e11de8e358c@oracle.com> <354409e5-8985-6710-1a1a-848a6b366d12@oracle.com> <1a37a25f-8a72-3990-4849-24dbfbc21b0a@oracle.com> Message-ID: On 13/10/2017 3:23 AM, coleen.phillimore at oracle.com wrote: > > Here's the qseries in webrevs. Are these the latest or do they match the big webrev you previously put out? > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.add_ptr/webrev There are still two add(-n) instead of sub(n) cases. Also here: --- old/src/hotspot/share/services/mallocTracker.hpp 2017-10-12 12:15:32.951573341 -0400 +++ new/src/hotspot/share/services/mallocTracker.hpp 2017-10-12 12:15:32.386616320 -0400 @@ -68,7 +68,7 @@ if (sz > 0) { // unary minus operator applied to unsigned type, result still unsigned #pragma warning(suppress: 4146) - Atomic::add(-sz, &_size); + Atomic::sub(sz, &_size); You should be able to remove the comment and pragma now as no unary minus is being applied (at this level). Thanks, David > open webrev at > http://cr.openjdk.java.net/~coleenp/8188220.cmpxchg_ptr/webrev > open webrev at > http://cr.openjdk.java.net/~coleenp/8188220.cmpxchg_if_null/webrev > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.xchg_ptr/webrev > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.store_ptr/webrev > open webrev at > http://cr.openjdk.java.net/~coleenp/8188220.load_ptr_acquire/webrev > open webrev at > http://cr.openjdk.java.net/~coleenp/8188220.assembler_cmpxchg/webrev > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.casptr/webrev > open webrev at > http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev > > assembler_cmpxchg should be release_store_ptr which got qrefreshed with > trying to get the cmpxchg function pointer to compile. > > Thanks, > Coleen > > On 10/12/17 8:55 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/12/17 8:21 AM, David Holmes wrote: >>> On 12/10/2017 9:52 PM, coleen.phillimore at oracle.com wrote: >>>> On 10/12/17 3:23 AM, David Holmes wrote: >>>>> Hi Coleen, >>>>> >>>>> Thanks for doing this tedious cleanup! >>>>> >>>>> It was good to see so many casts disappear; and sad to see so many >>>>> have to now appear in the sync code. :( >>>> >>>> The sync code has _owner field as void* because it can be several >>>> things.? I didn't try to >>> >>> Yeah I understood why this had to happen. >>> >>>>> >>>>> There were a few things that struck me ... >>>>> >>>>> Atomic::xchg_ptr turned into Atomic::xchg; yet for the stub >>>>> generator routines atomic_xchg_ptr became atomic_xchg_long - but I >>>>> can't see where that stub will now come into play? >>>> >>>> http://cr.openjdk.java.net/~coleenp/8188220.02/webrev/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp.udiff.html >>>> >>>> >>>> I tried to remove it but windows x64 uses a stub for xchg (and others). >>> >>> Ah so I think this is where it is used: >>> >>> ./os_cpu/windows_x86/atomic_windows_x86.hpp:DEFINE_STUB_XCHG(8, >>> jlong, os::atomic_xchg_ptr_func) >>> >>> ie atomic_xchg_ptr is the stub for Atomic::xchg<8> >>> >>>> There was a preexisting stub for cmpxchg_long which I followed >>>> naming convention. >>>> >>>> ?? static address _atomic_cmpxchg_entry; >>>> ?? static address _atomic_cmpxchg_byte_entry; >>>> ?? static address _atomic_cmpxchg_long_entry; >>>> >>>> Technically I think it should be long_long, as well as the >>>> cmpxchg_long_entry as well. >>> >>> Or int64_t >>> >>>> I also missed renaming store_ptr_entry and add_ptr_entry.? What do >>>> you suggest? >>> >>> store_ptr_entry actually seems unused. >>> >>> add_ptr_entry looks like it needs to be the 64-bit Atomic::add<8> >>> implementation - so probably add_int64_t_entry. >> >> https://bugs.openjdk.java.net/browse/JDK-8186903 >> >> I'm renaming to ptr => long for now to follow other code and fixing >> the name with this RFE to what it really is, and what we decide. >> >> It was pretty ugly as: >> >> ? static jint????? (*atomic_add_func)?????????? (jint, volatile jint*); >> ? static intptr_t? (*atomic_add_ptr_func)?????? (intptr_t, volatile >> intptr_t*); >> >> When the other uses jint as an argument.?? Actually, I think add_ptr >> makes more sense in this context than long.? I think I should leave >> this name and not make it long. >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>>>> >>>>> +? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>>>> +????????????????????????????????? (volatile intptr_t *)&_data, >>>>> +????????????????????????????????? (intptr_t)old_age._data); >>>>> >>>>> The actual types here should be size_t, can we now change it to use >>>>> the real type? >>>> >>>> Yes, fixed.? Missed that one. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/oops/cpCache.cpp >>>>> >>>>> ?114 bool ConstantPoolCacheEntry::init_flags_atomic(intptr_t flags) { >>>>> ?115?? intptr_t result = Atomic::cmpxchg(flags, &_flags, (intptr_t)0); >>>>> ?116?? return (result == 0); >>>>> ?117 } >>>>> >>>>> _flags is actually intx, yet above we treat it as intptr_t. But >>>>> then later: >>>>> >>>>> ?156?? if (_flags == 0) { >>>>> ?157???? intx newflags = (value & parameter_size_mask); >>>>> ?158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >>>>> ?159?? } >>>>> >>>>> its intx again. This looks really odd to me. >>>> >>>> It's better as an intx, because that's what it's declared as. I'll >>>> patch up some other uses but don't promise total consistency because >>>> I don't want to pull on this particular sweater thread too much. >>>> intx and intptr_t I believe are typedefed to each other. >>>> >>>> typedef intptr_t? intx; >>>> >>>> Should we not have intx and uintx and change all their uses? I've >>>> sworn off large changes after this though. >>> >>> I don't know why we have intx/uintx other than someone not liking >>> having to type intptr_t all the time. >>> >>>> ConstantPoolCacheEntry::make_flags returns an int.?? I fixed >>>> init_flags_atomic() because it's declared with an intx and defined >>>> with intptr_t. >>> >>> Ok. >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/runtime/objectMonitor.inline.hpp >>>>> >>>>> The addition of header_addr() made me a little nervous :) Can we >>>>> add a sanity assert either inside it (or in synchronizer.cpp), to >>>>> verify that this == &_header? (or monitor == monitor->header_addr()) >>>> >>>> Where I introduced it, looked like undefined behavior because it >>>> assumed that the header was the first field. >>> >>> Assumes and expects, I think. Not sure if it is undefined behaviour >>> or not. >> >> Assumes without giving the static compiler a chance to check that what >> you've done is correct or not.? Maybe that's not undefined behavior. >>> >>>> So I should sanity check that other places with undefined behavior >>>> won't break?? Sure I'll do that. >>> >>> No only sanity check that your change actually didn't change >>> anything. :) >> >> As well. >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/runtime/synchronizer.cpp >>>>> >>>>> ?// global list of blocks of monitors >>>>> -// gBlockList is really PaddedEnd *, but we don't >>>>> -// want to expose the PaddedEnd template more than necessary. >>>>> -ObjectMonitor * volatile ObjectSynchronizer::gBlockList = NULL; >>>>> +PaddedEnd * volatile ObjectSynchronizer::gBlockList >>>>> = NULL; >>>>> >>>>> Did this have to change? I'm not sure why we didn't want to expose >>>>> PaddedEnd, but it is now being exposed. >>>> >>>> I didn't see why not and it avoided a bunch of ugly casts.?? I >>>> tested that the SA was fine with it because the SA manually did the >>>> address adjustment.? The SA could be fixed to know about PaddedEnd >>>> if it's somehting they want to do. >>> >>> Glad you mentioned SA as I forgot to mention that with the vmStructs >>> changes. :) >>> >>>> Thanks for going through and reviewing all of this.?? Please answer >>>> question about the stub function name and I'll include the change >>>> with this patch. >>> >>> Would like to see an incremental webrev please. (Should be easy if >>> you're using mq :) ) >> >> Will do. >> >> Thanks, >> Coleen >>> >>> Thanks, >>> David >>> >>>> Coleen >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>> >>>>> On 11/10/2017 11:50 PM, coleen.phillimore at oracle.com wrote: >>>>>> >>>>>> Please review version .02 which removes use of replace_if_null, >>>>>> but not the function.? A separate RFE can be filed to discuss that. >>>>>> >>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.02/webrev >>>>>> >>>>>> Thanks, >>>>>> Coleen >>>>>> >>>>>> On 10/11/17 7:07 AM, coleen.phillimore at oracle.com wrote: >>>>>>> >>>>>>> >>>>>>> On 10/11/17 4:12 AM, Robbin Ehn wrote: >>>>>>>> On 10/11/2017 10:09 AM, David Holmes wrote: >>>>>>>>> On 11/10/2017 5:45 PM, Erik ?sterlund wrote: >>>>>>>>> >>>>>>>>> Removing the operation is a different argument to renaming it. >>>>>>>>> Most of the above argues for removing it. :) >>>>>>>> >>>>>>>> +1 on removing >>>>>>> >>>>>>> Thank you for all your feedback.? Erik best described what I was >>>>>>> thinking.? I will remove it then.? There were not that many >>>>>>> instances and one instance that people thought would be useful, >>>>>>> needed the old return value. >>>>>>> >>>>>>> Coleen >>>>>>>> >>>>>>>> Thanks, Robbin >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> I have not reviewed this completely yet - thought I'd wait >>>>>>>>>> with that until we agree about replace_if_null, if that is okay. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> /Erik >>>>>>>>>> >>>>>>>>>> On 2017-10-11 05:55, David Holmes wrote: >>>>>>>>>>> On 11/10/2017 1:43 PM, Kim Barrett wrote: >>>>>>>>>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Summary: With the new template functions these are >>>>>>>>>>>>> unnecessary. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. renamed Atomic::replace_if_null to >>>>>>>>>>>>> Atomic::cmpxchg_if_null. I disliked the first name because >>>>>>>>>>>>> it's not explicit from the callers that there's an >>>>>>>>>>>>> underlying cas. If people want to fight, I'll remove the >>>>>>>>>>>>> function and use cmpxchg because there are only a couple >>>>>>>>>>>>> places where this is a little nicer. >>>>>>>>>>>> >>>>>>>>>>>> I'm still looking at other parts, but I want to respond to >>>>>>>>>>>> this now. >>>>>>>>>>>> >>>>>>>>>>>> I object to this change.? I think the proposed new name is >>>>>>>>>>>> confusing, >>>>>>>>>>>> suggesting there are two different comparisons involved. >>>>>>>>>>>> >>>>>>>>>>>> I originally called it something else that I wasn't entirely >>>>>>>>>>>> happy >>>>>>>>>>>> with.? When David suggested replace_if_null I quickly >>>>>>>>>>>> adopted that as >>>>>>>>>>>> I think that name exactly describes what it does. In >>>>>>>>>>>> particular, I >>>>>>>>>>>> think "atomic replace if" pretty clearly suggests a >>>>>>>>>>>> test-and-set / >>>>>>>>>>>> compare-and-swap type of operation. >>>>>>>>>>> >>>>>>>>>>> I totally agree. It's an Atomic operation, the implementation >>>>>>>>>>> will involve something atomic, it doesn't matter if it is >>>>>>>>>>> cmpxchg or something else. The name replace_if_null describes >>>>>>>>>>> exactly what the function does - it doesn't have to describe >>>>>>>>>>> how it does it. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>>> Further, I think any name involving "cmpxchg" is problematic >>>>>>>>>>>> because >>>>>>>>>>>> the result of this operation is intentionally different from >>>>>>>>>>>> cmpxchg, >>>>>>>>>>>> in order to better support the primary use-case, which is lazy >>>>>>>>>>>> initialization. >>>>>>>>>>>> >>>>>>>>>>>> I also object to your alternative suggestion of removing the >>>>>>>>>>>> operation >>>>>>>>>>>> entirely and just using cmpxchg directly instead.? I don't >>>>>>>>>>>> recall how >>>>>>>>>>>> many occurrences there presently are, but I suspect more >>>>>>>>>>>> could easily >>>>>>>>>>>> be added; it's part of a lazy initialization pattern similar >>>>>>>>>>>> to DCLP >>>>>>>>>>>> but without the locks. >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>> >> > From kim.barrett at oracle.com Thu Oct 12 23:17:38 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 12 Oct 2017 19:17:38 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: Message-ID: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> > On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: > > Summary: With the new template functions these are unnecessary. > > The changes are mostly s/_ptr// and removing the cast to return type. There weren't many types that needed to be improved to match the template version of the function. Some notes: > 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging arguments. > 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. I disliked the first name because it's not explicit from the callers that there's an underlying cas. If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer. > 3. Added Atomic::sub() > > Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. > > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8188220 > > Thanks, > Coleen I looked harder at the potential ABA problems, and believe they are okay. There can be multiple threads doing pushes, and there can be multiple threads doing pops, but not both at the same time. ------------------------------------------------------------------------------ src/hotspot/cpu/zero/cppInterpreter_zero.cpp 279 if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != disp) { How does this work? monitor and disp seem like they have unrelated types? Given that this is zero-specific code, maybe this hasn't been tested? Similarly here: 423 if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != lock) { ------------------------------------------------------------------------------ src/hotspot/share/asm/assembler.cpp 239 dcon->value_fn = cfn; Is it actually safe to remove the atomic update? If multiple threads performing the assignment *are* possible (and I don't understand the context yet, so don't know the answer to that), then a bare non-atomic assignment is a race, e.g. undefined behavior. Regardless of that, I think the CAST_FROM_FN_PTR should be retained. ------------------------------------------------------------------------------ src/hotspot/share/classfile/classLoaderData.cpp 167 Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head); I think the cast to Chunk* is no longer needed. ------------------------------------------------------------------------------ src/hotspot/share/classfile/classLoaderData.cpp 946 ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, (ClassLoaderData*)NULL); 947 if (old != NULL) { 948 delete cld; 949 // Returns the data. 950 return old; 951 } That could instead be if (!Atomic::replace_if_null(cld, cld_addr)) { delete cld; // Lost the race. return *cld_addr; // Use the winner's value. } And apparently the caller of CLDG::add doesn't care whether the returned CLD has actually been added to the graph yet. If that's not true, then there's a bug here, since a race loser might return a winner's value before the winner has actually done the insertion. ------------------------------------------------------------------------------ src/hotspot/share/classfile/verifier.cpp 71 static void* verify_byte_codes_fn() { 72 if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) { 73 void *lib_handle = os::native_java_library(); 74 void *func = os::dll_lookup(lib_handle, "VerifyClassCodesForMajorVersion"); 75 OrderAccess::release_store(&_verify_byte_codes_fn, func); 76 if (func == NULL) { 77 _is_new_verify_byte_codes_fn = false; 78 func = os::dll_lookup(lib_handle, "VerifyClassCodes"); 79 OrderAccess::release_store(&_verify_byte_codes_fn, func); 80 } 81 } 82 return (void*)_verify_byte_codes_fn; 83 } [pre-existing] I think this code has race problems; a caller could unexpectedly and inappropriately return NULL. Consider the case where there is no VerifyClassCodesForMajorVersion, but there is VerifyClassCodes. The variable is initially NULL. Both Thread1 and Thread2 reach line 73, having both seen a NULL value for the variable. Thread1 reaches line 80, setting the variable to VerifyClassCodes. Thread2 reaches line 76, resetting the variable to NULL. Thread1 reads the now (momentarily) NULL value and returns it. I think the first release_store should be conditional on func != NULL. Also, the usage of _is_new_verify_byte_codes_fn seems suspect. And a minor additional nit: the cast in the return is unnecessary. ------------------------------------------------------------------------------ src/hotspot/share/code/nmethod.cpp 1664 nmethod* observed_mark_link = _oops_do_mark_link; 1665 if (observed_mark_link == NULL) { 1666 // Claim this nmethod for this thread to mark. 1667 if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, &_oops_do_mark_link)) { With these changes, the only use of observed_mark_link is in the if. I'm not sure that variable is really useful anymore, e.g. just use if (_oops_do_mark_link == NULL) { ------------------------------------------------------------------------------ src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were of type oopDesc*, I think there would be a whole lot fewer casts and cast_to_oop's. Later on, I think suffix_head, observed_overflow_list, and curr_overflow_list could also be oopDesc* instead of oop to eliminate more casts. And some similar changes in CMSCollector::par_push_on_overflow_list. And similarly in parNewGeneration.cpp, in push_on_overflow_list and take_from_overflow_list_work. As noted in the comments for JDK-8165857, the lists and "objects" involved here aren't really oops, but rather the shattered remains of oops. The suggestion there was to use HeapWord* and carry through the fanout; what was actually done was to change _overflow_list to oopDesc* to minimize fanout, even though that's kind of lying to the type system. Now, with the cleanup of cmpxchg_ptr and such, we're paying the price of doing the minimal thing back then. ------------------------------------------------------------------------------ src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp 7960 Atomic::add(-n, &_num_par_pushes); Atomic::sub ------------------------------------------------------------------------------ src/hotspot/share/gc/cms/parNewGeneration.cpp 1455 Atomic::add(-n, &_num_par_pushes); Atomic::sub ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/dirtyCardQueue.cpp 283 void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, nd); ... 289 nd = static_cast(actual); Change actual's type to BufferNode* and remove the cast on line 289. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1CollectedHeap.cpp [pre-existing] 3499 old = (CompiledMethod*)_postponed_list; I think that cast is only needed because G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as "volatile CompiledMethod*", when I think it ought to be "CompiledMethod* volatile". I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed, with a similar should not be needed cast: 3530 first = (CompiledMethod*)_claimed_nmethod; and another for _postponed_list here: 3552 claim = (CompiledMethod*)_postponed_list; ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1HotCardCache.cpp 77 jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr, I think the cast of the cmpxchg result is no longer needed. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp 254 char* touch_addr = (char*)Atomic::add(actual_chunk_size, &_cur_addr) - actual_chunk_size; I think the cast of the add result is no longer needed. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1StringDedup.cpp 213 return (size_t)Atomic::add(partition_size, &_next_bucket) - partition_size; I think the cast of the add result is no longer needed. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/heapRegionRemSet.cpp 200 PerRegionTable* res = 201 Atomic::cmpxchg(nxt, &_free_list, fl); Please remove the line break, now that the code has been simplified. But wait, doesn't this alloc exhibit classic ABA problems? I *think* this works because alloc and bulk_free are called in different phases, never overlapping. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/sparsePRT.cpp 295 SparsePRT* res = 296 Atomic::cmpxchg(sprt, &_head_expanded_list, hd); and 307 SparsePRT* res = 308 Atomic::cmpxchg(next, &_head_expanded_list, hd); I'd rather not have the line breaks in these either. And get_from_expanded_list also appears to have classic ABA problems. I *think* this works because add_to_expanded_list and get_from_expanded_list are called in different phases, never overlapping. ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/taskqueue.inline.hpp 262 return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, 263 (volatile intptr_t *)&_data, 264 (intptr_t)old_age._data); This should be return Atomic::cmpxchg(new_age._data, &_data, old_age._data); ------------------------------------------------------------------------------ src/hotspot/share/interpreter/bytecodeInterpreter.cpp This doesn't have any casts, which I think is correct. 708 if (Atomic::cmpxchg(header, rcvr->mark_addr(), mark) == mark) { but these do. 718 if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), mark) == mark) { 737 if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), header) == header) { I'm not sure how the ones with casts even compile? mark_addr() seems to be a markOop*, which is a markOopDesc**, where markOopDesc is a class. void* is not implicitly convertible to markOopDesc*. Hm, this entire file is #ifdef CC_INTERP. Is this zero-only code? Or something like that? Similarly here: 906 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { and 917 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { 935 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { and here: 1847 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { 1858 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { 1878 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { and here: 1847 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { 1858 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { 1878 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { ------------------------------------------------------------------------------ src/hotspot/share/memory/metaspace.cpp 1502 size_t value = OrderAccess::load_acquire(&_capacity_until_GC); ... 1537 return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC); These and other uses of _capacity_until_GC suggest that variable's type should be size_t rather than intptr_t. Note that I haven't done a careful check of uses to see if there are any places where such a change would cause problems. ------------------------------------------------------------------------------ src/hotspot/share/oops/constantPool.cpp 229 OrderAccess::release_store((Klass* volatile *)adr, k); 246 OrderAccess::release_store((Klass* volatile *)adr, k); 514 OrderAccess::release_store((Klass* volatile *)adr, k); Casts are not needed. ------------------------------------------------------------------------------ src/hotspot/share/oops/constantPool.hpp 148 volatile intptr_t adr = OrderAccess::load_acquire(obj_at_addr_raw(which)); [pre-existing] Why is adr declared volatile? ------------------------------------------------------------------------------ src/hotspot/share/oops/cpCache.cpp 157 intx newflags = (value & parameter_size_mask); 158 Atomic::cmpxchg(newflags, &_flags, (intx)0); This is a nice demonstration of why I wanted to include some value preserving integral conversions in cmpxchg, rather than requiring exact type matching in the integral case. There have been some others that I haven't commented on. Apparently we (I) got away with including such conversions in Atomic::add, which I'd forgotten about. And see comment regarding Atomic::sub below. ------------------------------------------------------------------------------ src/hotspot/share/oops/cpCache.hpp 139 volatile Metadata* _f1; // entry specific metadata field [pre-existing] I suspect the type should be Metadata* volatile. And that would eliminate the need for the cast here: 339 Metadata* f1_ord() const { return (Metadata *)OrderAccess::load_acquire(&_f1); } I don't know if there are any other changes needed or desirable around _f1 usage. ------------------------------------------------------------------------------ src/hotspot/share/oops/method.hpp 139 volatile address from_compiled_entry() const { return OrderAccess::load_acquire(&_from_compiled_entry); } 140 volatile address from_compiled_entry_no_trampoline() const; 141 volatile address from_interpreted_entry() const{ return OrderAccess::load_acquire(&_from_interpreted_entry); } [pre-existing] The volatile qualifiers here seem suspect to me. ------------------------------------------------------------------------------ src/hotspot/share/oops/oop.inline.hpp 391 narrowOop old = (narrowOop)Atomic::xchg(val, (narrowOop*)dest); Cast of return type is not needed. ------------------------------------------------------------------------------ src/hotspot/share/prims/jni.cpp [pre-existing] copy_jni_function_table should be using Copy::disjoint_words_atomic. ------------------------------------------------------------------------------ src/hotspot/share/prims/jni.cpp [pre-existing] 3892 // We're about to use Atomic::xchg for synchronization. Some Zero 3893 // platforms use the GCC builtin __sync_lock_test_and_set for this, 3894 // but __sync_lock_test_and_set is not guaranteed to do what we want 3895 // on all architectures. So we check it works before relying on it. 3896 #if defined(ZERO) && defined(ASSERT) 3897 { 3898 jint a = 0xcafebabe; 3899 jint b = Atomic::xchg(0xdeadbeef, &a); 3900 void *c = &a; 3901 void *d = Atomic::xchg(&b, &c); 3902 assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, "Atomic::xchg() works"); 3903 assert(c == &b && d == &a, "Atomic::xchg() works"); 3904 } 3905 #endif // ZERO && ASSERT It seems rather strange to be testing Atomic::xchg() here, rather than as part of unit testing Atomic? Fail unit testing => don't try to use... ------------------------------------------------------------------------------ src/hotspot/share/prims/jvmtiRawMonitor.cpp 130 if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { 142 if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, &_owner)) { I think these casts aren't needed. _owner is void*, and Self is Thread*, which is implicitly convertible to void*. Similarly here, for the THREAD argument: 280 Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL); 283 Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL); ------------------------------------------------------------------------------ src/hotspot/share/prims/jvmtiRawMonitor.hpp This file is in the webrev, but seems to be unchanged. ------------------------------------------------------------------------------ src/hotspot/share/runtime/atomic.hpp 520 template 521 inline D Atomic::sub(I sub_value, D volatile* dest) { 522 STATIC_ASSERT(IsPointer::value || IsIntegral::value); 523 // Assumes two's complement integer representation. 524 #pragma warning(suppress: 4146) 525 return Atomic::add(-sub_value, dest); 526 } I'm pretty sure this implementation is incorrect. I think it produces the wrong result when I and D are both unsigned integer types and sizeof(I) < sizeof(D). ------------------------------------------------------------------------------ src/hotspot/share/runtime/mutex.cpp 304 intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, &_LockWord.FullWord, (intptr_t)0); // agro ... _LBIT should probably be intptr_t, rather than an enum. Note that the enum type is unused. The old value here is another place where an implicit widening of same signedness would have been nice. (Such implicit widening doesn't work for enums, since it's unspecified whether they default to signed or unsigned representation, and implementatinos differ.) ------------------------------------------------------------------------------ src/hotspot/share/runtime/mutex.hpp [pre-existing] I think the Address member of the SplitWord union is unused. Looking at AcquireOrPush (and others), I'm wondering whether it *should* be used there, or whether just using intptr_t casts and doing integral arithmetic (as is presently being done) is easier and clearer. Also the _LSBINDEX macro probably ought to be defined in mutex.cpp rather than polluting the global namespace. And technically, that name is reserved word. ------------------------------------------------------------------------------ src/hotspot/share/runtime/objectMonitor.cpp 252 void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); 409 if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { 1983 ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); I think the casts of Self aren't needed. ------------------------------------------------------------------------------ src/hotspot/share/runtime/objectMonitor.cpp 995 if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { 1020 if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { I think the casts of THREAD aren't needed. ------------------------------------------------------------------------------ src/hotspot/share/runtime/objectMonitor.hpp 254 markOopDesc* volatile* header_addr(); Why isn't this volatile markOop* ? ------------------------------------------------------------------------------ src/hotspot/share/runtime/synchronizer.cpp 242 Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) { I think the cast of Self isn't needed. ------------------------------------------------------------------------------ src/hotspot/share/runtime/synchronizer.cpp 992 for (; block != NULL; block = (PaddedEnd *)next(block)) { 1734 for (; block != NULL; block = (PaddedEnd *)next(block)) { [pre-existing] All calls to next() pass a PaddedEnd* and cast the result. How about moving all that behavior into next(). ------------------------------------------------------------------------------ src/hotspot/share/runtime/synchronizer.cpp 1970 if (monitor > (ObjectMonitor *)&block[0] && 1971 monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) { [pre-existing] Are the casts needed here? I think PaddedEnd is derived from ObjectMonitor, so implicit conversions should apply. ------------------------------------------------------------------------------ src/hotspot/share/runtime/synchronizer.hpp 28 #include "memory/padded.hpp" 163 static PaddedEnd * volatile gBlockList; I was going to suggest as an alternative just making gBlockList a file scoped variable in synchronizer.cpp, since it isn't used outside of that file. Except that it is referenced by vmStructs. Curses! ------------------------------------------------------------------------------ src/hotspot/share/runtime/thread.cpp 4707 intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, (intptr_t)0); This and other places suggest LOCKBIT should be defined as intptr_t, rather than as an enum value. The MuxBits enum type is unused. And the cast of 0 is another case where implicit widening would be nice. ------------------------------------------------------------------------------ src/hotspot/share/services/mallocSiteTable.cpp 261 bool MallocSiteHashtableEntry::atomic_insert(const MallocSiteHashtableEntry* entry) { 262 return Atomic::cmpxchg_if_null(entry, (const MallocSiteHashtableEntry**)&_next); 263 } I think the problem here that is leading to the cast is that atomic_insert is taking a const T*. Note that it's only caller passes a non-const T*. ------------------------------------------------------------------------------ From david.holmes at oracle.com Fri Oct 13 00:55:53 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Oct 2017 10:55:53 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> Message-ID: <2e8d66a6-24c3-b4de-e187-47a9e582238c@oracle.com> Hi Kim, Very detailed analysis! A few things have already been updated by Coleen. Many of the issues with possibly incorrect/inappropriate types really need to be dealt with separately - they go beyond the basic renaming - by their component teams. Similarly any ABA issues - which are likely non-issues but not clearly documented - should be handled separately. And the potential race you highlight below - though to be honest I couldn't match your statements with the code as shown. Thanks, David On 13/10/2017 9:17 AM, Kim Barrett wrote: >> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >> >> Summary: With the new template functions these are unnecessary. >> >> The changes are mostly s/_ptr// and removing the cast to return type. There weren't many types that needed to be improved to match the template version of the function. Some notes: >> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging arguments. >> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. I disliked the first name because it's not explicit from the callers that there's an underlying cas. If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer. >> 3. Added Atomic::sub() >> >> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8188220 >> >> Thanks, >> Coleen > > I looked harder at the potential ABA problems, and believe they are > okay. There can be multiple threads doing pushes, and there can be > multiple threads doing pops, but not both at the same time. > > ------------------------------------------------------------------------------ > src/hotspot/cpu/zero/cppInterpreter_zero.cpp > 279 if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != disp) { > > How does this work? monitor and disp seem like they have unrelated > types? Given that this is zero-specific code, maybe this hasn't been > tested? > > Similarly here: > 423 if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != lock) { > > ------------------------------------------------------------------------------ > src/hotspot/share/asm/assembler.cpp > 239 dcon->value_fn = cfn; > > Is it actually safe to remove the atomic update? If multiple threads > performing the assignment *are* possible (and I don't understand the > context yet, so don't know the answer to that), then a bare non-atomic > assignment is a race, e.g. undefined behavior. > > Regardless of that, I think the CAST_FROM_FN_PTR should be retained. > > ------------------------------------------------------------------------------ > src/hotspot/share/classfile/classLoaderData.cpp > 167 Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head); > > I think the cast to Chunk* is no longer needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/classfile/classLoaderData.cpp > 946 ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, (ClassLoaderData*)NULL); > 947 if (old != NULL) { > 948 delete cld; > 949 // Returns the data. > 950 return old; > 951 } > > That could instead be > > if (!Atomic::replace_if_null(cld, cld_addr)) { > delete cld; // Lost the race. > return *cld_addr; // Use the winner's value. > } > > And apparently the caller of CLDG::add doesn't care whether the > returned CLD has actually been added to the graph yet. If that's not > true, then there's a bug here, since a race loser might return a > winner's value before the winner has actually done the insertion. > > ------------------------------------------------------------------------------ > src/hotspot/share/classfile/verifier.cpp > 71 static void* verify_byte_codes_fn() { > 72 if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) { > 73 void *lib_handle = os::native_java_library(); > 74 void *func = os::dll_lookup(lib_handle, "VerifyClassCodesForMajorVersion"); > 75 OrderAccess::release_store(&_verify_byte_codes_fn, func); > 76 if (func == NULL) { > 77 _is_new_verify_byte_codes_fn = false; > 78 func = os::dll_lookup(lib_handle, "VerifyClassCodes"); > 79 OrderAccess::release_store(&_verify_byte_codes_fn, func); > 80 } > 81 } > 82 return (void*)_verify_byte_codes_fn; > 83 } > > [pre-existing] > > I think this code has race problems; a caller could unexpectedly and > inappropriately return NULL. Consider the case where there is no > VerifyClassCodesForMajorVersion, but there is VerifyClassCodes. > > The variable is initially NULL. > > Both Thread1 and Thread2 reach line 73, having both seen a NULL value > for the variable. > > Thread1 reaches line 80, setting the variable to VerifyClassCodes. > > Thread2 reaches line 76, resetting the variable to NULL. > > Thread1 reads the now (momentarily) NULL value and returns it. > > I think the first release_store should be conditional on func != NULL. > Also, the usage of _is_new_verify_byte_codes_fn seems suspect. > And a minor additional nit: the cast in the return is unnecessary. > > ------------------------------------------------------------------------------ > src/hotspot/share/code/nmethod.cpp > 1664 nmethod* observed_mark_link = _oops_do_mark_link; > 1665 if (observed_mark_link == NULL) { > 1666 // Claim this nmethod for this thread to mark. > 1667 if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, &_oops_do_mark_link)) { > > With these changes, the only use of observed_mark_link is in the if. > I'm not sure that variable is really useful anymore, e.g. just use > > if (_oops_do_mark_link == NULL) { > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp > > In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were > of type oopDesc*, I think there would be a whole lot fewer casts and > cast_to_oop's. Later on, I think suffix_head, observed_overflow_list, > and curr_overflow_list could also be oopDesc* instead of oop to > eliminate more casts. > > And some similar changes in CMSCollector::par_push_on_overflow_list. > > And similarly in parNewGeneration.cpp, in push_on_overflow_list and > take_from_overflow_list_work. > > As noted in the comments for JDK-8165857, the lists and "objects" > involved here aren't really oops, but rather the shattered remains of > oops. The suggestion there was to use HeapWord* and carry through the > fanout; what was actually done was to change _overflow_list to > oopDesc* to minimize fanout, even though that's kind of lying to the > type system. Now, with the cleanup of cmpxchg_ptr and such, we're > paying the price of doing the minimal thing back then. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp > 7960 Atomic::add(-n, &_num_par_pushes); > > Atomic::sub > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/cms/parNewGeneration.cpp > 1455 Atomic::add(-n, &_num_par_pushes); > > Atomic::sub > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/dirtyCardQueue.cpp > 283 void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, nd); > ... > 289 nd = static_cast(actual); > > Change actual's type to BufferNode* and remove the cast on line 289. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1CollectedHeap.cpp > > [pre-existing] > 3499 old = (CompiledMethod*)_postponed_list; > > I think that cast is only needed because > G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as > "volatile CompiledMethod*", when I think it ought to be > "CompiledMethod* volatile". > > I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed, > with a similar should not be needed cast: > 3530 first = (CompiledMethod*)_claimed_nmethod; > > and another for _postponed_list here: > 3552 claim = (CompiledMethod*)_postponed_list; > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1HotCardCache.cpp > 77 jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr, > > I think the cast of the cmpxchg result is no longer needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp > 254 char* touch_addr = (char*)Atomic::add(actual_chunk_size, &_cur_addr) - actual_chunk_size; > > I think the cast of the add result is no longer needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1StringDedup.cpp > 213 return (size_t)Atomic::add(partition_size, &_next_bucket) - partition_size; > > I think the cast of the add result is no longer needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegionRemSet.cpp > 200 PerRegionTable* res = > 201 Atomic::cmpxchg(nxt, &_free_list, fl); > > Please remove the line break, now that the code has been simplified. > > But wait, doesn't this alloc exhibit classic ABA problems? I *think* > this works because alloc and bulk_free are called in different phases, > never overlapping. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/sparsePRT.cpp > 295 SparsePRT* res = > 296 Atomic::cmpxchg(sprt, &_head_expanded_list, hd); > and > 307 SparsePRT* res = > 308 Atomic::cmpxchg(next, &_head_expanded_list, hd); > > I'd rather not have the line breaks in these either. > > And get_from_expanded_list also appears to have classic ABA problems. > I *think* this works because add_to_expanded_list and > get_from_expanded_list are called in different phases, never > overlapping. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/taskqueue.inline.hpp > 262 return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, > 263 (volatile intptr_t *)&_data, > 264 (intptr_t)old_age._data); > > This should be > > return Atomic::cmpxchg(new_age._data, &_data, old_age._data); > > ------------------------------------------------------------------------------ > src/hotspot/share/interpreter/bytecodeInterpreter.cpp > This doesn't have any casts, which I think is correct. > 708 if (Atomic::cmpxchg(header, rcvr->mark_addr(), mark) == mark) { > > but these do. > 718 if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), mark) == mark) { > 737 if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), header) == header) { > > I'm not sure how the ones with casts even compile? mark_addr() seems > to be a markOop*, which is a markOopDesc**, where markOopDesc is a > class. void* is not implicitly convertible to markOopDesc*. > > Hm, this entire file is #ifdef CC_INTERP. Is this zero-only code? Or > something like that? > > Similarly here: > 906 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { > and > 917 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { > 935 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { > > and here: > 1847 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { > 1858 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { > 1878 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { > > and here: > 1847 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { > 1858 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { > 1878 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { > > ------------------------------------------------------------------------------ > src/hotspot/share/memory/metaspace.cpp > 1502 size_t value = OrderAccess::load_acquire(&_capacity_until_GC); > ... > 1537 return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC); > > These and other uses of _capacity_until_GC suggest that variable's > type should be size_t rather than intptr_t. Note that I haven't done > a careful check of uses to see if there are any places where such a > change would cause problems. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/constantPool.cpp > 229 OrderAccess::release_store((Klass* volatile *)adr, k); > 246 OrderAccess::release_store((Klass* volatile *)adr, k); > 514 OrderAccess::release_store((Klass* volatile *)adr, k); > > Casts are not needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/constantPool.hpp > 148 volatile intptr_t adr = OrderAccess::load_acquire(obj_at_addr_raw(which)); > > [pre-existing] > Why is adr declared volatile? > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/cpCache.cpp > 157 intx newflags = (value & parameter_size_mask); > 158 Atomic::cmpxchg(newflags, &_flags, (intx)0); > > This is a nice demonstration of why I wanted to include some value > preserving integral conversions in cmpxchg, rather than requiring > exact type matching in the integral case. There have been some others > that I haven't commented on. Apparently we (I) got away with > including such conversions in Atomic::add, which I'd forgotten about. > And see comment regarding Atomic::sub below. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/cpCache.hpp > 139 volatile Metadata* _f1; // entry specific metadata field > > [pre-existing] > I suspect the type should be Metadata* volatile. And that would > eliminate the need for the cast here: > > 339 Metadata* f1_ord() const { return (Metadata *)OrderAccess::load_acquire(&_f1); } > > I don't know if there are any other changes needed or desirable around > _f1 usage. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/method.hpp > 139 volatile address from_compiled_entry() const { return OrderAccess::load_acquire(&_from_compiled_entry); } > 140 volatile address from_compiled_entry_no_trampoline() const; > 141 volatile address from_interpreted_entry() const{ return OrderAccess::load_acquire(&_from_interpreted_entry); } > > [pre-existing] > The volatile qualifiers here seem suspect to me. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/oop.inline.hpp > 391 narrowOop old = (narrowOop)Atomic::xchg(val, (narrowOop*)dest); > > Cast of return type is not needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jni.cpp > > [pre-existing] > > copy_jni_function_table should be using Copy::disjoint_words_atomic. > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jni.cpp > > [pre-existing] > > 3892 // We're about to use Atomic::xchg for synchronization. Some Zero > 3893 // platforms use the GCC builtin __sync_lock_test_and_set for this, > 3894 // but __sync_lock_test_and_set is not guaranteed to do what we want > 3895 // on all architectures. So we check it works before relying on it. > 3896 #if defined(ZERO) && defined(ASSERT) > 3897 { > 3898 jint a = 0xcafebabe; > 3899 jint b = Atomic::xchg(0xdeadbeef, &a); > 3900 void *c = &a; > 3901 void *d = Atomic::xchg(&b, &c); > 3902 assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, "Atomic::xchg() works"); > 3903 assert(c == &b && d == &a, "Atomic::xchg() works"); > 3904 } > 3905 #endif // ZERO && ASSERT > > It seems rather strange to be testing Atomic::xchg() here, rather than > as part of unit testing Atomic? Fail unit testing => don't try to > use... > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jvmtiRawMonitor.cpp > 130 if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { > 142 if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, &_owner)) { > > I think these casts aren't needed. _owner is void*, and Self is > Thread*, which is implicitly convertible to void*. > > Similarly here, for the THREAD argument: > 280 Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL); > 283 Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL); > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jvmtiRawMonitor.hpp > > This file is in the webrev, but seems to be unchanged. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/atomic.hpp > 520 template > 521 inline D Atomic::sub(I sub_value, D volatile* dest) { > 522 STATIC_ASSERT(IsPointer::value || IsIntegral::value); > 523 // Assumes two's complement integer representation. > 524 #pragma warning(suppress: 4146) > 525 return Atomic::add(-sub_value, dest); > 526 } > > I'm pretty sure this implementation is incorrect. I think it produces > the wrong result when I and D are both unsigned integer types and > sizeof(I) < sizeof(D). > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/mutex.cpp > 304 intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, &_LockWord.FullWord, (intptr_t)0); // agro ... > > _LBIT should probably be intptr_t, rather than an enum. Note that the > enum type is unused. The old value here is another place where an > implicit widening of same signedness would have been nice. (Such > implicit widening doesn't work for enums, since it's unspecified > whether they default to signed or unsigned representation, and > implementatinos differ.) > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/mutex.hpp > > [pre-existing] > > I think the Address member of the SplitWord union is unused. Looking > at AcquireOrPush (and others), I'm wondering whether it *should* be > used there, or whether just using intptr_t casts and doing integral > arithmetic (as is presently being done) is easier and clearer. > > Also the _LSBINDEX macro probably ought to be defined in mutex.cpp > rather than polluting the global namespace. And technically, that > name is reserved word. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/objectMonitor.cpp > 252 void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); > 409 if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { > 1983 ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); > > I think the casts of Self aren't needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/objectMonitor.cpp > 995 if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { > 1020 if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { > > I think the casts of THREAD aren't needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/objectMonitor.hpp > 254 markOopDesc* volatile* header_addr(); > > Why isn't this volatile markOop* ? > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/synchronizer.cpp > 242 Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) { > > I think the cast of Self isn't needed. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/synchronizer.cpp > 992 for (; block != NULL; block = (PaddedEnd *)next(block)) { > 1734 for (; block != NULL; block = (PaddedEnd *)next(block)) { > > [pre-existing] > All calls to next() pass a PaddedEnd* and cast the > result. How about moving all that behavior into next(). > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/synchronizer.cpp > 1970 if (monitor > (ObjectMonitor *)&block[0] && > 1971 monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) { > > [pre-existing] > Are the casts needed here? I think PaddedEnd is > derived from ObjectMonitor, so implicit conversions should apply. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/synchronizer.hpp > 28 #include "memory/padded.hpp" > 163 static PaddedEnd * volatile gBlockList; > > I was going to suggest as an alternative just making gBlockList a file > scoped variable in synchronizer.cpp, since it isn't used outside of > that file. Except that it is referenced by vmStructs. Curses! > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/thread.cpp > 4707 intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, (intptr_t)0); > > This and other places suggest LOCKBIT should be defined as intptr_t, > rather than as an enum value. The MuxBits enum type is unused. > > And the cast of 0 is another case where implicit widening would be nice. > > ------------------------------------------------------------------------------ > src/hotspot/share/services/mallocSiteTable.cpp > 261 bool MallocSiteHashtableEntry::atomic_insert(const MallocSiteHashtableEntry* entry) { > 262 return Atomic::cmpxchg_if_null(entry, (const MallocSiteHashtableEntry**)&_next); > 263 } > > I think the problem here that is leading to the cast is that > atomic_insert is taking a const T*. Note that it's only caller passes > a non-const T*. > > ------------------------------------------------------------------------------ > From david.holmes at oracle.com Fri Oct 13 03:08:23 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Oct 2017 13:08:23 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> Message-ID: <20ef0bac-1942-b29f-a9e2-4ea4d4f81cd2@oracle.com> Hi Bob, On 13/10/2017 1:43 AM, Bob Vandette wrote: > >> On Oct 11, 2017, at 9:04 PM, David Holmes wrote: >> >> Hi Bob, >> >> On 12/10/2017 5:11 AM, Bob Vandette wrote: >>> Here?s an updated webrev for this RFE that contains changes and cleanups based on feedback I?ve received so far. >>> I?m still investigating the best approach for reacting to cpu shares and quotas. I do not believe doing nothing is the answer. >> >> I do. :) Let me try this again. When you run outside of a container you don't get 100% of the CPUs - you have to share with whatever else is running on the system. You get a fraction of CPU time based on the load. We don't try to communicate load information to the VM/application so it can adapt. Within a container setting shares/quotas is just a way of setting an artificial load. So why should we be treating it any differently? > Because today we optimize for a lightly loaded system and when running serverless applications in containers we should be > optimizing for a fully loaded system. If developers don?t want this, then don?t use shares or quotas and you?ll have exactly > the behavior you have today. I think we just have to document the new behavior (and how to turn it off) so people know what > to expect. The person deploying the app may not have control over how the app is deployed in terms of shares/quotas. It all depends how (and who) manages the containers. This is a big part of my problem/concerns here that I don't know exactly how all this is organized and who knows what in advance and what they can control. But I'll let this drop, other than raising an additional concern. I don't think just allowing the user to hardwire the number of processors to use will necessarily solve the problem with what available_processors() returns. I'm concerned the execution of the VM may occur in a context where the number of processors is not known in advance, and the user can not disable shares/quotas. In that case we may need to have a flag that says to ignore shares/quotas in the processor count calculation. > You seem to discount the added cost of 100s of VMs creating lots of un-necessaary threads. In the current JDK 10 code base, > In a heavily loaded system with 88 processors, VmData grows from 60MBs (1 cpu) to 376MB (88 cpus). This is only mapped > memory and it depends heavily on how deep in the stack these threads go before it impacts VmRSS but it shows the potential downside > of having 100s of VMs thinking they each own the entire machine. I agree that the default ergonomics does not scale well. Anyone doing any serious Java deployment tunes the VM explicitly and does not rely on the defaults. How will they do that in a container environment? I don't know. I would love to see some actual deployment scenarios/experiences for this to understand things better. > I haven?t even done any experiments to determine the added context switching cost if the VM decides to use excessive > pthreads. > >> >> That's not to say an API to provide load/shares/quota information may not be useful, but that is a separate issue to what the "active processor count" should report. > I don?t have a problem with active processor count reporting the number of processors we have, but I do have a problem > with our current usage of this information within the VM and Core libraries. That is a somewhat separate issue. One worth pursuing separately. >> >>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01 >>> Updates: >>> 1. I had to move the processing of AggressiveHeap since the container memory size needs to be known before this can be processed. >> >> I don't like the placement of this - we don't call os:: init functions from inside Arguments - we manage the initialization sequence from Threads::create_vm. Seems to me that container initialization can/should happen in os::init_before_ergo, and the AggressiveHeap processing can occur at the start of Arguments::apply_ergo(). >> >> That said we need to be sure nothing touched by set_aggressive_heap_flags will be used before we now reach that code - there are a lot of flags being set in there. > > This is exactly the reason why I put the call where it did. I put the call to set_aggressive_heap_flags in finalize_vm_init_args > because that is exactly what this call is doing. It?s finalizing flags used after the parsing. The impacted flags are definitely being > used shortly after and before init_before_ergo is called. I see that now and it is very unfortunate because I really do not like what you had to do here. As you can tell from the logic in create_vm we have always refactored to ensure we can progressively manage the interleaving of OS initialization with Arguments processing. So having a deep part of Argument processing go off and call some more OS initialization is not nice. That said I can't see a way around it without very unreasonable refactoring. But I do have a couple of changes I'd like to request please: 1. Move the call to os::initialize_container_support() up a level to before the call to finalize_vm_init_args(), with a more elaborate comment: // We need to ensure processor and memory resources have been properly // configured - which may rely on arguments we just processed - before // doing the final argument processing. Any argument processing that // needs to know about processor and memory resources must occur after // this point. os::initialize_container_support(); // Do final processing now that all arguments have been parsed result = finalize_vm_init_args(patch_mod_javabase); 2. Simplify and modify os.hpp as follows: + LINUX_ONLY(static void pd_initialize_container_support();) public: static void init(void); // Called before command line parsing + static void initialize_container_support() { // Called during command line parsing + LINUX_ONLY(pd_initialize_container_support();) + } static void init_before_ergo(void); // Called after command line parsing // before VM ergonomics 3. In thread.cpp add a comment here: // Parse arguments + // Note: this internally calls os::initialize_container_support() jint parse_result = Arguments::parse(args); Thanks. > >> >>> 2. I no longer use the cpuset.cpus contents since sched_getaffinity reports the correct results >>> even if someone manually updates the cgroup data. I originally didn?t think this was the case since >>> sched_setaffinity didn?t automatically update the cpuset file contents but the inverse is true. >> >> Ok. >> >>> 3. I ifdef?d the container function support in src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os >>> platform directories. I can do this if it?s absolutely necessary. >> >> You should not need to do this if initialization moves as I suggested above. os::init_before_ergo() in os_linux.cpp can call OSContainer::init(). > >> No need for os::initialize_container_support() or os::pd_initialize_container_support. > > But os::init_before_ergo is in shared code. Yep my bad - point is moot now anyway. >> src/hotspot/os/linux/os_linux.cpp/.hpp >> >> 187 log_trace(os)("available container memory: " JULONG_FORMAT, avail_mem); >> 188 return avail_mem; >> 189 } else { >> 190 log_debug(os,container)("container memory usage call failed: " JLONG_FORMAT, mem_usage); >> >> Why "trace" (the third logging level) to show the information, but "debug" (the second level) to show failed calls? You use debug in other files for basic info. Overall I'm unclear on your use of debug versus trace for the logging. > > I use trace for noisy information that is not reporting errors and debug for failures that are informational and not fatal. > In this case, the call could return -1 or -2. -1 is unlimited and -2 is an error. In either case we fallback to the > standard system call to get available memory. I would have used warning but since these messages were occurring > during a test run causing test failures. Okay. Thanks for clarifying. >> >> --- >> >> src/hotspot/os/linux/osContainer_linux.cpp >> >> Dead code: >> >> 376 #if 0 >> 377 os::Linux::print_container_info(tty); >> ... >> 390 #endif > > I left it in for standalone testing. Should I use some other #if? We don't generally leave in dead code in the runtime code. Do you see this as useful after you've finalized the changes? Is this testing just for showing the logging? Is it worth making this a logging controlled call? Is it suitable for a Gtest test? Thanks, David ----- > Bob. > >> >> Thanks, >> David >> >>> Bob. > From goetz.lindenmaier at sap.com Fri Oct 13 06:38:59 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 13 Oct 2017 06:38:59 +0000 Subject: RFR(M): 8189102: All tools should support -?, -h and --help In-Reply-To: References: Message-ID: <2cd7785d6dad442e90d403b2eb96c588@sap.com> Hi Vladimir, added that for jaotc, thanks! Best regards, Goetz. > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf Of Vladimir Kozlov > Sent: Donnerstag, 12. Oktober 2017 01:04 > To: hotspot-dev at openjdk.java.net > Subject: Re: RFR(M): 8189102: All tools should support -?, -h and --help > > You missed AOT tool jaotc: > > http://hg.openjdk.java.net/jdk10/hs/file/44117bc2bedf/src/jdk.aot/share/cl > asses/jdk.tools.jaotc/src/jdk/tools/jaotc/Options.java#l230 > > }, new Option(" --help Print this usage message", false, "--help", > "-h", "-?") { > > Vladimir > > On 10/11/17 1:06 PM, Lindenmaier, Goetz wrote: > > Hi > > > > The tools in jdk should all show the same behavior wrt. help flags. > > This change normalizes the help flags of a row of the tools in the jdk. > > Java accepts -?, -h and --help, thus I changed the tools to support > > these, too. Some tools exited with '1' after displaying the help message, > > I turned this to '0'. > > > > Maybe this is not the right mailing list for this, please advise. > > > > Please review this change. I please need a sponsor. > > http://cr.openjdk.java.net/~goetz/wr17/8189102- > helpMessage/webrev.01/ > > > > In detail, this fixes the help message of the following tools: > > jar -? -h --help; added -?. > > jarsigner -? -h --help; added --help. -help accepted but not documented. > > javac -? --help; added -?. Removed -help. -h is taken for other > purpose > > javadoc -? -h --help; added -h -?. Removed -help > > javap -? -h --help; added -h. -help accepted but no more documented. > > jcmd -? -h --help; added -? --help. -help accepted but no more > documented. Changed return value to '0' > > jdb -? -h --help; added -? -h --help. -help accepted but no more > documented. > > jdeprscan -? -h --help; added -? > > jinfo -? -h --help; added -? --help. -help accepted but no more > documented. > > jjs -h --help; Replaced -help by --help. Adding more not straight > forward. > > jps -? -h --help; added -? --help. -help accepted but no more > documented. > > jshell -? -h --help; added -? > > jstat -? -h --help; added -h --help. -help accepted but no more > documented. > > > > Best regards, > > Goetz. > > From thomas.schatzl at oracle.com Fri Oct 13 13:04:21 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 13 Oct 2017 15:04:21 +0200 Subject: RFR(M) 8186834:Expanding old area without full GC in parallel GC In-Reply-To: References: Message-ID: <1507899861.3162.12.camel@oracle.com> Hi, On Tue, 2017-08-29 at 00:20 +0900, Michihiro Horie wrote: > Dear all, > > Would you please review the following change? > bug: https://bugs.openjdk.java.net/browse/JDK-8186834 > webrev: http://cr.openjdk.java.net/~mhorie/8186834/webrev.00/ > > In parallel GC, old area is expanded only after a full GC occurs. > I am wondering if we could give an option to expand old area without > full GC. So, I added an option > UseAdaptiveGenerationSizePolicyBeforeMajorCollection Sorry for the late (and probably stupid) question, but what is the difference (in performance) to simply set -Xms==-Xmx here? And why not make the (first) full gc expand the heap more aggressively? (I think there is at least one way to do that, something like Min/MaxFreeHeapRatio or so, I can look it up if needed). Thanks, Thomas > Following is a simple micro benchmark I used to see the benefit of > this change. > As a result, pause time of full GC reduced by 30%. Full GC count > reduced by 54%. > Elapsed time reduced by 7%. > > import java.util.HashMap; > import java.util.Map; > public class HeapExpandTest { > ? static Map map = new HashMap<>(); > ? public static void main(String[] args) throws Exception { > ????long start = System.currentTimeMillis(); > ????for (int i = 0; i < 2200; ++i) { > ??????map.put(i, new byte[1024*1024]); // 1MB > ????} > ????System.out.println("elapsed= " + (System.currentTimeMillis() - > start)); > ? } > } > > JVM options: -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy > -XX:ParallelGCThreads=8 -Xms64m -Xmx3g > -XX:+UseAdaptiveGenerationSizePolicyBeforeMajorCollection From bob.vandette at oracle.com Fri Oct 13 13:14:19 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Fri, 13 Oct 2017 09:14:19 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <20ef0bac-1942-b29f-a9e2-4ea4d4f81cd2@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <33d617e7-7ec4-ebde-efa1-5602189e8470@oracle.com> <12909a67-6876-a40c-85b9-b959ed9f02df@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> <20ef0bac-1942-b29f-a9e2-4ea4d4f81cd2@oracle.com> Message-ID: > On Oct 12, 2017, at 11:08 PM, David Holmes wrote: > > Hi Bob, > > On 13/10/2017 1:43 AM, Bob Vandette wrote: >>> On Oct 11, 2017, at 9:04 PM, David Holmes wrote: >>> >>> Hi Bob, >>> >>> On 12/10/2017 5:11 AM, Bob Vandette wrote: >>>> Here?s an updated webrev for this RFE that contains changes and cleanups based on feedback I?ve received so far. >>>> I?m still investigating the best approach for reacting to cpu shares and quotas. I do not believe doing nothing is the answer. >>> >>> I do. :) Let me try this again. When you run outside of a container you don't get 100% of the CPUs - you have to share with whatever else is running on the system. You get a fraction of CPU time based on the load. We don't try to communicate load information to the VM/application so it can adapt. Within a container setting shares/quotas is just a way of setting an artificial load. So why should we be treating it any differently? >> Because today we optimize for a lightly loaded system and when running serverless applications in containers we should be >> optimizing for a fully loaded system. If developers don?t want this, then don?t use shares or quotas and you?ll have exactly >> the behavior you have today. I think we just have to document the new behavior (and how to turn it off) so people know what >> to expect. > > The person deploying the app may not have control over how the app is deployed in terms of shares/quotas. It all depends how (and who) manages the containers. This is a big part of my problem/concerns here that I don't know exactly how all this is organized and who knows what in advance and what they can control. > > But I'll let this drop, other than raising an additional concern. I don't think just allowing the user to hardwire the number of processors to use will necessarily solve the problem with what available_processors() returns. I'm concerned the execution of the VM may occur in a context where the number of processors is not known in advance, and the user can not disable shares/quotas. In that case we may need to have a flag that says to ignore shares/quotas in the processor count calculation. I?m not sure that?s a high probability issue. It?s my understanding that whoever is configuring the container management will be specifying the resources required to run these applications which comes along with a guarantee of these resources. If this issue does come up, I do have the -XX:-UseContainerSupport big switch that turns all of this off. It will however disable the memory support as well. > >> You seem to discount the added cost of 100s of VMs creating lots of un-necessaary threads. In the current JDK 10 code base, >> In a heavily loaded system with 88 processors, VmData grows from 60MBs (1 cpu) to 376MB (88 cpus). This is only mapped >> memory and it depends heavily on how deep in the stack these threads go before it impacts VmRSS but it shows the potential downside >> of having 100s of VMs thinking they each own the entire machine. > > I agree that the default ergonomics does not scale well. Anyone doing any serious Java deployment tunes the VM explicitly and does not rely on the defaults. How will they do that in a container environment? I don't know. > > I would love to see some actual deployment scenarios/experiences for this to understand things better. This is one of the reasons I want to get this support out in JDK 10, to get some feedback under real scenarios. > >> I haven?t even done any experiments to determine the added context switching cost if the VM decides to use excessive >> pthreads. >>> >>> That's not to say an API to provide load/shares/quota information may not be useful, but that is a separate issue to what the "active processor count" should report. >> I don?t have a problem with active processor count reporting the number of processors we have, but I do have a problem >> with our current usage of this information within the VM and Core libraries. > > That is a somewhat separate issue. One worth pursuing separately. We should look at this as part of the ?Container aware Java? JEP. > >>> >>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01 >>>> Updates: >>>> 1. I had to move the processing of AggressiveHeap since the container memory size needs to be known before this can be processed. >>> >>> I don't like the placement of this - we don't call os:: init functions from inside Arguments - we manage the initialization sequence from Threads::create_vm. Seems to me that container initialization can/should happen in os::init_before_ergo, and the AggressiveHeap processing can occur at the start of Arguments::apply_ergo(). >>> >>> That said we need to be sure nothing touched by set_aggressive_heap_flags will be used before we now reach that code - there are a lot of flags being set in there. >> This is exactly the reason why I put the call where it did. I put the call to set_aggressive_heap_flags in finalize_vm_init_args >> because that is exactly what this call is doing. It?s finalizing flags used after the parsing. The impacted flags are definitely being >> used shortly after and before init_before_ergo is called. > > I see that now and it is very unfortunate because I really do not like what you had to do here. As you can tell from the logic in create_vm we have always refactored to ensure we can progressively manage the interleaving of OS initialization with Arguments processing. So having a deep part of Argument processing go off and call some more OS initialization is not nice. That said I can't see a way around it without very unreasonable refactoring. > > But I do have a couple of changes I'd like to request please: > > 1. Move the call to os::initialize_container_support() up a level to before the call to finalize_vm_init_args(), with a more elaborate comment: > > // We need to ensure processor and memory resources have been properly > // configured - which may rely on arguments we just processed - before > // doing the final argument processing. Any argument processing that > // needs to know about processor and memory resources must occur after > // this point. > > os::initialize_container_support(); > > // Do final processing now that all arguments have been parsed > result = finalize_vm_init_args(patch_mod_javabase); > > 2. Simplify and modify os.hpp as follows: > > + LINUX_ONLY(static void pd_initialize_container_support();) > > public: > static void init(void); // Called before command line parsing > > + static void initialize_container_support() { // Called during command line parsing > + LINUX_ONLY(pd_initialize_container_support();) > + } > > static void init_before_ergo(void); // Called after command line parsing > // before VM ergonomics > > 3. In thread.cpp add a comment here: > > // Parse arguments > + // Note: this internally calls os::initialize_container_support() > jint parse_result = Arguments::parse(args); All very reasonable changes. Thanks, Bob. > > Thanks. > >>> >>>> 2. I no longer use the cpuset.cpus contents since sched_getaffinity reports the correct results >>>> even if someone manually updates the cgroup data. I originally didn?t think this was the case since >>>> sched_setaffinity didn?t automatically update the cpuset file contents but the inverse is true. >>> >>> Ok. >>> >>>> 3. I ifdef?d the container function support in src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os >>>> platform directories. I can do this if it?s absolutely necessary. >>> >>> You should not need to do this if initialization moves as I suggested above. os::init_before_ergo() in os_linux.cpp can call OSContainer::init(). >>> No need for os::initialize_container_support() or os::pd_initialize_container_support. >> But os::init_before_ergo is in shared code. > > Yep my bad - point is moot now anyway. > > > >>> src/hotspot/os/linux/os_linux.cpp/.hpp >>> >>> 187 log_trace(os)("available container memory: " JULONG_FORMAT, avail_mem); >>> 188 return avail_mem; >>> 189 } else { >>> 190 log_debug(os,container)("container memory usage call failed: " JLONG_FORMAT, mem_usage); >>> >>> Why "trace" (the third logging level) to show the information, but "debug" (the second level) to show failed calls? You use debug in other files for basic info. Overall I'm unclear on your use of debug versus trace for the logging. >> I use trace for noisy information that is not reporting errors and debug for failures that are informational and not fatal. >> In this case, the call could return -1 or -2. -1 is unlimited and -2 is an error. In either case we fallback to the >> standard system call to get available memory. I would have used warning but since these messages were occurring >> during a test run causing test failures. > > Okay. Thanks for clarifying. > >>> >>> --- >>> >>> src/hotspot/os/linux/osContainer_linux.cpp >>> >>> Dead code: >>> >>> 376 #if 0 >>> 377 os::Linux::print_container_info(tty); >>> ... >>> 390 #endif >> I left it in for standalone testing. Should I use some other #if? > > We don't generally leave in dead code in the runtime code. Do you see this as useful after you've finalized the changes? > > Is this testing just for showing the logging? Is it worth making this a logging controlled call? Is it suitable for a Gtest test? > > Thanks, > David > ----- > >> Bob. >>> >>> Thanks, >>> David >>> >>>> Bob. From coleen.phillimore at oracle.com Fri Oct 13 13:25:06 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 13 Oct 2017 09:25:06 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> Message-ID: <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> Hi Kim, Thank you for the detailed review and the time you've spent on it, and discussion yesterday. On 10/12/17 7:17 PM, Kim Barrett wrote: >> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >> >> Summary: With the new template functions these are unnecessary. >> >> The changes are mostly s/_ptr// and removing the cast to return type. There weren't many types that needed to be improved to match the template version of the function. Some notes: >> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging arguments. >> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. I disliked the first name because it's not explicit from the callers that there's an underlying cas. If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer. >> 3. Added Atomic::sub() >> >> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8188220 >> >> Thanks, >> Coleen > I looked harder at the potential ABA problems, and believe they are > okay. There can be multiple threads doing pushes, and there can be > multiple threads doing pops, but not both at the same time. > > ------------------------------------------------------------------------------ > src/hotspot/cpu/zero/cppInterpreter_zero.cpp > 279 if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != disp) { > > How does this work? monitor and disp seem like they have unrelated > types? Given that this is zero-specific code, maybe this hasn't been > tested? > > Similarly here: > 423 if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != lock) { I haven't built zero.? I don't know how to do this anymore (help?) I fixed the obvious type mismatches here and in bytecodeInterpreter.cpp.? I'll try to build it. > > ------------------------------------------------------------------------------ > src/hotspot/share/asm/assembler.cpp > 239 dcon->value_fn = cfn; > > Is it actually safe to remove the atomic update? If multiple threads > performing the assignment *are* possible (and I don't understand the > context yet, so don't know the answer to that), then a bare non-atomic > assignment is a race, e.g. undefined behavior. > > Regardless of that, I think the CAST_FROM_FN_PTR should be retained. I can find no uses of this code, ie. looking for "delayed_value".? I think it was early jsr292 code.? I could also not find any combination of casts that would make it compile, so in the end I believed the comment and took out the cmpxchg.?? The code appears to be intended to for bootstrapping, see the call to update_delayed_values() in JavaClasses::compute_offsets(). The CAST_FROM_FN_PTR was to get it to compile with cmpxchg, the new code does not require a cast.? If you can help with finding the right set of casts, I'd be happy to put the cmpxchg back in.? I just couldn't find one. > > ------------------------------------------------------------------------------ > src/hotspot/share/classfile/classLoaderData.cpp > 167 Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head); > > I think the cast to Chunk* is no longer needed. Missed another, thanks.? No that's the same one David found. > > ------------------------------------------------------------------------------ > src/hotspot/share/classfile/classLoaderData.cpp > 946 ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, (ClassLoaderData*)NULL); > 947 if (old != NULL) { > 948 delete cld; > 949 // Returns the data. > 950 return old; > 951 } > > That could instead be > > if (!Atomic::replace_if_null(cld, cld_addr)) { > delete cld; // Lost the race. > return *cld_addr; // Use the winner's value. > } > > And apparently the caller of CLDG::add doesn't care whether the > returned CLD has actually been added to the graph yet. If that's not > true, then there's a bug here, since a race loser might return a > winner's value before the winner has actually done the insertion. True, the race loser doesn't care whether the CLD has been added to the graph. Your instead code requires a comment that replace_if_null is really a compare exchange and has an extra read of the original value, so I am leaving what I have which is clearer to me. > > ------------------------------------------------------------------------------ > src/hotspot/share/classfile/verifier.cpp > 71 static void* verify_byte_codes_fn() { > 72 if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) { > 73 void *lib_handle = os::native_java_library(); > 74 void *func = os::dll_lookup(lib_handle, "VerifyClassCodesForMajorVersion"); > 75 OrderAccess::release_store(&_verify_byte_codes_fn, func); > 76 if (func == NULL) { > 77 _is_new_verify_byte_codes_fn = false; > 78 func = os::dll_lookup(lib_handle, "VerifyClassCodes"); > 79 OrderAccess::release_store(&_verify_byte_codes_fn, func); > 80 } > 81 } > 82 return (void*)_verify_byte_codes_fn; > 83 } > > [pre-existing] > > I think this code has race problems; a caller could unexpectedly and > inappropriately return NULL. Consider the case where there is no > VerifyClassCodesForMajorVersion, but there is VerifyClassCodes. > > The variable is initially NULL. > > Both Thread1 and Thread2 reach line 73, having both seen a NULL value > for the variable. > > Thread1 reaches line 80, setting the variable to VerifyClassCodes. > > Thread2 reaches line 76, resetting the variable to NULL. > > Thread1 reads the now (momentarily) NULL value and returns it. > > I think the first release_store should be conditional on func != NULL. > Also, the usage of _is_new_verify_byte_codes_fn seems suspect. > And a minor additional nit: the cast in the return is unnecessary. Yes, this looks like a bug.?? I'll cut/paste this and file it.?? It may be that this is support for the old verifier in old jdk versions that can be cleaned up. > > ------------------------------------------------------------------------------ > src/hotspot/share/code/nmethod.cpp > 1664 nmethod* observed_mark_link = _oops_do_mark_link; > 1665 if (observed_mark_link == NULL) { > 1666 // Claim this nmethod for this thread to mark. > 1667 if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, &_oops_do_mark_link)) { > > With these changes, the only use of observed_mark_link is in the if. > I'm not sure that variable is really useful anymore, e.g. just use > > if (_oops_do_mark_link == NULL) { Ok fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp > > In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were > of type oopDesc*, I think there would be a whole lot fewer casts and > cast_to_oop's. Later on, I think suffix_head, observed_overflow_list, > and curr_overflow_list could also be oopDesc* instead of oop to > eliminate more casts. I actually tried to make this change but ran into more fan out that way, so went back and just fixed the cmpxchg calls to cast oops to oopDesc* and things were less perturbed that way. > > And some similar changes in CMSCollector::par_push_on_overflow_list. > > And similarly in parNewGeneration.cpp, in push_on_overflow_list and > take_from_overflow_list_work. > > As noted in the comments for JDK-8165857, the lists and "objects" > involved here aren't really oops, but rather the shattered remains of Yes, somewhat horrified at the value of BUSY. > oops. The suggestion there was to use HeapWord* and carry through the > fanout; what was actually done was to change _overflow_list to > oopDesc* to minimize fanout, even though that's kind of lying to the > type system. Now, with the cleanup of cmpxchg_ptr and such, we're > paying the price of doing the minimal thing back then. I will file an RFE about cleaning this up.? I think what I've done was the minimal thing. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp > 7960 Atomic::add(-n, &_num_par_pushes); > > Atomic::sub fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/cms/parNewGeneration.cpp > 1455 Atomic::add(-n, &_num_par_pushes); fixed. > Atomic::sub > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/dirtyCardQueue.cpp > 283 void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, nd); > ... > 289 nd = static_cast(actual); > > Change actual's type to BufferNode* and remove the cast on line 289. fixed.? missed that one. gross. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1CollectedHeap.cpp > > [pre-existing] > 3499 old = (CompiledMethod*)_postponed_list; > > I think that cast is only needed because > G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as > "volatile CompiledMethod*", when I think it ought to be > "CompiledMethod* volatile". > > I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed, > with a similar should not be needed cast: > 3530 first = (CompiledMethod*)_claimed_nmethod; > > and another for _postponed_list here: > 3552 claim = (CompiledMethod*)_postponed_list; I've fixed this.?? C++ is so confusing about where to put the volatile.?? Everyone has been tripped up by it. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1HotCardCache.cpp > 77 jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr, > > I think the cast of the cmpxchg result is no longer needed. fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp > 254 char* touch_addr = (char*)Atomic::add(actual_chunk_size, &_cur_addr) - actual_chunk_size; > > I think the cast of the add result is no longer needed. got it already. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1StringDedup.cpp > 213 return (size_t)Atomic::add(partition_size, &_next_bucket) - partition_size; > > I think the cast of the add result is no longer needed. I was slacking in the g1 files.? fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/heapRegionRemSet.cpp > 200 PerRegionTable* res = > 201 Atomic::cmpxchg(nxt, &_free_list, fl); > > Please remove the line break, now that the code has been simplified. > > But wait, doesn't this alloc exhibit classic ABA problems? I *think* > this works because alloc and bulk_free are called in different phases, > never overlapping. I don't know.? Do you want to file a bug to investigate this? fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/sparsePRT.cpp > 295 SparsePRT* res = > 296 Atomic::cmpxchg(sprt, &_head_expanded_list, hd); > and > 307 SparsePRT* res = > 308 Atomic::cmpxchg(next, &_head_expanded_list, hd); > > I'd rather not have the line breaks in these either. > > And get_from_expanded_list also appears to have classic ABA problems. > I *think* this works because add_to_expanded_list and > get_from_expanded_list are called in different phases, never > overlapping. Fixed, same question as above?? Or one bug to investigate both? > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/taskqueue.inline.hpp > 262 return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, > 263 (volatile intptr_t *)&_data, > 264 (intptr_t)old_age._data); > > This should be > > return Atomic::cmpxchg(new_age._data, &_data, old_age._data); fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/interpreter/bytecodeInterpreter.cpp > This doesn't have any casts, which I think is correct. > 708 if (Atomic::cmpxchg(header, rcvr->mark_addr(), mark) == mark) { > > but these do. > 718 if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), mark) == mark) { > 737 if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), header) == header) { > > I'm not sure how the ones with casts even compile? mark_addr() seems > to be a markOop*, which is a markOopDesc**, where markOopDesc is a > class. void* is not implicitly convertible to markOopDesc*. > > Hm, this entire file is #ifdef CC_INTERP. Is this zero-only code? Or > something like that? > > Similarly here: > 906 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { > and > 917 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { > 935 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { > > and here: > 1847 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { > 1858 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { > 1878 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { > > and here: > 1847 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) { > 1858 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) { > 1878 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) { I've changed all these.?? This is part of Zero. > > ------------------------------------------------------------------------------ > src/hotspot/share/memory/metaspace.cpp > 1502 size_t value = OrderAccess::load_acquire(&_capacity_until_GC); > ... > 1537 return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC); > > These and other uses of _capacity_until_GC suggest that variable's > type should be size_t rather than intptr_t. Note that I haven't done > a careful check of uses to see if there are any places where such a > change would cause problems. Yes, I had a hard time with metaspace.cpp because I agree _capacity_until_GC should be size_t.?? Tried to make this change and it cascaded a bit.? I'll file an RFE to change this type separately. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/constantPool.cpp > 229 OrderAccess::release_store((Klass* volatile *)adr, k); > 246 OrderAccess::release_store((Klass* volatile *)adr, k); > 514 OrderAccess::release_store((Klass* volatile *)adr, k); > > Casts are not needed. fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/constantPool.hpp > 148 volatile intptr_t adr = OrderAccess::load_acquire(obj_at_addr_raw(which)); > > [pre-existing] > Why is adr declared volatile? golly beats me.? concurrency is scary, especially in the constant pool. The load_acquire() should make sure the value is fetched from memory so volatile is unneeded. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/cpCache.cpp > 157 intx newflags = (value & parameter_size_mask); > 158 Atomic::cmpxchg(newflags, &_flags, (intx)0); > > This is a nice demonstration of why I wanted to include some value > preserving integral conversions in cmpxchg, rather than requiring > exact type matching in the integral case. There have been some others > that I haven't commented on. Apparently we (I) got away with > including such conversions in Atomic::add, which I'd forgotten about. > And see comment regarding Atomic::sub below. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/cpCache.hpp > 139 volatile Metadata* _f1; // entry specific metadata field > > [pre-existing] > I suspect the type should be Metadata* volatile. And that would > eliminate the need for the cast here: > > 339 Metadata* f1_ord() const { return (Metadata *)OrderAccess::load_acquire(&_f1); } > > I don't know if there are any other changes needed or desirable around > _f1 usage. yes, fixed this. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/method.hpp > 139 volatile address from_compiled_entry() const { return OrderAccess::load_acquire(&_from_compiled_entry); } > 140 volatile address from_compiled_entry_no_trampoline() const; > 141 volatile address from_interpreted_entry() const{ return OrderAccess::load_acquire(&_from_interpreted_entry); } > > [pre-existing] > The volatile qualifiers here seem suspect to me. Again much suspicion about concurrency and giant pain, which I remember, of debugging these when they were broken. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/oop.inline.hpp > 391 narrowOop old = (narrowOop)Atomic::xchg(val, (narrowOop*)dest); > > Cast of return type is not needed. fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jni.cpp > > [pre-existing] > > copy_jni_function_table should be using Copy::disjoint_words_atomic. yuck. > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jni.cpp > > [pre-existing] > > 3892 // We're about to use Atomic::xchg for synchronization. Some Zero > 3893 // platforms use the GCC builtin __sync_lock_test_and_set for this, > 3894 // but __sync_lock_test_and_set is not guaranteed to do what we want > 3895 // on all architectures. So we check it works before relying on it. > 3896 #if defined(ZERO) && defined(ASSERT) > 3897 { > 3898 jint a = 0xcafebabe; > 3899 jint b = Atomic::xchg(0xdeadbeef, &a); > 3900 void *c = &a; > 3901 void *d = Atomic::xchg(&b, &c); > 3902 assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, "Atomic::xchg() works"); > 3903 assert(c == &b && d == &a, "Atomic::xchg() works"); > 3904 } > 3905 #endif // ZERO && ASSERT > > It seems rather strange to be testing Atomic::xchg() here, rather than > as part of unit testing Atomic? Fail unit testing => don't try to > use... This is zero.? I'm not touching this. > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jvmtiRawMonitor.cpp > 130 if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { > 142 if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, &_owner)) { > > I think these casts aren't needed. _owner is void*, and Self is > Thread*, which is implicitly convertible to void*. > > Similarly here, for the THREAD argument: > 280 Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL); > 283 Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL); Okay, let me see if the compiler(s) eat that. (yes they do) > > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jvmtiRawMonitor.hpp > > This file is in the webrev, but seems to be unchanged. It'll be cleaned up with the the commit and not be part of the changeset. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/atomic.hpp > 520 template > 521 inline D Atomic::sub(I sub_value, D volatile* dest) { > 522 STATIC_ASSERT(IsPointer::value || IsIntegral::value); > 523 // Assumes two's complement integer representation. > 524 #pragma warning(suppress: 4146) > 525 return Atomic::add(-sub_value, dest); > 526 } > > I'm pretty sure this implementation is incorrect. I think it produces > the wrong result when I and D are both unsigned integer types and > sizeof(I) < sizeof(D). Can you suggest a correction?? I just copied Atomic::dec(). > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/mutex.cpp > 304 intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, &_LockWord.FullWord, (intptr_t)0); // agro ... > > _LBIT should probably be intptr_t, rather than an enum. Note that the > enum type is unused. The old value here is another place where an > implicit widening of same signedness would have been nice. (Such > implicit widening doesn't work for enums, since it's unspecified > whether they default to signed or unsigned representation, and > implementatinos differ.) This would be a good/simple cleanup.? I changed it to const intptr_t _LBIT = 1; > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/mutex.hpp > > [pre-existing] > > I think the Address member of the SplitWord union is unused. Looking > at AcquireOrPush (and others), I'm wondering whether it *should* be > used there, or whether just using intptr_t casts and doing integral > arithmetic (as is presently being done) is easier and clearer. > > Also the _LSBINDEX macro probably ought to be defined in mutex.cpp > rather than polluting the global namespace. And technically, that > name is reserved word. I moved both this and _LBIT into the top of mutex.cpp since they are used there. Cant define const intptr_t _LBIT =1; in a class in our version of C++. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/objectMonitor.cpp > 252 void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); > 409 if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { > 1983 ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); > > I think the casts of Self aren't needed. fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/objectMonitor.cpp > 995 if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { > 1020 if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { > > I think the casts of THREAD aren't needed. nope, fixed. > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/objectMonitor.hpp > 254 markOopDesc* volatile* header_addr(); > > Why isn't this volatile markOop* ? fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/synchronizer.cpp > 242 Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) { > > I think the cast of Self isn't needed. fixed. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/synchronizer.cpp > 992 for (; block != NULL; block = (PaddedEnd *)next(block)) { > 1734 for (; block != NULL; block = (PaddedEnd *)next(block)) { > > [pre-existing] > All calls to next() pass a PaddedEnd* and cast the > result. How about moving all that behavior into next(). I fixed this next() function, but it necessitated a cast to FreeNext field.? The PaddedEnd<> type was intentionally not propagated to all the things that use it.?? Which is a shame because there are a lot more casts to PaddedEnd that could have been removed. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/synchronizer.cpp > 1970 if (monitor > (ObjectMonitor *)&block[0] && > 1971 monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) { > > [pre-existing] > Are the casts needed here? I think PaddedEnd is > derived from ObjectMonitor, so implicit conversions should apply. prob not.? removed them. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/synchronizer.hpp > 28 #include "memory/padded.hpp" > 163 static PaddedEnd * volatile gBlockList; > > I was going to suggest as an alternative just making gBlockList a file > scoped variable in synchronizer.cpp, since it isn't used outside of > that file. Except that it is referenced by vmStructs. Curses! It's also used by the SA. > > ------------------------------------------------------------------------------ > src/hotspot/share/runtime/thread.cpp > 4707 intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, (intptr_t)0); > > This and other places suggest LOCKBIT should be defined as intptr_t, > rather than as an enum value. The MuxBits enum type is unused. > > And the cast of 0 is another case where implicit widening would be nice. Making LOCKBIT a const intptr_t = 1 removes a lot of casts. > > ------------------------------------------------------------------------------ > src/hotspot/share/services/mallocSiteTable.cpp > 261 bool MallocSiteHashtableEntry::atomic_insert(const MallocSiteHashtableEntry* entry) { > 262 return Atomic::cmpxchg_if_null(entry, (const MallocSiteHashtableEntry**)&_next); > 263 } > > I think the problem here that is leading to the cast is that > atomic_insert is taking a const T*. Note that it's only caller passes > a non-const T*. I'll change the type to non-const.? We try to use consts... Thanks for the detailed review!? The gcc compiler seems happy so far, I'll post a webrev of the result of these changes after fixing Atomic::sub() and seeing how the other compilers deal with these changes. Thanks, Coleen > > ------------------------------------------------------------------------------ > From david.holmes at oracle.com Fri Oct 13 13:34:03 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 13 Oct 2017 23:34:03 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> <20ef0bac-1942-b29f-a9e2-4ea4d4f81cd2@oracle.com> Message-ID: <5d217c60-3049-30a6-c207-d6c9274a5ddf@oracle.com> Reading back through my suggestion for os.hpp initialize_container_support should just be init_container_support. Thanks, David On 13/10/2017 11:14 PM, Bob Vandette wrote: > >> On Oct 12, 2017, at 11:08 PM, David Holmes wrote: >> >> Hi Bob, >> >> On 13/10/2017 1:43 AM, Bob Vandette wrote: >>>> On Oct 11, 2017, at 9:04 PM, David Holmes wrote: >>>> >>>> Hi Bob, >>>> >>>> On 12/10/2017 5:11 AM, Bob Vandette wrote: >>>>> Here?s an updated webrev for this RFE that contains changes and cleanups based on feedback I?ve received so far. >>>>> I?m still investigating the best approach for reacting to cpu shares and quotas. I do not believe doing nothing is the answer. >>>> >>>> I do. :) Let me try this again. When you run outside of a container you don't get 100% of the CPUs - you have to share with whatever else is running on the system. You get a fraction of CPU time based on the load. We don't try to communicate load information to the VM/application so it can adapt. Within a container setting shares/quotas is just a way of setting an artificial load. So why should we be treating it any differently? >>> Because today we optimize for a lightly loaded system and when running serverless applications in containers we should be >>> optimizing for a fully loaded system. If developers don?t want this, then don?t use shares or quotas and you?ll have exactly >>> the behavior you have today. I think we just have to document the new behavior (and how to turn it off) so people know what >>> to expect. >> >> The person deploying the app may not have control over how the app is deployed in terms of shares/quotas. It all depends how (and who) manages the containers. This is a big part of my problem/concerns here that I don't know exactly how all this is organized and who knows what in advance and what they can control. >> >> But I'll let this drop, other than raising an additional concern. I don't think just allowing the user to hardwire the number of processors to use will necessarily solve the problem with what available_processors() returns. I'm concerned the execution of the VM may occur in a context where the number of processors is not known in advance, and the user can not disable shares/quotas. In that case we may need to have a flag that says to ignore shares/quotas in the processor count calculation. > > I?m not sure that?s a high probability issue. It?s my understanding that whoever is configuring the container > management will be specifying the resources required to run these applications which comes along with a > guarantee of these resources. If this issue does come up, I do have the -XX:-UseContainerSupport big > switch that turns all of this off. It will however disable the memory support as well. > >> >>> You seem to discount the added cost of 100s of VMs creating lots of un-necessaary threads. In the current JDK 10 code base, >>> In a heavily loaded system with 88 processors, VmData grows from 60MBs (1 cpu) to 376MB (88 cpus). This is only mapped >>> memory and it depends heavily on how deep in the stack these threads go before it impacts VmRSS but it shows the potential downside >>> of having 100s of VMs thinking they each own the entire machine. >> >> I agree that the default ergonomics does not scale well. Anyone doing any serious Java deployment tunes the VM explicitly and does not rely on the defaults. How will they do that in a container environment? I don't know. >> >> I would love to see some actual deployment scenarios/experiences for this to understand things better. > > This is one of the reasons I want to get this support out in JDK 10, to get some feedback under real scenarios. > >> >>> I haven?t even done any experiments to determine the added context switching cost if the VM decides to use excessive >>> pthreads. >>>> >>>> That's not to say an API to provide load/shares/quota information may not be useful, but that is a separate issue to what the "active processor count" should report. >>> I don?t have a problem with active processor count reporting the number of processors we have, but I do have a problem >>> with our current usage of this information within the VM and Core libraries. >> >> That is a somewhat separate issue. One worth pursuing separately. > > We should look at this as part of the ?Container aware Java? JEP. > >> >>>> >>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01 >>>>> Updates: >>>>> 1. I had to move the processing of AggressiveHeap since the container memory size needs to be known before this can be processed. >>>> >>>> I don't like the placement of this - we don't call os:: init functions from inside Arguments - we manage the initialization sequence from Threads::create_vm. Seems to me that container initialization can/should happen in os::init_before_ergo, and the AggressiveHeap processing can occur at the start of Arguments::apply_ergo(). >>>> >>>> That said we need to be sure nothing touched by set_aggressive_heap_flags will be used before we now reach that code - there are a lot of flags being set in there. >>> This is exactly the reason why I put the call where it did. I put the call to set_aggressive_heap_flags in finalize_vm_init_args >>> because that is exactly what this call is doing. It?s finalizing flags used after the parsing. The impacted flags are definitely being >>> used shortly after and before init_before_ergo is called. >> >> I see that now and it is very unfortunate because I really do not like what you had to do here. As you can tell from the logic in create_vm we have always refactored to ensure we can progressively manage the interleaving of OS initialization with Arguments processing. So having a deep part of Argument processing go off and call some more OS initialization is not nice. That said I can't see a way around it without very unreasonable refactoring. >> >> But I do have a couple of changes I'd like to request please: >> >> 1. Move the call to os::initialize_container_support() up a level to before the call to finalize_vm_init_args(), with a more elaborate comment: >> >> // We need to ensure processor and memory resources have been properly >> // configured - which may rely on arguments we just processed - before >> // doing the final argument processing. Any argument processing that >> // needs to know about processor and memory resources must occur after >> // this point. >> >> os::initialize_container_support(); >> >> // Do final processing now that all arguments have been parsed >> result = finalize_vm_init_args(patch_mod_javabase); >> >> 2. Simplify and modify os.hpp as follows: >> >> + LINUX_ONLY(static void pd_initialize_container_support();) >> >> public: >> static void init(void); // Called before command line parsing >> >> + static void initialize_container_support() { // Called during command line parsing >> + LINUX_ONLY(pd_initialize_container_support();) >> + } >> >> static void init_before_ergo(void); // Called after command line parsing >> // before VM ergonomics >> >> 3. In thread.cpp add a comment here: >> >> // Parse arguments >> + // Note: this internally calls os::initialize_container_support() >> jint parse_result = Arguments::parse(args); > > All very reasonable changes. > > Thanks, > Bob. > >> >> Thanks. >> >>>> >>>>> 2. I no longer use the cpuset.cpus contents since sched_getaffinity reports the correct results >>>>> even if someone manually updates the cgroup data. I originally didn?t think this was the case since >>>>> sched_setaffinity didn?t automatically update the cpuset file contents but the inverse is true. >>>> >>>> Ok. >>>> >>>>> 3. I ifdef?d the container function support in src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os >>>>> platform directories. I can do this if it?s absolutely necessary. >>>> >>>> You should not need to do this if initialization moves as I suggested above. os::init_before_ergo() in os_linux.cpp can call OSContainer::init(). >>>> No need for os::initialize_container_support() or os::pd_initialize_container_support. >>> But os::init_before_ergo is in shared code. >> >> Yep my bad - point is moot now anyway. >> >> >> >>>> src/hotspot/os/linux/os_linux.cpp/.hpp >>>> >>>> 187 log_trace(os)("available container memory: " JULONG_FORMAT, avail_mem); >>>> 188 return avail_mem; >>>> 189 } else { >>>> 190 log_debug(os,container)("container memory usage call failed: " JLONG_FORMAT, mem_usage); >>>> >>>> Why "trace" (the third logging level) to show the information, but "debug" (the second level) to show failed calls? You use debug in other files for basic info. Overall I'm unclear on your use of debug versus trace for the logging. >>> I use trace for noisy information that is not reporting errors and debug for failures that are informational and not fatal. >>> In this case, the call could return -1 or -2. -1 is unlimited and -2 is an error. In either case we fallback to the >>> standard system call to get available memory. I would have used warning but since these messages were occurring >>> during a test run causing test failures. >> >> Okay. Thanks for clarifying. >> >>>> >>>> --- >>>> >>>> src/hotspot/os/linux/osContainer_linux.cpp >>>> >>>> Dead code: >>>> >>>> 376 #if 0 >>>> 377 os::Linux::print_container_info(tty); >>>> ... >>>> 390 #endif >>> I left it in for standalone testing. Should I use some other #if? >> >> We don't generally leave in dead code in the runtime code. Do you see this as useful after you've finalized the changes? >> >> Is this testing just for showing the logging? Is it worth making this a logging controlled call? Is it suitable for a Gtest test? >> >> Thanks, >> David >> ----- >> >>> Bob. >>>> >>>> Thanks, >>>> David >>>> >>>>> Bob. > From coleen.phillimore at oracle.com Fri Oct 13 14:09:42 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 13 Oct 2017 10:09:42 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <2e8d66a6-24c3-b4de-e187-47a9e582238c@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <2e8d66a6-24c3-b4de-e187-47a9e582238c@oracle.com> Message-ID: <756c8ab7-a63b-26e7-fbb9-79bc261068cd@oracle.com> On 10/12/17 8:55 PM, David Holmes wrote: > Hi Kim, > > Very detailed analysis! A few things have already been updated by Coleen. > > Many of the issues with possibly incorrect/inappropriate types really > need to be dealt with separately - they go beyond the basic renaming - > by their component teams. Yes, I fixed up some types that were trivial changes, but agree with you and left other types to be dealt with by the component teams. I filed some RFEs and bugs. thanks, Coleen > > Similarly any ABA issues - which are likely non-issues but not clearly > documented - should be handled separately. And the potential race you > highlight below - though to be honest I couldn't match your statements > with the code as shown. > > Thanks, > David > > On 13/10/2017 9:17 AM, Kim Barrett wrote: >>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>> >>> Summary: With the new template functions these are unnecessary. >>> >>> The changes are mostly s/_ptr// and removing the cast to return >>> type.? There weren't many types that needed to be improved to match >>> the template version of the function.?? Some notes: >>> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging >>> arguments. >>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I >>> disliked the first name because it's not explicit from the callers >>> that there's an underlying cas.? If people want to fight, I'll >>> remove the function and use cmpxchg because there are only a couple >>> places where this is a little nicer. >>> 3. Added Atomic::sub() >>> >>> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8188220 >>> >>> Thanks, >>> Coleen >> >> I looked harder at the potential ABA problems, and believe they are >> okay.? There can be multiple threads doing pushes, and there can be >> multiple threads doing pops, but not both at the same time. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/cpu/zero/cppInterpreter_zero.cpp >> ? 279???? if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != >> disp) { >> >> How does this work?? monitor and disp seem like they have unrelated >> types?? Given that this is zero-specific code, maybe this hasn't been >> tested? >> >> Similarly here: >> ? 423?????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != >> lock) { >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/asm/assembler.cpp >> ? 239???????? dcon->value_fn = cfn; >> >> Is it actually safe to remove the atomic update?? If multiple threads >> performing the assignment *are* possible (and I don't understand the >> context yet, so don't know the answer to that), then a bare non-atomic >> assignment is a race, e.g. undefined behavior. >> >> Regardless of that, I think the CAST_FROM_FN_PTR should be retained. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/classLoaderData.cpp >> ? 167?? Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head); >> >> I think the cast to Chunk* is no longer needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/classLoaderData.cpp >> ? 946???? ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, >> (ClassLoaderData*)NULL); >> ? 947???? if (old != NULL) { >> ? 948?????? delete cld; >> ? 949?????? // Returns the data. >> ? 950?????? return old; >> ? 951???? } >> >> That could instead be >> >> ?? if (!Atomic::replace_if_null(cld, cld_addr)) { >> ???? delete cld;?????????? // Lost the race. >> ???? return *cld_addr;???? // Use the winner's value. >> ?? } >> >> And apparently the caller of CLDG::add doesn't care whether the >> returned CLD has actually been added to the graph yet.? If that's not >> true, then there's a bug here, since a race loser might return a >> winner's value before the winner has actually done the insertion. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/verifier.cpp >> ?? 71 static void* verify_byte_codes_fn() { >> ?? 72?? if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) { >> ?? 73???? void *lib_handle = os::native_java_library(); >> ?? 74???? void *func = os::dll_lookup(lib_handle, >> "VerifyClassCodesForMajorVersion"); >> ?? 75???? OrderAccess::release_store(&_verify_byte_codes_fn, func); >> ?? 76???? if (func == NULL) { >> ?? 77?????? _is_new_verify_byte_codes_fn = false; >> ?? 78?????? func = os::dll_lookup(lib_handle, "VerifyClassCodes"); >> ?? 79 OrderAccess::release_store(&_verify_byte_codes_fn, func); >> ?? 80???? } >> ?? 81?? } >> ?? 82?? return (void*)_verify_byte_codes_fn; >> ?? 83 } >> >> [pre-existing] >> >> I think this code has race problems; a caller could unexpectedly and >> inappropriately return NULL.? Consider the case where there is no >> VerifyClassCodesForMajorVersion, but there is VerifyClassCodes. >> >> The variable is initially NULL. >> >> Both Thread1 and Thread2 reach line 73, having both seen a NULL value >> for the variable. >> >> Thread1 reaches line 80, setting the variable to VerifyClassCodes. >> >> Thread2 reaches line 76, resetting the variable to NULL. >> >> Thread1 reads the now (momentarily) NULL value and returns it. >> >> I think the first release_store should be conditional on func != NULL. >> Also, the usage of _is_new_verify_byte_codes_fn seems suspect. >> And a minor additional nit: the cast in the return is unnecessary. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/code/nmethod.cpp >> 1664?? nmethod* observed_mark_link = _oops_do_mark_link; >> 1665?? if (observed_mark_link == NULL) { >> 1666???? // Claim this nmethod for this thread to mark. >> 1667???? if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, >> &_oops_do_mark_link)) { >> >> With these changes, the only use of observed_mark_link is in the if. >> I'm not sure that variable is really useful anymore, e.g. just use >> >> ?? if (_oops_do_mark_link == NULL) { >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >> >> In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were >> of type oopDesc*, I think there would be a whole lot fewer casts and >> cast_to_oop's.? Later on, I think suffix_head, observed_overflow_list, >> and curr_overflow_list could also be oopDesc* instead of oop to >> eliminate more casts. >> >> And some similar changes in CMSCollector::par_push_on_overflow_list. >> >> And similarly in parNewGeneration.cpp, in push_on_overflow_list and >> take_from_overflow_list_work. >> >> As noted in the comments for JDK-8165857, the lists and "objects" >> involved here aren't really oops, but rather the shattered remains of >> oops.? The suggestion there was to use HeapWord* and carry through the >> fanout; what was actually done was to change _overflow_list to >> oopDesc* to minimize fanout, even though that's kind of lying to the >> type system.? Now, with the cleanup of cmpxchg_ptr and such, we're >> paying the price of doing the minimal thing back then. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >> 7960?? Atomic::add(-n, &_num_par_pushes); >> >> Atomic::sub >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/cms/parNewGeneration.cpp >> 1455?? Atomic::add(-n, &_num_par_pushes); >> >> Atomic::sub >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/dirtyCardQueue.cpp >> ? 283???? void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, >> nd); >> ... >> ? 289?????? nd = static_cast(actual); >> >> Change actual's type to BufferNode* and remove the cast on line 289. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1CollectedHeap.cpp >> >> [pre-existing] >> 3499???????? old = (CompiledMethod*)_postponed_list; >> >> I think that cast is only needed because >> G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as >> "volatile CompiledMethod*", when I think it ought to be >> "CompiledMethod* volatile". >> >> I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed, >> with a similar should not be needed cast: >> 3530?????? first = (CompiledMethod*)_claimed_nmethod; >> >> and another for _postponed_list here: >> 3552?????? claim = (CompiledMethod*)_postponed_list; >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1HotCardCache.cpp >> ?? 77?? jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr, >> >> I think the cast of the cmpxchg result is no longer needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp >> ? 254?????? char* touch_addr = (char*)Atomic::add(actual_chunk_size, >> &_cur_addr) - actual_chunk_size; >> >> I think the cast of the add result is no longer needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1StringDedup.cpp >> ? 213?? return (size_t)Atomic::add(partition_size, &_next_bucket) - >> partition_size; >> >> I think the cast of the add result is no longer needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >> ? 200?????? PerRegionTable* res = >> ? 201???????? Atomic::cmpxchg(nxt, &_free_list, fl); >> >> Please remove the line break, now that the code has been simplified. >> >> But wait, doesn't this alloc exhibit classic ABA problems?? I *think* >> this works because alloc and bulk_free are called in different phases, >> never overlapping. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/sparsePRT.cpp >> ? 295???? SparsePRT* res = >> ? 296?????? Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >> and >> ? 307???? SparsePRT* res = >> ? 308?????? Atomic::cmpxchg(next, &_head_expanded_list, hd); >> >> I'd rather not have the line breaks in these either. >> >> And get_from_expanded_list also appears to have classic ABA problems. >> I *think* this works because add_to_expanded_list and >> get_from_expanded_list are called in different phases, never >> overlapping. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/shared/taskqueue.inline.hpp >> ? 262?? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >> ? 263?????????????????????????????????? (volatile intptr_t *)&_data, >> ? 264 (intptr_t)old_age._data); >> >> This should be >> >> ?? return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/interpreter/bytecodeInterpreter.cpp >> This doesn't have any casts, which I think is correct. >> ? 708???????????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), >> mark) == mark) { >> >> but these do. >> ? 718???????????? if (Atomic::cmpxchg((void*)new_header, >> rcvr->mark_addr(), mark) == mark) { >> ? 737???????????? if (Atomic::cmpxchg((void*)new_header, >> rcvr->mark_addr(), header) == header) { >> >> I'm not sure how the ones with casts even compile?? mark_addr() seems >> to be a markOop*, which is a markOopDesc**, where markOopDesc is a >> class.? void* is not implicitly convertible to markOopDesc*. >> >> Hm, this entire file is #ifdef CC_INTERP.? Is this zero-only code?? Or >> something like that? >> >> Similarly here: >> ? 906?????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >> mark) == mark) { >> and >> ? 917?????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), mark) == mark) { >> ? 935?????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), header) == header) { >> >> and here: >> 1847?????????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >> mark) == mark) { >> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), mark) == mark) { >> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), header) == header) { >> >> and here: >> 1847?????????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >> mark) == mark) { >> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), mark) == mark) { >> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), header) == header) { >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/memory/metaspace.cpp >> 1502?? size_t value = OrderAccess::load_acquire(&_capacity_until_GC); >> ... >> 1537?? return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC); >> >> These and other uses of _capacity_until_GC suggest that variable's >> type should be size_t rather than intptr_t.? Note that I haven't done >> a careful check of uses to see if there are any places where such a >> change would cause problems. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/constantPool.cpp >> ? 229?? OrderAccess::release_store((Klass* volatile *)adr, k); >> ? 246?? OrderAccess::release_store((Klass* volatile *)adr, k); >> ? 514?? OrderAccess::release_store((Klass* volatile *)adr, k); >> >> Casts are not needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/constantPool.hpp >> ? 148???? volatile intptr_t adr = >> OrderAccess::load_acquire(obj_at_addr_raw(which)); >> >> [pre-existing] >> Why is adr declared volatile? >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/cpCache.cpp >> ? 157???? intx newflags = (value & parameter_size_mask); >> ? 158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >> >> This is a nice demonstration of why I wanted to include some value >> preserving integral conversions in cmpxchg, rather than requiring >> exact type matching in the integral case.? There have been some others >> that I haven't commented on.? Apparently we (I) got away with >> including such conversions in Atomic::add, which I'd forgotten about. >> And see comment regarding Atomic::sub below. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/cpCache.hpp >> ? 139?? volatile Metadata*?? _f1;?????? // entry specific metadata field >> >> [pre-existing] >> I suspect the type should be Metadata* volatile.? And that would >> eliminate the need for the cast here: >> >> ? 339?? Metadata* f1_ord() const?????????????????????? { return >> (Metadata *)OrderAccess::load_acquire(&_f1); } >> >> I don't know if there are any other changes needed or desirable around >> _f1 usage. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/method.hpp >> ? 139?? volatile address from_compiled_entry() const?? { return >> OrderAccess::load_acquire(&_from_compiled_entry); } >> ? 140?? volatile address from_compiled_entry_no_trampoline() const; >> ? 141?? volatile address from_interpreted_entry() const{ return >> OrderAccess::load_acquire(&_from_interpreted_entry); } >> >> [pre-existing] >> The volatile qualifiers here seem suspect to me. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/oop.inline.hpp >> ? 391???? narrowOop old = (narrowOop)Atomic::xchg(val, >> (narrowOop*)dest); >> >> Cast of return type is not needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jni.cpp >> >> [pre-existing] >> >> copy_jni_function_table should be using Copy::disjoint_words_atomic. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jni.cpp >> >> [pre-existing] >> >> 3892?? // We're about to use Atomic::xchg for synchronization. Some Zero >> 3893?? // platforms use the GCC builtin __sync_lock_test_and_set for >> this, >> 3894?? // but __sync_lock_test_and_set is not guaranteed to do what >> we want >> 3895?? // on all architectures.? So we check it works before relying >> on it. >> 3896 #if defined(ZERO) && defined(ASSERT) >> 3897?? { >> 3898???? jint a = 0xcafebabe; >> 3899???? jint b = Atomic::xchg(0xdeadbeef, &a); >> 3900???? void *c = &a; >> 3901???? void *d = Atomic::xchg(&b, &c); >> 3902???? assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, >> "Atomic::xchg() works"); >> 3903???? assert(c == &b && d == &a, "Atomic::xchg() works"); >> 3904?? } >> 3905 #endif // ZERO && ASSERT >> >> It seems rather strange to be testing Atomic::xchg() here, rather than >> as part of unit testing Atomic?? Fail unit testing => don't try to >> use... >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jvmtiRawMonitor.cpp >> ? 130???? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >> ? 142???? if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, >> &_owner)) { >> >> I think these casts aren't needed. _owner is void*, and Self is >> Thread*, which is implicitly convertible to void*. >> >> Similarly here, for the THREAD argument: >> ? 280???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >> (void*)NULL); >> ? 283???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >> (void*)NULL); >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jvmtiRawMonitor.hpp >> >> This file is in the webrev, but seems to be unchanged. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/atomic.hpp >> ? 520 template >> ? 521 inline D Atomic::sub(I sub_value, D volatile* dest) { >> ? 522?? STATIC_ASSERT(IsPointer::value || IsIntegral::value); >> ? 523?? // Assumes two's complement integer representation. >> ? 524?? #pragma warning(suppress: 4146) >> ? 525?? return Atomic::add(-sub_value, dest); >> ? 526 } >> >> I'm pretty sure this implementation is incorrect.? I think it produces >> the wrong result when I and D are both unsigned integer types and >> sizeof(I) < sizeof(D). >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/mutex.cpp >> ? 304?? intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, >> &_LockWord.FullWord, (intptr_t)0);? // agro ... >> >> _LBIT should probably be intptr_t, rather than an enum.? Note that the >> enum type is unused.? The old value here is another place where an >> implicit widening of same signedness would have been nice. (Such >> implicit widening doesn't work for enums, since it's unspecified >> whether they default to signed or unsigned representation, and >> implementatinos differ.) >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/mutex.hpp >> >> [pre-existing] >> >> I think the Address member of the SplitWord union is unused. Looking >> at AcquireOrPush (and others), I'm wondering whether it *should* be >> used there, or whether just using intptr_t casts and doing integral >> arithmetic (as is presently being done) is easier and clearer. >> >> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >> rather than polluting the global namespace.? And technically, that >> name is reserved word. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/objectMonitor.cpp >> ? 252?? void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); >> ? 409?? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >> 1983?????? ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, >> (void*)NULL); >> >> I think the casts of Self aren't needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/objectMonitor.cpp >> ? 995?????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >> 1020???????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >> >> I think the casts of THREAD aren't needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/objectMonitor.hpp >> ? 254?? markOopDesc* volatile* header_addr(); >> >> Why isn't this volatile markOop* ? >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/synchronizer.cpp >> ? 242???????? Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) { >> >> I think the cast of Self isn't needed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/synchronizer.cpp >> ? 992?? for (; block != NULL; block = (PaddedEnd >> *)next(block)) { >> 1734???? for (; block != NULL; block = (PaddedEnd >> *)next(block)) { >> >> [pre-existing] >> All calls to next() pass a PaddedEnd* and cast the >> result.? How about moving all that behavior into next(). >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/synchronizer.cpp >> 1970???? if (monitor > (ObjectMonitor *)&block[0] && >> 1971???????? monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) { >> >> [pre-existing] >> Are the casts needed here?? I think PaddedEnd is >> derived from ObjectMonitor, so implicit conversions should apply. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/synchronizer.hpp >> ?? 28 #include "memory/padded.hpp" >> ? 163?? static PaddedEnd * volatile gBlockList; >> >> I was going to suggest as an alternative just making gBlockList a file >> scoped variable in synchronizer.cpp, since it isn't used outside of >> that file. Except that it is referenced by vmStructs.? Curses! >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/thread.cpp >> 4707?? intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, >> (intptr_t)0); >> >> This and other places suggest LOCKBIT should be defined as intptr_t, >> rather than as an enum value.? The MuxBits enum type is unused. >> >> And the cast of 0 is another case where implicit widening would be nice. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/services/mallocSiteTable.cpp >> ? 261 bool MallocSiteHashtableEntry::atomic_insert(const >> MallocSiteHashtableEntry* entry) { >> ? 262?? return Atomic::cmpxchg_if_null(entry, (const >> MallocSiteHashtableEntry**)&_next); >> ? 263 } >> >> I think the problem here that is leading to the cast is that >> atomic_insert is taking a const T*.? Note that it's only caller passes >> a non-const T*. >> >> ------------------------------------------------------------------------------ >> >> From coleen.phillimore at oracle.com Fri Oct 13 18:34:43 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 13 Oct 2017 14:34:43 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> Message-ID: Hi, Here is the version with the changes from Kim's comments that has passed at least testing with JPRT and tier1, locally.?? More testing (tier2-5) is in progress. Also includes a corrected version of Atomic::sub care of Erik Osterlund. open webrev at http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev open webrev at http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev Full version: http://cr.openjdk.java.net/~coleenp/8188220.03/webrev Thanks! Coleen On 10/13/17 9:25 AM, coleen.phillimore at oracle.com wrote: > > Hi Kim, Thank you for the detailed review and the time you've spent on > it, and discussion yesterday. > > On 10/12/17 7:17 PM, Kim Barrett wrote: >>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>> >>> Summary: With the new template functions these are unnecessary. >>> >>> The changes are mostly s/_ptr// and removing the cast to return >>> type.? There weren't many types that needed to be improved to match >>> the template version of the function.?? Some notes: >>> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging >>> arguments. >>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I >>> disliked the first name because it's not explicit from the callers >>> that there's an underlying cas.? If people want to fight, I'll >>> remove the function and use cmpxchg because there are only a couple >>> places where this is a little nicer. >>> 3. Added Atomic::sub() >>> >>> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8188220 >>> >>> Thanks, >>> Coleen >> I looked harder at the potential ABA problems, and believe they are >> okay.? There can be multiple threads doing pushes, and there can be >> multiple threads doing pops, but not both at the same time. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/cpu/zero/cppInterpreter_zero.cpp >> ? 279???? if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != >> disp) { >> >> How does this work?? monitor and disp seem like they have unrelated >> types?? Given that this is zero-specific code, maybe this hasn't been >> tested? >> >> Similarly here: >> ? 423?????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != >> lock) { > > I haven't built zero.? I don't know how to do this anymore (help?) I > fixed the obvious type mismatches here and in > bytecodeInterpreter.cpp.? I'll try to build it. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/asm/assembler.cpp >> ? 239???????? dcon->value_fn = cfn; >> >> Is it actually safe to remove the atomic update?? If multiple threads >> performing the assignment *are* possible (and I don't understand the >> context yet, so don't know the answer to that), then a bare non-atomic >> assignment is a race, e.g. undefined behavior. >> >> Regardless of that, I think the CAST_FROM_FN_PTR should be retained. > > I can find no uses of this code, ie. looking for "delayed_value". I > think it was early jsr292 code.? I could also not find any combination > of casts that would make it compile, so in the end I believed the > comment and took out the cmpxchg.?? The code appears to be intended to > for bootstrapping, see the call to update_delayed_values() in > JavaClasses::compute_offsets(). > > The CAST_FROM_FN_PTR was to get it to compile with cmpxchg, the new > code does not require a cast.? If you can help with finding the right > set of casts, I'd be happy to put the cmpxchg back in. I just couldn't > find one. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/classLoaderData.cpp >> ? 167?? Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head); >> >> I think the cast to Chunk* is no longer needed. > > Missed another, thanks.? No that's the same one David found. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/classLoaderData.cpp >> ? 946???? ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, >> (ClassLoaderData*)NULL); >> ? 947???? if (old != NULL) { >> ? 948?????? delete cld; >> ? 949?????? // Returns the data. >> ? 950?????? return old; >> ? 951???? } >> >> That could instead be >> >> ?? if (!Atomic::replace_if_null(cld, cld_addr)) { >> ???? delete cld;?????????? // Lost the race. >> ???? return *cld_addr;???? // Use the winner's value. >> ?? } >> >> And apparently the caller of CLDG::add doesn't care whether the >> returned CLD has actually been added to the graph yet.? If that's not >> true, then there's a bug here, since a race loser might return a >> winner's value before the winner has actually done the insertion. > > True, the race loser doesn't care whether the CLD has been added to > the graph. > Your instead code requires a comment that replace_if_null is really a > compare exchange and has an extra read of the original value, so I am > leaving what I have which is clearer to me. > >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/verifier.cpp >> ?? 71 static void* verify_byte_codes_fn() { >> ?? 72?? if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) { >> ?? 73???? void *lib_handle = os::native_java_library(); >> ?? 74???? void *func = os::dll_lookup(lib_handle, >> "VerifyClassCodesForMajorVersion"); >> ?? 75???? OrderAccess::release_store(&_verify_byte_codes_fn, func); >> ?? 76???? if (func == NULL) { >> ?? 77?????? _is_new_verify_byte_codes_fn = false; >> ?? 78?????? func = os::dll_lookup(lib_handle, "VerifyClassCodes"); >> ?? 79 OrderAccess::release_store(&_verify_byte_codes_fn, func); >> ?? 80???? } >> ?? 81?? } >> ?? 82?? return (void*)_verify_byte_codes_fn; >> ?? 83 } >> >> [pre-existing] >> >> I think this code has race problems; a caller could unexpectedly and >> inappropriately return NULL.? Consider the case where there is no >> VerifyClassCodesForMajorVersion, but there is VerifyClassCodes. >> >> The variable is initially NULL. >> >> Both Thread1 and Thread2 reach line 73, having both seen a NULL value >> for the variable. >> >> Thread1 reaches line 80, setting the variable to VerifyClassCodes. >> >> Thread2 reaches line 76, resetting the variable to NULL. >> >> Thread1 reads the now (momentarily) NULL value and returns it. >> >> I think the first release_store should be conditional on func != NULL. >> Also, the usage of _is_new_verify_byte_codes_fn seems suspect. >> And a minor additional nit: the cast in the return is unnecessary. > > Yes, this looks like a bug.?? I'll cut/paste this and file it. It may > be that this is support for the old verifier in old jdk versions that > can be cleaned up. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/code/nmethod.cpp >> 1664?? nmethod* observed_mark_link = _oops_do_mark_link; >> 1665?? if (observed_mark_link == NULL) { >> 1666???? // Claim this nmethod for this thread to mark. >> 1667???? if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, >> &_oops_do_mark_link)) { >> >> With these changes, the only use of observed_mark_link is in the if. >> I'm not sure that variable is really useful anymore, e.g. just use >> >> ?? if (_oops_do_mark_link == NULL) { > > Ok fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >> >> In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were >> of type oopDesc*, I think there would be a whole lot fewer casts and >> cast_to_oop's.? Later on, I think suffix_head, observed_overflow_list, >> and curr_overflow_list could also be oopDesc* instead of oop to >> eliminate more casts. > > I actually tried to make this change but ran into more fan out that > way, so went back and just fixed the cmpxchg calls to cast oops to > oopDesc* and things were less perturbed that way. >> >> And some similar changes in CMSCollector::par_push_on_overflow_list. >> >> And similarly in parNewGeneration.cpp, in push_on_overflow_list and >> take_from_overflow_list_work. >> >> As noted in the comments for JDK-8165857, the lists and "objects" >> involved here aren't really oops, but rather the shattered remains of > > Yes, somewhat horrified at the value of BUSY. >> oops.? The suggestion there was to use HeapWord* and carry through the >> fanout; what was actually done was to change _overflow_list to >> oopDesc* to minimize fanout, even though that's kind of lying to the >> type system.? Now, with the cleanup of cmpxchg_ptr and such, we're >> paying the price of doing the minimal thing back then. > > I will file an RFE about cleaning this up.? I think what I've done was > the minimal thing. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >> 7960?? Atomic::add(-n, &_num_par_pushes); >> >> Atomic::sub > > fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/cms/parNewGeneration.cpp >> 1455?? Atomic::add(-n, &_num_par_pushes); > fixed. >> Atomic::sub >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/dirtyCardQueue.cpp >> ? 283???? void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, >> nd); >> ... >> ? 289?????? nd = static_cast(actual); >> >> Change actual's type to BufferNode* and remove the cast on line 289. > > fixed.? missed that one. gross. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1CollectedHeap.cpp >> >> [pre-existing] >> 3499???????? old = (CompiledMethod*)_postponed_list; >> >> I think that cast is only needed because >> G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as >> "volatile CompiledMethod*", when I think it ought to be >> "CompiledMethod* volatile". >> >> I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed, >> with a similar should not be needed cast: >> 3530?????? first = (CompiledMethod*)_claimed_nmethod; >> >> and another for _postponed_list here: >> 3552?????? claim = (CompiledMethod*)_postponed_list; > > I've fixed this.?? C++ is so confusing about where to put the > volatile.?? Everyone has been tripped up by it. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1HotCardCache.cpp >> ?? 77?? jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr, >> >> I think the cast of the cmpxchg result is no longer needed. > > fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp >> ? 254?????? char* touch_addr = (char*)Atomic::add(actual_chunk_size, >> &_cur_addr) - actual_chunk_size; >> >> I think the cast of the add result is no longer needed. > got it already. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/g1StringDedup.cpp >> ? 213?? return (size_t)Atomic::add(partition_size, &_next_bucket) - >> partition_size; >> >> I think the cast of the add result is no longer needed. > > I was slacking in the g1 files.? fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >> ? 200?????? PerRegionTable* res = >> ? 201???????? Atomic::cmpxchg(nxt, &_free_list, fl); >> >> Please remove the line break, now that the code has been simplified. >> >> But wait, doesn't this alloc exhibit classic ABA problems?? I *think* >> this works because alloc and bulk_free are called in different phases, >> never overlapping. > > I don't know.? Do you want to file a bug to investigate this? > fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/g1/sparsePRT.cpp >> ? 295???? SparsePRT* res = >> ? 296?????? Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >> and >> ? 307???? SparsePRT* res = >> ? 308?????? Atomic::cmpxchg(next, &_head_expanded_list, hd); >> >> I'd rather not have the line breaks in these either. >> >> And get_from_expanded_list also appears to have classic ABA problems. >> I *think* this works because add_to_expanded_list and >> get_from_expanded_list are called in different phases, never >> overlapping. > > Fixed, same question as above?? Or one bug to investigate both? >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/gc/shared/taskqueue.inline.hpp >> ? 262?? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >> ? 263?????????????????????????????????? (volatile intptr_t *)&_data, >> ? 264 (intptr_t)old_age._data); >> >> This should be >> >> ?? return Atomic::cmpxchg(new_age._data, &_data, old_age._data); > > fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/interpreter/bytecodeInterpreter.cpp >> This doesn't have any casts, which I think is correct. >> ? 708???????????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), >> mark) == mark) { >> >> but these do. >> ? 718???????????? if (Atomic::cmpxchg((void*)new_header, >> rcvr->mark_addr(), mark) == mark) { >> ? 737???????????? if (Atomic::cmpxchg((void*)new_header, >> rcvr->mark_addr(), header) == header) { >> >> I'm not sure how the ones with casts even compile?? mark_addr() seems >> to be a markOop*, which is a markOopDesc**, where markOopDesc is a >> class.? void* is not implicitly convertible to markOopDesc*. >> >> Hm, this entire file is #ifdef CC_INTERP.? Is this zero-only code?? Or >> something like that? >> >> Similarly here: >> ? 906?????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >> mark) == mark) { >> and >> ? 917?????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), mark) == mark) { >> ? 935?????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), header) == header) { >> >> and here: >> 1847?????????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >> mark) == mark) { >> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), mark) == mark) { >> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), header) == header) { >> >> and here: >> 1847?????????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >> mark) == mark) { >> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), mark) == mark) { >> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >> lockee->mark_addr(), header) == header) { > > I've changed all these.?? This is part of Zero. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/memory/metaspace.cpp >> 1502?? size_t value = OrderAccess::load_acquire(&_capacity_until_GC); >> ... >> 1537?? return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC); >> >> These and other uses of _capacity_until_GC suggest that variable's >> type should be size_t rather than intptr_t.? Note that I haven't done >> a careful check of uses to see if there are any places where such a >> change would cause problems. > > Yes, I had a hard time with metaspace.cpp because I agree > _capacity_until_GC should be size_t.?? Tried to make this change and > it cascaded a bit.? I'll file an RFE to change this type separately. > >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/constantPool.cpp >> ? 229?? OrderAccess::release_store((Klass* volatile *)adr, k); >> ? 246?? OrderAccess::release_store((Klass* volatile *)adr, k); >> ? 514?? OrderAccess::release_store((Klass* volatile *)adr, k); >> >> Casts are not needed. > > fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/constantPool.hpp >> ? 148???? volatile intptr_t adr = >> OrderAccess::load_acquire(obj_at_addr_raw(which)); >> >> [pre-existing] >> Why is adr declared volatile? > > golly beats me.? concurrency is scary, especially in the constant pool. > The load_acquire() should make sure the value is fetched from memory > so volatile is unneeded. > >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/cpCache.cpp >> ? 157???? intx newflags = (value & parameter_size_mask); >> ? 158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >> >> This is a nice demonstration of why I wanted to include some value >> preserving integral conversions in cmpxchg, rather than requiring >> exact type matching in the integral case.? There have been some others >> that I haven't commented on.? Apparently we (I) got away with >> including such conversions in Atomic::add, which I'd forgotten about. >> And see comment regarding Atomic::sub below. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/cpCache.hpp >> ? 139?? volatile Metadata*?? _f1;?????? // entry specific metadata field >> >> [pre-existing] >> I suspect the type should be Metadata* volatile.? And that would >> eliminate the need for the cast here: >> >> ? 339?? Metadata* f1_ord() const?????????????????????? { return >> (Metadata *)OrderAccess::load_acquire(&_f1); } >> >> I don't know if there are any other changes needed or desirable around >> _f1 usage. > > yes, fixed this. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/method.hpp >> ? 139?? volatile address from_compiled_entry() const?? { return >> OrderAccess::load_acquire(&_from_compiled_entry); } >> ? 140?? volatile address from_compiled_entry_no_trampoline() const; >> ? 141?? volatile address from_interpreted_entry() const{ return >> OrderAccess::load_acquire(&_from_interpreted_entry); } >> >> [pre-existing] >> The volatile qualifiers here seem suspect to me. > > Again much suspicion about concurrency and giant pain, which I > remember, of debugging these when they were broken. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/oop.inline.hpp >> ? 391???? narrowOop old = (narrowOop)Atomic::xchg(val, >> (narrowOop*)dest); >> >> Cast of return type is not needed. > > fixed. > >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jni.cpp >> >> [pre-existing] >> >> copy_jni_function_table should be using Copy::disjoint_words_atomic. > > yuck. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jni.cpp >> >> [pre-existing] >> >> 3892?? // We're about to use Atomic::xchg for synchronization. Some Zero >> 3893?? // platforms use the GCC builtin __sync_lock_test_and_set for >> this, >> 3894?? // but __sync_lock_test_and_set is not guaranteed to do what >> we want >> 3895?? // on all architectures.? So we check it works before relying >> on it. >> 3896 #if defined(ZERO) && defined(ASSERT) >> 3897?? { >> 3898???? jint a = 0xcafebabe; >> 3899???? jint b = Atomic::xchg(0xdeadbeef, &a); >> 3900???? void *c = &a; >> 3901???? void *d = Atomic::xchg(&b, &c); >> 3902???? assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, >> "Atomic::xchg() works"); >> 3903???? assert(c == &b && d == &a, "Atomic::xchg() works"); >> 3904?? } >> 3905 #endif // ZERO && ASSERT >> >> It seems rather strange to be testing Atomic::xchg() here, rather than >> as part of unit testing Atomic?? Fail unit testing => don't try to >> use... > > This is zero.? I'm not touching this. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jvmtiRawMonitor.cpp >> ? 130???? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >> ? 142???? if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, >> &_owner)) { >> >> I think these casts aren't needed. _owner is void*, and Self is >> Thread*, which is implicitly convertible to void*. >> >> Similarly here, for the THREAD argument: >> ? 280???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >> (void*)NULL); >> ? 283???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >> (void*)NULL); > > Okay, let me see if the compiler(s) eat that. (yes they do) >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jvmtiRawMonitor.hpp >> >> This file is in the webrev, but seems to be unchanged. > > It'll be cleaned up with the the commit and not be part of the changeset. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/atomic.hpp >> ? 520 template >> ? 521 inline D Atomic::sub(I sub_value, D volatile* dest) { >> ? 522?? STATIC_ASSERT(IsPointer::value || IsIntegral::value); >> ? 523?? // Assumes two's complement integer representation. >> ? 524?? #pragma warning(suppress: 4146) >> ? 525?? return Atomic::add(-sub_value, dest); >> ? 526 } >> >> I'm pretty sure this implementation is incorrect.? I think it produces >> the wrong result when I and D are both unsigned integer types and >> sizeof(I) < sizeof(D). > > Can you suggest a correction?? I just copied Atomic::dec(). >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/mutex.cpp >> ? 304?? intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, >> &_LockWord.FullWord, (intptr_t)0);? // agro ... >> >> _LBIT should probably be intptr_t, rather than an enum.? Note that the >> enum type is unused.? The old value here is another place where an >> implicit widening of same signedness would have been nice. (Such >> implicit widening doesn't work for enums, since it's unspecified >> whether they default to signed or unsigned representation, and >> implementatinos differ.) > > This would be a good/simple cleanup.? I changed it to const intptr_t > _LBIT = 1; >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/mutex.hpp >> >> [pre-existing] >> >> I think the Address member of the SplitWord union is unused. Looking >> at AcquireOrPush (and others), I'm wondering whether it *should* be >> used there, or whether just using intptr_t casts and doing integral >> arithmetic (as is presently being done) is easier and clearer. >> >> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >> rather than polluting the global namespace.? And technically, that >> name is reserved word. > > I moved both this and _LBIT into the top of mutex.cpp since they are > used there. > Cant define const intptr_t _LBIT =1; in a class in our version of C++. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/objectMonitor.cpp >> ? 252?? void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); >> ? 409?? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >> 1983?????? ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, >> (void*)NULL); >> >> I think the casts of Self aren't needed. > > fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/objectMonitor.cpp >> ? 995?????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >> 1020???????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >> >> I think the casts of THREAD aren't needed. > > nope, fixed. >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/objectMonitor.hpp >> ? 254?? markOopDesc* volatile* header_addr(); >> >> Why isn't this volatile markOop* ? > > fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/synchronizer.cpp >> ? 242???????? Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) { >> >> I think the cast of Self isn't needed. > > fixed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/synchronizer.cpp >> ? 992?? for (; block != NULL; block = (PaddedEnd >> *)next(block)) { >> 1734???? for (; block != NULL; block = (PaddedEnd >> *)next(block)) { >> >> [pre-existing] >> All calls to next() pass a PaddedEnd* and cast the >> result.? How about moving all that behavior into next(). > > I fixed this next() function, but it necessitated a cast to FreeNext > field.? The PaddedEnd<> type was intentionally not propagated to all > the things that use it.?? Which is a shame because there are a lot > more casts to PaddedEnd that could have been removed. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/synchronizer.cpp >> 1970???? if (monitor > (ObjectMonitor *)&block[0] && >> 1971???????? monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) { >> >> [pre-existing] >> Are the casts needed here?? I think PaddedEnd is >> derived from ObjectMonitor, so implicit conversions should apply. > > prob not.? removed them. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/synchronizer.hpp >> ?? 28 #include "memory/padded.hpp" >> ? 163?? static PaddedEnd * volatile gBlockList; >> >> I was going to suggest as an alternative just making gBlockList a file >> scoped variable in synchronizer.cpp, since it isn't used outside of >> that file. Except that it is referenced by vmStructs.? Curses! > > It's also used by the SA. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/runtime/thread.cpp >> 4707?? intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, >> (intptr_t)0); >> >> This and other places suggest LOCKBIT should be defined as intptr_t, >> rather than as an enum value.? The MuxBits enum type is unused. >> >> And the cast of 0 is another case where implicit widening would be nice. > > Making LOCKBIT a const intptr_t = 1 removes a lot of casts. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/services/mallocSiteTable.cpp >> ? 261 bool MallocSiteHashtableEntry::atomic_insert(const >> MallocSiteHashtableEntry* entry) { >> ? 262?? return Atomic::cmpxchg_if_null(entry, (const >> MallocSiteHashtableEntry**)&_next); >> ? 263 } >> >> I think the problem here that is leading to the cast is that >> atomic_insert is taking a const T*.? Note that it's only caller passes >> a non-const T*. > > I'll change the type to non-const.? We try to use consts... > > Thanks for the detailed review!? The gcc compiler seems happy so far, > I'll post a webrev of the result of these changes after fixing > Atomic::sub() and seeing how the other compilers deal with these changes. > > Thanks, > Coleen > >> >> ------------------------------------------------------------------------------ >> >> > From david.holmes at oracle.com Sat Oct 14 12:32:14 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 14 Oct 2017 22:32:14 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> Message-ID: <7265c30d-946b-19c4-a1b3-c3314a869ee8@oracle.com> Hi Coleen, These changes all seem okay to me - except I can't comment on the Atomic::sub implementation. :) Thanks for adding the assert to header_addr(). FYI from objectMonitor.hpp: // ObjectMonitor Layout Overview/Highlights/Restrictions: // // - The _header field must be at offset 0 because the displaced header // from markOop is stored there. We do not want markOop.hpp to include // ObjectMonitor.hpp to avoid exposing ObjectMonitor everywhere. This // means that ObjectMonitor cannot inherit from any other class nor can // it use any virtual member functions. This restriction is critical to // the proper functioning of the VM. so it is important we ensure this holds. Thanks, David On 14/10/2017 4:34 AM, coleen.phillimore at oracle.com wrote: > > Hi, Here is the version with the changes from Kim's comments that has > passed at least testing with JPRT and tier1, locally.?? More testing > (tier2-5) is in progress. > > Also includes a corrected version of Atomic::sub care of Erik Osterlund. > > open webrev at > http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev > open webrev at > http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev > > Full version: > > http://cr.openjdk.java.net/~coleenp/8188220.03/webrev > > Thanks! > Coleen > > On 10/13/17 9:25 AM, coleen.phillimore at oracle.com wrote: >> >> Hi Kim, Thank you for the detailed review and the time you've spent on >> it, and discussion yesterday. >> >> On 10/12/17 7:17 PM, Kim Barrett wrote: >>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>> >>>> Summary: With the new template functions these are unnecessary. >>>> >>>> The changes are mostly s/_ptr// and removing the cast to return >>>> type.? There weren't many types that needed to be improved to match >>>> the template version of the function.?? Some notes: >>>> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging >>>> arguments. >>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I >>>> disliked the first name because it's not explicit from the callers >>>> that there's an underlying cas.? If people want to fight, I'll >>>> remove the function and use cmpxchg because there are only a couple >>>> places where this is a little nicer. >>>> 3. Added Atomic::sub() >>>> >>>> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8188220 >>>> >>>> Thanks, >>>> Coleen >>> I looked harder at the potential ABA problems, and believe they are >>> okay.? There can be multiple threads doing pushes, and there can be >>> multiple threads doing pops, but not both at the same time. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/cpu/zero/cppInterpreter_zero.cpp >>> ? 279???? if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != >>> disp) { >>> >>> How does this work?? monitor and disp seem like they have unrelated >>> types?? Given that this is zero-specific code, maybe this hasn't been >>> tested? >>> >>> Similarly here: >>> ? 423?????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != >>> lock) { >> >> I haven't built zero.? I don't know how to do this anymore (help?) I >> fixed the obvious type mismatches here and in >> bytecodeInterpreter.cpp.? I'll try to build it. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/asm/assembler.cpp >>> ? 239???????? dcon->value_fn = cfn; >>> >>> Is it actually safe to remove the atomic update?? If multiple threads >>> performing the assignment *are* possible (and I don't understand the >>> context yet, so don't know the answer to that), then a bare non-atomic >>> assignment is a race, e.g. undefined behavior. >>> >>> Regardless of that, I think the CAST_FROM_FN_PTR should be retained. >> >> I can find no uses of this code, ie. looking for "delayed_value". I >> think it was early jsr292 code.? I could also not find any combination >> of casts that would make it compile, so in the end I believed the >> comment and took out the cmpxchg.?? The code appears to be intended to >> for bootstrapping, see the call to update_delayed_values() in >> JavaClasses::compute_offsets(). >> >> The CAST_FROM_FN_PTR was to get it to compile with cmpxchg, the new >> code does not require a cast.? If you can help with finding the right >> set of casts, I'd be happy to put the cmpxchg back in. I just couldn't >> find one. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/classLoaderData.cpp >>> ? 167?? Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head); >>> >>> I think the cast to Chunk* is no longer needed. >> >> Missed another, thanks.? No that's the same one David found. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/classLoaderData.cpp >>> ? 946???? ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, >>> (ClassLoaderData*)NULL); >>> ? 947???? if (old != NULL) { >>> ? 948?????? delete cld; >>> ? 949?????? // Returns the data. >>> ? 950?????? return old; >>> ? 951???? } >>> >>> That could instead be >>> >>> ?? if (!Atomic::replace_if_null(cld, cld_addr)) { >>> ???? delete cld;?????????? // Lost the race. >>> ???? return *cld_addr;???? // Use the winner's value. >>> ?? } >>> >>> And apparently the caller of CLDG::add doesn't care whether the >>> returned CLD has actually been added to the graph yet.? If that's not >>> true, then there's a bug here, since a race loser might return a >>> winner's value before the winner has actually done the insertion. >> >> True, the race loser doesn't care whether the CLD has been added to >> the graph. >> Your instead code requires a comment that replace_if_null is really a >> compare exchange and has an extra read of the original value, so I am >> leaving what I have which is clearer to me. >> >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/verifier.cpp >>> ?? 71 static void* verify_byte_codes_fn() { >>> ?? 72?? if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) { >>> ?? 73???? void *lib_handle = os::native_java_library(); >>> ?? 74???? void *func = os::dll_lookup(lib_handle, >>> "VerifyClassCodesForMajorVersion"); >>> ?? 75???? OrderAccess::release_store(&_verify_byte_codes_fn, func); >>> ?? 76???? if (func == NULL) { >>> ?? 77?????? _is_new_verify_byte_codes_fn = false; >>> ?? 78?????? func = os::dll_lookup(lib_handle, "VerifyClassCodes"); >>> ?? 79 OrderAccess::release_store(&_verify_byte_codes_fn, func); >>> ?? 80???? } >>> ?? 81?? } >>> ?? 82?? return (void*)_verify_byte_codes_fn; >>> ?? 83 } >>> >>> [pre-existing] >>> >>> I think this code has race problems; a caller could unexpectedly and >>> inappropriately return NULL.? Consider the case where there is no >>> VerifyClassCodesForMajorVersion, but there is VerifyClassCodes. >>> >>> The variable is initially NULL. >>> >>> Both Thread1 and Thread2 reach line 73, having both seen a NULL value >>> for the variable. >>> >>> Thread1 reaches line 80, setting the variable to VerifyClassCodes. >>> >>> Thread2 reaches line 76, resetting the variable to NULL. >>> >>> Thread1 reads the now (momentarily) NULL value and returns it. >>> >>> I think the first release_store should be conditional on func != NULL. >>> Also, the usage of _is_new_verify_byte_codes_fn seems suspect. >>> And a minor additional nit: the cast in the return is unnecessary. >> >> Yes, this looks like a bug.?? I'll cut/paste this and file it. It may >> be that this is support for the old verifier in old jdk versions that >> can be cleaned up. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/code/nmethod.cpp >>> 1664?? nmethod* observed_mark_link = _oops_do_mark_link; >>> 1665?? if (observed_mark_link == NULL) { >>> 1666???? // Claim this nmethod for this thread to mark. >>> 1667???? if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, >>> &_oops_do_mark_link)) { >>> >>> With these changes, the only use of observed_mark_link is in the if. >>> I'm not sure that variable is really useful anymore, e.g. just use >>> >>> ?? if (_oops_do_mark_link == NULL) { >> >> Ok fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>> >>> In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were >>> of type oopDesc*, I think there would be a whole lot fewer casts and >>> cast_to_oop's.? Later on, I think suffix_head, observed_overflow_list, >>> and curr_overflow_list could also be oopDesc* instead of oop to >>> eliminate more casts. >> >> I actually tried to make this change but ran into more fan out that >> way, so went back and just fixed the cmpxchg calls to cast oops to >> oopDesc* and things were less perturbed that way. >>> >>> And some similar changes in CMSCollector::par_push_on_overflow_list. >>> >>> And similarly in parNewGeneration.cpp, in push_on_overflow_list and >>> take_from_overflow_list_work. >>> >>> As noted in the comments for JDK-8165857, the lists and "objects" >>> involved here aren't really oops, but rather the shattered remains of >> >> Yes, somewhat horrified at the value of BUSY. >>> oops.? The suggestion there was to use HeapWord* and carry through the >>> fanout; what was actually done was to change _overflow_list to >>> oopDesc* to minimize fanout, even though that's kind of lying to the >>> type system.? Now, with the cleanup of cmpxchg_ptr and such, we're >>> paying the price of doing the minimal thing back then. >> >> I will file an RFE about cleaning this up.? I think what I've done was >> the minimal thing. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>> 7960?? Atomic::add(-n, &_num_par_pushes); >>> >>> Atomic::sub >> >> fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/cms/parNewGeneration.cpp >>> 1455?? Atomic::add(-n, &_num_par_pushes); >> fixed. >>> Atomic::sub >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/g1/dirtyCardQueue.cpp >>> ? 283???? void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, >>> nd); >>> ... >>> ? 289?????? nd = static_cast(actual); >>> >>> Change actual's type to BufferNode* and remove the cast on line 289. >> >> fixed.? missed that one. gross. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp >>> >>> [pre-existing] >>> 3499???????? old = (CompiledMethod*)_postponed_list; >>> >>> I think that cast is only needed because >>> G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as >>> "volatile CompiledMethod*", when I think it ought to be >>> "CompiledMethod* volatile". >>> >>> I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed, >>> with a similar should not be needed cast: >>> 3530?????? first = (CompiledMethod*)_claimed_nmethod; >>> >>> and another for _postponed_list here: >>> 3552?????? claim = (CompiledMethod*)_postponed_list; >> >> I've fixed this.?? C++ is so confusing about where to put the >> volatile.?? Everyone has been tripped up by it. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/g1/g1HotCardCache.cpp >>> ?? 77?? jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr, >>> >>> I think the cast of the cmpxchg result is no longer needed. >> >> fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp >>> ? 254?????? char* touch_addr = (char*)Atomic::add(actual_chunk_size, >>> &_cur_addr) - actual_chunk_size; >>> >>> I think the cast of the add result is no longer needed. >> got it already. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/g1/g1StringDedup.cpp >>> ? 213?? return (size_t)Atomic::add(partition_size, &_next_bucket) - >>> partition_size; >>> >>> I think the cast of the add result is no longer needed. >> >> I was slacking in the g1 files.? fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >>> ? 200?????? PerRegionTable* res = >>> ? 201???????? Atomic::cmpxchg(nxt, &_free_list, fl); >>> >>> Please remove the line break, now that the code has been simplified. >>> >>> But wait, doesn't this alloc exhibit classic ABA problems?? I *think* >>> this works because alloc and bulk_free are called in different phases, >>> never overlapping. >> >> I don't know.? Do you want to file a bug to investigate this? >> fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/g1/sparsePRT.cpp >>> ? 295???? SparsePRT* res = >>> ? 296?????? Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >>> and >>> ? 307???? SparsePRT* res = >>> ? 308?????? Atomic::cmpxchg(next, &_head_expanded_list, hd); >>> >>> I'd rather not have the line breaks in these either. >>> >>> And get_from_expanded_list also appears to have classic ABA problems. >>> I *think* this works because add_to_expanded_list and >>> get_from_expanded_list are called in different phases, never >>> overlapping. >> >> Fixed, same question as above?? Or one bug to investigate both? >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>> ? 262?? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>> ? 263?????????????????????????????????? (volatile intptr_t *)&_data, >>> ? 264 (intptr_t)old_age._data); >>> >>> This should be >>> >>> ?? return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >> >> fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/interpreter/bytecodeInterpreter.cpp >>> This doesn't have any casts, which I think is correct. >>> ? 708???????????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), >>> mark) == mark) { >>> >>> but these do. >>> ? 718???????????? if (Atomic::cmpxchg((void*)new_header, >>> rcvr->mark_addr(), mark) == mark) { >>> ? 737???????????? if (Atomic::cmpxchg((void*)new_header, >>> rcvr->mark_addr(), header) == header) { >>> >>> I'm not sure how the ones with casts even compile?? mark_addr() seems >>> to be a markOop*, which is a markOopDesc**, where markOopDesc is a >>> class.? void* is not implicitly convertible to markOopDesc*. >>> >>> Hm, this entire file is #ifdef CC_INTERP.? Is this zero-only code?? Or >>> something like that? >>> >>> Similarly here: >>> ? 906?????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >>> mark) == mark) { >>> and >>> ? 917?????????? if (Atomic::cmpxchg((void*)new_header, >>> lockee->mark_addr(), mark) == mark) { >>> ? 935?????????? if (Atomic::cmpxchg((void*)new_header, >>> lockee->mark_addr(), header) == header) { >>> >>> and here: >>> 1847?????????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >>> mark) == mark) { >>> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >>> lockee->mark_addr(), mark) == mark) { >>> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >>> lockee->mark_addr(), header) == header) { >>> >>> and here: >>> 1847?????????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >>> mark) == mark) { >>> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >>> lockee->mark_addr(), mark) == mark) { >>> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >>> lockee->mark_addr(), header) == header) { >> >> I've changed all these.?? This is part of Zero. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/memory/metaspace.cpp >>> 1502?? size_t value = OrderAccess::load_acquire(&_capacity_until_GC); >>> ... >>> 1537?? return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC); >>> >>> These and other uses of _capacity_until_GC suggest that variable's >>> type should be size_t rather than intptr_t.? Note that I haven't done >>> a careful check of uses to see if there are any places where such a >>> change would cause problems. >> >> Yes, I had a hard time with metaspace.cpp because I agree >> _capacity_until_GC should be size_t.?? Tried to make this change and >> it cascaded a bit.? I'll file an RFE to change this type separately. >> >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/oops/constantPool.cpp >>> ? 229?? OrderAccess::release_store((Klass* volatile *)adr, k); >>> ? 246?? OrderAccess::release_store((Klass* volatile *)adr, k); >>> ? 514?? OrderAccess::release_store((Klass* volatile *)adr, k); >>> >>> Casts are not needed. >> >> fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/oops/constantPool.hpp >>> ? 148???? volatile intptr_t adr = >>> OrderAccess::load_acquire(obj_at_addr_raw(which)); >>> >>> [pre-existing] >>> Why is adr declared volatile? >> >> golly beats me.? concurrency is scary, especially in the constant pool. >> The load_acquire() should make sure the value is fetched from memory >> so volatile is unneeded. >> >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/oops/cpCache.cpp >>> ? 157???? intx newflags = (value & parameter_size_mask); >>> ? 158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >>> >>> This is a nice demonstration of why I wanted to include some value >>> preserving integral conversions in cmpxchg, rather than requiring >>> exact type matching in the integral case.? There have been some others >>> that I haven't commented on.? Apparently we (I) got away with >>> including such conversions in Atomic::add, which I'd forgotten about. >>> And see comment regarding Atomic::sub below. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/oops/cpCache.hpp >>> ? 139?? volatile Metadata*?? _f1;?????? // entry specific metadata field >>> >>> [pre-existing] >>> I suspect the type should be Metadata* volatile.? And that would >>> eliminate the need for the cast here: >>> >>> ? 339?? Metadata* f1_ord() const?????????????????????? { return >>> (Metadata *)OrderAccess::load_acquire(&_f1); } >>> >>> I don't know if there are any other changes needed or desirable around >>> _f1 usage. >> >> yes, fixed this. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/oops/method.hpp >>> ? 139?? volatile address from_compiled_entry() const?? { return >>> OrderAccess::load_acquire(&_from_compiled_entry); } >>> ? 140?? volatile address from_compiled_entry_no_trampoline() const; >>> ? 141?? volatile address from_interpreted_entry() const{ return >>> OrderAccess::load_acquire(&_from_interpreted_entry); } >>> >>> [pre-existing] >>> The volatile qualifiers here seem suspect to me. >> >> Again much suspicion about concurrency and giant pain, which I >> remember, of debugging these when they were broken. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/oops/oop.inline.hpp >>> ? 391???? narrowOop old = (narrowOop)Atomic::xchg(val, >>> (narrowOop*)dest); >>> >>> Cast of return type is not needed. >> >> fixed. >> >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jni.cpp >>> >>> [pre-existing] >>> >>> copy_jni_function_table should be using Copy::disjoint_words_atomic. >> >> yuck. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jni.cpp >>> >>> [pre-existing] >>> >>> 3892?? // We're about to use Atomic::xchg for synchronization. Some Zero >>> 3893?? // platforms use the GCC builtin __sync_lock_test_and_set for >>> this, >>> 3894?? // but __sync_lock_test_and_set is not guaranteed to do what >>> we want >>> 3895?? // on all architectures.? So we check it works before relying >>> on it. >>> 3896 #if defined(ZERO) && defined(ASSERT) >>> 3897?? { >>> 3898???? jint a = 0xcafebabe; >>> 3899???? jint b = Atomic::xchg(0xdeadbeef, &a); >>> 3900???? void *c = &a; >>> 3901???? void *d = Atomic::xchg(&b, &c); >>> 3902???? assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, >>> "Atomic::xchg() works"); >>> 3903???? assert(c == &b && d == &a, "Atomic::xchg() works"); >>> 3904?? } >>> 3905 #endif // ZERO && ASSERT >>> >>> It seems rather strange to be testing Atomic::xchg() here, rather than >>> as part of unit testing Atomic?? Fail unit testing => don't try to >>> use... >> >> This is zero.? I'm not touching this. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jvmtiRawMonitor.cpp >>> ? 130???? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >>> ? 142???? if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, >>> &_owner)) { >>> >>> I think these casts aren't needed. _owner is void*, and Self is >>> Thread*, which is implicitly convertible to void*. >>> >>> Similarly here, for the THREAD argument: >>> ? 280???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >>> (void*)NULL); >>> ? 283???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >>> (void*)NULL); >> >> Okay, let me see if the compiler(s) eat that. (yes they do) >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jvmtiRawMonitor.hpp >>> >>> This file is in the webrev, but seems to be unchanged. >> >> It'll be cleaned up with the the commit and not be part of the changeset. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/atomic.hpp >>> ? 520 template >>> ? 521 inline D Atomic::sub(I sub_value, D volatile* dest) { >>> ? 522?? STATIC_ASSERT(IsPointer::value || IsIntegral::value); >>> ? 523?? // Assumes two's complement integer representation. >>> ? 524?? #pragma warning(suppress: 4146) >>> ? 525?? return Atomic::add(-sub_value, dest); >>> ? 526 } >>> >>> I'm pretty sure this implementation is incorrect.? I think it produces >>> the wrong result when I and D are both unsigned integer types and >>> sizeof(I) < sizeof(D). >> >> Can you suggest a correction?? I just copied Atomic::dec(). >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/mutex.cpp >>> ? 304?? intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, >>> &_LockWord.FullWord, (intptr_t)0);? // agro ... >>> >>> _LBIT should probably be intptr_t, rather than an enum.? Note that the >>> enum type is unused.? The old value here is another place where an >>> implicit widening of same signedness would have been nice. (Such >>> implicit widening doesn't work for enums, since it's unspecified >>> whether they default to signed or unsigned representation, and >>> implementatinos differ.) >> >> This would be a good/simple cleanup.? I changed it to const intptr_t >> _LBIT = 1; >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/mutex.hpp >>> >>> [pre-existing] >>> >>> I think the Address member of the SplitWord union is unused. Looking >>> at AcquireOrPush (and others), I'm wondering whether it *should* be >>> used there, or whether just using intptr_t casts and doing integral >>> arithmetic (as is presently being done) is easier and clearer. >>> >>> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >>> rather than polluting the global namespace.? And technically, that >>> name is reserved word. >> >> I moved both this and _LBIT into the top of mutex.cpp since they are >> used there. >> Cant define const intptr_t _LBIT =1; in a class in our version of C++. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/objectMonitor.cpp >>> ? 252?? void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL); >>> ? 409?? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >>> 1983?????? ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, >>> (void*)NULL); >>> >>> I think the casts of Self aren't needed. >> >> fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/objectMonitor.cpp >>> ? 995?????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >>> 1020???????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >>> >>> I think the casts of THREAD aren't needed. >> >> nope, fixed. >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/objectMonitor.hpp >>> ? 254?? markOopDesc* volatile* header_addr(); >>> >>> Why isn't this volatile markOop* ? >> >> fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/synchronizer.cpp >>> ? 242???????? Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) { >>> >>> I think the cast of Self isn't needed. >> >> fixed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/synchronizer.cpp >>> ? 992?? for (; block != NULL; block = (PaddedEnd >>> *)next(block)) { >>> 1734???? for (; block != NULL; block = (PaddedEnd >>> *)next(block)) { >>> >>> [pre-existing] >>> All calls to next() pass a PaddedEnd* and cast the >>> result.? How about moving all that behavior into next(). >> >> I fixed this next() function, but it necessitated a cast to FreeNext >> field.? The PaddedEnd<> type was intentionally not propagated to all >> the things that use it.?? Which is a shame because there are a lot >> more casts to PaddedEnd that could have been removed. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/synchronizer.cpp >>> 1970???? if (monitor > (ObjectMonitor *)&block[0] && >>> 1971???????? monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) { >>> >>> [pre-existing] >>> Are the casts needed here?? I think PaddedEnd is >>> derived from ObjectMonitor, so implicit conversions should apply. >> >> prob not.? removed them. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/synchronizer.hpp >>> ?? 28 #include "memory/padded.hpp" >>> ? 163?? static PaddedEnd * volatile gBlockList; >>> >>> I was going to suggest as an alternative just making gBlockList a file >>> scoped variable in synchronizer.cpp, since it isn't used outside of >>> that file. Except that it is referenced by vmStructs.? Curses! >> >> It's also used by the SA. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/runtime/thread.cpp >>> 4707?? intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, >>> (intptr_t)0); >>> >>> This and other places suggest LOCKBIT should be defined as intptr_t, >>> rather than as an enum value.? The MuxBits enum type is unused. >>> >>> And the cast of 0 is another case where implicit widening would be nice. >> >> Making LOCKBIT a const intptr_t = 1 removes a lot of casts. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/services/mallocSiteTable.cpp >>> ? 261 bool MallocSiteHashtableEntry::atomic_insert(const >>> MallocSiteHashtableEntry* entry) { >>> ? 262?? return Atomic::cmpxchg_if_null(entry, (const >>> MallocSiteHashtableEntry**)&_next); >>> ? 263 } >>> >>> I think the problem here that is leading to the cast is that >>> atomic_insert is taking a const T*.? Note that it's only caller passes >>> a non-const T*. >> >> I'll change the type to non-const.? We try to use consts... >> >> Thanks for the detailed review!? The gcc compiler seems happy so far, >> I'll post a webrev of the result of these changes after fixing >> Atomic::sub() and seeing how the other compilers deal with these changes. >> >> Thanks, >> Coleen >> >>> >>> ------------------------------------------------------------------------------ >>> >>> >> > From rkennke at redhat.com Sat Oct 14 22:41:05 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 00:41:05 +0200 Subject: RFR: 8171853: Remove Shark compiler Message-ID: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So here comes the big patch to remove it. What I have done: grep -i -R shark src grep -i -R shark make grep -i -R shark doc grep -i -R shark doc and purged any reference to shark. Almost everything was straightforward. The only things I wasn't really sure of: - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope that's good? - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing to do. If not, what *would* be the right thing? Then of course I did: rm -rf src/hotspot/share/shark I also went through the build machinery and removed stuff related to Shark and LLVM libs. Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) I tested by building a regular x86 JVM and running JTREG tests. All looks fine. - I could not build zero because it seems broken because of the recent Atomic::* changes - I could not test any of the other arches that seemed to reference Shark (arm and sparc) Here's the full webrev: http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ Can I get a review on this? Thanks, Roman From kim.barrett at oracle.com Sat Oct 14 23:36:44 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 14 Oct 2017 19:36:44 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> Message-ID: <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> > On Oct 13, 2017, at 2:34 PM, coleen.phillimore at oracle.com wrote: > > > Hi, Here is the version with the changes from Kim's comments that has passed at least testing with JPRT and tier1, locally. More testing (tier2-5) is in progress. > > Also includes a corrected version of Atomic::sub care of Erik Osterlund. > > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev > open webrev at http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev > > Full version: > > http://cr.openjdk.java.net/~coleenp/8188220.03/webrev > > Thanks! > Coleen I still dislike and disagree with what is being proposed regarding replace_if_null. ------------------------------------------------------------------------------ I forgot that I'd promised you an updated Atomic::sub definition. Unfortunately, the new one still has problems, performing some conversions that should not be permitted (and are disallowed by Atomic::add). Try this instead. (This hasn't been tested, not even compiled; hopefully I don't have any typos or anything.) The intent is that this supports the same conversions as Atomic::add. template inline D Atomic::sub(I sub_value, D volatile* dest) { STATIC_ASSERT(IsPointer::value || IsIntegral::value); STATIC_ASSERT(IsIntegral::value); // If D is a pointer type, use [u]intptr_t as the addend type, // matching signedness of I. Otherwise, use D as the addend type. typedef typename Conditional::value, intptr_t, uintptr_t>::type PI; typedef typename Conditional::value, PI, D>::type AddendType; // Only allow conversions that can't change the value. STATIC_ASSERT(IsSigned::value == IsSigned::value); STATIC_ASSERT(sizeof(I) <= sizeof(AddendType)); AddendType addend = sub_value; // Assumes two's complement integer representation. #pragma warning(suppress: 4146) // In case AddendType is not signed. return Atomic::add(-addend, dest); } >>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>> 7960 Atomic::add(-n, &_num_par_pushes); >>> >>> Atomic::sub >> >> fixed. Nope, not fixed in http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >>> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >>> 200 PerRegionTable* res = >>> 201 Atomic::cmpxchg(nxt, &_free_list, fl); >>> >>> Please remove the line break, now that the code has been simplified. >>> >>> But wait, doesn't this alloc exhibit classic ABA problems? I *think* >>> this works because alloc and bulk_free are called in different phases, >>> never overlapping. >> >> I don't know. Do you want to file a bug to investigate this? >> fixed. No, I now think it?s ok, though confusing. >>> src/hotspot/share/gc/g1/sparsePRT.cpp >>> 295 SparsePRT* res = >>> 296 Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >>> and >>> 307 SparsePRT* res = >>> 308 Atomic::cmpxchg(next, &_head_expanded_list, hd); >>> >>> I'd rather not have the line breaks in these either. >>> >>> And get_from_expanded_list also appears to have classic ABA problems. >>> I *think* this works because add_to_expanded_list and >>> get_from_expanded_list are called in different phases, never >>> overlapping. >> >> Fixed, same question as above? Or one bug to investigate both? Again, I think it?s ok, though confusing. >>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>> 262 return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>> 263 (volatile intptr_t *)&_data, >>> 264 (intptr_t)old_age._data); >>> >>> This should be >>> >>> return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >> >> fixed. Still casting the result. >>> src/hotspot/share/oops/method.hpp >>> 139 volatile address from_compiled_entry() const { return OrderAccess::load_acquire(&_from_compiled_entry); } >>> 140 volatile address from_compiled_entry_no_trampoline() const; >>> 141 volatile address from_interpreted_entry() const{ return OrderAccess::load_acquire(&_from_interpreted_entry); } >>> >>> [pre-existing] >>> The volatile qualifiers here seem suspect to me. >> >> Again much suspicion about concurrency and giant pain, which I remember, of debugging these when they were broken. Let me be more direct: the volatile qualifiers for the function return types are bogus and confusing, and should be removed. >>> src/hotspot/share/prims/jni.cpp >>> >>> [pre-existing] >>> >>> copy_jni_function_table should be using Copy::disjoint_words_atomic. >> >> yuck. Of course, neither is entirely technically correct, since both are treating conversion of function pointers to void* as okay in shared code, e.g. violating some of the raison d'etre of CAST_{TO,FROM}_FN_PTR. For way more detail than you probably care about, see the discussion starting here: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018578.html through (5 messages in total) http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018623.html Oh well. >>> src/hotspot/share/runtime/mutex.hpp >>> >>> [pre-existing] >>> >>> I think the Address member of the SplitWord union is unused. Looking >>> at AcquireOrPush (and others), I'm wondering whether it *should* be >>> used there, or whether just using intptr_t casts and doing integral >>> arithmetic (as is presently being done) is easier and clearer. >>> >>> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >>> rather than polluting the global namespace. And technically, that >>> name is reserved word. >> >> I moved both this and _LBIT into the top of mutex.cpp since they are used there. Good. >> Cant define const intptr_t _LBIT =1; in a class in our version of C++. Sorry, please explain? If you tried to move it into SplitWord, that doesn?t work; unions are not permitted to have static data members (I don?t off-hand know why, just that it?s explicitly forbidden). And you left the seemingly unused Address member in SplitWord. >>> src/hotspot/share/runtime/thread.cpp >>> 4707 intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, (intptr_t)0); >>> >>> This and other places suggest LOCKBIT should be defined as intptr_t, >>> rather than as an enum value. The MuxBits enum type is unused. >>> >>> And the cast of 0 is another case where implicit widening would be nice. >> >> Making LOCKBIT a const intptr_t = 1 removes a lot of casts. Because of the new definition of LOCKBIT I noticed the immediately preceeding typedef for MutexT, which seems to be unused. ------------------------------------------------------------------------------ src/hotspot/share/oops/cpCache.cpp 114 bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { 115 intptr_t result = Atomic::cmpxchg(flags, &_flags, (intx)0); 116 return (result == 0); 117 } [I missed this on earlier pass.] Should be bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { return Atomic::cmpxchg(flags, &_flags, (intx)0) == 0; } Otherwise, I end up asking why result is intptr_t when the cmpxchg is dealing with intx. Yeah, one's a typedef of the other, but mixing them like that in the same expression is not helpful. From glaubitz at physik.fu-berlin.de Sun Oct 15 06:06:12 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Sun, 15 Oct 2017 08:06:12 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> Message-ID: <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> Hi Roman! Please let me look at SPARC next week first before merging this. And thanks for notifying me that Zero is broken again *sigh*. People, please test your changes. Yes, I know you all just care about Hotspot. But please understand that there are many people out there who rely on Zero, i.e. they are using it. Breaking code that people actively use is not nice and should not happen in a project like OpenJDK. Building Zero takes maybe 5 minutes on a fast x86 machine, so I would like to ask everyone to please test their changes against Zero as well. These tests will keep the headaches for people relying on Zero low and also avoids that distributions have to ship many patches on top of OpenJDK upstream. If you cannot test your patch on a given platform X, please let me know. I have access to every platform supported by OpenJDK except AIX/PPC. Thanks, Adrian > On Oct 15, 2017, at 12:41 AM, Roman Kennke wrote: > > The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So here comes the big patch to remove it. > > What I have done: > > grep -i -R shark src > grep -i -R shark make > grep -i -R shark doc > grep -i -R shark doc > > and purged any reference to shark. Almost everything was straightforward. > > The only things I wasn't really sure of: > > - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope that's good? > - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing to do. If not, what *would* be the right thing? > > Then of course I did: > > rm -rf src/hotspot/share/shark > > I also went through the build machinery and removed stuff related to Shark and LLVM libs. > > Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) > > I tested by building a regular x86 JVM and running JTREG tests. All looks fine. > > - I could not build zero because it seems broken because of the recent Atomic::* changes > - I could not test any of the other arches that seemed to reference Shark (arm and sparc) > > Here's the full webrev: > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ > > Can I get a review on this? > > Thanks, Roman From rkennke at redhat.com Sun Oct 15 20:20:17 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 22:20:17 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> Message-ID: Hi Adrian, > Please let me look at SPARC next week first before merging this. Thanks! Will wait for your feedback! > And thanks for notifying me that Zero is broken again *sigh*. It seems to be only a little thing. I have a fix that I'm currently testing. Will file another bug and an RFR soon. Thanks, Roman From glaubitz at physik.fu-berlin.de Sun Oct 15 20:26:58 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Sun, 15 Oct 2017 22:26:58 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> Message-ID: Hi Roman! On 10/15/2017 12:41 AM, Roman Kennke wrote: > The JEP to remove the Shark compiler has received exclusively positive > feedback (JDK-8189173) on zero-dev. So here comes the big patch to remove it. I have now read through the JEP and I have to say, I'm sad to see Shark go. In my opinion, Shark should be a supported version of the JVM as LLVM is gaining code generation support for more and more architectures. I have always liked the idea to split out the code generation of compilers into a separate project and, in fact, the compilers for many other languages like Rust and Julia rely on LLVM. It's a pity that this value is not seen within the OpenJDK project. > I tested by building a regular x86 JVM and running JTREG tests. All looks fine. > > - I could not build zero because it seems broken because of the recent Atomic::* changes I just performed a Zero test build with the current HG revision of OpenJDK on x86_64 without any problems and Zero on SPARC builds fine as well, so the problem you are seeing has apparently been fixed now. I have not tested your patch yet though, I just wanted to verify whether Zero still builds fine. > - I could not test any of the other arches that seemed to reference Shark (arm and sparc) I will test this later. I am currently waiting for JDK-8186579 to get merged which fixes the last problem on Linux-SPARC. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From rkennke at redhat.com Sun Oct 15 20:34:46 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 22:34:46 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> Message-ID: <152a7a54-d30f-3c82-313a-608ef118628a@redhat.com> Am 15.10.2017 um 22:26 schrieb John Paul Adrian Glaubitz: > Hi Roman! > > On 10/15/2017 12:41 AM, Roman Kennke wrote: >> The JEP to remove the Shark compiler has received exclusively positive >> feedback (JDK-8189173) on zero-dev. So here comes the big patch to >> remove it. > > I have now read through the JEP and I have to say, I'm sad to see > Shark go. > > In my opinion, Shark should be a supported version of the JVM as LLVM > is gaining > code generation support for more and more architectures. I have always > liked the > idea to split out the code generation of compilers into a separate > project and, > in fact, the compilers for many other languages like Rust and Julia > rely on LLVM. > > It's a pity that this value is not seen within the OpenJDK project. Yes, I agree with you. However, at this point, fixing Shark amounts to almost complete rewrite of it. It would nowadays be based on jvmci. It would use the new and presumably much better JIT interface of LLVM. It would not use a shadow stack and a sane interface between LLVM and the GC (which hasn't existed back then). It's a project I'd personally like to do just for the fun of it, but I simply don't have enough time and the nerve to pull it off alone. In any case, as I said, it would probably make sense to start it from scratch. >> I tested by building a regular x86 JVM and running JTREG tests. All >> looks fine. >> >> - I could not build zero because it seems broken because of the >> recent Atomic::* changes > > I just performed a Zero test build with the current HG revision of > OpenJDK on x86_64 > without any problems and Zero on SPARC builds fine as well, so the > problem you are > seeing has apparently been fixed now. I have not tested your patch yet > though, I just > wanted to verify whether Zero still builds fine. I checked and noticed that it only affects debug builds. That's probably why it slipped through. I filed https://bugs.openjdk.java.net/browse/JDK-8189333 and will post an RFR later. >> - I could not test any of the other arches that seemed to reference >> Shark (arm and sparc) > > I will test this later. I am currently waiting for JDK-8186579 to get > merged which fixes > the last problem on Linux-SPARC. Okidoki, thanks a lot!! Cheers, Roman From glaubitz at physik.fu-berlin.de Sun Oct 15 20:44:16 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Sun, 15 Oct 2017 22:44:16 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <152a7a54-d30f-3c82-313a-608ef118628a@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <152a7a54-d30f-3c82-313a-608ef118628a@redhat.com> Message-ID: On 10/15/2017 10:34 PM, Roman Kennke wrote: >> It's a pity that this value is not seen within the OpenJDK project. > > Yes, I agree with you. However, at this point, fixing Shark amounts to almost > complete rewrite of it. It would nowadays be based on jvmci. It would use the > new and presumably much better JIT interface of LLVM. It would not use a shadow > stack and a sane interface between LLVM and the GC (which hasn't existed back then). Ok, that gives me some consolation, although I'm still sad about this decision. > It's a project I'd personally like to do just for the fun of it, but I simply don't > have enough time and the nerve to pull it off alone. In any case, as I said, it > would probably make sense to start it from scratch. FWIW, there are actually quite a number of users for Zero who would be happy to have a JIT-version of it. One major user for Zero is MIPS (big-, little-endian, 32 and 64 bit) which still doesn't have a native code generator in Hotspot. But we're also using Zero on architectures like m68k (yes, that still exists as people are upgrading their Amigas and Ataris with fast FPGA accelerators) and SuperH and it works fine. I have also contributed several patches already to get Zero into a better shape which allows it to build within Debian without additional patches, I would definitely be interested in helping with a new Shark JVM although I understand that would be a bigger project :). > I checked and noticed that it only affects debug builds. That's probably why it slipped through. > > I filed https://bugs.openjdk.java.net/browse/JDK-8189333 and will post an RFR later. Ok, I'll test it once you've posted it. >>> - I could not test any of the other arches that seemed to reference Shark (arm and sparc) >> >> I will test this later. I am currently waiting for JDK-8186579 to get merged which fixes >> the last problem on Linux-SPARC. > > Okidoki, thanks a lot!! Let me pull this in and test Zero and Server on Linux SPARC. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From david.holmes at oracle.com Sun Oct 15 20:48:23 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 06:48:23 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> Message-ID: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> Hi Roman, The build changes must be reviewed on build-dev - now cc'd. Thanks, David On 15/10/2017 8:41 AM, Roman Kennke wrote: > The JEP to remove the Shark compiler has received exclusively positive > feedback (JDK-8189173) on zero-dev. So here comes the big patch to > remove it. > > What I have done: > > grep -i -R shark src > grep -i -R shark make > grep -i -R shark doc > grep -i -R shark doc > > and purged any reference to shark. Almost everything was straightforward. > > The only things I wasn't really sure of: > > - in globals.hpp, I re-arranged the KIND_* bits to account for the gap > that removing KIND_SHARK left. I hope that's good? > - in relocInfo_zero.hpp I put a ShouldNotCallThis() in > pd_address_in_code(), I am not sure it is the right thing to do. If not, > what *would* be the right thing? > > Then of course I did: > > rm -rf src/hotspot/share/shark > > I also went through the build machinery and removed stuff related to > Shark and LLVM libs. > > Now the only references in the whole JDK tree to shark is a 'Shark Bay' > in a timezone file, and 'Wireshark' in some tests ;-) > > I tested by building a regular x86 JVM and running JTREG tests. All > looks fine. > > - I could not build zero because it seems broken because of the recent > Atomic::* changes > - I could not test any of the other arches that seemed to reference > Shark (arm and sparc) > > Here's the full webrev: > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ > > > Can I get a review on this? > > Thanks, Roman > From glaubitz at physik.fu-berlin.de Sun Oct 15 20:51:08 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Sun, 15 Oct 2017 22:51:08 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> Message-ID: <6e68e18a-f13a-bfd5-f486-d75448538ceb@physik.fu-berlin.de> On 10/15/2017 12:41 AM, Roman Kennke wrote: > Here's the full webrev: > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ Hmm, I just tried importing it: glaubitz at deb4g:~/openjdk/hs$ hg import http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/jdk10-hs-single.changeset applying http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/jdk10-hs-single.changeset patching file make/autoconf/generated-configure.sh Hunk #7 FAILED at 5104 1 out of 19 hunks FAILED -- saving rejects to file make/autoconf/generated-configure.sh.rej abort: patch failed to apply glaubitz at deb4g:~/openjdk/hs$ Does it need to be rebased? Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From rkennke at redhat.com Sun Oct 15 20:52:43 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 22:52:43 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <152a7a54-d30f-3c82-313a-608ef118628a@redhat.com> Message-ID: Am 15.10.2017 um 22:44 schrieb John Paul Adrian Glaubitz: > On 10/15/2017 10:34 PM, Roman Kennke wrote: >>> It's a pity that this value is not seen within the OpenJDK project. >> >> Yes, I agree with you. However, at this point, fixing Shark amounts >> to almost >> complete rewrite of it. It would nowadays be based on jvmci. It would >> use the >> new and presumably much better JIT interface of LLVM. It would not >> use a shadow >> stack and a sane interface between LLVM and the GC (which hasn't >> existed back then). > > Ok, that gives me some consolation, although I'm still sad about this > decision. > >> It's a project I'd personally like to do just for the fun of it, but >> I simply don't >> have enough time and the nerve to pull it off alone. In any case, as >> I said, it >> would probably make sense to start it from scratch. > > FWIW, there are actually quite a number of users for Zero who would be > happy to > have a JIT-version of it. One major user for Zero is MIPS (big-, > little-endian, > 32 and 64 bit) which still doesn't have a native code generator in > Hotspot. > > But we're also using Zero on architectures like m68k (yes, that still > exists > as people are upgrading their Amigas and Ataris with fast FPGA > accelerators) > and SuperH and it works fine. And here is another complication: the last time I checked, the LLVM JIT only support very few platforms. I don't remember from the top off my head, but I'm pretty sure it's a subset of those supported natively by hotspot now (x86, arm and probably ppc). I doubt that MIPS and m68k are on the list of LLVM JIT supported platforms. A quick search yields no current information about this though. > I have also contributed several patches already to get Zero into a better > shape which allows it to build within Debian without additional patches, > I would definitely be interested in helping with a new Shark JVM although > I understand that would be a bigger project :). Ok cool! If/when I ever get to do it (or somebody else) this will be very welcome :-) Cheers, Roman From rkennke at redhat.com Sun Oct 15 21:00:15 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:00:15 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <6e68e18a-f13a-bfd5-f486-d75448538ceb@physik.fu-berlin.de> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <6e68e18a-f13a-bfd5-f486-d75448538ceb@physik.fu-berlin.de> Message-ID: <67a4e380-64d3-d863-5b8f-53554158082f@redhat.com> Am 15.10.2017 um 22:51 schrieb John Paul Adrian Glaubitz: > On 10/15/2017 12:41 AM, Roman Kennke wrote: >> Here's the full webrev: >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >> > > Hmm, I just tried importing it: > > glaubitz at deb4g:~/openjdk/hs$ hg import > http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/jdk10-hs-single.changeset > applying > http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/jdk10-hs-single.changeset > patching file make/autoconf/generated-configure.sh > Hunk #7 FAILED at 5104 > 1 out of 19 hunks FAILED -- saving rejects to file > make/autoconf/generated-configure.sh.rej > abort: patch failed to apply > glaubitz at deb4g:~/openjdk/hs$ > > Does it need to be rebased? Shouldn't be the case, but just to be sure, my patch is based on: http://hg.openjdk.java.net/jdk10/hs/ Also, I've made a small fix that was related to Zero (now that I can actually build it), and I'm currently uploading to: http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ Notice that I have removed the generated-configure.sh part, which means you will be prompted to re-generate it. Roman From glaubitz at physik.fu-berlin.de Sun Oct 15 21:01:15 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Sun, 15 Oct 2017 23:01:15 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <67a4e380-64d3-d863-5b8f-53554158082f@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <6e68e18a-f13a-bfd5-f486-d75448538ceb@physik.fu-berlin.de> <67a4e380-64d3-d863-5b8f-53554158082f@redhat.com> Message-ID: <094e215b-150a-4859-427d-85a201f118e4@physik.fu-berlin.de> On 10/15/2017 11:00 PM, Roman Kennke wrote: >> Does it need to be rebased? > > Shouldn't be the case, but just to be sure, my patch is based on: > > http://hg.openjdk.java.net/jdk10/hs/ > > Also, I've made a small fix that was related to Zero (now that I can actually build it), and I'm currently uploading to: > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ > > Notice that I have removed the generated-configure.sh part, which means you will be prompted to re-generate it. Ok, I will pull that. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From rkennke at redhat.com Sun Oct 15 21:01:42 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:01:42 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> Message-ID: <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Hi David, thanks! I'm uploading a 2nd revision of the patch that excludes the generated-configure.sh part, and adds a smallish Zero-related fix. http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ Thanks, Roman > Hi Roman, > > The build changes must be reviewed on build-dev - now cc'd. > > Thanks, > David > > On 15/10/2017 8:41 AM, Roman Kennke wrote: >> The JEP to remove the Shark compiler has received exclusively >> positive feedback (JDK-8189173) on zero-dev. So here comes the big >> patch to remove it. >> >> What I have done: >> >> grep -i -R shark src >> grep -i -R shark make >> grep -i -R shark doc >> grep -i -R shark doc >> >> and purged any reference to shark. Almost everything was >> straightforward. >> >> The only things I wasn't really sure of: >> >> - in globals.hpp, I re-arranged the KIND_* bits to account for the >> gap that removing KIND_SHARK left. I hope that's good? >> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >> pd_address_in_code(), I am not sure it is the right thing to do. If >> not, what *would* be the right thing? >> >> Then of course I did: >> >> rm -rf src/hotspot/share/shark >> >> I also went through the build machinery and removed stuff related to >> Shark and LLVM libs. >> >> Now the only references in the whole JDK tree to shark is a 'Shark >> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >> >> I tested by building a regular x86 JVM and running JTREG tests. All >> looks fine. >> >> - I could not build zero because it seems broken because of the >> recent Atomic::* changes >> - I could not test any of the other arches that seemed to reference >> Shark (arm and sparc) >> >> Here's the full webrev: >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >> >> >> Can I get a review on this? >> >> Thanks, Roman >> From rkennke at redhat.com Sun Oct 15 21:12:23 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:12:23 +0200 Subject: RFR: 8189333: Fix Zero build after Atomic::xchg changes Message-ID: <003ff7d9-759f-1ef5-f580-18c2571b63e5@redhat.com> Zero debug build has been broken by: JDK-8187977: Generalize Atomic::xchg to use templates. This patch fixes it by casting the unsigned literal to jint: http://cr.openjdk.java.net/~rkennke/8189333/webrev.00/ Tested by building zero fastdebug and running some small test programs. Ok? Roman From david.holmes at oracle.com Sun Oct 15 21:23:52 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:23:52 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> Message-ID: <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com> Hi Roman, I've looked at all the changes for the build and hotspot and everything appears okay to me. Still need someone from compiler team and build team to sign off on this though. One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these includes would seem to be impossible: 38 #ifdef COMPILER1 39 #include "c1/c1_Runtime1.hpp" 40 #endif 41 #ifdef COMPILER2 42 #include "opto/runtime.hpp" 43 #endif no? In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment entirely as it's obviously C2: if (is_c2_compile(comp_level)) { // C2 Ditto in src/hotspot/share/compiler/compileBroker.cpp ! // C2 make_thread(name_buffer, _c2_compile_queue, counters, _compilers[1], compiler_thread, CHECK); Thanks, David ----- On 16/10/2017 6:48 AM, David Holmes wrote: > Hi Roman, > > The build changes must be reviewed on build-dev - now cc'd. > > Thanks, > David > > On 15/10/2017 8:41 AM, Roman Kennke wrote: >> The JEP to remove the Shark compiler has received exclusively positive >> feedback (JDK-8189173) on zero-dev. So here comes the big patch to >> remove it. >> >> What I have done: >> >> grep -i -R shark src >> grep -i -R shark make >> grep -i -R shark doc >> grep -i -R shark doc >> >> and purged any reference to shark. Almost everything was straightforward. >> >> The only things I wasn't really sure of: >> >> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap >> that removing KIND_SHARK left. I hope that's good? >> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >> pd_address_in_code(), I am not sure it is the right thing to do. If >> not, what *would* be the right thing? >> >> Then of course I did: >> >> rm -rf src/hotspot/share/shark >> >> I also went through the build machinery and removed stuff related to >> Shark and LLVM libs. >> >> Now the only references in the whole JDK tree to shark is a 'Shark >> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >> >> I tested by building a regular x86 JVM and running JTREG tests. All >> looks fine. >> >> - I could not build zero because it seems broken because of the recent >> Atomic::* changes >> - I could not test any of the other arches that seemed to reference >> Shark (arm and sparc) >> >> Here's the full webrev: >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >> >> >> Can I get a review on this? >> >> Thanks, Roman >> From david.holmes at oracle.com Sun Oct 15 21:25:04 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:25:04 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: On 16/10/2017 7:01 AM, Roman Kennke wrote: > Hi David, > > thanks! > > I'm uploading a 2nd revision of the patch that excludes the > generated-configure.sh part, and adds a smallish Zero-related fix. > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ > Can you point me to the exact change please as I don't want to re-examine it all. :) I'll pull this in and do a test build run internally. Thanks, David > Thanks, Roman > > >> Hi Roman, >> >> The build changes must be reviewed on build-dev - now cc'd. >> >> Thanks, >> David >> >> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>> The JEP to remove the Shark compiler has received exclusively >>> positive feedback (JDK-8189173) on zero-dev. So here comes the big >>> patch to remove it. >>> >>> What I have done: >>> >>> grep -i -R shark src >>> grep -i -R shark make >>> grep -i -R shark doc >>> grep -i -R shark doc >>> >>> and purged any reference to shark. Almost everything was >>> straightforward. >>> >>> The only things I wasn't really sure of: >>> >>> - in globals.hpp, I re-arranged the KIND_* bits to account for the >>> gap that removing KIND_SHARK left. I hope that's good? >>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>> pd_address_in_code(), I am not sure it is the right thing to do. If >>> not, what *would* be the right thing? >>> >>> Then of course I did: >>> >>> rm -rf src/hotspot/share/shark >>> >>> I also went through the build machinery and removed stuff related to >>> Shark and LLVM libs. >>> >>> Now the only references in the whole JDK tree to shark is a 'Shark >>> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>> >>> I tested by building a regular x86 JVM and running JTREG tests. All >>> looks fine. >>> >>> - I could not build zero because it seems broken because of the >>> recent Atomic::* changes >>> - I could not test any of the other arches that seemed to reference >>> Shark (arm and sparc) >>> >>> Here's the full webrev: >>> >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>> >>> >>> Can I get a review on this? >>> >>> Thanks, Roman >>> > From david.holmes at oracle.com Sun Oct 15 21:29:33 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:29:33 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: Just spotted this: ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ David On 16/10/2017 7:25 AM, David Holmes wrote: > On 16/10/2017 7:01 AM, Roman Kennke wrote: >> Hi David, >> >> thanks! >> >> I'm uploading a 2nd revision of the patch that excludes the >> generated-configure.sh part, and adds a smallish Zero-related fix. >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >> > > Can you point me to the exact change please as I don't want to > re-examine it all. :) > > I'll pull this in and do a test build run internally. > > Thanks, > David > >> Thanks, Roman >> >> >>> Hi Roman, >>> >>> The build changes must be reviewed on build-dev - now cc'd. >>> >>> Thanks, >>> David >>> >>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>> The JEP to remove the Shark compiler has received exclusively >>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big >>>> patch to remove it. >>>> >>>> What I have done: >>>> >>>> grep -i -R shark src >>>> grep -i -R shark make >>>> grep -i -R shark doc >>>> grep -i -R shark doc >>>> >>>> and purged any reference to shark. Almost everything was >>>> straightforward. >>>> >>>> The only things I wasn't really sure of: >>>> >>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the >>>> gap that removing KIND_SHARK left. I hope that's good? >>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>> pd_address_in_code(), I am not sure it is the right thing to do. If >>>> not, what *would* be the right thing? >>>> >>>> Then of course I did: >>>> >>>> rm -rf src/hotspot/share/shark >>>> >>>> I also went through the build machinery and removed stuff related to >>>> Shark and LLVM libs. >>>> >>>> Now the only references in the whole JDK tree to shark is a 'Shark >>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>> >>>> I tested by building a regular x86 JVM and running JTREG tests. All >>>> looks fine. >>>> >>>> - I could not build zero because it seems broken because of the >>>> recent Atomic::* changes >>>> - I could not test any of the other arches that seemed to reference >>>> Shark (arm and sparc) >>>> >>>> Here's the full webrev: >>>> >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>> >>>> >>>> Can I get a review on this? >>>> >>>> Thanks, Roman >>>> >> From rkennke at redhat.com Sun Oct 15 21:31:51 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:31:51 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com> Message-ID: <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com> Hi David, thanks for reviewing! > > One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these > includes would seem to be impossible: > > ? 38 #ifdef COMPILER1 > ? 39 #include "c1/c1_Runtime1.hpp" > ? 40 #endif > ? 41 #ifdef COMPILER2 > ? 42 #include "opto/runtime.hpp" > ? 43 #endif > > no? I have no idea. It is at least theoretically possible to have a platform with C1 and/or C2 support based on the Zero interpreter? I'm leaving that in for now as it was pre-existing and not related to Shark removal, ok? > > In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment > entirely as it's obviously C2: > > if (is_c2_compile(comp_level)) { // C2 > > Ditto in src/hotspot/share/compiler/compileBroker.cpp > > !???? // C2 > ????? make_thread(name_buffer, _c2_compile_queue, counters, > _compilers[1], compiler_thread, CHECK); Ok, right. For consistency, I also remove // C1 in ciEnv.cpp similarily obvious is_c1_compile() call :-) New webrev: http://cr.openjdk.java.net/~rkennke/8171853/webrev.02/ Roman From david.holmes at oracle.com Sun Oct 15 21:32:26 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:32:26 +1000 Subject: RFR: 8189333: Fix Zero build after Atomic::xchg changes In-Reply-To: <003ff7d9-759f-1ef5-f580-18c2571b63e5@redhat.com> References: <003ff7d9-759f-1ef5-f580-18c2571b63e5@redhat.com> Message-ID: Hi Roman, On 16/10/2017 7:12 AM, Roman Kennke wrote: > Zero debug build has been broken by: JDK-8187977: Generalize > Atomic::xchg to use templates. > > This patch fixes it by casting the unsigned literal to jint: > > http://cr.openjdk.java.net/~rkennke/8189333/webrev.00/ > Looks fine. I can push this for you straight away (relatively speaking :) ) under the trivial rule. Thanks, David > Tested by building zero fastdebug and running some small test programs. > > Ok? > > > Roman > From david.holmes at oracle.com Sun Oct 15 21:33:44 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:33:44 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <9d0b0656-c168-7e72-e272-893d0b475d56@oracle.com> <0c0f0e20-86d5-bd3e-06ec-5b4c103eb3e7@redhat.com> Message-ID: <86c02492-ecf5-197b-7ca1-a411f68000c5@oracle.com> On 16/10/2017 7:31 AM, Roman Kennke wrote: > Hi David, > > thanks for reviewing! > >> >> One observation in src/hotspot/cpu/zero/sharedRuntime_zero.cpp, these >> includes would seem to be impossible: >> >> ? 38 #ifdef COMPILER1 >> ? 39 #include "c1/c1_Runtime1.hpp" >> ? 40 #endif >> ? 41 #ifdef COMPILER2 >> ? 42 #include "opto/runtime.hpp" >> ? 43 #endif >> >> no? > > I have no idea. It is at least theoretically possible to have a platform > with C1 and/or C2 support based on the Zero interpreter? I'm leaving > that in for now as it was pre-existing and not related to Shark removal, > ok? Yep that's fine. Thanks. David >> >> In src/hotspot/share/ci/ciEnv.cpp you can just delete the comment >> entirely as it's obviously C2: >> >> if (is_c2_compile(comp_level)) { // C2 >> >> Ditto in src/hotspot/share/compiler/compileBroker.cpp >> >> !???? // C2 >> ????? make_thread(name_buffer, _c2_compile_queue, counters, >> _compilers[1], compiler_thread, CHECK); > > Ok, right. For consistency, I also remove // C1 in ciEnv.cpp similarily > obvious is_c1_compile() call :-) > > New webrev: > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.02/ > > > Roman From rkennke at redhat.com Sun Oct 15 21:39:54 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:39:54 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com> Am 15.10.2017 um 23:25 schrieb David Holmes: > On 16/10/2017 7:01 AM, Roman Kennke wrote: >> Hi David, >> >> thanks! >> >> I'm uploading a 2nd revision of the patch that excludes the >> generated-configure.sh part, and adds a smallish Zero-related fix. >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >> > > Can you point me to the exact change please as I don't want to > re-examine it all. :) Oops, sorry. The diff between 00 and 01 is this (apart from generated-configure.sh): diff --git a/src/hotspot/share/utilities/vmError.cpp b/src/hotspot/share/utilities/vmError.cpp --- a/src/hotspot/share/utilities/vmError.cpp +++ b/src/hotspot/share/utilities/vmError.cpp @@ -192,6 +192,7 @@ ???? st->cr(); ???? // Print the frames +??? StackFrameStream sfs(jt); ???? for(int i = 0; !sfs.is_done(); sfs.next(), i++) { ?????? sfs.current()->zero_print_on_error(i, st, buf, buflen); ?????? st->cr(); I.e. I added back the sfs variable that I accidentally removed in webrev.00. From rkennke at redhat.com Sun Oct 15 21:40:21 2017 From: rkennke at redhat.com (Roman Kennke) Date: Sun, 15 Oct 2017 23:40:21 +0200 Subject: RFR: 8189333: Fix Zero build after Atomic::xchg changes In-Reply-To: References: <003ff7d9-759f-1ef5-f580-18c2571b63e5@redhat.com> Message-ID: Am 15.10.2017 um 23:32 schrieb David Holmes: > Hi Roman, > > On 16/10/2017 7:12 AM, Roman Kennke wrote: >> Zero debug build has been broken by: JDK-8187977: Generalize >> Atomic::xchg to use templates. >> >> This patch fixes it by casting the unsigned literal to jint: >> >> http://cr.openjdk.java.net/~rkennke/8189333/webrev.00/ >> > > Looks fine. > > I can push this for you straight away (relatively speaking :) ) under > the trivial rule. Thanks! Roman From david.holmes at oracle.com Sun Oct 15 21:44:04 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 07:44:04 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <7deb690b-8b74-1ca2-948e-d76d0d133814@redhat.com> Message-ID: On 16/10/2017 7:39 AM, Roman Kennke wrote: > Am 15.10.2017 um 23:25 schrieb David Holmes: >> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>> Hi David, >>> >>> thanks! >>> >>> I'm uploading a 2nd revision of the patch that excludes the >>> generated-configure.sh part, and adds a smallish Zero-related fix. >>> >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>> >> >> Can you point me to the exact change please as I don't want to >> re-examine it all. :) > Oops, sorry. The diff between 00 and 01 is this (apart from > generated-configure.sh): > > diff --git a/src/hotspot/share/utilities/vmError.cpp > b/src/hotspot/share/utilities/vmError.cpp > --- a/src/hotspot/share/utilities/vmError.cpp > +++ b/src/hotspot/share/utilities/vmError.cpp > @@ -192,6 +192,7 @@ > ???? st->cr(); > > ???? // Print the frames > +??? StackFrameStream sfs(jt); > ???? for(int i = 0; !sfs.is_done(); sfs.next(), i++) { > ?????? sfs.current()->zero_print_on_error(i, st, buf, buflen); > ?????? st->cr(); > > I.e. I added back the sfs variable that I accidentally removed in > webrev.00. Looks good! David From rkennke at redhat.com Sun Oct 15 22:00:15 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Oct 2017 00:00:15 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: Ok, I fixed all the comments you mentioned. Differential (against webrev.01): http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ Full webrev: http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ Roman > Just spotted this: > > ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** > {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ > > David > > On 16/10/2017 7:25 AM, David Holmes wrote: >> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>> Hi David, >>> >>> thanks! >>> >>> I'm uploading a 2nd revision of the patch that excludes the >>> generated-configure.sh part, and adds a smallish Zero-related fix. >>> >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>> >> >> Can you point me to the exact change please as I don't want to >> re-examine it all. :) >> >> I'll pull this in and do a test build run internally. >> >> Thanks, >> David >> >>> Thanks, Roman >>> >>> >>>> Hi Roman, >>>> >>>> The build changes must be reviewed on build-dev - now cc'd. >>>> >>>> Thanks, >>>> David >>>> >>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>> The JEP to remove the Shark compiler has received exclusively >>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big >>>>> patch to remove it. >>>>> >>>>> What I have done: >>>>> >>>>> grep -i -R shark src >>>>> grep -i -R shark make >>>>> grep -i -R shark doc >>>>> grep -i -R shark doc >>>>> >>>>> and purged any reference to shark. Almost everything was >>>>> straightforward. >>>>> >>>>> The only things I wasn't really sure of: >>>>> >>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the >>>>> gap that removing KIND_SHARK left. I hope that's good? >>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>> If not, what *would* be the right thing? >>>>> >>>>> Then of course I did: >>>>> >>>>> rm -rf src/hotspot/share/shark >>>>> >>>>> I also went through the build machinery and removed stuff related >>>>> to Shark and LLVM libs. >>>>> >>>>> Now the only references in the whole JDK tree to shark is a 'Shark >>>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>> >>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>> All looks fine. >>>>> >>>>> - I could not build zero because it seems broken because of the >>>>> recent Atomic::* changes >>>>> - I could not test any of the other arches that seemed to >>>>> reference Shark (arm and sparc) >>>>> >>>>> Here's the full webrev: >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>> >>>>> >>>>> Can I get a review on this? >>>>> >>>>> Thanks, Roman >>>>> >>> From david.holmes at oracle.com Sun Oct 15 22:08:52 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 08:08:52 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> Looks good. Thanks, David On 16/10/2017 8:00 AM, Roman Kennke wrote: > > Ok, I fixed all the comments you mentioned. > > Differential (against webrev.01): > http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ > > Full webrev: > http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ > > > Roman > >> Just spotted this: >> >> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >> >> David >> >> On 16/10/2017 7:25 AM, David Holmes wrote: >>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>> Hi David, >>>> >>>> thanks! >>>> >>>> I'm uploading a 2nd revision of the patch that excludes the >>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>> >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>> >>> >>> Can you point me to the exact change please as I don't want to >>> re-examine it all. :) >>> >>> I'll pull this in and do a test build run internally. >>> >>> Thanks, >>> David >>> >>>> Thanks, Roman >>>> >>>> >>>>> Hi Roman, >>>>> >>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the big >>>>>> patch to remove it. >>>>>> >>>>>> What I have done: >>>>>> >>>>>> grep -i -R shark src >>>>>> grep -i -R shark make >>>>>> grep -i -R shark doc >>>>>> grep -i -R shark doc >>>>>> >>>>>> and purged any reference to shark. Almost everything was >>>>>> straightforward. >>>>>> >>>>>> The only things I wasn't really sure of: >>>>>> >>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the >>>>>> gap that removing KIND_SHARK left. I hope that's good? >>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>> If not, what *would* be the right thing? >>>>>> >>>>>> Then of course I did: >>>>>> >>>>>> rm -rf src/hotspot/share/shark >>>>>> >>>>>> I also went through the build machinery and removed stuff related >>>>>> to Shark and LLVM libs. >>>>>> >>>>>> Now the only references in the whole JDK tree to shark is a 'Shark >>>>>> Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>> >>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>> All looks fine. >>>>>> >>>>>> - I could not build zero because it seems broken because of the >>>>>> recent Atomic::* changes >>>>>> - I could not test any of the other arches that seemed to >>>>>> reference Shark (arm and sparc) >>>>>> >>>>>> Here's the full webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>> >>>>>> >>>>>> Can I get a review on this? >>>>>> >>>>>> Thanks, Roman >>>>>> >>>> > From vladimir.kozlov at oracle.com Sun Oct 15 22:14:53 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 15 Oct 2017 15:14:53 -0700 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> Message-ID: <85b68a77-f418-c619-0a51-c7389d7c5a86@oracle.com> +1 Thanks, Vladimir On 10/15/17 3:08 PM, David Holmes wrote: > Looks good. > > Thanks, > David > > On 16/10/2017 8:00 AM, Roman Kennke wrote: >> >> Ok, I fixed all the comments you mentioned. >> >> Differential (against webrev.01): >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >> >> Full webrev: >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >> >> Roman >> >>> Just spotted this: >>> >>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** {@code CompLevel::CompLevel_full_optimization} -- C2 >>> or Shark */ >>> >>> David >>> >>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> >>>>> thanks! >>>>> >>>>> I'm uploading a 2nd revision of the patch that excludes the generated-configure.sh part, and adds a smallish >>>>> Zero-related fix. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>> >>>> Can you point me to the exact change please as I don't want to re-examine it all. :) >>>> >>>> I'll pull this in and do a test build run internally. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Roman >>>>> >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>> The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So >>>>>>> here comes the big patch to remove it. >>>>>>> >>>>>>> What I have done: >>>>>>> >>>>>>> grep -i -R shark src >>>>>>> grep -i -R shark make >>>>>>> grep -i -R shark doc >>>>>>> grep -i -R shark doc >>>>>>> >>>>>>> and purged any reference to shark. Almost everything was straightforward. >>>>>>> >>>>>>> The only things I wasn't really sure of: >>>>>>> >>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope >>>>>>> that's good? >>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing >>>>>>> to do. If not, what *would* be the right thing? >>>>>>> >>>>>>> Then of course I did: >>>>>>> >>>>>>> rm -rf src/hotspot/share/shark >>>>>>> >>>>>>> I also went through the build machinery and removed stuff related to Shark and LLVM libs. >>>>>>> >>>>>>> Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in >>>>>>> some tests ;-) >>>>>>> >>>>>>> I tested by building a regular x86 JVM and running JTREG tests. All looks fine. >>>>>>> >>>>>>> - I could not build zero because it seems broken because of the recent Atomic::* changes >>>>>>> - I could not test any of the other arches that seemed to reference Shark (arm and sparc) >>>>>>> >>>>>>> Here's the full webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>> >>>>>>> Can I get a review on this? >>>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>> >> From david.holmes at oracle.com Mon Oct 16 00:31:55 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 10:31:55 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> Message-ID: <331579a0-29de-f152-2dd4-66987896c463@oracle.com> My internal JPRT run went fine. So this just needs a build team signoff from the perspective of the patch. However, as this has had a JEP submitted for it, the code changes can not be pushed until the JEP has been targeted. Thanks, David On 16/10/2017 8:08 AM, David Holmes wrote: > Looks good. > > Thanks, > David > > On 16/10/2017 8:00 AM, Roman Kennke wrote: >> >> Ok, I fixed all the comments you mentioned. >> >> Differential (against webrev.01): >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >> >> Full webrev: >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >> >> >> Roman >> >>> Just spotted this: >>> >>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>> >>> David >>> >>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> >>>>> thanks! >>>>> >>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>> >>>> >>>> Can you point me to the exact change please as I don't want to >>>> re-examine it all. :) >>>> >>>> I'll pull this in and do a test build run internally. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Roman >>>>> >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>> big patch to remove it. >>>>>>> >>>>>>> What I have done: >>>>>>> >>>>>>> grep -i -R shark src >>>>>>> grep -i -R shark make >>>>>>> grep -i -R shark doc >>>>>>> grep -i -R shark doc >>>>>>> >>>>>>> and purged any reference to shark. Almost everything was >>>>>>> straightforward. >>>>>>> >>>>>>> The only things I wasn't really sure of: >>>>>>> >>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>>> If not, what *would* be the right thing? >>>>>>> >>>>>>> Then of course I did: >>>>>>> >>>>>>> rm -rf src/hotspot/share/shark >>>>>>> >>>>>>> I also went through the build machinery and removed stuff related >>>>>>> to Shark and LLVM libs. >>>>>>> >>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>> >>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>> All looks fine. >>>>>>> >>>>>>> - I could not build zero because it seems broken because of the >>>>>>> recent Atomic::* changes >>>>>>> - I could not test any of the other arches that seemed to >>>>>>> reference Shark (arm and sparc) >>>>>>> >>>>>>> Here's the full webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>> >>>>>>> >>>>>>> Can I get a review on this? >>>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>> >> From david.holmes at oracle.com Mon Oct 16 01:18:10 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 11:18:10 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <7265c30d-946b-19c4-a1b3-c3314a869ee8@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <7265c30d-946b-19c4-a1b3-c3314a869ee8@oracle.com> Message-ID: <33af17b9-6dce-5a5e-cb94-b3c1afbe8532@oracle.com> One tiny follow up as I was looking at this code: src/hotspot/share/services/mallocSiteTable.hpp 65 MallocSiteHashtableEntry* _next; should be 65 MallocSiteHashtableEntry* volatile _next; as we operate on it with CAS. Thanks, David On 14/10/2017 10:32 PM, David Holmes wrote: > Hi Coleen, > > These changes all seem okay to me - except I can't comment on the > Atomic::sub implementation. :) > > Thanks for adding the assert to header_addr(). FYI from objectMonitor.hpp: > > // ObjectMonitor Layout Overview/Highlights/Restrictions: > // > // - The _header field must be at offset 0 because the displaced header > //?? from markOop is stored there. We do not want markOop.hpp to include > //?? ObjectMonitor.hpp to avoid exposing ObjectMonitor everywhere. This > //?? means that ObjectMonitor cannot inherit from any other class nor can > //?? it use any virtual member functions. This restriction is critical to > //?? the proper functioning of the VM. > > so it is important we ensure this holds. > > Thanks, > David > > On 14/10/2017 4:34 AM, coleen.phillimore at oracle.com wrote: >> >> Hi, Here is the version with the changes from Kim's comments that has >> passed at least testing with JPRT and tier1, locally.?? More testing >> (tier2-5) is in progress. >> >> Also includes a corrected version of Atomic::sub care of Erik Osterlund. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev >> open webrev at >> http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev >> >> Full version: >> >> http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >> >> Thanks! >> Coleen >> >> On 10/13/17 9:25 AM, coleen.phillimore at oracle.com wrote: >>> >>> Hi Kim, Thank you for the detailed review and the time you've spent >>> on it, and discussion yesterday. >>> >>> On 10/12/17 7:17 PM, Kim Barrett wrote: >>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>> >>>>> Summary: With the new template functions these are unnecessary. >>>>> >>>>> The changes are mostly s/_ptr// and removing the cast to return >>>>> type.? There weren't many types that needed to be improved to match >>>>> the template version of the function.?? Some notes: >>>>> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging >>>>> arguments. >>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I >>>>> disliked the first name because it's not explicit from the callers >>>>> that there's an underlying cas.? If people want to fight, I'll >>>>> remove the function and use cmpxchg because there are only a couple >>>>> places where this is a little nicer. >>>>> 3. Added Atomic::sub() >>>>> >>>>> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. >>>>> >>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8188220 >>>>> >>>>> Thanks, >>>>> Coleen >>>> I looked harder at the potential ABA problems, and believe they are >>>> okay.? There can be multiple threads doing pushes, and there can be >>>> multiple threads doing pops, but not both at the same time. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/cpu/zero/cppInterpreter_zero.cpp >>>> ? 279???? if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != >>>> disp) { >>>> >>>> How does this work?? monitor and disp seem like they have unrelated >>>> types?? Given that this is zero-specific code, maybe this hasn't been >>>> tested? >>>> >>>> Similarly here: >>>> ? 423?????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != >>>> lock) { >>> >>> I haven't built zero.? I don't know how to do this anymore (help?) I >>> fixed the obvious type mismatches here and in >>> bytecodeInterpreter.cpp.? I'll try to build it. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/asm/assembler.cpp >>>> ? 239???????? dcon->value_fn = cfn; >>>> >>>> Is it actually safe to remove the atomic update?? If multiple threads >>>> performing the assignment *are* possible (and I don't understand the >>>> context yet, so don't know the answer to that), then a bare non-atomic >>>> assignment is a race, e.g. undefined behavior. >>>> >>>> Regardless of that, I think the CAST_FROM_FN_PTR should be retained. >>> >>> I can find no uses of this code, ie. looking for "delayed_value". I >>> think it was early jsr292 code.? I could also not find any >>> combination of casts that would make it compile, so in the end I >>> believed the comment and took out the cmpxchg.?? The code appears to >>> be intended to for bootstrapping, see the call to >>> update_delayed_values() in JavaClasses::compute_offsets(). >>> >>> The CAST_FROM_FN_PTR was to get it to compile with cmpxchg, the new >>> code does not require a cast.? If you can help with finding the right >>> set of casts, I'd be happy to put the cmpxchg back in. I just >>> couldn't find one. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/classfile/classLoaderData.cpp >>>> ? 167?? Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head); >>>> >>>> I think the cast to Chunk* is no longer needed. >>> >>> Missed another, thanks.? No that's the same one David found. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/classfile/classLoaderData.cpp >>>> ? 946???? ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, >>>> (ClassLoaderData*)NULL); >>>> ? 947???? if (old != NULL) { >>>> ? 948?????? delete cld; >>>> ? 949?????? // Returns the data. >>>> ? 950?????? return old; >>>> ? 951???? } >>>> >>>> That could instead be >>>> >>>> ?? if (!Atomic::replace_if_null(cld, cld_addr)) { >>>> ???? delete cld;?????????? // Lost the race. >>>> ???? return *cld_addr;???? // Use the winner's value. >>>> ?? } >>>> >>>> And apparently the caller of CLDG::add doesn't care whether the >>>> returned CLD has actually been added to the graph yet.? If that's not >>>> true, then there's a bug here, since a race loser might return a >>>> winner's value before the winner has actually done the insertion. >>> >>> True, the race loser doesn't care whether the CLD has been added to >>> the graph. >>> Your instead code requires a comment that replace_if_null is really a >>> compare exchange and has an extra read of the original value, so I am >>> leaving what I have which is clearer to me. >>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/classfile/verifier.cpp >>>> ?? 71 static void* verify_byte_codes_fn() { >>>> ?? 72?? if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == >>>> NULL) { >>>> ?? 73???? void *lib_handle = os::native_java_library(); >>>> ?? 74???? void *func = os::dll_lookup(lib_handle, >>>> "VerifyClassCodesForMajorVersion"); >>>> ?? 75???? OrderAccess::release_store(&_verify_byte_codes_fn, func); >>>> ?? 76???? if (func == NULL) { >>>> ?? 77?????? _is_new_verify_byte_codes_fn = false; >>>> ?? 78?????? func = os::dll_lookup(lib_handle, "VerifyClassCodes"); >>>> ?? 79 OrderAccess::release_store(&_verify_byte_codes_fn, func); >>>> ?? 80???? } >>>> ?? 81?? } >>>> ?? 82?? return (void*)_verify_byte_codes_fn; >>>> ?? 83 } >>>> >>>> [pre-existing] >>>> >>>> I think this code has race problems; a caller could unexpectedly and >>>> inappropriately return NULL.? Consider the case where there is no >>>> VerifyClassCodesForMajorVersion, but there is VerifyClassCodes. >>>> >>>> The variable is initially NULL. >>>> >>>> Both Thread1 and Thread2 reach line 73, having both seen a NULL value >>>> for the variable. >>>> >>>> Thread1 reaches line 80, setting the variable to VerifyClassCodes. >>>> >>>> Thread2 reaches line 76, resetting the variable to NULL. >>>> >>>> Thread1 reads the now (momentarily) NULL value and returns it. >>>> >>>> I think the first release_store should be conditional on func != NULL. >>>> Also, the usage of _is_new_verify_byte_codes_fn seems suspect. >>>> And a minor additional nit: the cast in the return is unnecessary. >>> >>> Yes, this looks like a bug.?? I'll cut/paste this and file it. It may >>> be that this is support for the old verifier in old jdk versions that >>> can be cleaned up. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/code/nmethod.cpp >>>> 1664?? nmethod* observed_mark_link = _oops_do_mark_link; >>>> 1665?? if (observed_mark_link == NULL) { >>>> 1666???? // Claim this nmethod for this thread to mark. >>>> 1667???? if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, >>>> &_oops_do_mark_link)) { >>>> >>>> With these changes, the only use of observed_mark_link is in the if. >>>> I'm not sure that variable is really useful anymore, e.g. just use >>>> >>>> ?? if (_oops_do_mark_link == NULL) { >>> >>> Ok fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>>> >>>> In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were >>>> of type oopDesc*, I think there would be a whole lot fewer casts and >>>> cast_to_oop's.? Later on, I think suffix_head, observed_overflow_list, >>>> and curr_overflow_list could also be oopDesc* instead of oop to >>>> eliminate more casts. >>> >>> I actually tried to make this change but ran into more fan out that >>> way, so went back and just fixed the cmpxchg calls to cast oops to >>> oopDesc* and things were less perturbed that way. >>>> >>>> And some similar changes in CMSCollector::par_push_on_overflow_list. >>>> >>>> And similarly in parNewGeneration.cpp, in push_on_overflow_list and >>>> take_from_overflow_list_work. >>>> >>>> As noted in the comments for JDK-8165857, the lists and "objects" >>>> involved here aren't really oops, but rather the shattered remains of >>> >>> Yes, somewhat horrified at the value of BUSY. >>>> oops.? The suggestion there was to use HeapWord* and carry through the >>>> fanout; what was actually done was to change _overflow_list to >>>> oopDesc* to minimize fanout, even though that's kind of lying to the >>>> type system.? Now, with the cleanup of cmpxchg_ptr and such, we're >>>> paying the price of doing the minimal thing back then. >>> >>> I will file an RFE about cleaning this up.? I think what I've done >>> was the minimal thing. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>>> 7960?? Atomic::add(-n, &_num_par_pushes); >>>> >>>> Atomic::sub >>> >>> fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/cms/parNewGeneration.cpp >>>> 1455?? Atomic::add(-n, &_num_par_pushes); >>> fixed. >>>> Atomic::sub >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/g1/dirtyCardQueue.cpp >>>> ? 283???? void* actual = Atomic::cmpxchg(next, >>>> &_cur_par_buffer_node, nd); >>>> ... >>>> ? 289?????? nd = static_cast(actual); >>>> >>>> Change actual's type to BufferNode* and remove the cast on line 289. >>> >>> fixed.? missed that one. gross. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp >>>> >>>> [pre-existing] >>>> 3499???????? old = (CompiledMethod*)_postponed_list; >>>> >>>> I think that cast is only needed because >>>> G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as >>>> "volatile CompiledMethod*", when I think it ought to be >>>> "CompiledMethod* volatile". >>>> >>>> I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed, >>>> with a similar should not be needed cast: >>>> 3530?????? first = (CompiledMethod*)_claimed_nmethod; >>>> >>>> and another for _postponed_list here: >>>> 3552?????? claim = (CompiledMethod*)_postponed_list; >>> >>> I've fixed this.?? C++ is so confusing about where to put the >>> volatile.?? Everyone has been tripped up by it. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/g1/g1HotCardCache.cpp >>>> ?? 77?? jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr, >>>> >>>> I think the cast of the cmpxchg result is no longer needed. >>> >>> fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp >>>> ? 254?????? char* touch_addr = (char*)Atomic::add(actual_chunk_size, >>>> &_cur_addr) - actual_chunk_size; >>>> >>>> I think the cast of the add result is no longer needed. >>> got it already. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/g1/g1StringDedup.cpp >>>> ? 213?? return (size_t)Atomic::add(partition_size, &_next_bucket) - >>>> partition_size; >>>> >>>> I think the cast of the add result is no longer needed. >>> >>> I was slacking in the g1 files.? fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >>>> ? 200?????? PerRegionTable* res = >>>> ? 201???????? Atomic::cmpxchg(nxt, &_free_list, fl); >>>> >>>> Please remove the line break, now that the code has been simplified. >>>> >>>> But wait, doesn't this alloc exhibit classic ABA problems?? I *think* >>>> this works because alloc and bulk_free are called in different phases, >>>> never overlapping. >>> >>> I don't know.? Do you want to file a bug to investigate this? >>> fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/g1/sparsePRT.cpp >>>> ? 295???? SparsePRT* res = >>>> ? 296?????? Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >>>> and >>>> ? 307???? SparsePRT* res = >>>> ? 308?????? Atomic::cmpxchg(next, &_head_expanded_list, hd); >>>> >>>> I'd rather not have the line breaks in these either. >>>> >>>> And get_from_expanded_list also appears to have classic ABA problems. >>>> I *think* this works because add_to_expanded_list and >>>> get_from_expanded_list are called in different phases, never >>>> overlapping. >>> >>> Fixed, same question as above?? Or one bug to investigate both? >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>>> ? 262?? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>>> ? 263?????????????????????????????????? (volatile intptr_t *)&_data, >>>> ? 264 (intptr_t)old_age._data); >>>> >>>> This should be >>>> >>>> ?? return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >>> >>> fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/interpreter/bytecodeInterpreter.cpp >>>> This doesn't have any casts, which I think is correct. >>>> ? 708???????????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), >>>> mark) == mark) { >>>> >>>> but these do. >>>> ? 718???????????? if (Atomic::cmpxchg((void*)new_header, >>>> rcvr->mark_addr(), mark) == mark) { >>>> ? 737???????????? if (Atomic::cmpxchg((void*)new_header, >>>> rcvr->mark_addr(), header) == header) { >>>> >>>> I'm not sure how the ones with casts even compile?? mark_addr() seems >>>> to be a markOop*, which is a markOopDesc**, where markOopDesc is a >>>> class.? void* is not implicitly convertible to markOopDesc*. >>>> >>>> Hm, this entire file is #ifdef CC_INTERP.? Is this zero-only code?? Or >>>> something like that? >>>> >>>> Similarly here: >>>> ? 906?????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >>>> mark) == mark) { >>>> and >>>> ? 917?????????? if (Atomic::cmpxchg((void*)new_header, >>>> lockee->mark_addr(), mark) == mark) { >>>> ? 935?????????? if (Atomic::cmpxchg((void*)new_header, >>>> lockee->mark_addr(), header) == header) { >>>> >>>> and here: >>>> 1847?????????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >>>> mark) == mark) { >>>> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >>>> lockee->mark_addr(), mark) == mark) { >>>> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >>>> lockee->mark_addr(), header) == header) { >>>> >>>> and here: >>>> 1847?????????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >>>> mark) == mark) { >>>> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >>>> lockee->mark_addr(), mark) == mark) { >>>> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >>>> lockee->mark_addr(), header) == header) { >>> >>> I've changed all these.?? This is part of Zero. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/memory/metaspace.cpp >>>> 1502?? size_t value = OrderAccess::load_acquire(&_capacity_until_GC); >>>> ... >>>> 1537?? return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC); >>>> >>>> These and other uses of _capacity_until_GC suggest that variable's >>>> type should be size_t rather than intptr_t.? Note that I haven't done >>>> a careful check of uses to see if there are any places where such a >>>> change would cause problems. >>> >>> Yes, I had a hard time with metaspace.cpp because I agree >>> _capacity_until_GC should be size_t.?? Tried to make this change and >>> it cascaded a bit.? I'll file an RFE to change this type separately. >>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/oops/constantPool.cpp >>>> ? 229?? OrderAccess::release_store((Klass* volatile *)adr, k); >>>> ? 246?? OrderAccess::release_store((Klass* volatile *)adr, k); >>>> ? 514?? OrderAccess::release_store((Klass* volatile *)adr, k); >>>> >>>> Casts are not needed. >>> >>> fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/oops/constantPool.hpp >>>> ? 148???? volatile intptr_t adr = >>>> OrderAccess::load_acquire(obj_at_addr_raw(which)); >>>> >>>> [pre-existing] >>>> Why is adr declared volatile? >>> >>> golly beats me.? concurrency is scary, especially in the constant pool. >>> The load_acquire() should make sure the value is fetched from memory >>> so volatile is unneeded. >>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/oops/cpCache.cpp >>>> ? 157???? intx newflags = (value & parameter_size_mask); >>>> ? 158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >>>> >>>> This is a nice demonstration of why I wanted to include some value >>>> preserving integral conversions in cmpxchg, rather than requiring >>>> exact type matching in the integral case.? There have been some others >>>> that I haven't commented on.? Apparently we (I) got away with >>>> including such conversions in Atomic::add, which I'd forgotten about. >>>> And see comment regarding Atomic::sub below. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/oops/cpCache.hpp >>>> ? 139?? volatile Metadata*?? _f1;?????? // entry specific metadata >>>> field >>>> >>>> [pre-existing] >>>> I suspect the type should be Metadata* volatile.? And that would >>>> eliminate the need for the cast here: >>>> >>>> ? 339?? Metadata* f1_ord() const?????????????????????? { return >>>> (Metadata *)OrderAccess::load_acquire(&_f1); } >>>> >>>> I don't know if there are any other changes needed or desirable around >>>> _f1 usage. >>> >>> yes, fixed this. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/oops/method.hpp >>>> ? 139?? volatile address from_compiled_entry() const?? { return >>>> OrderAccess::load_acquire(&_from_compiled_entry); } >>>> ? 140?? volatile address from_compiled_entry_no_trampoline() const; >>>> ? 141?? volatile address from_interpreted_entry() const{ return >>>> OrderAccess::load_acquire(&_from_interpreted_entry); } >>>> >>>> [pre-existing] >>>> The volatile qualifiers here seem suspect to me. >>> >>> Again much suspicion about concurrency and giant pain, which I >>> remember, of debugging these when they were broken. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/oops/oop.inline.hpp >>>> ? 391???? narrowOop old = (narrowOop)Atomic::xchg(val, >>>> (narrowOop*)dest); >>>> >>>> Cast of return type is not needed. >>> >>> fixed. >>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/prims/jni.cpp >>>> >>>> [pre-existing] >>>> >>>> copy_jni_function_table should be using Copy::disjoint_words_atomic. >>> >>> yuck. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/prims/jni.cpp >>>> >>>> [pre-existing] >>>> >>>> 3892?? // We're about to use Atomic::xchg for synchronization. Some >>>> Zero >>>> 3893?? // platforms use the GCC builtin __sync_lock_test_and_set for >>>> this, >>>> 3894?? // but __sync_lock_test_and_set is not guaranteed to do what >>>> we want >>>> 3895?? // on all architectures.? So we check it works before relying >>>> on it. >>>> 3896 #if defined(ZERO) && defined(ASSERT) >>>> 3897?? { >>>> 3898???? jint a = 0xcafebabe; >>>> 3899???? jint b = Atomic::xchg(0xdeadbeef, &a); >>>> 3900???? void *c = &a; >>>> 3901???? void *d = Atomic::xchg(&b, &c); >>>> 3902???? assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, >>>> "Atomic::xchg() works"); >>>> 3903???? assert(c == &b && d == &a, "Atomic::xchg() works"); >>>> 3904?? } >>>> 3905 #endif // ZERO && ASSERT >>>> >>>> It seems rather strange to be testing Atomic::xchg() here, rather than >>>> as part of unit testing Atomic?? Fail unit testing => don't try to >>>> use... >>> >>> This is zero.? I'm not touching this. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/prims/jvmtiRawMonitor.cpp >>>> ? 130???? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >>>> ? 142???? if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, >>>> &_owner)) { >>>> >>>> I think these casts aren't needed. _owner is void*, and Self is >>>> Thread*, which is implicitly convertible to void*. >>>> >>>> Similarly here, for the THREAD argument: >>>> ? 280???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >>>> (void*)NULL); >>>> ? 283???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >>>> (void*)NULL); >>> >>> Okay, let me see if the compiler(s) eat that. (yes they do) >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/prims/jvmtiRawMonitor.hpp >>>> >>>> This file is in the webrev, but seems to be unchanged. >>> >>> It'll be cleaned up with the the commit and not be part of the >>> changeset. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/atomic.hpp >>>> ? 520 template >>>> ? 521 inline D Atomic::sub(I sub_value, D volatile* dest) { >>>> ? 522?? STATIC_ASSERT(IsPointer::value || IsIntegral::value); >>>> ? 523?? // Assumes two's complement integer representation. >>>> ? 524?? #pragma warning(suppress: 4146) >>>> ? 525?? return Atomic::add(-sub_value, dest); >>>> ? 526 } >>>> >>>> I'm pretty sure this implementation is incorrect.? I think it produces >>>> the wrong result when I and D are both unsigned integer types and >>>> sizeof(I) < sizeof(D). >>> >>> Can you suggest a correction?? I just copied Atomic::dec(). >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/mutex.cpp >>>> ? 304?? intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, >>>> &_LockWord.FullWord, (intptr_t)0);? // agro ... >>>> >>>> _LBIT should probably be intptr_t, rather than an enum.? Note that the >>>> enum type is unused.? The old value here is another place where an >>>> implicit widening of same signedness would have been nice. (Such >>>> implicit widening doesn't work for enums, since it's unspecified >>>> whether they default to signed or unsigned representation, and >>>> implementatinos differ.) >>> >>> This would be a good/simple cleanup.? I changed it to const intptr_t >>> _LBIT = 1; >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/mutex.hpp >>>> >>>> [pre-existing] >>>> >>>> I think the Address member of the SplitWord union is unused. Looking >>>> at AcquireOrPush (and others), I'm wondering whether it *should* be >>>> used there, or whether just using intptr_t casts and doing integral >>>> arithmetic (as is presently being done) is easier and clearer. >>>> >>>> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >>>> rather than polluting the global namespace.? And technically, that >>>> name is reserved word. >>> >>> I moved both this and _LBIT into the top of mutex.cpp since they are >>> used there. >>> Cant define const intptr_t _LBIT =1; in a class in our version of C++. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/objectMonitor.cpp >>>> ? 252?? void * cur = Atomic::cmpxchg((void*)Self, &_owner, >>>> (void*)NULL); >>>> ? 409?? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >>>> 1983?????? ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, >>>> (void*)NULL); >>>> >>>> I think the casts of Self aren't needed. >>> >>> fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/objectMonitor.cpp >>>> ? 995?????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >>>> 1020???????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >>>> >>>> I think the casts of THREAD aren't needed. >>> >>> nope, fixed. >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/objectMonitor.hpp >>>> ? 254?? markOopDesc* volatile* header_addr(); >>>> >>>> Why isn't this volatile markOop* ? >>> >>> fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/synchronizer.cpp >>>> ? 242???????? Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) { >>>> >>>> I think the cast of Self isn't needed. >>> >>> fixed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/synchronizer.cpp >>>> ? 992?? for (; block != NULL; block = (PaddedEnd >>>> *)next(block)) { >>>> 1734???? for (; block != NULL; block = (PaddedEnd >>>> *)next(block)) { >>>> >>>> [pre-existing] >>>> All calls to next() pass a PaddedEnd* and cast the >>>> result.? How about moving all that behavior into next(). >>> >>> I fixed this next() function, but it necessitated a cast to FreeNext >>> field.? The PaddedEnd<> type was intentionally not propagated to all >>> the things that use it.?? Which is a shame because there are a lot >>> more casts to PaddedEnd that could have been removed. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/synchronizer.cpp >>>> 1970???? if (monitor > (ObjectMonitor *)&block[0] && >>>> 1971???????? monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) { >>>> >>>> [pre-existing] >>>> Are the casts needed here?? I think PaddedEnd is >>>> derived from ObjectMonitor, so implicit conversions should apply. >>> >>> prob not.? removed them. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/synchronizer.hpp >>>> ?? 28 #include "memory/padded.hpp" >>>> ? 163?? static PaddedEnd * volatile gBlockList; >>>> >>>> I was going to suggest as an alternative just making gBlockList a file >>>> scoped variable in synchronizer.cpp, since it isn't used outside of >>>> that file. Except that it is referenced by vmStructs.? Curses! >>> >>> It's also used by the SA. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/runtime/thread.cpp >>>> 4707?? intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, >>>> (intptr_t)0); >>>> >>>> This and other places suggest LOCKBIT should be defined as intptr_t, >>>> rather than as an enum value.? The MuxBits enum type is unused. >>>> >>>> And the cast of 0 is another case where implicit widening would be >>>> nice. >>> >>> Making LOCKBIT a const intptr_t = 1 removes a lot of casts. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/services/mallocSiteTable.cpp >>>> ? 261 bool MallocSiteHashtableEntry::atomic_insert(const >>>> MallocSiteHashtableEntry* entry) { >>>> ? 262?? return Atomic::cmpxchg_if_null(entry, (const >>>> MallocSiteHashtableEntry**)&_next); >>>> ? 263 } >>>> >>>> I think the problem here that is leading to the cast is that >>>> atomic_insert is taking a const T*.? Note that it's only caller passes >>>> a non-const T*. >>> >>> I'll change the type to non-const.? We try to use consts... >>> >>> Thanks for the detailed review!? The gcc compiler seems happy so far, >>> I'll post a webrev of the result of these changes after fixing >>> Atomic::sub() and seeing how the other compilers deal with these >>> changes. >>> >>> Thanks, >>> Coleen >>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> >>> >> From rkennke at redhat.com Mon Oct 16 05:49:26 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Oct 2017 07:49:26 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <331579a0-29de-f152-2dd4-66987896c463@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> <331579a0-29de-f152-2dd4-66987896c463@oracle.com> Message-ID: Hi David, thanks for reviewing and testing! The interaction between JEPs and patches going in is not really clear to me, nor is it well documented. For example, we're already pushing patches for JEP 304: Garbage Collection Interface, even though it's only in 'candidate' state... In any case, I'll ping Mark Reinhold about moving the Shark JEP forward. Thanks again, Roman > My internal JPRT run went fine. So this just needs a build team > signoff from the perspective of the patch. > > However, as this has had a JEP submitted for it, the code changes can > not be pushed until the JEP has been targeted. > > Thanks, > David > > On 16/10/2017 8:08 AM, David Holmes wrote: >> Looks good. >> >> Thanks, >> David >> >> On 16/10/2017 8:00 AM, Roman Kennke wrote: >>> >>> Ok, I fixed all the comments you mentioned. >>> >>> Differential (against webrev.01): >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >>> >>> Full webrev: >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >>> >>> >>> Roman >>> >>>> Just spotted this: >>>> >>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>>> >>>> David >>>> >>>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>>> Hi David, >>>>>> >>>>>> thanks! >>>>>> >>>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>>> >>>>> >>>>> Can you point me to the exact change please as I don't want to >>>>> re-examine it all. :) >>>>> >>>>> I'll pull this in and do a test build run internally. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, Roman >>>>>> >>>>>> >>>>>>> Hi Roman, >>>>>>> >>>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>>> big patch to remove it. >>>>>>>> >>>>>>>> What I have done: >>>>>>>> >>>>>>>> grep -i -R shark src >>>>>>>> grep -i -R shark make >>>>>>>> grep -i -R shark doc >>>>>>>> grep -i -R shark doc >>>>>>>> >>>>>>>> and purged any reference to shark. Almost everything was >>>>>>>> straightforward. >>>>>>>> >>>>>>>> The only things I wasn't really sure of: >>>>>>>> >>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>>> pd_address_in_code(), I am not sure it is the right thing to >>>>>>>> do. If not, what *would* be the right thing? >>>>>>>> >>>>>>>> Then of course I did: >>>>>>>> >>>>>>>> rm -rf src/hotspot/share/shark >>>>>>>> >>>>>>>> I also went through the build machinery and removed stuff >>>>>>>> related to Shark and LLVM libs. >>>>>>>> >>>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>>> >>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>>> All looks fine. >>>>>>>> >>>>>>>> - I could not build zero because it seems broken because of the >>>>>>>> recent Atomic::* changes >>>>>>>> - I could not test any of the other arches that seemed to >>>>>>>> reference Shark (arm and sparc) >>>>>>>> >>>>>>>> Here's the full webrev: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>>> >>>>>>>> >>>>>>>> Can I get a review on this? >>>>>>>> >>>>>>>> Thanks, Roman >>>>>>>> >>>>>> >>> From david.holmes at oracle.com Mon Oct 16 06:10:19 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 16 Oct 2017 16:10:19 +1000 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <321ac223-1dc8-841c-93c0-c39c770b7e20@oracle.com> <331579a0-29de-f152-2dd4-66987896c463@oracle.com> Message-ID: <456436e4-955c-75f5-ac92-e2fd4a2fb280@oracle.com> On 16/10/2017 3:49 PM, Roman Kennke wrote: > > Hi David, > > thanks for reviewing and testing! > > The interaction between JEPs and patches going in is not really clear to > me, nor is it well documented. For example, we're already pushing > patches for JEP 304: Garbage Collection Interface, even though it's only > in 'candidate' state... If patches can be separated out into generally useful cleanup or enabling changes then it can be okay to push them independently of the JEP AFAIK. That's obviously a little subjective. In this case though we're talking about the whole thing at once, so AFAIK the JEP has to be targeted before the changes can be pushed. > In any case, I'll ping Mark Reinhold about moving the Shark JEP forward. Thanks. Should be simple enough, I hope. :) Cheers, David > Thanks again, > Roman > >> My internal JPRT run went fine. So this just needs a build team >> signoff from the perspective of the patch. >> >> However, as this has had a JEP submitted for it, the code changes can >> not be pushed until the JEP has been targeted. >> >> Thanks, >> David >> >> On 16/10/2017 8:08 AM, David Holmes wrote: >>> Looks good. >>> >>> Thanks, >>> David >>> >>> On 16/10/2017 8:00 AM, Roman Kennke wrote: >>>> >>>> Ok, I fixed all the comments you mentioned. >>>> >>>> Differential (against webrev.01): >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >>>> >>>> Full webrev: >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >>>> >>>> >>>> Roman >>>> >>>>> Just spotted this: >>>>> >>>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>>>> >>>>> David >>>>> >>>>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> thanks! >>>>>>> >>>>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>>>> >>>>>> >>>>>> Can you point me to the exact change please as I don't want to >>>>>> re-examine it all. :) >>>>>> >>>>>> I'll pull this in and do a test build run internally. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>>>> >>>>>>>> Hi Roman, >>>>>>>> >>>>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>>>> big patch to remove it. >>>>>>>>> >>>>>>>>> What I have done: >>>>>>>>> >>>>>>>>> grep -i -R shark src >>>>>>>>> grep -i -R shark make >>>>>>>>> grep -i -R shark doc >>>>>>>>> grep -i -R shark doc >>>>>>>>> >>>>>>>>> and purged any reference to shark. Almost everything was >>>>>>>>> straightforward. >>>>>>>>> >>>>>>>>> The only things I wasn't really sure of: >>>>>>>>> >>>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>>>> pd_address_in_code(), I am not sure it is the right thing to >>>>>>>>> do. If not, what *would* be the right thing? >>>>>>>>> >>>>>>>>> Then of course I did: >>>>>>>>> >>>>>>>>> rm -rf src/hotspot/share/shark >>>>>>>>> >>>>>>>>> I also went through the build machinery and removed stuff >>>>>>>>> related to Shark and LLVM libs. >>>>>>>>> >>>>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>>>> >>>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>>>> All looks fine. >>>>>>>>> >>>>>>>>> - I could not build zero because it seems broken because of the >>>>>>>>> recent Atomic::* changes >>>>>>>>> - I could not test any of the other arches that seemed to >>>>>>>>> reference Shark (arm and sparc) >>>>>>>>> >>>>>>>>> Here's the full webrev: >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Can I get a review on this? >>>>>>>>> >>>>>>>>> Thanks, Roman >>>>>>>>> >>>>>>> >>>> > From aph at redhat.com Mon Oct 16 07:31:50 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 16 Oct 2017 08:31:50 +0100 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> Message-ID: <75f7713d-e1d6-46a8-820d-f4d76f1d722b@redhat.com> On 15/10/17 21:26, John Paul Adrian Glaubitz wrote: > On 10/15/2017 12:41 AM, Roman Kennke wrote: >> The JEP to remove the Shark compiler has received exclusively positive >> feedback (JDK-8189173) on zero-dev. So here comes the big patch to remove it. > > I have now read through the JEP and I have to say, I'm sad to see > Shark go. > > In my opinion, Shark should be a supported version of the JVM as > LLVM is gaining code generation support for more and more > architectures. I have always liked the idea to split out the code > generation of compilers into a separate project and, in fact, the > compilers for many other languages like Rust and Julia rely on LLVM. There's no reason that something like Shark couldn't be written again, but the problem at the time was that LLVM was a work in flux, and its interface to the JIT continually mutated. In addition, each LLVM version had bugs which broke HotSpot; these bugs would be fixed in the next version, but the next version had more bugs which broke HotSpot. It was impossible to keep it working. > It's a pity that this value is not seen within the OpenJDK project. It's seen, for sure. Otherwise I wouldn't have wanted us to do it. There's no reason something like Shark couldn't be done again, but you wouldn't start from here. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Mon Oct 16 07:33:37 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 16 Oct 2017 08:33:37 +0100 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <152a7a54-d30f-3c82-313a-608ef118628a@redhat.com> Message-ID: <0ff1b913-15d8-3fce-0381-c16076a8b0b5@redhat.com> On 15/10/17 21:44, John Paul Adrian Glaubitz wrote: > FWIW, there are actually quite a number of users for Zero who would be happy to > have a JIT-version of it. One major user for Zero is MIPS (big-, little-endian, > 32 and 64 bit) which still doesn't have a native code generator in Hotspot. The problem with LLVM was always that its JIT interface didn't have support for unpopular targets, thus negating its usefulness. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.joelsson at oracle.com Mon Oct 16 08:24:56 2017 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Mon, 16 Oct 2017 10:24:56 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> Message-ID: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> Hello Roman, In hotspot.m4, I believe the check on line 328 (pre changes) is still relevant for just the zero case. Otherwise build changes look good to me. /Erik On 2017-10-16 00:00, Roman Kennke wrote: > > Ok, I fixed all the comments you mentioned. > > Differential (against webrev.01): > http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ > > Full webrev: > http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ > > > Roman > >> Just spotted this: >> >> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >> >> David >> >> On 16/10/2017 7:25 AM, David Holmes wrote: >>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>> Hi David, >>>> >>>> thanks! >>>> >>>> I'm uploading a 2nd revision of the patch that excludes the >>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>> >>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>> >>> >>> Can you point me to the exact change please as I don't want to >>> re-examine it all. :) >>> >>> I'll pull this in and do a test build run internally. >>> >>> Thanks, >>> David >>> >>>> Thanks, Roman >>>> >>>> >>>>> Hi Roman, >>>>> >>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>> big patch to remove it. >>>>>> >>>>>> What I have done: >>>>>> >>>>>> grep -i -R shark src >>>>>> grep -i -R shark make >>>>>> grep -i -R shark doc >>>>>> grep -i -R shark doc >>>>>> >>>>>> and purged any reference to shark. Almost everything was >>>>>> straightforward. >>>>>> >>>>>> The only things I wasn't really sure of: >>>>>> >>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>> If not, what *would* be the right thing? >>>>>> >>>>>> Then of course I did: >>>>>> >>>>>> rm -rf src/hotspot/share/shark >>>>>> >>>>>> I also went through the build machinery and removed stuff related >>>>>> to Shark and LLVM libs. >>>>>> >>>>>> Now the only references in the whole JDK tree to shark is a >>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>> >>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>> All looks fine. >>>>>> >>>>>> - I could not build zero because it seems broken because of the >>>>>> recent Atomic::* changes >>>>>> - I could not test any of the other arches that seemed to >>>>>> reference Shark (arm and sparc) >>>>>> >>>>>> Here's the full webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>> >>>>>> >>>>>> Can I get a review on this? >>>>>> >>>>>> Thanks, Roman >>>>>> >>>> > From magnus.ihse.bursie at oracle.com Mon Oct 16 09:25:59 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 16 Oct 2017 11:25:59 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> Message-ID: On 2017-10-16 10:24, Erik Joelsson wrote: > Hello Roman, > > In hotspot.m4, I believe the check on line 328 (pre changes) is still > relevant for just the zero case. Yes, it is indeed. > > Otherwise build changes look good to me. Agree, looks good. /Magnus > > /Erik > > > On 2017-10-16 00:00, Roman Kennke wrote: >> >> Ok, I fixed all the comments you mentioned. >> >> Differential (against webrev.01): >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >> >> Full webrev: >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >> >> >> Roman >> >>> Just spotted this: >>> >>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>> >>> David >>> >>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> >>>>> thanks! >>>>> >>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>> >>>> >>>> Can you point me to the exact change please as I don't want to >>>> re-examine it all. :) >>>> >>>> I'll pull this in and do a test build run internally. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Roman >>>>> >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>> big patch to remove it. >>>>>>> >>>>>>> What I have done: >>>>>>> >>>>>>> grep -i -R shark src >>>>>>> grep -i -R shark make >>>>>>> grep -i -R shark doc >>>>>>> grep -i -R shark doc >>>>>>> >>>>>>> and purged any reference to shark. Almost everything was >>>>>>> straightforward. >>>>>>> >>>>>>> The only things I wasn't really sure of: >>>>>>> >>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>>> If not, what *would* be the right thing? >>>>>>> >>>>>>> Then of course I did: >>>>>>> >>>>>>> rm -rf src/hotspot/share/shark >>>>>>> >>>>>>> I also went through the build machinery and removed stuff >>>>>>> related to Shark and LLVM libs. >>>>>>> >>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>> >>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>> All looks fine. >>>>>>> >>>>>>> - I could not build zero because it seems broken because of the >>>>>>> recent Atomic::* changes >>>>>>> - I could not test any of the other arches that seemed to >>>>>>> reference Shark (arm and sparc) >>>>>>> >>>>>>> Here's the full webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>> >>>>>>> >>>>>>> Can I get a review on this? >>>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>> >> > From rkennke at redhat.com Mon Oct 16 10:26:43 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Oct 2017 12:26:43 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> Message-ID: <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com> Hi Erik, You mean like this? http://cr.openjdk.java.net/~rkennke/8171853/webrev.04.diff/ Full webrev here: http://cr.openjdk.java.net/~rkennke/8171853/webrev.04/ Thanks, Roman > Hello Roman, > > In hotspot.m4, I believe the check on line 328 (pre changes) is still > relevant for just the zero case. > > Otherwise build changes look good to me. > > /Erik > > > On 2017-10-16 00:00, Roman Kennke wrote: >> >> Ok, I fixed all the comments you mentioned. >> >> Differential (against webrev.01): >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >> >> Full webrev: >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >> >> >> Roman >> >>> Just spotted this: >>> >>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>> >>> David >>> >>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>> Hi David, >>>>> >>>>> thanks! >>>>> >>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>> >>>> >>>> Can you point me to the exact change please as I don't want to >>>> re-examine it all. :) >>>> >>>> I'll pull this in and do a test build run internally. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, Roman >>>>> >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>> big patch to remove it. >>>>>>> >>>>>>> What I have done: >>>>>>> >>>>>>> grep -i -R shark src >>>>>>> grep -i -R shark make >>>>>>> grep -i -R shark doc >>>>>>> grep -i -R shark doc >>>>>>> >>>>>>> and purged any reference to shark. Almost everything was >>>>>>> straightforward. >>>>>>> >>>>>>> The only things I wasn't really sure of: >>>>>>> >>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>> pd_address_in_code(), I am not sure it is the right thing to do. >>>>>>> If not, what *would* be the right thing? >>>>>>> >>>>>>> Then of course I did: >>>>>>> >>>>>>> rm -rf src/hotspot/share/shark >>>>>>> >>>>>>> I also went through the build machinery and removed stuff >>>>>>> related to Shark and LLVM libs. >>>>>>> >>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>> >>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>> All looks fine. >>>>>>> >>>>>>> - I could not build zero because it seems broken because of the >>>>>>> recent Atomic::* changes >>>>>>> - I could not test any of the other arches that seemed to >>>>>>> reference Shark (arm and sparc) >>>>>>> >>>>>>> Here's the full webrev: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>> >>>>>>> >>>>>>> Can I get a review on this? >>>>>>> >>>>>>> Thanks, Roman >>>>>>> >>>>> >> > From erik.joelsson at oracle.com Mon Oct 16 10:55:28 2017 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Mon, 16 Oct 2017 12:55:28 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <836cd06d-d10a-a98e-d996-b5b92de94c4b@oracle.com> <980aa9ae-9c50-c1dc-d52f-e00234a2e3ca@redhat.com> <872910c6-a17b-d3df-bc80-fa850b9738d9@oracle.com> <6ed7e856-baee-a59d-6710-4ff143277dc9@redhat.com> Message-ID: That looks correct. Thanks! /Erik On 2017-10-16 12:26, Roman Kennke wrote: > > Hi Erik, > > You mean like this? > > http://cr.openjdk.java.net/~rkennke/8171853/webrev.04.diff/ > > > Full webrev here: > http://cr.openjdk.java.net/~rkennke/8171853/webrev.04/ > > > Thanks, > Roman > >> Hello Roman, >> >> In hotspot.m4, I believe the check on line 328 (pre changes) is still >> relevant for just the zero case. >> >> Otherwise build changes look good to me. >> >> /Erik >> >> >> On 2017-10-16 00:00, Roman Kennke wrote: >>> >>> Ok, I fixed all the comments you mentioned. >>> >>> Differential (against webrev.01): >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03.diff/ >>> >>> Full webrev: >>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.03/ >>> >>> >>> Roman >>> >>>> Just spotted this: >>>> >>>> ./hotspot/jtreg/compiler/whitebox/CompilerWhiteBoxTest.java: /** >>>> {@code CompLevel::CompLevel_full_optimization} -- C2 or Shark */ >>>> >>>> David >>>> >>>> On 16/10/2017 7:25 AM, David Holmes wrote: >>>>> On 16/10/2017 7:01 AM, Roman Kennke wrote: >>>>>> Hi David, >>>>>> >>>>>> thanks! >>>>>> >>>>>> I'm uploading a 2nd revision of the patch that excludes the >>>>>> generated-configure.sh part, and adds a smallish Zero-related fix. >>>>>> >>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.01/ >>>>>> >>>>> >>>>> Can you point me to the exact change please as I don't want to >>>>> re-examine it all. :) >>>>> >>>>> I'll pull this in and do a test build run internally. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, Roman >>>>>> >>>>>> >>>>>>> Hi Roman, >>>>>>> >>>>>>> The build changes must be reviewed on build-dev - now cc'd. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> On 15/10/2017 8:41 AM, Roman Kennke wrote: >>>>>>>> The JEP to remove the Shark compiler has received exclusively >>>>>>>> positive feedback (JDK-8189173) on zero-dev. So here comes the >>>>>>>> big patch to remove it. >>>>>>>> >>>>>>>> What I have done: >>>>>>>> >>>>>>>> grep -i -R shark src >>>>>>>> grep -i -R shark make >>>>>>>> grep -i -R shark doc >>>>>>>> grep -i -R shark doc >>>>>>>> >>>>>>>> and purged any reference to shark. Almost everything was >>>>>>>> straightforward. >>>>>>>> >>>>>>>> The only things I wasn't really sure of: >>>>>>>> >>>>>>>> - in globals.hpp, I re-arranged the KIND_* bits to account for >>>>>>>> the gap that removing KIND_SHARK left. I hope that's good? >>>>>>>> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in >>>>>>>> pd_address_in_code(), I am not sure it is the right thing to >>>>>>>> do. If not, what *would* be the right thing? >>>>>>>> >>>>>>>> Then of course I did: >>>>>>>> >>>>>>>> rm -rf src/hotspot/share/shark >>>>>>>> >>>>>>>> I also went through the build machinery and removed stuff >>>>>>>> related to Shark and LLVM libs. >>>>>>>> >>>>>>>> Now the only references in the whole JDK tree to shark is a >>>>>>>> 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >>>>>>>> >>>>>>>> I tested by building a regular x86 JVM and running JTREG tests. >>>>>>>> All looks fine. >>>>>>>> >>>>>>>> - I could not build zero because it seems broken because of the >>>>>>>> recent Atomic::* changes >>>>>>>> - I could not test any of the other arches that seemed to >>>>>>>> reference Shark (arm and sparc) >>>>>>>> >>>>>>>> Here's the full webrev: >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >>>>>>>> >>>>>>>> >>>>>>>> Can I get a review on this? >>>>>>>> >>>>>>>> Thanks, Roman >>>>>>>> >>>>>> >>> >> > From coleen.phillimore at oracle.com Mon Oct 16 13:10:47 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 16 Oct 2017 09:10:47 -0400 Subject: RFR: 8189333: Fix Zero build after Atomic::xchg changes In-Reply-To: References: <003ff7d9-759f-1ef5-f580-18c2571b63e5@redhat.com> Message-ID: <441ed55f-6398-9fa1-d571-86548ed5a2a9@oracle.com> Hi Roman, Can you build zero with this changeset? http://cr.openjdk.java.net/~coleenp/8188220.03/webrev/index.html My scripts for building zero are broken now. thanks, Coleen On 10/15/17 5:40 PM, Roman Kennke wrote: > Am 15.10.2017 um 23:32 schrieb David Holmes: >> Hi Roman, >> >> On 16/10/2017 7:12 AM, Roman Kennke wrote: >>> Zero debug build has been broken by: JDK-8187977: Generalize >>> Atomic::xchg to use templates. >>> >>> This patch fixes it by casting the unsigned literal to jint: >>> >>> http://cr.openjdk.java.net/~rkennke/8189333/webrev.00/ >>> >> >> Looks fine. >> >> I can push this for you straight away (relatively speaking :) ) under >> the trivial rule. > Thanks! > > Roman From coleen.phillimore at oracle.com Mon Oct 16 13:13:52 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 16 Oct 2017 09:13:52 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> Message-ID: <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> On 10/14/17 7:36 PM, Kim Barrett wrote: >> On Oct 13, 2017, at 2:34 PM, coleen.phillimore at oracle.com wrote: >> >> >> Hi, Here is the version with the changes from Kim's comments that has passed at least testing with JPRT and tier1, locally. More testing (tier2-5) is in progress. >> >> Also includes a corrected version of Atomic::sub care of Erik Osterlund. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev >> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev >> >> Full version: >> >> http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >> >> Thanks! >> Coleen > I still dislike and disagree with what is being proposed regarding replace_if_null. We can discuss that seperately, please file an RFE. > > ------------------------------------------------------------------------------ > I forgot that I'd promised you an updated Atomic::sub definition. > Unfortunately, the new one still has problems, performing some > conversions that should not be permitted (and are disallowed by > Atomic::add). Try this instead. (This hasn't been tested, not even > compiled; hopefully I don't have any typos or anything.) The intent > is that this supports the same conversions as Atomic::add. > > template > inline D Atomic::sub(I sub_value, D volatile* dest) { > STATIC_ASSERT(IsPointer::value || IsIntegral::value); > STATIC_ASSERT(IsIntegral::value); > // If D is a pointer type, use [u]intptr_t as the addend type, > // matching signedness of I. Otherwise, use D as the addend type. > typedef typename Conditional::value, intptr_t, uintptr_t>::type PI; > typedef typename Conditional::value, PI, D>::type AddendType; > // Only allow conversions that can't change the value. > STATIC_ASSERT(IsSigned::value == IsSigned::value); > STATIC_ASSERT(sizeof(I) <= sizeof(AddendType)); > AddendType addend = sub_value; > // Assumes two's complement integer representation. > #pragma warning(suppress: 4146) // In case AddendType is not signed. > return Atomic::add(-addend, dest); > } Uh, Ok.? I'll try it out. > >>>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>>> 7960 Atomic::add(-n, &_num_par_pushes); >>>> >>>> Atomic::sub >>> fixed. > Nope, not fixed in http://cr.openjdk.java.net/~coleenp/8188220.03/webrev Missed it twice now.? I think I have it now. >>>> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >>>> 200 PerRegionTable* res = >>>> 201 Atomic::cmpxchg(nxt, &_free_list, fl); >>>> >>>> Please remove the line break, now that the code has been simplified. >>>> >>>> But wait, doesn't this alloc exhibit classic ABA problems? I *think* >>>> this works because alloc and bulk_free are called in different phases, >>>> never overlapping. >>> I don't know. Do you want to file a bug to investigate this? >>> fixed. > No, I now think it?s ok, though confusing. > >>>> src/hotspot/share/gc/g1/sparsePRT.cpp >>>> 295 SparsePRT* res = >>>> 296 Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >>>> and >>>> 307 SparsePRT* res = >>>> 308 Atomic::cmpxchg(next, &_head_expanded_list, hd); >>>> >>>> I'd rather not have the line breaks in these either. >>>> >>>> And get_from_expanded_list also appears to have classic ABA problems. >>>> I *think* this works because add_to_expanded_list and >>>> get_from_expanded_list are called in different phases, never >>>> overlapping. >>> Fixed, same question as above? Or one bug to investigate both? > Again, I think it?s ok, though confusing. > >>>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>>> 262 return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>>> 263 (volatile intptr_t *)&_data, >>>> 264 (intptr_t)old_age._data); >>>> >>>> This should be >>>> >>>> return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >>> fixed. > Still casting the result. I thought I fixed it.? I think I fixed it now. > >>>> src/hotspot/share/oops/method.hpp >>>> 139 volatile address from_compiled_entry() const { return OrderAccess::load_acquire(&_from_compiled_entry); } >>>> 140 volatile address from_compiled_entry_no_trampoline() const; >>>> 141 volatile address from_interpreted_entry() const{ return OrderAccess::load_acquire(&_from_interpreted_entry); } >>>> >>>> [pre-existing] >>>> The volatile qualifiers here seem suspect to me. >>> Again much suspicion about concurrency and giant pain, which I remember, of debugging these when they were broken. > Let me be more direct: the volatile qualifiers for the function return > types are bogus and confusing, and should be removed. Okay, sure. > >>>> src/hotspot/share/prims/jni.cpp >>>> >>>> [pre-existing] >>>> >>>> copy_jni_function_table should be using Copy::disjoint_words_atomic. >>> yuck. > Of course, neither is entirely technically correct, since both are > treating conversion of function pointers to void* as okay in shared > code, e.g. violating some of the raison d'etre of CAST_{TO,FROM}_FN_PTR. > For way more detail than you probably care about, see the discussion > starting here: > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018578.html > through (5 messages in total) > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018623.html > > Oh well. > >>>> src/hotspot/share/runtime/mutex.hpp >>>> >>>> [pre-existing] >>>> >>>> I think the Address member of the SplitWord union is unused. Looking >>>> at AcquireOrPush (and others), I'm wondering whether it *should* be >>>> used there, or whether just using intptr_t casts and doing integral >>>> arithmetic (as is presently being done) is easier and clearer. >>>> >>>> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >>>> rather than polluting the global namespace. And technically, that >>>> name is reserved word. >>> I moved both this and _LBIT into the top of mutex.cpp since they are used there. > Good. > >>> Cant define const intptr_t _LBIT =1; in a class in our version of C++. > Sorry, please explain? If you tried to move it into SplitWord, that doesn?t work; > unions are not permitted to have static data members (I don?t off-hand know why, > just that it?s explicitly forbidden). > > And you left the seemingly unused Address member in SplitWord. This is the compilation error I get: /scratch/cphillim/hg/10ptr2/open/src/hotspot/share/runtime/mutex.hpp:124:33: error: non-static data member initializers only available with -std=c++11 or -std=gnu++11 [-Werror] ?? const intptr_t _NEW_LOCKBIT = 1; I don't own this SplitWord code so do not want to remove the unused Address member. > >>>> src/hotspot/share/runtime/thread.cpp >>>> 4707 intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, (intptr_t)0); >>>> >>>> This and other places suggest LOCKBIT should be defined as intptr_t, >>>> rather than as an enum value. The MuxBits enum type is unused. >>>> >>>> And the cast of 0 is another case where implicit widening would be nice. >>> Making LOCKBIT a const intptr_t = 1 removes a lot of casts. > Because of the new definition of LOCKBIT I noticed the immediately > preceeding typedef for MutexT, which seems to be unused. Removed MutexT. > > ------------------------------------------------------------------------------ > src/hotspot/share/oops/cpCache.cpp > 114 bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { > 115 intptr_t result = Atomic::cmpxchg(flags, &_flags, (intx)0); > 116 return (result == 0); > 117 } > > [I missed this on earlier pass.] > > Should be > > bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { > return Atomic::cmpxchg(flags, &_flags, (intx)0) == 0; > } > > Otherwise, I end up asking why result is intptr_t when the cmpxchg is > dealing with intx. Yeah, one's a typedef of the other, but mixing > them like that in the same expression is not helpful. > > Sure why not? Actually init_flags_atomic is not used and neither is init_method_flags_atomic so I did one better and removed them. Thanks for the again thorough code review and Atomic::sub.?? I'll post incremental when it compiles. Coleen From coleen.phillimore at oracle.com Mon Oct 16 13:27:24 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 16 Oct 2017 09:27:24 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <33af17b9-6dce-5a5e-cb94-b3c1afbe8532@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <7265c30d-946b-19c4-a1b3-c3314a869ee8@oracle.com> <33af17b9-6dce-5a5e-cb94-b3c1afbe8532@oracle.com> Message-ID: <97afc964-b6e4-9937-94d0-06aa181919a2@oracle.com> On 10/15/17 9:18 PM, David Holmes wrote: > One tiny follow up as I was looking at this code: > > src/hotspot/share/services/mallocSiteTable.hpp > > 65?? MallocSiteHashtableEntry* _next; > > should be > > 65?? MallocSiteHashtableEntry* volatile _next; > > as we operate on it with CAS. Ok, got it. thanks. Coleen > > Thanks, > David > > On 14/10/2017 10:32 PM, David Holmes wrote: >> Hi Coleen, >> >> These changes all seem okay to me - except I can't comment on the >> Atomic::sub implementation. :) >> >> Thanks for adding the assert to header_addr(). FYI from >> objectMonitor.hpp: >> >> // ObjectMonitor Layout Overview/Highlights/Restrictions: >> // >> // - The _header field must be at offset 0 because the displaced header >> //?? from markOop is stored there. We do not want markOop.hpp to include >> //?? ObjectMonitor.hpp to avoid exposing ObjectMonitor everywhere. This >> //?? means that ObjectMonitor cannot inherit from any other class nor >> can >> //?? it use any virtual member functions. This restriction is >> critical to >> //?? the proper functioning of the VM. >> >> so it is important we ensure this holds. >> >> Thanks, >> David >> >> On 14/10/2017 4:34 AM, coleen.phillimore at oracle.com wrote: >>> >>> Hi, Here is the version with the changes from Kim's comments that >>> has passed at least testing with JPRT and tier1, locally.?? More >>> testing (tier2-5) is in progress. >>> >>> Also includes a corrected version of Atomic::sub care of Erik >>> Osterlund. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev >>> >>> Full version: >>> >>> http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >>> >>> Thanks! >>> Coleen >>> >>> On 10/13/17 9:25 AM, coleen.phillimore at oracle.com wrote: >>>> >>>> Hi Kim, Thank you for the detailed review and the time you've spent >>>> on it, and discussion yesterday. >>>> >>>> On 10/12/17 7:17 PM, Kim Barrett wrote: >>>>>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote: >>>>>> >>>>>> Summary: With the new template functions these are unnecessary. >>>>>> >>>>>> The changes are mostly s/_ptr// and removing the cast to return >>>>>> type.? There weren't many types that needed to be improved to >>>>>> match the template version of the function.?? Some notes: >>>>>> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, >>>>>> rearranging arguments. >>>>>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.? I >>>>>> disliked the first name because it's not explicit from the >>>>>> callers that there's an underlying cas.? If people want to fight, >>>>>> I'll remove the function and use cmpxchg because there are only a >>>>>> couple places where this is a little nicer. >>>>>> 3. Added Atomic::sub() >>>>>> >>>>>> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris. >>>>>> >>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev >>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8188220 >>>>>> >>>>>> Thanks, >>>>>> Coleen >>>>> I looked harder at the potential ABA problems, and believe they are >>>>> okay.? There can be multiple threads doing pushes, and there can be >>>>> multiple threads doing pops, but not both at the same time. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/cpu/zero/cppInterpreter_zero.cpp >>>>> ? 279???? if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) >>>>> != disp) { >>>>> >>>>> How does this work?? monitor and disp seem like they have unrelated >>>>> types?? Given that this is zero-specific code, maybe this hasn't been >>>>> tested? >>>>> >>>>> Similarly here: >>>>> ? 423?????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) >>>>> != lock) { >>>> >>>> I haven't built zero.? I don't know how to do this anymore (help?) >>>> I fixed the obvious type mismatches here and in >>>> bytecodeInterpreter.cpp.? I'll try to build it. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/asm/assembler.cpp >>>>> ? 239???????? dcon->value_fn = cfn; >>>>> >>>>> Is it actually safe to remove the atomic update?? If multiple threads >>>>> performing the assignment *are* possible (and I don't understand the >>>>> context yet, so don't know the answer to that), then a bare >>>>> non-atomic >>>>> assignment is a race, e.g. undefined behavior. >>>>> >>>>> Regardless of that, I think the CAST_FROM_FN_PTR should be retained. >>>> >>>> I can find no uses of this code, ie. looking for "delayed_value". I >>>> think it was early jsr292 code.? I could also not find any >>>> combination of casts that would make it compile, so in the end I >>>> believed the comment and took out the cmpxchg.?? The code appears >>>> to be intended to for bootstrapping, see the call to >>>> update_delayed_values() in JavaClasses::compute_offsets(). >>>> >>>> The CAST_FROM_FN_PTR was to get it to compile with cmpxchg, the new >>>> code does not require a cast.? If you can help with finding the >>>> right set of casts, I'd be happy to put the cmpxchg back in. I just >>>> couldn't find one. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/classfile/classLoaderData.cpp >>>>> ? 167?? Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head); >>>>> >>>>> I think the cast to Chunk* is no longer needed. >>>> >>>> Missed another, thanks.? No that's the same one David found. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/classfile/classLoaderData.cpp >>>>> ? 946???? ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, >>>>> (ClassLoaderData*)NULL); >>>>> ? 947???? if (old != NULL) { >>>>> ? 948?????? delete cld; >>>>> ? 949?????? // Returns the data. >>>>> ? 950?????? return old; >>>>> ? 951???? } >>>>> >>>>> That could instead be >>>>> >>>>> ?? if (!Atomic::replace_if_null(cld, cld_addr)) { >>>>> ???? delete cld;?????????? // Lost the race. >>>>> ???? return *cld_addr;???? // Use the winner's value. >>>>> ?? } >>>>> >>>>> And apparently the caller of CLDG::add doesn't care whether the >>>>> returned CLD has actually been added to the graph yet.? If that's not >>>>> true, then there's a bug here, since a race loser might return a >>>>> winner's value before the winner has actually done the insertion. >>>> >>>> True, the race loser doesn't care whether the CLD has been added to >>>> the graph. >>>> Your instead code requires a comment that replace_if_null is really >>>> a compare exchange and has an extra read of the original value, so >>>> I am leaving what I have which is clearer to me. >>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/classfile/verifier.cpp >>>>> ?? 71 static void* verify_byte_codes_fn() { >>>>> ?? 72?? if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == >>>>> NULL) { >>>>> ?? 73???? void *lib_handle = os::native_java_library(); >>>>> ?? 74???? void *func = os::dll_lookup(lib_handle, >>>>> "VerifyClassCodesForMajorVersion"); >>>>> ?? 75 OrderAccess::release_store(&_verify_byte_codes_fn, func); >>>>> ?? 76???? if (func == NULL) { >>>>> ?? 77?????? _is_new_verify_byte_codes_fn = false; >>>>> ?? 78?????? func = os::dll_lookup(lib_handle, "VerifyClassCodes"); >>>>> ?? 79 OrderAccess::release_store(&_verify_byte_codes_fn, func); >>>>> ?? 80???? } >>>>> ?? 81?? } >>>>> ?? 82?? return (void*)_verify_byte_codes_fn; >>>>> ?? 83 } >>>>> >>>>> [pre-existing] >>>>> >>>>> I think this code has race problems; a caller could unexpectedly and >>>>> inappropriately return NULL.? Consider the case where there is no >>>>> VerifyClassCodesForMajorVersion, but there is VerifyClassCodes. >>>>> >>>>> The variable is initially NULL. >>>>> >>>>> Both Thread1 and Thread2 reach line 73, having both seen a NULL value >>>>> for the variable. >>>>> >>>>> Thread1 reaches line 80, setting the variable to VerifyClassCodes. >>>>> >>>>> Thread2 reaches line 76, resetting the variable to NULL. >>>>> >>>>> Thread1 reads the now (momentarily) NULL value and returns it. >>>>> >>>>> I think the first release_store should be conditional on func != >>>>> NULL. >>>>> Also, the usage of _is_new_verify_byte_codes_fn seems suspect. >>>>> And a minor additional nit: the cast in the return is unnecessary. >>>> >>>> Yes, this looks like a bug.?? I'll cut/paste this and file it. It >>>> may be that this is support for the old verifier in old jdk >>>> versions that can be cleaned up. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/code/nmethod.cpp >>>>> 1664?? nmethod* observed_mark_link = _oops_do_mark_link; >>>>> 1665?? if (observed_mark_link == NULL) { >>>>> 1666???? // Claim this nmethod for this thread to mark. >>>>> 1667???? if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, >>>>> &_oops_do_mark_link)) { >>>>> >>>>> With these changes, the only use of observed_mark_link is in the if. >>>>> I'm not sure that variable is really useful anymore, e.g. just use >>>>> >>>>> ?? if (_oops_do_mark_link == NULL) { >>>> >>>> Ok fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>>>> >>>>> In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were >>>>> of type oopDesc*, I think there would be a whole lot fewer casts and >>>>> cast_to_oop's.? Later on, I think suffix_head, >>>>> observed_overflow_list, >>>>> and curr_overflow_list could also be oopDesc* instead of oop to >>>>> eliminate more casts. >>>> >>>> I actually tried to make this change but ran into more fan out that >>>> way, so went back and just fixed the cmpxchg calls to cast oops to >>>> oopDesc* and things were less perturbed that way. >>>>> >>>>> And some similar changes in CMSCollector::par_push_on_overflow_list. >>>>> >>>>> And similarly in parNewGeneration.cpp, in push_on_overflow_list and >>>>> take_from_overflow_list_work. >>>>> >>>>> As noted in the comments for JDK-8165857, the lists and "objects" >>>>> involved here aren't really oops, but rather the shattered remains of >>>> >>>> Yes, somewhat horrified at the value of BUSY. >>>>> oops.? The suggestion there was to use HeapWord* and carry through >>>>> the >>>>> fanout; what was actually done was to change _overflow_list to >>>>> oopDesc* to minimize fanout, even though that's kind of lying to the >>>>> type system.? Now, with the cleanup of cmpxchg_ptr and such, we're >>>>> paying the price of doing the minimal thing back then. >>>> >>>> I will file an RFE about cleaning this up.? I think what I've done >>>> was the minimal thing. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>>>> 7960?? Atomic::add(-n, &_num_par_pushes); >>>>> >>>>> Atomic::sub >>>> >>>> fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/cms/parNewGeneration.cpp >>>>> 1455?? Atomic::add(-n, &_num_par_pushes); >>>> fixed. >>>>> Atomic::sub >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/g1/dirtyCardQueue.cpp >>>>> ? 283???? void* actual = Atomic::cmpxchg(next, >>>>> &_cur_par_buffer_node, nd); >>>>> ... >>>>> ? 289?????? nd = static_cast(actual); >>>>> >>>>> Change actual's type to BufferNode* and remove the cast on line 289. >>>> >>>> fixed.? missed that one. gross. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp >>>>> >>>>> [pre-existing] >>>>> 3499???????? old = (CompiledMethod*)_postponed_list; >>>>> >>>>> I think that cast is only needed because >>>>> G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as >>>>> "volatile CompiledMethod*", when I think it ought to be >>>>> "CompiledMethod* volatile". >>>>> >>>>> I think G1CodeCacheUnloading::_claimed_nmethod is similarly >>>>> mis-typed, >>>>> with a similar should not be needed cast: >>>>> 3530?????? first = (CompiledMethod*)_claimed_nmethod; >>>>> >>>>> and another for _postponed_list here: >>>>> 3552?????? claim = (CompiledMethod*)_postponed_list; >>>> >>>> I've fixed this.?? C++ is so confusing about where to put the >>>> volatile.?? Everyone has been tripped up by it. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/g1/g1HotCardCache.cpp >>>>> ?? 77?? jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr, >>>>> >>>>> I think the cast of the cmpxchg result is no longer needed. >>>> >>>> fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp >>>>> ? 254?????? char* touch_addr = >>>>> (char*)Atomic::add(actual_chunk_size, &_cur_addr) - >>>>> actual_chunk_size; >>>>> >>>>> I think the cast of the add result is no longer needed. >>>> got it already. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/g1/g1StringDedup.cpp >>>>> ? 213?? return (size_t)Atomic::add(partition_size, &_next_bucket) >>>>> - partition_size; >>>>> >>>>> I think the cast of the add result is no longer needed. >>>> >>>> I was slacking in the g1 files.? fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >>>>> ? 200?????? PerRegionTable* res = >>>>> ? 201???????? Atomic::cmpxchg(nxt, &_free_list, fl); >>>>> >>>>> Please remove the line break, now that the code has been simplified. >>>>> >>>>> But wait, doesn't this alloc exhibit classic ABA problems?? I *think* >>>>> this works because alloc and bulk_free are called in different >>>>> phases, >>>>> never overlapping. >>>> >>>> I don't know.? Do you want to file a bug to investigate this? >>>> fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/g1/sparsePRT.cpp >>>>> ? 295???? SparsePRT* res = >>>>> ? 296?????? Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >>>>> and >>>>> ? 307???? SparsePRT* res = >>>>> ? 308?????? Atomic::cmpxchg(next, &_head_expanded_list, hd); >>>>> >>>>> I'd rather not have the line breaks in these either. >>>>> >>>>> And get_from_expanded_list also appears to have classic ABA problems. >>>>> I *think* this works because add_to_expanded_list and >>>>> get_from_expanded_list are called in different phases, never >>>>> overlapping. >>>> >>>> Fixed, same question as above?? Or one bug to investigate both? >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>>>> ? 262?? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>>>> ? 263?????????????????????????????????? (volatile intptr_t *)&_data, >>>>> ? 264 (intptr_t)old_age._data); >>>>> >>>>> This should be >>>>> >>>>> ?? return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >>>> >>>> fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/interpreter/bytecodeInterpreter.cpp >>>>> This doesn't have any casts, which I think is correct. >>>>> ? 708???????????? if (Atomic::cmpxchg(header, rcvr->mark_addr(), >>>>> mark) == mark) { >>>>> >>>>> but these do. >>>>> ? 718???????????? if (Atomic::cmpxchg((void*)new_header, >>>>> rcvr->mark_addr(), mark) == mark) { >>>>> ? 737???????????? if (Atomic::cmpxchg((void*)new_header, >>>>> rcvr->mark_addr(), header) == header) { >>>>> >>>>> I'm not sure how the ones with casts even compile? mark_addr() seems >>>>> to be a markOop*, which is a markOopDesc**, where markOopDesc is a >>>>> class.? void* is not implicitly convertible to markOopDesc*. >>>>> >>>>> Hm, this entire file is #ifdef CC_INTERP.? Is this zero-only >>>>> code?? Or >>>>> something like that? >>>>> >>>>> Similarly here: >>>>> ? 906?????????? if (Atomic::cmpxchg(header, lockee->mark_addr(), >>>>> mark) == mark) { >>>>> and >>>>> ? 917?????????? if (Atomic::cmpxchg((void*)new_header, >>>>> lockee->mark_addr(), mark) == mark) { >>>>> ? 935?????????? if (Atomic::cmpxchg((void*)new_header, >>>>> lockee->mark_addr(), header) == header) { >>>>> >>>>> and here: >>>>> 1847?????????????? if (Atomic::cmpxchg(header, >>>>> lockee->mark_addr(), mark) == mark) { >>>>> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >>>>> lockee->mark_addr(), mark) == mark) { >>>>> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >>>>> lockee->mark_addr(), header) == header) { >>>>> >>>>> and here: >>>>> 1847?????????????? if (Atomic::cmpxchg(header, >>>>> lockee->mark_addr(), mark) == mark) { >>>>> 1858?????????????? if (Atomic::cmpxchg((void*)new_header, >>>>> lockee->mark_addr(), mark) == mark) { >>>>> 1878?????????????? if (Atomic::cmpxchg((void*)new_header, >>>>> lockee->mark_addr(), header) == header) { >>>> >>>> I've changed all these.?? This is part of Zero. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/memory/metaspace.cpp >>>>> 1502?? size_t value = OrderAccess::load_acquire(&_capacity_until_GC); >>>>> ... >>>>> 1537?? return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC); >>>>> >>>>> These and other uses of _capacity_until_GC suggest that variable's >>>>> type should be size_t rather than intptr_t.? Note that I haven't done >>>>> a careful check of uses to see if there are any places where such a >>>>> change would cause problems. >>>> >>>> Yes, I had a hard time with metaspace.cpp because I agree >>>> _capacity_until_GC should be size_t.?? Tried to make this change >>>> and it cascaded a bit.? I'll file an RFE to change this type >>>> separately. >>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/oops/constantPool.cpp >>>>> ? 229?? OrderAccess::release_store((Klass* volatile *)adr, k); >>>>> ? 246?? OrderAccess::release_store((Klass* volatile *)adr, k); >>>>> ? 514?? OrderAccess::release_store((Klass* volatile *)adr, k); >>>>> >>>>> Casts are not needed. >>>> >>>> fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/oops/constantPool.hpp >>>>> ? 148???? volatile intptr_t adr = >>>>> OrderAccess::load_acquire(obj_at_addr_raw(which)); >>>>> >>>>> [pre-existing] >>>>> Why is adr declared volatile? >>>> >>>> golly beats me.? concurrency is scary, especially in the constant >>>> pool. >>>> The load_acquire() should make sure the value is fetched from >>>> memory so volatile is unneeded. >>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/oops/cpCache.cpp >>>>> ? 157???? intx newflags = (value & parameter_size_mask); >>>>> ? 158???? Atomic::cmpxchg(newflags, &_flags, (intx)0); >>>>> >>>>> This is a nice demonstration of why I wanted to include some value >>>>> preserving integral conversions in cmpxchg, rather than requiring >>>>> exact type matching in the integral case.? There have been some >>>>> others >>>>> that I haven't commented on.? Apparently we (I) got away with >>>>> including such conversions in Atomic::add, which I'd forgotten about. >>>>> And see comment regarding Atomic::sub below. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/oops/cpCache.hpp >>>>> ? 139?? volatile Metadata*?? _f1;?????? // entry specific metadata >>>>> field >>>>> >>>>> [pre-existing] >>>>> I suspect the type should be Metadata* volatile.? And that would >>>>> eliminate the need for the cast here: >>>>> >>>>> ? 339?? Metadata* f1_ord() const?????????????????????? { return >>>>> (Metadata *)OrderAccess::load_acquire(&_f1); } >>>>> >>>>> I don't know if there are any other changes needed or desirable >>>>> around >>>>> _f1 usage. >>>> >>>> yes, fixed this. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/oops/method.hpp >>>>> ? 139?? volatile address from_compiled_entry() const?? { return >>>>> OrderAccess::load_acquire(&_from_compiled_entry); } >>>>> ? 140?? volatile address from_compiled_entry_no_trampoline() const; >>>>> ? 141?? volatile address from_interpreted_entry() const{ return >>>>> OrderAccess::load_acquire(&_from_interpreted_entry); } >>>>> >>>>> [pre-existing] >>>>> The volatile qualifiers here seem suspect to me. >>>> >>>> Again much suspicion about concurrency and giant pain, which I >>>> remember, of debugging these when they were broken. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/oops/oop.inline.hpp >>>>> ? 391???? narrowOop old = (narrowOop)Atomic::xchg(val, >>>>> (narrowOop*)dest); >>>>> >>>>> Cast of return type is not needed. >>>> >>>> fixed. >>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/prims/jni.cpp >>>>> >>>>> [pre-existing] >>>>> >>>>> copy_jni_function_table should be using Copy::disjoint_words_atomic. >>>> >>>> yuck. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/prims/jni.cpp >>>>> >>>>> [pre-existing] >>>>> >>>>> 3892?? // We're about to use Atomic::xchg for synchronization. >>>>> Some Zero >>>>> 3893?? // platforms use the GCC builtin __sync_lock_test_and_set >>>>> for this, >>>>> 3894?? // but __sync_lock_test_and_set is not guaranteed to do >>>>> what we want >>>>> 3895?? // on all architectures.? So we check it works before >>>>> relying on it. >>>>> 3896 #if defined(ZERO) && defined(ASSERT) >>>>> 3897?? { >>>>> 3898???? jint a = 0xcafebabe; >>>>> 3899???? jint b = Atomic::xchg(0xdeadbeef, &a); >>>>> 3900???? void *c = &a; >>>>> 3901???? void *d = Atomic::xchg(&b, &c); >>>>> 3902???? assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, >>>>> "Atomic::xchg() works"); >>>>> 3903???? assert(c == &b && d == &a, "Atomic::xchg() works"); >>>>> 3904?? } >>>>> 3905 #endif // ZERO && ASSERT >>>>> >>>>> It seems rather strange to be testing Atomic::xchg() here, rather >>>>> than >>>>> as part of unit testing Atomic?? Fail unit testing => don't try to >>>>> use... >>>> >>>> This is zero.? I'm not touching this. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/prims/jvmtiRawMonitor.cpp >>>>> ? 130???? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >>>>> ? 142???? if (_owner == NULL && >>>>> Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >>>>> >>>>> I think these casts aren't needed. _owner is void*, and Self is >>>>> Thread*, which is implicitly convertible to void*. >>>>> >>>>> Similarly here, for the THREAD argument: >>>>> ? 280???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >>>>> (void*)NULL); >>>>> ? 283???? Contended = Atomic::cmpxchg((void*)THREAD, &_owner, >>>>> (void*)NULL); >>>> >>>> Okay, let me see if the compiler(s) eat that. (yes they do) >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/prims/jvmtiRawMonitor.hpp >>>>> >>>>> This file is in the webrev, but seems to be unchanged. >>>> >>>> It'll be cleaned up with the the commit and not be part of the >>>> changeset. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/atomic.hpp >>>>> ? 520 template >>>>> ? 521 inline D Atomic::sub(I sub_value, D volatile* dest) { >>>>> ? 522?? STATIC_ASSERT(IsPointer::value || IsIntegral::value); >>>>> ? 523?? // Assumes two's complement integer representation. >>>>> ? 524?? #pragma warning(suppress: 4146) >>>>> ? 525?? return Atomic::add(-sub_value, dest); >>>>> ? 526 } >>>>> >>>>> I'm pretty sure this implementation is incorrect.? I think it >>>>> produces >>>>> the wrong result when I and D are both unsigned integer types and >>>>> sizeof(I) < sizeof(D). >>>> >>>> Can you suggest a correction?? I just copied Atomic::dec(). >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/mutex.cpp >>>>> ? 304?? intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, >>>>> &_LockWord.FullWord, (intptr_t)0);? // agro ... >>>>> >>>>> _LBIT should probably be intptr_t, rather than an enum. Note that the >>>>> enum type is unused.? The old value here is another place where an >>>>> implicit widening of same signedness would have been nice. (Such >>>>> implicit widening doesn't work for enums, since it's unspecified >>>>> whether they default to signed or unsigned representation, and >>>>> implementatinos differ.) >>>> >>>> This would be a good/simple cleanup.? I changed it to const >>>> intptr_t _LBIT = 1; >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/mutex.hpp >>>>> >>>>> [pre-existing] >>>>> >>>>> I think the Address member of the SplitWord union is unused. Looking >>>>> at AcquireOrPush (and others), I'm wondering whether it *should* be >>>>> used there, or whether just using intptr_t casts and doing integral >>>>> arithmetic (as is presently being done) is easier and clearer. >>>>> >>>>> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >>>>> rather than polluting the global namespace.? And technically, that >>>>> name is reserved word. >>>> >>>> I moved both this and _LBIT into the top of mutex.cpp since they >>>> are used there. >>>> Cant define const intptr_t _LBIT =1; in a class in our version of C++. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/objectMonitor.cpp >>>>> ? 252?? void * cur = Atomic::cmpxchg((void*)Self, &_owner, >>>>> (void*)NULL); >>>>> ? 409?? if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) { >>>>> 1983?????? ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, >>>>> (void*)NULL); >>>>> >>>>> I think the casts of Self aren't needed. >>>> >>>> fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/objectMonitor.cpp >>>>> ? 995?????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >>>>> 1020???????? if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) { >>>>> >>>>> I think the casts of THREAD aren't needed. >>>> >>>> nope, fixed. >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/objectMonitor.hpp >>>>> ? 254?? markOopDesc* volatile* header_addr(); >>>>> >>>>> Why isn't this volatile markOop* ? >>>> >>>> fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/synchronizer.cpp >>>>> ? 242???????? Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) { >>>>> >>>>> I think the cast of Self isn't needed. >>>> >>>> fixed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/synchronizer.cpp >>>>> ? 992?? for (; block != NULL; block = (PaddedEnd >>>>> *)next(block)) { >>>>> 1734???? for (; block != NULL; block = (PaddedEnd >>>>> *)next(block)) { >>>>> >>>>> [pre-existing] >>>>> All calls to next() pass a PaddedEnd* and cast the >>>>> result.? How about moving all that behavior into next(). >>>> >>>> I fixed this next() function, but it necessitated a cast to >>>> FreeNext field.? The PaddedEnd<> type was intentionally not >>>> propagated to all the things that use it.?? Which is a shame >>>> because there are a lot more casts to PaddedEnd that >>>> could have been removed. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/synchronizer.cpp >>>>> 1970???? if (monitor > (ObjectMonitor *)&block[0] && >>>>> 1971???????? monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) { >>>>> >>>>> [pre-existing] >>>>> Are the casts needed here?? I think PaddedEnd is >>>>> derived from ObjectMonitor, so implicit conversions should apply. >>>> >>>> prob not.? removed them. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/synchronizer.hpp >>>>> ?? 28 #include "memory/padded.hpp" >>>>> ? 163?? static PaddedEnd * volatile gBlockList; >>>>> >>>>> I was going to suggest as an alternative just making gBlockList a >>>>> file >>>>> scoped variable in synchronizer.cpp, since it isn't used outside of >>>>> that file. Except that it is referenced by vmStructs. Curses! >>>> >>>> It's also used by the SA. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/runtime/thread.cpp >>>>> 4707?? intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, >>>>> (intptr_t)0); >>>>> >>>>> This and other places suggest LOCKBIT should be defined as intptr_t, >>>>> rather than as an enum value.? The MuxBits enum type is unused. >>>>> >>>>> And the cast of 0 is another case where implicit widening would be >>>>> nice. >>>> >>>> Making LOCKBIT a const intptr_t = 1 removes a lot of casts. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> src/hotspot/share/services/mallocSiteTable.cpp >>>>> ? 261 bool MallocSiteHashtableEntry::atomic_insert(const >>>>> MallocSiteHashtableEntry* entry) { >>>>> ? 262?? return Atomic::cmpxchg_if_null(entry, (const >>>>> MallocSiteHashtableEntry**)&_next); >>>>> ? 263 } >>>>> >>>>> I think the problem here that is leading to the cast is that >>>>> atomic_insert is taking a const T*.? Note that it's only caller >>>>> passes >>>>> a non-const T*. >>>> >>>> I'll change the type to non-const.? We try to use consts... >>>> >>>> Thanks for the detailed review!? The gcc compiler seems happy so >>>> far, I'll post a webrev of the result of these changes after fixing >>>> Atomic::sub() and seeing how the other compilers deal with these >>>> changes. >>>> >>>> Thanks, >>>> Coleen >>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> >>>> >>> From rkennke at redhat.com Mon Oct 16 13:45:00 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Oct 2017 15:45:00 +0200 Subject: RFR: 8189333: Fix Zero build after Atomic::xchg changes In-Reply-To: <441ed55f-6398-9fa1-d571-86548ed5a2a9@oracle.com> References: <003ff7d9-759f-1ef5-f580-18c2571b63e5@redhat.com> <441ed55f-6398-9fa1-d571-86548ed5a2a9@oracle.com> Message-ID: <9cd66129-3636-8de3-4624-a69bd8f28b99@redhat.com> Hi Coleen, Nope. It fails with this (and a bunch of similar) errors: https://paste.fedoraproject.org/paste/cWKozoxY23z72~EMm0BPBA It does build with this additional patch: http://cr.openjdk.java.net/~rkennke/fix-zero-coleen/webrev/ I.e.: - cast BasicLock to markOop by using markOopDesc::encode() - use oopDesc::cas_set_mark() instead of the raw Atomic ops (probably not strictly required for this change, but still much nicer) You should not require any build scripts for Zero though. Simply run configure with --with-jvm-variants=zero and build in the corresponding linux-x86_64-normal-zero-slowdebug or similar directory using the usual make calls. > > Hi Roman, Can you build zero with this changeset? > > http://cr.openjdk.java.net/~coleenp/8188220.03/webrev/index.html > > My scripts for building zero are broken now. > > thanks, > Coleen > > On 10/15/17 5:40 PM, Roman Kennke wrote: >> Am 15.10.2017 um 23:32 schrieb David Holmes: >>> Hi Roman, >>> >>> On 16/10/2017 7:12 AM, Roman Kennke wrote: >>>> Zero debug build has been broken by: JDK-8187977: Generalize >>>> Atomic::xchg to use templates. >>>> >>>> This patch fixes it by casting the unsigned literal to jint: >>>> >>>> http://cr.openjdk.java.net/~rkennke/8189333/webrev.00/ >>>> >>> >>> Looks fine. >>> >>> I can push this for you straight away (relatively speaking :) ) >>> under the trivial rule. >> Thanks! >> >> Roman > From stefan.karlsson at oracle.com Mon Oct 16 14:14:27 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 16 Oct 2017 16:14:27 +0200 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor Message-ID: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> Hi all, Please review this patch to move the JNI global weak handle processing out of the ReferenceProcessor into a new class, WeakProcessor, that will be used to gather processing and cleaning of "native weak" oops. After this patch the ReferenceProcessor will only deal with the Java level java.lang.ref weak references. http://cr.openjdk.java.net/~stefank/8189359/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8189359 Note this patch only moves the JNIHandles::weak_oops_do calls into the new WeakProcessor. A subsequent patch for JDK-8189359 will move the JvmtiExport::weak_oops_do from JNIHandleBlock into the WeakProcessor. Future patches like JDK-8171119, for example, will be able to add it's set of native weak oops into the new WeakProcessor functions and won't have to duplicate the code for all GCs or add call inside the ReferenceProcessor. Tested with JPRT. Thanks, StefanK From nils.eliasson at oracle.com Mon Oct 16 14:26:40 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 16 Oct 2017 16:26:40 +0200 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: References: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> <11af0f62-ba6b-d533-d23c-750d2ca012c7@oracle.com> Message-ID: <886a112d-fc55-34d5-6e70-1e6a78cf1b0f@oracle.com> Hi, I ran into a problem touching this area, so I'm hijacking this thread. > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 The limitation of MaxVecorSize to 16 for some processors in this code has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't be initalized even though the platform has the capability. Type.cpp:~660 [...] > if (Matcher::vector_size_supported(T_FLOAT,4)) { > TypeVect::VECTX = TypeVect::make(T_FLOAT,4); > } > if (Matcher::vector_size_supported(T_FLOAT,8)) { > TypeVect::VECTY = TypeVect::make(T_FLOAT,8); > } > if (Matcher::vector_size_supported(T_FLOAT,16)) { > TypeVect::VECTZ = TypeVect::make(T_FLOAT,16); > } [...] > mreg2type[Op_VecX] = TypeVect::VECTX; > mreg2type[Op_VecY] = TypeVect::VECTY; > mreg2type[Op_VecZ] = TypeVect::VECTZ; In the ad-files feature flags (UseAVX etc.) are used to control what rules should be matched if it has effects on specific vector registers. Here we have a mismatch. On a platform that supports AVX2 but have MaxVectorSize limited to 16, the VM will fail in regalloc when the TypeVect::VECTY/mreg2type[Op_VecY] is uninitalized, we will also hit asserts in a few places like: assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), "sanity"); Shouldn't the type initalization in type.cpp be dependent on feature flag (UseAVX etc.) instead of MaxVectorLength? (The type for the vector registers are initalized if the platform supports them, but they might not be used if MaxVectorSize is limited.) I suggest something like this: http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ I will open a bug and and a separate RFR if this seems reasonable to you. Regards, Nils Eliasson On 2017-09-22 09:41, Rohit Arul Raj wrote: > Thanks Vladimir, > > On Wed, Sep 20, 2017 at 10:07 PM, Vladimir Kozlov > wrote: >>> __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? >>> __ jcc(Assembler::belowEqual, done); >>> __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? >>> - __ jccb(Assembler::belowEqual, ext_cpuid1); >>> + __ jcc(Assembler::belowEqual, ext_cpuid1); >> >> Good. You may need to increase size of the buffer too (to be safe) to 1100: >> >> static const int stub_size = 1000; >> > Please find the updated patch after the requested change. > > diff --git a/src/cpu/x86/vm/vm_version_x86.cpp > b/src/cpu/x86/vm/vm_version_x86.cpp > --- a/src/cpu/x86/vm/vm_version_x86.cpp > +++ b/src/cpu/x86/vm/vm_version_x86.cpp > @@ -46,7 +46,7 @@ > address VM_Version::_cpuinfo_cont_addr = 0; > > static BufferBlob* stub_blob; > -static const int stub_size = 1000; > +static const int stub_size = 1100; > > extern "C" { > typedef void (*get_cpu_info_stub_t)(void*); > @@ -70,7 +70,7 @@ > bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); > > Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; > - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > done, wrapup; > + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, > ext_cpuid8, done, wrapup; > Label legacy_setup, save_restore_except, legacy_save_restore, > start_simd_check; > > StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); > @@ -267,14 +267,30 @@ > __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? > __ jcc(Assembler::belowEqual, done); > __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? > - __ jccb(Assembler::belowEqual, ext_cpuid1); > + __ jcc(Assembler::belowEqual, ext_cpuid1); > __ cmpl(rax, 0x80000006); // Is cpuid(0x80000007) supported? > __ jccb(Assembler::belowEqual, ext_cpuid5); > __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? > __ jccb(Assembler::belowEqual, ext_cpuid7); > + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) supported? > + __ jccb(Assembler::belowEqual, ext_cpuid8); > + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? > + __ jccb(Assembler::below, ext_cpuid8); > + // > + // Extended cpuid(0x8000001E) > + // > + __ movl(rax, 0x8000001E); > + __ cpuid(); > + __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid1E_offset()))); > + __ movl(Address(rsi, 0), rax); > + __ movl(Address(rsi, 4), rbx); > + __ movl(Address(rsi, 8), rcx); > + __ movl(Address(rsi,12), rdx); > + > // > // Extended cpuid(0x80000008) > // > + __ bind(ext_cpuid8); > __ movl(rax, 0x80000008); > __ cpuid(); > __ lea(rsi, Address(rbp, in_bytes(VM_Version::ext_cpuid8_offset()))); > @@ -1109,11 +1125,27 @@ > } > > #ifdef COMPILER2 > - if (MaxVectorSize > 16) { > - // Limit vectors size to 16 bytes on current AMD cpus. > + if (cpu_family() < 0x17 && MaxVectorSize > 16) { > + // Limit vectors size to 16 bytes on AMD cpus < 17h. > FLAG_SET_DEFAULT(MaxVectorSize, 16); > } > #endif // COMPILER2 > + > + // Some defaults for AMD family 17h > + if ( cpu_family() == 0x17 ) { > + // On family 17h processors use XMM and UnalignedLoadStores for > Array Copy > + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { > + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); > + } > + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { > + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); > + } > +#ifdef COMPILER2 > + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { > + FLAG_SET_DEFAULT(UseFPUForSpilling, true); > + } > +#endif > + } > } > > if( is_intel() ) { // Intel cpus specific settings > diff --git a/src/cpu/x86/vm/vm_version_x86.hpp > b/src/cpu/x86/vm/vm_version_x86.hpp > --- a/src/cpu/x86/vm/vm_version_x86.hpp > +++ b/src/cpu/x86/vm/vm_version_x86.hpp > @@ -228,6 +228,15 @@ > } bits; > }; > > + union ExtCpuid1EEbx { > + uint32_t value; > + struct { > + uint32_t : 8, > + threads_per_core : 8, > + : 16; > + } bits; > + }; > + > union XemXcr0Eax { > uint32_t value; > struct { > @@ -398,6 +407,12 @@ > ExtCpuid8Ecx ext_cpuid8_ecx; > uint32_t ext_cpuid8_edx; // reserved > > + // cpuid function 0x8000001E // AMD 17h > + uint32_t ext_cpuid1E_eax; > + ExtCpuid1EEbx ext_cpuid1E_ebx; // threads per core (AMD17h) > + uint32_t ext_cpuid1E_ecx; > + uint32_t ext_cpuid1E_edx; // unused currently > + > // extended control register XCR0 (the XFEATURE_ENABLED_MASK register) > XemXcr0Eax xem_xcr0_eax; > uint32_t xem_xcr0_edx; // reserved > @@ -505,6 +520,14 @@ > result |= CPU_CLMUL; > if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) > result |= CPU_RTM; > + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > + result |= CPU_ADX; > + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > + result |= CPU_BMI2; > + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > + result |= CPU_SHA; > + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > + result |= CPU_FMA; > > // AMD features. > if (is_amd()) { > @@ -518,16 +541,8 @@ > } > // Intel features. > if(is_intel()) { > - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) > - result |= CPU_ADX; > - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) > - result |= CPU_BMI2; > - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) > - result |= CPU_SHA; > if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) > result |= CPU_LZCNT; > - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) > - result |= CPU_FMA; > // for Intel, ecx.bits.misalignsse bit (bit 8) indicates > support for prefetchw > if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { > result |= CPU_3DNOW_PREFETCH; > @@ -590,6 +605,7 @@ > static ByteSize ext_cpuid5_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid5_eax); } > static ByteSize ext_cpuid7_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid7_eax); } > static ByteSize ext_cpuid8_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid8_eax); } > + static ByteSize ext_cpuid1E_offset() { return > byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } > static ByteSize tpl_cpuidB0_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } > static ByteSize tpl_cpuidB1_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } > static ByteSize tpl_cpuidB2_offset() { return > byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } > @@ -673,8 +689,12 @@ > if (is_intel() && supports_processor_topology()) { > result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; > } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { > - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > - cores_per_cpu(); > + if (cpu_family() >= 0x17) { > + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; > + } else { > + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / > + cores_per_cpu(); > + } > } > return (result == 0 ? 1 : result); > } > > Regards, > Rohit > >>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>> b/src/cpu/x86/vm/vm_version_x86.cpp >>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>> @@ -70,7 +70,7 @@ >>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>> >>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>> done, wrapup; >>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>> ext_cpuid8, done, wrapup; >>> Label legacy_setup, save_restore_except, legacy_save_restore, >>> start_simd_check; >>> >>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>> @@ -267,14 +267,30 @@ >>> __ cmpl(rax, 0x80000000); // Is cpuid(0x80000001) supported? >>> __ jcc(Assembler::belowEqual, done); >>> __ cmpl(rax, 0x80000004); // Is cpuid(0x80000005) supported? >>> - __ jccb(Assembler::belowEqual, ext_cpuid1); >>> + __ jcc(Assembler::belowEqual, ext_cpuid1); >>> __ cmpl(rax, 0x80000006); // Is cpuid(0x80000007) supported? >>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x80000009 and above) >>> supported? >>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>> + __ cmpl(rax, 0x8000001E); // Is cpuid(0x8000001E) supported? >>> + __ jccb(Assembler::below, ext_cpuid8); >>> + // >>> + // Extended cpuid(0x8000001E) >>> + // >>> + __ movl(rax, 0x8000001E); >>> + __ cpuid(); >>> + __ lea(rsi, Address(rbp, >>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>> + __ movl(Address(rsi, 0), rax); >>> + __ movl(Address(rsi, 4), rbx); >>> + __ movl(Address(rsi, 8), rcx); >>> + __ movl(Address(rsi,12), rdx); >>> + >>> // >>> // Extended cpuid(0x80000008) >>> // >>> + __ bind(ext_cpuid8); >>> __ movl(rax, 0x80000008); >>> __ cpuid(); >>> __ lea(rsi, Address(rbp, >>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>> @@ -1109,11 +1125,27 @@ >>> } >>> >>> #ifdef COMPILER2 >>> - if (MaxVectorSize > 16) { >>> - // Limit vectors size to 16 bytes on current AMD cpus. >>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>> } >>> #endif // COMPILER2 >>> + >>> + // Some defaults for AMD family 17h >>> + if ( cpu_family() == 0x17 ) { >>> + // On family 17h processors use XMM and UnalignedLoadStores for >>> Array Copy >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>> + } >>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>> + } >>> +#ifdef COMPILER2 >>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>> + } >>> +#endif >>> + } >>> } >>> >>> if( is_intel() ) { // Intel cpus specific settings >>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>> b/src/cpu/x86/vm/vm_version_x86.hpp >>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>> @@ -228,6 +228,15 @@ >>> } bits; >>> }; >>> >>> + union ExtCpuid1EEbx { >>> + uint32_t value; >>> + struct { >>> + uint32_t : 8, >>> + threads_per_core : 8, >>> + : 16; >>> + } bits; >>> + }; >>> + >>> union XemXcr0Eax { >>> uint32_t value; >>> struct { >>> @@ -398,6 +407,12 @@ >>> ExtCpuid8Ecx ext_cpuid8_ecx; >>> uint32_t ext_cpuid8_edx; // reserved >>> >>> + // cpuid function 0x8000001E // AMD 17h >>> + uint32_t ext_cpuid1E_eax; >>> + ExtCpuid1EEbx ext_cpuid1E_ebx; // threads per core (AMD17h) >>> + uint32_t ext_cpuid1E_ecx; >>> + uint32_t ext_cpuid1E_edx; // unused currently >>> + >>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>> register) >>> XemXcr0Eax xem_xcr0_eax; >>> uint32_t xem_xcr0_edx; // reserved >>> @@ -505,6 +520,14 @@ >>> result |= CPU_CLMUL; >>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>> result |= CPU_RTM; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> + result |= CPU_ADX; >>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> + result |= CPU_BMI2; >>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> + result |= CPU_SHA; >>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> + result |= CPU_FMA; >>> >>> // AMD features. >>> if (is_amd()) { >>> @@ -518,16 +541,8 @@ >>> } >>> // Intel features. >>> if(is_intel()) { >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>> - result |= CPU_ADX; >>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>> - result |= CPU_BMI2; >>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>> - result |= CPU_SHA; >>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>> result |= CPU_LZCNT; >>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>> - result |= CPU_FMA; >>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>> support for prefetchw >>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>> result |= CPU_3DNOW_PREFETCH; >>> @@ -590,6 +605,7 @@ >>> static ByteSize ext_cpuid5_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>> static ByteSize ext_cpuid7_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>> static ByteSize ext_cpuid8_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>> + static ByteSize ext_cpuid1E_offset() { return >>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>> static ByteSize tpl_cpuidB0_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>> static ByteSize tpl_cpuidB1_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>> static ByteSize tpl_cpuidB2_offset() { return >>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>> @@ -673,8 +689,12 @@ >>> if (is_intel() && supports_processor_topology()) { >>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>> - cores_per_cpu(); >>> + if (cpu_family() >= 0x17) { >>> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; >>> + } else { >>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>> + cores_per_cpu(); >>> + } >>> } >>> return (result == 0 ? 1 : result); >>> } >>> >>> Please let me know your comments. >>> Thanks for your review. >>> >>> Regards, >>> Rohit >>> >>>> >>>> On 9/11/17 9:52 PM, Rohit Arul Raj wrote: >>>>> >>>>> Hello David, >>>>> >>>>>>> >>>>>>> 1. ExtCpuid1EEx >>>>>>> >>>>>>> Should this be ExtCpuid1EEbx? (I see the naming here is somewhat >>>>>>> inconsistent - and potentially confusing: I would have preferred to >>>>>>> see >>>>>>> things like ExtCpuid_1E_Ebx, to make it clear.) >>>>>> >>>>>> >>>>>> Yes, I can change it accordingly. >>>>>> >>>>> I have attached the updated, re-tested patch as per your comments above. >>>>> >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>> @@ -70,7 +70,7 @@ >>>>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>>> >>>>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>> done, wrapup; >>>>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>> ext_cpuid8, done, wrapup; >>>>> Label legacy_setup, save_restore_except, legacy_save_restore, >>>>> start_simd_check; >>>>> >>>>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>>> @@ -272,9 +272,23 @@ >>>>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>>>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) supported? >>>>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>>>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>>>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>>>> + // >>>>> + // Extended cpuid(0x8000001E) >>>>> + // >>>>> + __ movl(rax, 0x8000001E); >>>>> + __ cpuid(); >>>>> + __ lea(rsi, Address(rbp, >>>>> in_bytes(VM_Version::ext_cpuid_1E_offset()))); >>>>> + __ movl(Address(rsi, 0), rax); >>>>> + __ movl(Address(rsi, 4), rbx); >>>>> + __ movl(Address(rsi, 8), rcx); >>>>> + __ movl(Address(rsi,12), rdx); >>>>> + >>>>> // >>>>> // Extended cpuid(0x80000008) >>>>> // >>>>> + __ bind(ext_cpuid8); >>>>> __ movl(rax, 0x80000008); >>>>> __ cpuid(); >>>>> __ lea(rsi, Address(rbp, >>>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>>> @@ -1109,11 +1123,27 @@ >>>>> } >>>>> >>>>> #ifdef COMPILER2 >>>>> - if (MaxVectorSize > 16) { >>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>> } >>>>> #endif // COMPILER2 >>>>> + >>>>> + // Some defaults for AMD family 17h >>>>> + if ( cpu_family() == 0x17 ) { >>>>> + // On family 17h processors use XMM and UnalignedLoadStores for >>>>> Array Copy >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>> + } >>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>> + } >>>>> +#ifdef COMPILER2 >>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>> + } >>>>> +#endif >>>>> + } >>>>> } >>>>> >>>>> if( is_intel() ) { // Intel cpus specific settings >>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>> @@ -228,6 +228,15 @@ >>>>> } bits; >>>>> }; >>>>> >>>>> + union ExtCpuid_1E_Ebx { >>>>> + uint32_t value; >>>>> + struct { >>>>> + uint32_t : 8, >>>>> + threads_per_core : 8, >>>>> + : 16; >>>>> + } bits; >>>>> + }; >>>>> + >>>>> union XemXcr0Eax { >>>>> uint32_t value; >>>>> struct { >>>>> @@ -398,6 +407,12 @@ >>>>> ExtCpuid8Ecx ext_cpuid8_ecx; >>>>> uint32_t ext_cpuid8_edx; // reserved >>>>> >>>>> + // cpuid function 0x8000001E // AMD 17h >>>>> + uint32_t ext_cpuid_1E_eax; >>>>> + ExtCpuid_1E_Ebx ext_cpuid_1E_ebx; // threads per core (AMD17h) >>>>> + uint32_t ext_cpuid_1E_ecx; >>>>> + uint32_t ext_cpuid_1E_edx; // unused currently >>>>> + >>>>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>>> register) >>>>> XemXcr0Eax xem_xcr0_eax; >>>>> uint32_t xem_xcr0_edx; // reserved >>>>> @@ -505,6 +520,14 @@ >>>>> result |= CPU_CLMUL; >>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>> result |= CPU_RTM; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> + result |= CPU_ADX; >>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> + result |= CPU_BMI2; >>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> + result |= CPU_SHA; >>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> + result |= CPU_FMA; >>>>> >>>>> // AMD features. >>>>> if (is_amd()) { >>>>> @@ -518,16 +541,8 @@ >>>>> } >>>>> // Intel features. >>>>> if(is_intel()) { >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>> - result |= CPU_ADX; >>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>> - result |= CPU_BMI2; >>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>> - result |= CPU_SHA; >>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>> result |= CPU_LZCNT; >>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>> - result |= CPU_FMA; >>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>> support for prefetchw >>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>> result |= CPU_3DNOW_PREFETCH; >>>>> @@ -590,6 +605,7 @@ >>>>> static ByteSize ext_cpuid5_offset() { return >>>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>>> static ByteSize ext_cpuid7_offset() { return >>>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>>> static ByteSize ext_cpuid8_offset() { return >>>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>>> + static ByteSize ext_cpuid_1E_offset() { return >>>>> byte_offset_of(CpuidInfo, ext_cpuid_1E_eax); } >>>>> static ByteSize tpl_cpuidB0_offset() { return >>>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>>> static ByteSize tpl_cpuidB1_offset() { return >>>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>>> static ByteSize tpl_cpuidB2_offset() { return >>>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>>> @@ -673,8 +689,11 @@ >>>>> if (is_intel() && supports_processor_topology()) { >>>>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>> - cores_per_cpu(); >>>>> + if (cpu_family() >= 0x17) >>>>> + result = _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + >>>>> 1; >>>>> + else >>>>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>> + cores_per_cpu(); >>>>> } >>>>> return (result == 0 ? 1 : result); >>>>> } >>>>> >>>>> >>>>> Please let me know your comments >>>>> >>>>> Thanks for your time. >>>>> >>>>> Regards, >>>>> Rohit >>>>> >>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> >>>>>>>> Reference: >>>>>>>> >>>>>>>> >>>>>>>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >>>>>>>> [Pg 82] >>>>>>>> >>>>>>>> CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >>>>>>>> 15:8 ThreadsPerCore: threads per core. Read-only. Reset: >>>>>>>> XXh. >>>>>>>> The number of threads per core is ThreadsPerCore+1. >>>>>>>> >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>> @@ -70,7 +70,7 @@ >>>>>>>> bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>>>>>> >>>>>>>> Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>>>>>> - Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>>>>> done, wrapup; >>>>>>>> + Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>>>>> ext_cpuid8, done, wrapup; >>>>>>>> Label legacy_setup, save_restore_except, legacy_save_restore, >>>>>>>> start_simd_check; >>>>>>>> >>>>>>>> StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>>>>>> @@ -272,9 +272,23 @@ >>>>>>>> __ jccb(Assembler::belowEqual, ext_cpuid5); >>>>>>>> __ cmpl(rax, 0x80000007); // Is cpuid(0x80000008) >>>>>>>> supported? >>>>>>>> __ jccb(Assembler::belowEqual, ext_cpuid7); >>>>>>>> + __ cmpl(rax, 0x80000008); // Is cpuid(0x8000001E) supported? >>>>>>>> + __ jccb(Assembler::belowEqual, ext_cpuid8); >>>>>>>> + // >>>>>>>> + // Extended cpuid(0x8000001E) >>>>>>>> + // >>>>>>>> + __ movl(rax, 0x8000001E); >>>>>>>> + __ cpuid(); >>>>>>>> + __ lea(rsi, Address(rbp, >>>>>>>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>>>>>>> + __ movl(Address(rsi, 0), rax); >>>>>>>> + __ movl(Address(rsi, 4), rbx); >>>>>>>> + __ movl(Address(rsi, 8), rcx); >>>>>>>> + __ movl(Address(rsi,12), rdx); >>>>>>>> + >>>>>>>> // >>>>>>>> // Extended cpuid(0x80000008) >>>>>>>> // >>>>>>>> + __ bind(ext_cpuid8); >>>>>>>> __ movl(rax, 0x80000008); >>>>>>>> __ cpuid(); >>>>>>>> __ lea(rsi, Address(rbp, >>>>>>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>>>>>> @@ -1109,11 +1123,27 @@ >>>>>>>> } >>>>>>>> >>>>>>>> #ifdef COMPILER2 >>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>> } >>>>>>>> #endif // COMPILER2 >>>>>>>> + >>>>>>>> + // Some defaults for AMD family 17h >>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>> + // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>> for >>>>>>>> Array Copy >>>>>>>> + if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>> + } >>>>>>>> + if (supports_sse2() && >>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>> { >>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>> + } >>>>>>>> +#ifdef COMPILER2 >>>>>>>> + if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>> + } >>>>>>>> +#endif >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>> @@ -228,6 +228,15 @@ >>>>>>>> } bits; >>>>>>>> }; >>>>>>>> >>>>>>>> + union ExtCpuid1EEx { >>>>>>>> + uint32_t value; >>>>>>>> + struct { >>>>>>>> + uint32_t : 8, >>>>>>>> + threads_per_core : 8, >>>>>>>> + : 16; >>>>>>>> + } bits; >>>>>>>> + }; >>>>>>>> + >>>>>>>> union XemXcr0Eax { >>>>>>>> uint32_t value; >>>>>>>> struct { >>>>>>>> @@ -398,6 +407,12 @@ >>>>>>>> ExtCpuid8Ecx ext_cpuid8_ecx; >>>>>>>> uint32_t ext_cpuid8_edx; // reserved >>>>>>>> >>>>>>>> + // cpuid function 0x8000001E // AMD 17h >>>>>>>> + uint32_t ext_cpuid1E_eax; >>>>>>>> + ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >>>>>>>> + uint32_t ext_cpuid1E_ecx; >>>>>>>> + uint32_t ext_cpuid1E_edx; // unused currently >>>>>>>> + >>>>>>>> // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>>>>>> register) >>>>>>>> XemXcr0Eax xem_xcr0_eax; >>>>>>>> uint32_t xem_xcr0_edx; // reserved >>>>>>>> @@ -505,6 +520,14 @@ >>>>>>>> result |= CPU_CLMUL; >>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>> result |= CPU_RTM; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> + result |= CPU_ADX; >>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> + result |= CPU_BMI2; >>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> + result |= CPU_SHA; >>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> + result |= CPU_FMA; >>>>>>>> >>>>>>>> // AMD features. >>>>>>>> if (is_amd()) { >>>>>>>> @@ -518,16 +541,8 @@ >>>>>>>> } >>>>>>>> // Intel features. >>>>>>>> if(is_intel()) { >>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>> - result |= CPU_ADX; >>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>> - result |= CPU_BMI2; >>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>> - result |= CPU_SHA; >>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>> result |= CPU_LZCNT; >>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>> - result |= CPU_FMA; >>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>>>> support for prefetchw >>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>> @@ -590,6 +605,7 @@ >>>>>>>> static ByteSize ext_cpuid5_offset() { return >>>>>>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>>>>>> static ByteSize ext_cpuid7_offset() { return >>>>>>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>>>>>> static ByteSize ext_cpuid8_offset() { return >>>>>>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>>>>>> + static ByteSize ext_cpuid1E_offset() { return >>>>>>>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>>>>>>> static ByteSize tpl_cpuidB0_offset() { return >>>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>>>>>> static ByteSize tpl_cpuidB1_offset() { return >>>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>>>>>> static ByteSize tpl_cpuidB2_offset() { return >>>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>>>>>> @@ -673,8 +689,11 @@ >>>>>>>> if (is_intel() && supports_processor_topology()) { >>>>>>>> result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>>>>>> } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>>>>>> - result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>>>> - cores_per_cpu(); >>>>>>>> + if (cpu_family() >= 0x17) >>>>>>>> + result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + >>>>>>>> 1; >>>>>>>> + else >>>>>>>> + result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>>>> + cores_per_cpu(); >>>>>>>> } >>>>>>>> return (result == 0 ? 1 : result); >>>>>>>> } >>>>>>>> >>>>>>>> I have attached the patch for review. >>>>>>>> Please let me know your comments. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Rohit >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> >>>>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>> >>>>>>>>>> No comments on AMD specific changes. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hello David, >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>> >>>>>>>>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot >>>>>>>>>>>>> repo. >>>>>>>>>>>>> >>>>>>>>>>>> I checked out the latest jdk10/hs/hotspot [parent: >>>>>>>>>>>> 13548:1a9c2e07a826] >>>>>>>>>>>> and was able to apply the patch >>>>>>>>>>>> [epyc-amd17h-defaults-3Sept.patch] >>>>>>>>>>>> without any issues. >>>>>>>>>>>> Can you share the error message that you are getting? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I was getting this: >>>>>>>>>>> >>>>>>>>>>> applying hotspot.patch >>>>>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> Hunk #1 FAILED at 1108 >>>>>>>>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>> Hunk #2 FAILED at 522 >>>>>>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>>>>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>>>>>>>> abort: patch failed to apply >>>>>>>>>>> >>>>>>>>>>> but I started again and this time it applied fine, so not sure >>>>>>>>>>> what >>>>>>>>>>> was >>>>>>>>>>> going on there. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Rohit >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Changes look good. Only question I have is about >>>>>>>>>>>>>>>>> MaxVectorSize. >>>>>>>>>>>>>>>>> It >>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> set >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for >>>>>>>>>>>>>>>> AMD >>>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>>> So >>>>>>>>>>>>>>>> I have removed the surplus check for MaxVectorSize from my >>>>>>>>>>>>>>>> patch. >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Which check you removed? >>>>>>>>>>>>>>> >>>>>>>>>>>>>> My older patch had the below mentioned check which was required >>>>>>>>>>>>>> on >>>>>>>>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been >>>>>>>>>>>>>> handled >>>>>>>>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> .. >>>>>>>>>>>>>> .. >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>>>>>>>> AMD 15h doesn't have support for SHA. Still "UseSHA" flag >>>>>>>>>>>>>>>> gets >>>>>>>>>>>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is >>>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>> underlying reason for this? I have handled this in the patch >>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>> wanted to confirm. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>>>>>>>> instructions >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> calculate SHA-256: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I don't know if AMD 15h supports these instructions and can >>>>>>>>>>>>>>> execute >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> code. You need to test it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 >>>>>>>>>>>>>> instructions, >>>>>>>>>>>>>> it should work. >>>>>>>>>>>>>> Confirmed by running following sanity tests: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>>>>>>>> >>>>>>>>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please find attached updated, re-tested patch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>> for >>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>> + if (supports_sse4_2() && >>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>> + } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>> >>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != >>>>>>>>>>>>>> 0) >>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>> indicates >>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>>>> 0) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please let me know your comments. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for your time. >>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>> + warning("SHA instructions are not available on this >>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 crypto >>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel >>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse >>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I already >>>>>>>>>>>>>>>>>>>> see >>>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>> lot of >>>>>>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source >>>>>>>>>>>>>>>>>>> base, >>>>>>>>>>>>>>>>>>> test >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg >>>>>>>>>>>>>>>>>> ($make >>>>>>>>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Can anyone please volunteer to review this patch which >>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>> flag/ISA >>>>>>>>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ************************* Patch >>>>>>>>>>>>>>>>>> **************************** >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>> + warning("SHA instructions are not available on >>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>>>>> result |= CPU_CLMUL; >>>>>>>>>>>>>>>>>> if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>>>>>> result |= CPU_RTM; >>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> // AMD features. >>>>>>>>>>>>>>>>>> if (is_amd()) { >>>>>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != >>>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>> - result |= CPU_ADX; >>>>>>>>>>>>>>>>>> - if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>> - result |= CPU_BMI2; >>>>>>>>>>>>>>>>>> - if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>> - result |= CPU_SHA; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel >>>>>>>>>>>>>>>>>> != 0) >>>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>> - if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>> - result |= CPU_FMA; >>>>>>>>>>>>>>>>>> // for Intel, ecx.bits.misalignsse bit (bit >>>>>>>>>>>>>>>>>> 8) >>>>>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>> (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse >>>>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>>>>>> result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ************************************************************** >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I would like an volunteer to review this patch >>>>>>>>>>>>>>>>>>>>>>> (openJDK9) >>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor >>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from systems >>>>>>>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for >>>>>>>>>>>>>>>>>>>>>>> reference. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the >>>>>>>>>>>>>>>>>>>>>> mail >>>>>>>>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> patch is small please include it inline. Otherwise you >>>>>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make >>>>>>>>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to >>>>>>>>>>>>>>>>>>>>>> comment >>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(UseSSE42Intrinsics, >>>>>>>>>>>>>>>>>>>>> false); >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> + if (supports_sha()) { >>>>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> + } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>>>> + if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>>>>>> + !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>>>> + warning("SHA instructions are not available on >>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> // some defaults for AMD family 15h >>>>>>>>>>>>>>>>>>>>> if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> #ifdef COMPILER2 >>>>>>>>>>>>>>>>>>>>> - if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>>>> - // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>>>>>> + if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>>>> + // Limit vectors size to 16 bytes on AMD cpus < >>>>>>>>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>>>>>>>> FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> #endif // COMPILER2 >>>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>>> + // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>>>>>> + if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>>>>>> + // On family 17h processors use XMM and >>>>>>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> + UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> + if (supports_sse2() && >>>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> + UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> + if (supports_bmi2() && >>>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> + UseBMI2Instructions = true; >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> + if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> + if (UseSHA) { >>>>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>>>> + } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>>>> + warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>>>>>> + if (supports_sse4_2()) { >>>>>>>>>>>>>>>>>>>>> + if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>>>>>> + FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>>>>>>>> result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>>>>> if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a >>>>>>>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>>>>>> result |= CPU_SSE4A; >>>>>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>>>>> + result |= CPU_BMI2; >>>>>>>>>>>>>>>>>>>>> + if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>>>>>> + result |= CPU_HT; >>>>>>>>>>>>>>>>>>>>> + if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>>>>> + result |= CPU_ADX; >>>>>>>>>>>>>>>>>>>>> + if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>>>>> + result |= CPU_SHA; >>>>>>>>>>>>>>>>>>>>> + if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>>>>> + result |= CPU_FMA; >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> // Intel features. >>>>>>>>>>>>>>>>>>>>> if(is_intel()) { >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>>>>> From coleen.phillimore at oracle.com Mon Oct 16 14:56:35 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 16 Oct 2017 10:56:35 -0400 Subject: RFR: 8189333: Fix Zero build after Atomic::xchg changes In-Reply-To: <9cd66129-3636-8de3-4624-a69bd8f28b99@redhat.com> References: <003ff7d9-759f-1ef5-f580-18c2571b63e5@redhat.com> <441ed55f-6398-9fa1-d571-86548ed5a2a9@oracle.com> <9cd66129-3636-8de3-4624-a69bd8f28b99@redhat.com> Message-ID: <60f5a96a-8f91-8114-c8de-8a72004fcb75@oracle.com> Thank you for the patch.? I have scripts to remember these options. There used to be other options but it doesn't look like I need them now. thanks, Coleen On 10/16/17 9:45 AM, Roman Kennke wrote: > Hi Coleen, > > Nope. It fails with this (and a bunch of similar) errors: > https://paste.fedoraproject.org/paste/cWKozoxY23z72~EMm0BPBA > > > It does build with this additional patch: > http://cr.openjdk.java.net/~rkennke/fix-zero-coleen/webrev/ > > > I.e.: > - cast BasicLock to markOop by using markOopDesc::encode() > - use oopDesc::cas_set_mark() instead of the raw Atomic ops (probably > not strictly required for this change, but still much nicer) > > > You should not require any build scripts for Zero though. Simply run > configure with --with-jvm-variants=zero and build in the corresponding > linux-x86_64-normal-zero-slowdebug or similar directory using the > usual make calls. > > >> >> Hi Roman, Can you build zero with this changeset? >> >> http://cr.openjdk.java.net/~coleenp/8188220.03/webrev/index.html >> >> My scripts for building zero are broken now. >> >> thanks, >> Coleen >> >> On 10/15/17 5:40 PM, Roman Kennke wrote: >>> Am 15.10.2017 um 23:32 schrieb David Holmes: >>>> Hi Roman, >>>> >>>> On 16/10/2017 7:12 AM, Roman Kennke wrote: >>>>> Zero debug build has been broken by: JDK-8187977: Generalize >>>>> Atomic::xchg to use templates. >>>>> >>>>> This patch fixes it by casting the unsigned literal to jint: >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8189333/webrev.00/ >>>>> >>>> >>>> Looks fine. >>>> >>>> I can push this for you straight away (relatively speaking :) ) >>>> under the trivial rule. >>> Thanks! >>> >>> Roman >> > From stefan.karlsson at oracle.com Mon Oct 16 15:40:04 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 16 Oct 2017 17:40:04 +0200 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances Message-ID: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> Hi all, Please review this patch to move the call of the static JvmtiExport::weak_oops_do out of the JNIHandleBlock::weak_oops_do member function into the new WeakProcessor. Today, this isn't causing any bugs because there's only one instance of JNIHandleBlock, the _weak_global_handles. However, in prototypes with more than one JNIHandleBlock, this results in multiple calls to JvmtiExport::weak_oops_do. http://cr.openjdk.java.net/~stefank/8189360/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8189360 This patch builds upon the patch in: http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028684.html Tested with JPRT. Thanks, StefanK From coleen.phillimore at oracle.com Mon Oct 16 15:59:45 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 16 Oct 2017 11:59:45 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> Message-ID: <5a787ec8-afe6-b8a6-23de-5d6a5b935035@oracle.com> The latest incremental based on these comments (now running tier1). http://cr.openjdk.java.net/~coleenp/8188220.review-comments.02/webrev/index.html plus what Roman sent in the "RFR: 8189333: Fix Zero build after Atomic::xchg changes" thread. thanks, Coleen On 10/16/17 9:13 AM, coleen.phillimore at oracle.com wrote: > > > On 10/14/17 7:36 PM, Kim Barrett wrote: >>> On Oct 13, 2017, at 2:34 PM, coleen.phillimore at oracle.com wrote: >>> >>> >>> Hi, Here is the version with the changes from Kim's comments that >>> has passed at least testing with JPRT and tier1, locally.?? More >>> testing (tier2-5) is in progress. >>> >>> Also includes a corrected version of Atomic::sub care of Erik >>> Osterlund. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev >>> >>> Full version: >>> >>> http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >>> >>> Thanks! >>> Coleen >> I still dislike and disagree with what is being proposed regarding >> replace_if_null. > > We can discuss that seperately, please file an RFE. >> >> ------------------------------------------------------------------------------ >> >> I forgot that I'd promised you an updated Atomic::sub definition. >> Unfortunately, the new one still has problems, performing some >> conversions that should not be permitted (and are disallowed by >> Atomic::add).? Try this instead.? (This hasn't been tested, not even >> compiled; hopefully I don't have any typos or anything.)? The intent >> is that this supports the same conversions as Atomic::add. >> >> template >> inline D Atomic::sub(I sub_value, D volatile* dest) { >> ?? STATIC_ASSERT(IsPointer::value || IsIntegral::value); >> ?? STATIC_ASSERT(IsIntegral::value); >> ?? // If D is a pointer type, use [u]intptr_t as the addend type, >> ?? // matching signedness of I.? Otherwise, use D as the addend type. >> ?? typedef typename Conditional::value, intptr_t, >> uintptr_t>::type PI; >> ?? typedef typename Conditional::value, PI, D>::type >> AddendType; >> ?? // Only allow conversions that can't change the value. >> ?? STATIC_ASSERT(IsSigned::value == IsSigned::value); >> ?? STATIC_ASSERT(sizeof(I) <= sizeof(AddendType)); >> ?? AddendType addend = sub_value; >> ?? // Assumes two's complement integer representation. >> ?? #pragma warning(suppress: 4146) // In case AddendType is not signed. >> ?? return Atomic::add(-addend, dest); >> } > > Uh, Ok.? I'll try it out. >> >>>>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>>>> 7960?? Atomic::add(-n, &_num_par_pushes); >>>>> >>>>> Atomic::sub >>>> fixed. >> Nope, not fixed in http://cr.openjdk.java.net/~coleenp/8188220.03/webrev > > Missed it twice now.? I think I have it now. >>>>> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >>>>> ?? 200?????? PerRegionTable* res = >>>>> ?? 201???????? Atomic::cmpxchg(nxt, &_free_list, fl); >>>>> >>>>> Please remove the line break, now that the code has been simplified. >>>>> >>>>> But wait, doesn't this alloc exhibit classic ABA problems?? I *think* >>>>> this works because alloc and bulk_free are called in different >>>>> phases, >>>>> never overlapping. >>>> I don't know.? Do you want to file a bug to investigate this? >>>> fixed. >> No, I now think it?s ok, though confusing. >> >>>>> src/hotspot/share/gc/g1/sparsePRT.cpp >>>>> ?? 295???? SparsePRT* res = >>>>> ?? 296?????? Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >>>>> and >>>>> ?? 307???? SparsePRT* res = >>>>> ?? 308?????? Atomic::cmpxchg(next, &_head_expanded_list, hd); >>>>> >>>>> I'd rather not have the line breaks in these either. >>>>> >>>>> And get_from_expanded_list also appears to have classic ABA problems. >>>>> I *think* this works because add_to_expanded_list and >>>>> get_from_expanded_list are called in different phases, never >>>>> overlapping. >>>> Fixed, same question as above?? Or one bug to investigate both? >> Again, I think it?s ok, though confusing. >> >>>>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>>>> ?? 262?? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>>>> ?? 263?????????????????????????????????? (volatile intptr_t *)&_data, >>>>> ?? 264 (intptr_t)old_age._data); >>>>> >>>>> This should be >>>>> >>>>> ??? return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >>>> fixed. >> Still casting the result. > > I thought I fixed it.? I think I fixed it now. >> >>>>> src/hotspot/share/oops/method.hpp >>>>> ?? 139?? volatile address from_compiled_entry() const?? { return >>>>> OrderAccess::load_acquire(&_from_compiled_entry); } >>>>> ?? 140?? volatile address from_compiled_entry_no_trampoline() const; >>>>> ?? 141?? volatile address from_interpreted_entry() const{ return >>>>> OrderAccess::load_acquire(&_from_interpreted_entry); } >>>>> >>>>> [pre-existing] >>>>> The volatile qualifiers here seem suspect to me. >>>> Again much suspicion about concurrency and giant pain, which I >>>> remember, of debugging these when they were broken. >> Let me be more direct: the volatile qualifiers for the function return >> types are bogus and confusing, and should be removed. > > Okay, sure. > >> >>>>> src/hotspot/share/prims/jni.cpp >>>>> >>>>> [pre-existing] >>>>> >>>>> copy_jni_function_table should be using Copy::disjoint_words_atomic. >>>> yuck. >> Of course, neither is entirely technically correct, since both are >> treating conversion of function pointers to void* as okay in shared >> code, e.g. violating some of the raison d'etre of CAST_{TO,FROM}_FN_PTR. >> For way more detail than you probably care about, see the discussion >> starting here: >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018578.html >> >> through (5 messages in total) >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018623.html >> >> >> Oh well. >> >>>>> src/hotspot/share/runtime/mutex.hpp >>>>> >>>>> [pre-existing] >>>>> >>>>> I think the Address member of the SplitWord union is unused. Looking >>>>> at AcquireOrPush (and others), I'm wondering whether it *should* be >>>>> used there, or whether just using intptr_t casts and doing integral >>>>> arithmetic (as is presently being done) is easier and clearer. >>>>> >>>>> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >>>>> rather than polluting the global namespace.? And technically, that >>>>> name is reserved word. >>>> I moved both this and _LBIT into the top of mutex.cpp since they >>>> are used there. >> Good. >> >>>> Cant define const intptr_t _LBIT =1; in a class in our version of C++. >> Sorry, please explain?? If you tried to move it into SplitWord, that >> doesn?t work; >> unions are not permitted to have static data members (I don?t >> off-hand know why, >> just that it?s explicitly forbidden). >> >> And you left the seemingly unused Address member in SplitWord. > > This is the compilation error I get: > > /scratch/cphillim/hg/10ptr2/open/src/hotspot/share/runtime/mutex.hpp:124:33: > error: non-static data member initializers only available with > -std=c++11 or -std=gnu++11 [-Werror] > ?? const intptr_t _NEW_LOCKBIT = 1; > > > I don't own this SplitWord code so do not want to remove the unused > Address member. > >> >>>>> src/hotspot/share/runtime/thread.cpp >>>>> 4707?? intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, >>>>> (intptr_t)0); >>>>> >>>>> This and other places suggest LOCKBIT should be defined as intptr_t, >>>>> rather than as an enum value.? The MuxBits enum type is unused. >>>>> >>>>> And the cast of 0 is another case where implicit widening would be >>>>> nice. >>>> Making LOCKBIT a const intptr_t = 1 removes a lot of casts. >> Because of the new definition of LOCKBIT I noticed the immediately >> preceeding typedef for MutexT, which seems to be unused. > > Removed MutexT. >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/oops/cpCache.cpp >> ? 114 bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { >> ? 115?? intptr_t result = Atomic::cmpxchg(flags, &_flags, (intx)0); >> ? 116?? return (result == 0); >> ? 117 } >> >> [I missed this on earlier pass.] >> >> Should be >> >> bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { >> ?? return Atomic::cmpxchg(flags, &_flags, (intx)0) == 0; >> } >> >> Otherwise, I end up asking why result is intptr_t when the cmpxchg is >> dealing with intx.? Yeah, one's a typedef of the other, but mixing >> them like that in the same expression is not helpful. >> >> > Sure why not? > > Actually init_flags_atomic is not used and neither is > init_method_flags_atomic so I did one better and removed them. > > Thanks for the again thorough code review and Atomic::sub.?? I'll post > incremental when it compiles. > > Coleen From kim.barrett at oracle.com Mon Oct 16 17:14:48 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 16 Oct 2017 13:14:48 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> Message-ID: <36C02953-AF4E-4A89-92CE-70FE4293965A@oracle.com> > On Oct 16, 2017, at 9:13 AM, coleen.phillimore at oracle.com wrote: >>>> Cant define const intptr_t _LBIT =1; in a class in our version of C++. >> Sorry, please explain? If you tried to move it into SplitWord, that doesn?t work; >> unions are not permitted to have static data members (I don?t off-hand know why, >> just that it?s explicitly forbidden). >> >> And you left the seemingly unused Address member in SplitWord. > > This is the compilation error I get: > > /scratch/cphillim/hg/10ptr2/open/src/hotspot/share/runtime/mutex.hpp:124:33: error: non-static data member initializers only available with -std=c++11 or -std=gnu++11 [-Werror] > const intptr_t _NEW_LOCKBIT = 1; Needs ?static? in a class. From coleen.phillimore at oracle.com Mon Oct 16 19:31:56 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 16 Oct 2017 15:31:56 -0400 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> Message-ID: <0ce8d126-1f0c-5cd3-edbc-b9bd36d17801@oracle.com> On 10/15/17 2:06 AM, John Paul Adrian Glaubitz wrote: > Hi Roman! > > Please let me look at SPARC next week first before merging this. > > And thanks for notifying me that Zero is broken again *sigh*. > > People, please test your changes. Yes, I know you all just care about Hotspot. But please understand that there are many people out there who rely on Zero, i.e. they are using it. Breaking code that people actively use is not nice and should not happen in a project like OpenJDK. > > Building Zero takes maybe 5 minutes on a fast x86 machine, so I would like to ask everyone to please test their changes against Zero as well. These tests will keep the headaches for people relying on Zero low and also avoids that distributions have to ship many patches on top of OpenJDK upstream. I used to be able to compile and link Zero and have fixed it but as an occasional task, it's something that stops working. At one point, I thought I'd filed an internal bug so that zero is built in JPRT. So I can compile zero again, but can't link on OL7 (Oracle's RedHat version of linux). Error: failed /scratch/cphillim/hg/10ptr3/build/linux-x64/jdk/lib/server/libjvm.so, because libffi.so.5: cannot open shared object file: No such file or directory I did a "yum install libffi" which seemed to succeed. Help? Coleen > > If you cannot test your patch on a given platform X, please let me know. I have access to every platform supported by OpenJDK except AIX/PPC. > > Thanks, > Adrian > >> On Oct 15, 2017, at 12:41 AM, Roman Kennke wrote: >> >> The JEP to remove the Shark compiler has received exclusively positive feedback (JDK-8189173) on zero-dev. So here comes the big patch to remove it. >> >> What I have done: >> >> grep -i -R shark src >> grep -i -R shark make >> grep -i -R shark doc >> grep -i -R shark doc >> >> and purged any reference to shark. Almost everything was straightforward. >> >> The only things I wasn't really sure of: >> >> - in globals.hpp, I re-arranged the KIND_* bits to account for the gap that removing KIND_SHARK left. I hope that's good? >> - in relocInfo_zero.hpp I put a ShouldNotCallThis() in pd_address_in_code(), I am not sure it is the right thing to do. If not, what *would* be the right thing? >> >> Then of course I did: >> >> rm -rf src/hotspot/share/shark >> >> I also went through the build machinery and removed stuff related to Shark and LLVM libs. >> >> Now the only references in the whole JDK tree to shark is a 'Shark Bay' in a timezone file, and 'Wireshark' in some tests ;-) >> >> I tested by building a regular x86 JVM and running JTREG tests. All looks fine. >> >> - I could not build zero because it seems broken because of the recent Atomic::* changes >> - I could not test any of the other arches that seemed to reference Shark (arm and sparc) >> >> Here's the full webrev: >> >> http://cr.openjdk.java.net/~rkennke/8171853/webrev.00/ >> >> Can I get a review on this? >> >> Thanks, Roman From rkennke at redhat.com Mon Oct 16 19:37:41 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 16 Oct 2017 21:37:41 +0200 Subject: RFR: 8171853: Remove Shark compiler In-Reply-To: <0ce8d126-1f0c-5cd3-edbc-b9bd36d17801@oracle.com> References: <92a2fec1-88f1-2579-1202-299b13062b7b@redhat.com> <87BC5241-9C27-457F-9856-3D969831DABC@physik.fu-berlin.de> <0ce8d126-1f0c-5cd3-edbc-b9bd36d17801@oracle.com> Message-ID: <9aa4fc18-0c52-6570-902f-586ef981020e@redhat.com> Am 16.10.2017 um 21:31 schrieb coleen.phillimore at oracle.com: > > > On 10/15/17 2:06 AM, John Paul Adrian Glaubitz wrote: >> Hi Roman! >> >> Please let me look at SPARC next week first before merging this. >> >> And thanks for notifying me that Zero is broken again *sigh*. >> >> People, please test your changes. Yes, I know you all just care about >> Hotspot. But please understand that there are many people out there >> who rely on Zero, i.e. they are using it. Breaking code that people >> actively use is not nice and should not happen in a project like >> OpenJDK. >> >> Building Zero takes maybe 5 minutes on a fast x86 machine, so I would >> like to ask everyone to please test their changes against Zero as >> well. These tests will keep the headaches for people relying on Zero >> low and also avoids that distributions have to ship many patches on >> top of OpenJDK upstream. > > I used to be able to compile and link Zero and have fixed it but as an > occasional task, it's something that stops working. > > At one point, I thought I'd filed an internal bug so that zero is > built in JPRT. > > So I can compile zero again, but can't link on OL7 (Oracle's RedHat > version of linux). > > Error: failed > /scratch/cphillim/hg/10ptr3/build/linux-x64/jdk/lib/server/libjvm.so, > because libffi.so.5: cannot open shared object file: No such file or > directory > > I did a "yum install libffi" which seemed to succeed. What you want is: "yum install libffi-devel" This is the only additional dependency that Zero has. And I'm doing this on CentOS7 (an open source version of RHEL7), which should practically be the same in this regard as OL7. Roman From mark.reinhold at oracle.com Mon Oct 16 21:38:18 2017 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Mon, 16 Oct 2017 14:38:18 -0700 (PDT) Subject: JEP 310: Application Class-Data Sharing Message-ID: <20171016213818.E4CE2EA182@eggemoggin.niobe.net> New JEP Candidate: http://openjdk.java.net/jeps/310 - Mark From david.holmes at oracle.com Mon Oct 16 21:58:01 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 17 Oct 2017 07:58:01 +1000 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <5a787ec8-afe6-b8a6-23de-5d6a5b935035@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> <5a787ec8-afe6-b8a6-23de-5d6a5b935035@oracle.com> Message-ID: <2adcda24-1386-b5ee-81d3-2e4604b0f4d5@oracle.com> Seems okay. Thanks, David On 17/10/2017 1:59 AM, coleen.phillimore at oracle.com wrote: > > The latest incremental based on these comments (now running tier1). > http://cr.openjdk.java.net/~coleenp/8188220.review-comments.02/webrev/index.html > > > plus what Roman sent in the "RFR: 8189333: Fix Zero build after > Atomic::xchg changes" thread. > > thanks, > Coleen > > On 10/16/17 9:13 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/14/17 7:36 PM, Kim Barrett wrote: >>>> On Oct 13, 2017, at 2:34 PM, coleen.phillimore at oracle.com wrote: >>>> >>>> >>>> Hi, Here is the version with the changes from Kim's comments that >>>> has passed at least testing with JPRT and tier1, locally.?? More >>>> testing (tier2-5) is in progress. >>>> >>>> Also includes a corrected version of Atomic::sub care of Erik >>>> Osterlund. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev >>>> >>>> Full version: >>>> >>>> http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >>>> >>>> Thanks! >>>> Coleen >>> I still dislike and disagree with what is being proposed regarding >>> replace_if_null. >> >> We can discuss that seperately, please file an RFE. >>> >>> ------------------------------------------------------------------------------ >>> >>> I forgot that I'd promised you an updated Atomic::sub definition. >>> Unfortunately, the new one still has problems, performing some >>> conversions that should not be permitted (and are disallowed by >>> Atomic::add).? Try this instead.? (This hasn't been tested, not even >>> compiled; hopefully I don't have any typos or anything.)? The intent >>> is that this supports the same conversions as Atomic::add. >>> >>> template >>> inline D Atomic::sub(I sub_value, D volatile* dest) { >>> ?? STATIC_ASSERT(IsPointer::value || IsIntegral::value); >>> ?? STATIC_ASSERT(IsIntegral::value); >>> ?? // If D is a pointer type, use [u]intptr_t as the addend type, >>> ?? // matching signedness of I.? Otherwise, use D as the addend type. >>> ?? typedef typename Conditional::value, intptr_t, >>> uintptr_t>::type PI; >>> ?? typedef typename Conditional::value, PI, D>::type >>> AddendType; >>> ?? // Only allow conversions that can't change the value. >>> ?? STATIC_ASSERT(IsSigned::value == IsSigned::value); >>> ?? STATIC_ASSERT(sizeof(I) <= sizeof(AddendType)); >>> ?? AddendType addend = sub_value; >>> ?? // Assumes two's complement integer representation. >>> ?? #pragma warning(suppress: 4146) // In case AddendType is not signed. >>> ?? return Atomic::add(-addend, dest); >>> } >> >> Uh, Ok.? I'll try it out. >>> >>>>>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>>>>> 7960?? Atomic::add(-n, &_num_par_pushes); >>>>>> >>>>>> Atomic::sub >>>>> fixed. >>> Nope, not fixed in http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >> >> Missed it twice now.? I think I have it now. >>>>>> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >>>>>> ?? 200?????? PerRegionTable* res = >>>>>> ?? 201???????? Atomic::cmpxchg(nxt, &_free_list, fl); >>>>>> >>>>>> Please remove the line break, now that the code has been simplified. >>>>>> >>>>>> But wait, doesn't this alloc exhibit classic ABA problems?? I *think* >>>>>> this works because alloc and bulk_free are called in different >>>>>> phases, >>>>>> never overlapping. >>>>> I don't know.? Do you want to file a bug to investigate this? >>>>> fixed. >>> No, I now think it?s ok, though confusing. >>> >>>>>> src/hotspot/share/gc/g1/sparsePRT.cpp >>>>>> ?? 295???? SparsePRT* res = >>>>>> ?? 296?????? Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >>>>>> and >>>>>> ?? 307???? SparsePRT* res = >>>>>> ?? 308?????? Atomic::cmpxchg(next, &_head_expanded_list, hd); >>>>>> >>>>>> I'd rather not have the line breaks in these either. >>>>>> >>>>>> And get_from_expanded_list also appears to have classic ABA problems. >>>>>> I *think* this works because add_to_expanded_list and >>>>>> get_from_expanded_list are called in different phases, never >>>>>> overlapping. >>>>> Fixed, same question as above?? Or one bug to investigate both? >>> Again, I think it?s ok, though confusing. >>> >>>>>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>>>>> ?? 262?? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>>>>> ?? 263?????????????????????????????????? (volatile intptr_t *)&_data, >>>>>> ?? 264 (intptr_t)old_age._data); >>>>>> >>>>>> This should be >>>>>> >>>>>> ??? return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >>>>> fixed. >>> Still casting the result. >> >> I thought I fixed it.? I think I fixed it now. >>> >>>>>> src/hotspot/share/oops/method.hpp >>>>>> ?? 139?? volatile address from_compiled_entry() const?? { return >>>>>> OrderAccess::load_acquire(&_from_compiled_entry); } >>>>>> ?? 140?? volatile address from_compiled_entry_no_trampoline() const; >>>>>> ?? 141?? volatile address from_interpreted_entry() const{ return >>>>>> OrderAccess::load_acquire(&_from_interpreted_entry); } >>>>>> >>>>>> [pre-existing] >>>>>> The volatile qualifiers here seem suspect to me. >>>>> Again much suspicion about concurrency and giant pain, which I >>>>> remember, of debugging these when they were broken. >>> Let me be more direct: the volatile qualifiers for the function return >>> types are bogus and confusing, and should be removed. >> >> Okay, sure. >> >>> >>>>>> src/hotspot/share/prims/jni.cpp >>>>>> >>>>>> [pre-existing] >>>>>> >>>>>> copy_jni_function_table should be using Copy::disjoint_words_atomic. >>>>> yuck. >>> Of course, neither is entirely technically correct, since both are >>> treating conversion of function pointers to void* as okay in shared >>> code, e.g. violating some of the raison d'etre of CAST_{TO,FROM}_FN_PTR. >>> For way more detail than you probably care about, see the discussion >>> starting here: >>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018578.html >>> >>> through (5 messages in total) >>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018623.html >>> >>> >>> Oh well. >>> >>>>>> src/hotspot/share/runtime/mutex.hpp >>>>>> >>>>>> [pre-existing] >>>>>> >>>>>> I think the Address member of the SplitWord union is unused. Looking >>>>>> at AcquireOrPush (and others), I'm wondering whether it *should* be >>>>>> used there, or whether just using intptr_t casts and doing integral >>>>>> arithmetic (as is presently being done) is easier and clearer. >>>>>> >>>>>> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >>>>>> rather than polluting the global namespace.? And technically, that >>>>>> name is reserved word. >>>>> I moved both this and _LBIT into the top of mutex.cpp since they >>>>> are used there. >>> Good. >>> >>>>> Cant define const intptr_t _LBIT =1; in a class in our version of C++. >>> Sorry, please explain?? If you tried to move it into SplitWord, that >>> doesn?t work; >>> unions are not permitted to have static data members (I don?t >>> off-hand know why, >>> just that it?s explicitly forbidden). >>> >>> And you left the seemingly unused Address member in SplitWord. >> >> This is the compilation error I get: >> >> /scratch/cphillim/hg/10ptr2/open/src/hotspot/share/runtime/mutex.hpp:124:33: >> error: non-static data member initializers only available with >> -std=c++11 or -std=gnu++11 [-Werror] >> ?? const intptr_t _NEW_LOCKBIT = 1; >> >> >> I don't own this SplitWord code so do not want to remove the unused >> Address member. >> >>> >>>>>> src/hotspot/share/runtime/thread.cpp >>>>>> 4707?? intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, >>>>>> (intptr_t)0); >>>>>> >>>>>> This and other places suggest LOCKBIT should be defined as intptr_t, >>>>>> rather than as an enum value.? The MuxBits enum type is unused. >>>>>> >>>>>> And the cast of 0 is another case where implicit widening would be >>>>>> nice. >>>>> Making LOCKBIT a const intptr_t = 1 removes a lot of casts. >>> Because of the new definition of LOCKBIT I noticed the immediately >>> preceeding typedef for MutexT, which seems to be unused. >> >> Removed MutexT. >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/oops/cpCache.cpp >>> ? 114 bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { >>> ? 115?? intptr_t result = Atomic::cmpxchg(flags, &_flags, (intx)0); >>> ? 116?? return (result == 0); >>> ? 117 } >>> >>> [I missed this on earlier pass.] >>> >>> Should be >>> >>> bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { >>> ?? return Atomic::cmpxchg(flags, &_flags, (intx)0) == 0; >>> } >>> >>> Otherwise, I end up asking why result is intptr_t when the cmpxchg is >>> dealing with intx.? Yeah, one's a typedef of the other, but mixing >>> them like that in the same expression is not helpful. >>> >>> >> Sure why not? >> >> Actually init_flags_atomic is not used and neither is >> init_method_flags_atomic so I did one better and removed them. >> >> Thanks for the again thorough code review and Atomic::sub.?? I'll post >> incremental when it compiles. >> >> Coleen > From volker.simonis at gmail.com Mon Oct 16 23:07:39 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 16 Oct 2017 23:07:39 +0000 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <50cda0ab-f403-372a-ce51-1a27d8821448@oracle.com> <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> <4109f960-078f-e582-3c78-71f201a265fd@redhat.com> Message-ID: Volker Simonis schrieb am Di. 10. Okt. 2017 um 19:17: > On Tue, Oct 10, 2017 at 9:42 AM, Andrew Haley wrote: > > On 09/10/17 20:24, Volker Simonis wrote: > >> Unfortunately we can't easily generate these stubs during > >> 'stubRoutines_init1()' because > >> 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map > >> base address which is only initialized in > >> 'CardTableModRefBS::initialize()' during 'univers_init()' which > >> happens after 'stubRoutines_init1()'. > > > > Yes you can, you can do something like we do for narrow_ptrs_base: > > > > if (Universe::is_fully_initialized()) { > > mov(rheapbase, Universe::narrow_ptrs_base()); > > } else { > > lea(rheapbase, > ExternalAddress((address)Universe::narrow_ptrs_base_addr())); > > ldr(rheapbase, Address(rheapbase)); > > } > > > Hi, can somebody please take a look at the new version of the patch? Thanks, Volker > Hi Andrew, > > thanks for your suggestion. Yes, I could do that, but that would > replace a constant load in the barrier with a constant load plus a > load from memory, because during stubRoutines_init1() heap won't be > initialized. Not sure about this, but I think we want to avoid this > overhead in the barriers. > > Also, Christian proposed in a previous mail to replace the G1 barrier > stubs on SPARC with simple runtime calls like on other platforms. > While I think that it is probably worthwhile thinking about such a > change, I don't know the exact history of these stubs and probably > some GC experts should decide if that's really a good idea. I'd be > happy to open an extra issue for following up on that path. > > But for the moments I've simply added a new initialization step > "g1_barrier_stubs_init()" between 'univers_init()' and > interpreter_init() which is empty on all platforms except SPARC where > it generates the corresponding stubs: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v3/ > > I've built and smoke-tested the new change on Windows, MacOS, > Solaris/SPARC, AIX, Linux/x86_64/ppc64/ppc64le/s390. Unfortunately I > don't have access to ARM machines so I couldn't check arm,arm64 and > aarch64 although I don't expect any problems there (actually I've just > added an empty method there). But it would be great if somebody could > check that for any case. > > @Vladimir: I've also rebased the change for "8187091: > ReturnBlobToWrongHeapTest fails because of problems in > CodeHeap::contains_blob()": > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ > > Because it changes the same files like 8166317 it should be applied > and pushed only after 8166317 was pushed. > > Thank you and best regards, > Volker > > > -- > > Andrew Haley > > Java Platform Lead Engineer > > Red Hat UK Ltd. > > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From kim.barrett at oracle.com Mon Oct 16 23:29:06 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 16 Oct 2017 19:29:06 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <5a787ec8-afe6-b8a6-23de-5d6a5b935035@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> <5a787ec8-afe6-b8a6-23de-5d6a5b935035@oracle.com> Message-ID: <8DB3C54F-EA41-4F08-A2DB-839A577A2A55@oracle.com> > On Oct 16, 2017, at 11:59 AM, coleen.phillimore at oracle.com wrote: > > > The latest incremental based on these comments (now running tier1). > http://cr.openjdk.java.net/~coleenp/8188220.review-comments.02/webrev/index.html > > plus what Roman sent in the "RFR: 8189333: Fix Zero build after Atomic::xchg changes" thread. Looks good. I?ll file an RFR for replace_if_null From coleen.phillimore at oracle.com Tue Oct 17 00:45:03 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 16 Oct 2017 20:45:03 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <2adcda24-1386-b5ee-81d3-2e4604b0f4d5@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> <5a787ec8-afe6-b8a6-23de-5d6a5b935035@oracle.com> <2adcda24-1386-b5ee-81d3-2e4604b0f4d5@oracle.com> Message-ID: <06cb2315-a5ae-5c3d-365b-c24bbf0e5bdb@oracle.com> Thanks David! Coleen On 10/16/17 5:58 PM, David Holmes wrote: > Seems okay. > > Thanks, > David > > On 17/10/2017 1:59 AM, coleen.phillimore at oracle.com wrote: >> >> The latest incremental based on these comments (now running tier1). >> http://cr.openjdk.java.net/~coleenp/8188220.review-comments.02/webrev/index.html >> >> >> plus what Roman sent in the "RFR: 8189333: Fix Zero build after >> Atomic::xchg changes" thread. >> >> thanks, >> Coleen >> >> On 10/16/17 9:13 AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> On 10/14/17 7:36 PM, Kim Barrett wrote: >>>>> On Oct 13, 2017, at 2:34 PM, coleen.phillimore at oracle.com wrote: >>>>> >>>>> >>>>> Hi, Here is the version with the changes from Kim's comments that >>>>> has passed at least testing with JPRT and tier1, locally.?? More >>>>> testing (tier2-5) is in progress. >>>>> >>>>> Also includes a corrected version of Atomic::sub care of Erik >>>>> Osterlund. >>>>> >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/8188220.kim-review-changes/webrev >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/8188220.review-comments/webrev >>>>> >>>>> Full version: >>>>> >>>>> http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >>>>> >>>>> Thanks! >>>>> Coleen >>>> I still dislike and disagree with what is being proposed regarding >>>> replace_if_null. >>> >>> We can discuss that seperately, please file an RFE. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> I forgot that I'd promised you an updated Atomic::sub definition. >>>> Unfortunately, the new one still has problems, performing some >>>> conversions that should not be permitted (and are disallowed by >>>> Atomic::add).? Try this instead.? (This hasn't been tested, not even >>>> compiled; hopefully I don't have any typos or anything.) The intent >>>> is that this supports the same conversions as Atomic::add. >>>> >>>> template >>>> inline D Atomic::sub(I sub_value, D volatile* dest) { >>>> ?? STATIC_ASSERT(IsPointer::value || IsIntegral::value); >>>> ?? STATIC_ASSERT(IsIntegral::value); >>>> ?? // If D is a pointer type, use [u]intptr_t as the addend type, >>>> ?? // matching signedness of I.? Otherwise, use D as the addend type. >>>> ?? typedef typename Conditional::value, intptr_t, >>>> uintptr_t>::type PI; >>>> ?? typedef typename Conditional::value, PI, D>::type >>>> AddendType; >>>> ?? // Only allow conversions that can't change the value. >>>> ?? STATIC_ASSERT(IsSigned::value == IsSigned::value); >>>> ?? STATIC_ASSERT(sizeof(I) <= sizeof(AddendType)); >>>> ?? AddendType addend = sub_value; >>>> ?? // Assumes two's complement integer representation. >>>> ?? #pragma warning(suppress: 4146) // In case AddendType is not >>>> signed. >>>> ?? return Atomic::add(-addend, dest); >>>> } >>> >>> Uh, Ok.? I'll try it out. >>>> >>>>>>> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp >>>>>>> 7960?? Atomic::add(-n, &_num_par_pushes); >>>>>>> >>>>>>> Atomic::sub >>>>>> fixed. >>>> Nope, not fixed in >>>> http://cr.openjdk.java.net/~coleenp/8188220.03/webrev >>> >>> Missed it twice now.? I think I have it now. >>>>>>> src/hotspot/share/gc/g1/heapRegionRemSet.cpp >>>>>>> ?? 200?????? PerRegionTable* res = >>>>>>> ?? 201???????? Atomic::cmpxchg(nxt, &_free_list, fl); >>>>>>> >>>>>>> Please remove the line break, now that the code has been >>>>>>> simplified. >>>>>>> >>>>>>> But wait, doesn't this alloc exhibit classic ABA problems?? I >>>>>>> *think* >>>>>>> this works because alloc and bulk_free are called in different >>>>>>> phases, >>>>>>> never overlapping. >>>>>> I don't know.? Do you want to file a bug to investigate this? >>>>>> fixed. >>>> No, I now think it?s ok, though confusing. >>>> >>>>>>> src/hotspot/share/gc/g1/sparsePRT.cpp >>>>>>> ?? 295???? SparsePRT* res = >>>>>>> ?? 296?????? Atomic::cmpxchg(sprt, &_head_expanded_list, hd); >>>>>>> and >>>>>>> ?? 307???? SparsePRT* res = >>>>>>> ?? 308?????? Atomic::cmpxchg(next, &_head_expanded_list, hd); >>>>>>> >>>>>>> I'd rather not have the line breaks in these either. >>>>>>> >>>>>>> And get_from_expanded_list also appears to have classic ABA >>>>>>> problems. >>>>>>> I *think* this works because add_to_expanded_list and >>>>>>> get_from_expanded_list are called in different phases, never >>>>>>> overlapping. >>>>>> Fixed, same question as above?? Or one bug to investigate both? >>>> Again, I think it?s ok, though confusing. >>>> >>>>>>> src/hotspot/share/gc/shared/taskqueue.inline.hpp >>>>>>> ?? 262?? return (size_t) Atomic::cmpxchg((intptr_t)new_age._data, >>>>>>> ?? 263?????????????????????????????????? (volatile intptr_t >>>>>>> *)&_data, >>>>>>> ?? 264 (intptr_t)old_age._data); >>>>>>> >>>>>>> This should be >>>>>>> >>>>>>> ??? return Atomic::cmpxchg(new_age._data, &_data, old_age._data); >>>>>> fixed. >>>> Still casting the result. >>> >>> I thought I fixed it.? I think I fixed it now. >>>> >>>>>>> src/hotspot/share/oops/method.hpp >>>>>>> ?? 139?? volatile address from_compiled_entry() const?? { return >>>>>>> OrderAccess::load_acquire(&_from_compiled_entry); } >>>>>>> ?? 140?? volatile address from_compiled_entry_no_trampoline() >>>>>>> const; >>>>>>> ?? 141?? volatile address from_interpreted_entry() const{ return >>>>>>> OrderAccess::load_acquire(&_from_interpreted_entry); } >>>>>>> >>>>>>> [pre-existing] >>>>>>> The volatile qualifiers here seem suspect to me. >>>>>> Again much suspicion about concurrency and giant pain, which I >>>>>> remember, of debugging these when they were broken. >>>> Let me be more direct: the volatile qualifiers for the function return >>>> types are bogus and confusing, and should be removed. >>> >>> Okay, sure. >>> >>>> >>>>>>> src/hotspot/share/prims/jni.cpp >>>>>>> >>>>>>> [pre-existing] >>>>>>> >>>>>>> copy_jni_function_table should be using >>>>>>> Copy::disjoint_words_atomic. >>>>>> yuck. >>>> Of course, neither is entirely technically correct, since both are >>>> treating conversion of function pointers to void* as okay in shared >>>> code, e.g. violating some of the raison d'etre of >>>> CAST_{TO,FROM}_FN_PTR. >>>> For way more detail than you probably care about, see the discussion >>>> starting here: >>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018578.html >>>> >>>> through (5 messages in total) >>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-March/018623.html >>>> >>>> >>>> Oh well. >>>> >>>>>>> src/hotspot/share/runtime/mutex.hpp >>>>>>> >>>>>>> [pre-existing] >>>>>>> >>>>>>> I think the Address member of the SplitWord union is unused. >>>>>>> Looking >>>>>>> at AcquireOrPush (and others), I'm wondering whether it *should* be >>>>>>> used there, or whether just using intptr_t casts and doing integral >>>>>>> arithmetic (as is presently being done) is easier and clearer. >>>>>>> >>>>>>> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp >>>>>>> rather than polluting the global namespace.? And technically, that >>>>>>> name is reserved word. >>>>>> I moved both this and _LBIT into the top of mutex.cpp since they >>>>>> are used there. >>>> Good. >>>> >>>>>> Cant define const intptr_t _LBIT =1; in a class in our version of >>>>>> C++. >>>> Sorry, please explain?? If you tried to move it into SplitWord, >>>> that doesn?t work; >>>> unions are not permitted to have static data members (I don?t >>>> off-hand know why, >>>> just that it?s explicitly forbidden). >>>> >>>> And you left the seemingly unused Address member in SplitWord. >>> >>> This is the compilation error I get: >>> >>> /scratch/cphillim/hg/10ptr2/open/src/hotspot/share/runtime/mutex.hpp:124:33: >>> error: non-static data member initializers only available with >>> -std=c++11 or -std=gnu++11 [-Werror] >>> ?? const intptr_t _NEW_LOCKBIT = 1; >>> >>> >>> I don't own this SplitWord code so do not want to remove the unused >>> Address member. >>> >>>> >>>>>>> src/hotspot/share/runtime/thread.cpp >>>>>>> 4707?? intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, >>>>>>> (intptr_t)0); >>>>>>> >>>>>>> This and other places suggest LOCKBIT should be defined as >>>>>>> intptr_t, >>>>>>> rather than as an enum value.? The MuxBits enum type is unused. >>>>>>> >>>>>>> And the cast of 0 is another case where implicit widening would >>>>>>> be nice. >>>>>> Making LOCKBIT a const intptr_t = 1 removes a lot of casts. >>>> Because of the new definition of LOCKBIT I noticed the immediately >>>> preceeding typedef for MutexT, which seems to be unused. >>> >>> Removed MutexT. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/oops/cpCache.cpp >>>> ? 114 bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { >>>> ? 115?? intptr_t result = Atomic::cmpxchg(flags, &_flags, (intx)0); >>>> ? 116?? return (result == 0); >>>> ? 117 } >>>> >>>> [I missed this on earlier pass.] >>>> >>>> Should be >>>> >>>> bool ConstantPoolCacheEntry::init_flags_atomic(intx flags) { >>>> ?? return Atomic::cmpxchg(flags, &_flags, (intx)0) == 0; >>>> } >>>> >>>> Otherwise, I end up asking why result is intptr_t when the cmpxchg is >>>> dealing with intx.? Yeah, one's a typedef of the other, but mixing >>>> them like that in the same expression is not helpful. >>>> >>>> >>> Sure why not? >>> >>> Actually init_flags_atomic is not used and neither is >>> init_method_flags_atomic so I did one better and removed them. >>> >>> Thanks for the again thorough code review and Atomic::sub. I'll post >>> incremental when it compiles. >>> >>> Coleen >> From coleen.phillimore at oracle.com Tue Oct 17 00:46:07 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 16 Oct 2017 20:46:07 -0400 Subject: RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot In-Reply-To: <8DB3C54F-EA41-4F08-A2DB-839A577A2A55@oracle.com> References: <7A475565-84D9-4F98-AE7B-2FDB206CC6E1@oracle.com> <49b7c5f7-2f6d-16ac-0b60-140619d0fffd@oracle.com> <0784FA88-3D00-4DBA-8726-3A3B23C91B3E@oracle.com> <2f32124d-2428-678d-ef50-3306231aa848@oracle.com> <5a787ec8-afe6-b8a6-23de-5d6a5b935035@oracle.com> <8DB3C54F-EA41-4F08-A2DB-839A577A2A55@oracle.com> Message-ID: Thanks, Kim. Coleen On 10/16/17 7:29 PM, Kim Barrett wrote: >> On Oct 16, 2017, at 11:59 AM, coleen.phillimore at oracle.com wrote: >> >> >> The latest incremental based on these comments (now running tier1). >> http://cr.openjdk.java.net/~coleenp/8188220.review-comments.02/webrev/index.html >> >> plus what Roman sent in the "RFR: 8189333: Fix Zero build after Atomic::xchg changes" thread. > Looks good. > > I?ll file an RFR for replace_if_null > From nils.eliasson at oracle.com Tue Oct 17 14:37:09 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 17 Oct 2017 16:37:09 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: Hi Robbin, I have reviewed the compiler parts of the patch - c1, c2, jvmci and cpu*. Look great! Regards, Nils On 2017-10-11 15:37, Robbin Ehn wrote: > Hi all, > > Starting the review of the code while JEP work is still not completed. > > JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 > > This JEP introduces a way to execute a callback on threads without > performing a global VM safepoint. It makes it both possible and cheap > to stop individual threads and not just all threads or none. > > Entire changeset: > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ > > Divided into 3-parts, > SafepointMechanism abstraction: > http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ > Consolidating polling page allocation: > http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ > Handshakes: > http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ > > A handshake operation is a callback that is executed for each > JavaThread while that thread is in a safepoint safe state. The > callback is executed either by the thread itself or by the VM thread > while keeping the thread in a blocked state. The big difference > between safepointing and handshaking is that the per thread operation > will be performed on all threads as soon as possible and they will > continue to execute as soon as it?s own operation is completed. If a > JavaThread is known to be running, then a handshake can be performed > with that single JavaThread as well. > > The current safepointing scheme is modified to perform an indirection > through a per-thread pointer which will allow a single thread's > execution to be forced to trap on the guard page. In order to force a > thread to yield the VM updates the per-thread pointer for the > corresponding thread to point to the guarded page. > > Example of potential use-cases: > -Biased lock revocation > -External requests for stack traces > -Deoptimization > -Async exception delivery > -External suspension > -Eliding memory barriers > > All of these will benefit the VM moving towards becoming more > low-latency friendly by reducing the number of global safepoints. > Platforms that do not yet implement the per JavaThread poll, a > fallback to normal safepoint is in place. HandshakeOneThread will then > be a normal safepoint. The supported platforms are Linux x64 and > Solaris SPARC. > > Tested heavily with various test suits and comes with a few new tests. > > Performance testing using standardized benchmark show no signification > changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris > SPARC (not statistically ensured). A minor regression for the load vs > load load on x64 is expected and a slight increase on SPARC due to the > cost of ?materializing? the page vs load load. > The time to trigger a safepoint was measured on a large machine to not > be an issue. The looping over threads and arming the polling page will > benefit from the work on JavaThread life-cycle (8167108 - SMR and > JavaThread Lifecycle: > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) > which puts all JavaThreads in an array instead of a linked list. > > Thanks, Robbin From erik.osterlund at oracle.com Tue Oct 17 15:30:30 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 17 Oct 2017 17:30:30 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: <59E62216.5070401@oracle.com> Hi Robbin, Looks fantastic. Thanks, /Erik On 2017-10-11 15:37, Robbin Ehn wrote: > Hi all, > > Starting the review of the code while JEP work is still not completed. > > JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 > > This JEP introduces a way to execute a callback on threads without > performing a global VM safepoint. It makes it both possible and cheap > to stop individual threads and not just all threads or none. > > Entire changeset: > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ > > Divided into 3-parts, > SafepointMechanism abstraction: > http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ > Consolidating polling page allocation: > http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ > Handshakes: > http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ > > A handshake operation is a callback that is executed for each > JavaThread while that thread is in a safepoint safe state. The > callback is executed either by the thread itself or by the VM thread > while keeping the thread in a blocked state. The big difference > between safepointing and handshaking is that the per thread operation > will be performed on all threads as soon as possible and they will > continue to execute as soon as it?s own operation is completed. If a > JavaThread is known to be running, then a handshake can be performed > with that single JavaThread as well. > > The current safepointing scheme is modified to perform an indirection > through a per-thread pointer which will allow a single thread's > execution to be forced to trap on the guard page. In order to force a > thread to yield the VM updates the per-thread pointer for the > corresponding thread to point to the guarded page. > > Example of potential use-cases: > -Biased lock revocation > -External requests for stack traces > -Deoptimization > -Async exception delivery > -External suspension > -Eliding memory barriers > > All of these will benefit the VM moving towards becoming more > low-latency friendly by reducing the number of global safepoints. > Platforms that do not yet implement the per JavaThread poll, a > fallback to normal safepoint is in place. HandshakeOneThread will then > be a normal safepoint. The supported platforms are Linux x64 and > Solaris SPARC. > > Tested heavily with various test suits and comes with a few new tests. > > Performance testing using standardized benchmark show no signification > changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris > SPARC (not statistically ensured). A minor regression for the load vs > load load on x64 is expected and a slight increase on SPARC due to the > cost of ?materializing? the page vs load load. > The time to trigger a safepoint was measured on a large machine to not > be an issue. The looping over threads and arming the polling page will > benefit from the work on JavaThread life-cycle (8167108 - SMR and > JavaThread Lifecycle: > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) > which puts all JavaThreads in an array instead of a linked list. > > Thanks, Robbin From vladimir.kozlov at oracle.com Tue Oct 17 17:49:10 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Oct 2017 10:49:10 -0700 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> <4109f960-078f-e582-3c78-71f201a265fd@redhat.com> Message-ID: <1c2eeaa1-334a-4744-ba31-87e580faafa5@oracle.com> Hi, Volker You can do a trick with NOT_SPARC() macro to avoid defining empty method on all platforms: +#if INCLUDE_ALL_GCS +void g1_barrier_stubs_init() NOT_SPARC( {} ); // depends on universe_init, must be before interpreter_init +#endif I thought we pushed 8187091 already. I will keep it in mind. Thanks, Vladimir On 10/10/17 10:17 AM, Volker Simonis wrote: > On Tue, Oct 10, 2017 at 9:42 AM, Andrew Haley wrote: >> On 09/10/17 20:24, Volker Simonis wrote: >>> Unfortunately we can't easily generate these stubs during >>> 'stubRoutines_init1()' because >>> 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map >>> base address which is only initialized in >>> 'CardTableModRefBS::initialize()' during 'univers_init()' which >>> happens after 'stubRoutines_init1()'. >> >> Yes you can, you can do something like we do for narrow_ptrs_base: >> >> if (Universe::is_fully_initialized()) { >> mov(rheapbase, Universe::narrow_ptrs_base()); >> } else { >> lea(rheapbase, ExternalAddress((address)Universe::narrow_ptrs_base_addr())); >> ldr(rheapbase, Address(rheapbase)); >> } >> > > Hi Andrew, > > thanks for your suggestion. Yes, I could do that, but that would > replace a constant load in the barrier with a constant load plus a > load from memory, because during stubRoutines_init1() heap won't be > initialized. Not sure about this, but I think we want to avoid this > overhead in the barriers. > > Also, Christian proposed in a previous mail to replace the G1 barrier > stubs on SPARC with simple runtime calls like on other platforms. > While I think that it is probably worthwhile thinking about such a > change, I don't know the exact history of these stubs and probably > some GC experts should decide if that's really a good idea. I'd be > happy to open an extra issue for following up on that path. > > But for the moments I've simply added a new initialization step > "g1_barrier_stubs_init()" between 'univers_init()' and > interpreter_init() which is empty on all platforms except SPARC where > it generates the corresponding stubs: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v3/ > > I've built and smoke-tested the new change on Windows, MacOS, > Solaris/SPARC, AIX, Linux/x86_64/ppc64/ppc64le/s390. Unfortunately I > don't have access to ARM machines so I couldn't check arm,arm64 and > aarch64 although I don't expect any problems there (actually I've just > added an empty method there). But it would be great if somebody could > check that for any case. > > @Vladimir: I've also rebased the change for "8187091: > ReturnBlobToWrongHeapTest fails because of problems in > CodeHeap::contains_blob()": > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ > > Because it changes the same files like 8166317 it should be applied > and pushed only after 8166317 was pushed. > > Thank you and best regards, > Volker > >> -- >> Andrew Haley >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Tue Oct 17 17:58:47 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 17 Oct 2017 17:58:47 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: <15dd917732444959b7785efbe6640952@sap.com> Hi Robbin, my first impression is very good. Thanks for providing the webrev. I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. Would it be ok to move the decision between what to use to platform code? (Some platforms could still use both if this is beneficial.) E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn Sent: Mittwoch, 11. Oktober 2017 15:38 To: hotspot-dev developers Subject: RFR(XL): 8185640: Thread-local handshakes Hi all, Starting the review of the code while JEP work is still not completed. JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none. Entire changeset: http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ Divided into 3-parts, SafepointMechanism abstraction: http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ Consolidating polling page allocation: http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ Handshakes: http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well. The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. Example of potential use-cases: -Biased lock revocation -External requests for stack traces -Deoptimization -Async exception delivery -External suspension -Eliding memory barriers All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC. Tested heavily with various test suits and comes with a few new tests. Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list. Thanks, Robbin From coleen.phillimore at oracle.com Tue Oct 17 18:18:42 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 17 Oct 2017 14:18:42 -0400 Subject: Result: New hotspot Group Member: Ioi Lam Message-ID: <7331f8aa-6396-6a62-069e-b13ebc12c8d3@oracle.com> The vote for Ioi Lam [1] is now closed. Yes: 10 Veto: 0 Abstain: 0 According to the Bylaws definition of Lazy Consensus, this is sufficient to approve the nomination. Coleen Phillimore [1] http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028480.html From vladimir.kozlov at oracle.com Tue Oct 17 18:30:22 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 17 Oct 2017 11:30:22 -0700 Subject: RFR: Newer AMD 17h (EPYC) Processor family defaults In-Reply-To: <886a112d-fc55-34d5-6e70-1e6a78cf1b0f@oracle.com> References: <4d4fe028-ea6a-4f77-ab69-5c2bc752e1f5@oracle.com> <47bc0a90-ed6a-220a-c3d1-b4df2d8bbc74@oracle.com> <9c53f889-e58e-33ac-3c05-874779b469d6@oracle.com> <45619e1a-9eb0-a540-193b-5187da3bf6bc@oracle.com> <66e4af43-c0e2-6d64-b69f-35166150ffa2@oracle.com> <11af0f62-ba6b-d533-d23c-750d2ca012c7@oracle.com> <886a112d-fc55-34d5-6e70-1e6a78cf1b0f@oracle.com> Message-ID: Nils, I would like to review you changes as separate bug in separate thread. I don't like your current changes and want to discuss them. Please, send separate RFR. Thanks, Vladimir On 10/16/17 7:26 AM, Nils Eliasson wrote: > Hi, > > I ran into a problem touching this area, so I'm hijacking this thread. > > > #ifdef COMPILER2 > > - if (MaxVectorSize > 16) { > > - // Limit vectors size to 16 bytes on current AMD cpus. > >> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >> ?????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >> ???? } >> ?#endif // COMPILER2 > > > The limitation of MaxVecorSize to 16 for some processors in this code > has the sideeffect that the TypeVect::VECTY and mreg2type[Op_VecY] won't > be initalized even though the platform has the capability. > > Type.cpp:~660 > > [...] > >?? if (Matcher::vector_size_supported(T_FLOAT,4)) { > >???? TypeVect::VECTX = TypeVect::make(T_FLOAT,4); > >?? } > >?? if (Matcher::vector_size_supported(T_FLOAT,8)) { > >???? TypeVect::VECTY = TypeVect::make(T_FLOAT,8); > >?? } > >?? if (Matcher::vector_size_supported(T_FLOAT,16)) { > >???? TypeVect::VECTZ = TypeVect::make(T_FLOAT,16); > >?? } > [...] > >?? mreg2type[Op_VecX] = TypeVect::VECTX; > >?? mreg2type[Op_VecY] = TypeVect::VECTY; > >?? mreg2type[Op_VecZ] = TypeVect::VECTZ; > > In the ad-files feature flags (UseAVX etc.) are used to control what > rules should be matched if it has effects on specific vector registers. > Here we have a mismatch. > > On a platform that supports AVX2 but have MaxVectorSize limited to 16, > the VM will fail in regalloc when the TypeVect::VECTY/mreg2type[Op_VecY] > is uninitalized, we will also hit asserts in a few places like: > assert(Matcher::vector_size_supported(T_FLOAT,RegMask::SlotsPerVecY), > "sanity"); > > Shouldn't the type initalization in type.cpp be dependent on feature > flag (UseAVX etc.) instead of MaxVectorLength? (The type for the vector > registers are initalized if the platform supports them, but they might > not be used if MaxVectorSize is limited.) > > I suggest something like this: > > http://cr.openjdk.java.net/~neliasso/maxvectorsize/webrev/ > > I will open a bug and and a separate RFR if this seems reasonable to you. > > Regards, > Nils Eliasson > > On 2017-09-22 09:41, Rohit Arul Raj wrote: >> Thanks Vladimir, >> >> On Wed, Sep 20, 2017 at 10:07 PM, Vladimir Kozlov >> wrote: >>>> ?????? __ cmpl(rax, 0x80000000);???? // Is cpuid(0x80000001) supported? >>>> ?????? __ jcc(Assembler::belowEqual, done); >>>> ?????? __ cmpl(rax, 0x80000004);???? // Is cpuid(0x80000005) supported? >>>> -??? __ jccb(Assembler::belowEqual, ext_cpuid1); >>>> +?? __ jcc(Assembler::belowEqual, ext_cpuid1); >>> >>> Good. You may need to increase size of the buffer too (to be safe) to >>> 1100: >>> >>> static const int stub_size = 1000; >>> >> Please find the updated patch after the requested change. >> >> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >> b/src/cpu/x86/vm/vm_version_x86.cpp >> --- a/src/cpu/x86/vm/vm_version_x86.cpp >> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >> @@ -46,7 +46,7 @@ >> ? address VM_Version::_cpuinfo_cont_addr = 0; >> >> ? static BufferBlob* stub_blob; >> -static const int stub_size = 1000; >> +static const int stub_size = 1100; >> >> ? extern "C" { >> ??? typedef void (*get_cpu_info_stub_t)(void*); >> @@ -70,7 +70,7 @@ >> ????? bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >> >> ????? Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >> -??? Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >> done, wrapup; >> +??? Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >> ext_cpuid8, done, wrapup; >> ????? Label legacy_setup, save_restore_except, legacy_save_restore, >> start_simd_check; >> >> ????? StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >> @@ -267,14 +267,30 @@ >> ????? __ cmpl(rax, 0x80000000);???? // Is cpuid(0x80000001) supported? >> ????? __ jcc(Assembler::belowEqual, done); >> ????? __ cmpl(rax, 0x80000004);???? // Is cpuid(0x80000005) supported? >> -??? __ jccb(Assembler::belowEqual, ext_cpuid1); >> +??? __ jcc(Assembler::belowEqual, ext_cpuid1); >> ????? __ cmpl(rax, 0x80000006);???? // Is cpuid(0x80000007) supported? >> ????? __ jccb(Assembler::belowEqual, ext_cpuid5); >> ????? __ cmpl(rax, 0x80000007);???? // Is cpuid(0x80000008) supported? >> ????? __ jccb(Assembler::belowEqual, ext_cpuid7); >> +??? __ cmpl(rax, 0x80000008);???? // Is cpuid(0x80000009 and above) >> supported? >> +??? __ jccb(Assembler::belowEqual, ext_cpuid8); >> +??? __ cmpl(rax, 0x8000001E);???? // Is cpuid(0x8000001E) supported? >> +??? __ jccb(Assembler::below, ext_cpuid8); >> +??? // >> +??? // Extended cpuid(0x8000001E) >> +??? // >> +??? __ movl(rax, 0x8000001E); >> +??? __ cpuid(); >> +??? __ lea(rsi, Address(rbp, >> in_bytes(VM_Version::ext_cpuid1E_offset()))); >> +??? __ movl(Address(rsi, 0), rax); >> +??? __ movl(Address(rsi, 4), rbx); >> +??? __ movl(Address(rsi, 8), rcx); >> +??? __ movl(Address(rsi,12), rdx); >> + >> ????? // >> ????? // Extended cpuid(0x80000008) >> ????? // >> +??? __ bind(ext_cpuid8); >> ????? __ movl(rax, 0x80000008); >> ????? __ cpuid(); >> ????? __ lea(rsi, Address(rbp, >> in_bytes(VM_Version::ext_cpuid8_offset()))); >> @@ -1109,11 +1125,27 @@ >> ????? } >> >> ? #ifdef COMPILER2 >> -??? if (MaxVectorSize > 16) { >> -????? // Limit vectors size to 16 bytes on current AMD cpus. >> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >> ??????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >> ????? } >> ? #endif // COMPILER2 >> + >> +??? // Some defaults for AMD family 17h >> +??? if ( cpu_family() == 0x17 ) { >> +????? // On family 17h processors use XMM and UnalignedLoadStores for >> Array Copy >> +????? if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >> +????? } >> +????? if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >> +????? } >> +#ifdef COMPILER2 >> +????? if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >> +??????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >> +????? } >> +#endif >> +??? } >> ??? } >> >> ??? if( is_intel() ) { // Intel cpus specific settings >> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >> b/src/cpu/x86/vm/vm_version_x86.hpp >> --- a/src/cpu/x86/vm/vm_version_x86.hpp >> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >> @@ -228,6 +228,15 @@ >> ????? } bits; >> ??? }; >> >> +? union ExtCpuid1EEbx { >> +??? uint32_t value; >> +??? struct { >> +????? uint32_t????????????????? : 8, >> +?????????????? threads_per_core : 8, >> +??????????????????????????????? : 16; >> +??? } bits; >> +? }; >> + >> ??? union XemXcr0Eax { >> ????? uint32_t value; >> ????? struct { >> @@ -398,6 +407,12 @@ >> ????? ExtCpuid8Ecx ext_cpuid8_ecx; >> ????? uint32_t???? ext_cpuid8_edx; // reserved >> >> +??? // cpuid function 0x8000001E // AMD 17h >> +??? uint32_t????? ext_cpuid1E_eax; >> +??? ExtCpuid1EEbx ext_cpuid1E_ebx; // threads per core (AMD17h) >> +??? uint32_t????? ext_cpuid1E_ecx; >> +??? uint32_t????? ext_cpuid1E_edx; // unused currently >> + >> ????? // extended control register XCR0 (the XFEATURE_ENABLED_MASK >> register) >> ????? XemXcr0Eax?? xem_xcr0_eax; >> ????? uint32_t???? xem_xcr0_edx; // reserved >> @@ -505,6 +520,14 @@ >> ??????? result |= CPU_CLMUL; >> ????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >> ??????? result |= CPU_RTM; >> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> +?????? result |= CPU_ADX; >> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> +????? result |= CPU_BMI2; >> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> +????? result |= CPU_SHA; >> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> +????? result |= CPU_FMA; >> >> ????? // AMD features. >> ????? if (is_amd()) { >> @@ -518,16 +541,8 @@ >> ????? } >> ????? // Intel features. >> ????? if(is_intel()) { >> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >> -???????? result |= CPU_ADX; >> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >> -??????? result |= CPU_BMI2; >> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >> -??????? result |= CPU_SHA; >> ??????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >> ????????? result |= CPU_LZCNT; >> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >> -??????? result |= CPU_FMA; >> ??????? // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >> support for prefetchw >> ??????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >> ????????? result |= CPU_3DNOW_PREFETCH; >> @@ -590,6 +605,7 @@ >> ??? static ByteSize ext_cpuid5_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >> ??? static ByteSize ext_cpuid7_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >> ??? static ByteSize ext_cpuid8_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >> +? static ByteSize ext_cpuid1E_offset() { return >> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >> ??? static ByteSize tpl_cpuidB0_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >> ??? static ByteSize tpl_cpuidB1_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >> ??? static ByteSize tpl_cpuidB2_offset() { return >> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >> @@ -673,8 +689,12 @@ >> ????? if (is_intel() && supports_processor_topology()) { >> ??????? result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >> ????? } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >> -????? result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >> -?????????????? cores_per_cpu(); >> +????? if (cpu_family() >= 0x17) { >> +??????? result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + 1; >> +????? } else { >> +??????? result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >> +???????????????? cores_per_cpu(); >> +????? } >> ????? } >> ????? return (result == 0 ? 1 : result); >> ??? } >> >> Regards, >> Rohit >> >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>> @@ -70,7 +70,7 @@ >>>> ?????? bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>> >>>> ?????? Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>> -??? Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>> done, wrapup; >>>> +??? Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>> ext_cpuid8, done, wrapup; >>>> ?????? Label legacy_setup, save_restore_except, legacy_save_restore, >>>> start_simd_check; >>>> >>>> ?????? StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>> @@ -267,14 +267,30 @@ >>>> ?????? __ cmpl(rax, 0x80000000);???? // Is cpuid(0x80000001) supported? >>>> ?????? __ jcc(Assembler::belowEqual, done); >>>> ?????? __ cmpl(rax, 0x80000004);???? // Is cpuid(0x80000005) supported? >>>> -??? __ jccb(Assembler::belowEqual, ext_cpuid1); >>>> +??? __ jcc(Assembler::belowEqual, ext_cpuid1); >>>> ?????? __ cmpl(rax, 0x80000006);???? // Is cpuid(0x80000007) supported? >>>> ?????? __ jccb(Assembler::belowEqual, ext_cpuid5); >>>> ?????? __ cmpl(rax, 0x80000007);???? // Is cpuid(0x80000008) supported? >>>> ?????? __ jccb(Assembler::belowEqual, ext_cpuid7); >>>> +??? __ cmpl(rax, 0x80000008);???? // Is cpuid(0x80000009 and above) >>>> supported? >>>> +??? __ jccb(Assembler::belowEqual, ext_cpuid8); >>>> +??? __ cmpl(rax, 0x8000001E);???? // Is cpuid(0x8000001E) supported? >>>> +??? __ jccb(Assembler::below, ext_cpuid8); >>>> +??? // >>>> +??? // Extended cpuid(0x8000001E) >>>> +??? // >>>> +??? __ movl(rax, 0x8000001E); >>>> +??? __ cpuid(); >>>> +??? __ lea(rsi, Address(rbp, >>>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>>> +??? __ movl(Address(rsi, 0), rax); >>>> +??? __ movl(Address(rsi, 4), rbx); >>>> +??? __ movl(Address(rsi, 8), rcx); >>>> +??? __ movl(Address(rsi,12), rdx); >>>> + >>>> ?????? // >>>> ?????? // Extended cpuid(0x80000008) >>>> ?????? // >>>> +??? __ bind(ext_cpuid8); >>>> ?????? __ movl(rax, 0x80000008); >>>> ?????? __ cpuid(); >>>> ?????? __ lea(rsi, Address(rbp, >>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>> @@ -1109,11 +1125,27 @@ >>>> ?????? } >>>> >>>> ?? #ifdef COMPILER2 >>>> -??? if (MaxVectorSize > 16) { >>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>> ???????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>> ?????? } >>>> ?? #endif // COMPILER2 >>>> + >>>> +??? // Some defaults for AMD family 17h >>>> +??? if ( cpu_family() == 0x17 ) { >>>> +????? // On family 17h processors use XMM and UnalignedLoadStores for >>>> Array Copy >>>> +????? if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>> +????? } >>>> +????? if (supports_sse2() && >>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>> +????? } >>>> +#ifdef COMPILER2 >>>> +????? if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>> +??????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>> +????? } >>>> +#endif >>>> +??? } >>>> ???? } >>>> >>>> ???? if( is_intel() ) { // Intel cpus specific settings >>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>> @@ -228,6 +228,15 @@ >>>> ?????? } bits; >>>> ???? }; >>>> >>>> +? union ExtCpuid1EEbx { >>>> +??? uint32_t value; >>>> +??? struct { >>>> +????? uint32_t????????????????? : 8, >>>> +?????????????? threads_per_core : 8, >>>> +??????????????????????????????? : 16; >>>> +??? } bits; >>>> +? }; >>>> + >>>> ???? union XemXcr0Eax { >>>> ?????? uint32_t value; >>>> ?????? struct { >>>> @@ -398,6 +407,12 @@ >>>> ?????? ExtCpuid8Ecx ext_cpuid8_ecx; >>>> ?????? uint32_t???? ext_cpuid8_edx; // reserved >>>> >>>> +??? // cpuid function 0x8000001E // AMD 17h >>>> +??? uint32_t????? ext_cpuid1E_eax; >>>> +??? ExtCpuid1EEbx ext_cpuid1E_ebx; // threads per core (AMD17h) >>>> +??? uint32_t????? ext_cpuid1E_ecx; >>>> +??? uint32_t????? ext_cpuid1E_edx; // unused currently >>>> + >>>> ?????? // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>> register) >>>> ?????? XemXcr0Eax?? xem_xcr0_eax; >>>> ?????? uint32_t???? xem_xcr0_edx; // reserved >>>> @@ -505,6 +520,14 @@ >>>> ???????? result |= CPU_CLMUL; >>>> ?????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>> ???????? result |= CPU_RTM; >>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> +?????? result |= CPU_ADX; >>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> +????? result |= CPU_BMI2; >>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> +????? result |= CPU_SHA; >>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> +????? result |= CPU_FMA; >>>> >>>> ?????? // AMD features. >>>> ?????? if (is_amd()) { >>>> @@ -518,16 +541,8 @@ >>>> ?????? } >>>> ?????? // Intel features. >>>> ?????? if(is_intel()) { >>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>> -???????? result |= CPU_ADX; >>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>> -??????? result |= CPU_BMI2; >>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>> -??????? result |= CPU_SHA; >>>> ???????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>> ?????????? result |= CPU_LZCNT; >>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>> -??????? result |= CPU_FMA; >>>> ???????? // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>> support for prefetchw >>>> ???????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>> ?????????? result |= CPU_3DNOW_PREFETCH; >>>> @@ -590,6 +605,7 @@ >>>> ???? static ByteSize ext_cpuid5_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>> ???? static ByteSize ext_cpuid7_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>> ???? static ByteSize ext_cpuid8_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>> +? static ByteSize ext_cpuid1E_offset() { return >>>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>>> ???? static ByteSize tpl_cpuidB0_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>> ???? static ByteSize tpl_cpuidB1_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>> ???? static ByteSize tpl_cpuidB2_offset() { return >>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>> @@ -673,8 +689,12 @@ >>>> ?????? if (is_intel() && supports_processor_topology()) { >>>> ???????? result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>> ?????? } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>> -????? result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>> -?????????????? cores_per_cpu(); >>>> +????? if (cpu_family() >= 0x17) { >>>> +??????? result = _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core >>>> + 1; >>>> +????? } else { >>>> +??????? result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>> +???????????????? cores_per_cpu(); >>>> +????? } >>>> ?????? } >>>> ?????? return (result == 0 ? 1 : result); >>>> ???? } >>>> >>>> Please let me know your comments. >>>> Thanks for your review. >>>> >>>> Regards, >>>> Rohit >>>> >>>>> >>>>> On 9/11/17 9:52 PM, Rohit Arul Raj wrote: >>>>>> >>>>>> Hello David, >>>>>> >>>>>>>> >>>>>>>> 1. ExtCpuid1EEx >>>>>>>> >>>>>>>> Should this be ExtCpuid1EEbx? (I see the naming here is somewhat >>>>>>>> inconsistent - and potentially confusing: I would have preferred to >>>>>>>> see >>>>>>>> things like ExtCpuid_1E_Ebx, to make it clear.) >>>>>>> >>>>>>> >>>>>>> Yes, I can change it accordingly. >>>>>>> >>>>>> I have attached the updated, re-tested patch as per your comments >>>>>> above. >>>>>> >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>> @@ -70,7 +70,7 @@ >>>>>> ??????? bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>>>> >>>>>> ??????? Label detect_486, cpu486, detect_586, std_cpuid1, std_cpuid4; >>>>>> -??? Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>>> done, wrapup; >>>>>> +??? Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, ext_cpuid7, >>>>>> ext_cpuid8, done, wrapup; >>>>>> ??????? Label legacy_setup, save_restore_except, legacy_save_restore, >>>>>> start_simd_check; >>>>>> >>>>>> ??????? StubCodeMark mark(this, "VM_Version", "get_cpu_info_stub"); >>>>>> @@ -272,9 +272,23 @@ >>>>>> ??????? __ jccb(Assembler::belowEqual, ext_cpuid5); >>>>>> ??????? __ cmpl(rax, 0x80000007);???? // Is cpuid(0x80000008) >>>>>> supported? >>>>>> ??????? __ jccb(Assembler::belowEqual, ext_cpuid7); >>>>>> +??? __ cmpl(rax, 0x80000008);???? // Is cpuid(0x8000001E) supported? >>>>>> +??? __ jccb(Assembler::belowEqual, ext_cpuid8); >>>>>> +??? // >>>>>> +??? // Extended cpuid(0x8000001E) >>>>>> +??? // >>>>>> +??? __ movl(rax, 0x8000001E); >>>>>> +??? __ cpuid(); >>>>>> +??? __ lea(rsi, Address(rbp, >>>>>> in_bytes(VM_Version::ext_cpuid_1E_offset()))); >>>>>> +??? __ movl(Address(rsi, 0), rax); >>>>>> +??? __ movl(Address(rsi, 4), rbx); >>>>>> +??? __ movl(Address(rsi, 8), rcx); >>>>>> +??? __ movl(Address(rsi,12), rdx); >>>>>> + >>>>>> ??????? // >>>>>> ??????? // Extended cpuid(0x80000008) >>>>>> ??????? // >>>>>> +??? __ bind(ext_cpuid8); >>>>>> ??????? __ movl(rax, 0x80000008); >>>>>> ??????? __ cpuid(); >>>>>> ??????? __ lea(rsi, Address(rbp, >>>>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>>>> @@ -1109,11 +1123,27 @@ >>>>>> ??????? } >>>>>> >>>>>> ??? #ifdef COMPILER2 >>>>>> -??? if (MaxVectorSize > 16) { >>>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>> ????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>> ??????? } >>>>>> ??? #endif // COMPILER2 >>>>>> + >>>>>> +??? // Some defaults for AMD family 17h >>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>> +????? // On family 17h processors use XMM and UnalignedLoadStores >>>>>> for >>>>>> Array Copy >>>>>> +????? if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>> +????? } >>>>>> +????? if (supports_sse2() && >>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>> +????? } >>>>>> +#ifdef COMPILER2 >>>>>> +????? if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>> +??????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>> +????? } >>>>>> +#endif >>>>>> +??? } >>>>>> ????? } >>>>>> >>>>>> ????? if( is_intel() ) { // Intel cpus specific settings >>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>> @@ -228,6 +228,15 @@ >>>>>> ??????? } bits; >>>>>> ????? }; >>>>>> >>>>>> +? union ExtCpuid_1E_Ebx { >>>>>> +??? uint32_t value; >>>>>> +??? struct { >>>>>> +????? uint32_t????????????????? : 8, >>>>>> +?????????????? threads_per_core : 8, >>>>>> +??????????????????????????????? : 16; >>>>>> +??? } bits; >>>>>> +? }; >>>>>> + >>>>>> ????? union XemXcr0Eax { >>>>>> ??????? uint32_t value; >>>>>> ??????? struct { >>>>>> @@ -398,6 +407,12 @@ >>>>>> ??????? ExtCpuid8Ecx ext_cpuid8_ecx; >>>>>> ??????? uint32_t???? ext_cpuid8_edx; // reserved >>>>>> >>>>>> +??? // cpuid function 0x8000001E // AMD 17h >>>>>> +??? uint32_t??????? ext_cpuid_1E_eax; >>>>>> +??? ExtCpuid_1E_Ebx ext_cpuid_1E_ebx; // threads per core (AMD17h) >>>>>> +??? uint32_t??????? ext_cpuid_1E_ecx; >>>>>> +??? uint32_t??????? ext_cpuid_1E_edx; // unused currently >>>>>> + >>>>>> ??????? // extended control register XCR0 (the XFEATURE_ENABLED_MASK >>>>>> register) >>>>>> ??????? XemXcr0Eax?? xem_xcr0_eax; >>>>>> ??????? uint32_t???? xem_xcr0_edx; // reserved >>>>>> @@ -505,6 +520,14 @@ >>>>>> ????????? result |= CPU_CLMUL; >>>>>> ??????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>> ????????? result |= CPU_RTM; >>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> +?????? result |= CPU_ADX; >>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> +????? result |= CPU_BMI2; >>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> +????? result |= CPU_SHA; >>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> +????? result |= CPU_FMA; >>>>>> >>>>>> ??????? // AMD features. >>>>>> ??????? if (is_amd()) { >>>>>> @@ -518,16 +541,8 @@ >>>>>> ??????? } >>>>>> ??????? // Intel features. >>>>>> ??????? if(is_intel()) { >>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>> -???????? result |= CPU_ADX; >>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>> -??????? result |= CPU_BMI2; >>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>> -??????? result |= CPU_SHA; >>>>>> ????????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>> ??????????? result |= CPU_LZCNT; >>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>> -??????? result |= CPU_FMA; >>>>>> ????????? // for Intel, ecx.bits.misalignsse bit (bit 8) indicates >>>>>> support for prefetchw >>>>>> ????????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>> ??????????? result |= CPU_3DNOW_PREFETCH; >>>>>> @@ -590,6 +605,7 @@ >>>>>> ????? static ByteSize ext_cpuid5_offset() { return >>>>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>>>> ????? static ByteSize ext_cpuid7_offset() { return >>>>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>>>> ????? static ByteSize ext_cpuid8_offset() { return >>>>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>>>> +? static ByteSize ext_cpuid_1E_offset() { return >>>>>> byte_offset_of(CpuidInfo, ext_cpuid_1E_eax); } >>>>>> ????? static ByteSize tpl_cpuidB0_offset() { return >>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>>>> ????? static ByteSize tpl_cpuidB1_offset() { return >>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>>>> ????? static ByteSize tpl_cpuidB2_offset() { return >>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>>>> @@ -673,8 +689,11 @@ >>>>>> ??????? if (is_intel() && supports_processor_topology()) { >>>>>> ????????? result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>>>> ??????? } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>>>> -????? result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>> -?????????????? cores_per_cpu(); >>>>>> +????? if (cpu_family() >= 0x17) >>>>>> +??????? result = >>>>>> _cpuid_info.ext_cpuid_1E_ebx.bits.threads_per_core + >>>>>> 1; >>>>>> +????? else >>>>>> +??????? result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>> +???????????????? cores_per_cpu(); >>>>>> ??????? } >>>>>> ??????? return (result == 0 ? 1 : result); >>>>>> ????? } >>>>>> >>>>>> >>>>>> Please let me know your comments >>>>>> >>>>>> Thanks for your time. >>>>>> >>>>>> Regards, >>>>>> Rohit >>>>>> >>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> >>>>>>>>> Reference: >>>>>>>>> >>>>>>>>> >>>>>>>>> https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf >>>>>>>>> >>>>>>>>> [Pg 82] >>>>>>>>> >>>>>>>>> ??????? CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId) >>>>>>>>> ????????? 15:8 ThreadsPerCore: threads per core. Read-only. Reset: >>>>>>>>> XXh. >>>>>>>>> The number of threads per core is ThreadsPerCore+1. >>>>>>>>> >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>> @@ -70,7 +70,7 @@ >>>>>>>>> ???????? bool use_evex = FLAG_IS_DEFAULT(UseAVX) || (UseAVX > 2); >>>>>>>>> >>>>>>>>> ???????? Label detect_486, cpu486, detect_586, std_cpuid1, >>>>>>>>> std_cpuid4; >>>>>>>>> -??? Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, >>>>>>>>> ext_cpuid7, >>>>>>>>> done, wrapup; >>>>>>>>> +??? Label sef_cpuid, ext_cpuid, ext_cpuid1, ext_cpuid5, >>>>>>>>> ext_cpuid7, >>>>>>>>> ext_cpuid8, done, wrapup; >>>>>>>>> ???????? Label legacy_setup, save_restore_except, >>>>>>>>> legacy_save_restore, >>>>>>>>> start_simd_check; >>>>>>>>> >>>>>>>>> ???????? StubCodeMark mark(this, "VM_Version", >>>>>>>>> "get_cpu_info_stub"); >>>>>>>>> @@ -272,9 +272,23 @@ >>>>>>>>> ???????? __ jccb(Assembler::belowEqual, ext_cpuid5); >>>>>>>>> ???????? __ cmpl(rax, 0x80000007);???? // Is cpuid(0x80000008) >>>>>>>>> supported? >>>>>>>>> ???????? __ jccb(Assembler::belowEqual, ext_cpuid7); >>>>>>>>> +??? __ cmpl(rax, 0x80000008);???? // Is cpuid(0x8000001E) >>>>>>>>> supported? >>>>>>>>> +??? __ jccb(Assembler::belowEqual, ext_cpuid8); >>>>>>>>> +??? // >>>>>>>>> +??? // Extended cpuid(0x8000001E) >>>>>>>>> +??? // >>>>>>>>> +??? __ movl(rax, 0x8000001E); >>>>>>>>> +??? __ cpuid(); >>>>>>>>> +??? __ lea(rsi, Address(rbp, >>>>>>>>> in_bytes(VM_Version::ext_cpuid1E_offset()))); >>>>>>>>> +??? __ movl(Address(rsi, 0), rax); >>>>>>>>> +??? __ movl(Address(rsi, 4), rbx); >>>>>>>>> +??? __ movl(Address(rsi, 8), rcx); >>>>>>>>> +??? __ movl(Address(rsi,12), rdx); >>>>>>>>> + >>>>>>>>> ???????? // >>>>>>>>> ???????? // Extended cpuid(0x80000008) >>>>>>>>> ???????? // >>>>>>>>> +??? __ bind(ext_cpuid8); >>>>>>>>> ???????? __ movl(rax, 0x80000008); >>>>>>>>> ???????? __ cpuid(); >>>>>>>>> ???????? __ lea(rsi, Address(rbp, >>>>>>>>> in_bytes(VM_Version::ext_cpuid8_offset()))); >>>>>>>>> @@ -1109,11 +1123,27 @@ >>>>>>>>> ???????? } >>>>>>>>> >>>>>>>>> ???? #ifdef COMPILER2 >>>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>>> -????? // Limit vectors size to 16 bytes on current AMD cpus. >>>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>> ?????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>> ???????? } >>>>>>>>> ???? #endif // COMPILER2 >>>>>>>>> + >>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>> +????? // On family 17h processors use XMM and UnalignedLoadStores >>>>>>>>> for >>>>>>>>> Array Copy >>>>>>>>> +????? if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>> +????? } >>>>>>>>> +????? if (supports_sse2() && >>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>> { >>>>>>>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>> +????? } >>>>>>>>> +#ifdef COMPILER2 >>>>>>>>> +????? if (supports_sse4_2() && >>>>>>>>> FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>> +??????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>> +????? } >>>>>>>>> +#endif >>>>>>>>> +??? } >>>>>>>>> ?????? } >>>>>>>>> >>>>>>>>> ?????? if( is_intel() ) { // Intel cpus specific settings >>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>> @@ -228,6 +228,15 @@ >>>>>>>>> ???????? } bits; >>>>>>>>> ?????? }; >>>>>>>>> >>>>>>>>> +? union ExtCpuid1EEx { >>>>>>>>> +??? uint32_t value; >>>>>>>>> +??? struct { >>>>>>>>> +????? uint32_t????????????????? : 8, >>>>>>>>> +?????????????? threads_per_core : 8, >>>>>>>>> +??????????????????????????????? : 16; >>>>>>>>> +??? } bits; >>>>>>>>> +? }; >>>>>>>>> + >>>>>>>>> ?????? union XemXcr0Eax { >>>>>>>>> ???????? uint32_t value; >>>>>>>>> ???????? struct { >>>>>>>>> @@ -398,6 +407,12 @@ >>>>>>>>> ???????? ExtCpuid8Ecx ext_cpuid8_ecx; >>>>>>>>> ???????? uint32_t???? ext_cpuid8_edx; // reserved >>>>>>>>> >>>>>>>>> +??? // cpuid function 0x8000001E // AMD 17h >>>>>>>>> +??? uint32_t???? ext_cpuid1E_eax; >>>>>>>>> +??? ExtCpuid1EEx ext_cpuid1E_ebx; // threads per core (AMD17h) >>>>>>>>> +??? uint32_t???? ext_cpuid1E_ecx; >>>>>>>>> +??? uint32_t???? ext_cpuid1E_edx; // unused currently >>>>>>>>> + >>>>>>>>> ???????? // extended control register XCR0 (the >>>>>>>>> XFEATURE_ENABLED_MASK >>>>>>>>> register) >>>>>>>>> ???????? XemXcr0Eax?? xem_xcr0_eax; >>>>>>>>> ???????? uint32_t???? xem_xcr0_edx; // reserved >>>>>>>>> @@ -505,6 +520,14 @@ >>>>>>>>> ?????????? result |= CPU_CLMUL; >>>>>>>>> ???????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>> ?????????? result |= CPU_RTM; >>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> +?????? result |= CPU_ADX; >>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> +????? result |= CPU_BMI2; >>>>>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> +????? result |= CPU_SHA; >>>>>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> +????? result |= CPU_FMA; >>>>>>>>> >>>>>>>>> ???????? // AMD features. >>>>>>>>> ???????? if (is_amd()) { >>>>>>>>> @@ -518,16 +541,8 @@ >>>>>>>>> ???????? } >>>>>>>>> ???????? // Intel features. >>>>>>>>> ???????? if(is_intel()) { >>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>> -???????? result |= CPU_ADX; >>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>> -??????? result |= CPU_BMI2; >>>>>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>> -??????? result |= CPU_SHA; >>>>>>>>> ?????????? if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0) >>>>>>>>> ???????????? result |= CPU_LZCNT; >>>>>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>> -??????? result |= CPU_FMA; >>>>>>>>> ?????????? // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>> indicates >>>>>>>>> support for prefetchw >>>>>>>>> ?????????? if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) { >>>>>>>>> ???????????? result |= CPU_3DNOW_PREFETCH; >>>>>>>>> @@ -590,6 +605,7 @@ >>>>>>>>> ?????? static ByteSize ext_cpuid5_offset() { return >>>>>>>>> byte_offset_of(CpuidInfo, ext_cpuid5_eax); } >>>>>>>>> ?????? static ByteSize ext_cpuid7_offset() { return >>>>>>>>> byte_offset_of(CpuidInfo, ext_cpuid7_eax); } >>>>>>>>> ?????? static ByteSize ext_cpuid8_offset() { return >>>>>>>>> byte_offset_of(CpuidInfo, ext_cpuid8_eax); } >>>>>>>>> +? static ByteSize ext_cpuid1E_offset() { return >>>>>>>>> byte_offset_of(CpuidInfo, ext_cpuid1E_eax); } >>>>>>>>> ?????? static ByteSize tpl_cpuidB0_offset() { return >>>>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB0_eax); } >>>>>>>>> ?????? static ByteSize tpl_cpuidB1_offset() { return >>>>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB1_eax); } >>>>>>>>> ?????? static ByteSize tpl_cpuidB2_offset() { return >>>>>>>>> byte_offset_of(CpuidInfo, tpl_cpuidB2_eax); } >>>>>>>>> @@ -673,8 +689,11 @@ >>>>>>>>> ???????? if (is_intel() && supports_processor_topology()) { >>>>>>>>> ?????????? result = _cpuid_info.tpl_cpuidB0_ebx.bits.logical_cpus; >>>>>>>>> ???????? } else if (_cpuid_info.std_cpuid1_edx.bits.ht != 0) { >>>>>>>>> -????? result = _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>>>>> -?????????????? cores_per_cpu(); >>>>>>>>> +????? if (cpu_family() >= 0x17) >>>>>>>>> +??????? result = >>>>>>>>> _cpuid_info.ext_cpuid1E_ebx.bits.threads_per_core + >>>>>>>>> 1; >>>>>>>>> +????? else >>>>>>>>> +??????? result = >>>>>>>>> _cpuid_info.std_cpuid1_ebx.bits.threads_per_cpu / >>>>>>>>> +???????????????? cores_per_cpu(); >>>>>>>>> ???????? } >>>>>>>>> ???????? return (result == 0 ? 1 : result); >>>>>>>>> ?????? } >>>>>>>>> >>>>>>>>> I have attached the patch for review. >>>>>>>>> Please let me know your comments. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Rohit >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Vladimir >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>> >>>>>>>>>>> No comments on AMD specific changes. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>> On 5/09/2017 3:43 PM, David Holmes wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hello David, >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was unable to apply your patch to latest jdk10/hs/hotspot >>>>>>>>>>>>>> repo. >>>>>>>>>>>>>> >>>>>>>>>>>>> I checked out the latest jdk10/hs/hotspot [parent: >>>>>>>>>>>>> 13548:1a9c2e07a826] >>>>>>>>>>>>> and was able to apply the patch >>>>>>>>>>>>> [epyc-amd17h-defaults-3Sept.patch] >>>>>>>>>>>>> without any issues. >>>>>>>>>>>>> Can you share the error message that you are getting? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I was getting this: >>>>>>>>>>>> >>>>>>>>>>>> applying hotspot.patch >>>>>>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>> Hunk #1 FAILED at 1108 >>>>>>>>>>>> 1 out of 1 hunks FAILED -- saving rejects to file >>>>>>>>>>>> src/cpu/x86/vm/vm_version_x86.cpp.rej >>>>>>>>>>>> patching file src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>> Hunk #2 FAILED at 522 >>>>>>>>>>>> 1 out of 2 hunks FAILED -- saving rejects to file >>>>>>>>>>>> src/cpu/x86/vm/vm_version_x86.hpp.rej >>>>>>>>>>>> abort: patch failed to apply >>>>>>>>>>>> >>>>>>>>>>>> but I started again and this time it applied fine, so not sure >>>>>>>>>>>> what >>>>>>>>>>>> was >>>>>>>>>>>> going on there. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Rohit >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello Vladimir, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Changes look good. Only question I have is about >>>>>>>>>>>>>>>>>> MaxVectorSize. >>>>>>>>>>>>>>>>>> It >>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> set >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 16 only in presence of AVX: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Does that code works for AMD 17h too? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for pointing that out. Yes, the code works fine for >>>>>>>>>>>>>>>>> AMD >>>>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>>>> So >>>>>>>>>>>>>>>>> I have removed the surplus check for MaxVectorSize from my >>>>>>>>>>>>>>>>> patch. >>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> have updated, re-tested and attached the patch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Which check you removed? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My older patch had the below mentioned check which was >>>>>>>>>>>>>>> required >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> JDK9 where the default MaxVectorSize was 64. It has been >>>>>>>>>>>>>>> handled >>>>>>>>>>>>>>> better in openJDK10. So this check is not required anymore. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> +????? if (MaxVectorSize > 32) { >>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>> .. >>>>>>>>>>>>>>> .. >>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have one query regarding the setting of UseSHA flag: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> AMD 17h has support for SHA. >>>>>>>>>>>>>>>>> AMD 15h doesn't have? support for SHA. Still "UseSHA" flag >>>>>>>>>>>>>>>>> gets >>>>>>>>>>>>>>>>> enabled for it based on the availability of BMI2 and >>>>>>>>>>>>>>>>> AVX2. Is >>>>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>> underlying reason for this? I have handled this in the >>>>>>>>>>>>>>>>> patch >>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>>> wanted to confirm. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It was done with next changes which use only AVX2 and BMI2 >>>>>>>>>>>>>>>> instructions >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> calculate SHA-256: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't know if AMD 15h supports these instructions and can >>>>>>>>>>>>>>>> execute >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> code. You need to test it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 >>>>>>>>>>>>>>> instructions, >>>>>>>>>>>>>>> it should work. >>>>>>>>>>>>>>> Confirmed by running following sanity tests: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So I have removed those SHA checks from my patch too. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please find attached updated, re-tested patch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>> @@ -1109,11 +1109,27 @@ >>>>>>>>>>>>>>> ?????????? } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ?????? #ifdef COMPILER2 >>>>>>>>>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>>>>>>>>> -????? // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < 17h. >>>>>>>>>>>>>>> ???????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>> ?????????? } >>>>>>>>>>>>>>> ?????? #endif // COMPILER2 >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>> +????? // On family 17h processors use XMM and >>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) { >>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>> +????? if (supports_sse4_2() && >>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseFPUForSpilling)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>> +??? } >>>>>>>>>>>>>>> ???????? } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ???????? if( is_intel() ) { // Intel cpus specific settings >>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>> ???????????? result |= CPU_CLMUL; >>>>>>>>>>>>>>> ?????????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>>> ???????????? result |= CPU_RTM; >>>>>>>>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>> +?????? result |= CPU_ADX; >>>>>>>>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>> +????? result |= CPU_BMI2; >>>>>>>>>>>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>> +????? result |= CPU_SHA; >>>>>>>>>>>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>> +????? result |= CPU_FMA; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ?????????? // AMD features. >>>>>>>>>>>>>>> ?????????? if (is_amd()) { >>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>> ?????????????? result |= CPU_LZCNT; >>>>>>>>>>>>>>> ???????????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0) >>>>>>>>>>>>>>> ?????????????? result |= CPU_SSE4A; >>>>>>>>>>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>> +??????? result |= CPU_HT; >>>>>>>>>>>>>>> ?????????? } >>>>>>>>>>>>>>> ?????????? // Intel features. >>>>>>>>>>>>>>> ?????????? if(is_intel()) { >>>>>>>>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>> -???????? result |= CPU_ADX; >>>>>>>>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>> -??????? result |= CPU_BMI2; >>>>>>>>>>>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>> -??????? result |= CPU_SHA; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != >>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>> ?????????????? result |= CPU_LZCNT; >>>>>>>>>>>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>> -??????? result |= CPU_FMA; >>>>>>>>>>>>>>> ???????????? // for Intel, ecx.bits.misalignsse bit (bit 8) >>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>> ???????????? if >>>>>>>>>>>>>>> (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != >>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> ?????????????? result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please let me know your comments. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for your time. >>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for taking time to review the code. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>>>>> ????????????? } >>>>>>>>>>>>>>>>> ????????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, false); >>>>>>>>>>>>>>>>> ??????????? } >>>>>>>>>>>>>>>>> +??? if (supports_sha()) { >>>>>>>>>>>>>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>> +??????? warning("SHA instructions are not available on >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> +??? } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ??????????? // some defaults for AMD family 15h >>>>>>>>>>>>>>>>> ??????????? if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>> @@ -1109,11 +1125,40 @@ >>>>>>>>>>>>>>>>> ??????????? } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ??????? #ifdef COMPILER2 >>>>>>>>>>>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>> -????? // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus < >>>>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>>>> ????????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>> ??????????? } >>>>>>>>>>>>>>>>> ??????? #endif // COMPILER2 >>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>> +????? // On family 17h processors use XMM and >>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseXMMForArrayCopy, true); >>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { >>>>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseUnalignedLoadStores, true); >>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>> +????? if (supports_bmi2() && >>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseBMI2Instructions, true); >>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>> +????? if (UseSHA) { >>>>>>>>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>> +????????? warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>> +??????? } >>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>> +????? if (supports_sse4_2()) { >>>>>>>>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>> +??????? } >>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>> +??? } >>>>>>>>>>>>>>>>> ????????? } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ????????? if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>>>> ????????????? result |= CPU_CLMUL; >>>>>>>>>>>>>>>>> ??????????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0) >>>>>>>>>>>>>>>>> ????????????? result |= CPU_RTM; >>>>>>>>>>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>> +?????? result |= CPU_ADX; >>>>>>>>>>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>> +????? result |= CPU_BMI2; >>>>>>>>>>>>>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>> +????? result |= CPU_SHA; >>>>>>>>>>>>>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>> +????? result |= CPU_FMA; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ??????????? // AMD features. >>>>>>>>>>>>>>>>> ??????????? if (is_amd()) { >>>>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>>>> ??????????????? result |= CPU_LZCNT; >>>>>>>>>>>>>>>>> ????????????? if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a >>>>>>>>>>>>>>>>> != 0) >>>>>>>>>>>>>>>>> ??????????????? result |= CPU_SSE4A; >>>>>>>>>>>>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>> +??????? result |= CPU_HT; >>>>>>>>>>>>>>>>> ??????????? } >>>>>>>>>>>>>>>>> ??????????? // Intel features. >>>>>>>>>>>>>>>>> ??????????? if(is_intel()) { >>>>>>>>>>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>> -???????? result |= CPU_ADX; >>>>>>>>>>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>> -??????? result |= CPU_BMI2; >>>>>>>>>>>>>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>> -??????? result |= CPU_SHA; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel >>>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>> ??????????????? result |= CPU_LZCNT; >>>>>>>>>>>>>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>> -??????? result |= CPU_FMA; >>>>>>>>>>>>>>>>> ????????????? // for Intel, ecx.bits.misalignsse bit >>>>>>>>>>>>>>>>> (bit 8) >>>>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>>>> ????????????? if >>>>>>>>>>>>>>>>> (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse >>>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>>>>> ??????????????? result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I think the patch needs updating for jdk10 as I >>>>>>>>>>>>>>>>>>>>> already >>>>>>>>>>>>>>>>>>>>> see >>>>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>> lot of >>>>>>>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>>>>> around UseSHA in vm_version_x86.cpp. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source >>>>>>>>>>>>>>>>>>>> base, >>>>>>>>>>>>>>>>>>>> test >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> resubmit for review. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent: >>>>>>>>>>>>>>>>>>> 13519:71337910df60), did regression testing using jtreg >>>>>>>>>>>>>>>>>>> ($make >>>>>>>>>>>>>>>>>>> default) and didnt find any regressions. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Can anyone please volunteer to review this patch? which >>>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>>> flag/ISA >>>>>>>>>>>>>>>>>>> defaults for newer AMD 17h (EPYC) processor? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ************************* Patch >>>>>>>>>>>>>>>>>>> **************************** >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>> @@ -1088,6 +1088,22 @@ >>>>>>>>>>>>>>>>>>> ?????????????? } >>>>>>>>>>>>>>>>>>> ?????????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, >>>>>>>>>>>>>>>>>>> false); >>>>>>>>>>>>>>>>>>> ???????????? } >>>>>>>>>>>>>>>>>>> +??? if (supports_sha()) { >>>>>>>>>>>>>>>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>> +??????? warning("SHA instructions are not available on >>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>> +??? } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ???????????? // some defaults for AMD family 15h >>>>>>>>>>>>>>>>>>> ???????????? if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>>>> @@ -1109,11 +1125,43 @@ >>>>>>>>>>>>>>>>>>> ???????????? } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ???????? #ifdef COMPILER2 >>>>>>>>>>>>>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>> -????? // Limit vectors size to 16 bytes on current AMD >>>>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>> +????? // Limit vectors size to 16 bytes on AMD cpus >>>>>>>>>>>>>>>>>>> < 17h. >>>>>>>>>>>>>>>>>>> ?????????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>>>> ???????????? } >>>>>>>>>>>>>>>>>>> ???????? #endif // COMPILER2 >>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>>>> +????? // On family 17h processors use XMM and >>>>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> +??????? UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> +??????? UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>> +????? if (supports_bmi2() && >>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) { >>>>>>>>>>>>>>>>>>> +??????? UseBMI2Instructions = true; >>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>> +????? if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>> +????? if (UseSHA) { >>>>>>>>>>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>> +????????? warning("Intrinsics for SHA-384 and SHA-512 >>>>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>> +??????? } >>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>>>> +????? if (supports_sse4_2()) { >>>>>>>>>>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>>>> +??????? } >>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>>>> +??? } >>>>>>>>>>>>>>>>>>> ?????????? } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ?????????? if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>> @@ -505,6 +505,14 @@ >>>>>>>>>>>>>>>>>>> ?????????????? result |= CPU_CLMUL; >>>>>>>>>>>>>>>>>>> ???????????? if (_cpuid_info.sef_cpuid7_ebx.bits.rtm >>>>>>>>>>>>>>>>>>> != 0) >>>>>>>>>>>>>>>>>>> ?????????????? result |= CPU_RTM; >>>>>>>>>>>>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>>> +?????? result |= CPU_ADX; >>>>>>>>>>>>>>>>>>> +??? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>>> +????? result |= CPU_BMI2; >>>>>>>>>>>>>>>>>>> +??? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>>> +????? result |= CPU_SHA; >>>>>>>>>>>>>>>>>>> +??? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>>> +????? result |= CPU_FMA; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ???????????? // AMD features. >>>>>>>>>>>>>>>>>>> ???????????? if (is_amd()) { >>>>>>>>>>>>>>>>>>> @@ -515,19 +523,13 @@ >>>>>>>>>>>>>>>>>>> ???????????????? result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>>> ?????????????? if >>>>>>>>>>>>>>>>>>> (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != >>>>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>>>> ???????????????? result |= CPU_SSE4A; >>>>>>>>>>>>>>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>>>> +??????? result |= CPU_HT; >>>>>>>>>>>>>>>>>>> ???????????? } >>>>>>>>>>>>>>>>>>> ???????????? // Intel features. >>>>>>>>>>>>>>>>>>> ???????????? if(is_intel()) { >>>>>>>>>>>>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>>> -???????? result |= CPU_ADX; >>>>>>>>>>>>>>>>>>> -????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>>> -??????? result |= CPU_BMI2; >>>>>>>>>>>>>>>>>>> -????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>>> -??????? result |= CPU_SHA; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel >>>>>>>>>>>>>>>>>>> != 0) >>>>>>>>>>>>>>>>>>> ???????????????? result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>>> -????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>>> -??????? result |= CPU_FMA; >>>>>>>>>>>>>>>>>>> ?????????????? // for Intel, ecx.bits.misalignsse bit >>>>>>>>>>>>>>>>>>> (bit >>>>>>>>>>>>>>>>>>> 8) >>>>>>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>>>>>> support for prefetchw >>>>>>>>>>>>>>>>>>> ?????????????? if >>>>>>>>>>>>>>>>>>> (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse >>>>>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>>>>> 0) { >>>>>>>>>>>>>>>>>>> ???????????????? result |= CPU_3DNOW_PREFETCH; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ************************************************************** >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Rohit, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I would like an volunteer to review this patch >>>>>>>>>>>>>>>>>>>>>>>> (openJDK9) >>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) >>>>>>>>>>>>>>>>>>>>>>>> processor >>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>>>>>> us >>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>> the commit process. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Unfortunately patches can not be accepted from >>>>>>>>>>>>>>>>>>>>>>> systems >>>>>>>>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> OpenJDK >>>>>>>>>>>>>>>>>>>>>>> infrastructure and ... >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for >>>>>>>>>>>>>>>>>>>>>>>> reference. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ... unfortunately patches tend to get stripped by >>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> mail >>>>>>>>>>>>>>>>>>>>>>> servers. >>>>>>>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> patch is small please include it inline. >>>>>>>>>>>>>>>>>>>>>>> Otherwise you >>>>>>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>> OpenJDK Author who can host it for you on >>>>>>>>>>>>>>>>>>>>>>> cr.openjdk.java.net. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 3) I have done regression testing using jtreg >>>>>>>>>>>>>>>>>>>>>>>> ($make >>>>>>>>>>>>>>>>>>>>>>>> default) >>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>> didnt find any regressions. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to >>>>>>>>>>>>>>>>>>>>>>> comment >>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>> testing >>>>>>>>>>>>>>>>>>>>>>> requirements. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks David, >>>>>>>>>>>>>>>>>>>>>> Yes, it's a small patch. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp >>>>>>>>>>>>>>>>>>>>>> @@ -1051,6 +1051,22 @@ >>>>>>>>>>>>>>>>>>>>>> ??????????????? } >>>>>>>>>>>>>>>>>>>>>> ??????????????? FLAG_SET_DEFAULT(UseSSE42Intrinsics, >>>>>>>>>>>>>>>>>>>>>> false); >>>>>>>>>>>>>>>>>>>>>> ????????????? } >>>>>>>>>>>>>>>>>>>>>> +??? if (supports_sha()) { >>>>>>>>>>>>>>>>>>>>>> +????? if (FLAG_IS_DEFAULT(UseSHA)) { >>>>>>>>>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(UseSHA, true); >>>>>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>>>>> +??? } else if (UseSHA || UseSHA1Intrinsics || >>>>>>>>>>>>>>>>>>>>>> UseSHA256Intrinsics >>>>>>>>>>>>>>>>>>>>>> || >>>>>>>>>>>>>>>>>>>>>> UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>>>>> +????? if (!FLAG_IS_DEFAULT(UseSHA) || >>>>>>>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA1Intrinsics) || >>>>>>>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA256Intrinsics) || >>>>>>>>>>>>>>>>>>>>>> +????????? !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>>>>> +??????? warning("SHA instructions are not >>>>>>>>>>>>>>>>>>>>>> available on >>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>> CPU"); >>>>>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA, false); >>>>>>>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA1Intrinsics, false); >>>>>>>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA256Intrinsics, false); >>>>>>>>>>>>>>>>>>>>>> +????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, false); >>>>>>>>>>>>>>>>>>>>>> +??? } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ????????????? // some defaults for AMD family 15h >>>>>>>>>>>>>>>>>>>>>> ????????????? if ( cpu_family() == 0x15 ) { >>>>>>>>>>>>>>>>>>>>>> @@ -1072,11 +1088,43 @@ >>>>>>>>>>>>>>>>>>>>>> ????????????? } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ????????? #ifdef COMPILER2 >>>>>>>>>>>>>>>>>>>>>> -??? if (MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>>>>> -????? // Limit vectors size to 16 bytes on >>>>>>>>>>>>>>>>>>>>>> current AMD >>>>>>>>>>>>>>>>>>>>>> cpus. >>>>>>>>>>>>>>>>>>>>>> +??? if (cpu_family() < 0x17 && MaxVectorSize > 16) { >>>>>>>>>>>>>>>>>>>>>> +????? // Limit vectors size to 16 bytes on AMD >>>>>>>>>>>>>>>>>>>>>> cpus < >>>>>>>>>>>>>>>>>>>>>> 17h. >>>>>>>>>>>>>>>>>>>>>> ??????????????? FLAG_SET_DEFAULT(MaxVectorSize, 16); >>>>>>>>>>>>>>>>>>>>>> ????????????? } >>>>>>>>>>>>>>>>>>>>>> ????????? #endif // COMPILER2 >>>>>>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>>>> +??? // Some defaults for AMD family 17h >>>>>>>>>>>>>>>>>>>>>> +??? if ( cpu_family() == 0x17 ) { >>>>>>>>>>>>>>>>>>>>>> +????? // On family 17h processors use XMM and >>>>>>>>>>>>>>>>>>>>>> UnalignedLoadStores >>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>> Array Copy >>>>>>>>>>>>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> +??????? UseXMMForArrayCopy = true; >>>>>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>>>>> +????? if (supports_sse2() && >>>>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> +??????? UseUnalignedLoadStores = true; >>>>>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>>>>> +????? if (supports_bmi2() && >>>>>>>>>>>>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> +??????? UseBMI2Instructions = true; >>>>>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>>>>> +????? if (MaxVectorSize > 32) { >>>>>>>>>>>>>>>>>>>>>> +??????? FLAG_SET_DEFAULT(MaxVectorSize, 32); >>>>>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>>>>> +????? if (UseSHA) { >>>>>>>>>>>>>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) { >>>>>>>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, >>>>>>>>>>>>>>>>>>>>>> false); >>>>>>>>>>>>>>>>>>>>>> +??????? } else if (UseSHA512Intrinsics) { >>>>>>>>>>>>>>>>>>>>>> +????????? warning("Intrinsics for SHA-384 and >>>>>>>>>>>>>>>>>>>>>> SHA-512 >>>>>>>>>>>>>>>>>>>>>> crypto >>>>>>>>>>>>>>>>>>>>>> hash >>>>>>>>>>>>>>>>>>>>>> functions not available on this CPU."); >>>>>>>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseSHA512Intrinsics, >>>>>>>>>>>>>>>>>>>>>> false); >>>>>>>>>>>>>>>>>>>>>> +??????? } >>>>>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>>>>> +#ifdef COMPILER2 >>>>>>>>>>>>>>>>>>>>>> +????? if (supports_sse4_2()) { >>>>>>>>>>>>>>>>>>>>>> +??????? if (FLAG_IS_DEFAULT(UseFPUForSpilling)) { >>>>>>>>>>>>>>>>>>>>>> +????????? FLAG_SET_DEFAULT(UseFPUForSpilling, true); >>>>>>>>>>>>>>>>>>>>>> +??????? } >>>>>>>>>>>>>>>>>>>>>> +????? } >>>>>>>>>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>>>>>>>>> +??? } >>>>>>>>>>>>>>>>>>>>>> ??????????? } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ??????????? if( is_intel() ) { // Intel cpus specific >>>>>>>>>>>>>>>>>>>>>> settings >>>>>>>>>>>>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp >>>>>>>>>>>>>>>>>>>>>> @@ -513,6 +513,16 @@ >>>>>>>>>>>>>>>>>>>>>> ????????????????? result |= CPU_LZCNT; >>>>>>>>>>>>>>>>>>>>>> ??????????????? if >>>>>>>>>>>>>>>>>>>>>> (_cpuid_info.ext_cpuid1_ecx.bits.sse4a >>>>>>>>>>>>>>>>>>>>>> != >>>>>>>>>>>>>>>>>>>>>> 0) >>>>>>>>>>>>>>>>>>>>>> ????????????????? result |= CPU_SSE4A; >>>>>>>>>>>>>>>>>>>>>> +????? if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0) >>>>>>>>>>>>>>>>>>>>>> +??????? result |= CPU_BMI2; >>>>>>>>>>>>>>>>>>>>>> +????? if(_cpuid_info.std_cpuid1_edx.bits.ht != 0) >>>>>>>>>>>>>>>>>>>>>> +??????? result |= CPU_HT; >>>>>>>>>>>>>>>>>>>>>> +????? if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0) >>>>>>>>>>>>>>>>>>>>>> +??????? result |= CPU_ADX; >>>>>>>>>>>>>>>>>>>>>> +????? if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0) >>>>>>>>>>>>>>>>>>>>>> +??????? result |= CPU_SHA; >>>>>>>>>>>>>>>>>>>>>> +????? if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0) >>>>>>>>>>>>>>>>>>>>>> +??????? result |= CPU_FMA; >>>>>>>>>>>>>>>>>>>>>> ????????????? } >>>>>>>>>>>>>>>>>>>>>> ????????????? // Intel features. >>>>>>>>>>>>>>>>>>>>>> ????????????? if(is_intel()) { >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>> Rohit >>>>>>>>>>>>>>>>>>>>>> > From kim.barrett at oracle.com Tue Oct 17 19:52:22 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 17 Oct 2017 15:52:22 -0400 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> Message-ID: > On Oct 16, 2017, at 10:14 AM, Stefan Karlsson wrote: > > Hi all, > > Please review this patch to move the JNI global weak handle processing out of the ReferenceProcessor into a new class, WeakProcessor, that will be used to gather processing and cleaning of "native weak" oops. > > After this patch the ReferenceProcessor will only deal with the Java level java.lang.ref weak references. > > http://cr.openjdk.java.net/~stefank/8189359/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8189359 > > Note this patch only moves the JNIHandles::weak_oops_do calls into the new WeakProcessor. A subsequent patch for JDK-8189359 will move the JvmtiExport::weak_oops_do from JNIHandleBlock into the WeakProcessor. > > Future patches like JDK-8171119, for example, will be able to add it's set of native weak oops into the new WeakProcessor functions and won't have to duplicate the code for all GCs or add call inside the ReferenceProcessor. > > Tested with JPRT. > > Thanks, > StefanK Mostly OK, and nice to have this cleaned up, esp. with the JDK-8189359 followup. ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/weakProcessor.hpp I don't understand the name of unlink_or_oops_do. A little hint as to the semantics of the two functions in WeakProcessor might help. Right now, it's not at all obvious how they differ, other than by signature. ------------------------------------------------------------------------------ src/hotspot/share/gc/g1/g1CollectedHeap.cpp This change seems to remove the only call to process_weak_jni_handles(). ------------------------------------------------------------------------------ From kim.barrett at oracle.com Tue Oct 17 19:55:06 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 17 Oct 2017 15:55:06 -0400 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> Message-ID: > On Oct 16, 2017, at 11:40 AM, Stefan Karlsson wrote: > > Hi all, > > Please review this patch to move the call of the static JvmtiExport::weak_oops_do out of the JNIHandleBlock::weak_oops_do member function into the new WeakProcessor. > > Today, this isn't causing any bugs because there's only one instance of JNIHandleBlock, the _weak_global_handles. However, in prototypes with more than one JNIHandleBlock, this results in multiple calls to JvmtiExport::weak_oops_do. > > http://cr.openjdk.java.net/~stefank/8189360/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8189360 > > This patch builds upon the patch in: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028684.html > > Tested with JPRT. > > Thanks, > StefanK src/hotspot/share/runtime/jniHandles.cpp Maybe remove #include ?prims/jvmtiExport.hpp? ? Otherwise looks good. I don?t need another webrev for that #include removal. From stefan.karlsson at oracle.com Tue Oct 17 20:57:15 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 17 Oct 2017 22:57:15 +0200 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> Message-ID: Hi Kim, On 2017-10-17 21:52, Kim Barrett wrote: >> On Oct 16, 2017, at 10:14 AM, Stefan Karlsson wrote: >> >> Hi all, >> >> Please review this patch to move the JNI global weak handle processing out of the ReferenceProcessor into a new class, WeakProcessor, that will be used to gather processing and cleaning of "native weak" oops. >> >> After this patch the ReferenceProcessor will only deal with the Java level java.lang.ref weak references. >> >> http://cr.openjdk.java.net/~stefank/8189359/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8189359 >> >> Note this patch only moves the JNIHandles::weak_oops_do calls into the new WeakProcessor. A subsequent patch for JDK-8189359 will move the JvmtiExport::weak_oops_do from JNIHandleBlock into the WeakProcessor. >> >> Future patches like JDK-8171119, for example, will be able to add it's set of native weak oops into the new WeakProcessor functions and won't have to duplicate the code for all GCs or add call inside the ReferenceProcessor. >> >> Tested with JPRT. >> >> Thanks, >> StefanK > Mostly OK, and nice to have this cleaned up, esp. with the JDK-8189359 > followup. Thanks for reviewing! > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/weakProcessor.hpp > > I don't understand the name of unlink_or_oops_do. A little hint as to > the semantics of the two functions in WeakProcessor might help. Right > now, it's not at all obvious how they differ, other than by signature. I've renamed it to weak_oops_do and added comments to hopefully explain what they do. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/g1/g1CollectedHeap.cpp > > This change seems to remove the only call to process_weak_jni_handles(). Removed. Here are the updated webrevs: ?http://cr.openjdk.java.net/~stefank/8189359/webrev.01.delta ?http://cr.openjdk.java.net/~stefank/8189359/webrev.01 Thanks, StefanK > > ------------------------------------------------------------------------------ > From stefan.karlsson at oracle.com Tue Oct 17 20:59:20 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 17 Oct 2017 22:59:20 +0200 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> Message-ID: On 2017-10-17 21:55, Kim Barrett wrote: >> On Oct 16, 2017, at 11:40 AM, Stefan Karlsson wrote: >> >> Hi all, >> >> Please review this patch to move the call of the static JvmtiExport::weak_oops_do out of the JNIHandleBlock::weak_oops_do member function into the new WeakProcessor. >> >> Today, this isn't causing any bugs because there's only one instance of JNIHandleBlock, the _weak_global_handles. However, in prototypes with more than one JNIHandleBlock, this results in multiple calls to JvmtiExport::weak_oops_do. >> >> http://cr.openjdk.java.net/~stefank/8189360/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8189360 >> >> This patch builds upon the patch in: >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028684.html >> >> Tested with JPRT. >> >> Thanks, >> StefanK > src/hotspot/share/runtime/jniHandles.cpp > Maybe remove #include ?prims/jvmtiExport.hpp? ? > > Otherwise looks good. I don?t need another webrev for that #include removal. Thanks! StefanK From coleen.phillimore at oracle.com Tue Oct 17 21:03:51 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 17 Oct 2017 17:03:51 -0400 Subject: RFR: 8184914: Use MacroAssembler::cmpoop() consistently when comparing heap objects In-Reply-To: <8d667010-f17e-7d1b-088b-106999e3b005@redhat.com> References: <8d667010-f17e-7d1b-088b-106999e3b005@redhat.com> Message-ID: <9b629556-b3f0-e52e-35e0-711c6a767e95@oracle.com> This looks reasonable to me.? Maybe the compiler group should review the c1 part.? I changed the mailing list to hotspot-dev. I can sponsor this for you. Thanks, Coleen On 10/17/17 4:22 PM, Roman Kennke wrote: > (Not sure if this is the correct list to ask.. if not, please let me > know and/or redirect me) > > Currently, cmpoop() is only declared for 32-bit x86, and only used in > 2 places in C1 to compare oops. In other places, oops are compared > using cmpptr(). It would be useful to distinguish normal pointer > comparisons from heap object comparisons, and use cmpoop() > consistently for heap object comparisons. This would remove clutter in > several places where we have #ifdef _LP64 around comparisons, and > would also allow to insert necessary barriers for GCs that need them > (e.g. Shenandoah) later. > > http://cr.openjdk.java.net/~rkennke/8184914/webrev.00/ > > > Tested by running hotspot_gc jtreg tests. > > Can I get a review please? > > Thanks, Roman > > From rkennke at redhat.com Tue Oct 17 21:05:29 2017 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 17 Oct 2017 23:05:29 +0200 Subject: RFR: 8184914: Use MacroAssembler::cmpoop() consistently when comparing heap objects In-Reply-To: <9b629556-b3f0-e52e-35e0-711c6a767e95@oracle.com> References: <8d667010-f17e-7d1b-088b-106999e3b005@redhat.com> <9b629556-b3f0-e52e-35e0-711c6a767e95@oracle.com> Message-ID: <55bb0f72-df71-44bc-53a0-7d982ab1ca04@redhat.com> > > This looks reasonable to me.? Maybe the compiler group should review > the c1 part.? I changed the mailing list to hotspot-dev. > I can sponsor this for you. Thanks, thanks and thanks! ;-) Roman > Thanks, > Coleen > > On 10/17/17 4:22 PM, Roman Kennke wrote: >> (Not sure if this is the correct list to ask.. if not, please let me >> know and/or redirect me) >> >> Currently, cmpoop() is only declared for 32-bit x86, and only used in >> 2 places in C1 to compare oops. In other places, oops are compared >> using cmpptr(). It would be useful to distinguish normal pointer >> comparisons from heap object comparisons, and use cmpoop() >> consistently for heap object comparisons. This would remove clutter >> in several places where we have #ifdef _LP64 around comparisons, and >> would also allow to insert necessary barriers for GCs that need them >> (e.g. Shenandoah) later. >> >> http://cr.openjdk.java.net/~rkennke/8184914/webrev.00/ >> >> >> Tested by running hotspot_gc jtreg tests. >> >> Can I get a review please? >> >> Thanks, Roman >> >> > From kim.barrett at oracle.com Tue Oct 17 21:11:07 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 17 Oct 2017 17:11:07 -0400 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> Message-ID: <5F447CCE-7412-43C3-A27E-F89B393B05D9@oracle.com> > On Oct 17, 2017, at 4:57 PM, Stefan Karlsson wrote: > Here are the updated webrevs: > http://cr.openjdk.java.net/~stefank/8189359/webrev.01.delta > http://cr.openjdk.java.net/~stefank/8189359/webrev.01 ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/weakProcessor.hpp 33 // New contains of weak oops added to this class will automatically Sorry, but that's garbled, and I'm not sure what is intended. Previous version had "sets" instead of "contains", which seemed okay to me. ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/weakProcessor.hpp 45 // Visit all oop*s and apply the given clousre. s/clousre/closure/ ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/weakProcessor.hpp 41 // The complete closure is used as a post-processing step called 42 // after each container has been processed. I think a comma is needed between "step" and "called". But we were discussing in chat whether this closure is even needed. I think it isn't... ------------------------------------------------------------------------------ src/hotspot/share/gc/shared/weakProcessor.hpp 37 // Visit all oop*s and either apply the keep_alive closure if the referenced 38 // object is considered alive by the is_alive closure, otherwise do some 39 // container specific cleanup of element holding the oop. Suggest s/either// s/, otherwise/. Otherwise/ ------------------------------------------------------------------------------ From stefan.karlsson at oracle.com Tue Oct 17 21:22:54 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 17 Oct 2017 23:22:54 +0200 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: <5F447CCE-7412-43C3-A27E-F89B393B05D9@oracle.com> References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> <5F447CCE-7412-43C3-A27E-F89B393B05D9@oracle.com> Message-ID: <73dcae02-f18e-83aa-25ae-087dd1d917ca@oracle.com> On 2017-10-17 23:11, Kim Barrett wrote: >> On Oct 17, 2017, at 4:57 PM, Stefan Karlsson wrote: >> Here are the updated webrevs: >> http://cr.openjdk.java.net/~stefank/8189359/webrev.01.delta >> http://cr.openjdk.java.net/~stefank/8189359/webrev.01 Obviously, this is getting too late for me to do these kinds of changes. But let me try one more time. :) > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/weakProcessor.hpp > 33 // New contains of weak oops added to this class will automatically > > Sorry, but that's garbled, and I'm not sure what is intended. > > Previous version had "sets" instead of "contains", which seemed okay > to me. I used the word container in the comment for weak_oops_do, so I wanted to use the same word here. I can change to set(s) if that makes more sense. > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/weakProcessor.hpp > 45 // Visit all oop*s and apply the given clousre. > > s/clousre/closure/ Done. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/weakProcessor.hpp > 41 // The complete closure is used as a post-processing step called > 42 // after each container has been processed. > > I think a comma is needed between "step" and "called". Done. > But we were > discussing in chat whether this closure is even needed. I think it > isn't... I agree. But I'd rather think about that a bit more and then remove that as a separate patch. > > ------------------------------------------------------------------------------ > src/hotspot/share/gc/shared/weakProcessor.hpp > 37 // Visit all oop*s and either apply the keep_alive closure if the referenced > 38 // object is considered alive by the is_alive closure, otherwise do some > 39 // container specific cleanup of element holding the oop. > > Suggest > > s/either// > s/, otherwise/. Otherwise/ > > ------------------------------------------------------------------------------ Done. http://cr.openjdk.java.net/~stefank/8189359/webrev.02.delta http://cr.openjdk.java.net/~stefank/8189359/webrev.02 Thanks, StefanK > From per.liden at oracle.com Tue Oct 17 21:38:05 2017 From: per.liden at oracle.com (Per Liden) Date: Tue, 17 Oct 2017 23:38:05 +0200 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> Message-ID: Hi, On 2017-10-17 22:57, Stefan Karlsson wrote: [...] > > Here are the updated webrevs: > ?http://cr.openjdk.java.net/~stefank/8189359/webrev.01.delta > ?http://cr.openjdk.java.net/~stefank/8189359/webrev.01 Looks good. Just two comments. share/gc/parallel/psScavenge.cpp: 446 { 447 GCTraceTime(Debug, gc, phases) tm("Weak Processing", &_gc_timer); 448 WeakProcessor::weak_oops_do(&_is_alive_closure, &root_closure); 449 } I see you've kept the "complete" closure in WeakProcessor::weak_oops_do(), which is fine and we can clean that out later, but here you don't seem to mimic exactly what the old code did. I think you want to pass in &evac_followers here, right? share/gc/serial/defNewGeneration.cpp: 662 WeakProcessor::weak_oops_do(&is_alive, &keep_alive); Same here, pass in &evacuate_followers? I don't need to see a new webrev. cheers, Per From per.liden at oracle.com Tue Oct 17 21:43:59 2017 From: per.liden at oracle.com (Per Liden) Date: Tue, 17 Oct 2017 23:43:59 +0200 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> Message-ID: <67b8baf1-0e2b-7ebc-2826-de81da5cf770@oracle.com> Hi, On 2017-10-16 17:40, Stefan Karlsson wrote: > Hi all, > > Please review this patch to move the call of the static > JvmtiExport::weak_oops_do out of the JNIHandleBlock::weak_oops_do member > function into the new WeakProcessor. > > Today, this isn't causing any bugs because there's only one instance of > JNIHandleBlock, the _weak_global_handles. However, in prototypes with > more than one JNIHandleBlock, this results in multiple calls to > JvmtiExport::weak_oops_do. > > http://cr.openjdk.java.net/~stefank/8189360/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8189360 30 void WeakProcessor::unlink_or_oops_do(BoolObjectClosure* is_alive, OopClosure* keep_alive, VoidClosure* complete) { 31 JNIHandles::weak_oops_do(is_alive, keep_alive); 32 if (complete != NULL) { 33 complete->do_void(); 34 } 35 36 JvmtiExport::weak_oops_do(is_alive, keep_alive); 37 if (complete != NULL) { 38 complete->do_void(); 39 } 40 } Should you really be calling complete->do_void() twice here. It seems to me that doing it once, after both calls to weak_oops_do() would mimic what the old code did? cheers, Per > > This patch builds upon the patch in: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028684.html > > Tested with JPRT. > > Thanks, > StefanK From kim.barrett at oracle.com Tue Oct 17 23:04:19 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 17 Oct 2017 19:04:19 -0400 Subject: RFR(XXS): 8187462: IntegralConstant should not be AllStatic Message-ID: <7B2A73A3-3D83-4D29-A6D0-42C158575E28@oracle.com> Please review this small change to the IntegralConstant class so that it actually behaves as documented. CR: https://bugs.openjdk.java.net/browse/JDK-8187462 Webrev: http://cr.openjdk.java.net/~kbarrett/8187462/open.00/ Testing: Built on all platforms supported by JPRT. From serguei.spitsyn at oracle.com Tue Oct 17 23:08:51 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 17 Oct 2017 16:08:51 -0700 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> Message-ID: Hi Stefan, Looks good. +1 for the removal of #include ?prims/jvmtiExport.hpp?. Thanks, Serguei On 10/17/17 12:55, Kim Barrett wrote: >> On Oct 16, 2017, at 11:40 AM, Stefan Karlsson wrote: >> >> Hi all, >> >> Please review this patch to move the call of the static JvmtiExport::weak_oops_do out of the JNIHandleBlock::weak_oops_do member function into the new WeakProcessor. >> >> Today, this isn't causing any bugs because there's only one instance of JNIHandleBlock, the _weak_global_handles. However, in prototypes with more than one JNIHandleBlock, this results in multiple calls to JvmtiExport::weak_oops_do. >> >> http://cr.openjdk.java.net/~stefank/8189360/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8189360 >> >> This patch builds upon the patch in: >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028684.html >> >> Tested with JPRT. >> >> Thanks, >> StefanK > src/hotspot/share/runtime/jniHandles.cpp > Maybe remove #include ?prims/jvmtiExport.hpp? ? > > Otherwise looks good. I don?t need another webrev for that #include removal. > From coleen.phillimore at oracle.com Tue Oct 17 23:09:04 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 17 Oct 2017 19:09:04 -0400 Subject: RFR(XXS): 8187462: IntegralConstant should not be AllStatic In-Reply-To: <7B2A73A3-3D83-4D29-A6D0-42C158575E28@oracle.com> References: <7B2A73A3-3D83-4D29-A6D0-42C158575E28@oracle.com> Message-ID: <4574a14e-5375-ab81-ac86-b13393f33f70@oracle.com> This looks good.? I'm pretty sure this can be checked in under the "trivial" rule. Coleen On 10/17/17 7:04 PM, Kim Barrett wrote: > Please review this small change to the IntegralConstant class so that > it actually behaves as documented. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8187462 > > Webrev: > http://cr.openjdk.java.net/~kbarrett/8187462/open.00/ > > Testing: > Built on all platforms supported by JPRT. > From kim.barrett at oracle.com Tue Oct 17 23:57:53 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 17 Oct 2017 19:57:53 -0400 Subject: RFR(XXS): 8187462: IntegralConstant should not be AllStatic In-Reply-To: <4574a14e-5375-ab81-ac86-b13393f33f70@oracle.com> References: <7B2A73A3-3D83-4D29-A6D0-42C158575E28@oracle.com> <4574a14e-5375-ab81-ac86-b13393f33f70@oracle.com> Message-ID: > On Oct 17, 2017, at 7:09 PM, coleen.phillimore at oracle.com wrote: > > This looks good. I'm pretty sure this can be checked in under the "trivial" rule. > Coleen Thanks. Agree that it?s trivial. > > On 10/17/17 7:04 PM, Kim Barrett wrote: >> Please review this small change to the IntegralConstant class so that >> it actually behaves as documented. >> >> CR: >> https://bugs.openjdk.java.net/browse/JDK-8187462 >> >> Webrev: >> http://cr.openjdk.java.net/~kbarrett/8187462/open.00/ >> >> Testing: >> Built on all platforms supported by JPRT. From kim.barrett at oracle.com Wed Oct 18 00:09:30 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 17 Oct 2017 20:09:30 -0400 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> Message-ID: <8209F13B-72CA-4135-B589-09D72A0B54AA@oracle.com> > On Oct 17, 2017, at 5:38 PM, Per Liden wrote: > > Hi, > > On 2017-10-17 22:57, Stefan Karlsson wrote: > [...] >> Here are the updated webrevs: >> http://cr.openjdk.java.net/~stefank/8189359/webrev.01.delta >> http://cr.openjdk.java.net/~stefank/8189359/webrev.01 > > Looks good. Just two comments. > > share/gc/parallel/psScavenge.cpp: > > 446 { > 447 GCTraceTime(Debug, gc, phases) tm("Weak Processing", &_gc_timer); > 448 WeakProcessor::weak_oops_do(&_is_alive_closure, &root_closure); > 449 } > > I see you've kept the "complete" closure in WeakProcessor::weak_oops_do(), which is fine and we can clean that out later, but here you don't seem to mimic exactly what the old code did. I think you want to pass in &evac_followers here, right? > > share/gc/serial/defNewGeneration.cpp: > > 662 WeakProcessor::weak_oops_do(&is_alive, &keep_alive); > > Same here, pass in &evacuate_followers? > > I don't need to see a new webrev. > > cheers, > Per Oh, I missed that. Same thing in cms/parNewGeneration.cpp, I think. Otherwise, looks good. I don?t need a new webrev either. From thomas.stuefe at gmail.com Wed Oct 18 07:10:51 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 18 Oct 2017 09:10:51 +0200 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: <368f252c8d5440e785e1ee341f4a918e@sap.com> Message-ID: Hi all, I am cleaning up my backlog of old issues which did not make it into the repo before the consolidation. Bug: https://bugs.openjdk.java.net/browse/JDK-8187230 Last Webrev (just rebased to the new repo structure, no changes): http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix-leave-os-guard-page-size-at-default-for-non-java-threads/webrev.02/webrev/ For your convenience, here the original message: <<< The change is very subtle. Before, we would set the OS guard page size for every thread - for java threads disable them, for non-java threads we'd set them to 4K. Now, we still disable them for java threads but leave them at the OS default size for non-java threads. The really important part is the disabling of OS guard pages for java threads, where we have a VM guard pages in place and do not want to spend more memory on OS guards. We do not really care for the exact size of the OS guard pages for non-java threads, and therefore should not set it - we should leave the size in place the OS deems sufficient. That also spares us the complexity of handling the thread stack page size, which on AIX may be different from os::vm_page_size(). >>> @Chris: you did ask whether this would make sense for Linux too. I think you are right, but as Goetz pointed out matters are more complicated as glibc pthread_create does not substract OS guard size from the user specified stack size, so it requires us to know the OS guard size and add it to the specified stack size (funny, the same issue we have with VM guards and -Xss). So, for now, I'd prefer this to keep AIX only. I think I need a second reviewer beside Goetz. Thanks! Thomas On Fri, Sep 8, 2017 at 10:48 AM, Thomas St?fe wrote: > Hi Guys, > > On Fri, Sep 8, 2017 at 9:51 AM, Lindenmaier, Goetz < > goetz.lindenmaier at sap.com> wrote: > >> Hi Chris, >> >> on linux the pthread implementation is a bit strange, or buggy. >> It takes the OS guard pages out of the stack size specified. >> We need to set it so we can predict the additional space >> that must be allocated for the stack. >> >> See also the comment in os_linux.cpp, create_thread(). >> > > Goetz, I know we talked about this off list yesterday, but now I am not > sure this is actually needed. Yes, to correctly calculate the stack size, > we need to know the OS guard page size, but we do not need to set it, we > just need to know it. So, for non-java threads (java threads get the OS > guard set to zero), it would probably be sufficient to: > > - pthread_attr_init() (sets default thread attribute values to the > attribute structure) and then > - pthread_attr_getguardsize() to read the guard size from that structure. > > That way we leave the OS guard page at the size glibc deems best. I think > that is a better option. Consider a situation where the glibc changes the > size of the OS guard pages, for whatever reason - we probably should follow > suit. > > See e.g. this security issue - admittedly only loosely related, since the > fix for this issue seemed to be a fix to stack banging, not changing the OS > guard size: https://access.redhat.com/security/vulnerabilities/stackguard > > So, in short, I think we could change this for Linux too. If you guys > agree, I'll add this to the patch. Since I am on vacation and the depot is > closed, it may take some time. > > Kind Regards, Thomas > > > > > >> >> Best regards, >> Goetz. >> >> > -----Original Message----- >> > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounc >> es at openjdk.java.net] >> > On Behalf Of Chris Plummer >> > Sent: Thursday, September 07, 2017 11:07 PM >> > To: Thomas St?fe ; ppc-aix-port- >> > dev at openjdk.java.net >> > Cc: HotSpot Open Source Developers >> > Subject: Re: RFR(xxs): 8187230: [aix] Leave OS guard page size at >> default for >> > non-java threads instead of explicitly setting it >> > >> > Hi Thomas, >> > >> > Is there a reason this shouldn't also be done for linux? >> > >> > thanks, >> > >> > Chris >> > >> > On 9/7/17 3:02 AM, Thomas St?fe wrote: >> > > Hi all, >> > > >> > > may I please have a review for this small change: >> > > >> > > Bug: >> > > https://bugs.openjdk.java.net/browse/JDK-8187230 >> > > >> > > Webrev: >> > > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- >> > > leave-os-guard-page-size-at-default-for-non-java- >> > threads/webrev.00/webrev/ >> > > >> > > The change is very subtle. >> > > >> > > Before, we would set the OS guard page size for every thread - for >> java >> > > threads disable them, for non-java threads we'd set them to 4K. >> > > >> > > Now, we still disable them for java threads but leave them at the OS >> > > default size for non-java threads. >> > > >> > > The really important part is the disabling of OS guard pages for java >> > > threads, where we have a VM guard pages in place and do not want to >> > spend >> > > more memory on OS guards. We do not really care for the exact size of >> the >> > > OS guard pages for non-java threads, and therefore should not set it >> - we >> > > should leave the size in place the OS deems sufficient. That also >> spares us >> > > the complexity of handling the thread stack page size, which on AIX >> may be >> > > different from os::vm_page_size(). >> > > >> > > Thank you and Kind Regards, Thomas >> > >> > >> >> > From david.holmes at oracle.com Wed Oct 18 07:12:50 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 Oct 2017 17:12:50 +1000 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: <368f252c8d5440e785e1ee341f4a918e@sap.com> Message-ID: Looks fine to me. Cheers, David On 18/10/2017 5:10 PM, Thomas St?fe wrote: > Hi all, > > I am cleaning up my backlog of old issues which did not make it into the > repo before the consolidation. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8187230 > > > Last Webrev (just rebased to the new repo structure, no changes): > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix-leave-os-guard-page-size-at-default-for-non-java-threads/webrev.02/webrev/ > > For your convenience, here the original message: > > <<< > The change is very subtle. > > Before, we would set the OS guard page size for every thread - for java > threads disable them, for non-java threads we'd set them to 4K. > > Now, we still disable them for java threads but leave them at the OS > default size for non-java threads. > > The really important part is the disabling of OS guard pages for java > threads, where we have a VM guard pages in place and do not want to > spend more memory on OS guards. We do not really care for the exact size > of the OS guard pages for non-java threads, and therefore should not set > it - we should leave the size in place the OS deems sufficient. That > also spares us the complexity of handling the thread stack page size, > which on AIX may be different from os::vm_page_size(). > >>> > > @Chris: you did ask whether this would make sense for Linux too. I think > you are right, but as Goetz pointed out matters are more complicated as > glibc pthread_create does not substract OS guard size from the user > specified stack size, so it requires us to know the OS guard size and > add it to the specified stack size (funny, the same issue we have with > VM guards and -Xss). So, for now, I'd prefer this to keep AIX only. > > I think I need a second reviewer beside Goetz. > > Thanks! > > Thomas > > > > On Fri, Sep 8, 2017 at 10:48 AM, Thomas St?fe > wrote: > > Hi Guys, > > On Fri, Sep 8, 2017 at 9:51 AM, Lindenmaier, Goetz > > wrote: > > Hi Chris, > > on linux the pthread implementation is a bit strange, or buggy. > It takes the OS guard pages out of the stack size specified. > We need to set it so we can predict the additional space > that must be allocated for the stack. > > See also the comment in os_linux.cpp, create_thread(). > > > Goetz, I know we talked about this off list yesterday, but now I am > not sure this is actually needed. Yes, to correctly calculate the > stack size, we need to know the OS guard page size, but we do not > need to set it, we just need to know it. So, for non-java threads > (java threads get the OS guard set to zero), it would probably be > sufficient to: > > - pthread_attr_init() (sets default thread attribute values to the > attribute structure) and then > - pthread_attr_getguardsize() to read the guard size from that > structure. > > That way we leave the OS guard page at the size glibc deems best. I > think that is a better option. Consider a situation where the glibc > changes the size of the OS guard pages, for whatever reason - we > probably should follow suit. > > See e.g. this security issue - admittedly only loosely related, > since the fix for this issue seemed to be a fix to stack banging, > not changing the OS guard size: > https://access.redhat.com/security/vulnerabilities/stackguard > > > So, in short, I think we could change this for Linux too. If you > guys agree, I'll add this to the patch. Since I am on vacation and > the depot is closed, it may take some time. > > Kind Regards, Thomas > > > > > Best regards, > ? Goetz. > > > -----Original Message----- > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net > ] > > On Behalf Of Chris Plummer > > Sent: Thursday, September 07, 2017 11:07 PM > > To: Thomas St?fe >; > ppc-aix-port- > > dev at openjdk.java.net > > Cc: HotSpot Open Source Developers > > > Subject: Re: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for > > non-java threads instead of explicitly setting it > > > > Hi Thomas, > > > > Is there a reason this shouldn't also be done for linux? > > > > thanks, > > > > Chris > > > > On 9/7/17 3:02 AM, Thomas St?fe wrote: > > > Hi all, > > > > > > may I please have a review for this small change: > > > > > > Bug: > > > https://bugs.openjdk.java.net/browse/JDK-8187230 > > > > > > > Webrev: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- > > > > leave-os-guard-page-size-at-default-for-non-java- > > threads/webrev.00/webrev/ > > > > > > The change is very subtle. > > > > > > Before, we would set the OS guard page size for every > thread - for java > > > threads disable them, for non-java threads we'd set them to 4K. > > > > > > Now, we still disable them for java threads but leave them > at the OS > > > default size for non-java threads. > > > > > > The really important part is the disabling of OS guard > pages for java > > > threads, where we have a VM guard pages in place and do not > want to > > spend > > > more memory on OS guards. We do not really care for the > exact size of the > > > OS guard pages for non-java threads, and therefore should > not set it - we > > > should leave the size in place the OS deems sufficient. > That also spares us > > > the complexity of handling the thread stack page size, > which on AIX may be > > > different from os::vm_page_size(). > > > > > > Thank you and Kind Regards, Thomas > > > > > > > From thomas.stuefe at gmail.com Wed Oct 18 07:27:24 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 18 Oct 2017 09:27:24 +0200 Subject: RFR(xxs): 8187230: [aix] Leave OS guard page size at default for non-java threads instead of explicitly setting it In-Reply-To: References: <368f252c8d5440e785e1ee341f4a918e@sap.com> Message-ID: On Wed, Oct 18, 2017 at 9:12 AM, David Holmes wrote: > Looks fine to me. > > Cheers, > David > > Thanks David! > On 18/10/2017 5:10 PM, Thomas St?fe wrote: > >> Hi all, >> >> I am cleaning up my backlog of old issues which did not make it into the >> repo before the consolidation. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8187230 < >> https://bugs.openjdk.java.net/browse/JDK-8187230> >> >> Last Webrev (just rebased to the new repo structure, no changes): >> http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix-leave >> -os-guard-page-size-at-default-for-non-java-threads/webrev.02/webrev/ >> >> For your convenience, here the original message: >> >> <<< >> The change is very subtle. >> >> Before, we would set the OS guard page size for every thread - for java >> threads disable them, for non-java threads we'd set them to 4K. >> >> Now, we still disable them for java threads but leave them at the OS >> default size for non-java threads. >> >> The really important part is the disabling of OS guard pages for java >> threads, where we have a VM guard pages in place and do not want to spend >> more memory on OS guards. We do not really care for the exact size of the >> OS guard pages for non-java threads, and therefore should not set it - we >> should leave the size in place the OS deems sufficient. That also spares us >> the complexity of handling the thread stack page size, which on AIX may be >> different from os::vm_page_size(). >> >>> >> >> @Chris: you did ask whether this would make sense for Linux too. I think >> you are right, but as Goetz pointed out matters are more complicated as >> glibc pthread_create does not substract OS guard size from the user >> specified stack size, so it requires us to know the OS guard size and add >> it to the specified stack size (funny, the same issue we have with VM >> guards and -Xss). So, for now, I'd prefer this to keep AIX only. >> >> I think I need a second reviewer beside Goetz. >> >> Thanks! >> >> Thomas >> >> >> >> On Fri, Sep 8, 2017 at 10:48 AM, Thomas St?fe > > wrote: >> >> Hi Guys, >> >> On Fri, Sep 8, 2017 at 9:51 AM, Lindenmaier, Goetz >> > wrote: >> >> Hi Chris, >> >> on linux the pthread implementation is a bit strange, or buggy. >> It takes the OS guard pages out of the stack size specified. >> We need to set it so we can predict the additional space >> that must be allocated for the stack. >> >> See also the comment in os_linux.cpp, create_thread(). >> >> >> Goetz, I know we talked about this off list yesterday, but now I am >> not sure this is actually needed. Yes, to correctly calculate the >> stack size, we need to know the OS guard page size, but we do not >> need to set it, we just need to know it. So, for non-java threads >> (java threads get the OS guard set to zero), it would probably be >> sufficient to: >> >> - pthread_attr_init() (sets default thread attribute values to the >> attribute structure) and then >> - pthread_attr_getguardsize() to read the guard size from that >> structure. >> >> That way we leave the OS guard page at the size glibc deems best. I >> think that is a better option. Consider a situation where the glibc >> changes the size of the OS guard pages, for whatever reason - we >> probably should follow suit. >> >> See e.g. this security issue - admittedly only loosely related, >> since the fix for this issue seemed to be a fix to stack banging, >> not changing the OS guard size: >> https://access.redhat.com/security/vulnerabilities/stackguard >> >> >> So, in short, I think we could change this for Linux too. If you >> guys agree, I'll add this to the patch. Since I am on vacation and >> the depot is closed, it may take some time. >> >> Kind Regards, Thomas >> >> >> >> >> Best regards, >> Goetz. >> >> > -----Original Message----- >> > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounc >> es at openjdk.java.net >> ] >> > On Behalf Of Chris Plummer >> > Sent: Thursday, September 07, 2017 11:07 PM >> > To: Thomas St?fe > thomas.stuefe at gmail.com>>; >> ppc-aix-port- >> > dev at openjdk.java.net >> > Cc: HotSpot Open Source Developers < >> hotspot-dev at openjdk.java.net > >> > Subject: Re: RFR(xxs): 8187230: [aix] Leave OS guard page size >> at default for >> > non-java threads instead of explicitly setting it >> > >> > Hi Thomas, >> > >> > Is there a reason this shouldn't also be done for linux? >> > >> > thanks, >> > >> > Chris >> > >> > On 9/7/17 3:02 AM, Thomas St?fe wrote: >> > > Hi all, >> > > >> > > may I please have a review for this small change: >> > > >> > > Bug: >> > > https://bugs.openjdk.java.net/browse/JDK-8187230 >> >> > > >> > > Webrev: >> > > http://cr.openjdk.java.net/~stuefe/webrevs/8187230-aix- >> >> > > leave-os-guard-page-size-at-default-for-non-java- >> > threads/webrev.00/webrev/ >> > > >> > > The change is very subtle. >> > > >> > > Before, we would set the OS guard page size for every >> thread - for java >> > > threads disable them, for non-java threads we'd set them to >> 4K. >> > > >> > > Now, we still disable them for java threads but leave them >> at the OS >> > > default size for non-java threads. >> > > >> > > The really important part is the disabling of OS guard >> pages for java >> > > threads, where we have a VM guard pages in place and do not >> want to >> > spend >> > > more memory on OS guards. We do not really care for the >> exact size of the >> > > OS guard pages for non-java threads, and therefore should >> not set it - we >> > > should leave the size in place the OS deems sufficient. >> That also spares us >> > > the complexity of handling the thread stack page size, >> which on AIX may be >> > > different from os::vm_page_size(). >> > > >> > > Thank you and Kind Regards, Thomas >> > >> > >> >> >> >> From magnus.ihse.bursie at oracle.com Wed Oct 18 08:04:11 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Wed, 18 Oct 2017 10:04:11 +0200 Subject: RFR: JDK-8189607 Remove duplicated jvmticmlr.h Message-ID: The file jvmticmlr.h is stored twice in the repo, both in hotspot and in java.base. They are both identical, and only the java.base version is included in the final product. This might arguably have been useful in a pre-consolidated world, but makes absolutely no sense now. Bug: https://bugs.openjdk.java.net/browse/JDK-8189607 WebRev: http://cr.openjdk.java.net/~ihse/JDK-8189607-remove-duplicated-jvmticmlr/webrev.01 /Magnus From thomas.schatzl at oracle.com Wed Oct 18 08:18:37 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 18 Oct 2017 10:18:37 +0200 Subject: RFR(M) 8186834:Expanding old area without full GC in parallel GC In-Reply-To: References: Message-ID: <1508314717.4435.5.camel@oracle.com> On Tue, 2017-10-17 at 21:09 +0900, Michihiro Horie wrote: > Hi Thomas, > > Thanks a lot for your response! > > >what is the difference (in performance) to simply set -Xms==-Xmx > here? > This change assumes -Xms==-Xmx is not set.? > > Please let me explain our situation. We have a real project where we > need to run multiple Java processes per node with limited memory > resource for job schedulers of parallel distributed computing > framework such as Spark. Arbitrary Java processes actually need the > Xmx heap, although the same JVM arguments are uniformly set for these > job schedulers. I am still trying to understand why in this situation the new (additional) flag would be preferable to the mentioned alternative. Maybe there is something about argument passing, but the description seems to be a bit unclear. Let me recap if I understood the problem and the need for this solution correctly: - there are at least two different kinds of VMs, job schedulers and the big data processing worker VMs - (assumption) the job schedulers and the worker VMs have different memory requirements - to ease VM management (assumption), both job schedulers and the worker VMs need to be passed the same VM arguments? So in your case you would add the new -XX:+UseAdaptiveGenerationSizePolicyBeforeMajorCollection to both, and the worker VM would benefit from it, while the job scheduler would never ever expand the heap anyway? Otherwise, if you were able to pass different VM arguments to the different VMs, the use of -Xms (instead of that new flag) would seem straightforward to me (Only specifying -Xms will not actually commit the memory, so there is no difference in actual memory use). Particularly if, as you mention, full gc will not yield a significant amount of freed memory, both methods seem to achieve the exact same effect. Or is there another difference between passing -Xms instead of -XX:UseAdaptiveGenerationSizePolicyBeforeMajorCollection? > Besides, only a limited number of objects are > collected in the full GCs that occur during the heap expansion. So, > full GC here is especially expensive. Did you ever try G1 for these workloads? There are some (old) reports [0] where G1 outperforms Parallel GC with some tuning. It generally does not use full gcs to expand the heap. With recent improvements in JDK9, it should perform even slightly better, but I am not sure if Spark already works with JDK9. > >And why not make the (first) full gc expand the heap more > > aggressively? > >(I think there is at least one way to do that, something like > >Min/MaxFreeHeapRatio or so, I can look it up if needed). > Thank you for telling the Min/MaxHeapFreeRatio. I think they surely > help for our purpose, but I think this change would be still > effective with them. > > Best regards, Thanks, Thomas [0] https://databricks.com/blog/2015/05/28/tuning-java-garbage-collecti on-for-spark-applications.html > -- > Michihiro, > IBM Research - Tokyo > > Thomas Schatzl ---2017/10/13 22:04:38---Hi, On Tue, 2017-08-29 at > 00:20 +0900, Michihiro Horie wrote: > > From: Thomas Schatzl > To: Michihiro Horie , hotspot-dev at openjdk.java.net > Cc: Hiroshi H Horii > Date: 2017/10/13 22:04 > Subject: Re: RFR(M) 8186834:Expanding old area without full GC in > parallel GC > > > > Hi, > > On Tue, 2017-08-29 at 00:20 +0900, Michihiro Horie wrote: > > Dear all, > >? > > Would you please review the following change? > > bug: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.open > jdk.java.net_browse_JDK-2D8186834&d=DwIFaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=oecsIpYF-cifqq2i1JEH0Q&m=CaV8n9mhlYuwwkSthJ3tAKsxYWXA8YW- > A_scv5JwjxE&s=RN7_XLvlvAligv4Bmsj1fMFsKTHsrQQFEaLRIrjYm9Y&e= > > webrev: https://urldefense.proofpoint.com/v2/url?u=http-3A__cr.open > jdk.java.net_-7Emhorie_8186834_webrev.00_&d=DwIFaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=oecsIpYF-cifqq2i1JEH0Q&m=CaV8n9mhlYuwwkSthJ3tAKsxYWXA8YW- > A_scv5JwjxE&s=Lkjbx2hQv0H19iIiNH-7wwN0HKn5xxhXinMHhoPIvqI&e= > >? > > In parallel GC, old area is expanded only after a full GC occurs. > > I am wondering if we could give an option to expand old area > without > > full GC. So, I added an option > > UseAdaptiveGenerationSizePolicyBeforeMajorCollection > > Sorry for the late (and probably stupid) question, but what is the > difference (in performance) to simply set -Xms==-Xmx here? > > And why not make the (first) full gc expand the heap more > aggressively? > (I think there is at least one way to do that, something like > Min/MaxFreeHeapRatio or so, I can look it up if needed). > > Thanks, > ?Thomas > > > Following is a simple micro benchmark I used to see the benefit of > > this change. > > As a result, pause time of full GC reduced by 30%. Full GC count > > reduced by 54%. > > Elapsed time reduced by 7%. > >? > > import java.util.HashMap; > > import java.util.Map; > > public class HeapExpandTest { > > ? static Map map = new HashMap<>(); > > ? public static void main(String[] args) throws Exception { > > ????long start = System.currentTimeMillis(); > > ????for (int i = 0; i < 2200; ++i) { > > ??????map.put(i, new byte[1024*1024]); // 1MB > > ????} > > ????System.out.println("elapsed= " + (System.currentTimeMillis() - > > start)); > > ? } > > } > >? > > JVM options: -XX:+UseParallelGC -XX:+UseAdaptiveSizePolicy > > -XX:ParallelGCThreads=8 -Xms64m -Xmx3g > > -XX:+UseAdaptiveGenerationSizePolicyBeforeMajorCollection > > > From erik.joelsson at oracle.com Wed Oct 18 08:26:06 2017 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Wed, 18 Oct 2017 10:26:06 +0200 Subject: RFR: JDK-8189607 Remove duplicated jvmticmlr.h In-Reply-To: References: Message-ID: On 2017-10-18 10:04, Magnus Ihse Bursie wrote: > The file jvmticmlr.h is stored twice in the repo, both in hotspot and > in java.base. They are both identical, and only the java.base version > is included in the final product. This might arguably have been useful > in a pre-consolidated world, but makes absolutely no sense now. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189607 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8189607-remove-duplicated-jvmticmlr/webrev.01 > The question is, which file location makes the most sense. I think your pick of java.base/share/native/include probably makes more sense as that makes it much clearer that this is an exported header file. Looks good to me. /Erik From magnus.ihse.bursie at oracle.com Wed Oct 18 08:37:18 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Wed, 18 Oct 2017 10:37:18 +0200 Subject: RFR: JDK-8189607 Remove duplicated jvmticmlr.h In-Reply-To: References: Message-ID: On 2017-10-18 10:26, Erik Joelsson wrote: > On 2017-10-18 10:04, Magnus Ihse Bursie wrote: >> The file jvmticmlr.h is stored twice in the repo, both in hotspot and >> in java.base. They are both identical, and only the java.base version >> is included in the final product. This might arguably have been >> useful in a pre-consolidated world, but makes absolutely no sense now. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8189607 >> WebRev: >> http://cr.openjdk.java.net/~ihse/JDK-8189607-remove-duplicated-jvmticmlr/webrev.01 >> > The question is, which file location makes the most sense. I think > your pick of java.base/share/native/include probably makes more sense > as that makes it much clearer that this is an exported header file. Yes, that was my reasoning. Also, the file is not really tied to hotspot per se -- if you were to plug in another VM, you'd still need this file. Combined with the fact that this was the file that was exported to the world. (Which doesn't *really* make any difference in this case, since the files were identical...) > Looks good to me. Thanks. /Magnus > > /Erik > From robbin.ehn at oracle.com Wed Oct 18 08:51:33 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 18 Oct 2017 10:51:33 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <59E62216.5070401@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <59E62216.5070401@oracle.com> Message-ID: Thanks Erik, On 2017-10-17 17:30, Erik ?sterlund wrote: > Hi Robbin, > > Looks fantastic. We have to credit Mikael Gerdin for much of the work. Since you have been involved also, I count you as one of the contributors, and view your review as a bit biased but really appreciated of course :) /Robbin > > Thanks, > /Erik > > On 2017-10-11 15:37, Robbin Ehn wrote: >> Hi all, >> >> Starting the review of the code while JEP work is still not completed. >> >> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >> >> This JEP introduces a way to execute a callback on threads without performing >> a global VM safepoint. It makes it both possible and cheap to stop individual >> threads and not just all threads or none. >> >> Entire changeset: >> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >> >> Divided into 3-parts, >> SafepointMechanism abstraction: >> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >> Consolidating polling page allocation: >> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >> Handshakes: >> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >> >> A handshake operation is a callback that is executed for each JavaThread while >> that thread is in a safepoint safe state. The callback is executed either by >> the thread itself or by the VM thread while keeping the thread in a blocked >> state. The big difference between safepointing and handshaking is that the per >> thread operation will be performed on all threads as soon as possible and they >> will continue to execute as soon as it?s own operation is completed. If a >> JavaThread is known to be running, then a handshake can be performed with that >> single JavaThread as well. >> >> The current safepointing scheme is modified to perform an indirection through >> a per-thread pointer which will allow a single thread's execution to be forced >> to trap on the guard page. In order to force a thread to yield the VM updates >> the per-thread pointer for the corresponding thread to point to the guarded page. >> >> Example of potential use-cases: >> -Biased lock revocation >> -External requests for stack traces >> -Deoptimization >> -Async exception delivery >> -External suspension >> -Eliding memory barriers >> >> All of these will benefit the VM moving towards becoming more low-latency >> friendly by reducing the number of global safepoints. >> Platforms that do not yet implement the per JavaThread poll, a fallback to >> normal safepoint is in place. HandshakeOneThread will then be a normal >> safepoint. The supported platforms are Linux x64 and Solaris SPARC. >> >> Tested heavily with various test suits and comes with a few new tests. >> >> Performance testing using standardized benchmark show no signification >> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not >> statistically ensured). A minor regression for the load vs load load on x64 is >> expected and a slight increase on SPARC due to the cost of ?materializing? the >> page vs load load. >> The time to trigger a safepoint was measured on a large machine to not be an >> issue. The looping over threads and arming the polling page will benefit from >> the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) >> which puts all JavaThreads in an array instead of a linked list. >> >> Thanks, Robbin > From magnus.ihse.bursie at oracle.com Wed Oct 18 08:53:51 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Wed, 18 Oct 2017 10:53:51 +0200 Subject: RFR: JDK-8189608 Remove duplicated jni.h Message-ID: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> The file jni.h is stored twice in the repo, both in hotspot and in java.base. They are both identical, and only the java.base version is included in the final product. This bug is a part of the umbrella effort JDK-8167078 "Duplicate header files in hotspot and jdk". As for JDK-8189607, my reasoning is that the java.base version is the one to keep. (In this case, there was actually a small difference between the two files -- the hotspot version first copyright year was 1997, but the java.base version was 1996. It makes sense to keep the oldest one.) My assumption was that hotspot include files should be sorted according to the containing directory, and since jni.h no longer resides in "prims", I've rearranged the include line where needed. The -I path added in CompileJvm.gmk is identical to the one in JDK-8189607, and will be merged to the same change (depending on which fix enters first.) Bug: https://bugs.openjdk.java.net/browse/JDK-8189608 WebRev: http://cr.openjdk.java.net/~ihse/JDK-8189608-remove-duplicated-jni/webrev.01 /Magnus From serguei.spitsyn at oracle.com Wed Oct 18 09:00:09 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 18 Oct 2017 02:00:09 -0700 Subject: RFR: JDK-8189607 Remove duplicated jvmticmlr.h In-Reply-To: References: Message-ID: Hi Magnus, The fix looks good to me. Thank you for doing this cleanup. Thanks, Serguei On 10/18/17 01:04, Magnus Ihse Bursie wrote: > The file jvmticmlr.h is stored twice in the repo, both in hotspot and > in java.base. They are both identical, and only the java.base version > is included in the final product. This might arguably have been useful > in a pre-consolidated world, but makes absolutely no sense now. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189607 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8189607-remove-duplicated-jvmticmlr/webrev.01 > > /Magnus From robbin.ehn at oracle.com Wed Oct 18 09:06:57 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 18 Oct 2017 11:06:57 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <15dd917732444959b7785efbe6640952@sap.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> Message-ID: Thanks for looking at this. On 2017-10-17 19:58, Doerr, Martin wrote: > Hi Robbin, > > my first impression is very good. Thanks for providing the webrev. Great! > > I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. > Would it be ok to move the decision between what to use to platform code? > (Some platforms could still use both if this is beneficial.) > > E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. I see no issue with this. Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. Can we do this incremental when adding the platform support for PPC64? Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn > Sent: Mittwoch, 11. Oktober 2017 15:38 > To: hotspot-dev developers > Subject: RFR(XL): 8185640: Thread-local handshakes > > Hi all, > > Starting the review of the code while JEP work is still not completed. > > JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 > > This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not > just all threads or none. > > Entire changeset: > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ > > Divided into 3-parts, > SafepointMechanism abstraction: > http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ > Consolidating polling page allocation: > http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ > Handshakes: > http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ > > A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread > itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be > performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a > handshake can be performed with that single JavaThread as well. > > The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the > guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. > > Example of potential use-cases: > -Biased lock revocation > -External requests for stack traces > -Deoptimization > -Async exception delivery > -External suspension > -Eliding memory barriers > > All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. > Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported > platforms are Linux x64 and Solaris SPARC. > > Tested heavily with various test suits and comes with a few new tests. > > Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically > ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. > The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on > JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all > JavaThreads in an array instead of a linked list. > > Thanks, Robbin > From robbin.ehn at oracle.com Wed Oct 18 09:09:31 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 18 Oct 2017 11:09:31 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: <72a1da33-4680-1570-7d43-8ce28788f01c@oracle.com> Thanks Nils for looking at that! /Robbin On 2017-10-17 16:37, Nils Eliasson wrote: > Hi Robbin, > > I have reviewed the compiler parts of the patch - c1, c2, jvmci and cpu*. > > Look great! > > Regards, > > Nils > > > On 2017-10-11 15:37, Robbin Ehn wrote: >> Hi all, >> >> Starting the review of the code while JEP work is still not completed. >> >> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >> >> This JEP introduces a way to execute a callback on threads without performing >> a global VM safepoint. It makes it both possible and cheap to stop individual >> threads and not just all threads or none. >> >> Entire changeset: >> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >> >> Divided into 3-parts, >> SafepointMechanism abstraction: >> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >> Consolidating polling page allocation: >> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >> Handshakes: >> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >> >> A handshake operation is a callback that is executed for each JavaThread while >> that thread is in a safepoint safe state. The callback is executed either by >> the thread itself or by the VM thread while keeping the thread in a blocked >> state. The big difference between safepointing and handshaking is that the per >> thread operation will be performed on all threads as soon as possible and they >> will continue to execute as soon as it?s own operation is completed. If a >> JavaThread is known to be running, then a handshake can be performed with that >> single JavaThread as well. >> >> The current safepointing scheme is modified to perform an indirection through >> a per-thread pointer which will allow a single thread's execution to be forced >> to trap on the guard page. In order to force a thread to yield the VM updates >> the per-thread pointer for the corresponding thread to point to the guarded page. >> >> Example of potential use-cases: >> -Biased lock revocation >> -External requests for stack traces >> -Deoptimization >> -Async exception delivery >> -External suspension >> -Eliding memory barriers >> >> All of these will benefit the VM moving towards becoming more low-latency >> friendly by reducing the number of global safepoints. >> Platforms that do not yet implement the per JavaThread poll, a fallback to >> normal safepoint is in place. HandshakeOneThread will then be a normal >> safepoint. The supported platforms are Linux x64 and Solaris SPARC. >> >> Tested heavily with various test suits and comes with a few new tests. >> >> Performance testing using standardized benchmark show no signification >> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not >> statistically ensured). A minor regression for the load vs load load on x64 is >> expected and a slight increase on SPARC due to the cost of ?materializing? the >> page vs load load. >> The time to trigger a safepoint was measured on a large machine to not be an >> issue. The looping over threads and arming the polling page will benefit from >> the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) >> which puts all JavaThreads in an array instead of a linked list. >> >> Thanks, Robbin > From erik.joelsson at oracle.com Wed Oct 18 09:15:31 2017 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Wed, 18 Oct 2017 11:15:31 +0200 Subject: RFR: JDK-8189608 Remove duplicated jni.h In-Reply-To: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> References: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> Message-ID: <3ce96f6a-e7fe-b7eb-2212-07bba5b5043f@oracle.com> Looks good to me. /Erik On 2017-10-18 10:53, Magnus Ihse Bursie wrote: > The file jni.h is stored twice in the repo, both in hotspot and in > java.base. They are both identical, and only the java.base version is > included in the final product. > > This bug is a part of the umbrella effort JDK-8167078 "Duplicate > header files in hotspot and jdk". As for JDK-8189607, my reasoning is > that the java.base version is the one to keep. (In this case, there > was actually a small difference between the two files -- the hotspot > version first copyright year was 1997, but the java.base version was > 1996. It makes sense to keep the oldest one.) > > My assumption was that hotspot include files should be sorted > according to the containing directory, and since jni.h no longer > resides in "prims", I've rearranged the include line where needed. > > The -I path added in CompileJvm.gmk is identical to the one in > JDK-8189607, and will be merged to the same change (depending on which > fix enters first.) > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189608 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8189608-remove-duplicated-jni/webrev.01 > > /Magnus From robbin.ehn at oracle.com Wed Oct 18 09:15:53 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 18 Oct 2017 11:15:53 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: <82848a04-21dd-119e-3d53-101a7f25cb54@oracle.com> Hi all, Update after re-base with new atomic implementation: http://cr.openjdk.java.net/~rehn/8185640/v1/Atomic-Update-Rebase-3/ This goes on top of the Handshakes-2. Let me know if you want some other kinds of webrevs. I would like to point out that Mikael Gerdin and Erik ?sterlund also are contributors of this changeset. Thanks, Robbin On 2017-10-11 15:37, Robbin Ehn wrote: > Hi all, > > Starting the review of the code while JEP work is still not completed. > > JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 > > This JEP introduces a way to execute a callback on threads without performing a > global VM safepoint. It makes it both possible and cheap to stop individual > threads and not just all threads or none. > > Entire changeset: > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ > > Divided into 3-parts, > SafepointMechanism abstraction: > http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ > Consolidating polling page allocation: > http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ > Handshakes: > http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ > > A handshake operation is a callback that is executed for each JavaThread while > that thread is in a safepoint safe state. The callback is executed either by the > thread itself or by the VM thread while keeping the thread in a blocked state. > The big difference between safepointing and handshaking is that the per thread > operation will be performed on all threads as soon as possible and they will > continue to execute as soon as it?s own operation is completed. If a JavaThread > is known to be running, then a handshake can be performed with that single > JavaThread as well. > > The current safepointing scheme is modified to perform an indirection through a > per-thread pointer which will allow a single thread's execution to be forced to > trap on the guard page. In order to force a thread to yield the VM updates the > per-thread pointer for the corresponding thread to point to the guarded page. > > Example of potential use-cases: > -Biased lock revocation > -External requests for stack traces > -Deoptimization > -Async exception delivery > -External suspension > -Eliding memory barriers > > All of these will benefit the VM moving towards becoming more low-latency > friendly by reducing the number of global safepoints. > Platforms that do not yet implement the per JavaThread poll, a fallback to > normal safepoint is in place. HandshakeOneThread will then be a normal > safepoint. The supported platforms are Linux x64 and Solaris SPARC. > > Tested heavily with various test suits and comes with a few new tests. > > Performance testing using standardized benchmark show no signification changes, > the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not > statistically ensured). A minor regression for the load vs load load on x64 is > expected and a slight increase on SPARC due to the cost of ?materializing? the > page vs load load. > The time to trigger a safepoint was measured on a large machine to not be an > issue. The looping over threads and arming the polling page will benefit from > the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) > which puts all JavaThreads in an array instead of a linked list. > > Thanks, Robbin From david.holmes at oracle.com Wed Oct 18 09:29:50 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 Oct 2017 19:29:50 +1000 Subject: RFR: JDK-8189607 Remove duplicated jvmticmlr.h In-Reply-To: References: Message-ID: <20e2aee8-17cc-e668-ae45-3d782794f9d3@oracle.com> Hi Magnus, This seems fine to me. Sanity check: the various -Ixxx will be processed in order and the first file found will be used - right? ie we won't unintentionally pick up the java.base jni.h. Thanks, David On 18/10/2017 6:04 PM, Magnus Ihse Bursie wrote: > The file jvmticmlr.h is stored twice in the repo, both in hotspot and in > java.base. They are both identical, and only the java.base version is > included in the final product. This might arguably have been useful in a > pre-consolidated world, but makes absolutely no sense now. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189607 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8189607-remove-duplicated-jvmticmlr/webrev.01 > > > /Magnus From erik.joelsson at oracle.com Wed Oct 18 09:32:12 2017 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Wed, 18 Oct 2017 11:32:12 +0200 Subject: RFR: JDK-8189607 Remove duplicated jvmticmlr.h In-Reply-To: <20e2aee8-17cc-e668-ae45-3d782794f9d3@oracle.com> References: <20e2aee8-17cc-e668-ae45-3d782794f9d3@oracle.com> Message-ID: Hello David, On 2017-10-18 11:29, David Holmes wrote: > > Sanity check: the various -Ixxx will be processed in order and the > first file found will be used - right? ie we won't unintentionally > pick up the java.base jni.h. > Correct, the search order is the order in which the -I parameters are listed on the command line. /Erik From david.holmes at oracle.com Wed Oct 18 09:34:41 2017 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 Oct 2017 19:34:41 +1000 Subject: RFR: JDK-8189608 Remove duplicated jni.h In-Reply-To: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> References: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> Message-ID: <0a8d2474-38eb-8dc1-aa39-5c541b466222@oracle.com> Looks good to me. Thanks, David On 18/10/2017 6:53 PM, Magnus Ihse Bursie wrote: > The file jni.h is stored twice in the repo, both in hotspot and in > java.base. They are both identical, and only the java.base version is > included in the final product. > > This bug is a part of the umbrella effort JDK-8167078 "Duplicate header > files in hotspot and jdk". As for JDK-8189607, my reasoning is that the > java.base version is the one to keep. (In this case, there was actually > a small difference between the two files -- the hotspot version first > copyright year was 1997, but the java.base version was 1996. It makes > sense to keep the oldest one.) > > My assumption was that hotspot include files should be sorted according > to the containing directory, and since jni.h no longer resides in > "prims", I've rearranged the include line where needed. > > The -I path added in CompileJvm.gmk is identical to the one in > JDK-8189607, and will be merged to the same change (depending on which > fix enters first.) > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189608 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8189608-remove-duplicated-jni/webrev.01 > > > /Magnus From martin.doerr at sap.com Wed Oct 18 10:11:14 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 18 Oct 2017 10:11:14 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> Message-ID: <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> Hi Robbin, so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? I'd be fine with that, too. While thinking a little longer about the interpreter implementation, a new idea came into my mind. I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); in TemplateInterpreterGenerator::generate_and_dispatch. Best regards, Martin -----Original Message----- From: Robbin Ehn [mailto:robbin.ehn at oracle.com] Sent: Mittwoch, 18. Oktober 2017 11:07 To: Doerr, Martin ; hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes Thanks for looking at this. On 2017-10-17 19:58, Doerr, Martin wrote: > Hi Robbin, > > my first impression is very good. Thanks for providing the webrev. Great! > > I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. > Would it be ok to move the decision between what to use to platform code? > (Some platforms could still use both if this is beneficial.) > > E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. I see no issue with this. Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. Can we do this incremental when adding the platform support for PPC64? Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn > Sent: Mittwoch, 11. Oktober 2017 15:38 > To: hotspot-dev developers > Subject: RFR(XL): 8185640: Thread-local handshakes > > Hi all, > > Starting the review of the code while JEP work is still not completed. > > JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 > > This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not > just all threads or none. > > Entire changeset: > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ > > Divided into 3-parts, > SafepointMechanism abstraction: > http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ > Consolidating polling page allocation: > http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ > Handshakes: > http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ > > A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread > itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be > performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a > handshake can be performed with that single JavaThread as well. > > The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the > guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. > > Example of potential use-cases: > -Biased lock revocation > -External requests for stack traces > -Deoptimization > -Async exception delivery > -External suspension > -Eliding memory barriers > > All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. > Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported > platforms are Linux x64 and Solaris SPARC. > > Tested heavily with various test suits and comes with a few new tests. > > Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically > ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. > The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on > JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all > JavaThreads in an array instead of a linked list. > > Thanks, Robbin > From martin.doerr at sap.com Wed Oct 18 10:43:37 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 18 Oct 2017 10:43:37 +0000 Subject: 8188131: [PPC] Increase inlining thresholds to the same as other platforms In-Reply-To: References: Message-ID: <6a5027ffe1c14f48a0bf39523c88aa4b@sap.com> Hi Ogata, sorry for the delay. I had missed this one. The change looks feasible to me. It may only impact the utilization of the Code Cache. Can you evaluate that (e.g. by running large benchmarks with -XX:+PrintCodeCache)? Thanks and best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Kazunori Ogata Sent: Freitag, 29. September 2017 08:42 To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as other platforms Hi all, Please review a change for JDK-8188131. Bug report: https://bugs.openjdk.java.net/browse/JDK-8188131 Webrev: http://cr.openjdk.java.net/~horii/8188131/webrev.00/ This change increases the default values of FreqInlineSize and InlineSmallCode in ppc64 to 325 and 2500, respectively. These values are the same as aarch64. The performance of TPC-DS Q96 was improved by about 6% with this change. Regards, Ogata From thomas.schatzl at oracle.com Wed Oct 18 11:08:26 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 18 Oct 2017 13:08:26 +0200 Subject: Request for review JDK-8187819 gc/TestFullGCALot.java fails on jdk10 started with "-XX:-UseCompressedOops" option In-Reply-To: References: Message-ID: <1508324906.4435.15.camel@oracle.com> Hi, On Tue, 2017-10-03 at 14:44 -0400, Alexander Harlap wrote: > Please review change for JDK-8187819? > ? > gc/TestFullGCALot.java fails on jdk10 started with? > "-XX:-UseCompressedOops" option. > > Change is located at http://cr.openjdk.java.net/~aharlap/8187819/webr > ev.00/ > > Initialized metaspace performance counters before their potential > use. > > Tested - JPRT > - I think you should add the 8187819 number to the TestFullGCALot test at the @bug tag Looks good otherwise. Thanks, Thomas From stefan.karlsson at oracle.com Wed Oct 18 11:55:34 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 18 Oct 2017 13:55:34 +0200 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: <8209F13B-72CA-4135-B589-09D72A0B54AA@oracle.com> References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> <8209F13B-72CA-4135-B589-09D72A0B54AA@oracle.com> Message-ID: Hi all, Updated webrevs: http://cr.openjdk.java.net/~stefank/8189359/webrev.03.delta http://cr.openjdk.java.net/~stefank/8189359/webrev.03 Changes in the webrevs: ------------------------------------------------------------------------ I've added back all the missing evacuate followers closure to try to mimic the original code as much as possible. This unveiled a bug for CMS with the original patch. I've re-added the following section to referenceProcessor.cpp: if (task_executor != NULL) { task_executor->set_single_threaded_mode(); } When running with CMS this executes the following code: void ParNewRefProcTaskExecutor::set_single_threaded_mode() { _state_set.flush(); GenCollectedHeap* gch = GenCollectedHeap::heap(); gch->save_marks(); } The missing call to GenCollectedHeap::save_marks() caused subsequent calls to the evacuate followers closure to assert that the same object were scanned twice. ------------------------------------------------------------------------ I also reverted to using PSKeepAliveClosure instead of PSScavengeRootsClosure in psScavenge.cpp. ------------------------------------------------------------------------ The comment I add for WeakProcessor::weak_oops_do previously stated that the function applied the "complete" closure after _each_ container had been processed. The next patch will move the call to JvmtiExport::weak_oops_do, and then the code wouldn't mimic the original code. I've updated the comment to state that we only apply the "complete" closure once, after _all_ containers have been processed. Thanks, StefanK On 2017-10-18 02:09, Kim Barrett wrote: >> On Oct 17, 2017, at 5:38 PM, Per Liden wrote: >> >> Hi, >> >> On 2017-10-17 22:57, Stefan Karlsson wrote: >> [...] >>> Here are the updated webrevs: >>> http://cr.openjdk.java.net/~stefank/8189359/webrev.01.delta >>> http://cr.openjdk.java.net/~stefank/8189359/webrev.01 >> >> Looks good. Just two comments. >> >> share/gc/parallel/psScavenge.cpp: >> >> 446 { >> 447 GCTraceTime(Debug, gc, phases) tm("Weak Processing", &_gc_timer); >> 448 WeakProcessor::weak_oops_do(&_is_alive_closure, &root_closure); >> 449 } >> >> I see you've kept the "complete" closure in WeakProcessor::weak_oops_do(), which is fine and we can clean that out later, but here you don't seem to mimic exactly what the old code did. I think you want to pass in &evac_followers here, right? >> >> share/gc/serial/defNewGeneration.cpp: >> >> 662 WeakProcessor::weak_oops_do(&is_alive, &keep_alive); >> >> Same here, pass in &evacuate_followers? >> >> I don't need to see a new webrev. >> >> cheers, >> Per > > Oh, I missed that. Same thing in cms/parNewGeneration.cpp, I think. > > Otherwise, looks good. > > I don?t need a new webrev either. > > From per.liden at oracle.com Wed Oct 18 12:00:15 2017 From: per.liden at oracle.com (Per Liden) Date: Wed, 18 Oct 2017 14:00:15 +0200 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> <8209F13B-72CA-4135-B589-09D72A0B54AA@oracle.com> Message-ID: <9b09e340-8956-019b-fcbe-6affb844c708@oracle.com> Looks good! /Per On 2017-10-18 13:55, Stefan Karlsson wrote: > Hi all, > > Updated webrevs: > http://cr.openjdk.java.net/~stefank/8189359/webrev.03.delta > http://cr.openjdk.java.net/~stefank/8189359/webrev.03 > > Changes in the webrevs: > ------------------------------------------------------------------------ > I've added back all the missing evacuate followers closure to try to > mimic the original code as much as possible. > > This unveiled a bug for CMS with the original patch. I've re-added the > following section to referenceProcessor.cpp: > > if (task_executor != NULL) { > task_executor->set_single_threaded_mode(); > } > > When running with CMS this executes the following code: > > void ParNewRefProcTaskExecutor::set_single_threaded_mode() { > _state_set.flush(); > GenCollectedHeap* gch = GenCollectedHeap::heap(); > gch->save_marks(); > } > > The missing call to GenCollectedHeap::save_marks() caused subsequent > calls to the evacuate followers closure to assert that the same object > were scanned twice. > > ------------------------------------------------------------------------ > I also reverted to using PSKeepAliveClosure instead of > PSScavengeRootsClosure in psScavenge.cpp. > > ------------------------------------------------------------------------ > The comment I add for WeakProcessor::weak_oops_do previously stated that > the function applied the "complete" closure after _each_ container had > been processed. The next patch will move the call to > JvmtiExport::weak_oops_do, and then the code wouldn't mimic the original > code. > > I've updated the comment to state that we only apply the "complete" > closure once, after _all_ containers have been processed. > > Thanks, > StefanK > > > > On 2017-10-18 02:09, Kim Barrett wrote: >>> On Oct 17, 2017, at 5:38 PM, Per Liden wrote: >>> >>> Hi, >>> >>> On 2017-10-17 22:57, Stefan Karlsson wrote: >>> [...] >>>> Here are the updated webrevs: >>>> http://cr.openjdk.java.net/~stefank/8189359/webrev.01.delta >>>> http://cr.openjdk.java.net/~stefank/8189359/webrev.01 >>> >>> Looks good. Just two comments. >>> >>> share/gc/parallel/psScavenge.cpp: >>> >>> 446 { >>> 447 GCTraceTime(Debug, gc, phases) tm("Weak Processing", >>> &_gc_timer); >>> 448 WeakProcessor::weak_oops_do(&_is_alive_closure, >>> &root_closure); >>> 449 } >>> >>> I see you've kept the "complete" closure in >>> WeakProcessor::weak_oops_do(), which is fine and we can clean that >>> out later, but here you don't seem to mimic exactly what the old code >>> did. I think you want to pass in &evac_followers here, right? >>> >>> share/gc/serial/defNewGeneration.cpp: >>> >>> 662 WeakProcessor::weak_oops_do(&is_alive, &keep_alive); >>> >>> Same here, pass in &evacuate_followers? >>> >>> I don't need to see a new webrev. >>> >>> cheers, >>> Per >> >> Oh, I missed that. Same thing in cms/parNewGeneration.cpp, I think. >> >> Otherwise, looks good. >> >> I don?t need a new webrev either. >> >> From stefan.karlsson at oracle.com Wed Oct 18 12:01:40 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 18 Oct 2017 14:01:40 +0200 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: <67b8baf1-0e2b-7ebc-2826-de81da5cf770@oracle.com> References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> <67b8baf1-0e2b-7ebc-2826-de81da5cf770@oracle.com> Message-ID: <276b4eb8-1bb1-0ec7-72c6-6279665b58f5@oracle.com> Hi Per, On 2017-10-17 23:43, Per Liden wrote: > Hi, > > On 2017-10-16 17:40, Stefan Karlsson wrote: >> Hi all, >> >> Please review this patch to move the call of the static >> JvmtiExport::weak_oops_do out of the JNIHandleBlock::weak_oops_do >> member function into the new WeakProcessor. >> >> Today, this isn't causing any bugs because there's only one instance >> of JNIHandleBlock, the _weak_global_handles. However, in prototypes >> with more than one JNIHandleBlock, this results in multiple calls to >> JvmtiExport::weak_oops_do. >> >> http://cr.openjdk.java.net/~stefank/8189360/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8189360 > > ? 30 void WeakProcessor::unlink_or_oops_do(BoolObjectClosure* is_alive, > OopClosure* keep_alive, VoidClosure* complete) { > ? 31?? JNIHandles::weak_oops_do(is_alive, keep_alive); > ? 32?? if (complete != NULL) { > ? 33???? complete->do_void(); > ? 34?? } > ? 35 > ? 36?? JvmtiExport::weak_oops_do(is_alive, keep_alive); > ? 37?? if (complete != NULL) { > ? 38???? complete->do_void(); > ? 39?? } > ? 40 } > > Should you really be calling complete->do_void() twice here. It seems to > me that doing it once, after both calls to weak_oops_do() would mimic > what the old code did? You're right. I've update the code to only call the "complete" closure at the end of the function. FYI, the latest revision of the patch for 8189360 also updated the name unlink_or_oops_do to weak_oops_do. I've also taken the liberty to implement oops_do as a call to weak_oops_do. This way we only have to list the calls to the individual containers once. New webrevs: http://cr.openjdk.java.net/~stefank/8189360/webrev.01.delta http://cr.openjdk.java.net/~stefank/8189360/webrev.01 Thanks, StefanK > > cheers, > Per > >> >> This patch builds upon the patch in: >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028684.html >> >> >> Tested with JPRT. >> >> Thanks, >> StefanK From stefan.karlsson at oracle.com Wed Oct 18 12:02:05 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 18 Oct 2017 14:02:05 +0200 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: <9b09e340-8956-019b-fcbe-6affb844c708@oracle.com> References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> <8209F13B-72CA-4135-B589-09D72A0B54AA@oracle.com> <9b09e340-8956-019b-fcbe-6affb844c708@oracle.com> Message-ID: <0d222a7b-0a00-737d-2100-ad96b58a1ee3@oracle.com> Thanks, Per! StefanK On 2017-10-18 14:00, Per Liden wrote: > Looks good! > > /Per > > On 2017-10-18 13:55, Stefan Karlsson wrote: >> Hi all, >> >> Updated webrevs: >> ?http://cr.openjdk.java.net/~stefank/8189359/webrev.03.delta >> ?http://cr.openjdk.java.net/~stefank/8189359/webrev.03 >> >> Changes in the webrevs: >> ------------------------------------------------------------------------ >> I've added back all the missing evacuate followers closure to try to >> mimic the original code as much as possible. >> >> This unveiled a bug for CMS with the original patch. I've re-added the >> following section to referenceProcessor.cpp: >> >> ? if (task_executor != NULL) { >> ??? task_executor->set_single_threaded_mode(); >> ? } >> >> When running with CMS this executes the following code: >> >> ? void ParNewRefProcTaskExecutor::set_single_threaded_mode() { >> ??? _state_set.flush(); >> ??? GenCollectedHeap* gch = GenCollectedHeap::heap(); >> ??? gch->save_marks(); >> ? } >> >> The missing call to GenCollectedHeap::save_marks() caused subsequent >> calls to the evacuate followers closure to assert that the same object >> were scanned twice. >> >> ------------------------------------------------------------------------ >> I also reverted to using PSKeepAliveClosure instead of >> PSScavengeRootsClosure in psScavenge.cpp. >> >> ------------------------------------------------------------------------ >> The comment I add for WeakProcessor::weak_oops_do previously stated that >> the function applied the "complete" closure after _each_ container had >> been processed. The next patch will move the call to >> JvmtiExport::weak_oops_do, and then the code wouldn't mimic the original >> code. >> >> I've updated the comment to state that we only apply the "complete" >> closure once, after _all_ containers have been processed. >> >> Thanks, >> StefanK >> >> >> >> On 2017-10-18 02:09, Kim Barrett wrote: >>>> On Oct 17, 2017, at 5:38 PM, Per Liden wrote: >>>> >>>> Hi, >>>> >>>> On 2017-10-17 22:57, Stefan Karlsson wrote: >>>> [...] >>>>> Here are the updated webrevs: >>>>> ? http://cr.openjdk.java.net/~stefank/8189359/webrev.01.delta >>>>> ? http://cr.openjdk.java.net/~stefank/8189359/webrev.01 >>>> >>>> Looks good. Just two comments. >>>> >>>> share/gc/parallel/psScavenge.cpp: >>>> >>>> 446???? { >>>> 447?????? GCTraceTime(Debug, gc, phases) tm("Weak Processing", >>>> &_gc_timer); >>>> 448?????? WeakProcessor::weak_oops_do(&_is_alive_closure, >>>> &root_closure); >>>> 449???? } >>>> >>>> I see you've kept the "complete" closure in >>>> WeakProcessor::weak_oops_do(), which is fine and we can clean that >>>> out later, but here you don't seem to mimic exactly what the old code >>>> did. I think you want to pass in &evac_followers here, right? >>>> >>>> share/gc/serial/defNewGeneration.cpp: >>>> >>>> 662?? WeakProcessor::weak_oops_do(&is_alive, &keep_alive); >>>> >>>> Same here, pass in &evacuate_followers? >>>> >>>> I don't need to see a new webrev. >>>> >>>> cheers, >>>> Per >>> >>> Oh, I missed that.? Same thing in cms/parNewGeneration.cpp, I think. >>> >>> Otherwise, looks good. >>> >>> I don?t need a new webrev either. >>> >>> From per.liden at oracle.com Wed Oct 18 12:18:42 2017 From: per.liden at oracle.com (Per Liden) Date: Wed, 18 Oct 2017 14:18:42 +0200 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: <276b4eb8-1bb1-0ec7-72c6-6279665b58f5@oracle.com> References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> <67b8baf1-0e2b-7ebc-2826-de81da5cf770@oracle.com> <276b4eb8-1bb1-0ec7-72c6-6279665b58f5@oracle.com> Message-ID: Looks good! /Per On 2017-10-18 14:01, Stefan Karlsson wrote: > Hi Per, > > On 2017-10-17 23:43, Per Liden wrote: >> Hi, >> >> On 2017-10-16 17:40, Stefan Karlsson wrote: >>> Hi all, >>> >>> Please review this patch to move the call of the static >>> JvmtiExport::weak_oops_do out of the JNIHandleBlock::weak_oops_do >>> member function into the new WeakProcessor. >>> >>> Today, this isn't causing any bugs because there's only one instance >>> of JNIHandleBlock, the _weak_global_handles. However, in prototypes >>> with more than one JNIHandleBlock, this results in multiple calls to >>> JvmtiExport::weak_oops_do. >>> >>> http://cr.openjdk.java.net/~stefank/8189360/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8189360 >> >> 30 void WeakProcessor::unlink_or_oops_do(BoolObjectClosure* >> is_alive, OopClosure* keep_alive, VoidClosure* complete) { >> 31 JNIHandles::weak_oops_do(is_alive, keep_alive); >> 32 if (complete != NULL) { >> 33 complete->do_void(); >> 34 } >> 35 >> 36 JvmtiExport::weak_oops_do(is_alive, keep_alive); >> 37 if (complete != NULL) { >> 38 complete->do_void(); >> 39 } >> 40 } >> >> Should you really be calling complete->do_void() twice here. It seems >> to me that doing it once, after both calls to weak_oops_do() would >> mimic what the old code did? > > You're right. I've update the code to only call the "complete" closure > at the end of the function. > > FYI, the latest revision of the patch for 8189360 also updated the name > unlink_or_oops_do to weak_oops_do. > > I've also taken the liberty to implement oops_do as a call to > weak_oops_do. This way we only have to list the calls to the individual > containers once. > > New webrevs: > http://cr.openjdk.java.net/~stefank/8189360/webrev.01.delta > http://cr.openjdk.java.net/~stefank/8189360/webrev.01 > > Thanks, > StefanK > >> >> cheers, >> Per >> >>> >>> This patch builds upon the patch in: >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028684.html >>> >>> >>> Tested with JPRT. >>> >>> Thanks, >>> StefanK From coleen.phillimore at oracle.com Wed Oct 18 13:14:42 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 18 Oct 2017 09:14:42 -0400 Subject: RFR: JDK-8189608 Remove duplicated jni.h In-Reply-To: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> References: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> Message-ID: <237f7a02-73a8-a121-d4f4-5978c7479b79@oracle.com> This looks great.? There's also jvm.h too, which is a little more different but shouldn't be. Did/could you make this change in the jdk10/hs repository since it's primarily hotspot files??? I can't tell from the webrev. Thanks, Coleen On 10/18/17 4:53 AM, Magnus Ihse Bursie wrote: > The file jni.h is stored twice in the repo, both in hotspot and in > java.base. They are both identical, and only the java.base version is > included in the final product. > > This bug is a part of the umbrella effort JDK-8167078 "Duplicate > header files in hotspot and jdk". As for JDK-8189607, my reasoning is > that the java.base version is the one to keep. (In this case, there > was actually a small difference between the two files -- the hotspot > version first copyright year was 1997, but the java.base version was > 1996. It makes sense to keep the oldest one.) > > My assumption was that hotspot include files should be sorted > according to the containing directory, and since jni.h no longer > resides in "prims", I've rearranged the include line where needed. > > The -I path added in CompileJvm.gmk is identical to the one in > JDK-8189607, and will be merged to the same change (depending on which > fix enters first.) > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189608 > WebRev: > http://cr.openjdk.java.net/~ihse/JDK-8189608-remove-duplicated-jni/webrev.01 > > /Magnus From stefan.karlsson at oracle.com Wed Oct 18 13:16:51 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 18 Oct 2017 15:16:51 +0200 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> <67b8baf1-0e2b-7ebc-2826-de81da5cf770@oracle.com> <276b4eb8-1bb1-0ec7-72c6-6279665b58f5@oracle.com> Message-ID: <6b8d32fe-592b-6087-3283-d46546aba044@oracle.com> Thanks, Per! StefanK On 2017-10-18 14:18, Per Liden wrote: > Looks good! > > /Per > > On 2017-10-18 14:01, Stefan Karlsson wrote: >> Hi Per, >> >> On 2017-10-17 23:43, Per Liden wrote: >>> Hi, >>> >>> On 2017-10-16 17:40, Stefan Karlsson wrote: >>>> Hi all, >>>> >>>> Please review this patch to move the call of the static >>>> JvmtiExport::weak_oops_do out of the JNIHandleBlock::weak_oops_do >>>> member function into the new WeakProcessor. >>>> >>>> Today, this isn't causing any bugs because there's only one instance >>>> of JNIHandleBlock, the _weak_global_handles. However, in prototypes >>>> with more than one JNIHandleBlock, this results in multiple calls to >>>> JvmtiExport::weak_oops_do. >>>> >>>> http://cr.openjdk.java.net/~stefank/8189360/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8189360 >>> >>> ?? 30 void WeakProcessor::unlink_or_oops_do(BoolObjectClosure* >>> is_alive, OopClosure* keep_alive, VoidClosure* complete) { >>> ?? 31?? JNIHandles::weak_oops_do(is_alive, keep_alive); >>> ?? 32?? if (complete != NULL) { >>> ?? 33???? complete->do_void(); >>> ?? 34?? } >>> ?? 35 >>> ?? 36?? JvmtiExport::weak_oops_do(is_alive, keep_alive); >>> ?? 37?? if (complete != NULL) { >>> ?? 38???? complete->do_void(); >>> ?? 39?? } >>> ?? 40 } >>> >>> Should you really be calling complete->do_void() twice here. It seems >>> to me that doing it once, after both calls to weak_oops_do() would >>> mimic what the old code did? >> >> You're right. I've update the code to only call the "complete" closure >> at the end of the function. >> >> FYI, the latest revision of the patch for 8189360 also updated the name >> unlink_or_oops_do to weak_oops_do. >> >> I've also taken the liberty to implement oops_do as a call to >> weak_oops_do. This way we only have to list the calls to the individual >> containers once. >> >> New webrevs: >> ?http://cr.openjdk.java.net/~stefank/8189360/webrev.01.delta >> ?http://cr.openjdk.java.net/~stefank/8189360/webrev.01 >> >> Thanks, >> StefanK >> >>> >>> cheers, >>> Per >>> >>>> >>>> This patch builds upon the patch in: >>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-October/028684.html >>>> >>>> >>>> >>>> Tested with JPRT. >>>> >>>> Thanks, >>>> StefanK From robbin.ehn at oracle.com Wed Oct 18 13:57:35 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 18 Oct 2017 15:57:35 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> Message-ID: <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> Hi Martin, On 2017-10-18 12:11, Doerr, Martin wrote: > Hi Robbin, > > so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? > I'd be fine with that, too. Yes, great! > > While thinking a little longer about the interpreter implementation, a new idea came into my mind. > I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like > if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); > in TemplateInterpreterGenerator::generate_and_dispatch. We have not seen any performance regression in simple benchmark with this. I will do a better benchmark and compare what difference it makes. Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Robbin Ehn [mailto:robbin.ehn at oracle.com] > Sent: Mittwoch, 18. Oktober 2017 11:07 > To: Doerr, Martin ; hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > Thanks for looking at this. > > On 2017-10-17 19:58, Doerr, Martin wrote: >> Hi Robbin, >> >> my first impression is very good. Thanks for providing the webrev. > > Great! > >> >> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >> Would it be ok to move the decision between what to use to platform code? >> (Some platforms could still use both if this is beneficial.) >> >> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. > > I see no issue with this. > Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. > Can we do this incremental when adding the platform support for PPC64? > > Thanks, Robbin > >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >> Sent: Mittwoch, 11. Oktober 2017 15:38 >> To: hotspot-dev developers >> Subject: RFR(XL): 8185640: Thread-local handshakes >> >> Hi all, >> >> Starting the review of the code while JEP work is still not completed. >> >> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >> >> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >> just all threads or none. >> >> Entire changeset: >> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >> >> Divided into 3-parts, >> SafepointMechanism abstraction: >> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >> Consolidating polling page allocation: >> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >> Handshakes: >> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >> >> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >> handshake can be performed with that single JavaThread as well. >> >> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >> >> Example of potential use-cases: >> -Biased lock revocation >> -External requests for stack traces >> -Deoptimization >> -Async exception delivery >> -External suspension >> -Eliding memory barriers >> >> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >> platforms are Linux x64 and Solaris SPARC. >> >> Tested heavily with various test suits and comes with a few new tests. >> >> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >> JavaThreads in an array instead of a linked list. >> >> Thanks, Robbin >> From coleen.phillimore at oracle.com Wed Oct 18 14:00:26 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 18 Oct 2017 10:00:26 -0400 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> Message-ID: <5b81e9e8-eb09-4598-6da2-212ad37cb1c1@oracle.com> On 10/18/17 9:57 AM, Robbin Ehn wrote: >> >> While thinking a little longer about the interpreter implementation, >> a new idea came into my mind. >> I think we could significantly reduce impact on interpreter code size >> and performance by using safepoint polls only in a subset of >> bytecodes. E.g., we could use only bytecodes which perform any kind >> of jump by implementing something like >> if (SafepointMechanism::uses_thread_local_poll() && >> t->does_dispatch()) generate_safepoint_poll(); >> in TemplateInterpreterGenerator::generate_and_dispatch. > > We have not seen any performance regression in simple benchmark with > this. > I will do a better benchmark and compare what difference it makes. I think this is a good suggestion for a further RFE.? At one point, I'd only enabled safepoints for backward branches and returns in the safepoint table but it had no effect on performance, but since this generates code in dispatch_epilogue, it might help with code bloat. Thanks, Coleen From martin.doerr at sap.com Wed Oct 18 14:05:49 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 18 Oct 2017 14:05:49 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> Message-ID: Hi Robbin, thanks for the quick reply and for doing additional benchmarks. Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code. I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) Best regards, Martin -----Original Message----- From: Robbin Ehn [mailto:robbin.ehn at oracle.com] Sent: Mittwoch, 18. Oktober 2017 15:58 To: Doerr, Martin ; hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes Hi Martin, On 2017-10-18 12:11, Doerr, Martin wrote: > Hi Robbin, > > so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? > I'd be fine with that, too. Yes, great! > > While thinking a little longer about the interpreter implementation, a new idea came into my mind. > I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like > if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); > in TemplateInterpreterGenerator::generate_and_dispatch. We have not seen any performance regression in simple benchmark with this. I will do a better benchmark and compare what difference it makes. Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Robbin Ehn [mailto:robbin.ehn at oracle.com] > Sent: Mittwoch, 18. Oktober 2017 11:07 > To: Doerr, Martin ; hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > Thanks for looking at this. > > On 2017-10-17 19:58, Doerr, Martin wrote: >> Hi Robbin, >> >> my first impression is very good. Thanks for providing the webrev. > > Great! > >> >> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >> Would it be ok to move the decision between what to use to platform code? >> (Some platforms could still use both if this is beneficial.) >> >> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. > > I see no issue with this. > Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. > Can we do this incremental when adding the platform support for PPC64? > > Thanks, Robbin > >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >> Sent: Mittwoch, 11. Oktober 2017 15:38 >> To: hotspot-dev developers >> Subject: RFR(XL): 8185640: Thread-local handshakes >> >> Hi all, >> >> Starting the review of the code while JEP work is still not completed. >> >> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >> >> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >> just all threads or none. >> >> Entire changeset: >> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >> >> Divided into 3-parts, >> SafepointMechanism abstraction: >> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >> Consolidating polling page allocation: >> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >> Handshakes: >> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >> >> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >> handshake can be performed with that single JavaThread as well. >> >> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >> >> Example of potential use-cases: >> -Biased lock revocation >> -External requests for stack traces >> -Deoptimization >> -Async exception delivery >> -External suspension >> -Eliding memory barriers >> >> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >> platforms are Linux x64 and Solaris SPARC. >> >> Tested heavily with various test suits and comes with a few new tests. >> >> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >> JavaThreads in an array instead of a linked list. >> >> Thanks, Robbin >> From claes.redestad at oracle.com Wed Oct 18 14:28:34 2017 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 18 Oct 2017 16:28:34 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> Message-ID: <394ca03f-ce6b-5200-8bde-6c4bcb40d35f@oracle.com> Hi! On 2017-10-18 16:05, Doerr, Martin wrote: > [...] when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) we do a lot of benchmarking to measure startup, warmup and footprint on a variety of applications, and have been improving tooling to flag even very small regressions (statistically significant results on <0.5M instruction increases). -Xint is typically not explicitly used for any benchmarking other than as a diagnostic tool, and even if we did I'd imagine we'd not file bugs if they didn't also correlate with a regression in a mixed mode config. /Claes From martin.doerr at sap.com Wed Oct 18 14:43:13 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 18 Oct 2017 14:43:13 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <394ca03f-ce6b-5200-8bde-6c4bcb40d35f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <394ca03f-ce6b-5200-8bde-6c4bcb40d35f@oracle.com> Message-ID: Hi Claes, thanks for the explanation. We use -Xint benchmarking only when we make significant interpreter changes as quick regression check (not so relevant for real life, but delivers stable and quick results). Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Claes Redestad Sent: Mittwoch, 18. Oktober 2017 16:29 To: hotspot-dev at openjdk.java.net Subject: Re: RFR(XL): 8185640: Thread-local handshakes Hi! On 2017-10-18 16:05, Doerr, Martin wrote: > [...] when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) we do a lot of benchmarking to measure startup, warmup and footprint on a variety of applications, and have been improving tooling to flag even very small regressions (statistically significant results on <0.5M instruction increases). -Xint is typically not explicitly used for any benchmarking other than as a diagnostic tool, and even if we did I'd imagine we'd not file bugs if they didn't also correlate with a regression in a mixed mode config. /Claes From jesper.wilhelmsson at oracle.com Wed Oct 18 17:21:01 2017 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Wed, 18 Oct 2017 19:21:01 +0200 Subject: Integration blockers Message-ID: Hi, I've gone through all bugs filed based on nightly findings since the last integration to master and added the integration_blocker label to most of them. I tried to filter out infrastructure problems and bugs that seems to originate from master. The result is 14 blockers. There can obviously be several cases where the label can be removed but the rule is the same as it has been before: All bugs found by nightly testing should have the integration_blocker label, or a motivation as to why it is not a blocker. One (desperate) way to remove the integration_blocker label from a pure test bug is to add the test to the problem list. This is not recommended, but possible. The integration_blocker label is then moved to the subtask used to problem list the test. If a test is put on the problem list due to a VM issue (not recommended) the bug remains an integration blocker. Thanks, /Jesper From mandy.chung at oracle.com Wed Oct 18 17:57:26 2017 From: mandy.chung at oracle.com (mandy chung) Date: Wed, 18 Oct 2017 10:57:26 -0700 Subject: RFR: JDK-8189607 Remove duplicated jvmticmlr.h In-Reply-To: References: Message-ID: <1012e7eb-0509-ceb7-789d-658e87ad1a14@oracle.com> On 10/18/17 1:26 AM, Erik Joelsson wrote: > On 2017-10-18 10:04, Magnus Ihse Bursie wrote: >> The file jvmticmlr.h is stored twice in the repo, both in hotspot and >> in java.base. They are both identical, and only the java.base version >> is included in the final product. This might arguably have been >> useful in a pre-consolidated world, but makes absolutely no sense now. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8189607 >> WebRev: >> http://cr.openjdk.java.net/~ihse/JDK-8189607-remove-duplicated-jvmticmlr/webrev.01 >> > The question is, which file location makes the most sense. I think > your pick of java.base/share/native/include probably makes more sense > as that makes it much clearer that this is an exported header file. > jvmticmlr.h is an exported header file and java.base/share/native/include is a proper location as described in JEP 201 about the modular source layout. The change looks good to me too. Mandy From kim.barrett at oracle.com Wed Oct 18 18:21:50 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 18 Oct 2017 14:21:50 -0400 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> <8209F13B-72CA-4135-B589-09D72A0B54AA@oracle.com> Message-ID: > On Oct 18, 2017, at 7:55 AM, Stefan Karlsson wrote: > > Hi all, > > Updated webrevs: > http://cr.openjdk.java.net/~stefank/8189359/webrev.03.delta > http://cr.openjdk.java.net/~stefank/8189359/webrev.03 Looks good. From kim.barrett at oracle.com Wed Oct 18 18:24:21 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 18 Oct 2017 14:24:21 -0400 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: <276b4eb8-1bb1-0ec7-72c6-6279665b58f5@oracle.com> References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> <67b8baf1-0e2b-7ebc-2826-de81da5cf770@oracle.com> <276b4eb8-1bb1-0ec7-72c6-6279665b58f5@oracle.com> Message-ID: > On Oct 18, 2017, at 8:01 AM, Stefan Karlsson wrote: > New webrevs: > http://cr.openjdk.java.net/~stefank/8189360/webrev.01.delta > http://cr.openjdk.java.net/~stefank/8189360/webrev.01 Looks good. From stefan.karlsson at oracle.com Wed Oct 18 19:11:17 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 18 Oct 2017 21:11:17 +0200 Subject: 8189360: JvmtiExport::weak_oops_do is called for all JNIHandleBlock instances In-Reply-To: References: <8e8b2dd7-3e49-ef54-6e3b-f13fb847cbd8@oracle.com> <67b8baf1-0e2b-7ebc-2826-de81da5cf770@oracle.com> <276b4eb8-1bb1-0ec7-72c6-6279665b58f5@oracle.com> Message-ID: Thanks all for reviewing. StefanK On 2017-10-18 20:24, Kim Barrett wrote: >> On Oct 18, 2017, at 8:01 AM, Stefan Karlsson wrote: >> New webrevs: >> http://cr.openjdk.java.net/~stefank/8189360/webrev.01.delta >> http://cr.openjdk.java.net/~stefank/8189360/webrev.01 > Looks good. > From stefan.karlsson at oracle.com Wed Oct 18 19:11:48 2017 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 18 Oct 2017 21:11:48 +0200 Subject: RFR: 8189359: Move native weak oops cleaning out of ReferenceProcessor In-Reply-To: References: <8c0aafa1-ca06-105c-72f9-7bd11d382452@oracle.com> <8209F13B-72CA-4135-B589-09D72A0B54AA@oracle.com> Message-ID: Thanks all for reviewing. StefanK On 2017-10-18 20:21, Kim Barrett wrote: >> On Oct 18, 2017, at 7:55 AM, Stefan Karlsson wrote: >> >> Hi all, >> >> Updated webrevs: >> http://cr.openjdk.java.net/~stefank/8189359/webrev.03.delta >> http://cr.openjdk.java.net/~stefank/8189359/webrev.03 > Looks good. > From coleen.phillimore at oracle.com Wed Oct 18 20:44:28 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 18 Oct 2017 16:44:28 -0400 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: This looks really nice.? A few minor comments. http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/src/hotspot/share/runtime/handshake.hpp.html 51 // or the JavaThread it self. typo, "itself" Thank you for adding these comments.? I think they're just right in length and detail in the header. http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/src/hotspot/share/runtime/handshake.cpp.html The protocol in HandshakeState::process_self_inner and cancel_inner is: ??? clear_handshake(thread); ??? if (op != NULL) { ????? op->do_handshake(thread); ??? } But in HandshakeState::process_by_vmthread(), the order is reversed.? Can you explain why in the comments. ??? _operation->do_handshake(target); ??? clear_handshake(target); It looks like the thread can't continue while the handshake operation is in progress, so does the order matter? http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeWalkStackNativeTest.java.html This has the wrong @test name.? These could use an @comment line about what you expect also.? I don't know what's "Native" about it though, isn't it testing what happens when you use -XX:+ThreadLocalHandshakes? http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeWalkStackFallbackTest.java.html This one too an @comment that it's testing the fallback VM operation would be good. I don't need to see another webrev for the comment changes. Lastly, as I said before, I think putting the safepoint polls in the interpreter at return and backward branches would be a good follow on changeset. Thanks, Coleen On 10/11/17 9:37 AM, Robbin Ehn wrote: > Hi all, > > Starting the review of the code while JEP work is still not completed. > > JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 > > This JEP introduces a way to execute a callback on threads without > performing a global VM safepoint. It makes it both possible and cheap > to stop individual threads and not just all threads or none. > > Entire changeset: > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ > > Divided into 3-parts, > SafepointMechanism abstraction: > http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ > Consolidating polling page allocation: > http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ > Handshakes: > http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ > > A handshake operation is a callback that is executed for each > JavaThread while that thread is in a safepoint safe state. The > callback is executed either by the thread itself or by the VM thread > while keeping the thread in a blocked state. The big difference > between safepointing and handshaking is that the per thread operation > will be performed on all threads as soon as possible and they will > continue to execute as soon as it?s own operation is completed. If a > JavaThread is known to be running, then a handshake can be performed > with that single JavaThread as well. > > The current safepointing scheme is modified to perform an indirection > through a per-thread pointer which will allow a single thread's > execution to be forced to trap on the guard page. In order to force a > thread to yield the VM updates the per-thread pointer for the > corresponding thread to point to the guarded page. > > Example of potential use-cases: > -Biased lock revocation > -External requests for stack traces > -Deoptimization > -Async exception delivery > -External suspension > -Eliding memory barriers > > All of these will benefit the VM moving towards becoming more > low-latency friendly by reducing the number of global safepoints. > Platforms that do not yet implement the per JavaThread poll, a > fallback to normal safepoint is in place. HandshakeOneThread will then > be a normal safepoint. The supported platforms are Linux x64 and > Solaris SPARC. > > Tested heavily with various test suits and comes with a few new tests. > > Performance testing using standardized benchmark show no signification > changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris > SPARC (not statistically ensured). A minor regression for the load vs > load load on x64 is expected and a slight increase on SPARC due to the > cost of ?materializing? the page vs load load. > The time to trigger a safepoint was measured on a large machine to not > be an issue. The looping over threads and arming the polling page will > benefit from the work on JavaThread life-cycle (8167108 - SMR and > JavaThread Lifecycle: > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) > which puts all JavaThreads in an array instead of a linked list. > > Thanks, Robbin From david.holmes at oracle.com Thu Oct 19 02:07:48 2017 From: david.holmes at oracle.com (David Holmes) Date: Thu, 19 Oct 2017 12:07:48 +1000 Subject: RFR: JDK-8189608 Remove duplicated jni.h In-Reply-To: <237f7a02-73a8-a121-d4f4-5978c7479b79@oracle.com> References: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> <237f7a02-73a8-a121-d4f4-5978c7479b79@oracle.com> Message-ID: On 18/10/2017 11:14 PM, coleen.phillimore at oracle.com wrote: > > This looks great.? There's also jvm.h too, which is a little more > different but shouldn't be. jvm.h plus the platform specific headers need a bit more work. There's a runtime bug open for that: https://bugs.openjdk.java.net/browse/JDK-8189610 > Did/could you make this change in the jdk10/hs repository since it's > primarily hotspot files??? I can't tell from the webrev. I suggested hs. David > Thanks, > Coleen > > > > On 10/18/17 4:53 AM, Magnus Ihse Bursie wrote: >> The file jni.h is stored twice in the repo, both in hotspot and in >> java.base. They are both identical, and only the java.base version is >> included in the final product. >> >> This bug is a part of the umbrella effort JDK-8167078 "Duplicate >> header files in hotspot and jdk". As for JDK-8189607, my reasoning is >> that the java.base version is the one to keep. (In this case, there >> was actually a small difference between the two files -- the hotspot >> version first copyright year was 1997, but the java.base version was >> 1996. It makes sense to keep the oldest one.) >> >> My assumption was that hotspot include files should be sorted >> according to the containing directory, and since jni.h no longer >> resides in "prims", I've rearranged the include line where needed. >> >> The -I path added in CompileJvm.gmk is identical to the one in >> JDK-8189607, and will be merged to the same change (depending on which >> fix enters first.) >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8189608 >> WebRev: >> http://cr.openjdk.java.net/~ihse/JDK-8189608-remove-duplicated-jni/webrev.01 >> >> >> /Magnus > From OGATAK at jp.ibm.com Thu Oct 19 06:43:19 2017 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Thu, 19 Oct 2017 15:43:19 +0900 Subject: 8188131: [PPC] Increase inlining thresholds to the same as other platforms In-Reply-To: References: Message-ID: Hi Martin, Thank you for your comment. I checked the code cache size by running SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 31GB). The used code cache size was increased by 4.5MB from 41982Kb to 47006Kb (+12%). Is the increase too large? The raw output of -XX:+PrintCodeCache are: === Original === CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb max_used=13884Kb free=638595Kb bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000] CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb max_used=26593Kb free=625886Kb bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000] CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb free=4254Kb bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000] total_blobs=16606 nmethods=10265 adapters=653 compilation: enabled === Modified (webrev.00) === CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb max_used=18516Kb free=633964Kb bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000] CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb max_used=26963Kb free=625516Kb bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000] CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb free=4232Kb bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000] total_blobs=16561 nmethods=10295 adapters=653 compilation: enabled Regards, Ogata From: "Doerr, Martin" To: Kazunori Ogata , "hotspot-dev at openjdk.java.net" , "ppc-aix-port-dev at openjdk.java.net" Date: 2017/10/18 19:43 Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other platforms Hi Ogata, sorry for the delay. I had missed this one. The change looks feasible to me. It may only impact the utilization of the Code Cache. Can you evaluate that (e.g. by running large benchmarks with -XX:+PrintCodeCache)? Thanks and best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Kazunori Ogata Sent: Freitag, 29. September 2017 08:42 To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as other platforms Hi all, Please review a change for JDK-8188131. Bug report: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p-FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD73lAZxkNhGsrlDkk-YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e= Webrev: https://urldefense.proofpoint.com/v2/url?u=http-3A__cr.openjdk.java.net_-7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p-FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB-i9r6lTggpGH3Np8kmONkkMAg&e= This change increases the default values of FreqInlineSize and InlineSmallCode in ppc64 to 325 and 2500, respectively. These values are the same as aarch64. The performance of TPC-DS Q96 was improved by about 6% with this change. Regards, Ogata From magnus.ihse.bursie at oracle.com Thu Oct 19 07:21:15 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Thu, 19 Oct 2017 09:21:15 +0200 Subject: RFR: JDK-8189608 Remove duplicated jni.h In-Reply-To: <237f7a02-73a8-a121-d4f4-5978c7479b79@oracle.com> References: <05451b2c-9905-e1cc-7cfb-39fbe1d1c983@oracle.com> <237f7a02-73a8-a121-d4f4-5978c7479b79@oracle.com> Message-ID: <075ad93f-cd7f-0147-e7a0-2c9ebc44acfd@oracle.com> On 2017-10-18 15:14, coleen.phillimore at oracle.com wrote: > > This looks great. Thank you! > There's also jvm.h too, which is a little more different but shouldn't > be. That needs a bit of work to make sure no relevant differences get lost. I opened JDK-8189610 for the hotspot team to fix this, before I can proceed with the unification. > > Did/could you make this change in the jdk10/hs repository since it's > primarily hotspot files??? I can't tell from the webrev. Sorry I was not clear on this. I started out by doing the patch in my jdk10/master clone, but I pushed it to jdk10/hs. /Magnus > > Thanks, > Coleen > > > > On 10/18/17 4:53 AM, Magnus Ihse Bursie wrote: >> The file jni.h is stored twice in the repo, both in hotspot and in >> java.base. They are both identical, and only the java.base version is >> included in the final product. >> >> This bug is a part of the umbrella effort JDK-8167078 "Duplicate >> header files in hotspot and jdk". As for JDK-8189607, my reasoning is >> that the java.base version is the one to keep. (In this case, there >> was actually a small difference between the two files -- the hotspot >> version first copyright year was 1997, but the java.base version was >> 1996. It makes sense to keep the oldest one.) >> >> My assumption was that hotspot include files should be sorted >> according to the containing directory, and since jni.h no longer >> resides in "prims", I've rearranged the include line where needed. >> >> The -I path added in CompileJvm.gmk is identical to the one in >> JDK-8189607, and will be merged to the same change (depending on >> which fix enters first.) >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8189608 >> WebRev: >> http://cr.openjdk.java.net/~ihse/JDK-8189608-remove-duplicated-jni/webrev.01 >> >> /Magnus > From goetz.lindenmaier at sap.com Thu Oct 19 11:03:16 2017 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 19 Oct 2017 11:03:16 +0000 Subject: 8188131: [PPC] Increase inlining thresholds to the same as other platforms In-Reply-To: References: Message-ID: Hi Kazunori, To me, this seems to be a very large increase. Considering that not only the required code cache size but also the compiler cpu time will increase in this magnitude, this seems to be a rather risky step that should be tested for its benefits on systems that are highly contended. In this case, you probably had enough space in the code cache so that no recompilation etc. happened. To further look at this I could think of 1. finding the minimal code cache size with the old flags where the JIT is not disabled 2. finding the same size for the new flag settings --> How much more is needed for the new settings? Then you should compare the performance with the bigger code cache size for both, and see whether there still is performance improvement, or whether it's eaten up by more compile time. I.e. you should have a setup where compiler threads and application threads compete for the available CPUs. What do you think? Best regards, Goetz. > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf Of Kazunori Ogata > Sent: Donnerstag, 19. Oktober 2017 08:43 > To: Doerr, Martin > Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other > platforms > > Hi Martin, > > Thank you for your comment. I checked the code cache size by running > SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 31GB). > > The used code cache size was increased by 4.5MB from 41982Kb to 47006Kb > (+12%). Is the increase too large? > > > The raw output of -XX:+PrintCodeCache are: > > === Original === > CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb > max_used=13884Kb free=638595Kb > bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000] > CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb > max_used=26593Kb > free=625886Kb > bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000] > CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb > free=4254Kb > bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000] > total_blobs=16606 nmethods=10265 adapters=653 > compilation: enabled > > > === Modified (webrev.00) === > CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb > max_used=18516Kb free=633964Kb > bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000] > CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb > max_used=26963Kb > free=625516Kb > bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000] > CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb > free=4232Kb > bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000] > total_blobs=16561 nmethods=10295 adapters=653 > compilation: enabled > > > Regards, > Ogata > > > > > From: "Doerr, Martin" > To: Kazunori Ogata , "hotspot- > dev at openjdk.java.net" > , "ppc-aix-port-dev at openjdk.java.net" > > Date: 2017/10/18 19:43 > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the > same as other platforms > > > > Hi Ogata, > > sorry for the delay. I had missed this one. > > The change looks feasible to me. > > It may only impact the utilization of the Code Cache. Can you evaluate > that (e.g. by running large benchmarks with -XX:+PrintCodeCache)? > > Thanks and best regards, > Martin > > > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf > Of Kazunori Ogata > Sent: Freitag, 29. September 2017 08:42 > To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as > other platforms > > Hi all, > > Please review a change for JDK-8188131. > > Bug report: > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__bugs.openjdk.java.net_browse_JDK- > 2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > 73lAZxkNhGsrlDkk- > YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e= > > Webrev: > https://urldefense.proofpoint.com/v2/url?u=http- > 3A__cr.openjdk.java.net_- > 7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > 73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB- > i9r6lTggpGH3Np8kmONkkMAg&e= > > > This change increases the default values of FreqInlineSize and > InlineSmallCode in ppc64 to 325 and 2500, respectively. These values are > the same as aarch64. The performance of TPC-DS Q96 was improved by > about > 6% with this change. > > > Regards, > Ogata > > > From robbin.ehn at oracle.com Thu Oct 19 12:36:34 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 19 Oct 2017 14:36:34 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: <8e9d6d66-8d5c-6605-f0b1-fdbfedef43cf@oracle.com> Thanks for looking at this Coleen, On 2017-10-18 22:44, coleen.phillimore at oracle.com wrote: > > This looks really nice.? A few minor comments. > > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/src/hotspot/share/runtime/handshake.hpp.html > > ? 51 // or the JavaThread it self. > > typo, "itself" Fixed > > Thank you for adding these comments.? I think they're just right in length and detail in the header. > > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/src/hotspot/share/runtime/handshake.cpp.html > > The protocol in HandshakeState::process_self_inner and cancel_inner is: > > ??? clear_handshake(thread); > ??? if (op != NULL) { > ????? op->do_handshake(thread); > ??? } > > But in HandshakeState::process_by_vmthread(), the order is reversed. Can you explain why in the comments. > > ??? _operation->do_handshake(target); > ??? clear_handshake(target); > > It looks like the thread can't continue while the handshake operation is in progress, so does the order matter? The key part here is that must be cleared before signaling the semaphore. The early clearing is because if the thread is doing it's own operation, the VM thread can quickly skip this thread by looking if it still have an operation. > > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeWalkStackNativeTest.java.html > > This has the wrong @test name.? These could use an @comment line about what you expect also.? I don't know what's "Native" about it though, isn't it testing what happens when you use -XX:+ThreadLocalHandshakes? > > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeWalkStackFallbackTest.java.html > > This one too an @comment that it's testing the fallback VM operation would be good. > > I don't need to see another webrev for the comment changes. Here it is, there was inconsistencies in the tests, I think it is better now. http://cr.openjdk.java.net/~rehn/8185640/v2/Coleen-n-Test-Cleanup-4/webrev/ > > Lastly, as I said before, I think putting the safepoint polls in the interpreter at return and backward branches would be a good follow on changeset. I will let Claes R decided if that is an acceptable approach. Thanks, Robbin > > Thanks, > Coleen > > > On 10/11/17 9:37 AM, Robbin Ehn wrote: >> Hi all, >> >> Starting the review of the code while JEP work is still not completed. >> >> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >> >> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none. >> >> Entire changeset: >> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >> >> Divided into 3-parts, >> SafepointMechanism abstraction: >> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >> Consolidating polling page allocation: >> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >> Handshakes: >> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >> >> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well. >> >> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >> >> Example of potential use-cases: >> -Biased lock revocation >> -External requests for stack traces >> -Deoptimization >> -Async exception delivery >> -External suspension >> -Eliding memory barriers >> >> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC. >> >> Tested heavily with various test suits and comes with a few new tests. >> >> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list. >> >> Thanks, Robbin > From robbin.ehn at oracle.com Thu Oct 19 12:40:24 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 19 Oct 2017 14:40:24 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <82848a04-21dd-119e-3d53-101a7f25cb54@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <82848a04-21dd-119e-3d53-101a7f25cb54@oracle.com> Message-ID: <04bd05d6-7ce2-93a9-288f-a640fb4b2806@oracle.com> Here is the third incremental change: http://cr.openjdk.java.net/~rehn/8185640/v2/Coleen-n-Test-Cleanup-4/webrev/ Goes on top of Atomic-Update-Rebase-3. Let me know if anyone want to see some other kind of webrevs. Thanks, Robbin On 2017-10-18 11:15, Robbin Ehn wrote: > Hi all, > > Update after re-base with new atomic implementation: > http://cr.openjdk.java.net/~rehn/8185640/v1/Atomic-Update-Rebase-3/ > This goes on top of the Handshakes-2. > > Let me know if you want some other kinds of webrevs. > > I would like to point out that Mikael Gerdin and Erik ?sterlund also are contributors of this changeset. > > Thanks, Robbin > > On 2017-10-11 15:37, Robbin Ehn wrote: >> Hi all, >> >> Starting the review of the code while JEP work is still not completed. >> >> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >> >> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none. >> >> Entire changeset: >> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >> >> Divided into 3-parts, >> SafepointMechanism abstraction: >> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >> Consolidating polling page allocation: >> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >> Handshakes: >> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >> >> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well. >> >> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >> >> Example of potential use-cases: >> -Biased lock revocation >> -External requests for stack traces >> -Deoptimization >> -Async exception delivery >> -External suspension >> -Eliding memory barriers >> >> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC. >> >> Tested heavily with various test suits and comes with a few new tests. >> >> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list. >> >> Thanks, Robbin From coleen.phillimore at oracle.com Thu Oct 19 13:56:17 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 19 Oct 2017 09:56:17 -0400 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <04bd05d6-7ce2-93a9-288f-a640fb4b2806@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <82848a04-21dd-119e-3d53-101a7f25cb54@oracle.com> <04bd05d6-7ce2-93a9-288f-a640fb4b2806@oracle.com> Message-ID: <868cd47e-f120-69d9-8932-45501794d4b5@oracle.com> http://cr.openjdk.java.net/~rehn/8185640/v2/Coleen-n-Test-Cleanup-4/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeTransitionTest.java.udiff.html Thank you this is better. In this test, what happens if it fails? Everything looks better with this change. Thanks, Coleen On 10/19/17 8:40 AM, Robbin Ehn wrote: > Here is the third incremental change: > http://cr.openjdk.java.net/~rehn/8185640/v2/Coleen-n-Test-Cleanup-4/webrev/ > > Goes on top of Atomic-Update-Rebase-3. > > Let me know if anyone want to see some other kind of webrevs. > > Thanks, Robbin > > On 2017-10-18 11:15, Robbin Ehn wrote: >> Hi all, >> >> Update after re-base with new atomic implementation: >> http://cr.openjdk.java.net/~rehn/8185640/v1/Atomic-Update-Rebase-3/ >> This goes on top of the Handshakes-2. >> >> Let me know if you want some other kinds of webrevs. >> >> I would like to point out that Mikael Gerdin and Erik ?sterlund also >> are contributors of this changeset. >> >> Thanks, Robbin >> >> On 2017-10-11 15:37, Robbin Ehn wrote: >>> Hi all, >>> >>> Starting the review of the code while JEP work is still not completed. >>> >>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>> >>> This JEP introduces a way to execute a callback on threads without >>> performing a global VM safepoint. It makes it both possible and >>> cheap to stop individual threads and not just all threads or none. >>> >>> Entire changeset: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>> >>> Divided into 3-parts, >>> SafepointMechanism abstraction: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>> Consolidating polling page allocation: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>> Handshakes: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>> >>> A handshake operation is a callback that is executed for each >>> JavaThread while that thread is in a safepoint safe state. The >>> callback is executed either by the thread itself or by the VM thread >>> while keeping the thread in a blocked state. The big difference >>> between safepointing and handshaking is that the per thread >>> operation will be performed on all threads as soon as possible and >>> they will continue to execute as soon as it?s own operation is >>> completed. If a JavaThread is known to be running, then a handshake >>> can be performed with that single JavaThread as well. >>> >>> The current safepointing scheme is modified to perform an >>> indirection through a per-thread pointer which will allow a single >>> thread's execution to be forced to trap on the guard page. In order >>> to force a thread to yield the VM updates the per-thread pointer for >>> the corresponding thread to point to the guarded page. >>> >>> Example of potential use-cases: >>> -Biased lock revocation >>> -External requests for stack traces >>> -Deoptimization >>> -Async exception delivery >>> -External suspension >>> -Eliding memory barriers >>> >>> All of these will benefit the VM moving towards becoming more >>> low-latency friendly by reducing the number of global safepoints. >>> Platforms that do not yet implement the per JavaThread poll, a >>> fallback to normal safepoint is in place. HandshakeOneThread will >>> then be a normal safepoint. The supported platforms are Linux x64 >>> and Solaris SPARC. >>> >>> Tested heavily with various test suits and comes with a few new tests. >>> >>> Performance testing using standardized benchmark show no >>> signification changes, the latest number was -0.7% on Linux x64 and >>> +1.5% Solaris SPARC (not statistically ensured). A minor regression >>> for the load vs load load on x64 is expected and a slight increase >>> on SPARC due to the cost of ?materializing? the page vs load load. >>> The time to trigger a safepoint was measured on a large machine to >>> not be an issue. The looping over threads and arming the polling >>> page will benefit from the work on JavaThread life-cycle (8167108 - >>> SMR and JavaThread Lifecycle: >>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) >>> which puts all JavaThreads in an array instead of a linked list. >>> >>> Thanks, Robbin From coleen.phillimore at oracle.com Thu Oct 19 14:20:34 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 19 Oct 2017 10:20:34 -0400 Subject: RFR: 8184914: Use MacroAssembler::cmpoop() consistently when comparing heap objects In-Reply-To: <55bb0f72-df71-44bc-53a0-7d982ab1ca04@redhat.com> References: <8d667010-f17e-7d1b-088b-106999e3b005@redhat.com> <9b629556-b3f0-e52e-35e0-711c6a767e95@oracle.com> <55bb0f72-df71-44bc-53a0-7d982ab1ca04@redhat.com> Message-ID: I'm calling this as "trivial" and can be pushed now. Thanks, Coleen On 10/17/17 5:05 PM, Roman Kennke wrote: > >> >> This looks reasonable to me.? Maybe the compiler group should review >> the c1 part.? I changed the mailing list to hotspot-dev. >> I can sponsor this for you. > Thanks, thanks and thanks! ;-) > > Roman > >> Thanks, >> Coleen >> >> On 10/17/17 4:22 PM, Roman Kennke wrote: >>> (Not sure if this is the correct list to ask.. if not, please let me >>> know and/or redirect me) >>> >>> Currently, cmpoop() is only declared for 32-bit x86, and only used >>> in 2 places in C1 to compare oops. In other places, oops are >>> compared using cmpptr(). It would be useful to distinguish normal >>> pointer comparisons from heap object comparisons, and use cmpoop() >>> consistently for heap object comparisons. This would remove clutter >>> in several places where we have #ifdef _LP64 around comparisons, and >>> would also allow to insert necessary barriers for GCs that need them >>> (e.g. Shenandoah) later. >>> >>> http://cr.openjdk.java.net/~rkennke/8184914/webrev.00/ >>> >>> >>> Tested by running hotspot_gc jtreg tests. >>> >>> Can I get a review please? >>> >>> Thanks, Roman >>> >>> >> > From OGATAK at jp.ibm.com Fri Oct 20 06:31:47 2017 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Fri, 20 Oct 2017 15:31:47 +0900 Subject: 8188131: [PPC] Increase inlining thresholds to the same as other platforms In-Reply-To: References: Message-ID: Hi Goetz, Thank you for your comment. OK, I'll evaluate the patch more by comparing the minimum code cache sizes and the performance on the cache size. It is helpful if you could explain what is the difference of the JIT behavior when the code cache is large enough and when it is the minimum size. It seems almost the same to me because all the methods that needed to be compiled should be compiled in both cases, but I may miss something. By the way, the benchmark I confirmed performance improvement was TPC-DS q96, but I measured the code cache size of SPECjbb2015 by my mistake. I'll compare the minimum code cache sizes and the performance of both benchmarks, as this patch will affect all applications. Regards, Ogata From: "Lindenmaier, Goetz" To: Kazunori Ogata , "Doerr, Martin" Cc: "ppc-aix-port-dev at openjdk.java.net" , "hotspot-dev at openjdk.java.net" Date: 2017/10/19 20:03 Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other platforms Hi Kazunori, To me, this seems to be a very large increase. Considering that not only the required code cache size but also the compiler cpu time will increase in this magnitude, this seems to be a rather risky step that should be tested for its benefits on systems that are highly contended. In this case, you probably had enough space in the code cache so that no recompilation etc. happened. To further look at this I could think of 1. finding the minimal code cache size with the old flags where the JIT is not disabled 2. finding the same size for the new flag settings --> How much more is needed for the new settings? Then you should compare the performance with the bigger code cache size for both, and see whether there still is performance improvement, or whether it's eaten up by more compile time. I.e. you should have a setup where compiler threads and application threads compete for the available CPUs. What do you think? Best regards, Goetz. > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf Of Kazunori Ogata > Sent: Donnerstag, 19. Oktober 2017 08:43 > To: Doerr, Martin > Cc: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the same as other > platforms > > Hi Martin, > > Thank you for your comment. I checked the code cache size by running > SPECjbb2015 (composite mode, i.e., single JVM mode, heap size is 31GB). > > The used code cache size was increased by 4.5MB from 41982Kb to 47006Kb > (+12%). Is the increase too large? > > > The raw output of -XX:+PrintCodeCache are: > > === Original === > CodeHeap 'non-profiled nmethods': size=652480Kb used=13884Kb > max_used=13884Kb free=638595Kb > bounds [0x00001000356f0000, 0x0000100036480000, 0x000010005d420000] > CodeHeap 'profiled nmethods': size=652480Kb used=26593Kb > max_used=26593Kb > free=625886Kb > bounds [0x000010000d9c0000, 0x000010000f3c0000, 0x00001000356f0000] > CodeHeap 'non-nmethods': size=5760Kb used=1505Kb max_used=1559Kb > free=4254Kb > bounds [0x000010000d420000, 0x000010000d620000, 0x000010000d9c0000] > total_blobs=16606 nmethods=10265 adapters=653 > compilation: enabled > > > === Modified (webrev.00) === > CodeHeap 'non-profiled nmethods': size=652480Kb used=18516Kb > max_used=18516Kb free=633964Kb > bounds [0x0000100035730000, 0x0000100036950000, 0x000010005d460000] > CodeHeap 'profiled nmethods': size=652480Kb used=26963Kb > max_used=26963Kb > free=625516Kb > bounds [0x000010000da00000, 0x000010000f460000, 0x0000100035730000] > CodeHeap 'non-nmethods': size=5760Kb used=1527Kb max_used=1565Kb > free=4232Kb > bounds [0x000010000d460000, 0x000010000d660000, 0x000010000da00000] > total_blobs=16561 nmethods=10295 adapters=653 > compilation: enabled > > > Regards, > Ogata > > > > > From: "Doerr, Martin" > To: Kazunori Ogata , "hotspot- > dev at openjdk.java.net" > , "ppc-aix-port-dev at openjdk.java.net" > > Date: 2017/10/18 19:43 > Subject: RE: 8188131: [PPC] Increase inlining thresholds to the > same as other platforms > > > > Hi Ogata, > > sorry for the delay. I had missed this one. > > The change looks feasible to me. > > It may only impact the utilization of the Code Cache. Can you evaluate > that (e.g. by running large benchmarks with -XX:+PrintCodeCache)? > > Thanks and best regards, > Martin > > > -----Original Message----- > From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On > Behalf > Of Kazunori Ogata > Sent: Freitag, 29. September 2017 08:42 > To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: RFR: 8188131: [PPC] Increase inlining thresholds to the same as > other platforms > > Hi all, > > Please review a change for JDK-8188131. > > Bug report: > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__bugs.openjdk.java.net_browse_JDK- > 2D8188131&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > 73lAZxkNhGsrlDkk- > YUYORQ&s=ic27Fb2_vyTSsUAPraEI89UDJy9cbodGojvMw9DNHiU&e= > > Webrev: > https://urldefense.proofpoint.com/v2/url?u=http- > 3A__cr.openjdk.java.net_- > 7Ehorii_8188131_webrev.00_&d=DwIFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p- > FJcrbNvnCOLkbIdmQ2tigCrcpdU77tlI2EIdaEcJw&m=ExKSiZAany_n7vS453MD > 73lAZxkNhGsrlDkk-YUYORQ&s=xS8PbLyuVtbOBRDMIB- > i9r6lTggpGH3Np8kmONkkMAg&e= > > > This change increases the default values of FreqInlineSize and > InlineSmallCode in ppc64 to 325 and 2500, respectively. These values are > the same as aarch64. The performance of TPC-DS Q96 was improved by > about > 6% with this change. > > > Regards, > Ogata > > > From bourges.laurent at gmail.com Fri Oct 20 08:19:53 2017 From: bourges.laurent at gmail.com (=?UTF-8?Q?Laurent_Bourg=C3=A8s?=) Date: Fri, 20 Oct 2017 10:19:53 +0200 Subject: Upgrading gcc arch ? In-Reply-To: References: Message-ID: Hi, I wonder if it is time to compile c/c++ code with a more recent cpu architecture (x86-64 is quite old: only SSE ?) to take benefit of performance optimizations offered by recent CPU and compilers (AVX...). Of course that means such builds would be specific to a CPU class and that will require build changes to make multiple flavors depending on the CPU classes ... See gcc -mtune argument: https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/x86-Options.html " ?sandybridge? Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. ?ivybridge? Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C instruction set support. ?haswell? Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2 and F16C instruction set support. ?broadwell? Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support. ?skylake? Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and XSAVES instruction set support. ?bonnell? Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support. ?silvermont? Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support. ?knl? Intel Knight's Landing CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, AVX512F, AVX512PF, AVX512ER and AVX512CD instruction set support. ?skylake-avx512? Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set support. " Comments are welcome, Laurent From glaubitz at physik.fu-berlin.de Fri Oct 20 08:25:54 2017 From: glaubitz at physik.fu-berlin.de (John Paul Adrian Glaubitz) Date: Fri, 20 Oct 2017 10:25:54 +0200 Subject: Upgrading gcc arch ? In-Reply-To: References: Message-ID: <8d81eea9-0fff-5981-f885-acc66c69fb33@physik.fu-berlin.de> On 10/20/2017 10:19 AM, Laurent Bourg?s wrote: > I wonder if it is time to compile c/c++ code with a more recent cpu > architecture (x86-64 is quite old: only SSE ?) to take benefit of > performance optimizations offered by recent CPU and compilers (AVX...). Only if it's possible to make use of these features during runtime as it's being done on SPARC. > Of course that means such builds would be specific to a CPU class and that > will require build changes to make multiple flavors depending on the CPU > classes ... No, if this a compile time option, this is an absolute no go. It would be absolutely crazy to break compatibility with such widely available hardware with a piece of software which has one of the largest installation bases world wide. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz at debian.org `. `' Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 From david.holmes at oracle.com Fri Oct 20 09:19:00 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 20 Oct 2017 19:19:00 +1000 Subject: Upgrading gcc arch ? In-Reply-To: References: Message-ID: bcc'ing the discuss list On 20/10/2017 6:19 PM, Laurent Bourg?s wrote: > Hi, > > I wonder if it is time to compile c/c++ code with a more recent cpu > architecture (x86-64 is quite old: only SSE ?) to take benefit of > performance optimizations offered by recent CPU and compilers (AVX...). The focus in hotspot is on JIT generated code which does take advantage of such optimizations based on the runtime CPU capabilities. Is there specific C code in the JDK that you think would benefit from them? Have you done comparison builds and run any benchmarks? Thanks, David > Of course that means such builds would be specific to a CPU class and that > will require build changes to make multiple flavors depending on the CPU > classes ... > > See gcc -mtune argument: > https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/x86-Options.html > > " > ?sandybridge? > Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. > ?ivybridge? > Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C > instruction set support. > ?haswell? > Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, > FMA, BMI, BMI2 and F16C instruction set support. > ?broadwell? > Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, > RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set > support. > ?skylake? > Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, > FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and > XSAVES instruction set support. > ?bonnell? > Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 > and SSSE3 instruction set support. > ?silvermont? > Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set > support. > ?knl? > Intel Knight's Landing CPU with 64-bit extensions, MOVBE, MMX, SSE, > SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, > FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, AVX512F, > AVX512PF, AVX512ER and AVX512CD instruction set support. > ?skylake-avx512? > Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, > RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, > XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set > support. > > " > > Comments are welcome, > Laurent > From thomas.schatzl at oracle.com Fri Oct 20 10:05:29 2017 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 20 Oct 2017 12:05:29 +0200 Subject: RFR(M) 8186834:Expanding old area without full GC in parallel GC In-Reply-To: References: Message-ID: <1508493929.2820.9.camel@oracle.com> Hi, On Fri, 2017-10-20 at 18:13 +0900, Michihiro Horie wrote: > Hi Thomas, > > Thanks a lot for the discussion, also sorry for my late reply. > > I think MinHeapFreeRatio tunes the size of heap expansion, while > UseAdaptiveGenerationSizePolicyBeforeMajorCollection decides to > expand heap, whose size is decided by MinHeapFreeRatio, without full > GC. I agree, but one could tune MinHeapFreeRatio so that the amount of full gcs and the time spent in there would be much smaller than by default. > >Particularly if, as you mention, full gc will not yield a > significant amount of freed memory, both methods seem to achieve the > exact same effect. > Yes, so I think heap once expands up to Xmx, both methods have the > same effect. > [...] > >Otherwise, if you were able to pass different VM arguments to the > >different VMs, the use of -Xms (instead of that new flag) would seem > >straightforward to me (Only specifying -Xms will not actually commit > >the memory, so there is no difference in actual memory use). > I did not tell this (sorry), but currently Xms and Xmx are set > explicitly in the VM arguments because we want to use only needed > memory. > As mentioned before, even with -Xms == -Xmx, memory is not actually backed with physical memory until actually touched by default. I could imagine that -Xms==-Xmx would yield better (initial) performance as the young gen will be sized larger. So the suggested change would only make a difference in case you also explicitly pre-touched that memory from what I understood. Not sure if that is what you do or desire (enabling memory pretouch is typically only used with -Xms==-Xmx, so not sure if that is a good use case). Thanks, Thomas From robbin.ehn at oracle.com Fri Oct 20 10:11:54 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 20 Oct 2017 12:11:54 +0200 Subject: Upgrading gcc arch ? In-Reply-To: References: Message-ID: <7bbf5a0d-ed69-6f6e-6c59-2373a465f65d@oracle.com> On 2017-10-20 11:19, David Holmes wrote: > bcc'ing the discuss list > > On 20/10/2017 6:19 PM, Laurent Bourg?s wrote: >> Hi, >> >> I wonder if it is time to compile c/c++ code with a more recent cpu >> architecture (x86-64 is quite old: only SSE ?) to take benefit of >> performance optimizations offered by recent CPU and compilers (AVX...). > > The focus in hotspot is on JIT generated code which does take advantage of such optimizations based on the runtime CPU capabilities. > > Is there specific C code in the JDK that you think would benefit from them? If there are specific code that preform much better with new some newer features we could utilize function multiversioning feature in the gcc. E.g.: __attribute__((target_clones("sse4.2","sse3","default"))) void stream_function(...) { Negative impact on size, so as David says, benchmark first. /Robbin > > Have you done comparison builds and run any benchmarks? > > Thanks, > David > >> Of course that means such builds would be specific to a CPU class and that >> will require build changes to make multiple flavors depending on the CPU >> classes ... >> >> See gcc -mtune argument: >> https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/x86-Options.html >> >> " >> ?sandybridge? >> ???? Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, >> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. >> ?ivybridge? >> ???? Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, >> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C >> instruction set support. >> ?haswell? >> ???? Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, >> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, >> FMA, BMI, BMI2 and F16C instruction set support. >> ?broadwell? >> ???? Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, >> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, >> RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set >> support. >> ?skylake? >> ???? Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, >> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, >> FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and >> XSAVES instruction set support. >> ?bonnell? >> ???? Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 >> and SSSE3 instruction set support. >> ?silvermont? >> ???? Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, >> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set >> support. >> ?knl? >> ???? Intel Knight's Landing CPU with 64-bit extensions, MOVBE, MMX, SSE, >> SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, >> FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, AVX512F, >> AVX512PF, AVX512ER and AVX512CD instruction set support. >> ?skylake-avx512? >> ???? Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, >> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, >> RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, >> XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set >> support. >> >> " >> >> Comments are welcome, >> Laurent >> From thomas.stuefe at gmail.com Fri Oct 20 10:56:48 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 20 Oct 2017 12:56:48 +0200 Subject: Upgrading gcc arch ? In-Reply-To: <7bbf5a0d-ed69-6f6e-6c59-2373a465f65d@oracle.com> References: <7bbf5a0d-ed69-6f6e-6c59-2373a465f65d@oracle.com> Message-ID: On Fri, Oct 20, 2017 at 12:11 PM, Robbin Ehn wrote: > On 2017-10-20 11:19, David Holmes wrote: > >> bcc'ing the discuss list >> >> On 20/10/2017 6:19 PM, Laurent Bourg?s wrote: >> >>> Hi, >>> >>> I wonder if it is time to compile c/c++ code with a more recent cpu >>> architecture (x86-64 is quite old: only SSE ?) to take benefit of >>> performance optimizations offered by recent CPU and compilers (AVX...). >>> >> >> The focus in hotspot is on JIT generated code which does take advantage >> of such optimizations based on the runtime CPU capabilities. >> >> Is there specific C code in the JDK that you think would benefit from >> them? >> > > If there are specific code that preform much better with new some newer > features we could utilize function multiversioning feature in the gcc. > E.g.: > __attribute__((target_clones("sse4.2","sse3","default"))) > void stream_function(...) { > ry > Negative impact on size, so as David says, benchmark first. > > But how would this help with gcc specific optimizations ? You can provide your own implementation, but I thought the idea was to let gcc do the optimization work via -mtune. We still would have one global mtune setting for the compilation unit, right? ..Thomas > /Robbin > > > >> Have you done comparison builds and run any benchmarks? >> >> Thanks, >> David >> >> Of course that means such builds would be specific to a CPU class and that >>> will require build changes to make multiple flavors depending on the CPU >>> classes ... >>> >>> See gcc -mtune argument: >>> https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/x86-Options.html >>> >>> " >>> ?sandybridge? >>> Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, >>> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set >>> support. >>> ?ivybridge? >>> Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, >>> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C >>> instruction set support. >>> ?haswell? >>> Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, >>> SSE3, >>> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, >>> FMA, BMI, BMI2 and F16C instruction set support. >>> ?broadwell? >>> Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, >>> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, >>> RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set >>> support. >>> ?skylake? >>> Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, >>> SSE3, >>> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, >>> FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and >>> XSAVES instruction set support. >>> ?bonnell? >>> Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, >>> SSE3 >>> and SSSE3 instruction set support. >>> ?silvermont? >>> Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, >>> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction >>> set >>> support. >>> ?knl? >>> Intel Knight's Landing CPU with 64-bit extensions, MOVBE, MMX, SSE, >>> SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, >>> FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, AVX512F, >>> AVX512PF, AVX512ER and AVX512CD instruction set support. >>> ?skylake-avx512? >>> Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, >>> SSE2, >>> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, >>> FSGSBASE, >>> RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, >>> XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction >>> set >>> support. >>> >>> " >>> >>> Comments are welcome, >>> Laurent >>> >>> From robbin.ehn at oracle.com Fri Oct 20 11:37:50 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 20 Oct 2017 13:37:50 +0200 Subject: Upgrading gcc arch ? In-Reply-To: References: <7bbf5a0d-ed69-6f6e-6c59-2373a465f65d@oracle.com> Message-ID: <9272af52-8d32-ff5d-49f2-098223096173@oracle.com> On 2017-10-20 12:56, Thomas St?fe wrote: > > > On Fri, Oct 20, 2017 at 12:11 PM, Robbin Ehn > wrote: > > On 2017-10-20 11:19, David Holmes wrote: > > bcc'ing the discuss list > > On 20/10/2017 6:19 PM, Laurent Bourg?s wrote: > > Hi, > > I wonder if it is time to compile c/c++ code with a more recent cpu > architecture (x86-64 is quite old: only SSE ?) to take benefit of > performance optimizations offered by recent CPU and compilers (AVX...). > > > The focus in hotspot is on JIT generated code which does take advantage of such optimizations based on the runtime CPU capabilities. > > Is there specific C code in the JDK that you think would benefit from them? > > > If there are specific code that preform much better with new some newer features we could utilize function multiversioning feature in the gcc. > E.g.: > __attribute__((target_clones("sse4.2","sse3","default"))) > void stream_function(...) { > ry > Negative impact on size, so as David says, benchmark first. > > > But how would this help with gcc specific optimizations ?? You can provide your own implementation, but I thought the idea was to let gcc do the optimization work via -mtune. We still would have one global mtune setting for the compilation unit, right? target_clones attribute "is used to specify that a function be cloned into multiple versions compiled with different target options than specified on the command line." gcc generates, in above, 3 functions, you can also do: __attribute__((target_clones("arch=znver1","arch=skylake", "default"))) So you get: [rehn at rehn-lt ~]$ nm a.out | grep stream_function 00000000004009e0 T stream_function 0000000000400bc0 t stream_function.arch_skylake.1 0000000000400b90 t stream_function.arch_znver1.0 0000000000400bf0 i stream_function.ifunc 0000000000400bf0 W stream_function.resolver /Robbin > > ..Thomas > > /Robbin > > > > Have you done comparison builds and run any benchmarks? > > Thanks, > David > > Of course that means such builds would be specific to a CPU class and that > will require build changes to make multiple flavors depending on the CPU > classes ... > > See gcc -mtune argument: > https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/x86-Options.html > > " > ?sandybridge? > ???? Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. > ?ivybridge? > ???? Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C > instruction set support. > ?haswell? > ???? Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, > FMA, BMI, BMI2 and F16C instruction set support. > ?broadwell? > ???? Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, > RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set > support. > ?skylake? > ???? Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, > FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and > XSAVES instruction set support. > ?bonnell? > ???? Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 > and SSSE3 instruction set support. > ?silvermont? > ???? Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set > support. > ?knl? > ???? Intel Knight's Landing CPU with 64-bit extensions, MOVBE, MMX, SSE, > SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, > FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, AVX512F, > AVX512PF, AVX512ER and AVX512CD instruction set support. > ?skylake-avx512? > ???? Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, > RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, > XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set > support. > > " > > Comments are welcome, > Laurent > > From karen.kinnear at oracle.com Fri Oct 20 16:24:17 2017 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Fri, 20 Oct 2017 12:24:17 -0400 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: <3018D48F-245A-4C92-9CED-5692BBD88E8C@oracle.com> Robbin, Erik, Mikael - Delighted to see this! Looks good. I don?t need to see any updates - these are minor comments. Thank you for the performance testing Couple of questions/comments: 1. platform support supports_thread_local_poll returns true for AMD64 or SPARC Your comment said Linux x64 and Sparc only. What about Mac and Windows? 2. safepointMechanism_inline.hpp - comment clarification line 42 - ?Mutexes can be taken but none JavaThread?. Are you saying: ?Non-JavaThreads do not support handshakes, but must stop for safepoints.? Not sure what the Mutex comment is about 3. globals.hpp The way I understand this - ThreadLocalHandshakes flag is not so much to enable use of ThreadLocalHandle operations, but to enable use of TLH for global safe point. If that is true, could you possibly at least clarify this in the comment if there is not a better name for the flag? 4. thank you for looking into startup performance and interpreter return/backward branch checks. 5. handshake.cpp Could you possibly add a comment that thread_has_completed and/or pool_for_completed_thread means that the thread has either done the operation or the operation has been cancelled? I get that we are polling this to tell when it is safe to return to the synchronous requestor not to determine if the thread actually performed the operation. The comment would make that clearer. thanks, Karen > On Oct 11, 2017, at 9:37 AM, Robbin Ehn wrote: > > Hi all, > > Starting the review of the code while JEP work is still not completed. > > JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 > > This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none. > > Entire changeset: > http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ > > Divided into 3-parts, > SafepointMechanism abstraction: > http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ > Consolidating polling page allocation: > http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ > Handshakes: > http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ > > A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well. > > The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. > > Example of potential use-cases: > -Biased lock revocation > -External requests for stack traces > -Deoptimization > -Async exception delivery > -External suspension > -Eliding memory barriers > > All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. > Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC. > > Tested heavily with various test suits and comes with a few new tests. > > Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. > The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list. > > Thanks, Robbin From thomas.stuefe at gmail.com Fri Oct 20 17:12:49 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 20 Oct 2017 19:12:49 +0200 Subject: Upgrading gcc arch ? In-Reply-To: <9272af52-8d32-ff5d-49f2-098223096173@oracle.com> References: <7bbf5a0d-ed69-6f6e-6c59-2373a465f65d@oracle.com> <9272af52-8d32-ff5d-49f2-098223096173@oracle.com> Message-ID: On Fri, Oct 20, 2017 at 1:37 PM, Robbin Ehn wrote: > On 2017-10-20 12:56, Thomas St?fe wrote: > >> >> >> On Fri, Oct 20, 2017 at 12:11 PM, Robbin Ehn > > wrote: >> >> On 2017-10-20 11:19, David Holmes wrote: >> >> bcc'ing the discuss list >> >> On 20/10/2017 6:19 PM, Laurent Bourg?s wrote: >> >> Hi, >> >> I wonder if it is time to compile c/c++ code with a more >> recent cpu >> architecture (x86-64 is quite old: only SSE ?) to take >> benefit of >> performance optimizations offered by recent CPU and compilers >> (AVX...). >> >> >> The focus in hotspot is on JIT generated code which does take >> advantage of such optimizations based on the runtime CPU capabilities. >> >> Is there specific C code in the JDK that you think would benefit >> from them? >> >> >> If there are specific code that preform much better with new some >> newer features we could utilize function multiversioning feature in the gcc. >> E.g.: >> __attribute__((target_clones("sse4.2","sse3","default"))) >> void stream_function(...) { >> ry >> Negative impact on size, so as David says, benchmark first. >> >> >> But how would this help with gcc specific optimizations ? You can >> provide your own implementation, but I thought the idea was to let gcc do >> the optimization work via -mtune. We still would have one global mtune >> setting for the compilation unit, right? >> > > target_clones attribute "is used to specify that a function be cloned into > multiple versions compiled with different target options than specified on > the command line." > gcc generates, in above, 3 functions, you can also do: > __attribute__((target_clones("arch=znver1","arch=skylake", "default"))) > > So you get: > [rehn at rehn-lt ~]$ nm a.out | grep stream_function > 00000000004009e0 T stream_function > 0000000000400bc0 t stream_function.arch_skylake.1 > 0000000000400b90 t stream_function.arch_znver1.0 > 0000000000400bf0 i stream_function.ifunc > 0000000000400bf0 W stream_function.resolver > > /Robbin > > Very interesting, thanks for the pointer. I did not know that was possible. Best Regards, Thomas > >> ..Thomas >> >> /Robbin >> >> >> >> Have you done comparison builds and run any benchmarks? >> >> Thanks, >> David >> >> Of course that means such builds would be specific to a CPU >> class and that >> will require build changes to make multiple flavors depending >> on the CPU >> classes ... >> >> See gcc -mtune argument: >> https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/x86-Options.html >> >> >> >> " >> ?sandybridge? >> Intel Sandy Bridge CPU with 64-bit extensions, MMX, >> SSE, SSE2, SSE3, >> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL >> instruction set support. >> ?ivybridge? >> Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, >> SSE2, SSE3, >> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, >> RDRND and F16C >> instruction set support. >> ?haswell? >> Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, >> SSE, SSE2, SSE3, >> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, >> FSGSBASE, RDRND, >> FMA, BMI, BMI2 and F16C instruction set support. >> ?broadwell? >> Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, >> SSE, SSE2, >> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, >> FSGSBASE, >> RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW >> instruction set >> support. >> ?skylake? >> Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, >> SSE, SSE2, SSE3, >> SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, >> FSGSBASE, RDRND, >> FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, >> XSAVEC and >> XSAVES instruction set support. >> ?bonnell? >> Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, >> SSE, SSE2, SSE3 >> and SSSE3 instruction set support. >> ?silvermont? >> Intel Silvermont CPU with 64-bit extensions, MOVBE, >> MMX, SSE, SSE2, >> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND >> instruction set >> support. >> ?knl? >> Intel Knight's Landing CPU with 64-bit extensions, >> MOVBE, MMX, SSE, >> SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, >> PCLMUL, >> FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, >> PREFETCHW, AVX512F, >> AVX512PF, AVX512ER and AVX512CD instruction set support. >> ?skylake-avx512? >> Intel Skylake Server CPU with 64-bit extensions, MOVBE, >> MMX, SSE, SSE2, >> SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, >> PCLMUL, FSGSBASE, >> RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, >> CLFLUSHOPT, XSAVEC, >> XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD >> instruction set >> support. >> >> " >> >> Comments are welcome, >> Laurent >> >> >> From peter.lawrey at gmail.com Fri Oct 20 08:31:34 2017 From: peter.lawrey at gmail.com (Peter Lawrey) Date: Fri, 20 Oct 2017 09:31:34 +0100 Subject: Upgrading gcc arch ? In-Reply-To: References: Message-ID: I know the drive is toward smaller builds, but it would be good to auto select the CPU level at run time. I suspect however, this is something the OpenJDK (or a vendor supporting it) could do. Perhaps code which is CPU model sensitive could be placed in a small shared library with multiple versions and the appropriate build selected at runtime or on installation. Regards, Peter. ? On 20 October 2017 at 09:19, Laurent Bourg?s wrote: > Hi, > > I wonder if it is time to compile c/c++ code with a more recent cpu > architecture (x86-64 is quite old: only SSE ?) to take benefit of > performance optimizations offered by recent CPU and compilers (AVX...). > > Of course that means such builds would be specific to a CPU class and that > will require build changes to make multiple flavors depending on the CPU > classes ... > > See gcc -mtune argument: > https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/x86-Options.html > > " > ?sandybridge? > Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. > ?ivybridge? > Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C > instruction set support. > ?haswell? > Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, > FMA, BMI, BMI2 and F16C instruction set support. > ?broadwell? > Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, > RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set > support. > ?skylake? > Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, > SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, > FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and > XSAVES instruction set support. > ?bonnell? > Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 > and SSSE3 instruction set support. > ?silvermont? > Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set > support. > ?knl? > Intel Knight's Landing CPU with 64-bit extensions, MOVBE, MMX, SSE, > SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, > FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, AVX512F, > AVX512PF, AVX512ER and AVX512CD instruction set support. > ?skylake-avx512? > Intel Skylake Server CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, > SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2, AES, PCLMUL, FSGSBASE, > RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, > XSAVES, AVX512F, AVX512VL, AVX512BW, AVX512DQ and AVX512CD instruction set > support. > > " > > Comments are welcome, > Laurent > From bob.vandette at oracle.com Fri Oct 20 18:44:31 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Fri, 20 Oct 2017 14:44:31 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <5d217c60-3049-30a6-c207-d6c9274a5ddf@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <39AD9F8D-7E2B-4C15-8525-36DBA7C74302@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> <20ef0bac-1942-b29f-a9e2-4ea4d4f81cd2@oracle.com> <5d217c60-3049-30a6-c207-d6c9274a5ddf@oracle.com> Message-ID: <1C03FCB5-969B-4C43-8BAD-EF939515FEC2@oracle.com> Here?s an updated webrev that hopefully takes care of all remaining comments. http://cr.openjdk.java.net/~bobv/8146115/webrev.02 I added the deprecation of the UseCGroupMemoryLimitForHeap option this round since this experimental option should no longer be necessary. Bob. > On Oct 13, 2017, at 9:34 AM, David Holmes wrote: > > Reading back through my suggestion for os.hpp initialize_container_support should just be init_container_support. > > Thanks, > David > > On 13/10/2017 11:14 PM, Bob Vandette wrote: >>> On Oct 12, 2017, at 11:08 PM, David Holmes wrote: >>> >>> Hi Bob, >>> >>> On 13/10/2017 1:43 AM, Bob Vandette wrote: >>>>> On Oct 11, 2017, at 9:04 PM, David Holmes wrote: >>>>> >>>>> Hi Bob, >>>>> >>>>> On 12/10/2017 5:11 AM, Bob Vandette wrote: >>>>>> Here?s an updated webrev for this RFE that contains changes and cleanups based on feedback I?ve received so far. >>>>>> I?m still investigating the best approach for reacting to cpu shares and quotas. I do not believe doing nothing is the answer. >>>>> >>>>> I do. :) Let me try this again. When you run outside of a container you don't get 100% of the CPUs - you have to share with whatever else is running on the system. You get a fraction of CPU time based on the load. We don't try to communicate load information to the VM/application so it can adapt. Within a container setting shares/quotas is just a way of setting an artificial load. So why should we be treating it any differently? >>>> Because today we optimize for a lightly loaded system and when running serverless applications in containers we should be >>>> optimizing for a fully loaded system. If developers don?t want this, then don?t use shares or quotas and you?ll have exactly >>>> the behavior you have today. I think we just have to document the new behavior (and how to turn it off) so people know what >>>> to expect. >>> >>> The person deploying the app may not have control over how the app is deployed in terms of shares/quotas. It all depends how (and who) manages the containers. This is a big part of my problem/concerns here that I don't know exactly how all this is organized and who knows what in advance and what they can control. >>> >>> But I'll let this drop, other than raising an additional concern. I don't think just allowing the user to hardwire the number of processors to use will necessarily solve the problem with what available_processors() returns. I'm concerned the execution of the VM may occur in a context where the number of processors is not known in advance, and the user can not disable shares/quotas. In that case we may need to have a flag that says to ignore shares/quotas in the processor count calculation. >> I?m not sure that?s a high probability issue. It?s my understanding that whoever is configuring the container >> management will be specifying the resources required to run these applications which comes along with a >> guarantee of these resources. If this issue does come up, I do have the -XX:-UseContainerSupport big >> switch that turns all of this off. It will however disable the memory support as well. >>> >>>> You seem to discount the added cost of 100s of VMs creating lots of un-necessaary threads. In the current JDK 10 code base, >>>> In a heavily loaded system with 88 processors, VmData grows from 60MBs (1 cpu) to 376MB (88 cpus). This is only mapped >>>> memory and it depends heavily on how deep in the stack these threads go before it impacts VmRSS but it shows the potential downside >>>> of having 100s of VMs thinking they each own the entire machine. >>> >>> I agree that the default ergonomics does not scale well. Anyone doing any serious Java deployment tunes the VM explicitly and does not rely on the defaults. How will they do that in a container environment? I don't know. >>> >>> I would love to see some actual deployment scenarios/experiences for this to understand things better. >> This is one of the reasons I want to get this support out in JDK 10, to get some feedback under real scenarios. >>> >>>> I haven?t even done any experiments to determine the added context switching cost if the VM decides to use excessive >>>> pthreads. >>>>> >>>>> That's not to say an API to provide load/shares/quota information may not be useful, but that is a separate issue to what the "active processor count" should report. >>>> I don?t have a problem with active processor count reporting the number of processors we have, but I do have a problem >>>> with our current usage of this information within the VM and Core libraries. >>> >>> That is a somewhat separate issue. One worth pursuing separately. >> We should look at this as part of the ?Container aware Java? JEP. >>> >>>>> >>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01 >>>>>> Updates: >>>>>> 1. I had to move the processing of AggressiveHeap since the container memory size needs to be known before this can be processed. >>>>> >>>>> I don't like the placement of this - we don't call os:: init functions from inside Arguments - we manage the initialization sequence from Threads::create_vm. Seems to me that container initialization can/should happen in os::init_before_ergo, and the AggressiveHeap processing can occur at the start of Arguments::apply_ergo(). >>>>> >>>>> That said we need to be sure nothing touched by set_aggressive_heap_flags will be used before we now reach that code - there are a lot of flags being set in there. >>>> This is exactly the reason why I put the call where it did. I put the call to set_aggressive_heap_flags in finalize_vm_init_args >>>> because that is exactly what this call is doing. It?s finalizing flags used after the parsing. The impacted flags are definitely being >>>> used shortly after and before init_before_ergo is called. >>> >>> I see that now and it is very unfortunate because I really do not like what you had to do here. As you can tell from the logic in create_vm we have always refactored to ensure we can progressively manage the interleaving of OS initialization with Arguments processing. So having a deep part of Argument processing go off and call some more OS initialization is not nice. That said I can't see a way around it without very unreasonable refactoring. >>> >>> But I do have a couple of changes I'd like to request please: >>> >>> 1. Move the call to os::initialize_container_support() up a level to before the call to finalize_vm_init_args(), with a more elaborate comment: >>> >>> // We need to ensure processor and memory resources have been properly >>> // configured - which may rely on arguments we just processed - before >>> // doing the final argument processing. Any argument processing that >>> // needs to know about processor and memory resources must occur after >>> // this point. >>> >>> os::initialize_container_support(); >>> >>> // Do final processing now that all arguments have been parsed >>> result = finalize_vm_init_args(patch_mod_javabase); >>> >>> 2. Simplify and modify os.hpp as follows: >>> >>> + LINUX_ONLY(static void pd_initialize_container_support();) >>> >>> public: >>> static void init(void); // Called before command line parsing >>> >>> + static void initialize_container_support() { // Called during command line parsing >>> + LINUX_ONLY(pd_initialize_container_support();) >>> + } >>> >>> static void init_before_ergo(void); // Called after command line parsing >>> // before VM ergonomics >>> >>> 3. In thread.cpp add a comment here: >>> >>> // Parse arguments >>> + // Note: this internally calls os::initialize_container_support() >>> jint parse_result = Arguments::parse(args); >> All very reasonable changes. >> Thanks, >> Bob. >>> >>> Thanks. >>> >>>>> >>>>>> 2. I no longer use the cpuset.cpus contents since sched_getaffinity reports the correct results >>>>>> even if someone manually updates the cgroup data. I originally didn?t think this was the case since >>>>>> sched_setaffinity didn?t automatically update the cpuset file contents but the inverse is true. >>>>> >>>>> Ok. >>>>> >>>>>> 3. I ifdef?d the container function support in src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os >>>>>> platform directories. I can do this if it?s absolutely necessary. >>>>> >>>>> You should not need to do this if initialization moves as I suggested above. os::init_before_ergo() in os_linux.cpp can call OSContainer::init(). >>>>> No need for os::initialize_container_support() or os::pd_initialize_container_support. >>>> But os::init_before_ergo is in shared code. >>> >>> Yep my bad - point is moot now anyway. >>> >>> >>> >>>>> src/hotspot/os/linux/os_linux.cpp/.hpp >>>>> >>>>> 187 log_trace(os)("available container memory: " JULONG_FORMAT, avail_mem); >>>>> 188 return avail_mem; >>>>> 189 } else { >>>>> 190 log_debug(os,container)("container memory usage call failed: " JLONG_FORMAT, mem_usage); >>>>> >>>>> Why "trace" (the third logging level) to show the information, but "debug" (the second level) to show failed calls? You use debug in other files for basic info. Overall I'm unclear on your use of debug versus trace for the logging. >>>> I use trace for noisy information that is not reporting errors and debug for failures that are informational and not fatal. >>>> In this case, the call could return -1 or -2. -1 is unlimited and -2 is an error. In either case we fallback to the >>>> standard system call to get available memory. I would have used warning but since these messages were occurring >>>> during a test run causing test failures. >>> >>> Okay. Thanks for clarifying. >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/os/linux/osContainer_linux.cpp >>>>> >>>>> Dead code: >>>>> >>>>> 376 #if 0 >>>>> 377 os::Linux::print_container_info(tty); >>>>> ... >>>>> 390 #endif >>>> I left it in for standalone testing. Should I use some other #if? >>> >>> We don't generally leave in dead code in the runtime code. Do you see this as useful after you've finalized the changes? >>> >>> Is this testing just for showing the logging? Is it worth making this a logging controlled call? Is it suitable for a Gtest test? >>> >>> Thanks, >>> David >>> ----- >>> >>>> Bob. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Bob. From kim.barrett at oracle.com Sat Oct 21 05:23:47 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 21 Oct 2017 01:23:47 -0400 Subject: RFR: 8189088: Add intrusive doubly-linked list utility In-Reply-To: References: Message-ID: > On Oct 10, 2017, at 4:29 AM, Kim Barrett wrote: > > RFR: 8189088: Add intrusive doubly-linked list utility Based on some offline feedback, I?m withdrawing this change to do some rework. From david.holmes at oracle.com Sun Oct 22 21:52:12 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 23 Oct 2017 07:52:12 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <1C03FCB5-969B-4C43-8BAD-EF939515FEC2@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> <20ef0bac-1942-b29f-a9e2-4ea4d4f81cd2@oracle.com> <5d217c60-3049-30a6-c207-d6c9274a5ddf@oracle.com> <1C03FCB5-969B-4C43-8BAD-EF939515FEC2@oracle.com> Message-ID: <51f57623-8ce5-0883-69cc-9ba6b39b5a65@oracle.com> Hi Bob, Changes seem fine. I'll take up the issue of whether this should be enabled by default in the CSR. Thanks, David On 21/10/2017 4:44 AM, Bob Vandette wrote: > Here?s an updated webrev that hopefully takes care of all remaining comments. > > http://cr.openjdk.java.net/~bobv/8146115/webrev.02 > > I added the deprecation of the UseCGroupMemoryLimitForHeap option this round since > this experimental option should no longer be necessary. > > > Bob. > > >> On Oct 13, 2017, at 9:34 AM, David Holmes wrote: >> >> Reading back through my suggestion for os.hpp initialize_container_support should just be init_container_support. >> >> Thanks, >> David >> >> On 13/10/2017 11:14 PM, Bob Vandette wrote: >>>> On Oct 12, 2017, at 11:08 PM, David Holmes wrote: >>>> >>>> Hi Bob, >>>> >>>> On 13/10/2017 1:43 AM, Bob Vandette wrote: >>>>>> On Oct 11, 2017, at 9:04 PM, David Holmes wrote: >>>>>> >>>>>> Hi Bob, >>>>>> >>>>>> On 12/10/2017 5:11 AM, Bob Vandette wrote: >>>>>>> Here?s an updated webrev for this RFE that contains changes and cleanups based on feedback I?ve received so far. >>>>>>> I?m still investigating the best approach for reacting to cpu shares and quotas. I do not believe doing nothing is the answer. >>>>>> >>>>>> I do. :) Let me try this again. When you run outside of a container you don't get 100% of the CPUs - you have to share with whatever else is running on the system. You get a fraction of CPU time based on the load. We don't try to communicate load information to the VM/application so it can adapt. Within a container setting shares/quotas is just a way of setting an artificial load. So why should we be treating it any differently? >>>>> Because today we optimize for a lightly loaded system and when running serverless applications in containers we should be >>>>> optimizing for a fully loaded system. If developers don?t want this, then don?t use shares or quotas and you?ll have exactly >>>>> the behavior you have today. I think we just have to document the new behavior (and how to turn it off) so people know what >>>>> to expect. >>>> >>>> The person deploying the app may not have control over how the app is deployed in terms of shares/quotas. It all depends how (and who) manages the containers. This is a big part of my problem/concerns here that I don't know exactly how all this is organized and who knows what in advance and what they can control. >>>> >>>> But I'll let this drop, other than raising an additional concern. I don't think just allowing the user to hardwire the number of processors to use will necessarily solve the problem with what available_processors() returns. I'm concerned the execution of the VM may occur in a context where the number of processors is not known in advance, and the user can not disable shares/quotas. In that case we may need to have a flag that says to ignore shares/quotas in the processor count calculation. >>> I?m not sure that?s a high probability issue. It?s my understanding that whoever is configuring the container >>> management will be specifying the resources required to run these applications which comes along with a >>> guarantee of these resources. If this issue does come up, I do have the -XX:-UseContainerSupport big >>> switch that turns all of this off. It will however disable the memory support as well. >>>> >>>>> You seem to discount the added cost of 100s of VMs creating lots of un-necessaary threads. In the current JDK 10 code base, >>>>> In a heavily loaded system with 88 processors, VmData grows from 60MBs (1 cpu) to 376MB (88 cpus). This is only mapped >>>>> memory and it depends heavily on how deep in the stack these threads go before it impacts VmRSS but it shows the potential downside >>>>> of having 100s of VMs thinking they each own the entire machine. >>>> >>>> I agree that the default ergonomics does not scale well. Anyone doing any serious Java deployment tunes the VM explicitly and does not rely on the defaults. How will they do that in a container environment? I don't know. >>>> >>>> I would love to see some actual deployment scenarios/experiences for this to understand things better. >>> This is one of the reasons I want to get this support out in JDK 10, to get some feedback under real scenarios. >>>> >>>>> I haven?t even done any experiments to determine the added context switching cost if the VM decides to use excessive >>>>> pthreads. >>>>>> >>>>>> That's not to say an API to provide load/shares/quota information may not be useful, but that is a separate issue to what the "active processor count" should report. >>>>> I don?t have a problem with active processor count reporting the number of processors we have, but I do have a problem >>>>> with our current usage of this information within the VM and Core libraries. >>>> >>>> That is a somewhat separate issue. One worth pursuing separately. >>> We should look at this as part of the ?Container aware Java? JEP. >>>> >>>>>> >>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01 >>>>>>> Updates: >>>>>>> 1. I had to move the processing of AggressiveHeap since the container memory size needs to be known before this can be processed. >>>>>> >>>>>> I don't like the placement of this - we don't call os:: init functions from inside Arguments - we manage the initialization sequence from Threads::create_vm. Seems to me that container initialization can/should happen in os::init_before_ergo, and the AggressiveHeap processing can occur at the start of Arguments::apply_ergo(). >>>>>> >>>>>> That said we need to be sure nothing touched by set_aggressive_heap_flags will be used before we now reach that code - there are a lot of flags being set in there. >>>>> This is exactly the reason why I put the call where it did. I put the call to set_aggressive_heap_flags in finalize_vm_init_args >>>>> because that is exactly what this call is doing. It?s finalizing flags used after the parsing. The impacted flags are definitely being >>>>> used shortly after and before init_before_ergo is called. >>>> >>>> I see that now and it is very unfortunate because I really do not like what you had to do here. As you can tell from the logic in create_vm we have always refactored to ensure we can progressively manage the interleaving of OS initialization with Arguments processing. So having a deep part of Argument processing go off and call some more OS initialization is not nice. That said I can't see a way around it without very unreasonable refactoring. >>>> >>>> But I do have a couple of changes I'd like to request please: >>>> >>>> 1. Move the call to os::initialize_container_support() up a level to before the call to finalize_vm_init_args(), with a more elaborate comment: >>>> >>>> // We need to ensure processor and memory resources have been properly >>>> // configured - which may rely on arguments we just processed - before >>>> // doing the final argument processing. Any argument processing that >>>> // needs to know about processor and memory resources must occur after >>>> // this point. >>>> >>>> os::initialize_container_support(); >>>> >>>> // Do final processing now that all arguments have been parsed >>>> result = finalize_vm_init_args(patch_mod_javabase); >>>> >>>> 2. Simplify and modify os.hpp as follows: >>>> >>>> + LINUX_ONLY(static void pd_initialize_container_support();) >>>> >>>> public: >>>> static void init(void); // Called before command line parsing >>>> >>>> + static void initialize_container_support() { // Called during command line parsing >>>> + LINUX_ONLY(pd_initialize_container_support();) >>>> + } >>>> >>>> static void init_before_ergo(void); // Called after command line parsing >>>> // before VM ergonomics >>>> >>>> 3. In thread.cpp add a comment here: >>>> >>>> // Parse arguments >>>> + // Note: this internally calls os::initialize_container_support() >>>> jint parse_result = Arguments::parse(args); >>> All very reasonable changes. >>> Thanks, >>> Bob. >>>> >>>> Thanks. >>>> >>>>>> >>>>>>> 2. I no longer use the cpuset.cpus contents since sched_getaffinity reports the correct results >>>>>>> even if someone manually updates the cgroup data. I originally didn?t think this was the case since >>>>>>> sched_setaffinity didn?t automatically update the cpuset file contents but the inverse is true. >>>>>> >>>>>> Ok. >>>>>> >>>>>>> 3. I ifdef?d the container function support in src/hotspot/share/runtime/os.hpp to avoid putting stubs in all other os >>>>>>> platform directories. I can do this if it?s absolutely necessary. >>>>>> >>>>>> You should not need to do this if initialization moves as I suggested above. os::init_before_ergo() in os_linux.cpp can call OSContainer::init(). >>>>>> No need for os::initialize_container_support() or os::pd_initialize_container_support. >>>>> But os::init_before_ergo is in shared code. >>>> >>>> Yep my bad - point is moot now anyway. >>>> >>>> >>>> >>>>>> src/hotspot/os/linux/os_linux.cpp/.hpp >>>>>> >>>>>> 187 log_trace(os)("available container memory: " JULONG_FORMAT, avail_mem); >>>>>> 188 return avail_mem; >>>>>> 189 } else { >>>>>> 190 log_debug(os,container)("container memory usage call failed: " JLONG_FORMAT, mem_usage); >>>>>> >>>>>> Why "trace" (the third logging level) to show the information, but "debug" (the second level) to show failed calls? You use debug in other files for basic info. Overall I'm unclear on your use of debug versus trace for the logging. >>>>> I use trace for noisy information that is not reporting errors and debug for failures that are informational and not fatal. >>>>> In this case, the call could return -1 or -2. -1 is unlimited and -2 is an error. In either case we fallback to the >>>>> standard system call to get available memory. I would have used warning but since these messages were occurring >>>>> during a test run causing test failures. >>>> >>>> Okay. Thanks for clarifying. >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/os/linux/osContainer_linux.cpp >>>>>> >>>>>> Dead code: >>>>>> >>>>>> 376 #if 0 >>>>>> 377 os::Linux::print_container_info(tty); >>>>>> ... >>>>>> 390 #endif >>>>> I left it in for standalone testing. Should I use some other #if? >>>> >>>> We don't generally leave in dead code in the runtime code. Do you see this as useful after you've finalized the changes? >>>> >>>> Is this testing just for showing the logging? Is it worth making this a logging controlled call? Is it suitable for a Gtest test? >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Bob. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Bob. > From david.holmes at oracle.com Mon Oct 23 01:59:55 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 23 Oct 2017 11:59:55 +1000 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <51f57623-8ce5-0883-69cc-9ba6b39b5a65@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> <20ef0bac-1942-b29f-a9e2-4ea4d4f81cd2@oracle.com> <5d217c60-3049-30a6-c207-d6c9274a5ddf@oracle.com> <1C03FCB5-969B-4C43-8BAD-EF939515FEC2@oracle.com> <51f57623-8ce5-0883-69cc-9ba6b39b5a65@oracle.com> Message-ID: <46ef96d6-f10a-7da4-8101-08bfc281705d@oracle.com> Sorry just spotted a minor issue when actually running the code. Many of you log statements include \n in the format string. This is unnecessary and results in lots of blank lines in the logging output eg: [0.002s][trace][os,container] OSContainer::init: Initializing Container Support [0.003s][trace][os,container] Path to /memory.limit_in_bytes is /cgroup/memory//memory.limit_in_bytes [0.003s][trace][os,container] Memory Limit is: 9223372036854775807 [0.004s][trace][os,container] Memory Limit is: Unlimited [0.004s][trace][os ] active_processor_count: using static path - configured processors: 4 [0.004s][trace][os ] active_processor_count: sched_getaffinity processor count: 4 [0.004s][trace][os,container] Path to /cpu.shares is /cgroup/cpu//cpu.shares [0.005s][trace][os,container] CPU Shares is: 1024 [0.005s][trace][os,container] Path to /cpu.cfs_quota_us is /cgroup/cpu//cpu.cfs_quota_us [0.005s][debug][os,container] file not found /cgroup/cpu//cpu.cfs_quota_us [0.005s][debug][os,container] Error reading /cpu.cfs_quota_us [0.005s][trace][os,container] Path to /cpu.cfs_period_us is /cgroup/cpu//cpu.cfs_period_us [0.006s][debug][os,container] file not found /cgroup/cpu//cpu.cfs_period_us [0.006s][debug][os,container] Error reading /cpu.cfs_period_us Thanks, David On 23/10/2017 7:52 AM, David Holmes wrote: > Hi Bob, > > Changes seem fine. > > I'll take up the issue of whether this should be enabled by default in > the CSR. > > Thanks, > David > > On 21/10/2017 4:44 AM, Bob Vandette wrote: >> Here?s an updated webrev that hopefully takes care of all remaining >> comments. >> >> http://cr.openjdk.java.net/~bobv/8146115/webrev.02 >> >> I added the deprecation of the UseCGroupMemoryLimitForHeap option this >> round since >> this experimental option should no longer be necessary. >> >> >> Bob. >> >> >>> On Oct 13, 2017, at 9:34 AM, David Holmes >>> wrote: >>> >>> Reading back through my suggestion for os.hpp >>> initialize_container_support should just be init_container_support. >>> >>> Thanks, >>> David >>> >>> On 13/10/2017 11:14 PM, Bob Vandette wrote: >>>>> On Oct 12, 2017, at 11:08 PM, David Holmes >>>>> wrote: >>>>> >>>>> Hi Bob, >>>>> >>>>> On 13/10/2017 1:43 AM, Bob Vandette wrote: >>>>>>> On Oct 11, 2017, at 9:04 PM, David Holmes >>>>>>> wrote: >>>>>>> >>>>>>> Hi Bob, >>>>>>> >>>>>>> On 12/10/2017 5:11 AM, Bob Vandette wrote: >>>>>>>> Here?s an updated webrev for this RFE that contains changes and >>>>>>>> cleanups based on feedback I?ve received so far. >>>>>>>> I?m still investigating the best approach for reacting to cpu >>>>>>>> shares and quotas.? I do not believe doing nothing is the answer. >>>>>>> >>>>>>> I do. :) Let me try this again. When you run outside of a >>>>>>> container you don't get 100% of the CPUs - you have to share with >>>>>>> whatever else is running on the system. You get a fraction of CPU >>>>>>> time based on the load. We don't try to communicate load >>>>>>> information to the VM/application so it can adapt. Within a >>>>>>> container setting shares/quotas is just a way of setting an >>>>>>> artificial load. So why should we be treating it any differently? >>>>>> Because today we optimize for a lightly loaded system and when >>>>>> running serverless applications in containers we should be >>>>>> optimizing for a fully loaded system.? If developers don?t want >>>>>> this, then don?t use shares or quotas and you?ll have exactly >>>>>> the behavior you have today.? I think we just have to document the >>>>>> new behavior (and how to turn it off) so people know what >>>>>> to expect. >>>>> >>>>> The person deploying the app may not have control over how the app >>>>> is deployed in terms of shares/quotas. It all depends how (and who) >>>>> manages the containers. This is a big part of my problem/concerns >>>>> here that I don't know exactly how all this is organized and who >>>>> knows what in advance and what they can control. >>>>> >>>>> But I'll let this drop, other than raising an additional concern. I >>>>> don't think just allowing the user to hardwire the number of >>>>> processors to use will necessarily solve the problem with what >>>>> available_processors() returns. I'm concerned the execution of the >>>>> VM may occur in a context where the number of processors is not >>>>> known in advance, and the user can not disable shares/quotas. In >>>>> that case we may need to have a flag that says to ignore >>>>> shares/quotas in the processor count calculation. >>>> I?m not sure that?s a high probability issue.? It?s my understanding >>>> that whoever is configuring the container >>>> management will be specifying the resources required to run these >>>> applications which comes along with a >>>> guarantee of these resources.? If this issue does come up, I do have >>>> the -XX:-UseContainerSupport big >>>> switch that turns all of this off.? It will however disable the >>>> memory support as well. >>>>> >>>>>> You seem to discount the added cost of 100s of VMs creating lots >>>>>> of un-necessaary threads.? In the current JDK 10 code base, >>>>>> In a heavily loaded system with 88 processors, VmData grows from >>>>>> 60MBs (1 cpu) to 376MB (88 cpus).? This is only mapped >>>>>> memory and it depends heavily on how deep in the stack these >>>>>> threads go before it impacts VmRSS but it shows the potential >>>>>> downside >>>>>> of having 100s of VMs thinking they each own the entire machine. >>>>> >>>>> I agree that the default ergonomics does not scale well. Anyone >>>>> doing any serious Java deployment tunes the VM explicitly and does >>>>> not rely on the defaults. How will they do that in a container >>>>> environment? I don't know. >>>>> >>>>> I would love to see some actual deployment scenarios/experiences >>>>> for this to understand things better. >>>> This is one of the reasons I want to get this support out in JDK 10, >>>> to get some feedback under real scenarios. >>>>> >>>>>> I haven?t even done any experiments to determine the added context >>>>>> switching cost if the VM decides to use excessive >>>>>> pthreads. >>>>>>> >>>>>>> That's not to say an API to provide load/shares/quota information >>>>>>> may not be useful, but that is a separate issue to what the >>>>>>> "active processor count" should report. >>>>>> I don?t have a problem with active processor count reporting the >>>>>> number of processors we have, but I do have a problem >>>>>> with our current usage of this information within the VM and Core >>>>>> libraries. >>>>> >>>>> That is a somewhat separate issue. One worth pursuing separately. >>>> We should look at this as part of the ?Container aware Java? JEP. >>>>> >>>>>>> >>>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01 >>>>>>>> Updates: >>>>>>>> 1. I had to move the processing of AggressiveHeap since the >>>>>>>> container memory size needs to be known before this can be >>>>>>>> processed. >>>>>>> >>>>>>> I don't like the placement of this - we don't call os:: init >>>>>>> functions from inside Arguments - we manage the initialization >>>>>>> sequence from Threads::create_vm. Seems to me that container >>>>>>> initialization can/should happen in os::init_before_ergo, and the >>>>>>> AggressiveHeap processing can occur at the start of >>>>>>> Arguments::apply_ergo(). >>>>>>> >>>>>>> That said we need to be sure nothing touched by >>>>>>> set_aggressive_heap_flags will be used before we now reach that >>>>>>> code - there are a lot of flags being set in there. >>>>>> This is exactly the reason why I put the call where it did.? I put >>>>>> the call to set_aggressive_heap_flags in finalize_vm_init_args >>>>>> because that is exactly what this call is doing.? It?s finalizing >>>>>> flags used after the parsing.? The impacted flags are definitely >>>>>> being >>>>>> used shortly after and before init_before_ergo is called. >>>>> >>>>> I see that now and it is very unfortunate because I really do not >>>>> like what you had to do here. As you can tell from the logic in >>>>> create_vm we have always refactored to ensure we can progressively >>>>> manage the interleaving of OS initialization with Arguments >>>>> processing. So having a deep part of Argument processing go off and >>>>> call some more OS initialization is not nice. That said I can't see >>>>> a way around it without very unreasonable refactoring. >>>>> >>>>> But I do have a couple of changes I'd like to request please: >>>>> >>>>> 1. Move the call to os::initialize_container_support() up a level >>>>> to before the call to finalize_vm_init_args(), with a more >>>>> elaborate comment: >>>>> >>>>> // We need to ensure processor and memory resources have been properly >>>>> // configured - which may rely on arguments we just processed - before >>>>> // doing the final argument processing. Any argument processing that >>>>> // needs to know about processor and memory resources must occur after >>>>> // this point. >>>>> >>>>> os::initialize_container_support(); >>>>> >>>>> // Do final processing now that all arguments have been parsed >>>>> result = finalize_vm_init_args(patch_mod_javabase); >>>>> >>>>> 2. Simplify and modify os.hpp as follows: >>>>> >>>>> +? LINUX_ONLY(static void pd_initialize_container_support();) >>>>> >>>>> ?? public: >>>>> ??? static void init(void);????????????????????? // Called before >>>>> command line parsing >>>>> >>>>> +?? static void initialize_container_support() { // Called during >>>>> command line parsing >>>>> +???? LINUX_ONLY(pd_initialize_container_support();) >>>>> +?? } >>>>> >>>>> ??? static void init_before_ergo(void);????????? // Called after >>>>> command line parsing >>>>> ???????????????????????????????????????????????? // before VM >>>>> ergonomics >>>>> >>>>> 3. In thread.cpp add a comment here: >>>>> >>>>> ?? // Parse arguments >>>>> +? // Note: this internally calls os::initialize_container_support() >>>>> ?? jint parse_result = Arguments::parse(args); >>>> All very reasonable changes. >>>> Thanks, >>>> Bob. >>>>> >>>>> Thanks. >>>>> >>>>>>> >>>>>>>> 2. I no longer use the cpuset.cpus contents since >>>>>>>> sched_getaffinity reports the correct results >>>>>>>> even if someone manually updates the cgroup data.? I originally >>>>>>>> didn?t think this was the case since >>>>>>>> sched_setaffinity didn?t automatically update the cpuset file >>>>>>>> contents but the inverse is true. >>>>>>> >>>>>>> Ok. >>>>>>> >>>>>>>> 3. I ifdef?d the container function support in >>>>>>>> src/hotspot/share/runtime/os.hpp to avoid putting stubs in all >>>>>>>> other os >>>>>>>> platform directories.? I can do this if it?s absolutely necessary. >>>>>>> >>>>>>> You should not need to do this if initialization moves as I >>>>>>> suggested above. os::init_before_ergo() in os_linux.cpp can call >>>>>>> OSContainer::init(). >>>>>>> No need for os::initialize_container_support() or >>>>>>> os::pd_initialize_container_support. >>>>>> But os::init_before_ergo is in shared code. >>>>> >>>>> Yep my bad - point is moot now anyway. >>>>> >>>>> >>>>> >>>>>>> src/hotspot/os/linux/os_linux.cpp/.hpp >>>>>>> >>>>>>> 187???????? log_trace(os)("available container memory: " >>>>>>> JULONG_FORMAT, avail_mem); >>>>>>> 188???????? return avail_mem; >>>>>>> 189?????? } else { >>>>>>> 190???????? log_debug(os,container)("container memory usage call >>>>>>> failed: " JLONG_FORMAT, mem_usage); >>>>>>> >>>>>>> Why "trace" (the third logging level) to show the information, >>>>>>> but "debug" (the second level) to show failed calls? You use >>>>>>> debug in other files for basic info. Overall I'm unclear on your >>>>>>> use of debug versus trace for the logging. >>>>>> I use trace for noisy information that is not reporting errors and >>>>>> debug for failures that are informational and not fatal. >>>>>> In this case, the call could return -1 or -2.? -1 is unlimited and >>>>>> -2 is an error.? In either case we fallback to the >>>>>> standard system call to get available memory.? I would have used >>>>>> warning but since these messages were occurring >>>>>> during a test run causing test failures. >>>>> >>>>> Okay. Thanks for clarifying. >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/os/linux/osContainer_linux.cpp >>>>>>> >>>>>>> Dead code: >>>>>>> >>>>>>> 376 #if 0 >>>>>>> 377?? os::Linux::print_container_info(tty); >>>>>>> ... >>>>>>> 390 #endif >>>>>> I left it in for standalone testing.? Should I use some other #if? >>>>> >>>>> We don't generally leave in dead code in the runtime code. Do you >>>>> see this as useful after you've finalized the changes? >>>>> >>>>> Is this testing just for showing the logging? Is it worth making >>>>> this a logging controlled call? Is it suitable for a Gtest test? >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> Bob. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> Bob. >> From kim.barrett at oracle.com Mon Oct 23 04:52:18 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 23 Oct 2017 00:52:18 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> Message-ID: > On Sep 27, 2017, at 9:20 PM, David Holmes wrote: >>> 62 void set_subsystem_path(char *cgroup_path) { >>> >>> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere? >> I tried several different ways of declaring the container accessor functions and >> always ended up with warnings due to scanf not being able to validate arguments >> since the format string didn?t end up being a string literal. I originally was using templates >> and then ended up with the macros. I tried several different casts but could resolve the problem. > > Sounds like something Kim Barrett should take a look at :) Fortunately, I just happened by. The warnings are because we compile with -Wformat=2, which enables -Wformat-nonliteral (among other things). Use PRAGMA_FORMAT_NONLITERAL_IGNORED, e.g. PRAGMA_DIAG_PUSH PRAGMA_FORMAT_NONLITERAL_IGNORED PRAGMA_DIAG_POP That will silence warnings about sscanf (or anything else!) with a non-literal format string within that . Also, while I was looking at this, I noticed that in get_subsytem_file_contents_##return_name, if the sum of the lengths of get_subsystem_path() and filename is >= MAXBUF, then we can end up reading from a file other than the one intended, if such a file exists. That seems like it might be bad. Also, the filename argument should be const char*. From kim.barrett at oracle.com Mon Oct 23 05:44:31 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 23 Oct 2017 01:44:31 -0400 Subject: RFR: 8163897: oop_store has unnecessary memory barriers Message-ID: Please review this change to the oop_store function template, which removes some unnecessary memory barriers, moves CMS-specific code into GC-specific (though not completely CMS-specific) areas, and cleans up the API a bit. See the CR for more details about the problems. [Note: CTMRBS expands to CardTableModRefBS below.] As a preliminary cleanup, CTMRBS::inline_write_ref_field has been merged into it's only caller, CTMRBS::write_ref_field_work. This left the file gc/shared/cardTableModRefBS.inline.hpp effectively empty, so it has been removed. As a related cleanup, CTMRBS::inline_write_ref_field_pre was found to be unused and has been removed. The volatile overload for oop_store has been renamed to release_oop_store, to correspond to its purpose. oop_store no longer examines always_do_update_barrier to conditionally call the (now renamed) volatile overload. The only other caller of the volatile overload was release_obj_field_put, which has been updated for the new name. The release argument for BarrierSet::write_ref_field and all the related implementation has been removed. Instead, CTMRBS::write_ref_field_work now uses a release_store to mark the card if card marking is required to be ordered after the value store, e.g. for CMS, per the value of always_do_update_barrier. Finally, the global variable always_do_update_barrier, which was only needed for CMS, has been replaced with member variable CTMRBS::_requires_ordered_marking (with accessor functions). (G1 had commented out manipulation of this variable, added in commented out state as part of fix for 6904516; looks like debugging leftovers. Those have been removed.) So we now have [release_]oop_store, which (1) calls the barrier set's pre-barrier handler (which is a nop except for G1), (2) then performs a [release_]store of the new value, (3) and finally calls the barrier set's post-barrier handler. The post-barrier handler shared by Serial, Parallel, and CMS performs the card marking with a release barrier when requested (only for CMS). With these changes, a release store of the new value is only done when that's what is actually required by the caller, without regard to some hidden global variable. Also with these changes, only CMS (not Serial or Parallel) uses a release store for the card marking, and then only when actually needed, irrespective of whether the value store needed to be a release store. Finally, _requires_ordered_marking is now only set true when both UseConcMarkSweepGC and CMSPrecleaningEnabled are true, which matches the behavior of JITed code. Precleaning is what requires the ordering, so there's no point if it's disabled. CR: https://bugs.openjdk.java.net/browse/JDK-8163897 Webrev: http://cr.openjdk.java.net/~kbarrett/8163897/open.00/ Testing: hs-tier1 through hs-tier5. From dmitry.samersoff at bell-sw.com Mon Oct 23 07:37:14 2017 From: dmitry.samersoff at bell-sw.com (Dmitry Samersoff) Date: Mon, 23 Oct 2017 10:37:14 +0300 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: <51f57623-8ce5-0883-69cc-9ba6b39b5a65@oracle.com> References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <1d05a95f-75db-e0ca-e069-12fe41502e4f@oracle.com> <615e504d-94af-bae3-b721-6ca1dac6a567@oracle.com> <1BD883DB-8C8B-405D-8F85-3A026B19286F@oracle.com> <5f7f3d85-db48-fe6e-28f5-e3f4858f33e8@oracle.com> <799205ae-ba9f-ce3a-8dd6-1a55e32689df@oracle.com> <9956F9D0-B01B-44FE-AE56-527907816436@oracle.com> <20ef0bac-1942-b29f-a9e2-4ea4d4f81cd2@oracle.com> <5d217c60-3049-30a6-c207-d6c9274a5ddf@oracle.com> <1C03FCB5-969B-4C43-8BAD-EF939515FEC2@oracle.com> <51f57623-8ce5-0883-69cc-9ba6b39b5a65@oracle.com> Message-ID: Bob, I compiled and run .02 on aarch64 linux and it works as expected. -Dmitry On 23.10.2017 00:52, David Holmes wrote: > Hi Bob, > > Changes seem fine. > > I'll take up the issue of whether this should be enabled by default in > the CSR. > > Thanks, > David > > On 21/10/2017 4:44 AM, Bob Vandette wrote: >> Here?s an updated webrev that hopefully takes care of all remaining >> comments. >> >> http://cr.openjdk.java.net/~bobv/8146115/webrev.02 >> >> I added the deprecation of the UseCGroupMemoryLimitForHeap option this >> round since >> this experimental option should no longer be necessary. >> >> >> Bob. >> >> >>> On Oct 13, 2017, at 9:34 AM, David Holmes >>> wrote: >>> >>> Reading back through my suggestion for os.hpp >>> initialize_container_support should just be init_container_support. >>> >>> Thanks, >>> David >>> >>> On 13/10/2017 11:14 PM, Bob Vandette wrote: >>>>> On Oct 12, 2017, at 11:08 PM, David Holmes >>>>> wrote: >>>>> >>>>> Hi Bob, >>>>> >>>>> On 13/10/2017 1:43 AM, Bob Vandette wrote: >>>>>>> On Oct 11, 2017, at 9:04 PM, David Holmes >>>>>>> wrote: >>>>>>> >>>>>>> Hi Bob, >>>>>>> >>>>>>> On 12/10/2017 5:11 AM, Bob Vandette wrote: >>>>>>>> Here?s an updated webrev for this RFE that contains changes and >>>>>>>> cleanups based on feedback I?ve received so far. >>>>>>>> I?m still investigating the best approach for reacting to cpu >>>>>>>> shares and quotas.? I do not believe doing nothing is the answer. >>>>>>> >>>>>>> I do. :) Let me try this again. When you run outside of a >>>>>>> container you don't get 100% of the CPUs - you have to share with >>>>>>> whatever else is running on the system. You get a fraction of CPU >>>>>>> time based on the load. We don't try to communicate load >>>>>>> information to the VM/application so it can adapt. Within a >>>>>>> container setting shares/quotas is just a way of setting an >>>>>>> artificial load. So why should we be treating it any differently? >>>>>> Because today we optimize for a lightly loaded system and when >>>>>> running serverless applications in containers we should be >>>>>> optimizing for a fully loaded system.? If developers don?t want >>>>>> this, then don?t use shares or quotas and you?ll have exactly >>>>>> the behavior you have today.? I think we just have to document the >>>>>> new behavior (and how to turn it off) so people know what >>>>>> to expect. >>>>> >>>>> The person deploying the app may not have control over how the app >>>>> is deployed in terms of shares/quotas. It all depends how (and who) >>>>> manages the containers. This is a big part of my problem/concerns >>>>> here that I don't know exactly how all this is organized and who >>>>> knows what in advance and what they can control. >>>>> >>>>> But I'll let this drop, other than raising an additional concern. I >>>>> don't think just allowing the user to hardwire the number of >>>>> processors to use will necessarily solve the problem with what >>>>> available_processors() returns. I'm concerned the execution of the >>>>> VM may occur in a context where the number of processors is not >>>>> known in advance, and the user can not disable shares/quotas. In >>>>> that case we may need to have a flag that says to ignore >>>>> shares/quotas in the processor count calculation. >>>> I?m not sure that?s a high probability issue.? It?s my understanding >>>> that whoever is configuring the container >>>> management will be specifying the resources required to run these >>>> applications which comes along with a >>>> guarantee of these resources.? If this issue does come up, I do have >>>> the -XX:-UseContainerSupport big >>>> switch that turns all of this off.? It will however disable the >>>> memory support as well. >>>>> >>>>>> You seem to discount the added cost of 100s of VMs creating lots >>>>>> of un-necessaary threads.? In the current JDK 10 code base, >>>>>> In a heavily loaded system with 88 processors, VmData grows from >>>>>> 60MBs (1 cpu) to 376MB (88 cpus).? This is only mapped >>>>>> memory and it depends heavily on how deep in the stack these >>>>>> threads go before it impacts VmRSS but it shows the potential >>>>>> downside >>>>>> of having 100s of VMs thinking they each own the entire machine. >>>>> >>>>> I agree that the default ergonomics does not scale well. Anyone >>>>> doing any serious Java deployment tunes the VM explicitly and does >>>>> not rely on the defaults. How will they do that in a container >>>>> environment? I don't know. >>>>> >>>>> I would love to see some actual deployment scenarios/experiences >>>>> for this to understand things better. >>>> This is one of the reasons I want to get this support out in JDK 10, >>>> to get some feedback under real scenarios. >>>>> >>>>>> I haven?t even done any experiments to determine the added context >>>>>> switching cost if the VM decides to use excessive >>>>>> pthreads. >>>>>>> >>>>>>> That's not to say an API to provide load/shares/quota information >>>>>>> may not be useful, but that is a separate issue to what the >>>>>>> "active processor count" should report. >>>>>> I don?t have a problem with active processor count reporting the >>>>>> number of processors we have, but I do have a problem >>>>>> with our current usage of this information within the VM and Core >>>>>> libraries. >>>>> >>>>> That is a somewhat separate issue. One worth pursuing separately. >>>> We should look at this as part of the ?Container aware Java? JEP. >>>>> >>>>>>> >>>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.01 >>>>>>>> Updates: >>>>>>>> 1. I had to move the processing of AggressiveHeap since the >>>>>>>> container memory size needs to be known before this can be >>>>>>>> processed. >>>>>>> >>>>>>> I don't like the placement of this - we don't call os:: init >>>>>>> functions from inside Arguments - we manage the initialization >>>>>>> sequence from Threads::create_vm. Seems to me that container >>>>>>> initialization can/should happen in os::init_before_ergo, and the >>>>>>> AggressiveHeap processing can occur at the start of >>>>>>> Arguments::apply_ergo(). >>>>>>> >>>>>>> That said we need to be sure nothing touched by >>>>>>> set_aggressive_heap_flags will be used before we now reach that >>>>>>> code - there are a lot of flags being set in there. >>>>>> This is exactly the reason why I put the call where it did.? I put >>>>>> the call to set_aggressive_heap_flags in finalize_vm_init_args >>>>>> because that is exactly what this call is doing.? It?s finalizing >>>>>> flags used after the parsing.? The impacted flags are definitely >>>>>> being >>>>>> used shortly after and before init_before_ergo is called. >>>>> >>>>> I see that now and it is very unfortunate because I really do not >>>>> like what you had to do here. As you can tell from the logic in >>>>> create_vm we have always refactored to ensure we can progressively >>>>> manage the interleaving of OS initialization with Arguments >>>>> processing. So having a deep part of Argument processing go off and >>>>> call some more OS initialization is not nice. That said I can't see >>>>> a way around it without very unreasonable refactoring. >>>>> >>>>> But I do have a couple of changes I'd like to request please: >>>>> >>>>> 1. Move the call to os::initialize_container_support() up a level >>>>> to before the call to finalize_vm_init_args(), with a more >>>>> elaborate comment: >>>>> >>>>> // We need to ensure processor and memory resources have been properly >>>>> // configured - which may rely on arguments we just processed - before >>>>> // doing the final argument processing. Any argument processing that >>>>> // needs to know about processor and memory resources must occur after >>>>> // this point. >>>>> >>>>> os::initialize_container_support(); >>>>> >>>>> // Do final processing now that all arguments have been parsed >>>>> result = finalize_vm_init_args(patch_mod_javabase); >>>>> >>>>> 2. Simplify and modify os.hpp as follows: >>>>> >>>>> +? LINUX_ONLY(static void pd_initialize_container_support();) >>>>> >>>>> ?? public: >>>>> ??? static void init(void);????????????????????? // Called before >>>>> command line parsing >>>>> >>>>> +?? static void initialize_container_support() { // Called during >>>>> command line parsing >>>>> +???? LINUX_ONLY(pd_initialize_container_support();) >>>>> +?? } >>>>> >>>>> ??? static void init_before_ergo(void);????????? // Called after >>>>> command line parsing >>>>> ???????????????????????????????????????????????? // before VM >>>>> ergonomics >>>>> >>>>> 3. In thread.cpp add a comment here: >>>>> >>>>> ?? // Parse arguments >>>>> +? // Note: this internally calls os::initialize_container_support() >>>>> ?? jint parse_result = Arguments::parse(args); >>>> All very reasonable changes. >>>> Thanks, >>>> Bob. >>>>> >>>>> Thanks. >>>>> >>>>>>> >>>>>>>> 2. I no longer use the cpuset.cpus contents since >>>>>>>> sched_getaffinity reports the correct results >>>>>>>> even if someone manually updates the cgroup data.? I originally >>>>>>>> didn?t think this was the case since >>>>>>>> sched_setaffinity didn?t automatically update the cpuset file >>>>>>>> contents but the inverse is true. >>>>>>> >>>>>>> Ok. >>>>>>> >>>>>>>> 3. I ifdef?d the container function support in >>>>>>>> src/hotspot/share/runtime/os.hpp to avoid putting stubs in all >>>>>>>> other os >>>>>>>> platform directories.? I can do this if it?s absolutely necessary. >>>>>>> >>>>>>> You should not need to do this if initialization moves as I >>>>>>> suggested above. os::init_before_ergo() in os_linux.cpp can call >>>>>>> OSContainer::init(). >>>>>>> No need for os::initialize_container_support() or >>>>>>> os::pd_initialize_container_support. >>>>>> But os::init_before_ergo is in shared code. >>>>> >>>>> Yep my bad - point is moot now anyway. >>>>> >>>>> >>>>> >>>>>>> src/hotspot/os/linux/os_linux.cpp/.hpp >>>>>>> >>>>>>> 187???????? log_trace(os)("available container memory: " >>>>>>> JULONG_FORMAT, avail_mem); >>>>>>> 188???????? return avail_mem; >>>>>>> 189?????? } else { >>>>>>> 190???????? log_debug(os,container)("container memory usage call >>>>>>> failed: " JLONG_FORMAT, mem_usage); >>>>>>> >>>>>>> Why "trace" (the third logging level) to show the information, >>>>>>> but "debug" (the second level) to show failed calls? You use >>>>>>> debug in other files for basic info. Overall I'm unclear on your >>>>>>> use of debug versus trace for the logging. >>>>>> I use trace for noisy information that is not reporting errors and >>>>>> debug for failures that are informational and not fatal. >>>>>> In this case, the call could return -1 or -2.? -1 is unlimited and >>>>>> -2 is an error.? In either case we fallback to the >>>>>> standard system call to get available memory.? I would have used >>>>>> warning but since these messages were occurring >>>>>> during a test run causing test failures. >>>>> >>>>> Okay. Thanks for clarifying. >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/os/linux/osContainer_linux.cpp >>>>>>> >>>>>>> Dead code: >>>>>>> >>>>>>> 376 #if 0 >>>>>>> 377?? os::Linux::print_container_info(tty); >>>>>>> ... >>>>>>> 390 #endif >>>>>> I left it in for standalone testing.? Should I use some other #if? >>>>> >>>>> We don't generally leave in dead code in the runtime code. Do you >>>>> see this as useful after you've finalized the changes? >>>>> >>>>> Is this testing just for showing the logging? Is it worth making >>>>> this a logging controlled call? Is it suitable for a Gtest test? >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> Bob. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> Bob. >> From rkennke at redhat.com Mon Oct 23 12:21:08 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 23 Oct 2017 14:21:08 +0200 Subject: RFR: 8184914: Use MacroAssembler::cmpoop() consistently when comparing heap objects In-Reply-To: References: <8d667010-f17e-7d1b-088b-106999e3b005@redhat.com> <9b629556-b3f0-e52e-35e0-711c6a767e95@oracle.com> <55bb0f72-df71-44bc-53a0-7d982ab1ca04@redhat.com> Message-ID: <810cfcd2-95ed-9df8-0910-dd2beecbdd48@redhat.com> Hi Coleen, thank you. Can you sponsor it? Do you need anything from me? Thanks, Roman > I'm calling this as "trivial" and can be pushed now. > Thanks, > Coleen > > On 10/17/17 5:05 PM, Roman Kennke wrote: >> >>> >>> This looks reasonable to me.? Maybe the compiler group should review >>> the c1 part.? I changed the mailing list to hotspot-dev. >>> I can sponsor this for you. >> Thanks, thanks and thanks! ;-) >> >> Roman >> >>> Thanks, >>> Coleen >>> >>> On 10/17/17 4:22 PM, Roman Kennke wrote: >>>> (Not sure if this is the correct list to ask.. if not, please let >>>> me know and/or redirect me) >>>> >>>> Currently, cmpoop() is only declared for 32-bit x86, and only used >>>> in 2 places in C1 to compare oops. In other places, oops are >>>> compared using cmpptr(). It would be useful to distinguish normal >>>> pointer comparisons from heap object comparisons, and use cmpoop() >>>> consistently for heap object comparisons. This would remove clutter >>>> in several places where we have #ifdef _LP64 around comparisons, >>>> and would also allow to insert necessary barriers for GCs that need >>>> them (e.g. Shenandoah) later. >>>> >>>> http://cr.openjdk.java.net/~rkennke/8184914/webrev.00/ >>>> >>>> >>>> Tested by running hotspot_gc jtreg tests. >>>> >>>> Can I get a review please? >>>> >>>> Thanks, Roman >>>> >>>> >>> >> > From coleen.phillimore at oracle.com Mon Oct 23 12:30:27 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 23 Oct 2017 08:30:27 -0400 Subject: RFR: 8184914: Use MacroAssembler::cmpoop() consistently when comparing heap objects In-Reply-To: <810cfcd2-95ed-9df8-0910-dd2beecbdd48@redhat.com> References: <8d667010-f17e-7d1b-088b-106999e3b005@redhat.com> <9b629556-b3f0-e52e-35e0-711c6a767e95@oracle.com> <55bb0f72-df71-44bc-53a0-7d982ab1ca04@redhat.com> <810cfcd2-95ed-9df8-0910-dd2beecbdd48@redhat.com> Message-ID: On 10/23/17 8:21 AM, Roman Kennke wrote: > Hi Coleen, > > thank you. Can you sponsor it? Do you need anything from me? I do not.? I'll push it now.? I'm curious why you didn't change any of the other platforms.? Or do you only need this for x86? thanks, Coleen > > Thanks, Roman > >> I'm calling this as "trivial" and can be pushed now. >> Thanks, >> Coleen >> >> On 10/17/17 5:05 PM, Roman Kennke wrote: >>> >>>> >>>> This looks reasonable to me.? Maybe the compiler group should >>>> review the c1 part.? I changed the mailing list to hotspot-dev. >>>> I can sponsor this for you. >>> Thanks, thanks and thanks! ;-) >>> >>> Roman >>> >>>> Thanks, >>>> Coleen >>>> >>>> On 10/17/17 4:22 PM, Roman Kennke wrote: >>>>> (Not sure if this is the correct list to ask.. if not, please let >>>>> me know and/or redirect me) >>>>> >>>>> Currently, cmpoop() is only declared for 32-bit x86, and only used >>>>> in 2 places in C1 to compare oops. In other places, oops are >>>>> compared using cmpptr(). It would be useful to distinguish normal >>>>> pointer comparisons from heap object comparisons, and use cmpoop() >>>>> consistently for heap object comparisons. This would remove >>>>> clutter in several places where we have #ifdef _LP64 around >>>>> comparisons, and would also allow to insert necessary barriers for >>>>> GCs that need them (e.g. Shenandoah) later. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/8184914/webrev.00/ >>>>> >>>>> >>>>> Tested by running hotspot_gc jtreg tests. >>>>> >>>>> Can I get a review please? >>>>> >>>>> Thanks, Roman >>>>> >>>>> >>>> >>> >> > From rkennke at redhat.com Mon Oct 23 12:47:11 2017 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 23 Oct 2017 14:47:11 +0200 Subject: RFR: 8184914: Use MacroAssembler::cmpoop() consistently when comparing heap objects In-Reply-To: References: <8d667010-f17e-7d1b-088b-106999e3b005@redhat.com> <9b629556-b3f0-e52e-35e0-711c6a767e95@oracle.com> <55bb0f72-df71-44bc-53a0-7d982ab1ca04@redhat.com> <810cfcd2-95ed-9df8-0910-dd2beecbdd48@redhat.com> Message-ID: <9071d7a5-1837-51e0-ec09-ba5922415811@redhat.com> Am 23.10.2017 um 14:30 schrieb coleen.phillimore at oracle.com: > > > On 10/23/17 8:21 AM, Roman Kennke wrote: >> Hi Coleen, >> >> thank you. Can you sponsor it? Do you need anything from me? > > I do not.? I'll push it now.? I'm curious why you didn't change any of > the other platforms.? Or do you only need this for x86? > thanks, > Coleen x86 seemed most important for now because there was this odd discrepancy between 32 and 64 bit (cmpoop did exist before, but only for one special case in 32 bit). I will do something similar for aarch64 later (need it for Shenandoah). Others will have to fill in the required parts for Shenandoah to other platforms, if they need it. Thanks for all your help! Roman From robbin.ehn at oracle.com Mon Oct 23 15:16:41 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 23 Oct 2017 17:16:41 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <3018D48F-245A-4C92-9CED-5692BBD88E8C@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <3018D48F-245A-4C92-9CED-5692BBD88E8C@oracle.com> Message-ID: <47c2ac8e-151e-267f-28bd-f76ed5ef5357@oracle.com> Hi, On 2017-10-20 18:24, Karen Kinnear wrote: > Robbin, Erik, Mikael - > > Delighted to see this! Looks good. I don?t need to see any updates - these are minor comments. > Thank you for the performance testing > > Couple of questions/comments: > 1. platform support > supports_thread_local_poll returns true for AMD64 or SPARC > Your comment said Linux x64 and Sparc only. > What about Mac and Windows? Sorry it should be x64 and SPARC, OS is not important. (so yes mac and windows) > > 2. safepointMechanism_inline.hpp - comment clarification > line 42 - ?Mutexes can be taken but none JavaThread?. > Are you saying: ?Non-JavaThreads do not support handshakes, but must stop for > safepoints.? > Not sure what the Mutex comment is about Fixed: "// If the poll is on a non-java thread, we can only check the global state." This is possible from e.g. Monitor::TrySpin. > > 3. globals.hpp > The way I understand this - ThreadLocalHandshakes flag is not so much to enable > use of ThreadLocalHandle operations, but to enable use of TLH for global safe point. > If that is true, could you possibly at least clarify this in the comment if there is not > a better name for the flag? Fixed "Use thread-local polls instead of global poll for safepoints." We can also do better name of option, e.g. -XX:+(Use)ThreadLocalPoll ? Let me know. > > 4. thank you for looking into startup performance and interpreter return/backward branch checks. We are committed to fix this before 18.3! > > 5. handshake.cpp > Could you possibly add a comment that thread_has_completed and/or pool_for_completed_thread > means that the thread has either done the operation or the operation has been cancelled? > I get that we are polling this to tell when it is safe to return to the synchronous requestor not to > determine if the thread actually performed the operation. The comment would make that clearer. Fixed Incremental: http://cr.openjdk.java.net/~rehn/8185640/v3/Assorted-Karen-5/webrev/ Again let me know if anyone needs another kind! Thanks Karen! /Robbin > > thanks, > Karen > >> On Oct 11, 2017, at 9:37 AM, Robbin Ehn wrote: >> >> Hi all, >> >> Starting the review of the code while JEP work is still not completed. >> >> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >> >> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none. >> >> Entire changeset: >> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >> >> Divided into 3-parts, >> SafepointMechanism abstraction: >> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >> Consolidating polling page allocation: >> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >> Handshakes: >> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >> >> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well. >> >> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >> >> Example of potential use-cases: >> -Biased lock revocation >> -External requests for stack traces >> -Deoptimization >> -Async exception delivery >> -External suspension >> -Eliding memory barriers >> >> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC. >> >> Tested heavily with various test suits and comes with a few new tests. >> >> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list. >> >> Thanks, Robbin > From robbin.ehn at oracle.com Mon Oct 23 15:26:26 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 23 Oct 2017 17:26:26 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> Message-ID: <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> Hi Martin, On 2017-10-18 16:05, Doerr, Martin wrote: > Hi Robbin, > > thanks for the quick reply and for doing additional benchmarks. > Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code. > I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark. We are committed to fix this, but it might come as separate RFE/bug depending on the JEP's timeline. (If the fix, very unlikely, would not be done before next release, we would change the default to off) I hope this is an acceptable path? Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Robbin Ehn [mailto:robbin.ehn at oracle.com] > Sent: Mittwoch, 18. Oktober 2017 15:58 > To: Doerr, Martin ; hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > Hi Martin, > > On 2017-10-18 12:11, Doerr, Martin wrote: >> Hi Robbin, >> >> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? >> I'd be fine with that, too. > > Yes, great! > >> >> While thinking a little longer about the interpreter implementation, a new idea came into my mind. >> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like >> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); >> in TemplateInterpreterGenerator::generate_and_dispatch. > > We have not seen any performance regression in simple benchmark with this. > I will do a better benchmark and compare what difference it makes. > > Thanks, Robbin > >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >> Sent: Mittwoch, 18. Oktober 2017 11:07 >> To: Doerr, Martin ; hotspot-dev developers >> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >> >> Thanks for looking at this. >> >> On 2017-10-17 19:58, Doerr, Martin wrote: >>> Hi Robbin, >>> >>> my first impression is very good. Thanks for providing the webrev. >> >> Great! >> >>> >>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >>> Would it be ok to move the decision between what to use to platform code? >>> (Some platforms could still use both if this is beneficial.) >>> >>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. >> >> I see no issue with this. >> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. >> Can we do this incremental when adding the platform support for PPC64? >> >> Thanks, Robbin >> >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>> To: hotspot-dev developers >>> Subject: RFR(XL): 8185640: Thread-local handshakes >>> >>> Hi all, >>> >>> Starting the review of the code while JEP work is still not completed. >>> >>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>> >>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >>> just all threads or none. >>> >>> Entire changeset: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>> >>> Divided into 3-parts, >>> SafepointMechanism abstraction: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>> Consolidating polling page allocation: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>> Handshakes: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>> >>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >>> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >>> handshake can be performed with that single JavaThread as well. >>> >>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >>> >>> Example of potential use-cases: >>> -Biased lock revocation >>> -External requests for stack traces >>> -Deoptimization >>> -Async exception delivery >>> -External suspension >>> -Eliding memory barriers >>> >>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >>> platforms are Linux x64 and Solaris SPARC. >>> >>> Tested heavily with various test suits and comes with a few new tests. >>> >>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >>> JavaThreads in an array instead of a linked list. >>> >>> Thanks, Robbin >>> From martin.doerr at sap.com Mon Oct 23 15:40:44 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 23 Oct 2017 15:40:44 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> Message-ID: Hi Coleen and Robbin, I'm ok with putting it into a separate RFE. I understand that there are more fun activities than rebasing this XL change for a long time :-) So you don't need to delay it. It's acceptable for me. Thanks, Coleen, for sharing your proposal. I appreciate it. Best regards, Martin -----Original Message----- From: Robbin Ehn [mailto:robbin.ehn at oracle.com] Sent: Montag, 23. Oktober 2017 17:26 To: Doerr, Martin ; hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes Hi Martin, On 2017-10-18 16:05, Doerr, Martin wrote: > Hi Robbin, > > thanks for the quick reply and for doing additional benchmarks. > Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code. > I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark. We are committed to fix this, but it might come as separate RFE/bug depending on the JEP's timeline. (If the fix, very unlikely, would not be done before next release, we would change the default to off) I hope this is an acceptable path? Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Robbin Ehn [mailto:robbin.ehn at oracle.com] > Sent: Mittwoch, 18. Oktober 2017 15:58 > To: Doerr, Martin ; hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > Hi Martin, > > On 2017-10-18 12:11, Doerr, Martin wrote: >> Hi Robbin, >> >> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? >> I'd be fine with that, too. > > Yes, great! > >> >> While thinking a little longer about the interpreter implementation, a new idea came into my mind. >> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like >> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); >> in TemplateInterpreterGenerator::generate_and_dispatch. > > We have not seen any performance regression in simple benchmark with this. > I will do a better benchmark and compare what difference it makes. > > Thanks, Robbin > >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >> Sent: Mittwoch, 18. Oktober 2017 11:07 >> To: Doerr, Martin ; hotspot-dev developers >> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >> >> Thanks for looking at this. >> >> On 2017-10-17 19:58, Doerr, Martin wrote: >>> Hi Robbin, >>> >>> my first impression is very good. Thanks for providing the webrev. >> >> Great! >> >>> >>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >>> Would it be ok to move the decision between what to use to platform code? >>> (Some platforms could still use both if this is beneficial.) >>> >>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. >> >> I see no issue with this. >> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. >> Can we do this incremental when adding the platform support for PPC64? >> >> Thanks, Robbin >> >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>> To: hotspot-dev developers >>> Subject: RFR(XL): 8185640: Thread-local handshakes >>> >>> Hi all, >>> >>> Starting the review of the code while JEP work is still not completed. >>> >>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>> >>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >>> just all threads or none. >>> >>> Entire changeset: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>> >>> Divided into 3-parts, >>> SafepointMechanism abstraction: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>> Consolidating polling page allocation: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>> Handshakes: >>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>> >>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >>> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >>> handshake can be performed with that single JavaThread as well. >>> >>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >>> >>> Example of potential use-cases: >>> -Biased lock revocation >>> -External requests for stack traces >>> -Deoptimization >>> -Async exception delivery >>> -External suspension >>> -Eliding memory barriers >>> >>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >>> platforms are Linux x64 and Solaris SPARC. >>> >>> Tested heavily with various test suits and comes with a few new tests. >>> >>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >>> JavaThreads in an array instead of a linked list. >>> >>> Thanks, Robbin >>> From karen.kinnear at oracle.com Mon Oct 23 15:58:55 2017 From: karen.kinnear at oracle.com (Karen Kinnear) Date: Mon, 23 Oct 2017 08:58:55 -0700 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> Message-ID: <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> Works for me Thanks, Karen > On Oct 23, 2017, at 8:40 AM, Doerr, Martin wrote: > > Hi Coleen and Robbin, > > I'm ok with putting it into a separate RFE. I understand that there are more fun activities than rebasing this XL change for a long time :-) > So you don't need to delay it. It's acceptable for me. > > Thanks, Coleen, for sharing your proposal. I appreciate it. > > Best regards, > Martin > > > -----Original Message----- > From: Robbin Ehn [mailto:robbin.ehn at oracle.com] > Sent: Montag, 23. Oktober 2017 17:26 > To: Doerr, Martin ; hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > Hi Martin, > >> On 2017-10-18 16:05, Doerr, Martin wrote: >> Hi Robbin, >> >> thanks for the quick reply and for doing additional benchmarks. >> Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code. >> I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) > > Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark. > We are committed to fix this, but it might come as separate RFE/bug depending on > the JEP's timeline. > > (If the fix, very unlikely, would not be done before next release, we would > change the default to off) > > I hope this is an acceptable path? > > Thanks, Robbin > >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >> Sent: Mittwoch, 18. Oktober 2017 15:58 >> To: Doerr, Martin ; hotspot-dev developers >> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >> >> Hi Martin, >> >>> On 2017-10-18 12:11, Doerr, Martin wrote: >>> Hi Robbin, >>> >>> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? >>> I'd be fine with that, too. >> >> Yes, great! >> >>> >>> While thinking a little longer about the interpreter implementation, a new idea came into my mind. >>> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like >>> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); >>> in TemplateInterpreterGenerator::generate_and_dispatch. >> >> We have not seen any performance regression in simple benchmark with this. >> I will do a better benchmark and compare what difference it makes. >> >> Thanks, Robbin >> >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>> Sent: Mittwoch, 18. Oktober 2017 11:07 >>> To: Doerr, Martin ; hotspot-dev developers >>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>> >>> Thanks for looking at this. >>> >>>> On 2017-10-17 19:58, Doerr, Martin wrote: >>>> Hi Robbin, >>>> >>>> my first impression is very good. Thanks for providing the webrev. >>> >>> Great! >>> >>>> >>>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >>>> Would it be ok to move the decision between what to use to platform code? >>>> (Some platforms could still use both if this is beneficial.) >>>> >>>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. >>> >>> I see no issue with this. >>> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. >>> Can we do this incremental when adding the platform support for PPC64? >>> >>> Thanks, Robbin >>> >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >>>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>>> To: hotspot-dev developers >>>> Subject: RFR(XL): 8185640: Thread-local handshakes >>>> >>>> Hi all, >>>> >>>> Starting the review of the code while JEP work is still not completed. >>>> >>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>>> >>>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >>>> just all threads or none. >>>> >>>> Entire changeset: >>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>>> >>>> Divided into 3-parts, >>>> SafepointMechanism abstraction: >>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>>> Consolidating polling page allocation: >>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>>> Handshakes: >>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>>> >>>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >>>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >>>> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >>>> handshake can be performed with that single JavaThread as well. >>>> >>>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >>>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >>>> >>>> Example of potential use-cases: >>>> -Biased lock revocation >>>> -External requests for stack traces >>>> -Deoptimization >>>> -Async exception delivery >>>> -External suspension >>>> -Eliding memory barriers >>>> >>>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >>>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >>>> platforms are Linux x64 and Solaris SPARC. >>>> >>>> Tested heavily with various test suits and comes with a few new tests. >>>> >>>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >>>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >>>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >>>> JavaThreads in an array instead of a linked list. >>>> >>>> Thanks, Robbin >>>> From aph at redhat.com Mon Oct 23 16:36:03 2017 From: aph at redhat.com (Andrew Haley) Date: Mon, 23 Oct 2017 17:36:03 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: <33aff570-5bdb-d1aa-bccd-f6122db61051@redhat.com> This is a bad way to handle supports_thread_local_poll(): static bool supports_thread_local_poll() { #if defined(AMD64) || defined(SPARC) return true; #else return false; #endif } Instead, it is better to use a flag which is #defined in the back ends, and allow each back end to specify if it supports thread-local handshakes. We have *two* AARCH64 back ends, and only one of them supports thread-local handshakes; both of them #define AARCH64. #if defined(BLAH) should be reserved for hardware-specific properties, not back-end-specific properties. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From volker.simonis at gmail.com Mon Oct 23 17:15:01 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 23 Oct 2017 19:15:01 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <1c2eeaa1-334a-4744-ba31-87e580faafa5@oracle.com> References: <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> <4109f960-078f-e582-3c78-71f201a265fd@redhat.com> <1c2eeaa1-334a-4744-ba31-87e580faafa5@oracle.com> Message-ID: Hi Vladimir, that's a good suggestion! I've did so and prepared a new webrev: http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v4/ I've also verified that: http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ still applies after 8166317.v4 Thank you and best regards, Volker On Tue, Oct 17, 2017 at 7:49 PM, Vladimir Kozlov wrote: > Hi, Volker > > You can do a trick with NOT_SPARC() macro to avoid defining empty method on > all platforms: > > +#if INCLUDE_ALL_GCS > +void g1_barrier_stubs_init() NOT_SPARC( {} ); // depends on universe_init, > must be before interpreter_init > +#endif > > I thought we pushed 8187091 already. I will keep it in mind. > > Thanks, > Vladimir > > > On 10/10/17 10:17 AM, Volker Simonis wrote: >> >> On Tue, Oct 10, 2017 at 9:42 AM, Andrew Haley wrote: >>> >>> On 09/10/17 20:24, Volker Simonis wrote: >>>> >>>> Unfortunately we can't easily generate these stubs during >>>> 'stubRoutines_init1()' because >>>> 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map >>>> base address which is only initialized in >>>> 'CardTableModRefBS::initialize()' during 'univers_init()' which >>>> happens after 'stubRoutines_init1()'. >>> >>> >>> Yes you can, you can do something like we do for narrow_ptrs_base: >>> >>> if (Universe::is_fully_initialized()) { >>> mov(rheapbase, Universe::narrow_ptrs_base()); >>> } else { >>> lea(rheapbase, >>> ExternalAddress((address)Universe::narrow_ptrs_base_addr())); >>> ldr(rheapbase, Address(rheapbase)); >>> } >>> >> >> Hi Andrew, >> >> thanks for your suggestion. Yes, I could do that, but that would >> replace a constant load in the barrier with a constant load plus a >> load from memory, because during stubRoutines_init1() heap won't be >> initialized. Not sure about this, but I think we want to avoid this >> overhead in the barriers. >> >> Also, Christian proposed in a previous mail to replace the G1 barrier >> stubs on SPARC with simple runtime calls like on other platforms. >> While I think that it is probably worthwhile thinking about such a >> change, I don't know the exact history of these stubs and probably >> some GC experts should decide if that's really a good idea. I'd be >> happy to open an extra issue for following up on that path. >> >> But for the moments I've simply added a new initialization step >> "g1_barrier_stubs_init()" between 'univers_init()' and >> interpreter_init() which is empty on all platforms except SPARC where >> it generates the corresponding stubs: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v3/ >> >> I've built and smoke-tested the new change on Windows, MacOS, >> Solaris/SPARC, AIX, Linux/x86_64/ppc64/ppc64le/s390. Unfortunately I >> don't have access to ARM machines so I couldn't check arm,arm64 and >> aarch64 although I don't expect any problems there (actually I've just >> added an empty method there). But it would be great if somebody could >> check that for any case. >> >> @Vladimir: I've also rebased the change for "8187091: >> ReturnBlobToWrongHeapTest fails because of problems in >> CodeHeap::contains_blob()": >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ >> >> Because it changes the same files like 8166317 it should be applied >> and pushed only after 8166317 was pushed. >> >> Thank you and best regards, >> Volker >> >>> -- >>> Andrew Haley >>> Java Platform Lead Engineer >>> Red Hat UK Ltd. >>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From bob.vandette at oracle.com Mon Oct 23 18:28:31 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Mon, 23 Oct 2017 14:28:31 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> Message-ID: Thanks Kim! Bob. > On Oct 23, 2017, at 12:52 AM, Kim Barrett wrote: > >> On Sep 27, 2017, at 9:20 PM, David Holmes wrote: >>>> 62 void set_subsystem_path(char *cgroup_path) { >>>> >>>> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere? >>> I tried several different ways of declaring the container accessor functions and >>> always ended up with warnings due to scanf not being able to validate arguments >>> since the format string didn?t end up being a string literal. I originally was using templates >>> and then ended up with the macros. I tried several different casts but could resolve the problem. >> >> Sounds like something Kim Barrett should take a look at :) > > Fortunately, I just happened by. > > The warnings are because we compile with -Wformat=2, which enables > -Wformat-nonliteral (among other things). > > Use PRAGMA_FORMAT_NONLITERAL_IGNORED, e.g. > > PRAGMA_DIAG_PUSH > PRAGMA_FORMAT_NONLITERAL_IGNORED > > PRAGMA_DIAG_POP > > That will silence warnings about sscanf (or anything else!) with a > non-literal format string within that . > > Also, while I was looking at this, I noticed that in > get_subsytem_file_contents_##return_name, if the sum of the lengths of > get_subsystem_path() and filename is >= MAXBUF, then we can end up > reading from a file other than the one intended, if such a file > exists. That seems like it might be bad. > > Also, the filename argument should be const char*. > From mark.reinhold at oracle.com Mon Oct 23 19:43:00 2017 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Mon, 23 Oct 2017 12:43:00 -0700 (PDT) Subject: JEP 312: Thread-Local Handshakes Message-ID: <20171023194300.CA616EB325@eggemoggin.niobe.net> New JEP Candidate: http://openjdk.java.net/jeps/312 - Mark From vladimir.kozlov at oracle.com Mon Oct 23 20:00:17 2017 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 23 Oct 2017 13:00:17 -0700 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: References: <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> <4109f960-078f-e582-3c78-71f201a265fd@redhat.com> <1c2eeaa1-334a-4744-ba31-87e580faafa5@oracle.com> Message-ID: <9c35ed03-5a85-8b14-6874-cd828f123d16@oracle.com> Looks good. I start new testing. Thanks, Vladimir On 10/23/17 10:15 AM, Volker Simonis wrote: > Hi Vladimir, > > that's a good suggestion! I've did so and prepared a new webrev: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v4/ > > I've also verified that: > > http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ > > still applies after 8166317.v4 > > Thank you and best regards, > Volker > > > On Tue, Oct 17, 2017 at 7:49 PM, Vladimir Kozlov > wrote: >> Hi, Volker >> >> You can do a trick with NOT_SPARC() macro to avoid defining empty method on >> all platforms: >> >> +#if INCLUDE_ALL_GCS >> +void g1_barrier_stubs_init() NOT_SPARC( {} ); // depends on universe_init, >> must be before interpreter_init >> +#endif >> >> I thought we pushed 8187091 already. I will keep it in mind. >> >> Thanks, >> Vladimir >> >> >> On 10/10/17 10:17 AM, Volker Simonis wrote: >>> >>> On Tue, Oct 10, 2017 at 9:42 AM, Andrew Haley wrote: >>>> >>>> On 09/10/17 20:24, Volker Simonis wrote: >>>>> >>>>> Unfortunately we can't easily generate these stubs during >>>>> 'stubRoutines_init1()' because >>>>> 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map >>>>> base address which is only initialized in >>>>> 'CardTableModRefBS::initialize()' during 'univers_init()' which >>>>> happens after 'stubRoutines_init1()'. >>>> >>>> >>>> Yes you can, you can do something like we do for narrow_ptrs_base: >>>> >>>> if (Universe::is_fully_initialized()) { >>>> mov(rheapbase, Universe::narrow_ptrs_base()); >>>> } else { >>>> lea(rheapbase, >>>> ExternalAddress((address)Universe::narrow_ptrs_base_addr())); >>>> ldr(rheapbase, Address(rheapbase)); >>>> } >>>> >>> >>> Hi Andrew, >>> >>> thanks for your suggestion. Yes, I could do that, but that would >>> replace a constant load in the barrier with a constant load plus a >>> load from memory, because during stubRoutines_init1() heap won't be >>> initialized. Not sure about this, but I think we want to avoid this >>> overhead in the barriers. >>> >>> Also, Christian proposed in a previous mail to replace the G1 barrier >>> stubs on SPARC with simple runtime calls like on other platforms. >>> While I think that it is probably worthwhile thinking about such a >>> change, I don't know the exact history of these stubs and probably >>> some GC experts should decide if that's really a good idea. I'd be >>> happy to open an extra issue for following up on that path. >>> >>> But for the moments I've simply added a new initialization step >>> "g1_barrier_stubs_init()" between 'univers_init()' and >>> interpreter_init() which is empty on all platforms except SPARC where >>> it generates the corresponding stubs: >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v3/ >>> >>> I've built and smoke-tested the new change on Windows, MacOS, >>> Solaris/SPARC, AIX, Linux/x86_64/ppc64/ppc64le/s390. Unfortunately I >>> don't have access to ARM machines so I couldn't check arm,arm64 and >>> aarch64 although I don't expect any problems there (actually I've just >>> added an empty method there). But it would be great if somebody could >>> check that for any case. >>> >>> @Vladimir: I've also rebased the change for "8187091: >>> ReturnBlobToWrongHeapTest fails because of problems in >>> CodeHeap::contains_blob()": >>> >>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ >>> >>> Because it changes the same files like 8166317 it should be applied >>> and pushed only after 8166317 was pushed. >>> >>> Thank you and best regards, >>> Volker >>> >>>> -- >>>> Andrew Haley >>>> Java Platform Lead Engineer >>>> Red Hat UK Ltd. >>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adeel.iqbal at hotmail.com Sun Oct 22 14:23:32 2017 From: adeel.iqbal at hotmail.com (Adeel Iqbal) Date: Sun, 22 Oct 2017 14:23:32 +0000 Subject: Modify / Add Instruction set of Java & Add New SuperInstruction to It Message-ID: Hi, i am working on a project where i have to modify the java bytecode instruction set by adding custom instruction (SuperInstructions) as a replacement of sequence of instructions in order to reduce the size of the generated file and to modify the JVM to recognize these new instructions. can you please guide me. From david.holmes at oracle.com Tue Oct 24 05:07:40 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 24 Oct 2017 15:07:40 +1000 Subject: Modify / Add Instruction set of Java & Add New SuperInstruction to It In-Reply-To: References: Message-ID: <4b506c75-eabd-8325-680a-0e1acb24bba7@oracle.com> Hi, On 23/10/2017 12:23 AM, Adeel Iqbal wrote: > Hi, > i am working on a project where i have to modify the java bytecode instruction set by adding custom instruction (SuperInstructions) as a replacement of sequence of instructions in order to reduce the size of the generated file and to modify the JVM to recognize these new instructions. > can you please guide me. That's a significant project. Not knowing how much you know about anything makes it hard to give guidance. But you're not the first to attempt such a thing so I suggest doing some initial research. This is a quick hit I got when I googled "bytecode compaction for the JVM": https://link.springer.com/chapter/10.1007/978-3-642-13651-1_2 It's only a preview, you'll need to get full access to the paper by some means. But their project has a wikipedia entry: https://en.wikipedia.org/wiki/TakaTuka Disclaimer: I know nothing about this system. The theory is simple enough: 1. Identify the sequences you want to replace 2. Write a tool (or modify javac) to recognize the sequences and replace them with the new bytecode. 3. Add the new bytecode to the interpreter. Before proceeding with step 3 run the tool over your benchmark application and see if you're really achieving your goals with regards to saving space. But please don't expect step-by-step assistance with this. Cheers, David From jini.george at oracle.com Tue Oct 24 07:31:37 2017 From: jini.george at oracle.com (Jini George) Date: Tue, 24 Oct 2017 13:01:37 +0530 Subject: RFR: SA: JDK-8189798: SA cleanup - part 1 In-Reply-To: <18501902-23db-de6c-b83d-640cd33df836@oracle.com> References: <18501902-23db-de6c-b83d-640cd33df836@oracle.com> Message-ID: Adding hotspot-dev too. Thanks, Jini. On 10/24/2017 12:05 PM, Jini George wrote: > Hello, > > As a part of SA next, I am working on writing a test case which compares > the fields and the types of the fields of the SA java classes with the > corresponding entries in the vmStructs tables. This, to some extent, > would help in preventing errors in SA due to the changes in hotspot. As > a precursor to this, I am in the process of making some cleanup related > changes (mostly in SA). I plan to have the changes done in parts. For > this webrev, most of the changes are for: > > 1. Avoiding having some values being redefined in SA. Instead have those > exported through vmStructs, and read it in SA. > (CompactibleFreeListSpace::_min_chunk_size_in_bytes, > CompactibleFreeListSpace::IndexSetSize) > > Redefinition of hotspot values in SA makes SA error prone, when the > value gets altered in hotspot and the corresponding modification gets > missed out in SA. > > 2. To remove some unused code (JNIid.java). > 3. Add the missing "CMSBitMap::_bmStartWord" in vmStructs. > 4. Modify variable names in SA and hotspot to match the counterpart > names, so that the comparison of the fields become easier. Most of the > changes belong to this group. > > Could I please get reviews done for these precursor changes ? > > JBS Id: https://bugs.openjdk.java.net/browse/JDK-8189798 > webrev: http://cr.openjdk.java.net/~jgeorge/8189798/webrev.00/ > > Thank you, > Jini. > From volker.simonis at gmail.com Tue Oct 24 07:35:37 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 24 Oct 2017 09:35:37 +0200 Subject: RFR(M): 8166317: InterpreterCodeSize should be computed In-Reply-To: <9c35ed03-5a85-8b14-6874-cd828f123d16@oracle.com> References: <6704868d-caa7-51e0-4741-5d62f90d837c@oracle.com> <8c522d38-90db-2864-0778-6d5948b1f50c@oracle.com> <7fee08f1-8304-3026-19e9-844e618e98ea@oracle.com> <2bb4136a-8c0e-ac4c-0c03-af38ff79ab40@oracle.com> <5b5219a5-960e-363b-2bdc-3613f1dae62c@oracle.com> <4109f960-078f-e582-3c78-71f201a265fd@redhat.com> <1c2eeaa1-334a-4744-ba31-87e580faafa5@oracle.com> <9c35ed03-5a85-8b14-6874-cd828f123d16@oracle.com> Message-ID: Thanks, Volker On Mon, Oct 23, 2017 at 10:00 PM, Vladimir Kozlov wrote: > Looks good. I start new testing. > > Thanks, > Vladimir > > > On 10/23/17 10:15 AM, Volker Simonis wrote: >> >> Hi Vladimir, >> >> that's a good suggestion! I've did so and prepared a new webrev: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v4/ >> >> I've also verified that: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ >> >> still applies after 8166317.v4 >> >> Thank you and best regards, >> Volker >> >> >> On Tue, Oct 17, 2017 at 7:49 PM, Vladimir Kozlov >> wrote: >>> >>> Hi, Volker >>> >>> You can do a trick with NOT_SPARC() macro to avoid defining empty method >>> on >>> all platforms: >>> >>> +#if INCLUDE_ALL_GCS >>> +void g1_barrier_stubs_init() NOT_SPARC( {} ); // depends on >>> universe_init, >>> must be before interpreter_init >>> +#endif >>> >>> I thought we pushed 8187091 already. I will keep it in mind. >>> >>> Thanks, >>> Vladimir >>> >>> >>> On 10/10/17 10:17 AM, Volker Simonis wrote: >>>> >>>> >>>> On Tue, Oct 10, 2017 at 9:42 AM, Andrew Haley wrote: >>>>> >>>>> >>>>> On 09/10/17 20:24, Volker Simonis wrote: >>>>>> >>>>>> >>>>>> Unfortunately we can't easily generate these stubs during >>>>>> 'stubRoutines_init1()' because >>>>>> 'generate_dirty_card_log_enqueue_if_necessary()' needs the byte map >>>>>> base address which is only initialized in >>>>>> 'CardTableModRefBS::initialize()' during 'univers_init()' which >>>>>> happens after 'stubRoutines_init1()'. >>>>> >>>>> >>>>> >>>>> Yes you can, you can do something like we do for narrow_ptrs_base: >>>>> >>>>> if (Universe::is_fully_initialized()) { >>>>> mov(rheapbase, Universe::narrow_ptrs_base()); >>>>> } else { >>>>> lea(rheapbase, >>>>> ExternalAddress((address)Universe::narrow_ptrs_base_addr())); >>>>> ldr(rheapbase, Address(rheapbase)); >>>>> } >>>>> >>>> >>>> Hi Andrew, >>>> >>>> thanks for your suggestion. Yes, I could do that, but that would >>>> replace a constant load in the barrier with a constant load plus a >>>> load from memory, because during stubRoutines_init1() heap won't be >>>> initialized. Not sure about this, but I think we want to avoid this >>>> overhead in the barriers. >>>> >>>> Also, Christian proposed in a previous mail to replace the G1 barrier >>>> stubs on SPARC with simple runtime calls like on other platforms. >>>> While I think that it is probably worthwhile thinking about such a >>>> change, I don't know the exact history of these stubs and probably >>>> some GC experts should decide if that's really a good idea. I'd be >>>> happy to open an extra issue for following up on that path. >>>> >>>> But for the moments I've simply added a new initialization step >>>> "g1_barrier_stubs_init()" between 'univers_init()' and >>>> interpreter_init() which is empty on all platforms except SPARC where >>>> it generates the corresponding stubs: >>>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8166317.v3/ >>>> >>>> I've built and smoke-tested the new change on Windows, MacOS, >>>> Solaris/SPARC, AIX, Linux/x86_64/ppc64/ppc64le/s390. Unfortunately I >>>> don't have access to ARM machines so I couldn't check arm,arm64 and >>>> aarch64 although I don't expect any problems there (actually I've just >>>> added an empty method there). But it would be great if somebody could >>>> check that for any case. >>>> >>>> @Vladimir: I've also rebased the change for "8187091: >>>> ReturnBlobToWrongHeapTest fails because of problems in >>>> CodeHeap::contains_blob()": >>>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ >>>> >>>> Because it changes the same files like 8166317 it should be applied >>>> and pushed only after 8166317 was pushed. >>>> >>>> Thank you and best regards, >>>> Volker >>>> >>>>> -- >>>>> Andrew Haley >>>>> Java Platform Lead Engineer >>>>> Red Hat UK Ltd. >>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Tue Oct 24 14:03:37 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 24 Oct 2017 16:03:37 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <33aff570-5bdb-d1aa-bccd-f6122db61051@redhat.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <33aff570-5bdb-d1aa-bccd-f6122db61051@redhat.com> Message-ID: <79f14e0e-6b68-feca-fbc9-3bd538ac7364@oracle.com> On 2017-10-23 18:36, Andrew Haley wrote: > This is a bad way to handle supports_thread_local_poll(): I agree, is this what you had in mind: Incremental: http://cr.openjdk.java.net/~rehn/8185640/v4/Support-Check-Haley-6/webrev/ Thanks, Robbin > > static bool supports_thread_local_poll() { > #if defined(AMD64) || defined(SPARC) > return true; > #else > return false; > #endif > } > > Instead, it is better to use a flag which is #defined in the back > ends, and allow each back end to specify if it supports thread-local > handshakes. We have *two* AARCH64 back ends, and only one of them > supports thread-local handshakes; both of them #define AARCH64. > > #if defined(BLAH) should be reserved for hardware-specific properties, > not back-end-specific properties. > From bob.vandette at oracle.com Tue Oct 24 14:11:43 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Tue, 24 Oct 2017 10:11:43 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> Message-ID: > On Oct 23, 2017, at 12:52 AM, Kim Barrett wrote: > >> On Sep 27, 2017, at 9:20 PM, David Holmes wrote: >>>> 62 void set_subsystem_path(char *cgroup_path) { >>>> >>>> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere? >>> I tried several different ways of declaring the container accessor functions and >>> always ended up with warnings due to scanf not being able to validate arguments >>> since the format string didn?t end up being a string literal. I originally was using templates >>> and then ended up with the macros. I tried several different casts but could resolve the problem. >> >> Sounds like something Kim Barrett should take a look at :) > > Fortunately, I just happened by. > > The warnings are because we compile with -Wformat=2, which enables > -Wformat-nonliteral (among other things). > > Use PRAGMA_FORMAT_NONLITERAL_IGNORED, e.g. > > PRAGMA_DIAG_PUSH > PRAGMA_FORMAT_NONLITERAL_IGNORED > > PRAGMA_DIAG_POP > > That will silence warnings about sscanf (or anything else!) with a > non-literal format string within that . Thanks but I ended up taking a different approach that resulted in more compact code. http://cr.openjdk.java.net/~bobv/8146115/webrev.02 > > Also, while I was looking at this, I noticed that in > get_subsytem_file_contents_##return_name, if the sum of the lengths of > get_subsystem_path() and filename is >= MAXBUF, then we can end up > reading from a file other than the one intended, if such a file > exists. That seems like it might be bad. I fixed all uses of strncat. > > Also, the filename argument should be const char*. > Fixed. Thanks, Bob. From aph at redhat.com Tue Oct 24 14:21:45 2017 From: aph at redhat.com (Andrew Haley) Date: Tue, 24 Oct 2017 15:21:45 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <79f14e0e-6b68-feca-fbc9-3bd538ac7364@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <33aff570-5bdb-d1aa-bccd-f6122db61051@redhat.com> <79f14e0e-6b68-feca-fbc9-3bd538ac7364@oracle.com> Message-ID: On 24/10/17 15:03, Robbin Ehn wrote: > I agree, is this what you had in mind: > Incremental: > http://cr.openjdk.java.net/~rehn/8185640/v4/Support-Check-Haley-6/webrev/ Perfect, thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Tue Oct 24 14:54:28 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 24 Oct 2017 16:54:28 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> Message-ID: <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> Hi, I did a fix for the interpreter performance regression, it's plain and simple, I kept the polling code inside dispatch_base but made it optional as the verify oop. Incremental: http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression vs TLH off. More insensitive benchmark show no regression. Thanks, Robbin On 2017-10-23 17:58, Karen Kinnear wrote: > Works for me > > Thanks, > Karen > >> On Oct 23, 2017, at 8:40 AM, Doerr, Martin wrote: >> >> Hi Coleen and Robbin, >> >> I'm ok with putting it into a separate RFE. I understand that there are more fun activities than rebasing this XL change for a long time :-) >> So you don't need to delay it. It's acceptable for me. >> >> Thanks, Coleen, for sharing your proposal. I appreciate it. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >> Sent: Montag, 23. Oktober 2017 17:26 >> To: Doerr, Martin ; hotspot-dev developers >> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >> >> Hi Martin, >> >>> On 2017-10-18 16:05, Doerr, Martin wrote: >>> Hi Robbin, >>> >>> thanks for the quick reply and for doing additional benchmarks. >>> Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code. >>> I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) >> >> Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark. >> We are committed to fix this, but it might come as separate RFE/bug depending on >> the JEP's timeline. >> >> (If the fix, very unlikely, would not be done before next release, we would >> change the default to off) >> >> I hope this is an acceptable path? >> >> Thanks, Robbin >> >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>> Sent: Mittwoch, 18. Oktober 2017 15:58 >>> To: Doerr, Martin ; hotspot-dev developers >>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>> >>> Hi Martin, >>> >>>> On 2017-10-18 12:11, Doerr, Martin wrote: >>>> Hi Robbin, >>>> >>>> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? >>>> I'd be fine with that, too. >>> >>> Yes, great! >>> >>>> >>>> While thinking a little longer about the interpreter implementation, a new idea came into my mind. >>>> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like >>>> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); >>>> in TemplateInterpreterGenerator::generate_and_dispatch. >>> >>> We have not seen any performance regression in simple benchmark with this. >>> I will do a better benchmark and compare what difference it makes. >>> >>> Thanks, Robbin >>> >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>> Sent: Mittwoch, 18. Oktober 2017 11:07 >>>> To: Doerr, Martin ; hotspot-dev developers >>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>> >>>> Thanks for looking at this. >>>> >>>>> On 2017-10-17 19:58, Doerr, Martin wrote: >>>>> Hi Robbin, >>>>> >>>>> my first impression is very good. Thanks for providing the webrev. >>>> >>>> Great! >>>> >>>>> >>>>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >>>>> Would it be ok to move the decision between what to use to platform code? >>>>> (Some platforms could still use both if this is beneficial.) >>>>> >>>>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. >>>> >>>> I see no issue with this. >>>> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. >>>> Can we do this incremental when adding the platform support for PPC64? >>>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >>>>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>>>> To: hotspot-dev developers >>>>> Subject: RFR(XL): 8185640: Thread-local handshakes >>>>> >>>>> Hi all, >>>>> >>>>> Starting the review of the code while JEP work is still not completed. >>>>> >>>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>>>> >>>>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >>>>> just all threads or none. >>>>> >>>>> Entire changeset: >>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>>>> >>>>> Divided into 3-parts, >>>>> SafepointMechanism abstraction: >>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>>>> Consolidating polling page allocation: >>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>>>> Handshakes: >>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>>>> >>>>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >>>>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >>>>> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >>>>> handshake can be performed with that single JavaThread as well. >>>>> >>>>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >>>>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >>>>> >>>>> Example of potential use-cases: >>>>> -Biased lock revocation >>>>> -External requests for stack traces >>>>> -Deoptimization >>>>> -Async exception delivery >>>>> -External suspension >>>>> -Eliding memory barriers >>>>> >>>>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >>>>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >>>>> platforms are Linux x64 and Solaris SPARC. >>>>> >>>>> Tested heavily with various test suits and comes with a few new tests. >>>>> >>>>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >>>>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >>>>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >>>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >>>>> JavaThreads in an array instead of a linked list. >>>>> >>>>> Thanks, Robbin >>>>> > From martin.doerr at sap.com Tue Oct 24 17:08:25 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 24 Oct 2017 17:08:25 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> Message-ID: <0b34f052cc7047cfb40dc44e91c1300d@sap.com> Hi Robbin, sounds good. Thanks a lot for doing it. The change looks good to me except that I'd expect a poll for wide_ret, too. Best regards, Martin -----Original Message----- From: Robbin Ehn [mailto:robbin.ehn at oracle.com] Sent: Dienstag, 24. Oktober 2017 16:54 To: Karen Kinnear ; Doerr, Martin Cc: hotspot-dev developers ; Coleen Phillimore (coleen.phillimore at oracle.com) Subject: Re: RFR(XL): 8185640: Thread-local handshakes Hi, I did a fix for the interpreter performance regression, it's plain and simple, I kept the polling code inside dispatch_base but made it optional as the verify oop. Incremental: http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression vs TLH off. More insensitive benchmark show no regression. Thanks, Robbin On 2017-10-23 17:58, Karen Kinnear wrote: > Works for me > > Thanks, > Karen > >> On Oct 23, 2017, at 8:40 AM, Doerr, Martin wrote: >> >> Hi Coleen and Robbin, >> >> I'm ok with putting it into a separate RFE. I understand that there are more fun activities than rebasing this XL change for a long time :-) >> So you don't need to delay it. It's acceptable for me. >> >> Thanks, Coleen, for sharing your proposal. I appreciate it. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >> Sent: Montag, 23. Oktober 2017 17:26 >> To: Doerr, Martin ; hotspot-dev developers >> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >> >> Hi Martin, >> >>> On 2017-10-18 16:05, Doerr, Martin wrote: >>> Hi Robbin, >>> >>> thanks for the quick reply and for doing additional benchmarks. >>> Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code. >>> I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) >> >> Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark. >> We are committed to fix this, but it might come as separate RFE/bug depending on >> the JEP's timeline. >> >> (If the fix, very unlikely, would not be done before next release, we would >> change the default to off) >> >> I hope this is an acceptable path? >> >> Thanks, Robbin >> >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>> Sent: Mittwoch, 18. Oktober 2017 15:58 >>> To: Doerr, Martin ; hotspot-dev developers >>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>> >>> Hi Martin, >>> >>>> On 2017-10-18 12:11, Doerr, Martin wrote: >>>> Hi Robbin, >>>> >>>> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? >>>> I'd be fine with that, too. >>> >>> Yes, great! >>> >>>> >>>> While thinking a little longer about the interpreter implementation, a new idea came into my mind. >>>> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like >>>> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); >>>> in TemplateInterpreterGenerator::generate_and_dispatch. >>> >>> We have not seen any performance regression in simple benchmark with this. >>> I will do a better benchmark and compare what difference it makes. >>> >>> Thanks, Robbin >>> >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>> Sent: Mittwoch, 18. Oktober 2017 11:07 >>>> To: Doerr, Martin ; hotspot-dev developers >>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>> >>>> Thanks for looking at this. >>>> >>>>> On 2017-10-17 19:58, Doerr, Martin wrote: >>>>> Hi Robbin, >>>>> >>>>> my first impression is very good. Thanks for providing the webrev. >>>> >>>> Great! >>>> >>>>> >>>>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >>>>> Would it be ok to move the decision between what to use to platform code? >>>>> (Some platforms could still use both if this is beneficial.) >>>>> >>>>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. >>>> >>>> I see no issue with this. >>>> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. >>>> Can we do this incremental when adding the platform support for PPC64? >>>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >>>>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>>>> To: hotspot-dev developers >>>>> Subject: RFR(XL): 8185640: Thread-local handshakes >>>>> >>>>> Hi all, >>>>> >>>>> Starting the review of the code while JEP work is still not completed. >>>>> >>>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>>>> >>>>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >>>>> just all threads or none. >>>>> >>>>> Entire changeset: >>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>>>> >>>>> Divided into 3-parts, >>>>> SafepointMechanism abstraction: >>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>>>> Consolidating polling page allocation: >>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>>>> Handshakes: >>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>>>> >>>>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >>>>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >>>>> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >>>>> handshake can be performed with that single JavaThread as well. >>>>> >>>>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >>>>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >>>>> >>>>> Example of potential use-cases: >>>>> -Biased lock revocation >>>>> -External requests for stack traces >>>>> -Deoptimization >>>>> -Async exception delivery >>>>> -External suspension >>>>> -Eliding memory barriers >>>>> >>>>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >>>>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >>>>> platforms are Linux x64 and Solaris SPARC. >>>>> >>>>> Tested heavily with various test suits and comes with a few new tests. >>>>> >>>>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >>>>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >>>>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >>>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >>>>> JavaThreads in an array instead of a linked list. >>>>> >>>>> Thanks, Robbin >>>>> > From kim.barrett at oracle.com Wed Oct 25 06:57:14 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 25 Oct 2017 02:57:14 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> Message-ID: > On Oct 24, 2017, at 10:11 AM, Bob Vandette wrote: > > >> On Oct 23, 2017, at 12:52 AM, Kim Barrett wrote: >> >>> On Sep 27, 2017, at 9:20 PM, David Holmes wrote: >>>>> 62 void set_subsystem_path(char *cgroup_path) { >>>>> >>>>> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere? >>>> I tried several different ways of declaring the container accessor functions and >>>> always ended up with warnings due to scanf not being able to validate arguments >>>> since the format string didn?t end up being a string literal. I originally was using templates >>>> and then ended up with the macros. I tried several different casts but could resolve the problem. >>> >>> Sounds like something Kim Barrett should take a look at :) >> >> Fortunately, I just happened by. >> >> The warnings are because we compile with -Wformat=2, which enables >> -Wformat-nonliteral (among other things). >> >> Use PRAGMA_FORMAT_NONLITERAL_IGNORED, e.g. >> >> PRAGMA_DIAG_PUSH >> PRAGMA_FORMAT_NONLITERAL_IGNORED >> >> PRAGMA_DIAG_POP >> >> That will silence warnings about sscanf (or anything else!) with a >> non-literal format string within that . > > Thanks but I ended up taking a different approach that resulted in more compact code. > > http://cr.openjdk.java.net/~bobv/8146115/webrev.02 Not a review, just a few more comments in passing. ------------------------------------------------------------------------------ src/hotspot/os/linux/osContainer_linux.cpp 150 log_debug(os, container)("Type %s not found in file %s\n", \ 151 scan_fmt , buf); \ uses buf as path, but buf has been clobbered to contain contents from file. Similarly for 155 log_debug(os, container)("Empty file %s\n", buf); \ ------------------------------------------------------------------------------ src/hotspot/os/linux/osContainer_linux.cpp 158 log_debug(os, container)("file not found %s\n", buf); \ There are many reasons why fopen might fail, and merging them all into a "file not found" message could be quite confusing. It would be much better to report the error from errno. ------------------------------------------------------------------------------ src/hotspot/os/linux/osContainer_linux.cpp Something like the following (where the obvious helpers are made up to keep this short) would eliminate the macrology. PRAGMA_DIAG_PUSH PRAGMA_FORMAT_NONLITERAL_IGNORED template int get_subsystem_file_contents_value(CgroupSubsystem* c, const char* filename, T* returnval, const char* scan_fmt, const char* description) { const char* line = get_subsystem_file_line(c, filename); if (line != NULL) { if (sscanf(line, scan_fmt, returnval) == 1) { return 0; } else { report_subsystem_file_contents_parse_error(description, c, filename); } } return OSCONTAINER_ERROR; } PRAGMA_DIAG_POP int subsystem_file_contents_int(CgroupSubsystem* c, const char* filename, int* returnval) { return get_subsystem_file_contents_value(c, filename, returnval, "%d", "int"); } ------------------------------------------------------------------------------ From aph at redhat.com Wed Oct 25 10:32:33 2017 From: aph at redhat.com (Andrew Haley) Date: Wed, 25 Oct 2017 11:32:33 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: Do we know hat this is always correct for C2? Could we not have something like ldr r0, [rthread, #polling_page_offset] loop: ldr rscratch, [r0] {poll} cmp foo, bar bne loop when C2 hoists the load of the polling page address out of a loop? Or is such hoisting disable for this case? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Wed Oct 25 11:36:33 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 25 Oct 2017 13:36:33 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: Hi Andrew, The address of the polling page address is static per thread. The load of the polling page address is a dependent load. If the add of the offset to rthread is done outside loop, that is perfectly fine. I do not see an issue here. If I did not understand you correctly, please let me know. Thanks, Robbin On 2017-10-25 12:32, Andrew Haley wrote: > Do we know hat this is always correct for C2? Could we not have > something like > > ldr r0, [rthread, #polling_page_offset] > > loop: > ldr rscratch, [r0] {poll} > cmp foo, bar > bne loop > > when C2 hoists the load of the polling page address out of a loop? > > Or is such hoisting disable for this case? > From erik.osterlund at oracle.com Wed Oct 25 11:45:46 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 25 Oct 2017 13:45:46 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> Message-ID: <59F0796A.9060408@oracle.com> Hi Robbin and Andrew, @Robbin: I think Andrew is concerned about a poll inside of the loop always being on if the initial load on rthread points to the trapping page and was loaded into a register (before a loop) that is not changed inside of the loop, and as a consequence gets stuck in trapping all the time for every poll in the loop. By making the initial load from rthread of "raw" pointer type, this load will (as far as I know) not be moved outside of the loop. If it ever was, it would be a bug. Thanks, /Erik On 2017-10-25 13:36, Robbin Ehn wrote: > Hi Andrew, > > The address of the polling page address is static per thread. > The load of the polling page address is a dependent load. > > If the add of the offset to rthread is done outside loop, that is > perfectly fine. I do not see an issue here. If I did not understand > you correctly, please let me know. > > Thanks, Robbin > > On 2017-10-25 12:32, Andrew Haley wrote: >> Do we know hat this is always correct for C2? Could we not have >> something like >> >> ldr r0, [rthread, #polling_page_offset] >> >> loop: >> ldr rscratch, [r0] {poll} >> cmp foo, bar >> bne loop >> >> when C2 hoists the load of the polling page address out of a loop? >> >> Or is such hoisting disable for this case? >> From aph at redhat.com Wed Oct 25 11:52:03 2017 From: aph at redhat.com (Andrew Haley) Date: Wed, 25 Oct 2017 12:52:03 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <59F0796A.9060408@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <59F0796A.9060408@oracle.com> Message-ID: On 25/10/17 12:45, Erik ?sterlund wrote: > By making the initial load from rthread of "raw" pointer type, this load > will (as far as I know) not be moved outside of the loop. If it ever > was, it would be a bug. OK, that's the answer to my question. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Wed Oct 25 12:28:50 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 25 Oct 2017 14:28:50 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <59F0796A.9060408@oracle.com> Message-ID: Thanks Erik for understanding and answering! On 2017-10-25 13:52, Andrew Haley wrote: > On 25/10/17 12:45, Erik ?sterlund wrote: >> By making the initial load from rthread of "raw" pointer type, this load >> will (as far as I know) not be moved outside of the loop. If it ever >> was, it would be a bug. > > OK, that's the answer to my question. Great! /Robbin > From robbin.ehn at oracle.com Wed Oct 25 12:53:38 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 25 Oct 2017 14:53:38 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <0b34f052cc7047cfb40dc44e91c1300d@sap.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <0b34f052cc7047cfb40dc44e91c1300d@sap.com> Message-ID: Hi Martin, On 2017-10-24 19:08, Doerr, Martin wrote: > Hi Robbin, > > sounds good. Thanks a lot for doing it. > The change looks good to me except that I'd expect a poll for wide_ret, too. Yes, incremental: http://cr.openjdk.java.net/~rehn/8185640/v6/Interpreter-Poll-Wide_Ret-8/webrev/index.html Sanity tested, running big test job now. Thanks! /Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Robbin Ehn [mailto:robbin.ehn at oracle.com] > Sent: Dienstag, 24. Oktober 2017 16:54 > To: Karen Kinnear ; Doerr, Martin > Cc: hotspot-dev developers ; Coleen Phillimore (coleen.phillimore at oracle.com) > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > Hi, > > I did a fix for the interpreter performance regression, it's plain and simple, I > kept the polling code inside dispatch_base but made it optional as the verify oop. > > Incremental: > http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html > > Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake > > It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression > vs TLH off. More insensitive benchmark show no regression. > > Thanks, Robbin > > On 2017-10-23 17:58, Karen Kinnear wrote: >> Works for me >> >> Thanks, >> Karen >> >>> On Oct 23, 2017, at 8:40 AM, Doerr, Martin wrote: >>> >>> Hi Coleen and Robbin, >>> >>> I'm ok with putting it into a separate RFE. I understand that there are more fun activities than rebasing this XL change for a long time :-) >>> So you don't need to delay it. It's acceptable for me. >>> >>> Thanks, Coleen, for sharing your proposal. I appreciate it. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>> Sent: Montag, 23. Oktober 2017 17:26 >>> To: Doerr, Martin ; hotspot-dev developers >>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>> >>> Hi Martin, >>> >>>> On 2017-10-18 16:05, Doerr, Martin wrote: >>>> Hi Robbin, >>>> >>>> thanks for the quick reply and for doing additional benchmarks. >>>> Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code. >>>> I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) >>> >>> Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark. >>> We are committed to fix this, but it might come as separate RFE/bug depending on >>> the JEP's timeline. >>> >>> (If the fix, very unlikely, would not be done before next release, we would >>> change the default to off) >>> >>> I hope this is an acceptable path? >>> >>> Thanks, Robbin >>> >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>> Sent: Mittwoch, 18. Oktober 2017 15:58 >>>> To: Doerr, Martin ; hotspot-dev developers >>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>> >>>> Hi Martin, >>>> >>>>> On 2017-10-18 12:11, Doerr, Martin wrote: >>>>> Hi Robbin, >>>>> >>>>> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? >>>>> I'd be fine with that, too. >>>> >>>> Yes, great! >>>> >>>>> >>>>> While thinking a little longer about the interpreter implementation, a new idea came into my mind. >>>>> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like >>>>> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); >>>>> in TemplateInterpreterGenerator::generate_and_dispatch. >>>> >>>> We have not seen any performance regression in simple benchmark with this. >>>> I will do a better benchmark and compare what difference it makes. >>>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>>> Sent: Mittwoch, 18. Oktober 2017 11:07 >>>>> To: Doerr, Martin ; hotspot-dev developers >>>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>>> >>>>> Thanks for looking at this. >>>>> >>>>>> On 2017-10-17 19:58, Doerr, Martin wrote: >>>>>> Hi Robbin, >>>>>> >>>>>> my first impression is very good. Thanks for providing the webrev. >>>>> >>>>> Great! >>>>> >>>>>> >>>>>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >>>>>> Would it be ok to move the decision between what to use to platform code? >>>>>> (Some platforms could still use both if this is beneficial.) >>>>>> >>>>>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. >>>>> >>>>> I see no issue with this. >>>>> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. >>>>> Can we do this incremental when adding the platform support for PPC64? >>>>> >>>>> Thanks, Robbin >>>>> >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >>>>>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>>>>> To: hotspot-dev developers >>>>>> Subject: RFR(XL): 8185640: Thread-local handshakes >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Starting the review of the code while JEP work is still not completed. >>>>>> >>>>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>>>>> >>>>>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >>>>>> just all threads or none. >>>>>> >>>>>> Entire changeset: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>>>>> >>>>>> Divided into 3-parts, >>>>>> SafepointMechanism abstraction: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>>>>> Consolidating polling page allocation: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>>>>> Handshakes: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>>>>> >>>>>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >>>>>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >>>>>> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >>>>>> handshake can be performed with that single JavaThread as well. >>>>>> >>>>>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >>>>>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >>>>>> >>>>>> Example of potential use-cases: >>>>>> -Biased lock revocation >>>>>> -External requests for stack traces >>>>>> -Deoptimization >>>>>> -Async exception delivery >>>>>> -External suspension >>>>>> -Eliding memory barriers >>>>>> >>>>>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >>>>>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >>>>>> platforms are Linux x64 and Solaris SPARC. >>>>>> >>>>>> Tested heavily with various test suits and comes with a few new tests. >>>>>> >>>>>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >>>>>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >>>>>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >>>>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >>>>>> JavaThreads in an array instead of a linked list. >>>>>> >>>>>> Thanks, Robbin >>>>>> >> From martin.doerr at sap.com Wed Oct 25 13:14:32 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 25 Oct 2017 13:14:32 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <0b34f052cc7047cfb40dc44e91c1300d@sap.com> Message-ID: <1b4cf2fdb4864377b41dc56016af819f@sap.com> Hi Robbin, thanks a lot. Looks good. Best regards, Martin -----Original Message----- From: Robbin Ehn [mailto:robbin.ehn at oracle.com] Sent: Mittwoch, 25. Oktober 2017 14:54 To: Doerr, Martin ; Karen Kinnear Cc: hotspot-dev developers ; Coleen Phillimore (coleen.phillimore at oracle.com) Subject: Re: RFR(XL): 8185640: Thread-local handshakes Hi Martin, On 2017-10-24 19:08, Doerr, Martin wrote: > Hi Robbin, > > sounds good. Thanks a lot for doing it. > The change looks good to me except that I'd expect a poll for wide_ret, too. Yes, incremental: http://cr.openjdk.java.net/~rehn/8185640/v6/Interpreter-Poll-Wide_Ret-8/webrev/index.html Sanity tested, running big test job now. Thanks! /Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Robbin Ehn [mailto:robbin.ehn at oracle.com] > Sent: Dienstag, 24. Oktober 2017 16:54 > To: Karen Kinnear ; Doerr, Martin > Cc: hotspot-dev developers ; Coleen Phillimore (coleen.phillimore at oracle.com) > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > Hi, > > I did a fix for the interpreter performance regression, it's plain and simple, I > kept the polling code inside dispatch_base but made it optional as the verify oop. > > Incremental: > http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html > > Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake > > It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression > vs TLH off. More insensitive benchmark show no regression. > > Thanks, Robbin > > On 2017-10-23 17:58, Karen Kinnear wrote: >> Works for me >> >> Thanks, >> Karen >> >>> On Oct 23, 2017, at 8:40 AM, Doerr, Martin wrote: >>> >>> Hi Coleen and Robbin, >>> >>> I'm ok with putting it into a separate RFE. I understand that there are more fun activities than rebasing this XL change for a long time :-) >>> So you don't need to delay it. It's acceptable for me. >>> >>> Thanks, Coleen, for sharing your proposal. I appreciate it. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>> Sent: Montag, 23. Oktober 2017 17:26 >>> To: Doerr, Martin ; hotspot-dev developers >>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>> >>> Hi Martin, >>> >>>> On 2017-10-18 16:05, Doerr, Martin wrote: >>>> Hi Robbin, >>>> >>>> thanks for the quick reply and for doing additional benchmarks. >>>> Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code. >>>> I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.) >>> >>> Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark. >>> We are committed to fix this, but it might come as separate RFE/bug depending on >>> the JEP's timeline. >>> >>> (If the fix, very unlikely, would not be done before next release, we would >>> change the default to off) >>> >>> I hope this is an acceptable path? >>> >>> Thanks, Robbin >>> >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>> Sent: Mittwoch, 18. Oktober 2017 15:58 >>>> To: Doerr, Martin ; hotspot-dev developers >>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>> >>>> Hi Martin, >>>> >>>>> On 2017-10-18 12:11, Doerr, Martin wrote: >>>>> Hi Robbin, >>>>> >>>>> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again? >>>>> I'd be fine with that, too. >>>> >>>> Yes, great! >>>> >>>>> >>>>> While thinking a little longer about the interpreter implementation, a new idea came into my mind. >>>>> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like >>>>> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll(); >>>>> in TemplateInterpreterGenerator::generate_and_dispatch. >>>> >>>> We have not seen any performance regression in simple benchmark with this. >>>> I will do a better benchmark and compare what difference it makes. >>>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>>> Sent: Mittwoch, 18. Oktober 2017 11:07 >>>>> To: Doerr, Martin ; hotspot-dev developers >>>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>>> >>>>> Thanks for looking at this. >>>>> >>>>>> On 2017-10-17 19:58, Doerr, Martin wrote: >>>>>> Hi Robbin, >>>>>> >>>>>> my first impression is very good. Thanks for providing the webrev. >>>>> >>>>> Great! >>>>> >>>>>> >>>>>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism. >>>>>> Would it be ok to move the decision between what to use to platform code? >>>>>> (Some platforms could still use both if this is beneficial.) >>>>>> >>>>>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion. >>>>> >>>>> I see no issue with this. >>>>> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific. >>>>> Can we do this incremental when adding the platform support for PPC64? >>>>> >>>>> Thanks, Robbin >>>>> >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Robbin Ehn >>>>>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>>>>> To: hotspot-dev developers >>>>>> Subject: RFR(XL): 8185640: Thread-local handshakes >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Starting the review of the code while JEP work is still not completed. >>>>>> >>>>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>>>>> >>>>>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not >>>>>> just all threads or none. >>>>>> >>>>>> Entire changeset: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>>>>> >>>>>> Divided into 3-parts, >>>>>> SafepointMechanism abstraction: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>>>>> Consolidating polling page allocation: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>>>>> Handshakes: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>>>>> >>>>>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread >>>>>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be >>>>>> performed on all threads as soon as possible and they will continue to execute as soon as it?s own operation is completed. If a JavaThread is known to be running, then a >>>>>> handshake can be performed with that single JavaThread as well. >>>>>> >>>>>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the >>>>>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page. >>>>>> >>>>>> Example of potential use-cases: >>>>>> -Biased lock revocation >>>>>> -External requests for stack traces >>>>>> -Deoptimization >>>>>> -Async exception delivery >>>>>> -External suspension >>>>>> -Eliding memory barriers >>>>>> >>>>>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints. >>>>>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported >>>>>> platforms are Linux x64 and Solaris SPARC. >>>>>> >>>>>> Tested heavily with various test suits and comes with a few new tests. >>>>>> >>>>>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically >>>>>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ?materializing? the page vs load load. >>>>>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on >>>>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all >>>>>> JavaThreads in an array instead of a linked list. >>>>>> >>>>>> Thanks, Robbin >>>>>> >> From coleen.phillimore at oracle.com Wed Oct 25 13:19:30 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 25 Oct 2017 09:19:30 -0400 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> Message-ID: <3b7f8cb6-527d-b5f6-25d4-16bbae9d7ae2@oracle.com> Hi Robbin, This change (with the addition of the poll at wide_ret) looks good. It came out nicely in the code. thanks, Coleen On 10/24/17 10:54 AM, Robbin Ehn wrote: > Hi, > > I did a fix for the interpreter performance regression, it's plain and > simple, I kept the polling code inside dispatch_base but made it > optional as the verify oop. > > Incremental: > http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html > > > Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake > > It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% > regression vs TLH off. More insensitive benchmark show no regression. > > Thanks, Robbin > > On 2017-10-23 17:58, Karen Kinnear wrote: >> Works for me >> >> Thanks, >> Karen >> >>> On Oct 23, 2017, at 8:40 AM, Doerr, Martin >>> wrote: >>> >>> Hi Coleen and Robbin, >>> >>> I'm ok with putting it into a separate RFE. I understand that there >>> are more fun activities than rebasing this XL change for a long time >>> :-) >>> So you don't need to delay it. It's acceptable for me. >>> >>> Thanks, Coleen, for sharing your proposal. I appreciate it. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>> Sent: Montag, 23. Oktober 2017 17:26 >>> To: Doerr, Martin ; hotspot-dev developers >>> >>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>> >>> Hi Martin, >>> >>>> On 2017-10-18 16:05, Doerr, Martin wrote: >>>> Hi Robbin, >>>> >>>> thanks for the quick reply and for doing additional benchmarks. >>>> Please note that t->does_dispatch() was just a first idea, but >>>> doesn't really fit for the purpose because it's false for >>>> conditional branch bytecodes for example. I just didn't find an >>>> appropriate quick check in the existing code. >>>> I guess you will notice a performance impact when benchmarking with >>>> -Xint. (I don't know if Oracle usually runs startup performance >>>> benchmarks.) >>> >>> Yes, we are seeing a performance regression, 2.5%-6% depending on >>> benchmark. >>> We are committed to fix this, but it might come as separate RFE/bug >>> depending on >>> the JEP's timeline. >>> >>> (If the fix, very unlikely, would not be done before next release, >>> we would >>> change the default to off) >>> >>> I hope this is an acceptable path? >>> >>> Thanks, Robbin >>> >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>> Sent: Mittwoch, 18. Oktober 2017 15:58 >>>> To: Doerr, Martin ; hotspot-dev developers >>>> >>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>> >>>> Hi Martin, >>>> >>>>> On 2017-10-18 12:11, Doerr, Martin wrote: >>>>> Hi Robbin, >>>>> >>>>> so you would like to push your version first (as it does not break >>>>> other platforms) and then help us to push non-Oracle platform >>>>> implementations which change shared code again? >>>>> I'd be fine with that, too. >>>> >>>> Yes, great! >>>> >>>>> >>>>> While thinking a little longer about the interpreter >>>>> implementation, a new idea came into my mind. >>>>> I think we could significantly reduce impact on interpreter code >>>>> size and performance by using safepoint polls only in a subset of >>>>> bytecodes. E.g., we could use only bytecodes which perform any >>>>> kind of jump by implementing something like >>>>> if (SafepointMechanism::uses_thread_local_poll() && >>>>> t->does_dispatch()) generate_safepoint_poll(); >>>>> in TemplateInterpreterGenerator::generate_and_dispatch. >>>> >>>> We have not seen any performance regression in simple benchmark >>>> with this. >>>> I will do a better benchmark and compare what difference it makes. >>>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>>> Sent: Mittwoch, 18. Oktober 2017 11:07 >>>>> To: Doerr, Martin ; hotspot-dev developers >>>>> >>>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>>> >>>>> Thanks for looking at this. >>>>> >>>>>> On 2017-10-17 19:58, Doerr, Martin wrote: >>>>>> Hi Robbin, >>>>>> >>>>>> my first impression is very good. Thanks for providing the webrev. >>>>> >>>>> Great! >>>>> >>>>>> >>>>>> I only don't like that "poll_page_val | poll_bit()" is used in >>>>>> shared code. I'd prefer to use either one or the other mechanism. >>>>>> Would it be ok to move the decision between what to use to >>>>>> platform code? >>>>>> (Some platforms could still use both if this is beneficial.) >>>>>> >>>>>> E.g. on PPC64, we'd like to use conditional trap instructions >>>>>> with special bit patterns if UseSIGTRAP is on. Would be excellent >>>>>> if we could implement set functions for _poll_armed_value and >>>>>> _poll_disarmed_value in platform code. poll_bit() also fits >>>>>> better into platform code in my opinion. >>>>> >>>>> I see no issue with this. >>>>> Maybe SafepointMechanism::local_poll_armed should be possibly >>>>> platform specific. >>>>> Can we do this incremental when adding the platform support for >>>>> PPC64? >>>>> >>>>> Thanks, Robbin >>>>> >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] >>>>>> On Behalf Of Robbin Ehn >>>>>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>>>>> To: hotspot-dev developers >>>>>> Subject: RFR(XL): 8185640: Thread-local handshakes >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Starting the review of the code while JEP work is still not >>>>>> completed. >>>>>> >>>>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>>>>> >>>>>> This JEP introduces a way to execute a callback on threads >>>>>> without performing a global VM safepoint. It makes it both >>>>>> possible and cheap to stop individual threads and not >>>>>> just all threads or none. >>>>>> >>>>>> Entire changeset: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>>>>> >>>>>> Divided into 3-parts, >>>>>> SafepointMechanism abstraction: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>>>>> Consolidating polling page allocation: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>>>>> Handshakes: >>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>>>>> >>>>>> A handshake operation is a callback that is executed for each >>>>>> JavaThread while that thread is in a safepoint safe state. The >>>>>> callback is executed either by the thread >>>>>> itself or by the VM thread while keeping the thread in a blocked >>>>>> state. The big difference between safepointing and handshaking is >>>>>> that the per thread operation will be >>>>>> performed on all threads as soon as possible and they will >>>>>> continue to execute as soon as it?s own operation is completed. >>>>>> If a JavaThread is known to be running, then a >>>>>> handshake can be performed with that single JavaThread as well. >>>>>> >>>>>> The current safepointing scheme is modified to perform an >>>>>> indirection through a per-thread pointer which will allow a >>>>>> single thread's execution to be forced to trap on the >>>>>> guard page. In order to force a thread to yield the VM updates >>>>>> the per-thread pointer for the corresponding thread to point to >>>>>> the guarded page. >>>>>> >>>>>> Example of potential use-cases: >>>>>> -Biased lock revocation >>>>>> -External requests for stack traces >>>>>> -Deoptimization >>>>>> -Async exception delivery >>>>>> -External suspension >>>>>> -Eliding memory barriers >>>>>> >>>>>> All of these will benefit the VM moving towards becoming more >>>>>> low-latency friendly by reducing the number of global safepoints. >>>>>> Platforms that do not yet implement the per JavaThread poll, a >>>>>> fallback to normal safepoint is in place. HandshakeOneThread will >>>>>> then be a normal safepoint. The supported >>>>>> platforms are Linux x64 and Solaris SPARC. >>>>>> >>>>>> Tested heavily with various test suits and comes with a few new >>>>>> tests. >>>>>> >>>>>> Performance testing using standardized benchmark show no >>>>>> signification changes, the latest number was -0.7% on Linux x64 >>>>>> and +1.5% Solaris SPARC (not statistically >>>>>> ensured). A minor regression for the load vs load load on x64 is >>>>>> expected and a slight increase on SPARC due to the cost of >>>>>> ?materializing? the page vs load load. >>>>>> The time to trigger a safepoint was measured on a large machine >>>>>> to not be an issue. The looping over threads and arming the >>>>>> polling page will benefit from the work on >>>>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: >>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) >>>>>> which puts all >>>>>> JavaThreads in an array instead of a linked list. >>>>>> >>>>>> Thanks, Robbin >>>>>> >> From robbin.ehn at oracle.com Wed Oct 25 13:35:55 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 25 Oct 2017 15:35:55 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <3b7f8cb6-527d-b5f6-25d4-16bbae9d7ae2@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <3b7f8cb6-527d-b5f6-25d4-16bbae9d7ae2@oracle.com> Message-ID: <03a50ba7-7afd-aca5-d59c-0d7472b513c8@oracle.com> Thanks Coleen, Robbin On 2017-10-25 15:19, coleen.phillimore at oracle.com wrote: > > Hi Robbin, > This change (with the addition of the poll at wide_ret) looks good. It came out > nicely in the code. > thanks, > Coleen > > On 10/24/17 10:54 AM, Robbin Ehn wrote: >> Hi, >> >> I did a fix for the interpreter performance regression, it's plain and simple, >> I kept the polling code inside dispatch_base but made it optional as the >> verify oop. >> >> Incremental: >> http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html >> >> Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake >> >> It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% >> regression vs TLH off. More insensitive benchmark show no regression. >> >> Thanks, Robbin >> >> On 2017-10-23 17:58, Karen Kinnear wrote: >>> Works for me >>> >>> Thanks, >>> Karen >>> >>>> On Oct 23, 2017, at 8:40 AM, Doerr, Martin wrote: >>>> >>>> Hi Coleen and Robbin, >>>> >>>> I'm ok with putting it into a separate RFE. I understand that there are more >>>> fun activities than rebasing this XL change for a long time :-) >>>> So you don't need to delay it. It's acceptable for me. >>>> >>>> Thanks, Coleen, for sharing your proposal. I appreciate it. >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>> Sent: Montag, 23. Oktober 2017 17:26 >>>> To: Doerr, Martin ; hotspot-dev developers >>>> >>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>> >>>> Hi Martin, >>>> >>>>> On 2017-10-18 16:05, Doerr, Martin wrote: >>>>> Hi Robbin, >>>>> >>>>> thanks for the quick reply and for doing additional benchmarks. >>>>> Please note that t->does_dispatch() was just a first idea, but doesn't >>>>> really fit for the purpose because it's false for conditional branch >>>>> bytecodes for example. I just didn't find an appropriate quick check in the >>>>> existing code. >>>>> I guess you will notice a performance impact when benchmarking with -Xint. >>>>> (I don't know if Oracle usually runs startup performance benchmarks.) >>>> >>>> Yes, we are seeing a performance regression, 2.5%-6% depending on benchmark. >>>> We are committed to fix this, but it might come as separate RFE/bug >>>> depending on >>>> the JEP's timeline. >>>> >>>> (If the fix, very unlikely, would not be done before next release, we would >>>> change the default to off) >>>> >>>> I hope this is an acceptable path? >>>> >>>> Thanks, Robbin >>>> >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>>> Sent: Mittwoch, 18. Oktober 2017 15:58 >>>>> To: Doerr, Martin ; hotspot-dev developers >>>>> >>>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>>> >>>>> Hi Martin, >>>>> >>>>>> On 2017-10-18 12:11, Doerr, Martin wrote: >>>>>> Hi Robbin, >>>>>> >>>>>> so you would like to push your version first (as it does not break other >>>>>> platforms) and then help us to push non-Oracle platform implementations >>>>>> which change shared code again? >>>>>> I'd be fine with that, too. >>>>> >>>>> Yes, great! >>>>> >>>>>> >>>>>> While thinking a little longer about the interpreter implementation, a new >>>>>> idea came into my mind. >>>>>> I think we could significantly reduce impact on interpreter code size and >>>>>> performance by using safepoint polls only in a subset of bytecodes. E.g., >>>>>> we could use only bytecodes which perform any kind of jump by implementing >>>>>> something like >>>>>> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) >>>>>> generate_safepoint_poll(); >>>>>> in TemplateInterpreterGenerator::generate_and_dispatch. >>>>> >>>>> We have not seen any performance regression in simple benchmark with this. >>>>> I will do a better benchmark and compare what difference it makes. >>>>> >>>>> Thanks, Robbin >>>>> >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>>>>> Sent: Mittwoch, 18. Oktober 2017 11:07 >>>>>> To: Doerr, Martin ; hotspot-dev developers >>>>>> >>>>>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>>>>> >>>>>> Thanks for looking at this. >>>>>> >>>>>>> On 2017-10-17 19:58, Doerr, Martin wrote: >>>>>>> Hi Robbin, >>>>>>> >>>>>>> my first impression is very good. Thanks for providing the webrev. >>>>>> >>>>>> Great! >>>>>> >>>>>>> >>>>>>> I only don't like that "poll_page_val | poll_bit()" is used in shared >>>>>>> code. I'd prefer to use either one or the other mechanism. >>>>>>> Would it be ok to move the decision between what to use to platform code? >>>>>>> (Some platforms could still use both if this is beneficial.) >>>>>>> >>>>>>> E.g. on PPC64, we'd like to use conditional trap instructions with >>>>>>> special bit patterns if UseSIGTRAP is on. Would be excellent if we could >>>>>>> implement set functions for _poll_armed_value and _poll_disarmed_value in >>>>>>> platform code. poll_bit() also fits better into platform code in my opinion. >>>>>> >>>>>> I see no issue with this. >>>>>> Maybe SafepointMechanism::local_poll_armed should be possibly platform >>>>>> specific. >>>>>> Can we do this incremental when adding the platform support for PPC64? >>>>>> >>>>>> Thanks, Robbin >>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf >>>>>>> Of Robbin Ehn >>>>>>> Sent: Mittwoch, 11. Oktober 2017 15:38 >>>>>>> To: hotspot-dev developers >>>>>>> Subject: RFR(XL): 8185640: Thread-local handshakes >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Starting the review of the code while JEP work is still not completed. >>>>>>> >>>>>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640 >>>>>>> >>>>>>> This JEP introduces a way to execute a callback on threads without >>>>>>> performing a global VM safepoint. It makes it both possible and cheap to >>>>>>> stop individual threads and not >>>>>>> just all threads or none. >>>>>>> >>>>>>> Entire changeset: >>>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/ >>>>>>> >>>>>>> Divided into 3-parts, >>>>>>> SafepointMechanism abstraction: >>>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/ >>>>>>> Consolidating polling page allocation: >>>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/ >>>>>>> Handshakes: >>>>>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/ >>>>>>> >>>>>>> A handshake operation is a callback that is executed for each JavaThread >>>>>>> while that thread is in a safepoint safe state. The callback is executed >>>>>>> either by the thread >>>>>>> itself or by the VM thread while keeping the thread in a blocked state. >>>>>>> The big difference between safepointing and handshaking is that the per >>>>>>> thread operation will be >>>>>>> performed on all threads as soon as possible and they will continue to >>>>>>> execute as soon as it?s own operation is completed. If a JavaThread is >>>>>>> known to be running, then a >>>>>>> handshake can be performed with that single JavaThread as well. >>>>>>> >>>>>>> The current safepointing scheme is modified to perform an indirection >>>>>>> through a per-thread pointer which will allow a single thread's execution >>>>>>> to be forced to trap on the >>>>>>> guard page. In order to force a thread to yield the VM updates the >>>>>>> per-thread pointer for the corresponding thread to point to the guarded >>>>>>> page. >>>>>>> >>>>>>> Example of potential use-cases: >>>>>>> -Biased lock revocation >>>>>>> -External requests for stack traces >>>>>>> -Deoptimization >>>>>>> -Async exception delivery >>>>>>> -External suspension >>>>>>> -Eliding memory barriers >>>>>>> >>>>>>> All of these will benefit the VM moving towards becoming more low-latency >>>>>>> friendly by reducing the number of global safepoints. >>>>>>> Platforms that do not yet implement the per JavaThread poll, a fallback >>>>>>> to normal safepoint is in place. HandshakeOneThread will then be a normal >>>>>>> safepoint. The supported >>>>>>> platforms are Linux x64 and Solaris SPARC. >>>>>>> >>>>>>> Tested heavily with various test suits and comes with a few new tests. >>>>>>> >>>>>>> Performance testing using standardized benchmark show no signification >>>>>>> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC >>>>>>> (not statistically >>>>>>> ensured). A minor regression for the load vs load load on x64 is expected >>>>>>> and a slight increase on SPARC due to the cost of ?materializing? the >>>>>>> page vs load load. >>>>>>> The time to trigger a safepoint was measured on a large machine to not be >>>>>>> an issue. The looping over threads and arming the polling page will >>>>>>> benefit from the work on >>>>>>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: >>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) >>>>>>> which puts all >>>>>>> JavaThreads in an array instead of a linked list. >>>>>>> >>>>>>> Thanks, Robbin >>>>>>> >>> > From aph at redhat.com Wed Oct 25 15:16:40 2017 From: aph at redhat.com (Andrew Haley) Date: Wed, 25 Oct 2017 16:16:40 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> Message-ID: <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> On 24/10/17 15:54, Robbin Ehn wrote: > I did a fix for the interpreter performance regression, it's plain and simple, I > kept the polling code inside dispatch_base but made it optional as the verify oop. > > Incremental: > http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html > > Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake > > It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression > vs TLH off. More insensitive benchmark show no regression. I think it's not quite right: you're missing a check in tableswitch and fast_linearswitch. These can be used to construct loops. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From coleen.phillimore at oracle.com Wed Oct 25 16:49:54 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 25 Oct 2017 12:49:54 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot Message-ID: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> Summary: removed hotspot version of jvm*h and jni*h files Mostly used sed to remove prims/jvm.h and move #include "jvm.h" after precompiled.h, so if you have repetitive stress wrist issues don't click on most of these files. There were more issues to resolve, however.? The JDK windows jni_md.h file defined jint as long and the hotspot windows jni_x86.h as int.? I had to choose the jdk version since it's the public version, so there are changes to the hotspot files for this. Generally I changed the code to use 'int' rather than 'jint' where the surrounding API didn't insist on consistently using java types. We should mostly be using C++ types within hotspot except in interfaces to native/JNI code.? There are a couple of hacks in places where adding multiple jint casts was too painful. Tested with JPRT and tier2-4 (in progress). open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8189610 I have a script to update copyright files on commit. Thanks to Magnus and ErikJ for the makefile changes. Thanks, Coleen From martin.doerr at sap.com Wed Oct 25 19:38:02 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 25 Oct 2017 19:38:02 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> Message-ID: <8d7678bf2281406da43cbe090276b51f@sap.com> Hi Andrew, I think you're right. A Java program could have a goto in one of the cases of any switch which gets optimized out (by javac) replacing the branch target of the case. So I think we need safepoint polls in all switch templates, too. Best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Mittwoch, 25. Oktober 2017 17:17 To: Robbin Ehn ; Karen Kinnear ; Doerr, Martin Cc: hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes On 24/10/17 15:54, Robbin Ehn wrote: > I did a fix for the interpreter performance regression, it's plain and simple, I > kept the polling code inside dispatch_base but made it optional as the verify oop. > > Incremental: > http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html > > Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake > > It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression > vs TLH off. More insensitive benchmark show no regression. I think it's not quite right: you're missing a check in tableswitch and fast_linearswitch. These can be used to construct loops. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Wed Oct 25 19:52:33 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 25 Oct 2017 21:52:33 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <8d7678bf2281406da43cbe090276b51f@sap.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> Message-ID: <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> Hi, First thanks both for reviewing this! On 2017-10-25 21:38, Doerr, Martin wrote: > Hi Andrew, > > I think you're right. > > A Java program could have a goto in one of the cases of any switch which gets optimized out (by javac) replacing the branch target of the case. > > So I think we need safepoint polls in all switch templates, too. That's lookupswitch and binaryswitch also? Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Mittwoch, 25. Oktober 2017 17:17 > To: Robbin Ehn ; Karen Kinnear ; Doerr, Martin > Cc: hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > On 24/10/17 15:54, Robbin Ehn wrote: > >> I did a fix for the interpreter performance regression, it's plain and simple, I >> kept the polling code inside dispatch_base but made it optional as the verify oop. >> >> Incremental: >> http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html >> >> Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake >> >> It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression >> vs TLH off. More insensitive benchmark show no regression. > > I think it's not quite right: you're missing a check in tableswitch > and fast_linearswitch. These can be used to construct loops. > From martin.doerr at sap.com Wed Oct 25 20:23:49 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 25 Oct 2017 20:23:49 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> Message-ID: <818e352d5e3a450491cf0c140bf129d6@sap.com> Hi Robbin, as far as I can see tableswitch, fast_linearswitch and fast_binaryswitch should get the poll. lookupswitch gets rewritten to fast_linearswitch or fast_binaryswitch. Sorry that my proposal created extra work for you. Thanks for doing it. Best regards, Martin -----Original Message----- From: Robbin Ehn [mailto:robbin.ehn at oracle.com] Sent: Mittwoch, 25. Oktober 2017 21:53 To: Doerr, Martin ; Andrew Haley ; Karen Kinnear Cc: hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes Hi, First thanks both for reviewing this! On 2017-10-25 21:38, Doerr, Martin wrote: > Hi Andrew, > > I think you're right. > > A Java program could have a goto in one of the cases of any switch which gets optimized out (by javac) replacing the branch target of the case. > > So I think we need safepoint polls in all switch templates, too. That's lookupswitch and binaryswitch also? Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Mittwoch, 25. Oktober 2017 17:17 > To: Robbin Ehn ; Karen Kinnear ; Doerr, Martin > Cc: hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > On 24/10/17 15:54, Robbin Ehn wrote: > >> I did a fix for the interpreter performance regression, it's plain and simple, I >> kept the polling code inside dispatch_base but made it optional as the verify oop. >> >> Incremental: >> http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html >> >> Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake >> >> It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression >> vs TLH off. More insensitive benchmark show no regression. > > I think it's not quite right: you're missing a check in tableswitch > and fast_linearswitch. These can be used to construct loops. > From martin.doerr at sap.com Wed Oct 25 22:05:43 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 25 Oct 2017 22:05:43 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <818e352d5e3a450491cf0c140bf129d6@sap.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> Message-ID: <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> Hi, it's me again. after looking at the bytecodes again, I remembered that ret is olny for jsr. I think polling should also be done for the regular returns. A poll at the beginning of TemplateTable::_return should do the job. Unfortunately, it doesn't fit into your dispatch scheme. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Doerr, Martin Sent: Mittwoch, 25. Oktober 2017 22:24 To: Robbin Ehn ; Andrew Haley ; Karen Kinnear Cc: hotspot-dev developers Subject: RE: RFR(XL): 8185640: Thread-local handshakes Hi Robbin, as far as I can see tableswitch, fast_linearswitch and fast_binaryswitch should get the poll. lookupswitch gets rewritten to fast_linearswitch or fast_binaryswitch. Sorry that my proposal created extra work for you. Thanks for doing it. Best regards, Martin -----Original Message----- From: Robbin Ehn [mailto:robbin.ehn at oracle.com] Sent: Mittwoch, 25. Oktober 2017 21:53 To: Doerr, Martin ; Andrew Haley ; Karen Kinnear Cc: hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes Hi, First thanks both for reviewing this! On 2017-10-25 21:38, Doerr, Martin wrote: > Hi Andrew, > > I think you're right. > > A Java program could have a goto in one of the cases of any switch which gets optimized out (by javac) replacing the branch target of the case. > > So I think we need safepoint polls in all switch templates, too. That's lookupswitch and binaryswitch also? Thanks, Robbin > > Best regards, > Martin > > > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Mittwoch, 25. Oktober 2017 17:17 > To: Robbin Ehn ; Karen Kinnear ; Doerr, Martin > Cc: hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > On 24/10/17 15:54, Robbin Ehn wrote: > >> I did a fix for the interpreter performance regression, it's plain and simple, I >> kept the polling code inside dispatch_base but made it optional as the verify oop. >> >> Incremental: >> http://cr.openjdk.java.net/~rehn/8185640/v5/Interpreter-Poll-7/webrev/index.html >> >> Manual tested with jstack and it passes: hotspot_tier1, hotspot_handshake >> >> It reduces the polling cost of 80%, sensitive benchmarks shows -0.44% regression >> vs TLH off. More insensitive benchmark show no regression. > > I think it's not quite right: you're missing a check in tableswitch > and fast_linearswitch. These can be used to construct loops. > From aph at redhat.com Thu Oct 26 08:58:31 2017 From: aph at redhat.com (Andrew Haley) Date: Thu, 26 Oct 2017 09:58:31 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> Message-ID: <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> On 25/10/17 23:05, Doerr, Martin wrote: > after looking at the bytecodes again, I remembered that ret is olny > for jsr. I think polling should also be done for the regular > returns. > A poll at the beginning of TemplateTable::_return should do the > job. Unfortunately, it doesn't fit into your dispatch scheme. I'm wondering if this is a good idea at all: it could increase the latency of taking a safepoint in bytecode. Granted, it does avoid some significant code bloat in the interpreter. BTW, I don't understand why interpreted code doesn't simply read the polling page. Or we could even simply read-protect the bytecode dispatch tables themselves. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Oct 26 08:59:22 2017 From: aph at redhat.com (Andrew Haley) Date: Thu, 26 Oct 2017 09:59:22 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> Message-ID: <2600c955-57b3-f442-f9fa-0e064ea3916f@redhat.com> On 26/10/17 09:58, Andrew Haley wrote: > Or we could even simply read-protect the bytecode > dispatch tables themselves. But not with thread-local handshakes, of course. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Thu Oct 26 09:30:37 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 26 Oct 2017 09:30:37 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> Message-ID: Hi Andrew, I don't think this will increase safepoint latency (if implemented appropriately). Methods compiled by C2 may contain counted loops (with int range) without safepoint. So this may be quite long in comparison to an interpreted method which can only contain up to 64 k bytecodes while every branch contains a safepoint check. (One might be kind of concerned about no poll in calls in the current implementation.) Best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Donnerstag, 26. Oktober 2017 10:59 To: Doerr, Martin ; Robbin Ehn ; Karen Kinnear ; Coleen Phillimore (coleen.phillimore at oracle.com) Cc: hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes On 25/10/17 23:05, Doerr, Martin wrote: > after looking at the bytecodes again, I remembered that ret is olny > for jsr. I think polling should also be done for the regular > returns. > A poll at the beginning of TemplateTable::_return should do the > job. Unfortunately, it doesn't fit into your dispatch scheme. I'm wondering if this is a good idea at all: it could increase the latency of taking a safepoint in bytecode. Granted, it does avoid some significant code bloat in the interpreter. BTW, I don't understand why interpreted code doesn't simply read the polling page. Or we could even simply read-protect the bytecode dispatch tables themselves. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Oct 26 09:39:53 2017 From: aph at redhat.com (Andrew Haley) Date: Thu, 26 Oct 2017 10:39:53 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> Message-ID: <57ce3fa7-5ba7-2525-3e9f-8b65ee34a24d@redhat.com> On 26/10/17 10:30, Doerr, Martin wrote: > I don't think this will increase safepoint latency (if implemented > appropriately). Methods compiled by C2 may contain counted loops > (with int range) without safepoint. So this may be quite long in > comparison to an interpreted method which can only contain up to 64k > bytecodes while every branch contains a safepoint check. This is to say, I think, that we already have one source of severe safepoint delays, so why not have another one? 64k bytecodes is a lot. > (One might be kind of concerned about no poll in calls in the > current implementation.) I'm not sure why. For every call, there is a return. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From martin.doerr at sap.com Thu Oct 26 09:44:01 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 26 Oct 2017 09:44:01 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <57ce3fa7-5ba7-2525-3e9f-8b65ee34a24d@redhat.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <57ce3fa7-5ba7-2525-3e9f-8b65ee34a24d@redhat.com> Message-ID: 64k is the absolute worst case. I guess it won't take long until a branch gets reached in typical bytecode. My point regarding the call is that it may be a tail recursion which fills up the stack. Best regards, Martin -----Original Message----- From: Andrew Haley [mailto:aph at redhat.com] Sent: Donnerstag, 26. Oktober 2017 11:40 To: Doerr, Martin ; Robbin Ehn ; Karen Kinnear ; Coleen Phillimore (coleen.phillimore at oracle.com) Cc: hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes On 26/10/17 10:30, Doerr, Martin wrote: > I don't think this will increase safepoint latency (if implemented > appropriately). Methods compiled by C2 may contain counted loops > (with int range) without safepoint. So this may be quite long in > comparison to an interpreted method which can only contain up to 64k > bytecodes while every branch contains a safepoint check. This is to say, I think, that we already have one source of severe safepoint delays, so why not have another one? 64k bytecodes is a lot. > (One might be kind of concerned about no poll in calls in the > current implementation.) I'm not sure why. For every call, there is a return. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From magnus.ihse.bursie at oracle.com Thu Oct 26 09:57:15 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Thu, 26 Oct 2017 11:57:15 +0200 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> Message-ID: <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> Coleen, Thank you for addressing this! On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: > Summary: removed hotspot version of jvm*h and jni*h files > > Mostly used sed to remove prims/jvm.h and move #include "jvm.h" after > precompiled.h, so if you have repetitive stress wrist issues don't > click on most of these files. > > There were more issues to resolve, however.? The JDK windows jni_md.h > file defined jint as long and the hotspot windows jni_x86.h as int.? I > had to choose the jdk version since it's the public version, so there > are changes to the hotspot files for this. Generally I changed the > code to use 'int' rather than 'jint' where the surrounding API didn't > insist on consistently using java types. We should mostly be using C++ > types within hotspot except in interfaces to native/JNI code.? There > are a couple of hacks in places where adding multiple jint casts was > too painful. > > Tested with JPRT and tier2-4 (in progress). > > open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev Looks great! Just a few comments: * src/java.base/unix/native/include/jni_md.h: I don't think the externally_visible attribute should be there for arm. I know this was the case for the corresponding hotspot file for arm, but that was techically incorrect. The proper dependency here is that externally_visible should be in all JNIEXPORT if and only if we're building with JVM feature "link-time-opt". Traditionally, that feature been enabled when building arm32 builds, and only then, so there's been a (coincidentally) connection here. Nowadays, Oracle does not care about the arm32 builds, and I'm not sure if anyone else is building them with link-time-opt enabled. It does seem wrong to me to export this behavior in the public jni_md.h file, though. I think the correct way to solve this, if we should continue supporting link-time-opt is to make sure this attribute is set for exported hotspot functions. If it's still needed, that is. A quick googling seems to indicate that visibility("default") might be enough in modern gcc's. A third option is to remove the support for link-time-opt entirely, if it's not really used. * src/java.base/unix/native/include/jvm_md.h and src/java.base/windows/native/include/jvm_md.h: These files define a public API, and contain non-trivial changes. I suspect you should file a CSR request. (Even though I realize you're only matching the header file with the reality.) /Magnus > bug link https://bugs.openjdk.java.net/browse/JDK-8189610 > > I have a script to update copyright files on commit. > > Thanks to Magnus and ErikJ for the makefile changes. > > Thanks, > Coleen > From nils.eliasson at oracle.com Thu Oct 26 12:11:37 2017 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 26 Oct 2017 14:11:37 +0200 Subject: JDK10/RFR(M): 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on Linux). In-Reply-To: <3fcc865d-3eda-a341-e112-8417711ee3e5@oracle.com> References: <7d5e1ebb-7de8-66f1-a1f0-db465bcad4ab@oracle.com> <9f2896ca-65dc-557f-793c-4235499cc340@oracle.com> <3fcc865d-3eda-a341-e112-8417711ee3e5@oracle.com> Message-ID: Thanks for fixing Patric, Looks good! Regards, Nils On 2017-10-04 11:04, Patric Hedlin wrote: > Thanks for reviewing Vladimir. > > On 09/29/2017 08:56 PM, Vladimir Kozlov wrote: >> In general it is fine. Few notes. >> You use ifdef DEBUG_SPARC_CAPS which is undefed at the beginning. Is >> it set by gcc by default? >> > Removed. > >> Coding style for methods definitions - open parenthesis should be on >> the same line: >> >> + bool match(const char* s) const >> + { >> > Updated/re-formated. > > Refreshed webrev. > > @Adrian: Please validate. > > Best regards, > Patric > >> Thanks, >> Vladimir >> >> On 9/29/17 6:08 AM, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8172232 >>> >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8172232/ >>> >>> >>> 8172232: SPARC ISA/CPU feature detection is broken/insufficient (on >>> Linux). >>> >>> Subsumes (duplicate) JDK-8186579: >>> VM_Version::platform_features() needs update on linux-sparc. >>> >>> >>> Caveat: >>> >>> This update will introduce some redundancies into the code >>> base, features and definitions >>> currently not used, addressed by subsequent bug or feature >>> updates/patches. Fujitsu HW is >>> treated very conservatively. >>> >>> >>> Testing: >>> >>> JDK9/JDK10 local jtreg/hotspot >>> >>> >>> Thanks to Adrian for additional test (and review) support. >>> >>> Tested-By: John Paul Adrian Glaubitz >>> >>> >>> Best regards, >>> Patric >>> > From erik.osterlund at oracle.com Thu Oct 26 14:39:47 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 26 Oct 2017 16:39:47 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> Message-ID: <59F1F3B3.10701@oracle.com> Hi Andrew, On 2017-10-26 10:58, Andrew Haley wrote: > BTW, I don't understand why interpreted code doesn't simply read the > polling page. Or we could even simply read-protect the bytecode > dispatch tables themselves. The reason we do not poll the page in the interpreter is that we need to generate appropriate relocation entries in the code blob for the PCs that we poll on, so that we in the signal handler can look up the code blob, walk the relocation entries, and find precisely why we got the trap, i.e. due to the poll, and precisely what kind of poll, so we know what trampoline needs to be taken into the runtime. While constructing something that does that is indeed possible, it simply did not seem worth the trouble compared to using a branch in these paths. The same reasoning applies for the poll performed in the native wrapper when waking up from native and transitioning into Java. It performs a conditional branch instead of indirect load to avoid signal handler logic for polls that are not performance critical. Only the polls in JIT-compiled code use the optimized indirect load mechanism. And we do not want to read-protect the bytecode dispatch tables, because we want the ability to stop individual threads, and that would stop all of them. I hope that explains it. Thanks, /Erik From bob.vandette at oracle.com Thu Oct 26 14:45:49 2017 From: bob.vandette at oracle.com (Bob Vandette) Date: Thu, 26 Oct 2017 10:45:49 -0400 Subject: RFR: 8146115 - Improve docker container detection and resource configuration usage In-Reply-To: References: <74630458-926E-4B3E-B967-6F6ADCA0A406@oracle.com> <2d9dd746-63e1-cade-28f9-5ca1ae1c253e@oracle.com> <200F07CB-35DA-492B-B78D-9EC033EE0431@oracle.com> <833ba1a5-49fc-bb24-ff99-994011af52aa@oracle.com> Message-ID: <321E60F4-567F-4648-BC5C-53903B6C95BF@oracle.com> > On Oct 25, 2017, at 2:57 AM, Kim Barrett wrote: > >> On Oct 24, 2017, at 10:11 AM, Bob Vandette wrote: >> >> >>> On Oct 23, 2017, at 12:52 AM, Kim Barrett wrote: >>> >>>> On Sep 27, 2017, at 9:20 PM, David Holmes wrote: >>>>>> 62 void set_subsystem_path(char *cgroup_path) { >>>>>> >>>>>> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere? >>>>> I tried several different ways of declaring the container accessor functions and >>>>> always ended up with warnings due to scanf not being able to validate arguments >>>>> since the format string didn?t end up being a string literal. I originally was using templates >>>>> and then ended up with the macros. I tried several different casts but could resolve the problem. >>>> >>>> Sounds like something Kim Barrett should take a look at :) >>> >>> Fortunately, I just happened by. >>> >>> The warnings are because we compile with -Wformat=2, which enables >>> -Wformat-nonliteral (among other things). >>> >>> Use PRAGMA_FORMAT_NONLITERAL_IGNORED, e.g. >>> >>> PRAGMA_DIAG_PUSH >>> PRAGMA_FORMAT_NONLITERAL_IGNORED >>> >>> PRAGMA_DIAG_POP >>> >>> That will silence warnings about sscanf (or anything else!) with a >>> non-literal format string within that . >> >> Thanks but I ended up taking a different approach that resulted in more compact code. >> >> http://cr.openjdk.java.net/~bobv/8146115/webrev.02 > > Not a review, just a few more comments in passing. > > ------------------------------------------------------------------------------ > src/hotspot/os/linux/osContainer_linux.cpp > 150 log_debug(os, container)("Type %s not found in file %s\n", \ > 151 scan_fmt , buf); \ > > uses buf as path, but buf has been clobbered to contain contents from > file. > > Similarly for > 155 log_debug(os, container)("Empty file %s\n", buf); \ I fixed these by adding an additional buffer for the read. > > ------------------------------------------------------------------------------ > src/hotspot/os/linux/osContainer_linux.cpp > 158 log_debug(os, container)("file not found %s\n", buf); \ > > There are many reasons why fopen might fail, and merging them all into > a "file not found" message could be quite confusing. It would be much > better to report the error from errno. I added os::strerror(errno) to all failures from fopen to provide more detail. > > ------------------------------------------------------------------------------ > src/hotspot/os/linux/osContainer_linux.cpp > > Something like the following (where the obvious helpers are made up to > keep this short) would eliminate the macrology. > > PRAGMA_DIAG_PUSH > PRAGMA_FORMAT_NONLITERAL_IGNORED > template > int get_subsystem_file_contents_value(CgroupSubsystem* c, > const char* filename, > T* returnval, > const char* scan_fmt, > const char* description) { > const char* line = get_subsystem_file_line(c, filename); > if (line != NULL) { > if (sscanf(line, scan_fmt, returnval) == 1) { > return 0; > } else { > report_subsystem_file_contents_parse_error(description, c, filename); > } > } > return OSCONTAINER_ERROR; > } > PRAGMA_DIAG_POP > > int subsystem_file_contents_int(CgroupSubsystem* c, > const char* filename, > int* returnval) { > return get_subsystem_file_contents_value(c, filename, returnval, "%d", "int"); > } > I originally tried to use a template but ran into the issue of the literal and strings needed to be handled differently. I wasn?t sure how to limit the length of the string but I now see that I can use something like ?%1023s?. I?ll give it a try. Bob. From aph at redhat.com Thu Oct 26 16:05:11 2017 From: aph at redhat.com (Andrew Haley) Date: Thu, 26 Oct 2017 17:05:11 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <59F1F3B3.10701@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> Message-ID: <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> On 26/10/17 15:39, Erik ?sterlund wrote: > The reason we do not poll the page in the interpreter is that we > need to generate appropriate relocation entries in the code blob for > the PCs that we poll on, so that we in the signal handler can look > up the code blob, walk the relocation entries, and find precisely > why we got the trap, i.e. due to the poll, and precisely what kind > of poll, so we know what trampoline needs to be taken into the > runtime. Not really, no. If we know that we're in the interpreter and the faulting address is the safepoint poll, then we can read all of the context we need from the interpreter registers and the frame. > While constructing something that does that is indeed possible, it > simply did not seem worth the trouble compared to using a branch in > these paths. The same reasoning applies for the poll performed in > the native wrapper when waking up from native and transitioning into > Java. It performs a conditional branch instead of indirect load to > avoid signal handler logic for polls that are not performance > critical. If we're talking about performance, the existing bytecode interpreter is exquisitely carefully coded, even going to the extent of having multiple dispatch tables for safepoint- and non-safepoint cases. Clearly the original authors weren't thinking that code was not performance critical or they wouldn't have done what they did. I suppose, though, that the design we have is from the early days when people diligently strove to make the interpreter as fast as possible. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Thu Oct 26 17:00:11 2017 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 26 Oct 2017 19:00:11 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> Message-ID: <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> Hi Andrew, > On 26 Oct 2017, at 18:05, Andrew Haley wrote: > >> On 26/10/17 15:39, Erik ?sterlund wrote: >> >> The reason we do not poll the page in the interpreter is that we >> need to generate appropriate relocation entries in the code blob for >> the PCs that we poll on, so that we in the signal handler can look >> up the code blob, walk the relocation entries, and find precisely >> why we got the trap, i.e. due to the poll, and precisely what kind >> of poll, so we know what trampoline needs to be taken into the >> runtime. > > Not really, no. If we know that we're in the interpreter and the > faulting address is the safepoint poll, then we can read all of the > context we need from the interpreter registers and the frame. That sounds like what I said. As I said, it is definitely possible to dig out that it was an interpreter safepoint poll causing the trap given the execution context in the interpreter (and appropriate metadata generated for the trapping PC), and send the trapping thread back to a trampoline that saves state appropriately and calls into the runtime to yield to the safepoint synchronizer, like we do for the JIT-compiled code. But the cost of the conditional branch is empirically (this was attempted and measured a while ago) approximately the same as the indirect load during "normal circumstances". The indirect load was only marginally better. Therefore that added complexity with the signal handler dance was simply not warranted for the interpreter. It was only warranted when polling in the absolutely most performance critical code, i.e. JIT compiled code. > >> While constructing something that does that is indeed possible, it >> simply did not seem worth the trouble compared to using a branch in >> these paths. The same reasoning applies for the poll performed in >> the native wrapper when waking up from native and transitioning into >> Java. It performs a conditional branch instead of indirect load to >> avoid signal handler logic for polls that are not performance >> critical. > > If we're talking about performance, the existing bytecode interpreter > is exquisitely carefully coded, even going to the extent of having > multiple dispatch tables for safepoint- and non-safepoint cases. > Clearly the original authors weren't thinking that code was not > performance critical or they wouldn't have done what they did. I > suppose, though, that the design we have is from the early days when > people diligently strove to make the interpreter as fast as possible. On the other hand, branches have become a lot faster in "recent" years, and this one is particularly trivial to predict. Therefore I prefer to base design decisions on empirical measurements. And introducing that complexity for an close to insignificantly faster interpreter poll does not seem encouraging to me. Do you agree? Thanks, /Erik > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From paul.sandoz at oracle.com Thu Oct 26 17:03:15 2017 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 26 Oct 2017 10:03:15 -0700 Subject: [10] RFR 8186046 Minimal ConstantDynamic support Message-ID: Hi, Please review the following patch for minimal dynamic constant support: http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ https://bugs.openjdk.java.net/browse/JDK-8186046 https://bugs.openjdk.java.net/browse/JDK-8186209 This patch is based on the JDK 10 unified HotSpot repository. Testing so far looks good. By minimal i mean just the support in the runtime for a dynamic constant pool entry to be referenced by a LDC instruction or a bootstrap method argument. Much of the work leverages the foundations built by invoke dynamic but is arguably simpler since resolution is less complex. A small set of bootstrap methods will be proposed as a follow on issue for 10 (these are currently being refined in the amber repository). Bootstrap method invocation has not changed (and the rules are the same for dynamic constants and indy). It is planned to enhance this in a further major release to support lazy resolution of bootstrap method arguments. The CSR for the VM specification is here: https://bugs.openjdk.java.net/browse/JDK-8189199 the j.l.invoke package documentation was also updated but please consider the VM specification as the definitive "source of truth" (we may clean up this area further later on so it becomes more informative, and that may also apply to duplicative text on MethodHandles/VarHandles). Any AoT-related work will be deferred to a future release. ? This patch only supports x64 platforms. There is a small set of changes specific to x64 (specifically to support null and primitives constants, as prior to this patch null was used as a sentinel for resolution and certain primitives types would never have been encountered, such as say byte). We will need to follow up with the SPARC platform and it is hoped/anticipated that OpenJDK members responsible for other platforms (namely ARM and PPC) will separately provide patches. ? Many of tests rely on an experimental byte code API that supports the generation of byte code with dynamic constants. One test uses class file bytes produced from a modified version of asmtools. The modifications have now been pushed but a new version of asmtools need to be rolled into jtreg before the test can operate directly on asmtools information rather than embedding class file bytes directly in the test. ? Paul. From aph at redhat.com Thu Oct 26 17:19:06 2017 From: aph at redhat.com (Andrew Haley) Date: Thu, 26 Oct 2017 18:19:06 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> Message-ID: On 26/10/17 18:00, Erik Osterlund wrote: > Hi Andrew, > >> On 26 Oct 2017, at 18:05, Andrew Haley wrote: >> >>> On 26/10/17 15:39, Erik ?sterlund wrote: >>> >>> The reason we do not poll the page in the interpreter is that we >>> need to generate appropriate relocation entries in the code blob for >>> the PCs that we poll on, so that we in the signal handler can look >>> up the code blob, walk the relocation entries, and find precisely >>> why we got the trap, i.e. due to the poll, and precisely what kind >>> of poll, so we know what trampoline needs to be taken into the >>> runtime. >> >> Not really, no. If we know that we're in the interpreter and the >> faulting address is the safepoint poll, then we can read all of the >> context we need from the interpreter registers and the frame. > > That sounds like what I said. Not exactly. We do not need to generate any more relocation entries. > But the cost of the conditional branch is empirically (this was > attempted and measured a while ago) approximately the same as the > indirect load during "normal circumstances". The indirect load was > only marginally better. That's interesting. The cost of the SEGV trap going through the kernel is fairly high, and I'm now wondering if, for very fast safepoint responses, we'd be better off not doing it. The cost of the write protect, given that it probably involves an IPI on all processors, isn't cheap either. >>> While constructing something that does that is indeed possible, it >>> simply did not seem worth the trouble compared to using a branch in >>> these paths. The same reasoning applies for the poll performed in >>> the native wrapper when waking up from native and transitioning into >>> Java. It performs a conditional branch instead of indirect load to >>> avoid signal handler logic for polls that are not performance >>> critical. >> >> If we're talking about performance, the existing bytecode interpreter >> is exquisitely carefully coded, even going to the extent of having >> multiple dispatch tables for safepoint- and non-safepoint cases. >> Clearly the original authors weren't thinking that code was not >> performance critical or they wouldn't have done what they did. I >> suppose, though, that the design we have is from the early days when >> people diligently strove to make the interpreter as fast as possible. > > On the other hand, branches have become a lot faster in "recent" > years, and this one is particularly trivial to predict. Therefore I > prefer to base design decisions on empirical measurements. And > introducing that complexity for an close to insignificantly faster > interpreter poll does not seem encouraging to me. Do you agree? Perhaps. It's interesting that the result falls one way in compiled code and the other in interpreted code. If the choice is so very finely balanced, though, it sort-of makes sense. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kim.barrett at oracle.com Thu Oct 26 18:45:03 2017 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 26 Oct 2017 14:45:03 -0400 Subject: RFR: 8163897: oop_store has unnecessary memory barriers In-Reply-To: References: Message-ID: <198FAF45-59AD-4618-86B7-279C81248F9B@oracle.com> > On Oct 23, 2017, at 1:44 AM, Kim Barrett wrote: > > Please review this change to the oop_store function template, which > removes some unnecessary memory barriers, moves CMS-specific code into > GC-specific (though not completely CMS-specific) areas, and cleans up > the API a bit. See the CR for more details about the problems. Due to some miscommunication, Erik O and I have both developed solutions to this. Mine is a stand-alone piece of work for me, while his is some number of changes in a long patch train. In the interest of not imposing possibly messy merging on Erik, I'm withdrawing this RFR and reassigning the bug to him. From mandy.chung at oracle.com Thu Oct 26 18:47:19 2017 From: mandy.chung at oracle.com (mandy chung) Date: Thu, 26 Oct 2017 11:47:19 -0700 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> Message-ID: On 10/26/17 2:57 AM, Magnus Ihse Bursie wrote: > A third option is to remove the support for link-time-opt entirely, if > it's not really used. > > * src/java.base/unix/native/include/jvm_md.h and > src/java.base/windows/native/include/jvm_md.h: > > These files define a public API, and contain non-trivial changes. I > suspect you should file a CSR request. (Even though I realize you're > only matching the header file with the reality.) jvm.h and jvm_md.h are not public API and they are not copied to the $JAVA_HOME/includes directly.? This does raise the question that jvm*.h may belong to other location than src/java.base/{share,$OS}/native/include. Mandy From coleen.phillimore at oracle.com Thu Oct 26 20:34:30 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 26 Oct 2017 16:34:30 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> Message-ID: <8e157a28-5397-95c1-03dc-de6d0d3d37e8@oracle.com> On 10/26/17 2:47 PM, mandy chung wrote: > > > On 10/26/17 2:57 AM, Magnus Ihse Bursie wrote: >> A third option is to remove the support for link-time-opt entirely, >> if it's not really used. >> >> * src/java.base/unix/native/include/jvm_md.h and >> src/java.base/windows/native/include/jvm_md.h: >> >> These files define a public API, and contain non-trivial changes. I >> suspect you should file a CSR request. (Even though I realize you're >> only matching the header file with the reality.) > > jvm.h and jvm_md.h are not public API and they are not copied to the > $JAVA_HOME/includes directly.? This does raise the question that > jvm*.h may belong to other location than > src/java.base/{share,$OS}/native/include. I'm not sure where else it would go honestly, but it could be moved outside this changeset.? The good thing about where it is, is that the -I directives in the makefiles find both jni.h and jvm.h. thanks, Coleen > > Mandy From coleen.phillimore at oracle.com Thu Oct 26 20:44:05 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 26 Oct 2017 16:44:05 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> Message-ID: <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> ?Hi Magnus, Thank you for reviewing this.?? I have a new version that takes out the hack in globalDefinitions.hpp and adds casts to src/hotspot/share/opto/type.cpp instead. Also some fixes from Martin at SAP. open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev see below. On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: > Coleen, > > Thank you for addressing this! > > On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >> Summary: removed hotspot version of jvm*h and jni*h files >> >> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" after >> precompiled.h, so if you have repetitive stress wrist issues don't >> click on most of these files. >> >> There were more issues to resolve, however.? The JDK windows jni_md.h >> file defined jint as long and the hotspot windows jni_x86.h as int.? >> I had to choose the jdk version since it's the public version, so >> there are changes to the hotspot files for this. Generally I changed >> the code to use 'int' rather than 'jint' where the surrounding API >> didn't insist on consistently using java types. We should mostly be >> using C++ types within hotspot except in interfaces to native/JNI >> code.? There are a couple of hacks in places where adding multiple >> jint casts was too painful. >> >> Tested with JPRT and tier2-4 (in progress). >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev > > Looks great! > > Just a few comments: > > * src/java.base/unix/native/include/jni_md.h: > > I don't think the externally_visible attribute should be there for > arm. I know this was the case for the corresponding hotspot file for > arm, but that was techically incorrect. The proper dependency here is > that externally_visible should be in all JNIEXPORT if and only if > we're building with JVM feature "link-time-opt". Traditionally, that > feature been enabled when building arm32 builds, and only then, so > there's been a (coincidentally) connection here. Nowadays, Oracle does > not care about the arm32 builds, and I'm not sure if anyone else is > building them with link-time-opt enabled. > > It does seem wrong to me to export this behavior in the public > jni_md.h file, though. I think the correct way to solve this, if we > should continue supporting link-time-opt is to make sure this > attribute is set for exported hotspot functions. If it's still needed, > that is. A quick googling seems to indicate that visibility("default") > might be enough in modern gcc's. > > A third option is to remove the support for link-time-opt entirely, if > it's not really used. I didn't know how to change this since we are still building ARM with the jdk10/hs repository, and ARM needed this change.? I could wait until we bring down the jdk10/master changes that remove the ARM build and remove this conditional before I push.? Or we could file an RFE to remove link-time-opt (?) and remove it then? > > * src/java.base/unix/native/include/jvm_md.h and > src/java.base/windows/native/include/jvm_md.h: > > These files define a public API, and contain non-trivial changes. I > suspect you should file a CSR request. (Even though I realize you're > only matching the header file with the reality.) > I filed the CSR.?? Waiting for the next steps. Thanks, Coleen > /Magnus > >> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >> >> I have a script to update copyright files on commit. >> >> Thanks to Magnus and ErikJ for the makefile changes. >> >> Thanks, >> Coleen >> > From mandy.chung at oracle.com Thu Oct 26 21:27:19 2017 From: mandy.chung at oracle.com (mandy chung) Date: Thu, 26 Oct 2017 14:27:19 -0700 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <8e157a28-5397-95c1-03dc-de6d0d3d37e8@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <8e157a28-5397-95c1-03dc-de6d0d3d37e8@oracle.com> Message-ID: <15a07ec6-3fc3-f757-1711-8d088d194115@oracle.com> On 10/26/17 1:34 PM, coleen.phillimore at oracle.com wrote: > > > On 10/26/17 2:47 PM, mandy chung wrote: >> >> >> On 10/26/17 2:57 AM, Magnus Ihse Bursie wrote: >>> A third option is to remove the support for link-time-opt entirely, >>> if it's not really used. >>> >>> * src/java.base/unix/native/include/jvm_md.h and >>> src/java.base/windows/native/include/jvm_md.h: >>> >>> These files define a public API, and contain non-trivial changes. I >>> suspect you should file a CSR request. (Even though I realize you're >>> only matching the header file with the reality.) >> >> jvm.h and jvm_md.h are not public API and they are not copied to the >> $JAVA_HOME/includes directly.? This does raise the question that >> jvm*.h may belong to other location than >> src/java.base/{share,$OS}/native/include. > > I'm not sure where else it would go honestly, but it could be moved > outside this changeset.? The good thing about where it is, is that the > -I directives in the makefiles find both jni.h and jvm.h. I agree we should keep this location for this change (the location is a separate issue).? I reviewed the change that looks good to me. Mandy From ioi.lam at oracle.com Thu Oct 26 21:53:15 2017 From: ioi.lam at oracle.com (Ioi Lam) Date: Thu, 26 Oct 2017 14:53:15 -0700 Subject: RFR [S] JDK-8179624 [REDO] Avoid repeated calls to JavaThread::last_frame in InterpreterRuntime Message-ID: <842ce767-4436-02a3-f536-b71fed1fa6ed@oracle.com> Hi, Please review the following change. It's a redo of a previous botched attempt (JDK-8179305) that had a typo which caused JIT-related crashes. Thanks to Dean for spotting the typo. + Bug https://bugs.openjdk.java.net/browse/JDK-8179624 + The full changeset: http://cr.openjdk.java.net/~iklam/jdk10/8179624-redo-8179305-avoid-last-frame.v01.full/ + The delta from the botched attempt ? (fixing the typo with monitor_begin/monitor_end): http://cr.openjdk.java.net/~iklam/jdk10/8179624-redo-8179305-avoid-last-frame.v01.redo_delta/ + Testing: hotspot tier1~5 tests. Thanks - Ioi From dean.long at oracle.com Thu Oct 26 23:41:35 2017 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 26 Oct 2017 16:41:35 -0700 Subject: RFR [S] JDK-8179624 [REDO] Avoid repeated calls to JavaThread::last_frame in InterpreterRuntime In-Reply-To: <842ce767-4436-02a3-f536-b71fed1fa6ed@oracle.com> References: <842ce767-4436-02a3-f536-b71fed1fa6ed@oracle.com> Message-ID: <958fef30-03d7-5b8c-1f3b-0bcca945565f@oracle.com> Looks good. dl On 10/26/17 2:53 PM, Ioi Lam wrote: > Hi, > > Please review the following change. It's a redo of a previous botched > attempt (JDK-8179305) that had a typo which caused JIT-related crashes. > > Thanks to Dean for spotting the typo. > > + Bug > https://bugs.openjdk.java.net/browse/JDK-8179624 > > > + The full changeset: > http://cr.openjdk.java.net/~iklam/jdk10/8179624-redo-8179305-avoid-last-frame.v01.full/ > > > > + The delta from the botched attempt > ? (fixing the typo with monitor_begin/monitor_end): > http://cr.openjdk.java.net/~iklam/jdk10/8179624-redo-8179305-avoid-last-frame.v01.redo_delta/ > > > > + Testing: > hotspot tier1~5 tests. > > > Thanks > - Ioi From hohensee at amazon.com Thu Oct 26 23:54:35 2017 From: hohensee at amazon.com (Hohensee, Paul) Date: Thu, 26 Oct 2017 23:54:35 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> Message-ID: <4815B009-174E-4363-A60F-7EC3D4EDE3ED@amazon.com> As a reference point, Android Java branches on a flag in the TLS rather than issuing a poisoned page probe. On x86 at least, there?s no performance disadvantage: branch prediction makes the compare-and-branch pair a single-cycle operation in the vast majority of cases. The interpreter was built at a time when branches had non-zero cost, as evidenced by the prediction bits in the sparc64 predicted branch instructions. The compare-and-branch code sequence takes up icache space in the interpreter (vs. zero for switching the dispatch table) and icache is still a limited resource on modern processors, so that?s an argument for switching dispatch tables. For compiled code, compare-and-branch takes a bit more space than the current poison page probe, but not enough to matter imo. Compiled code is executed far more than interpreter code, so I?d go with optimizing compiled code performance. Thanks, Paul On 10/26/17, 10:20 AM, "hotspot-dev on behalf of Andrew Haley" wrote: On 26/10/17 18:00, Erik Osterlund wrote: > Hi Andrew, > >> On 26 Oct 2017, at 18:05, Andrew Haley wrote: >> >>> On 26/10/17 15:39, Erik ?sterlund wrote: >>> >>> The reason we do not poll the page in the interpreter is that we >>> need to generate appropriate relocation entries in the code blob for >>> the PCs that we poll on, so that we in the signal handler can look >>> up the code blob, walk the relocation entries, and find precisely >>> why we got the trap, i.e. due to the poll, and precisely what kind >>> of poll, so we know what trampoline needs to be taken into the >>> runtime. >> >> Not really, no. If we know that we're in the interpreter and the >> faulting address is the safepoint poll, then we can read all of the >> context we need from the interpreter registers and the frame. > > That sounds like what I said. Not exactly. We do not need to generate any more relocation entries. > But the cost of the conditional branch is empirically (this was > attempted and measured a while ago) approximately the same as the > indirect load during "normal circumstances". The indirect load was > only marginally better. That's interesting. The cost of the SEGV trap going through the kernel is fairly high, and I'm now wondering if, for very fast safepoint responses, we'd be better off not doing it. The cost of the write protect, given that it probably involves an IPI on all processors, isn't cheap either. >>> While constructing something that does that is indeed possible, it >>> simply did not seem worth the trouble compared to using a branch in >>> these paths. The same reasoning applies for the poll performed in >>> the native wrapper when waking up from native and transitioning into >>> Java. It performs a conditional branch instead of indirect load to >>> avoid signal handler logic for polls that are not performance >>> critical. >> >> If we're talking about performance, the existing bytecode interpreter >> is exquisitely carefully coded, even going to the extent of having >> multiple dispatch tables for safepoint- and non-safepoint cases. >> Clearly the original authors weren't thinking that code was not >> performance critical or they wouldn't have done what they did. I >> suppose, though, that the design we have is from the early days when >> people diligently strove to make the interpreter as fast as possible. > > On the other hand, branches have become a lot faster in "recent" > years, and this one is particularly trivial to predict. Therefore I > prefer to base design decisions on empirical measurements. And > introducing that complexity for an close to insignificantly faster > interpreter poll does not seem encouraging to me. Do you agree? Perhaps. It's interesting that the result falls one way in compiled code and the other in interpreted code. If the choice is so very finely balanced, though, it sort-of makes sense. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ioi.lam at oracle.com Fri Oct 27 00:02:37 2017 From: ioi.lam at oracle.com (Ioi Lam) Date: Thu, 26 Oct 2017 17:02:37 -0700 Subject: RFR [S] JDK-8179624 [REDO] Avoid repeated calls to JavaThread::last_frame in InterpreterRuntime In-Reply-To: <958fef30-03d7-5b8c-1f3b-0bcca945565f@oracle.com> References: <842ce767-4436-02a3-f536-b71fed1fa6ed@oracle.com> <958fef30-03d7-5b8c-1f3b-0bcca945565f@oracle.com> Message-ID: Thanks Dean! - Ioi On 10/26/17 4:41 PM, dean.long at oracle.com wrote: > Looks good. > > dl > > > On 10/26/17 2:53 PM, Ioi Lam wrote: >> Hi, >> >> Please review the following change. It's a redo of a previous botched >> attempt (JDK-8179305) that had a typo which caused JIT-related crashes. >> >> Thanks to Dean for spotting the typo. >> >> + Bug >> https://bugs.openjdk.java.net/browse/JDK-8179624 >> >> >> + The full changeset: >> http://cr.openjdk.java.net/~iklam/jdk10/8179624-redo-8179305-avoid-last-frame.v01.full/ >> >> >> >> + The delta from the botched attempt >> ? (fixing the typo with monitor_begin/monitor_end): >> http://cr.openjdk.java.net/~iklam/jdk10/8179624-redo-8179305-avoid-last-frame.v01.redo_delta/ >> >> >> >> + Testing: >> hotspot tier1~5 tests. >> >> >> Thanks >> - Ioi > From erik.osterlund at oracle.com Fri Oct 27 06:51:48 2017 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 27 Oct 2017 08:51:48 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <4815B009-174E-4363-A60F-7EC3D4EDE3ED@amazon.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <4815B009-174E-4363-A60F-7EC3D4EDE3ED@amazon.com> Message-ID: Hi Paul, Regarding confitional branch on the TLS, I have the following to say: 1) Mikael Gerdin tried an earlier prorotype doing that, and found that indirect lod was more desirable for now. The reason is that the performance of the branch variant is more sensitive to chip details such as the number of branch ports on the reservation stations (double branch ports were introduced in haswell). On some chips the branch would marginally win, on some it would marginally lose. But there are more pathological cases for the branch, like e.g. a nonsense loop that does not do anything but loop. Arguably that is a nonsense benchmark though. But since the indirect load was less sensitive to the chip details, always performed well consistently, was more predictable and deterministic, that approach was selected. Perhaps this decision may change in a few years, but it seems a bit early for that now. 2) As for the number of bytes in the code stream of the global testl (x86) to a conditional branch on TLS, you can get an optimal encoding of the branch variant of the same length, 6 bytes, on x86. The optimal testb on offset zero is 4 bytes and a short branch is 2 bytes. For the curious reader: in the past (years ago now) I prototyped getting the optimal machine encoding of a TLS conditional branch poll. I ended up exposing different thread pointers to the TLS register at an offset into Thread in the JIT to be able to get that offset zero, and changing locking code to deal with the owner being misaligned, and all sorts of fun. But it ultimately didn't seem to make any measurable difference at all. But I got the T-shirt anyway. Hope this explains why the indirect load was chosen over the conditional branch. Thanks, /Erik > On 27 Oct 2017, at 01:54, Hohensee, Paul wrote: > > As a reference point, Android Java branches on a flag in the TLS rather than issuing a poisoned page probe. On x86 at least, there?s no performance disadvantage: branch prediction makes the compare-and-branch pair a single-cycle operation in the vast majority of cases. > > The interpreter was built at a time when branches had non-zero cost, as evidenced by the prediction bits in the sparc64 predicted branch instructions. The compare-and-branch code sequence takes up icache space in the interpreter (vs. zero for switching the dispatch table) and icache is still a limited resource on modern processors, so that?s an argument for switching dispatch tables. For compiled code, compare-and-branch takes a bit more space than the current poison page probe, but not enough to matter imo. Compiled code is executed far more than interpreter code, so I?d go with optimizing compiled code performance. > > Thanks, > > Paul > > On 10/26/17, 10:20 AM, "hotspot-dev on behalf of Andrew Haley" wrote: > > On 26/10/17 18:00, Erik Osterlund wrote: >> Hi Andrew, >> >>>> On 26 Oct 2017, at 18:05, Andrew Haley wrote: >>>> >>>> On 26/10/17 15:39, Erik ?sterlund wrote: >>>> >>>> The reason we do not poll the page in the interpreter is that we >>>> need to generate appropriate relocation entries in the code blob for >>>> the PCs that we poll on, so that we in the signal handler can look >>>> up the code blob, walk the relocation entries, and find precisely >>>> why we got the trap, i.e. due to the poll, and precisely what kind >>>> of poll, so we know what trampoline needs to be taken into the >>>> runtime. >>> >>> Not really, no. If we know that we're in the interpreter and the >>> faulting address is the safepoint poll, then we can read all of the >>> context we need from the interpreter registers and the frame. >> >> That sounds like what I said. > > Not exactly. We do not need to generate any more relocation entries. > >> But the cost of the conditional branch is empirically (this was >> attempted and measured a while ago) approximately the same as the >> indirect load during "normal circumstances". The indirect load was >> only marginally better. > > That's interesting. The cost of the SEGV trap going through the > kernel is fairly high, and I'm now wondering if, for very fast > safepoint responses, we'd be better off not doing it. The cost of the > write protect, given that it probably involves an IPI on all > processors, isn't cheap either. > >>>> While constructing something that does that is indeed possible, it >>>> simply did not seem worth the trouble compared to using a branch in >>>> these paths. The same reasoning applies for the poll performed in >>>> the native wrapper when waking up from native and transitioning into >>>> Java. It performs a conditional branch instead of indirect load to >>>> avoid signal handler logic for polls that are not performance >>>> critical. >>> >>> If we're talking about performance, the existing bytecode interpreter >>> is exquisitely carefully coded, even going to the extent of having >>> multiple dispatch tables for safepoint- and non-safepoint cases. >>> Clearly the original authors weren't thinking that code was not >>> performance critical or they wouldn't have done what they did. I >>> suppose, though, that the design we have is from the early days when >>> people diligently strove to make the interpreter as fast as possible. >> >> On the other hand, branches have become a lot faster in "recent" >> years, and this one is particularly trivial to predict. Therefore I >> prefer to base design decisions on empirical measurements. And >> introducing that complexity for an close to insignificantly faster >> interpreter poll does not seem encouraging to me. Do you agree? > > Perhaps. It's interesting that the result falls one way in compiled > code and the other in interpreted code. If the choice is so very > finely balanced, though, it sort-of makes sense. > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > From erik.osterlund at oracle.com Fri Oct 27 07:11:32 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 27 Oct 2017 09:11:32 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> Message-ID: <59F2DC24.8050701@oracle.com> Hi Andrew, On 2017-10-26 19:19, Andrew Haley wrote: > On 26/10/17 18:00, Erik Osterlund wrote: >> Hi Andrew, >> >>> On 26 Oct 2017, at 18:05, Andrew Haley wrote: >>> >>>> On 26/10/17 15:39, Erik ?sterlund wrote: >>>> >>>> The reason we do not poll the page in the interpreter is that we >>>> need to generate appropriate relocation entries in the code blob for >>>> the PCs that we poll on, so that we in the signal handler can look >>>> up the code blob, walk the relocation entries, and find precisely >>>> why we got the trap, i.e. due to the poll, and precisely what kind >>>> of poll, so we know what trampoline needs to be taken into the >>>> runtime. >>> Not really, no. If we know that we're in the interpreter and the >>> faulting address is the safepoint poll, then we can read all of the >>> context we need from the interpreter registers and the frame. >> That sounds like what I said. > Not exactly. We do not need to generate any more relocation entries. Maybe. >> But the cost of the conditional branch is empirically (this was >> attempted and measured a while ago) approximately the same as the >> indirect load during "normal circumstances". The indirect load was >> only marginally better. > That's interesting. The cost of the SEGV trap going through the > kernel is fairly high, and I'm now wondering if, for very fast > safepoint responses, we'd be better off not doing it. The cost of the > write protect, given that it probably involves an IPI on all > processors, isn't cheap either. The current mechanism does not use mprotect to stop threads. It has one global trapping page and one global not trapping page. It simply performs stores to flip the polling word to point at the trapping page. So I am not so concerned about TLB shootdown costs here. As for the SEGV, the mechanism was stress tested (shooting handshakes on all threads continuously) to see how expensive the SEGV was, and the outcome was that it was surprisingly cheap. So we did not pursue making the slow path faster. > >>>> While constructing something that does that is indeed possible, it >>>> simply did not seem worth the trouble compared to using a branch in >>>> these paths. The same reasoning applies for the poll performed in >>>> the native wrapper when waking up from native and transitioning into >>>> Java. It performs a conditional branch instead of indirect load to >>>> avoid signal handler logic for polls that are not performance >>>> critical. >>> If we're talking about performance, the existing bytecode interpreter >>> is exquisitely carefully coded, even going to the extent of having >>> multiple dispatch tables for safepoint- and non-safepoint cases. >>> Clearly the original authors weren't thinking that code was not >>> performance critical or they wouldn't have done what they did. I >>> suppose, though, that the design we have is from the early days when >>> people diligently strove to make the interpreter as fast as possible. >> On the other hand, branches have become a lot faster in "recent" >> years, and this one is particularly trivial to predict. Therefore I >> prefer to base design decisions on empirical measurements. And >> introducing that complexity for an close to insignificantly faster >> interpreter poll does not seem encouraging to me. Do you agree? > Perhaps. It's interesting that the result falls one way in compiled > code and the other in interpreted code. If the choice is so very > finely balanced, though, it sort-of makes sense. Yeah. I wrote about that decision to use indirect load instead of conditional branch in compiled code in an email to Paul if you are interested. Thanks, /Erik From david.holmes at oracle.com Fri Oct 27 07:23:34 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 27 Oct 2017 17:23:34 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> Message-ID: <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> Hi Coleen, Thanks for tackling this. > Summary: removed hotspot version of jvm*h and jni*h files Can you update the bug synopsis to show it covers both sets of files please. I hate to start with this (and it took me quite a while to realize it) but as Mandy pointed out jvm.h is not an exported interface from the JDK to the outside world (so not subject to CSR review), but is a private interface between the JVM and the JDK libraries. So I think really jvm.h belongs in the hotspot sources where it was, while jni.h belongs in the exported JDK sources. In which case the bulk of your changes to the hotspot files would not be needed - sorry. Moving on ... First to address the initial comments/query you had: > The JDK windows jni_md.h file defined jint as long and the hotspot > windows jni_x86.h as int. I had to choose the jdk version since it's the > public version, so there are changes to the hotspot files for this. On Windows int and long are always the same as it uses ILP32 or LLP64 (not LP64 like *nix platforms). So either choice should be fine. That said there are some odd casting issues I comment on below. Does the VS compiler complain about mixing int and long in expressions? > Generally I changed the code to use 'int' rather than 'jint' where the > surrounding API didn't insist on consistently using java types. We > should mostly be using C++ types within hotspot except in interfaces to > native/JNI code. I think you pulled too hard on a few threads here and things are starting to unravel. There are numerous cases I refer to below where either the cast seems unnecessary/inappropriate or else highlights a bunch of additional changes that also need to be made. The fan out from this could be horrendous. Unless you actually get some kind of error - and I'd like to understand the details of those - I would not suggest making these changes as part of this work. Looking through I have a quite a few queries/comments - apologies in advance as I know how tedious this is: make/hotspot/lib/CompileLibjsig.gmk src/java.base/solaris/native/libjsig/jsig.c Took a while to figure out why the include was needed. :) As a follow up I suggest just deleting the -I include directive, delete the Solaris-only definition of JSIG_VERSION_1_4_1, and delete everything to do with JVM_get_libjsig_version. It is all obsolete. --- src/hotspot/cpu/arm/interp_masm_arm.cpp Why did you need to add the jvm.h include? --- src/hotspot/os/windows/os_windows.cpp. The type of process_exiting should be uint to match the DWORD of GetCurrentThreadID. Then you should need any casts. Also you missed this jint cast: 3796 process_exiting != (jint)GetCurrentThreadId()) { --- src/hotspot/share/c1/c1_Canonicalizer.hpp 43 #ifdef _WINDOWS 44 // jint is defined as long in jni_md.h, so convert from int to jint 45 void set_constant(int x) { set_constant((jint)x); } 46 #endif Why is this necessary? int and long are the same on Windows. The whole point is that jint hides the underlying type, so where does this go wrong? --- src/hotspot/share/c1/c1_LinearScan.cpp ConstantIntValue((jint)0); why is this cast needed? what causes the ambiguity? (If this was a template I'd understand ;-) ). Also didn't you change that constructor to take an int anyway - not that I think it should - see below. --- src/hotspot/share/ci/ciReplay.cpp 793 jint* dims = NEW_RESOURCE_ARRAY(jint, rank); why should this be jint? --- src/hotspot/share/classfile/altHashing.cpp Okay this looks more consistent with jint. --- src/hotspot/share/code/debugInfo.hpp These changes seem wrong. We have: ConstantLongValue(jlong value) ConstantDoubleValue(jdouble value) so we should have: ConstantIntValue(jint value) --- src/hotspot/share/code/relocInfo.cpp Change seems unnecessary - int32_t is fine --- src/hotspot/share/compiler/compileBroker.cpp src/hotspot/share/compiler/compileBroker.hpp I see a complete mix of int and jint in this class, so why make the one change you did ?? --- src/hotspot/share/jvmci/jvmciCompilerToVM.cpp 1700 tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); why did you need to add the jint cast? It's used without any cast on the next two lines: 1701 length -= O_BUFLEN; 1702 offset += O_BUFLEN; ?? --- src/hotspot/share/jvmci/jvmciRuntime.cpp Looking around this code it seems very confused about types - eg the previous function is declared jboolean yet returns a jint on one path! It isn't clear to me if the return type is what should be changed or the parameter type? I would just leave this alone. --- src/hotspot/share/opto/mulnode.cpp Okay TypeInt has jint parts, so the remaining int32_t declarations (A, B, C, D) should also be jint. --- src/hotspot/share/opto/parse3.cpp I agree with the changes you made, but then: 419 jint dim_con = find_int_con(length[j], -1); should also be changed. And obviously MultiArrayExpandLimit should be defined as int not intx! --- src/hotspot/share/opto/phaseX.cpp I can see that intcon(jint i) is consistent with longcon(jlong l), but the use of "i" in the code is more consistent with int than jint. --- src/hotspot/share/opto/type.cpp 1505 int TypeInt::hash(void) const { 1506 return java_add(java_add(_lo, _hi), java_add((jint)_widen, (jint)Type::Int)); 1507 } I can see that the (jint) casts you added make sense, but then the whole function should be returning jint not int. Ditto the other hash functions. --- src/hotspot/share/prims/jni.cpp I think vm_created should be a bool. In fact all the fields you changed are logically bools - do Atomics work for bool now? --- src/hotspot/share/prims/jvm.cpp is_attachable is the terminology used in the JDK code. --- src/hotspot/share/prims/jvmtiEnvBase.cpp src/hotspot/share/prims/jvmtiImpl.cpp Are you making parameters consistent with the fields they initialize? --- src/hotspot/share/prims/jvmtiTagMap.cpp There is a mix of int and jint for slot in this code. You fixed some, but this remains: 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong thread_tag, 2441 jlong tid, 2442 jint depth, 2443 jmethodID method, 2444 jlocation bci, 2445 jint slot, --- src/hotspot/share/runtime/perfData.cpp Callers pass both jint and int, so param type seems arbitrary. --- src/hotspot/share/runtime/perfMemory.cpp src/hotspot/share/runtime/perfMemory.hpp PerfMemory::_initialized should ideally be a bool - can OrderAccess handle that now? --- src/java.base/share/native/include/jvm.h Not clear why the jio functions are not also JNICALL ? --- src/java.base/unix/native/include/jni_md.h There is no need to special case ARM. The differences in the existing code were for LTO support and that is now irrelevant. --- src/java.base/unix/native/include/jvm_md.h I know you've just copied this across, but it seems wrong to me: 57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This may 58 // cause problems if JVM and the rest of JDK are built on different 59 // Linux releases. Here we define JVM_MAXPATHLEN to be MAXPATHLEN + 1, 60 // so buffers declared in VM are always >= 4096. 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 It doesn't make sense to me to define an internal "max path length" that can _exceed_ the platform max! That aside there's no support for building different parts of the JDK on different platforms and then bringing them together. And in any case I would think the real problem would be building on a platform that uses 4096 and running on one that uses 4095! But that aside this is a Linux hack and should be guarded by ifdef LINUX. (I doubt BSD needs it, the bsd file is just a copy of the linux one - the JDK macosx version does the right thing). Solaris and AIX should stay as-is at MAXPATHLEN. 86 #define ASYNC_SIGNAL SIGJVM2 This only exists on Solaris so I think should be in #ifdef SOLARIS, to make that clear. --- src/java.base/windows/native/include/jvm_md.h Given the differences between the two versions either something has been broken or "extern C" declarations are not needed :) --- That was a really painful way to spend most of my Friday. TGIF! :) Thanks, David ----- On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: > ?Hi Magnus, > > Thank you for reviewing this.?? I have a new version that takes out the > hack in globalDefinitions.hpp and adds casts to > src/hotspot/share/opto/type.cpp instead. > > Also some fixes from Martin at SAP. > > open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev > > see below. > > On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >> Coleen, >> >> Thank you for addressing this! >> >> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>> Summary: removed hotspot version of jvm*h and jni*h files >>> >>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" after >>> precompiled.h, so if you have repetitive stress wrist issues don't >>> click on most of these files. >>> >>> There were more issues to resolve, however.? The JDK windows jni_md.h >>> file defined jint as long and the hotspot windows jni_x86.h as int. I >>> had to choose the jdk version since it's the public version, so there >>> are changes to the hotspot files for this. Generally I changed the >>> code to use 'int' rather than 'jint' where the surrounding API didn't >>> insist on consistently using java types. We should mostly be using >>> C++ types within hotspot except in interfaces to native/JNI code. >>> There are a couple of hacks in places where adding multiple jint >>> casts was too painful. >>> >>> Tested with JPRT and tier2-4 (in progress). >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >> >> Looks great! >> >> Just a few comments: >> >> * src/java.base/unix/native/include/jni_md.h: >> >> I don't think the externally_visible attribute should be there for >> arm. I know this was the case for the corresponding hotspot file for >> arm, but that was techically incorrect. The proper dependency here is >> that externally_visible should be in all JNIEXPORT if and only if >> we're building with JVM feature "link-time-opt". Traditionally, that >> feature been enabled when building arm32 builds, and only then, so >> there's been a (coincidentally) connection here. Nowadays, Oracle does >> not care about the arm32 builds, and I'm not sure if anyone else is >> building them with link-time-opt enabled. >> >> It does seem wrong to me to export this behavior in the public >> jni_md.h file, though. I think the correct way to solve this, if we >> should continue supporting link-time-opt is to make sure this >> attribute is set for exported hotspot functions. If it's still needed, >> that is. A quick googling seems to indicate that visibility("default") >> might be enough in modern gcc's. >> >> A third option is to remove the support for link-time-opt entirely, if >> it's not really used. > > I didn't know how to change this since we are still building ARM with > the jdk10/hs repository, and ARM needed this change.? I could wait until > we bring down the jdk10/master changes that remove the ARM build and > remove this conditional before I push.? Or we could file an RFE to > remove link-time-opt (?) and remove it then? > >> >> * src/java.base/unix/native/include/jvm_md.h and >> src/java.base/windows/native/include/jvm_md.h: >> >> These files define a public API, and contain non-trivial changes. I >> suspect you should file a CSR request. (Even though I realize you're >> only matching the header file with the reality.) >> > > I filed the CSR.?? Waiting for the next steps. > > Thanks, > Coleen > >> /Magnus >> >>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>> >>> I have a script to update copyright files on commit. >>> >>> Thanks to Magnus and ErikJ for the makefile changes. >>> >>> Thanks, >>> Coleen >>> >> > From aph at redhat.com Fri Oct 27 08:26:18 2017 From: aph at redhat.com (Andrew Haley) Date: Fri, 27 Oct 2017 09:26:18 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <59F2DC24.8050701@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <15dd917732444959b7785efbe6640952@sap.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> Message-ID: On 27/10/17 08:11, Erik ?sterlund wrote: > The current mechanism does not use mprotect to stop threads. Eh? Sure it does: you're talking about the new, proposed mechanism that's the subject of this patch, surely. > It has one global trapping page and one global not trapping page. It > simply performs stores to flip the polling word to point at the > trapping page. So I am not so concerned about TLB shootdown costs > here. As for the SEGV, the mechanism was stress tested (shooting > handshakes on all threads continuously) to see how expensive the > SEGV was, and the outcome was that it was surprisingly cheap. So we > did not pursue making the slow path faster. Interesting. It's a lot of code. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Fri Oct 27 08:36:42 2017 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 27 Oct 2017 10:36:42 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> Message-ID: <59F2F01A.403@oracle.com> Hi Andrew, On 2017-10-27 10:26, Andrew Haley wrote: > On 27/10/17 08:11, Erik ?sterlund wrote: > >> The current mechanism does not use mprotect to stop threads. > Eh? Sure it does: you're talking about the new, proposed mechanism > that's the subject of this patch, surely. Yes indeed. Sorry, I was unclear. > >> It has one global trapping page and one global not trapping page. It >> simply performs stores to flip the polling word to point at the >> trapping page. So I am not so concerned about TLB shootdown costs >> here. As for the SEGV, the mechanism was stress tested (shooting >> handshakes on all threads continuously) to see how expensive the >> SEGV was, and the outcome was that it was surprisingly cheap. So we >> did not pursue making the slow path faster. > Interesting. It's a lot of code. :) Thanks, /Erik From serguei.spitsyn at oracle.com Fri Oct 27 08:52:54 2017 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 27 Oct 2017 01:52:54 -0700 Subject: RFR: SA: JDK-8189798: SA cleanup - part 1 In-Reply-To: References: <18501902-23db-de6c-b83d-640cd33df836@oracle.com> Message-ID: <691d8166-5395-906a-4256-ef0ab2e2773a@oracle.com> Hi Jini, The fix looks good to me. Thanks, Serguei On 10/24/17 00:31, Jini George wrote: > Adding hotspot-dev too. > > Thanks, > Jini. > > On 10/24/2017 12:05 PM, Jini George wrote: >> Hello, >> >> As a part of SA next, I am working on writing a test case which >> compares the fields and the types of the fields of the SA java >> classes with the corresponding entries in the vmStructs tables. This, >> to some extent, would help in preventing errors in SA due to the >> changes in hotspot. As a precursor to this, I am in the process of >> making some cleanup related changes (mostly in SA). I plan to have >> the changes done in parts. For this webrev, most of the changes are for: >> >> 1. Avoiding having some values being redefined in SA. Instead have >> those exported through vmStructs, and read it in SA. >> (CompactibleFreeListSpace::_min_chunk_size_in_bytes, >> CompactibleFreeListSpace::IndexSetSize) >> >> Redefinition of hotspot values in SA makes SA error prone, when the >> value gets altered in hotspot and the corresponding modification gets >> missed out in SA. >> >> 2. To remove some unused code (JNIid.java). >> 3. Add the missing "CMSBitMap::_bmStartWord" in vmStructs. >> 4. Modify variable names in SA and hotspot to match the counterpart >> names, so that the comparison of the fields become easier. Most of >> the changes belong to this group. >> >> Could I please get reviews done for these precursor changes ? >> >> JBS Id: https://bugs.openjdk.java.net/browse/JDK-8189798 >> webrev: http://cr.openjdk.java.net/~jgeorge/8189798/webrev.00/ >> >> Thank you, >> Jini. >> From coleen.phillimore at oracle.com Fri Oct 27 11:12:06 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 27 Oct 2017 07:12:06 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <15a07ec6-3fc3-f757-1711-8d088d194115@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <8e157a28-5397-95c1-03dc-de6d0d3d37e8@oracle.com> <15a07ec6-3fc3-f757-1711-8d088d194115@oracle.com> Message-ID: <77e1ab82-4307-671f-1ca8-fd7f8a557b2c@oracle.com> Thank you for reviewing this, Mandy! Coleen On 10/26/17 5:27 PM, mandy chung wrote: > > > On 10/26/17 1:34 PM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/26/17 2:47 PM, mandy chung wrote: >>> >>> >>> On 10/26/17 2:57 AM, Magnus Ihse Bursie wrote: >>>> A third option is to remove the support for link-time-opt entirely, >>>> if it's not really used. >>>> >>>> * src/java.base/unix/native/include/jvm_md.h and >>>> src/java.base/windows/native/include/jvm_md.h: >>>> >>>> These files define a public API, and contain non-trivial changes. I >>>> suspect you should file a CSR request. (Even though I realize >>>> you're only matching the header file with the reality.) >>> >>> jvm.h and jvm_md.h are not public API and they are not copied to the >>> $JAVA_HOME/includes directly.? This does raise the question that >>> jvm*.h may belong to other location than >>> src/java.base/{share,$OS}/native/include. >> >> I'm not sure where else it would go honestly, but it could be moved >> outside this changeset.? The good thing about where it is, is that >> the -I directives in the makefiles find both jni.h and jvm.h. > > I agree we should keep this location for this change (the location is > a separate issue).? I reviewed the change that looks good to me. > > Mandy From magnus.ihse.bursie at oracle.com Fri Oct 27 11:44:53 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Fri, 27 Oct 2017 13:44:53 +0200 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> Message-ID: On 2017-10-26 22:44, coleen.phillimore at oracle.com wrote: > ?Hi Magnus, > > Thank you for reviewing this.?? I have a new version that takes out > the hack in globalDefinitions.hpp and adds casts to > src/hotspot/share/opto/type.cpp instead. > > Also some fixes from Martin at SAP. > > open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev > > see below. > > On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >> Coleen, >> >> Thank you for addressing this! >> >> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>> Summary: removed hotspot version of jvm*h and jni*h files >>> >>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>> after precompiled.h, so if you have repetitive stress wrist issues >>> don't click on most of these files. >>> >>> There were more issues to resolve, however.? The JDK windows >>> jni_md.h file defined jint as long and the hotspot windows jni_x86.h >>> as int.? I had to choose the jdk version since it's the public >>> version, so there are changes to the hotspot files for this. >>> Generally I changed the code to use 'int' rather than 'jint' where >>> the surrounding API didn't insist on consistently using java types. >>> We should mostly be using C++ types within hotspot except in >>> interfaces to native/JNI code. There are a couple of hacks in places >>> where adding multiple jint casts was too painful. >>> >>> Tested with JPRT and tier2-4 (in progress). >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >> >> Looks great! >> >> Just a few comments: >> >> * src/java.base/unix/native/include/jni_md.h: >> >> I don't think the externally_visible attribute should be there for >> arm. I know this was the case for the corresponding hotspot file for >> arm, but that was techically incorrect. The proper dependency here is >> that externally_visible should be in all JNIEXPORT if and only if >> we're building with JVM feature "link-time-opt". Traditionally, that >> feature been enabled when building arm32 builds, and only then, so >> there's been a (coincidentally) connection here. Nowadays, Oracle >> does not care about the arm32 builds, and I'm not sure if anyone else >> is building them with link-time-opt enabled. >> >> It does seem wrong to me to export this behavior in the public >> jni_md.h file, though. I think the correct way to solve this, if we >> should continue supporting link-time-opt is to make sure this >> attribute is set for exported hotspot functions. If it's still >> needed, that is. A quick googling seems to indicate that >> visibility("default") might be enough in modern gcc's. >> >> A third option is to remove the support for link-time-opt entirely, >> if it's not really used. > > I didn't know how to change this since we are still building ARM with > the jdk10/hs repository, and ARM needed this change.? I could wait > until we bring down the jdk10/master changes that remove the ARM build > and remove this conditional before I push.? Or we could file an RFE to > remove link-time-opt (?) and remove it then? I'm looking into the link-time-opt issue right now. I think it boils down to us using an incorrect flag to gcc when linking, -fwhole-program, when -fuse-linker-plugin should have been used. This caused all exported symbols to disappear unless they were attributed with externally_visible, which makes sense for a program but not a shared library. I'm currently trying to verify that -fuse-linker-plugin removes the need for the externally_visible attribute when using link-time-opt. If it does, I'll open a separate bug to fix that, and if I push that first, you can safely delete the externally_visible attributes. /Magnus > >> >> * src/java.base/unix/native/include/jvm_md.h and >> src/java.base/windows/native/include/jvm_md.h: >> >> These files define a public API, and contain non-trivial changes. I >> suspect you should file a CSR request. (Even though I realize you're >> only matching the header file with the reality.) >> > > I filed the CSR.?? Waiting for the next steps. > > Thanks, > Coleen > >> /Magnus >> >>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>> >>> I have a script to update copyright files on commit. >>> >>> Thanks to Magnus and ErikJ for the makefile changes. >>> >>> Thanks, >>> Coleen >>> >> > From coleen.phillimore at oracle.com Fri Oct 27 12:13:12 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 27 Oct 2017 08:13:12 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> Message-ID: On 10/27/17 3:23 AM, David Holmes wrote: > Hi Coleen, > > Thanks for tackling this. > >> Summary: removed hotspot version of jvm*h and jni*h files > > Can you update the bug synopsis to show it covers both sets of files > please. > > I hate to start with this (and it took me quite a while to realize it) > but as Mandy pointed out jvm.h is not an exported interface from the > JDK to the outside world (so not subject to CSR review), but is a > private interface between the JVM and the JDK libraries. So I think > really jvm.h belongs in the hotspot sources where it was, while jni.h > belongs in the exported JDK sources. In which case the bulk of your > changes to the hotspot files would not be needed - sorry. Maybe someone can make that decision and change at a later date. The point of this change is that there is now only one of these files that is shared.? I don't think jvm.h and the jvm_md.h belong on the hotspot sources for the jdk to find them in some random prims and os dependent directories. I'm happy to withdraw the CSR.? We generally use the CSR process to add and remove JVM_ interfaces even though they're a private interface in case some other JVM/JDK combination relies on them. The changes to these files are very minor though and not likely to cause any even theoretical incompatibility, so I'll withdraw it. > > Moving on ... > > First to address the initial comments/query you had: > >> The JDK windows jni_md.h file defined jint as long and the hotspot >> windows jni_x86.h as int. I had to choose the jdk version since it's the >> public version, so there are changes to the hotspot files for this. > > On Windows int and long are always the same as it uses ILP32 or LLP64 > (not LP64 like *nix platforms). So either choice should be fine. That > said there are some odd casting issues I comment on below. Does the VS > compiler complain about mixing int and long in expressions? Yes, it does even though int and long are the same representation. > >> Generally I changed the code to use 'int' rather than 'jint' where the >> surrounding API didn't insist on consistently using java types. We >> should mostly be using C++ types within hotspot except in interfaces to >> native/JNI code. > > I think you pulled too hard on a few threads here and things are > starting to unravel. There are numerous cases I refer to below where > either the cast seems unnecessary/inappropriate or else highlights a > bunch of additional changes that also need to be made. The fan out > from this could be horrendous. Unless you actually get some kind of > error - and I'd like to understand the details of those - I would not > suggest making these changes as part of this work. I didn't make any change unless there was was an error.? I have 100 failed JPRT jobs to confirm!? I eventually got a Windows system to compile and test this on.?? Actually some of the changes came out better.? Cases where we use jint as a bool simply turned to int.? We do not have an overload for bool for cmpxchg. > > Looking through I have a quite a few queries/comments - apologies in > advance as I know how tedious this is: > > make/hotspot/lib/CompileLibjsig.gmk > src/java.base/solaris/native/libjsig/jsig.c > > Took a while to figure out why the include was needed. :) As a follow > up I suggest just deleting the -I include directive, delete the > Solaris-only definition of JSIG_VERSION_1_4_1, and delete everything > to do with JVM_get_libjsig_version. It is all obsolete. Can I patch up jsig in a separate RFE?? I don't remember why this broke so I simply moved JSIG #define.? Is jsig obsolete?? Removing JVM_* definitions generally requires a CSR. > > --- > > src/hotspot/cpu/arm/interp_masm_arm.cpp > > Why did you need to add the jvm.h include? > ? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); > --- > > src/hotspot/os/windows/os_windows.cpp. > > The type of process_exiting should be uint to match the DWORD of > GetCurrentThreadID. Then you should need any casts. Also you missed > this jint cast: > > 3796???????? process_exiting != (jint)GetCurrentThreadId()) { Yes, that's better to change process_exiting to a DWORD.? It needs a DWORD cast to 0 in the cmpxchg. ??????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, (DWORD)0); These templates are picky. > > --- > > src/hotspot/share/c1/c1_Canonicalizer.hpp > > ? 43 #ifdef _WINDOWS > ? 44?? // jint is defined as long in jni_md.h, so convert from int to > jint > ? 45?? void set_constant(int x)?????????????????????? { > set_constant((jint)x); } > ? 46 #endif > > Why is this necessary? int and long are the same on Windows. The whole > point is that jint hides the underlying type, so where does this go > wrong? No, they are not the same types even though they have the same representation! > > --- > > src/hotspot/share/c1/c1_LinearScan.cpp > > ?ConstantIntValue((jint)0); > > why is this cast needed? what causes the ambiguity? (If this was a > template I'd understand ;-) ). Also didn't you change that constructor > to take an int anyway - not that I think it should - see below. Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match 'long' better than any pointer type.? So this cast is needed. > > --- > > src/hotspot/share/ci/ciReplay.cpp > > 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); > > why should this be jint? To avoid a cast from int* to jint* in the line below: value = kelem->multi_allocate(rank, dims, CHECK); > > --- > > src/hotspot/share/classfile/altHashing.cpp > > Okay this looks more consistent with jint. Yes.? I translated this from some native code iirc. > > --- > > src/hotspot/share/code/debugInfo.hpp > > These changes seem wrong. We have: > > ConstantLongValue(jlong value) > ConstantDoubleValue(jdouble value) > > so we should have: > > ConstantIntValue(jint value) Again, there are multiple call sites with '0', which match int trivially but are confused with long.? It's less consistent I agree but better to not cast all the call sites. > > --- > > src/hotspot/share/code/relocInfo.cpp > > Change seems unnecessary - int32_t is fine > No, int32_t doesn't match the calls below it.? They all assume _lo and _hi are jint. > --- > > src/hotspot/share/compiler/compileBroker.cpp > src/hotspot/share/compiler/compileBroker.hpp > > I see a complete mix of int and jint in this class, so why make the > one change you did ?? This is another case of using jint as a flag with cmpxchg.? The templates for cmpxchg want the types to match and 0 and 1 are essentially 'int'.? This is a lot cleaner this way. > > --- > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp > > 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); > > why did you need to add the jint cast? It's used without any cast on > the next two lines: > > 1701???? length -= O_BUFLEN; > 1702???? offset += O_BUFLEN; > There's a conversion from O_BUFLEN from int to long in 1701 and 1702.?? MIN2 is a template that wants the types to match exactly. > ?? > > --- > > src/hotspot/share/jvmci/jvmciRuntime.cpp > > Looking around this code it seems very confused about types - eg the > previous function is declared jboolean yet returns a jint on one path! > It isn't clear to me if the return type is what should be changed or > the parameter type? I would just leave this alone. I can't leave it alone because it doesn't compile that way.? This was the minimal change and yea, does look a bit inconsistent. > > --- > > src/hotspot/share/opto/mulnode.cpp > > Okay TypeInt has jint parts, so the remaining int32_t declarations (A, > B, C, D) should also be jint. Yes.? c2 uses jint types. > > --- > > src/hotspot/share/opto/parse3.cpp > > I agree with the changes you made, but then: > > ?419???? jint dim_con = find_int_con(length[j], -1); > > should also be changed. > > And obviously MultiArrayExpandLimit should be defined as int not intx! Everything in globals.hpp is intx.? That's a thread that I don't want to pull on! Changed dim_con to int. > > --- > > src/hotspot/share/opto/phaseX.cpp > > I can see that intcon(jint i) is consistent with longcon(jlong l), but > the use of "i" in the code is more consistent with int than jint. huh?? really? > > --- > > src/hotspot/share/opto/type.cpp > > 1505 int TypeInt::hash(void) const { > 1506?? return java_add(java_add(_lo, _hi), java_add((jint)_widen, > (jint)Type::Int)); > 1507 } > > I can see that the (jint) casts you added make sense, but then the > whole function should be returning jint not int. Ditto the other hash > functions. I'm not messing with this, this is the minimal in type fixing that I'm going to do here. > > --- > > src/hotspot/share/prims/jni.cpp > > I think vm_created should be a bool. In fact all the fields you > changed are logically bools - do Atomics work for bool now? No, they do not.?? I had thought bool would be better originally too. > > --- > > src/hotspot/share/prims/jvm.cpp > > is_attachable is the terminology used in the JDK code. Well the JDK version had is_attach_supported() as the flag name so I used that in this one place. > > --- > > src/hotspot/share/prims/jvmtiEnvBase.cpp > src/hotspot/share/prims/jvmtiImpl.cpp > > Are you making parameters consistent with the fields they initialize? They're consistent with the declarations now. > > --- > > src/hotspot/share/prims/jvmtiTagMap.cpp > > There is a mix of int and jint for slot in this code. You fixed some, > but this remains: > > 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong thread_tag, > 2441??????????????????????????????????????????????????? jlong tid, > 2442??????????????????????????????????????????????????? jint depth, > 2443??????????????????????????????????????????????????? jmethodID method, > 2444??????????????????????????????????????????????????? jlocation bci, > 2445??????????????????????????????????????????????????? jint slot, Right for consistency with the declarations. > > --- > > src/hotspot/share/runtime/perfData.cpp > > Callers pass both jint and int, so param type seems arbitrary. They are, but importantly they match the declarations. > > --- > > src/hotspot/share/runtime/perfMemory.cpp > src/hotspot/share/runtime/perfMemory.hpp > > PerfMemory::_initialized should ideally be a bool - can OrderAccess > handle that now? Nope. > > --- > > src/java.base/share/native/include/jvm.h > > Not clear why the jio functions are not also JNICALL ? They are now.? The JDK version didn't have JNICALL.? JVM needs JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. > > --- > > src/java.base/unix/native/include/jni_md.h > > There is no need to special case ARM. The differences in the existing > code were for LTO support and that is now irrelevant. See discussion with Magnus.?? We still build ARM for jdk10/hs so I needed this conditional or of course I wouldn't have added it.? We can remove it with LTO support. > > --- > > src/java.base/unix/native/include/jvm_md.h > > I know you've just copied this across, but it seems wrong to me: > > ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This > may > ? 58 //?????? cause problems if JVM and the rest of JDK are built on > different > ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to be > MAXPATHLEN + 1, > ? 60 //?????? so buffers declared in VM are always >= 4096. > ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 > > It doesn't make sense to me to define an internal "max path length" > that can _exceed_ the platform max! > > That aside there's no support for building different parts of the JDK > on different platforms and then bringing them together. And in any > case I would think the real problem would be building on a platform > that uses 4096 and running on one that uses 4095! > > But that aside this is a Linux hack and should be guarded by ifdef > LINUX. (I doubt BSD needs it, the bsd file is just a copy of the linux > one - the JDK macosx version does the right thing). Solaris and AIX > should stay as-is at MAXPATHLEN. All of the unix platforms had MAXPATHLEN+1.? I'll leave it for now and we can investigate that further. > > ?86 #define ASYNC_SIGNAL???? SIGJVM2 > > This only exists on Solaris so I think should be in #ifdef SOLARIS, to > make that clear. Ok.? I'll add this. > > --- > > src/java.base/windows/native/include/jvm_md.h > > Given the differences between the two versions either something has > been broken or "extern C" declarations are not needed :) Well, they are needed for Hotspot to build and do not prevent jdk from building.? I don't know what was broken. > > --- > > That was a really painful way to spend most of my Friday. TGIF! :) Thanks for going through it.? See comments inline for changes. Generating a webrev takes hours so I'm not going to do that unless you insist. Thanks, Coleen > > Thanks, > David > ----- > > > On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >> ??Hi Magnus, >> >> Thank you for reviewing this.?? I have a new version that takes out >> the hack in globalDefinitions.hpp and adds casts to >> src/hotspot/share/opto/type.cpp instead. >> >> Also some fixes from Martin at SAP. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >> >> see below. >> >> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>> Coleen, >>> >>> Thank you for addressing this! >>> >>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>> Summary: removed hotspot version of jvm*h and jni*h files >>>> >>>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>>> after precompiled.h, so if you have repetitive stress wrist issues >>>> don't click on most of these files. >>>> >>>> There were more issues to resolve, however.? The JDK windows >>>> jni_md.h file defined jint as long and the hotspot windows >>>> jni_x86.h as int. I had to choose the jdk version since it's the >>>> public version, so there are changes to the hotspot files for this. >>>> Generally I changed the code to use 'int' rather than 'jint' where >>>> the surrounding API didn't insist on consistently using java types. >>>> We should mostly be using C++ types within hotspot except in >>>> interfaces to native/JNI code.? There are a couple of hacks in >>>> places where adding multiple jint casts was too painful. >>>> >>>> Tested with JPRT and tier2-4 (in progress). >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>> >>> Looks great! >>> >>> Just a few comments: >>> >>> * src/java.base/unix/native/include/jni_md.h: >>> >>> I don't think the externally_visible attribute should be there for >>> arm. I know this was the case for the corresponding hotspot file for >>> arm, but that was techically incorrect. The proper dependency here >>> is that externally_visible should be in all JNIEXPORT if and only if >>> we're building with JVM feature "link-time-opt". Traditionally, that >>> feature been enabled when building arm32 builds, and only then, so >>> there's been a (coincidentally) connection here. Nowadays, Oracle >>> does not care about the arm32 builds, and I'm not sure if anyone >>> else is building them with link-time-opt enabled. >>> >>> It does seem wrong to me to export this behavior in the public >>> jni_md.h file, though. I think the correct way to solve this, if we >>> should continue supporting link-time-opt is to make sure this >>> attribute is set for exported hotspot functions. If it's still >>> needed, that is. A quick googling seems to indicate that >>> visibility("default") might be enough in modern gcc's. >>> >>> A third option is to remove the support for link-time-opt entirely, >>> if it's not really used. >> >> I didn't know how to change this since we are still building ARM with >> the jdk10/hs repository, and ARM needed this change.? I could wait >> until we bring down the jdk10/master changes that remove the ARM >> build and remove this conditional before I push. Or we could file an >> RFE to remove link-time-opt (?) and remove it then? >> >>> >>> * src/java.base/unix/native/include/jvm_md.h and >>> src/java.base/windows/native/include/jvm_md.h: >>> >>> These files define a public API, and contain non-trivial changes. I >>> suspect you should file a CSR request. (Even though I realize you're >>> only matching the header file with the reality.) >>> >> >> I filed the CSR.?? Waiting for the next steps. >> >> Thanks, >> Coleen >> >>> /Magnus >>> >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>> >>>> I have a script to update copyright files on commit. >>>> >>>> Thanks to Magnus and ErikJ for the makefile changes. >>>> >>>> Thanks, >>>> Coleen >>>> >>> >> From coleen.phillimore at oracle.com Fri Oct 27 12:15:34 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 27 Oct 2017 08:15:34 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> Message-ID: <54c2f8c4-dcec-7b87-024f-65353c91242f@oracle.com> On 10/27/17 7:44 AM, Magnus Ihse Bursie wrote: > > On 2017-10-26 22:44, coleen.phillimore at oracle.com wrote: >> ?Hi Magnus, >> >> Thank you for reviewing this.?? I have a new version that takes out >> the hack in globalDefinitions.hpp and adds casts to >> src/hotspot/share/opto/type.cpp instead. >> >> Also some fixes from Martin at SAP. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >> >> see below. >> >> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>> Coleen, >>> >>> Thank you for addressing this! >>> >>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>> Summary: removed hotspot version of jvm*h and jni*h files >>>> >>>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>>> after precompiled.h, so if you have repetitive stress wrist issues >>>> don't click on most of these files. >>>> >>>> There were more issues to resolve, however.? The JDK windows >>>> jni_md.h file defined jint as long and the hotspot windows >>>> jni_x86.h as int.? I had to choose the jdk version since it's the >>>> public version, so there are changes to the hotspot files for this. >>>> Generally I changed the code to use 'int' rather than 'jint' where >>>> the surrounding API didn't insist on consistently using java types. >>>> We should mostly be using C++ types within hotspot except in >>>> interfaces to native/JNI code. There are a couple of hacks in >>>> places where adding multiple jint casts was too painful. >>>> >>>> Tested with JPRT and tier2-4 (in progress). >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>> >>> Looks great! >>> >>> Just a few comments: >>> >>> * src/java.base/unix/native/include/jni_md.h: >>> >>> I don't think the externally_visible attribute should be there for >>> arm. I know this was the case for the corresponding hotspot file for >>> arm, but that was techically incorrect. The proper dependency here >>> is that externally_visible should be in all JNIEXPORT if and only if >>> we're building with JVM feature "link-time-opt". Traditionally, that >>> feature been enabled when building arm32 builds, and only then, so >>> there's been a (coincidentally) connection here. Nowadays, Oracle >>> does not care about the arm32 builds, and I'm not sure if anyone >>> else is building them with link-time-opt enabled. >>> >>> It does seem wrong to me to export this behavior in the public >>> jni_md.h file, though. I think the correct way to solve this, if we >>> should continue supporting link-time-opt is to make sure this >>> attribute is set for exported hotspot functions. If it's still >>> needed, that is. A quick googling seems to indicate that >>> visibility("default") might be enough in modern gcc's. >>> >>> A third option is to remove the support for link-time-opt entirely, >>> if it's not really used. >> >> I didn't know how to change this since we are still building ARM with >> the jdk10/hs repository, and ARM needed this change.? I could wait >> until we bring down the jdk10/master changes that remove the ARM >> build and remove this conditional before I push. Or we could file an >> RFE to remove link-time-opt (?) and remove it then? > > I'm looking into the link-time-opt issue right now. I think it boils > down to us using an incorrect flag to gcc when linking, > -fwhole-program, when -fuse-linker-plugin should have been used. This > caused all exported symbols to disappear unless they were attributed > with externally_visible, which makes sense for a program but not a > shared library. I'm currently trying to verify that > -fuse-linker-plugin removes the need for the externally_visible > attribute when using link-time-opt. If it does, I'll open a separate > bug to fix that, and if I push that first, you can safely delete the > externally_visible attributes. Thanks Magnus.? Let me know when you push this change and I'll update my change to remove this #ifdef ARM code.?? Please push to the hs repo though. Thanks! Coleen > > /Magnus > >> >>> >>> * src/java.base/unix/native/include/jvm_md.h and >>> src/java.base/windows/native/include/jvm_md.h: >>> >>> These files define a public API, and contain non-trivial changes. I >>> suspect you should file a CSR request. (Even though I realize you're >>> only matching the header file with the reality.) >>> >> >> I filed the CSR.?? Waiting for the next steps. >> >> Thanks, >> Coleen >> >>> /Magnus >>> >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>> >>>> I have a script to update copyright files on commit. >>>> >>>> Thanks to Magnus and ErikJ for the makefile changes. >>>> >>>> Thanks, >>>> Coleen >>>> >>> >> > From david.holmes at oracle.com Fri Oct 27 12:31:33 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 27 Oct 2017 22:31:33 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> Message-ID: Magnus, LTO is irrelevant now. David On 27/10/2017 9:44 PM, Magnus Ihse Bursie wrote: > > On 2017-10-26 22:44, coleen.phillimore at oracle.com wrote: >> ?Hi Magnus, >> >> Thank you for reviewing this.?? I have a new version that takes out >> the hack in globalDefinitions.hpp and adds casts to >> src/hotspot/share/opto/type.cpp instead. >> >> Also some fixes from Martin at SAP. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >> >> see below. >> >> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>> Coleen, >>> >>> Thank you for addressing this! >>> >>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>> Summary: removed hotspot version of jvm*h and jni*h files >>>> >>>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>>> after precompiled.h, so if you have repetitive stress wrist issues >>>> don't click on most of these files. >>>> >>>> There were more issues to resolve, however.? The JDK windows >>>> jni_md.h file defined jint as long and the hotspot windows jni_x86.h >>>> as int.? I had to choose the jdk version since it's the public >>>> version, so there are changes to the hotspot files for this. >>>> Generally I changed the code to use 'int' rather than 'jint' where >>>> the surrounding API didn't insist on consistently using java types. >>>> We should mostly be using C++ types within hotspot except in >>>> interfaces to native/JNI code. There are a couple of hacks in places >>>> where adding multiple jint casts was too painful. >>>> >>>> Tested with JPRT and tier2-4 (in progress). >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>> >>> Looks great! >>> >>> Just a few comments: >>> >>> * src/java.base/unix/native/include/jni_md.h: >>> >>> I don't think the externally_visible attribute should be there for >>> arm. I know this was the case for the corresponding hotspot file for >>> arm, but that was techically incorrect. The proper dependency here is >>> that externally_visible should be in all JNIEXPORT if and only if >>> we're building with JVM feature "link-time-opt". Traditionally, that >>> feature been enabled when building arm32 builds, and only then, so >>> there's been a (coincidentally) connection here. Nowadays, Oracle >>> does not care about the arm32 builds, and I'm not sure if anyone else >>> is building them with link-time-opt enabled. >>> >>> It does seem wrong to me to export this behavior in the public >>> jni_md.h file, though. I think the correct way to solve this, if we >>> should continue supporting link-time-opt is to make sure this >>> attribute is set for exported hotspot functions. If it's still >>> needed, that is. A quick googling seems to indicate that >>> visibility("default") might be enough in modern gcc's. >>> >>> A third option is to remove the support for link-time-opt entirely, >>> if it's not really used. >> >> I didn't know how to change this since we are still building ARM with >> the jdk10/hs repository, and ARM needed this change.? I could wait >> until we bring down the jdk10/master changes that remove the ARM build >> and remove this conditional before I push.? Or we could file an RFE to >> remove link-time-opt (?) and remove it then? > > I'm looking into the link-time-opt issue right now. I think it boils > down to us using an incorrect flag to gcc when linking, -fwhole-program, > when -fuse-linker-plugin should have been used. This caused all exported > symbols to disappear unless they were attributed with > externally_visible, which makes sense for a program but not a shared > library. I'm currently trying to verify that -fuse-linker-plugin removes > the need for the externally_visible attribute when using link-time-opt. > If it does, I'll open a separate bug to fix that, and if I push that > first, you can safely delete the externally_visible attributes. > /Magnus > >> >>> >>> * src/java.base/unix/native/include/jvm_md.h and >>> src/java.base/windows/native/include/jvm_md.h: >>> >>> These files define a public API, and contain non-trivial changes. I >>> suspect you should file a CSR request. (Even though I realize you're >>> only matching the header file with the reality.) >>> >> >> I filed the CSR.?? Waiting for the next steps. >> >> Thanks, >> Coleen >> >>> /Magnus >>> >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>> >>>> I have a script to update copyright files on commit. >>>> >>>> Thanks to Magnus and ErikJ for the makefile changes. >>>> >>>> Thanks, >>>> Coleen >>>> >>> >> > From robbin.ehn at oracle.com Fri Oct 27 13:14:32 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 27 Oct 2017 15:14:32 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <59F2F01A.403@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> <59F2F01A.403@oracle.com> Message-ID: Hi all, Poll in switches: http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Switch-10/ Poll in return: http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Ret-11/ Please take an extra look at poll in return. Sanity tested, big test run still running (99% complete - OK). Performance regression for the added polls increased to total of -0.68% vs global poll. (was -0.44%) We are discussing the opt-out option, the newest suggestion is to make it diagnostic. Opinions? For anyone applying these patches, the number 9 patch changes the option from product. I have not sent that out. Thanks, Robbin From aph at redhat.com Fri Oct 27 13:21:32 2017 From: aph at redhat.com (Andrew Haley) Date: Fri, 27 Oct 2017 14:21:32 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> <59F2F01A.403@oracle.com> Message-ID: <3d3474e5-2380-8209-cb95-3ca8cc4aa4ed@redhat.com> On 27/10/17 14:14, Robbin Ehn wrote: > We are discussing the opt-out option, the newest suggestion is to make it > diagnostic. Opinions? We're working on ultra-low-pause-time garbage collection, and it would be very useful to be able to safepoint the interpreter at any bytecode, not at jumps. It is a performance-related option rather than diagonstic. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From david.holmes at oracle.com Fri Oct 27 13:37:42 2017 From: david.holmes at oracle.com (David Holmes) Date: Fri, 27 Oct 2017 23:37:42 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> Message-ID: <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: > > > On 10/27/17 3:23 AM, David Holmes wrote: >> Hi Coleen, >> >> Thanks for tackling this. >> >>> Summary: removed hotspot version of jvm*h and jni*h files >> >> Can you update the bug synopsis to show it covers both sets of files >> please. >> >> I hate to start with this (and it took me quite a while to realize it) >> but as Mandy pointed out jvm.h is not an exported interface from the >> JDK to the outside world (so not subject to CSR review), but is a >> private interface between the JVM and the JDK libraries. So I think >> really jvm.h belongs in the hotspot sources where it was, while jni.h >> belongs in the exported JDK sources. In which case the bulk of your >> changes to the hotspot files would not be needed - sorry. > > Maybe someone can make that decision and change at a later date. The > point of this change is that there is now only one of these files that > is shared.? I don't think jvm.h and the jvm_md.h belong on the hotspot > sources for the jdk to find them in some random prims and os dependent > directories. The one file that is needed is a hotspot file - jvm.h defines the interface that hotspot exports via jvm.cpp. If you leave jvm.h in hotspot/prims then a very large chunk of your boilerplate changes are not needed. The JDK code doesn't care what the name of the directory is - whatever it is just gets added as a -I directive (the JDK code will include "jvm.h" not "prims/jvm.h" the way hotspot sources do. This isn't something we want to change back or move again later. Whatever we do now we live with. > I'm happy to withdraw the CSR.? We generally use the CSR process to add > and remove JVM_ interfaces even though they're a private interface in > case some other JVM/JDK combination relies on them. The changes to these > files are very minor though and not likely to cause any even theoretical > incompatibility, so I'll withdraw it. >> >> Moving on ... >> >> First to address the initial comments/query you had: >> >>> The JDK windows jni_md.h file defined jint as long and the hotspot >>> windows jni_x86.h as int. I had to choose the jdk version since it's the >>> public version, so there are changes to the hotspot files for this. >> >> On Windows int and long are always the same as it uses ILP32 or LLP64 >> (not LP64 like *nix platforms). So either choice should be fine. That >> said there are some odd casting issues I comment on below. Does the VS >> compiler complain about mixing int and long in expressions? > > Yes, it does even though int and long are the same representation. And what an absolute mess that makes. :( >> >>> Generally I changed the code to use 'int' rather than 'jint' where the >>> surrounding API didn't insist on consistently using java types. We >>> should mostly be using C++ types within hotspot except in interfaces to >>> native/JNI code. >> >> I think you pulled too hard on a few threads here and things are >> starting to unravel. There are numerous cases I refer to below where >> either the cast seems unnecessary/inappropriate or else highlights a >> bunch of additional changes that also need to be made. The fan out >> from this could be horrendous. Unless you actually get some kind of >> error - and I'd like to understand the details of those - I would not >> suggest making these changes as part of this work. > > I didn't make any change unless there was was an error.? I have 100 > failed JPRT jobs to confirm!? I eventually got a Windows system to > compile and test this on.?? Actually some of the changes came out > better.? Cases where we use jint as a bool simply turned to int.? We do > not have an overload for bool for cmpxchg. That's unfortunate - ditto for OrderAccess. >> >> Looking through I have a quite a few queries/comments - apologies in >> advance as I know how tedious this is: >> >> make/hotspot/lib/CompileLibjsig.gmk >> src/java.base/solaris/native/libjsig/jsig.c >> >> Took a while to figure out why the include was needed. :) As a follow >> up I suggest just deleting the -I include directive, delete the >> Solaris-only definition of JSIG_VERSION_1_4_1, and delete everything >> to do with JVM_get_libjsig_version. It is all obsolete. > > Can I patch up jsig in a separate RFE?? I don't remember why this broke > so I simply moved JSIG #define.? Is jsig obsolete?? Removing JVM_* > definitions generally requires a CSR. I did say "As a follow up". jsig is not obsolete but the jsig versioning code, only used by Solaris, is. >> >> --- >> >> src/hotspot/cpu/arm/interp_masm_arm.cpp >> >> Why did you need to add the jvm.h include? >> > > ? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); Okay. I'm not going to try and figure out how this code found this before. >> --- >> >> src/hotspot/os/windows/os_windows.cpp. >> >> The type of process_exiting should be uint to match the DWORD of >> GetCurrentThreadID. Then you should need any casts. Also you missed >> this jint cast: >> >> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { > > Yes, that's better to change process_exiting to a DWORD.? It needs a > DWORD cast to 0 in the cmpxchg. > > ??????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, (DWORD)0); > > These templates are picky. Yes - their inability to deal with literals is extremely frustrating. >> >> --- >> >> src/hotspot/share/c1/c1_Canonicalizer.hpp >> >> ? 43 #ifdef _WINDOWS >> ? 44?? // jint is defined as long in jni_md.h, so convert from int to >> jint >> ? 45?? void set_constant(int x)?????????????????????? { >> set_constant((jint)x); } >> ? 46 #endif >> >> Why is this necessary? int and long are the same on Windows. The whole >> point is that jint hides the underlying type, so where does this go >> wrong? > > No, they are not the same types even though they have the same > representation! This is truly unfortunate. >> >> --- >> >> src/hotspot/share/c1/c1_LinearScan.cpp >> >> ?ConstantIntValue((jint)0); >> >> why is this cast needed? what causes the ambiguity? (If this was a >> template I'd understand ;-) ). Also didn't you change that constructor >> to take an int anyway - not that I think it should - see below. > > Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match > 'long' better than any pointer type.? So this cast is needed. But you changed the constructor to take an int! class ConstantIntValue: public ScopeValue { private: - jint _value; + int _value; public: - ConstantIntValue(jint value) { _value = value; } + ConstantIntValue(int value) { _value = value; } >> >> --- >> >> src/hotspot/share/ci/ciReplay.cpp >> >> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >> >> why should this be jint? > > To avoid a cast from int* to jint* in the line below: > > value = kelem->multi_allocate(rank, dims, CHECK); > > >> >> --- >> >> src/hotspot/share/classfile/altHashing.cpp >> >> Okay this looks more consistent with jint. > > Yes.? I translated this from some native code iirc. >> >> --- >> >> src/hotspot/share/code/debugInfo.hpp >> >> These changes seem wrong. We have: >> >> ConstantLongValue(jlong value) >> ConstantDoubleValue(jdouble value) >> >> so we should have: >> >> ConstantIntValue(jint value) > > Again, there are multiple call sites with '0', which match int trivially > but are confused with long.? It's less consistent I agree but better to > not cast all the call sites. This is really making a mess of the APIs - they should be a jint but we declare them int because of a 0 casting problem. Can't we just use 0L? >> >> --- >> >> src/hotspot/share/code/relocInfo.cpp >> >> Change seems unnecessary - int32_t is fine >> > > No, int32_t doesn't match the calls below it.? They all assume _lo and > _hi are jint. >> --- >> >> src/hotspot/share/compiler/compileBroker.cpp >> src/hotspot/share/compiler/compileBroker.hpp >> >> I see a complete mix of int and jint in this class, so why make the >> one change you did ?? > > This is another case of using jint as a flag with cmpxchg.? The > templates for cmpxchg want the types to match and 0 and 1 are > essentially 'int'.? This is a lot cleaner this way. >> >> --- >> >> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >> >> 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); >> >> why did you need to add the jint cast? It's used without any cast on >> the next two lines: >> >> 1701???? length -= O_BUFLEN; >> 1702???? offset += O_BUFLEN; >> > > There's a conversion from O_BUFLEN from int to long in 1701 and 1702. > MIN2 is a template that wants the types to match exactly. $%^%$! templates! >> ?? >> >> --- >> >> src/hotspot/share/jvmci/jvmciRuntime.cpp >> >> Looking around this code it seems very confused about types - eg the >> previous function is declared jboolean yet returns a jint on one path! >> It isn't clear to me if the return type is what should be changed or >> the parameter type? I would just leave this alone. > > I can't leave it alone because it doesn't compile that way.? This was > the minimal change and yea, does look a bit inconsistent. >> >> --- >> >> src/hotspot/share/opto/mulnode.cpp >> >> Okay TypeInt has jint parts, so the remaining int32_t declarations (A, >> B, C, D) should also be jint. > > Yes.? c2 uses jint types. >> >> --- >> >> src/hotspot/share/opto/parse3.cpp >> >> I agree with the changes you made, but then: >> >> ?419???? jint dim_con = find_int_con(length[j], -1); >> >> should also be changed. >> >> And obviously MultiArrayExpandLimit should be defined as int not intx! > > Everything in globals.hpp is intx.? That's a thread that I don't want to > pull on! We still have that limitation? > > Changed dim_con to int. >> >> --- >> >> src/hotspot/share/opto/phaseX.cpp >> >> I can see that intcon(jint i) is consistent with longcon(jlong l), but >> the use of "i" in the code is more consistent with int than jint. > > huh?? really? >> >> --- >> >> src/hotspot/share/opto/type.cpp >> >> 1505 int TypeInt::hash(void) const { >> 1506?? return java_add(java_add(_lo, _hi), java_add((jint)_widen, >> (jint)Type::Int)); >> 1507 } >> >> I can see that the (jint) casts you added make sense, but then the >> whole function should be returning jint not int. Ditto the other hash >> functions. > > I'm not messing with this, this is the minimal in type fixing that I'm > going to do here. >> >> --- >> >> src/hotspot/share/prims/jni.cpp >> >> I think vm_created should be a bool. In fact all the fields you >> changed are logically bools - do Atomics work for bool now? > > No, they do not.?? I had thought bool would be better originally too. >> >> --- >> >> src/hotspot/share/prims/jvm.cpp >> >> is_attachable is the terminology used in the JDK code. > > Well the JDK version had is_attach_supported() as the flag name so I > used that in this one place. >> >> --- >> >> src/hotspot/share/prims/jvmtiEnvBase.cpp >> src/hotspot/share/prims/jvmtiImpl.cpp >> >> Are you making parameters consistent with the fields they initialize? > > They're consistent with the declarations now. >> >> --- >> >> src/hotspot/share/prims/jvmtiTagMap.cpp >> >> There is a mix of int and jint for slot in this code. You fixed some, >> but this remains: >> >> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong thread_tag, >> 2441??????????????????????????????????????????????????? jlong tid, >> 2442??????????????????????????????????????????????????? jint depth, >> 2443??????????????????????????????????????????????????? jmethodID method, >> 2444??????????????????????????????????????????????????? jlocation bci, >> 2445??????????????????????????????????????????????????? jint slot, > > Right for consistency with the declarations. >> >> --- >> >> src/hotspot/share/runtime/perfData.cpp >> >> Callers pass both jint and int, so param type seems arbitrary. > > They are, but importantly they match the declarations. >> >> --- >> >> src/hotspot/share/runtime/perfMemory.cpp >> src/hotspot/share/runtime/perfMemory.hpp >> >> PerfMemory::_initialized should ideally be a bool - can OrderAccess >> handle that now? > > Nope. >> >> --- >> >> src/java.base/share/native/include/jvm.h >> >> Not clear why the jio functions are not also JNICALL ? > > They are now.? The JDK version didn't have JNICALL.? JVM needs JNICALL. > I can't tell you why JDK didn't need JNICALL linkage. ?? JVM currently does not have JNICALL. But they are declared as "extern C". >> >> --- >> >> src/java.base/unix/native/include/jni_md.h >> >> There is no need to special case ARM. The differences in the existing >> code were for LTO support and that is now irrelevant. > > See discussion with Magnus.?? We still build ARM for jdk10/hs so I > needed this conditional or of course I wouldn't have added it.? We can > remove it with LTO support. Those builds are gone - this is obsolete. But yes all LTO can be removed later if you wish. Just trying to simplify things now. >> >> --- >> >> src/java.base/unix/native/include/jvm_md.h >> >> I know you've just copied this across, but it seems wrong to me: >> >> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This >> may >> ? 58 //?????? cause problems if JVM and the rest of JDK are built on >> different >> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >> MAXPATHLEN + 1, >> ? 60 //?????? so buffers declared in VM are always >= 4096. >> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >> >> It doesn't make sense to me to define an internal "max path length" >> that can _exceed_ the platform max! >> >> That aside there's no support for building different parts of the JDK >> on different platforms and then bringing them together. And in any >> case I would think the real problem would be building on a platform >> that uses 4096 and running on one that uses 4095! >> >> But that aside this is a Linux hack and should be guarded by ifdef >> LINUX. (I doubt BSD needs it, the bsd file is just a copy of the linux >> one - the JDK macosx version does the right thing). Solaris and AIX >> should stay as-is at MAXPATHLEN. > > All of the unix platforms had MAXPATHLEN+1.? I'll leave it for now and > we can investigate that further. I see the following existing code: src/java.base/unix/native/include/jvm_md.h: #define JVM_MAXPATHLEN MAXPATHLEN src/java.base/macosx/native/include/jvm_md.h #define JVM_MAXPATHLEN MAXPATHLEN src/hotspot/os/aix/jvm_aix.h #define JVM_MAXPATHLEN MAXPATHLEN src/hotspot/os/bsd/jvm_bsd.h #define JVM_MAXPATHLEN MAXPATHLEN + 1 // blindly copied from Linux version src/hotspot/os/linux/jvm_linux.h #define JVM_MAXPATHLEN MAXPATHLEN + 1 src/hotspot/os/solaris/jvm_solaris.h #define JVM_MAXPATHLEN MAXPATHLEN This is a linux only hack (if you ignore the blind copy from linux into the BSD code in the VM). >> >> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >> >> This only exists on Solaris so I think should be in #ifdef SOLARIS, to >> make that clear. > > Ok.? I'll add this. >> >> --- >> >> src/java.base/windows/native/include/jvm_md.h >> >> Given the differences between the two versions either something has >> been broken or "extern C" declarations are not needed :) > > Well, they are needed for Hotspot to build and do not prevent jdk from > building.? I don't know what was broken. We really need to understand this better. Maybe related to the map files that expose the symbols. ?? >> >> --- >> >> That was a really painful way to spend most of my Friday. TGIF! :) > > Thanks for going through it.? See comments inline for changes. > Generating a webrev takes hours so I'm not going to do that unless you > insist. An incremental webrev shouldn't take long - right? You're a mq maestro now. :) If you can reasonably produce an incremental webrev once you've settled on all the comments/issues that would be good. Thanks, David > Thanks, > Coleen > > >> >> Thanks, >> David >> ----- >> >> >> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>> ??Hi Magnus, >>> >>> Thank you for reviewing this.?? I have a new version that takes out >>> the hack in globalDefinitions.hpp and adds casts to >>> src/hotspot/share/opto/type.cpp instead. >>> >>> Also some fixes from Martin at SAP. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>> >>> see below. >>> >>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>> Coleen, >>>> >>>> Thank you for addressing this! >>>> >>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>> >>>>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>>>> after precompiled.h, so if you have repetitive stress wrist issues >>>>> don't click on most of these files. >>>>> >>>>> There were more issues to resolve, however.? The JDK windows >>>>> jni_md.h file defined jint as long and the hotspot windows >>>>> jni_x86.h as int. I had to choose the jdk version since it's the >>>>> public version, so there are changes to the hotspot files for this. >>>>> Generally I changed the code to use 'int' rather than 'jint' where >>>>> the surrounding API didn't insist on consistently using java types. >>>>> We should mostly be using C++ types within hotspot except in >>>>> interfaces to native/JNI code.? There are a couple of hacks in >>>>> places where adding multiple jint casts was too painful. >>>>> >>>>> Tested with JPRT and tier2-4 (in progress). >>>>> >>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>> >>>> Looks great! >>>> >>>> Just a few comments: >>>> >>>> * src/java.base/unix/native/include/jni_md.h: >>>> >>>> I don't think the externally_visible attribute should be there for >>>> arm. I know this was the case for the corresponding hotspot file for >>>> arm, but that was techically incorrect. The proper dependency here >>>> is that externally_visible should be in all JNIEXPORT if and only if >>>> we're building with JVM feature "link-time-opt". Traditionally, that >>>> feature been enabled when building arm32 builds, and only then, so >>>> there's been a (coincidentally) connection here. Nowadays, Oracle >>>> does not care about the arm32 builds, and I'm not sure if anyone >>>> else is building them with link-time-opt enabled. >>>> >>>> It does seem wrong to me to export this behavior in the public >>>> jni_md.h file, though. I think the correct way to solve this, if we >>>> should continue supporting link-time-opt is to make sure this >>>> attribute is set for exported hotspot functions. If it's still >>>> needed, that is. A quick googling seems to indicate that >>>> visibility("default") might be enough in modern gcc's. >>>> >>>> A third option is to remove the support for link-time-opt entirely, >>>> if it's not really used. >>> >>> I didn't know how to change this since we are still building ARM with >>> the jdk10/hs repository, and ARM needed this change.? I could wait >>> until we bring down the jdk10/master changes that remove the ARM >>> build and remove this conditional before I push. Or we could file an >>> RFE to remove link-time-opt (?) and remove it then? >>> >>>> >>>> * src/java.base/unix/native/include/jvm_md.h and >>>> src/java.base/windows/native/include/jvm_md.h: >>>> >>>> These files define a public API, and contain non-trivial changes. I >>>> suspect you should file a CSR request. (Even though I realize you're >>>> only matching the header file with the reality.) >>>> >>> >>> I filed the CSR.?? Waiting for the next steps. >>> >>> Thanks, >>> Coleen >>> >>>> /Magnus >>>> >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>> >>>>> I have a script to update copyright files on commit. >>>>> >>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>> >>>>> Thanks, >>>>> Coleen >>>>> >>>> >>> > From coleen.phillimore at oracle.com Fri Oct 27 13:40:08 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 27 Oct 2017 09:40:08 -0400 Subject: RFR [S] JDK-8179624 [REDO] Avoid repeated calls to JavaThread::last_frame in InterpreterRuntime In-Reply-To: <842ce767-4436-02a3-f536-b71fed1fa6ed@oracle.com> References: <842ce767-4436-02a3-f536-b71fed1fa6ed@oracle.com> Message-ID: <5c7e239e-dd97-6bb7-2615-1cdb7c8b1844@oracle.com> This looks good. Thanks, Coleen On 10/26/17 5:53 PM, Ioi Lam wrote: > Hi, > > Please review the following change. It's a redo of a previous botched > attempt (JDK-8179305) that had a typo which caused JIT-related crashes. > > Thanks to Dean for spotting the typo. > > + Bug > https://bugs.openjdk.java.net/browse/JDK-8179624 > > > + The full changeset: > http://cr.openjdk.java.net/~iklam/jdk10/8179624-redo-8179305-avoid-last-frame.v01.full/ > > > > + The delta from the botched attempt > ? (fixing the typo with monitor_begin/monitor_end): > http://cr.openjdk.java.net/~iklam/jdk10/8179624-redo-8179305-avoid-last-frame.v01.redo_delta/ > > > > + Testing: > hotspot tier1~5 tests. > > > Thanks > - Ioi From coleen.phillimore at oracle.com Fri Oct 27 14:08:42 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 27 Oct 2017 10:08:42 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> Message-ID: <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> On 10/27/17 9:37 AM, David Holmes wrote: > On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/27/17 3:23 AM, David Holmes wrote: >>> Hi Coleen, >>> >>> Thanks for tackling this. >>> >>>> Summary: removed hotspot version of jvm*h and jni*h files >>> >>> Can you update the bug synopsis to show it covers both sets of files >>> please. >>> >>> I hate to start with this (and it took me quite a while to realize >>> it) but as Mandy pointed out jvm.h is not an exported interface from >>> the JDK to the outside world (so not subject to CSR review), but is >>> a private interface between the JVM and the JDK libraries. So I >>> think really jvm.h belongs in the hotspot sources where it was, >>> while jni.h belongs in the exported JDK sources. In which case the >>> bulk of your changes to the hotspot files would not be needed - sorry. >> >> Maybe someone can make that decision and change at a later date. The >> point of this change is that there is now only one of these files >> that is shared.? I don't think jvm.h and the jvm_md.h belong on the >> hotspot sources for the jdk to find them in some random prims and os >> dependent directories. > > The one file that is needed is a hotspot file - jvm.h defines the > interface that hotspot exports via jvm.cpp. > > If you leave jvm.h in hotspot/prims then a very large chunk of your > boilerplate changes are not needed. The JDK code doesn't care what the > name of the directory is - whatever it is just gets added as a -I > directive (the JDK code will include "jvm.h" not "prims/jvm.h" the way > hotspot sources do. > > This isn't something we want to change back or move again later. > Whatever we do now we live with. I think it belongs with jni.h and I think the core libraries group would agree.?? It seems more natural there than buried in the hotspot prims directory.? I guess this is on hold while we have this debate.?? Sigh. Actually with -I directives, changing to jvm.h from prims/jvm.h would still work.?? Maybe we should change the name to jvm.hpp since it's jvm.cpp though??? Or maybe just have two divergent copies and close this as WNF. > >> I'm happy to withdraw the CSR.? We generally use the CSR process to >> add and remove JVM_ interfaces even though they're a private >> interface in case some other JVM/JDK combination relies on them. The >> changes to these files are very minor though and not likely to cause >> any even theoretical incompatibility, so I'll withdraw it. >>> >>> Moving on ... >>> >>> First to address the initial comments/query you had: >>> >>>> The JDK windows jni_md.h file defined jint as long and the hotspot >>>> windows jni_x86.h as int. I had to choose the jdk version since >>>> it's the >>>> public version, so there are changes to the hotspot files for this. >>> >>> On Windows int and long are always the same as it uses ILP32 or >>> LLP64 (not LP64 like *nix platforms). So either choice should be >>> fine. That said there are some odd casting issues I comment on >>> below. Does the VS compiler complain about mixing int and long in >>> expressions? >> >> Yes, it does even though int and long are the same representation. > > And what an absolute mess that makes. :( > >>> >>>> Generally I changed the code to use 'int' rather than 'jint' where the >>>> surrounding API didn't insist on consistently using java types. We >>>> should mostly be using C++ types within hotspot except in >>>> interfaces to >>>> native/JNI code. >>> >>> I think you pulled too hard on a few threads here and things are >>> starting to unravel. There are numerous cases I refer to below where >>> either the cast seems unnecessary/inappropriate or else highlights a >>> bunch of additional changes that also need to be made. The fan out >>> from this could be horrendous. Unless you actually get some kind of >>> error - and I'd like to understand the details of those - I would >>> not suggest making these changes as part of this work. >> >> I didn't make any change unless there was was an error.? I have 100 >> failed JPRT jobs to confirm!? I eventually got a Windows system to >> compile and test this on.?? Actually some of the changes came out >> better.? Cases where we use jint as a bool simply turned to int.? We >> do not have an overload for bool for cmpxchg. > > That's unfortunate - ditto for OrderAccess. > >>> >>> Looking through I have a quite a few queries/comments - apologies in >>> advance as I know how tedious this is: >>> >>> make/hotspot/lib/CompileLibjsig.gmk >>> src/java.base/solaris/native/libjsig/jsig.c >>> >>> Took a while to figure out why the include was needed. :) As a >>> follow up I suggest just deleting the -I include directive, delete >>> the Solaris-only definition of JSIG_VERSION_1_4_1, and delete >>> everything to do with JVM_get_libjsig_version. It is all obsolete. >> >> Can I patch up jsig in a separate RFE?? I don't remember why this >> broke so I simply moved JSIG #define.? Is jsig obsolete? Removing >> JVM_* definitions generally requires a CSR. > > I did say "As a follow up". jsig is not obsolete but the jsig > versioning code, only used by Solaris, is. > >>> >>> --- >>> >>> src/hotspot/cpu/arm/interp_masm_arm.cpp >>> >>> Why did you need to add the jvm.h include? >>> >> >> ?? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); > > Okay. I'm not going to try and figure out how this code found this > before. > >>> --- >>> >>> src/hotspot/os/windows/os_windows.cpp. >>> >>> The type of process_exiting should be uint to match the DWORD of >>> GetCurrentThreadID. Then you should need any casts. Also you missed >>> this jint cast: >>> >>> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { >> >> Yes, that's better to change process_exiting to a DWORD.? It needs a >> DWORD cast to 0 in the cmpxchg. >> >> ???????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, >> (DWORD)0); >> >> These templates are picky. > > Yes - their inability to deal with literals is extremely frustrating. > >>> >>> --- >>> >>> src/hotspot/share/c1/c1_Canonicalizer.hpp >>> >>> ? 43 #ifdef _WINDOWS >>> ? 44?? // jint is defined as long in jni_md.h, so convert from int >>> to jint >>> ? 45?? void set_constant(int x)?????????????????????? { >>> set_constant((jint)x); } >>> ? 46 #endif >>> >>> Why is this necessary? int and long are the same on Windows. The >>> whole point is that jint hides the underlying type, so where does >>> this go wrong? >> >> No, they are not the same types even though they have the same >> representation! > > This is truly unfortunate. > >>> >>> --- >>> >>> src/hotspot/share/c1/c1_LinearScan.cpp >>> >>> ?ConstantIntValue((jint)0); >>> >>> why is this cast needed? what causes the ambiguity? (If this was a >>> template I'd understand ;-) ). Also didn't you change that >>> constructor to take an int anyway - not that I think it should - see >>> below. >> >> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >> 'long' better than any pointer type.? So this cast is needed. > > But you changed the constructor to take an int! > > ?class ConstantIntValue: public ScopeValue { > ? private: > -? jint _value; > +? int _value; > ? public: > -? ConstantIntValue(jint value)???????? { _value = value; } > +? ConstantIntValue(int value)????????? { _value = value; } > > Okay I removed this cast. >>> --- >>> >>> src/hotspot/share/ci/ciReplay.cpp >>> >>> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >>> >>> why should this be jint? >> >> To avoid a cast from int* to jint* in the line below: >> >> ????????? value = kelem->multi_allocate(rank, dims, CHECK); >> >> >>> >>> --- >>> >>> src/hotspot/share/classfile/altHashing.cpp >>> >>> Okay this looks more consistent with jint. >> >> Yes.? I translated this from some native code iirc. >>> >>> --- >>> >>> src/hotspot/share/code/debugInfo.hpp >>> >>> These changes seem wrong. We have: >>> >>> ConstantLongValue(jlong value) >>> ConstantDoubleValue(jdouble value) >>> >>> so we should have: >>> >>> ConstantIntValue(jint value) >> >> Again, there are multiple call sites with '0', which match int >> trivially but are confused with long.? It's less consistent I agree >> but better to not cast all the call sites. > > This is really making a mess of the APIs - they should be a jint but > we declare them int because of a 0 casting problem. Can't we just use 0L? There aren't that many casts.? You're right, that would have been better in some places. >>> >>> --- >>> >>> src/hotspot/share/code/relocInfo.cpp >>> >>> Change seems unnecessary - int32_t is fine >>> >> >> No, int32_t doesn't match the calls below it.? They all assume _lo >> and _hi are jint. >>> --- >>> >>> src/hotspot/share/compiler/compileBroker.cpp >>> src/hotspot/share/compiler/compileBroker.hpp >>> >>> I see a complete mix of int and jint in this class, so why make the >>> one change you did ?? >> >> This is another case of using jint as a flag with cmpxchg.? The >> templates for cmpxchg want the types to match and 0 and 1 are >> essentially 'int'.? This is a lot cleaner this way. > > > >>> >>> --- >>> >>> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >>> >>> 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); >>> >>> why did you need to add the jint cast? It's used without any cast on >>> the next two lines: >>> >>> 1701???? length -= O_BUFLEN; >>> 1702???? offset += O_BUFLEN; >>> >> >> There's a conversion from O_BUFLEN from int to long in 1701 and >> 1702.?? MIN2 is a template that wants the types to match exactly. > > $%^%$! templates! > >>> ?? >>> >>> --- >>> >>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>> >>> Looking around this code it seems very confused about types - eg the >>> previous function is declared jboolean yet returns a jint on one >>> path! It isn't clear to me if the return type is what should be >>> changed or the parameter type? I would just leave this alone. >> >> I can't leave it alone because it doesn't compile that way. This was >> the minimal change and yea, does look a bit inconsistent. >>> >>> --- >>> >>> src/hotspot/share/opto/mulnode.cpp >>> >>> Okay TypeInt has jint parts, so the remaining int32_t declarations >>> (A, B, C, D) should also be jint. >> >> Yes.? c2 uses jint types. >>> >>> --- >>> >>> src/hotspot/share/opto/parse3.cpp >>> >>> I agree with the changes you made, but then: >>> >>> ?419???? jint dim_con = find_int_con(length[j], -1); >>> >>> should also be changed. >>> >>> And obviously MultiArrayExpandLimit should be defined as int not intx! >> >> Everything in globals.hpp is intx.? That's a thread that I don't want >> to pull on! > > We still have that limitation? >> >> Changed dim_con to int. >>> >>> --- >>> >>> src/hotspot/share/opto/phaseX.cpp >>> >>> I can see that intcon(jint i) is consistent with longcon(jlong l), >>> but the use of "i" in the code is more consistent with int than jint. >> >> huh?? really? >>> >>> --- >>> >>> src/hotspot/share/opto/type.cpp >>> >>> 1505 int TypeInt::hash(void) const { >>> 1506?? return java_add(java_add(_lo, _hi), java_add((jint)_widen, >>> (jint)Type::Int)); >>> 1507 } >>> >>> I can see that the (jint) casts you added make sense, but then the >>> whole function should be returning jint not int. Ditto the other >>> hash functions. >> >> I'm not messing with this, this is the minimal in type fixing that >> I'm going to do here. > > > >>> >>> --- >>> >>> src/hotspot/share/prims/jni.cpp >>> >>> I think vm_created should be a bool. In fact all the fields you >>> changed are logically bools - do Atomics work for bool now? >> >> No, they do not.?? I had thought bool would be better originally too. >>> >>> --- >>> >>> src/hotspot/share/prims/jvm.cpp >>> >>> is_attachable is the terminology used in the JDK code. >> >> Well the JDK version had is_attach_supported() as the flag name so I >> used that in this one place. >>> >>> --- >>> >>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>> src/hotspot/share/prims/jvmtiImpl.cpp >>> >>> Are you making parameters consistent with the fields they initialize? >> >> They're consistent with the declarations now. >>> >>> --- >>> >>> src/hotspot/share/prims/jvmtiTagMap.cpp >>> >>> There is a mix of int and jint for slot in this code. You fixed >>> some, but this remains: >>> >>> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong >>> thread_tag, >>> 2441??????????????????????????????????????????????????? jlong tid, >>> 2442??????????????????????????????????????????????????? jint depth, >>> 2443 jmethodID method, >>> 2444 jlocation bci, >>> 2445??????????????????????????????????????????????????? jint slot, >> >> Right for consistency with the declarations. >>> >>> --- >>> >>> src/hotspot/share/runtime/perfData.cpp >>> >>> Callers pass both jint and int, so param type seems arbitrary. >> >> They are, but importantly they match the declarations. >>> >>> --- >>> >>> src/hotspot/share/runtime/perfMemory.cpp >>> src/hotspot/share/runtime/perfMemory.hpp >>> >>> PerfMemory::_initialized should ideally be a bool - can OrderAccess >>> handle that now? >> >> Nope. >>> >>> --- >>> >>> src/java.base/share/native/include/jvm.h >>> >>> Not clear why the jio functions are not also JNICALL ? >> >> They are now.? The JDK version didn't have JNICALL.? JVM needs >> JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. > > ?? JVM currently does not have JNICALL. But they are declared as > "extern C". This was a compilation error on Windows with JDK.?? Maybe the C code in the JDK doesn't complain about linkage differences.? I'll have to go back and figure this out then. > >>> >>> --- >>> >>> src/java.base/unix/native/include/jni_md.h >>> >>> There is no need to special case ARM. The differences in the >>> existing code were for LTO support and that is now irrelevant. >> >> See discussion with Magnus.?? We still build ARM for jdk10/hs so I >> needed this conditional or of course I wouldn't have added it.? We >> can remove it with LTO support. > > Those builds are gone - this is obsolete. But yes all LTO can be > removed later if you wish. Just trying to simplify things now. > >>> >>> --- >>> >>> src/java.base/unix/native/include/jvm_md.h >>> >>> I know you've just copied this across, but it seems wrong to me: >>> >>> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. >>> This may >>> ? 58 //?????? cause problems if JVM and the rest of JDK are built on >>> different >>> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >>> MAXPATHLEN + 1, >>> ? 60 //?????? so buffers declared in VM are always >= 4096. >>> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>> >>> It doesn't make sense to me to define an internal "max path length" >>> that can _exceed_ the platform max! >>> >>> That aside there's no support for building different parts of the >>> JDK on different platforms and then bringing them together. And in >>> any case I would think the real problem would be building on a >>> platform that uses 4096 and running on one that uses 4095! >>> >>> But that aside this is a Linux hack and should be guarded by ifdef >>> LINUX. (I doubt BSD needs it, the bsd file is just a copy of the >>> linux one - the JDK macosx version does the right thing). Solaris >>> and AIX should stay as-is at MAXPATHLEN. >> >> All of the unix platforms had MAXPATHLEN+1.? I'll leave it for now >> and we can investigate that further. > > I see the following existing code: > > src/java.base/unix/native/include/jvm_md.h: > > #define JVM_MAXPATHLEN MAXPATHLEN > > src/java.base/macosx/native/include/jvm_md.h > > #define JVM_MAXPATHLEN MAXPATHLEN > > src/hotspot/os/aix/jvm_aix.h > > #define JVM_MAXPATHLEN MAXPATHLEN > > src/hotspot/os/bsd/jvm_bsd.h > > #define JVM_MAXPATHLEN MAXPATHLEN + 1? // blindly copied from Linux > version > > src/hotspot/os/linux/jvm_linux.h > > #define JVM_MAXPATHLEN MAXPATHLEN + 1 > > src/hotspot/os/solaris/jvm_solaris.h > > #define JVM_MAXPATHLEN MAXPATHLEN > > This is a linux only hack (if you ignore the blind copy from linux > into the BSD code in the VM). Oh, thanks, so should I add a bunch of ifdefs then?? Or do you think having MAXPATHLEN + 1 will really break the other platforms?? Do you really see this as a problem or are you just pointing out inconsistency? > >>> >>> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >>> >>> This only exists on Solaris so I think should be in #ifdef SOLARIS, >>> to make that clear. >> >> Ok.? I'll add this. >>> >>> --- >>> >>> src/java.base/windows/native/include/jvm_md.h >>> >>> Given the differences between the two versions either something has >>> been broken or "extern C" declarations are not needed :) >> >> Well, they are needed for Hotspot to build and do not prevent jdk >> from building.? I don't know what was broken. > > We really need to understand this better. Maybe related to the map > files that expose the symbols. ?? They're needed because the JDK files are written mostly in C and that doesn't complain about the linkage difference.? Hotspot files are in C++ which does complain. > >>> >>> --- >>> >>> That was a really painful way to spend most of my Friday. TGIF! :) >> >> Thanks for going through it.? See comments inline for changes. >> Generating a webrev takes hours so I'm not going to do that unless >> you insist. > > An incremental webrev shouldn't take long - right? You're a mq maestro > now. :) Well I generally trash a repository whenever I use mq but sure. > > If you can reasonably produce an incremental webrev once you've > settled on all the comments/issues that would be good. Ok, sure. Coleen > > Thanks, > David > >> Thanks, >> Coleen >> >> >>> >>> Thanks, >>> David >>> ----- >>> >>> >>> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>>> ??Hi Magnus, >>>> >>>> Thank you for reviewing this.?? I have a new version that takes out >>>> the hack in globalDefinitions.hpp and adds casts to >>>> src/hotspot/share/opto/type.cpp instead. >>>> >>>> Also some fixes from Martin at SAP. >>>> >>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>>> >>>> see below. >>>> >>>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>>> Coleen, >>>>> >>>>> Thank you for addressing this! >>>>> >>>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>> >>>>>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>>>>> after precompiled.h, so if you have repetitive stress wrist >>>>>> issues don't click on most of these files. >>>>>> >>>>>> There were more issues to resolve, however.? The JDK windows >>>>>> jni_md.h file defined jint as long and the hotspot windows >>>>>> jni_x86.h as int. I had to choose the jdk version since it's the >>>>>> public version, so there are changes to the hotspot files for >>>>>> this. Generally I changed the code to use 'int' rather than >>>>>> 'jint' where the surrounding API didn't insist on consistently >>>>>> using java types. We should mostly be using C++ types within >>>>>> hotspot except in interfaces to native/JNI code.? There are a >>>>>> couple of hacks in places where adding multiple jint casts was >>>>>> too painful. >>>>>> >>>>>> Tested with JPRT and tier2-4 (in progress). >>>>>> >>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>>> >>>>> Looks great! >>>>> >>>>> Just a few comments: >>>>> >>>>> * src/java.base/unix/native/include/jni_md.h: >>>>> >>>>> I don't think the externally_visible attribute should be there for >>>>> arm. I know this was the case for the corresponding hotspot file >>>>> for arm, but that was techically incorrect. The proper dependency >>>>> here is that externally_visible should be in all JNIEXPORT if and >>>>> only if we're building with JVM feature "link-time-opt". >>>>> Traditionally, that feature been enabled when building arm32 >>>>> builds, and only then, so there's been a (coincidentally) >>>>> connection here. Nowadays, Oracle does not care about the arm32 >>>>> builds, and I'm not sure if anyone else is building them with >>>>> link-time-opt enabled. >>>>> >>>>> It does seem wrong to me to export this behavior in the public >>>>> jni_md.h file, though. I think the correct way to solve this, if >>>>> we should continue supporting link-time-opt is to make sure this >>>>> attribute is set for exported hotspot functions. If it's still >>>>> needed, that is. A quick googling seems to indicate that >>>>> visibility("default") might be enough in modern gcc's. >>>>> >>>>> A third option is to remove the support for link-time-opt >>>>> entirely, if it's not really used. >>>> >>>> I didn't know how to change this since we are still building ARM >>>> with the jdk10/hs repository, and ARM needed this change.? I could >>>> wait until we bring down the jdk10/master changes that remove the >>>> ARM build and remove this conditional before I push. Or we could >>>> file an RFE to remove link-time-opt (?) and remove it then? >>>> >>>>> >>>>> * src/java.base/unix/native/include/jvm_md.h and >>>>> src/java.base/windows/native/include/jvm_md.h: >>>>> >>>>> These files define a public API, and contain non-trivial changes. >>>>> I suspect you should file a CSR request. (Even though I realize >>>>> you're only matching the header file with the reality.) >>>>> >>>> >>>> I filed the CSR.?? Waiting for the next steps. >>>> >>>> Thanks, >>>> Coleen >>>> >>>>> /Magnus >>>>> >>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>>> >>>>>> I have a script to update copyright files on commit. >>>>>> >>>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>>> >>>>>> Thanks, >>>>>> Coleen >>>>>> >>>>> >>>> >> From robbin.ehn at oracle.com Fri Oct 27 14:45:36 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 27 Oct 2017 16:45:36 +0200 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <3d3474e5-2380-8209-cb95-3ca8cc4aa4ed@redhat.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> <59F2F01A.403@oracle.com> <3d3474e5-2380-8209-cb95-3ca8cc4aa4ed@redhat.com> Message-ID: On 2017-10-27 15:21, Andrew Haley wrote: > On 27/10/17 14:14, Robbin Ehn wrote: >> We are discussing the opt-out option, the newest suggestion is to make it >> diagnostic. Opinions? > > We're working on ultra-low-pause-time garbage collection, and it would be very > useful to be able to safepoint the interpreter at any bytecode, not at jumps. > It is a performance-related option rather than diagonstic. > For that I suggest the e.g UseShenandoah to set a VM internal global setting to low latency. Not exposing yet another option to the user. And in dispatch_base look for that, e.g: if (SafepointMechanism::uses_thread_local_poll() && table != safepoint_table && (generate_poll || SOME_GLOBAL_SETTING_FOR_LOW_LATENCY)) { When I get this into jdk10/hs, down-stream it to Shenandoah repo, do the benchmarks, upstream to jdk10/hs (ZGC might want this also). /Robbin From martin.doerr at sap.com Fri Oct 27 14:47:13 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 27 Oct 2017 14:47:13 +0000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <18f2001cbbbd4772aa9268e6e34b4be9@sap.com> <770b1286-3c8e-92e5-3929-17eb4e6c3847@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> <59F2F01A.403@oracle.com> Message-ID: <4ebb905f23324a00b9cf10d8d410d420@sap.com> Hi Robbin, excellent. I think this matches what Coleen had proposed, now. Thanks for doing all the work with so many incremental patches and for responding on so many discussions. Seems to be a tough piece of work. Best regards, Martin -----Original Message----- From: Robbin Ehn [mailto:robbin.ehn at oracle.com] Sent: Freitag, 27. Oktober 2017 15:15 To: Erik ?sterlund ; Andrew Haley ; Doerr, Martin ; Karen Kinnear ; Coleen Phillimore (coleen.phillimore at oracle.com) Cc: hotspot-dev developers Subject: Re: RFR(XL): 8185640: Thread-local handshakes Hi all, Poll in switches: http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Switch-10/ Poll in return: http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Ret-11/ Please take an extra look at poll in return. Sanity tested, big test run still running (99% complete - OK). Performance regression for the added polls increased to total of -0.68% vs global poll. (was -0.44%) We are discussing the opt-out option, the newest suggestion is to make it diagnostic. Opinions? For anyone applying these patches, the number 9 patch changes the option from product. I have not sent that out. Thanks, Robbin From coleen.phillimore at oracle.com Fri Oct 27 15:13:57 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 27 Oct 2017 11:13:57 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> Message-ID: <57390ec3-8d8d-a3d7-9774-b5945a323be9@oracle.com> On 10/27/17 9:37 AM, David Holmes wrote: >>> src/hotspot/share/c1/c1_LinearScan.cpp >>> >>> ?ConstantIntValue((jint)0); >>> >>> why is this cast needed? what causes the ambiguity? (If this was a >>> template I'd understand ;-) ). Also didn't you change that >>> constructor to take an int anyway - not that I think it should - see >>> below. >> >> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >> 'long' better than any pointer type.? So this cast is needed. > > But you changed the constructor to take an int! > > ?class ConstantIntValue: public ScopeValue { > ? private: > -? jint _value; > +? int _value; > ? public: > -? ConstantIntValue(jint value)???????? { _value = value; } > +? ConstantIntValue(int value)????????? { _value = value; } > I changed this back to not take an int and changed c1_LinearScan.cpp to have the (jint)0 cast and output.cp needed (jint)0 casts.? 0L doesn't work for platforms where jint is an 'int' rather than a long because it's ambiguous with the functions that take a pointer type. Probably better to keep the type of ConstantIntValue consistent with j types. Thanks, Coleen From jini.george at oracle.com Fri Oct 27 15:49:45 2017 From: jini.george at oracle.com (Jini George) Date: Fri, 27 Oct 2017 21:19:45 +0530 Subject: RFR: SA: JDK-8189798: SA cleanup - part 1 In-Reply-To: <691d8166-5395-906a-4256-ef0ab2e2773a@oracle.com> References: <18501902-23db-de6c-b83d-640cd33df836@oracle.com> <691d8166-5395-906a-4256-ef0ab2e2773a@oracle.com> Message-ID: Thank you very much, Serguei. -Jini. On 10/27/2017 2:22 PM, serguei.spitsyn at oracle.com wrote: > Hi Jini, > > The fix looks good to me. > > Thanks, > Serguei > > > On 10/24/17 00:31, Jini George wrote: >> Adding hotspot-dev too. >> >> Thanks, >> Jini. >> >> On 10/24/2017 12:05 PM, Jini George wrote: >>> Hello, >>> >>> As a part of SA next, I am working on writing a test case which >>> compares the fields and the types of the fields of the SA java >>> classes with the corresponding entries in the vmStructs tables. This, >>> to some extent, would help in preventing errors in SA due to the >>> changes in hotspot. As a precursor to this, I am in the process of >>> making some cleanup related changes (mostly in SA). I plan to have >>> the changes done in parts. For this webrev, most of the changes are for: >>> >>> 1. Avoiding having some values being redefined in SA. Instead have >>> those exported through vmStructs, and read it in SA. >>> (CompactibleFreeListSpace::_min_chunk_size_in_bytes, >>> CompactibleFreeListSpace::IndexSetSize) >>> >>> Redefinition of hotspot values in SA makes SA error prone, when the >>> value gets altered in hotspot and the corresponding modification gets >>> missed out in SA. >>> >>> 2. To remove some unused code (JNIid.java). >>> 3. Add the missing "CMSBitMap::_bmStartWord" in vmStructs. >>> 4. Modify variable names in SA and hotspot to match the counterpart >>> names, so that the comparison of the fields become easier. Most of >>> the changes belong to this group. >>> >>> Could I please get reviews done for these precursor changes ? >>> >>> JBS Id: https://bugs.openjdk.java.net/browse/JDK-8189798 >>> webrev: http://cr.openjdk.java.net/~jgeorge/8189798/webrev.00/ >>> >>> Thank you, >>> Jini. >>> > From mandy.chung at oracle.com Fri Oct 27 17:47:21 2017 From: mandy.chung at oracle.com (mandy chung) Date: Fri, 27 Oct 2017 10:47:21 -0700 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> Message-ID: <51f09db9-06f5-ad01-bc92-1d73e1113f86@oracle.com> On 10/27/17 7:08 AM, coleen.phillimore at oracle.com wrote: > > > On 10/27/17 9:37 AM, David Holmes wrote: >> >> The one file that is needed is a hotspot file - jvm.h defines the >> interface that hotspot exports via jvm.cpp. >> >> If you leave jvm.h in hotspot/prims then a very large chunk of your >> boilerplate changes are not needed. The JDK code doesn't care what >> the name of the directory is - whatever it is just gets added as a -I >> directive (the JDK code will include "jvm.h" not "prims/jvm.h" the >> way hotspot sources do. >> >> This isn't something we want to change back or move again later. >> Whatever we do now we live with. > > I think it belongs with jni.h and I think the core libraries group > would agree.?? It seems more natural there than buried in the hotspot > prims directory.? I guess this is on hold while we have this debate.?? > Sigh. > > Actually with -I directives, changing to jvm.h from prims/jvm.h would > still work.?? Maybe we should change the name to jvm.hpp since it's > jvm.cpp though??? Or maybe just have two divergent copies and close > this as WNF. I also think hotspot/prims is not a good location. src/java.base/share/include is a well-defined location for native header files.? Maybe internal header files could be placed in include/internal but this is a separate issue .? I should create an issue for jvm.h and jmm.h (I looked at the files under the include directory and jvm.h and jmm.h are the only two internal header files in the include directory). I do think removing the duplicated copy of jvm.h is a good change. This is finally possible with the consolidated repository and we no longer need to update two copies of jvm.h for any change to the JVM interface.?? This change will work with -I directive setting to the new location, if changed later. What do you think? Mandy From coleen.phillimore at oracle.com Fri Oct 27 18:13:22 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 27 Oct 2017 14:13:22 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <51f09db9-06f5-ad01-bc92-1d73e1113f86@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <51f09db9-06f5-ad01-bc92-1d73e1113f86@oracle.com> Message-ID: <2a48b157-06e5-668e-7533-3e073620d7cd@oracle.com> On 10/27/17 1:47 PM, mandy chung wrote: > > > On 10/27/17 7:08 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/27/17 9:37 AM, David Holmes wrote: >>> >>> The one file that is needed is a hotspot file - jvm.h defines the >>> interface that hotspot exports via jvm.cpp. >>> >>> If you leave jvm.h in hotspot/prims then a very large chunk of your >>> boilerplate changes are not needed. The JDK code doesn't care what >>> the name of the directory is - whatever it is just gets added as a >>> -I directive (the JDK code will include "jvm.h" not "prims/jvm.h" >>> the way hotspot sources do. >>> >>> This isn't something we want to change back or move again later. >>> Whatever we do now we live with. >> >> I think it belongs with jni.h and I think the core libraries group >> would agree.?? It seems more natural there than buried in the hotspot >> prims directory.? I guess this is on hold while we have this >> debate.?? Sigh. >> >> Actually with -I directives, changing to jvm.h from prims/jvm.h would >> still work.?? Maybe we should change the name to jvm.hpp since it's >> jvm.cpp though??? Or maybe just have two divergent copies and close >> this as WNF. > > I also think hotspot/prims is not a good location. > src/java.base/share/include is a well-defined location for native > header files.? Maybe internal header files could be placed in > include/internal but this is a separate issue .? I should create an > issue for jvm.h and jmm.h (I looked at the files under the include > directory and jvm.h and jmm.h are the only two internal header files > in the include directory). > > I do think removing the duplicated copy of jvm.h is a good change.? > This is finally possible with the consolidated repository and we no > longer need to update two copies of jvm.h for any change to the JVM > interface.?? This change will work with -I directive setting to the > new location, if changed later. > > What do you think? I agree.? I'm not really bothered by it being in src/java/base/share/include in the first place though.?? Only jni.h and jni_md.h are copied into the images, so this seems a bit pained to make jvm.h be in some other directory.? But your call, really. Thanks, Coleen > > Mandy From coleen.phillimore at oracle.com Fri Oct 27 20:20:18 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 27 Oct 2017 16:20:18 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <57390ec3-8d8d-a3d7-9774-b5945a323be9@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <57390ec3-8d8d-a3d7-9774-b5945a323be9@oracle.com> Message-ID: Incremental webrev: http://cr.openjdk.java.net/~coleenp/8189610.incr.01/webrev/index.html thanks, Coleen On 10/27/17 11:13 AM, coleen.phillimore at oracle.com wrote: > > > On 10/27/17 9:37 AM, David Holmes wrote: >>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>> >>>> ?ConstantIntValue((jint)0); >>>> >>>> why is this cast needed? what causes the ambiguity? (If this was a >>>> template I'd understand ;-) ). Also didn't you change that >>>> constructor to take an int anyway - not that I think it should - >>>> see below. >>> >>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >>> 'long' better than any pointer type.? So this cast is needed. >> >> But you changed the constructor to take an int! >> >> ?class ConstantIntValue: public ScopeValue { >> ? private: >> -? jint _value; >> +? int _value; >> ? public: >> -? ConstantIntValue(jint value)???????? { _value = value; } >> +? ConstantIntValue(int value)????????? { _value = value; } >> > I changed this back to not take an int and changed c1_LinearScan.cpp > to have the (jint)0 cast and output.cp needed (jint)0 casts.? 0L > doesn't work for platforms where jint is an 'int' rather than a long > because it's ambiguous with the functions that take a pointer type. > Probably better to keep the type of ConstantIntValue consistent with j > types. > > Thanks, > Coleen From david.holmes at oracle.com Sat Oct 28 07:46:44 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 28 Oct 2017 17:46:44 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <51f09db9-06f5-ad01-bc92-1d73e1113f86@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <51f09db9-06f5-ad01-bc92-1d73e1113f86@oracle.com> Message-ID: <66b590da-f94c-6d87-cf61-e269bf1afc0d@oracle.com> On 28/10/2017 3:47 AM, mandy chung wrote: > On 10/27/17 7:08 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/27/17 9:37 AM, David Holmes wrote: >>> >>> The one file that is needed is a hotspot file - jvm.h defines the >>> interface that hotspot exports via jvm.cpp. >>> >>> If you leave jvm.h in hotspot/prims then a very large chunk of your >>> boilerplate changes are not needed. The JDK code doesn't care what >>> the name of the directory is - whatever it is just gets added as a -I >>> directive (the JDK code will include "jvm.h" not "prims/jvm.h" the >>> way hotspot sources do. >>> >>> This isn't something we want to change back or move again later. >>> Whatever we do now we live with. >> >> I think it belongs with jni.h and I think the core libraries group >> would agree.?? It seems more natural there than buried in the hotspot >> prims directory.? I guess this is on hold while we have this debate. >> Sigh. >> >> Actually with -I directives, changing to jvm.h from prims/jvm.h would >> still work.?? Maybe we should change the name to jvm.hpp since it's >> jvm.cpp though??? Or maybe just have two divergent copies and close >> this as WNF. > > I also think hotspot/prims is not a good location. > src/java.base/share/include is a well-defined location for native header > files.? Maybe internal header files could be placed in include/internal > but this is a separate issue .? I should create an issue for jvm.h and > jmm.h (I looked at the files under the include directory and jvm.h and > jmm.h are the only two internal header files in the include directory). Keeping it in prims avoids the need to touch many hotspot files, and with no changes needed on the JDK side because we use a -I directive to set the include path anyway. This is the exported VM interface so it makes sense to me for it to be located in the VM sources. But I'm not going to oppose this either way so it's up to Coleen. > I do think removing the duplicated copy of jvm.h is a good change. This > is finally possible with the consolidated repository and we no longer > need to update two copies of jvm.h for any change to the JVM Unfortunately we did not do this though - hence the divergence between the two. The use of int versus long for jint is causing a real problem. Coleen also hit the other issue on the head. The JNI and JVM interfaces are C interfaces, not C++. The JDK code that uses them is compiled as C - so all good. But the JVM code that implements them is compiled as C++, and that is why we are getting issues with differing linkage directives. David ----- > interface.?? This change will work with -I directive setting to the new > location, if changed later. > > What do you think? > > Mandy From david.holmes at oracle.com Sat Oct 28 07:50:27 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 28 Oct 2017 17:50:27 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> Message-ID: <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> Hi Coleen, I've commented on the file location in response to Mandy's email. The only issue I'm still concerned about is the JVM_MAXPATHLEN issue. I think it is a bug to define a JVM_MAXPATHLEN that is bigger than the platform MAXPATHLEN. I also would not want to see any change in behaviour because of this - so AIX and Solaris should not get a different JVM_MAXPATHLEN due to this refactoring change. So yes I think this needs to be ifdef'd for Linux and reluctantly (because it was a copy error) for OSX/BSD as well. Thanks, David On 28/10/2017 12:08 AM, coleen.phillimore at oracle.com wrote: > > > On 10/27/17 9:37 AM, David Holmes wrote: >> On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>> >>> >>> On 10/27/17 3:23 AM, David Holmes wrote: >>>> Hi Coleen, >>>> >>>> Thanks for tackling this. >>>> >>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>> >>>> Can you update the bug synopsis to show it covers both sets of files >>>> please. >>>> >>>> I hate to start with this (and it took me quite a while to realize >>>> it) but as Mandy pointed out jvm.h is not an exported interface from >>>> the JDK to the outside world (so not subject to CSR review), but is >>>> a private interface between the JVM and the JDK libraries. So I >>>> think really jvm.h belongs in the hotspot sources where it was, >>>> while jni.h belongs in the exported JDK sources. In which case the >>>> bulk of your changes to the hotspot files would not be needed - sorry. >>> >>> Maybe someone can make that decision and change at a later date. The >>> point of this change is that there is now only one of these files >>> that is shared.? I don't think jvm.h and the jvm_md.h belong on the >>> hotspot sources for the jdk to find them in some random prims and os >>> dependent directories. >> >> The one file that is needed is a hotspot file - jvm.h defines the >> interface that hotspot exports via jvm.cpp. >> >> If you leave jvm.h in hotspot/prims then a very large chunk of your >> boilerplate changes are not needed. The JDK code doesn't care what the >> name of the directory is - whatever it is just gets added as a -I >> directive (the JDK code will include "jvm.h" not "prims/jvm.h" the way >> hotspot sources do. >> >> This isn't something we want to change back or move again later. >> Whatever we do now we live with. > > I think it belongs with jni.h and I think the core libraries group would > agree.?? It seems more natural there than buried in the hotspot prims > directory.? I guess this is on hold while we have this debate.?? Sigh. > > Actually with -I directives, changing to jvm.h from prims/jvm.h would > still work.?? Maybe we should change the name to jvm.hpp since it's > jvm.cpp though??? Or maybe just have two divergent copies and close this > as WNF. > >> >>> I'm happy to withdraw the CSR.? We generally use the CSR process to >>> add and remove JVM_ interfaces even though they're a private >>> interface in case some other JVM/JDK combination relies on them. The >>> changes to these files are very minor though and not likely to cause >>> any even theoretical incompatibility, so I'll withdraw it. >>>> >>>> Moving on ... >>>> >>>> First to address the initial comments/query you had: >>>> >>>>> The JDK windows jni_md.h file defined jint as long and the hotspot >>>>> windows jni_x86.h as int. I had to choose the jdk version since >>>>> it's the >>>>> public version, so there are changes to the hotspot files for this. >>>> >>>> On Windows int and long are always the same as it uses ILP32 or >>>> LLP64 (not LP64 like *nix platforms). So either choice should be >>>> fine. That said there are some odd casting issues I comment on >>>> below. Does the VS compiler complain about mixing int and long in >>>> expressions? >>> >>> Yes, it does even though int and long are the same representation. >> >> And what an absolute mess that makes. :( >> >>>> >>>>> Generally I changed the code to use 'int' rather than 'jint' where the >>>>> surrounding API didn't insist on consistently using java types. We >>>>> should mostly be using C++ types within hotspot except in >>>>> interfaces to >>>>> native/JNI code. >>>> >>>> I think you pulled too hard on a few threads here and things are >>>> starting to unravel. There are numerous cases I refer to below where >>>> either the cast seems unnecessary/inappropriate or else highlights a >>>> bunch of additional changes that also need to be made. The fan out >>>> from this could be horrendous. Unless you actually get some kind of >>>> error - and I'd like to understand the details of those - I would >>>> not suggest making these changes as part of this work. >>> >>> I didn't make any change unless there was was an error.? I have 100 >>> failed JPRT jobs to confirm!? I eventually got a Windows system to >>> compile and test this on.?? Actually some of the changes came out >>> better.? Cases where we use jint as a bool simply turned to int.? We >>> do not have an overload for bool for cmpxchg. >> >> That's unfortunate - ditto for OrderAccess. >> >>>> >>>> Looking through I have a quite a few queries/comments - apologies in >>>> advance as I know how tedious this is: >>>> >>>> make/hotspot/lib/CompileLibjsig.gmk >>>> src/java.base/solaris/native/libjsig/jsig.c >>>> >>>> Took a while to figure out why the include was needed. :) As a >>>> follow up I suggest just deleting the -I include directive, delete >>>> the Solaris-only definition of JSIG_VERSION_1_4_1, and delete >>>> everything to do with JVM_get_libjsig_version. It is all obsolete. >>> >>> Can I patch up jsig in a separate RFE?? I don't remember why this >>> broke so I simply moved JSIG #define.? Is jsig obsolete? Removing >>> JVM_* definitions generally requires a CSR. >> >> I did say "As a follow up". jsig is not obsolete but the jsig >> versioning code, only used by Solaris, is. >> >>>> >>>> --- >>>> >>>> src/hotspot/cpu/arm/interp_masm_arm.cpp >>>> >>>> Why did you need to add the jvm.h include? >>>> >>> >>> ?? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); >> >> Okay. I'm not going to try and figure out how this code found this >> before. >> >>>> --- >>>> >>>> src/hotspot/os/windows/os_windows.cpp. >>>> >>>> The type of process_exiting should be uint to match the DWORD of >>>> GetCurrentThreadID. Then you should need any casts. Also you missed >>>> this jint cast: >>>> >>>> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { >>> >>> Yes, that's better to change process_exiting to a DWORD.? It needs a >>> DWORD cast to 0 in the cmpxchg. >>> >>> ???????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, >>> (DWORD)0); >>> >>> These templates are picky. >> >> Yes - their inability to deal with literals is extremely frustrating. >> >>>> >>>> --- >>>> >>>> src/hotspot/share/c1/c1_Canonicalizer.hpp >>>> >>>> ? 43 #ifdef _WINDOWS >>>> ? 44?? // jint is defined as long in jni_md.h, so convert from int >>>> to jint >>>> ? 45?? void set_constant(int x)?????????????????????? { >>>> set_constant((jint)x); } >>>> ? 46 #endif >>>> >>>> Why is this necessary? int and long are the same on Windows. The >>>> whole point is that jint hides the underlying type, so where does >>>> this go wrong? >>> >>> No, they are not the same types even though they have the same >>> representation! >> >> This is truly unfortunate. >> >>>> >>>> --- >>>> >>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>> >>>> ?ConstantIntValue((jint)0); >>>> >>>> why is this cast needed? what causes the ambiguity? (If this was a >>>> template I'd understand ;-) ). Also didn't you change that >>>> constructor to take an int anyway - not that I think it should - see >>>> below. >>> >>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >>> 'long' better than any pointer type.? So this cast is needed. >> >> But you changed the constructor to take an int! >> >> ?class ConstantIntValue: public ScopeValue { >> ? private: >> -? jint _value; >> +? int _value; >> ? public: >> -? ConstantIntValue(jint value)???????? { _value = value; } >> +? ConstantIntValue(int value)????????? { _value = value; } >> >> > > Okay I removed this cast. > >>>> --- >>>> >>>> src/hotspot/share/ci/ciReplay.cpp >>>> >>>> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >>>> >>>> why should this be jint? >>> >>> To avoid a cast from int* to jint* in the line below: >>> >>> ????????? value = kelem->multi_allocate(rank, dims, CHECK); >>> >>> >>>> >>>> --- >>>> >>>> src/hotspot/share/classfile/altHashing.cpp >>>> >>>> Okay this looks more consistent with jint. >>> >>> Yes.? I translated this from some native code iirc. >>>> >>>> --- >>>> >>>> src/hotspot/share/code/debugInfo.hpp >>>> >>>> These changes seem wrong. We have: >>>> >>>> ConstantLongValue(jlong value) >>>> ConstantDoubleValue(jdouble value) >>>> >>>> so we should have: >>>> >>>> ConstantIntValue(jint value) >>> >>> Again, there are multiple call sites with '0', which match int >>> trivially but are confused with long.? It's less consistent I agree >>> but better to not cast all the call sites. >> >> This is really making a mess of the APIs - they should be a jint but >> we declare them int because of a 0 casting problem. Can't we just use 0L? > > There aren't that many casts.? You're right, that would have been better > in some places. > >>>> >>>> --- >>>> >>>> src/hotspot/share/code/relocInfo.cpp >>>> >>>> Change seems unnecessary - int32_t is fine >>>> >>> >>> No, int32_t doesn't match the calls below it.? They all assume _lo >>> and _hi are jint. >>>> --- >>>> >>>> src/hotspot/share/compiler/compileBroker.cpp >>>> src/hotspot/share/compiler/compileBroker.hpp >>>> >>>> I see a complete mix of int and jint in this class, so why make the >>>> one change you did ?? >>> >>> This is another case of using jint as a flag with cmpxchg.? The >>> templates for cmpxchg want the types to match and 0 and 1 are >>> essentially 'int'.? This is a lot cleaner this way. >> >> >> >>>> >>>> --- >>>> >>>> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >>>> >>>> 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); >>>> >>>> why did you need to add the jint cast? It's used without any cast on >>>> the next two lines: >>>> >>>> 1701???? length -= O_BUFLEN; >>>> 1702???? offset += O_BUFLEN; >>>> >>> >>> There's a conversion from O_BUFLEN from int to long in 1701 and >>> 1702.?? MIN2 is a template that wants the types to match exactly. >> >> $%^%$! templates! >> >>>> ?? >>>> >>>> --- >>>> >>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>> >>>> Looking around this code it seems very confused about types - eg the >>>> previous function is declared jboolean yet returns a jint on one >>>> path! It isn't clear to me if the return type is what should be >>>> changed or the parameter type? I would just leave this alone. >>> >>> I can't leave it alone because it doesn't compile that way. This was >>> the minimal change and yea, does look a bit inconsistent. >>>> >>>> --- >>>> >>>> src/hotspot/share/opto/mulnode.cpp >>>> >>>> Okay TypeInt has jint parts, so the remaining int32_t declarations >>>> (A, B, C, D) should also be jint. >>> >>> Yes.? c2 uses jint types. >>>> >>>> --- >>>> >>>> src/hotspot/share/opto/parse3.cpp >>>> >>>> I agree with the changes you made, but then: >>>> >>>> ?419???? jint dim_con = find_int_con(length[j], -1); >>>> >>>> should also be changed. >>>> >>>> And obviously MultiArrayExpandLimit should be defined as int not intx! >>> >>> Everything in globals.hpp is intx.? That's a thread that I don't want >>> to pull on! >> >> We still have that limitation? >>> >>> Changed dim_con to int. >>>> >>>> --- >>>> >>>> src/hotspot/share/opto/phaseX.cpp >>>> >>>> I can see that intcon(jint i) is consistent with longcon(jlong l), >>>> but the use of "i" in the code is more consistent with int than jint. >>> >>> huh?? really? >>>> >>>> --- >>>> >>>> src/hotspot/share/opto/type.cpp >>>> >>>> 1505 int TypeInt::hash(void) const { >>>> 1506?? return java_add(java_add(_lo, _hi), java_add((jint)_widen, >>>> (jint)Type::Int)); >>>> 1507 } >>>> >>>> I can see that the (jint) casts you added make sense, but then the >>>> whole function should be returning jint not int. Ditto the other >>>> hash functions. >>> >>> I'm not messing with this, this is the minimal in type fixing that >>> I'm going to do here. >> >> >> >>>> >>>> --- >>>> >>>> src/hotspot/share/prims/jni.cpp >>>> >>>> I think vm_created should be a bool. In fact all the fields you >>>> changed are logically bools - do Atomics work for bool now? >>> >>> No, they do not.?? I had thought bool would be better originally too. >>>> >>>> --- >>>> >>>> src/hotspot/share/prims/jvm.cpp >>>> >>>> is_attachable is the terminology used in the JDK code. >>> >>> Well the JDK version had is_attach_supported() as the flag name so I >>> used that in this one place. >>>> >>>> --- >>>> >>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>> src/hotspot/share/prims/jvmtiImpl.cpp >>>> >>>> Are you making parameters consistent with the fields they initialize? >>> >>> They're consistent with the declarations now. >>>> >>>> --- >>>> >>>> src/hotspot/share/prims/jvmtiTagMap.cpp >>>> >>>> There is a mix of int and jint for slot in this code. You fixed >>>> some, but this remains: >>>> >>>> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong >>>> thread_tag, >>>> 2441??????????????????????????????????????????????????? jlong tid, >>>> 2442??????????????????????????????????????????????????? jint depth, >>>> 2443 jmethodID method, >>>> 2444 jlocation bci, >>>> 2445??????????????????????????????????????????????????? jint slot, >>> >>> Right for consistency with the declarations. >>>> >>>> --- >>>> >>>> src/hotspot/share/runtime/perfData.cpp >>>> >>>> Callers pass both jint and int, so param type seems arbitrary. >>> >>> They are, but importantly they match the declarations. >>>> >>>> --- >>>> >>>> src/hotspot/share/runtime/perfMemory.cpp >>>> src/hotspot/share/runtime/perfMemory.hpp >>>> >>>> PerfMemory::_initialized should ideally be a bool - can OrderAccess >>>> handle that now? >>> >>> Nope. >>>> >>>> --- >>>> >>>> src/java.base/share/native/include/jvm.h >>>> >>>> Not clear why the jio functions are not also JNICALL ? >>> >>> They are now.? The JDK version didn't have JNICALL.? JVM needs >>> JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. >> >> ?? JVM currently does not have JNICALL. But they are declared as >> "extern C". > > This was a compilation error on Windows with JDK.?? Maybe the C code in > the JDK doesn't complain about linkage differences.? I'll have to go > back and figure this out then. >> >>>> >>>> --- >>>> >>>> src/java.base/unix/native/include/jni_md.h >>>> >>>> There is no need to special case ARM. The differences in the >>>> existing code were for LTO support and that is now irrelevant. >>> >>> See discussion with Magnus.?? We still build ARM for jdk10/hs so I >>> needed this conditional or of course I wouldn't have added it.? We >>> can remove it with LTO support. >> >> Those builds are gone - this is obsolete. But yes all LTO can be >> removed later if you wish. Just trying to simplify things now. >> >>>> >>>> --- >>>> >>>> src/java.base/unix/native/include/jvm_md.h >>>> >>>> I know you've just copied this across, but it seems wrong to me: >>>> >>>> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. >>>> This may >>>> ? 58 //?????? cause problems if JVM and the rest of JDK are built on >>>> different >>>> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >>>> MAXPATHLEN + 1, >>>> ? 60 //?????? so buffers declared in VM are always >= 4096. >>>> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>> >>>> It doesn't make sense to me to define an internal "max path length" >>>> that can _exceed_ the platform max! >>>> >>>> That aside there's no support for building different parts of the >>>> JDK on different platforms and then bringing them together. And in >>>> any case I would think the real problem would be building on a >>>> platform that uses 4096 and running on one that uses 4095! >>>> >>>> But that aside this is a Linux hack and should be guarded by ifdef >>>> LINUX. (I doubt BSD needs it, the bsd file is just a copy of the >>>> linux one - the JDK macosx version does the right thing). Solaris >>>> and AIX should stay as-is at MAXPATHLEN. >>> >>> All of the unix platforms had MAXPATHLEN+1.? I'll leave it for now >>> and we can investigate that further. >> >> I see the following existing code: >> >> src/java.base/unix/native/include/jvm_md.h: >> >> #define JVM_MAXPATHLEN MAXPATHLEN >> >> src/java.base/macosx/native/include/jvm_md.h >> >> #define JVM_MAXPATHLEN MAXPATHLEN >> >> src/hotspot/os/aix/jvm_aix.h >> >> #define JVM_MAXPATHLEN MAXPATHLEN >> >> src/hotspot/os/bsd/jvm_bsd.h >> >> #define JVM_MAXPATHLEN MAXPATHLEN + 1? // blindly copied from Linux >> version >> >> src/hotspot/os/linux/jvm_linux.h >> >> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >> >> src/hotspot/os/solaris/jvm_solaris.h >> >> #define JVM_MAXPATHLEN MAXPATHLEN >> >> This is a linux only hack (if you ignore the blind copy from linux >> into the BSD code in the VM). > > Oh, thanks, so should I add a bunch of ifdefs then?? Or do you think > having MAXPATHLEN + 1 will really break the other platforms?? Do you > really see this as a problem or are you just pointing out inconsistency? >> >>>> >>>> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >>>> >>>> This only exists on Solaris so I think should be in #ifdef SOLARIS, >>>> to make that clear. >>> >>> Ok.? I'll add this. >>>> >>>> --- >>>> >>>> src/java.base/windows/native/include/jvm_md.h >>>> >>>> Given the differences between the two versions either something has >>>> been broken or "extern C" declarations are not needed :) >>> >>> Well, they are needed for Hotspot to build and do not prevent jdk >>> from building.? I don't know what was broken. >> >> We really need to understand this better. Maybe related to the map >> files that expose the symbols. ?? > > They're needed because the JDK files are written mostly in C and that > doesn't complain about the linkage difference.? Hotspot files are in C++ > which does complain. > >> >>>> >>>> --- >>>> >>>> That was a really painful way to spend most of my Friday. TGIF! :) >>> >>> Thanks for going through it.? See comments inline for changes. >>> Generating a webrev takes hours so I'm not going to do that unless >>> you insist. >> >> An incremental webrev shouldn't take long - right? You're a mq maestro >> now. :) > > Well I generally trash a repository whenever I use mq but sure. >> >> If you can reasonably produce an incremental webrev once you've >> settled on all the comments/issues that would be good. > > Ok, sure. > > Coleen >> >> Thanks, >> David >> >>> Thanks, >>> Coleen >>> >>> >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>> >>>> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>>>> ??Hi Magnus, >>>>> >>>>> Thank you for reviewing this.?? I have a new version that takes out >>>>> the hack in globalDefinitions.hpp and adds casts to >>>>> src/hotspot/share/opto/type.cpp instead. >>>>> >>>>> Also some fixes from Martin at SAP. >>>>> >>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>>>> >>>>> see below. >>>>> >>>>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>>>> Coleen, >>>>>> >>>>>> Thank you for addressing this! >>>>>> >>>>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>> >>>>>>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>>>>>> after precompiled.h, so if you have repetitive stress wrist >>>>>>> issues don't click on most of these files. >>>>>>> >>>>>>> There were more issues to resolve, however.? The JDK windows >>>>>>> jni_md.h file defined jint as long and the hotspot windows >>>>>>> jni_x86.h as int. I had to choose the jdk version since it's the >>>>>>> public version, so there are changes to the hotspot files for >>>>>>> this. Generally I changed the code to use 'int' rather than >>>>>>> 'jint' where the surrounding API didn't insist on consistently >>>>>>> using java types. We should mostly be using C++ types within >>>>>>> hotspot except in interfaces to native/JNI code.? There are a >>>>>>> couple of hacks in places where adding multiple jint casts was >>>>>>> too painful. >>>>>>> >>>>>>> Tested with JPRT and tier2-4 (in progress). >>>>>>> >>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>>>> >>>>>> Looks great! >>>>>> >>>>>> Just a few comments: >>>>>> >>>>>> * src/java.base/unix/native/include/jni_md.h: >>>>>> >>>>>> I don't think the externally_visible attribute should be there for >>>>>> arm. I know this was the case for the corresponding hotspot file >>>>>> for arm, but that was techically incorrect. The proper dependency >>>>>> here is that externally_visible should be in all JNIEXPORT if and >>>>>> only if we're building with JVM feature "link-time-opt". >>>>>> Traditionally, that feature been enabled when building arm32 >>>>>> builds, and only then, so there's been a (coincidentally) >>>>>> connection here. Nowadays, Oracle does not care about the arm32 >>>>>> builds, and I'm not sure if anyone else is building them with >>>>>> link-time-opt enabled. >>>>>> >>>>>> It does seem wrong to me to export this behavior in the public >>>>>> jni_md.h file, though. I think the correct way to solve this, if >>>>>> we should continue supporting link-time-opt is to make sure this >>>>>> attribute is set for exported hotspot functions. If it's still >>>>>> needed, that is. A quick googling seems to indicate that >>>>>> visibility("default") might be enough in modern gcc's. >>>>>> >>>>>> A third option is to remove the support for link-time-opt >>>>>> entirely, if it's not really used. >>>>> >>>>> I didn't know how to change this since we are still building ARM >>>>> with the jdk10/hs repository, and ARM needed this change.? I could >>>>> wait until we bring down the jdk10/master changes that remove the >>>>> ARM build and remove this conditional before I push. Or we could >>>>> file an RFE to remove link-time-opt (?) and remove it then? >>>>> >>>>>> >>>>>> * src/java.base/unix/native/include/jvm_md.h and >>>>>> src/java.base/windows/native/include/jvm_md.h: >>>>>> >>>>>> These files define a public API, and contain non-trivial changes. >>>>>> I suspect you should file a CSR request. (Even though I realize >>>>>> you're only matching the header file with the reality.) >>>>>> >>>>> >>>>> I filed the CSR.?? Waiting for the next steps. >>>>> >>>>> Thanks, >>>>> Coleen >>>>> >>>>>> /Magnus >>>>>> >>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>>>> >>>>>>> I have a script to update copyright files on commit. >>>>>>> >>>>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>>>> >>>>>>> Thanks, >>>>>>> Coleen >>>>>>> >>>>>> >>>>> >>> > From david.holmes at oracle.com Sat Oct 28 07:58:30 2017 From: david.holmes at oracle.com (David Holmes) Date: Sat, 28 Oct 2017 17:58:30 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <57390ec3-8d8d-a3d7-9774-b5945a323be9@oracle.com> Message-ID: <0f568e05-6f06-d2df-571e-0c591f062c15@oracle.com> On 28/10/2017 6:20 AM, coleen.phillimore at oracle.com wrote: > > Incremental webrev: > > http://cr.openjdk.java.net/~coleenp/8189610.incr.01/webrev/index.html That all looks fine - thanks. If I get a chance I'll look deeper into why the VS compiler needs 0 to be cast to jint (aka long) to avoid ambiguity with it being a NULL pointer. I could understand if it always needed the cast, but not only needing it for long, but not int. Thanks, David > thanks, > Coleen > > On 10/27/17 11:13 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/27/17 9:37 AM, David Holmes wrote: >>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>> >>>>> ?ConstantIntValue((jint)0); >>>>> >>>>> why is this cast needed? what causes the ambiguity? (If this was a >>>>> template I'd understand ;-) ). Also didn't you change that >>>>> constructor to take an int anyway - not that I think it should - >>>>> see below. >>>> >>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >>>> 'long' better than any pointer type.? So this cast is needed. >>> >>> But you changed the constructor to take an int! >>> >>> ?class ConstantIntValue: public ScopeValue { >>> ? private: >>> -? jint _value; >>> +? int _value; >>> ? public: >>> -? ConstantIntValue(jint value)???????? { _value = value; } >>> +? ConstantIntValue(int value)????????? { _value = value; } >>> >> I changed this back to not take an int and changed c1_LinearScan.cpp >> to have the (jint)0 cast and output.cp needed (jint)0 casts.? 0L >> doesn't work for platforms where jint is an 'int' rather than a long >> because it's ambiguous with the functions that take a pointer type. >> Probably better to keep the type of ConstantIntValue consistent with j >> types. >> >> Thanks, >> Coleen > From kumar.x.srinivasan at oracle.com Fri Oct 27 17:12:43 2017 From: kumar.x.srinivasan at oracle.com (Kumar Srinivasan) Date: Fri, 27 Oct 2017 10:12:43 -0700 Subject: RFR: 8190287: Update JDK's internal ASM to ASMv6 Message-ID: <59F3690B.6070309@oracle.com> Hello Remi, Sundar and others, Please review the webrev [1] to update JDK's internal ASM to v6. To help with review areas, you can use the browser to search for mq patches commented with // Highlights of changes: 1. updated ASMv6 // jdk-new-asmv6.patch 2. changes to jlink and jar to add ModuleMainClass and ModulePackages attributes //jdk-new-asm-update.patch 3. adjustments to jdk tests //jdk-new-asm-test.patch 4. minor adjustments to hotspot tests //jdk-new-hotspot-test.patch Tests: jdk_tier1, jdk_tier2, testset hotspot, hotspot_tier1, nashorn ant tests, Alan has also run several tests. Big thanks to Alan for #2 and #3 as part of [3]. Thanks Kumar [1] http://cr.openjdk.java.net/~ksrini/8190287/webrev.00/index.html [2] https://bugs.openjdk.java.net/browse/JDK-8190287 [3] https://bugs.openjdk.java.net/browse/JDK-8186236 From magnus.ihse.bursie at oracle.com Mon Oct 30 07:50:02 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 30 Oct 2017 08:50:02 +0100 Subject: RFR [10] 8189800: Add support for AddressSanitizer In-Reply-To: References: <51eabbae-5435-59be-f443-a6b214a17513@oracle.com> Message-ID: <55e0e055-2e65-5c83-3f8e-36895f71860e@oracle.com> On 2017-10-30 08:39, Artem Smotrakov wrote: > cc'ing hotspot-dev at openjdk.java.net as David suggested. > > Artem > > > On 10/27/2017 11:02 PM, Artem Smotrakov wrote: >> Hello, >> >> Please review the following patch which adds support for >> AddressSanitizer. >> >> AddressSanitizer is a runtime memory error detector which looks for >> various memory corruption issues and leaks. >> >> Please refer to [1] for details. AddressSanitizer is available in gcc >> 4.8+ and clang 3.1+ >> >> The patch below introduces --enable-asan parameter for the configure >> script which enables AddressSanitizer. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8189800 >> Webrev: http://cr.openjdk.java.net/~asmotrak/8189800/webrev.00/ spec.gmk.in should only have export for variables that needs to be exported in the environment for executing binaries, that is ASAN_OPTIONS and LD_LIBRARY_PATH, not ASAN_ENABLED or DEVKIT_LIB_DIR. I'm also a bit curious about the addition of of DEVKIT_LIB_DIR. Would you care to elaborate your thinking? Otherwise it looks good. /Magnus >> >> [1] https://github.com/google/sanitizers/wiki/AddressSanitizer >> >> Artem > From coleen.phillimore at oracle.com Mon Oct 30 12:07:46 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 30 Oct 2017 08:07:46 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <66b590da-f94c-6d87-cf61-e269bf1afc0d@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <51f09db9-06f5-ad01-bc92-1d73e1113f86@oracle.com> <66b590da-f94c-6d87-cf61-e269bf1afc0d@oracle.com> Message-ID: On 10/28/17 3:46 AM, David Holmes wrote: > On 28/10/2017 3:47 AM, mandy chung wrote: >> On 10/27/17 7:08 AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> On 10/27/17 9:37 AM, David Holmes wrote: >>>> >>>> The one file that is needed is a hotspot file - jvm.h defines the >>>> interface that hotspot exports via jvm.cpp. >>>> >>>> If you leave jvm.h in hotspot/prims then a very large chunk of your >>>> boilerplate changes are not needed. The JDK code doesn't care what >>>> the name of the directory is - whatever it is just gets added as a >>>> -I directive (the JDK code will include "jvm.h" not "prims/jvm.h" >>>> the way hotspot sources do. >>>> >>>> This isn't something we want to change back or move again later. >>>> Whatever we do now we live with. >>> >>> I think it belongs with jni.h and I think the core libraries group >>> would agree.?? It seems more natural there than buried in the >>> hotspot prims directory.? I guess this is on hold while we have this >>> debate.?? Sigh. >>> >>> Actually with -I directives, changing to jvm.h from prims/jvm.h >>> would still work.?? Maybe we should change the name to jvm.hpp since >>> it's jvm.cpp though??? Or maybe just have two divergent copies and >>> close this as WNF. >> >> I also think hotspot/prims is not a good location. >> src/java.base/share/include is a well-defined location for native >> header files.? Maybe internal header files could be placed in >> include/internal but this is a separate issue .? I should create an >> issue for jvm.h and jmm.h (I looked at the files under the include >> directory and jvm.h and jmm.h are the only two internal header files >> in the include directory). > > Keeping it in prims avoids the need to touch many hotspot files, and > with no changes needed on the JDK side because we use a -I directive > to set the include path anyway. This is the exported VM interface so > it makes sense to me for it to be located in the VM sources. > > But I'm not going to oppose this either way so it's up to Coleen. I've already disagreed that this file belongs in src/hotspot/share/prims, so the include directive without prims is preferred.? This allows putting jvm.h in a new place if/when that is agreed upon. > >> I do think removing the duplicated copy of jvm.h is a good change. >> This is finally possible with the consolidated repository and we no >> longer need to update two copies of jvm.h for any change to the JVM > > Unfortunately we did not do this though - hence the divergence between > the two. The use of int versus long for jint is causing a real problem. > > Coleen also hit the other issue on the head. The JNI and JVM > interfaces are C interfaces, not C++. The JDK code that uses them is > compiled as C - so all good. But the JVM code that implements them is > compiled as C++, and that is why we are getting issues with differing > linkage directives. Well, there is now one source file for jvm.h and jni.h and their machine dependent counterparts and 2500 lines of duplicated code is removed with this change.? The issues with jint and linkages are resolved and tested as well with this changeset. Thanks, Coleen > > David > ----- > >> interface.?? This change will work with -I directive setting to the >> new location, if changed later. >> >> What do you think? >> >> Mandy From coleen.phillimore at oracle.com Mon Oct 30 12:13:45 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 30 Oct 2017 08:13:45 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> Message-ID: On 10/28/17 3:50 AM, David Holmes wrote: > Hi Coleen, > > I've commented on the file location in response to Mandy's email. > > The only issue I'm still concerned about is the JVM_MAXPATHLEN issue. > I think it is a bug to define a JVM_MAXPATHLEN that is bigger than the > platform MAXPATHLEN. I also would not want to see any change in > behaviour because of this - so AIX and Solaris should not get a > different JVM_MAXPATHLEN due to this refactoring change. So yes I > think this needs to be ifdef'd for Linux and reluctantly (because it > was a copy error) for OSX/BSD as well. #if defined(AIX) || defined(SOLARIS) #define JVM_MAXPATHLEN MAXPATHLEN #else // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This may //?????? cause problems if JVM and the rest of JDK are built on different //?????? Linux releases. Here we define JVM_MAXPATHLEN to be MAXPATHLEN + 1, //?????? so buffers declared in VM are always >= 4096. #define JVM_MAXPATHLEN MAXPATHLEN + 1 #endif Is this ok? thanks, Coleen > > Thanks, > David > > On 28/10/2017 12:08 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/27/17 9:37 AM, David Holmes wrote: >>> On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>>> >>>> >>>> On 10/27/17 3:23 AM, David Holmes wrote: >>>>> Hi Coleen, >>>>> >>>>> Thanks for tackling this. >>>>> >>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>> >>>>> Can you update the bug synopsis to show it covers both sets of >>>>> files please. >>>>> >>>>> I hate to start with this (and it took me quite a while to realize >>>>> it) but as Mandy pointed out jvm.h is not an exported interface >>>>> from the JDK to the outside world (so not subject to CSR review), >>>>> but is a private interface between the JVM and the JDK libraries. >>>>> So I think really jvm.h belongs in the hotspot sources where it >>>>> was, while jni.h belongs in the exported JDK sources. In which >>>>> case the bulk of your changes to the hotspot files would not be >>>>> needed - sorry. >>>> >>>> Maybe someone can make that decision and change at a later date. >>>> The point of this change is that there is now only one of these >>>> files that is shared.? I don't think jvm.h and the jvm_md.h belong >>>> on the hotspot sources for the jdk to find them in some random >>>> prims and os dependent directories. >>> >>> The one file that is needed is a hotspot file - jvm.h defines the >>> interface that hotspot exports via jvm.cpp. >>> >>> If you leave jvm.h in hotspot/prims then a very large chunk of your >>> boilerplate changes are not needed. The JDK code doesn't care what >>> the name of the directory is - whatever it is just gets added as a >>> -I directive (the JDK code will include "jvm.h" not "prims/jvm.h" >>> the way hotspot sources do. >>> >>> This isn't something we want to change back or move again later. >>> Whatever we do now we live with. >> >> I think it belongs with jni.h and I think the core libraries group >> would agree.?? It seems more natural there than buried in the hotspot >> prims directory.? I guess this is on hold while we have this >> debate.?? Sigh. >> >> Actually with -I directives, changing to jvm.h from prims/jvm.h would >> still work.?? Maybe we should change the name to jvm.hpp since it's >> jvm.cpp though??? Or maybe just have two divergent copies and close >> this as WNF. >> >>> >>>> I'm happy to withdraw the CSR.? We generally use the CSR process to >>>> add and remove JVM_ interfaces even though they're a private >>>> interface in case some other JVM/JDK combination relies on them. >>>> The changes to these files are very minor though and not likely to >>>> cause any even theoretical incompatibility, so I'll withdraw it. >>>>> >>>>> Moving on ... >>>>> >>>>> First to address the initial comments/query you had: >>>>> >>>>>> The JDK windows jni_md.h file defined jint as long and the hotspot >>>>>> windows jni_x86.h as int. I had to choose the jdk version since >>>>>> it's the >>>>>> public version, so there are changes to the hotspot files for this. >>>>> >>>>> On Windows int and long are always the same as it uses ILP32 or >>>>> LLP64 (not LP64 like *nix platforms). So either choice should be >>>>> fine. That said there are some odd casting issues I comment on >>>>> below. Does the VS compiler complain about mixing int and long in >>>>> expressions? >>>> >>>> Yes, it does even though int and long are the same representation. >>> >>> And what an absolute mess that makes. :( >>> >>>>> >>>>>> Generally I changed the code to use 'int' rather than 'jint' >>>>>> where the >>>>>> surrounding API didn't insist on consistently using java types. We >>>>>> should mostly be using C++ types within hotspot except in >>>>>> interfaces to >>>>>> native/JNI code. >>>>> >>>>> I think you pulled too hard on a few threads here and things are >>>>> starting to unravel. There are numerous cases I refer to below >>>>> where either the cast seems unnecessary/inappropriate or else >>>>> highlights a bunch of additional changes that also need to be >>>>> made. The fan out from this could be horrendous. Unless you >>>>> actually get some kind of error - and I'd like to understand the >>>>> details of those - I would not suggest making these changes as >>>>> part of this work. >>>> >>>> I didn't make any change unless there was was an error.? I have 100 >>>> failed JPRT jobs to confirm!? I eventually got a Windows system to >>>> compile and test this on.?? Actually some of the changes came out >>>> better.? Cases where we use jint as a bool simply turned to int.? >>>> We do not have an overload for bool for cmpxchg. >>> >>> That's unfortunate - ditto for OrderAccess. >>> >>>>> >>>>> Looking through I have a quite a few queries/comments - apologies >>>>> in advance as I know how tedious this is: >>>>> >>>>> make/hotspot/lib/CompileLibjsig.gmk >>>>> src/java.base/solaris/native/libjsig/jsig.c >>>>> >>>>> Took a while to figure out why the include was needed. :) As a >>>>> follow up I suggest just deleting the -I include directive, delete >>>>> the Solaris-only definition of JSIG_VERSION_1_4_1, and delete >>>>> everything to do with JVM_get_libjsig_version. It is all obsolete. >>>> >>>> Can I patch up jsig in a separate RFE?? I don't remember why this >>>> broke so I simply moved JSIG #define.? Is jsig obsolete? Removing >>>> JVM_* definitions generally requires a CSR. >>> >>> I did say "As a follow up". jsig is not obsolete but the jsig >>> versioning code, only used by Solaris, is. >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/cpu/arm/interp_masm_arm.cpp >>>>> >>>>> Why did you need to add the jvm.h include? >>>>> >>>> >>>> ?? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); >>> >>> Okay. I'm not going to try and figure out how this code found this >>> before. >>> >>>>> --- >>>>> >>>>> src/hotspot/os/windows/os_windows.cpp. >>>>> >>>>> The type of process_exiting should be uint to match the DWORD of >>>>> GetCurrentThreadID. Then you should need any casts. Also you >>>>> missed this jint cast: >>>>> >>>>> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { >>>> >>>> Yes, that's better to change process_exiting to a DWORD.? It needs >>>> a DWORD cast to 0 in the cmpxchg. >>>> >>>> ???????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, >>>> (DWORD)0); >>>> >>>> These templates are picky. >>> >>> Yes - their inability to deal with literals is extremely frustrating. >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/c1/c1_Canonicalizer.hpp >>>>> >>>>> ? 43 #ifdef _WINDOWS >>>>> ? 44?? // jint is defined as long in jni_md.h, so convert from int >>>>> to jint >>>>> ? 45?? void set_constant(int x)?????????????????????? { >>>>> set_constant((jint)x); } >>>>> ? 46 #endif >>>>> >>>>> Why is this necessary? int and long are the same on Windows. The >>>>> whole point is that jint hides the underlying type, so where does >>>>> this go wrong? >>>> >>>> No, they are not the same types even though they have the same >>>> representation! >>> >>> This is truly unfortunate. >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>> >>>>> ?ConstantIntValue((jint)0); >>>>> >>>>> why is this cast needed? what causes the ambiguity? (If this was a >>>>> template I'd understand ;-) ). Also didn't you change that >>>>> constructor to take an int anyway - not that I think it should - >>>>> see below. >>>> >>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >>>> 'long' better than any pointer type.? So this cast is needed. >>> >>> But you changed the constructor to take an int! >>> >>> ?class ConstantIntValue: public ScopeValue { >>> ? private: >>> -? jint _value; >>> +? int _value; >>> ? public: >>> -? ConstantIntValue(jint value)???????? { _value = value; } >>> +? ConstantIntValue(int value)????????? { _value = value; } >>> >>> >> >> Okay I removed this cast. >> >>>>> --- >>>>> >>>>> src/hotspot/share/ci/ciReplay.cpp >>>>> >>>>> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >>>>> >>>>> why should this be jint? >>>> >>>> To avoid a cast from int* to jint* in the line below: >>>> >>>> ????????? value = kelem->multi_allocate(rank, dims, CHECK); >>>> >>>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/classfile/altHashing.cpp >>>>> >>>>> Okay this looks more consistent with jint. >>>> >>>> Yes.? I translated this from some native code iirc. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/code/debugInfo.hpp >>>>> >>>>> These changes seem wrong. We have: >>>>> >>>>> ConstantLongValue(jlong value) >>>>> ConstantDoubleValue(jdouble value) >>>>> >>>>> so we should have: >>>>> >>>>> ConstantIntValue(jint value) >>>> >>>> Again, there are multiple call sites with '0', which match int >>>> trivially but are confused with long.? It's less consistent I agree >>>> but better to not cast all the call sites. >>> >>> This is really making a mess of the APIs - they should be a jint but >>> we declare them int because of a 0 casting problem. Can't we just >>> use 0L? >> >> There aren't that many casts.? You're right, that would have been >> better in some places. >> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/code/relocInfo.cpp >>>>> >>>>> Change seems unnecessary - int32_t is fine >>>>> >>>> >>>> No, int32_t doesn't match the calls below it.? They all assume _lo >>>> and _hi are jint. >>>>> --- >>>>> >>>>> src/hotspot/share/compiler/compileBroker.cpp >>>>> src/hotspot/share/compiler/compileBroker.hpp >>>>> >>>>> I see a complete mix of int and jint in this class, so why make >>>>> the one change you did ?? >>>> >>>> This is another case of using jint as a flag with cmpxchg. The >>>> templates for cmpxchg want the types to match and 0 and 1 are >>>> essentially 'int'.? This is a lot cleaner this way. >>> >>> >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >>>>> >>>>> 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); >>>>> >>>>> why did you need to add the jint cast? It's used without any cast >>>>> on the next two lines: >>>>> >>>>> 1701???? length -= O_BUFLEN; >>>>> 1702???? offset += O_BUFLEN; >>>>> >>>> >>>> There's a conversion from O_BUFLEN from int to long in 1701 and >>>> 1702.?? MIN2 is a template that wants the types to match exactly. >>> >>> $%^%$! templates! >>> >>>>> ?? >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>>> >>>>> Looking around this code it seems very confused about types - eg >>>>> the previous function is declared jboolean yet returns a jint on >>>>> one path! It isn't clear to me if the return type is what should >>>>> be changed or the parameter type? I would just leave this alone. >>>> >>>> I can't leave it alone because it doesn't compile that way. This >>>> was the minimal change and yea, does look a bit inconsistent. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/opto/mulnode.cpp >>>>> >>>>> Okay TypeInt has jint parts, so the remaining int32_t declarations >>>>> (A, B, C, D) should also be jint. >>>> >>>> Yes.? c2 uses jint types. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/opto/parse3.cpp >>>>> >>>>> I agree with the changes you made, but then: >>>>> >>>>> ?419???? jint dim_con = find_int_con(length[j], -1); >>>>> >>>>> should also be changed. >>>>> >>>>> And obviously MultiArrayExpandLimit should be defined as int not >>>>> intx! >>>> >>>> Everything in globals.hpp is intx.? That's a thread that I don't >>>> want to pull on! >>> >>> We still have that limitation? >>>> >>>> Changed dim_con to int. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/opto/phaseX.cpp >>>>> >>>>> I can see that intcon(jint i) is consistent with longcon(jlong l), >>>>> but the use of "i" in the code is more consistent with int than jint. >>>> >>>> huh?? really? >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/opto/type.cpp >>>>> >>>>> 1505 int TypeInt::hash(void) const { >>>>> 1506?? return java_add(java_add(_lo, _hi), java_add((jint)_widen, >>>>> (jint)Type::Int)); >>>>> 1507 } >>>>> >>>>> I can see that the (jint) casts you added make sense, but then the >>>>> whole function should be returning jint not int. Ditto the other >>>>> hash functions. >>>> >>>> I'm not messing with this, this is the minimal in type fixing that >>>> I'm going to do here. >>> >>> >>> >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/prims/jni.cpp >>>>> >>>>> I think vm_created should be a bool. In fact all the fields you >>>>> changed are logically bools - do Atomics work for bool now? >>>> >>>> No, they do not.?? I had thought bool would be better originally too. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/prims/jvm.cpp >>>>> >>>>> is_attachable is the terminology used in the JDK code. >>>> >>>> Well the JDK version had is_attach_supported() as the flag name so >>>> I used that in this one place. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>> src/hotspot/share/prims/jvmtiImpl.cpp >>>>> >>>>> Are you making parameters consistent with the fields they initialize? >>>> >>>> They're consistent with the declarations now. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/prims/jvmtiTagMap.cpp >>>>> >>>>> There is a mix of int and jint for slot in this code. You fixed >>>>> some, but this remains: >>>>> >>>>> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong >>>>> thread_tag, >>>>> 2441 jlong tid, >>>>> 2442 jint depth, >>>>> 2443 jmethodID method, >>>>> 2444 jlocation bci, >>>>> 2445 jint slot, >>>> >>>> Right for consistency with the declarations. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/runtime/perfData.cpp >>>>> >>>>> Callers pass both jint and int, so param type seems arbitrary. >>>> >>>> They are, but importantly they match the declarations. >>>>> >>>>> --- >>>>> >>>>> src/hotspot/share/runtime/perfMemory.cpp >>>>> src/hotspot/share/runtime/perfMemory.hpp >>>>> >>>>> PerfMemory::_initialized should ideally be a bool - can >>>>> OrderAccess handle that now? >>>> >>>> Nope. >>>>> >>>>> --- >>>>> >>>>> src/java.base/share/native/include/jvm.h >>>>> >>>>> Not clear why the jio functions are not also JNICALL ? >>>> >>>> They are now.? The JDK version didn't have JNICALL.? JVM needs >>>> JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. >>> >>> ?? JVM currently does not have JNICALL. But they are declared as >>> "extern C". >> >> This was a compilation error on Windows with JDK.?? Maybe the C code >> in the JDK doesn't complain about linkage differences. I'll have to >> go back and figure this out then. >>> >>>>> >>>>> --- >>>>> >>>>> src/java.base/unix/native/include/jni_md.h >>>>> >>>>> There is no need to special case ARM. The differences in the >>>>> existing code were for LTO support and that is now irrelevant. >>>> >>>> See discussion with Magnus.?? We still build ARM for jdk10/hs so I >>>> needed this conditional or of course I wouldn't have added it.? We >>>> can remove it with LTO support. >>> >>> Those builds are gone - this is obsolete. But yes all LTO can be >>> removed later if you wish. Just trying to simplify things now. >>> >>>>> >>>>> --- >>>>> >>>>> src/java.base/unix/native/include/jvm_md.h >>>>> >>>>> I know you've just copied this across, but it seems wrong to me: >>>>> >>>>> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. >>>>> This may >>>>> ? 58 //?????? cause problems if JVM and the rest of JDK are built >>>>> on different >>>>> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >>>>> MAXPATHLEN + 1, >>>>> ? 60 //?????? so buffers declared in VM are always >= 4096. >>>>> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>> >>>>> It doesn't make sense to me to define an internal "max path >>>>> length" that can _exceed_ the platform max! >>>>> >>>>> That aside there's no support for building different parts of the >>>>> JDK on different platforms and then bringing them together. And in >>>>> any case I would think the real problem would be building on a >>>>> platform that uses 4096 and running on one that uses 4095! >>>>> >>>>> But that aside this is a Linux hack and should be guarded by ifdef >>>>> LINUX. (I doubt BSD needs it, the bsd file is just a copy of the >>>>> linux one - the JDK macosx version does the right thing). Solaris >>>>> and AIX should stay as-is at MAXPATHLEN. >>>> >>>> All of the unix platforms had MAXPATHLEN+1.? I'll leave it for now >>>> and we can investigate that further. >>> >>> I see the following existing code: >>> >>> src/java.base/unix/native/include/jvm_md.h: >>> >>> #define JVM_MAXPATHLEN MAXPATHLEN >>> >>> src/java.base/macosx/native/include/jvm_md.h >>> >>> #define JVM_MAXPATHLEN MAXPATHLEN >>> >>> src/hotspot/os/aix/jvm_aix.h >>> >>> #define JVM_MAXPATHLEN MAXPATHLEN >>> >>> src/hotspot/os/bsd/jvm_bsd.h >>> >>> #define JVM_MAXPATHLEN MAXPATHLEN + 1? // blindly copied from Linux >>> version >>> >>> src/hotspot/os/linux/jvm_linux.h >>> >>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>> >>> src/hotspot/os/solaris/jvm_solaris.h >>> >>> #define JVM_MAXPATHLEN MAXPATHLEN >>> >>> This is a linux only hack (if you ignore the blind copy from linux >>> into the BSD code in the VM). >> >> Oh, thanks, so should I add a bunch of ifdefs then?? Or do you think >> having MAXPATHLEN + 1 will really break the other platforms?? Do you >> really see this as a problem or are you just pointing out inconsistency? >>> >>>>> >>>>> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >>>>> >>>>> This only exists on Solaris so I think should be in #ifdef >>>>> SOLARIS, to make that clear. >>>> >>>> Ok.? I'll add this. >>>>> >>>>> --- >>>>> >>>>> src/java.base/windows/native/include/jvm_md.h >>>>> >>>>> Given the differences between the two versions either something >>>>> has been broken or "extern C" declarations are not needed :) >>>> >>>> Well, they are needed for Hotspot to build and do not prevent jdk >>>> from building.? I don't know what was broken. >>> >>> We really need to understand this better. Maybe related to the map >>> files that expose the symbols. ?? >> >> They're needed because the JDK files are written mostly in C and that >> doesn't complain about the linkage difference.? Hotspot files are in >> C++ which does complain. >> >>> >>>>> >>>>> --- >>>>> >>>>> That was a really painful way to spend most of my Friday. TGIF! :) >>>> >>>> Thanks for going through it.? See comments inline for changes. >>>> Generating a webrev takes hours so I'm not going to do that unless >>>> you insist. >>> >>> An incremental webrev shouldn't take long - right? You're a mq >>> maestro now. :) >> >> Well I generally trash a repository whenever I use mq but sure. >>> >>> If you can reasonably produce an incremental webrev once you've >>> settled on all the comments/issues that would be good. >> >> Ok, sure. >> >> Coleen >>> >>> Thanks, >>> David >>> >>>> Thanks, >>>> Coleen >>>> >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>> >>>>> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>>>>> ??Hi Magnus, >>>>>> >>>>>> Thank you for reviewing this.?? I have a new version that takes >>>>>> out the hack in globalDefinitions.hpp and adds casts to >>>>>> src/hotspot/share/opto/type.cpp instead. >>>>>> >>>>>> Also some fixes from Martin at SAP. >>>>>> >>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>>>>> >>>>>> see below. >>>>>> >>>>>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>>>>> Coleen, >>>>>>> >>>>>>> Thank you for addressing this! >>>>>>> >>>>>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>> >>>>>>>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>>>>>>> after precompiled.h, so if you have repetitive stress wrist >>>>>>>> issues don't click on most of these files. >>>>>>>> >>>>>>>> There were more issues to resolve, however.? The JDK windows >>>>>>>> jni_md.h file defined jint as long and the hotspot windows >>>>>>>> jni_x86.h as int. I had to choose the jdk version since it's >>>>>>>> the public version, so there are changes to the hotspot files >>>>>>>> for this. Generally I changed the code to use 'int' rather than >>>>>>>> 'jint' where the surrounding API didn't insist on consistently >>>>>>>> using java types. We should mostly be using C++ types within >>>>>>>> hotspot except in interfaces to native/JNI code.? There are a >>>>>>>> couple of hacks in places where adding multiple jint casts was >>>>>>>> too painful. >>>>>>>> >>>>>>>> Tested with JPRT and tier2-4 (in progress). >>>>>>>> >>>>>>>> open webrev at >>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>>>>> >>>>>>> Looks great! >>>>>>> >>>>>>> Just a few comments: >>>>>>> >>>>>>> * src/java.base/unix/native/include/jni_md.h: >>>>>>> >>>>>>> I don't think the externally_visible attribute should be there >>>>>>> for arm. I know this was the case for the corresponding hotspot >>>>>>> file for arm, but that was techically incorrect. The proper >>>>>>> dependency here is that externally_visible should be in all >>>>>>> JNIEXPORT if and only if we're building with JVM feature >>>>>>> "link-time-opt". Traditionally, that feature been enabled when >>>>>>> building arm32 builds, and only then, so there's been a >>>>>>> (coincidentally) connection here. Nowadays, Oracle does not care >>>>>>> about the arm32 builds, and I'm not sure if anyone else is >>>>>>> building them with link-time-opt enabled. >>>>>>> >>>>>>> It does seem wrong to me to export this behavior in the public >>>>>>> jni_md.h file, though. I think the correct way to solve this, if >>>>>>> we should continue supporting link-time-opt is to make sure this >>>>>>> attribute is set for exported hotspot functions. If it's still >>>>>>> needed, that is. A quick googling seems to indicate that >>>>>>> visibility("default") might be enough in modern gcc's. >>>>>>> >>>>>>> A third option is to remove the support for link-time-opt >>>>>>> entirely, if it's not really used. >>>>>> >>>>>> I didn't know how to change this since we are still building ARM >>>>>> with the jdk10/hs repository, and ARM needed this change.? I >>>>>> could wait until we bring down the jdk10/master changes that >>>>>> remove the ARM build and remove this conditional before I push. >>>>>> Or we could file an RFE to remove link-time-opt (?) and remove it >>>>>> then? >>>>>> >>>>>>> >>>>>>> * src/java.base/unix/native/include/jvm_md.h and >>>>>>> src/java.base/windows/native/include/jvm_md.h: >>>>>>> >>>>>>> These files define a public API, and contain non-trivial >>>>>>> changes. I suspect you should file a CSR request. (Even though I >>>>>>> realize you're only matching the header file with the reality.) >>>>>>> >>>>>> >>>>>> I filed the CSR.?? Waiting for the next steps. >>>>>> >>>>>> Thanks, >>>>>> Coleen >>>>>> >>>>>>> /Magnus >>>>>>> >>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>>>>> >>>>>>>> I have a script to update copyright files on commit. >>>>>>>> >>>>>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Coleen >>>>>>>> >>>>>>> >>>>>> >>>> >> From coleen.phillimore at oracle.com Mon Oct 30 12:15:31 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 30 Oct 2017 08:15:31 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <0f568e05-6f06-d2df-571e-0c591f062c15@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <57390ec3-8d8d-a3d7-9774-b5945a323be9@oracle.com> <0f568e05-6f06-d2df-571e-0c591f062c15@oracle.com> Message-ID: <29688c76-4983-dffc-6ce2-402cf91dafbf@oracle.com> On 10/28/17 3:58 AM, David Holmes wrote: > On 28/10/2017 6:20 AM, coleen.phillimore at oracle.com wrote: >> >> Incremental webrev: >> >> http://cr.openjdk.java.net/~coleenp/8189610.incr.01/webrev/index.html > > That all looks fine - thanks. > > If I get a chance I'll look deeper into why the VS compiler needs 0 to > be cast to jint (aka long) to avoid ambiguity with it being a NULL > pointer. I could understand if it always needed the cast, but not only > needing it for long, but not int. Thanks,? Kim can probably tell you where in the spec this is. Coleen > > Thanks, > David > >> thanks, >> Coleen >> >> On 10/27/17 11:13 AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> On 10/27/17 9:37 AM, David Holmes wrote: >>>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>>> >>>>>> ?ConstantIntValue((jint)0); >>>>>> >>>>>> why is this cast needed? what causes the ambiguity? (If this was >>>>>> a template I'd understand ;-) ). Also didn't you change that >>>>>> constructor to take an int anyway - not that I think it should - >>>>>> see below. >>>>> >>>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >>>>> 'long' better than any pointer type.? So this cast is needed. >>>> >>>> But you changed the constructor to take an int! >>>> >>>> ?class ConstantIntValue: public ScopeValue { >>>> ? private: >>>> -? jint _value; >>>> +? int _value; >>>> ? public: >>>> -? ConstantIntValue(jint value)???????? { _value = value; } >>>> +? ConstantIntValue(int value)????????? { _value = value; } >>>> >>> I changed this back to not take an int and changed c1_LinearScan.cpp >>> to have the (jint)0 cast and output.cp needed (jint)0 casts.? 0L >>> doesn't work for platforms where jint is an 'int' rather than a long >>> because it's ambiguous with the functions that take a pointer type. >>> Probably better to keep the type of ConstantIntValue consistent with >>> j types. >>> >>> Thanks, >>> Coleen >> From david.holmes at oracle.com Mon Oct 30 12:17:38 2017 From: david.holmes at oracle.com (David Holmes) Date: Mon, 30 Oct 2017 22:17:38 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> Message-ID: <815ac734-ea8b-ea2d-ecec-85cb547ba2f4@oracle.com> On 30/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: > On 10/28/17 3:50 AM, David Holmes wrote: >> Hi Coleen, >> >> I've commented on the file location in response to Mandy's email. >> >> The only issue I'm still concerned about is the JVM_MAXPATHLEN issue. >> I think it is a bug to define a JVM_MAXPATHLEN that is bigger than the >> platform MAXPATHLEN. I also would not want to see any change in >> behaviour because of this - so AIX and Solaris should not get a >> different JVM_MAXPATHLEN due to this refactoring change. So yes I >> think this needs to be ifdef'd for Linux and reluctantly (because it >> was a copy error) for OSX/BSD as well. > > #if defined(AIX) || defined(SOLARIS) > #define JVM_MAXPATHLEN MAXPATHLEN > #else > // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This may > //?????? cause problems if JVM and the rest of JDK are built on different > //?????? Linux releases. Here we define JVM_MAXPATHLEN to be MAXPATHLEN > + 1, > //?????? so buffers declared in VM are always >= 4096. > #define JVM_MAXPATHLEN MAXPATHLEN + 1 > #endif > > Is this ok? Yes - thanks. It preserves existing behaviour on the VM side at least. Time will tell if it messes anything up on the JDK side for Linux/OSX. David > thanks, > Coleen >> >> Thanks, >> David >> >> On 28/10/2017 12:08 AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> On 10/27/17 9:37 AM, David Holmes wrote: >>>> On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>>>> >>>>> >>>>> On 10/27/17 3:23 AM, David Holmes wrote: >>>>>> Hi Coleen, >>>>>> >>>>>> Thanks for tackling this. >>>>>> >>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>> >>>>>> Can you update the bug synopsis to show it covers both sets of >>>>>> files please. >>>>>> >>>>>> I hate to start with this (and it took me quite a while to realize >>>>>> it) but as Mandy pointed out jvm.h is not an exported interface >>>>>> from the JDK to the outside world (so not subject to CSR review), >>>>>> but is a private interface between the JVM and the JDK libraries. >>>>>> So I think really jvm.h belongs in the hotspot sources where it >>>>>> was, while jni.h belongs in the exported JDK sources. In which >>>>>> case the bulk of your changes to the hotspot files would not be >>>>>> needed - sorry. >>>>> >>>>> Maybe someone can make that decision and change at a later date. >>>>> The point of this change is that there is now only one of these >>>>> files that is shared.? I don't think jvm.h and the jvm_md.h belong >>>>> on the hotspot sources for the jdk to find them in some random >>>>> prims and os dependent directories. >>>> >>>> The one file that is needed is a hotspot file - jvm.h defines the >>>> interface that hotspot exports via jvm.cpp. >>>> >>>> If you leave jvm.h in hotspot/prims then a very large chunk of your >>>> boilerplate changes are not needed. The JDK code doesn't care what >>>> the name of the directory is - whatever it is just gets added as a >>>> -I directive (the JDK code will include "jvm.h" not "prims/jvm.h" >>>> the way hotspot sources do. >>>> >>>> This isn't something we want to change back or move again later. >>>> Whatever we do now we live with. >>> >>> I think it belongs with jni.h and I think the core libraries group >>> would agree.?? It seems more natural there than buried in the hotspot >>> prims directory.? I guess this is on hold while we have this >>> debate.?? Sigh. >>> >>> Actually with -I directives, changing to jvm.h from prims/jvm.h would >>> still work.?? Maybe we should change the name to jvm.hpp since it's >>> jvm.cpp though??? Or maybe just have two divergent copies and close >>> this as WNF. >>> >>>> >>>>> I'm happy to withdraw the CSR.? We generally use the CSR process to >>>>> add and remove JVM_ interfaces even though they're a private >>>>> interface in case some other JVM/JDK combination relies on them. >>>>> The changes to these files are very minor though and not likely to >>>>> cause any even theoretical incompatibility, so I'll withdraw it. >>>>>> >>>>>> Moving on ... >>>>>> >>>>>> First to address the initial comments/query you had: >>>>>> >>>>>>> The JDK windows jni_md.h file defined jint as long and the hotspot >>>>>>> windows jni_x86.h as int. I had to choose the jdk version since >>>>>>> it's the >>>>>>> public version, so there are changes to the hotspot files for this. >>>>>> >>>>>> On Windows int and long are always the same as it uses ILP32 or >>>>>> LLP64 (not LP64 like *nix platforms). So either choice should be >>>>>> fine. That said there are some odd casting issues I comment on >>>>>> below. Does the VS compiler complain about mixing int and long in >>>>>> expressions? >>>>> >>>>> Yes, it does even though int and long are the same representation. >>>> >>>> And what an absolute mess that makes. :( >>>> >>>>>> >>>>>>> Generally I changed the code to use 'int' rather than 'jint' >>>>>>> where the >>>>>>> surrounding API didn't insist on consistently using java types. We >>>>>>> should mostly be using C++ types within hotspot except in >>>>>>> interfaces to >>>>>>> native/JNI code. >>>>>> >>>>>> I think you pulled too hard on a few threads here and things are >>>>>> starting to unravel. There are numerous cases I refer to below >>>>>> where either the cast seems unnecessary/inappropriate or else >>>>>> highlights a bunch of additional changes that also need to be >>>>>> made. The fan out from this could be horrendous. Unless you >>>>>> actually get some kind of error - and I'd like to understand the >>>>>> details of those - I would not suggest making these changes as >>>>>> part of this work. >>>>> >>>>> I didn't make any change unless there was was an error.? I have 100 >>>>> failed JPRT jobs to confirm!? I eventually got a Windows system to >>>>> compile and test this on.?? Actually some of the changes came out >>>>> better.? Cases where we use jint as a bool simply turned to int. We >>>>> do not have an overload for bool for cmpxchg. >>>> >>>> That's unfortunate - ditto for OrderAccess. >>>> >>>>>> >>>>>> Looking through I have a quite a few queries/comments - apologies >>>>>> in advance as I know how tedious this is: >>>>>> >>>>>> make/hotspot/lib/CompileLibjsig.gmk >>>>>> src/java.base/solaris/native/libjsig/jsig.c >>>>>> >>>>>> Took a while to figure out why the include was needed. :) As a >>>>>> follow up I suggest just deleting the -I include directive, delete >>>>>> the Solaris-only definition of JSIG_VERSION_1_4_1, and delete >>>>>> everything to do with JVM_get_libjsig_version. It is all obsolete. >>>>> >>>>> Can I patch up jsig in a separate RFE?? I don't remember why this >>>>> broke so I simply moved JSIG #define.? Is jsig obsolete? Removing >>>>> JVM_* definitions generally requires a CSR. >>>> >>>> I did say "As a follow up". jsig is not obsolete but the jsig >>>> versioning code, only used by Solaris, is. >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/cpu/arm/interp_masm_arm.cpp >>>>>> >>>>>> Why did you need to add the jvm.h include? >>>>>> >>>>> >>>>> ?? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); >>>> >>>> Okay. I'm not going to try and figure out how this code found this >>>> before. >>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/os/windows/os_windows.cpp. >>>>>> >>>>>> The type of process_exiting should be uint to match the DWORD of >>>>>> GetCurrentThreadID. Then you should need any casts. Also you >>>>>> missed this jint cast: >>>>>> >>>>>> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { >>>>> >>>>> Yes, that's better to change process_exiting to a DWORD.? It needs >>>>> a DWORD cast to 0 in the cmpxchg. >>>>> >>>>> ???????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, >>>>> (DWORD)0); >>>>> >>>>> These templates are picky. >>>> >>>> Yes - their inability to deal with literals is extremely frustrating. >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/c1/c1_Canonicalizer.hpp >>>>>> >>>>>> ? 43 #ifdef _WINDOWS >>>>>> ? 44?? // jint is defined as long in jni_md.h, so convert from int >>>>>> to jint >>>>>> ? 45?? void set_constant(int x)?????????????????????? { >>>>>> set_constant((jint)x); } >>>>>> ? 46 #endif >>>>>> >>>>>> Why is this necessary? int and long are the same on Windows. The >>>>>> whole point is that jint hides the underlying type, so where does >>>>>> this go wrong? >>>>> >>>>> No, they are not the same types even though they have the same >>>>> representation! >>>> >>>> This is truly unfortunate. >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>>> >>>>>> ?ConstantIntValue((jint)0); >>>>>> >>>>>> why is this cast needed? what causes the ambiguity? (If this was a >>>>>> template I'd understand ;-) ). Also didn't you change that >>>>>> constructor to take an int anyway - not that I think it should - >>>>>> see below. >>>>> >>>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >>>>> 'long' better than any pointer type.? So this cast is needed. >>>> >>>> But you changed the constructor to take an int! >>>> >>>> ?class ConstantIntValue: public ScopeValue { >>>> ? private: >>>> -? jint _value; >>>> +? int _value; >>>> ? public: >>>> -? ConstantIntValue(jint value)???????? { _value = value; } >>>> +? ConstantIntValue(int value)????????? { _value = value; } >>>> >>>> >>> >>> Okay I removed this cast. >>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/ci/ciReplay.cpp >>>>>> >>>>>> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >>>>>> >>>>>> why should this be jint? >>>>> >>>>> To avoid a cast from int* to jint* in the line below: >>>>> >>>>> ????????? value = kelem->multi_allocate(rank, dims, CHECK); >>>>> >>>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/classfile/altHashing.cpp >>>>>> >>>>>> Okay this looks more consistent with jint. >>>>> >>>>> Yes.? I translated this from some native code iirc. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/code/debugInfo.hpp >>>>>> >>>>>> These changes seem wrong. We have: >>>>>> >>>>>> ConstantLongValue(jlong value) >>>>>> ConstantDoubleValue(jdouble value) >>>>>> >>>>>> so we should have: >>>>>> >>>>>> ConstantIntValue(jint value) >>>>> >>>>> Again, there are multiple call sites with '0', which match int >>>>> trivially but are confused with long.? It's less consistent I agree >>>>> but better to not cast all the call sites. >>>> >>>> This is really making a mess of the APIs - they should be a jint but >>>> we declare them int because of a 0 casting problem. Can't we just >>>> use 0L? >>> >>> There aren't that many casts.? You're right, that would have been >>> better in some places. >>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/code/relocInfo.cpp >>>>>> >>>>>> Change seems unnecessary - int32_t is fine >>>>>> >>>>> >>>>> No, int32_t doesn't match the calls below it.? They all assume _lo >>>>> and _hi are jint. >>>>>> --- >>>>>> >>>>>> src/hotspot/share/compiler/compileBroker.cpp >>>>>> src/hotspot/share/compiler/compileBroker.hpp >>>>>> >>>>>> I see a complete mix of int and jint in this class, so why make >>>>>> the one change you did ?? >>>>> >>>>> This is another case of using jint as a flag with cmpxchg. The >>>>> templates for cmpxchg want the types to match and 0 and 1 are >>>>> essentially 'int'.? This is a lot cleaner this way. >>>> >>>> >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >>>>>> >>>>>> 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); >>>>>> >>>>>> why did you need to add the jint cast? It's used without any cast >>>>>> on the next two lines: >>>>>> >>>>>> 1701???? length -= O_BUFLEN; >>>>>> 1702???? offset += O_BUFLEN; >>>>>> >>>>> >>>>> There's a conversion from O_BUFLEN from int to long in 1701 and >>>>> 1702.?? MIN2 is a template that wants the types to match exactly. >>>> >>>> $%^%$! templates! >>>> >>>>>> ?? >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>>>> >>>>>> Looking around this code it seems very confused about types - eg >>>>>> the previous function is declared jboolean yet returns a jint on >>>>>> one path! It isn't clear to me if the return type is what should >>>>>> be changed or the parameter type? I would just leave this alone. >>>>> >>>>> I can't leave it alone because it doesn't compile that way. This >>>>> was the minimal change and yea, does look a bit inconsistent. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/opto/mulnode.cpp >>>>>> >>>>>> Okay TypeInt has jint parts, so the remaining int32_t declarations >>>>>> (A, B, C, D) should also be jint. >>>>> >>>>> Yes.? c2 uses jint types. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/opto/parse3.cpp >>>>>> >>>>>> I agree with the changes you made, but then: >>>>>> >>>>>> ?419???? jint dim_con = find_int_con(length[j], -1); >>>>>> >>>>>> should also be changed. >>>>>> >>>>>> And obviously MultiArrayExpandLimit should be defined as int not >>>>>> intx! >>>>> >>>>> Everything in globals.hpp is intx.? That's a thread that I don't >>>>> want to pull on! >>>> >>>> We still have that limitation? >>>>> >>>>> Changed dim_con to int. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/opto/phaseX.cpp >>>>>> >>>>>> I can see that intcon(jint i) is consistent with longcon(jlong l), >>>>>> but the use of "i" in the code is more consistent with int than jint. >>>>> >>>>> huh?? really? >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/opto/type.cpp >>>>>> >>>>>> 1505 int TypeInt::hash(void) const { >>>>>> 1506?? return java_add(java_add(_lo, _hi), java_add((jint)_widen, >>>>>> (jint)Type::Int)); >>>>>> 1507 } >>>>>> >>>>>> I can see that the (jint) casts you added make sense, but then the >>>>>> whole function should be returning jint not int. Ditto the other >>>>>> hash functions. >>>>> >>>>> I'm not messing with this, this is the minimal in type fixing that >>>>> I'm going to do here. >>>> >>>> >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/prims/jni.cpp >>>>>> >>>>>> I think vm_created should be a bool. In fact all the fields you >>>>>> changed are logically bools - do Atomics work for bool now? >>>>> >>>>> No, they do not.?? I had thought bool would be better originally too. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/prims/jvm.cpp >>>>>> >>>>>> is_attachable is the terminology used in the JDK code. >>>>> >>>>> Well the JDK version had is_attach_supported() as the flag name so >>>>> I used that in this one place. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>>> src/hotspot/share/prims/jvmtiImpl.cpp >>>>>> >>>>>> Are you making parameters consistent with the fields they initialize? >>>>> >>>>> They're consistent with the declarations now. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/prims/jvmtiTagMap.cpp >>>>>> >>>>>> There is a mix of int and jint for slot in this code. You fixed >>>>>> some, but this remains: >>>>>> >>>>>> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong >>>>>> thread_tag, >>>>>> 2441 jlong tid, >>>>>> 2442 jint depth, >>>>>> 2443 jmethodID method, >>>>>> 2444 jlocation bci, >>>>>> 2445 jint slot, >>>>> >>>>> Right for consistency with the declarations. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/runtime/perfData.cpp >>>>>> >>>>>> Callers pass both jint and int, so param type seems arbitrary. >>>>> >>>>> They are, but importantly they match the declarations. >>>>>> >>>>>> --- >>>>>> >>>>>> src/hotspot/share/runtime/perfMemory.cpp >>>>>> src/hotspot/share/runtime/perfMemory.hpp >>>>>> >>>>>> PerfMemory::_initialized should ideally be a bool - can >>>>>> OrderAccess handle that now? >>>>> >>>>> Nope. >>>>>> >>>>>> --- >>>>>> >>>>>> src/java.base/share/native/include/jvm.h >>>>>> >>>>>> Not clear why the jio functions are not also JNICALL ? >>>>> >>>>> They are now.? The JDK version didn't have JNICALL.? JVM needs >>>>> JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. >>>> >>>> ?? JVM currently does not have JNICALL. But they are declared as >>>> "extern C". >>> >>> This was a compilation error on Windows with JDK.?? Maybe the C code >>> in the JDK doesn't complain about linkage differences. I'll have to >>> go back and figure this out then. >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/java.base/unix/native/include/jni_md.h >>>>>> >>>>>> There is no need to special case ARM. The differences in the >>>>>> existing code were for LTO support and that is now irrelevant. >>>>> >>>>> See discussion with Magnus.?? We still build ARM for jdk10/hs so I >>>>> needed this conditional or of course I wouldn't have added it.? We >>>>> can remove it with LTO support. >>>> >>>> Those builds are gone - this is obsolete. But yes all LTO can be >>>> removed later if you wish. Just trying to simplify things now. >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> src/java.base/unix/native/include/jvm_md.h >>>>>> >>>>>> I know you've just copied this across, but it seems wrong to me: >>>>>> >>>>>> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. >>>>>> This may >>>>>> ? 58 //?????? cause problems if JVM and the rest of JDK are built >>>>>> on different >>>>>> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >>>>>> MAXPATHLEN + 1, >>>>>> ? 60 //?????? so buffers declared in VM are always >= 4096. >>>>>> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>>> >>>>>> It doesn't make sense to me to define an internal "max path >>>>>> length" that can _exceed_ the platform max! >>>>>> >>>>>> That aside there's no support for building different parts of the >>>>>> JDK on different platforms and then bringing them together. And in >>>>>> any case I would think the real problem would be building on a >>>>>> platform that uses 4096 and running on one that uses 4095! >>>>>> >>>>>> But that aside this is a Linux hack and should be guarded by ifdef >>>>>> LINUX. (I doubt BSD needs it, the bsd file is just a copy of the >>>>>> linux one - the JDK macosx version does the right thing). Solaris >>>>>> and AIX should stay as-is at MAXPATHLEN. >>>>> >>>>> All of the unix platforms had MAXPATHLEN+1.? I'll leave it for now >>>>> and we can investigate that further. >>>> >>>> I see the following existing code: >>>> >>>> src/java.base/unix/native/include/jvm_md.h: >>>> >>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>> >>>> src/java.base/macosx/native/include/jvm_md.h >>>> >>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>> >>>> src/hotspot/os/aix/jvm_aix.h >>>> >>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>> >>>> src/hotspot/os/bsd/jvm_bsd.h >>>> >>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1? // blindly copied from Linux >>>> version >>>> >>>> src/hotspot/os/linux/jvm_linux.h >>>> >>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>> >>>> src/hotspot/os/solaris/jvm_solaris.h >>>> >>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>> >>>> This is a linux only hack (if you ignore the blind copy from linux >>>> into the BSD code in the VM). >>> >>> Oh, thanks, so should I add a bunch of ifdefs then?? Or do you think >>> having MAXPATHLEN + 1 will really break the other platforms?? Do you >>> really see this as a problem or are you just pointing out inconsistency? >>>> >>>>>> >>>>>> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >>>>>> >>>>>> This only exists on Solaris so I think should be in #ifdef >>>>>> SOLARIS, to make that clear. >>>>> >>>>> Ok.? I'll add this. >>>>>> >>>>>> --- >>>>>> >>>>>> src/java.base/windows/native/include/jvm_md.h >>>>>> >>>>>> Given the differences between the two versions either something >>>>>> has been broken or "extern C" declarations are not needed :) >>>>> >>>>> Well, they are needed for Hotspot to build and do not prevent jdk >>>>> from building.? I don't know what was broken. >>>> >>>> We really need to understand this better. Maybe related to the map >>>> files that expose the symbols. ?? >>> >>> They're needed because the JDK files are written mostly in C and that >>> doesn't complain about the linkage difference.? Hotspot files are in >>> C++ which does complain. >>> >>>> >>>>>> >>>>>> --- >>>>>> >>>>>> That was a really painful way to spend most of my Friday. TGIF! :) >>>>> >>>>> Thanks for going through it.? See comments inline for changes. >>>>> Generating a webrev takes hours so I'm not going to do that unless >>>>> you insist. >>>> >>>> An incremental webrev shouldn't take long - right? You're a mq >>>> maestro now. :) >>> >>> Well I generally trash a repository whenever I use mq but sure. >>>> >>>> If you can reasonably produce an incremental webrev once you've >>>> settled on all the comments/issues that would be good. >>> >>> Ok, sure. >>> >>> Coleen >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> Coleen >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>> >>>>>> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>>>>>> ??Hi Magnus, >>>>>>> >>>>>>> Thank you for reviewing this.?? I have a new version that takes >>>>>>> out the hack in globalDefinitions.hpp and adds casts to >>>>>>> src/hotspot/share/opto/type.cpp instead. >>>>>>> >>>>>>> Also some fixes from Martin at SAP. >>>>>>> >>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>>>>>> >>>>>>> see below. >>>>>>> >>>>>>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>>>>>> Coleen, >>>>>>>> >>>>>>>> Thank you for addressing this! >>>>>>>> >>>>>>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>>> >>>>>>>>> Mostly used sed to remove prims/jvm.h and move #include "jvm.h" >>>>>>>>> after precompiled.h, so if you have repetitive stress wrist >>>>>>>>> issues don't click on most of these files. >>>>>>>>> >>>>>>>>> There were more issues to resolve, however.? The JDK windows >>>>>>>>> jni_md.h file defined jint as long and the hotspot windows >>>>>>>>> jni_x86.h as int. I had to choose the jdk version since it's >>>>>>>>> the public version, so there are changes to the hotspot files >>>>>>>>> for this. Generally I changed the code to use 'int' rather than >>>>>>>>> 'jint' where the surrounding API didn't insist on consistently >>>>>>>>> using java types. We should mostly be using C++ types within >>>>>>>>> hotspot except in interfaces to native/JNI code.? There are a >>>>>>>>> couple of hacks in places where adding multiple jint casts was >>>>>>>>> too painful. >>>>>>>>> >>>>>>>>> Tested with JPRT and tier2-4 (in progress). >>>>>>>>> >>>>>>>>> open webrev at >>>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>>>>>> >>>>>>>> Looks great! >>>>>>>> >>>>>>>> Just a few comments: >>>>>>>> >>>>>>>> * src/java.base/unix/native/include/jni_md.h: >>>>>>>> >>>>>>>> I don't think the externally_visible attribute should be there >>>>>>>> for arm. I know this was the case for the corresponding hotspot >>>>>>>> file for arm, but that was techically incorrect. The proper >>>>>>>> dependency here is that externally_visible should be in all >>>>>>>> JNIEXPORT if and only if we're building with JVM feature >>>>>>>> "link-time-opt". Traditionally, that feature been enabled when >>>>>>>> building arm32 builds, and only then, so there's been a >>>>>>>> (coincidentally) connection here. Nowadays, Oracle does not care >>>>>>>> about the arm32 builds, and I'm not sure if anyone else is >>>>>>>> building them with link-time-opt enabled. >>>>>>>> >>>>>>>> It does seem wrong to me to export this behavior in the public >>>>>>>> jni_md.h file, though. I think the correct way to solve this, if >>>>>>>> we should continue supporting link-time-opt is to make sure this >>>>>>>> attribute is set for exported hotspot functions. If it's still >>>>>>>> needed, that is. A quick googling seems to indicate that >>>>>>>> visibility("default") might be enough in modern gcc's. >>>>>>>> >>>>>>>> A third option is to remove the support for link-time-opt >>>>>>>> entirely, if it's not really used. >>>>>>> >>>>>>> I didn't know how to change this since we are still building ARM >>>>>>> with the jdk10/hs repository, and ARM needed this change.? I >>>>>>> could wait until we bring down the jdk10/master changes that >>>>>>> remove the ARM build and remove this conditional before I push. >>>>>>> Or we could file an RFE to remove link-time-opt (?) and remove it >>>>>>> then? >>>>>>> >>>>>>>> >>>>>>>> * src/java.base/unix/native/include/jvm_md.h and >>>>>>>> src/java.base/windows/native/include/jvm_md.h: >>>>>>>> >>>>>>>> These files define a public API, and contain non-trivial >>>>>>>> changes. I suspect you should file a CSR request. (Even though I >>>>>>>> realize you're only matching the header file with the reality.) >>>>>>>> >>>>>>> >>>>>>> I filed the CSR.?? Waiting for the next steps. >>>>>>> >>>>>>> Thanks, >>>>>>> Coleen >>>>>>> >>>>>>>> /Magnus >>>>>>>> >>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>>>>>> >>>>>>>>> I have a script to update copyright files on commit. >>>>>>>>> >>>>>>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Coleen >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> > From coleen.phillimore at oracle.com Mon Oct 30 12:38:23 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 30 Oct 2017 08:38:23 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <815ac734-ea8b-ea2d-ecec-85cb547ba2f4@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> <815ac734-ea8b-ea2d-ecec-85cb547ba2f4@oracle.com> Message-ID: <440f79ba-2da3-b627-53bc-e1842e3cf73c@oracle.com> On 10/30/17 8:17 AM, David Holmes wrote: > On 30/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >> On 10/28/17 3:50 AM, David Holmes wrote: >>> Hi Coleen, >>> >>> I've commented on the file location in response to Mandy's email. >>> >>> The only issue I'm still concerned about is the JVM_MAXPATHLEN >>> issue. I think it is a bug to define a JVM_MAXPATHLEN that is bigger >>> than the platform MAXPATHLEN. I also would not want to see any >>> change in behaviour because of this - so AIX and Solaris should not >>> get a different JVM_MAXPATHLEN due to this refactoring change. So >>> yes I think this needs to be ifdef'd for Linux and reluctantly >>> (because it was a copy error) for OSX/BSD as well. >> >> #if defined(AIX) || defined(SOLARIS) >> #define JVM_MAXPATHLEN MAXPATHLEN >> #else >> // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This may >> //?????? cause problems if JVM and the rest of JDK are built on >> different >> //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >> MAXPATHLEN + 1, >> //?????? so buffers declared in VM are always >= 4096. >> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >> #endif >> >> Is this ok? > > Yes - thanks. It preserves existing behaviour on the VM side at least. > Time will tell if it messes anything up on the JDK side for Linux/OSX. I don't want to wait for time so I'm investigating. It's one use is: Java_java_io_UnixFileSystem_canonicalize0(JNIEnv *env, jobject this, ... ??????? char canonicalPath[JVM_MAXPATHLEN]; ??????? if (canonicalize((char *)path, ???????????????????????? canonicalPath, JVM_MAXPATHLEN) < 0) { ??????????? JNU_ThrowIOExceptionWithLastError(env, "Bad pathname"); Which goes to: canonicalize_md.c canonicalize(char *original, char *resolved, int len) ??? if (len < PATH_MAX) { ??????? errno = EINVAL; ??????? return -1; ??? } So this should fail every time. sys/param.h:# define MAXPATHLEN??? PATH_MAX I haven't found any tests for it. I don't know why Java_java_io_UnixFileSystem uses JVM_MAXPATHLEN since it's not calling the JVM interface as far as I can tell.??? I think it should be changed to PATH_MAX. ? Coleen > > David > >> thanks, >> Coleen >>> >>> Thanks, >>> David >>> >>> On 28/10/2017 12:08 AM, coleen.phillimore at oracle.com wrote: >>>> >>>> >>>> On 10/27/17 9:37 AM, David Holmes wrote: >>>>> On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>>>>> >>>>>> >>>>>> On 10/27/17 3:23 AM, David Holmes wrote: >>>>>>> Hi Coleen, >>>>>>> >>>>>>> Thanks for tackling this. >>>>>>> >>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>> >>>>>>> Can you update the bug synopsis to show it covers both sets of >>>>>>> files please. >>>>>>> >>>>>>> I hate to start with this (and it took me quite a while to >>>>>>> realize it) but as Mandy pointed out jvm.h is not an exported >>>>>>> interface from the JDK to the outside world (so not subject to >>>>>>> CSR review), but is a private interface between the JVM and the >>>>>>> JDK libraries. So I think really jvm.h belongs in the hotspot >>>>>>> sources where it was, while jni.h belongs in the exported JDK >>>>>>> sources. In which case the bulk of your changes to the hotspot >>>>>>> files would not be needed - sorry. >>>>>> >>>>>> Maybe someone can make that decision and change at a later date. >>>>>> The point of this change is that there is now only one of these >>>>>> files that is shared.? I don't think jvm.h and the jvm_md.h >>>>>> belong on the hotspot sources for the jdk to find them in some >>>>>> random prims and os dependent directories. >>>>> >>>>> The one file that is needed is a hotspot file - jvm.h defines the >>>>> interface that hotspot exports via jvm.cpp. >>>>> >>>>> If you leave jvm.h in hotspot/prims then a very large chunk of >>>>> your boilerplate changes are not needed. The JDK code doesn't care >>>>> what the name of the directory is - whatever it is just gets added >>>>> as a -I directive (the JDK code will include "jvm.h" not >>>>> "prims/jvm.h" the way hotspot sources do. >>>>> >>>>> This isn't something we want to change back or move again later. >>>>> Whatever we do now we live with. >>>> >>>> I think it belongs with jni.h and I think the core libraries group >>>> would agree.?? It seems more natural there than buried in the >>>> hotspot prims directory.? I guess this is on hold while we have >>>> this debate.?? Sigh. >>>> >>>> Actually with -I directives, changing to jvm.h from prims/jvm.h >>>> would still work.?? Maybe we should change the name to jvm.hpp >>>> since it's jvm.cpp though??? Or maybe just have two divergent >>>> copies and close this as WNF. >>>> >>>>> >>>>>> I'm happy to withdraw the CSR.? We generally use the CSR process >>>>>> to add and remove JVM_ interfaces even though they're a private >>>>>> interface in case some other JVM/JDK combination relies on them. >>>>>> The changes to these files are very minor though and not likely >>>>>> to cause any even theoretical incompatibility, so I'll withdraw it. >>>>>>> >>>>>>> Moving on ... >>>>>>> >>>>>>> First to address the initial comments/query you had: >>>>>>> >>>>>>>> The JDK windows jni_md.h file defined jint as long and the hotspot >>>>>>>> windows jni_x86.h as int. I had to choose the jdk version since >>>>>>>> it's the >>>>>>>> public version, so there are changes to the hotspot files for >>>>>>>> this. >>>>>>> >>>>>>> On Windows int and long are always the same as it uses ILP32 or >>>>>>> LLP64 (not LP64 like *nix platforms). So either choice should be >>>>>>> fine. That said there are some odd casting issues I comment on >>>>>>> below. Does the VS compiler complain about mixing int and long >>>>>>> in expressions? >>>>>> >>>>>> Yes, it does even though int and long are the same representation. >>>>> >>>>> And what an absolute mess that makes. :( >>>>> >>>>>>> >>>>>>>> Generally I changed the code to use 'int' rather than 'jint' >>>>>>>> where the >>>>>>>> surrounding API didn't insist on consistently using java types. We >>>>>>>> should mostly be using C++ types within hotspot except in >>>>>>>> interfaces to >>>>>>>> native/JNI code. >>>>>>> >>>>>>> I think you pulled too hard on a few threads here and things are >>>>>>> starting to unravel. There are numerous cases I refer to below >>>>>>> where either the cast seems unnecessary/inappropriate or else >>>>>>> highlights a bunch of additional changes that also need to be >>>>>>> made. The fan out from this could be horrendous. Unless you >>>>>>> actually get some kind of error - and I'd like to understand the >>>>>>> details of those - I would not suggest making these changes as >>>>>>> part of this work. >>>>>> >>>>>> I didn't make any change unless there was was an error. I have >>>>>> 100 failed JPRT jobs to confirm!? I eventually got a Windows >>>>>> system to compile and test this on. Actually some of the changes >>>>>> came out better.? Cases where we use jint as a bool simply turned >>>>>> to int. We do not have an overload for bool for cmpxchg. >>>>> >>>>> That's unfortunate - ditto for OrderAccess. >>>>> >>>>>>> >>>>>>> Looking through I have a quite a few queries/comments - >>>>>>> apologies in advance as I know how tedious this is: >>>>>>> >>>>>>> make/hotspot/lib/CompileLibjsig.gmk >>>>>>> src/java.base/solaris/native/libjsig/jsig.c >>>>>>> >>>>>>> Took a while to figure out why the include was needed. :) As a >>>>>>> follow up I suggest just deleting the -I include directive, >>>>>>> delete the Solaris-only definition of JSIG_VERSION_1_4_1, and >>>>>>> delete everything to do with JVM_get_libjsig_version. It is all >>>>>>> obsolete. >>>>>> >>>>>> Can I patch up jsig in a separate RFE?? I don't remember why this >>>>>> broke so I simply moved JSIG #define.? Is jsig obsolete? Removing >>>>>> JVM_* definitions generally requires a CSR. >>>>> >>>>> I did say "As a follow up". jsig is not obsolete but the jsig >>>>> versioning code, only used by Solaris, is. >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/cpu/arm/interp_masm_arm.cpp >>>>>>> >>>>>>> Why did you need to add the jvm.h include? >>>>>>> >>>>>> >>>>>> ?? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); >>>>> >>>>> Okay. I'm not going to try and figure out how this code found this >>>>> before. >>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/os/windows/os_windows.cpp. >>>>>>> >>>>>>> The type of process_exiting should be uint to match the DWORD of >>>>>>> GetCurrentThreadID. Then you should need any casts. Also you >>>>>>> missed this jint cast: >>>>>>> >>>>>>> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { >>>>>> >>>>>> Yes, that's better to change process_exiting to a DWORD.? It >>>>>> needs a DWORD cast to 0 in the cmpxchg. >>>>>> >>>>>> ???????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, >>>>>> (DWORD)0); >>>>>> >>>>>> These templates are picky. >>>>> >>>>> Yes - their inability to deal with literals is extremely frustrating. >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/c1/c1_Canonicalizer.hpp >>>>>>> >>>>>>> ? 43 #ifdef _WINDOWS >>>>>>> ? 44?? // jint is defined as long in jni_md.h, so convert from >>>>>>> int to jint >>>>>>> ? 45?? void set_constant(int x) { set_constant((jint)x); } >>>>>>> ? 46 #endif >>>>>>> >>>>>>> Why is this necessary? int and long are the same on Windows. The >>>>>>> whole point is that jint hides the underlying type, so where >>>>>>> does this go wrong? >>>>>> >>>>>> No, they are not the same types even though they have the same >>>>>> representation! >>>>> >>>>> This is truly unfortunate. >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>>>> >>>>>>> ?ConstantIntValue((jint)0); >>>>>>> >>>>>>> why is this cast needed? what causes the ambiguity? (If this was >>>>>>> a template I'd understand ;-) ). Also didn't you change that >>>>>>> constructor to take an int anyway - not that I think it should - >>>>>>> see below. >>>>>> >>>>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't >>>>>> match 'long' better than any pointer type.? So this cast is needed. >>>>> >>>>> But you changed the constructor to take an int! >>>>> >>>>> ?class ConstantIntValue: public ScopeValue { >>>>> ? private: >>>>> -? jint _value; >>>>> +? int _value; >>>>> ? public: >>>>> -? ConstantIntValue(jint value)???????? { _value = value; } >>>>> +? ConstantIntValue(int value)????????? { _value = value; } >>>>> >>>>> >>>> >>>> Okay I removed this cast. >>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/ci/ciReplay.cpp >>>>>>> >>>>>>> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >>>>>>> >>>>>>> why should this be jint? >>>>>> >>>>>> To avoid a cast from int* to jint* in the line below: >>>>>> >>>>>> ????????? value = kelem->multi_allocate(rank, dims, CHECK); >>>>>> >>>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/classfile/altHashing.cpp >>>>>>> >>>>>>> Okay this looks more consistent with jint. >>>>>> >>>>>> Yes.? I translated this from some native code iirc. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/code/debugInfo.hpp >>>>>>> >>>>>>> These changes seem wrong. We have: >>>>>>> >>>>>>> ConstantLongValue(jlong value) >>>>>>> ConstantDoubleValue(jdouble value) >>>>>>> >>>>>>> so we should have: >>>>>>> >>>>>>> ConstantIntValue(jint value) >>>>>> >>>>>> Again, there are multiple call sites with '0', which match int >>>>>> trivially but are confused with long.? It's less consistent I >>>>>> agree but better to not cast all the call sites. >>>>> >>>>> This is really making a mess of the APIs - they should be a jint >>>>> but we declare them int because of a 0 casting problem. Can't we >>>>> just use 0L? >>>> >>>> There aren't that many casts.? You're right, that would have been >>>> better in some places. >>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/code/relocInfo.cpp >>>>>>> >>>>>>> Change seems unnecessary - int32_t is fine >>>>>>> >>>>>> >>>>>> No, int32_t doesn't match the calls below it.? They all assume >>>>>> _lo and _hi are jint. >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/compiler/compileBroker.cpp >>>>>>> src/hotspot/share/compiler/compileBroker.hpp >>>>>>> >>>>>>> I see a complete mix of int and jint in this class, so why make >>>>>>> the one change you did ?? >>>>>> >>>>>> This is another case of using jint as a flag with cmpxchg. The >>>>>> templates for cmpxchg want the types to match and 0 and 1 are >>>>>> essentially 'int'.? This is a lot cleaner this way. >>>>> >>>>> >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >>>>>>> >>>>>>> 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); >>>>>>> >>>>>>> why did you need to add the jint cast? It's used without any >>>>>>> cast on the next two lines: >>>>>>> >>>>>>> 1701???? length -= O_BUFLEN; >>>>>>> 1702???? offset += O_BUFLEN; >>>>>>> >>>>>> >>>>>> There's a conversion from O_BUFLEN from int to long in 1701 and >>>>>> 1702.?? MIN2 is a template that wants the types to match exactly. >>>>> >>>>> $%^%$! templates! >>>>> >>>>>>> ?? >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>>>>> >>>>>>> Looking around this code it seems very confused about types - eg >>>>>>> the previous function is declared jboolean yet returns a jint on >>>>>>> one path! It isn't clear to me if the return type is what should >>>>>>> be changed or the parameter type? I would just leave this alone. >>>>>> >>>>>> I can't leave it alone because it doesn't compile that way. This >>>>>> was the minimal change and yea, does look a bit inconsistent. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/opto/mulnode.cpp >>>>>>> >>>>>>> Okay TypeInt has jint parts, so the remaining int32_t >>>>>>> declarations (A, B, C, D) should also be jint. >>>>>> >>>>>> Yes.? c2 uses jint types. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/opto/parse3.cpp >>>>>>> >>>>>>> I agree with the changes you made, but then: >>>>>>> >>>>>>> ?419???? jint dim_con = find_int_con(length[j], -1); >>>>>>> >>>>>>> should also be changed. >>>>>>> >>>>>>> And obviously MultiArrayExpandLimit should be defined as int not >>>>>>> intx! >>>>>> >>>>>> Everything in globals.hpp is intx.? That's a thread that I don't >>>>>> want to pull on! >>>>> >>>>> We still have that limitation? >>>>>> >>>>>> Changed dim_con to int. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/opto/phaseX.cpp >>>>>>> >>>>>>> I can see that intcon(jint i) is consistent with longcon(jlong >>>>>>> l), but the use of "i" in the code is more consistent with int >>>>>>> than jint. >>>>>> >>>>>> huh?? really? >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/opto/type.cpp >>>>>>> >>>>>>> 1505 int TypeInt::hash(void) const { >>>>>>> 1506?? return java_add(java_add(_lo, _hi), >>>>>>> java_add((jint)_widen, (jint)Type::Int)); >>>>>>> 1507 } >>>>>>> >>>>>>> I can see that the (jint) casts you added make sense, but then >>>>>>> the whole function should be returning jint not int. Ditto the >>>>>>> other hash functions. >>>>>> >>>>>> I'm not messing with this, this is the minimal in type fixing >>>>>> that I'm going to do here. >>>>> >>>>> >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/prims/jni.cpp >>>>>>> >>>>>>> I think vm_created should be a bool. In fact all the fields you >>>>>>> changed are logically bools - do Atomics work for bool now? >>>>>> >>>>>> No, they do not.?? I had thought bool would be better originally >>>>>> too. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/prims/jvm.cpp >>>>>>> >>>>>>> is_attachable is the terminology used in the JDK code. >>>>>> >>>>>> Well the JDK version had is_attach_supported() as the flag name >>>>>> so I used that in this one place. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>>>> src/hotspot/share/prims/jvmtiImpl.cpp >>>>>>> >>>>>>> Are you making parameters consistent with the fields they >>>>>>> initialize? >>>>>> >>>>>> They're consistent with the declarations now. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/prims/jvmtiTagMap.cpp >>>>>>> >>>>>>> There is a mix of int and jint for slot in this code. You fixed >>>>>>> some, but this remains: >>>>>>> >>>>>>> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong >>>>>>> thread_tag, >>>>>>> 2441 jlong tid, >>>>>>> 2442 jint depth, >>>>>>> 2443 jmethodID method, >>>>>>> 2444 jlocation bci, >>>>>>> 2445 jint slot, >>>>>> >>>>>> Right for consistency with the declarations. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/runtime/perfData.cpp >>>>>>> >>>>>>> Callers pass both jint and int, so param type seems arbitrary. >>>>>> >>>>>> They are, but importantly they match the declarations. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/runtime/perfMemory.cpp >>>>>>> src/hotspot/share/runtime/perfMemory.hpp >>>>>>> >>>>>>> PerfMemory::_initialized should ideally be a bool - can >>>>>>> OrderAccess handle that now? >>>>>> >>>>>> Nope. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/java.base/share/native/include/jvm.h >>>>>>> >>>>>>> Not clear why the jio functions are not also JNICALL ? >>>>>> >>>>>> They are now.? The JDK version didn't have JNICALL.? JVM needs >>>>>> JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. >>>>> >>>>> ?? JVM currently does not have JNICALL. But they are declared as >>>>> "extern C". >>>> >>>> This was a compilation error on Windows with JDK.?? Maybe the C >>>> code in the JDK doesn't complain about linkage differences. I'll >>>> have to go back and figure this out then. >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/java.base/unix/native/include/jni_md.h >>>>>>> >>>>>>> There is no need to special case ARM. The differences in the >>>>>>> existing code were for LTO support and that is now irrelevant. >>>>>> >>>>>> See discussion with Magnus.?? We still build ARM for jdk10/hs so >>>>>> I needed this conditional or of course I wouldn't have added it.? >>>>>> We can remove it with LTO support. >>>>> >>>>> Those builds are gone - this is obsolete. But yes all LTO can be >>>>> removed later if you wish. Just trying to simplify things now. >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/java.base/unix/native/include/jvm_md.h >>>>>>> >>>>>>> I know you've just copied this across, but it seems wrong to me: >>>>>>> >>>>>>> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on >>>>>>> others. This may >>>>>>> ? 58 //?????? cause problems if JVM and the rest of JDK are >>>>>>> built on different >>>>>>> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to >>>>>>> be MAXPATHLEN + 1, >>>>>>> ? 60 //?????? so buffers declared in VM are always >= 4096. >>>>>>> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>>>> >>>>>>> It doesn't make sense to me to define an internal "max path >>>>>>> length" that can _exceed_ the platform max! >>>>>>> >>>>>>> That aside there's no support for building different parts of >>>>>>> the JDK on different platforms and then bringing them together. >>>>>>> And in any case I would think the real problem would be building >>>>>>> on a platform that uses 4096 and running on one that uses 4095! >>>>>>> >>>>>>> But that aside this is a Linux hack and should be guarded by >>>>>>> ifdef LINUX. (I doubt BSD needs it, the bsd file is just a copy >>>>>>> of the linux one - the JDK macosx version does the right thing). >>>>>>> Solaris and AIX should stay as-is at MAXPATHLEN. >>>>>> >>>>>> All of the unix platforms had MAXPATHLEN+1.? I'll leave it for >>>>>> now and we can investigate that further. >>>>> >>>>> I see the following existing code: >>>>> >>>>> src/java.base/unix/native/include/jvm_md.h: >>>>> >>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>> >>>>> src/java.base/macosx/native/include/jvm_md.h >>>>> >>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>> >>>>> src/hotspot/os/aix/jvm_aix.h >>>>> >>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>> >>>>> src/hotspot/os/bsd/jvm_bsd.h >>>>> >>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1? // blindly copied from >>>>> Linux version >>>>> >>>>> src/hotspot/os/linux/jvm_linux.h >>>>> >>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>> >>>>> src/hotspot/os/solaris/jvm_solaris.h >>>>> >>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>> >>>>> This is a linux only hack (if you ignore the blind copy from linux >>>>> into the BSD code in the VM). >>>> >>>> Oh, thanks, so should I add a bunch of ifdefs then?? Or do you >>>> think having MAXPATHLEN + 1 will really break the other platforms?? >>>> Do you really see this as a problem or are you just pointing out >>>> inconsistency? >>>>> >>>>>>> >>>>>>> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >>>>>>> >>>>>>> This only exists on Solaris so I think should be in #ifdef >>>>>>> SOLARIS, to make that clear. >>>>>> >>>>>> Ok.? I'll add this. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/java.base/windows/native/include/jvm_md.h >>>>>>> >>>>>>> Given the differences between the two versions either something >>>>>>> has been broken or "extern C" declarations are not needed :) >>>>>> >>>>>> Well, they are needed for Hotspot to build and do not prevent jdk >>>>>> from building.? I don't know what was broken. >>>>> >>>>> We really need to understand this better. Maybe related to the map >>>>> files that expose the symbols. ?? >>>> >>>> They're needed because the JDK files are written mostly in C and >>>> that doesn't complain about the linkage difference. Hotspot files >>>> are in C++ which does complain. >>>> >>>>> >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> That was a really painful way to spend most of my Friday. TGIF! :) >>>>>> >>>>>> Thanks for going through it.? See comments inline for changes. >>>>>> Generating a webrev takes hours so I'm not going to do that >>>>>> unless you insist. >>>>> >>>>> An incremental webrev shouldn't take long - right? You're a mq >>>>> maestro now. :) >>>> >>>> Well I generally trash a repository whenever I use mq but sure. >>>>> >>>>> If you can reasonably produce an incremental webrev once you've >>>>> settled on all the comments/issues that would be good. >>>> >>>> Ok, sure. >>>> >>>> Coleen >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Thanks, >>>>>> Coleen >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> >>>>>>> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>>>>>>> ??Hi Magnus, >>>>>>>> >>>>>>>> Thank you for reviewing this.?? I have a new version that takes >>>>>>>> out the hack in globalDefinitions.hpp and adds casts to >>>>>>>> src/hotspot/share/opto/type.cpp instead. >>>>>>>> >>>>>>>> Also some fixes from Martin at SAP. >>>>>>>> >>>>>>>> open webrev at >>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>>>>>>> >>>>>>>> see below. >>>>>>>> >>>>>>>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>>>>>>> Coleen, >>>>>>>>> >>>>>>>>> Thank you for addressing this! >>>>>>>>> >>>>>>>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>>>> >>>>>>>>>> Mostly used sed to remove prims/jvm.h and move #include >>>>>>>>>> "jvm.h" after precompiled.h, so if you have repetitive stress >>>>>>>>>> wrist issues don't click on most of these files. >>>>>>>>>> >>>>>>>>>> There were more issues to resolve, however.? The JDK windows >>>>>>>>>> jni_md.h file defined jint as long and the hotspot windows >>>>>>>>>> jni_x86.h as int. I had to choose the jdk version since it's >>>>>>>>>> the public version, so there are changes to the hotspot files >>>>>>>>>> for this. Generally I changed the code to use 'int' rather >>>>>>>>>> than 'jint' where the surrounding API didn't insist on >>>>>>>>>> consistently using java types. We should mostly be using C++ >>>>>>>>>> types within hotspot except in interfaces to native/JNI >>>>>>>>>> code.? There are a couple of hacks in places where adding >>>>>>>>>> multiple jint casts was too painful. >>>>>>>>>> >>>>>>>>>> Tested with JPRT and tier2-4 (in progress). >>>>>>>>>> >>>>>>>>>> open webrev at >>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>>>>>>> >>>>>>>>> Looks great! >>>>>>>>> >>>>>>>>> Just a few comments: >>>>>>>>> >>>>>>>>> * src/java.base/unix/native/include/jni_md.h: >>>>>>>>> >>>>>>>>> I don't think the externally_visible attribute should be there >>>>>>>>> for arm. I know this was the case for the corresponding >>>>>>>>> hotspot file for arm, but that was techically incorrect. The >>>>>>>>> proper dependency here is that externally_visible should be in >>>>>>>>> all JNIEXPORT if and only if we're building with JVM feature >>>>>>>>> "link-time-opt". Traditionally, that feature been enabled when >>>>>>>>> building arm32 builds, and only then, so there's been a >>>>>>>>> (coincidentally) connection here. Nowadays, Oracle does not >>>>>>>>> care about the arm32 builds, and I'm not sure if anyone else >>>>>>>>> is building them with link-time-opt enabled. >>>>>>>>> >>>>>>>>> It does seem wrong to me to export this behavior in the public >>>>>>>>> jni_md.h file, though. I think the correct way to solve this, >>>>>>>>> if we should continue supporting link-time-opt is to make sure >>>>>>>>> this attribute is set for exported hotspot functions. If it's >>>>>>>>> still needed, that is. A quick googling seems to indicate that >>>>>>>>> visibility("default") might be enough in modern gcc's. >>>>>>>>> >>>>>>>>> A third option is to remove the support for link-time-opt >>>>>>>>> entirely, if it's not really used. >>>>>>>> >>>>>>>> I didn't know how to change this since we are still building >>>>>>>> ARM with the jdk10/hs repository, and ARM needed this change.? >>>>>>>> I could wait until we bring down the jdk10/master changes that >>>>>>>> remove the ARM build and remove this conditional before I push. >>>>>>>> Or we could file an RFE to remove link-time-opt (?) and remove >>>>>>>> it then? >>>>>>>> >>>>>>>>> >>>>>>>>> * src/java.base/unix/native/include/jvm_md.h and >>>>>>>>> src/java.base/windows/native/include/jvm_md.h: >>>>>>>>> >>>>>>>>> These files define a public API, and contain non-trivial >>>>>>>>> changes. I suspect you should file a CSR request. (Even though >>>>>>>>> I realize you're only matching the header file with the reality.) >>>>>>>>> >>>>>>>> >>>>>>>> I filed the CSR.?? Waiting for the next steps. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Coleen >>>>>>>> >>>>>>>>> /Magnus >>>>>>>>> >>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>>>>>>> >>>>>>>>>> I have a script to update copyright files on commit. >>>>>>>>>> >>>>>>>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Coleen >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >> From Alan.Bateman at oracle.com Mon Oct 30 13:24:33 2017 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 30 Oct 2017 13:24:33 +0000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <440f79ba-2da3-b627-53bc-e1842e3cf73c@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> <815ac734-ea8b-ea2d-ecec-85cb547ba2f4@oracle.com> <440f79ba-2da3-b627-53bc-e1842e3cf73c@oracle.com> Message-ID: <9fa3a074-3ebc-4fb5-4ffa-72d8bc4e5dc2@oracle.com> On 30/10/2017 12:38, coleen.phillimore at oracle.com wrote: > : > > > I don't know why Java_java_io_UnixFileSystem uses JVM_MAXPATHLEN since > it's not calling the JVM interface as far as I can tell. I think it > should be changed to PATH_MAX. This code used to use the JVM_* functions (dates back to early JDK releases). The JVM_MAXPATHLEN usage is likely left over from when this code was change to use the syscalls directly. -Alan From robbin.ehn at oracle.com Mon Oct 30 14:34:29 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 30 Oct 2017 15:34:29 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <4ebb905f23324a00b9cf10d8d410d420@sap.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> <59F2F01A.403@oracle.com> <4ebb905f23324a00b9cf10d8d410d420@sap.com> Message-ID: Thanks! There have been a bit hesitation and confusion about the option (at least internally). The option is opt-out but in globals.hpp it starts out as false. Now instead we explicit set it true in globals.hpp but we turn it off if we notice that: - We are on an unsupported platform - User have specified UseAOT - User have specified EnableJVMCI Here is webrev for changes needed: http://cr.openjdk.java.net/~rehn/8185640/v8/Option-Cleanup-12/webrev/ And here is CSR: https://bugs.openjdk.java.net/browse/JDK-8189942 Manual testing + basic testing done. And since I'm really hoping that this can be the last incremental, here is my whole patch queue flatten out: http://cr.openjdk.java.net/~rehn/8185640/v8/Full/webrev/ Thanks, Robbin On 10/27/2017 04:47 PM, Doerr, Martin wrote: > Hi Robbin, > > excellent. I think this matches what Coleen had proposed, now. > Thanks for doing all the work with so many incremental patches and for responding on so many discussions. Seems to be a tough piece of work. > > Best regards, > Martin > > > -----Original Message----- > From: Robbin Ehn [mailto:robbin.ehn at oracle.com] > Sent: Freitag, 27. Oktober 2017 15:15 > To: Erik ?sterlund ; Andrew Haley ; Doerr, Martin ; Karen Kinnear ; Coleen Phillimore (coleen.phillimore at oracle.com) > Cc: hotspot-dev developers > Subject: Re: RFR(XL): 8185640: Thread-local handshakes > > Hi all, > > Poll in switches: > http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Switch-10/ > > Poll in return: > http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Ret-11/ > > Please take an extra look at poll in return. > > Sanity tested, big test run still running (99% complete - OK). > > Performance regression for the added polls increased to total of -0.68% vs > global poll. (was -0.44%) > > We are discussing the opt-out option, the newest suggestion is to make it > diagnostic. Opinions? > > For anyone applying these patches, the number 9 patch changes the option from > product. I have not sent that out. > > Thanks, Robbin > > > From artem.smotrakov at oracle.com Mon Oct 30 07:39:49 2017 From: artem.smotrakov at oracle.com (Artem Smotrakov) Date: Mon, 30 Oct 2017 10:39:49 +0300 Subject: RFR [10] 8189800: Add support for AddressSanitizer In-Reply-To: <51eabbae-5435-59be-f443-a6b214a17513@oracle.com> References: <51eabbae-5435-59be-f443-a6b214a17513@oracle.com> Message-ID: cc'ing hotspot-dev at openjdk.java.net as David suggested. Artem On 10/27/2017 11:02 PM, Artem Smotrakov wrote: > Hello, > > Please review the following patch which adds support for > AddressSanitizer. > > AddressSanitizer is a runtime memory error detector which looks for > various memory corruption issues and leaks. > > Please refer to [1] for details. AddressSanitizer is available in gcc > 4.8+ and clang 3.1+ > > The patch below introduces --enable-asan parameter for the configure > script which enables AddressSanitizer. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8189800 > Webrev: http://cr.openjdk.java.net/~asmotrak/8189800/webrev.00/ > > [1] https://github.com/google/sanitizers/wiki/AddressSanitizer > > Artem From artem.smotrakov at oracle.com Mon Oct 30 09:31:40 2017 From: artem.smotrakov at oracle.com (Artem Smotrakov) Date: Mon, 30 Oct 2017 12:31:40 +0300 Subject: RFR [10] 8189800: Add support for AddressSanitizer In-Reply-To: <55e0e055-2e65-5c83-3f8e-36895f71860e@oracle.com> References: <51eabbae-5435-59be-f443-a6b214a17513@oracle.com> <55e0e055-2e65-5c83-3f8e-36895f71860e@oracle.com> Message-ID: <3b4c5abb-762f-a66c-02d5-93909dc656d4@oracle.com> Hi Magnus, The current approach uses AddressSanitizer as a shared library (libasan.so) which is part of GCC/Clang toolkit. In case you use system toolkit, then libasan.so is available for linker and at runtime. But if you set a custom toolkit by --with-devkit option, then libasan.so form this toolkit may not be available for linker and at runtime by default. As a result, you can get errors while linking and running. To fix that, you normally need to make it available using ldconfig, or update LD_LIBRARY_PATH. That's why it updates LD_LIBRARY_PATH with DEVKIT_LIB_DIR if a custom toolkit was used. That may be helpful when you build JDK in environment like jib/jprt. I tried to remove exporting ASAN_ENABLED and DEVKIT_LIB_DIR, and as a result, ASAN_OPTIONS and DEVKIT_LIB_DIR didn't go to jtreg command which caused tests to fail when you run "make test". If we don't export ASAN_OPTIONS and DEVKIT_LIB_DIR, then the updates in TestCommon.gmk don't make much sense to me because those variables have to be explicitly set for "make" anyway. I can remove exporting those variables and revert TestCommon.gmk. Although, it looks nicer to me if we can run the tests just with "make test" without specifying ASAN_OPTIONS and DEVKIT_LIB_DIR explicitly. What do you think? Artem On 10/30/2017 10:50 AM, Magnus Ihse Bursie wrote: > On 2017-10-30 08:39, Artem Smotrakov wrote: >> cc'ing hotspot-dev at openjdk.java.net as David suggested. >> >> Artem >> >> >> On 10/27/2017 11:02 PM, Artem Smotrakov wrote: >>> Hello, >>> >>> Please review the following patch which adds support for >>> AddressSanitizer. >>> >>> AddressSanitizer is a runtime memory error detector which looks for >>> various memory corruption issues and leaks. >>> >>> Please refer to [1] for details. AddressSanitizer is available in >>> gcc 4.8+ and clang 3.1+ >>> >>> The patch below introduces --enable-asan parameter for the configure >>> script which enables AddressSanitizer. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8189800 >>> Webrev: http://cr.openjdk.java.net/~asmotrak/8189800/webrev.00/ > spec.gmk.in should only have export for variables that needs to be > exported in the environment for executing binaries, that is > ASAN_OPTIONS and LD_LIBRARY_PATH, not ASAN_ENABLED or DEVKIT_LIB_DIR. > > I'm also a bit curious about the addition of of DEVKIT_LIB_DIR. Would > you care to elaborate your thinking? > > Otherwise it looks good. > > /Magnus > >>> >>> [1] https://github.com/google/sanitizers/wiki/AddressSanitizer >>> >>> Artem >> > From coleen.phillimore at oracle.com Mon Oct 30 14:48:32 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 30 Oct 2017 10:48:32 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <440f79ba-2da3-b627-53bc-e1842e3cf73c@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> <815ac734-ea8b-ea2d-ecec-85cb547ba2f4@oracle.com> <440f79ba-2da3-b627-53bc-e1842e3cf73c@oracle.com> Message-ID: http://cr.openjdk.java.net/~coleenp/8189610.incr.02/webrev/index.html Changed JDK file to use PATH_MAX.? Retested jdk tier1 tests. thanks, Coleen On 10/30/17 8:38 AM, coleen.phillimore at oracle.com wrote: > > > On 10/30/17 8:17 AM, David Holmes wrote: >> On 30/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>> On 10/28/17 3:50 AM, David Holmes wrote: >>>> Hi Coleen, >>>> >>>> I've commented on the file location in response to Mandy's email. >>>> >>>> The only issue I'm still concerned about is the JVM_MAXPATHLEN >>>> issue. I think it is a bug to define a JVM_MAXPATHLEN that is >>>> bigger than the platform MAXPATHLEN. I also would not want to see >>>> any change in behaviour because of this - so AIX and Solaris should >>>> not get a different JVM_MAXPATHLEN due to this refactoring change. >>>> So yes I think this needs to be ifdef'd for Linux and reluctantly >>>> (because it was a copy error) for OSX/BSD as well. >>> >>> #if defined(AIX) || defined(SOLARIS) >>> #define JVM_MAXPATHLEN MAXPATHLEN >>> #else >>> // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This may >>> //?????? cause problems if JVM and the rest of JDK are built on >>> different >>> //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >>> MAXPATHLEN + 1, >>> //?????? so buffers declared in VM are always >= 4096. >>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>> #endif >>> >>> Is this ok? >> >> Yes - thanks. It preserves existing behaviour on the VM side at >> least. Time will tell if it messes anything up on the JDK side for >> Linux/OSX. > > I don't want to wait for time so I'm investigating. > > It's one use is: > > Java_java_io_UnixFileSystem_canonicalize0(JNIEnv *env, jobject this, > ... > ??????? char canonicalPath[JVM_MAXPATHLEN]; > ??????? if (canonicalize((char *)path, > ???????????????????????? canonicalPath, JVM_MAXPATHLEN) < 0) { > ??????????? JNU_ThrowIOExceptionWithLastError(env, "Bad pathname"); > > Which goes to: > > canonicalize_md.c > > canonicalize(char *original, char *resolved, int len) > ??? if (len < PATH_MAX) { > ??????? errno = EINVAL; > ??????? return -1; > ??? } > > > So this should fail every time. > > sys/param.h:# define MAXPATHLEN??? PATH_MAX > > I haven't found any tests for it. > > I don't know why Java_java_io_UnixFileSystem uses JVM_MAXPATHLEN since > it's not calling the JVM interface as far as I can tell. I think it > should be changed to PATH_MAX. > > ? > Coleen >> >> David >> >>> thanks, >>> Coleen >>>> >>>> Thanks, >>>> David >>>> >>>> On 28/10/2017 12:08 AM, coleen.phillimore at oracle.com wrote: >>>>> >>>>> >>>>> On 10/27/17 9:37 AM, David Holmes wrote: >>>>>> On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>>>>>> >>>>>>> >>>>>>> On 10/27/17 3:23 AM, David Holmes wrote: >>>>>>>> Hi Coleen, >>>>>>>> >>>>>>>> Thanks for tackling this. >>>>>>>> >>>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>> >>>>>>>> Can you update the bug synopsis to show it covers both sets of >>>>>>>> files please. >>>>>>>> >>>>>>>> I hate to start with this (and it took me quite a while to >>>>>>>> realize it) but as Mandy pointed out jvm.h is not an exported >>>>>>>> interface from the JDK to the outside world (so not subject to >>>>>>>> CSR review), but is a private interface between the JVM and the >>>>>>>> JDK libraries. So I think really jvm.h belongs in the hotspot >>>>>>>> sources where it was, while jni.h belongs in the exported JDK >>>>>>>> sources. In which case the bulk of your changes to the hotspot >>>>>>>> files would not be needed - sorry. >>>>>>> >>>>>>> Maybe someone can make that decision and change at a later date. >>>>>>> The point of this change is that there is now only one of these >>>>>>> files that is shared.? I don't think jvm.h and the jvm_md.h >>>>>>> belong on the hotspot sources for the jdk to find them in some >>>>>>> random prims and os dependent directories. >>>>>> >>>>>> The one file that is needed is a hotspot file - jvm.h defines the >>>>>> interface that hotspot exports via jvm.cpp. >>>>>> >>>>>> If you leave jvm.h in hotspot/prims then a very large chunk of >>>>>> your boilerplate changes are not needed. The JDK code doesn't >>>>>> care what the name of the directory is - whatever it is just gets >>>>>> added as a -I directive (the JDK code will include "jvm.h" not >>>>>> "prims/jvm.h" the way hotspot sources do. >>>>>> >>>>>> This isn't something we want to change back or move again later. >>>>>> Whatever we do now we live with. >>>>> >>>>> I think it belongs with jni.h and I think the core libraries group >>>>> would agree.?? It seems more natural there than buried in the >>>>> hotspot prims directory.? I guess this is on hold while we have >>>>> this debate.?? Sigh. >>>>> >>>>> Actually with -I directives, changing to jvm.h from prims/jvm.h >>>>> would still work.?? Maybe we should change the name to jvm.hpp >>>>> since it's jvm.cpp though??? Or maybe just have two divergent >>>>> copies and close this as WNF. >>>>> >>>>>> >>>>>>> I'm happy to withdraw the CSR. We generally use the CSR process >>>>>>> to add and remove JVM_ interfaces even though they're a private >>>>>>> interface in case some other JVM/JDK combination relies on them. >>>>>>> The changes to these files are very minor though and not likely >>>>>>> to cause any even theoretical incompatibility, so I'll withdraw it. >>>>>>>> >>>>>>>> Moving on ... >>>>>>>> >>>>>>>> First to address the initial comments/query you had: >>>>>>>> >>>>>>>>> The JDK windows jni_md.h file defined jint as long and the >>>>>>>>> hotspot >>>>>>>>> windows jni_x86.h as int. I had to choose the jdk version >>>>>>>>> since it's the >>>>>>>>> public version, so there are changes to the hotspot files for >>>>>>>>> this. >>>>>>>> >>>>>>>> On Windows int and long are always the same as it uses ILP32 or >>>>>>>> LLP64 (not LP64 like *nix platforms). So either choice should >>>>>>>> be fine. That said there are some odd casting issues I comment >>>>>>>> on below. Does the VS compiler complain about mixing int and >>>>>>>> long in expressions? >>>>>>> >>>>>>> Yes, it does even though int and long are the same representation. >>>>>> >>>>>> And what an absolute mess that makes. :( >>>>>> >>>>>>>> >>>>>>>>> Generally I changed the code to use 'int' rather than 'jint' >>>>>>>>> where the >>>>>>>>> surrounding API didn't insist on consistently using java >>>>>>>>> types. We >>>>>>>>> should mostly be using C++ types within hotspot except in >>>>>>>>> interfaces to >>>>>>>>> native/JNI code. >>>>>>>> >>>>>>>> I think you pulled too hard on a few threads here and things >>>>>>>> are starting to unravel. There are numerous cases I refer to >>>>>>>> below where either the cast seems unnecessary/inappropriate or >>>>>>>> else highlights a bunch of additional changes that also need to >>>>>>>> be made. The fan out from this could be horrendous. Unless you >>>>>>>> actually get some kind of error - and I'd like to understand >>>>>>>> the details of those - I would not suggest making these changes >>>>>>>> as part of this work. >>>>>>> >>>>>>> I didn't make any change unless there was was an error. I have >>>>>>> 100 failed JPRT jobs to confirm!? I eventually got a Windows >>>>>>> system to compile and test this on. Actually some of the changes >>>>>>> came out better.? Cases where we use jint as a bool simply >>>>>>> turned to int. We do not have an overload for bool for cmpxchg. >>>>>> >>>>>> That's unfortunate - ditto for OrderAccess. >>>>>> >>>>>>>> >>>>>>>> Looking through I have a quite a few queries/comments - >>>>>>>> apologies in advance as I know how tedious this is: >>>>>>>> >>>>>>>> make/hotspot/lib/CompileLibjsig.gmk >>>>>>>> src/java.base/solaris/native/libjsig/jsig.c >>>>>>>> >>>>>>>> Took a while to figure out why the include was needed. :) As a >>>>>>>> follow up I suggest just deleting the -I include directive, >>>>>>>> delete the Solaris-only definition of JSIG_VERSION_1_4_1, and >>>>>>>> delete everything to do with JVM_get_libjsig_version. It is all >>>>>>>> obsolete. >>>>>>> >>>>>>> Can I patch up jsig in a separate RFE?? I don't remember why >>>>>>> this broke so I simply moved JSIG #define.? Is jsig obsolete? >>>>>>> Removing JVM_* definitions generally requires a CSR. >>>>>> >>>>>> I did say "As a follow up". jsig is not obsolete but the jsig >>>>>> versioning code, only used by Solaris, is. >>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/cpu/arm/interp_masm_arm.cpp >>>>>>>> >>>>>>>> Why did you need to add the jvm.h include? >>>>>>>> >>>>>>> >>>>>>> ?? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); >>>>>> >>>>>> Okay. I'm not going to try and figure out how this code found >>>>>> this before. >>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/os/windows/os_windows.cpp. >>>>>>>> >>>>>>>> The type of process_exiting should be uint to match the DWORD >>>>>>>> of GetCurrentThreadID. Then you should need any casts. Also you >>>>>>>> missed this jint cast: >>>>>>>> >>>>>>>> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { >>>>>>> >>>>>>> Yes, that's better to change process_exiting to a DWORD.? It >>>>>>> needs a DWORD cast to 0 in the cmpxchg. >>>>>>> >>>>>>> ???????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, >>>>>>> (DWORD)0); >>>>>>> >>>>>>> These templates are picky. >>>>>> >>>>>> Yes - their inability to deal with literals is extremely >>>>>> frustrating. >>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/c1/c1_Canonicalizer.hpp >>>>>>>> >>>>>>>> ? 43 #ifdef _WINDOWS >>>>>>>> ? 44?? // jint is defined as long in jni_md.h, so convert from >>>>>>>> int to jint >>>>>>>> ? 45?? void set_constant(int x) { set_constant((jint)x); } >>>>>>>> ? 46 #endif >>>>>>>> >>>>>>>> Why is this necessary? int and long are the same on Windows. >>>>>>>> The whole point is that jint hides the underlying type, so >>>>>>>> where does this go wrong? >>>>>>> >>>>>>> No, they are not the same types even though they have the same >>>>>>> representation! >>>>>> >>>>>> This is truly unfortunate. >>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>>>>> >>>>>>>> ?ConstantIntValue((jint)0); >>>>>>>> >>>>>>>> why is this cast needed? what causes the ambiguity? (If this >>>>>>>> was a template I'd understand ;-) ). Also didn't you change >>>>>>>> that constructor to take an int anyway - not that I think it >>>>>>>> should - see below. >>>>>>> >>>>>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't >>>>>>> match 'long' better than any pointer type.? So this cast is needed. >>>>>> >>>>>> But you changed the constructor to take an int! >>>>>> >>>>>> ?class ConstantIntValue: public ScopeValue { >>>>>> ? private: >>>>>> -? jint _value; >>>>>> +? int _value; >>>>>> ? public: >>>>>> -? ConstantIntValue(jint value)???????? { _value = value; } >>>>>> +? ConstantIntValue(int value)????????? { _value = value; } >>>>>> >>>>>> >>>>> >>>>> Okay I removed this cast. >>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/ci/ciReplay.cpp >>>>>>>> >>>>>>>> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >>>>>>>> >>>>>>>> why should this be jint? >>>>>>> >>>>>>> To avoid a cast from int* to jint* in the line below: >>>>>>> >>>>>>> ????????? value = kelem->multi_allocate(rank, dims, CHECK); >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/classfile/altHashing.cpp >>>>>>>> >>>>>>>> Okay this looks more consistent with jint. >>>>>>> >>>>>>> Yes.? I translated this from some native code iirc. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/code/debugInfo.hpp >>>>>>>> >>>>>>>> These changes seem wrong. We have: >>>>>>>> >>>>>>>> ConstantLongValue(jlong value) >>>>>>>> ConstantDoubleValue(jdouble value) >>>>>>>> >>>>>>>> so we should have: >>>>>>>> >>>>>>>> ConstantIntValue(jint value) >>>>>>> >>>>>>> Again, there are multiple call sites with '0', which match int >>>>>>> trivially but are confused with long.? It's less consistent I >>>>>>> agree but better to not cast all the call sites. >>>>>> >>>>>> This is really making a mess of the APIs - they should be a jint >>>>>> but we declare them int because of a 0 casting problem. Can't we >>>>>> just use 0L? >>>>> >>>>> There aren't that many casts.? You're right, that would have been >>>>> better in some places. >>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/code/relocInfo.cpp >>>>>>>> >>>>>>>> Change seems unnecessary - int32_t is fine >>>>>>>> >>>>>>> >>>>>>> No, int32_t doesn't match the calls below it.? They all assume >>>>>>> _lo and _hi are jint. >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/compiler/compileBroker.cpp >>>>>>>> src/hotspot/share/compiler/compileBroker.hpp >>>>>>>> >>>>>>>> I see a complete mix of int and jint in this class, so why make >>>>>>>> the one change you did ?? >>>>>>> >>>>>>> This is another case of using jint as a flag with cmpxchg. The >>>>>>> templates for cmpxchg want the types to match and 0 and 1 are >>>>>>> essentially 'int'.? This is a lot cleaner this way. >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >>>>>>>> >>>>>>>> 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); >>>>>>>> >>>>>>>> why did you need to add the jint cast? It's used without any >>>>>>>> cast on the next two lines: >>>>>>>> >>>>>>>> 1701???? length -= O_BUFLEN; >>>>>>>> 1702???? offset += O_BUFLEN; >>>>>>>> >>>>>>> >>>>>>> There's a conversion from O_BUFLEN from int to long in 1701 and >>>>>>> 1702.?? MIN2 is a template that wants the types to match exactly. >>>>>> >>>>>> $%^%$! templates! >>>>>> >>>>>>>> ?? >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>>>>>> >>>>>>>> Looking around this code it seems very confused about types - >>>>>>>> eg the previous function is declared jboolean yet returns a >>>>>>>> jint on one path! It isn't clear to me if the return type is >>>>>>>> what should be changed or the parameter type? I would just >>>>>>>> leave this alone. >>>>>>> >>>>>>> I can't leave it alone because it doesn't compile that way. This >>>>>>> was the minimal change and yea, does look a bit inconsistent. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/opto/mulnode.cpp >>>>>>>> >>>>>>>> Okay TypeInt has jint parts, so the remaining int32_t >>>>>>>> declarations (A, B, C, D) should also be jint. >>>>>>> >>>>>>> Yes.? c2 uses jint types. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/opto/parse3.cpp >>>>>>>> >>>>>>>> I agree with the changes you made, but then: >>>>>>>> >>>>>>>> ?419???? jint dim_con = find_int_con(length[j], -1); >>>>>>>> >>>>>>>> should also be changed. >>>>>>>> >>>>>>>> And obviously MultiArrayExpandLimit should be defined as int >>>>>>>> not intx! >>>>>>> >>>>>>> Everything in globals.hpp is intx.? That's a thread that I don't >>>>>>> want to pull on! >>>>>> >>>>>> We still have that limitation? >>>>>>> >>>>>>> Changed dim_con to int. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/opto/phaseX.cpp >>>>>>>> >>>>>>>> I can see that intcon(jint i) is consistent with longcon(jlong >>>>>>>> l), but the use of "i" in the code is more consistent with int >>>>>>>> than jint. >>>>>>> >>>>>>> huh?? really? >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/opto/type.cpp >>>>>>>> >>>>>>>> 1505 int TypeInt::hash(void) const { >>>>>>>> 1506?? return java_add(java_add(_lo, _hi), >>>>>>>> java_add((jint)_widen, (jint)Type::Int)); >>>>>>>> 1507 } >>>>>>>> >>>>>>>> I can see that the (jint) casts you added make sense, but then >>>>>>>> the whole function should be returning jint not int. Ditto the >>>>>>>> other hash functions. >>>>>>> >>>>>>> I'm not messing with this, this is the minimal in type fixing >>>>>>> that I'm going to do here. >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/prims/jni.cpp >>>>>>>> >>>>>>>> I think vm_created should be a bool. In fact all the fields you >>>>>>>> changed are logically bools - do Atomics work for bool now? >>>>>>> >>>>>>> No, they do not.?? I had thought bool would be better originally >>>>>>> too. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/prims/jvm.cpp >>>>>>>> >>>>>>>> is_attachable is the terminology used in the JDK code. >>>>>>> >>>>>>> Well the JDK version had is_attach_supported() as the flag name >>>>>>> so I used that in this one place. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>>>>> src/hotspot/share/prims/jvmtiImpl.cpp >>>>>>>> >>>>>>>> Are you making parameters consistent with the fields they >>>>>>>> initialize? >>>>>>> >>>>>>> They're consistent with the declarations now. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/prims/jvmtiTagMap.cpp >>>>>>>> >>>>>>>> There is a mix of int and jint for slot in this code. You fixed >>>>>>>> some, but this remains: >>>>>>>> >>>>>>>> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong >>>>>>>> thread_tag, >>>>>>>> 2441 jlong tid, >>>>>>>> 2442 jint depth, >>>>>>>> 2443 jmethodID method, >>>>>>>> 2444 jlocation bci, >>>>>>>> 2445 jint slot, >>>>>>> >>>>>>> Right for consistency with the declarations. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/runtime/perfData.cpp >>>>>>>> >>>>>>>> Callers pass both jint and int, so param type seems arbitrary. >>>>>>> >>>>>>> They are, but importantly they match the declarations. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/hotspot/share/runtime/perfMemory.cpp >>>>>>>> src/hotspot/share/runtime/perfMemory.hpp >>>>>>>> >>>>>>>> PerfMemory::_initialized should ideally be a bool - can >>>>>>>> OrderAccess handle that now? >>>>>>> >>>>>>> Nope. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/java.base/share/native/include/jvm.h >>>>>>>> >>>>>>>> Not clear why the jio functions are not also JNICALL ? >>>>>>> >>>>>>> They are now.? The JDK version didn't have JNICALL. JVM needs >>>>>>> JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. >>>>>> >>>>>> ?? JVM currently does not have JNICALL. But they are declared as >>>>>> "extern C". >>>>> >>>>> This was a compilation error on Windows with JDK.?? Maybe the C >>>>> code in the JDK doesn't complain about linkage differences. I'll >>>>> have to go back and figure this out then. >>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/java.base/unix/native/include/jni_md.h >>>>>>>> >>>>>>>> There is no need to special case ARM. The differences in the >>>>>>>> existing code were for LTO support and that is now irrelevant. >>>>>>> >>>>>>> See discussion with Magnus.?? We still build ARM for jdk10/hs so >>>>>>> I needed this conditional or of course I wouldn't have added >>>>>>> it.? We can remove it with LTO support. >>>>>> >>>>>> Those builds are gone - this is obsolete. But yes all LTO can be >>>>>> removed later if you wish. Just trying to simplify things now. >>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/java.base/unix/native/include/jvm_md.h >>>>>>>> >>>>>>>> I know you've just copied this across, but it seems wrong to me: >>>>>>>> >>>>>>>> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on >>>>>>>> others. This may >>>>>>>> ? 58 //?????? cause problems if JVM and the rest of JDK are >>>>>>>> built on different >>>>>>>> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to >>>>>>>> be MAXPATHLEN + 1, >>>>>>>> ? 60 //?????? so buffers declared in VM are always >= 4096. >>>>>>>> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>>>>> >>>>>>>> It doesn't make sense to me to define an internal "max path >>>>>>>> length" that can _exceed_ the platform max! >>>>>>>> >>>>>>>> That aside there's no support for building different parts of >>>>>>>> the JDK on different platforms and then bringing them together. >>>>>>>> And in any case I would think the real problem would be >>>>>>>> building on a platform that uses 4096 and running on one that >>>>>>>> uses 4095! >>>>>>>> >>>>>>>> But that aside this is a Linux hack and should be guarded by >>>>>>>> ifdef LINUX. (I doubt BSD needs it, the bsd file is just a copy >>>>>>>> of the linux one - the JDK macosx version does the right >>>>>>>> thing). Solaris and AIX should stay as-is at MAXPATHLEN. >>>>>>> >>>>>>> All of the unix platforms had MAXPATHLEN+1.? I'll leave it for >>>>>>> now and we can investigate that further. >>>>>> >>>>>> I see the following existing code: >>>>>> >>>>>> src/java.base/unix/native/include/jvm_md.h: >>>>>> >>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>> >>>>>> src/java.base/macosx/native/include/jvm_md.h >>>>>> >>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>> >>>>>> src/hotspot/os/aix/jvm_aix.h >>>>>> >>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>> >>>>>> src/hotspot/os/bsd/jvm_bsd.h >>>>>> >>>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1? // blindly copied from >>>>>> Linux version >>>>>> >>>>>> src/hotspot/os/linux/jvm_linux.h >>>>>> >>>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>>> >>>>>> src/hotspot/os/solaris/jvm_solaris.h >>>>>> >>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>> >>>>>> This is a linux only hack (if you ignore the blind copy from >>>>>> linux into the BSD code in the VM). >>>>> >>>>> Oh, thanks, so should I add a bunch of ifdefs then?? Or do you >>>>> think having MAXPATHLEN + 1 will really break the other >>>>> platforms?? Do you really see this as a problem or are you just >>>>> pointing out inconsistency? >>>>>> >>>>>>>> >>>>>>>> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >>>>>>>> >>>>>>>> This only exists on Solaris so I think should be in #ifdef >>>>>>>> SOLARIS, to make that clear. >>>>>>> >>>>>>> Ok.? I'll add this. >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> src/java.base/windows/native/include/jvm_md.h >>>>>>>> >>>>>>>> Given the differences between the two versions either something >>>>>>>> has been broken or "extern C" declarations are not needed :) >>>>>>> >>>>>>> Well, they are needed for Hotspot to build and do not prevent >>>>>>> jdk from building.? I don't know what was broken. >>>>>> >>>>>> We really need to understand this better. Maybe related to the >>>>>> map files that expose the symbols. ?? >>>>> >>>>> They're needed because the JDK files are written mostly in C and >>>>> that doesn't complain about the linkage difference. Hotspot files >>>>> are in C++ which does complain. >>>>> >>>>>> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> That was a really painful way to spend most of my Friday. TGIF! :) >>>>>>> >>>>>>> Thanks for going through it.? See comments inline for changes. >>>>>>> Generating a webrev takes hours so I'm not going to do that >>>>>>> unless you insist. >>>>>> >>>>>> An incremental webrev shouldn't take long - right? You're a mq >>>>>> maestro now. :) >>>>> >>>>> Well I generally trash a repository whenever I use mq but sure. >>>>>> >>>>>> If you can reasonably produce an incremental webrev once you've >>>>>> settled on all the comments/issues that would be good. >>>>> >>>>> Ok, sure. >>>>> >>>>> Coleen >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>>> Thanks, >>>>>>> Coleen >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> >>>>>>>> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>>>>>>>> ??Hi Magnus, >>>>>>>>> >>>>>>>>> Thank you for reviewing this.?? I have a new version that >>>>>>>>> takes out the hack in globalDefinitions.hpp and adds casts to >>>>>>>>> src/hotspot/share/opto/type.cpp instead. >>>>>>>>> >>>>>>>>> Also some fixes from Martin at SAP. >>>>>>>>> >>>>>>>>> open webrev at >>>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>>>>>>>> >>>>>>>>> see below. >>>>>>>>> >>>>>>>>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>>>>>>>> Coleen, >>>>>>>>>> >>>>>>>>>> Thank you for addressing this! >>>>>>>>>> >>>>>>>>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>>>>> >>>>>>>>>>> Mostly used sed to remove prims/jvm.h and move #include >>>>>>>>>>> "jvm.h" after precompiled.h, so if you have repetitive >>>>>>>>>>> stress wrist issues don't click on most of these files. >>>>>>>>>>> >>>>>>>>>>> There were more issues to resolve, however. The JDK windows >>>>>>>>>>> jni_md.h file defined jint as long and the hotspot windows >>>>>>>>>>> jni_x86.h as int. I had to choose the jdk version since it's >>>>>>>>>>> the public version, so there are changes to the hotspot >>>>>>>>>>> files for this. Generally I changed the code to use 'int' >>>>>>>>>>> rather than 'jint' where the surrounding API didn't insist >>>>>>>>>>> on consistently using java types. We should mostly be using >>>>>>>>>>> C++ types within hotspot except in interfaces to native/JNI >>>>>>>>>>> code. There are a couple of hacks in places where adding >>>>>>>>>>> multiple jint casts was too painful. >>>>>>>>>>> >>>>>>>>>>> Tested with JPRT and tier2-4 (in progress). >>>>>>>>>>> >>>>>>>>>>> open webrev at >>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>>>>>>>> >>>>>>>>>> Looks great! >>>>>>>>>> >>>>>>>>>> Just a few comments: >>>>>>>>>> >>>>>>>>>> * src/java.base/unix/native/include/jni_md.h: >>>>>>>>>> >>>>>>>>>> I don't think the externally_visible attribute should be >>>>>>>>>> there for arm. I know this was the case for the corresponding >>>>>>>>>> hotspot file for arm, but that was techically incorrect. The >>>>>>>>>> proper dependency here is that externally_visible should be >>>>>>>>>> in all JNIEXPORT if and only if we're building with JVM >>>>>>>>>> feature "link-time-opt". Traditionally, that feature been >>>>>>>>>> enabled when building arm32 builds, and only then, so there's >>>>>>>>>> been a (coincidentally) connection here. Nowadays, Oracle >>>>>>>>>> does not care about the arm32 builds, and I'm not sure if >>>>>>>>>> anyone else is building them with link-time-opt enabled. >>>>>>>>>> >>>>>>>>>> It does seem wrong to me to export this behavior in the >>>>>>>>>> public jni_md.h file, though. I think the correct way to >>>>>>>>>> solve this, if we should continue supporting link-time-opt is >>>>>>>>>> to make sure this attribute is set for exported hotspot >>>>>>>>>> functions. If it's still needed, that is. A quick googling >>>>>>>>>> seems to indicate that visibility("default") might be enough >>>>>>>>>> in modern gcc's. >>>>>>>>>> >>>>>>>>>> A third option is to remove the support for link-time-opt >>>>>>>>>> entirely, if it's not really used. >>>>>>>>> >>>>>>>>> I didn't know how to change this since we are still building >>>>>>>>> ARM with the jdk10/hs repository, and ARM needed this change.? >>>>>>>>> I could wait until we bring down the jdk10/master changes that >>>>>>>>> remove the ARM build and remove this conditional before I >>>>>>>>> push. Or we could file an RFE to remove link-time-opt (?) and >>>>>>>>> remove it then? >>>>>>>>> >>>>>>>>>> >>>>>>>>>> * src/java.base/unix/native/include/jvm_md.h and >>>>>>>>>> src/java.base/windows/native/include/jvm_md.h: >>>>>>>>>> >>>>>>>>>> These files define a public API, and contain non-trivial >>>>>>>>>> changes. I suspect you should file a CSR request. (Even >>>>>>>>>> though I realize you're only matching the header file with >>>>>>>>>> the reality.) >>>>>>>>>> >>>>>>>>> >>>>>>>>> I filed the CSR.?? Waiting for the next steps. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Coleen >>>>>>>>> >>>>>>>>>> /Magnus >>>>>>>>>> >>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>>>>>>>> >>>>>>>>>>> I have a script to update copyright files on commit. >>>>>>>>>>> >>>>>>>>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Coleen >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> > From dmitry.samersoff at bell-sw.com Mon Oct 30 18:05:41 2017 From: dmitry.samersoff at bell-sw.com (Dmitry Samersoff) Date: Mon, 30 Oct 2017 21:05:41 +0300 Subject: [10] RFR 8186046 Minimal ConstantDynamic support In-Reply-To: References: Message-ID: Paul, templateTable_x86.cpp: 564 const Register flags = rcx; 565 const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); Should we use another register for rarg under NOT_LP64 ? -Dmitry On 10/26/2017 08:03 PM, Paul Sandoz wrote: > Hi, > > Please review the following patch for minimal dynamic constant support: > > http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ > > https://bugs.openjdk.java.net/browse/JDK-8186046 > https://bugs.openjdk.java.net/browse/JDK-8186209 > > This patch is based on the JDK 10 unified HotSpot repository. Testing so far looks good. > > By minimal i mean just the support in the runtime for a dynamic constant pool entry to be referenced by a LDC instruction or a bootstrap method argument. Much of the work leverages the foundations built by invoke dynamic but is arguably simpler since resolution is less complex. > > A small set of bootstrap methods will be proposed as a follow on issue for 10 (these are currently being refined in the amber repository). > > Bootstrap method invocation has not changed (and the rules are the same for dynamic constants and indy). It is planned to enhance this in a further major release to support lazy resolution of bootstrap method arguments. > > The CSR for the VM specification is here: > > https://bugs.openjdk.java.net/browse/JDK-8189199 > > the j.l.invoke package documentation was also updated but please consider the VM specification as the definitive "source of truth" (we may clean up this area further later on so it becomes more informative, and that may also apply to duplicative text on MethodHandles/VarHandles). > > Any AoT-related work will be deferred to a future release. > > ? > > This patch only supports x64 platforms. There is a small set of changes specific to x64 (specifically to support null and primitives constants, as prior to this patch null was used as a sentinel for resolution and certain primitives types would never have been encountered, such as say byte). > > We will need to follow up with the SPARC platform and it is hoped/anticipated that OpenJDK members responsible for other platforms (namely ARM and PPC) will separately provide patches. > > ? > > Many of tests rely on an experimental byte code API that supports the generation of byte code with dynamic constants. > > One test uses class file bytes produced from a modified version of asmtools. The modifications have now been pushed but a new version of asmtools need to be rolled into jtreg before the test can operate directly on asmtools information rather than embedding class file bytes directly in the test. > > ? > > Paul. > From volker.simonis at gmail.com Mon Oct 30 19:34:22 2017 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 30 Oct 2017 20:34:22 +0100 Subject: RFR(S): 8187091: ReturnBlobToWrongHeapTest fails because of problems in CodeHeap::contains_blob() In-Reply-To: References: Message-ID: Hi Vladimir, this one is still pending (you only pushed "8166317: InterpreterCodeSize should be computed"). Could you please also sponsor this one? The latest version is here: http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v2/ Thank you and best regards, Volker On Tue, Sep 5, 2017 at 6:35 PM, Vladimir Kozlov wrote: > On 9/4/17 10:23 AM, Volker Simonis wrote: >> >> On Fri, Sep 1, 2017 at 6:00 PM, Vladimir Kozlov >> wrote: >>> >>> Checking type is emulation of virtual call ;-) >> >> >> I agree :) But it is only a bimorphic dispatch in this case which >> should be still faster than a normal virtual call. >> >>> But I agree that it is simplest solution - one line change (excluding >>> comment - comment is good BTW). >>> >> >> Thanks. >> >>> You can also add guard AOT_ONLY() around aot specific code: >>> >>> const void* start = AOT_ONLY( (code_blob_type() == CodeBlobType::AOT) >>> ? >>> blob->code_begin() : ) (void*)blob; >>> >>> because we do have builds without AOT. >>> >> >> Done. Please find the new webrev here: >> >> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091.v1/ > > > Looks good. Thank you for updated CodeBlob description comment. > >> >> Could you please sponsor the change once jdk10-hs opens again? > > > We have to wait when jdk10 "consolidation" is finished. It may take 2 weeks. > >> >> Thanks, >> Volker >> >> PS: one thing which is still unclear to me is why you haven't caught >> this issue before? Isn't >> test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java part of >> JPRT and/or your regular tests? > > > test/compiler/codecache/stress are excluded from JPRT runs: > > https://bugs.openjdk.java.net/browse/JDK-8069021 > > Also these tests are marked with @key stress. Originally it was only 2 tests > and ReturnBlobToWrongHeapTest.java was added later: > > https://bugs.openjdk.java.net/browse/JDK-8069021 > > I am trying to find which testing tier runs them. I will follow this. > > Thanks, > Vladimir > > >> >> >>> Thanks, >>> Vladimir >>> >>> >>> On 9/1/17 8:42 AM, Volker Simonis wrote: >>>> >>>> >>>> Hi, >>>> >>>> can I please have a review and sponsor for the following small fix: >>>> >>>> http://cr.openjdk.java.net/~simonis/webrevs/2017/8187091/ >>>> https://bugs.openjdk.java.net/browse/JDK-8187091 >>>> >>>> We see failures in >>>> test/compiler/codecache/stress/ReturnBlobToWrongHeapTest.java which >>>> are cause by problems in CodeHeap::contains_blob() for corner cases >>>> with CodeBlobs of zero size: >>>> >>>> # A fatal error has been detected by the Java Runtime Environment: >>>> # >>>> # Internal Error (heap.cpp:248), pid=27586, tid=27587 >>>> # guarantee((char*) b >= _memory.low_boundary() && (char*) b < >>>> _memory.high()) failed: The block to be deallocated 0x00007fffe6666f80 >>>> is not within the heap starting with 0x00007fffe6667000 and ending >>>> with 0x00007fffe6ba000 >>>> >>>> The problem is that JDK-8183573 replaced >>>> >>>> virtual bool contains_blob(const CodeBlob* blob) const { return >>>> low_boundary() <= (char*) blob && (char*) blob < high(); } >>>> >>>> by: >>>> >>>> bool contains_blob(const CodeBlob* blob) const { return >>>> contains(blob->code_begin()); } >>>> >>>> But that my be wrong in the corner case where the size of the >>>> CodeBlob's payload is zero (i.e. the CodeBlob consists only of the >>>> 'header' - i.e. the C++ object itself) because in that case >>>> CodeBlob::code_begin() points right behind the CodeBlob's header which >>>> is a memory location which doesn't belong to the CodeBlob anymore. >>>> >>>> This exact corner case is exercised by ReturnBlobToWrongHeapTest which >>>> allocates CodeBlobs of size zero (i.e. zero 'payload') with the help >>>> of sun.hotspot.WhiteBox.allocateCodeBlob() until the CodeCache fills >>>> up. The test first fills the 'non-profiled nmethods' CodeHeap. If the >>>> 'non-profiled nmethods' CodeHeap is full, the VM automatically tries >>>> to allocate from the 'profiled nmethods' CodeHeap until that fills up >>>> as well. But in the CodeCache the 'profiled nmethods' CodeHeap is >>>> located right before the non-profiled nmethods' CodeHeap. So if the >>>> last CodeBlob allocated from the 'profiled nmethods' CodeHeap has a >>>> payload size of zero and uses all the CodeHeaps remaining size, we >>>> will end up with a CodeBlob whose code_begin() address will point >>>> right behind the actual CodeHeap (i.e. it will point right at the >>>> beginning of the adjacent, 'non-profiled nmethods' CodeHeap). This >>>> will result in the above guarantee to fire, when we will try to free >>>> the last allocated CodeBlob (with >>>> sun.hotspot.WhiteBox.freeCodeBlob()). >>>> >>>> In a previous mail thread >>>> >>>> >>>> (http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-August/028175.html) >>>> Vladimir explained why JDK-8183573 was done: >>>> >>>>> About contains_blob(). The problem is that AOTCompiledMethod allocated >>>>> in >>>>> CHeap and not in aot code section (which is RO): >>>>> >>>>> >>>>> >>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/8acd232fb52a/src/share/vm/aot/aotCompiledMethod.hpp#l124 >>>>> >>>>> It is allocated in CHeap after AOT library is loaded. Its code_begin() >>>>> points to AOT code section but AOTCompiledMethod* >>>>> points outside it (to normal malloced space) so you can't use >>>>> (char*)blob >>>>> address. >>>> >>>> >>>> >>>> and proposed these two fixes: >>>> >>>>> There are 2 ways to fix it, I think. >>>>> One is to add new field to CodeBlobLayout and set it to blob* address >>>>> for >>>>> normal CodeCache blobs and to code_begin for >>>>> AOT code. >>>>> Second is to use contains(blob->code_end() - 1) assuming that AOT code >>>>> is >>>>> never zero. >>>> >>>> >>>> >>>> I came up with a slightly different solution - just use >>>> 'CodeHeap::code_blob_type()' whether to use 'blob->code_begin()' (for >>>> the AOT case) or '(void*)blob' (for all other blobs) as input for the >>>> call to 'CodeHeap::contain()'. It's simple and still much cheaper than >>>> a virtual call. What do you think? >>>> >>>> I've also updated the documentation of the CodeBlob class hierarchy in >>>> codeBlob.hpp. Please let me know if I've missed something. >>>> >>>> Thank you and best regards, >>>> Volker >>>> >>> > From paul.sandoz at oracle.com Mon Oct 30 19:44:54 2017 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 30 Oct 2017 12:44:54 -0700 Subject: [10] RFR 8186046 Minimal ConstantDynamic support In-Reply-To: References: Message-ID: <93431280-9CBF-4722-961D-F2D2D0F83B4E@oracle.com> Hi, Thanks for reviewing. > On 30 Oct 2017, at 11:05, Dmitry Samersoff wrote: > > Paul, > > templateTable_x86.cpp: > > 564 const Register flags = rcx; > 565 const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); > > Should we use another register for rarg under NOT_LP64 ? > I think it should be ok, it i ain?t an expert here on the interpreter and the calling conventions, so please correct me. Some more context: + const Register flags = rcx; + const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); + __ movl(rarg, (int)bytecode()); The current bytecode code is loaded into ?rarg? + call_VM(obj, CAST_FROM_FN_PTR(address, InterpreterRuntime::resolve_ldc), rarg); Then ?rarg" is the argument to the call to InterpreterRuntime::resolve_ldc, after which it is no longer referred to. +#ifndef _LP64 + // borrow rdi from locals + __ get_thread(rdi); + __ get_vm_result_2(flags, rdi); + __ restore_locals(); +#else + __ get_vm_result_2(flags, r15_thread); +#endif The result from the call is then loaded into flags. So i don?t think it matters in this case if rcx is aliased. Paul. > -Dmitry > > > On 10/26/2017 08:03 PM, Paul Sandoz wrote: >> Hi, >> >> Please review the following patch for minimal dynamic constant support: >> >> http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ >> >> https://bugs.openjdk.java.net/browse/JDK-8186046 >> https://bugs.openjdk.java.net/browse/JDK-8186209 >> >> This patch is based on the JDK 10 unified HotSpot repository. Testing so far looks good. >> >> By minimal i mean just the support in the runtime for a dynamic constant pool entry to be referenced by a LDC instruction or a bootstrap method argument. Much of the work leverages the foundations built by invoke dynamic but is arguably simpler since resolution is less complex. >> >> A small set of bootstrap methods will be proposed as a follow on issue for 10 (these are currently being refined in the amber repository). >> >> Bootstrap method invocation has not changed (and the rules are the same for dynamic constants and indy). It is planned to enhance this in a further major release to support lazy resolution of bootstrap method arguments. >> >> The CSR for the VM specification is here: >> >> https://bugs.openjdk.java.net/browse/JDK-8189199 >> >> the j.l.invoke package documentation was also updated but please consider the VM specification as the definitive "source of truth" (we may clean up this area further later on so it becomes more informative, and that may also apply to duplicative text on MethodHandles/VarHandles). >> >> Any AoT-related work will be deferred to a future release. >> >> ? >> >> This patch only supports x64 platforms. There is a small set of changes specific to x64 (specifically to support null and primitives constants, as prior to this patch null was used as a sentinel for resolution and certain primitives types would never have been encountered, such as say byte). >> >> We will need to follow up with the SPARC platform and it is hoped/anticipated that OpenJDK members responsible for other platforms (namely ARM and PPC) will separately provide patches. >> >> ? >> >> Many of tests rely on an experimental byte code API that supports the generation of byte code with dynamic constants. >> >> One test uses class file bytes produced from a modified version of asmtools. The modifications have now been pushed but a new version of asmtools need to be rolled into jtreg before the test can operate directly on asmtools information rather than embedding class file bytes directly in the test. >> >> ? >> >> Paul. >> > From mandy.chung at oracle.com Mon Oct 30 21:08:53 2017 From: mandy.chung at oracle.com (mandy chung) Date: Mon, 30 Oct 2017 14:08:53 -0700 Subject: RFR: 8190287: Update JDK's internal ASM to ASMv6 In-Reply-To: <59F3690B.6070309@oracle.com> References: <59F3690B.6070309@oracle.com> Message-ID: <1d6c773a-8495-cf14-61b6-7616c8b80225@oracle.com> On 10/27/17 10:12 AM, Kumar Srinivasan wrote: > Hello Remi, Sundar and others, > > Please review the webrev [1] to update JDK's internal ASM to v6. > > [1] http://cr.openjdk.java.net/~ksrini/8190287/webrev.00/index.html The jlink and module-related change looks fine to me.? I also skimmed through asm6 change which looks fine too. Please update src/java.base/share/legal/asm.md to reflect the new version. thanks Mandy From frederic.parain at oracle.com Mon Oct 30 21:56:37 2017 From: frederic.parain at oracle.com (Frederic Parain) Date: Mon, 30 Oct 2017 17:56:37 -0400 Subject: [10] RFR 8186046 Minimal ConstantDynamic support In-Reply-To: <93431280-9CBF-4722-961D-F2D2D0F83B4E@oracle.com> References: <93431280-9CBF-4722-961D-F2D2D0F83B4E@oracle.com> Message-ID: I?m seeing no issue with rcx being aliased in this code. Fred > On Oct 30, 2017, at 15:44, Paul Sandoz wrote: > > Hi, > > Thanks for reviewing. > >> On 30 Oct 2017, at 11:05, Dmitry Samersoff wrote: >> >> Paul, >> >> templateTable_x86.cpp: >> >> 564 const Register flags = rcx; >> 565 const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); >> >> Should we use another register for rarg under NOT_LP64 ? >> > > I think it should be ok, it i ain?t an expert here on the interpreter and the calling conventions, so please correct me. > > Some more context: > > + const Register flags = rcx; > + const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); > + __ movl(rarg, (int)bytecode()); > > The current bytecode code is loaded into ?rarg? > > + call_VM(obj, CAST_FROM_FN_PTR(address, InterpreterRuntime::resolve_ldc), rarg); > > Then ?rarg" is the argument to the call to InterpreterRuntime::resolve_ldc, after which it is no longer referred to. > > +#ifndef _LP64 > + // borrow rdi from locals > + __ get_thread(rdi); > + __ get_vm_result_2(flags, rdi); > + __ restore_locals(); > +#else > + __ get_vm_result_2(flags, r15_thread); > +#endif > > The result from the call is then loaded into flags. > > So i don?t think it matters in this case if rcx is aliased. > > Paul. > >> -Dmitry >> >> >> On 10/26/2017 08:03 PM, Paul Sandoz wrote: >>> Hi, >>> >>> Please review the following patch for minimal dynamic constant support: >>> >>> http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ >>> >>> https://bugs.openjdk.java.net/browse/JDK-8186046 >>> https://bugs.openjdk.java.net/browse/JDK-8186209 >>> >>> This patch is based on the JDK 10 unified HotSpot repository. Testing so far looks good. >>> >>> By minimal i mean just the support in the runtime for a dynamic constant pool entry to be referenced by a LDC instruction or a bootstrap method argument. Much of the work leverages the foundations built by invoke dynamic but is arguably simpler since resolution is less complex. >>> >>> A small set of bootstrap methods will be proposed as a follow on issue for 10 (these are currently being refined in the amber repository). >>> >>> Bootstrap method invocation has not changed (and the rules are the same for dynamic constants and indy). It is planned to enhance this in a further major release to support lazy resolution of bootstrap method arguments. >>> >>> The CSR for the VM specification is here: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8189199 >>> >>> the j.l.invoke package documentation was also updated but please consider the VM specification as the definitive "source of truth" (we may clean up this area further later on so it becomes more informative, and that may also apply to duplicative text on MethodHandles/VarHandles). >>> >>> Any AoT-related work will be deferred to a future release. >>> >>> ? >>> >>> This patch only supports x64 platforms. There is a small set of changes specific to x64 (specifically to support null and primitives constants, as prior to this patch null was used as a sentinel for resolution and certain primitives types would never have been encountered, such as say byte). >>> >>> We will need to follow up with the SPARC platform and it is hoped/anticipated that OpenJDK members responsible for other platforms (namely ARM and PPC) will separately provide patches. >>> >>> ? >>> >>> Many of tests rely on an experimental byte code API that supports the generation of byte code with dynamic constants. >>> >>> One test uses class file bytes produced from a modified version of asmtools. The modifications have now been pushed but a new version of asmtools need to be rolled into jtreg before the test can operate directly on asmtools information rather than embedding class file bytes directly in the test. >>> >>> ? >>> >>> Paul. >>> >> > From david.holmes at oracle.com Tue Oct 31 00:21:45 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 31 Oct 2017 10:21:45 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> <815ac734-ea8b-ea2d-ecec-85cb547ba2f4@oracle.com> <440f79ba-2da3-b627-53bc-e1842e3cf73c@oracle.com> Message-ID: <058662bb-5d5b-0085-cc08-02192d000838@oracle.com> On 31/10/2017 12:48 AM, coleen.phillimore at oracle.com wrote: > > http://cr.openjdk.java.net/~coleenp/8189610.incr.02/webrev/index.html > > Changed JDK file to use PATH_MAX.? Retested jdk tier1 tests. Why PATH_MAX instead of MAXPATHLEN? They appear to be the same on Linux and Solaris, but I don't know if that is true for AIX and Mac OS / BSD. Does UnixFileSystem_md.c still need the jvm.h include now? Thanks, David > thanks, > Coleen > > On 10/30/17 8:38 AM, coleen.phillimore at oracle.com wrote: >> >> >> On 10/30/17 8:17 AM, David Holmes wrote: >>> On 30/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>>> On 10/28/17 3:50 AM, David Holmes wrote: >>>>> Hi Coleen, >>>>> >>>>> I've commented on the file location in response to Mandy's email. >>>>> >>>>> The only issue I'm still concerned about is the JVM_MAXPATHLEN >>>>> issue. I think it is a bug to define a JVM_MAXPATHLEN that is >>>>> bigger than the platform MAXPATHLEN. I also would not want to see >>>>> any change in behaviour because of this - so AIX and Solaris should >>>>> not get a different JVM_MAXPATHLEN due to this refactoring change. >>>>> So yes I think this needs to be ifdef'd for Linux and reluctantly >>>>> (because it was a copy error) for OSX/BSD as well. >>>> >>>> #if defined(AIX) || defined(SOLARIS) >>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>> #else >>>> // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This may >>>> //?????? cause problems if JVM and the rest of JDK are built on >>>> different >>>> //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >>>> MAXPATHLEN + 1, >>>> //?????? so buffers declared in VM are always >= 4096. >>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>> #endif >>>> >>>> Is this ok? >>> >>> Yes - thanks. It preserves existing behaviour on the VM side at >>> least. Time will tell if it messes anything up on the JDK side for >>> Linux/OSX. >> >> I don't want to wait for time so I'm investigating. >> >> It's one use is: >> >> Java_java_io_UnixFileSystem_canonicalize0(JNIEnv *env, jobject this, >> ... >> ??????? char canonicalPath[JVM_MAXPATHLEN]; >> ??????? if (canonicalize((char *)path, >> ???????????????????????? canonicalPath, JVM_MAXPATHLEN) < 0) { >> ??????????? JNU_ThrowIOExceptionWithLastError(env, "Bad pathname"); >> >> Which goes to: >> >> canonicalize_md.c >> >> canonicalize(char *original, char *resolved, int len) >> ??? if (len < PATH_MAX) { >> ??????? errno = EINVAL; >> ??????? return -1; >> ??? } >> >> >> So this should fail every time. >> >> sys/param.h:# define MAXPATHLEN??? PATH_MAX >> >> I haven't found any tests for it. >> >> I don't know why Java_java_io_UnixFileSystem uses JVM_MAXPATHLEN since >> it's not calling the JVM interface as far as I can tell. I think it >> should be changed to PATH_MAX. >> >> ? >> Coleen >>> >>> David >>> >>>> thanks, >>>> Coleen >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 28/10/2017 12:08 AM, coleen.phillimore at oracle.com wrote: >>>>>> >>>>>> >>>>>> On 10/27/17 9:37 AM, David Holmes wrote: >>>>>>> On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 10/27/17 3:23 AM, David Holmes wrote: >>>>>>>>> Hi Coleen, >>>>>>>>> >>>>>>>>> Thanks for tackling this. >>>>>>>>> >>>>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>>> >>>>>>>>> Can you update the bug synopsis to show it covers both sets of >>>>>>>>> files please. >>>>>>>>> >>>>>>>>> I hate to start with this (and it took me quite a while to >>>>>>>>> realize it) but as Mandy pointed out jvm.h is not an exported >>>>>>>>> interface from the JDK to the outside world (so not subject to >>>>>>>>> CSR review), but is a private interface between the JVM and the >>>>>>>>> JDK libraries. So I think really jvm.h belongs in the hotspot >>>>>>>>> sources where it was, while jni.h belongs in the exported JDK >>>>>>>>> sources. In which case the bulk of your changes to the hotspot >>>>>>>>> files would not be needed - sorry. >>>>>>>> >>>>>>>> Maybe someone can make that decision and change at a later date. >>>>>>>> The point of this change is that there is now only one of these >>>>>>>> files that is shared.? I don't think jvm.h and the jvm_md.h >>>>>>>> belong on the hotspot sources for the jdk to find them in some >>>>>>>> random prims and os dependent directories. >>>>>>> >>>>>>> The one file that is needed is a hotspot file - jvm.h defines the >>>>>>> interface that hotspot exports via jvm.cpp. >>>>>>> >>>>>>> If you leave jvm.h in hotspot/prims then a very large chunk of >>>>>>> your boilerplate changes are not needed. The JDK code doesn't >>>>>>> care what the name of the directory is - whatever it is just gets >>>>>>> added as a -I directive (the JDK code will include "jvm.h" not >>>>>>> "prims/jvm.h" the way hotspot sources do. >>>>>>> >>>>>>> This isn't something we want to change back or move again later. >>>>>>> Whatever we do now we live with. >>>>>> >>>>>> I think it belongs with jni.h and I think the core libraries group >>>>>> would agree.?? It seems more natural there than buried in the >>>>>> hotspot prims directory.? I guess this is on hold while we have >>>>>> this debate.?? Sigh. >>>>>> >>>>>> Actually with -I directives, changing to jvm.h from prims/jvm.h >>>>>> would still work.?? Maybe we should change the name to jvm.hpp >>>>>> since it's jvm.cpp though??? Or maybe just have two divergent >>>>>> copies and close this as WNF. >>>>>> >>>>>>> >>>>>>>> I'm happy to withdraw the CSR. We generally use the CSR process >>>>>>>> to add and remove JVM_ interfaces even though they're a private >>>>>>>> interface in case some other JVM/JDK combination relies on them. >>>>>>>> The changes to these files are very minor though and not likely >>>>>>>> to cause any even theoretical incompatibility, so I'll withdraw it. >>>>>>>>> >>>>>>>>> Moving on ... >>>>>>>>> >>>>>>>>> First to address the initial comments/query you had: >>>>>>>>> >>>>>>>>>> The JDK windows jni_md.h file defined jint as long and the >>>>>>>>>> hotspot >>>>>>>>>> windows jni_x86.h as int. I had to choose the jdk version >>>>>>>>>> since it's the >>>>>>>>>> public version, so there are changes to the hotspot files for >>>>>>>>>> this. >>>>>>>>> >>>>>>>>> On Windows int and long are always the same as it uses ILP32 or >>>>>>>>> LLP64 (not LP64 like *nix platforms). So either choice should >>>>>>>>> be fine. That said there are some odd casting issues I comment >>>>>>>>> on below. Does the VS compiler complain about mixing int and >>>>>>>>> long in expressions? >>>>>>>> >>>>>>>> Yes, it does even though int and long are the same representation. >>>>>>> >>>>>>> And what an absolute mess that makes. :( >>>>>>> >>>>>>>>> >>>>>>>>>> Generally I changed the code to use 'int' rather than 'jint' >>>>>>>>>> where the >>>>>>>>>> surrounding API didn't insist on consistently using java >>>>>>>>>> types. We >>>>>>>>>> should mostly be using C++ types within hotspot except in >>>>>>>>>> interfaces to >>>>>>>>>> native/JNI code. >>>>>>>>> >>>>>>>>> I think you pulled too hard on a few threads here and things >>>>>>>>> are starting to unravel. There are numerous cases I refer to >>>>>>>>> below where either the cast seems unnecessary/inappropriate or >>>>>>>>> else highlights a bunch of additional changes that also need to >>>>>>>>> be made. The fan out from this could be horrendous. Unless you >>>>>>>>> actually get some kind of error - and I'd like to understand >>>>>>>>> the details of those - I would not suggest making these changes >>>>>>>>> as part of this work. >>>>>>>> >>>>>>>> I didn't make any change unless there was was an error. I have >>>>>>>> 100 failed JPRT jobs to confirm!? I eventually got a Windows >>>>>>>> system to compile and test this on. Actually some of the changes >>>>>>>> came out better.? Cases where we use jint as a bool simply >>>>>>>> turned to int. We do not have an overload for bool for cmpxchg. >>>>>>> >>>>>>> That's unfortunate - ditto for OrderAccess. >>>>>>> >>>>>>>>> >>>>>>>>> Looking through I have a quite a few queries/comments - >>>>>>>>> apologies in advance as I know how tedious this is: >>>>>>>>> >>>>>>>>> make/hotspot/lib/CompileLibjsig.gmk >>>>>>>>> src/java.base/solaris/native/libjsig/jsig.c >>>>>>>>> >>>>>>>>> Took a while to figure out why the include was needed. :) As a >>>>>>>>> follow up I suggest just deleting the -I include directive, >>>>>>>>> delete the Solaris-only definition of JSIG_VERSION_1_4_1, and >>>>>>>>> delete everything to do with JVM_get_libjsig_version. It is all >>>>>>>>> obsolete. >>>>>>>> >>>>>>>> Can I patch up jsig in a separate RFE?? I don't remember why >>>>>>>> this broke so I simply moved JSIG #define.? Is jsig obsolete? >>>>>>>> Removing JVM_* definitions generally requires a CSR. >>>>>>> >>>>>>> I did say "As a follow up". jsig is not obsolete but the jsig >>>>>>> versioning code, only used by Solaris, is. >>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/cpu/arm/interp_masm_arm.cpp >>>>>>>>> >>>>>>>>> Why did you need to add the jvm.h include? >>>>>>>>> >>>>>>>> >>>>>>>> ?? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); >>>>>>> >>>>>>> Okay. I'm not going to try and figure out how this code found >>>>>>> this before. >>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/os/windows/os_windows.cpp. >>>>>>>>> >>>>>>>>> The type of process_exiting should be uint to match the DWORD >>>>>>>>> of GetCurrentThreadID. Then you should need any casts. Also you >>>>>>>>> missed this jint cast: >>>>>>>>> >>>>>>>>> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { >>>>>>>> >>>>>>>> Yes, that's better to change process_exiting to a DWORD.? It >>>>>>>> needs a DWORD cast to 0 in the cmpxchg. >>>>>>>> >>>>>>>> ???????? Atomic::cmpxchg(GetCurrentThreadId(), &process_exiting, >>>>>>>> (DWORD)0); >>>>>>>> >>>>>>>> These templates are picky. >>>>>>> >>>>>>> Yes - their inability to deal with literals is extremely >>>>>>> frustrating. >>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/c1/c1_Canonicalizer.hpp >>>>>>>>> >>>>>>>>> ? 43 #ifdef _WINDOWS >>>>>>>>> ? 44?? // jint is defined as long in jni_md.h, so convert from >>>>>>>>> int to jint >>>>>>>>> ? 45?? void set_constant(int x) { set_constant((jint)x); } >>>>>>>>> ? 46 #endif >>>>>>>>> >>>>>>>>> Why is this necessary? int and long are the same on Windows. >>>>>>>>> The whole point is that jint hides the underlying type, so >>>>>>>>> where does this go wrong? >>>>>>>> >>>>>>>> No, they are not the same types even though they have the same >>>>>>>> representation! >>>>>>> >>>>>>> This is truly unfortunate. >>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>>>>>> >>>>>>>>> ?ConstantIntValue((jint)0); >>>>>>>>> >>>>>>>>> why is this cast needed? what causes the ambiguity? (If this >>>>>>>>> was a template I'd understand ;-) ). Also didn't you change >>>>>>>>> that constructor to take an int anyway - not that I think it >>>>>>>>> should - see below. >>>>>>>> >>>>>>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't >>>>>>>> match 'long' better than any pointer type.? So this cast is needed. >>>>>>> >>>>>>> But you changed the constructor to take an int! >>>>>>> >>>>>>> ?class ConstantIntValue: public ScopeValue { >>>>>>> ? private: >>>>>>> -? jint _value; >>>>>>> +? int _value; >>>>>>> ? public: >>>>>>> -? ConstantIntValue(jint value)???????? { _value = value; } >>>>>>> +? ConstantIntValue(int value)????????? { _value = value; } >>>>>>> >>>>>>> >>>>>> >>>>>> Okay I removed this cast. >>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/ci/ciReplay.cpp >>>>>>>>> >>>>>>>>> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >>>>>>>>> >>>>>>>>> why should this be jint? >>>>>>>> >>>>>>>> To avoid a cast from int* to jint* in the line below: >>>>>>>> >>>>>>>> ????????? value = kelem->multi_allocate(rank, dims, CHECK); >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/classfile/altHashing.cpp >>>>>>>>> >>>>>>>>> Okay this looks more consistent with jint. >>>>>>>> >>>>>>>> Yes.? I translated this from some native code iirc. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/code/debugInfo.hpp >>>>>>>>> >>>>>>>>> These changes seem wrong. We have: >>>>>>>>> >>>>>>>>> ConstantLongValue(jlong value) >>>>>>>>> ConstantDoubleValue(jdouble value) >>>>>>>>> >>>>>>>>> so we should have: >>>>>>>>> >>>>>>>>> ConstantIntValue(jint value) >>>>>>>> >>>>>>>> Again, there are multiple call sites with '0', which match int >>>>>>>> trivially but are confused with long.? It's less consistent I >>>>>>>> agree but better to not cast all the call sites. >>>>>>> >>>>>>> This is really making a mess of the APIs - they should be a jint >>>>>>> but we declare them int because of a 0 casting problem. Can't we >>>>>>> just use 0L? >>>>>> >>>>>> There aren't that many casts.? You're right, that would have been >>>>>> better in some places. >>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/code/relocInfo.cpp >>>>>>>>> >>>>>>>>> Change seems unnecessary - int32_t is fine >>>>>>>>> >>>>>>>> >>>>>>>> No, int32_t doesn't match the calls below it.? They all assume >>>>>>>> _lo and _hi are jint. >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/compiler/compileBroker.cpp >>>>>>>>> src/hotspot/share/compiler/compileBroker.hpp >>>>>>>>> >>>>>>>>> I see a complete mix of int and jint in this class, so why make >>>>>>>>> the one change you did ?? >>>>>>>> >>>>>>>> This is another case of using jint as a flag with cmpxchg. The >>>>>>>> templates for cmpxchg want the types to match and 0 and 1 are >>>>>>>> essentially 'int'.? This is a lot cleaner this way. >>>>>>> >>>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >>>>>>>>> >>>>>>>>> 1700???? tty->write((char*) start, MIN2(length, (jint)O_BUFLEN)); >>>>>>>>> >>>>>>>>> why did you need to add the jint cast? It's used without any >>>>>>>>> cast on the next two lines: >>>>>>>>> >>>>>>>>> 1701???? length -= O_BUFLEN; >>>>>>>>> 1702???? offset += O_BUFLEN; >>>>>>>>> >>>>>>>> >>>>>>>> There's a conversion from O_BUFLEN from int to long in 1701 and >>>>>>>> 1702.?? MIN2 is a template that wants the types to match exactly. >>>>>>> >>>>>>> $%^%$! templates! >>>>>>> >>>>>>>>> ?? >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>>>>>>> >>>>>>>>> Looking around this code it seems very confused about types - >>>>>>>>> eg the previous function is declared jboolean yet returns a >>>>>>>>> jint on one path! It isn't clear to me if the return type is >>>>>>>>> what should be changed or the parameter type? I would just >>>>>>>>> leave this alone. >>>>>>>> >>>>>>>> I can't leave it alone because it doesn't compile that way. This >>>>>>>> was the minimal change and yea, does look a bit inconsistent. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/opto/mulnode.cpp >>>>>>>>> >>>>>>>>> Okay TypeInt has jint parts, so the remaining int32_t >>>>>>>>> declarations (A, B, C, D) should also be jint. >>>>>>>> >>>>>>>> Yes.? c2 uses jint types. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/opto/parse3.cpp >>>>>>>>> >>>>>>>>> I agree with the changes you made, but then: >>>>>>>>> >>>>>>>>> ?419???? jint dim_con = find_int_con(length[j], -1); >>>>>>>>> >>>>>>>>> should also be changed. >>>>>>>>> >>>>>>>>> And obviously MultiArrayExpandLimit should be defined as int >>>>>>>>> not intx! >>>>>>>> >>>>>>>> Everything in globals.hpp is intx.? That's a thread that I don't >>>>>>>> want to pull on! >>>>>>> >>>>>>> We still have that limitation? >>>>>>>> >>>>>>>> Changed dim_con to int. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/opto/phaseX.cpp >>>>>>>>> >>>>>>>>> I can see that intcon(jint i) is consistent with longcon(jlong >>>>>>>>> l), but the use of "i" in the code is more consistent with int >>>>>>>>> than jint. >>>>>>>> >>>>>>>> huh?? really? >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/opto/type.cpp >>>>>>>>> >>>>>>>>> 1505 int TypeInt::hash(void) const { >>>>>>>>> 1506?? return java_add(java_add(_lo, _hi), >>>>>>>>> java_add((jint)_widen, (jint)Type::Int)); >>>>>>>>> 1507 } >>>>>>>>> >>>>>>>>> I can see that the (jint) casts you added make sense, but then >>>>>>>>> the whole function should be returning jint not int. Ditto the >>>>>>>>> other hash functions. >>>>>>>> >>>>>>>> I'm not messing with this, this is the minimal in type fixing >>>>>>>> that I'm going to do here. >>>>>>> >>>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/prims/jni.cpp >>>>>>>>> >>>>>>>>> I think vm_created should be a bool. In fact all the fields you >>>>>>>>> changed are logically bools - do Atomics work for bool now? >>>>>>>> >>>>>>>> No, they do not.?? I had thought bool would be better originally >>>>>>>> too. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/prims/jvm.cpp >>>>>>>>> >>>>>>>>> is_attachable is the terminology used in the JDK code. >>>>>>>> >>>>>>>> Well the JDK version had is_attach_supported() as the flag name >>>>>>>> so I used that in this one place. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>>>>>> src/hotspot/share/prims/jvmtiImpl.cpp >>>>>>>>> >>>>>>>>> Are you making parameters consistent with the fields they >>>>>>>>> initialize? >>>>>>>> >>>>>>>> They're consistent with the declarations now. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/prims/jvmtiTagMap.cpp >>>>>>>>> >>>>>>>>> There is a mix of int and jint for slot in this code. You fixed >>>>>>>>> some, but this remains: >>>>>>>>> >>>>>>>>> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong >>>>>>>>> thread_tag, >>>>>>>>> 2441 jlong tid, >>>>>>>>> 2442 jint depth, >>>>>>>>> 2443 jmethodID method, >>>>>>>>> 2444 jlocation bci, >>>>>>>>> 2445 jint slot, >>>>>>>> >>>>>>>> Right for consistency with the declarations. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/runtime/perfData.cpp >>>>>>>>> >>>>>>>>> Callers pass both jint and int, so param type seems arbitrary. >>>>>>>> >>>>>>>> They are, but importantly they match the declarations. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/hotspot/share/runtime/perfMemory.cpp >>>>>>>>> src/hotspot/share/runtime/perfMemory.hpp >>>>>>>>> >>>>>>>>> PerfMemory::_initialized should ideally be a bool - can >>>>>>>>> OrderAccess handle that now? >>>>>>>> >>>>>>>> Nope. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/java.base/share/native/include/jvm.h >>>>>>>>> >>>>>>>>> Not clear why the jio functions are not also JNICALL ? >>>>>>>> >>>>>>>> They are now.? The JDK version didn't have JNICALL. JVM needs >>>>>>>> JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. >>>>>>> >>>>>>> ?? JVM currently does not have JNICALL. But they are declared as >>>>>>> "extern C". >>>>>> >>>>>> This was a compilation error on Windows with JDK.?? Maybe the C >>>>>> code in the JDK doesn't complain about linkage differences. I'll >>>>>> have to go back and figure this out then. >>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/java.base/unix/native/include/jni_md.h >>>>>>>>> >>>>>>>>> There is no need to special case ARM. The differences in the >>>>>>>>> existing code were for LTO support and that is now irrelevant. >>>>>>>> >>>>>>>> See discussion with Magnus.?? We still build ARM for jdk10/hs so >>>>>>>> I needed this conditional or of course I wouldn't have added >>>>>>>> it.? We can remove it with LTO support. >>>>>>> >>>>>>> Those builds are gone - this is obsolete. But yes all LTO can be >>>>>>> removed later if you wish. Just trying to simplify things now. >>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/java.base/unix/native/include/jvm_md.h >>>>>>>>> >>>>>>>>> I know you've just copied this across, but it seems wrong to me: >>>>>>>>> >>>>>>>>> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on >>>>>>>>> others. This may >>>>>>>>> ? 58 //?????? cause problems if JVM and the rest of JDK are >>>>>>>>> built on different >>>>>>>>> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN to >>>>>>>>> be MAXPATHLEN + 1, >>>>>>>>> ? 60 //?????? so buffers declared in VM are always >= 4096. >>>>>>>>> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>>>>>> >>>>>>>>> It doesn't make sense to me to define an internal "max path >>>>>>>>> length" that can _exceed_ the platform max! >>>>>>>>> >>>>>>>>> That aside there's no support for building different parts of >>>>>>>>> the JDK on different platforms and then bringing them together. >>>>>>>>> And in any case I would think the real problem would be >>>>>>>>> building on a platform that uses 4096 and running on one that >>>>>>>>> uses 4095! >>>>>>>>> >>>>>>>>> But that aside this is a Linux hack and should be guarded by >>>>>>>>> ifdef LINUX. (I doubt BSD needs it, the bsd file is just a copy >>>>>>>>> of the linux one - the JDK macosx version does the right >>>>>>>>> thing). Solaris and AIX should stay as-is at MAXPATHLEN. >>>>>>>> >>>>>>>> All of the unix platforms had MAXPATHLEN+1.? I'll leave it for >>>>>>>> now and we can investigate that further. >>>>>>> >>>>>>> I see the following existing code: >>>>>>> >>>>>>> src/java.base/unix/native/include/jvm_md.h: >>>>>>> >>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>>> >>>>>>> src/java.base/macosx/native/include/jvm_md.h >>>>>>> >>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>>> >>>>>>> src/hotspot/os/aix/jvm_aix.h >>>>>>> >>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>>> >>>>>>> src/hotspot/os/bsd/jvm_bsd.h >>>>>>> >>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1? // blindly copied from >>>>>>> Linux version >>>>>>> >>>>>>> src/hotspot/os/linux/jvm_linux.h >>>>>>> >>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>>>> >>>>>>> src/hotspot/os/solaris/jvm_solaris.h >>>>>>> >>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>>> >>>>>>> This is a linux only hack (if you ignore the blind copy from >>>>>>> linux into the BSD code in the VM). >>>>>> >>>>>> Oh, thanks, so should I add a bunch of ifdefs then?? Or do you >>>>>> think having MAXPATHLEN + 1 will really break the other >>>>>> platforms?? Do you really see this as a problem or are you just >>>>>> pointing out inconsistency? >>>>>>> >>>>>>>>> >>>>>>>>> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >>>>>>>>> >>>>>>>>> This only exists on Solaris so I think should be in #ifdef >>>>>>>>> SOLARIS, to make that clear. >>>>>>>> >>>>>>>> Ok.? I'll add this. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> src/java.base/windows/native/include/jvm_md.h >>>>>>>>> >>>>>>>>> Given the differences between the two versions either something >>>>>>>>> has been broken or "extern C" declarations are not needed :) >>>>>>>> >>>>>>>> Well, they are needed for Hotspot to build and do not prevent >>>>>>>> jdk from building.? I don't know what was broken. >>>>>>> >>>>>>> We really need to understand this better. Maybe related to the >>>>>>> map files that expose the symbols. ?? >>>>>> >>>>>> They're needed because the JDK files are written mostly in C and >>>>>> that doesn't complain about the linkage difference. Hotspot files >>>>>> are in C++ which does complain. >>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> That was a really painful way to spend most of my Friday. TGIF! :) >>>>>>>> >>>>>>>> Thanks for going through it.? See comments inline for changes. >>>>>>>> Generating a webrev takes hours so I'm not going to do that >>>>>>>> unless you insist. >>>>>>> >>>>>>> An incremental webrev shouldn't take long - right? You're a mq >>>>>>> maestro now. :) >>>>>> >>>>>> Well I generally trash a repository whenever I use mq but sure. >>>>>>> >>>>>>> If you can reasonably produce an incremental webrev once you've >>>>>>> settled on all the comments/issues that would be good. >>>>>> >>>>>> Ok, sure. >>>>>> >>>>>> Coleen >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>>> Thanks, >>>>>>>> Coleen >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> >>>>>>>>> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>>>>>>>>> ??Hi Magnus, >>>>>>>>>> >>>>>>>>>> Thank you for reviewing this.?? I have a new version that >>>>>>>>>> takes out the hack in globalDefinitions.hpp and adds casts to >>>>>>>>>> src/hotspot/share/opto/type.cpp instead. >>>>>>>>>> >>>>>>>>>> Also some fixes from Martin at SAP. >>>>>>>>>> >>>>>>>>>> open webrev at >>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>>>>>>>>> >>>>>>>>>> see below. >>>>>>>>>> >>>>>>>>>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>>>>>>>>> Coleen, >>>>>>>>>>> >>>>>>>>>>> Thank you for addressing this! >>>>>>>>>>> >>>>>>>>>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>>>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>>>>>> >>>>>>>>>>>> Mostly used sed to remove prims/jvm.h and move #include >>>>>>>>>>>> "jvm.h" after precompiled.h, so if you have repetitive >>>>>>>>>>>> stress wrist issues don't click on most of these files. >>>>>>>>>>>> >>>>>>>>>>>> There were more issues to resolve, however. The JDK windows >>>>>>>>>>>> jni_md.h file defined jint as long and the hotspot windows >>>>>>>>>>>> jni_x86.h as int. I had to choose the jdk version since it's >>>>>>>>>>>> the public version, so there are changes to the hotspot >>>>>>>>>>>> files for this. Generally I changed the code to use 'int' >>>>>>>>>>>> rather than 'jint' where the surrounding API didn't insist >>>>>>>>>>>> on consistently using java types. We should mostly be using >>>>>>>>>>>> C++ types within hotspot except in interfaces to native/JNI >>>>>>>>>>>> code. There are a couple of hacks in places where adding >>>>>>>>>>>> multiple jint casts was too painful. >>>>>>>>>>>> >>>>>>>>>>>> Tested with JPRT and tier2-4 (in progress). >>>>>>>>>>>> >>>>>>>>>>>> open webrev at >>>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>>>>>>>>> >>>>>>>>>>> Looks great! >>>>>>>>>>> >>>>>>>>>>> Just a few comments: >>>>>>>>>>> >>>>>>>>>>> * src/java.base/unix/native/include/jni_md.h: >>>>>>>>>>> >>>>>>>>>>> I don't think the externally_visible attribute should be >>>>>>>>>>> there for arm. I know this was the case for the corresponding >>>>>>>>>>> hotspot file for arm, but that was techically incorrect. The >>>>>>>>>>> proper dependency here is that externally_visible should be >>>>>>>>>>> in all JNIEXPORT if and only if we're building with JVM >>>>>>>>>>> feature "link-time-opt". Traditionally, that feature been >>>>>>>>>>> enabled when building arm32 builds, and only then, so there's >>>>>>>>>>> been a (coincidentally) connection here. Nowadays, Oracle >>>>>>>>>>> does not care about the arm32 builds, and I'm not sure if >>>>>>>>>>> anyone else is building them with link-time-opt enabled. >>>>>>>>>>> >>>>>>>>>>> It does seem wrong to me to export this behavior in the >>>>>>>>>>> public jni_md.h file, though. I think the correct way to >>>>>>>>>>> solve this, if we should continue supporting link-time-opt is >>>>>>>>>>> to make sure this attribute is set for exported hotspot >>>>>>>>>>> functions. If it's still needed, that is. A quick googling >>>>>>>>>>> seems to indicate that visibility("default") might be enough >>>>>>>>>>> in modern gcc's. >>>>>>>>>>> >>>>>>>>>>> A third option is to remove the support for link-time-opt >>>>>>>>>>> entirely, if it's not really used. >>>>>>>>>> >>>>>>>>>> I didn't know how to change this since we are still building >>>>>>>>>> ARM with the jdk10/hs repository, and ARM needed this change. >>>>>>>>>> I could wait until we bring down the jdk10/master changes that >>>>>>>>>> remove the ARM build and remove this conditional before I >>>>>>>>>> push. Or we could file an RFE to remove link-time-opt (?) and >>>>>>>>>> remove it then? >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * src/java.base/unix/native/include/jvm_md.h and >>>>>>>>>>> src/java.base/windows/native/include/jvm_md.h: >>>>>>>>>>> >>>>>>>>>>> These files define a public API, and contain non-trivial >>>>>>>>>>> changes. I suspect you should file a CSR request. (Even >>>>>>>>>>> though I realize you're only matching the header file with >>>>>>>>>>> the reality.) >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I filed the CSR.?? Waiting for the next steps. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Coleen >>>>>>>>>> >>>>>>>>>>> /Magnus >>>>>>>>>>> >>>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>>>>>>>>> >>>>>>>>>>>> I have a script to update copyright files on commit. >>>>>>>>>>>> >>>>>>>>>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Coleen >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >> > From david.holmes at oracle.com Tue Oct 31 06:33:10 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 31 Oct 2017 16:33:10 +1000 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <29688c76-4983-dffc-6ce2-402cf91dafbf@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <57390ec3-8d8d-a3d7-9774-b5945a323be9@oracle.com> <0f568e05-6f06-d2df-571e-0c591f062c15@oracle.com> <29688c76-4983-dffc-6ce2-402cf91dafbf@oracle.com> Message-ID: On 30/10/2017 10:15 PM, coleen.phillimore at oracle.com wrote: > On 10/28/17 3:58 AM, David Holmes wrote: >> On 28/10/2017 6:20 AM, coleen.phillimore at oracle.com wrote: >>> >>> Incremental webrev: >>> >>> http://cr.openjdk.java.net/~coleenp/8189610.incr.01/webrev/index.html >> >> That all looks fine - thanks. >> >> If I get a chance I'll look deeper into why the VS compiler needs 0 to >> be cast to jint (aka long) to avoid ambiguity with it being a NULL >> pointer. I could understand if it always needed the cast, but not only >> needing it for long, but not int. > > Thanks,? Kim can probably tell you where in the spec this is. Now I get it. Given: void x(int i) { ...} void x(Foo* p) { ... } a call x(0) is a call to x(int) because 0 is an int. No conversion needed. But given: void x(long i) { ...} void x(Foo* p) { ... } a call x(0) has no direct match (no int version) so standard conversions apply and IIUC conversion to long and conversion to Foo* have the same rank, so neither is preferred and the call is ambiguous. David > Coleen > >> >> Thanks, >> David >> >>> thanks, >>> Coleen >>> >>> On 10/27/17 11:13 AM, coleen.phillimore at oracle.com wrote: >>>> >>>> >>>> On 10/27/17 9:37 AM, David Holmes wrote: >>>>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>>>> >>>>>>> ?ConstantIntValue((jint)0); >>>>>>> >>>>>>> why is this cast needed? what causes the ambiguity? (If this was >>>>>>> a template I'd understand ;-) ). Also didn't you change that >>>>>>> constructor to take an int anyway - not that I think it should - >>>>>>> see below. >>>>>> >>>>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't match >>>>>> 'long' better than any pointer type.? So this cast is needed. >>>>> >>>>> But you changed the constructor to take an int! >>>>> >>>>> ?class ConstantIntValue: public ScopeValue { >>>>> ? private: >>>>> -? jint _value; >>>>> +? int _value; >>>>> ? public: >>>>> -? ConstantIntValue(jint value)???????? { _value = value; } >>>>> +? ConstantIntValue(int value)????????? { _value = value; } >>>>> >>>> I changed this back to not take an int and changed c1_LinearScan.cpp >>>> to have the (jint)0 cast and output.cp needed (jint)0 casts.? 0L >>>> doesn't work for platforms where jint is an 'int' rather than a long >>>> because it's ambiguous with the functions that take a pointer type. >>>> Probably better to keep the type of ConstantIntValue consistent with >>>> j types. >>>> >>>> Thanks, >>>> Coleen >>> > From david.holmes at oracle.com Tue Oct 31 10:27:47 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 31 Oct 2017 20:27:47 +1000 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <6e10687c-f70e-5ee7-414f-b2c22d3e8f21@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> <59F2F01A.403@oracle.com> <4ebb905f23324a00b9cf10d8d410d420@sap.com> Message-ID: <9ff3abc3-9809-a9df-141b-15f0b05bd8a4@oracle.com> Hi Robbin, On 31/10/2017 12:34 AM, Robbin Ehn wrote: > Thanks! > > There have been a bit hesitation and confusion about the option (at > least internally). > The option is opt-out but in globals.hpp it starts out as false. > > Now instead we explicit set it true in globals.hpp but we turn it off if > we notice that: > - We are on an unsupported platform > - User have specified UseAOT > - User have specified EnableJVMCI That logic from #4617 onwards is absolutely doing my head in! 4617 bool aot_enabled = UseAOT && ((AOTLibrary != NULL) || !FLAG_IS_DEFAULT(UseAOT)); why do we care if the flag is default or not? If they set an AOTLibrary they expect AOT to be enabled. If they know UseAOT is true by default then they won't set it explicitly and so the flag will be default. If they set UseAOT directly but don't set a library then they won't get AOT - and UseAOT should be turned off somewhere else. 4623 if (FLAG_IS_DEFAULT(UseAOT)) { Why do we care if it is default or not? If we got here AOT is not enabled. We can just do: if (UseAOT) FLAG_SET_DEFAULT(UseAOT, false) or even skip the query and just set it false. 4627 if (FLAG_IS_DEFAULT(ThreadLocalHandshakes) && ThreadLocalHandshakes) { Okay I get why you check for default here :) 4631 FLAG_SET_ERGO(bool, ThreadLocalHandshakes, false); I wouldn't really say this is an "ergo" choice - if we can't have it on then we set it off - just as previously done with UseAOT. 4632 } else if (!FLAG_IS_DEFAULT(UseAOT) && UseAOT) { Again why do we care about default? You seem to be saying that "java -XX:+UseAOT -XX:AOTLibrary=..." is a stronger request for AOT than just "java -XX:AOTLibrary=...". But I'd always use the latter if I know UseAOT defaults to true anyway. 4635 FLAG_SET_ERGO(bool, ThreadLocalHandshakes, false); 4639 FLAG_SET_ERGO(bool, ThreadLocalHandshakes, false); Same ergo comment. I'm also thinking, if this is platform dependent then shouldn't ThreadLocalHandshakes be a product_pd flag, with pd specific default setting - and turning it on when on an unsupported platform should be a error ? Thanks, David ----- > Here is webrev for changes needed: > http://cr.openjdk.java.net/~rehn/8185640/v8/Option-Cleanup-12/webrev/ > And here is CSR: > https://bugs.openjdk.java.net/browse/JDK-8189942 > > Manual testing + basic testing done. > > And since I'm really hoping that this can be the last incremental, here > is my whole patch queue flatten out: > http://cr.openjdk.java.net/~rehn/8185640/v8/Full/webrev/ > > Thanks, Robbin > > On 10/27/2017 04:47 PM, Doerr, Martin wrote: >> Hi Robbin, >> >> excellent. I think this matches what Coleen had proposed, now. >> Thanks for doing all the work with so many incremental patches and for >> responding on so many discussions. Seems to be a tough piece of work. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >> Sent: Freitag, 27. Oktober 2017 15:15 >> To: Erik ?sterlund ; Andrew Haley >> ; Doerr, Martin ; Karen Kinnear >> ; Coleen Phillimore >> (coleen.phillimore at oracle.com) >> Cc: hotspot-dev developers >> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >> >> Hi all, >> >> Poll in switches: >> http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Switch-10/ >> >> Poll in return: >> http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Ret-11/ >> >> Please take an extra look at poll in return. >> >> Sanity tested, big test run still running (99% complete - OK). >> >> Performance regression for the added polls increased to total of >> -0.68% vs >> global poll. (was -0.44%) >> >> We are discussing the opt-out option, the newest suggestion is to make it >> diagnostic. Opinions? >> >> For anyone applying these patches, the number 9 patch changes the >> option from >> product. I have not sent that out. >> >> Thanks, Robbin >> >> >> From artem.smotrakov at oracle.com Tue Oct 31 10:58:07 2017 From: artem.smotrakov at oracle.com (Artem Smotrakov) Date: Tue, 31 Oct 2017 13:58:07 +0300 Subject: RFR [10] 8189800: Add support for AddressSanitizer In-Reply-To: <4982c49e-859c-75d4-e5b1-b4a68b49d746@oracle.com> References: <51eabbae-5435-59be-f443-a6b214a17513@oracle.com> <4982c49e-859c-75d4-e5b1-b4a68b49d746@oracle.com> Message-ID: <3b08231c-4f8b-3d9a-3515-15afe743b82d@oracle.com> Hi David, That's a good question, I thought about it. According to [1]: - recommended versions of gcc is 4.9.2 - the minimum accepted version of gcc is 4.7 (Older versions will generate a warning by `configure` and are unlikely to work.) - the minimum accepted version of clang is 3.2 (Older versions will not be accepted by `configure`) It looks like that clang has to be at least 3.2 which should contain AddressSanitizer. Only for gcc, there may be a chance that someone wants to use 4.7. So, we might want to check version to see if it's 4.7, although I am not sure how many people would like to use gcc 4.7. As a result, this case didn't look very common to me, so I preferred to simplify the patch, and didn't add such a check. Without version check, compilation is going to fail if gcc 4.7 is used, and -fsanitize=address enabled. [1] http://hg.openjdk.java.net/jdk10/master/file/438e0c9f2f17/doc/building.md Artem On 10/31/2017 01:37 PM, David Holmes wrote: > Hi Artem, > > On 28/10/2017 6:02 AM, Artem Smotrakov wrote: >> Hello, >> >> Please review the following patch which adds support for >> AddressSanitizer. >> >> AddressSanitizer is a runtime memory error detector which looks for >> various memory corruption issues and leaks. >> >> Please refer to [1] for details. AddressSanitizer is available in gcc >> 4.8+ and clang 3.1+ > > Should we be checking the version before adding the flags? > > Thanks, > David > >> The patch below introduces --enable-asan parameter for the configure >> script which enables AddressSanitizer. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8189800 >> Webrev: http://cr.openjdk.java.net/~asmotrak/8189800/webrev.00/ >> >> [1] https://github.com/google/sanitizers/wiki/AddressSanitizer >> >> Artem From david.holmes at oracle.com Tue Oct 31 12:24:10 2017 From: david.holmes at oracle.com (David Holmes) Date: Tue, 31 Oct 2017 22:24:10 +1000 Subject: RFR [10] 8189800: Add support for AddressSanitizer In-Reply-To: <3b08231c-4f8b-3d9a-3515-15afe743b82d@oracle.com> References: <51eabbae-5435-59be-f443-a6b214a17513@oracle.com> <4982c49e-859c-75d4-e5b1-b4a68b49d746@oracle.com> <3b08231c-4f8b-3d9a-3515-15afe743b82d@oracle.com> Message-ID: <8a97162b-4055-472c-dc55-5e38bd9a5ca8@oracle.com> Sounds reasonable. Anyone using older gcc simply won't/shouldn't enable Asan. Thanks, David On 31/10/2017 8:58 PM, Artem Smotrakov wrote: > Hi David, > > That's a good question, I thought about it. According to [1]: > > - recommended versions of gcc is 4.9.2 > - the minimum accepted version of gcc is 4.7 (Older versions will > generate a warning by `configure` and are unlikely to work.) > - the minimum accepted version of clang is 3.2 (Older versions will not > be accepted by `configure`) > > It looks like that clang has to be at least 3.2 which should contain > AddressSanitizer. Only for gcc, there may be a chance that someone wants > to use 4.7. So, we might want to check version to see if it's 4.7, > although I am not sure how many people would like to use gcc 4.7. As a > result, this case didn't look very common to me, so I preferred to > simplify the patch, and didn't add such a check. > > Without version check, compilation is going to fail if gcc 4.7 is used, > and -fsanitize=address enabled. > > [1] > http://hg.openjdk.java.net/jdk10/master/file/438e0c9f2f17/doc/building.md > > Artem > > On 10/31/2017 01:37 PM, David Holmes wrote: >> Hi Artem, >> >> On 28/10/2017 6:02 AM, Artem Smotrakov wrote: >>> Hello, >>> >>> Please review the following patch which adds support for >>> AddressSanitizer. >>> >>> AddressSanitizer is a runtime memory error detector which looks for >>> various memory corruption issues and leaks. >>> >>> Please refer to [1] for details. AddressSanitizer is available in gcc >>> 4.8+ and clang 3.1+ >> >> Should we be checking the version before adding the flags? >> >> Thanks, >> David >> >>> The patch below introduces --enable-asan parameter for the configure >>> script which enables AddressSanitizer. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8189800 >>> Webrev: http://cr.openjdk.java.net/~asmotrak/8189800/webrev.00/ >>> >>> [1] https://github.com/google/sanitizers/wiki/AddressSanitizer >>> >>> Artem > From magnus.ihse.bursie at oracle.com Tue Oct 31 12:41:09 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 31 Oct 2017 13:41:09 +0100 Subject: RFR [10] 8189800: Add support for AddressSanitizer In-Reply-To: <3b4c5abb-762f-a66c-02d5-93909dc656d4@oracle.com> References: <51eabbae-5435-59be-f443-a6b214a17513@oracle.com> <55e0e055-2e65-5c83-3f8e-36895f71860e@oracle.com> <3b4c5abb-762f-a66c-02d5-93909dc656d4@oracle.com> Message-ID: On 2017-10-30 10:31, Artem Smotrakov wrote: > Hi Magnus, > > The current approach uses AddressSanitizer as a shared library > (libasan.so) which is part of GCC/Clang toolkit. In case you use > system toolkit, then libasan.so is available for linker and at > runtime. But if you set a custom toolkit by --with-devkit option, then > libasan.so form this toolkit may not be available for linker and at > runtime by default. As a result, you can get errors while linking and > running. To fix that, you normally need to make it available using > ldconfig, or update LD_LIBRARY_PATH. That's why it updates > LD_LIBRARY_PATH with DEVKIT_LIB_DIR if a custom toolkit was used. That > may be helpful when you build JDK in environment like jib/jprt. > > I tried to remove exporting ASAN_ENABLED and DEVKIT_LIB_DIR, and as a > result, ASAN_OPTIONS and DEVKIT_LIB_DIR didn't go to jtreg command > which caused tests to fail when you run "make test". If we don't > export ASAN_OPTIONS and DEVKIT_LIB_DIR, then the updates in > TestCommon.gmk don't make much sense to me because those variables > have to be explicitly set for "make" anyway. > > I can remove exporting those variables and revert TestCommon.gmk. > Although, it looks nicer to me if we can run the tests just with "make > test" without specifying ASAN_OPTIONS and DEVKIT_LIB_DIR explicitly. > > What do you think? Ah, I see. TestCommon.gmk is not properly integrated into the rest of the build system. I'm still a bit surprised at this behavior, but I accept your explanation. Keep it as it is. TestCommon is due to be removed by the new RunTests.gmk (which is properly integrated), and when that happens, we can remove the exports then. /Magnus > > Artem > > > On 10/30/2017 10:50 AM, Magnus Ihse Bursie wrote: >> On 2017-10-30 08:39, Artem Smotrakov wrote: >>> cc'ing hotspot-dev at openjdk.java.net as David suggested. >>> >>> Artem >>> >>> >>> On 10/27/2017 11:02 PM, Artem Smotrakov wrote: >>>> Hello, >>>> >>>> Please review the following patch which adds support for >>>> AddressSanitizer. >>>> >>>> AddressSanitizer is a runtime memory error detector which looks for >>>> various memory corruption issues and leaks. >>>> >>>> Please refer to [1] for details. AddressSanitizer is available in >>>> gcc 4.8+ and clang 3.1+ >>>> >>>> The patch below introduces --enable-asan parameter for the >>>> configure script which enables AddressSanitizer. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8189800 >>>> Webrev: http://cr.openjdk.java.net/~asmotrak/8189800/webrev.00/ >> spec.gmk.in should only have export for variables that needs to be >> exported in the environment for executing binaries, that is >> ASAN_OPTIONS and LD_LIBRARY_PATH, not ASAN_ENABLED or DEVKIT_LIB_DIR. >> >> I'm also a bit curious about the addition of of DEVKIT_LIB_DIR. Would >> you care to elaborate your thinking? >> >> Otherwise it looks good. >> >> /Magnus >> >>>> >>>> [1] https://github.com/google/sanitizers/wiki/AddressSanitizer >>>> >>>> Artem >>> >> > From magnus.ihse.bursie at oracle.com Tue Oct 31 12:42:45 2017 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 31 Oct 2017 13:42:45 +0100 Subject: RFR [10] 8189800: Add support for AddressSanitizer In-Reply-To: <8a97162b-4055-472c-dc55-5e38bd9a5ca8@oracle.com> References: <51eabbae-5435-59be-f443-a6b214a17513@oracle.com> <4982c49e-859c-75d4-e5b1-b4a68b49d746@oracle.com> <3b08231c-4f8b-3d9a-3515-15afe743b82d@oracle.com> <8a97162b-4055-472c-dc55-5e38bd9a5ca8@oracle.com> Message-ID: <65ec3f30-2d0a-fe84-5455-d7ced2235061@oracle.com> On 2017-10-31 13:24, David Holmes wrote: > Sounds reasonable. Anyone using older gcc simply won't/shouldn't > enable Asan. Agree. We will probably not be keeping any pretense of supporting anything older than gcc 4.9 at some point in time anyway. I believe the only known user of the oldest gcc is SAP for some of their platforms. /Magnus > > Thanks, > David > > On 31/10/2017 8:58 PM, Artem Smotrakov wrote: >> Hi David, >> >> That's a good question, I thought about it. According to [1]: >> >> - recommended versions of gcc is 4.9.2 >> - the minimum accepted version of gcc is 4.7 (Older versions will >> generate a warning by `configure` and are unlikely to work.) >> - the minimum accepted version of clang is 3.2 (Older versions will >> not be accepted by `configure`) >> >> It looks like that clang has to be at least 3.2 which should contain >> AddressSanitizer. Only for gcc, there may be a chance that someone >> wants to use 4.7. So, we might want to check version to see if it's >> 4.7, although I am not sure how many people would like to use gcc >> 4.7. As a result, this case didn't look very common to me, so I >> preferred to simplify the patch, and didn't add such a check. >> >> Without version check, compilation is going to fail if gcc 4.7 is >> used, and -fsanitize=address enabled. >> >> [1] >> http://hg.openjdk.java.net/jdk10/master/file/438e0c9f2f17/doc/building.md >> >> Artem >> >> On 10/31/2017 01:37 PM, David Holmes wrote: >>> Hi Artem, >>> >>> On 28/10/2017 6:02 AM, Artem Smotrakov wrote: >>>> Hello, >>>> >>>> Please review the following patch which adds support for >>>> AddressSanitizer. >>>> >>>> AddressSanitizer is a runtime memory error detector which looks for >>>> various memory corruption issues and leaks. >>>> >>>> Please refer to [1] for details. AddressSanitizer is available in >>>> gcc 4.8+ and clang 3.1+ >>> >>> Should we be checking the version before adding the flags? >>> >>> Thanks, >>> David >>> >>>> The patch below introduces --enable-asan parameter for the >>>> configure script which enables AddressSanitizer. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8189800 >>>> Webrev: http://cr.openjdk.java.net/~asmotrak/8189800/webrev.00/ >>>> >>>> [1] https://github.com/google/sanitizers/wiki/AddressSanitizer >>>> >>>> Artem >> From coleen.phillimore at oracle.com Tue Oct 31 12:53:17 2017 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 31 Oct 2017 08:53:17 -0400 Subject: RFR (L, tedious again, sorry) 8189610: Reconcile jvm.h and all jvm_md.h between java.base and hotspot In-Reply-To: <058662bb-5d5b-0085-cc08-02192d000838@oracle.com> References: <8671321f-398c-5f7f-634d-9d9664e04d87@oracle.com> <05bf853a-52f8-c450-4171-89f6b64d793a@oracle.com> <72f2aac7-ed20-c995-913b-ee4341a2a978@oracle.com> <55ec3559-c593-bcb6-51b0-4639da126068@oracle.com> <4509dce7-10f8-4558-2adb-90d4745e054e@oracle.com> <396ab0f7-3710-3f76-675a-5108bcb50af5@oracle.com> <22afedef-59cc-ecde-48fc-0afb7b4bbb47@oracle.com> <815ac734-ea8b-ea2d-ecec-85cb547ba2f4@oracle.com> <440f79ba-2da3-b627-53bc-e1842e3cf73c@oracle.com> <058662bb-5d5b-0085-cc08-02192d000838@oracle.com> Message-ID: On 10/30/17 8:21 PM, David Holmes wrote: > On 31/10/2017 12:48 AM, coleen.phillimore at oracle.com wrote: >> >> http://cr.openjdk.java.net/~coleenp/8189610.incr.02/webrev/index.html >> >> Changed JDK file to use PATH_MAX.? Retested jdk tier1 tests. > > Why PATH_MAX instead of MAXPATHLEN? They appear to be the same on > Linux and Solaris, but I don't know if that is true for AIX and Mac OS > / BSD. I picked PATH_MAX because canonicalize_md.c uses that constant. > > Does UnixFileSystem_md.c still need the jvm.h include now? No, I will remove it. Thanks, Coleen > > Thanks, > David > >> thanks, >> Coleen >> >> On 10/30/17 8:38 AM, coleen.phillimore at oracle.com wrote: >>> >>> >>> On 10/30/17 8:17 AM, David Holmes wrote: >>>> On 30/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>>>> On 10/28/17 3:50 AM, David Holmes wrote: >>>>>> Hi Coleen, >>>>>> >>>>>> I've commented on the file location in response to Mandy's email. >>>>>> >>>>>> The only issue I'm still concerned about is the JVM_MAXPATHLEN >>>>>> issue. I think it is a bug to define a JVM_MAXPATHLEN that is >>>>>> bigger than the platform MAXPATHLEN. I also would not want to see >>>>>> any change in behaviour because of this - so AIX and Solaris >>>>>> should not get a different JVM_MAXPATHLEN due to this refactoring >>>>>> change. So yes I think this needs to be ifdef'd for Linux and >>>>>> reluctantly (because it was a copy error) for OSX/BSD as well. >>>>> >>>>> #if defined(AIX) || defined(SOLARIS) >>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>> #else >>>>> // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on others. This >>>>> may >>>>> //?????? cause problems if JVM and the rest of JDK are built on >>>>> different >>>>> //?????? Linux releases. Here we define JVM_MAXPATHLEN to be >>>>> MAXPATHLEN + 1, >>>>> //?????? so buffers declared in VM are always >= 4096. >>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>> #endif >>>>> >>>>> Is this ok? >>>> >>>> Yes - thanks. It preserves existing behaviour on the VM side at >>>> least. Time will tell if it messes anything up on the JDK side for >>>> Linux/OSX. >>> >>> I don't want to wait for time so I'm investigating. >>> >>> It's one use is: >>> >>> Java_java_io_UnixFileSystem_canonicalize0(JNIEnv *env, jobject this, >>> ... >>> ??????? char canonicalPath[JVM_MAXPATHLEN]; >>> ??????? if (canonicalize((char *)path, >>> ???????????????????????? canonicalPath, JVM_MAXPATHLEN) < 0) { >>> ??????????? JNU_ThrowIOExceptionWithLastError(env, "Bad pathname"); >>> >>> Which goes to: >>> >>> canonicalize_md.c >>> >>> canonicalize(char *original, char *resolved, int len) >>> ??? if (len < PATH_MAX) { >>> ??????? errno = EINVAL; >>> ??????? return -1; >>> ??? } >>> >>> >>> So this should fail every time. >>> >>> sys/param.h:# define MAXPATHLEN??? PATH_MAX >>> >>> I haven't found any tests for it. >>> >>> I don't know why Java_java_io_UnixFileSystem uses JVM_MAXPATHLEN >>> since it's not calling the JVM interface as far as I can tell. I >>> think it should be changed to PATH_MAX. >>> >>> ? >>> Coleen >>>> >>>> David >>>> >>>>> thanks, >>>>> Coleen >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 28/10/2017 12:08 AM, coleen.phillimore at oracle.com wrote: >>>>>>> >>>>>>> >>>>>>> On 10/27/17 9:37 AM, David Holmes wrote: >>>>>>>> On 27/10/2017 10:13 PM, coleen.phillimore at oracle.com wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10/27/17 3:23 AM, David Holmes wrote: >>>>>>>>>> Hi Coleen, >>>>>>>>>> >>>>>>>>>> Thanks for tackling this. >>>>>>>>>> >>>>>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>>>> >>>>>>>>>> Can you update the bug synopsis to show it covers both sets >>>>>>>>>> of files please. >>>>>>>>>> >>>>>>>>>> I hate to start with this (and it took me quite a while to >>>>>>>>>> realize it) but as Mandy pointed out jvm.h is not an exported >>>>>>>>>> interface from the JDK to the outside world (so not subject >>>>>>>>>> to CSR review), but is a private interface between the JVM >>>>>>>>>> and the JDK libraries. So I think really jvm.h belongs in the >>>>>>>>>> hotspot sources where it was, while jni.h belongs in the >>>>>>>>>> exported JDK sources. In which case the bulk of your changes >>>>>>>>>> to the hotspot files would not be needed - sorry. >>>>>>>>> >>>>>>>>> Maybe someone can make that decision and change at a later >>>>>>>>> date. The point of this change is that there is now only one >>>>>>>>> of these files that is shared.? I don't think jvm.h and the >>>>>>>>> jvm_md.h belong on the hotspot sources for the jdk to find >>>>>>>>> them in some random prims and os dependent directories. >>>>>>>> >>>>>>>> The one file that is needed is a hotspot file - jvm.h defines >>>>>>>> the interface that hotspot exports via jvm.cpp. >>>>>>>> >>>>>>>> If you leave jvm.h in hotspot/prims then a very large chunk of >>>>>>>> your boilerplate changes are not needed. The JDK code doesn't >>>>>>>> care what the name of the directory is - whatever it is just >>>>>>>> gets added as a -I directive (the JDK code will include "jvm.h" >>>>>>>> not "prims/jvm.h" the way hotspot sources do. >>>>>>>> >>>>>>>> This isn't something we want to change back or move again >>>>>>>> later. Whatever we do now we live with. >>>>>>> >>>>>>> I think it belongs with jni.h and I think the core libraries >>>>>>> group would agree.?? It seems more natural there than buried in >>>>>>> the hotspot prims directory.? I guess this is on hold while we >>>>>>> have this debate. Sigh. >>>>>>> >>>>>>> Actually with -I directives, changing to jvm.h from prims/jvm.h >>>>>>> would still work.?? Maybe we should change the name to jvm.hpp >>>>>>> since it's jvm.cpp though??? Or maybe just have two divergent >>>>>>> copies and close this as WNF. >>>>>>> >>>>>>>> >>>>>>>>> I'm happy to withdraw the CSR. We generally use the CSR >>>>>>>>> process to add and remove JVM_ interfaces even though they're >>>>>>>>> a private interface in case some other JVM/JDK combination >>>>>>>>> relies on them. The changes to these files are very minor >>>>>>>>> though and not likely to cause any even theoretical >>>>>>>>> incompatibility, so I'll withdraw it. >>>>>>>>>> >>>>>>>>>> Moving on ... >>>>>>>>>> >>>>>>>>>> First to address the initial comments/query you had: >>>>>>>>>> >>>>>>>>>>> The JDK windows jni_md.h file defined jint as long and the >>>>>>>>>>> hotspot >>>>>>>>>>> windows jni_x86.h as int. I had to choose the jdk version >>>>>>>>>>> since it's the >>>>>>>>>>> public version, so there are changes to the hotspot files >>>>>>>>>>> for this. >>>>>>>>>> >>>>>>>>>> On Windows int and long are always the same as it uses ILP32 >>>>>>>>>> or LLP64 (not LP64 like *nix platforms). So either choice >>>>>>>>>> should be fine. That said there are some odd casting issues I >>>>>>>>>> comment on below. Does the VS compiler complain about mixing >>>>>>>>>> int and long in expressions? >>>>>>>>> >>>>>>>>> Yes, it does even though int and long are the same >>>>>>>>> representation. >>>>>>>> >>>>>>>> And what an absolute mess that makes. :( >>>>>>>> >>>>>>>>>> >>>>>>>>>>> Generally I changed the code to use 'int' rather than 'jint' >>>>>>>>>>> where the >>>>>>>>>>> surrounding API didn't insist on consistently using java >>>>>>>>>>> types. We >>>>>>>>>>> should mostly be using C++ types within hotspot except in >>>>>>>>>>> interfaces to >>>>>>>>>>> native/JNI code. >>>>>>>>>> >>>>>>>>>> I think you pulled too hard on a few threads here and things >>>>>>>>>> are starting to unravel. There are numerous cases I refer to >>>>>>>>>> below where either the cast seems unnecessary/inappropriate >>>>>>>>>> or else highlights a bunch of additional changes that also >>>>>>>>>> need to be made. The fan out from this could be horrendous. >>>>>>>>>> Unless you actually get some kind of error - and I'd like to >>>>>>>>>> understand the details of those - I would not suggest making >>>>>>>>>> these changes as part of this work. >>>>>>>>> >>>>>>>>> I didn't make any change unless there was was an error. I have >>>>>>>>> 100 failed JPRT jobs to confirm!? I eventually got a Windows >>>>>>>>> system to compile and test this on. Actually some of the >>>>>>>>> changes came out better.? Cases where we use jint as a bool >>>>>>>>> simply turned to int. We do not have an overload for bool for >>>>>>>>> cmpxchg. >>>>>>>> >>>>>>>> That's unfortunate - ditto for OrderAccess. >>>>>>>> >>>>>>>>>> >>>>>>>>>> Looking through I have a quite a few queries/comments - >>>>>>>>>> apologies in advance as I know how tedious this is: >>>>>>>>>> >>>>>>>>>> make/hotspot/lib/CompileLibjsig.gmk >>>>>>>>>> src/java.base/solaris/native/libjsig/jsig.c >>>>>>>>>> >>>>>>>>>> Took a while to figure out why the include was needed. :) As >>>>>>>>>> a follow up I suggest just deleting the -I include directive, >>>>>>>>>> delete the Solaris-only definition of JSIG_VERSION_1_4_1, and >>>>>>>>>> delete everything to do with JVM_get_libjsig_version. It is >>>>>>>>>> all obsolete. >>>>>>>>> >>>>>>>>> Can I patch up jsig in a separate RFE?? I don't remember why >>>>>>>>> this broke so I simply moved JSIG #define.? Is jsig obsolete? >>>>>>>>> Removing JVM_* definitions generally requires a CSR. >>>>>>>> >>>>>>>> I did say "As a follow up". jsig is not obsolete but the jsig >>>>>>>> versioning code, only used by Solaris, is. >>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/cpu/arm/interp_masm_arm.cpp >>>>>>>>>> >>>>>>>>>> Why did you need to add the jvm.h include? >>>>>>>>>> >>>>>>>>> >>>>>>>>> ?? tbz(Raccess_flags, JVM_ACC_SYNCHRONIZED_BIT, unlocked); >>>>>>>> >>>>>>>> Okay. I'm not going to try and figure out how this code found >>>>>>>> this before. >>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/os/windows/os_windows.cpp. >>>>>>>>>> >>>>>>>>>> The type of process_exiting should be uint to match the DWORD >>>>>>>>>> of GetCurrentThreadID. Then you should need any casts. Also >>>>>>>>>> you missed this jint cast: >>>>>>>>>> >>>>>>>>>> 3796???????? process_exiting != (jint)GetCurrentThreadId()) { >>>>>>>>> >>>>>>>>> Yes, that's better to change process_exiting to a DWORD.? It >>>>>>>>> needs a DWORD cast to 0 in the cmpxchg. >>>>>>>>> >>>>>>>>> ???????? Atomic::cmpxchg(GetCurrentThreadId(), >>>>>>>>> &process_exiting, (DWORD)0); >>>>>>>>> >>>>>>>>> These templates are picky. >>>>>>>> >>>>>>>> Yes - their inability to deal with literals is extremely >>>>>>>> frustrating. >>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/c1/c1_Canonicalizer.hpp >>>>>>>>>> >>>>>>>>>> ? 43 #ifdef _WINDOWS >>>>>>>>>> ? 44?? // jint is defined as long in jni_md.h, so convert >>>>>>>>>> from int to jint >>>>>>>>>> ? 45?? void set_constant(int x) { set_constant((jint)x); } >>>>>>>>>> ? 46 #endif >>>>>>>>>> >>>>>>>>>> Why is this necessary? int and long are the same on Windows. >>>>>>>>>> The whole point is that jint hides the underlying type, so >>>>>>>>>> where does this go wrong? >>>>>>>>> >>>>>>>>> No, they are not the same types even though they have the same >>>>>>>>> representation! >>>>>>>> >>>>>>>> This is truly unfortunate. >>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/c1/c1_LinearScan.cpp >>>>>>>>>> >>>>>>>>>> ?ConstantIntValue((jint)0); >>>>>>>>>> >>>>>>>>>> why is this cast needed? what causes the ambiguity? (If this >>>>>>>>>> was a template I'd understand ;-) ). Also didn't you change >>>>>>>>>> that constructor to take an int anyway - not that I think it >>>>>>>>>> should - see below. >>>>>>>>> >>>>>>>>> Yes, it caused an ambiguity.? 0 matches 'int' but it doesn't >>>>>>>>> match 'long' better than any pointer type.? So this cast is >>>>>>>>> needed. >>>>>>>> >>>>>>>> But you changed the constructor to take an int! >>>>>>>> >>>>>>>> ?class ConstantIntValue: public ScopeValue { >>>>>>>> ? private: >>>>>>>> -? jint _value; >>>>>>>> +? int _value; >>>>>>>> ? public: >>>>>>>> -? ConstantIntValue(jint value)???????? { _value = value; } >>>>>>>> +? ConstantIntValue(int value)????????? { _value = value; } >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Okay I removed this cast. >>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/ci/ciReplay.cpp >>>>>>>>>> >>>>>>>>>> 793???????? jint* dims = NEW_RESOURCE_ARRAY(jint, rank); >>>>>>>>>> >>>>>>>>>> why should this be jint? >>>>>>>>> >>>>>>>>> To avoid a cast from int* to jint* in the line below: >>>>>>>>> >>>>>>>>> ????????? value = kelem->multi_allocate(rank, dims, CHECK); >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/classfile/altHashing.cpp >>>>>>>>>> >>>>>>>>>> Okay this looks more consistent with jint. >>>>>>>>> >>>>>>>>> Yes.? I translated this from some native code iirc. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/code/debugInfo.hpp >>>>>>>>>> >>>>>>>>>> These changes seem wrong. We have: >>>>>>>>>> >>>>>>>>>> ConstantLongValue(jlong value) >>>>>>>>>> ConstantDoubleValue(jdouble value) >>>>>>>>>> >>>>>>>>>> so we should have: >>>>>>>>>> >>>>>>>>>> ConstantIntValue(jint value) >>>>>>>>> >>>>>>>>> Again, there are multiple call sites with '0', which match int >>>>>>>>> trivially but are confused with long.? It's less consistent I >>>>>>>>> agree but better to not cast all the call sites. >>>>>>>> >>>>>>>> This is really making a mess of the APIs - they should be a >>>>>>>> jint but we declare them int because of a 0 casting problem. >>>>>>>> Can't we just use 0L? >>>>>>> >>>>>>> There aren't that many casts.? You're right, that would have >>>>>>> been better in some places. >>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/code/relocInfo.cpp >>>>>>>>>> >>>>>>>>>> Change seems unnecessary - int32_t is fine >>>>>>>>>> >>>>>>>>> >>>>>>>>> No, int32_t doesn't match the calls below it. They all assume >>>>>>>>> _lo and _hi are jint. >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/compiler/compileBroker.cpp >>>>>>>>>> src/hotspot/share/compiler/compileBroker.hpp >>>>>>>>>> >>>>>>>>>> I see a complete mix of int and jint in this class, so why >>>>>>>>>> make the one change you did ?? >>>>>>>>> >>>>>>>>> This is another case of using jint as a flag with cmpxchg. The >>>>>>>>> templates for cmpxchg want the types to match and 0 and 1 are >>>>>>>>> essentially 'int'.? This is a lot cleaner this way. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp >>>>>>>>>> >>>>>>>>>> 1700???? tty->write((char*) start, MIN2(length, >>>>>>>>>> (jint)O_BUFLEN)); >>>>>>>>>> >>>>>>>>>> why did you need to add the jint cast? It's used without any >>>>>>>>>> cast on the next two lines: >>>>>>>>>> >>>>>>>>>> 1701???? length -= O_BUFLEN; >>>>>>>>>> 1702???? offset += O_BUFLEN; >>>>>>>>>> >>>>>>>>> >>>>>>>>> There's a conversion from O_BUFLEN from int to long in 1701 >>>>>>>>> and 1702.?? MIN2 is a template that wants the types to match >>>>>>>>> exactly. >>>>>>>> >>>>>>>> $%^%$! templates! >>>>>>>> >>>>>>>>>> ?? >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/jvmci/jvmciRuntime.cpp >>>>>>>>>> >>>>>>>>>> Looking around this code it seems very confused about types - >>>>>>>>>> eg the previous function is declared jboolean yet returns a >>>>>>>>>> jint on one path! It isn't clear to me if the return type is >>>>>>>>>> what should be changed or the parameter type? I would just >>>>>>>>>> leave this alone. >>>>>>>>> >>>>>>>>> I can't leave it alone because it doesn't compile that way. >>>>>>>>> This was the minimal change and yea, does look a bit >>>>>>>>> inconsistent. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/opto/mulnode.cpp >>>>>>>>>> >>>>>>>>>> Okay TypeInt has jint parts, so the remaining int32_t >>>>>>>>>> declarations (A, B, C, D) should also be jint. >>>>>>>>> >>>>>>>>> Yes.? c2 uses jint types. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/opto/parse3.cpp >>>>>>>>>> >>>>>>>>>> I agree with the changes you made, but then: >>>>>>>>>> >>>>>>>>>> ?419???? jint dim_con = find_int_con(length[j], -1); >>>>>>>>>> >>>>>>>>>> should also be changed. >>>>>>>>>> >>>>>>>>>> And obviously MultiArrayExpandLimit should be defined as int >>>>>>>>>> not intx! >>>>>>>>> >>>>>>>>> Everything in globals.hpp is intx.? That's a thread that I >>>>>>>>> don't want to pull on! >>>>>>>> >>>>>>>> We still have that limitation? >>>>>>>>> >>>>>>>>> Changed dim_con to int. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/opto/phaseX.cpp >>>>>>>>>> >>>>>>>>>> I can see that intcon(jint i) is consistent with >>>>>>>>>> longcon(jlong l), but the use of "i" in the code is more >>>>>>>>>> consistent with int than jint. >>>>>>>>> >>>>>>>>> huh?? really? >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/opto/type.cpp >>>>>>>>>> >>>>>>>>>> 1505 int TypeInt::hash(void) const { >>>>>>>>>> 1506?? return java_add(java_add(_lo, _hi), >>>>>>>>>> java_add((jint)_widen, (jint)Type::Int)); >>>>>>>>>> 1507 } >>>>>>>>>> >>>>>>>>>> I can see that the (jint) casts you added make sense, but >>>>>>>>>> then the whole function should be returning jint not int. >>>>>>>>>> Ditto the other hash functions. >>>>>>>>> >>>>>>>>> I'm not messing with this, this is the minimal in type fixing >>>>>>>>> that I'm going to do here. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/prims/jni.cpp >>>>>>>>>> >>>>>>>>>> I think vm_created should be a bool. In fact all the fields >>>>>>>>>> you changed are logically bools - do Atomics work for bool now? >>>>>>>>> >>>>>>>>> No, they do not.?? I had thought bool would be better >>>>>>>>> originally too. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/prims/jvm.cpp >>>>>>>>>> >>>>>>>>>> is_attachable is the terminology used in the JDK code. >>>>>>>>> >>>>>>>>> Well the JDK version had is_attach_supported() as the flag >>>>>>>>> name so I used that in this one place. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/prims/jvmtiEnvBase.cpp >>>>>>>>>> src/hotspot/share/prims/jvmtiImpl.cpp >>>>>>>>>> >>>>>>>>>> Are you making parameters consistent with the fields they >>>>>>>>>> initialize? >>>>>>>>> >>>>>>>>> They're consistent with the declarations now. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/prims/jvmtiTagMap.cpp >>>>>>>>>> >>>>>>>>>> There is a mix of int and jint for slot in this code. You >>>>>>>>>> fixed some, but this remains: >>>>>>>>>> >>>>>>>>>> 2440 inline bool CallbackInvoker::report_stack_ref_root(jlong >>>>>>>>>> thread_tag, >>>>>>>>>> 2441 jlong tid, >>>>>>>>>> 2442 jint depth, >>>>>>>>>> 2443 jmethodID method, >>>>>>>>>> 2444 jlocation bci, >>>>>>>>>> 2445 jint slot, >>>>>>>>> >>>>>>>>> Right for consistency with the declarations. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/runtime/perfData.cpp >>>>>>>>>> >>>>>>>>>> Callers pass both jint and int, so param type seems arbitrary. >>>>>>>>> >>>>>>>>> They are, but importantly they match the declarations. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/hotspot/share/runtime/perfMemory.cpp >>>>>>>>>> src/hotspot/share/runtime/perfMemory.hpp >>>>>>>>>> >>>>>>>>>> PerfMemory::_initialized should ideally be a bool - can >>>>>>>>>> OrderAccess handle that now? >>>>>>>>> >>>>>>>>> Nope. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/java.base/share/native/include/jvm.h >>>>>>>>>> >>>>>>>>>> Not clear why the jio functions are not also JNICALL ? >>>>>>>>> >>>>>>>>> They are now.? The JDK version didn't have JNICALL. JVM needs >>>>>>>>> JNICALL.? I can't tell you why JDK didn't need JNICALL linkage. >>>>>>>> >>>>>>>> ?? JVM currently does not have JNICALL. But they are declared >>>>>>>> as "extern C". >>>>>>> >>>>>>> This was a compilation error on Windows with JDK. Maybe the C >>>>>>> code in the JDK doesn't complain about linkage differences. I'll >>>>>>> have to go back and figure this out then. >>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/java.base/unix/native/include/jni_md.h >>>>>>>>>> >>>>>>>>>> There is no need to special case ARM. The differences in the >>>>>>>>>> existing code were for LTO support and that is now irrelevant. >>>>>>>>> >>>>>>>>> See discussion with Magnus.?? We still build ARM for jdk10/hs >>>>>>>>> so I needed this conditional or of course I wouldn't have >>>>>>>>> added it.? We can remove it with LTO support. >>>>>>>> >>>>>>>> Those builds are gone - this is obsolete. But yes all LTO can >>>>>>>> be removed later if you wish. Just trying to simplify things now. >>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/java.base/unix/native/include/jvm_md.h >>>>>>>>>> >>>>>>>>>> I know you've just copied this across, but it seems wrong to me: >>>>>>>>>> >>>>>>>>>> ?57 // Hack: MAXPATHLEN is 4095 on some Linux and 4096 on >>>>>>>>>> others. This may >>>>>>>>>> ? 58 //?????? cause problems if JVM and the rest of JDK are >>>>>>>>>> built on different >>>>>>>>>> ? 59 //?????? Linux releases. Here we define JVM_MAXPATHLEN >>>>>>>>>> to be MAXPATHLEN + 1, >>>>>>>>>> ? 60 //?????? so buffers declared in VM are always >= 4096. >>>>>>>>>> ? 61 #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>>>>>>> >>>>>>>>>> It doesn't make sense to me to define an internal "max path >>>>>>>>>> length" that can _exceed_ the platform max! >>>>>>>>>> >>>>>>>>>> That aside there's no support for building different parts of >>>>>>>>>> the JDK on different platforms and then bringing them >>>>>>>>>> together. And in any case I would think the real problem >>>>>>>>>> would be building on a platform that uses 4096 and running on >>>>>>>>>> one that uses 4095! >>>>>>>>>> >>>>>>>>>> But that aside this is a Linux hack and should be guarded by >>>>>>>>>> ifdef LINUX. (I doubt BSD needs it, the bsd file is just a >>>>>>>>>> copy of the linux one - the JDK macosx version does the right >>>>>>>>>> thing). Solaris and AIX should stay as-is at MAXPATHLEN. >>>>>>>>> >>>>>>>>> All of the unix platforms had MAXPATHLEN+1.? I'll leave it for >>>>>>>>> now and we can investigate that further. >>>>>>>> >>>>>>>> I see the following existing code: >>>>>>>> >>>>>>>> src/java.base/unix/native/include/jvm_md.h: >>>>>>>> >>>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>>>> >>>>>>>> src/java.base/macosx/native/include/jvm_md.h >>>>>>>> >>>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>>>> >>>>>>>> src/hotspot/os/aix/jvm_aix.h >>>>>>>> >>>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>>>> >>>>>>>> src/hotspot/os/bsd/jvm_bsd.h >>>>>>>> >>>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1? // blindly copied from >>>>>>>> Linux version >>>>>>>> >>>>>>>> src/hotspot/os/linux/jvm_linux.h >>>>>>>> >>>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN + 1 >>>>>>>> >>>>>>>> src/hotspot/os/solaris/jvm_solaris.h >>>>>>>> >>>>>>>> #define JVM_MAXPATHLEN MAXPATHLEN >>>>>>>> >>>>>>>> This is a linux only hack (if you ignore the blind copy from >>>>>>>> linux into the BSD code in the VM). >>>>>>> >>>>>>> Oh, thanks, so should I add a bunch of ifdefs then? Or do you >>>>>>> think having MAXPATHLEN + 1 will really break the other >>>>>>> platforms?? Do you really see this as a problem or are you just >>>>>>> pointing out inconsistency? >>>>>>>> >>>>>>>>>> >>>>>>>>>> ?86 #define ASYNC_SIGNAL???? SIGJVM2 >>>>>>>>>> >>>>>>>>>> This only exists on Solaris so I think should be in #ifdef >>>>>>>>>> SOLARIS, to make that clear. >>>>>>>>> >>>>>>>>> Ok.? I'll add this. >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> src/java.base/windows/native/include/jvm_md.h >>>>>>>>>> >>>>>>>>>> Given the differences between the two versions either >>>>>>>>>> something has been broken or "extern C" declarations are not >>>>>>>>>> needed :) >>>>>>>>> >>>>>>>>> Well, they are needed for Hotspot to build and do not prevent >>>>>>>>> jdk from building.? I don't know what was broken. >>>>>>>> >>>>>>>> We really need to understand this better. Maybe related to the >>>>>>>> map files that expose the symbols. ?? >>>>>>> >>>>>>> They're needed because the JDK files are written mostly in C and >>>>>>> that doesn't complain about the linkage difference. Hotspot >>>>>>> files are in C++ which does complain. >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> That was a really painful way to spend most of my Friday. >>>>>>>>>> TGIF! :) >>>>>>>>> >>>>>>>>> Thanks for going through it.? See comments inline for changes. >>>>>>>>> Generating a webrev takes hours so I'm not going to do that >>>>>>>>> unless you insist. >>>>>>>> >>>>>>>> An incremental webrev shouldn't take long - right? You're a mq >>>>>>>> maestro now. :) >>>>>>> >>>>>>> Well I generally trash a repository whenever I use mq but sure. >>>>>>>> >>>>>>>> If you can reasonably produce an incremental webrev once you've >>>>>>>> settled on all the comments/issues that would be good. >>>>>>> >>>>>>> Ok, sure. >>>>>>> >>>>>>> Coleen >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Coleen >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 27/10/2017 6:44 AM, coleen.phillimore at oracle.com wrote: >>>>>>>>>>> ??Hi Magnus, >>>>>>>>>>> >>>>>>>>>>> Thank you for reviewing this.?? I have a new version that >>>>>>>>>>> takes out the hack in globalDefinitions.hpp and adds casts >>>>>>>>>>> to src/hotspot/share/opto/type.cpp instead. >>>>>>>>>>> >>>>>>>>>>> Also some fixes from Martin at SAP. >>>>>>>>>>> >>>>>>>>>>> open webrev at >>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.02/webrev >>>>>>>>>>> >>>>>>>>>>> see below. >>>>>>>>>>> >>>>>>>>>>> On 10/26/17 5:57 AM, Magnus Ihse Bursie wrote: >>>>>>>>>>>> Coleen, >>>>>>>>>>>> >>>>>>>>>>>> Thank you for addressing this! >>>>>>>>>>>> >>>>>>>>>>>> On 2017-10-25 18:49, coleen.phillimore at oracle.com wrote: >>>>>>>>>>>>> Summary: removed hotspot version of jvm*h and jni*h files >>>>>>>>>>>>> >>>>>>>>>>>>> Mostly used sed to remove prims/jvm.h and move #include >>>>>>>>>>>>> "jvm.h" after precompiled.h, so if you have repetitive >>>>>>>>>>>>> stress wrist issues don't click on most of these files. >>>>>>>>>>>>> >>>>>>>>>>>>> There were more issues to resolve, however. The JDK >>>>>>>>>>>>> windows jni_md.h file defined jint as long and the hotspot >>>>>>>>>>>>> windows jni_x86.h as int. I had to choose the jdk version >>>>>>>>>>>>> since it's the public version, so there are changes to the >>>>>>>>>>>>> hotspot files for this. Generally I changed the code to >>>>>>>>>>>>> use 'int' rather than 'jint' where the surrounding API >>>>>>>>>>>>> didn't insist on consistently using java types. We should >>>>>>>>>>>>> mostly be using C++ types within hotspot except in >>>>>>>>>>>>> interfaces to native/JNI code. There are a couple of hacks >>>>>>>>>>>>> in places where adding multiple jint casts was too painful. >>>>>>>>>>>>> >>>>>>>>>>>>> Tested with JPRT and tier2-4 (in progress). >>>>>>>>>>>>> >>>>>>>>>>>>> open webrev at >>>>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8189610.01/webrev >>>>>>>>>>>> >>>>>>>>>>>> Looks great! >>>>>>>>>>>> >>>>>>>>>>>> Just a few comments: >>>>>>>>>>>> >>>>>>>>>>>> * src/java.base/unix/native/include/jni_md.h: >>>>>>>>>>>> >>>>>>>>>>>> I don't think the externally_visible attribute should be >>>>>>>>>>>> there for arm. I know this was the case for the >>>>>>>>>>>> corresponding hotspot file for arm, but that was techically >>>>>>>>>>>> incorrect. The proper dependency here is that >>>>>>>>>>>> externally_visible should be in all JNIEXPORT if and only >>>>>>>>>>>> if we're building with JVM feature "link-time-opt". >>>>>>>>>>>> Traditionally, that feature been enabled when building >>>>>>>>>>>> arm32 builds, and only then, so there's been a >>>>>>>>>>>> (coincidentally) connection here. Nowadays, Oracle does not >>>>>>>>>>>> care about the arm32 builds, and I'm not sure if anyone >>>>>>>>>>>> else is building them with link-time-opt enabled. >>>>>>>>>>>> >>>>>>>>>>>> It does seem wrong to me to export this behavior in the >>>>>>>>>>>> public jni_md.h file, though. I think the correct way to >>>>>>>>>>>> solve this, if we should continue supporting link-time-opt >>>>>>>>>>>> is to make sure this attribute is set for exported hotspot >>>>>>>>>>>> functions. If it's still needed, that is. A quick googling >>>>>>>>>>>> seems to indicate that visibility("default") might be >>>>>>>>>>>> enough in modern gcc's. >>>>>>>>>>>> >>>>>>>>>>>> A third option is to remove the support for link-time-opt >>>>>>>>>>>> entirely, if it's not really used. >>>>>>>>>>> >>>>>>>>>>> I didn't know how to change this since we are still building >>>>>>>>>>> ARM with the jdk10/hs repository, and ARM needed this >>>>>>>>>>> change. I could wait until we bring down the jdk10/master >>>>>>>>>>> changes that remove the ARM build and remove this >>>>>>>>>>> conditional before I push. Or we could file an RFE to remove >>>>>>>>>>> link-time-opt (?) and remove it then? >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> * src/java.base/unix/native/include/jvm_md.h and >>>>>>>>>>>> src/java.base/windows/native/include/jvm_md.h: >>>>>>>>>>>> >>>>>>>>>>>> These files define a public API, and contain non-trivial >>>>>>>>>>>> changes. I suspect you should file a CSR request. (Even >>>>>>>>>>>> though I realize you're only matching the header file with >>>>>>>>>>>> the reality.) >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I filed the CSR.?? Waiting for the next steps. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Coleen >>>>>>>>>>> >>>>>>>>>>>> /Magnus >>>>>>>>>>>> >>>>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8189610 >>>>>>>>>>>>> >>>>>>>>>>>>> I have a script to update copyright files on commit. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks to Magnus and ErikJ for the makefile changes. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Coleen >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >> From dmitry.samersoff at bell-sw.com Tue Oct 31 12:58:37 2017 From: dmitry.samersoff at bell-sw.com (Dmitry Samersoff) Date: Tue, 31 Oct 2017 15:58:37 +0300 Subject: [10] RFR 8186046 Minimal ConstantDynamic support In-Reply-To: References: <93431280-9CBF-4722-961D-F2D2D0F83B4E@oracle.com> Message-ID: Paul and Frederic, Thank you. One more question. Do we need to call verify_oop below? 509 { // Check for the null sentinel. ... 517 xorptr(result, result); // NULL object reference ... 521 if (VerifyOops) { 522 verify_oop(result); 523 } -Dmitry On 31.10.2017 00:56, Frederic Parain wrote: > I?m seeing no issue with rcx being aliased in this code. > > Fred > >> On Oct 30, 2017, at 15:44, Paul Sandoz wrote: >> >> Hi, >> >> Thanks for reviewing. >> >>> On 30 Oct 2017, at 11:05, Dmitry Samersoff wrote: >>> >>> Paul, >>> >>> templateTable_x86.cpp: >>> >>> 564 const Register flags = rcx; >>> 565 const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); >>> >>> Should we use another register for rarg under NOT_LP64 ? >>> >> >> I think it should be ok, it i ain?t an expert here on the interpreter and the calling conventions, so please correct me. >> >> Some more context: >> >> + const Register flags = rcx; >> + const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); >> + __ movl(rarg, (int)bytecode()); >> >> The current bytecode code is loaded into ?rarg? >> >> + call_VM(obj, CAST_FROM_FN_PTR(address, InterpreterRuntime::resolve_ldc), rarg); >> >> Then ?rarg" is the argument to the call to InterpreterRuntime::resolve_ldc, after which it is no longer referred to. >> >> +#ifndef _LP64 >> + // borrow rdi from locals >> + __ get_thread(rdi); >> + __ get_vm_result_2(flags, rdi); >> + __ restore_locals(); >> +#else >> + __ get_vm_result_2(flags, r15_thread); >> +#endif >> >> The result from the call is then loaded into flags. >> >> So i don?t think it matters in this case if rcx is aliased. >> >> Paul. >> >>> -Dmitry >>> >>> >>> On 10/26/2017 08:03 PM, Paul Sandoz wrote: >>>> Hi, >>>> >>>> Please review the following patch for minimal dynamic constant support: >>>> >>>> http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8186046 >>>> https://bugs.openjdk.java.net/browse/JDK-8186209 >>>> >>>> This patch is based on the JDK 10 unified HotSpot repository. Testing so far looks good. >>>> >>>> By minimal i mean just the support in the runtime for a dynamic constant pool entry to be referenced by a LDC instruction or a bootstrap method argument. Much of the work leverages the foundations built by invoke dynamic but is arguably simpler since resolution is less complex. >>>> >>>> A small set of bootstrap methods will be proposed as a follow on issue for 10 (these are currently being refined in the amber repository). >>>> >>>> Bootstrap method invocation has not changed (and the rules are the same for dynamic constants and indy). It is planned to enhance this in a further major release to support lazy resolution of bootstrap method arguments. >>>> >>>> The CSR for the VM specification is here: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8189199 >>>> >>>> the j.l.invoke package documentation was also updated but please consider the VM specification as the definitive "source of truth" (we may clean up this area further later on so it becomes more informative, and that may also apply to duplicative text on MethodHandles/VarHandles). >>>> >>>> Any AoT-related work will be deferred to a future release. >>>> >>>> ? >>>> >>>> This patch only supports x64 platforms. There is a small set of changes specific to x64 (specifically to support null and primitives constants, as prior to this patch null was used as a sentinel for resolution and certain primitives types would never have been encountered, such as say byte). >>>> >>>> We will need to follow up with the SPARC platform and it is hoped/anticipated that OpenJDK members responsible for other platforms (namely ARM and PPC) will separately provide patches. >>>> >>>> ? >>>> >>>> Many of tests rely on an experimental byte code API that supports the generation of byte code with dynamic constants. >>>> >>>> One test uses class file bytes produced from a modified version of asmtools. The modifications have now been pushed but a new version of asmtools need to be rolled into jtreg before the test can operate directly on asmtools information rather than embedding class file bytes directly in the test. >>>> >>>> ? >>>> >>>> Paul. >>>> >>> >> > From doug.simon at oracle.com Tue Oct 31 13:05:11 2017 From: doug.simon at oracle.com (Doug Simon) Date: Tue, 31 Oct 2017 14:05:11 +0100 Subject: RFR: 8190415: [JVMCI] JVMCIRuntime::adjust_comp_level must not swallow ThreadDeath Message-ID: Please review this change that fixes a JVMCI code path that was swallowing ThreadDeath exceptions and thus preventing Thread.stop from working as intended. The webrev also contains some minor unrelated cleanup to mx_jvmci.py needed for supporting the consolidated repo. The internal test that caught this problem is now passing. https://bugs.openjdk.java.net/browse/JDK-8190415 http://cr.openjdk.java.net/~dnsimon/8190415/ -Doug From robbin.ehn at oracle.com Tue Oct 31 14:37:14 2017 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 31 Oct 2017 15:37:14 +0100 Subject: RFR(XL): 8185640: Thread-local handshakes In-Reply-To: <9ff3abc3-9809-a9df-141b-15f0b05bd8a4@oracle.com> References: <6f2f6259-73f1-c09c-063e-39ae528fb96f@oracle.com> <580AD7F0-2713-472C-A440-AAFDDA2D3EB3@oracle.com> <7591f6c0-7192-78c3-fe79-56a7785c43e4@oracle.com> <2ff79d24-90ab-822a-bd61-e01b79c01ada@redhat.com> <8d7678bf2281406da43cbe090276b51f@sap.com> <28bc3976-424d-1e05-cf7f-29bc38ccabcb@oracle.com> <818e352d5e3a450491cf0c140bf129d6@sap.com> <1c4a025da6ad4d39bedc1d6a12549b87@sap.com> <93cce80b-e9d7-f016-1324-2b0f5fac48c4@redhat.com> <59F1F3B3.10701@oracle.com> <43837915-a3a3-b36f-940e-1327937f0f17@redhat.com> <2EB9D7C3-B868-4C3E-BD88-6A4F92A39999@oracle.com> <59F2DC24.8050701@oracle.com> <59F2F01A.403@oracle.com> <4ebb905f23324a00b9cf10d8d410d420@sap.com> <9ff3abc3-9809-a9df-141b-15f0b05bd8a4@oracle.com> Message-ID: Thank you David for having a look. I updated after your review, I think I got it all, please see: http://cr.openjdk.java.net/~rehn/8185640/v9/DavidH-Option-Cleanup-13/webrev/ I'm also updating CSR with product_pd. Short thing: On 10/31/2017 11:27 AM, David Holmes wrote: > > I'm also thinking, if this is platform dependent then shouldn't > ThreadLocalHandshakes be a product_pd flag, with pd specific default setting - > and turning it on when on an unsupported platform should be a error ? Yes, the error checking already exists in: 135 Flag::Error ThreadLocalHandshakesConstraintFunc(bool value, bool verbose) { 136 if (value) { 137 if (!SafepointMechanism::supports_thread_local_poll()) { 138 CommandLineError::print(verbose, "ThreadLocalHandshakes not yet supported on this platform\n"); 139 return Flag::VIOLATES_CONSTRAINT; 140 } 141 if (UseAOT JVMCI_ONLY(|| EnableJVMCI || UseJVMCICompiler)) { 142 CommandLineError::print(verbose, "ThreadLocalHandshakes not yet supported in combination with AOT or JVMCI\n"); 143 return Flag::VIOLATES_CONSTRAINT; 144 } 145 } 146 return Flag::SUCCESS; 147 } Sanity tested with handshake benchmark on all supported + 1 unsupported platform. Thanks, Robbin > > Thanks, > David > ----- > >> Here is webrev for changes needed: >> http://cr.openjdk.java.net/~rehn/8185640/v8/Option-Cleanup-12/webrev/ >> And here is CSR: >> https://bugs.openjdk.java.net/browse/JDK-8189942 >> >> Manual testing + basic testing done. >> >> And since I'm really hoping that this can be the last incremental, here is my >> whole patch queue flatten out: >> http://cr.openjdk.java.net/~rehn/8185640/v8/Full/webrev/ >> >> Thanks, Robbin >> >> On 10/27/2017 04:47 PM, Doerr, Martin wrote: >>> Hi Robbin, >>> >>> excellent. I think this matches what Coleen had proposed, now. >>> Thanks for doing all the work with so many incremental patches and for >>> responding on so many discussions. Seems to be a tough piece of work. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Robbin Ehn [mailto:robbin.ehn at oracle.com] >>> Sent: Freitag, 27. Oktober 2017 15:15 >>> To: Erik ?sterlund ; Andrew Haley >>> ; Doerr, Martin ; Karen Kinnear >>> ; Coleen Phillimore (coleen.phillimore at oracle.com) >>> >>> Cc: hotspot-dev developers >>> Subject: Re: RFR(XL): 8185640: Thread-local handshakes >>> >>> Hi all, >>> >>> Poll in switches: >>> http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Switch-10/ >>> >>> Poll in return: >>> http://cr.openjdk.java.net/~rehn/8185640/v7/Interpreter-Poll-Ret-11/ >>> >>> Please take an extra look at poll in return. >>> >>> Sanity tested, big test run still running (99% complete - OK). >>> >>> Performance regression for the added polls increased to total of -0.68% vs >>> global poll. (was -0.44%) >>> >>> We are discussing the opt-out option, the newest suggestion is to make it >>> diagnostic. Opinions? >>> >>> For anyone applying these patches, the number 9 patch changes the option from >>> product. I have not sent that out. >>> >>> Thanks, Robbin >>> >>> >>> From kumar.x.srinivasan at oracle.com Tue Oct 31 16:42:43 2017 From: kumar.x.srinivasan at oracle.com (Kumar Srinivasan) Date: Tue, 31 Oct 2017 09:42:43 -0700 Subject: RFR: 8190287: Update JDK's internal ASM to ASMv6 In-Reply-To: <59F3690B.6070309@oracle.com> References: <59F3690B.6070309@oracle.com> Message-ID: <59F8A803.9060305@oracle.com> Hi Remi, Are you ok with the ASMv6 changes ? Thanks Kumar On 10/27/2017 10:12 AM, Kumar Srinivasan wrote: > Hello Remi, Sundar and others, > > Please review the webrev [1] to update JDK's internal ASM to v6. > > To help with review areas, you can use the browser to search for mq > patches commented with // > > Highlights of changes: > 1. updated ASMv6 // jdk-new-asmv6.patch > 2. changes to jlink and jar to add ModuleMainClass and ModulePackages > attributes //jdk-new-asm-update.patch > 3. adjustments to jdk tests //jdk-new-asm-test.patch > 4. minor adjustments to hotspot tests //jdk-new-hotspot-test.patch > > Tests: > jdk_tier1, jdk_tier2, testset hotspot, hotspot_tier1, nashorn ant tests, > Alan has also run several tests. > > Big thanks to Alan for #2 and #3 as part of [3]. > > Thanks > Kumar > > [1] http://cr.openjdk.java.net/~ksrini/8190287/webrev.00/index.html > [2] https://bugs.openjdk.java.net/browse/JDK-8190287 > [3] https://bugs.openjdk.java.net/browse/JDK-8186236 > From paul.sandoz at oracle.com Tue Oct 31 17:32:25 2017 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 31 Oct 2017 10:32:25 -0700 Subject: [10] RFR 8186046 Minimal ConstantDynamic support In-Reply-To: References: <93431280-9CBF-4722-961D-F2D2D0F83B4E@oracle.com> Message-ID: <58726425-BA16-482B-A02E-3B0613CD5010@oracle.com> > On 31 Oct 2017, at 05:58, Dmitry Samersoff wrote: > > Paul and Frederic, > > Thank you. > > One more question. Do we need to call verify_oop below? > > 509 { // Check for the null sentinel. > ... > 517 xorptr(result, result); // NULL object reference > ... > > 521 if (VerifyOops) { > 522 verify_oop(result); > 523 } > I believe it?s harmless. When the flag is on it eventually results in a call to the stub generated by generate_verify_oop: http://hg.openjdk.java.net/jdk10/hs/file/tip/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#l1023 // make sure object is 'reasonable' __ testptr(rax, rax); __ jcc(Assembler::zero, exit); // if obj is NULL it is OK If the oop is null the verification will exit safely. Paul. > -Dmitry > > > On 31.10.2017 00:56, Frederic Parain wrote: >> I?m seeing no issue with rcx being aliased in this code. >> >> Fred >> >>> On Oct 30, 2017, at 15:44, Paul Sandoz wrote: >>> >>> Hi, >>> >>> Thanks for reviewing. >>> >>>> On 30 Oct 2017, at 11:05, Dmitry Samersoff wrote: >>>> >>>> Paul, >>>> >>>> templateTable_x86.cpp: >>>> >>>> 564 const Register flags = rcx; >>>> 565 const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); >>>> >>>> Should we use another register for rarg under NOT_LP64 ? >>>> >>> >>> I think it should be ok, it i ain?t an expert here on the interpreter and the calling conventions, so please correct me. >>> >>> Some more context: >>> >>> + const Register flags = rcx; >>> + const Register rarg = NOT_LP64(rcx) LP64_ONLY(c_rarg1); >>> + __ movl(rarg, (int)bytecode()); >>> >>> The current bytecode code is loaded into ?rarg? >>> >>> + call_VM(obj, CAST_FROM_FN_PTR(address, InterpreterRuntime::resolve_ldc), rarg); >>> >>> Then ?rarg" is the argument to the call to InterpreterRuntime::resolve_ldc, after which it is no longer referred to. >>> >>> +#ifndef _LP64 >>> + // borrow rdi from locals >>> + __ get_thread(rdi); >>> + __ get_vm_result_2(flags, rdi); >>> + __ restore_locals(); >>> +#else >>> + __ get_vm_result_2(flags, r15_thread); >>> +#endif >>> >>> The result from the call is then loaded into flags. >>> >>> So i don?t think it matters in this case if rcx is aliased. >>> >>> Paul. >>> >>>> -Dmitry >>>> >>>> >>>> On 10/26/2017 08:03 PM, Paul Sandoz wrote: >>>>> Hi, >>>>> >>>>> Please review the following patch for minimal dynamic constant support: >>>>> >>>>> http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8186046 >>>>> https://bugs.openjdk.java.net/browse/JDK-8186209 >>>>> >>>>> This patch is based on the JDK 10 unified HotSpot repository. Testing so far looks good. >>>>> >>>>> By minimal i mean just the support in the runtime for a dynamic constant pool entry to be referenced by a LDC instruction or a bootstrap method argument. Much of the work leverages the foundations built by invoke dynamic but is arguably simpler since resolution is less complex. >>>>> >>>>> A small set of bootstrap methods will be proposed as a follow on issue for 10 (these are currently being refined in the amber repository). >>>>> >>>>> Bootstrap method invocation has not changed (and the rules are the same for dynamic constants and indy). It is planned to enhance this in a further major release to support lazy resolution of bootstrap method arguments. >>>>> >>>>> The CSR for the VM specification is here: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8189199 >>>>> >>>>> the j.l.invoke package documentation was also updated but please consider the VM specification as the definitive "source of truth" (we may clean up this area further later on so it becomes more informative, and that may also apply to duplicative text on MethodHandles/VarHandles). >>>>> >>>>> Any AoT-related work will be deferred to a future release. >>>>> >>>>> ? >>>>> >>>>> This patch only supports x64 platforms. There is a small set of changes specific to x64 (specifically to support null and primitives constants, as prior to this patch null was used as a sentinel for resolution and certain primitives types would never have been encountered, such as say byte). >>>>> >>>>> We will need to follow up with the SPARC platform and it is hoped/anticipated that OpenJDK members responsible for other platforms (namely ARM and PPC) will separately provide patches. >>>>> >>>>> ? >>>>> >>>>> Many of tests rely on an experimental byte code API that supports the generation of byte code with dynamic constants. >>>>> >>>>> One test uses class file bytes produced from a modified version of asmtools. The modifications have now been pushed but a new version of asmtools need to be rolled into jtreg before the test can operate directly on asmtools information rather than embedding class file bytes directly in the test. >>>>> >>>>> ? >>>>> >>>>> Paul. >>>>> >>>> >>> >> > > From paul.sandoz at oracle.com Tue Oct 31 19:32:28 2017 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 31 Oct 2017 12:32:28 -0700 Subject: [10] RFR 8186046 Minimal ConstantDynamic support In-Reply-To: References: Message-ID: Lois identified and fixed a bug found when running the JCK VM tests. I merged the changes below into the current webrev. Paul. --- old/src/hotspot/share/interpreter/linkResolver.cpp 2017-10-31 11:56:30.541287505 -0400 +++ new/src/hotspot/share/interpreter/linkResolver.cpp 2017-10-31 11:56:29.215676272 -0400 @@ -301,14 +301,14 @@ if (vca_result != Reflection::ACCESS_OK) { ResourceMark rm(THREAD); char* msg = Reflection::verify_class_access_msg(ref_klass, - InstanceKlass::cast(sel_klass), + InstanceKlass::cast(base_klass), vca_result); if (msg == NULL) { Exceptions::fthrow( THREAD_AND_LOCATION, vmSymbols::java_lang_IllegalAccessError(), "failed to access class %s from class %s", - sel_klass->external_name(), + base_klass->external_name(), ref_klass->external_name()); } else { // Use module specific message returned by verify_class_access_msg(). > On 26 Oct 2017, at 10:03, Paul Sandoz wrote: > > Hi, > > Please review the following patch for minimal dynamic constant support: > > http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ > > https://bugs.openjdk.java.net/browse/JDK-8186046 > https://bugs.openjdk.java.net/browse/JDK-8186209 > > This patch is based on the JDK 10 unified HotSpot repository. Testing so far looks good. > > By minimal i mean just the support in the runtime for a dynamic constant pool entry to be referenced by a LDC instruction or a bootstrap method argument. Much of the work leverages the foundations built by invoke dynamic but is arguably simpler since resolution is less complex. > > A small set of bootstrap methods will be proposed as a follow on issue for 10 (these are currently being refined in the amber repository). > > Bootstrap method invocation has not changed (and the rules are the same for dynamic constants and indy). It is planned to enhance this in a further major release to support lazy resolution of bootstrap method arguments. > > The CSR for the VM specification is here: > > https://bugs.openjdk.java.net/browse/JDK-8189199 > > the j.l.invoke package documentation was also updated but please consider the VM specification as the definitive "source of truth" (we may clean up this area further later on so it becomes more informative, and that may also apply to duplicative text on MethodHandles/VarHandles). > > Any AoT-related work will be deferred to a future release. > > ? > > This patch only supports x64 platforms. There is a small set of changes specific to x64 (specifically to support null and primitives constants, as prior to this patch null was used as a sentinel for resolution and certain primitives types would never have been encountered, such as say byte). > > We will need to follow up with the SPARC platform and it is hoped/anticipated that OpenJDK members responsible for other platforms (namely ARM and PPC) will separately provide patches. > > ? > > Many of tests rely on an experimental byte code API that supports the generation of byte code with dynamic constants. > > One test uses class file bytes produced from a modified version of asmtools. The modifications have now been pushed but a new version of asmtools need to be rolled into jtreg before the test can operate directly on asmtools information rather than embedding class file bytes directly in the test. > > ? > > Paul. From sundararajan.athijegannathan at oracle.com Tue Oct 31 04:27:35 2017 From: sundararajan.athijegannathan at oracle.com (Sundararajan Athijegannathan) Date: Tue, 31 Oct 2017 09:57:35 +0530 Subject: RFR: 8190287: Update JDK's internal ASM to ASMv6 In-Reply-To: <59F3690B.6070309@oracle.com> References: <59F3690B.6070309@oracle.com> Message-ID: <59F7FBB7.2080400@oracle.com> jlink changes look good. I ran jlink tests and all nashorn tests (jtreg as well as ant test/test262parallel) after applying the patch locally. All fine! +1 -Sundar On 27/10/17, 10:42 PM, Kumar Srinivasan wrote: > Hello Remi, Sundar and others, > > Please review the webrev [1] to update JDK's internal ASM to v6. > > To help with review areas, you can use the browser to search for mq > patches commented with // > > Highlights of changes: > 1. updated ASMv6 // jdk-new-asmv6.patch > 2. changes to jlink and jar to add ModuleMainClass and ModulePackages > attributes //jdk-new-asm-update.patch > 3. adjustments to jdk tests //jdk-new-asm-test.patch > 4. minor adjustments to hotspot tests //jdk-new-hotspot-test.patch > > Tests: > jdk_tier1, jdk_tier2, testset hotspot, hotspot_tier1, nashorn ant tests, > Alan has also run several tests. > > Big thanks to Alan for #2 and #3 as part of [3]. > > Thanks > Kumar > > [1] http://cr.openjdk.java.net/~ksrini/8190287/webrev.00/index.html > [2] https://bugs.openjdk.java.net/browse/JDK-8190287 > [3] https://bugs.openjdk.java.net/browse/JDK-8186236 > From mandy.chung at oracle.com Tue Oct 31 21:43:59 2017 From: mandy.chung at oracle.com (mandy chung) Date: Tue, 31 Oct 2017 14:43:59 -0700 Subject: [10] RFR 8186046 Minimal ConstantDynamic support In-Reply-To: References: Message-ID: <230aad0f-8649-baf2-71e8-8efc75d0cb16@oracle.com> On 10/26/17 10:03 AM, Paul Sandoz wrote: > Hi, > > Please review the following patch for minimal dynamic constant support: > > http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ I reviewed the non-hotspot change as a learning exercise (I am not close to j.l.invoke implementation).? I assume DynamicConstant intends to be non-public in this patch, right? 30 public final class DynamicConstant Mandy From paul.sandoz at oracle.com Tue Oct 31 22:53:30 2017 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 31 Oct 2017 15:53:30 -0700 Subject: [10] RFR 8186046 Minimal ConstantDynamic support In-Reply-To: <230aad0f-8649-baf2-71e8-8efc75d0cb16@oracle.com> References: <230aad0f-8649-baf2-71e8-8efc75d0cb16@oracle.com> Message-ID: <05E86643-EE91-49BB-9A57-B291AA087211@oracle.com> > On 31 Oct 2017, at 14:43, mandy chung wrote: > > > > On 10/26/17 10:03 AM, Paul Sandoz wrote: >> Hi, >> >> Please review the following patch for minimal dynamic constant support: >> >> >> http://cr.openjdk.java.net/~psandoz/jdk10/JDK-8186046-minimal-condy-support-hs/webrev/ > > > I reviewed the non-hotspot change as a learning exercise (I am not close to j.l.invoke implementation). I assume DynamicConstant intends to be non-public in this patch, right? > 30 public final class DynamicConstant > Well spotted. More likely to be renamed to ConstantBootstraps when a minimal set of dynamic constant bootstraps will be proposed (likely this week) as a follow on patch. I?ll made it non-public in the updated webrev so as to keep this patch self-contained. Paul.